Assinatura gratuita
The Daily São Paulo

São Paulo news, every day

News

São Paulo's Digital Archives Race to Stamp Out Duplicate Images — Here's Where That Effort Stands This Week

From Vila Madalena to the Paulista Avenue cultural corridor, institutions managing public photo collections are adopting new automated tools to clean up bloated, redundant visual databases.

By São Paulo News Desk · Published 4 July 2026, 3:58 pm

3 min read

São Paulo's Digital Archives Race to Stamp Out Duplicate Images — Here's Where That Effort Stands This Week
Photo: Photo by Frederico Luz on Pexels
Traduzindo…

São Paulo's major cultural and public institutions spent much of this week confronting a problem that has quietly inflated their digital storage costs for years: thousands of duplicate images clogging archives that were supposed to preserve the city's visual history. The Municipal Secretariat of Culture, which oversees collections spanning the Pinacoteca do Estado and the Municipal Library on Rua da Consolação, confirmed it is mid-rollout of a deduplication audit across its digitised holdings — a process administrators have been pushing since at least early 2025.

The timing matters. São Paulo's tech procurement cycle runs on a fiscal calendar that closes in August, meaning any software contracts for artificial-intelligence-assisted duplicate detection need to be signed, tested, and operational before the mid-year budget window shuts. Institutions that miss the deadline face waiting until 2027 for fresh capital allocation. That bureaucratic pressure is what turned a technical housekeeping task into a genuine news story this week.

What the Problem Actually Looks Like on the Ground

At the Centro Cultural São Paulo, on Rua Vergueiro in Liberdade, archivists have been manually flagging redundant scans since a digitisation push that ran through 2023 and 2024. The centre holds tens of thousands of photographic records covering São Paulo's urban transformation — construction of the Minhocão elevated road, Ibirapuera Park in its various states, and street documentation from floods along the Tietê and Pinheiros rivers. When different digitisation contractors scanned the same physical prints at different resolutions, the result was multiple file versions of a single image, each catalogued separately.

The Instituto Moreira Salles, which operates a São Paulo reading room on Rua Paulistana in Alto de Pinheiros and holds one of Brazil's largest photographic collections, has been further along this process. The institute began applying perceptual hashing — a method that generates a compact fingerprint for each image and flags near-identical matches — to its digital catalogue in the second half of 2024. The technique can detect duplicates even when one version has been cropped, colour-corrected, or saved in a different file format.

For smaller organisations, the economics are blunt. Cloud storage on Brazilian infrastructure typically runs between R$0,10 and R$0,25 per gigabyte per month depending on the provider and tier. A mid-sized archive carrying 500,000 images with a 15 percent duplication rate — a conservative estimate for institutions that digitised without strict version control — is paying for roughly 75,000 files it does not need. Across a year, that adds up to budget that could fund acquisitions or conservation work.

The Software Question and What Comes Next

The Municipal Secretariat of Science, Technology and Innovation, housed in the Parque Tecnológico Anhembi complex in the north zone, has been coordinating with the Secretariat of Culture on a joint procurement framework. The framework, still being finalised as of this week, is expected to cover AI-assisted deduplication tools that would be licensed across multiple municipal bodies rather than purchased separately — a model already used in the city's 2022 consolidation of geographic information systems.

The practical stakes extend beyond cost. São Paulo's flooding crisis, which has drawn sustained attention to the drainage infrastructure along the Marginal Pinheiros, has generated its own photographic documentation problem. Civil defence agencies have been photographing the same affected neighbourhoods — Pinheiros, Vila Sônia, Campo Limpo — with different teams and devices, creating redundant before-and-after sets that slow down damage assessment when staff have to cross-reference overlapping files manually.

For cultural institutions waiting on a procurement decision, the immediate advice from archival consultants familiar with the municipal process is to run internal audits using open-source tools — ExifTool and dupeGuru are both free and widely documented — before the August budget deadline, even if the full AI-platform contract is still months away. Building a clear count of duplicate files now gives administrators a defensible number to take into budget negotiations, and it positions those institutions at the front of the line when the joint licensing framework eventually opens for sign-on.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily São Paulo

This article was produced by the The Daily São Paulo editorial desk and covers news in São Paulo. See our editorial standards for how we use AI.

The Daily São Paulo brief

The day's São Paulo news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily São Paulo and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to São Paulo news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily São Paulo and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily São Paulo

More in News

Enjoyed this story? Get tomorrow's briefing free.