São Paulo's Digital Archives Face a Reckoning Over Duplicate Images: The Key Decisions Ahead
A growing crisis of redundant visual data is forcing city institutions, tech firms and cultural repositories to choose how—and how fast—to clean house.
A growing crisis of redundant visual data is forcing city institutions, tech firms and cultural repositories to choose how—and how fast—to clean house.

Tens of millions of duplicate image files are clogging the digital infrastructure of São Paulo's public institutions, and administrators at city hall and major cultural organisations now face a hard deadline to decide what to do about them. The problem is no longer abstract: the Prefeitura de São Paulo's Secretaria Municipal de Gestão, which oversees municipal data systems, has flagged the redundancy crisis as a priority item for the second half of 2026, after internal audits revealed that storage costs had grown significantly faster than the volume of genuinely new digital content.
Why now? São Paulo's push to digitise public records accelerated sharply after the 2021 launch of the SP156 platform and subsequent expansions under Mayor Ricardo Nunes, who tied his second-term efficiency agenda to cutting municipal IT overhead. Duplicate images—the same photograph, scanned document or infographic stored under multiple filenames across different departments—have quietly become one of the largest single drains on server capacity managed by PRODAM, the city's official technology company, headquartered on Rua Líbero Badaró in the Centro histórico. The redundancy does not just waste money; it complicates public-records requests and slows the city's own search and retrieval systems.
The cultural sector is equally exposed. The Museu de Arte de São Paulo, on Avenida Paulista, and the Instituto Moreira Salles, with its photography archive hub in Higienópolis, both maintain vast digital image libraries that have grown organically over years of scanning donations and digitising historical collections. Neither institution has publicly disclosed the exact scale of its duplicate problem, but specialists in digital preservation who work with Latin American collections describe the challenge as systemic across the region's largest repositories. Without a standardised deduplication protocol, the same high-resolution image of a 1950s Paulistano street scene can exist in dozens of slightly different file versions, each treated as a unique record requiring individual metadata and backup.
Two broad approaches are now on the table for São Paulo's institutions. The first is automated hash-based deduplication—software that generates a unique fingerprint for each image file and flags exact or near-exact matches for deletion or consolidation. PRODAM has piloted this method in at least one municipal department since early 2026, though the city has not published results. The second approach is manual curation combined with AI-assisted similarity detection, which is slower but better suited to cultural archives where two nearly identical photographs may have distinct historical value depending on their provenance metadata.
The cost difference is real. Automated deduplication tools licensed at an enterprise scale can run from R$80,000 to R$400,000 per year for a mid-sized public institution, depending on storage volume and vendor. Manual curation projects at comparable archives in cities like New York and London have taken two to four years and required dedicated archivist teams. São Paulo's tech unicorn ecosystem—several of whose companies operate data management platforms out of offices in the Vila Olímpia and Faria Lima corridor—has begun marketing hybrid solutions to both public and private clients, pitching faster timelines at lower cost than pure manual approaches.
The most immediate decision point falls in August 2026, when PRODAM is expected to present its storage rationalisation plan to the Secretaria de Gestão ahead of the municipal budget cycle. That plan will determine whether the city commits to a citywide deduplication standard or leaves each secretaria to solve the problem independently—the latter outcome being, by most technical assessments, the more expensive one in the long run.
For cultural institutions, the pressure is different but equally urgent. The Arquivo Público do Estado de São Paulo, located on Rua Voluntários da Pátria in Santana, is understood to be in early-stage talks with federal counterparts about adopting a shared deduplication standard that could eventually align state and national digital archives. Whether that alignment happens before or after individual institutions make their own procurement decisions will shape São Paulo's digital heritage infrastructure for the next decade.
Civic technologists and archivists watching the process say the real risk is not choosing the wrong tool—it is the city and its institutions making incompatible choices that lock in fragmentation for years. The window to coordinate is open now, but it will not stay open indefinitely once budget allocations are locked and vendor contracts are signed.
How does this story make you feel?
Spread the word
About this article
Published by The Daily São Paulo
Daily brief
Free, in your inbox before 7am. Weekdays.
More in News