São Paulo's city government is sitting on a digitisation problem that accumulated quietly for nearly twenty years. Municipal databases managed by the Secretaria Municipal de Gestão hold hundreds of thousands of duplicate image files — scanned documents, urban survey photographs, and planning maps that were uploaded multiple times by different departments, often without any shared naming convention or deduplication protocol. The result is a ballooning storage cost and a retrieval crisis that city archivists have been flagging internally since at least 2022.
The issue matters now because Mayor Ricardo Nunes's administration is pushing a broader smart-city agenda, including the expansion of the Centro de Operações São Paulo on Rua Líbero Badaró, which depends on clean, searchable image data to function. Redundant files clog the pipelines that feed real-time urban monitoring. Every duplicated flood-sensor photograph from the Tietê river basin, for example, slows query times and distorts the analytical dashboards that emergency teams in the sala de crise rely on during the city's notoriously destructive summer rains.
How the Duplication Built Up Over Two Decades
The roots run back to the early 2000s, when individual secretariats began scanning paper records independently, each buying their own storage infrastructure and writing their own file management procedures. The Secretaria Municipal de Habitação digitised favela survey images in Vila Madalena and Heliópolis on separate platforms from those used by SVMA — the environment secretariat — which was simultaneously photographing Atlantic Forest remnants in Parelheiros. Nobody talked to each other. No common metadata standard existed.
A 2018 federal directive under the Arquivo Nacional framework pushed Brazilian municipalities to adopt shared digital preservation standards, but implementation in São Paulo was piecemeal. Budget cycles meant that some departments upgraded their systems in 2019 while others were still running legacy software from 2011. When pandemic-era digitisation grants arrived in 2020 and 2021, agencies rushed to scan physical backlogs without first auditing what was already in the system. Industry analysts covering public-sector IT in Brazil have estimated that major city governments in the country spend between 15 and 30 percent of their digital storage budgets on redundant data — though no official São Paulo-specific figure has been published by the city.
The Arquivo Histórico Municipal, located on Rua Presidente Prudente in Mooca, has the most acute version of this problem. Its collection of approximately 1.2 million historical images — many from the original Secretaria de Obras records of early 20th-century urban expansion — was digitised across three separate projects between 2004 and 2017. Librarians there have said publicly, in interviews published by trade journals covering Brazilian archival practice, that cross-referencing between the three datasets is still largely a manual process.
What a Fix Actually Requires
Deduplication at this scale is not a weekend job. Automated hash-matching tools can identify bit-for-bit identical copies quickly, but the harder problem is near-duplicate images — the same document scanned twice at different resolutions, or the same aerial photograph of Interlagos cropped differently for different departmental reports. That requires perceptual hashing algorithms and, ultimately, human review to confirm which version to keep and which provenance metadata to attach to the surviving file.
The city's current contract with its primary cloud infrastructure provider runs through December 2026, which creates a hard deadline. Any serious deduplication and reclassification project needs to complete its first audit phase before that contract is renegotiated, or the city risks locking in inflated storage allocations for another three-year term. Procurement rules under federal Lei 14.133/2021 — the public contracting legislation that replaced the old Lei 8.666 — require new technical specifications to be finalised at least 90 days before a contract renewal tender opens.
For city residents, the practical consequence of inaction is slower access to public records requests filed through the SP156 platform, longer processing times for building permit document retrieval at subprefeituras across the city, and degraded performance in the flood-monitoring tools that matter most when the rain hits in January. The administration has until the end of the third quarter to put a corrective plan on the table — or start explaining to ratepayers why the city is paying twice to store the same photograph of a Consolação street drain.