São Paulo's municipal government is sitting on a digital archive problem years in the making. Across dozens of city secretariats, cultural institutions, and public-facing platforms, duplicate images — identical or near-identical photo files stored under different filenames, in different systems, sometimes on different servers — have quietly inflated storage costs, slowed public portals, and made reliable document retrieval a recurring headache for administrators and journalists alike. The reckoning, long deferred, is now formally underway.
The issue matters right now because the federal government's ongoing push toward integrated digital public services, formalised under the Estratégia de Governo Digital 2024–2027 framework, has set compliance benchmarks that cities must meet to access certain federal technology transfer funds. São Paulo, as the country's largest municipal economy and the administrative seat for roughly 12.3 million registered residents, cannot afford to lag. Redundant image data is not merely an aesthetic annoyance inside a content management system — it creates legal exposure when versioning conflicts arise in official records, and it degrades search performance on portals that millions of paulistanos use every week.
How the Mess Was Made
The roots of the problem trace back to at least 2013, when a wave of rushed digitisation projects hit Brazilian city governments ahead of the 2014 World Cup infrastructure push. São Paulo's Secretaria Municipal de Cultura, based in the Centro Histórico near Praça da Sé, launched its own digital cataloguing effort for the city's photographic heritage without a unified metadata standard. Simultaneously, the Empresa de Tecnologia da Informação e Comunicação do Município de São Paulo — known as PRODAM — was building out the backbone of the city's e-government infrastructure, but interoperability between its systems and the cultural secretariat's was never fully resolved.
Throughout the late 2010s, other agencies piled in. The São Paulo Turismo organisation digitised promotional photography for the city's tourism campaigns. The Secretaria Municipal de Urbanismo e Licenciamento scanned aerial survey images tied to zoning applications along corridors like Avenida Faria Lima and the expanded Linha 4-Amarela metro catchment area. Each initiative imported files using its own naming conventions. When staff uploaded the same event photographs — say, images from Virada Cultural on Paulista Avenue, which draws millions of people each May — to multiple platforms, duplicate files proliferated with no automated detection layer to flag the redundancy.
A 2023 audit commissioned by PRODAM, the results of which were reported in technical documentation published by the agency that year, identified that a significant share of files across several municipal content repositories were flagged as potential duplicates — a finding that accelerated internal pressure to act. The audit did not publish a single headline figure publicly, but PRODAM confirmed the problem was systemic rather than isolated.
The Cleanup, and What Comes Next
The current deduplication effort is being coordinated through PRODAM and involves deploying perceptual hashing tools — software that generates a fingerprint for each image based on visual content rather than filename — across the city's cloud-based document management environments. The program began phased rollout in early 2026 and is expected to process the backlog across priority secretariats by the end of the third quarter.
For ordinary paulistanos, the practical dividend should arrive in stages. Public portals like SP156, the city's main citizen services platform, and the Biblioteca Digital do Município have both struggled with sluggish image-loading speeds during peak access windows. Lighter, deduplicated repositories should reduce those bottlenecks, though administrators are careful not to overpromise timelines given the technical complexity of merging metadata from incompatible legacy systems.
For organisations that routinely request image records from city archives — press offices, urban researchers, law firms working on property licensing cases along corridors like Avenida Paulista and Rua Augusta — the more immediate benefit will be search accuracy. When duplicate records carry different dates or slightly different descriptions, retrieval results become unreliable. Removing the redundancies means the record you pull is, at minimum, the canonical one.
The larger lesson São Paulo is absorbing, somewhat belatedly, is that digitisation without governance architecture simply moves analogue chaos into a digital format. The city now has an opportunity to build the metadata standards it skipped the first time around — if the political will to fund PRODAM at the necessary scale holds through the next municipal budget cycle, expected to be debated at the Câmara Municipal on Viaduto Jacareí in late 2026.