São Paulo's municipal government holds an estimated 4.7 million digital image files spread across at least eleven departmental servers, and a significant portion of those files are exact or near-exact duplicates. That is the operational reality that information-management specialists contracted by the Secretaria Municipal de Gestão have been documenting since late 2024 — a problem that costs server infrastructure budget, slows archival searches and, in several documented cases, has caused the wrong version of a photograph to appear in official communications.
The story of how the city arrived at this point is not dramatic. It is the slow accumulation of two decades of uncoordinated digitisation, departmental silos and procurement decisions made in isolation. Understanding that history matters now because the Prefeitura de São Paulo, under Mayor Ricardo Nunes, has included image-library consolidation as a line item in the 2026 administrative modernisation programme — meaning public money is about to be committed to solving a problem that public decisions created.
Digitisation Without a Plan
The roots go back to roughly 2003 and 2004, when city departments began converting analogue photo archives to digital formats. The Arquivo Histórico Municipal, located near Largo do Paissandu in the Centro district, held glass-plate negatives and silver-gelatin prints dating to the nineteenth century. Those were scanned by at least three separate contractors over a ten-year period, each using different resolution standards and file-naming conventions. Nobody consolidated the outputs into a single repository. The Arquivo ended up with multiple scans of the same historical image — sometimes at three different resolutions, sometimes with different colour profiles, all saved as separate files with no parent-child metadata linking them.
The same pattern repeated across the city's communications infrastructure. When the Centro de Processamento de Dados do Município de São Paulo, the CPD-SP, migrated to cloud-hybrid storage in 2019, departments transferred their local drives wholesale rather than cleaning them first. Photographs used in press releases, campaign materials for Paulista Avenue events and flood-response documentation from the Córrego Tamanduateí basin area all arrived on the new servers carrying years of accumulated redundancy. Compression artefacts from earlier format conversions meant that what looked like identical images were technically distinct files, defeating simple hash-based deduplication tools.
By 2022, the Coordenadoria de Tecnologia e Inovação estimated internally that storage waste attributable to duplicate or near-duplicate media files was running at around 18 percent of total allocation — a figure that, applied to the CPD-SP's contracted server capacity, represented a non-trivial annual expenditure. Procurement of additional storage to accommodate the bloat proceeded regardless.
What the 2026 Programme Is Supposed to Fix
The current administrative modernisation push draws on a methodology that combines perceptual hashing — a technique that identifies visually similar images even when file checksums differ — with manual curatorial review by archivists. The Instituto de Pesquisas Tecnológicas, the IPT on Avenida Professor Almeida Prado in Cidade Universitária, is among the technical partners named in the programme's tender documentation published in the Diário Oficial do Município earlier this year.
The work is being piloted first in two departments: the Secretaria de Cultura and the Secretaria de Desenvolvimento Urbano. Both generate high volumes of photographic content — the former through events at venues from the Sala São Paulo to the Casa das Rosas on Avenida Paulista, the latter through construction-monitoring photography across the city's perpetually expanding peripheral districts.
For residents and civil-society organisations that routinely request images under the Lei de Acesso à Informação, the practical consequence of the cleanup should eventually be faster response times and less confusion about which photograph represents the authoritative record of a given event or infrastructure project. Several freedom-of-information researchers working with institutions in Pinheiros have noted informally that duplicate files have complicated requests in the past, with departments sometimes providing different versions of the same image without explanation.
The consolidation timeline runs through December 2026. Whether the pilot produces a replicable model that can be rolled out across all eleven server environments — including the politically sensitive databases held by the Secretaria de Segurança Urbana — will depend on budget decisions in the 2027 municipal spending review. That review begins in September.