São Paulo's municipal government is sitting on hundreds of thousands of duplicate digital images — photographs, maps, infrastructure scans, and event records — spread across at least a dozen separate departmental servers, with no unified system to detect or eliminate the redundancies. The problem, years in the making, has now reached a breaking point as the city prepares to consolidate its data infrastructure under the Programa São Paulo Inteligente, the smart-city initiative overseen by the Secretaria Municipal de Inovação e Tecnologia.
The duplication crisis matters for a straightforward reason: storage costs money, disorganised records slow emergency response, and a city dealing with chronic flooding in districts like Itaquera and Jardim Pantanal cannot afford administrative friction when engineers need accurate, up-to-date drainage maps in a hurry. Officials have known about the redundancy problem since at least 2022, when an internal audit flagged it during a review of the GeoSampa geospatial platform maintained by the city's urban planning arm, SMUL. But budget cycles, the 2024 municipal election, and competing infrastructure priorities kept any serious fix off the table.
A Fragmented History of Municipal Data
The roots of the problem go back to the mid-2000s, when individual secretariats began digitising their own records independently, with no shared naming convention, no metadata standard, and no central repository. The Secretaria de Obras, for instance, built its own image archive for construction site documentation. The Secretaria de Saúde maintained a parallel system for health facility imagery. CET — the Companhia de Engenharia de Tráfego — operated yet another database storing camera stills and incident photographs from the city's traffic monitoring network, which covers more than 900 intersections across the metropolitan area.
By 2019, when the city began pushing records onto cloud infrastructure through contracts with providers operating out of data centres in Tamboré, in Barueri, the volume of uploaded files exploded — and so did the duplication. Files migrated from legacy servers often arrived without deduplication processing. The same photograph of a burst water main on Avenida Rebouças, for example, might exist in four separate directories under different filenames, each uploaded by a different department responding to the same incident.
The financial cost is not trivial. Cloud storage for unstructured data — the category that includes images — runs at roughly R$0,08 to R$0,15 per gigabyte per month on contracts of the kind São Paulo typically signs, according to publicly available pricing from the major providers operating in Brazil. With estimates inside the Secretaria de Inovação suggesting the city holds upward of 40 terabytes of image data, a conservative calculation puts annual redundancy-related waste in the tens of millions of reais — money that could otherwise fund drainage work in the Zona Leste or bus shelters along Avenida Celso Garcia.
The Mechanism of the Fix — and What Comes Next
The proposed solution centres on automated duplicate-image replacement: a process in which software scans existing archives using perceptual hashing — a technique that identifies visually identical or near-identical images regardless of filename or format — flags duplicates, and systematically replaces redundant copies with a single canonical version stored in a master repository accessible to all departments. The technology is well understood and widely deployed by large media organisations and e-commerce platforms. Applying it to a fragmented government archive is the harder part.
Under the current timeline discussed inside IPT — the Instituto de Pesquisas Tecnológicas do Estado de São Paulo, which has been consulted on the technical specifications — a pilot phase covering the GeoSampa and CET archives is expected to begin in the third quarter of 2026. Full citywide rollout, contingent on budget approval in the next fiscal cycle, is targeted for 2027.
For residents and businesses that rely on city data — the startups clustered around Cubo Itaú on Rua Tamoios in the Vila Olímpia area, or the urban planners who work daily with GeoSampa layers — the practical upshot is better data hygiene and faster query times. For the city, it means a leaner, cheaper, more defensible digital infrastructure. The hard part, as always in São Paulo's administrative machinery, will be getting a dozen secretariats to agree on who controls the master archive and who signs off on what gets deleted.