São Paulo's municipal and private digital archives are sitting on a time bomb. Hundreds of thousands of duplicate images — photographs of the same potholes on Avenida Paulista, the same flood damage in Heliópolis, the same infrastructure inspections on Marginal Tietê — are clogging government servers, distorting machine-learning datasets and, in some cases, actively misleading urban planning decisions. The question now is who decides what gets deleted, and when.
The issue has moved from a technical nuisance to a governance headache precisely because São Paulo is in the middle of expanding its smart-city ambitions. Prefeitura de São Paulo is pushing forward with its Programa Cidade Inteligente, which relies on image data gathered from municipal cameras, drone inspections and citizen-submitted reports through the SP156 complaint platform. When that input data contains duplicates — sometimes dozens of copies of a single event — the models built on top of it inherit the error. Planners in the Secretaria Municipal de Urbanismo e Licenciamento have been asked to make budget and zoning decisions informed partly by visual evidence that has not been deduplicated in any systematic way.
Where the Backlog Is Worst
The sharpest pressure points are in the city's eastern zone. The subprefecture of São Mateus and the broader Zona Leste corridor have been subject to repeated drone survey campaigns since 2023, particularly after flooding events that prompted emergency documentation. Multiple overlapping missions by different contractors — some commissioned by the Defesa Civil, others by the Secretaria Municipal de Infraestrutura Urbana — produced image sets that were stored separately without cross-referencing. Instituto de Pesquisas Tecnológicas, the state-linked research body based in Cidade Universitária, has been working on deduplication protocols, but no single authority has been given a mandate to enforce them across agencies.
The private sector adds another layer of complexity. Several of São Paulo's tech unicorns, including logistics and urban-mobility platforms that operate across districts from Pinheiros to Santo André, maintain their own georeferenced image libraries for street-level mapping. These datasets are occasionally licensed to municipal bodies. Without standardised metadata — specifically without a shared geotag schema or a timestamp convention — merging those commercial records with public archives routinely produces the duplication problem all over again.
A complicating factor is scale. São Paulo's SP156 platform received more than 2.1 million service requests in 2024, according to data published by the Prefeitura. A significant share of those submissions include attached photos. If even 15 percent of image-backed submissions are duplicates — a conservative estimate by industry standards for crowdsourced platforms of that size — the municipal system is managing roughly 300,000 redundant files from that source alone, every year.
The Decisions No One Has Made Yet
Three choices will define how this unfolds. First, the city must decide whether deduplication is a centralised function or a per-secretaria responsibility. Centralising it under the Empresa de Tecnologia da Informação e Comunicação do Município de São Paulo, known as PRODAM, is the tidier option technically, but it requires political agreement between departments that have historically guarded their own data infrastructure. PRODAM has the server architecture; it does not yet have the legal mandate.
Second, there is a procurement question. If the city moves toward an automated deduplication pipeline — using perceptual hashing or embedding-based similarity search — that contract will need to go through a Licitação process under federal procurement rules. The federal government's recent push to digitise contracting under the new framework introduced in 2023 should, in theory, accelerate that, but municipal IT contracts in São Paulo have historically taken 14 to 22 months from specification to signature.
Third, and most consequentially, someone must decide what the retention rules are. Deleting a government image record is not trivial: Brazil's Lei de Acesso à Informação and LGPD both touch on document preservation obligations. The Arquivo Histórico Municipal on Rua Voluntários da Pátria would need to be consulted on archival standards before any mass deletion programme could begin. Get the retention policy wrong and the city risks destroying evidence it may need for litigation, audit or — given the flooding history along the Tietê — disaster-response accountability. The next six months, before the Programa Cidade Inteligente moves into its next procurement phase, is the window to get this right.