São Paulo's Duplicate Image Crisis: The Key Decisions That Will Define What Comes Next
City agencies and tech firms face a hard reckoning over how to clean up duplicated digital records flooding municipal databases — and the clock is ticking.
City agencies and tech firms face a hard reckoning over how to clean up duplicated digital records flooding municipal databases — and the clock is ticking.

São Paulo's municipal government is staring down a sprawling data integrity problem that has quietly undermined years of urban planning work: thousands of duplicate images embedded across city databases, from favela mapping projects in Heliópolis to infrastructure records managed by the Secretaria Municipal de Urbanismo e Licenciamento on Rua São Bento. The duplication has created conflicting records, slowed permit approvals and, in at least one documented case reviewed by this newspaper, fed inaccurate aerial data into the city's flood-risk modelling system — a serious liability given São Paulo's chronic drainage failures.
The issue matters now because the Nunes administration is midway through a digitisation push tied to the city's PlanTech 2025-2028 programme, a framework designed to modernise municipal data infrastructure across all 96 subprefeituras. Duplicate images are not a cosmetic glitch. When the same georeferenced photograph of a street drain on Avenida do Estado appears under two different cadastral codes, engineers working on drainage interventions can draw contradictory conclusions. The city lost at least one federal co-financing window in 2025 because submitted documentation contained conflicting photographic records — a fact disclosed in a Tribunal de Contas do Município audit summary published in March 2026.
Two institutions sit at the centre of the decisions that must now be made. The Instituto de Pesquisas Tecnológicas, based in Cidade Universitária on the west side of the city, has been contracted to develop a deduplication protocol for image assets held by the Empresa de Tecnologia da Informação e Comunicação do Município de São Paulo, known as PRODAM. A working group between the two bodies met for the third time in June 2026, according to a PRODAM agenda document obtained by The Daily São Paulo. The core question on the table: whether to run an automated hash-matching sweep across the full archive first, or to prioritise a manual audit of roughly 14,000 flagged records in the flood-zone mapping layer that covers the Tietê and Pinheiros river corridors.
The automated approach would be faster and cheaper — IPT engineers have estimated a full sweep could be completed within 90 days using existing server capacity at the PRODAM data centre in Santo André. The manual audit, by contrast, would likely take until the first quarter of 2027 and require hiring at least 12 specialist contractors. But the automated route carries risk: hash-matching removes exact duplicates but misses near-duplicates — images taken seconds apart that register as distinct files yet represent identical infrastructure conditions. In flood-risk mapping, that distinction matters enormously.
The cost differential is not trivial for a city running a 2026 technology budget of approximately R$980 million, of which PRODAM controls a discretionary slice of around R$140 million, according to figures published in the city's Lei Orçamentária Anual for 2026. Hiring a full manual-audit team would consume an estimated R$3.2 million, a figure that PRODAM's leadership has flagged as requiring mayoral sign-off given current fiscal constraints.
The political dimension is real. Paulista Avenue is not just a cultural flashpoint — it is also the address of several fintech and mapping-tech startups that have supply contracts with the city. At least two of those firms, both members of the Associação Brasileira de Startups, have offered proprietary deduplication tools to PRODAM at no upfront cost in exchange for data-sharing agreements. That arrangement has drawn scrutiny from the Câmara Municipal's technology subcommittee, which held a public hearing on municipal data contracts in May 2026.
The decisions ahead follow a tight sequence. PRODAM must submit a deduplication strategy recommendation to the Secretaria de Inovação e Tecnologia by 31 July 2026. If approved, procurement for whichever approach wins out must be completed before the October municipal recess to avoid another lost co-financing cycle. Residents and civil society groups in flood-prone districts — particularly in Grajaú and along the Córrego do Sacomã in the south zone — have the most to lose from further delay. The practical advice for anyone tracking this file: watch the 31 July deadline and the Câmara subcommittee's next scheduled session, currently set for 22 July, where the startup data-sharing proposals are expected to come to a vote.
How does this story make you feel?
Spread the word
About this article
Published by The Daily São Paulo
Daily brief
Free, in your inbox before 7am. Weekdays.
More in News