São Paulo's Image Duplication Problem: The Numbers Exposing a City-Wide Data Crisis
Tens of thousands of duplicate images are clogging municipal databases and costing taxpayers millions — and the numbers reveal how deep the problem runs.
Tens of thousands of duplicate images are clogging municipal databases and costing taxpayers millions — and the numbers reveal how deep the problem runs.

São Paulo's city government is sitting on a digital storage problem that nobody wants to talk about publicly but that auditors have been quietly flagging for months. Duplicate images — identical or near-identical digital files stored multiple times across separate municipal systems — now account for an estimated 30 to 40 percent of total storage consumption in several secretariats, according to infrastructure reviews conducted by the Controladoria Geral do Município in the first quarter of 2026. The redundancy is costing the city real money and slowing down systems that residents depend on every day.
The timing matters. Mayor Ricardo Nunes has staked much of his second-term administrative agenda on the São Paulo Inteligente program, a broad push to digitise city services and centralise data across agencies. But centralisation exposes something that siloed bureaucracies were able to hide: the same photograph, scanned document or satellite image, captured once in the field, often ends up stored six, eight, sometimes twelve times across different departmental servers. Before you can build a smart city, you have to clean the dumb data underneath it.
The problem is most acute in three areas: urban planning documentation held by the Secretaria Municipal de Urbanismo e Licenciamento, public health imaging archived through the Hospital das Clínicas integration network on Avenida Dr. Enéas de Carvalho Aguiar in Cerqueira César, and traffic and surveillance footage routed through the Centro de Controle Operacional da CET on Rua Barão de Itapetininga, near República. Each of those systems grew organically over years, with different procurement cycles and incompatible metadata standards, making automated deduplication difficult.
Storage contracts for municipal cloud and on-premise infrastructure cost São Paulo roughly R$180 million per year in the 2025 budget cycle, a figure disclosed in the Diário Oficial do Município last November. Independent IT governance analysts who study Brazilian municipal procurement — without being named here because they work with clients that bid on city contracts — have publicly written in sector journals that eliminating confirmed duplicate files alone could reduce storage overhead by 20 to 25 percent in large Brazilian municipalities. Applied to São Paulo's baseline, that range implies potential annual savings somewhere between R$36 million and R$45 million. The city has not published its own savings projection.
The duplication issue also has a latency cost. A January 2026 internal performance review by the Empresa de Tecnologia da Informação e Comunicação do Município de São Paulo — PRODAM — found that query response times on the Nota Fiscal Paulistana database, which stores transaction images and scanned receipts, degraded by an average of 17 percent over the previous 18 months. PRODAM attributed part of that slowdown to index bloat caused by duplicate file records. The Nota Fiscal Paulistana system is used by millions of residents who register purchases on Avenida Paulista-area businesses to claim IPTU credits.
PRODAM launched a deduplication pilot in February 2026, focused initially on the Secretaria de Subprefeituras' photo archive of street-level infrastructure inspections — potholes, broken kerbs, flooded drainage channels in districts like Penha and Santo André border zones. The pilot used hash-comparison algorithms to identify exact duplicates across 4.2 million stored images. After three months, the agency reported eliminating 1.1 million redundant files, recovering roughly 6 terabytes of primary storage. That is a meaningful but modest start given the scale of the broader problem.
The next phase, scheduled for the third quarter of 2026, will attempt near-duplicate detection — images that are slightly different in resolution or metadata but functionally identical in content. That is a harder technical problem and a more expensive one. PRODAM has budgeted R$8.7 million for the expanded deduplication effort, a line item confirmed in the April 2026 supplementary budget approved by the Câmara Municipal.
For residents and businesses that interact with city digital services, the practical upshot is straightforward: faster portals, more reliable document retrieval, and — if the savings projections hold — slightly less pressure on the city's discretionary budget. The deduplication work is unglamorous. It happens in server rooms, not on Paulista. But the numbers behind it are large enough that it deserves more than a footnote in the São Paulo Inteligente rollout story.
How does this story make you feel?
Spread the word
About this article
Published by The Daily São Paulo
Daily brief
Free, in your inbox before 7am. Weekdays.
More in News