São Paulo's Digital Archive Crisis: The Hidden Scale of Duplicate Images Clogging City Systems
New data reveals how tens of millions of redundant image files are draining public storage budgets and slowing down government platforms across the city.
New data reveals how tens of millions of redundant image files are draining public storage budgets and slowing down government platforms across the city.

São Paulo's municipal digital infrastructure is carrying a weight most residents never see. Across platforms managed by the Secretaria Municipal de Inovação e Tecnologia, technical audits conducted in the first half of 2026 identified more than 40 million duplicate image files distributed across servers handling everything from urban planning records to public health databases — a problem that costs the city an estimated R$12 million annually in unnecessary cloud storage fees alone.
The issue matters now because the Nunes administration is midway through a R$400 million digital modernisation push that includes migrating legacy systems to hybrid cloud architecture. Carrying millions of redundant files into a new infrastructure compounds costs and slows processing times at exactly the moment the city is trying to accelerate services. The Programa São Paulo Digital, launched in March 2025, was supposed to streamline data governance before the migration window closes at the end of 2026. Duplicate image data is testing that deadline.
The problem is not abstract. At the Centro de Operações São Paulo — the city's main data nerve centre, located in Mooca — staff running quality checks on urban mobility datasets found the same drone-survey images of Marginal Pinheiros flood barriers stored, in some cases, up to 23 times across different departmental folders. The Instituto de Tecnologia e Sociedade do Brasil, which has a research partnership with the city, flagged the same pattern in health-record imaging archives linked to the Rede Hora Certa clinics across the Zona Leste, where patient intake photographs were being duplicated each time a case was transferred between units without automated deduplication protocols in place.
The scale becomes clearer when broken down by file type. JPEG and PNG files make up roughly 78 percent of the redundant data volume, according to internal technical documentation circulated within the Secretaria. Each duplicate image costs, on average, R$0.023 per month to store on contracted cloud infrastructure — trivial individually, catastrophic at volume. At 40 million redundant files, that compounds to roughly R$920,000 per month, or just over R$11 million per year, before accounting for bandwidth costs triggered whenever those files are accessed or backed up.
Globally, this is a recognised problem. A 2024 study by the International Data Corporation estimated that between 30 and 40 percent of enterprise image storage worldwide consists of exact or near-exact duplicates. São Paulo's figures fall squarely within that range, which means the city is not an outlier — but it also means the tools to fix it already exist and are in commercial deployment elsewhere. Cities including Barcelona and Seoul have deployed automated perceptual hashing systems — software that identifies visually identical images even when file names differ — cutting duplicate image volumes by more than 60 percent within 18 months of implementation.
For São Paulo, the math on a comparable intervention is straightforward. A deduplication system deployment across all municipal servers, priced by comparable procurement contracts in Brazil's federal government, runs between R$3 million and R$5 million upfront. Payback, at current storage cost rates, would arrive in under six months.
The Secretaria Municipal de Inovação e Tecnologia has reportedly included deduplication tooling in a procurement notice expected to be published in the Diário Oficial do Município before August 2026, though the specific scope has not been confirmed publicly. Civil society groups monitoring the Programa São Paulo Digital — including the think tank Centro de Estudos em Governança Digital, which tracks city tech spending from its office near Avenida Paulista — have pushed for the tender to include open-source deduplication options alongside proprietary bids, arguing that transparency requirements under Brazil's Lei de Acesso à Informação demand auditability of how public image data is deduplicated and which files are permanently deleted.
For residents interacting with city services, the practical consequence of unchecked duplication is slower load times on platforms like SP156, the main citizen services portal, and longer processing delays when submitting image-based documents — construction permits, social assistance applications, health intake forms. Cleaning the data is not glamorous governance. The numbers say it is overdue.
How does this story make you feel?
Spread the word
About this article
Published by The Daily São Paulo
Daily brief
Free, in your inbox before 7am. Weekdays.
More in News