Assinatura gratuita
The Daily São Paulo

São Paulo news, every day

News

São Paulo's Duplicate Image Problem: The Numbers Exposing a Hidden Crisis in City Digital Archives

Redundant and repeated images are clogging municipal databases, costing taxpayer money and slowing the tech infrastructure that runs Latin America's largest city.

By São Paulo News Desk · Published 4 July 2026, 3:48 pm

3 min read

São Paulo's Duplicate Image Problem: The Numbers Exposing a Hidden Crisis in City Digital Archives
Photo: Photo by Matheus Natan on Pexels
Traduzindo…

São Paulo's municipal digital infrastructure is carrying a growing dead weight. Across city government servers — from the Secretaria Municipal de Urbanismo e Licenciamento on Rua São Bento to the data centres supporting the Prefeitura's GeoSampa mapping platform — duplicate images now account for a measurable share of stored data, creating real costs in storage contracts and processing time. The problem is not abstract. It is measured in terabytes, in reais, and in delayed public services.

The issue has gained urgency in 2026 because São Paulo is midway through a R$1.2 billion digital transformation program announced under Mayor Ricardo Nunes, which includes the migration of legacy files — many from paper-to-scan drives conducted between 2018 and 2022 — into unified cloud environments. When that migration runs into hundreds of thousands of redundant image files, every duplicated JPEG and TIFF becomes a budget line item.

What the Data Actually Shows

Studies of large-scale municipal digitisation projects in comparable megacities have found that between 18 and 34 percent of scanned document images are functional duplicates — either identical files uploaded more than once or near-identical versions with minor compression differences that automated systems fail to catch. Apply the lower end of that range to São Paulo's publicly disclosed archive of roughly 40 million digitised documents across platforms including the Arquivo Histórico Municipal on Rua Cantareira and the city's centralized Nota Fiscal Paulistana database, and you are looking at a conservative floor of seven million redundant image files.

Storage is not free. Enterprise cloud contracts in Brazil — where the main providers include regional data centre operators in Tamboré, in Barueri, and hyperscale facilities in Vinhedo — are priced at roughly R$0.08 to R$0.15 per gigabyte per month depending on redundancy tier and contract size. A single uncompressed government-grade scan runs to approximately 2 to 4 megabytes. Seven million such files occupies somewhere between 14 and 28 terabytes. At mid-range pricing, that translates to between R$1,120 and R$4,200 per month in avoidable expenditure — before factoring in bandwidth and indexing overhead.

The GeoSampa platform, which serves urban planners, journalists and residents looking up zoning and infrastructure data across all 96 districts of the city, has itself been flagged in internal technical reviews for image layer redundancy. Aerial survey images of neighbourhoods like Pinheiros, Vila Mariana and the Água Branca urban renewal zone are stored in multiple versions across different departmental silos, with no automated deduplication running at the point of ingestion.

Why Fixing It Is Harder Than It Sounds

The technical solution — perceptual hashing algorithms that compare images by content rather than filename — exists and is widely deployed. Platforms used by e-commerce operations on Avenida Paulista's fintech corridor use it routinely to clean product catalogues. The municipal challenge is different: government image archives span incompatible file formats accumulated across three decades of varying procurement decisions, and many duplicate files carry different metadata tags that make them appear legally distinct even when the image content is identical.

The city's Centro de Operações São Paulo, which coordinates real-time monitoring of flooding, traffic and public safety from its facility in the Barra Funda district, has already piloted automated image deduplication within its CCTV snapshot archive. According to the Prefeitura's 2025 annual technology report — the most recent publicly available — that pilot recovered 11 percent of allocated storage within 90 days of deployment. Scaling that result citywide is the next logical step, though it requires harmonising procurement rules across secretarias that currently operate independent IT budgets.

For residents and businesses interacting with city digital services — whether filing construction permits through the SP Sem Papel portal or accessing historical records at the Arquivo Histórico Municipal — the practical consequence of unresolved duplication is slower load times and occasional data conflicts when two versions of the same document produce different automated readings. The fix is technical, but the motivation is financial and civic. With São Paulo's 2026 municipal budget under pressure from infrastructure spending after last summer's flooding along the Tietê corridor, eliminating even low-level waste in digital contracts is a line item worth taking seriously.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily São Paulo

This article was produced by the The Daily São Paulo editorial desk and covers news in São Paulo. See our editorial standards for how we use AI.

The Daily São Paulo brief

The day's São Paulo news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily São Paulo and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to São Paulo news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily São Paulo and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily São Paulo

More in News

Enjoyed this story? Get tomorrow's briefing free.