Assinatura gratuita
The Daily São Paulo

São Paulo news, every day

News

São Paulo's Digital Archive Problem: The Numbers Behind the City's Duplicate Image Crisis

Municipal databases, cultural institutions and tech firms across the city are sitting on billions of redundant digital files — and the storage bill is mounting fast.

By São Paulo News Desk · Published 4 July 2026, 4:51 pm

3 min read

São Paulo's Digital Archive Problem: The Numbers Behind the City's Duplicate Image Crisis
Photo: Photo by Kaique Rocha on Pexels
Traduzindo…

São Paulo's public digital infrastructure is choking on copies of itself. Across municipal servers, cultural archives and the private tech ecosystem concentrated around Faria Lima Avenue, duplicate image files now account for a disproportionate share of stored data — driving up costs, slowing systems and complicating everything from urban planning to emergency response. The problem has a name in the industry: duplicate image redundancy. And in a city of 22 million people generating data at Latin America's highest per-capita rate, the numbers behind it are striking.

This is not an abstract problem. The São Paulo City Hall's own digital transformation unit, the Secretaria Municipal de Inovação e Tecnologia, manages archives that span decades of urban documentation — satellite imagery of the Tietê River floodplain, permit photographs from Brás and Mooca, surveillance stills from the Centro Histórico. Technical specialists in the field estimate that, in large municipal archives of this type, between 30 and 45 percent of stored image files are functionally identical or near-identical duplicates — a figure consistent with audits conducted by similar city governments in Mexico City and Bogotá over the past three years. For São Paulo, operating one of the largest municipal data centres in the Southern Hemisphere, that redundancy translates directly into millions of reais in avoidable annual expenditure on server capacity and energy.

What the Data Actually Shows

The commercial cloud storage market gives the problem a concrete price tag. As of mid-2026, enterprise cold-storage pricing from major providers operating Brazilian data centres — including facilities in Tamboré, in Barueri, and in the Alphaville tech corridor — runs at roughly R$0.08 to R$0.12 per gigabyte per month for large institutional clients. A municipal archive carrying 40 percent redundancy across, say, 500 terabytes of image data is paying for approximately 200 terabytes it does not need. At R$0.10 per gigabyte, that excess costs around R$20,000 per month — R$240,000 per year — before accounting for backup replication, which typically doubles that figure.

Private sector players in the Paulista Avenue and Vila Olímpia corridors face the same arithmetic at greater scale. São Paulo is home to more than 30 technology unicorns and a startup ecosystem that, according to the Associação Brasileira de Startups, produced over R$12 billion in venture capital inflows in 2025. Many of those companies — particularly those in fintech, proptech and healthtech — maintain image-heavy databases: property photos, identity document scans, medical imaging. Industry benchmarks published by data management firm Iron Mountain suggest that unmanaged image libraries in companies of 500 or more employees contain duplicate rates of up to 52 percent within three years of archive creation.

The Instituto Moreira Salles, which operates one of Brazil's most significant photographic collections from its unit on Paulista Avenue, began a systematic deduplication project in 2023. The institution has not published full results, but the challenge of distinguishing true duplicates from near-identical variants — different scans of the same print, for instance — is one that archivists describe as labour-intensive without automated tooling.

The Fix — and What Comes Next

Deduplication software has matured considerably. Tools using perceptual hashing algorithms — which compare images by visual content rather than file metadata — can now process millions of images in hours rather than weeks. Several firms with São Paulo offices, including those clustered in the tech hub around Rua Funchal in Vila Olímpia, offer services that pair automated flagging with human review workflows, reducing false-positive deletion rates to below two percent.

For Mayor Ricardo Nunes's administration, which has committed to a broader smart-city digitisation push under the 2024-2028 municipal plan, the practical advice from data specialists is to run deduplication audits before migrating legacy systems — not after. Migration is when redundant files get copied into new infrastructure, compounding the cost. The window to act efficiently is now, before the city's next major server infrastructure contract cycle, expected to open for tender in the first quarter of 2027. Institutions sitting on unaudited image archives — municipal or private — will find the cost of inaction measured not just in storage bills, but in the slower, heavier systems that everyone who depends on them will notice.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily São Paulo

This article was produced by the The Daily São Paulo editorial desk and covers news in São Paulo. See our editorial standards for how we use AI.

The Daily São Paulo brief

The day's São Paulo news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily São Paulo and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to São Paulo news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily São Paulo and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily São Paulo

More in News

Enjoyed this story? Get tomorrow's briefing free.