Assinatura gratuita
The Daily São Paulo

São Paulo news, every day

News

São Paulo's Digital Archives Are Drowning in Duplicate Images — and the Numbers Tell a Costly Story

Municipal databases, newsrooms and e-commerce platforms across the city are losing millions of reais annually to redundant image files, and data analysts are finally quantifying the damage.

By São Paulo News Desk · Published 4 July 2026, 4:16 pm

3 min read

São Paulo's Digital Archives Are Drowning in Duplicate Images — and the Numbers Tell a Costly Story
Photo: Photo by Rafael Silva on Pexels
Traduzindo…

São Paulo's public and private institutions are sitting on a data-storage crisis hiding in plain sight. Duplicate images — identical or near-identical digital files stored multiple times across servers — are consuming an estimated 30 to 40 percent of raw storage capacity in large Brazilian municipal databases, according to sector benchmarks published by the Brazilian Association of Information Technology and Communication (Brasscom) in its 2025 digital infrastructure report. In a city running thousands of concurrent digital services, that redundancy carries a direct price tag.

The issue has moved from a back-office nuisance to a budget line item worth watching. São Paulo's city hall is currently migrating legacy records from the old Diário Oficial print archive to a centralised digital platform managed by the Secretaria Municipal de Gestão (SMG), a process that began formally in March 2025. Technicians working on that migration have reportedly encountered file duplication rates that complicate indexing — a problem familiar to archivists and, increasingly, to finance officers who sign off on cloud storage contracts.

What Duplication Actually Costs in Reais and Gigabytes

Cloud storage in Brazil is not cheap. Pricing from major providers serving the Brazilian market ran between R$0.08 and R$0.23 per gigabyte per month for standard tiers as of early 2026. A mid-sized São Paulo newsroom or municipal department managing a photographic archive of, say, 10 terabytes — a conservative figure for any institution covering Paulista Avenue protests, Carnaval or flooding events in Jardim Ângela and Heliópolis over a decade — could be paying for three or four terabytes of pure redundancy every single month. Annualised, that is a recurring cost of roughly R$28,800 to R$82,800 wasted on files the system already has.

For e-commerce operations based in the Vila Olímpia and Faria Lima technology corridor — home to logistics-tech firms and retail startups that product-photograph thousands of SKUs — the numbers scale fast. Industry tooling firm Bling, which serves Brazilian small and medium retailers, flagged in a 2024 user survey that product image duplication was among the top five causes of catalogue inconsistency reported by its São Paulo user base. Inconsistent catalogues mean delayed listings, which translate to lost sales windows, particularly during high-volume dates like Black Friday and the FIFA World Cup commercial cycle currently running through July 2026.

Detection Tools and What Comes Next for São Paulo Operators

Automated deduplication software — tools that use perceptual hashing algorithms to identify visually identical images even when file names differ — has existed for years, but adoption among Brazilian municipal bodies has lagged behind the private sector. The São Paulo State Data Processing Company, Prodesp, which manages infrastructure for state-level digital services including the Poupatempo network of citizen service centres, began piloting deduplication protocols across its storage environment in the second half of 2025. Prodesp has not published outcomes from that pilot publicly yet.

On the private side, the math for action is straightforward. A one-time deduplication pass on a 10-terabyte archive using open-source tooling costs primarily in engineer time — typically eight to twenty hours depending on archive organisation — against ongoing monthly savings that compound indefinitely. For startups in the Sambódromo-adjacent Barra Funda tech cluster or established media companies operating out of Pinheiros, the return-on-investment calculation closes within the first billing cycle after cleanup.

For organisations in São Paulo that have not yet audited their image libraries, the practical starting point is a storage audit rather than an immediate software purchase. Mapping which departments or product lines generate the highest upload volumes — events photography, social media asset libraries, scanned documents — identifies where duplication concentrates. From there, perceptual hash tools including open-source options like DupeGuru or enterprise platforms can automate the matching. The SMG migration offers a live case study; how the city accounts for storage savings in its next infrastructure procurement round, expected in the first quarter of 2027, will be worth tracking for anyone managing digital assets at scale in the region.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily São Paulo

This article was produced by the The Daily São Paulo editorial desk and covers news in São Paulo. See our editorial standards for how we use AI.

The Daily São Paulo brief

The day's São Paulo news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily São Paulo and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to São Paulo news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily São Paulo and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily São Paulo

More in News

Enjoyed this story? Get tomorrow's briefing free.