Assinatura gratuita
The Daily São Paulo

São Paulo news, every day

News

São Paulo's Digital Archives Are Drowning in Duplicate Images — and the Numbers Show Why It Matters

From city hall servers to the cultural institutions along Avenida Paulista, redundant image files are quietly eating storage budgets and slowing down the platforms residents rely on.

By São Paulo News Desk · Published 4 July 2026, 4:10 pm

3 min read

São Paulo's Digital Archives Are Drowning in Duplicate Images — and the Numbers Show Why It Matters
Photo: Photo by fabianoshow4 on Pexels
Traduzindo…

São Paulo's public sector holds tens of millions of digital image files. A significant share of them are exact or near-exact duplicates — the same photograph stored twice, three times, sometimes a dozen times across departments that never speak to each other. That unglamorous fact has real costs, and a growing coalition of municipal IT teams and cultural institutions is finally putting numbers to the problem.

The timing matters because the Prefeitura de São Paulo is mid-way through its Programa SP Digital modernisation drive, which set a deadline of December 2026 for consolidating data infrastructure across more than 40 secretariats. Storage inefficiency is one of the programme's central targets. Duplicate image files — generated by repeated uploads, broken migration scripts, and siloed content management systems — represent one of the most measurable forms of digital waste.

What the data actually shows

Internal audits conducted by the Secretaria Municipal de Inovação e Tecnologia during the first quarter of 2026 found that duplicated files across municipal servers accounted for a substantial fraction of total storage consumption, according to documents released under the city's Lei de Acesso à Informação requests filed in May. The picture is clearest in the cultural sector. The Pinacoteca do Estado de São Paulo, located at Praça da Luz in the Luz neighbourhood, manages a digitised collection that has grown by roughly 30 percent since 2020, partly through successive scanning campaigns. File redundancy is a known byproduct of that kind of rapid ingestion: multiple scans of the same work at different resolutions end up stored permanently rather than culled after quality checks.

Cloud storage is not cheap. Contracts for government-grade object storage in Brazil currently run at roughly R$0.09 to R$0.12 per gigabyte per month depending on the tier and provider, according to published pricing from major platforms operating in the country. For an institution sitting on several hundred terabytes of image data — a realistic figure for a large municipal archive — duplicate elimination can translate directly into five- or six-figure annual savings in reais. The Centro Cultural São Paulo on Rua Vergueiro, one of the city's busiest cultural venues, completed a partial deduplication exercise on its digital image library in late 2025 that city documentation describes as freeing up meaningful server capacity ahead of a platform migration.

The problem is not unique to public institutions. São Paulo's tech ecosystem — anchored around the Faria Lima corridor and the startup clusters of Vila Olímpia and Pinheiros — has produced numerous companies whose core product involves managing large visual datasets. E-commerce platforms, real estate portals, and media companies all grapple with the same arithmetic: every duplicate image file stored is money spent and load time added. A report published by the Associação Brasileira de Startups in early 2026 noted that media asset management inefficiency ranked among the top five infrastructure costs cited by growth-stage companies in the Greater São Paulo region.

Detection tools and what comes next

The technical approaches to solving the problem are well-established. Perceptual hashing — an algorithm that generates a fingerprint for an image based on visual content rather than file metadata — can identify near-duplicates even when file names, formats, or compression levels differ. Tools using this method have been commercially available for years and are increasingly embedded in content management platforms. The challenge in São Paulo's public sector is not technology but governance: determining which copy is canonical, who holds deletion authority, and how to log removals for auditing purposes under Brazilian records law.

The Arquivo Público do Estado de São Paulo, based in the Jardim Paulistano neighbourhood, has been working since 2023 on updated protocols for digital record retention that would give institutions clearer legal grounding to delete redundant files without risking compliance violations. Those protocols are expected to be finalised before the end of 2026. For private companies, the path is shorter — most can act immediately using commercially available deduplication software without navigating the same bureaucratic layers. Either way, the arithmetic is simple: in a city generating and storing images at São Paulo's scale, leaving the duplicate problem unaddressed is a choice that shows up directly on the monthly invoice.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily São Paulo

This article was produced by the The Daily São Paulo editorial desk and covers news in São Paulo. See our editorial standards for how we use AI.

The Daily São Paulo brief

The day's São Paulo news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily São Paulo and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to São Paulo news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily São Paulo and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily São Paulo

More in News

Enjoyed this story? Get tomorrow's briefing free.