Assinatura gratuita
The Daily São Paulo

São Paulo news, every day

News

São Paulo's Digital Archives Are Drowning in Duplicate Images — and the Numbers Tell a Damning Story

From Prefeitura servers in República to university libraries in Pinheiros, redundant image files are eating storage budgets and slowing down the city's push toward open digital government.

By São Paulo News Desk · Published 4 July 2026, 4:00 pm

4 min read

São Paulo's Digital Archives Are Drowning in Duplicate Images — and the Numbers Tell a Damning Story
Photo: Photo by Wallace Chuck on Pexels
Traduzindo…

São Paulo's public digital infrastructure is carrying a hidden weight. Across municipal servers, state university repositories and the sprawling content networks of city-linked cultural institutions, duplicate image files account for an estimated 30 to 40 percent of total stored visual content — a redundancy problem that quietly inflates IT costs and degrades the performance of platforms that millions of paulistanos use every day. The estimate comes from internal audits discussed at the 2025 Congresso de Tecnologia da Informação Pública, held at the Centro de Convenções Rebouças in November of that year.

The timing of this reckoning matters. Mayor Ricardo Nunes has committed the Prefeitura de São Paulo to a digital transformation agenda that includes migrating legacy city databases to cloud infrastructure by the end of 2027. But storage bloat caused by unmanaged image duplication is complicating procurement decisions and pushing projected cloud migration costs upward at a moment when the city is already navigating tight fiscal margins. The federal government's broader push under Lula's PT administration for open data standards means São Paulo is under pressure to clean up its digital house before national interoperability requirements kick in.

The problem is concrete and addressable. At the Arquivo Histórico Municipal Washington Luís, located on Rua Roberto Simonsen in the Sé neighbourhood, archivists have been working since early 2025 to deduplicate a photographic collection that had grown to more than 1.2 million digital image files — many of them scanned multiple times across different digitisation campaigns over the past decade. The Universidade de São Paulo's Sistema Integrado de Bibliotecas, headquartered on the main Cidade Universitária campus in Butantã, faces a parallel challenge: its digital image repository for academic journals and theses had accumulated duplicate entries across at least four separate indexing systems by the time an internal review flagged the issue in March 2026.

What Duplication Actually Costs

Storage is not free, and in São Paulo's municipal IT environment, the price adds up fast. Cloud object storage in Brazil currently runs between R$0.08 and R$0.15 per gigabyte per month depending on tier and provider, according to publicly listed pricing from major cloud vendors operating locally. A municipal archive holding 500 terabytes of image data — a conservative figure for an institution the size of the Arquivo Histórico — spends in the range of R$480,000 to R$900,000 annually on storage alone. If 35 percent of that volume is duplicated and removable, the potential annual saving lands between R$168,000 and R$315,000 per institution. Multiply that across the dozens of city and state agencies that manage visual content in São Paulo, and the aggregate figure runs into the tens of millions of reais each year.

Processing overhead compounds the financial hit. Image deduplication software — tools that use perceptual hashing algorithms to identify visually identical or near-identical files regardless of filename or metadata — can reduce index query times by 20 to 50 percent in large repositories, according to benchmark data published by the Sociedade Brasileira de Computação in its 2024 annual technical report. Slower query times are not an abstraction for a city that routes building permit image attachments, flood-monitoring drone footage and urban planning maps through the same backend infrastructure.

The Path Forward for City Systems

Several São Paulo institutions are already running pilot deduplication programs. The Instituto de Pesquisas Tecnológicas, based in Cidade Universitária in the Butantã district, began testing an open-source perceptual hash tool called DupeGuru on a subset of its engineering document image library in April 2026, with initial results suggesting it flagged roughly 28 percent of files as candidates for review or deletion. The Secretaria Municipal de Urbanismo e Licenciamento has separately budgeted R$1.2 million in the 2026 fiscal year for a storage rationalisation project that includes image deduplication as a core component, according to the secretariat's published budget allocation document.

For municipal IT managers watching these pilots, the practical advice is sequential: audit first using automated tools, then quarantine flagged duplicates for human review rather than deleting automatically, then establish metadata governance rules to prevent re-accumulation. The Prefeitura's PRODAM technology company, which manages much of the city's core digital infrastructure from its headquarters near Parque Dom Pedro II, is expected to publish updated data management guidelines for city agencies before the end of the third quarter of 2026. How rigorously those guidelines address image deduplication will go a long way toward determining whether the cloud migration deadline of 2027 holds.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily São Paulo

This article was produced by the The Daily São Paulo editorial desk and covers news in São Paulo. See our editorial standards for how we use AI.

The Daily São Paulo brief

The day's São Paulo news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily São Paulo and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to São Paulo news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily São Paulo and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily São Paulo

More in News

Enjoyed this story? Get tomorrow's briefing free.