Assinatura gratuita
The Daily São Paulo

São Paulo news, every day

News

How São Paulo's Digital Archives Became a Swamp of Duplicate Images — and What's Being Done About It

Years of rapid digitisation, competing municipal platforms and overlapping federal programs left the city's public image databases riddled with redundant files, costing storage budgets and eroding public trust in official records.

By São Paulo News Desk · Published 4 July 2026, 4:00 pm

4 min read

How São Paulo's Digital Archives Became a Swamp of Duplicate Images — and What's Being Done About It
Photo: Photo by Luiz Silva on Pexels
Traduzindo…

São Paulo's municipal government is sitting on a digital storage crisis it largely created itself. Across at least three separate platforms — the Prefeitura's official press portal, the SP Notícias image bank, and the Arquivo Histórico Municipal on Rua Voluntários da Pátria in Santana — administrators have been quietly grappling with tens of thousands of duplicate image files, redundant entries that inflate storage costs, slow public-facing search tools and, in several documented cases, caused the wrong photograph to accompany official press releases.

The problem did not appear overnight. It is the direct product of a decade-long digitisation push that lacked a unified metadata standard, and it matters now because the Ricardo Nunes administration is midway through a R$47 million modernisation contract for city digital infrastructure — a contract that specifically targets data quality and redundancy as priority concerns before the system overhaul goes live in the first quarter of 2027.

A Timeline of Well-Intentioned Chaos

The roots run back to roughly 2014, when the then-administration of Fernando Haddad accelerated the scanning of physical records held at the Arquivo Histórico on Rua Voluntários da Pátria. The drive was admirable in scope: thousands of photographic prints, glass-plate negatives and administrative documents from the early twentieth century were digitised and pushed online. But different departments uploaded copies independently, often with conflicting file names and no cross-referencing protocol. The Secretaria Municipal de Cultura ran its own parallel ingestion pipeline. So did the Secretaria de Comunicação.

By the time the João Doria administration launched the SP156 citizen services app in 2017, engineers integrating image assets from legacy databases found that some photographs — including aerial shots of the Anhangabaú Valley and official portraits of past mayors — appeared in the system under four or five separate file identifiers. Duplicate image replacement, meaning the methodical process of identifying redundant files, designating a canonical version and redirecting all references to that master copy, was discussed internally but never funded as a standalone project.

The problem compounded under successive administrations. Federal programs including the Rede Nacional de Ensino e Pesquisa's digitisation grants pushed more content into municipal pipelines between 2019 and 2022, again without harmonised metadata standards. The Biblioteca Mário de Andrade on Rua da Consolação, which holds one of the country's most significant urban photography collections, contributed several large batches of scanned images that arrived already carrying duplicate entries from the institution's own internal catalogue.

Why the Reckoning Is Happening Now

Storage is not cheap. Municipal IT departments across Brazil typically pay between R$0.08 and R$0.23 per gigabyte per month for government-contracted cloud tiers, and analysts who study public procurement data note that redundant files can account for 15 to 30 percent of an institution's total image storage volume in organisations that never implemented deduplication protocols. Multiplied across years of accumulation, that is a measurable drain on a budget that the Secretaria Municipal de Gestão has been under pressure to trim since 2024.

Beyond cost, there is an accuracy problem. Journalists and researchers who use the Arquivo Histórico's online portal have reported that keyword searches for landmarks such as the Viaduto do Chá or the Mercadão on Rua da Cantareira return results cluttered with near-identical images catalogued under contradictory dates and attribution fields. That kind of noise erodes the utility of what should be a definitive public record.

The current modernisation contract, awarded in late 2025, tasks a consortium led by the technology firm Stefanini — which has a significant São Paulo operation — with implementing automated hash-based deduplication across the municipal image ecosystem before the 2027 system migration. The process works by generating a unique digital fingerprint for each file; exact or near-exact matches are flagged, human archivists review borderline cases, and a single canonical file is designated while all database references are updated to point to it.

For researchers, journalists and city agencies that rely on these archives daily, the practical takeaway is straightforward: the portal cleanup is scheduled in phases, beginning with the Secretaria de Comunicação's press image bank in the third quarter of 2026. Users should expect temporary search disruptions and are advised to download and locally archive any specific images they need for ongoing projects before September, when the first deduplication batch goes live.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily São Paulo

This article was produced by the The Daily São Paulo editorial desk and covers news in São Paulo. See our editorial standards for how we use AI.

The Daily São Paulo brief

The day's São Paulo news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily São Paulo and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to São Paulo news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily São Paulo and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily São Paulo

More in News

Enjoyed this story? Get tomorrow's briefing free.