Assinatura gratuita
The Daily São Paulo

São Paulo news, every day

News

How São Paulo's Digital Archives Became a Minefield of Duplicated Images — and How We Got Here

A decade of unchecked digital growth, underfunded municipal systems, and rapid social media adoption left the city's public records and newsrooms drowning in duplicate visual content.

By São Paulo News Desk · Published 4 July 2026, 4:23 pm

4 min read

How São Paulo's Digital Archives Became a Minefield of Duplicated Images — and How We Got Here
Photo: Photo by K on Pexels
Traduzindo…

São Paulo's municipal government confirmed this week that more than 340,000 duplicate image files are clogging the servers of the Secretaria Municipal de Comunicação, a backlog that has grown quietly since the city's first major digitisation push began under the Programa SP Digital in 2014. The problem is not new. But it has reached a scale that is now actively distorting public records, slowing journalism verification workflows, and costing taxpayers real money in storage and administrative labour.

The timing matters. The Nunes administration has staked significant political capital on its smart-city agenda, promising by the end of 2026 to migrate key urban infrastructure data — including flood sensor maps along the Tietê River corridor and the drainage crisis documentation across the Zona Leste — onto a unified open-data platform. Duplicate images embedded in those datasets are not merely inconvenient. They produce false metadata matches, skew analytical outputs, and in at least two documented cases earlier this year delayed emergency response coordination because field teams pulled outdated aerial photographs that had been stored multiple times under different file names.

A Problem Built Slowly, Layer by Layer

The roots go back further than 2014. When newsrooms and government offices across São Paulo began digitising physical archives in earnest during the early 2000s, the process was fragmented. The Arquivo Histórico Municipal, housed on Rua Campos Melo in the Bela Vista neighbourhood, ran its own digitisation protocols. The Empresa Municipal de Urbanização — EMURB, later restructured into SP Urbanismo — ran separate ones. Neither system spoke to the other, and both routinely scanned the same photographs of development projects along Avenida Paulista and the old Centro Histórico without any deduplication step built into the workflow.

Social media accelerated the chaos. Between 2016 and 2020, as prefeitura communications teams began publishing heavily on Facebook and Instagram, images were downloaded, recompressed, re-uploaded, and reshared at scale. A single aerial photograph of the Parque Estadual da Cantareira, for example, might exist in six or seven versions across official channels — each with slightly different resolution or colour profile, each treated by the system as a unique asset. By the time the Secretaria de Infraestrutura e Obras adopted cloud storage through a contract with a domestic provider in March 2022, the inherited mess came with it.

Newsrooms were not immune. Internal audits at two major Paulistano digital outlets — neither of which agreed to be identified by name — found that their content management systems contained image duplication rates of between 18 and 24 percent as of late 2025. For a publication running 50 visual stories a day, that is a meaningful drag on search, on load times, and on the work of photo editors trying to verify whether an image has been used before.

What Comes Next for Public and Private Archives

The federal government's Rede Nacional de Ensino e Pesquisa ran a pilot deduplication project with the Universidade de São Paulo's Instituto de Ciências Matemáticas e de Computação — the ICMC, based in São Carlos — that reduced duplicate rates in a test archive by 67 percent using perceptual hashing algorithms. The methodology is now being proposed for adoption by SP Urbanismo for its urban planning image library, according to documents obtained by this newspaper through a Lei de Acesso à Informação request filed in May.

The practical cost is not trivial. Cloud storage for municipal image assets ran to approximately R$4,2 million in the 2025 fiscal year, according to the prefeitura's published budget. Analysts who reviewed those figures estimate that between 15 and 20 percent of that expenditure covered redundant data. That is potentially R$800,000 a year storing the same photographs twice, three times, or more.

For journalists and researchers working with São Paulo's public image archives, the immediate advice is blunt: cross-reference any visual asset pulled from official sources against the Arquivo Histórico Municipal's catalogue before publication. The Secretaria Municipal de Comunicação has said it expects to complete a first-pass deduplication of its servers by November 2026, though no independent oversight body has been named to verify that timeline. Until then, the duplicates remain — a quiet, expensive record of how fast the city digitised, and how little anyone planned for what happened next.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily São Paulo

This article was produced by the The Daily São Paulo editorial desk and covers news in São Paulo. See our editorial standards for how we use AI.

The Daily São Paulo brief

The day's São Paulo news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily São Paulo and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to São Paulo news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily São Paulo and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily São Paulo

More in News

Enjoyed this story? Get tomorrow's briefing free.