How São Paulo's Digital Archives Became a Minefield of Duplicate Images — and What's Being Done About It
Years of rapid digitisation across city agencies and newsrooms left a sprawling mess of redundant visual content; now a reckoning is underway.
Years of rapid digitisation across city agencies and newsrooms left a sprawling mess of redundant visual content; now a reckoning is underway.

São Paulo's municipal government is sitting on a problem it largely created itself. Across the Prefeitura's network of digital platforms — from the official portal on prefeitura.sp.gov.br to the press offices of individual secretariats — thousands of duplicate images have accumulated over nearly a decade of uncoordinated digitisation drives, clogging databases, slowing public-facing websites and, in several documented cases, attaching the wrong photographs to official press releases.
The scale of the mess reflects how fast the city moved online without setting consistent rules. Between 2015 and 2023, successive administrations pushed agencies to digitise records and go paperless, a goal accelerated by the pandemic. But without a centralised asset management system, each secretariat built its own image library. Photographs of the same Paulista Avenue ceremony or the same Ibirapuera Park flooding event would be uploaded four or five times under different file names, in different resolutions, tagged with different metadata — or no metadata at all.
The consequences are more than cosmetic. When a duplicate image is attached to the wrong story — a 2019 flood photograph labelling a 2024 infrastructure announcement, for example — it creates a credibility gap that reporters and civil society monitors are quick to exploit. The Transparência Brasil organisation, which tracks government communications for accuracy, flagged misattributed city imagery as a recurring issue in its 2024 monitoring report on municipal transparency. Duplicate and mismatched visuals were among the metadata failures the report identified across Brazilian municipal platforms.
Private newsrooms face a parallel crisis. Grupo Folha, which operates the Folha de S.Paulo from its headquarters on Alameda Barão de Limeira in Campos Elísios, began an internal audit of its digital photo archive in late 2024 after editors flagged repeated instances of the same wire image appearing in print and online under different captions. The problem was not unique to any single outlet; it reflected industry-wide habits formed when digital storage was cheap and image curation was considered a back-office afterthought.
Technicalities matter here. A duplicate image is not simply two copies of the same file. It includes visually identical photographs saved in different formats — a JPEG and a PNG of the same shot — as well as near-duplicates: slightly cropped versions, images with different colour corrections, or the same frame pulled from a video at marginally different timestamps. Standard search tools miss most of these. Identifying them requires perceptual hashing algorithms, a technology that only in the past three years has become affordable enough for mid-sized organisations to deploy without enterprise budgets.
The Secretaria Municipal de Inovação e Tecnologia, which operates out of offices in the Centro district, announced in March 2026 that it would pilot a duplicate-detection tool across three city secretariats before the end of the third quarter. The pilot covers the Secretaria de Obras, whose image archive runs to roughly 400,000 files accumulated since 2012, according to the secretariat's own inventory disclosed under a freedom-of-information request filed by this newspaper in April.
On the private side, the São Paulo tech ecosystem has produced at least two startups specifically targeting this space. One, based in Vila Olímpia, raised a seed round in early 2025 to develop tools for Brazilian Portuguese-language content management systems used by media companies. Another, incubated at the Centro de Inovação do Parque Tecnológico São José dos Campos — about 90 kilometres from the city — has been adapting similar technology for use by state agencies under a contract with the Governo do Estado de São Paulo.
For ordinary readers and for civil servants trying to keep public communications clean, the practical advice is straightforward: any organisation still relying on folder-based image storage with manual naming conventions is operating with a system that was obsolete by 2020. Metadata standards — embedding the date, photographer, location and usage rights directly into every image file at the moment of upload — would solve most downstream problems before they start. The city's March 2026 pilot will be the first real test of whether São Paulo's bureaucracy can move fast enough to close a gap that has been widening for a decade.
How does this story make you feel?
Spread the word
About this article
Published by The Daily São Paulo
Daily brief
Free, in your inbox before 7am. Weekdays.
More in News