Assinatura gratuita
The Daily São Paulo

São Paulo news, every day

News

How São Paulo's Digital Archives Became a Minefield of Duplicate Images — and Why It Took Years to Fix

A sprawling, fragmented network of municipal databases left the city's public record riddled with redundant photos, broken links and legal liability — here's how it happened.

By São Paulo News Desk · Published 4 July 2026, 3:45 pm

3 min read

How São Paulo's Digital Archives Became a Minefield of Duplicate Images — and Why It Took Years to Fix
Photo: Photo by fabianoshow4 on Pexels
Traduzindo…

São Paulo's city hall has been quietly working through a problem that accumulated over roughly two decades: tens of thousands of duplicate images spread across disconnected municipal servers, from the Secretaria Municipal de Urbanismo e Licenciamento on Rua São Bento to the digital collections maintained by the São Paulo city archives on Rua Voluntários da Pátria, in Santana. The cleanup effort, now in its third year under the Nunes administration, has involved at least four municipal secretariats and a contracted technology firm operating out of a data centre in the Berrini corridor.

The problem matters now because Brazil's Lei Geral de Proteção de Dados — the LGPD, which took full enforcement effect in August 2021 — made redundant and untracked image files a legal liability, not just an administrative headache. Duplicate photographs of citizens in permit applications, social program registrations and public event documentation created exposure for fines from the Autoridade Nacional de Proteção de Dados. For a city the size of São Paulo, which processed more than 1.4 million digital documents in 2023 alone according to figures published by the municipal Controladoria Geral, the stakes were significant.

How the Duplication Crisis Built Up

The roots go back to the early 2000s. Each municipal department digitised its own records independently, using incompatible file management systems purchased in separate budget cycles. The Secretaria de Educação, for instance, built its own image repository for school enrollment documentation. The Secretaria de Saúde did the same for clinic registrations. Neither talked to the other. When the city later attempted to migrate both into a unified platform — a project that started under the Haddad administration in 2014 — automated batch uploads created copies rather than merging records.

The problem compounded during the Covid-19 pandemic, when emergency digitisation of social assistance documents pushed thousands of images into the system under deadline pressure, without deduplication protocols. The Secretaria de Assistência e Desenvolvimento Social, which operates service centres across peripheral neighbourhoods from Cidade Tiradentes to Capão Redondo, saw its image backlog grow sharply between March 2020 and the end of 2021.

Technicians at IPT — the Instituto de Pesquisas Tecnológicas, which has advised the city on infrastructure projects from the Pinheiros river cleanup to the drainage works along the Tietê — flagged the scale of the duplicate image problem in an internal diagnostic commissioned in late 2022. The report, referenced in a municipal budget justification document from January 2023, identified redundancy rates of above 30 percent in at least three departmental image libraries. The city did not publish the full diagnostic publicly.

What the Fix Looks Like — and What Comes Next

The current remediation effort uses perceptual hashing, a technique that generates a fingerprint for each image file and matches near-identical versions even when file names differ. A São Paulo-based startup from the Cubo Itaú tech hub in Vila Olímpia won the contract in early 2024 to deploy the tooling across six secretariats. Progress has been uneven: as of the first quarter of 2026, three secretariats had completed their audits while two others, including the housing secretariat, were still mid-process, according to a progress update published on the Prefeitura de São Paulo's transparency portal in March 2026.

The city is also piloting a new image ingestion policy that requires deduplication checks at the point of upload, rather than after the fact. The pilot launched in February 2026 at two service hubs — one in Pinheiros, one in São Miguel Paulista — before any planned citywide rollout.

For residents and businesses dealing with São Paulo's bureaucracy, the practical upshot is slower but cleaner digital processing. Permit applicants at the Poupatempo unit on Avenida Paulista have already reported fewer requests to resubmit documents that the system had previously failed to locate because of conflicting duplicate records. The full integration of municipal image databases remains, by the city's own published timeline, a target for completion by the end of 2027.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily São Paulo

This article was produced by the The Daily São Paulo editorial desk and covers news in São Paulo. See our editorial standards for how we use AI.

The Daily São Paulo brief

The day's São Paulo news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily São Paulo and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to São Paulo news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily São Paulo and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily São Paulo

More in News

Enjoyed this story? Get tomorrow's briefing free.