São Paulo's city hall has been quietly working through a problem that accumulated over roughly two decades: tens of thousands of duplicate images spread across disconnected municipal servers, from the Secretaria Municipal de Urbanismo e Licenciamento on Rua São Bento to the digital collections maintained by the São Paulo city archives on Rua Voluntários da Pátria, in Santana. The cleanup effort, now in its third year under the Nunes administration, has involved at least four municipal secretariats and a contracted technology firm operating out of a data centre in the Berrini corridor.
The problem matters now because Brazil's Lei Geral de Proteção de Dados — the LGPD, which took full enforcement effect in August 2021 — made redundant and untracked image files a legal liability, not just an administrative headache. Duplicate photographs of citizens in permit applications, social program registrations and public event documentation created exposure for fines from the Autoridade Nacional de Proteção de Dados. For a city the size of São Paulo, which processed more than 1.4 million digital documents in 2023 alone according to figures published by the municipal Controladoria Geral, the stakes were significant.
How the Duplication Crisis Built Up
The roots go back to the early 2000s. Each municipal department digitised its own records independently, using incompatible file management systems purchased in separate budget cycles. The Secretaria de Educação, for instance, built its own image repository for school enrollment documentation. The Secretaria de Saúde did the same for clinic registrations. Neither talked to the other. When the city later attempted to migrate both into a unified platform — a project that started under the Haddad administration in 2014 — automated batch uploads created copies rather than merging records.
The problem compounded during the Covid-19 pandemic, when emergency digitisation of social assistance documents pushed thousands of images into the system under deadline pressure, without deduplication protocols. The Secretaria de Assistência e Desenvolvimento Social, which operates service centres across peripheral neighbourhoods from Cidade Tiradentes to Capão Redondo, saw its image backlog grow sharply between March 2020 and the end of 2021.
Technicians at IPT — the Instituto de Pesquisas Tecnológicas, which has advised the city on infrastructure projects from the Pinheiros river cleanup to the drainage works along the Tietê — flagged the scale of the duplicate image problem in an internal diagnostic commissioned in late 2022. The report, referenced in a municipal budget justification document from January 2023, identified redundancy rates of above 30 percent in at least three departmental image libraries. The city did not publish the full diagnostic publicly.
What the Fix Looks Like — and What Comes Next
The current remediation effort uses perceptual hashing, a technique that generates a fingerprint for each image file and matches near-identical versions even when file names differ. A São Paulo-based startup from the Cubo Itaú tech hub in Vila Olímpia won the contract in early 2024 to deploy the tooling across six secretariats. Progress has been uneven: as of the first quarter of 2026, three secretariats had completed their audits while two others, including the housing secretariat, were still mid-process, according to a progress update published on the Prefeitura de São Paulo's transparency portal in March 2026.
The city is also piloting a new image ingestion policy that requires deduplication checks at the point of upload, rather than after the fact. The pilot launched in February 2026 at two service hubs — one in Pinheiros, one in São Miguel Paulista — before any planned citywide rollout.
For residents and businesses dealing with São Paulo's bureaucracy, the practical upshot is slower but cleaner digital processing. Permit applicants at the Poupatempo unit on Avenida Paulista have already reported fewer requests to resubmit documents that the system had previously failed to locate because of conflicting duplicate records. The full integration of municipal image databases remains, by the city's own published timeline, a target for completion by the end of 2027.