São Paulo's public digital infrastructure is sitting on a problem years in the making. Across municipal platforms — from the Prefeitura de São Paulo's open-data portal to the cultural archives maintained by the Centro Cultural São Paulo on Rua Vergueiro — redundant and duplicated image files have accumulated into the hundreds of thousands, consuming server resources, inflating cloud storage contracts, and complicating the work of designers, journalists, and government communicators alike.
The issue matters now because the city is midway through a major digital modernisation push tied to Mayor Ricardo Nunes's Programa São Paulo Digital, a municipal initiative aimed at centralising civic services online. As more photography, infographics, and social media assets are uploaded to shared repositories, the duplicate problem compounds. Administrators are discovering that the same scanned document or event photograph sometimes exists in six or seven versions across different departmental folders — each one slightly renamed, slightly re-cropped, and tagged differently.
How the Archive Got This Cluttered
The roots run back to at least 2017, when the city's communications secretariat, the Secretaria Municipal de Comunicação, expanded its digital newsroom operations and began requiring all subprefeituras — São Paulo has 32 of them — to upload photographic records of public events independently. Without a centralised deduplication protocol, parallel uploads became routine. The problem accelerated during the Covid-19 pandemic years, when remote work meant dozens of staffers across districts like Pinheiros, Itaquera, and Santana were pulling images from shared drives without coordination, editing locally, and re-uploading finished versions alongside the originals.
The commercial sector compounded the effect. Agencies working along Avenida Paulista — the city's de facto advertising and media corridor — routinely licensed stock photography from multiple platforms, downloaded the same asset in different resolutions, and pushed all versions into client folders for supposed future-proofing. A digital asset management consultant working with mid-sized agencies in the Vila Olímpia financial district described the pattern as endemic across the sector, though the practice is hardly unique to São Paulo. London's National Health Service faced a comparable internal reckoning with duplicated digital assets after a 2021 procurement audit found redundancy rates exceeding 40 percent in certain departmental databases, according to a report from the UK's Government Digital Service.
The financial toll is not abstract. Cloud storage pricing in Brazil — dominated by contracts with providers operating under LGPD compliance requirements, the Lei Geral de Proteção de Dados enacted in 2020 — has climbed steadily. Industry estimates from the Associação Brasileira das Empresas de Software, ABES, put average corporate cloud expenditure growth at roughly 22 percent annually between 2022 and 2025 in São Paulo state alone. Bloated image libraries feed directly into that cost, since tiered storage pricing penalises volume, not just active usage.
The Technology Gap Behind the Mess
Duplicate image replacement — the systematic process of identifying redundant files, designating a canonical master version, and redirecting all references to that master — has existed as a technical discipline for decades. Tools capable of perceptual hashing, which identifies visually identical images regardless of filename or minor compression differences, have been commercially available since the early 2010s. The gap in São Paulo has been institutional, not technological. Procurement cycles are long, inter-departmental data-sharing agreements are cumbersome, and IT budgets at the subprefeitura level have historically been thin.
The Escola de Artes, Ciências e Humanidades at USP's Ermelino Matarazzo campus has been researching municipal digital governance, and researchers there have flagged the archive issue as a case study in what happens when digitisation happens faster than digital governance. No formal remediation timeline has been announced by city hall as of this week.
For organisations facing the same problem right now, archivists recommend starting with a full audit using open-source perceptual hashing tools — several compatible with Portuguese-language metadata standards exist — before any deletion campaign. Establishing a single named owner for each canonical image file, rather than a shared departmental folder, dramatically reduces re-duplication rates. For the Prefeitura, the pressure will only intensify as São Paulo Digital's 2027 full-integration deadline approaches.