At least 34% of all image files stored on São Paulo's municipal digital infrastructure are estimated to be duplicates or near-duplicates, according to internal audits reviewed by data specialists working with the city's Secretaria Municipal de Inovação e Tecnologia. The redundancy is not a trivial housekeeping problem. Storage costs, server load, and the downstream failure of AI-powered content tools all trace back to bloated, unmanaged image libraries that nobody has had a mandate to clean up — until now.
The issue has landed on the desk of the Nunes administration at a sensitive moment. The city of São Paulo is midway through a R$280 million digital modernisation program called SP Digital, which aims to consolidate public data systems before the 2028 municipal election cycle. Duplicate image data is emerging as one of the program's most stubborn friction points, because legacy scanning campaigns — particularly those run between 2017 and 2022 during infrastructure and zoning surveys across districts like Brás and Itaquera — were conducted without standardised file-naming or hash-checking protocols. The result: servers holding three, four, sometimes five copies of the same photograph of the same building facade.
The Scale of the Redundancy Problem
Storage is not cheap, even at municipal contract rates. São Paulo City Hall currently pays cloud storage providers roughly R$0.09 per gigabyte per month for cold-tier archival data, according to procurement records published on the Portal da Transparência do Município de São Paulo. Analysts working with the Rede Nossa São Paulo civic monitoring group have estimated that duplicate image files across just three municipal departments — Planning, Housing, and Public Works — consume an excess of 18 terabytes of billable storage annually. At current rates, that translates to approximately R$19,440 wasted every month on files that are, by definition, already elsewhere in the same system.
Private sector platforms headquartered in São Paulo are grappling with an amplified version of the same problem. Vtex, the e-commerce technology company based in the Vila Olímpia district, has publicly discussed image deduplication as a technical priority for merchant catalogues on its platform, where millions of product images are uploaded by tens of thousands of Brazilian sellers. Internal research from the broader Brazilian e-commerce sector — compiled by Associação Brasileira de Comércio Eletrônico, known as ABComm — found that product catalogues on major platforms average a 22% image duplication rate, inflating CDN delivery costs and slowing page load times that directly affect conversion rates. A one-second delay in mobile page load can reduce conversions by up to 20%, a figure ABComm has cited in reports on digital retail performance.
What Deduplication Actually Costs — and Saves
The technology to solve this is not new. Perceptual hashing algorithms, which compare images by structural similarity rather than exact byte-matching, have been commercially available since the early 2010s. Open-source libraries implementing these methods are in active use by newsrooms including Agência Pública, the investigative outlet based in the Pinheiros neighbourhood, which uses image fingerprinting to check whether photos submitted by freelancers have been previously published or digitally manipulated.
The barrier in São Paulo's public sector is not technology — it is procurement lag and inter-secretariat coordination. A deduplication project tendered by the Secretaria de Gestão in late 2024 stalled for seven months in committee review before receiving a go-ahead in March 2026. The contract, valued at R$4.2 million, was awarded to a São Paulo-based technology integrator and covers a 14-month implementation window across six municipal data repositories, including the GeoSampa geospatial platform, which holds hundreds of thousands of aerial and street-level photographs of the city's 96 administrative subprefeituras.
For city agencies, technology departments, and businesses managing large visual databases, the practical steps are straightforward even if the organisational will has historically been absent: implement hash-based deduplication at the point of upload, run retrospective audits on archives older than three years, and establish a single master-file policy before any new digitisation campaign begins. The mathematics of doing nothing are now too visible to ignore. São Paulo has built a data infrastructure worthy of Latin America's largest city economy. Keeping duplicate copies of it is a luxury no administration's IT budget can reasonably justify.