São Paulo's Digital Image Crisis: The Numbers Behind the City's Duplicate Photo Problem
Researchers and public agencies are finally counting the cost of redundant image data clogging municipal databases, and the figures are striking.
Researchers and public agencies are finally counting the cost of redundant image data clogging municipal databases, and the figures are striking.

São Paulo's public digital archives hold millions of images — and a growing share of them are exact or near-exact duplicates. A technical audit completed in June 2026 by engineers at the Instituto de Pesquisas Tecnológicas, the state research body headquartered on the Cidade Universitária campus in Butantã, found that duplicate images accounted for roughly 34 percent of total file storage across three major municipal platforms reviewed. That figure is not an anomaly. It reflects a structural problem that has been building since city agencies began aggressive digitisation drives after 2018.
The timing matters. Mayor Ricardo Nunes's administration is midway through a R$2.1 billion smart-city investment cycle — much of it funnelled through the Secretaria Municipal de Inovação e Tecnologia — and storage inefficiency directly erodes the return on that spending. Every redundant image file occupies server space, inflates cloud-hosting costs, and slows the retrieval systems that emergency responders, urban planners, and journalists rely on daily. With São Paulo's flooding and drainage crisis forcing the city to maintain near-real-time photographic documentation of risk zones from Itaquera to Pinheiros, the duplication problem is no longer a back-office annoyance.
The IPT audit examined image repositories across the GeoSampa mapping portal, the Prefeitura's urban mobility documentation system, and the digital archive maintained by the Empresa Municipal de Urbanização, known as EMURB. Across those three platforms, engineers identified more than 1.4 million duplicate image pairs. Storage costs for redundant files were estimated at approximately R$180,000 per year in cloud contracts alone — a figure that does not include staff time spent manually tagging, retrieving, or resolving conflicting file versions.
The duplication rate varies sharply by department. Infrastructure documentation — photographs of road works on Avenida Paulista, drainage interventions along the Tietê corridor, construction sites near the Linha 6-Laranja metro works in Higienópolis — tends to have the highest redundancy, sometimes exceeding 50 percent per project folder. That happens because field technicians from different contractors upload independently, without a unified deduplication protocol at the point of ingestion. The result is a database that grows faster than the city's actual photographic output would justify.
Brazil's broader digital governance framework compounds the problem. The Lei Geral de Proteção de Dados, which came into full force in August 2021, imposes retention and auditing obligations on public bodies holding personal data — and many of the duplicate images in question contain identifiable faces, vehicle plates, or addresses. Each redundant copy technically represents a separate data-compliance exposure. Legal scholars at the Pontifícia Universidade Católica de São Paulo have flagged this interpretation in working papers, though no enforcement action has yet been taken against a municipal body specifically over duplicate image retention.
The solution is technically straightforward: perceptual hashing algorithms can scan image libraries and flag duplicates with accuracy rates above 98 percent for identical files and around 89 percent for near-duplicates altered by compression or cropping. Several São Paulo-based technology companies operating out of the São Paulo Tech Hub in Vila Olímpia have built exactly this kind of tooling for private clients in retail and media. The municipal market, however, has been slower to adopt it.
A procurement process opened by the Secretaria Municipal de Inovação e Tecnologia in March 2026 sought vendors for an automated image-management platform. The budget envelope published in the official Diário Oficial do Município was set at R$4.7 million for a three-year contract. As of July 4, 2026, no contract had been awarded, according to procurement records reviewed by The Daily São Paulo.
For agencies and organisations outside city hall grappling with the same problem, the practical steps are well established: implement hash-based deduplication at upload rather than retrospectively, enforce a single point of ingestion per project, and audit existing libraries at least annually. The technology costs less than the storage it eliminates. São Paulo's own numbers make that case without ambiguity — R$180,000 a year in preventable cloud costs is a reasonable argument for moving faster on a R$4.7 million fix.
How does this story make you feel?
Spread the word
About this article
Published by The Daily São Paulo
Daily brief
Free, in your inbox before 7am. Weekdays.
More in News