Assinatura gratuita
The Daily São Paulo

São Paulo news, every day

News

São Paulo's Image Duplication Problem: The Numbers That Reveal a City Drowning in Redundant Data

Municipal databases, e-commerce platforms and public archives are haemorrhaging storage costs and processing time because of duplicate image files — and the scale of the problem is larger than most administrators want to admit.

By São Paulo News Desk · Published 4 July 2026, 4:00 pm

3 min read

São Paulo's Image Duplication Problem: The Numbers That Reveal a City Drowning in Redundant Data
Photo: Photo by Caroline Cagnin on Pexels
Traduzindo…

São Paulo's municipal digital infrastructure is carrying millions of redundant image files across its public databases, a problem that costs city agencies an estimated tens of millions of reais annually in excess storage alone — and one that private sector players in the city's sprawling tech corridor are struggling with just as badly. The issue has a name: duplicate image replacement, the process of identifying and eliminating repeated visual data that clogs systems, slows performance and inflates operational budgets. The data behind the problem is striking.

Brazil's largest city generates an extraordinary volume of digital images every single day. The São Paulo City Hall's Secretaria Municipal de Gestão manages document archives that include property records, permit photographs and infrastructure inspection imagery for all 96 districts. According to digital governance specialists working with municipal contractors, image duplication rates in large institutional repositories commonly run between 20 and 40 percent of total stored files — meaning that in any database holding 10 million images, as many as 4 million may be redundant copies. Applied to the scale of São Paulo's public administration, the implications for wasted expenditure are significant.

Where the Problem Is Worst

The Poupatempo network — the state government's one-stop service centres with major hubs on Avenida do Estado and in the Itaquera district — processes hundreds of thousands of citizen documents each month. Each transaction can generate multiple image captures of the same ID card, utility bill or facial photograph, depending on how many times a form is resubmitted or a system times out. Duplication in this context is not accidental sloppiness. It is a structural byproduct of how legacy systems were designed, often without any deduplication logic built into the intake pipeline.

On the private side, São Paulo's e-commerce ecosystem — centred on companies operating out of the Vila Olímpia and Berrini tech clusters — faces the same arithmetic. Brazil's e-commerce sector processed more than R$185 billion in transactions in 2024, according to data published by the Associação Brasileira de Comércio Eletrônico. Product catalogues are a central battleground. A single SKU — one pair of trainers, one kitchen appliance — can appear in a retailer's database under dozens of duplicate image entries, each uploaded by a different supplier portal, marketing team or automated feed. Storage costs for cloud infrastructure in Brazil are priced in dollars but paid in reais, making the exchange rate exposure on wasted gigabytes a genuine financial risk.

Perceptual hashing technology, which generates a compact digital fingerprint for each image and flags near-identical matches, has been available commercially for years. Companies including those based in the Cubo Itaú innovation hub on Avenida Brigadeiro Faria Lima have begun integrating these tools into data pipelines. The process involves assigning a hash value to every image on ingestion; if two values fall within a set similarity threshold, one file is flagged for deletion or archival. Implementation costs for a mid-sized platform — say, 5 to 10 million stored images — typically run between R$80,000 and R$250,000 for a one-time integration project, with ongoing maintenance adding roughly 15 to 20 percent annually, based on pricing circulated by technology vendors operating in the São Paulo market.

What Comes Next for Administrators

The Ricardo Nunes administration's ongoing São Paulo Inteligente programme, which aims to modernise city digital services, does not yet include a published mandate for image deduplication across municipal databases, based on publicly available programme documents. That gap matters. Without a formal policy, each secretariat manages its own storage independently, which means the duplication problem compounds rather than shrinks over time.

For private companies, the practical next step is straightforward: audit existing image libraries before migrating to any new cloud architecture. The worst outcomes happen when organisations move to a new platform and carry every redundant file with them, paying twice — once to transfer the data and again to store it indefinitely. Municipal administrators face the same calculus. A city that spent years digitising paper records on Rua Libero Badaró and throughout the historic centre now needs a second pass: this time, to remove what was scanned more than once.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily São Paulo

This article was produced by the The Daily São Paulo editorial desk and covers news in São Paulo. See our editorial standards for how we use AI.

The Daily São Paulo brief

The day's São Paulo news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily São Paulo and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to São Paulo news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily São Paulo and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily São Paulo

More in News

Enjoyed this story? Get tomorrow's briefing free.