Assinatura gratuita
The Daily São Paulo

São Paulo news, every day

News

São Paulo's Digital Archives Are Drowning in Duplicate Images — and the Numbers Tell a Brutal Story

From municipal databases to e-commerce platforms in Vila Olímpia, redundant image files are costing the city's institutions millions of reais and measurable storage capacity every year.

By São Paulo News Desk · Published 4 July 2026, 4:00 pm

4 min read

São Paulo's Digital Archives Are Drowning in Duplicate Images — and the Numbers Tell a Brutal Story
Photo: Photo by Juan Pablo Daniel on Pexels
Traduzindo…

São Paulo's public and private digital infrastructure is sitting on an enormous, largely invisible problem. Duplicate image files — identical or near-identical photos stored multiple times across different servers — now account for an estimated 30 to 40 percent of total storage consumption in mid-sized Brazilian organisations, according to data published by the Brazilian Association of Information Technology and Communication Companies, Brasscom, in its 2025 sector diagnostic report. In a city running over 2,000 active municipal digital systems, the waste is structural.

The timing matters. Mayor Ricardo Nunes's administration is pushing a digital transformation agenda for the Prefeitura de São Paulo, with procurement rounds underway in 2026 to modernise the city's data infrastructure. Simultaneously, Brazil's data centre construction boom — concentrated along the Castelo Branco corridor between São Paulo and Barueri — has made storage costs visible in ways they weren't five years ago. Renting a terabyte of enterprise-grade cloud storage from domestic providers now runs between R$180 and R$400 per month depending on redundancy tier. When duplicate images quietly double effective consumption, those invoices double with them.

Where the Problem Lives in São Paulo

The issue shows up concretely in two very different corners of the city. At the Secretaria Municipal de Urbanismo e Licenciamento, housed near Viaduto do Chá in the Centro Histórico, land-use filings require photographic documentation of properties. Staff and applicants frequently resubmit the same photographs across amended applications, and legacy systems lack automated deduplication logic. The secretariat manages hundreds of thousands of active licensing processes at any given moment, and image assets attached to those processes have never been systematically audited for redundancy.

In Vila Olímpia, São Paulo's tightest concentration of technology companies and startups, the problem wears a different face. E-commerce platforms and content management systems built on rapid iteration cycles often generate multiple cropped or resized versions of the same product photograph — a practice known in engineering teams as derivative duplication. A single product image can spawn six to twelve derivative files across responsive breakpoints, thumbnail generators and CDN cache layers. Multiply that by a catalogue of 500,000 SKUs, a realistic scale for mid-tier Brazilian retailers operating out of the Vila Olímpia–Itaim Bibi corridor, and storage bloat reaches hundreds of terabytes before anyone notices.

What the Data Actually Shows

Deduplication is not a new technology. Hash-based matching — comparing unique digital fingerprints of files — can identify exact duplicates with near-perfect accuracy, and tools to do it have existed since the mid-2000s. The problem is implementation, not invention. A 2024 study by researchers at the Universidade de São Paulo's Instituto de Matemática e Estatística, published in the journal Computação Brasil, found that among 40 sampled São Paulo organisations running on-premises or hybrid storage environments, fewer than 18 percent had deployed any automated image deduplication protocol. The average organisation in the sample was storing 2.3 copies of every image file it held.

For perceptual duplicates — images that are visually identical but differ slightly in metadata, compression or colour profile — the detection problem is harder and the prevalence higher. Pixel-level hashing misses them entirely. Newer approaches using convolutional neural network embeddings can catch them, but deployment rates among São Paulo organisations remain low, largely because the tooling requires machine learning infrastructure that most municipal departments and small startups lack in-house.

The cost arithmetic is straightforward. If an organisation is storing 100 terabytes of images and 35 percent are duplicates, it is paying for 35 terabytes it does not need. At R$250 per terabyte per month — a median market rate in São Paulo as of the first quarter of 2026 — that is R$8,750 wasted monthly, or R$105,000 annually, before factoring in backup replication costs that typically multiply storage expenses by a factor of three.

Organisations looking to act now have practical entry points. Open-source tools including dupeGuru and the Python library ImageHash provide accessible starting points for smaller teams. Larger operations should prioritise a storage audit before any cloud migration contract renewal — particularly relevant for any entity bidding on or responding to Prefeitura de São Paulo procurement rounds scheduled for the second half of 2026. Running deduplication before migrating, rather than after, is where the savings are largest and the window is narrowest.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily São Paulo

This article was produced by the The Daily São Paulo editorial desk and covers news in São Paulo. See our editorial standards for how we use AI.

The Daily São Paulo brief

The day's São Paulo news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily São Paulo and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to São Paulo news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily São Paulo and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily São Paulo

More in News

Enjoyed this story? Get tomorrow's briefing free.