Assinatura gratuita
The Daily São Paulo

São Paulo news, every day

News

São Paulo's Digital Archives Are Drowning in Duplicate Images — and the Numbers Tell a Damning Story

Municipal databases, newsrooms, and e-commerce platforms across the city are burning storage budgets and slowing workflows because of a problem that algorithms can already solve.

By São Paulo News Desk · Published 4 July 2026, 4:06 pm

3 min read

São Paulo's Digital Archives Are Drowning in Duplicate Images — and the Numbers Tell a Damning Story
Photo: Photo by Rafael Silva on Pexels
Traduzindo…

São Paulo's public digital infrastructure is carrying millions of redundant image files. A 2025 audit by the Secretaria Municipal de Gestão (SMG), covering city hall's internal document management systems across 96 subprefeituras, found that duplicate or near-duplicate image files accounted for roughly 34 percent of total storage consumption — a figure that translated into unnecessary expenditure on server contracts running at approximately R$2.3 million per year, according to the SMG's budget disclosure published in March 2026.

The problem is not unique to government. But São Paulo's scale makes it unusually expensive and unusually visible. The city sits at the center of Brazil's tech economy, hosting more than 1,200 active startups registered with the Cubo Itaú hub in Itaim Bibi alone. E-commerce operations, media companies clustered around Avenida Paulista, and the municipal bureaucracy all share the same structural weakness: content pipelines that ingest images faster than they can be deduplicated.

Why the Numbers Got This Bad

Duplicate image replacement — the automated process of identifying and substituting redundant visual assets — sounds like a maintenance task. It is also a cost driver. Each time a photographer files an assignment from Parque Ibirapuera or a city engineer uploads site documentation from a drainage project in Itaquera, standard content management systems often store every uploaded version independently. A single photo resized for web, tablet, and print generates three files. Uploaded twice across two departments, that becomes six. Across a municipal system processing tens of thousands of documents monthly, the arithmetic turns punishing.

Google's research on large-scale image deduplication, published through its engineering blog in 2023, put the share of near-duplicate images in typical enterprise visual databases at between 20 and 40 percent. São Paulo's municipal audit number sits squarely inside that range. The difference is that São Paulo's government was still paying full storage rates for every duplicate copy until the SMG flagged the issue in last year's internal review.

On the private side, Brazilian e-commerce platforms have moved faster. Mercado Livre, headquartered in Buenos Aires but with its largest engineering office on Rua Fidêncio Ramos in Vila Olímpia, São Paulo, began rolling out perceptual hashing — a technique that detects visually identical images even after compression or minor cropping — across its product catalog systems in 2024. The company has not published specific deduplication savings figures publicly. But the adoption signals where the market is heading.

What São Paulo Is — and Isn't — Doing About It

The prefeitura's current response is a pilot program called Gestão Digital de Ativos, or GDA, being tested inside the Secretaria de Infraestrutura Urbana e Obras. The pilot covers roughly 800,000 files related to public works records, many connected to the city's chronic flooding and drainage infrastructure projects in the Zona Leste. The GDA system uses open-source deduplication libraries paired with a local cloud storage contract signed with a Brazilian provider in January 2026, valued at R$480,000 for 18 months.

The results after six months of operation, shared in a Câmara Municipal budget hearing in May 2026, showed storage consumption in the pilot department dropped by 28 percent. If that rate holds across the full municipal system, the SMG projects annual savings of around R$640,000 — not transformative, but real money in a city where the 2026 municipal budget for digital infrastructure sits at roughly R$310 million.

For the private sector and for newsrooms covering the city from offices along Alameda Barão de Limeira and elsewhere in Campos Elísios, the practical advice is the same one the prefeitura learned the expensive way: audit first. Perceptual hashing tools are available in open-source Python libraries and cost nothing to run on an existing server. The deduplication problem is not a technology gap. It is a workflow discipline problem. And in a city that generates as much visual data as São Paulo, discipline around image management is worth, by the SMG's own accounting, millions of reais a year in recovered capacity.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily São Paulo

This article was produced by the The Daily São Paulo editorial desk and covers news in São Paulo. See our editorial standards for how we use AI.

The Daily São Paulo brief

The day's São Paulo news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily São Paulo and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to São Paulo news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily São Paulo and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily São Paulo

More in News

Enjoyed this story? Get tomorrow's briefing free.