Assinatura gratuita
The Daily São Paulo

São Paulo news, every day

News

São Paulo's Digital Archives Are Drowning in Duplicate Images — and the Numbers Are Staggering

From city hall servers to the tech startups of Vila Olímpia, the hidden cost of redundant image files is bleeding budgets and slowing infrastructure across Brazil's largest city.

By São Paulo News Desk · Published 4 July 2026, 3:45 pm

4 min read

São Paulo's Digital Archives Are Drowning in Duplicate Images — and the Numbers Are Staggering
Photo: Photo by Denilson Santos de Oliveira on Pexels
Traduzindo…

São Paulo's municipal digital infrastructure is carrying a weight that most residents never see. According to internal assessments circulated within the Secretaria Municipal de Inovação e Tecnologia in the first quarter of 2026, duplicate image files account for an estimated 34 percent of total unstructured data stored across city government servers — a figure that translates directly into wasted expenditure on cloud storage contracts renewed annually at Prefeitura do Município de São Paulo.

The problem sits at the intersection of rapid digitisation and poor data governance. Over the past five years, city departments accelerated the scanning of physical records, the upload of event photography, and the archiving of urban monitoring footage from the 1,200-plus cameras operated along corridors including Avenida Paulista and the Marginal Tietê. Nobody built a systematic deduplication protocol into those workflows. The result is servers filled with near-identical images stored under different filenames, in different folders, sometimes in different departments entirely.

What the Data Actually Shows

The scale of the problem becomes clearer when you move from government systems into the private sector. A 2025 benchmark report published by the Brazilian Association of Software Companies, Abes, found that mid-sized technology companies in Greater São Paulo were allocating between 18 and 22 percent of their cloud storage budgets to duplicate or near-duplicate files — a category dominated by image assets. For a startup burning R$80,000 per month on AWS or Google Cloud infrastructure, that figure alone can represent more than R$15,000 in avoidable monthly costs.

The tech cluster concentrated in Vila Olímpia and Itaim Bibi, home to regional offices of companies including Totvs and several of the city's listed unicorns, has begun treating this as a balance-sheet issue rather than a technical inconvenience. Storage costs in Brazil carry an additional layer of complexity because data sovereignty rules under the Lei Geral de Proteção de Dados, in force since September 2020, push many organisations toward domestic data centres — which charge a premium compared to international equivalents. Localweb and UOL Host, two São Paulo-based providers with infrastructure in the Alphaville business district in Barueri, price enterprise storage tiers roughly 40 percent higher than comparable AWS São Paulo region rates, according to publicly listed pricing as of June 2026.

At the municipal level, the Arquivo Histórico de São Paulo on Rua Voluntários da Pátria in Santana digitised more than 400,000 documents between 2021 and 2025 as part of the city's Memória Paulistana programme. Sources familiar with the project — without attribution to named individuals — have previously described the absence of automated deduplication as one of the programme's unresolved technical gaps. The practical consequence is that storage allocation for the archive has grown faster than the volume of unique content would justify.

The Fix Exists — The Will to Deploy It Is the Question

Deduplication tools are not new. Perceptual hashing algorithms — software that generates a fingerprint for each image and flags near-identical copies even when file names or formats differ — have been commercially available since the early 2010s. Open-source libraries widely used by development teams in São Paulo's software ecosystem, including those gathering at events like Campus Party Brasil at Expo Center Norte in Santana, can reduce duplicate image libraries by 25 to 60 percent in a single automated pass, depending on how aggressively the similarity threshold is set.

The challenge is integration. Legacy systems inside city departments and older enterprise platforms were not designed to run deduplication at ingestion — meaning files land in storage unscreened. Retrofitting that logic into existing pipelines requires developer hours that compete with higher-visibility projects.

For organisations in São Paulo looking to address this now, the practical entry point is an audit rather than an immediate overhaul. Running a read-only deduplication scan — available through tools like dupeGuru or enterprise options from vendors present at the Distrito Faria Lima tech hub — produces a concrete number: the precise volume of redundant data and its associated monthly cost. That number, more than any governance argument, tends to move budget decisions. In a city where R$1 is under sustained pressure and every secretaria is being asked to demonstrate digital efficiency, the duplicate image problem is finally becoming a finance conversation, not just a technical one.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily São Paulo

This article was produced by the The Daily São Paulo editorial desk and covers news in São Paulo. See our editorial standards for how we use AI.

The Daily São Paulo brief

The day's São Paulo news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily São Paulo and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to São Paulo news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily São Paulo and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily São Paulo

More in News

Enjoyed this story? Get tomorrow's briefing free.