Assinatura gratuita
The Daily São Paulo

São Paulo news, every day

News

São Paulo's Digital Archives Are Drowning in Duplicate Images — and the Numbers Reveal Why

A growing crisis in municipal and corporate data storage is costing São Paulo institutions millions of reais and slowing down the city's push toward digital efficiency.

By São Paulo News Desk · Published 4 July 2026, 4:12 pm

3 min read

São Paulo's Digital Archives Are Drowning in Duplicate Images — and the Numbers Reveal Why
Photo: Photo by Gezer Amorim on Pexels
Traduzindo…

São Paulo's public institutions and private companies are sitting on a problem they largely refuse to quantify out loud: duplicate image files are consuming storage infrastructure at a scale that is now measurable, costly, and accelerating. Across municipal data centres and the city's sprawling tech ecosystem, redundant image data — the same photograph, scan, or graphic stored two, five, sometimes dozens of times across different servers — is eating into budgets that administrators would rather spend elsewhere.

The timing matters. Mayor Ricardo Nunes has pushed a digitisation agenda for city services throughout 2025 and into 2026, migrating health records, permit applications, and urban planning documents onto centralised platforms. That migration has created enormous image libraries — scanned IDs, satellite photos used by SP Urbanismo, drone footage from Defesa Civil flood-monitoring operations — and with them, a duplication problem that storage specialists say was predictable but largely unplanned for.

The Numbers Are Blunt

Industry benchmarks from data management firms operating in Brazil suggest that between 25 and 40 percent of enterprise image storage in large Latin American cities consists of duplicate or near-duplicate files. For a city the size of São Paulo — which, according to its own municipal technology secretariat, manages more than 180 terabytes of digitised public records as of early 2026 — that range translates to somewhere between 45 and 72 terabytes of potentially redundant data. Cloud and co-location storage in Brazil's southeast region currently runs at roughly R$0,35 to R$0,60 per gigabyte per month depending on contract tier. At those rates, the annual cost of storing duplicates alone in São Paulo's municipal systems could reach into the tens of millions of reais, though the city has not published a precise breakdown.

The private sector picture is sharper. The Cubo Itaú innovation hub in Itaim Bibi, which houses more than 300 startups, has become a testing ground for deduplication tools, partly because several resident companies in the health-tech and legal-tech verticals discovered their own image redundancy rates exceeded 50 percent after rapid scaling. Startups on Avenida Brigadeiro Faria Lima, where many of the city's fintech unicorns maintain operational offices, face similar pressures: regulatory requirements from the Banco Central and the CVM demand that document images be retained for defined periods, creating parallel archiving practices that multiply file counts fast.

Why Deduplication Is Harder Than It Looks

Deleting duplicates sounds simple. In practice, image deduplication requires hashing algorithms to identify identical files, perceptual hashing to catch near-duplicates that differ only in resolution or compression, and a governance framework to decide which copy is authoritative before the rest are removed. For public bodies like the Secretaria Municipal de Urbanismo e Licenciamento, which maintains image records tied to legal processes on streets ranging from Paulista Avenue to the industrial zones of Santo André, deleting the wrong copy can create evidentiary gaps in active cases.

Several São Paulo-based data engineering consultancies have begun offering deduplication audits as a standalone service since late 2024, pricing initial assessments for mid-sized organisations at between R$40,000 and R$120,000 depending on data volume. The Brazilian Association of Information Technology and Communication Companies, BRASSCOM, flagged data hygiene as a top-five operational cost driver for Brazilian enterprises in a report released in the first quarter of 2026.

For organisations that want to act before a formal audit, storage engineers recommend three immediate steps: implement SHA-256 or perceptual hashing on all new image ingestion pipelines so duplicates are flagged at the point of entry rather than discovered later; establish a single source-of-truth repository — a master library — before running any deletion scripts; and set retention policies that comply with Brazilian data protection law, the LGPD, which since its full enforcement in 2020 governs how long certain categories of image data must be kept and by whom. Companies or agencies that delete records prematurely risk ANPD sanctions; those that never clean their archives pay the storage bill indefinitely. Neither outcome is free.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily São Paulo

This article was produced by the The Daily São Paulo editorial desk and covers news in São Paulo. See our editorial standards for how we use AI.

The Daily São Paulo brief

The day's São Paulo news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily São Paulo and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to São Paulo news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily São Paulo and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily São Paulo

More in News

Enjoyed this story? Get tomorrow's briefing free.