Assinatura gratuita
The Daily São Paulo

São Paulo news, every day

News

São Paulo's Digital Archive Crisis: The Numbers Behind Millions of Duplicate Images Clogging City Systems

From Prefeitura servers in Itaquera to startup data centres in Berrini, redundant image files are costing São Paulo's institutions real money and real storage capacity.

By São Paulo News Desk · Published 4 July 2026, 4:45 pm

3 min read

São Paulo's Digital Archive Crisis: The Numbers Behind Millions of Duplicate Images Clogging City Systems
Photo: Photo by Kaique Rocha on Pexels
Traduzindo…

São Paulo's public and private digital infrastructure is drowning in copies of itself. Across municipal databases, creative agencies along Avenida Brigadeiro Faria Lima, and the sprawling tech corridor of Vila Olímpia, duplicate image files have quietly metastasised into one of the city's least-discussed data management headaches — one that carries a concrete price tag and measurable operational drag.

The issue matters right now because cloud storage costs in Brazil rose sharply through 2025, driven by the real's fluctuation against the dollar and the global surge in demand for data centre capacity. Brazilian companies typically contract storage priced in U.S. dollars through providers such as Amazon Web Services and Google Cloud, meaning that every redundant JPEG, PNG or RAW file sitting undetected on a server represents a direct currency-exposure liability. For a city generating as much digital content as São Paulo — which hosts more than 400 active tech startups classified as scale-ups by the Associação Brasileira de Startups — that exposure compounds fast.

What the Data Actually Shows

Industry benchmarks from data governance consultancies operating in Latin America suggest that between 20 and 40 percent of unmanaged corporate image libraries consist of exact or near-duplicate files. Apply even the conservative end of that range to São Paulo's context and the scale becomes uncomfortable. The city's Secretaria Municipal de Comunicação, which maintains visual archives for public campaigns posted across Paulista Avenue digital panels and the city's official social channels, manages libraries that span years of mayoral administrations and multiple rebranding cycles. Without systematic deduplication protocols, those archives accumulate redundancy with every new upload cycle.

Storage pricing on major cloud platforms hovered around $0.023 per gigabyte per month for standard-tier object storage as of early 2026. A library of 10 terabytes carrying 30 percent duplication means roughly 3 terabytes of wasted spend — approximately $828 per year for that one archive alone. Scale that across dozens of municipal departments and the figure becomes a legitimate budget line. The Instituto de Tecnologia e Sociedade do Rio de Janeiro published research in late 2024 estimating that Brazilian public-sector entities collectively waste hundreds of millions of reais annually on redundant digital assets, though São Paulo-specific breakdowns remain unpublished.

Private sector losses are harder to audit but no less real. Creative production houses clustered around Rua Funchal in Vila Olímpia — a street that has become shorthand for São Paulo's advertising and content industry — routinely manage campaign assets in the tens of thousands of files per client. Account managers and producers interviewed generically by data governance firms describe version-control failures as routine, with multiple near-identical crop variations of the same hero image proliferating across Dropbox folders, internal servers, and client-shared drives simultaneously.

Detection Tools and What Comes Next

The technical solutions exist and are not exotic. Perceptual hashing algorithms, which generate a fingerprint for each image based on visual content rather than file metadata, can identify near-duplicate images even when they have been resaved, recoloured or slightly cropped. Tools using this approach have been commercially available since at least 2018, and several Brazilian SaaS companies — including ventures that have gone through the Cubo Itaú accelerator in Itaim Bibi — have built deduplication modules into broader digital asset management platforms targeting the domestic market.

The practical advice for São Paulo organisations is to run a baseline audit before the next budget cycle closes. Any institution managing more than 50 gigabytes of image assets should implement an automated deduplication pass at minimum quarterly intervals. The Prefeitura de São Paulo's ongoing smart-city modernisation programme, which has allocated resources to upgrading municipal digital infrastructure under the current Nunes administration, represents a logical vehicle for embedding these protocols into standard procurement requirements for communication and IT contracts.

The cost of inaction is not abstract. Every month a duplicate image sits on a dollar-denominated cloud server, the bill in reais ticks slightly higher. For a city of 12.3 million people trying to stretch public budgets across flooding mitigation, transport upgrades, and social services, that is a saving worth engineering.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily São Paulo

This article was produced by the The Daily São Paulo editorial desk and covers news in São Paulo. See our editorial standards for how we use AI.

The Daily São Paulo brief

The day's São Paulo news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily São Paulo and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to São Paulo news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily São Paulo and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily São Paulo

More in News

Enjoyed this story? Get tomorrow's briefing free.