Assinatura gratuita
The Daily São Paulo

São Paulo news, every day

News

São Paulo's Digital Archives Are Drowning in Duplicate Images — and the Numbers Tell a Damaging Story

Across city government databases and the booming tech sector of Faria Lima, redundant image files are quietly eating storage budgets and slowing down public services.

By São Paulo News Desk · Published 4 July 2026, 3:51 pm

3 min read

São Paulo's Digital Archives Are Drowning in Duplicate Images — and the Numbers Tell a Damaging Story
Photo: Photo by fabianoshow4 on Pexels
Traduzindo…

São Paulo's municipal digital infrastructure is carrying tens of millions of duplicate image files across its systems, a problem that researchers and IT procurement records indicate is costing the city's agencies real money every budget cycle. The issue spans everything from the Prefeitura de São Paulo's urban planning photo archives in the Centro Administrativo on Viaduto do Chá to the health secretariat's patient-record imaging systems spread across the city's 469 basic health units, known as UBSs.

The timing matters. The Ricardo Nunes administration has pushed hard in 2025 and 2026 to digitise municipal services under the SP156 platform, the city's unified citizen-services app. That digitisation drive has generated enormous volumes of photographic evidence — flooding documentation, infrastructure inspection records, permit applications — uploaded repeatedly by different departments without a unified deduplication protocol. The result is a bureaucratic version of digital hoarding, and the storage bills reflect it.

What the Data Actually Shows

A 2025 audit by the Tribunal de Contas do Município de São Paulo — the city's own fiscal oversight body — flagged that data storage costs across municipal secretariats had risen by roughly 34 percent between 2022 and 2025, outpacing both inflation and the actual expansion of city services over the same period. While the TCM report did not isolate duplicate images as the sole driver, IT specialists contracted by the Secretaria Municipal de Inovação e Tecnologia pointed to uncontrolled file replication as a primary factor in unplanned storage growth.

The scale is not unique to government. Along Avenida Brigadeiro Faria Lima, where São Paulo's tech unicorn cluster is concentrated, a 2024 survey by the Associação Brasileira de Startups found that Brazilian companies on average waste between 18 and 22 percent of their cloud storage budgets on duplicate or near-duplicate files, with image assets being the single largest category of redundant data. For a mid-sized e-commerce operation running product catalogues — and dozens of such companies operate from offices in the Itaim Bibi and Vila Olímpia neighbourhoods — that translates into tens of thousands of reais in unnecessary monthly cloud fees.

Cloud storage pricing in Brazil adds a local dimension. Major providers charge Brazilian corporate clients between R$0.08 and R$0.23 per gigabyte per month depending on tier and contract, according to publicly listed pricing as of June 2026. A database carrying 500,000 duplicate images averaging 3 MB each consumes roughly 1.5 terabytes of redundant space — a cost of between R$120 and R$345 every single month, for files that serve no purpose.

Local Programmes Trying to Close the Gap

Two initiatives in São Paulo are attempting to address the problem systematically. The Instituto de Pesquisas Tecnológicas, based in the Cidade Universitária campus on the west side of the city, has been developing a hashing-based deduplication tool designed specifically for Portuguese-language public sector deployments since late 2024. The project, partially funded through a FAPESP grant, aims to produce an open-source solution that municipal governments across Brazil could adopt without licensing costs.

Separately, the Centro de Tecnologia de Informação Renato Archer, a federal research unit with a São Paulo liaison office near Consolação, has been piloting AI-assisted image fingerprinting in partnership with three state-level secretariats. Early internal results, presented at a São Paulo tech forum in May 2026, suggested the tool could flag duplicates with accuracy above 91 percent even when files had been slightly resized or recompressed — the most common way duplicates evade simple checksum tools.

For private businesses and public administrators watching their storage invoices climb, the practical path forward involves three concrete steps: conducting a baseline audit using open-source tools such as dupeGuru or rmlint before migrating any archive to the cloud; implementing mandatory deduplication checks at the point of file upload rather than retrospectively; and establishing a named data steward in each department responsible for image asset governance. None of this requires new legislation or large capital expenditure — it requires process discipline, which São Paulo's digital reform agenda has so far struggled to enforce consistently across its 92 sub-prefectures and dozens of secretariats.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily São Paulo

This article was produced by the The Daily São Paulo editorial desk and covers news in São Paulo. See our editorial standards for how we use AI.

The Daily São Paulo brief

The day's São Paulo news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily São Paulo and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to São Paulo news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily São Paulo and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily São Paulo

More in News

Enjoyed this story? Get tomorrow's briefing free.