Assinatura gratuita
The Daily São Paulo

São Paulo news, every day

News

São Paulo's Duplicate Image Problem: The Numbers That Are Costing the City Millions

From municipal databases to media archives, redundant digital images are quietly draining public resources — and the data shows the problem is bigger than most people realise.

By São Paulo News Desk · Published 4 July 2026, 4:06 pm

3 min read

São Paulo's Duplicate Image Problem: The Numbers That Are Costing the City Millions
Photo: Photo by Caroline Cagnin on Pexels
Traduzindo…

São Paulo's city government is sitting on a digital storage problem it can barely measure. Across municipal platforms — from the Secretaria Municipal de Gestão's internal document systems to the public-facing portals maintained by SP Negócios — duplicate images now account for an estimated 30 to 40 percent of total stored files, according to information technology professionals working on São Paulo's ongoing digital transformation programme. The result: taxpayer money spent storing the same photograph, diagram or scanned form dozens of times over.

The timing matters. City Hall under Mayor Ricardo Nunes has been pushing a broader modernisation agenda, including the São Paulo Digital initiative, which aims to migrate legacy paper records into unified cloud infrastructure before the end of 2026. But migrating bloated, uncleaned databases amplifies costs rather than cutting them. Every duplicated file that travels into a new system drags its redundant twins along with it.

What the Data Actually Shows

Digital storage in enterprise environments is not cheap, even at scale. In Brazil, corporate cloud storage contracts with providers operating out of data centres in Tamboré, in Barueri, or along the Rodovia Castelo Branco corridor typically run between R$0.08 and R$0.25 per gigabyte per month, depending on redundancy and compliance tiers. Municipal contracts, subject to public procurement rules under Lei 8.666 and its successor framework, often lock governments into multi-year agreements that make renegotiation difficult once storage volumes balloon.

The mathematics compound quickly. If a single municipal secretariat stores 500 terabytes of image data and 35 percent of that is duplicated, roughly 175 terabytes of paid storage is wasted. At even the lower end of market pricing — R$0.08 per gigabyte — that translates to approximately R$14,000 per month in unnecessary expenditure for one department alone. São Paulo operates more than 20 secretariats, each with its own digital infrastructure stack.

The problem is not unique to government. The Museu de Arte de São Paulo Assis Chateaubriand — MASP — on Avenida Paulista has been digitising its collection of more than 11,000 works since 2019. Cultural institutions worldwide report that 20 to 60 percent of digitisation output results in duplicate or near-duplicate files when quality control workflows are absent, according to standards published by the International Federation of Library Associations. MASP's digitisation team uses deduplication software as a standard step in its pipeline, a practice that municipal IT offices have been slower to adopt.

Where the Fixes Are Coming From

Deduplification — the automated process of identifying and removing redundant image files — is not a new technology. Tools such as open-source platforms and commercial solutions from vendors present in Brazil's market can cut stored image volumes by 25 to 60 percent in a single processing run, depending on how chaotic the original archive is. The Instituto de Pesquisas Tecnológicas, IPT, based at Cidade Universitária on the west side of the city, has been consulted on data quality standards for at least two municipal contracts in recent years, though specific financial terms were not available at publication time.

For organisations outside government, the business case is more straightforward. E-commerce platforms operating out of Vila Olímpia and Berrini — where much of São Paulo's tech unicorn ecosystem is concentrated — deal with product image libraries running into tens of millions of files. Retailers that fail to deduplicate before platform migrations routinely report post-migration performance degradation and inflated CDN bandwidth bills.

The practical advice for any organisation — public or private — confronting this problem follows a clear sequence. First, audit before migrating: running a hash-based duplicate detection scan on an existing archive costs a fraction of what redundant cloud storage costs over 12 months. Second, establish naming and metadata conventions before new images enter any system, since retroactive cleaning is always more expensive than prevention. Third, for São Paulo's municipal departments specifically, the window before the São Paulo Digital migration closes is narrow. Technology officers who have not yet commissioned a deduplication audit of their image libraries should treat the second half of 2026 as the last practical moment to act before legacy problems become permanent features of new infrastructure.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily São Paulo

This article was produced by the The Daily São Paulo editorial desk and covers news in São Paulo. See our editorial standards for how we use AI.

The Daily São Paulo brief

The day's São Paulo news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily São Paulo and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to São Paulo news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily São Paulo and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily São Paulo

More in News

Enjoyed this story? Get tomorrow's briefing free.