Assinatura gratuita
The Daily São Paulo

São Paulo news, every day

News

São Paulo's Duplicate Image Problem: The Numbers Exposing a Hidden Crisis in City Digital Archives

Municipal databases and creative agencies across the city are drowning in redundant visual files — and the storage bills are piling up fast.

By São Paulo News Desk · Published 4 July 2026, 3:47 pm

4 min read

São Paulo's Duplicate Image Problem: The Numbers Exposing a Hidden Crisis in City Digital Archives
Photo: Photo by Sérgio Souza on Pexels
Traduzindo…

São Paulo's public and private digital infrastructure is carrying tens of millions of duplicate image files, an accumulation years in the making that is now costing organisations real money and slowing down the systems that run everything from traffic monitoring on the Marginal Pinheiros to health records at Hospital das Clínicas. Specialists who work with municipal data systems say the problem is measurable, structural and largely invisible to the people paying for it.

The issue has come into sharper focus in 2026 as the city's tech sector, concentrated around the Berrini Avenue corridor and the Vila Olímpia district, has pushed deeper into artificial intelligence tools that depend on clean, deduplicated image libraries. When training sets contain repeated files, model accuracy drops and computing costs rise — a double penalty that São Paulo's growing pool of AI startups can ill afford as they compete for Series B funding rounds now averaging R$85 million in the Brazilian market, according to figures published by the Brazilian Private Equity and Venture Capital Association (ABVCAP) in its March 2026 report.

What the Data Actually Shows

Duplication rates inside large unmanaged image repositories typically run between 30 and 60 percent of total file volume, according to methodology published by the International Press Telecommunications Council (IPTC), which sets global metadata standards for digital images. Apply that range to São Paulo's context and the figures become striking. The Secretaria Municipal de Inovação e Tecnologia, which oversees the city's open data portal at dados.prefeitura.sp.gov.br, hosts image-linked datasets covering urban infrastructure, public transport and environmental monitoring. If even the lower end of that duplication range applies, storage overhead runs into hundreds of gigabytes of redundant data on city-managed servers alone.

Private creative economies amplify the problem. The Faria Lima financial district and the advertising agencies clustered around Rua Wisard in Vila Madelena together employ thousands of designers, content producers and social media managers who move image assets across platforms daily. A 2025 audit framework published by Brazil's Associação Brasileira de Agências de Publicidade (ABAP) identified asset redundancy as one of the top three operational inefficiencies reported by mid-size agencies, alongside licensing compliance and file versioning. The average mid-size São Paulo agency manages between 400,000 and 1.2 million active image files at any given time, by ABAP's own survey estimates released in November 2025.

Cloud storage is not free. Amazon Web Services S3 standard storage, widely used by São Paulo's tech companies, costs approximately R$0.115 per gigabyte per month as of mid-2026. For an organisation carrying 500 GB of duplicated images — a conservative figure for a medium-sized media or government operation — that represents roughly R$690 a month, or more than R$8,000 a year, spent on files that add zero value. Multiply that across hundreds of city agencies, newsrooms and creative companies and the aggregate waste runs into the tens of millions of reais annually.

Detection Tools and What Comes Next

Automated deduplication software has existed for years, but adoption inside Brazilian public institutions remains uneven. The federal government's Ministério da Gestão e da Inovação em Serviços Públicos has pushed standardisation protocols under its Estratégia de Governo Digital 2024-2027, which explicitly targets redundant data as a cost-reduction priority. São Paulo's municipal government has not yet published a parallel local target for image-specific deduplication, though the Secretaria de Inovação's open data roadmap, updated in April 2026, lists data quality improvement as a stated objective for the second half of this year.

For organisations that want to act now, the practical path starts with a hash-based audit — software that generates a unique fingerprint for every file and flags exact matches regardless of filename. Tools such as open-source platforms built on the SHA-256 algorithm can scan a one-terabyte library in under four hours on standard hardware. Several São Paulo-based IT consultancies operating out of the Paulista Avenue tech cluster, including firms that work on contracts with Poupatempo digital infrastructure, already offer this as a standalone service priced from R$3,500 per engagement.

The deeper fix requires policy. Without mandatory metadata standards at the point of file ingestion — the moment an image enters a government or agency system — duplicates will keep accumulating faster than any audit can clear them. São Paulo has the technical talent and the institutional framework to move. The arithmetic of inaction is getting harder to ignore.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily São Paulo

This article was produced by the The Daily São Paulo editorial desk and covers news in São Paulo. See our editorial standards for how we use AI.

The Daily São Paulo brief

The day's São Paulo news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily São Paulo and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to São Paulo news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily São Paulo and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily São Paulo

More in News

Enjoyed this story? Get tomorrow's briefing free.