Assinatura gratuita
The Daily São Paulo

São Paulo news, every day

News

São Paulo's Digital Archive Crisis: The Key Decisions Ahead on Duplicate Image Replacement

City agencies and private platforms are converging on a critical moment that will determine how millions of stored images are cleaned, catalogued and preserved across Brazil's largest city.

By São Paulo News Desk · Published 4 July 2026, 3:44 pm

4 min read

São Paulo's Digital Archive Crisis: The Key Decisions Ahead on Duplicate Image Replacement
Photo: Photo by K on Pexels
Traduzindo…

São Paulo's sprawling network of public databases, municipal archives and private tech platforms is sitting on a problem that has quietly compounded for years: duplicate digital images, stored redundantly across servers, costing money, distorting search results and slowing down the systems that millions of residents use every day. The pressure to act is now acute, and the decisions made in the second half of 2026 will set the terms for years of digital infrastructure work.

The issue landed on the agenda of the Secretaria Municipal de Inovação e Tecnologia — the city body responsible for São Paulo's digital transformation programs — earlier this year, as part of a broader audit of the Prefeitura's data storage contracts. Municipal storage costs have ballooned across Latin American city governments in recent years as unstructured image data accumulates from sources ranging from traffic cameras on Avenida Paulista to satellite imagery used in the city's flood-monitoring systems along the Tietê and Pinheiros rivers. Without systematic deduplication, those costs compound with every new upload cycle.

Why the Timing Is Critical

Two deadlines are driving urgency. The Prefeitura de São Paulo, under Mayor Ricardo Nunes, is operating under a technology modernisation framework that runs to the end of 2026, and procurement decisions for new cloud storage contracts must be finalised before the fiscal year closes. At the same time, the federal government's ongoing push through the Ministério da Gestão e da Inovação em Serviços Públicos to standardise data governance across all three tiers of government — municipal, state and federal — means any system São Paulo adopts now will likely need to conform to federal interoperability standards that are still being written in Brasília.

For the city's private sector, the stakes are equally concrete. The Vila Olímpia and Faria Lima corridor, home to dozens of fintech and health-tech startups — several of them unicorns valued above R$1 billion — rely on image recognition pipelines that degrade in accuracy when trained or queried against datasets bloated with duplicates. Engineers at companies operating out of the JK Iguatemi tech cluster have flagged the problem in public developer forums, describing how duplicate product and identity images inflate model error rates and slow API response times.

The financial arithmetic is not abstract. Cloud storage pricing from major providers operating in Brazil — including local data centres in Barueri and Tamboré in the greater São Paulo metropolitan zone — runs at roughly R$0.10 to R$0.25 per gigabyte per month for standard tiers as of mid-2026. A municipal archive holding even 500 terabytes of unaudited image data with a 30 percent duplication rate is paying for roughly 150 terabytes of redundant storage every month. That figure, multiplied across a dozen separate secretariat systems, adds up to millions of reais annually in avoidable expenditure.

What Happens Next

Three decisions will define the path forward. First, city officials must choose between a centralised deduplication approach — running a single hash-comparison and content-fingerprinting pass across all municipal image stores — or a distributed model where each secretariat manages its own cleanup. The centralised route is faster and cheaper per gigabyte but requires standardised metadata formats that do not currently exist across all departments. The Secretaria de Urbanismo e Licenciamento, which maintains construction permit image databases, uses a different file-naming convention than the Companhia de Engenharia de Tráfego, which archives traffic camera footage.

Second, there is the question of what to do with images flagged as duplicates before deletion. Archivists and legal teams are likely to push for a quarantine window — typically 90 days in comparable public-sector implementations — before any permanent removal, to avoid destroying records that may have evidentiary value in ongoing litigation or environmental monitoring cases tied to the city's drainage projects in the Várzea do Tietê.

Third, and most consequential for the private sector, is whether São Paulo adopts an open API standard for deduplication verification that startups and civil society organisations can query. If it does, the potential to clean image data across both public and private systems simultaneously is real. If it does not, companies along Faria Lima will continue running their own parallel processes at their own cost, and the coordination opportunity will close. Procurement documents and public consultations expected before September will be the first real signal of which direction the city intends to move.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily São Paulo

This article was produced by the The Daily São Paulo editorial desk and covers news in São Paulo. See our editorial standards for how we use AI.

The Daily São Paulo brief

The day's São Paulo news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily São Paulo and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to São Paulo news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily São Paulo and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily São Paulo

More in News

Enjoyed this story? Get tomorrow's briefing free.