Assinatura gratuita
The Daily São Paulo

São Paulo news, every day

News

São Paulo Races to Scrub Duplicate Images From City Records — But Lags Behind Bogotá and Tokyo

A sprawling digitisation backlog means millions of duplicate scanned documents are clogging the city's public databases, and the fix is proving harder than officials expected.

By São Paulo News Desk · Published 4 July 2026, 3:43 pm

3 min read

São Paulo Races to Scrub Duplicate Images From City Records — But Lags Behind Bogotá and Tokyo
Photo: Photo by Sérgio Souza on Pexels
Traduzindo…

São Paulo's municipal archive holds more than 40 million digitised documents, and a significant share of them appear more than once. That is the core finding of an internal review completed earlier this year by the Secretaria Municipal de Gestão, which identified duplicate image files as a growing drag on storage costs, search accuracy and public transparency portals used by residents across the city's 96 subprefectures.

The problem is not purely bureaucratic tidiness. The city's Lei de Acesso à Informação requests — São Paulo processes more of them annually than any other Brazilian municipality — return cluttered results when duplicate scans of the same permit, contract or urban-planning map sit uncleaned in the system. Lawyers working near the Fórum João Mendes, in the Centro Histórico neighbourhood, have complained for years that licence searches for properties in districts like Pinheiros and Vila Madalena pull up redundant PDFs that slow down due-diligence work on transactions worth millions of reais.

What São Paulo Is Actually Doing

Since March 2026, the Secretaria has been running a pilot deduplication programme across three municipal departments — Habitação, Obras and Finanças — using a combination of perceptual hashing and metadata cross-referencing. The pilot covers roughly 1.2 million document images, according to materials circulated at a public hearing held at the Câmara Municipal on Viaduto Jacareí in May. A full rollout is pencilled in for the first quarter of 2027, contingent on budget approval in the next Plano Plurianual cycle.

The tool being tested is not proprietary. The Secretaria contracted the São Paulo-based technology company Totvs to adapt an open-source deduplication stack for the city's Oracle-based document management environment. Totvs, headquartered in Bom Retiro, already handles payroll and fiscal systems for several state-level agencies in Brazil, which gave it an advantage in the procurement process over international bidders.

Funding is the immediate constraint. The pilot was budgeted at R$4.2 million for the 18-month testing phase. Scaling to the full 40-million-document archive would, by the Secretaria's own projections presented at the May hearing, require between R$28 million and R$35 million, a figure that has yet to secure a dedicated line in the 2026 municipal budget under Mayor Ricardo Nunes.

How That Compares to Other Cities

Bogotá completed a similar deduplication exercise across its Archivo de Bogotá in 2024, clearing an estimated 6 million redundant images from a 22-million-document base. The Colombian capital used funding from the Inter-American Development Bank's digital-government programme and finished the work in 14 months. Tokyo's Bureau of General Affairs began automated deduplication on its municipal land-registry scans back in 2021 and now runs quarterly automated audits to prevent the problem from re-accumulating.

Mexico City, whose Archivo Histórico sits in the Palacio de Lecumberri, completed a two-year deduplication project in partnership with the Universidad Nacional Autónoma de México in 2023. Officials there reported a 31 percent reduction in storage load and a measurable improvement in keyword-search response times on public portals — concrete benchmarks that São Paulo's Secretaria has cited internally as a target model, according to the May hearing materials.

Where São Paulo differs from all three comparators is scale. Its document archive is larger than Bogotá's and Mexico City's combined, and its incoming digitisation rate — driven by court filings, building permits and social-programme enrolments — runs at an estimated 800,000 new scans per month. That intake pace means that even a successful deduplication sweep risks being outrun by new redundant uploads if the city does not simultaneously reform how documents are scanned and ingested at source.

The next public checkpoint is a progress report the Secretaria is scheduled to present to the Câmara's technology committee in September. If the pilot results are strong enough, budget negotiators could accelerate the timeline. If the R$35 million figure sticks without a dedicated funding line, the broader rollout will almost certainly slip into 2028 — by which point, at current ingestion rates, the duplicate backlog will have grown by another 10 million images at minimum.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily São Paulo

This article was produced by the The Daily São Paulo editorial desk and covers news in São Paulo. See our editorial standards for how we use AI.

The Daily São Paulo brief

The day's São Paulo news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily São Paulo and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to São Paulo news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily São Paulo and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily São Paulo

More in News

Enjoyed this story? Get tomorrow's briefing free.