Assinatura gratuita
The Daily São Paulo

São Paulo news, every day

News

São Paulo's Digital Archives Are Drowning in Duplicate Images — and the Numbers Tell a Damning Story

A growing body of data reveals how redundant image files are costing public institutions and private companies in São Paulo millions of reais in wasted storage and slowed workflows.

By São Paulo News Desk · Published 4 July 2026, 3:58 pm

4 min read

São Paulo's Digital Archives Are Drowning in Duplicate Images — and the Numbers Tell a Damning Story
Photo: Photo by Carol Cardoso on Pexels
Traduzindo…

São Paulo's public and private digital archives collectively store an estimated 40% of their image files as duplicates — redundant copies that consume server capacity, inflate IT budgets, and slow the document management systems that underpin everything from city hall permit processing to hospital record-keeping. That figure, drawn from internal audits cited by technology consultancies operating in the Paulista Avenue corridor, has put pressure on institutions to act before year's end.

The timing matters. The Ricardo Nunes administration is midway through a R$2.3 billion digital transformation program for the Prefeitura de São Paulo, and procurement officers say storage inefficiency is one of the hardest line items to justify to federal oversight bodies under the Lula government's tightened public spending framework. Duplicate image files — product photos uploaded twice to municipal e-procurement platforms, scanned documents that exist in three folders simultaneously, news photography ingested without deduplication filters — are a mundane problem with a surprisingly large price tag.

The Scale of the Problem in Numbers

Object storage in Brazil's enterprise cloud market runs between R$0.08 and R$0.23 per gigabyte per month, depending on the provider and redundancy tier. For a mid-sized São Paulo hospital running roughly 80 terabytes of patient imaging data — chest X-rays, MRI scans, dermatological photos — a 40% duplication rate translates to 32 terabytes of avoidable monthly cost. At the higher pricing tier, that is approximately R$7,360 wasted every month on a single institution's image library. Scale that across the 96 hospitals in the municipal and state network and the figure becomes difficult to dismiss as a rounding error.

The Instituto de Tecnologia e Sociedade do Rio de Janeiro published analysis in late 2025 suggesting that Brazilian public sector bodies spend up to 18% of their cloud infrastructure budgets on redundant data broadly defined — images are the single largest contributor, ahead of video and document formats. Private sector firms headquartered in the Vila Olímpia and Faria Lima financial districts have been faster to automate deduplication because their cloud costs hit quarterly earnings reports directly. The public sector moves slower, but the pressure is building.

Perceptual hashing and content-aware deduplication — the two dominant technical approaches to identifying and replacing duplicate images — have been available as commercial products since at least 2018. Tools from vendors with São Paulo offices, including operations based in the Bela Vista and Brooklin Novo neighbourhoods, now market deduplication as a standalone service starting around R$12,000 per year for small archive volumes. The problem is not that solutions are unavailable. The problem is that procurement cycles at entities like the Secretaria Municipal de Gestão traditionally run 12 to 18 months from needs assessment to contract signature.

What the City and Companies Are Doing About It

The Arquivo Público do Estado de São Paulo, housed near Largo do Arouche in the city centre, began a pilot deduplication project in March 2026 covering its digitised photograph collection, which spans more than 1.2 million scanned images dating back to the 1860s. According to documentation published on the Arquivo's website, the pilot is targeting a 25% reduction in active storage load by December 2026. If achieved, it would free roughly 4.7 terabytes of space on the institution's on-premises servers — modest in absolute terms, but meaningful as proof of concept for larger municipal rollouts.

Private newsrooms and media companies on Avenida Engenheiro Luís Carlos Berrini have been running deduplication workflows inside their content management systems for longer, typically integrating them into photo desk software during post-processing. The efficiency gains there are measured in journalist hours as much as gigabytes: editors who previously spent 20 to 30 minutes per assignment reconciling duplicate images pulled from wire services now rely on automated flagging that cuts that task to under five minutes.

Institutions that have not yet audited their image libraries should treat the second half of 2026 as the window to act. Federal IT governance guidelines updated in January 2026 now require public bodies receiving federal digital transformation funding to demonstrate storage efficiency benchmarks annually. For São Paulo's city government and state institutions, that means the question of duplicate image replacement is no longer just a technical housekeeping matter — it is a compliance requirement with a deadline attached.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily São Paulo

This article was produced by the The Daily São Paulo editorial desk and covers news in São Paulo. See our editorial standards for how we use AI.

The Daily São Paulo brief

The day's São Paulo news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily São Paulo and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to São Paulo news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily São Paulo and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily São Paulo

More in News

Enjoyed this story? Get tomorrow's briefing free.