São Paulo's public and private institutions are sitting on a storage crisis they rarely discuss publicly. Duplicate image files — photographs, scans, satellite frames, medical imagery — now account for an estimated 30 to 40 percent of total data volume held across large urban administrative systems in Brazilian cities, according to benchmarks published by the Brazilian Association of Information Technology and Communication (Brasscom) in its 2025 sector report. In a city running over 12 million residents and generating petabytes of visual data annually, that redundancy carries a direct cost.
The timing matters for São Paulo specifically. Mayor Ricardo Nunes's administration has committed the city to a R$2.1 billion digital transformation programme through 2027, folding everything from urban flood monitoring to public health records into integrated cloud platforms. When underlying data is bloated with duplicates, those migration costs escalate — IT procurement specialists routinely cite a multiplier of 1.3 to 1.8 times initial storage budget projections when deduplication has not been performed before a major system transfer.
Where the Duplication Accumulates
Two local institutions illustrate the scale of the problem on the ground. The Hospital das Clínicas complex in Cerqueira César — the largest hospital in Latin America by bed count — processes roughly 3,000 diagnostic image exams per day across its various institutes, according to figures the institution has cited in academic publications. Radiology departments worldwide typically find that between 15 and 25 percent of stored DICOM files are duplicate or near-duplicate records created by upload errors, patient re-registration, or system migrations. Apply that range to Hospital das Clínicas and you're looking at potentially hundreds of redundant scans accumulating daily, consuming expensive regulated storage that must meet Brazil's Lei Geral de Proteção de Dados (LGPD) compliance standards.
At the municipal level, Geosampa — the Prefeitura de São Paulo's open geodata platform, managed out of the Secretaria Municipal de Urbanismo e Licenciamento on Rua São Bento — hosts aerial and satellite imagery layers updated on irregular cycles. Planners working on flood drainage projects along the Tietê River corridor have flagged internally that overlapping image captures from different survey rounds create redundant tile sets, inflating the platform's storage footprint and slowing query response times for engineers working urgent infrastructure problems.
The Cost in Reais and Hours
Storage costs in Brazil have not fallen as fast as global benchmarks suggest. Enterprise cloud storage from providers operating Brazilian data centres — required for government data under federal regulation — runs between R$0.18 and R$0.35 per gigabyte per month at commercial rates negotiated by mid-sized public institutions. For an organisation holding 500 terabytes of imagery with 35 percent duplication, eliminating those files frees roughly 175 terabytes. At R$0.25 per gigabyte, that translates to approximately R$43,750 in monthly savings — over R$500,000 annually from a single system.
The labour dimension compounds this. IT teams at Avenida Paulista-based fintech firms — several of which manage image-heavy KYC document archives for millions of Brazilian users — report that manual deduplication reviews, when conducted without automated tooling, consume between 12 and 20 staff hours per terabyte of reviewed data. São Paulo's tech unicorn ecosystem, anchored in neighbourhoods like Vila Olímpia and Faria Lima, has accelerated adoption of AI-powered deduplication pipelines precisely because the math on manual review stops working fast.
Federal guidance from the Tribunal de Contas da União has pushed federal agencies toward formal data governance frameworks since 2023, but municipal compliance remains patchy. The Prefeitura de São Paulo's own Controladoria Geral do Município noted in its 2024 annual report that data quality audits across secretarias remain incomplete, a gap that deduplication reviews would partially address.
For administrators and IT managers navigating these numbers now, the practical path is sequenced: audit total image storage volumes first, run automated hash-comparison deduplication tools before any cloud migration, and build deduplication checkpoints into procurement contracts for new imaging systems. Getting those steps into the 2026–2027 budget cycle — before the Nunes administration's digital transformation contracts are fully executed — is where the savings are actually available to capture.