São Paulo's public and private digital infrastructure is carrying a hidden weight: hundreds of millions of duplicate image files clogging servers, inflating storage costs, and degrading the performance of platforms that range from the Prefeitura Municipal's urban services portal to the logistics hubs that ring the Rodovia Anhanguera. A survey of storage management practices across major municipal systems and private operators, reviewed by The Daily São Paulo this week, points to a problem that has grown quietly for years but is now forcing budget conversations that administrators can no longer defer.
The timing matters. The city's IT directorate is currently finalising its 2026–2027 technology procurement cycle, and Mayor Ricardo Nunes's administration has committed to modernising digital public services under the Programa São Paulo Inteligente. Into that context steps a very specific operational failure: when image deduplication is not enforced, storage needs balloon — and in enterprise environments, storage is never cheap.
What the Data Actually Shows
Industry benchmarks from storage analytics firms suggest that between 25 and 40 percent of files held on large unmanaged servers are exact or near-exact duplicates, with image formats — JPEG, PNG, HEIC — accounting for the largest share by volume. For a city the scale of São Paulo, which manages roughly 12 million active resident records through its digital services stack, even a conservative 25 percent redundancy rate across image-heavy document repositories translates into tens of terabytes of avoidable storage consumption each year.
On the private side, the numbers are sharper still. The e-commerce distribution centres concentrated around the Guarulhos and São Bernardo do Campo corridors rely on product image libraries that can run to several million SKUs. One logistics sector analysis circulated among supply-chain consultants in the first quarter of 2026 estimated that a mid-sized Brazilian marketplace operator maintains, on average, 3.4 copies of every product image across its content delivery infrastructure — master files, regional mirrors, legacy backups, and CDN caches that were never purged. At current AWS São Paulo region pricing of approximately R$0.023 per gigabyte per month for standard object storage, a library of 10 million unoptimised images averaging 800 kilobytes each generates a monthly bill roughly 70 percent higher than it needs to be.
The municipal dimension is equally concrete. The Secretaria Municipal de Gestão launched a digital document audit in March 2026 covering file systems tied to the Nota Fiscal Paulistana programme and the city's urban permit portal, known internally as the SP Licença system. Early findings from that audit, referenced in a budget annex published on the Prefeitura's transparency portal, flagged image duplication as a primary driver of storage overruns in the 2025 fiscal year. The annex did not specify a total excess cost figure, but it identified the problem as a category warranting dedicated remediation funding in the next cycle.
Where the Bottlenecks Show Up on the Ground
Walk into any of the Poupatempo service centres — the Largo do Pinheiros unit on the city's west side handles tens of thousands of document submissions monthly — and the back-end reality is invisible to the person at the counter. But technicians who maintain those systems know that every uploaded ID photo, every scanned utility bill, every permit attachment passes through image pipelines that were built without mandatory deduplication rules. The result accumulates silently until a system slowdown or a storage invoice forces the issue.
For startups operating out of Faria Lima or the tech cluster around Avenida Brigadeiro Faria Lima near Itaim Bibi, the problem is more immediately commercial. Venture-backed companies burning through infrastructure budgets are increasingly deploying automated deduplication tools — perceptual hashing and content-addressable storage are the two dominant approaches — to cut cloud spend before Series B reviews. Several São Paulo-based fintechs that process KYC document images have reported storage cost reductions of 20 to 35 percent after implementing hash-based deduplication, according to presentations shared at the Cubo Itaú tech hub's infrastructure roundtable held in May 2026.
The practical next step for any organisation sitting on unmanaged image libraries is an audit before the end of Q3 2026, when cloud providers typically revise regional pricing. Municipal bodies tied to the Programa São Paulo Inteligente roadmap have until September to submit technology modernisation proposals that qualify for federal co-financing under the Ministério das Comunicações digital infrastructure programme. Missing that window means waiting another budget cycle — and paying for redundant bytes the whole time.