São Paulo's Digital Mess: The Numbers Behind the City's Duplicate Image Problem
From public archives to e-commerce platforms, redundant image files are costing São Paulo businesses and agencies millions of reais in wasted storage and slower systems.
From public archives to e-commerce platforms, redundant image files are costing São Paulo businesses and agencies millions of reais in wasted storage and slower systems.

São Paulo's public and private digital infrastructure is carrying a hidden weight. A growing body of technical audits and platform data shows that duplicate image files — identical or near-identical visual assets stored multiple times across servers — account for a disproportionate share of the city's digital storage costs, slowing down everything from municipal portals to the e-commerce backends of companies headquartered along Avenida Faria Lima.
The timing matters. Brazil's Lei Geral de Proteção de Dados, the LGPD, has pushed organisations across the country to audit what they store and why. That audit process, accelerated since 2023 enforcement actions by the Autoridade Nacional de Proteção de Dados, is forcing IT departments in São Paulo to confront data they never properly catalogued — and duplicate images are surfacing in enormous quantities.
Industry benchmarks from cloud storage consultancies operating in Brazil suggest that duplicate and redundant files can represent between 25 and 40 percent of total unstructured data held by mid-sized companies. For image-heavy sectors — retail, real estate, media — that figure climbs higher. São Paulo hosts the headquarters of major e-commerce operations including Mercado Livre's Brazilian logistics hub in Osasco and Magazine Luiza, whose technology teams manage product catalogues running into the tens of millions of SKUs. Each product listing historically generated multiple image variants: thumbnails, banners, mobile crops, and campaign-specific versions, often stored without a deduplication protocol.
The financial exposure is real. Amazon Web Services and Google Cloud, both of which have data centre infrastructure serving the Greater São Paulo region, charge Brazilian clients in US dollars for storage tiers. With the real trading around R$5.60 to the dollar in mid-2026, every unnecessary gigabyte of duplicated image storage carries a direct foreign-currency cost. A retail operation storing 10 terabytes of duplicate images on a standard cloud tier could be paying the equivalent of R$3,000 to R$5,000 per month for data it does not need.
The Prefeitura de São Paulo's own digital estate is not exempt. The city's open data portal, managed through the Secretaria Municipal de Inovação e Tecnologia, publishes datasets and accompanying visual assets across multiple departments. Technical reviews shared at events organised by the São Paulo Tech Hub — a city-supported initiative operating out of Consolação — have repeatedly flagged duplicated asset management as an unresolved structural issue in municipal IT governance.
The solution is not technically complex. Hash-based deduplication — a process where software generates a unique fingerprint for each image file and flags exact matches — can identify and flag redundant files within hours on most enterprise systems. Perceptual hashing tools go further, catching near-duplicate images that differ only in compression or minor cropping. Several technology firms based in the Vila Olímpia and Itaim Bibi neighbourhoods already sell deduplication tools tailored to Portuguese-language CMS platforms commonly used by Brazilian publishers and retailers.
The barrier is organisational, not technical. Teams that created image libraries under one department rarely communicate with teams in another, and without a centralised digital asset management policy, duplicates accumulate by default. The Brazilian Association of Software Companies, Abes, noted in its 2025 annual report that data governance remains the single most cited IT deficiency among its member companies.
For São Paulo businesses starting this process now, the practical path is straightforward: commission a storage audit before the end of the third quarter, prioritise image directories — which typically grow faster than document repositories — and implement a deduplication pass before any cloud contract renewal. Organisations under LGPD scrutiny have additional incentive: storing unnecessary copies of images that include personal data, such as employee headshots or customer-submitted photos, creates compliance exposure on top of the storage bill. The numbers make the case. Cleaning up duplicated image libraries is not a cosmetic fix — it is a cost centre that São Paulo's organisations can shut down before the end of the year.
How does this story make you feel?
Spread the word
About this article
Published by The Daily São Paulo
Daily brief
Free, in your inbox before 7am. Weekdays.
More in News