São Paulo's Duplicate Image Problem: The Numbers Exposing a Hidden Digital Crisis
Municipal databases, e-commerce platforms and public archives are drowning in redundant image files — and the storage bill is climbing fast.
Municipal databases, e-commerce platforms and public archives are drowning in redundant image files — and the storage bill is climbing fast.

São Paulo's digital infrastructure is carrying more dead weight than most administrators want to admit. Across municipal databases, retail platforms headquartered in the Faria Lima corridor, and public-sector archives managed by the Prefeitura de São Paulo, duplicate image files have quietly accumulated into a problem measured in petabytes — and, increasingly, in reais wasted every month on cloud storage that should never have been purchased in the first place.
The timing matters. Brazil's federal government has been pushing state and municipal bodies toward a unified digital governance framework under its Programa de Transformação Digital, which sets compliance benchmarks for data deduplication and storage efficiency through 2027. São Paulo, as the country's largest municipal economy, is under particular pressure to meet those targets. Meanwhile, the city's own tech unicorn ecosystem — clustered around the Vila Olímpia and Berrini districts — is generating image-heavy product catalogues, marketing assets and user-generated content at a pace that makes clean data hygiene genuinely difficult.
Industry benchmarks from cloud infrastructure research consistently place duplicate file rates inside large enterprise environments between 25 percent and 40 percent of total stored data, with image files — JPEGs, PNGs, WebP assets — among the worst offenders due to inconsistent upload pipelines and manual file-management habits. For a mid-sized São Paulo e-commerce operation storing, say, 50 terabytes of product imagery, that translates to somewhere between 12 and 20 terabytes of redundant data sitting on paid cloud nodes.
At current AWS São Paulo region pricing — the sa-east-1 zone, which serves the bulk of Brazilian enterprise cloud traffic — standard S3 storage runs approximately R$0.23 per gigabyte per month. Multiply that by 15,000 gigabytes of duplicate images and a single company is burning roughly R$3,450 every month on files it does not need. Annualised, that figure crosses R$41,000 before egress costs are factored in. For the dozens of scale-ups operating out of Itaim Bibi and Pinheiros, the aggregate loss across the sector is not a rounding error.
On the public-sector side, the Secretaria Municipal de Inovação e Tecnologia, which oversees the Prefeitura's digital asset management systems, has been conducting an internal audit of its GeoSampa platform — the city's open geographic data portal — following reports that aerial and satellite imagery uploaded across multiple departments contained significant duplication. GeoSampa indexes tens of thousands of georeferenced image files covering São Paulo's 96 subprefeituras, and overlapping uploads from independent municipal bodies have been flagged as a source of database bloat, though the Prefeitura has not yet published audit results publicly.
The technical fix is well understood. Perceptual hashing algorithms — tools that generate a fingerprint for each image and flag near-identical files regardless of filename or metadata — can reduce duplicate image loads by 30 percent or more in a first pass, according to published benchmarks from open-source projects including ImageHash and the Python-based imagededup library. Several São Paulo-based startups working out of coworking spaces on Rua Pequetita in Vila Olímpia are already building deduplication layers into their content management pipelines, treating it as a cost-control measure rather than a purely technical one.
For organisations that have not yet acted, the practical path forward starts with an audit. Running a hash-based scan across an image repository costs relatively little in compute time and surfaces the scale of the problem within hours. The harder step is governance: establishing upload protocols that prevent duplicates from re-entering the system after a clean-up. Without that, the numbers reset within months.
São Paulo's digital economy is not going to slow its image production. The Feira do Empreendedor, held annually at the Expo Center Norte in Santana, routinely showcases hundreds of small businesses moving product photography online for the first time — each one a new potential source of unmanaged image duplication. Getting the numbers under control now, before municipal compliance deadlines land and cloud bills grow further, is the more affordable option.
How does this story make you feel?
Spread the word
About this article
Published by The Daily São Paulo
Daily brief
Free, in your inbox before 7am. Weekdays.
More in News