São Paulo's municipal governments have spent more than a decade digitising public records, urban planning documents and cultural heritage photographs — and the result, according to archivists and technology specialists familiar with the city's systems, is a sprawling mess of duplicate images that has quietly paralysed search functions across multiple official platforms.
The problem did not appear overnight. It accumulated across successive administrations, each of which launched its own digitisation initiative with its own file-naming conventions, storage vendors and quality controls — or the lack thereof.
A Patchwork of Projects, a Mountain of Redundant Files
The roots of the crisis trace back to at least 2013, when the Prefeitura de São Paulo began an ambitious push to digitise physical records held at the Arquivo Histórico Municipal Washington Luís, a repository on Rua Cantareira in the Centro district that holds colonial-era maps, urban planning blueprints and over a century of photographic documentation. That effort ran in parallel with a separate scanning programme coordinated by the Secretaria Municipal de Cultura, which operated out of offices near Avenida São João, and the two projects never shared a common metadata standard.
By the time later administrations layered in additional platforms — including the GeoSampa geographic information portal and the São Paulo Aberto open-data initiative — every new batch of scanned images was being uploaded against a backdrop of files that had already been uploaded, often more than once, under different filenames. A photograph of the Viaduto do Chá, for example, might exist simultaneously in three separate folders labelled by different archivists using different date formats, different resolution settings and no persistent unique identifier to flag the duplication.
Technology workers contracted by the city to audit the GeoSampa database described encountering image sets where individual aerial photographs of neighbourhoods like Pinheiros and Vila Madalena appeared in four or five separate subdirectories. The audit, conducted in late 2024, identified the structural cause: procurement rules had pushed each secretariat to contract separate cloud-storage providers, none of which talked to one another automatically.
Why the Problem Matters Beyond Filing Cabinets
Duplicate images are not merely a librarian's headache. For urban planners working on São Paulo's chronic flooding and drainage infrastructure — a priority for the Ricardo Nunes administration, which has publicly committed to expanding the city's network of piscinões, or retention reservoirs — outdated or mislabelled aerial imagery can mean engineers are referencing the wrong version of a neighbourhood map. Parelheiros and Grajaú, two southern districts repeatedly battered by flooding during the summer rainy season, have both been the subject of conflicting satellite-image datasets held across different city systems.
The financial stakes are real. Brazil's Lei de Acesso à Informação, in force since 2012, requires public bodies to respond to records requests within 20 business days. Delays caused by staff having to manually triage duplicate files before releasing documents have contributed to municipalities across the state missing that deadline, according to monitoring data published by the Escola de Administração de Empresas de São Paulo da Fundação Getulio Vargas. São Paulo state bodies received more than 890,000 information requests in 2024 alone, a figure released by the state's Ouvidoria Geral.
The federal government's push under Lula's administration to expand the Conecta Gov digital-services framework has added pressure. Federal guidelines issued in early 2025 set interoperability standards that municipal databases must meet to qualify for infrastructure transfer payments — and duplicate, untagged image files are a direct obstacle to compliance.
What comes next is a phased deduplication programme that the Secretaria Municipal de Inovação e Tecnologia began piloting in March 2026 across a subset of the Arquivo Histórico's scanned collections. The approach uses hash-matching algorithms to flag identical binary files before human archivists confirm deletions. Officials familiar with the process say a full rollout across GeoSampa and São Paulo Aberto is targeted for completion before the end of 2027 — giving the city roughly 18 months to bring its image infrastructure in line with federal interoperability standards and, more practically, to ensure that the next time a planner in Grajaú pulls up a flood-plain map, it is the right one.