One question that we get – rarely, but occasionally – is on the availability of manzana level GIS data for Mexico, whether that’s manzana boundaries or manzana demographics. There are more than a million manzanas delineated by INEGI, covering urban areas. While that alone sounds like a compelling reason to use these shapefiles for demographic analysis, there's more to the story. Let's dig into what a manzana is, what it does, and why (or why not) you should be using them.
Manzana is an interesting word in Spanish, in that is has two very different meanings. On the one hand, a manzana is an apple. But, of more relevance to GIS analysts, a manzana is a block (as in, a city block). It even leads into the word amanzanamiento (a tongue twister for sure), which is often used at INEGI and other Latin American census bureaus to describe the process of dividing up urban territory into statistical units that are generally really small.
How small? The coy answer is “smaller than they need to be”. For census users -- by that, we mean hardcore demographers who are mostly in the public or academic sectors -- smaller is better. There is a select academic elite that has access to microdata, otherwise known as household-level raw data from the original Censo 2010 questionnaires. Everyone else – business users included – only has access to summary statistics, meaning an aggregation of household-level data into a geostatistical unit (like a municipio, a localidad, an AGEB or a manzana).
The problem with smaller geostatistical units, and particularly the manzana, is the privacy protection mechanism used by INEGI, along with virtually every census bureau facing similar pressure to prevent misuse of personal data. For very small geostatistical units, ancillary demographic data such as age, indigenous status, or access to poverty-oriented social benefits could be used to pry into someone’s personal, private details. Nosey neighbors, potential in-laws, creditors or criminals could potentially use these details to the detriment of the respondent. For example, when INEGI rolls up Censo 2010 data to the manzana level or AGEB level, it zeroes out any demographic indicator with fewer than 3 persons counted. Note that INEGI does not zero out the main “mass” counts: population count and household count.
While that seems like a rarity, it can make a big impact on your demographic data analysis. These false zeroes move through the process undetected and can wreak havoc on calculated variables. Say, for example, you wanted to calculate the percentage of total persons who are age 50 or older. By summing the variable for age 50+, and then dividing that count by the total population, you have a synthesized, lower-than-reality numerator divided by a real-world denominator, potentially leading to a smaller percentage than you’d expect.
While AGEBs are subject to the same privacy protection mechanism, the fact is that there are dramatically fewer AGEBs that qualify for privacy protection. The mechanism affects at least one variable in 98% of manzanas, yet only 32% of AGEBs.
While manzanas are a really interesting, fine-grained level of geography for demographic analysis, they come with too much baggage. Their small size may give users a false sense of security that the data will be more accurate, but when it comes to any numbers besides mass counts, they leave information off the table. While these challenges have led GeoAnalitica to discontinue manzana-level demographic data as a standard product, we do realize that every client has unique needs and processes. We continue to work with manzana data to create custom, tailored products to serve specific client needs. If you’re thinking about manzana level data, or are curious how AGEB data stacks up, we’d love to have that conversation with you.