A Note on the Ecological Fallacy

The ecological fallacy is an example of the effect of spurious correlation. It is called "ecological" not because it has anything to do with ecology or the environment but because it has to do with analyzing data areas or groups or aggregates. Typically, what happens is that aggregates of data (like counties) will show some relationship between the average value of one variable and the average value of another; however, at the same time, the relationship between the individual values of those variables may be quite different.

Here is a hypothetical example. Suppose we had a survey of ethnic restaurants in the Bay Area that scored them on quality of their food. Suppose we analyzed the relationship between quality of food in ethnic restaurants and the level of personal income by zip code. It is not unreasonable to expect that the better ethnic restaurants will be in low income areas; for example, it is not easy to buy soul food in Piedmont or Atherton. This means that the data would show a negative association between income and quality of food.

On the other hand, it is not unreasonable to expect that customers with higher incomes will pay more for better ethnic food. So, if you looked at this information at the level of the individual customer and the individual restaurant, you would see that there was a positive association between level of income and quality of food, even if people only eat out in their own zip code area.

Here is a hypothetical example of the data at the level of individual restaurants, each with a food score, a zip code location, and the average income of the customers who eat in that restaurant.

Zip codeFood Score Income
9999120 48
9999121 49
9999122 50
99991 23 51
9999124 52
9999225 43
9999226 44
9999227 45
9999228 46
9999229 47
9999330 38
9999331 39
9999332 40
9999333 41
9999334 42
9999435 33
9999436 34
9999437 35
9999438 36
9999439 37
9999540 28
9999541 29
9999542 30
9999543 31
9999544 32
9999645 23
9999646 24
9999647 25
9999648 26
9999649 27

If this information were available in a typical census it would look like this:

Zip codeMean Food Score Mean Income
9999122 50
9999227 45
9999332 40
9999437 35
9999542 31
9999647 25

If you did a graph of the information at the zip code level you would see this

The higher income the worse the food.

On the other hand if you did a graph of the information at the level of the individual restaurant you would see this:

And it is clear from this picture that food quality and customer income rise together, that this same relationship persists across zip codes, and the decline across zip code areas is just a function of the difference between them as aggregates.

It would be false to claim that people spend more money for worse food. They spend more money for better food, but it depends on where they eat. ÿ

CLICK HERE:Home Page For AUTOBOX