SPURIOUS CORRELATION AND ITS DETECTION

## STORKS BRING BABIES

```
Ratios are used in many fields to adjust or normalize
one measure for another in order to make comparisons or
rankings.  In economics, national indices for wealth are
formed by the ratio of wealth to population size, examples
being per capita income and gross national product per
capita.  In nutrition, the weight of people relative to their
frame size is captured as body mass index (weight in kg /
height in meters squared).

Regression analysis is the standard way to adjust one
measure for another.  Any observations falling on the
regression line are thought of as being equal relative to the
x variable.  Observations above the line are relatively large
and ones below the line are relatively small.  In regression
analysis, a constant is included to estimate the value of the
response variable when the covariate is at zero.  A ratio is
a special case of regression, equivalent to fitting without a
constant or forcing the constant to be zero.

If Y= B0 + B1 X then we can write

Z=Y/X

So, analyzing Z is identical to regression analysis where
B0=0 by specification, i.e omission.

In human biology, physiology, and nutrition, there is
increasing awareness that the use of such ratios can lead to
spurious results.  A workshop at the April 1996 Experimental
Biology meetings was devoted to these concerns, which stem
from the implicit assumptions that the relationship between
the numerator and denominator of a ratio is a straight line
with an intercept of zero.  Recent studies have demonstrated
that often these linear and zero-intercept assumptions are
not met, with the consequence that proper adjustment for the

Recent studies have demonstrated that two ratios can
appear to be related even when the numerator measures are
clearly completely independent.

The use of ratios as response variables in regression
should be avoided if possible in favor of adjusting for the
denominator measure by including it as a covariate in the
regression.  If ratios are used, one simple way to mitigate
these concerns and to ensure that complete adjustment has
been made is to include the denominator of the ratios used as
a covariate.

Jerzy Neyman used the following example to "prove" that
storks bring babies.

(u)                     (v)                 (w)
Population of Storks     Number of Babies Born      Women (k)

County 1          2                     10                   1
County 2          2                     15                   1
County 3          2                     20                   1
County 4          3                     10                   1
County 5          3                     15                   1
County 6          3                     20                   1
County 7          4                     10                   1
County 8          4                     15                   1
County 9          4                     20                   1
County 10         4                     15                   2
County 11         4                     20                   2
County 12         4                     25                   2
County 13         5                     15                   2
County 14         5                     20                   2
County 15         5                     25                   2
County 16         6                     15                   2
County 17         6                     20                   2
County 18         6                     25                   2
County 19         5                     20                   3
County 20         5                     25                   3
County 21         5                     30                   3
County 22         6                     20                   3
County 23         6                     25                   3
County 24         6                     30                   3
County 25         7                     22                   3
County 26         7                     25                   3
County 27         7                     30                   3
County 28         6                     25                   4
County 29         6                     30                   4
County 30         6                     35                   4
County 31         6                     25                   4
County 32         6                     30                   4
County 33         6                     35                   4
County 34         8                     30                   4
County 35         8                     35                   4
County 36         8                     40                   4
County 37         7                     30                   5
County 38         7                     35                   5
County 39         7                     40                   5
County 40         8                     30                   5
County 41         8                     35                   5
County 42         8                     40                   5
County 43         9                     30                   5
County 44         9                     35                   5
County 45         9                     40                   5
County 46         8                     35                   6
County 47         8                     40                   6
County 48         8                     45                   6
County 49         9                     35                   6
County 50         9                     40                   6
County 51         9                     45                   6
County 52        10                     35                   6
County 53        10                     40                   6
County 54        10                     45                   6

We can now compute

x = u/w        storks per capita
y = v/w        babies per capita

We can now find a statistically significant relationship between x and y.

Density of storks   Number of    Average  Class

per 10,000 women   counties    birth rate  average

1.33             3         6.67      7.12
1.40             3         7.00
1.50             6         7.08
1.60             3         7.00
1.67             6         7.50

-------------------------------

1.75             3         7.50      9.22
1.80             3         7.00
2.00            12        10.21

-------------------------------

2.33             3         8.33     11.67
2.50             3        10.00
3.00             6        12.50
4.00             3        15.00

The idea here is that one should prefer to use the
original data rather than the ratio.