SPURIOUS CORRELATION AND IT'S DETECTION

SPURIOUS CORRELATION

When we use Pearson's Correlation procedure, we sometimes end up with coefficients that indicate a relationship when there really isn't one. For example, there is a fairly strong positive correlation between fire trucks and fire damage. More fire trucks at a fire scene the more damage done at a fire scene. Are the fire trucks are causing the damage??? No! This correlation is a false, accidental correlation. Statisticians and users of statistics refer to this type of accidental association as a SPURIOUS CORRELATION. If the analyst uses Pearson's correlation coefficient with time series, it can be a pitfall. If the two series are normally distributed without autocorrelation then it is correct to use such simple procedures. Independent observations which are not autocorrelated are typically found in cross-sectional data and not in time series data. Otherwise, you must use transfer function methods as detailed by Box-Jenkins by building a pre-whitened filter for the input series in order to assess the conditional impact of one series on another. Spurious correlation is normally due to other extraneous variables that are associated with the independent and dependent variables focused on at the time. In the fire damage example the extraneous variable was fire intensity. Intensity of the fire was positively related to the number of fire trucks at the scene and positively related to the amount of damage at the scene. This situation will result in the statistical appearance that fire trucks and fire damage are directly related. They are related, but only by accident (or spuriously).

Statistics of the amount of damage caused in house fires show that the larger the number of firefighters attending the scene, the worse the damage!

This is an example of what is called the Simpson's Paradox. The apparent association is due to the omission of some important information. In the example of house fires, the size of the fire needs to be taken into account --- more firefighters are sent to larger fires and the larger the fires, the worse the damage.

This is also related to Simpson's paradox if you consider fire size as categorical (e.g. small, medium, large). The overall effect is that more firemen (seem to) imply more damage, however, within each category of fire, more firemen imply less damage. The relationship for every subgroup is the opposite of the relationship for the entire group taken as a whole.

In many situations, the explanation for some apparent association cannot be identified easily. One example is the association between smoking and lung cancer. It has been argued that the apparent association between the two may be due to some genetic factor that predisposes people both to nicotine addiction and lung cancer. If this is true, then smoking cannot be blamed for causing cancer. It is only after considerable research, with the aid of statistical methods, that it is now generally accepted that smoking is a contributory cause of lung cancer.

CLICK HERE:Home Page For AUTOBOX