|
Frequently Asked Statistical Questions (INTERVENTION/OUTLIERS)QUESTION:
Give
me the formal presentation of the IMPACT of outliers. ANSWER:
Outliers
and structure changes are commonly encountered in time series data analysis. The
presence of the extraordinary events could and have misled conventional time
series analysts resulting in erroneous conclusion. The impact of these events
is often overlooked however for the lack of a simple yet effective means to
incorporate these isolated events. Several approaches have been considered in
the literature for handling outliers in a time series. We will first
illustrate the effect o f unknown events which cause simple model
identification to go awry. We will then illustrate what to do in the case
when one knows a priori about the date and nature of the isolated event. We
will also point out a major flaw when one assumes an incorrect model
specification. Then we introduce the notion of finding the intervention
variables through a sequence of alternative regression models yielding
maximum likelihood estimates of both the form and the effect of the isolated
event. Standard identification of Arima models uses the sample ACF as one of
the two vehicles for model identification. The ACF is computed using the
covariance and the variance. An outlier distorts both of these and in effect
dampens the ACF by inflating both measures. Another problem with outliers is
that they can distort the sample ACF and PACF by introducing spurious
structure or correlations. For example consider the circumstance where the
outlier dampens the ACF: Thus
the net effect is to conclude that the ACF is flat; and the resulting
conclusion is that no information from the past is useful. These are the results
of incorrectly using statistics without validating the parametric
requirements. It is necessary to check that no isolated event has inflated
either of these measures leading to an "Alice in Wonderland"
conclusion. Various researches have concluded that the history of stock
market prices is information-less. Perhaps the conclusion should have been
that the analysts were statistic-less. Another way to understand this is to
derive the estimator of the coefficient from a simple model and to evaluate
the effect of a distortion. Consider the true model as an AR(1) with the
following familiar form: or or or The
variance of Y can be derived as: variance(Y) = PHI1*PHI1
variance(Y)+variance(A) thus 1
- P(B) or 1 - PB for a simple AR1 model case we have: Now,
if the true state of nature is where an intervention of form I(t) occurs at
time period t with a magnitude of W we have: with
=true variance(Y) + distortion thus The
inaccuracy or bias due to the intervention is not predictable due to the
complex nature of the relationship. At one extreme the addition of the
squared bias to variance(A) would increase the numerator and drive the ratio
to 1 and the estimate of PHI1 to zero. The rate at which this happens depends
on the relative size of the variances and the magnitude and duration of the
isolated event. Thus the presence of an outlier could hide the true model.
Now consider another option where the variance(Y) is large relative to
variance(A). The effect of the bias is to drive the ratio to zero and the
estimate of PHI1 to unity. A shift in the mean would generate an ACF that did
not die out slowly thus leading to a misidentified first difference model. In
conclusion, the effects of the outlier depend on the true state of nature. It
can both incorrectly hide model form and incorrectly generate evidence of a
bogus model. |