AN UNUSUAL VALUE
- Consider a case where the true but unknown model is: Y(t)
= v0*X(t) + v1*I(t=to) + A(t) where A is an i.i.d.
( gaussian ) error distribution
I(t=to) is an Intervention Variable such that
I(t) = 0 for all t , t = 1,n where t <> to I(t) = 1 for t = t0
and where the current tentative, albeit incorrect model is Y(t)
= v0*X(t) + e(t) thus we have a case where e(t)
= v1*I(t=to) + A(t) It is clear that if we
correlate e(t) and I(t=to) we will identify a statistically
significant relationship which will then lead to the required augmentation
strategy.
The strategy is then clear, for all possible or candidate "new
variables" such as the I(t=1), I(t=2),....I(t=n)...etc. compute partial
correlations which will measure the expected impact of the trial candidate.
Select that one which has maximum correlation. This is equivalent to
selecting the point in time where the largest absolute residual occurred.
These approaches to developing new variables from model implied series have
had outstanding success with time-series data.
UNUSUAL VALUES
Unless you deal with anomalies, e.g. unusual values, the estimates of
coefficients can be seriously biased. In this example, we illustrate the
effect of one unusual value. Outlier detection requires an estimate of the
standard deviation. This estimate requires the utilization of the predicted
value. See for more details on the
standard deviation.