outlier detection

AN UNUSUAL VALUE

Consider a case where the true but unknown model is: Y(t) = v0*X(t) + v1*I(t=to) + A(t) where A is an i.i.d. ( gaussian ) error distribution

I(t=to) is an Intervention Variable such that

I(t) = 0 for all t , t = 1,n where t <> to I(t) = 1 for t = t0

and where the current tentative, albeit incorrect model is Y(t) = v0*X(t) + e(t) thus we have a case where e(t) = v1*I(t=to) + A(t) It is clear that if we correlate e(t) and I(t=to) we will identify a statistically significant relationship which will then lead to the required augmentation strategy.

The strategy is then clear, for all possible or candidate "new variables" such as the I(t=1), I(t=2),....I(t=n)...etc. compute partial correlations which will measure the expected impact of the trial candidate. Select that one which has maximum correlation. This is equivalent to selecting the point in time where the largest absolute residual occurred.

These approaches to developing new variables from model implied series have had outstanding success with time-series data.

UNUSUAL VALUES

Unless you deal with anomalies, e.g. unusual values, the estimates of coefficients can be seriously biased. In this example, we illustrate the effect of one unusual value. Outlier detection requires an estimate of the standard deviation. This estimate requires the utilization of the predicted value. See for more details on the standard deviation.