DATA

QUESTION ABOUT ARIMA MODEL IDENTIFICATION:

How do determine the order of an ARIMA model?

ANSWER:

Simple Question and as always two kinds of answers. A complex answer and a simple answer.

COMPLEX ANSWER ---------------

Box and Jenkins suggested that you use the sample autocorrelation and the sample partial autocorrelation to do this. The approach explicitly assumes:

1. That the correct number of observations are being used. For example, if the model or parameters have changed over time then one might either model these changes or simply use the data from the most recent regime. AUTOBOX a software package that we develop and market does this as part of its tour de force.

2. That the variance of the errors of the underlying model is invariant or constant. This means that the variance for each subgroup of data is the same and doesn't depend on the level or the point in time. If this is violated then one can remedy this by stabilizing the variance. As before aggressive software incorporates these kinds of issues.

3. That there are no deterministic patterns in the data. For example, there may be a TREND or a number of TRENDS which should be ascribed to dummy variables of the form 1,2,3,....,t or 0,0,0,1,2,3,....,t-3 where a dead time of say 3 periods is appropriate. Also, one must not have any PULSES or one-time unusual values. Additionally, there should be no level or step shifts. Also, no seasonal pulses should be present.

The reason for all of this is that if they do exist then the sample autocorrelation and partial autocorrelation will seem to imply ARIMA structure when not needed (FALSE POSITIVE). Also the presence of these kind of model components can obfuscate or hide structure. For example, a single outlier or pulse can create an "Alice in Wonderland " effect where the structure is masked by the outlier.

Not to worry though good software can augment an initial model by accounting for these kinds of variables (INTERVENTION DETECTION) leading to a positive resolution and a robust identification of the order of the ARIMA model.

.............

If all of these issues are resolved then and only then can MODEL IDENTIFICATION proceed ala Box and Jenkins. They suggested that if the sample autocorrelation did not "die out" or become non-significant after a number of lags then one should difference the series to make it stationary. Unfortunately this was a slight (gross ?) oversell as a series that has deteministic structure will also evidence a non-disappearing autocorrelation function. It is like a doctor observing a symptom (sample autocorrelation) and inducing that the sickness or underlying problem is one thing whilst the true state of nature is something else again.

Continuing let's assume that the series is made stationary either by differencing or de-trending (multiple trends?) and is now ready to be analyzed in order to assess the order of the ARMA structure.

IFF (If and ONLY IF ) the underlying model is a PURE AR or a PURE MA

THEN

If the sample autocorrelation (acf) has MORE significant lags than the partial-autocorrelation function then the process is DOMINATED by AUTOREGRESSIVE STRUCTURE. The order of the AUTOREGRESSIVE PROCESS is indicated by the number of significant partials.

If the sample partial autocorrelation (pacf) has MORE significant lags than the autocorrelation function then the process is DOMINATED by MOVING AVERAGE STRUCTURE. The order of the MOVING AVERAGE PROCESS is indicated by the number of significant autocorrelations.

ELSE

More complicated procedures are necessary

ENDIF

SIMPLE ANSWER -------------

Use AUTOBOX as a productivity aid.