UNDERSTANDING AND IDENTIFICATION OF ARIMA MODELS
THIS SEMINAR SECTION EXPLAINS IN SIMPLE TERMS SOME OF THE WHY'S AND THE WHEREFORE'S OF
SIMPLE AUTO-PROJECTIVE MODELS.
Time series analysis attempts to match the observable patterns in
the data to an underlying model or sequences of models. These models
may be identical or different for distinct ranges of time. Even if the model
form is identical the model parameters may be different. It is possible and
even probable that the variance of the series may not be constant. If it is
not constant then it may be related to the level of the series or simply
evidence of major changes or might even stochastically evolve over time.
| |
Time series has three major dimensions. The first case and simplest is a single
endogenous (dependent) variable that may or may not be effected by unusual
events.
The second case extends the first by the inclusion of input (causal)
variables that may have a role in the prediction or
modelling of this one dependent variable. The third case
allows for simultaneous modelling of multiple dependent series.
Just as unexpected interventions may have effected the Y variable in case 1, these
same Intervention Variables might also impact 2 and 3.
Modeling skills required to do this are both rare and human intensive. Thus
it is natural to develop and hone methods to aid or to automate the process.
AUTOMATIC FORECASTING SYSTEMS has been doing this since 1976.
| |
The fundamental objective of AUTOBOX is to aid the researcher by developing
candidate models by comparing the theoretical autocorrelation structure
for viable alternatives with the observed state of nature, i.e. the actual
autocorrelation structure. Test statistics are developed to optimally
match these thus suggesting a model that approximates the true but unknown
model.
| |
Statisticians can develop the theoretical fingerprints for different
underlying models. What is done in practice is to follow the ideas
originally proposed by Yogi Berra, "You can observe a lot by
simply watching". We observe the data, characterize it and then
attempt to match it to alternative states of nature. Sardonically,
this is sometimes referred to as Hypothesis Generation.
The alternative approach, none to pretty I might add, is to
assume knowledge of the underlying model before data is actually
collected and analyzed.
| |
For example if the underlying model is a simple autoregressive model
of order 1 with a parameter of .8 , we can write it as follows.
| |
This model has a theoretical autocorrelation that follows a simple
recursive shape. If we observe a time series with a similar
shaped autocorrelation function then we may have found a match and
can then set out to actually test a statistical hypothesis.
| |
We will now attempt to apply this paradigm to an actual time series.
We show a plot of the original time series and ponder how to characterize
these values in order to develop the underlying
pattern or scheme.
| |
As a first cut, we can compute various basic statistics such as the
mean and variance. The problem is that these statistics don't speak to the
underlying memory structure or auto-dependence between successive values.
| |
The sample autocorrelation measures the auto-dependence or internal
predictability between contiguous values. For example, this series has a
simple correlation of .764 between values one period apart. Additionally,
the simple correlation between values two periods apart is .315.
Our task will be to attempt to match this with a good candidate. In this
regard, we are behaving like a "yenta" in trying to get a good match.
| |
There is a statistical tool called the Partial Regression Coefficient
which measures the importance of additional or auxiliary variables. For
example, as much as there is a one period auto-dependency implied by the
sample PACF, partial autocorrelation function, of .764 there is evidence
for an additional lag of two as the value -.645 is quite significant. Note
that subsequent auxiliary lags are not significant (-.051,-.193,-.101 etc.).
| |
The next 6 visuals present alternative models and an examination
supports the ultimate selection of a mixed model (AR2 and MA1).
This selection is a slight over-parameterization and leads us to a
reduced form of AR2. Model identification is essentially a trial and
error approach until one reaches a comfortable solution or match.
The six candidates, in order, are MA2, AR1, AR1/MA1, AR1/MA2,
AR2 and finally AR2/MA1 the ultimate choice.
| |
MA2
| |
AR1
| |
AR1/MA1
| |
AR1/MA2
| |
AR2
| |
AR2/MA1
| |
We show the results of estimation for the candidate model. Note the
clear non-significance of the MA1 coefficient. The estimated value is
.158 which has an associated probability value of .2. Clearly non-significant.
| |
This test of necessity leads to a reduced model of an AR2.
| |
Expressed as a lagged rational expectations model we have the following.
| |
This leads to its utilization to make a prediction.
| |
In terms of analysis we show the Fit, Actual and Forecast.
| |
And now the Actual and Residuals. Notice how the residuals have been
purged of any structure, thus being unpredictable and random in form. Statisticians
are Noise Makers in practice.
| |
Finally the Actual and the Forecasts.
| |
Some software packages, SAS for example try to crunch their way
to a solution. For more on the deficiencies of some very well known
models follow the arrow.
| |
I have been told that it is never a good idea to show students what shouldn't be
done as they might get the idea that they should follow the example.
However .....
To see an example of what you should studiously avoid
please click on the arrow. One unusual
value is enough to make the "pick-best" from a pre-set selection of models
absolutlely go bonkers ! The data should be allowed to mix'n match
the combination of history,causals and dummies. Try it yourself. Create
a time series with a KNOWN structure or select a series from a textbook
and perturb it with either a pulse(s) or a seasonal pulse or a level
shift or a local time trend. Canned or "crunch approaches" simply don't work.
The main idea is that pre-set or pick from the bunch approaches don't
offer enough latitude or breadth in their approach.
| |
CLICK HERE:Home Page For AUTOBOX