QUESTIONS/ANSWERS:
Subject: Forecasting (TS
and input variables) Date: Wed, 12 Nov 1997 21:28:46 -0800 Hello sci.stat.math
and sci.op-research readers! I am a young OR analyst. I have a few questions
regarding forecasting. Most of them are related to input variables.
Please, reply by
e-mail (maryse.turcotte@sympatico.ca) and I will summarize to the groups! Thank
you very much!
Questions: 1. What are
the tests that one can perform to determine whether it is appropriate or not to
use
an input variable? Is a test of correlation
between the 2 time series sufficient or does there exist some
tests that are especially designed to
determine whether or not to include the additional information? (I've
tried with or without an input variable and
compared the resulting RMSE, ... on hold-out samples because
adding an input variable always seemed to
improve the fitting!)
AFS RESPONSE:
If you omit a needed
stochastic input series then the noise or error from the under
-specified model will
incorporate or reflect that omission. This aspect of ARIMA structures is well
known. If you omit a needed deterministic
series, for example a known intervention , the mean of the
under-specified error process will be
effected and thus the variance of the errors untreated will be larger
than necessary leading to a downward bias in
test statistics. An example of an omitted stochastic series is
if you omit the earnings of a company in
predicting the stock price, history of the stock price becomes
"important" as previous earnings
have effected previous stock prices thus the effect of earnings has
already been incorporated by using the
history of the stock price. Another example is if you omit
temperature data from a model designed to
predict monthly beer sales you will identify a seasonal ARIMA
structure. This structure disappears when you
incorporate temperature into the model as an explicit
input series. Thus the ARIMA structure is
clearly seen as a proxy for the omitted temperature series.
This aspect of ARIMA structure was pointed
out to me by Fernandez in an AER article in 1977. The test
for correlation , as you put it , is potentially
flawed by
1.
autocorrelation
within the input series itself
2.
the effect
of Pulses , level shifts , Seasonal Pulses and Local time trends on either the
input or
the output series.
These effects can be
identified and treated using residual diagnostic checking or INTERVENTION
DETECTION procedures. Bartlett warned in 1932
about "why we sometimes get nonsense correlation
between two time series" . Fama erred in
the other direction by not accounting for unusual values in his
"proof" that the stock market was a
random walk. Whether you use ARIMA structure or the actual X
variable you can only improve the fit or the
R-Squared. This is an aspect of the error minimization
process. Whether or not the improvement is
statistically significant is available via the likelihood ratio
test (F or T). The real question is whether
or not this actually improves the prediction. Care should be
taken to evaluate forecast errors from a
number of different origins and for a number of different lead
times. The correct procedure to identify the
nature and form of a stochastic input is to pre-whiten the
input series and to pass this filter over the
Y series and to compute cross-correlations of these two
proxies. This is done for one and only one
reason to IDENTIFY the appropriate model structure. The
literature of TRANSFER FUNCTIONS is
appropriate in this regard. Note that this is a tentative
identification and may be flawed by outliers
or incorrect model identification. It is necessary to
simultaneously estimate and to examine the
residuals for: a. any autocorrelative structure b. whether or
not the residuals can be predicted or modeled
by omitted lags in the stochastic series. c. unusual values in
the mean of the errors (INTERVENTION
DETECTION) or the variance of the errors (NON-CONSTANT
VARIANCE) . Note that differencing is include
ONLY to identify and may or may not be necessary in the
actual transfer function. Early statisticians
often detrended or differenced data prior to computing cross
-correlations. These
up-front filters are of course subsets of extended ARIMA structures and may be
counter-productive. The form and nature of
the correct filter can be identified from the data itself.
Questions: 2. There are
some analysis that can be performed prior to choosing a time series forecasting
model. I'm thinking about autocorrelations,
partial autocorrelations, ... In your opinion, should one stick
to the model prescribed by the results of
these analysis even though there are some other models that
seem to perform better? Is it wise to choose
the most easily implementable model of a subset of models
that seem to perform a little better than the
others when one is not sure about which one is the most
suitable?
AFS RESPONSE: The
techniques of autocorrelations, partial autocorrelations, cross-correlations
are all
useful but they are estimated by error
minimization procedures. These tools can and often are flawed by
anomalies. Robust identification procedures,
particularly for pulses that was described by Masarotto are
often useful. What is even more useful is the
INTERVENTION DETECTION procedures and model
diagnostic checking for necessity and
sufficiency. One has to identify , sometimes based on priors , a
model and then estimate that model and
evaluate its facility to create a gaussian white noise error process
, which means an error term that has a mean
of zero everywhere and a variance that is constant. Part and
parcel of this is to test the constancy of
the parameters over the fitting period. In summary the modeler
uses sample acf's,ccf's to identify and then
tests the estimated parameters for significance and
invariance and makes sure that the error
process can not be predicted by any known information such as
lags or leads in the input series or lags in
the noise process.
Questions: 3. Aside from
RMSE and R*2, are there some statistics that a forecaster should consider as
important?
AFS RESPONSE: The error
process should be unpredictable using either its own history or the values in
the X series. In terms of one statistic the
AIC is just another, although widely popular , weighted variance
and is judged to be of import. I would
examine closely the forecast errors for different lead times from
different origins to assess expected performance.
Unfortunately, it is not totally clear whether one should
use BIAS , VARIANCE , RMSE to assess the
expected performance. My answer has to do with the loss
function that you have.
Questions: 4. My
understanding of an input variable is that: knowing the value of a variable, we
can use
that information to improve the accuracy of
our forecast. If I have to forecast the value of my input variable
(I don't know it in advance like the value
I'm trying to forecast), is it still appropriate to use it? I guess it
is, but I'm afraid that it won't be as
efficient...
AFS RESPONSE: One often
has to predict the input series in order to predict the output series. Good
statistical packages incorporate the
uncertainty in the predictor variables when estimating the
uncertainty in the forecast of the output
series.
Questions: 5. How do we
select the lag for the input variable? Is the answer the same as the one of
question #1 with lags?
AFS RESPONSE: The
selection of the lags (initial selection) is done via cross-correlations of the
suitably
stationary and filtered series using transfer
function identification procedures. Any omitted structure
can be identified by examining the cross-correlation
between the pre-whitened X and the tentatively
identified noise process. Model re-definition
using the acf of the currently identified noise process
appropriately incorporating intervention
series will lead to further re-identification. If you have multiple
input stochastic series one has to be
concerned with pairwise identification strategies as they can and
often are flawed by cross-correlation between
the input noise series. A more correct procedure is to
identify the initial structure using a COMMON
FILTER. This was suggested by Liu and Hanssens.
Presumptive pair-wise identification assuming
independent , i.e. uncorrelated input noise structures can
have a nasty effect. This is why some
packages incorrectly advise their users that transfer function
identification is valid for models with only
one stochastic input. Not so ! Of course model diagnostic
checking can lead in either case to
reasonable models.
Questions: 6. In
SAS/ETS, there are different methods to estimate the parameters of the model:
Maximum
Likelihood, Unconditional Least Squares, ...
Does anybody know where I could find the algorithms to find
the estimates using these methods. Or could
someone help me in identifying the reasons why the
estimates resulting from those methods
sometimes don't converge so that I could avoid it by applying
some specific conditions to select an
estimation method that wouldn't fail?
AFS RESPONSE: These
procedures are flawed not because of what they do it's because of what they
don't
do. It is necessary to check invertibility of
parameters , this is not done by the tools you have referred to.
SUMMARY: Please visit http://www.autobox.com
for more info on time series analysis, forecasting and
more. AUTOBOX is an industrial strength time
series package which performs all of the above
procedures and more ...... much more. As
always care should be taken to parsimoniously represent
relationships and never to believe or develop
theory around a purely empirical approach. Remember all
models are wrong but some models are useful !
The bottom line in all of this is that you have to weave or
combine the following four kinds of model
structure:
the value of historical
readings of the Y series (reflects omitted stochastic input series)
the effect of
contemporary , lead or lag relationships with user specified input series
the effect of omitted deterministic
series which can be proxied with PULSES, LEVEL SHIFTS,
SEASONAL PULSES, LOCAL
TIME TRENDS.
the changing aspect of
the variance and model parameters. All of these things are dealt with very
aggressively with AUTOBOX. If I can help
please call (215-675-0652) Dave Reilly AUTOMATIC
FORECASTING SYSTEMS
(DEVELOPERS OF AUTOBOX) Thanks in advance! Any indications on any
of the above questions will be greatly
appreciated! Maryse Turcotte maryse.turcotte@sympatico.ca
CLICK
HERE:Home Page For AUTOBOX