QUESTIONS/ANSWERS:

 

 

 
Subject: Forecasting (TS and input variables) Date: Wed, 12 Nov 1997 21:28:46 -0800 Hello sci.stat.math and sci.op-research readers! I am a young OR analyst. I have a few questions regarding forecasting. Most of them are related to input variables.
 
 Please, reply by e-mail (maryse.turcotte@sympatico.ca) and I will summarize to the groups! Thank you very much!
 
 Questions: 1. What are the tests that one can perform to determine whether it is appropriate or not to use
 
 an input variable? Is a test of correlation between the 2 time series sufficient or does there exist some
 
 tests that are especially designed to determine whether or not to include the additional information? (I've
 
 tried with or without an input variable and compared the resulting RMSE, ... on hold-out samples because
 
 adding an input variable always seemed to improve the fitting!)
 
 
AFS RESPONSE:
 
If you omit a needed stochastic input series then the noise or error from the under
 
-specified model will incorporate or reflect that omission. This aspect of ARIMA structures is well
 
 known. If you omit a needed deterministic series, for example a known intervention , the mean of the
 
 under-specified error process will be effected and thus the variance of the errors untreated will be larger
 
 than necessary leading to a downward bias in test statistics. An example of an omitted stochastic series is
 
 if you omit the earnings of a company in predicting the stock price, history of the stock price becomes
 
 "important" as previous earnings have effected previous stock prices thus the effect of earnings has
 
 already been incorporated by using the history of the stock price. Another example is if you omit
 
 temperature data from a model designed to predict monthly beer sales you will identify a seasonal ARIMA
 
 structure. This structure disappears when you incorporate temperature into the model as an explicit
 
 input series. Thus the ARIMA structure is clearly seen as a proxy for the omitted temperature series.
 
 This aspect of ARIMA structure was pointed out to me by Fernandez in an AER article in 1977. The test
 
 for correlation , as you put it , is potentially flawed by
 
 
 
1.              autocorrelation within the input series itself
 
2.              the effect of Pulses , level shifts , Seasonal Pulses and Local time trends on either the input or
 
 
the output series.
 
These effects can be identified and treated using residual diagnostic checking or INTERVENTION
 
 DETECTION procedures. Bartlett warned in 1932 about "why we sometimes get nonsense correlation
 
 between two time series" . Fama erred in the other direction by not accounting for unusual values in his
 
 "proof" that the stock market was a random walk. Whether you use ARIMA structure or the actual X
 
 variable you can only improve the fit or the R-Squared. This is an aspect of the error minimization
 
 process. Whether or not the improvement is statistically significant is available via the likelihood ratio
 
 test (F or T). The real question is whether or not this actually improves the prediction. Care should be
 
 taken to evaluate forecast errors from a number of different origins and for a number of different lead
 
 times. The correct procedure to identify the nature and form of a stochastic input is to pre-whiten the
 
 input series and to pass this filter over the Y series and to compute cross-correlations of these two
 
 proxies. This is done for one and only one reason to IDENTIFY the appropriate model structure. The
 
 literature of TRANSFER FUNCTIONS is appropriate in this regard. Note that this is a tentative
 
 identification and may be flawed by outliers or incorrect model identification. It is necessary to
 
 simultaneously estimate and to examine the residuals for: a. any autocorrelative structure b. whether or
 
 not the residuals can be predicted or modeled by omitted lags in the stochastic series. c. unusual values in
 
 the mean of the errors (INTERVENTION DETECTION) or the variance of the errors (NON-CONSTANT
 
 VARIANCE) . Note that differencing is include ONLY to identify and may or may not be necessary in the
 
 actual transfer function. Early statisticians often detrended or differenced data prior to computing cross
 
-correlations. These up-front filters are of course subsets of extended ARIMA structures and may be
 
 counter-productive. The form and nature of the correct filter can be identified from the data itself.
 
 
 
Questions: 2. There are some analysis that can be performed prior to choosing a time series forecasting
 
 model. I'm thinking about autocorrelations, partial autocorrelations, ... In your opinion, should one stick
 
 to the model prescribed by the results of these analysis even though there are some other models that
 
 seem to perform better? Is it wise to choose the most easily implementable model of a subset of models
 
 that seem to perform a little better than the others when one is not sure about which one is the most
 
 suitable?
 

 

 
 
 
AFS RESPONSE: The techniques of autocorrelations, partial autocorrelations, cross-correlations are all
 
 useful but they are estimated by error minimization procedures. These tools can and often are flawed by
 
 anomalies. Robust identification procedures, particularly for pulses that was described by Masarotto are
 
 often useful. What is even more useful is the INTERVENTION DETECTION procedures and model
 
 diagnostic checking for necessity and sufficiency. One has to identify , sometimes based on priors , a
 
 model and then estimate that model and evaluate its facility to create a gaussian white noise error process
 
 , which means an error term that has a mean of zero everywhere and a variance that is constant. Part and
 
 parcel of this is to test the constancy of the parameters over the fitting period. In summary the modeler
 
 uses sample acf's,ccf's to identify and then tests the estimated parameters for significance and
 
 invariance and makes sure that the error process can not be predicted by any known information such as
 
 lags or leads in the input series or lags in the noise process.
 
 
 
Questions: 3. Aside from RMSE and R*2, are there some statistics that a forecaster should consider as
 
 important?
 
 
 
AFS RESPONSE: The error process should be unpredictable using either its own history or the values in
 
 the X series. In terms of one statistic the AIC is just another, although widely popular , weighted variance
 
 and is judged to be of import. I would examine closely the forecast errors for different lead times from
 
 different origins to assess expected performance. Unfortunately, it is not totally clear whether one should
 
 use BIAS , VARIANCE , RMSE to assess the expected performance. My answer has to do with the loss
 
 function that you have.
 
 
 
Questions: 4. My understanding of an input variable is that: knowing the value of a variable, we can use
 
 that information to improve the accuracy of our forecast. If I have to forecast the value of my input variable
 
 (I don't know it in advance like the value I'm trying to forecast), is it still appropriate to use it? I guess it
 
 is, but I'm afraid that it won't be as efficient...
 
 
 
AFS RESPONSE: One often has to predict the input series in order to predict the output series. Good
 
 statistical packages incorporate the uncertainty in the predictor variables when estimating the
 
 uncertainty in the forecast of the output series.
 
 
 
Questions: 5. How do we select the lag for the input variable? Is the answer the same as the one of
 
 question #1 with lags?
 
 
 
AFS RESPONSE: The selection of the lags (initial selection) is done via cross-correlations of the suitably
 
 stationary and filtered series using transfer function identification procedures. Any omitted structure
 
 can be identified by examining the cross-correlation between the pre-whitened X and the tentatively
 
 identified noise process. Model re-definition using the acf of the currently identified noise process
 
 appropriately incorporating intervention series will lead to further re-identification. If you have multiple
 
 input stochastic series one has to be concerned with pairwise identification strategies as they can and
 
 often are flawed by cross-correlation between the input noise series. A more correct procedure is to
 
 identify the initial structure using a COMMON FILTER. This was suggested by Liu and Hanssens.
 
 Presumptive pair-wise identification assuming independent , i.e. uncorrelated input noise structures can
 
 have a nasty effect. This is why some packages incorrectly advise their users that transfer function
 
 identification is valid for models with only one stochastic input. Not so ! Of course model diagnostic
 
 checking can lead in either case to reasonable models.
 
 
 
Questions: 6. In SAS/ETS, there are different methods to estimate the parameters of the model: Maximum
 
 Likelihood, Unconditional Least Squares, ... Does anybody know where I could find the algorithms to find
 
 the estimates using these methods. Or could someone help me in identifying the reasons why the
 
 estimates resulting from those methods sometimes don't converge so that I could avoid it by applying
 
 some specific conditions to select an estimation method that wouldn't fail?
 
 
 
AFS RESPONSE: These procedures are flawed not because of what they do it's because of what they don't
 
 do. It is necessary to check invertibility of parameters , this is not done by the tools you have referred to.
 
 SUMMARY: Please visit http://www.autobox.com for more info on time series analysis, forecasting and
 
 more. AUTOBOX is an industrial strength time series package which performs all of the above
 
 procedures and more ...... much more. As always care should be taken to parsimoniously represent
 
 relationships and never to believe or develop theory around a purely empirical approach. Remember all
 
 models are wrong but some models are useful ! The bottom line in all of this is that you have to weave or
 
 combine the following four kinds of model structure:
 
the value of historical readings of the Y series (reflects omitted stochastic input series)
 
the effect of contemporary , lead or lag relationships with user specified input series
 
the effect of omitted deterministic series which can be proxied with PULSES, LEVEL SHIFTS,
 
SEASONAL PULSES, LOCAL TIME TRENDS.
 
the changing aspect of the variance and model parameters. All of these things are dealt with very
 
 aggressively with AUTOBOX. If I can help please call (215-675-0652) Dave Reilly AUTOMATIC
 
FORECASTING SYSTEMS (DEVELOPERS OF AUTOBOX) Thanks in advance! Any indications on any
 
 of the above questions will be greatly appreciated! Maryse Turcotte maryse.turcotte@sympatico.ca
 



 

CLICK HERE:Home Page For AUTOBOX