If you are going to analyze time
series data .... perhaps we could be of help. Multiple Regression was
originally developed for cross-sectional data but Statisticians/Economists have
been applying it ( mostly incorrectly ) to chronological or longitudinal data
with little regard for the Gaussian assumptions.
Following is a brief introduction
to time series analysis
Time series = a sequence of
observations taken on a variable or multiple variables at successive points in
time.
Objectives of time series analysis:
1. To understand the structure of
the time series (how it depends on time, itself, and other time series
variables)
2. To forecast/predict future
values of the time series
What is wrong with using
regression for modeling time series?
* Perhaps nothing. The test is
whether the residuals satisfy the regression assumptions: linearity,
homoscedasticity, independence, and (if necessary) normality. It is important
to test for Pulses or one-time unusual values and to either adjust the data or
to incorporate a Pulse Intervention variable to account for the identified
anomaly.
Unusual values can often arise
Seasonally, thus one has to identify and incorporate Seasonal Intervention
variables.
Unusual values can often arise at
successive points in time earmarking the need for either a Level Shift
Intervention to deal with the proven mean shift in the residuals.
* Often, time series analyzed by
regression suffer from autocorrelated residuals. In practice, positive
autocorrelation seems to occur much more frequently than negative.
* Positively autocorrelated
residuals make regression tests more significant than they should be and
confidence intervals too narrow; negatively autocorrelated residuals do the
reverse.
* In some time series regression
models, autocorrelation makes biased estimates, where the bias cannot be fixed
no matter how many data points or observations that you have.
To use regression methods on time
series data, first plot the data over time. Study the plot for evidence of
trend and seasonality. Use numerical tests for autocorrelation, if not apparent
from the plot.
* Trend can be dealt with by using
functions of time as predictors. Sometimes we have multiple trends and the
trick is to identify the beginning and end periods for each of the trends.
* Seasonality can be dealt with by
using seasonal indicators (Seasonal Pulses) as predictors or by allowing specific
auto-dependence or auto-projection such that the historical values ( Y(t-s) )
are used to predict Y(t)
* Autocorrelation can be dealt
with by using lags of the response variable Y as predictors.
* Run the regression and diagnose
how well the regression assumptions are met.
* the residuals should have
approximately the same variance (homoscedasticity) otherwise some form of
"weighted" analysis might be needed.
* the model form/parameters should
be invariant i.e. unchanging over time. If not then we perhaps have too much
data and need to determine at what points in time the model form or parameters
changed.
* 1. How to determine the temporal
relationship for each input series ,i.e. is the relationship contemporaneous,
lead or lag or some combination ? ( How to identify the form of a multi-input
transfer function without assuming independence of the inputs .)
* 2. How to determine the arima
model for the noise structure reflecting omitted variables.
* 3. How to do this in a ROBUST
MANNER where pulses, seasonal pulses , level shifts and local time trends are
identified and incorporated.
* 4. How to test for and include
specific structure to deal with non-constant variance of the error process.
* 5. How to test for and treat
non-constancy of parameters or model form.
Time series data presents a number
of problems/opportunities that standard statistical packages either avoid or
ignore.
|
Contrasting
the Modelling Opportinities |
|||
Bullet
Point |
Opportunity |
Cross-Sectional
Data (independent values) |
Time
Series Data (dependent values) |
|
#1 |
DETERMINATION
OF CORRECT TEMPORAL RELATIONSHIP |
Y(I)=B0
+B1*X(I) (THERE IS NO T) |
Y(T)=
B0 + B1*X(T) or Y(T)= B0 + B1*X(T-1) or .... |
|
#2 |
DETECTION
OF EFFECT OF OMITTED VARIABLES |
DOESN'T
APPLY |
ARIMA
MODEL FIXUP PROXYING OMITTED VARIABLES |
|
#3 |
EFFECT
OF OUTLIERS |
|
|
|
|
PULSES |
ROBUST
REGRESSION |
OUTLIER
DETECTION (0,0,0,1,0,0,,,,) |
|
|
SEASONAL
PULSES |
DOESN'T
APPLY |
OUTLIER
DETECTION (0,0,0,1,0,0,0,1,,,,,) |
|
|
LEVEL
OR STEP SHIFT |
DOESN'T
APPLY |
OUTLIER
DETECTION (0,0,0,1,1,1,1,1,1,,,) |
|
|
LOCAL
TIME TRENDS |
DOESN'T
APPLY |
OUTLIER
DETECTION (0,0,0,1,2,3,4,5,6,,,) |
|
#4 |
EFFECT
OF NON-CONSTANT VARIANCE |
|
|
|
|
POWER
TRANSFORMATIONS |
TRANSFORMATIONS
(LOGS,RECIPROCALS) |
TRANSFORMATIONS
(LOGS,RECIPROCALS) |
|
|
REGIME
CHANGES IN VARIANCE |
WEIGHTED
REGRESSION (KNOWN WEIGHTS) |
WEIGHTED
REGRESSION (KNOWN OR UNKNOWN WEIGHTS) |
|
|
|
|
|
|
#5 |
EFFECT
OF NON-CONSTANT PARAMETERS OR MODEL |
CHOW
TEST FOR CLASSIFICATION |
CHOW
TEST FOR CLASSIFICATION |