Regression and Lies My Mother Never Told Me

For starters......

If you are going to analyze time series data .... perhaps we could be of help. Multiple Regression was originally developed for cross-sectional data but Statisticians/Economists have been applying it ( mostly incorrectly ) to chronological or longitudinal data with little regard for the Gaussian assumptions.

Following is a brief introduction to time series analysis

Time series = a sequence of observations taken on a variable or multiple variables at successive points in time.

Objectives of time series analysis:

1. To understand the structure of the time series (how it depends on time, itself, and other time series variables)

2. To forecast/predict future values of the time series

What is wrong with using regression for modeling time series?

* Perhaps nothing. The test is whether the residuals satisfy the regression assumptions: linearity, homoscedasticity, independence, and (if necessary) normality. It is important to test for Pulses or one-time unusual values and to either adjust the data or to incorporate a Pulse Intervention variable to account for the identified anomaly.

Unusual values can often arise Seasonally, thus one has to identify and incorporate Seasonal Intervention variables.

Unusual values can often arise at successive points in time earmarking the need for either a Level Shift Intervention to deal with the proven mean shift in the residuals.

* Often, time series analyzed by regression suffer from autocorrelated residuals. In practice, positive autocorrelation seems to occur much more frequently than negative.

* Positively autocorrelated residuals make regression tests more significant than they should be and confidence intervals too narrow; negatively autocorrelated residuals do the reverse.

* In some time series regression models, autocorrelation makes biased estimates, where the bias cannot be fixed no matter how many data points or observations that you have.

To use regression methods on time series data, first plot the data over time. Study the plot for evidence of trend and seasonality. Use numerical tests for autocorrelation, if not apparent from the plot.

* Trend can be dealt with by using functions of time as predictors. Sometimes we have multiple trends and the trick is to identify the beginning and end periods for each of the trends.

* Seasonality can be dealt with by using seasonal indicators (Seasonal Pulses) as predictors or by allowing specific auto-dependence or auto-projection such that the historical values ( Y(t-s) ) are used to predict Y(t)

* Autocorrelation can be dealt with by using lags of the response variable Y as predictors.

* Run the regression and diagnose how well the regression assumptions are met.

* the residuals should have approximately the same variance (homoscedasticity) otherwise some form of "weighted" analysis might be needed.

* the model form/parameters should be invariant i.e. unchanging over time. If not then we perhaps have too much data and need to determine at what points in time the model form or parameters changed.

Problems and Opportunities

* 1. How to determine the temporal relationship for each input series ,i.e. is the relationship contemporaneous, lead or lag or some combination ? ( How to identify the form of a multi-input transfer function without assuming independence of the inputs .)

* 2. How to determine the arima model for the noise structure reflecting omitted variables.

* 3. How to do this in a ROBUST MANNER where pulses, seasonal pulses , level shifts and local time trends are identified and incorporated.

* 4. How to test for and include specific structure to deal with non-constant variance of the error process.

* 5. How to test for and treat non-constancy of parameters or model form.

Time series data presents a number of problems/opportunities that standard statistical packages either avoid or ignore.

 

 

Contrasting the Modelling Opportinities

 

Bullet Point

Opportunity

Cross-Sectional Data (independent values)

Time Series Data (dependent values)

 

 

#1

DETERMINATION OF CORRECT TEMPORAL RELATIONSHIP

Y(I)=B0 +B1*X(I) (THERE IS NO T)

Y(T)= B0 + B1*X(T) or Y(T)= B0 + B1*X(T-1) or ....

 

 

#2

DETECTION OF EFFECT OF OMITTED VARIABLES

DOESN'T APPLY

ARIMA MODEL FIXUP PROXYING OMITTED VARIABLES

 

 

#3

EFFECT OF OUTLIERS

 

 

 

PULSES

ROBUST REGRESSION

OUTLIER DETECTION (0,0,0,1,0,0,,,,)

 

SEASONAL PULSES

DOESN'T APPLY

OUTLIER DETECTION (0,0,0,1,0,0,0,1,,,,,)

 

LEVEL OR STEP SHIFT

DOESN'T APPLY

OUTLIER DETECTION (0,0,0,1,1,1,1,1,1,,,)

 

LOCAL TIME TRENDS

DOESN'T APPLY

OUTLIER DETECTION (0,0,0,1,2,3,4,5,6,,,)

 

 

#4

EFFECT OF NON-CONSTANT VARIANCE

 

 

 

POWER TRANSFORMATIONS

TRANSFORMATIONS (LOGS,RECIPROCALS)

TRANSFORMATIONS (LOGS,RECIPROCALS)

 

REGIME CHANGES IN VARIANCE

WEIGHTED REGRESSION (KNOWN WEIGHTS)

WEIGHTED REGRESSION (KNOWN OR UNKNOWN WEIGHTS)

 

 

 

 

 

#5

EFFECT OF NON-CONSTANT PARAMETERS OR MODEL

CHOW TEST FOR CLASSIFICATION

CHOW TEST FOR CLASSIFICATION