QUESTION

QUESTION:

Could someone give me a brief explanation of autocorrelation and maybe a formula? I was satisfied with

averaging until I started reading this group. Thanks.

ANSWER:

Autocorrelation is as the name suggests correlation on itself. If you take a sequence of equally spaced

readings this is called a time series. It is also called longitudinal data. When dealing with time series, we

are concerned that the Mean may not be representative or re-stated the Mean may not be the Expected

Value. The Mean is equal to the Expected Value if the random variable being analyzed has the distribution

X(_t) = u + A(_t) WHERE U = A CONSTANT and A(_t) has a NIID or GAUSSIAN DIST. The issue is if for

example X(_t) = u + .7 * X(_t-1) + A(_t) is more appropriate for the data then the Expected Value is no longer

the Mean. The way you find out that this might be the correct model is to lag the series that you observe

one period to create a new series called Z and to compute the simple correlation coefficient between X and

Z. By definition, the simple cross-correlation coefficient is equal to the autocorrelation coefficient. Other

autocorrelation coefficients for different lags can be computed.

The bottom line is that the autocorrelation coefficient measures the unconditional correlation between the

two series. Now the partial autocorrelation coefficient measures the conditional relationship between two

series. This is the same as the partial correlation coefficient. One would compute the partial for lag 2 by

estimating a multiple regression with two input series. The first would be the series lagged 1 period

while the second would be the series lagged 2 periods. The coefficient for the second series would be the

Partial Autocorrelation or Partial Correlation due to lag 2. Box and Jenkins used these concepts in their

work on IDENTIFYING model forms. Please see our Web Site ( http://www.autobox.com) for more stuff on

autocorrelation. AUTOBOX uses the sample autocorrelation and sample partial autocorrelation to

identify and refine the initially identified model. Like any other estimate based on minimizing an error

sum of squares these coefficients are not robust to outliers, be they pulses, level shifts, seasonal pulses

or local time trends thus the need for INTERVENTION DETECTION which provides more robust

(healthy) estimates. These are things that we have been concerned with in our work.