Matching Theoretical Autocovariance to Actual Autocovariance

	QUESTION: We often have the problem to test, if two or more time series are statistically different (by mean or distribution), in a sense like t-test, U-test or ANOVA. The problem is however, that the data are from a time series, which may be autocorrelated. Thus the measurements of a series are not independent, but the known tests would require this independence. Is there a test available, which, e.g. takes care of the autocorrelation? Or is it possible to "eliminate" the autocorrelation by ARIMA modeling? We build an ARIMA model and try to test with the time series means and the residuals of the ARIMA model. Is this allowed? All hints and suggestions are highly appreciated! ANSWER: The problem you refer to is generally known as pooled cross-sectional time series analysis. It is possible to estimate one set of parameters under the null hypothesis and to compare the resultant error sums of squares with locally estimated error sums of squares. We were asked by a client to incorporate this feature into AUTOBOX. Please visit the web page http://www.autobox.com and download a copy. In specific, AUTOBOX allows you to test the hypothesis that ARIMA models from K groups are equal. Additionally, this test is extended to Transfer Functions. Consider the case where you have n distinct time series (max of 3) and you wish to test the hypothesis that the individual ARIMA models are equal to each other vs. the alternative that at least 1 model differs from the rest. This requires that 1 model be specified for all n and parameter estimation be done locally and compared to a global or generic set of coefficients. A STARTING MODEL MUST EXIST as this will be used. If AUTOMATIC MODELING IS DISABLED and this answer is greater than one (1) the program will : 1. disable all model modification options (sufficiency,necessity etc) 2. expect the time series to be a concatenated series of the n distinct time series and will estimate parameters without using the last set of group i to predict the start of i+1 , where i goes from 1 to 2 (max 3 groups). Hypothesis testing is done by summing the error sum of squares from the n local estimations (done separately) and divide by the total degrees of freedom to obtain a denominator mean square error. The numerator mean square error is the differential error sum of squares (composite estimation less the sum of the locals, divided by the number of groups)( See JOHNSTON : ECONOMETRIC METHODS 1963 Page 137) Following is an example of the approach.
	TESTING THE EQUIVALENCE
	OF MODELS BETWEEN
	TIME SERIES

	POOLED CROSS-SECTIONAL
	TIME SERIES
	THERE ARE THREE (3) STATES IN A STUDY. WE HAVE BUILT
	SEPARATE MODELS FOR EACH OF THE THREE STATES AND ARE
	INTERESTED IN COMPARING THESE MODELS AND TESTING THE
	HYPOTHESIS THAT THE MODELS HAVE A SET OF COMMON COEFFICIENTS.
	FOLLOWING IS THE MODEL FOR NEW JERSEY.
	Estimation/Diagnostic Checking for Variable Y = NJ

	Number of Residuals (R) =n 256
	Number of Degrees of Freedom =n-m 253
	Residual Mean = R/n 720.571
	Sum of Squares = R .277010E+11
	Variance var= R /(n) .108207E+09
	Adjusted Variance = R /(n-m) .109490E+09
	Standard Deviation = 10463.8
	Standard Error of the Mean = / (n-m) 657.851
	Mean / its Standard Error = /[ / (n-m)] 1.09534
	Mean Absolute Deviation = R /n 8218.96
	AIC Value ( Uses var ) =nln +2m 4741.89
	SBC Value ( Uses var ) =nln +m*lnn 4752.52
	BIC Value ( Uses var ) =see Wei p153 3727.69
	R Square =1-[ R / (A- A) ] .832228
	THE ESTIMATED MODEL PARAMETERS

	MODEL COMPONENT LAG COEFFICIENT STANDARD T-RATIO
	# (BOP) ERROR

	Lambda Value 1.000000
	Differencing 1
	1 Autoregressive-Factor # 1 1 -.6051146 .611033E-01 -9.903
	2 2 -.3265066 .689884E-01 -4.733
	3 3 -.2133137 .612056E-01 -3.485

	[(1-B1)]Y(T) = + A(T)[(1+ .605B+.327B2+ .213B3)]-1
	FOLLOWING IS THE MODEL FOR NEW YORK.
	Estimation/Diagnostic Checking for Variable Y = NY

	Number of Residuals (R) =n 256
	Number of Degrees of Freedom =n-m 253
	Residual Mean = R/n 765.901
	Sum of Squares = R .341596E+11
	Variance var= R /(n) .133436E+09
	Adjusted Variance = R /(n-m) .135018E+09
	Standard Deviation = 11619.7
	Standard Error of the Mean = / (n-m) 730.526
	Mean / its Standard Error = /[ / (n-m)] 1.04842
	Mean Absolute Deviation = R /n 8882.33
	AIC Value ( Uses var ) =nln +2m 4795.54
	SBC Value ( Uses var ) =nln +m*lnn 4806.17
	BIC Value ( Uses var ) =see Wei p153 3862.24
	R Square =1-[ R / (A- A) ] .871268
	THE ESTIMATED MODEL PARAMETERS

	MODEL COMPONENT LAG COEFFICIENT STANDARD T-RATIO
	# (BOP) ERROR

	Lambda Value 1.000000
	Differencing 1
1 Autoregressive-Factor # 1 1 -.5617603 .621071E-01 -9.045
2 2 -.3516715 .677407E-01 -5.191
3 3 -.1608633 .620052E-01 -2.594

[(1-B1)]Y(T) = + A(T)[(1+ .562B+.352B2+ .161B3)]-1
AND OUR THIRD STATE PENNSYLVANIA.
Estimation/Diagnostic Checking for Variable Y = PA

Number of Residuals (R) =n 256
Number of Degrees of Freedom =n-m 253
Residual Mean = R/n 1132.07
Sum of Squares = R .415012E+11
Variance var= R /(n) .162114E+09
Adjusted Variance = R /(n-m) .164036E+09
Standard Deviation = 12807.7
Standard Error of the Mean = / (n-m) 805.211
Mean / its Standard Error = /[ / (n-m)] 1.40593
Mean Absolute Deviation = R /n 9981.22
AIC Value ( Uses var ) =nln +2m 4845.38
SBC Value ( Uses var ) =nln +m*lnn 4856.01
BIC Value ( Uses var ) =see Wei p153 3786.36
R Square =1-[ R / (A- A) ] .806812
THE ESTIMATED MODEL PARAMETERS

MODEL COMPONENT LAG COEFFICIENT STANDARD T-RATIO
# (BOP) ERROR

Lambda Value 1.000000
Differencing 1
1 Autoregressive-Factor # 1 1 -.7578220 .613060E-01 -12.36
2 2 -.3965126 .733749E-01 -5.404
3 3 -.1984544 .613150E-01 -3.237

[(1-B1)]Y(T) = + A(T)[(1+ .758B+.397B2+ .198B3)]-1
TO TEST THE HYPOTHESIS OF A COMMON SET OF COEFFICIENTS

OR PARAMETERS WE NOW SHOW THE RESULT OF ESTIMATION

WHERE THE DATA FROM ALL THREE STATES ARE POOLED AND
USED COLLECTIVELY.
Estimation/Diagnostic Checking UNDER THE NULL HYPOTHESIS

Number of Residuals (R) =n 776
Number of Degrees of Freedom =n-m 773
Residual Mean = R/n 863.112
Sum of Squares = R 0.104376E+12
Variance var= R /(n) 0.134505E+09
Adjusted Variance = R /(n-m) 0.135027E+09
Standard Deviation = 11620.1
Standard Error of the Mean = / (n-m) 417.946
Mean / its Standard Error = /[ / (n-m)] 2.06513
Mean Absolute Deviation = R /n 8964.92
AIC Value ( Uses var ) =nln +2m 14530.5
SBC Value ( Uses var ) =nln +m*lnn 14544.4
BIC Value ( Uses var ) =see Wei p153 10695.1
R Square =1-[ R / (A- A) ] 0.847911
THE ESTIMATED MODEL PARAMETERS

MODEL COMPONENT LAG COEFFICIENT STANDARD T-RATIO
# (BOP) ERROR

Lambda Value 1.000000
Differencing 1
1 Autoregressive-Factor # 1 1 -0.6563098 0.353132E-01 -18.59
2 2 -0.3574422 0.404579E-01 -8.835
3 3 -0.1929096 0.353165E-01 -5.462

[(1-B1)]Y(T) = + A(T)[(1+ 0.656B+ 0.357B2+ 0.192B3)]-1
MODEL SUMMARY :
MODEL SUMMARY

NJ [(1-B1)]Y(T)=+ A(T)[(1+ .605B+ .327B2+ .213B3)]-1
= R /(n-m) = .109490E+09 R = .277010E+11

NY [(1-B1)]Y(T)=+ A(T)[(1+ .562B+ .352B2+ .161B3)]-1
= R /(n-m) = .135018E+09 R = .341596E+11


PA [(1-B1)]Y(T)=+ A(T)[(1+ .758B+ .397B2+ .198B3)]-1
= R /(n-m) = .162114E+09 R = .415012E+11


GLOBAL [(1-B1)]Y(T)=+ A(T)[(1+ 0.656B+0.357B2+ 0.192B3)]-1
= R /(n-m) = .135027E+09 R = .104376E+12
We have 3 groups and k=3 coefficients are in the model and
we have m observations in group 1 n observations in group 2
and o observations in group3. In our example m=n=o=260 .

F = [ Q3/k ]/ [ Q2/(m+n+o-3k) ]

1. Pool all observations (m+n+o) where m=260 and n=260 and o=260
and compute sum of squared residuals from model = .10437E+12

Q1= .10437E+12

2. Carry out model estimation locally and total the sums of squared
residuals to obtain Q2

Q2= .277010E+11 + .341596E+11 + .415012E+11

Q2= .1034E+12
3. Compute Q3 = Q1 - Q2 and hence compute F as defined by

Q3= .1044E+12 - .1034E+12

Q3= .1E+10


F= [ Q3/k ]/ [ Q2/(m+n+o-2k) ]

F= [.1E+10/3]/[.104E+12/(260+260+260-2*3)]

F= [.033E+10]/[.0134E+11]

F= .3

with degrees of freedom (k m+n+o-3k) OR (3 774) the tabular

F value is 3.35 at alpha = .01
4. If F the tabular F then reject the hypothesis of the equality
of the sets of coefficients otherwise conclude that there is not
enough evidence to state that they are statistically significantly
different from each other.


Since .3 is less than 3.35 we conclude that the groups can not be
statistically proven to be unequal with respect to their parameters



  Consider this ! Motivated by D.Davis@lboro.ac.uk

  One of the major problems with statistical tests that assume independence
  is the manner in which degrees of freedom or "information content" is
  computed. We now present a rather difficult analysis problem where
  the number of observations is 4 x 10 x 20 x 50 = 4000 but they are
  not independent random samples.




  Consider 4 subjects or groups .... for example 4 individuals.            (1)

  Consider 10 conditions or separate experimental conditions.              (2)

  Consider 20 trials or repeat samplings using exactly the same conditions (3)

  Consider 50 readings taken at fixed intervals over time.                 (4)









    (1)      (2)                    (3)


                    |Trial 1                 |Trial 2.......Trial 20

                                    (4)

 |subject|condition |time 1|time 2 ...time 50|
 |       |          |      |                 |
 | A     | 1        |......|......
 | ..    | ..       |......|......
 | A     |10        |......|......
 | B     | 1        |......|......
 | ..    | ..       |......|......
 | B     |10        |......|......
 | C     | 1        |......|......
 | ..    | ..       |......|......
 | C     |10        |......|......
 | D     | 1        |......|......
 | ..    | ..       |......|......
 | D     |10        |......|......

 To consider using a 4 way ancova, testing between subjects and conditions,
 testing within trials; with the 10 conditions nested within the
 subjects would be downright silly.

 The obvious problem is that this is time series data and there will be a
 serious overestimation of the degrees of freedom due to the 50 repeat

measurements over time never mind that the 20 trials or repeat

 samplings of
 the individual at the same test conditions.

QUESTION:

ANSWER: