QUESTION:

We often have the problem to test, if two or more time series are statistically different (by mean or distribution), in a sense like t-test, U-test or ANOVA. The problem is however, that the data are from a time series, which may be autocorrelated. Thus the measurements of a series are not independent, but the known tests would require this independence. Is there a test available, which, e.g. takes care of the autocorrelation? Or is it possible to "eliminate" the autocorrelation by ARIMA modeling? We build an ARIMA model and try to test with the time series means and the residuals of the ARIMA model. Is this allowed? All hints and suggestions are highly appreciated!

ANSWER:

The problem you refer to is generally known as pooled cross-sectional time series analysis. It is possible to estimate one set of parameters under the null hypothesis and to compare the resultant error sums of squares with locally estimated error sums of squares. We were asked by a client to incorporate this feature into AUTOBOX. Please visit the web page http://www.autobox.com and download a copy. In specific, AUTOBOX allows you to test the hypothesis that ARIMA models from K groups are equal. Additionally, this test is extended to Transfer Functions. Consider the case where you have n distinct time series (max of 3) and you wish to test the hypothesis that the individual ARIMA models are equal to each other vs. the alternative that at least 1 model differs from the rest. This requires that 1 model be specified for all n and parameter estimation be done locally and compared to a global or generic set of coefficients. A STARTING MODEL MUST EXIST as this will be used. If AUTOMATIC MODELING IS DISABLED and this answer is greater than one (1) the program will : 1. disable all model modification options (sufficiency,necessity etc) 2. expect the time series to be a concatenated series of the n distinct time series and will estimate parameters without using the last set of group i to predict the start of i+1 , where i goes from 1 to 2 (max 3 groups). Hypothesis testing is done by summing the error sum of squares from the n local estimations (done separately) and divide by the total degrees of freedom to obtain a denominator mean square error. The numerator mean square error is the differential error sum of squares (composite estimation less the sum of the locals, divided by the number of groups)( See JOHNSTON : ECONOMETRIC METHODS 1963 Page 137) Following is an example of the approach.

 

TESTING THE EQUIVALENCE

OF MODELS BETWEEN

TIME SERIES

 

POOLED CROSS-SECTIONAL

TIME SERIES

THERE ARE THREE (3) STATES IN A STUDY. WE HAVE BUILT

SEPARATE MODELS FOR EACH OF THE THREE STATES AND ARE

INTERESTED IN COMPARING THESE MODELS AND TESTING THE

HYPOTHESIS THAT THE MODELS HAVE A SET OF COMMON COEFFICIENTS.

FOLLOWING IS THE MODEL FOR NEW JERSEY.

Estimation/Diagnostic Checking for Variable Y = NJ

 

Number of Residuals (R) =n 256

Number of Degrees of Freedom =n-m 253

Residual Mean = R/n 720.571

Sum of Squares = R .277010E+11

Variance var= R /(n) .108207E+09

Adjusted Variance = R /(n-m) .109490E+09

Standard Deviation = 10463.8

Standard Error of the Mean = / (n-m) 657.851

Mean / its Standard Error = /[ / (n-m)] 1.09534

Mean Absolute Deviation = R /n 8218.96

AIC Value ( Uses var ) =nln +2m 4741.89

SBC Value ( Uses var ) =nln +m*lnn 4752.52

BIC Value ( Uses var ) =see Wei p153 3727.69

R Square =1-[ R / (A- A) ] .832228

THE ESTIMATED MODEL PARAMETERS

 

MODEL COMPONENT LAG COEFFICIENT STANDARD T-RATIO

# (BOP) ERROR

 

Lambda Value 1.000000

Differencing 1

1 Autoregressive-Factor # 1 1 -.6051146 .611033E-01 -9.903

2 2 -.3265066 .689884E-01 -4.733

3 3 -.2133137 .612056E-01 -3.485

 

[(1-B**1)]Y(T) = + A(T)[(1+ .605B+.327B**2+ .213B**3)]**-1

FOLLOWING IS THE MODEL FOR NEW YORK.

Estimation/Diagnostic Checking for Variable Y = NY

 

Number of Residuals (R) =n 256

Number of Degrees of Freedom =n-m 253

Residual Mean = R/n 765.901

Sum of Squares = R .341596E+11

Variance var= R /(n) .133436E+09

Adjusted Variance = R /(n-m) .135018E+09

Standard Deviation = 11619.7

Standard Error of the Mean = / (n-m) 730.526

Mean / its Standard Error = /[ / (n-m)] 1.04842

Mean Absolute Deviation = R /n 8882.33

AIC Value ( Uses var ) =nln +2m 4795.54

SBC Value ( Uses var ) =nln +m*lnn 4806.17

BIC Value ( Uses var ) =see Wei p153 3862.24

R Square =1-[ R / (A- A) ] .871268

THE ESTIMATED MODEL PARAMETERS

 

MODEL COMPONENT LAG COEFFICIENT STANDARD T-RATIO

# (BOP) ERROR

 

Lambda Value 1.000000

Differencing 1

1 Autoregressive-Factor # 1 1 -.5617603 .621071E-01 -9.045

2 2 -.3516715 .677407E-01 -5.191

3 3 -.1608633 .620052E-01 -2.594

 

[(1-B**1)]Y(T) = + A(T)[(1+ .562B+.352B**2+ .161B**3)]**-1

AND OUR THIRD STATE PENNSYLVANIA.

Estimation/Diagnostic Checking for Variable Y = PA

 

Number of Residuals (R) =n 256

Number of Degrees of Freedom =n-m 253

Residual Mean = R/n 1132.07

Sum of Squares = R .415012E+11

Variance var= R /(n) .162114E+09

Adjusted Variance = R /(n-m) .164036E+09

Standard Deviation = 12807.7

Standard Error of the Mean = / (n-m) 805.211

Mean / its Standard Error = /[ / (n-m)] 1.40593

Mean Absolute Deviation = R /n 9981.22

AIC Value ( Uses var ) =nln +2m 4845.38

SBC Value ( Uses var ) =nln +m*lnn 4856.01

BIC Value ( Uses var ) =see Wei p153 3786.36

R Square =1-[ R / (A- A) ] .806812

THE ESTIMATED MODEL PARAMETERS

 

MODEL COMPONENT LAG COEFFICIENT STANDARD T-RATIO

# (BOP) ERROR

 

Lambda Value 1.000000

Differencing 1

1 Autoregressive-Factor # 1 1 -.7578220 .613060E-01 -12.36

2 2 -.3965126 .733749E-01 -5.404

3 3 -.1984544 .613150E-01 -3.237

 

[(1-B**1)]Y(T) = + A(T)[(1+ .758B+.397B**2+ .198B**3)]**-1

TO TEST THE HYPOTHESIS OF A COMMON SET OF COEFFICIENTS

 

OR PARAMETERS WE NOW SHOW THE RESULT OF ESTIMATION

 

WHERE THE DATA FROM ALL THREE STATES ARE POOLED AND

USED COLLECTIVELY.

Estimation/Diagnostic Checking UNDER THE NULL HYPOTHESIS

 

Number of Residuals (R) =n 776

Number of Degrees of Freedom =n-m 773

Residual Mean = R/n 863.112

Sum of Squares = R 0.104376E+12

Variance var= R /(n) 0.134505E+09

Adjusted Variance = R /(n-m) 0.135027E+09

Standard Deviation = 11620.1

Standard Error of the Mean = / (n-m) 417.946

Mean / its Standard Error = /[ / (n-m)] 2.06513

Mean Absolute Deviation = R /n 8964.92

AIC Value ( Uses var ) =nln +2m 14530.5

SBC Value ( Uses var ) =nln +m*lnn 14544.4

BIC Value ( Uses var ) =see Wei p153 10695.1

R Square =1-[ R / (A- A) ] 0.847911

THE ESTIMATED MODEL PARAMETERS

 

MODEL COMPONENT LAG COEFFICIENT STANDARD T-RATIO

# (BOP) ERROR

 

Lambda Value 1.000000

Differencing 1

1 Autoregressive-Factor # 1 1 -0.6563098 0.353132E-01 -18.59

2 2 -0.3574422 0.404579E-01 -8.835

3 3 -0.1929096 0.353165E-01 -5.462

 

[(1-B**1)]Y(T) = + A(T)[(1+ 0.656B+ 0.357B**2+ 0.192B**3)]**-1

MODEL SUMMARY :

MODEL SUMMARY

 

NJ [(1-B**1)]Y(T)=+ A(T)[(1+ .605B+ .327B**2+ .213B**3)]**-1

= R /(n-m) = .109490E+09 R = .277010E+11

 

NY [(1-B**1)]Y(T)=+ A(T)[(1+ .562B+ .352B**2+ .161B**3)]**-1

= R /(n-m) = .135018E+09 R = .341596E+11

 
 

PA [(1-B**1)]Y(T)=+ A(T)[(1+ .758B+ .397B**2+ .198B**3)]**-1

= R /(n-m) = .162114E+09 R = .415012E+11

 
 

GLOBAL [(1-B**1)]Y(T)=+ A(T)[(1+ 0.656B+0.357B**2+ 0.192B**3)]**-1

= R /(n-m) = .135027E+09 R = .104376E+12

We have 3 groups and k=3 coefficients are in the model and

we have m observations in group 1 n observations in group 2

and o observations in group3. In our example m=n=o=260 .

 

F = [ Q3/k ]/ [ Q2/(m+n+o-3k) ]

 

1. Pool all observations (m+n+o) where m=260 and n=260 and o=260

and compute sum of squared residuals from model = .10437E+12

 

Q1= .10437E+12

 

2. Carry out model estimation locally and total the sums of squared

residuals to obtain Q2

 

Q2= .277010E+11 + .341596E+11 + .415012E+11

 

Q2= .1034E+12

3. Compute Q3 = Q1 - Q2 and hence compute F as defined by

 

Q3= .1044E+12 - .1034E+12

 

Q3= .1E+10

 
 

F= [ Q3/k ]/ [ Q2/(m+n+o-2k) ]

 

F= [.1E+10/3]/[.104E+12/(260+260+260-2*3)]

 

F= [.033E+10]/[.0134E+11]

 

F= .3

 

with degrees of freedom (k m+n+o-3k) OR (3 774) the tabular

 

F value is 3.35 at alpha = .01

4. If F  the tabular F then reject the hypothesis of the equality

of the sets of coefficients otherwise conclude that there is not

enough evidence to state that they are statistically significantly

different from each other.

 
 

Since .3 is less than 3.35 we conclude that the groups can not be

statistically proven to be unequal with respect to their parameters

 


 

 


 



  Consider this ! Motivated by D.Davis@lboro.ac.uk

  One of the major problems with statistical tests that assume independence
  is the manner in which degrees of freedom or "information content" is
  computed. We now present a rather difficult analysis problem where
  the number of observations is 4 x 10 x 20 x 50 = 4000 but they are
  not independent random samples.




  Consider 4 subjects or groups .... for example 4 individuals.            (1)

  Consider 10 conditions or separate experimental conditions.              (2)

  Consider 20 trials or repeat samplings using exactly the same conditions (3)

  Consider 50 readings taken at fixed intervals over time.                 (4)









    (1)      (2)                    (3)


                    |Trial 1                 |Trial 2.......Trial 20

                                    (4)

 |subject|condition |time 1|time 2 ...time 50|
 |       |          |      |                 |
 | A     | 1        |......|......
 | ..    | ..       |......|......
 | A     |10        |......|......
 | B     | 1        |......|......
 | ..    | ..       |......|......
 | B     |10        |......|......
 | C     | 1        |......|......
 | ..    | ..       |......|......
 | C     |10        |......|......
 | D     | 1        |......|......
 | ..    | ..       |......|......
 | D     |10        |......|......

 To consider using a 4 way ancova, testing between subjects and conditions,
 testing within trials; with the 10 conditions nested within the
 subjects would be downright silly.

 The obvious problem is that this is time series data and there will be a
 serious overestimation of the degrees of freedom due to the 50 repeat