DATA

QUESTION ABOUT SAMPLE SIZE:

I am looking for guidance regarding sample size requirements for time series analysis (ARIMA). I have seen it written that a minimum of 100 time points are required, yet I'm wondering if this applies to all forms of time series analysis. I'm interested in looking at lagged cross correlations between two time series.

ANSWER:

The idea that there is a minimum requirement for time series analysis is quite naive. In order to IDENTIFY a seasonal model one needs a minimum of two seasons and preferably three or more in order to develop statistics that might confirm the presence of either Autoregressive or Moving-Average structure. However, one can always set up a "straw-man" and use estimation to "knock him down" . Thus even with the smallest of sample sizes model building can proceed.

Aside from the 2/3 times (the number of seasons recorded) requirement the important thing is the Signal to Noise ratio which is the ratio of the EXPLAINED to the UNEXPLAINED VARIANCE. If you have a paucity of data but a strong i.e. high signal to noise ratio present then one can trust the identification process. However, if there is a large amount of error due to outliers, or changes in parameters, or changes in variance then even with large N it may not be possible to properly identify the model form.

The best thing IMHO is to err on the liberal side and estimate potentially over-identified, but not ridiculously over-identified, models and use estimation, step-down and step-up strategies to refine and redefine the model. It is very important to studiously avoid "kitchen-sink modelling" as some very bad things can happen. Do not burden your initial model with differencing operators or various forms of power transforms such as logarithms or reciprocals. Keep it as simple as you can in the beginning and then use model error diagnostics to refine the model, testing alternative models as you go.