Frequently Asked Statistical Questions (INTERVENTION/OUTLIERS)

QUESTION: I have a problem in testing the significance of an event. Through a panel survey conducted on 115 points of sale (POS), I gather weekly sales data for 3 products. Follows a summary table where: Xi = Weekly total sales (115 POS) per product Yi = Number of POS (counts) which in the corresponding week had the product available on shelf ................. Week 1 Week 2 ...... Week n ................. X1 Y1 X2 Y2 ...... Xn Yn Product A 136 90 142 86 ...... 156 99 Product B 645 76 538 84 ...... 552 81 Product C 318 103 346 108 ...... 301 92 My questions are: 1. How can test if the effect of, e.g., a promotional action caused sales in week 2 for product A (142 US$) to be significantly different from the corresponding level in week 1 (136 US$)? ANSWER: At one level, your problem is a Transfer Function where you are specifying two cause variables ; 1. The Number of Stores and 2. An indicator variable Z (all zeroes save the week in question which would be indicated by a one). This variable is called an Intervention Variable and is hypothesized in advance. If it were found empirically it would still be called an Intervention Variable but it would have been detected by Intervention Detection or Outlier Analysis. Briefly, the reason that you have to construct a Transfer function is that Sales in a particular week may be effected by sales the previous week or weeks, sales at this point in time a year ago, the number of sites carrying the product last week , the week before that and/or a recent level shift due to some omitted variable or even a trend in sales. All of these things, and more , may be operational thus to identify and measure the increment or lift due to this unique activity one has to allow for the other effects. Failing this one can incorrectly assign significance to what is otherwise caused by lag effects or seasonal processes or local changes in the mean. Transfer Function is defined as: Y _t= CONSTANT + c ₁Y _{t - 1}+ ... +c _jY _{t - j}+ a _t+d ₀X _{t - 0}+ ... +d ₁X _{t - 1}+ ... +d _kX _{t - k} where a _tis White Noise process. The model presents a formulation of one , in this case, input series and the output series. The transfer model is represented as a polynomial distributed lag. The problem is to IDENTIFY and ESTIMATE the structure of the polynomial(s). But before continuing I think that your problem could be slightly restated. The Series The data could be laid out as follows , using Y as the variable to be predicted.
Time Series Data For Product A
Time Period	# of Stores (X)	Indicator Variable for Special Promotion (Z)	$ Sales (Y)
Week 1	90	0	136
Week 2	86	1	142
Week n	99	0	156
Time Series Data For Product B
Time Period	# of Stores (X)	Indicator Variable for Special Promotion (Z)	$ Sales (Y)
Week 1	76	0	645
Week 2	84	0	538
Week n	81	0	552
Yi = Weekly total sales (115 POS) per product Xi = Number of POS (counts) which in the corresponding week had the product available on shelf Week 1 Week 2 Week n Y1 X1 Y2 X2 ...... Yn Xn Product A 136 90 142 86 156 99 Construct a Transfer Function between the two exogenous variables (X and Z) making sure that the error process is correctly modelled, that is the ARIMA component. Test to see if there are violations of the Gaussian assumptions; viz. 1. The mean of the errors is zero everywhere. This can be tested via Outlier Detection. Various model augmenation may be required ( Pulse, Seasonal Pulse, Level Shift) 2. The variance of the errors may not be constant, i.e. the variance may be proportional to the level or it may have had regime changes, where for some period of time the variance may have doubled or whatever. All of the above is premised on the assumption that the Number of Stores and the Sales for Product 2 have no effect on the Sales of Product 1. If this were not true or had to be tested then the required tools would be Vector ARIMA, where in this case there would be Two endogenous variables (Y1, Y2) and three exogenous series X1,X2 and the hypothesized Z1 variable. To the best of my knowledge AFS is the sole provider of software to deal with either solution. AUTOBOX allows one to identify and model outliers in a Transfer Function. I don't know any other piece of software that does that. Secondly , MTS which is a product of AFS deals with the Vector formulation. Again I believe that it is a unique solution because the only other Vector ARIMA program requires all variables , both endogenous and exogenous to be simultaneously predicted, thus there are no purely exogenous variables. All variables in that system are treated as endogenous. Reference: BOX, G.E.P. AND JENKINS, G.M. (1976). TIME SERIES ANALYSIS: FORECASTING AND CONTROL, 2ND ED., SAN FRANCISCO: HOLDEN DAY. Reilly, D.P. (1980). "Experiences with an Automatic Box-Jenkins Modeling Algorithm," in Time Series Analysis, ed. O.D. Anderson. (Amsterdam: North-Holland), pp. 493-508. Reilly, D.P. (1987). "Experiences with an Automatic Transfer Function Algorithm," in Computer Science and Statistics Proceedings of the 19th Symposium on the Interface, ed. R.M. Heiberger, (Alexandria, VI: American Statistical Association), pp. 128-135. Tiao, G.C., and Box, G.E.P. (1981). "Modeling Multiple Time Series with Applications," Journal of the American Statistical Association, Vol. 76, pp. 802-816. Tsay, R.S. (1986). "Time Series Model Specification in the Presence of Outliers," Journal of the American Statistical Society, Vol. 81, pp. 132-141.

QUESTION:

ANSWER:

The Series