In their example on forecasting(they don't provide the data with Alteryx that they review, but you can request it---we did!), they have a video tutorial on analyzing monthly housing starts.

While this is only one example(we have done many!!). They use over 20 years of data. Kind of unnecessary to use that much data as patterns and models do change over time, but it only highlights a powerful feature of Autobox to protect you from this potential issue. We will discuss down below the use of the Chow test.

With 299 observations they determine of two alternative models (ie ETS and ARIMA)which the best model using the last 12 making a total of 311 observations used in the example. The video says they use 301 observations, but that is just a slight mistake. It should be noted that Autobox doesn't ever withhold data as it has adaptive techniques which USE all of the data to detect changes. It also doesn't fit models to data, but provides "a best answer". Combinations of forecasts never consider outliers. We do.

The MAPE for ARIMA was 5.17 and ETS was 5.65 which is shown in the video. When running this in Autobox using the automatic mode, it had a 3.85 MAPE(go to the bottom). That's a big difference by improving accuracy by >25%. Here is the model output and data file to reproduce this in Autobox.

Autobox is unique in that it checks if the model changes over time using the Chow test. A break was identified at period 180 and the older data will be deleted.

DIAGNOSTIC CHECK #4: THE CHOW PARAMETER CONSTANCY TEST The Critical value used for this test : .01 The minimum group or interval size was: 119 F TEST TO VERIFY CONSTANCY OF PARAMETERS CANDIDATE BREAKPOINT F VALUE P VALUE 120 1999/ 12 4.55639 .0039929423 132 2000/ 12 7.41461 .0000906435 144 2001/ 12 8.56839 .0000199732 156 2002/ 12 9.32945 .0000074149 168 2003/ 12 7.55716 .0000751465 180 2004/ 12 9.19764 .0000087995* * INDICATES THE MOST RECENT SIGNIFICANT BREAK POINT: 1% SIGNIFICANCE LEVEL. IMPLEMENTING THE BREAKPOINT AT TIME PERIOD 180: 2004/ 12 THUS WE WILL DROP (DELETE) THE FIRST 179 OBSOLETE OBSERVATIONS AND ANALYZE THE MOST RECENT 120 STATISTICALLY HOMOGENOUS OBSERVATIONS

DIAGNOSTIC CHECK #4: THE CHOW PARAMETER CONSTANCY TEST The Critical value used for this test : .01 The minimum group or interval size was: 119 F TEST TO VERIFY CONSTANCY OF PARAMETERS CANDIDATE BREAKPOINT F VALUE P VALUE 120 1999/ 12 4.55639 .0039929423 132 2000/ 12 7.41461 .0000906435 144 2001/ 12 8.56839 .0000199732 156 2002/ 12 9.32945 .0000074149 168 2003/ 12 7.55716 .0000751465 180 2004/ 12 9.19764 .0000087995* * INDICATES THE MOST RECENT SIGNIFICANT BREAK POINT: 1% SIGNIFICANCE LEVEL.

The model built using the more recent data had seasonal and regular differencing, an AR1 and a weak AR12. Two outliers were found at period 225(9/08) and 247(7/10). If you look at September's they are typically low, but not in 2008. July's are usually high, but not in 2010. If you don't identify and adjust for these outliers then you can never achieve a better model. Here is the Autobox model

[(1-B**1)][(1-B**12)]Y(T) = +[X1(T)][(1-B**1)][(1-B**12)][(- 831.26 )] :PULSE 2010/ 7 247 +[X2(T)][(1-B**1)][(1-B**12)][(+ 613.63 )] :PULSE 2008/ 9 225 + [(1+ .302B** 1)(1+ .359B** 12)]**-1 [A(T)]

Here is the table for forecasts for the 12 withheld periods.

]]>

It is 12 years of monthly data and Box-Jenkins used Logs to adjust for the increasing variance. They didn't have the research we have today on outliers, but what about everyone else? I. Chang had an unpublished dissertation(look for the name Chang) at University of Wisconsin in 1982 laying out an approach to detect and adjust outliers providing a huge leap in modeling power.

It was in 1973 that Chatfield and Prothero published a paper where the words "we have concerns" regarding the approach Box-Jenkins took with the Airline Passenger time series. What they saw was a high forecast that turned out to be too aggressive and too high. It is in the "Introduction" section. Naively, people think that when they take a transformation and make a forecast and then inverse transform of the forecast that they are ok. Statisticians and Mathematicians known that this is quite incorrect. There is no general solution for this except for the case of logarithms which requires a special modification to the inverse transform. This was pointed out by Chatfield in his book in 1985. See Rob Hyndman's discussion as well.

We do question why software companies, text books and practitioners that didn't check what assumptions and approaches that previous researchers said was fact. It was "always take Logs" for the Airline series and so everyone did. Maybe this assumption that it was optimal was never rechecked? I would imagine with all of the data scientists and researchers with ample tools would have found this out by now(start on page 114 and read on---hint:you won't find the word "outlier" in it!). Maybe they have, but haven't spread the word? We are now. :)

We accidently discovered that Logs weren't needed when we were implementing Chang's approach. We ran the example on the unlogged dataset and noticed the residuals variance was constant. What? No need to transform??

Logs are a transformation. Drugs also transform us. Sometimes with good consequences and sometimes with nasty side effects. In this case, the forecast for the Passenger was way too high and it was pointed out but went largely unnoticed(not by us).

Why did their criticism get ignored or forgotten? Either way, we are here to tell you that across the globe in schools and statistical software it is repeating a mistake in methodology that should be fixed.

Here is the model that Autobox identifies. Seasonal Differencing, an AR1 with 3 outliers. Much simpler than the Regular, Seasonal Differencing, MA1, MA12 model ....with a bad forecast. The forecast is not as aggressive. The outlier in March 1960 is the main culprit(period 135), but the others are also important. If you limit Autobox to search for one outlier is finds the 1960 outlier, but it still uses Logs so you need to "be better". It caused a false positive F test that logs were needed. They weren't and aren't needed!

The Residuals are clear of any variance Trend.

Here is a Description of the Possible Violations of the Assumptions of Constancy of the Mean and Variance in Residuals and How to Fix it.

Mean of the Error Changes: (Taio/Box/Chang)

1. A 1 period change in Level (i.e. a Pulse )

2. A contiguous multi-period change in Level (Intercept Change)

3. Systematically with the Season (Seasonal Pulse)

4. A change in Trend (nobody but Autobox)

Variance of the Error Changes:

5. At Discrete Points in Time (Tsay Test)

6. Linked to the Expected Value (Box-Cox)

7. Can be described as an ARMA Model (Garch)

8. Due to Parameter Changes (Chow, Tong/Tar Model)

]]>

The Tutorial is well written and allows you to easily download the 1,724 days of data and model this yourself. While SAP had a .13 MAPE(in sample), they had a challenge at the end for those who get a MAPE less than .12 to contact them. Can you predict what Autobox did? .0724. Guess who is going to contact them? I will also add, that if you can do better contact us as we might have something to learn too. I also suggest that you post how other tools handle this as well as that would be interesting to see as well. Autobox thrives(1st among automated) on daily data as it did in a daily forecasting competition and is much more difficult to model and something we have dedicated 25 years to perfecting.

After reading the SAP user's guide let's make the distinction that Autobox uses all of the data to build the model, while SAP (like all other tools) withholds data to "train" on.

Autobox adjusts for outliers. One could argue that by using adjusting for outliers the MAPE will only go down which is true, but it be aware that it allow for a clearer identification of the relationships in the data( ie coefficients / separating signal from noise).

The first approach in the SAP tutorial is running with only historical data and they add in the causals later. Outliers are identified and has a MAPE of .197.

A bunch of very curious variables(66??----PenultimateWednesday) are included that we have never seen before which made us scratch our heads (with delight???). They seem to try and capture the day of the week so we will turn that off some of Autobox's searches to avoid collinearity when we run with these in the first pass. They seem to use a day of year variable which I have never seen before. What book are they getting ideas to use these kind of variables from? Not one that I have ever seen, but perhaps someone can enlighten me? There are two variables that are measuring the number of working days that have occurred in the month and the number left in the month. We did find that some of these variables do have importance in the tests we ran so SAP has some ideas generating useful variables, but many are collinear and this could be called "kitchen sink" modeling. We will do more research into these. There is a holiday variable which also flags working days so the two variables would seem to be collinear. These two end up as the second and third most powerful variables in the SAP model. When we tried these in Autobox, both runs found them significant. Perhaps they measure (implicitly) holidays too? We are not sure, but they help.

There are weather variables which are useful and actually represent seasonality so using **both monthly dummies/weekly dummies and the weather variables could be problematic**. The holidays have been all combined into one catch all variable. This assumes that each holiday behaves similarly. It should be noted that a major difference is that SAP does not search for lead or lag relationships around the causals while Autobox can do that. Just try running this example in Autobox and then SAP. We ran with all of these curious variables. **We then reduced these variables and kept only Holiday, gust, rain, tmean, hmean, dmean, pmean, wmean, fmean, TubeStrike and Olympics and removed the curious other variables.** The question which might arise "how much can you trust the weather predictions?", but here we are looking at only the MAPE of the fit so that is not a topic of concern.

SAP ended up with a .13 MAPE when using there long list of causals. The key here is that no outliers are identified in the analysis. This is a distinction and why Autobox is so different. If you ignore outliers they do still exist and yes they exist in causal problems. By ignoring something that doesn't mean it goes away, but ends up impacting you elsewhere such as the model and you likely aren't even aware of its impact. By not being able to deal with outliers your model with causals will be skewed, but no one talks about this in any school or text book so sorry to ruin this illusion for you. Alice in Wonderland(search on alice) thought everything was perfect too, until.....

Autobox does stepdown regression, but also does "stepup" where it will search for changes in seasonality(ie day of the week), trend/level/parameters/variance as things sometimes drastically change. If you're not looking for it then you will never find it! s. The MAPE we are presenting can be found in the detail.htm audit report from the Autobox run(hint:near the bottom). We suppressed the search for special days of the month which are useful in ATM data, but not theoretically plausible for this data. Autobox allows for holidays in the Top 15 GDP's, but in general assumes the data is from the US so we will need to suppress that search. We suppressed the search for special days of the month which are useful in ATM daily data as payday's are important, but not theoretically plausible for this data.

To summarize: We can run this a few different ways, but we can't present all of these results down below as it would be too much information to present here. We included some output and the Autobox file (current.asc-rename that if you want to reproduce the results) so you can see for yourself. What we do know is that including ARIMA increases run time.

MAPE's

- Run using all variables with Autobox default options(suppressing US Holidays, day of month and monthly/weekly dummies). .0883
- Run using all variables with Autobox default options(suppressing US Holidays, day of month and monthly/weekly dummies). Allow for ARIMA .0746
- Run using a reduced set of variables(see above) & suppressing US holidays, day of month and monthly/weekly dummies). .1163
- Run using a reduced set of variables(see above) & suppressing US holidays, day of month and monthly/weekly dummies). Allow for ARIMA .0732
- Run using only Holiday, Strike/Olympics and rely upon monthly or weekly dummies. .1352
- Run using only Holiday, Strike/Olympics and rely upon monthly or weekly dummies. Allow for ARIMA .1145
- Run using a reduced set of variables, but remove the catch all "holiday" variable and create separate 6 main holiday variables that were flagged by SAP as they might each behave differently. (suppressing US Holidays, day of month, and monthly/weekly dummies) .1132

- Run using a reduced set of variables, but remove the catch all "holiday" variable and create separate 6 main holiday variables that were flagged by SAP as they might each behave differently. (suppressing US Holidays, day of month, and monthly/weekly dummies). Allow ARIMA .0724

Let's consider the model that was used to develop the lowest MAPE of .0724.

There were 38 outliers identified over the 1,724 observations so the goal is not to have the best fit, but to model and be parsimonious.

So, what did we do to make things right? We started by deleting all kinds of variables. There were linearly redundant variables such as WorkingDay that is perfectly correlated (inverse here) to Holiday which by definition should never be done when using dummy variables. The variable "Special Event" is redundant with TubeStrike and Olympics as well. Special Event name isn't even a number, but rather text and also is redundant.

All other software withholds data whereas Autobox uses all of the data to build the model as we have adaptive technology that can detect change (seasonality/level/trend/parameters/variance plus outliers). We won best dedicated forecasting tool in J. Scott Armstrong's "Principles of Forecasting". For the record, we politely disagree against a few of the 139 "Principles" as well.

We report the in sample MAPE, in the file "details.htm" seen below...

Another way to compare the Autobox and SAP results are by comparing side by side the actual and fit and you will clearly see how Autobox does a better job. The tutorial shows the graph for univariate, but unfortunately not for the causal run! Here is the graph of the actual, fit and forecast.

We prefer the actual and residuals plot as you can see the data more clearly.

The sign of the coefficients make sense(for the UK which is cold). When it's warmer people will skip the car and use the bike, for example so when Temperature goes up (+ sign) then people rent more bikes. When its gusty people will not and just drive. The tutorial explains the variables names in the back. tmean is average temperature, w is wind, d is dewpoint, h is humidity, p is barometric pressure, d is real feel temperature. All 6 holidays were found to be important with all but one having lead or lag impacts. When you see a B**-2 that means two days before the Christmas volume was low by 5036. Autobox found all 6 days of the week to be important. The SAP Holiday variable was a mixture of Saturday and Sunday and causes some confusion with interpretation of the model. This approach is much cleaner. The first day of the data is a Saturday(1/1/2011) and the variable "FIXED_EFF_N10107" is measuring that impact that Saturday is low by 4114. Sunday is considered average as day 7 is the baseline. See below for more on the day of the week rough verification(ie pivot table/contribution %).

Note the "level shift' variables added to the model. This meant that the volume changed up or down for a period and Autobox identified and ADAPTED to it. We call this "step up regression"(nothing there right? Yes, we own that world!) as we are identifying on the fly deterministic variables and adding them to the model. The runs with the SAP variables fit 2012 much better. The first time trend began at period 1 with volume steadily increasing 10.5 units each day. This gets tampered down with the second time trend beginning at 177 making the net effect +4.3 increase per day. 38 outliers were identified which is the key to whole analysis. They are sorted by their entry into the model and really their importance.

Note the Seasonal pulse where the first day becomes much higher starting at period 1639 and forward with an average 3956.8 higher volume. Thats quite a lot and if you do some simple plotting of the data it will be very apparent. Day 1 and Day 2 were always low, but over time Day 1 has become more average, Note the AR1 and AR7 parameters.

Let's consider the day of the week data by building a pivot table.

And getting this % of the whole. We call this the contribution %. Day 7 in Excel is Saturday which is low and notice Sunday(baseline) is even lower(remember that the holiday variable had a negative sign? The sign for Saturday was +1351.5 meaning it was 1351 higher than Sunday which matches the plot below. This type of summarization ignores trend, changes in day of the week impacts, etc. so be careful. We call this a poor man's regression because those percentages would be the coefficient if you ran a regression just using day of the week. It is directional, but by not means accurate as Autobox. We use this type of analysis to "roughly verify" Autobox with day of the week dummies, monthly dummies, and day of the month effects using pivot tables. The goal is not to overfit, but rather be parsimonious. Auto.arima is not parsimonious.

Let's look at the monthly breakout. Jan,Feb,Dec are average and the other months are higher with a slope up to the Summer months and down back to Winter. The temperature data replaces the use of monthly or weekly dummies here.

]]>

LEVEL SHIFTS

When hurricane Sandy hit last October, it caused a big drop for a number of weeks. Your model might have identified a "level shift" to react to the new average. The forecast would reflect this new average, but we all know that things will return, but the model and forecast aren't smart enough to address that. It would make sense to introduce a causal variable that reflected the drop due to the hurricane, BUT the future values of the causal would NOT reflect the impact so the forecast would return to the original level. So, the causal would have a lot of leading zeroes, and 1's when the impact of Sandy was felt and 0's when the impact would disappear. You could actually transition the 1 to a 0 gradually with some ramping techniques we learned from the famous modeler/forecaster Peg Young of the US DOT. The 0 dummy variable might increment like this 10,0,0,0,0,0,0,,1,1,1,1,1,1,1,.9,.8,.7,.6,.5,.4,.3,.2,.1,0,0,0,0,0,0,etc.

OUTLIERS

When you see outliers you should be reviewiing them to see if there is any pattern to them. For example, if you don't properly model the "Super Bowl" impact, you might see an outlier on those days. It takes a little time and effort to review and think "why" does this happen. The benefits of taking the time to do this can have a powerful impact. You can then add a causal with a 1 in the history when the Supewr Bowls took place and then the provide a 1 for the next one. For monthly data, you might see a low June as an outlier. Don't adjust it to the mean as that is throwing the baby away with the bath water. This means you might not be modeling the seasonality correctly. You might need an AR12, seasonal differencing or seasonal dummies.

SEASONAL PULSES

Let's continue with the low June example. This doesn't necessarily mean all months have seasonality and assuming a model instead of modeling the data might lead to a false conclusion for the need of seasonality. We are talking about a "seasonal pulse" where only June has an impact and the other months are near the average. This is where your causal dummy variable has 0's and a 1 on the low Junes and also the future Junes(ie 1,0,0,0,0,0,0,0,0,0,0,0,1).

]]>

A fun dataset to explore is the "age of the death of kings of England". The data comes form the 1977 book from McNeill called "Interactive Data Analysis" as is an example used by some to perform time series analysis. We intend on showing you the right way and the wrong way(we have seen examples of this!). Here is the data so you can you can try this out yourself: 60,43,67,50,56,42,50,65,68,43,65,34,47,34,49,41,13,35,53,56,16,43,69,59,48,59,86,55,68,51,33,49,67,77,81,67,71,81,68,70,77,56

It begins at William the Conqueror from the year 1028 to present(excluding the current Queen Elizabeth II) and shows the ages at death for 42 kings. It is an interesting example in that there is an underlying variable where life expectancy gets larger over time due to better health, eating, medicine, cyrogenic chambers???, etc and that is ignored in the "wrong way" example. We have seen the wrong way example as they are not looking for deterministic approaches to modeling and forecasting. Box-Jenkins ignored deterministic aspects of modeling when they formulated the ARIMA modeling process in 1976. The world has changed since then with research done by Tsay, Chatfield/Prothero (Box-Jenkins seasonal forecasting: Problems in a case study(with discussion)” J. Roy Statist soc., A, 136, 295-352), I. Chang, Fox that showed how important it is to consider deterministic options to achieve at a better model and forecast.

As for this dataset, there could be an argument that there would be no autocorrelation in the age between each king, but an argument could be made that heredity/genetics could have an autocorrelative impact or that if there were periods of stability or instability of the government would also matters. There could be an argument that there is an upper limit to how long we can live so there should be a cap on the maximum life span.

If you look at the dataset knew nothing about statistics, you might say that the first dozen obervations look stable and see that there is a trend up with some occasional real low values. If you ignored the outliers you might say there has been a change to a new higher mean, but that is when you ignore outliers and fall prey to Simpson's paradox or simply put "local vs global" inferences.

If you have some knowledge about time series analysis and were using your "rule book"on how to model, you might look at the ACF and PACF and say the series has no need for differencing and an AR1 model would suit it just fine. We have seen examples on the web where these experts use their brain and see the need for differencing and an AR1 as they like the forecast.

You might (incorrectly), look at the Autocorrelation function and Partial Autocorrelation and see a spike at Lag 1 and conclude that there is autocorrelation at lag 1 and then should then include an AR1 component to the model. Not shown here, but if you calculate the ACF on the first 10 observations the sign is negative and if you do the same on the last 32 observations they are positive supporting the "two trend" theory.

The PACF looks as follows:

Here is the forecast when using differencing and an AR1 model.

The ACF and PACF residuals look ok and here are the residuals. This is where you start to see how the outliers have been ignored with big spikes at 11,17,23,27,31 with general underfitting with values in the high side in the second half of the data as the model is inadequate. We want the residuals to be random around zero.

Now, to do it the right way....and with no human intervention whatsoever.

Autobox finds an AR1 to be significant and brings in a constant. It then identifies to time trends and 4 outliers to be brought into the model. We all know what "step down" regression modeling is, but when you are adding variables to the model it is called "step up". This is what is lacking in other forecasting software.

Note that the first trend is not significant at the 95% level. Autobox uses a sliding scale based on the number of observations. So, for large N .05 is the critical value, but this data set only has 42 observations so the critical value is adjusted. When all of the variables are assembled in the model, the model looks like this:

If you consider deterministic variables like outliers, level shifts, time trends your model and forecast will look very different. Do we expect people to live longer in a straight line? No. This is just a time series example showing you how to model data. Is the current king (Queen Elizabeth II) 87 years old? Yes. Are people living longer? Yes. The trend variable is a surrogate for the general populations longer life expectancy.

Here are the residuals. They are pretty random. There is some underfitting in the middle part of the dataset, but the model is more robust and sensible than the flat forecast kicked out by the difference, AR1 model.

Here is the actual and cleansed history of outliers. Its when you correct for outliers that you can really see why Autobox is doing what it is doing.

Does your current forecasting process automatically adjust for outliers? (correct answer Yes)

Do you make a separate run for certain problems that SHOULD NOT get adjusted for outliers as the outliers are in fact real and shouldn't be adjusted? (correct answer Yes)

Do you know what standard deviations are used to identify an outlier? (correct answer "who cares" You shouldn't be having to tell the system)

Who knows that the standard deviation calculation is itself skewed by the outlier? (correct answer "who cares" You shouldn't be having to tell the system)

Does the system ask you how many times it should "iterate" when removing outliers? How many times do you "iterate"? (correct answer "who cares" You shouldn't be having to tell the system)

Does the system allow you to convert outliers to causals and flag future values when the event will happen again? (correct answer Yes)

Does the system identify inliers? ie. 1,9,1,9,1,9,1,5 (correct answer Yes)

Does the system recognize the difference between an outlier and a seasonal pulse? (correct answer Yes) (IE 1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,etc)

Does the system recognize the difference between an outlier and a level shift? (correct answer Yes) (IE 0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,etc)

Does the system recognize the difference between an outlier and a change in the trend? (correct answer Yes) (IE 0,0,0,0,0,0,0,1,2,3,4,5,6,7,8,etc)

Does the system allow you to force the outlier in the most recent period to be a "level shift" or a "seasonal pulse"? (correct answer Yes)

Does the system report a file adjusted for outliers for pure data cleansing purposes? (correct answer Yes)

Does the system adjust for outliers in a time series regresion (ie ARIMAX/Transfer Function)? (correct answer Yes)

Who tries to find the assignable cause why the outliers existed? (correct answer I do)

Who then provides causals to explain the outliers to the system? (correct answer I do)

]]>For a couple of reasons:

**It wasn't an outlier. It was a seasonal pulse.**

The observations outside of the 2 or 3 sigma bounds could in fact be a newly formed seasonal pattern. For example, halfway through the time series June's become become very high when it had been average. Simple approaches would just remove anything outside the bounds which could be throwing the "baby out with the bathwater".

**Your 3 sigma calculation was skewed due to the outlier itself.**

It is a chicken and egg dilemma. The outliers make the sigma wide so that you miss outliers.

**The outlier was in fact a promotion.**

Using just the history of the series is not enough. You should include causals as they can help explain what is perceived to be an outlier.

**Now let's consider the inlier.**

There could be outliers that are within 3 sigma and let's say the observation is near the mean. When could the mean be unusual? When the observation should have been high and it just didn't for some reason.

**Simple methods force the user to specify the # of times the system should iterate to remove outliers.**

You are then asked how many times do you want to iterate to find the interventions by the forecasting tool? Is this intelligence or a crutch? So, you are somehow supposed to provide some empirically based guidance??? You don't know as it would be just a guess.

The reality is that Simple methods/software use a process where they assume a "mean model" to determine the outliers. The correct way is to build a model and identify the outliers at the same time. Sounds simple, right?

Refer to these articles for more on how to identify outliers properly

Fox JA (1972). Outliers in time series. J. Royal Stat. Soc., Series B, 34: 350-363.

Chang, I., and Tiao, G.C. (1983). "Estimation of Time Series Parameters in the Presence of Outliers," Technical Report #8, Statistics Research Center, Graduate School of Business, University of Chicago, Chicago.

Tsay R (1986a). Time series model specification in the presence of outliers. J. Am. Stat. Soc., 81: 132-141.

Tsay R (1988). Outliers, level shifts and variance changes in time series. J. Forecast., 7: 1-20.

**Does anyone have any other examples of bad outlier methodologies? or other software with their examples posted?**

]]>