The most studied time series on the planet would have to be the Box-Jenkins International Airline Passenger series found in their 1970 landmark textbook Time Series Analysis: Forecasting and Control. Just google AirPassengers or "airline passenger arima" and you will see it all over the place. It is on every major forecasting tool's website as an example. It is there with a giant flaw. We have been waiting and waiting for someone to notice. This example has let us known (for decades) that we have a something that the others don't...robust outlier detection. Let's explore more on why and how you check it out yourself.

It is 12 years of monthly data and Box-Jenkins used Logs to adjust for the increasing variance. They didn't have the research we have today on outliers, but what about everyone else? I. Chang had an unpublished dissertation(look for the name Chang) at University of Wisconsin in 1982 laying out an approach to detect and adjust outliers providing a huge leap in modeling power.

It was in 1973 that Chatfield and Prothero published a paper where the words "we have concerns" regarding the approach Box-Jenkins took with the Airline Passenger time series. What they saw was a high forecast that turned out to be too aggressive and too high. It is in the "Introduction" section. Naively, people think that when they take a transformation and make a forecast and then inverse transform of the forecast that they are ok. Statisticians and Mathematicians known that this is quite incorrect. There is no general solution for this except for the case of logarithms which requires a special modification to the inverse transform. This was pointed out by Chatfield in his book in 1985. See Rob Hyndman's discussion as well.

We do question why software companies, text books and practitioners that didn't check what assumptions and approaches that previous researchers said was fact. It was "always take Logs" for the Airline series and so everyone did. Maybe this assumption that it was optimal was never rechecked? I would imagine with all of the data scientists and researchers with ample tools would have found this out by now(start on page 114 and read on---hint:you won't find the word "outlier" in it!). Maybe they have, but haven't spread the word? We are now. :)

We accidently discovered that Logs weren't needed when we were implementing Chang's approach. We ran the example on the unlogged dataset and noticed the residuals variance was constant. What? No need to transform??

Logs are a transformation. Drugs also transform us. Sometimes with good consequences and sometimes with nasty side effects. In this case, the forecast for the Passenger was way too high and it was pointed out but went largely unnoticed(not by us).

Why did their criticism get ignored or forgotten? Either way, we are here to tell you that across the globe in schools and statistical software it is repeating a mistake in methodology that should be fixed.

Here is the model that Autobox identifies. Seasonal Differencing, an AR1 with 3 outliers. Much simpler than the Regular, Seasonal Differencing, MA1, MA12 model ....with a bad forecast. The forecast is not as aggressive. The outlier in March 1960 is the main culprit(period 135), but the others are also important. If you limit Autobox to search for one outlier is finds the 1960 outlier, but it still uses Logs so you need to "be better". It caused a false positive F test that logs were needed. They weren't and aren't needed!

The Residuals are clear of any variance Trend.

Here is a Description of the Possible Violations of the Assumptions of Constancy of the Mean and Variance in Residuals and How to Fix it.

Mean of the Error Changes: (Taio/Box/Chang)

1. A 1 period change in Level (i.e. a Pulse )

2. A contiguous multi-period change in Level (Intercept Change)

3. Systematically with the Season (Seasonal Pulse)

4. A change in Trend (nobody but Autobox)

Variance of the Error Changes:

5. At Discrete Points in Time (Tsay Test)

6. Linked to the Expected Value (Box-Cox)

7. Can be described as an ARMA Model (Garch)

8. Due to Parameter Changes (Chow, Tong/Tar Model)