Tom Reilly

Waging a war against how to model time series vs fitting

  • Home
    Home This is where you can find all the blog posts throughout the site.
  • Categories
    Categories Displays a list of categories from this blog.
  • Tags
    Tags Displays a list of tags that has been used in the blog.
  • Bloggers
    Bloggers Search for your favorite blogger from this site.

Boxing Match - Statistica vs. Autobox (Ding Ding!) - Let's get ready to CRUMBLE!

Posted by on in Forecasting
  • Font size: Larger Smaller
  • Hits: 1028
  • 1 Comment
  • Subscribe to this entry
  • Print
  • PDF

Dell acquired Statistica in 2014 and did what was called  a "major overhaul" and now through the magic of Gartner pixie dust is in the Magic Quadrant for Advanced analytics. Woohoo. Tibco bought it from Dell in 2016.  We stumbled upon an example in the Statistica documentation that we benchmarked against Autobox. The differences we saw were dramatic and common to what we see vs. the competition.  By logic, if Statistica is in the Magic Quadrant then we should be the undisputed heavyweight champ of the world.  Now, the major overhaul was based around the interface and "in-database" connectivity so the analytics didn't get the new coat of fresh paint.  Try this example that we have below in your tool to see if yours works or not.

You can download a 30 day trial of Statistics here.

You can download the Autobox output here(and ASC/data file to run with) or try yourself with our 30 day trial.

The example is meant to show how to model time series data when an Interruption" has occurred.  The data used was from the McLeary/Hay text book on phone calls to directory assistance per month in Cinncinati.  A charge for a call began at period 147 so a level shift variable was being tested to model the impact.  Statistica identifies a -399 impact with a bad forecast while Autobox identifies an impact of -533 and a good forecast. The thing that makes this example disturbing is that Statistica is not an automated tool. The person running the analysis decided to only analyze the data before the interruption and use only the first 146 observations of the 180 to determine if seasonal differencing should be applied. Time series isn't done in a piecemeal fashion. The model that is built is then inappropriately applied using the entire 180 observations along with a level shift (ie interrupted) variable to measure the impact in policy. Bad. Bad. Bad. or is that that person didn't know that they should be looking for outliers and level shifts like Autobox?

We did contact the creators of Statistica and the response was "Our users would be upset if the answer was different than the book".  That was shocking to hear.  So, you never want to learn or change methodology to something better?


Trackback URL for this blog entry.

Comments

Go to top