Domain knowledge remains a deciding factor in machine learning applications

Background

Time series forecasting is usually a complex task because the structure of already univariate data often contains many unobserved factors. Standard models such as ARIMA, or filters, e.g. Kalman Filter are complex models that often need tweaking which requires a rigorous understanding of the underlying theory.

Practitioners with good domain knowledge but little statistics know-how want to make use of machine learning and forecasting methodologies to inform their business decisions. So a number of software packages and libraries attempt to bridge this gap by offering automated solutions.

The aim of this article is to investigate a promising library by Facebook…


Goals and contents

ARIMA timeseries models are often taught in econometrics courses as part of the regular business science curriculum and are thus put to use by sometimes inexperienced data scientists.

The intention of this case study is to understand the data generating process behind simple MA(1) models and illustrate weakness of the estimators at small sample sizes.

Result

For the tested MA(1) model with coefficient beta=0.3, a time series length of at least 5000 observations is necessary to reach a narrower confidence interval.

The impact on the goodness of forecasts is evaluated and depends critically on the estimated coefficient.

The case

Install some libraries first:

Konrad Hoppe

applied AI / strategy consultant / aspiring XC rider on weekends // http://www.konrad-hoppe.com

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store