ARIMA

Given a time series of data X, the Autoregressive Integrated Moving Average (ARIMA) model is a tool for understanding and, perhaps, predicting future values in the series. The model consists of three parts, an autoregressive (AR) part, a moving average (MA) part, and an integrated (I) part where an initial differencing step can be applied to remove any non-stationarity in the signal. The model is generally referred to as an ARIMA(p, d, q) model where parameters p, d, and q are non-negative integers that refer to the order of the autoregressive, integrated, and moving average parts of the model respectively.

- Training Function

The ARIMA training function has the following syntax.

arima_train( input_table, output_table, timestamp_column, timeseries_column, grouping_columns, include_mean, non_seasonal_orders, optimizer_params )

**Arguments**

- input_table
TEXT. The name of the table containing time series data.

- output_table
TEXT. The name of the table to store the ARIMA model. Three tables are created, with names based on the value of the

*output_table*argument in the training function:*output_table*: Table containing the ARIMA model. Contains the following columns:mean Model mean (only if 'include_mean' is TRUE) mean_std_error Standard errors for mean ar_params Auto-regressions parameters of the ARIMA model ar_std_errors Standard errors for AR parameters ma_params Moving average parameters of the ARIMA model ma_std_errors Standard errors for MA parameters *output_table*_summary: Table containing descriptive statistics of the ARIMA model. Contains the following columns:input_table Table name with the source data timestamp_col Column name in the source table that contains the timestamp index to data timeseries_col Column name in the source table that contains the data values non_seasonal_orders Orders of the non-seasonal ARIMA model include_mean TRUE if intercept was included in ARIMA model residual_variance Variance of the residuals log_likelihood Log likelihood value (when using MLE) iter_num The number of iterations executed exec_time Total time taken to train the model *output_table*_residual: Table containing the residuals for each data point in 'input_table'. Contains the following columns:timestamp_col Same as the 'timestamp_col' parameter (all indices from source table included except the first *d*elements, where*d*is the differencing order value from 'non_seasonal_orders')residual Residual value for each data point

- timestamp_column
TEXT. The name of the column containing the timestamp (index) data. This could be a serial index (INTEGER) or date/time value (TIMESTAMP).

- timeseries_column
TEXT. The name of the column containing the time series data. This data is currently restricted to DOUBLE PRECISION.

- grouping_columns (optional)
TEXT, default: NULL.

*Not currently implemented. Any non-NULL value is ignored.*A comma-separated list of column names used to group the input dataset into discrete groups, training one ARIMA model per group. It is similar to the SQL

`GROUP BY`

clause. When this value is null, no grouping is used and a single result model is generated.- include_mean (optional)
BOOLEAN, default: FALSE. Mean value of the data series is added in the ARIMA model if this variable is True.

- non_seasonal_orders (optional)
INTEGER[], default: 'ARRAY[1,1,1]'. Orders of the ARIMA model. The orders are [p, d, q], where parameters p, d, and q are non-negative integers that refer to the order of the autoregressive, integrated, and moving average parts of the model respectively.

- optimizer_params (optional)
- TEXT. Comma-separated list of optimizer-specific parameters of the form â€˜name=value'. The order of the parameters does not matter. The following parameters are recognized:
**max_iter:**Maximum number of iterations to run learning algorithm (Default = 100)**tau:**Computes the initial step size for gradient algorithm (Default = 0.001)**e1:**Algorithm-specific threshold for convergence (Default = 1e-15)**e2:**Algorithm-specific threshold for convergence (Default = 1e-15)**e3:**Algorithm-specific threshold for convergence (Default = 1e-15)**hessian_delta:**Delta parameter to compute a numerical approximation of the Hessian matrix (Default = 1e-6)

- Forecasting Function

The ARIMA forecast function has the following syntax.

arima_forecast( model_table, output_table, steps_ahead )

**Arguments**

- model_table
TEXT. The name of the table containing the ARIMA model trained on the time series dataset.

- output_table
TEXT. The name of the table to store the forecasted values. The output table produced by the forecast function contains the following columns.

group_by_cols Grouping column values (if grouping parameter is provided) step_ahead Time step for the forecast forecast_value Forecast of the current time step - steps_ahead
- INTEGER. The number of steps to forecast at the end of the time series.

- Examples
- View online help for the ARIMA training function.
SELECT madlib.arima_train();

- Create an input data set.
DROP TABLE IF EXISTS arima_beer; CREATE TABLE arima_beer (time_id integer NOT NULL, value double precision NOT NULL ); COPY arima_beer (time_id, value) FROM stdin WITH DELIMITER '|'; 1 | 93.2 2 | 96.0 3 | 95.2 4 | 77.0 5 | 70.9 6 | 64.7 7 | 70.0 8 | 77.2 9 | 79.5 10 | 100.5 11 | 100.7 12 | 107.0 13 | 95.9 14 | 82.7 15 | 83.2 16 | 80.0 17 | 80.4 18 | 67.5 19 | 75.7 20 | 71.0 21 | 89.2 22 | 101.0 23 | 105.2 24 | 114.0 25 | 96.2 26 | 84.4 27 | 91.2 28 | 81.9 29 | 80.5 30 | 70.4 31 | 74.7 32 | 75.9 33 | 86.2 34 | 98.7 35 | 100.9 36 | 113.7 37 | 89.7 38 | 84.4 39 | 87.2 40 | 85.5 \.

- Train an ARIMA model.
-- Train ARIMA model with 'grouping_columns'=NULL, 'include_mean'=TRUE, -- and 'non_seasonal_orders'=[1,1,1] SELECT madlib.arima_train( 'arima_beer', 'arima_beer_output', 'time_id', 'value', NULL, FALSE, ARRAY[1, 1, 1] );

- Examine the ARIMA model.
\x ON SELECT * FROM arima_beer_output;

Result:-[ RECORD 1 ]-+------------------ ar_params | {0.221954769696} ar_std_errors | {0.575367782602} ma_params | {-0.140623564576} ma_std_errors | {0.533445214346}

- View the summary statistics table.
SELECT * FROM arima_beer_output_summary;

Result:-[ RECORD 1 ]-------+--------------- input_table | arima_beer timestamp_col | time_id timeseries_col | value non_seasonal_orders | {1,1,1} include_mean | f residual_variance | 100.989970539 log_likelihood | -145.331516396 iter_num | 28 exec_time (s) | 2.75

- View the residuals.
\x OFF SELECT * FROM arima_beer_output_residual;

Result:time_id | residual ---------+-------------------- 2 | 0 4 | -18.222328834394 6 | -5.49616627282665 ... 35 | 1.06298837051437 37 | -25.0886854003757 39 | 3.48401666299571 (40 rows)

- Use the ARIMA forecast function to forecast 10 future values.
SELECT madlib.arima_forecast( 'arima_beer_output', 'arima_beer_forecast_output', 10 ); SELECT * FROM arima_beer_forecast_output;

Result:steps_ahead | forecast_value -------------+---------------- 1 | 85.3802343659 3 | 85.3477516875 5 | 85.3461514635 7 | 85.3460726302 9 | 85.3460687465 2 | 85.3536518121 4 | 85.3464421267 6 | 85.3460869494 8 | 85.3460694519 10 | 85.34606859 (10 rows)

- View online help for the ARIMA training function.

- Technical Background
- An ARIMA model is an
*a*uto-*r*egressive*i*ntegrated*m*oving*a*verage model. An ARIMA model is typically expressed in the form

where is the backshift operator. The time is from to .

ARIMA models involve the following variables:

- The values of the time series: .
- Parameters of the model: , , and ; is the differencing order, is the order of the AR operator, and is the order of the MA operator.
- The AR operator: .
- The MA operator: .
- The lag difference: , where .
- The mean value: , which is set to be zero for and estimated from the data when d=0.
- The error terms: .

The auto regression operator models the prediction for the next observation as some linear combination of the previous observations. More formally, an AR operator of order is defined as

The moving average operator is similar, and it models the prediction for the next observation as a linear combination of the errors in the previous prediction errors. More formally, the MA operator of order is defined as

We estimate the parameters using the Levenberg-Marquardt Algorithm. In mathematics and computing, the Levenberg-Marquardt algorithm (LMA), also known as the damped least-squares (DLS) method, provides a numerical solution to the problem of minimizing a function, generally nonlinear, over a space of parameters of the function.

Like other numeric minimization algorithms, LMA is an iterative procedure. To start a minimization, the user has to provide an initial guess for the parameter vector, $p$, as well as some tuning parameters .

- Literature

[1] Rob J Hyndman and George Athanasopoulos: Forecasting: principles and practice, http://otexts.com/fpp/

[2] Robert H. Shumway, David S. Stoffer: Time Series Analysis and Its Applications With R Examples, Third edition Springer Texts in Statistics, 2010

[3] Henri Gavin: The Levenberg-Marquardt method for nonlinear least squares curve-fitting problems, 2011

- Related Topics

File arima.sql_in documenting the ARIMA functions