ARIMA#
ARIMA is an industry-standard name for a group of linear models that explains the time series based only on previous time series values. It takes its name from the:
AutoRegression.
Integrated.
Mean average.
ARMA#
These model interpret observed time series as a sum of two components:
\(x_t\): determined part of the \(t\)-th element of time series.
\(\varepsilon_t\): random noise of the \(t\)-th element of time series.
Thus, the observed value in the sample is actually composed of \(y_{t} = x_t + \varepsilon_t\).
The ARMA model assumes that the \(t\)-th value of the timeseries (\(x_t\)) depends linearly on \(p\) previous values of the time series (\(x_{t-1}, x_{t-2}, \ldots, x_{t-p}\)) and \(q\) previous values of the random noise (\(\varepsilon_{t-1} + \varepsilon_{t-2} + \ldots + \varepsilon_{t-q}\)).
It’s typically can be written down as a equation:
Where
\(\alpha_i, i = \overline{1,p}\): The coefficient that describes how the \(t\)-th value of the time series depends on the \(t-i\) value of the time series.
\(\theta_i, i = \overline{1,q}\): The coefficient that describes how the \(t\)-th value of the time series depends on the random noise for the \(t-i\)-th observation.
The official definition can be a bit confusing because it does not express the paticular value of the time series. Thus, it can be rewritten using basic mathematical transformations as follows:
Since an \(\varepsilon_t\) is just a noise, the sign before it is not imporant. We can rewrite the entire identity as follows:
The ARMA model can only be applied only under the assumption that the time series is stationary. This means that there are no trends and the variation is constant.
Compution#
Because the ARMA model uses the previous \(\max(p, q)\) values to estimate the \(t\)-th element of the sequence, the procedure is recursive.
This creates the issue that we need initial values to start the process. Typically, the missing \(x_{t-p}, \ldots, x_{-1}\) and \(\varepsilon_{t-q}, \ldots, \varepsilon_{t-q}\) values for the first \(\max(p, q)\) elements are either set to constants or estimated using backcasting.
Backcasting is a method of estimating pre-sample values of a time series by running the model equations backward in time to generate plausible initial conditions.
Integration#
The ARMA model requires the explained series to be stationary. The integration procedure is typically applied achieve stationarity. In this context, integration is simply a transformation that subtracts the previous value of the time series from the current value:
Generally, integrations can be applied several times:
The \(ARIMA(p, d, q)\) stands for the applying the ARMA model to the \(d\)-diffirintiated \(\nabla^d x_t\).
To determine how many times a time series must be integrated, the series is typcally integrated until a selected stationarity test indicates that it is stationary. Following tests can be applied in this case:
ADF: Augmented Dickey-Fuller (ADF) test.
KPSS: Kwiatkowski-Phillips-Schmidt-Shin test.
PP: Phillips-Perron test.
Choosing the order#
There are a number of approaches that allows to determine which autoregressive and mean-average components should be included in the model.
If the first \(p\) partial autocorrelation coefficients are high and then there is a rapid cutoff, it may indicate that you should include the first \(p\) autocorrelation coefficients.
If the first \(q\) autocorrelation coefficients are high and then there is a rapid cutoff, it may indicate that you should include the first \(q\) mean average coefficients.