Time Series ForecastingΒΆ

In this example, we use a feature representation pipeline to forecast a continuous time series target with a regressor.

The algorithm is trained from the target from the features and targets in the training set. Then predict (future segments) from the features in the test set.

We do not sequentially retrain the algorithm as we move through the test set - which is an approach you will sometimes see with time series forecasting (and which may or may not be useful in your application).

../_images/sphx_glr_plot_forecast_001.png

Out:

/home/circleci/miniconda/envs/testenv/lib/python3.7/site-packages/sklearn/base.py:197: FutureWarning: From version 0.24, get_params will raise an AttributeError if a parameter cannot be retrieved as an instance attribute. Previously it would return None.
  FutureWarning)
N series in train:  1
N series in test:  1
N segments in train:  34
N segments in test:  9
Score:  0.9969665115318143

# Author: David Burns
# License: BSD


import matplotlib.pyplot as plt
import numpy as np
from sklearn.linear_model import LinearRegression

from seglearn.pipe import Pype
from seglearn.split import temporal_split
from seglearn.transform import FeatureRep, SegmentXYForecast, last

t = np.arange(5000) / 100.
y = np.sin(t) * t * 2.5 + t * t

# with forecasting, X can include the target
X = np.stack([t, y], axis=1)

# remember for a single time series, we need to make a list
X = [X]
y = [y]

# split the data along the time axis (our only option since we have only 1 time series)
X_train, X_test, y_train, y_test = temporal_split(X, y, test_size=0.25)

# create a feature representation pipeline
# setting y_func = last, and forecast = 200 makes us predict the value of y
# 200 samples ahead of the segment
# other reasonable options for y_func are ``mean``, ``all`` (or create your own function)
# see the API documentation for further details
clf = Pype([('segment', SegmentXYForecast(width=200, overlap=0.5, y_func=last, forecast=200)),
            ('features', FeatureRep()),
            ('lin', LinearRegression())])

# fit and score
clf.fit(X_train, y_train)
score = clf.score(X_test, y_test)

print("N series in train: ", len(X_train))
print("N series in test: ", len(X_test))
print("N segments in train: ", clf.N_train)
print("N segments in test: ", clf.N_test)
print("Score: ", score)

# generate some predictions
y, y_p = clf.transform_predict(X, y)  # all predictions
ytr, ytr_p = clf.transform_predict(X_train, y_train)  # training predictions
yte, yte_p = clf.transform_predict(X_test, y_test)  # test predictions

# note - the first few segments in the test set won't have predictions (gap)
# we plot the 'gap' for the visualization to hopefully make the situation clear
Ns = len(y)
ts = np.arange(Ns)  # segment number
ttr = ts[0:len(ytr)]
tte = ts[(Ns - len(yte)):Ns]
tga = ts[len(ytr):(Ns - len(yte))]
yga = y[len(ytr):(Ns - len(yte))]

# plot the results
plt.plot(ttr, ytr, '.', label="training")
plt.plot(tga, yga, '.', label="gap")
plt.plot(tte, yte, '.', label="test")
plt.plot(tte, yte_p, label="predicted")

plt.xlabel("Segment Number")
plt.ylabel("Target")
plt.legend()
plt.show()

Total running time of the script: ( 0 minutes 0.090 seconds)

Gallery generated by Sphinx-Gallery