Temporal K Fold Data Splitter¶
This module has two classes for splitting time series data temporally - where train/test or fold splits are created within each of the time series’ in the time series data. This splitting approach is for evaluating how well the algorithm performs on segments drawn from the same time series but excluded from the training set. The performance from this splitting approach should be similar to performance on the training data so long as the data in each series is relatively uniform.
Note that splitting along the temporal axis violates the assumption of independence between train and test samples, as successive samples in a sequence or series are not iid. However, temporal splitting is still useful in certain cases such as for the analysis of a single sequence / series.
-
class
seglearn.split.
TemporalKFold
(n_splits=3)[source]¶ K-fold iterator variant for temporal splitting of time series data
The time series’ are divided in time with no overlap, and are balanced.
By splitting the time series’, the number of samples in the data set is changed and so new arrays for the data and target are returned by the
split
function in addition to the iterator.- Parameters
- n_splitsint > 1
number of folds
Examples
>>> from seglearn.split import TemporalKFold >>> from seglearn.datasets import load_watch >>> data = load_watch() >>> splitter = TemporalKFold(n_splits=4) >>> X, y, cv = splitter.split(data['X'], data['y'])
Methods
split
(self, X, y)Splits time series data and target arrays, and generates splitting indices
-
split
(self, X, y)[source]¶ Splits time series data and target arrays, and generates splitting indices
- Parameters
- Xarray-like, shape [n_series, …]
Time series data and (optionally) contextual data
- yarray-like shape [n_series, ]
target vector
- Returns
- Xarray-like, shape [n_series * n_splits, ]
Split time series data and contextual data
- yarray-like, shape [n_series * n_splits]
Split target data
- cvlist, shape [2, n_splits]
Splitting indices
-
seglearn.split.
temporal_split
(X, y, test_size=0.25)[source]¶ Split time series or sequence data along the time axis. Test data is drawn from the end of each series / sequence
- Parameters
- Xarray-like, shape [n_series, …]
Time series data and (optionally) contextual data
- yarray-like shape [n_series, ]
target vector
- test_sizefloat
between 0 and 1, amount to allocate to test
- Returns
- X_trainarray-like, shape [n_series, ]
- X_testarray-like, shape [n_series, ]
- y_trainarray-like, shape [n_series, ]
- y_testarray-like, shape [n_series, ]