Time Series Preprocessing

This module is for preprocessing time series data.

class seglearn.preprocessing.TargetRunLengthEncoder(min_length=200)[source]

Takes a data set with a categorical target variable encoded as a time series and transforms it with run length encoding (RLE) of the target variable

RLE finds contiguous runs of the same target value within the input data and derives the transformed data set from the amalgum of all contiguous runs of all target classes from all series in the input data.

This is useful for generating “pure” series with no mixing of target variables from datasets that encode the target variable as a series (e.g. MHEALTH and PAMAP2)

Note that seglearn can handle datasets with target variables encoded as a series natively (using SegmentXY) and so this preprocessing is not required but may be helpful for some tasks. Effectively it will let you use SegmentX on datasets that would otherwise require SegmentXY.

Parameters
min_lengthinteger > 1

minimum number of samples in a run for it to be included in the transformed data

Methods

fit(self, X[, y])

Fit the transform

fit_transform(self, X, y[, sample_weight])

Fit the data and transform (required by sklearn API)

get_params(self[, deep])

Get parameters for this estimator.

set_params(self, \*\*params)

Set the parameters of this estimator.

transform(self, X, y[, sample_weight])

Transforms the time series data with run length encoding of the target variable Note this transformation changes the number of samples in the data If sample_weight is provided, it is transformed to align to the new target encoding

fit(self, X, y=None)[source]

Fit the transform

Parameters
Xarray-like, shape [n_series, …]

Time series data and (optionally) contextual data

yNone

There is no need of a target in a transformer, yet the pipeline API requires this parameter.

Returns
selfobject

Returns self.

transform(self, X, y, sample_weight=None)[source]

Transforms the time series data with run length encoding of the target variable Note this transformation changes the number of samples in the data If sample_weight is provided, it is transformed to align to the new target encoding

Parameters
Xarray-like, shape [n_series, …]

Time series data and (optionally) contextual data

yarray-like shape [n_series, …]

target variable encoded as a time series

sample_weightarray-like shape [n_series], default = None

sample weights

Returns
Xtarray-like, shape [n_rle_series, ]

transformed time series data

ytarray-like, shape [n_rle_series]

target values for each series

sample_weight_newarray-like shape [n_rle_series]

sample weights