Time Series Transforms¶
This module is for transforming time series data.
-
class
seglearn.transform.
Segment
(width=100, overlap=0.5, step=None, y_func=<function last>, shuffle=False, random_state=None, order='F')[source]¶ Transformer for sliding window segmentation for datasets where X is time series data, optionally with contextual variables and y can either have a single value for each time series or itself be a time series with the same sampling interval as X
The target y is mapped to segments from their parent series.
If the target y is a time_series, the optional parameter y_func determines the mapping behavior. The segment targets can be a single value, or a sequence of values depending on
y_func
parameter.The transformed data consists of segment/target pairs that can be learned through a feature representation or directly with a neural network.
- Parameters
- widthint > 0
width of segments (number of samples)
- overlapfloat range [0,1]
amount of overlap between segments. must be in range: 0 <= overlap <= 1 (note: setting overlap to 1.0 results in the segments to being advanced by a single sample)
- stepint range [1, width] (default=None)
number of samples to advance adjacent segments (note: this takes precedence over overlap)
- y_funcfunction
returns target from array of target segments (eg
last
,middle
, ormean
)- shufflebool, optional
shuffle the segments after transform (recommended for batch optimizations)
- random_stateint, default = None
Randomized segment shuffling will return different results for each call to
transform
. If you have setshuffle
to True and want the same result with each call tofit
, setrandom_state
to an integer.- orderstr, optional (default=’F’)
Determines the index order of the segmented time series. ‘C’ means C-like index order (first index changes slowest) and ‘F’ means Fortran-like index order (last index changes slowest). ‘C’ ordering is suggested for neural network estimators, and ‘F’ ordering is suggested for computing feature representations.
- Returns
- selfobject
Returns self.
Methods
fit
(self, X[, y])Fit the transform
fit_transform
(self, X, y[, sample_weight])Fit the data and transform (required by sklearn API)
get_params
(self[, deep])Get parameters for this estimator.
set_params
(self, \*\*params)Set the parameters of this estimator.
transform
(self, X[, y, sample_weight])Transforms the time series data into segments (temporal tensor) Note this transformation changes the number of samples in the data If y and sample_weight are provided, they are transformed to align to the new samples
-
fit
(self, X, y=None)[source]¶ Fit the transform
- Parameters
- Xarray-like, shape [n_series, …]
Time series data and (optionally) contextual data
- yNone
There is no need of a target in a transformer, yet the pipeline API requires this parameter.
- shufflebool
Shuffles data after transformation
- Returns
- selfobject
Returns self.
-
transform
(self, X, y=None, sample_weight=None)[source]¶ Transforms the time series data into segments (temporal tensor) Note this transformation changes the number of samples in the data If y and sample_weight are provided, they are transformed to align to the new samples
- Parameters
- Xarray-like, shape [n_series, …]
Time series data and (optionally) contextual data
- yarray-like shape [n_series], default = None
target vector
- sample_weightarray-like shape [n_series], default = None
sample weights
- Returns
- Xtarray-like, shape [n_segments, ]
transformed time series data
- ytarray-like, shape [n_segments]
expanded target vector
- sample_weight_newarray-like shape [n_segments]
expanded sample weights
-
class
seglearn.transform.
SegmentX
(width=100, overlap=0.5, step=None, shuffle=False, random_state=None, order='F')[source]¶ DEPRECATED - Use Segment class instead
Transformer for sliding window segmentation for datasets where X is time series data, optionally with contextual variables and each time series in X has a single target value y
The target y is mapped to all segments from their parent series. The transformed data consists of segment/target pairs that can be learned through a feature representation or directly with a neural network.
- Parameters
- widthint > 0
width of segments (number of samples)
- overlapfloat range [0,1]
amount of overlap between segments. must be in range: 0 <= overlap <= 1 (note: setting overlap to 1.0 results in the segments to being advanced by a single sample)
- stepint range [1, width] (default=None)
number of samples to advance adjacent segments (note: this takes precedence over overlap)
- shufflebool, optional
shuffle the segments after transform (recommended for batch optimizations)
- random_stateint, default = None
Randomized segment shuffling will return different results for each call to
transform
. If you have setshuffle
to True and want the same result with each call tofit
, setrandom_state
to an integer.- orderstr, optional (default=’F’)
Determines the index order of the segmented time series. ‘C’ means C-like index order (first index changes slowest) and ‘F’ means Fortran-like index order (last index changes slowest). ‘C’ ordering is suggested for neural network estimators, and ‘F’ ordering is suggested for computing feature representations.
Methods
fit
(self, X[, y])Fit the transform
fit_transform
(self, X, y[, sample_weight])Fit the data and transform (required by sklearn API)
get_params
(self[, deep])Get parameters for this estimator.
set_params
(self, \*\*params)Set the parameters of this estimator.
transform
(self, X[, y, sample_weight])Transforms the time series data into segments (temporal tensor) Note this transformation changes the number of samples in the data If y and sample_weight are provided, they are transformed to align to the new samples
-
class
seglearn.transform.
SegmentXY
(width=100, overlap=0.5, step=None, y_func=<function last>, shuffle=False, random_state=None, order='F')[source]¶ DEPRECATED - Use Segment class instead
Transformer for sliding window segmentation for datasets where X is time series data, optionally with contextual variables and y is also time series data with the same sampling interval as X
The target y is mapped to segments from their parent series, using the parameter
y_func
to determine the mapping behavior. The segment targets can be a single value, or a sequence of values depending ony_func
parameter.The transformed data consists of segment/target pairs that can be learned through a feature representation or directly with a neural network.
- Parameters
- widthint > 0
width of segments (number of samples)
- overlapfloat range [0,1]
amount of overlap between segments. must be in range: 0 <= overlap <= 1 (note: setting overlap to 1.0 results in the segments to being advanced by a single sample)
- stepint range [1, width] (default=None)
number of samples to advance adjacent segments (note: this takes precedence over overlap)
- y_funcfunction
returns target from array of target segments (eg
last
,middle
, ormean
)- shufflebool, optional
shuffle the segments after transform (recommended for batch optimizations)
- random_stateint, default = None
Randomized segment shuffling will return different results for each call to
transform
. If you have setshuffle
to True and want the same result with each call tofit
, setrandom_state
to an integer.- orderstr, optional (default=’F’)
Determines the index order of the segmented time series. ‘C’ means C-like index order (first index changes slowest) and ‘F’ means Fortran-like index order (last index changes slowest). ‘C’ ordering is suggested for neural network estimators, and ‘F’ ordering is suggested for computing feature representations.
- Returns
- selfobject
Returns self.
Methods
fit
(self, X[, y])Fit the transform
fit_transform
(self, X, y[, sample_weight])Fit the data and transform (required by sklearn API)
get_params
(self[, deep])Get parameters for this estimator.
set_params
(self, \*\*params)Set the parameters of this estimator.
transform
(self, X[, y, sample_weight])Transforms the time series data into segments (temporal tensor) Note this transformation changes the number of samples in the data If y and sample_weight are provided, they are transformed to align to the new samples
-
class
seglearn.transform.
SegmentXYForecast
(width=100, overlap=0.5, step=None, forecast=10, y_func=<function last>, shuffle=False, random_state=None, order='F')[source]¶ Forecast sliding window segmentation for time series or sequence datasets
The target y is mapped to segments from their parent series, using the
forecast
andy_func
parameters to determine the mapping behavior. The segment targets can be a single value, or a sequence of values depending ony_func
parameter.The transformed data consists of segment/target pairs that can be learned through a feature representation or directly with a neural network.
- Parameters
- widthint > 0
width of segments (number of samples)
- overlapfloat range [0,1]
amount of overlap between segments. must be in range: 0 <= overlap <= 1 (note: setting overlap to 1.0 results in the segments to being advanced by a single sample)
- stepint range [1, width] (default=None)
number of samples to advance adjacent segments (note: this takes precedence over overlap)
- forecastint
The number of samples ahead in time to forecast
- y_funcfunction
returns target from array of target forecast segments (eg
last
, ormean
)- shufflebool, optional
shuffle the segments after transform (recommended for batch optimizations)
- random_stateint, default = None
Randomized segment shuffling will return different results for each call to
transform
. If you have setshuffle
to True and want the same result with each call tofit
, setrandom_state
to an integer.- orderstr, optional (default=’F’)
Determines the index order of the segmented time series. ‘C’ means C-like index order (first index changes slowest) and ‘F’ means Fortran-like index order (last index changes slowest). ‘C’ ordering is suggested for neural network estimators, and ‘F’ ordering is suggested for computing feature representations.
- Returns
- selfobject
Returns self.
Methods
fit
(self, X[, y])Fit the transform
fit_transform
(self, X, y[, sample_weight])Fit the data and transform (required by sklearn API)
get_params
(self[, deep])Get parameters for this estimator.
set_params
(self, \*\*params)Set the parameters of this estimator.
transform
(self, X[, y, sample_weight])Forecast sliding window segmentation for time series or sequence datasets.
-
transform
(self, X, y=None, sample_weight=None)[source]¶ Forecast sliding window segmentation for time series or sequence datasets. Note this transformation changes the number of samples in the data. Currently sample weights always returned as None.
- Parameters
- Xarray-like, shape [n_series, …]
Time series data and (optionally) contextual data
- yarray-like shape [n_series]
target vector
- sample_weightarray-like shape [n_series], default = None
sample weights
- Returns
- X_newarray-like, shape [n_segments, ]
segmented X data
- y_newarray-like, shape [n_segments]
forecast y data
- sample_weight_newNone
-
class
seglearn.transform.
PadTrunc
(width=100)[source]¶ Transformer for using padding and truncation to enforce fixed length on all time series in the dataset. Series’ longer than
width
are truncated to lengthwidth
. Series’ shorter than lengthwidth
are padded at the end with zeros up to lengthwidth
.The same behavior is applied to the target if it is a series and passed to the transformer.
- Parameters
- widthint >= 1
width of segments (number of samples)
Methods
fit
(self, X[, y])Fit the transform.
fit_transform
(self, X, y[, sample_weight])Fit the data and transform (required by sklearn API)
get_params
(self[, deep])Get parameters for this estimator.
set_params
(self, \*\*params)Set the parameters of this estimator.
transform
(self, X[, y, sample_weight])Transforms the time series data into fixed length segments using padding and or truncation If y is a time series and passed, it will be transformed as well
-
fit
(self, X, y=None)[source]¶ Fit the transform. Does nothing, for compatibility with sklearn API.
- Parameters
- Xarray-like, shape [n_series, …]
Time series data and (optionally) contextual data
- yNone
There is no need of a target in a transformer, yet the pipeline API requires this parameter.
- Returns
- selfobject
Returns self.
-
transform
(self, X, y=None, sample_weight=None)[source]¶ Transforms the time series data into fixed length segments using padding and or truncation If y is a time series and passed, it will be transformed as well
- Parameters
- Xarray-like, shape [n_series, …]
Time series data and (optionally) contextual data
- yarray-like shape [n_series], default = None
target vector
- sample_weightarray-like shape [n_series], default = None
sample weights
- Returns
- X_newarray-like, shape [n_series, ]
transformed time series data
- y_newarray-like, shape [n_series]
expanded target vector
- sample_weight_newNone
-
class
seglearn.transform.
InterpLongToWide
(sample_period, kind='linear', categorical_target=False, assume_sorted=True)[source]¶ Converts time series in long format dataframes (where variables are sampled at different times) to wide format data frames usable by the rest of seglearn using direct value interpolation.
Input data for this class must have at least 3 columns of type (time, var_type, var_value) Additional columns are treated as additional channels of var_value (e.g. time, var_type, var_value1, var_value2).
Each time series must have the same var_types and the same number of columns.
Default interpolation is linear, but other types can be specified. If the target is a series, it will be resampled as well.
categorical_target should be set to True if the target series is a class The transformer will then use nearest neighbor interp on the target.
The interpolation to a linear sampling space, and conversion to wide format dataframe results in the removal of the time column and var_type columns in the data.
If start time or similar is important to the estimator, use a context variable.
- Parameters
- sample_periodnumeric
desired sampling period
- kindstring
interpolation type - valid types as per scipy.interpolate.interp1d
- categorical_targetbool
set to True for classification problems to use nearest instead of linear interp for the target
- assume_sortedbool
assume time series data are sorted by time stamp
Examples
>>> import numpy as np >>> from seglearn.transform import InterpLongToWide >>> >>> # sample stacked input with values from 2 variables each with 2 channels >>> t = np.array([1.1, 1.2, 2.1, 3.3, 3.4, 3.5]) >>> s = np.array([0, 1, 0, 0, 1, 1]) >>> v1 = np.array([3, 4, 5, 7, 15, 25]) >>> v2 = np.array([5, 7, 6, 9, 22, 35]) >>> X = [np.column_stack([t, s, v1, v2])] >>> y = [np.array([1, 2, 2, 2, 3, 3])] >>> >>> stacked_interp = InterpLongToWide(0.5) >>> stacked_interp.fit(X, y) >>> Xc, yc, _ = stacked_interp.transform(X, y)
Methods
fit
(self, X[, y])Fit the transform.
fit_transform
(self, X, y[, sample_weight])Fit the data and transform (required by sklearn API)
get_params
(self[, deep])Get parameters for this estimator.
set_params
(self, \*\*params)Set the parameters of this estimator.
transform
(self, X[, y, sample_weight])Transforms the time series data with linear direct value interpolation If y is a time series and passed, it will be transformed as well The time dimension is removed from the data
-
fit
(self, X, y=None)[source]¶ Fit the transform. Does nothing, for compatibility with sklearn API.
- Parameters
- Xarray-like, shape [n_series, …]
Time series data and (optionally) contextual data
- yNone
There is no need of a target in a transformer, yet the pipeline API requires this parameter.
- Returns
- selfobject
Returns self.
-
transform
(self, X, y=None, sample_weight=None)[source]¶ Transforms the time series data with linear direct value interpolation If y is a time series and passed, it will be transformed as well The time dimension is removed from the data
- Parameters
- Xarray-like, shape [n_series, …]
Time series data and (optionally) contextual data
- yarray-like shape [n_series], default = None
target vector
- sample_weightarray-like shape [n_series], default = None
sample weights
- Returns
- X_newarray-like, shape [n_series, ]
transformed time series data
- y_newarray-like, shape [n_series]
expanded target vector
- sample_weight_newarray-like or None
None is returned if target is changed. Otherwise it is returned unchanged.
-
class
seglearn.transform.
Interp
(sample_period, kind='linear', categorical_target=False, assume_sorted=True)[source]¶ Transformer for resampling time series data to a fixed period over closed interval (direct value interpolation). Default interpolation is linear, but other types can be specified. If the target is a series, it will be resampled as well.
categorical_target should be set to True if the target series is a class The transformer will then use nearest neighbor interp on the target.
This transformer assumes the time dimension is column 0, i.e. X[0][:,0] Note the time dimension is removed, since this becomes a linear sequence. If start time or similar is important to the estimator, use a context variable.
- Parameters
- sample_periodnumeric
desired sampling period
- kindstring
interpolation type - valid types as per scipy.interpolate.interp1d
- categorical_targetbool
set to True for classification problems to use nearest instead of linear interp for the target
- assume_sortedbool
assume time series data is sorted by timestamp
Methods
fit
(self, X[, y])Fit the transform.
fit_transform
(self, X, y[, sample_weight])Fit the data and transform (required by sklearn API)
get_params
(self[, deep])Get parameters for this estimator.
set_params
(self, \*\*params)Set the parameters of this estimator.
transform
(self, X[, y, sample_weight])Transforms the time series data with linear direct value interpolation If y is a time series and passed, it will be transformed as well The time dimension is removed from the data
-
fit
(self, X, y=None)[source]¶ Fit the transform. Does nothing, for compatibility with sklearn API.
- Parameters
- Xarray-like, shape [n_series, …]
Time series data and (optionally) contextual data
- yNone
There is no need of a target in a transformer, yet the pipeline API requires this parameter.
- Returns
- selfobject
Returns self.
-
transform
(self, X, y=None, sample_weight=None)[source]¶ Transforms the time series data with linear direct value interpolation If y is a time series and passed, it will be transformed as well The time dimension is removed from the data
- Parameters
- Xarray-like, shape [n_series, …]
Time series data and (optionally) contextual data
- yarray-like shape [n_series], default = None
target vector
- sample_weightarray-like shape [n_series], default = None
sample weights
- Returns
- X_newarray-like, shape [n_series, ]
transformed time series data
- y_newarray-like, shape [n_series]
expanded target vector
- sample_weight_newarray-like or None
None is returned if target is changed. Otherwise it is returned unchanged.
-
class
seglearn.transform.
FeatureRep
(features='default', verbose=False)[source]¶ A transformer for calculating a feature representation from segmented time series data.
This transformer calculates features from the segmented time series’, by computing the same feature set for each segment from each time series in the data set.
The
features
computed are a parameter of this transformer, defined by a dict of functions. The seglearn package includes some useful features, but this basic feature set can be easily extended.- Parameters
- featuresdict, optional
Dictionary of functions for calculating features from a segmented time series. Each function in the dictionary is specified to compute features from a multivariate segmented time series along axis 1 (the segment) eg:
>>> def mean(X): >>> F = np.mean(X, axis = 1) >>> return F X : array-like shape [n_samples, segment_width, n_variables] F : array-like [n_samples, n_features] The number of features returned (n_features) must be >= 1
If features is not specified, a default feature dictionary will be used (see base_features). See
feature_functions
for example implementations.- verbose: boolean, optional (default false)
Controls the verbosity of output messages
Examples
>>> from seglearn.transform import FeatureRep, Segment >>> from seglearn.pipe import Pype >>> from seglearn.feature_functions import mean, var, std, skew >>> from seglearn.datasets import load_watch >>> from sklearn.ensemble import RandomForestClassifier >>> data = load_watch() >>> X = data['X'] >>> y = data['y'] >>> fts = {'mean': mean, 'var': var, 'std': std, 'skew': skew} >>> clf = Pype([('seg', Segment()), >>> ('ftr', FeatureRep(features = fts)), >>> ('rf',RandomForestClassifier())]) >>> clf.fit(X, y) >>> print(clf.score(X, y))
- Attributes
- f_labelslist of string feature labels (in order) corresponding to the computed features
Methods
fit
(self, X[, y])Fit the transform
fit_transform
(self, X[, y])Fit to data, then transform it.
get_params
(self[, deep])Get parameters for this estimator.
set_params
(self, \*\*params)Set the parameters of this estimator.
transform
(self, X)Transform the segmented time series data into feature data.
-
fit
(self, X, y=None)[source]¶ Fit the transform
- Parameters
- Xarray-like, shape [n_series, …]
Segmented time series data and (optionally) contextual data
- yNone
There is no need of a target in a transformer, yet the pipeline API requires this parameter.
- Returns
- selfobject
Returns self.
-
transform
(self, X)[source]¶ Transform the segmented time series data into feature data. If contextual data is included in X, it is returned with the feature data.
- Parameters
- Xarray-like, shape [n_series, …]
Segmented time series data and (optionally) contextual data
- Returns
- X_newarray shape [n_series, …]
Feature representation of segmented time series data and contextual data
-
class
seglearn.transform.
FeatureRepMix
(transformers)[source]¶ A transformer for calculating a feature representation from segmented time series data.
This transformer calculates features from the segmented time series’, by applying the supplied list of FeatureRep transformers on the specified columns of data. Non-specified columns are dropped.
The segmented time series data is expected to enter this transform in the form of num_samples x segment_size x num_features and to leave this transform in the form of num_samples x num_features. The term columns refers to the last dimension of both representations.
- Note: This code is partially taken (_validate and _transformers functions with docstring) from
the scikit-learn ColumnTransformer made available under the 3-Clause BSD license.
- Parameters
- transformerslist of (name, transformer, columns) to be applied on the segmented time series
- namestring
unique string which is used to prefix the f_labels of the FeatureRep below
- transformerFeatureRep transform
to be applied on the columns specified below
- columnsinteger, slice or boolean mask
to specify the columns to be transformed
Examples
>>> from seglearn.transform import FeatureRepMix, FeatureRep, Segment >>> from seglearn.pipe import Pype >>> from seglearn.feature_functions import mean, var, std, skew >>> from seglearn.datasets import load_watch >>> from sklearn.ensemble import RandomForestClassifier >>> data = load_watch() >>> X = data['X'] >>> y = data['y'] >>> mask = [False, False, False, True, True, True] >>> clf = Pype([('seg', Segment()), >>> ('union', FeatureRepMix([ >>> ('ftr_a', FeatureRep(features={'mean': mean}), 0), >>> ('ftr_b', FeatureRep(features={'var': var}), [0,1,2]), >>> ('ftr_c', FeatureRep(features={'std': std}), slice(3,7)), >>> ('ftr_d', FeatureRep(features={'skew': skew}), mask), >>> ])), >>> ('rf',RandomForestClassifier())]) >>> clf.fit(X, y) >>> print(clf.score(X, y))
- Attributes
- f_labelslist of string feature labels (in order) corresponding to the computed features
Methods
fit
(self, X[, y])Fit the transform
fit_transform
(self, X[, y])Fit to data, then transform it.
get_params
(self[, deep])Get parameters for this transformer.
set_params
(self, \*\*kwargs)Set the parameters of this transformer.
transform
(self, X)Transform the segmented time series data into feature data.
-
fit
(self, X, y=None)[source]¶ Fit the transform
- Parameters
- Xarray-like, shape [n_series, …]
Segmented time series data and (optionally) contextual data
- yNone
There is no need of a target in a transformer, yet the pipeline API requires this parameter.
- Returns
- selfobject
Returns self.
-
get_params
(self, deep=True)[source]¶ Get parameters for this transformer.
- Parameters
- deepboolean, optional
If True, will return the parameters for this transformer and contained transformers.
- Returns
- paramsmapping of string to any parameter names mapped to their values.
-
set_params
(self, **kwargs)[source]¶ Set the parameters of this transformer.
Valid parameter keys can be listed with
get_params()
.- Returns
- self
-
transform
(self, X)[source]¶ Transform the segmented time series data into feature data. If contextual data is included in X, it is returned with the feature data.
- Parameters
- Xarray-like, shape [n_series, …]
Segmented time series data and (optionally) contextual data
- Returns
- X_newarray shape [n_series, …]
Feature representation of segmented time series data and contextual data
-
class
seglearn.transform.
FunctionTransformer
(func=None, func_kwargs={})[source]¶ Transformer for applying a custom function to time series data.
- Parameters
- funcfunction, optional (default=None)
the function to be applied to Xt, the time series part of X (contextual variables Xc are passed through unaltered) - X remains unchanged if no function is supplied
- func_kwargsdictionary, optional (default={})
keyword arguments to be passed to the function call
- Returns
- selfobject
returns self
Examples
>>> from seglearn.transform import FunctionTransformer >>> import numpy as np >>> >>> def choose_cols(Xt, cols): >>> return [time_series[:, cols] for time_series in Xt] >>> >>> X = [np.array([[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11]]), >>> np.array([[30, 40, 50], [60, 70, 80], [90, 100, 110]])] >>> y = [np.array([True, False, False, True]), >>> np.array([False, True, False])] >>> trans = FunctionTransformer(choose_cols, func_kwargs={"cols":[0,1]}) >>> X = trans.fit_transform(X, y)
Methods
fit
(self, X[, y])Fit the transform
fit_transform
(self, X[, y])Fit to data, then transform it.
get_params
(self[, deep])Get parameters for this estimator.
set_params
(self, \*\*params)Set the parameters of this estimator.
transform
(self, X)Transforms the time series data based on the provided function.
-
fit
(self, X, y=None)[source]¶ Fit the transform
- Parameters
- Xarray-like, shape [n_samples, …]
time series data and (optionally) contextual data
- yNone
there is no need of a target in a transformer, yet the pipeline API requires this
- Returns
- selfobject
returns self
-
transform
(self, X)[source]¶ Transforms the time series data based on the provided function. Note this transformation must not change the number of samples in the data.
- Parameters
- Xarray-like, shape [n_samples, …]
time series data and (optionally) contextual data
- Returns
- Xtarray-like, shape [n_samples, …]
transformed time series data