Skip to main content

Scientific framework for representation in sequential data

Project description

SeqRep

Open Source? Yes! GitHub license Check Markdown links Open in Colab CodeFactor Code style: black Imports: isort

Scientific framework for representation in sequential data

Table of Content

Click to expand!

Description

This package aims to simplify the workflow of evaluation of machine learning models. It is primarily focused on sequential data. It helps with:

  • labeling data,
  • splitting data,
  • feature extraction,
  • feature reduction (i.e. selection or transformation),
  • running pipeline,
  • evaluation of results.

It also allows you to visualize each step.

The framework is designed for easy customization and extension of its functionality.

Installation

python -m pip install git+https://github.com/MIR-MU/seqrep

Features

See the README in the seqrep folder.

Usage

It is simple to use this package. After the import, you need to do three steps:

  1. Create your pipeline (which you want to evaluate);
  2. Create PipelineEvaluator (according to how you want to evaluate);
  3. Run the evaluation.
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import MinMaxScaler
from sklearn.svm import SVC

from seqrep.feature_engineering import PreviousValuesExtractor, TimeFeaturesExtractor
from seqrep.labeling import NextColorLabeler
from seqrep.splitting import TrainTestSplitter
from seqrep.scaling import UniversalScaler
from seqrep.evaluation import ClassificationEvaluator
from seqrep.pipeline_evaluation import PipelineEvaluator

# 1. step
pipe = Pipeline([('fext_prev', PreviousValuesExtractor()),
                 ('fext_time', TimeFeaturesExtractor()),
                 ('scale_u', UniversalScaler(scaler=MinMaxScaler())),
                 ])

# 2. step
pipe_eval = PipelineEvaluator(labeler = NextColorLabeler(),
                              splitter = TrainTestSplitter(),
                              pipeline = pipe,
                              model = SVC(),
                              evaluator = ClassificationEvaluator(),
                              )
# 3. step
result = pipe_eval.run(data=data)

See the examples folder for more details.

License

GitHub license

This package is licensed under the MIT license, so it is open source. Feel free to use it!

Acknowledgement

Thanks for the huge support to my supervisor Michal Stefanik! Gratitude also belongs to all members of the MIR-MU group. Finally, thanks go to the Faculty of Informatics of Masaryk University for supporting this project as a dean's project.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page