Semantic Quality Benchmark for Word Embeddings, i.e. Natural Language Models in Python. The shortname is `SeaQuBe` or `seaqube`. Simple call it '| ˈsi: kjuːb |'

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

SeaQuBe

Semantic Quality Benchmark for Word Embeddings, i.e. Natural Language Models in Python. Acronym SeaQuBe or seaqube.

This python framework provides several text augmentation implementations and word embedding quality evaluation methods. It is designed to fit in your machine learning pipeline. The BaseAugmentation class provides the same api as the python package nlpaug, so that this packages can used together smoothly. However BaseAugmentation provides also other methods. Detailed examples see beneath.

SeaQuBe provides also a toolkit to wrap a trained nlp model to a nice interactive tool.

Features

Text Data Augmentation
Chaining and Reducing of Text Data Augmentations
Word Embedding Quality Methods
Interactive NLM Model Wrapper

Demo

Augmentation

Level	Augmenter	Description
Character	QwertyAugmentation	Simulate keyboard distance error
Corpus	UnigramAugmentation	Replace ubiquitous words with other ubiquitous words
Word	Active2PassiveAugmentation	Change surface of document using an simple active-to-passive transformer
Word	EDAAugmentation	Augment document using the EDA algorithm
Word	EmbeddingAugmentation	Replace similar word using WordNet
Word	TranslationAugmentation	Change surface of document using translation and back-translation (with GoogleTranslate)

Augmentation Chainer

The streaming feature of augmentation is implemented in the AugmentationStreamer class. One Reduceing class exist, more can implemented extending the BaseReduction class.

Action	Class	Description
Streaming	AugmentationStreamer	Run augmentation for each document through all chained augmentations.
Reducing	UniqueCorpusReduction	Getting a list of documents, only unique documents are returned.

Word Embedding Evaluation

Method	Description
WordAnalogyBenchmark	This method benchmark how go relations of the type: `a is to b as c is to d` can be solved correctly.
WordSimilarityBenchmark	This methods compares the similarity of a word pair, calculated by a model with a human estimated similarity score.
WordOutliersBenchmark	This method benchmark how good a outlier of a group of words can be detected.
SemanticWordnetBenchmark	Based on the WordNet graph, the goodnes of the semantic / similarity of a nlp model is benchmarked.

Installation

SeaQuBe can be installed from PyPip using: pip install seaqube or run in the main directory: python setup.py install.

External Dependencies

Some external dependencies are not installed automatically, but seaqube or nltk might throw errors with an instruction what to do. For example seqube might ask you to run:

python -c "from seaqube import download;download('vec4ir')"

Quick Demo

from seaqube.augmentation.word import Active2PassiveAugmentation, EDAAugmentation, TranslationAugmentation, EmbeddingAugmentation
translate = TranslationAugmentation(max_length=2)
translate.doc_augment(['This', 'is', 'a', 'tokenized', 'corpus'])

Setup Dev Environment

TODO

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.1.11

Jan 28, 2021

0.1.10

Jan 27, 2021

0.1.1

Jan 14, 2021

0.1.0

Jan 6, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seaqube-0.1.11.tar.gz (3.1 MB view hashes)

Uploaded Jan 28, 2021 Source

Built Distribution

seaqube-0.1.11-py3-none-any.whl (3.2 MB view hashes)

Uploaded Jan 28, 2021 Python 3

Hashes for seaqube-0.1.11.tar.gz

Hashes for seaqube-0.1.11.tar.gz
Algorithm	Hash digest
SHA256	`aa6301d1e1fbbbde82c61d440e1970cab8a783d0d3843537649b9a2566b55b7c`
MD5	`76e3f9ca9da7fb025cadf35b5876b3ce`
BLAKE2b-256	`5190e862d9b191319341534da14c7aa58fe0f30e932514bf64aa19386614a237`

Hashes for seaqube-0.1.11-py3-none-any.whl

Hashes for seaqube-0.1.11-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fa43035722b3e591a3776ffd525fe215846c7c2ecc55cb24765a466051496372`
MD5	`b10f5c1c0dd09409d7015cf2b739be34`
BLAKE2b-256	`143ce43b3af9690579b65285af58baa7baf134cac4be9b1d06670ddd09182d4a`