Training of multi-label embeddings for k-shingled input sequences. for Tensorflow2/Keras
Project description
keras-multilabel-embedding
The package contains a TensorFlow2/Keras class to train an Embedding matrix for multi-label inputs, i.e. instead of 1 ID per token (one hot encoding), N IDs per token can be provided as model input.
An PyTorch implementation can be found here: https://github.com/ulf1/torch-multilabel-embedding (pip install torch-multilabel-embedding)
Usage
Multi-label embeddings with fixed number of labels
import keras_multilabel_embedding as tml
import tensorflow as tf
# a sequence of multi-label data points
x_ids = [[1, 2, 4], [0, 1, 2], [2, 1, 4], [3, 2, 1]]
x_ids = tf.constant(x_ids)
# initialize layer
layer = tml.MultiLabelEmbedding(
vocab_size=5, embed_size=300, random_state=42)
# predict
y = layer(x_ids)
Multi-label embeddings with variable number of labels
import keras_multilabel_embedding as tml
import tensorflow as tf
# a sequence of multi-label data points
x_ids = [[1, 2, 4], [0, 1, 2], [2, 1], [3]]
# initialize layer
layer = tml.MultiLabelEmbedding(
vocab_size=5, embed_size=300, random_state=42)
# predict
y = layer(x_ids)
Appendix
Installation
The keras-multilabel-embedding git repo is available as PyPi package
pip install keras-multilabel-embedding
pip install git+ssh://git@github.com/ulf1/keras-multilabel-embedding.git
Install a virtual environment
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt --no-cache-dir
pip install -r requirements-dev.txt --no-cache-dir
pip install -r requirements-demo.txt --no-cache-dir
(If your git repo is stored in a folder with whitespaces, then don’t use the subfolder .venv. Use an absolute path without whitespaces.)
Python commands
Jupyter for the examples: jupyter lab
Check syntax: flake8 --ignore=F401 --exclude=$(grep -v '^#' .gitignore | xargs | sed -e 's/ /,/g')
Run Unit Tests: PYTHONPATH=. pytest
Publish
pandoc README.md --from markdown --to rst -s -o README.rst
python setup.py sdist
twine upload -r pypi dist/*
Clean up
find . -type f -name "*.pyc" | xargs rm
find . -type d -name "__pycache__" | xargs rm -r
rm -r .pytest_cache
rm -r .venv
Support
Please open an issue for support.
Contributing
Please contribute using Github Flow. Create a branch, add commits, and open a pull request.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Hashes for keras-multilabel-embedding-0.1.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | f701806463d76e0949568979388c60f59d2cce58517c93fb7e8c1d75b84018ed |
|
MD5 | 6e0b5dee545611e52cb5f700b8033555 |
|
BLAKE2b-256 | 68b11a192b5fed9be71b1d84c06a69ba64939029cd30511111e3a7816f067be9 |