Skip to main content

napkinXC is an extremely simple and fast library for extreme multi-class and multi-label classification.

Project description

napkinXC

C++ build Python build Documentation Status PyPI version

napkinXC is an extremely simple and fast library for extreme multi-class and multi-label classification, that focus on implementing various methods for Probabilistic Label Trees. It allows training a classifier for very large datasets in just a few lines of code with minimal resources.

Right now, napkinXC implements the following features both in Python and C++:

  • Probabilistic Label Trees (PLTs) and Hierarchical softmax (HSM),
  • different types of inference methods (top-k, above a given threshold, etc.),
  • fast prediction with labels weight, e.g., propensity scores,
  • efficient online F-measure optimization (OFO) procedure,
  • different tree building methods, including hierarchical k-means clustering method,
  • training of tree node
  • support for custom tree structures, and node weights,
  • helpers to download and load data from XML Repository,
  • helpers to measure performance (precision@k, recall@k, nDCG@k, propensity-scored precision@k, and more).

Please note that this library is still under development and also serves as a base for experiments. API may not be compatible between releases and some of the experimental features may not be documented. Do not hesitate to open an issue in case of a question or problem!

The napkinXC is distributed under the MIT license. All contributions to the project are welcome!

Python Quick Start and Documentation

Install via pip:

pip install napkinxc

We provide precompiled wheels for many Linux distros, macOS, and Windows for Python 3.7+. In case there is no wheel for your os, it will be quickly compiled from the source. Compilation from source requires modern C++17 compiler, CMake, Git, and Python 3.7+ installed.

The latest (master) version can be installed directly from the GitHub repository (not recommended):

pip install git+https://github.com/mwydmuch/napkinXC.git

A minimal example of usage:

from napkinxc.datasets import load_dataset
from napkinxc.models import PLT
from napkinxc.measures import precision_at_k

X_train, Y_train = load_dataset("eurlex-4k", "train")
X_test, Y_test = load_dataset("eurlex-4k", "test")
plt = PLT("eurlex-model")
plt.fit(X_train, Y_train)
Y_pred = plt.predict(X_test, top_k=1)
print(precision_at_k(Y_test, Y_pred, k=1)) 

More examples can be found under python/examples directory, and napkinXC's documentation is available at https://napkinxc.readthedocs.io.

Executable

napkinXC can also be used as executable to train and evaluate models using data in LIBSVM format. See documentation for more details.

References and acknowledgments

This library implements methods from the following papers (see experiments directory for scripts to replicate the results):

Another implementation of PLT model is available in extremeText library, that implements approach described in this NeurIPS paper.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

napkinxc-0.6.2.tar.gz (43.6 MB view hashes)

Uploaded Source

Built Distributions

napkinxc-0.6.2-cp310-cp310-win_amd64.whl (324.0 kB view hashes)

Uploaded CPython 3.10 Windows x86-64

napkinxc-0.6.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (540.3 kB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

napkinxc-0.6.2-cp310-cp310-macosx_12_0_x86_64.whl (334.8 kB view hashes)

Uploaded CPython 3.10 macOS 12.0+ x86-64

napkinxc-0.6.2-cp310-cp310-macosx_11_0_x86_64.whl (344.7 kB view hashes)

Uploaded CPython 3.10 macOS 11.0+ x86-64

napkinxc-0.6.2-cp310-cp310-macosx_10_15_x86_64.whl (349.0 kB view hashes)

Uploaded CPython 3.10 macOS 10.15+ x86-64

napkinxc-0.6.2-cp39-cp39-win_amd64.whl (324.1 kB view hashes)

Uploaded CPython 3.9 Windows x86-64

napkinxc-0.6.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (540.2 kB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

napkinxc-0.6.2-cp39-cp39-macosx_12_0_x86_64.whl (334.9 kB view hashes)

Uploaded CPython 3.9 macOS 12.0+ x86-64

napkinxc-0.6.2-cp39-cp39-macosx_11_0_x86_64.whl (344.8 kB view hashes)

Uploaded CPython 3.9 macOS 11.0+ x86-64

napkinxc-0.6.2-cp39-cp39-macosx_10_15_x86_64.whl (349.0 kB view hashes)

Uploaded CPython 3.9 macOS 10.15+ x86-64

napkinxc-0.6.2-cp38-cp38-win_amd64.whl (324.0 kB view hashes)

Uploaded CPython 3.8 Windows x86-64

napkinxc-0.6.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (540.4 kB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

napkinxc-0.6.2-cp38-cp38-macosx_12_0_x86_64.whl (334.9 kB view hashes)

Uploaded CPython 3.8 macOS 12.0+ x86-64

napkinxc-0.6.2-cp38-cp38-macosx_11_0_x86_64.whl (344.8 kB view hashes)

Uploaded CPython 3.8 macOS 11.0+ x86-64

napkinxc-0.6.2-cp38-cp38-macosx_10_15_x86_64.whl (349.0 kB view hashes)

Uploaded CPython 3.8 macOS 10.15+ x86-64

napkinxc-0.6.2-cp37-cp37m-win_amd64.whl (325.7 kB view hashes)

Uploaded CPython 3.7m Windows x86-64

napkinxc-0.6.2-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (542.9 kB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64

napkinxc-0.6.2-cp37-cp37m-macosx_12_0_x86_64.whl (332.3 kB view hashes)

Uploaded CPython 3.7m macOS 12.0+ x86-64

napkinxc-0.6.2-cp37-cp37m-macosx_11_0_x86_64.whl (342.4 kB view hashes)

Uploaded CPython 3.7m macOS 11.0+ x86-64

napkinxc-0.6.2-cp37-cp37m-macosx_10_15_x86_64.whl (346.2 kB view hashes)

Uploaded CPython 3.7m macOS 10.15+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page