Python+Rust implementation of the Probabilistic Principal Component Analysis model
Project description
Probabilistic Principal Component Analysis (PPCA) model
This project implements a PPCA model implemented in Rust for Python using pyO3
and maturin
.
Installing
This package is available in PyPI!
pip install ppca-rs
And you can also use it natively in Rust:
cargo add ppca
Why use PPCA?
Glad you asked!
- The PPCA is a simples extension of the PCA (principal component analysis), but can be overall more robust to train.
- The PPCA is a proper statistical model. It doesn't spit out only the mean. You get standard deviations, covariances, and all the goodies that come from thre realm of probability and statistics.
- The PPCA model can handle missing values. If there is data missing from your dataset, it can extrapolate it with reasonable values and even give you a confidence interval.
- The training converges quickly and will always tend to a global maxima. No metaparameters to dabble with and no local maxima.
Why use ppca-rs
?
That's an easy one!
- It's written in Rust, with only a bit of Python glue on top. You can expect a performance in the same leage as of C code.
- It uses
rayon
to paralellize computations evenly across as many CPUs as you have. - It also uses fancy Linear Algebra Trickery Technology to reduce computational complexity in key bottlenecks.
- Battle-tested at Vio.com with some ridiculously huge datasets.
Quick example
import numpy as np
from ppca_rs import Dataset, PPCATrainer, PPCA
samples: np.ndarray
# Create your dataset from a rank 2 np.ndarray, where each line is a sample.
# Use non-finite values (`inf`s and `nan`) to signal masked values
dataset = Dataset(samples)
# Train the model (convenient edition!):
model: PPCAModel = PPCATrainer(dataset).train(state_size=10, n_iters=10)
# And now, here is a free sample of what you can do:
# Extrapolates the missing values with the most probable values:
extrapolated: Dataset = model.extrapolate(dataset)
# Smooths (removes noise from) samples and fills in missing values:
extrapolated: Dataset = model.filter_extrapolate(dataset)
# ... go back to numpy:
eextrapolated_np = extrapolated.numpy()
Juicy extras!
- Tired of the linear? We have support for PPCA mixture models. Make the most of your data with clustering and dimensionality reduction in a single tool!
- Support for adaptation of DataFrames using either
pandas
orpolars
. Never juggle thosedf
s in your code again.
Building from soure
Prerequisites
You will need Rust, which can be installed locally (i.e., without sudo
) and you will also need maturin
, which can be installed by
pip install maturin
pipenv
is also a good idea if you are going to mess around with it locally. At least, you need a venv
set, otherwise, maturin
will complain with you.
Installing it locally
Check the Makefile
for the available commands (or just type make
). To install it locally, do
make install # optional: i=python.version (e.g, `i=3.9`)
Messing around and testing
To mess around, inside a virtual environment (a Pipfile
is provided for the pipenv
lovers), do
maturin develop # use the flag --release to unlock superspeed!
This will install the package locally as is from source.
How do I use this stuff?
See the examples in the examples
folder. Also, all functions are type hinted and commented. If you are using pylance
or mypy
, it should be easy to navigate.
Is it faster than the pure Python implemetation you made?
You bet!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for ppca_rs-0.5.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 965cabb83b651a2a113bd1ae82fc8b519e88019b1fc43acd0dc160a4dd4a390d |
|
MD5 | c37c7b8b943ffadb52950b5f2ffec8a8 |
|
BLAKE2b-256 | 997601b81206dbe36f9a09df039bef72985067fe71749ddf90dd773c0786d129 |
Hashes for ppca_rs-0.5.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 77c195c135761d06546a270495c7e9e27744639e27772ddbb6e5ef70204a34f8 |
|
MD5 | acc3ed63f284f90b117e6a08d141d0fa |
|
BLAKE2b-256 | 9119b2da9e7c78dbe803f52b676e2adf9b8ffb445926438ef0982d95da15b830 |
Hashes for ppca_rs-0.5.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cb37052ec9ec88d5df2c0e9d9a5e4988fd376fc291d32ac5c7ca6df91d032947 |
|
MD5 | 4b02e0d6defc16d4db477ad881778b5a |
|
BLAKE2b-256 | 25516940e30240938c8a6a1aa0d93734d389d48b2dc82eff9440aff00a52607c |
Hashes for ppca_rs-0.5.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 86a622b5fe2fdcac4324ecd3cf133815dc95ee66acb6cd48d41016b65b26ee8a |
|
MD5 | b123e69b408f9f9729537196b0e7be7f |
|
BLAKE2b-256 | 776e0c658ab5ad9291ecece7763634a013d41b3b9ffbf4c1cacdc5ea1e70b74d |
Hashes for ppca_rs-0.5.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 101d85c9b3a7ff6f99280f9620db991b39a5ffa66760383ae89246aaf1fd88b7 |
|
MD5 | b25b5269a78107b92370f03ebb1b38a8 |
|
BLAKE2b-256 | e9986d39c827a41a4b9f8a161afc4dc32d218cfa253fdfc1df6c3d5374fab2e2 |