An eXplainable AI package for tabular data.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

givasile

These details have not been verified by PyPI

Project links

documentation

Project description

Effector

Execute Tests Publish Documentation

effector an eXplainable AI package for tabular data. It:

creates global and regional effect plots
has a simple API with smart defaults, but can become flexible if needed
is model agnostic; can explain any underlying ML model
integrates easily with popular ML libraries, like Scikit-Learn, Tensorflow and Pytorch
is fast, for both global and regional methods
provides a large collection of global and regional effects methods

📖 Documentation | 🔍 Intro to global and regional effects | 🔧 API | 🏗 Examples

Installation

Effector requires Python 3.10+:

pip install effector

Dependencies: numpy, scipy, matplotlib, tqdm, shap.

Quickstart

Train an ML model

import effector
import keras
import numpy as np
import tensorflow as tf

np.random.seed(42)
tf.random.set_seed(42)

# Load dataset
bike_sharing = effector.datasets.BikeSharing(pcg_train=0.8)
X_train, Y_train = bike_sharing.x_train, bike_sharing.y_train
X_test, Y_test = bike_sharing.x_test, bike_sharing.y_test

# Define and train a neural network
model = keras.Sequential([
    keras.layers.Dense(1024, activation="relu"),
    keras.layers.Dense(512, activation="relu"),
    keras.layers.Dense(256, activation="relu"),
    keras.layers.Dense(1)
])
model.compile(optimizer="adam", loss="mse", metrics=["mae", keras.metrics.RootMeanSquaredError()])
model.fit(X_train, Y_train, batch_size=512, epochs=20, verbose=1)
model.evaluate(X_test, Y_test, verbose=1)

Wrap it in a callable

def predict(x):
    return model(x).numpy().squeeze()

Explain it with global effect plots

# Initialize the Partial Dependence Plot (PDP) object
pdp = effector.PDP(
    X_test,  # Use the test set as background data
    predict,  # Prediction function
    feature_names=bike_sharing.feature_names,  # (optional) Feature names
    target_name=bike_sharing.target_name  # (optional) Target variable name
)

# Plot the effect of a feature
pdp.plot(
    feature=3,  # Select the 3rd feature (feature: hour)
    nof_ice=200,  # (optional) Number of Individual Conditional Expectation (ICE) curves to plot
    scale_x={"mean": bike_sharing.x_test_mu[3], "std": bike_sharing.x_test_std[3]},  # (optional) Scale x-axis
    scale_y={"mean": bike_sharing.y_test_mu, "std": bike_sharing.y_test_std},  # (optional) Scale y-axis
    centering=True,  # (optional) Center PDP and ICE curves
    show_avg_output=True,  # (optional) Display the average prediction
    y_limits=[-200, 1000]  # (optional) Set y-axis limits
)

Feature effect plot

Explain it with regional effect plots

# Initialize the Regional Partial Dependence Plot (RegionalPDP)
r_pdp = effector.RegionalPDP(
    X_test,  # Test set data
    predict,  # Prediction function
    feature_names=bike_sharing.feature_names,  # Feature names
    target_name=bike_sharing.target_name  # Target variable name
)

# Summarize the subregions of the 3rd feature (temperature)
r_pdp.summary(
    features=3,  # Select the 3rd feature for the summary
    scale_x_list=[  # scale each feature with mean and std
        {"mean": bike_sharing.x_test_mu[i], "std": bike_sharing.x_test_std[i]}
        for i in range(X_test.shape[1])
    ]
)

Feature 3 - Full partition tree:
🌳 Full Tree Structure:
───────────────────────
hr 🔹 [id: 0 | heter: 0.43 | inst: 3476 | w: 1.00]
    workingday = 0.00 🔹 [id: 1 | heter: 0.36 | inst: 1129 | w: 0.32]
        temp ≤ 6.50 🔹 [id: 3 | heter: 0.17 | inst: 568 | w: 0.16]
        temp > 6.50 🔹 [id: 4 | heter: 0.21 | inst: 561 | w: 0.16]
    workingday ≠ 0.00 🔹 [id: 2 | heter: 0.28 | inst: 2347 | w: 0.68]
        temp ≤ 6.50 🔹 [id: 5 | heter: 0.19 | inst: 953 | w: 0.27]
        temp > 6.50 🔹 [id: 6 | heter: 0.20 | inst: 1394 | w: 0.40]
--------------------------------------------------
Feature 3 - Statistics per tree level:
🌳 Tree Summary:
─────────────────
Level 0🔹heter: 0.43
    Level 1🔹heter: 0.31 | 🔻0.12 (28.15%)
        Level 2🔹heter: 0.19 | 🔻0.11 (37.10%)

The summary of feature hr (hour) says that its effect on the output is highly dependent on the value of features:

workingday, wheteher it is a workingday or not
temp, what is the temperature the specific hour

Let's see how the effect changes on these subregions!

Is it workingday or not?

# Plot regional effects after the first-level split (workingday vs non-workingday)
for node_idx in [1, 2]:  # Iterate over the nodes of the first-level split
    r_pdp.plot(
        feature=3,  # Feature 3 (temperature)
        node_idx=node_idx,  # Node index (1: workingday, 2: non-workingday)
        nof_ice=200,  # Number of ICE curves
        scale_x_list=[  # Scale features by mean and std
            {"mean": bike_sharing.x_test_mu[i], "std": bike_sharing.x_test_std[i]}
            for i in range(X_test.shape[1])
        ],
        scale_y={"mean": bike_sharing.y_test_mu, "std": bike_sharing.y_test_std},  # Scale the target
        y_limits=[-200, 1000]  # Set y-axis limits
    )

Is it hot or cold?

# Plot regional effects after second-level splits (workingday vs non-workingday and hot vs cold temperature)
for node_idx in [3, 4, 5, 6]:  # Iterate over the nodes of the second-level splits
    r_pdp.plot(
        feature=3,  # Feature 3 (temperature)
        node_idx=node_idx,  # Node index (hot/cold temperature and workingday/non-workingday)
        nof_ice=200,  # Number of ICE curves
        scale_x_list=[  # Scale features by mean and std
            {"mean": bike_sharing.x_test_mu[i], "std": bike_sharing.x_test_std[i]}
            for i in range(X_test.shape[1])
        ],
        scale_y={"mean": bike_sharing.y_test_mu, "std": bike_sharing.y_test_std},  # Scale target
        y_limits=[-200, 1000]  # Set y-axis limits
    )

Supported Methods

effector implements global and regional effect methods:

Method	Global Effect	Regional Effect	Reference	ML model	Speed
PDP	`PDP`	`RegionalPDP`	PDP	any	Fast for a small dataset
d-PDP	`DerPDP`	`RegionalDerPDP`	d-PDP	differentiable	Fast for a small dataset
ALE	`ALE`	`RegionalALE`	ALE	any	Fast
RHALE	`RHALE`	`RegionalRHALE`	RHALE	differentiable	Very fast
SHAP-DP	`ShapDP`	`RegionalShapDP`	SHAP	any	Fast for a small dataset and a light ML model

Method Selection Guide

From the runtime persepective there are three criterias:

is the dataset small (N<10K) or large (N>10K instances) ?
is the ML model light (runtime < 0.1s) or heavy (runtime > 0.1s) ?
is the ML model differentiable or non-differentiable ?

Trust us and follow this guide:

light + small + differentiable = any([PDP, RHALE, ShapDP, ALE, DerPDP])
light + small + non-differentiable: [PDP, ALE, ShapDP]
heavy + small + differentiable = any([PDP, RHALE, ALE, DerPDP])
heavy + small + non differentiable = any([PDP, ALE])
big + not differentiable = ALE
big + differentiable = RHALE

Citation

If you use effector, please cite it:

@misc{gkolemis2024effector,
  title={effector: A Python package for regional explanations},
  author={Vasilis Gkolemis et al.},
  year={2024},
  eprint={2404.02629},
  archivePrefix={arXiv},
  primaryClass={cs.LG}
}

References

License

effector is released under the MIT License.

Algorithm	Hash digest
SHA256	`2490d64fa0dd767f441007f2c5a3517a53e24ce83045a20be397a0c418a57f40`
MD5	`d71378ab983830242071cdf1a54a09cc`
BLAKE2b-256	`673857fb9574d603a6097591c903cfb16d3a7382825ece4ef03e89823728dd29`

Algorithm	Hash digest
SHA256	`12071d674ba24eb869062e4e919fe266a51690bd2483bfbe0c06fcf98e5ae0bb`
MD5	`bdcf0d6cc6911e9dca8442cdf86ce568`
BLAKE2b-256	`a0fec985dd2ee81c17573cbf5657ecc62622253f10f01b0d4c7bc5f56b1bef5b`

effector 0.1.5

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Project description

Effector

Installation

Quickstart

Train an ML model

Wrap it in a callable

Explain it with global effect plots

Explain it with regional effect plots

Is it workingday or not?

Is it hot or cold?

Supported Methods

Method Selection Guide

Citation

References

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance