Skip to main content

SMPrecursorPrediction

Project description

SMPrecursorPredictor

A ML pipeline for the prediction of specialised metabolites starting substances.

Installation

Manually

  1. Clone the repository and move into the directory:
git clone
cd SMPrecursorPredictor
  1. Create a conda environment and activate it:
conda create -n sm_precursor_predictor python=3.10
conda activate sm_precursor_predictor
  1. Install the dependencies:
pip install -r requirements.txt
  1. Install the package:
pip install .

Pypi

  1. Create a conda environment and activate it:
conda create -n sm_precursor_predictor python=3.10
conda activate sm_precursor_predictor
pip install SMPrecursorPrediction

Making predictions

Models available:

  • Layered FP + Low Variance FS + Ridge Classifier
  • Morgan FP + Ridge Classifier
from sm_precursor_predictor import predict_precursors
precursors = predict_precursors(
            ["[H][C@]89CN(CCc1c([nH]c2ccccc12)[C@@](C(=O)OC)(c3cc4c(cc3OC)N(C)[C@@]5([H])[C@@]"
             "(O)(C(=O)OC)[C@H](OC(C)=O)[C@]7(CC)C=CCN6CC[C@]45[C@@]67[H])C8)C[C@](O)(CC)C9",
             "COC1=C(C=CC(=C1)C2=C(C(=O)C3=C(C=C(C=C3O2)O)O)O[C@H]4[C@@H]([C@H]([C@H]([C@H](O4)CO)O)O)O)O"],
             model="Layered FP + Low Variance FS + Ridge Classifier")
print(precursors)

or

read a csv file with a column of SMILES and a column of IDs and save the predictions in a csv file:

from sm_precursor_predictor import predict_from_csv
predictions = predict_from_csv("path_to_csv", 
                               smiles_field="SMILES", 
                               ids_field="ID",
                               model="Layered FP + Low Variance FS + Ridge Classifier")
predictions.to_csv("path_to_save_predictions.csv")

Making and explaining predictions

This is only possible with one model: Morgan FP + Ridge Classifier.

Example with linalool:

from sm_precursor_predictor import get_prediction_and_explanation

prediction, images, plots = get_prediction_and_explanation(smiles="CC(=CCCC(C)(C=C)O)C", threshold=0.20)

feature_importance

prediction
['Geranyl diphosphate']
images[0]

Linalool

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

SMPrecursorPrediction-0.0.2.tar.gz (338.2 kB view hashes)

Uploaded Source

Built Distribution

SMPrecursorPrediction-0.0.2-py3-none-any.whl (338.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page