Skip to main content

A Python wrapper for minimap2-rs

Project description

Python bindings for the Rust FFI minimap2 library. In development! Feedback appreciated!

Why?

PyO3 makes it very easy to create Python libraries via Rust. Further, we can use Polars to export results as a dataframe (which can be used as-is, or converted to Pandas). Python allows for faster experimentation with novel algorithms, integration into machine learning pipelines, and provides an opportunity for those not familiar with Rust nor C/C++ to use minimap2.

Current State

Very early alpha. Please use, and open an issue for any features you need that are missing, and for any bugs you find.

How to use

Requirements

Polars and PyArrow, these should be installed when you install minimappers2

Creating an Aligner Instance

aligner = map_ont()
aligner.threads(4)

If you want an alignment performed, rather than just matches, enable .cigar()

aligner = map_hifi()
aligner.cigar()

Please note, at this time the following syntax is NOT supported:

aligner = map_ont().threads(4).cigar()

Creating an index

aligner.index("ref.fa")

To save a built-index, for future processing use:

aligner.index_and_save("ref.fa", "ref.mmi")

Then next time you use the index will be faster if you use the saved index instead.

aligner.load_index("ref.mmi")

Aligning a Single Sequence

query = Sequence(seq_name, seq)
aligner.map1(query)

# Example
seq = "CCAGAACGTACAAGGAAATATCCTCAAATTATCCCAAGAATTGTCCGCAGGAAATGGGGATAATTTCAGAAATGAGAG"
result = aligner.map1(Sequence("MySeq", seq))

Where seq_name and seq are both strings. The output is a Polars DataFrame.

Aligning Multiple Sequences

seqs = [Sequence("name of seq 1", seq1), 
        Sequence("name of seq 2", seq1)]
result = aligner.map(seqs)

Example Notebook

Please see the example notebook for more examples.

Mapping a file

Please open an issue if you need to map files from this API.

Results

All results are returned as Polars dataframes. You can convert Polars dataframes to Pandas dataframes with .to_pandas()

  • Polars is the fastest dataframe library in the Python Ecosystem.
  • Polars provides a nice data bridge between Rust and Python.

For more information, please see the Polars User Guide or the Polars Guide for Pandas users.

Example of Results

Here is an image of the resulting dataframe Resulting Dataframe Image

NOTE Mapq, Cigar, and others will not show up unless .cigar() is enabled on the aligner itself.

Errors

As this is a very-early stage library, error checking is not yet implemented. When things crash you will likely need to restart your python interpreter (jupyter kernel). Let me know what happened and open an issue and I will get to it.

Compatability

  • Windows: Unlikely

  • Linux: Likely

  • Mac: Unknown

  • x86_64: Likely

  • aarch64: Unknown

  • neon: No (Open an issue)

  • Google Colab: No, not sure why though.

Performance

Effort has been made to make this as performant as possible, but if you need more performance, please use minimap2 directly and import the results.

Citation

You should cite the minimap2 papers if you use this in your work.

Li, H. (2018). Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics, 34:3094-3100. [doi:10.1093/bioinformatics/bty191][doi]

and/or:

Li, H. (2021). New strategies to improve minimap2 alignment accuracy. Bioinformatics, 37:4572-4574. [doi:10.1093/bioinformatics/btab705][doi2]

Changelog

0.1.0

  • Initial Functions implemented
  • Return results as Polars dfs

Funding

Genomics Aotearoa

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

minimappers2-0.1.3.tar.gz (158.6 kB view hashes)

Uploaded Source

Built Distribution

minimappers2-0.1.3-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.8 MB view hashes)

Uploaded CPython 3.7+ manylinux: glibc 2.17+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page