svim · PyPI

A structural variant caller for long reads.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Environment
- Console
Intended Audience
- Science/Research
License
- OSI Approved :: GNU General Public License v3 (GPLv3)
Programming Language
- Python :: 3.6
Topic
- Scientific/Engineering :: Bio-Informatics

Project description

https://img.shields.io/pypi/v/svim?style=flat

https://img.shields.io/conda/vn/bioconda/svim?style=flat

https://img.shields.io/conda/dn/bioconda/svim?label=bioconda%20downloads&style=flat

https://img.shields.io/badge/published%20in-Bioinformatics-blue.svg

SVIM (pronounced SWIM) is a structural variant caller for long sequencing reads. It is able to detect and classify the following six classes of structural variation: deletions, insertions, inversions, tandem duplications, interspersed duplications and translocations. SVIM also estimates the genotypes of deletions, insertions, inversions and interspersed duplications. Unlike other methods, SVIM integrates information from across the genome to precisely distinguish similar events, such as tandem and interspersed duplications and simple insertions. In our experiments on simulated data and real datasets from PacBio and Nanopore sequencing machines, SVIM reached consistently better results than competing methods.

Note! To analyze haploid or diploid genome assemblies or contigs, please use our other method SVIM-asm.

Background on Structural Variants and Long Reads

https://raw.githubusercontent.com/eldariont/svim/master/docs/SVclasses.png

Structural variants (SVs) are typically defined as genomic variants larger than 50bps (e.g. deletions, duplications, inversions). Studies have shown that they affect more bases in an average genome than SNPs or small Indels. Consequently, they have a large impact on genes and regulatory regions. This is reflected in the large number of genetic disorders and other disease that are associated to SVs.

Common sequencing technologies by providers such as Illumina generate short reads with high accuracy. However, they exhibit weaknesses in repeat and low-complexity regions where SVs are particularly common. Single molecule long-read sequencing technologies from Pacific Biotechnologies and Oxford Nanopore produce reads with error rates of up to 15% but with lengths of several kbps. The high read lengths enable them to cover entire repeats and SVs which facilitates SV detection.

Installation

#Install via conda into a new environment (recommended): installs all dependencies including read alignment dependencies
conda create -n svim_env --channel bioconda svim

#Install via conda into existing (active) environment: installs all dependencies including read alignment dependencies
conda install --channel bioconda svim

#Install via pip (requires Python 3.6.* or newer): installs all dependencies except those necessary for read alignment (ngmlr, minimap2, samtools)
pip install svim

#Install from github (requires Python 3.6.* or newer): installs all dependencies except those necessary for read alignment (ngmlr, minimap2, samtools)
git clone https://github.com/eldariont/svim.git
cd svim
pip install .

Dependencies

edlib for edit distance computation
matplotlib>=3.3.0 for plotting
numpy and scipy for hierarchical clustering
pysam (>=0.15.2) for SAM/BAM file processing
pyspoa (>=0.0.6) for consensus sequence computation
py-cpuinfo (>=7.0.0) for CPU info retrieval (checking for SIMD capabilities)

Changelog

v2.0.0: adds consensus sequence computation for insertions, improves clustering step (considers sequence similarity when clustering insertions and prevents signatures from same read to be clustered together), outputs sequence alleles for all SV types except BNDs and DUPs by default, updates default parameters, bugfixes
v1.4.2: fixes invalid start coordinates in VCF output, issues warning for invalid characters in contig names
v1.4.1: improves clustering of translocation breakpoints (BNDs), improves –all_bnds mode, bugfixes
v1.4.0: fixes and improves clustering of insertions, adds option –all_bnds to output all SV classes in breakend notation, updates default value of –partition_max_distance to avoid very large partitions, bugfixes
v1.3.1: small changes to partitioning and clustering algorithm, adds two new command-line options to output duplications as INS records in VCF, removes limit on number of supplementary alignments, removes q5 filter, bugfixes
v1.3.0: improves BND detection, adds INFO:ZMWS tag with number of supporting PacBio wells, adds sequence alleles for INS, adds FORMAT:CN tag for tandem duplications, bugfixes
v1.2.0: adds 3 more VCF output options: output sequence instead of symbolic alleles in VCF, output names of supporting reads, output insertion sequences of supporting reads
v1.1.0: outputs BNDs in VCF, detects large tandem duplications, allows skipping genotyping, makes VCF output more flexible, adds genotype scatter plot
v1.0.0: adds genotyping of deletions, inversions, insertions and interspersed duplications, produces plots of SV length distribution, improves help descriptions
v0.5.0: replaces graph-based clustering with hierarchical clustering, modifies scoring function, improves partitioning prior to clustering, improves calling from coordinate-sorted SAM/BAM files, improves VCF output
v0.4.4: includes exception message into log files, bug fixes, adds tests and sets up Travis
v0.4.3: adds support for coordinate-sorted SAM/BAM files, improves VCF output and increases compatibility with IGV and truvari, bug fixes

Input

SVIM analyzes long reads given as a FASTA/FASTQ file (uncompressed or gzipped) or a file list. Alternatively, it can analyze an alignment file in BAM format. SVIM has been successfully tested on PacBio CLR, PacBio CCS (HiFi) and Oxford Nanopore data. It has been tested on alignment files produced by the read aligners minimap2, pbmm2 and NGMLR.

Output

SVIM’s main output file called variants.vcf is placed into the given working directory.

Usage

Please see our wiki.

Contact

If you experience problems or have suggestions please create an issue or a pull request or contact heller_d@molgen.mpg.de.

Citation

Feel free to read and cite our paper in Bioinformatics: https://doi.org/10.1093/bioinformatics/btz041

License

The project is licensed under the GNU General Public License.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Environment
- Console
Intended Audience
- Science/Research
License
- OSI Approved :: GNU General Public License v3 (GPLv3)
Programming Language
- Python :: 3.6
Topic
- Scientific/Engineering :: Bio-Informatics

Release history Release notifications | RSS feed

This version

2.0.0

Jun 18, 2021

2.0.0b1 pre-release

Apr 6, 2021

1.4.2

Oct 8, 2020

1.4.1

Jul 27, 2020

1.4.0

May 27, 2020

1.3.1

Mar 30, 2020

1.3.0

Jan 20, 2020

1.2.0

Jul 25, 2019

1.1.1

Jun 25, 2019

1.1.0

Jun 21, 2019

1.0.0

Apr 29, 2019

0.5.0

Mar 6, 2019

0.4.4

Feb 11, 2019

0.4.3

Jan 8, 2019

0.4.2

Dec 11, 2018

0.4.1

Dec 1, 2018

0.4.0

Dec 1, 2018

0.3.1

Nov 21, 2018

0.3

Jul 27, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

svim-2.0.0.tar.gz (59.1 kB view hashes)

Uploaded Jun 18, 2021 Source

Built Distribution

svim-2.0.0-py3-none-any.whl (81.4 kB view hashes)

Uploaded Jun 18, 2021 Python 3

Hashes for svim-2.0.0.tar.gz

Hashes for svim-2.0.0.tar.gz
Algorithm	Hash digest
SHA256	`1be9cfb84e420858b9e08fc3664b8d16d76bd2f241e6a87d876d9292d66ea1a3`
MD5	`6e94c5ae4e812dcf207cbccd6a49740e`
BLAKE2b-256	`3249e9ad488571e0eb88757f9b8920730582b4080063cc346c3e1ce86205827a`

Hashes for svim-2.0.0-py3-none-any.whl

Hashes for svim-2.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6a6e5d963055651bb6d89e574e06c405f5a2b2cc88c31a1ea8992a412d5edbe8`
MD5	`798bdaa007f4accd3569a8d9ec418695`
BLAKE2b-256	`30e9f6a00a8b69bd623b9c8d50f0d178a9b70caf20894515bfa02cc6e814b311`