Skip to content

sestaton/tephra

Repository files navigation

Tephra

A tool for discovering transposable elements and describing patterns of genome evolution

Build Coverage Github Version Docker Image Size Docker Pulls
CI codecov GitHub version Docker Image Size Pulls

What is Tephra?

Tephra is a command line application to annotate transposable elements from a genome assembly. The goal is to provide a high quality set of de novo annotations for all transposon types, describe the structure and evolution of those sequences, and do it without a reference set of transposon sequences (therefore being unbiased as possible).

RECOMMENDED USAGE

With Docker, you can create a container to run Tephra with the following command:

docker run -it --name tephra-con -v $(pwd)/db:/db:Z sestaton/tephra

That will create a container called tephra-con and start an interactive shell. The above assumes you have a directory called db in the working directory that contains your database files and the Tephra configuration. To run the full analysis, change to the mounted directory with cd /db in your container and run the following command:

tephra all -c tephra_config.yml

I recommend using nohup and then logging out, which will allow you to leave the container running in the background.

If you cannot use Docker, please see the INSTALL file included with this distribution to install Tephra on various operating systems.

BASIC USAGE

Tephra is a command-line program only for now. The command tephra itself controls all the action of the subcommands, which perform specific tasks. Typing the command tephra will show the available commands. Here is an example,

$ tephra 

Tephra version 0.14.0

Copyright (C) 2015-2023 S. Evan Staton
LICENSE -- MIT

Citation: Staton, SE. 2023. https://github.com/sestaton/tephra

Name:
     Tephra - A tool for discovering transposable elements and describing
     patterns of genome evolution

Description:
     This is an application to find transposable elements based on structural and sequence similarity features,
     group those elements into recognized (superfamilies) and novel (families) taxonomic groups,
     and infer patterns of evolution.

-------------------------------------------------------------------------------------------
USAGE: tephra <command> [options]

Available commands:
     
            age: Calculate the age distribution of LTR or TIR transposons.
            all: Run all subcommands and generate annotations for all transposon types.
   classifyltrs: Classify LTR retrotransposons into superfamilies and families.
   classifytirs: Classify TIR transposons into superfamilies.
  findfragments: Search a masked genome with a repeat database to find fragmented elements.
  findhelitrons: Find Helitons in a genome assembly.
       findltrs: Find LTR retrotransposons in a genome assembly.
    findnonltrs: Find non-LTR retrotransposons in a genome assembly.  
       findtirs: Find TIR transposons in a genome assembly.
      findtrims: Find TRIM retrotransposons in a genome assembly.
      illrecomb: Characterize the distribution of illegitimate recombination in a genome.
        maskref: Mask a reference genome with transposons.
     reannotate: Transfer annotations from a reference set of repeats to Tephra annotations.
        sololtr: Find solo-LTRs in a genome assembly.
           info: Show version information for all external programs configured and used by Tephra.
       
Most common usage:

    tephra all -c tephra_config.yml

 That will produce a FASTA and GFF3 of all intact and fragmented transposons in the genome,
 and generate a table of annotation results.

To get the configuration file, run:

    wget https://raw.githubusercontent.com/sestaton/tephra/master/config/tephra_config.yml

To see information about a subcommand, run:

    tephra <command> --help

To get more detailed information, run:

    tephra <command> --man

Typing a subcommand will show the usage of that command, for example:

$ tephra findnonltrs

[ERROR]: Required arguments not given.

Name:
     tephra findnonltrs - Find non-LTRs retrotransposons in a genome assembly.

Description:
     Find non-LTR retrotransposons in a reference genome, classify them into known superfamilies, 
     and generate a GFF file showing their locations and properties.

USAGE: tephra findnonltrs [-h] [-m]
    -m --man      :   Get the manual entry for a command.
    -h --help     :   Print the command usage.

Required:
    -g|genome     :   The genome sequences in FASTA format to search for non-LTR-RTs. 
    -o|gff        :   The GFF3 outfile to place the non-LTRs found in <genome>.

Options:
    -r|reference  :   The non-masked reference genome for base correction.
    -d|outdir     :   The location to place the results.
    -p|pdir       :   Location of the HMM models (Default: configured automatically).
    -t|threads    :   The number of threads to use for BLAST searches (Default: 1).
    -v|verbose    :   Display progress for each chromosome (Default: no).

SUPPORT AND DOCUMENTATION

You can get usage information at the command line with the following command:

perldoc tephra

The tephra program will also print a diagnostic help message when executed with no arguments, and display the available subcommands.

You can also look for information at:

Tephra wiki
        https://github.com/sestaton/tephra/wiki

Tephra issue tracker
        https://github.com/sestaton/tephra/issues

CITATION

A manuscript is in preparation, which includes a description of the all the methods and their uses, a comparison to other programs, and results from model systems. These will be provided in some form ahead of publication, as soon as they are available.

For now, please cite the github URL of this repo if you use Tephra. Thank you.

CURRENT STATUS

Please check the wiki for progress updates.

CONTRIBUTING

I welcome any comments, bug reports, feature requests, or contributions to the development of the project. Please submit a new issue (preferred) or send me an email and I would be happy to talk about Tephra or transposons.

LICENSE AND COPYRIGHT

Part of this project uses code from MGEScan-nonLTR, which is released under the GPL license. With permission of the authors, this code is packaged with Tephra. Below is the copyright for MGEScan-nonLTR:

Copyright (C) 2015. See the LICENSE file for license rights and limitations (GPL v3).

This program is part of MGEScan.

MGEScan is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

The license for Tephra is below:

Copyright (C) 2015-2023 S. Evan Staton

This program is distributed under the MIT (X11) License, which should be distributed with the package. If not, it can be found here: http://www.opensource.org/licenses/mit-license.php