Logomaker: Beautiful sequence logos in python

Mapping Intimacies ◽

10.1101/635029 ◽

2019 ◽

Cited By ~ 10

Author(s):

Ammar Tareen ◽

Justin B. Kinney

Keyword(s):

Source Code ◽

Biological Properties ◽

Programming Environment ◽

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments ◽

Link Type ◽

Sequence Logos ◽

Python Programming ◽

Publication Quality

AbstractSequence logos are visually compelling ways of illustrating the biological properties of DNA, RNA, and protein sequences, yet it is currently difficult to generate such logos within the Python programming environment. Here we introduce Logomaker, a Python API for creating publication-quality sequence logos. Logomaker can produce both standard and highly customized logos from any matrix-like array of numbers. Logos are rendered as vector graphics that are easy to stylize using standard matplotlib functions. Methods for creating logos from multiple-sequence alignments are also included.Availability and ImplementationLogomaker can be installed using the pip package manager and is compatible with both Python 2.7 and Python 3.6. Source code is available athttp://github.com/jbkinney/logomaker.Supplemental InformationDocumentation is provided athttp://[email protected].

Download Full-text

Logomaker: beautiful sequence logos in Python

Bioinformatics ◽

10.1093/bioinformatics/btz921 ◽

2019 ◽

Vol 36 (7) ◽

pp. 2272-2274 ◽

Cited By ~ 23

Author(s):

Ammar Tareen ◽

Justin B Kinney

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Source Code ◽

Protein Sequences ◽

Biological Properties ◽

Programming Environment ◽

Multiple Sequence ◽

Sequence Logos ◽

Python Programming ◽

Publication Quality

Abstract Summary Sequence logos are visually compelling ways of illustrating the biological properties of DNA, RNA and protein sequences, yet it is currently difficult to generate and customize such logos within the Python programming environment. Here we introduce Logomaker, a Python API for creating publication-quality sequence logos. Logomaker can produce both standard and highly customized logos from either a matrix-like array of numbers or a multiple-sequence alignment. Logos are rendered as native matplotlib objects that are easy to stylize and incorporate into multi-panel figures. Availability and implementation Logomaker can be installed using the pip package manager and is compatible with both Python 2.7 and Python 3.6. Documentation is provided at http://logomaker.readthedocs.io; source code is available at http://github.com/jbkinney/logomaker.

Download Full-text

PhyloCSF++: A fast and user-friendly implementation of PhyloCSF with annotation tools

10.1101/2021.03.10.434297 ◽

2021 ◽

Author(s):

Christopher Pockrandt ◽

Martin Steinegger ◽

Steven L. Salzberg

Keyword(s):

Source Code ◽

File Format ◽

Sequence Alignments ◽

Multiple Sequence ◽

Protein Coding ◽

Multiple Sequence Alignments ◽

Coding Regions ◽

Link Type ◽

A Genome ◽

User Friendly

AbstractSummaryPhyloCSF++ is an efficient and parallelized C++ implementation of the popular PhyloCSF method to distinguish protein-coding and non-coding regions in a genome based on multiple sequence alignments. It can score alignments or produce browser tracks for entire genomes in the wig file format. Additionally, PhyloCSF++ annotates coding sequences in GFF/GTF files using precomputed tracks or computes and scores multiple sequence alignments on the fly with MMseqs.AvailabilityPhyloCSF++ is released under the AGPLv3 license. Binaries and source code are available at https://github.com/cpockrandt/PhyloCSFpp. The software can be installed through bioconda. A variety of tracks can be accessed through ftp://ftp.ccb.jhu.edu/pub/software/phylocsf++/[email protected], [email protected]

Download Full-text

MSAC: Compression of multiple sequence alignment files

10.1101/240341 ◽

2017 ◽

Cited By ~ 1

Author(s):

Sebastian Deorowicz ◽

Joanna Walczyszyn ◽

Agnieszka Debudaj-Grabysz

Keyword(s):

Sequence Alignment ◽

Compression Ratio ◽

Multiple Sequence Alignment ◽

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments ◽

Link Type ◽

Bioinformatics Databases ◽

Supplementary Material ◽

Burrows Wheeler Transform

AbstractMotivationBioinformatics databases grow rapidly and achieve values hardly to imagine a decade ago. Among numerous bioinformatics processes generating hundreds of GB is multiple sequence alignments of protein families. Its largest database, i.e., Pfam, consumes 40–230 GB, depending of the variant. Storage and transfer of such massive data has become a challenge.ResultsWe propose a novel compression algorithm, MSAC (Multiple Sequence Alignment Compressor), designed especially for aligned data. It is based on a generalisation of the positional Burrows–Wheeler transform for non-binary alphabets. MSAC handles FASTA, as well as Stockholm files. It offers up to six times better compression ratio than other commonly used compressors, i.e., gzip. Performed experiments resulted in an analysis of the influence of a protein family size on the compression ratio.AvailabilityMSAC is available for free at https://github.com/refresh-bio/msac and http://sun.aei.polsl.pl/REFRESH/[email protected] materialSupplementary data are available at the publisher Web site.

Download Full-text

PhyKIT: a UNIX shell toolkit for processing and analyzing phylogenomic data

10.1101/2020.10.27.358143 ◽

2020 ◽

Cited By ~ 1

Author(s):

Jacob L. Steenwyk ◽

Thomas J. Buida ◽

Abigail L. Labella ◽

Yuanning Li ◽

Xing-Xing Shen ◽

...

Keyword(s):

Information Content ◽

Phylogenetic Trees ◽

Sequence Alignments ◽

Multiple Sequence ◽

Sequence Composition ◽

Multiple Sequence Alignments ◽

Functional Relationships ◽

Link Type ◽

Biology Process ◽

Rate Evaluation

AbstractDiverse disciplines in biology process and analyze multiple sequence alignments (MSAs) and phylogenetic trees to evaluate their information content, infer evolutionary events and processes, and predict gene function. However, automated processing of MSAs and trees remains a challenge due to the lack of a unified toolkit. To fill this gap, we introduce PhyKIT, a toolkit for the UNIX shell environment with 30 functions that process MSAs and trees, including but not limited to estimation of mutation rate, evaluation of sequence composition biases, calculation of the degree of violation of a molecular clock, and collapsing bipartitions (internal branches) with low support. To demonstrate the utility of PhyKIT, we detail three use cases: (1) summarizing information content in MSAs and phylogenetic trees for diagnosing potential biases in sequence or tree data; (2) evaluating gene-gene covariation of evolutionary rates to identify functional relationships, including novel ones, among genes; and (3) identify lack of resolution events or polytomies in phylogenetic trees, which are suggestive of rapid radiation events or lack of data. We anticipate PhyKIT will be useful for processing, examining, and deriving biological meaning from increasingly large phylogenomic datasets. PhyKIT is freely available on GitHub (https://github.com/JLSteenwyk/PhyKIT) and documentation including user tutorials are available online (https://jlsteenwyk.com/PhyKIT).

Download Full-text

ProfileGrids solve the large alignment visualization problem: influenza hemagglutinin example

F1000Research ◽

10.12688/f1000research.2-2.v1 ◽

2013 ◽

Vol 2 ◽

pp. 2 ◽

Cited By ~ 4

Author(s):

Alberto I Roca ◽

Aaron C Abajian ◽

David J Vigerust

Keyword(s):

Protein Sequences ◽

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments ◽

Link Type ◽

Influenza Hemagglutinin ◽

Residue Frequency ◽

Hemagglutinin Protein ◽

Version 2.0

Large multiple sequence alignments are a challenge for current visualization programs. ProfileGrids are a solution that reduces alignments to a matrix, color-shaded according to the residue frequency at each column position. ProfileGrids are not limited by the number of sequences and so solves this visualization problem. We demonstrate the new metadata searching and grep filtering features of the JProfileGrid version 2.0 software on an alignment of 11,900 hemagglutinin protein sequences. JProfileGrid is free and available from http://www.ProfileGrid.org.

Download Full-text

NX4: a web-based visualization of large multiple sequence alignments

Bioinformatics ◽

10.1093/bioinformatics/btz457 ◽

2019 ◽

Vol 35 (22) ◽

pp. 4800-4802

Author(s):

A Solano-Roman ◽

C Cruz-Castillo ◽

D Offenhuber ◽

A Colubri

Keyword(s):

Large Scale ◽

Supplementary Information ◽

Sequence Alignments ◽

Multiple Sequence ◽

High Genetic Diversity ◽

Web Based ◽

Multiple Sequence Alignments ◽

Line Chart ◽

Sequence Logos ◽

Scalable Analysis

Abstract Summary Multiple Sequence Alignments (MSAs) are a fundamental operation in genome analysis. However, MSA visualizations such as sequence logos and matrix representations have changed little since the nineties and are not well suited for displaying large-scale alignments. We propose a novel, web-based MSA visualization tool called NX4, which can handle genome alignments comprising thousands of sequences. NX4 calculates the frequency of each nucleotide along the alignment and visually summarizes the results using a color-blind friendly palette that helps identifying regions of high genetic diversity. NX4 also provides the user with additional assistance in finding these regions with a ‘focus + context’ mechanism that uses a line chart of the Shannon entropy across the alignment. The tool offers geneticists an easy-to-use and scalable analysis for large MSA studies. Availability and implementation NX4 is freely available at https://www.nx4.io, and its source code at https://github.com/NX4/nx4. Supplementary information Supplementary data are available at Bioinformatics online

Download Full-text

Treerecs: an integrated phylogenetic tool, from sequences to reconciliations

Bioinformatics ◽

10.1093/bioinformatics/btaa615 ◽

2020 ◽

Vol 36 (18) ◽

pp. 4822-4824 ◽

Cited By ~ 1

Author(s):

Nicolas Comte ◽

Benoit Morel ◽

Damir Hasić ◽

Laurent Guéguen ◽

Bastien Boussau ◽

...

Keyword(s):

Open Source ◽

Source Code ◽

Phylogenetic Inference ◽

Species Tree ◽

Gene Trees ◽

Sequence Alignments ◽

Multiple Sequence ◽

Tree Reconciliation ◽

Multiple Sequence Alignments ◽

Multiple Alignments

Abstract Motivation Gene and species tree reconciliation methods are used to interpret gene trees, root them and correct uncertainties that are due to scarcity of signal in multiple sequence alignments. So far, reconciliation tools have not been integrated in standard phylogenetic software and they either lack performance on certain functions, or usability for biologists. Results We present Treerecs, a phylogenetic software based on duplication-loss reconciliation. Treerecs is simple to install and to use. It is fast and versatile, has a graphic output, and can be used along with methods for phylogenetic inference on multiple alignments like PLL and Seaview. Availability and implementation Treerecs is open-source. Its source code (C++, AGPLv3) and manuals are available from https://project.inria.fr/treerecs/.

Download Full-text

Treerecs: an integrated phylogenetic tool, from sequences to reconciliations

10.1101/782946 ◽

2019 ◽

Cited By ~ 2

Author(s):

Nicolas Comte ◽

Benoit Morel ◽

Damir Hasic ◽

Laurent Guéguen ◽

Bastien Boussau ◽

...

Keyword(s):

Open Source ◽

Source Code ◽

Phylogenetic Inference ◽

Species Tree ◽

Gene Trees ◽

Sequence Alignments ◽

Multiple Sequence ◽

Tree Reconciliation ◽

Multiple Sequence Alignments ◽

Multiple Alignments

AbstractMotivationGene and species tree reconciliation methods are used to interpret gene trees, root them and correct uncertainties that are due to scarcity of signal in multiple sequence alignments. So far, reconciliation tools have not been integrated in standard phylogenetic software and they either lack performance on certain functions, or usability for biologists.ResultsWe present Treerecs, a phylogenetic software based on duplication-loss reconciliation. Treerecs is simple to install and to use. It is fast and versatile, has a graphic output, and can be used along with methods for phylogenetic inference on multiple alignments like PLL and Seaview.AvailabilityTreerecs is open-source. Its source code (C++, AGPLv3) and manuals are available from https://project.inria.fr/treerecs/[email protected] or [email protected]

Download Full-text