Base-By-Base Version 3: New Comparative Tools for Large Virus Genomes

Shin-Lin Tu; Jeannette Staheli; Colum McClay; Kathleen McLeod; Timothy Rose; Chris Upton

doi:10.3390/v10110637

Base-By-Base Version 3: New Comparative Tools for Large Virus Genomes

Viruses ◽

10.3390/v10110637 ◽

2018 ◽

Vol 10 (11) ◽

pp. 637 ◽

Cited By ~ 6

Author(s):

Shin-Lin Tu ◽

Jeannette Staheli ◽

Colum McClay ◽

Kathleen McLeod ◽

Timothy Rose ◽

...

Keyword(s):

Sequence Data ◽

Degenerate Primers ◽

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments ◽

Viral Genomes ◽

User Data ◽

Large Virus ◽

Intuitive Interface ◽

Virus Genomes

Base-By-Base is a comprehensive tool for the creation and editing of multiple sequence alignments that is coded in Java and runs on multiple platforms. It can be used with gene and protein sequences as well as with large viral genomes, which themselves can contain gene annotations. This report describes new features added to Base-By-Base over the last 7 years. The two most significant additions are: (1) The recoding and inclusion of “consensus-degenerate hybrid oligonucleotide primers” (CODEHOP), a popular tool for the design of degenerate primers from a multiple sequence alignment of proteins; and (2) the ability to perform fuzzy searches within the columns of sequence data in multiple sequence alignments to determine the distribution of sequence variants among the sequences. The intuitive interface focuses on the presentation of results in easily understood visualizations and providing the ability to annotate the sequences in a multiple alignment with analytic and user data.

Download Full-text

VisFeature: a stand-alone program for visualizing and analyzing statistical features of biological sequences

Bioinformatics ◽

10.1093/bioinformatics/btz689 ◽

2019 ◽

Cited By ~ 3

Author(s):

Jun Wang ◽

Pu-Feng Du ◽

Xin-Yu Xue ◽

Guang-Ping Li ◽

Yuan-Ke Zhou ◽

...

Keyword(s):

Sequence Data ◽

Software Tool ◽

Data Retrieval ◽

Supplementary Information ◽

Statistical Features ◽

Biological Sequence ◽

Sequence Alignments ◽

Multiple Sequence ◽

Source Codes ◽

Multiple Sequence Alignments

Abstract Summary Many efforts have been made in developing bioinformatics algorithms to predict functional attributes of genes and proteins from their primary sequences. One challenge in this process is to intuitively analyze and to understand the statistical features that have been selected by heuristic or iterative methods. In this paper, we developed VisFeature, which aims to be a helpful software tool that allows the users to intuitively visualize and analyze statistical features of all types of biological sequence, including DNA, RNA and proteins. VisFeature also integrates sequence data retrieval, multiple sequence alignments and statistical feature generation functions. Availability and implementation VisFeature is a desktop application that is implemented using JavaScript/Electron and R. The source codes of VisFeature are freely accessible from the GitHub repository (https://github.com/wangjun1996/VisFeature). The binary release, which includes an example dataset, can be freely downloaded from the same GitHub repository (https://github.com/wangjun1996/VisFeature/releases). Contact [email protected] or [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Organization of the multigene families of African Swine Fever Virus

Fine Focus ◽

10.33043/ff.3.2.155-170 ◽

2017 ◽

Vol 3 (2) ◽

pp. 155-170 ◽

Cited By ~ 1

Author(s):

Jacob Imbery ◽

Chris Upton

Keyword(s):

African Swine Fever Virus ◽

Sequence Divergence ◽

African Swine Fever ◽

Fever Virus ◽

Multigene Families ◽

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments ◽

Viral Genomes ◽

Gene Maps

African swine fever virus is a complex DNA virus that infects swine and is spread by ticks. Mortality rates in domestic pigs are very high and the virus is a significant threat to pork farming. The genomes of 16 viruses have been sequenced completely, but these represent only a few of the 23 genotypes. The viral genome is unusual in that it contains 5 multigene families, each of which contain 3-19 duplicated copies (paralogs). There is significant sequence divergence between the paralogs in a single virus and between the orthologs in the different viral genomes. This, together with the fact that in most of the multigene families there are numerous gene indels that create truncations and fusions, makes annotation of these regions very difficult; it has led to inconsistent annotation of the 16 viral genomes. In this project, we have created multiple sequence alignments for each of the multigene families and have produced gene maps to help researchers more easily understand the organization of the multigene families among the different viruses. These gene maps will help researchers ascertain which members of the multigene families are present in each of the viruses. This is critical because some of the multigene families are known to be associated with virus virulence.

Download Full-text

A minimum reporting standard for multiple sequence alignments

10.1101/2020.01.15.907733 ◽

2020 ◽

Author(s):

Thomas KF Wong ◽

Subha Kalyaanamoorthy ◽

Karen Meusemann ◽

David K Yeates ◽

Bernhard Misof ◽

...

Keyword(s):

Amino Acids ◽

Sequence Data ◽

Pivotal Role ◽

Sequence Alignments ◽

Reporting Standard ◽

Multiple Sequence ◽

Molecular Sequence Data ◽

Molecular Sequence ◽

Multiple Sequence Alignments

ABSTRACTMultiple sequence alignments (MSAs) play a pivotal role in studies of molecular sequence data, but nobody has developed a minimum reporting standard (MRS) to quantify the completeness of MSAs in terms of completely-specified nucleotides or amino acids. We present an MRS that relies on four simple completeness metrics. The metrics are implemented in AliStat, a program developed to support the MRS. A survey of published MSAs illustrates the benefits and unprecedented transparency offered by the MRS.

Download Full-text

A minimum reporting standard for multiple sequence alignments

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqaa024 ◽

2020 ◽

Vol 2 (2) ◽

Cited By ~ 8

Author(s):

Thomas K F Wong ◽

Subha Kalyaanamoorthy ◽

Karen Meusemann ◽

David K Yeates ◽

Bernhard Misof ◽

...

Keyword(s):

Amino Acids ◽

Sequence Data ◽

Pivotal Role ◽

Sequence Alignments ◽

Reporting Standard ◽

Multiple Sequence ◽

Molecular Sequence Data ◽

Molecular Sequence ◽

Multiple Sequence Alignments

Abstract Multiple sequence alignments (MSAs) play a pivotal role in studies of molecular sequence data, but nobody has developed a minimum reporting standard (MRS) to quantify the completeness of MSAs in terms of completely specified nucleotides or amino acids. We present an MRS that relies on four simple completeness metrics. The metrics are implemented in AliStat, a program developed to support the MRS. A survey of published MSAs illustrates the benefits and unprecedented transparency offered by the MRS.

Download Full-text

Greene SCPrimer: a rapid comprehensive tool for designing degenerate primers from multiple sequence alignments

Nucleic Acids Research ◽

10.1093/nar/gkl966 ◽

2006 ◽

Vol 34 (22) ◽

pp. 6605-6611 ◽

Cited By ~ 48

Author(s):

Omar J. Jabado ◽

Gustavo Palacios ◽

Vishal Kapoor ◽

Jeffrey Hui ◽

Neil Renwick ◽

...

Keyword(s):

Degenerate Primers ◽

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments

Download Full-text

Using sound to understand protein sequence data: new sonification algorithms for protein sequences and multiple sequence alignments

BMC Bioinformatics ◽

10.1186/s12859-021-04362-7 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Edward J. Martin ◽

Thomas R. Meagher ◽

Daniel Barker

Keyword(s):

Focus Group ◽

User Experience ◽

Protein Sequence ◽

Sequence Data ◽

Protein Sequences ◽

Sequence Alignments ◽

Multiple Sequence ◽

Future Directions ◽

Multiple Sequence Alignments ◽

Protein Sequence Data

Abstract Background The use of sound to represent sequence data—sonification—has great potential as an alternative and complement to visual representation, exploiting features of human psychoacoustic intuitions to convey nuance more effectively. We have created five parameter-mapping sonification algorithms that aim to improve knowledge discovery from protein sequences and small protein multiple sequence alignments. For two of these algorithms, we investigated their effectiveness at conveying information. To do this we focussed on subjective assessments of user experience. This entailed a focus group session and survey research by questionnaire of individuals engaged in bioinformatics research. Results For single protein sequences, the success of our sonifications for conveying features was supported by both the survey and focus group findings. For protein multiple sequence alignments, there was limited evidence that the sonifications successfully conveyed information. Additional work is required to identify effective algorithms to render multiple sequence alignment sonification useful to researchers. Feedback from both our survey and focus groups suggests future directions for sonification of multiple alignments: animated visualisation indicating the column in the multiple alignment as the sonification progresses, user control of sequence navigation, and customisation of the sound parameters. Conclusions Sonification approaches undertaken in this work have shown some success in conveying information from protein sequence data. Feedback points out future directions to build on the sonification approaches outlined in this paper. The effectiveness assessment process implemented in this work proved useful, giving detailed feedback and key approaches for improvement based on end-user input. The uptake of similar user experience focussed effectiveness assessments could also help with other areas of bioinformatics, for example in visualisation.

Download Full-text

An Overview of Multiple Sequence Alignments and Cloud Computing in Bioinformatics

ISRN Biomathematics ◽

10.1155/2013/615630 ◽

2013 ◽

Vol 2013 ◽

pp. 1-14 ◽

Cited By ~ 28

Author(s):

Jurate Daugelaite ◽

Aisling O' Driscoll ◽

Roy D. Sleator

Keyword(s):

Cloud Computing ◽

Large Scale ◽

Sequence Data ◽

Cloud Base ◽

Data Sets ◽

Next Generation ◽

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments ◽

Computing Technologies

Multiple sequence alignment (MSA) of DNA, RNA, and protein sequences is one of the most essential techniques in the fields of molecular biology, computational biology, and bioinformatics. Next-generation sequencing technologies are changing the biology landscape, flooding the databases with massive amounts of raw sequence data. MSA of ever-increasing sequence data sets is becoming a significant bottleneck. In order to realise the promise of MSA for large-scale sequence data sets, it is necessary for existing MSA algorithms to be run in a parallelised fashion with the sequence data distributed over a computing cluster or server farm. Combining MSA algorithms with cloud computing technologies is therefore likely to improve the speed, quality, and capability for MSA to handle large numbers of sequences. In this review, multiple sequence alignments are discussed, with a specific focus on the ClustalW and Clustal Omega algorithms. Cloud computing technologies and concepts are outlined, and the next generation of cloud base MSA algorithms is introduced.

Download Full-text

Benchmarking inverse statistical approaches for protein structure and design with exactly solvable models

10.1101/028936 ◽

2015 ◽

Cited By ~ 2

Author(s):

Hugo Jacquin ◽

Amy Gilson ◽

Eugene Shakhnovich ◽

Simona Cocco ◽

Rémi Monasson

Keyword(s):

Protein Structure ◽

Structural Information ◽

Sequence Data ◽

Careful Analysis ◽

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments ◽

Pairwise Models ◽

Statistical Approaches ◽

And Function

Inverse statistical approaches to determine protein structure and function from Multiple Sequence Alignments (MSA) are emerging as powerful tools in computational biology. However the underlying assumptions of the relationship between the inferred effective Potts Hamiltonian and real protein structure and energetics remain untested so far. Here we use lattice protein model (LP) to benchmark those inverse statistical approaches. We build MSA of highly stable sequences in target LP structures, and infer the effective pairwise Potts Hamiltonians from those MSA. We find that inferred Potts Hamiltonians reproduce many important aspects of `true' LP structures and energetics. Careful analysis reveals that effective pairwise couplings in inferred Potts Hamiltonians depend not only on the energetics of the native structure but also on competing folds; in particular, the coupling values reflect both positive design (stabilization of native conformation) and negative design (destabilization of competing folds). In addition to providing detailed structural information, the inferred Potts models used as protein Hamiltonian for design of new sequences are able to generate with high probability completely new sequences with the desired folds, which is not possible using independent-site models. Those are remarkable results as the effective LP Hamiltonians used to generate MSA are not simple pairwise models due to the competition between the folds. Our findings elucidate the reasons of the power of inverse approaches to the modelling of proteins from sequence data, and their limitations; we show, in particular, that their success crucially depend on the accurate inference of the Potts pairwise couplings.

Download Full-text

Molecular characterization of intraspecific variations in Helicoverpa armigera (Hübner) populations across India

Journal of Environmental Biology ◽

10.22438/jeb/42/5/mrn-1764 ◽

2021 ◽

Vol 42 (5) ◽

pp. 1320-1329

Author(s):

S. Chakravarty ◽

◽

K.G. Padwal ◽

C.P. Srivastava ◽

◽

...

Keyword(s):

Helicoverpa Armigera ◽

Sequence Data ◽

Pcr Amplification ◽

Haplotype Network ◽

Coi Gene ◽

Sequence Alignments ◽

Separate Species ◽

Multiple Sequence ◽

Multiple Sequence Alignments ◽

Helicoverpa Armígera

Aim: The present study was undertaken to explore the genetic diversity among Helicoverpa armigera populations from varied geographic regions of India using mitochondrial cytochrome c oxidase I (COI) gene fragments. Methodology: The larval specimens of H. armigera collected from 20 locations were subjected to DNA extraction, PCR amplification of target gene, sequencing and then multiple sequence alignments. Results: Based on COI sequence data, high levels of genetic differentiation among some H. armigera populations were detected, but divergence existing was not high enough to delineate them as separate species. The Indian population as a whole exhibited similarity with global genetic assemblage. Significant negative neutrality test indices and unimodal mismatch distribution further supported that this insect experienced a demographic expansion in the past. The phylogenetic tree and median-joining haplotype network indicated that genetic similarity was not related with geographic proximity of populations. Interpretation: Differences based on genetic analyses indicate considerable subspecific level variations among H. armigera populations of India. However, there is no existence of any unidentified cryptic species of H. armigera in the country.

Download Full-text

gmos: Rapid detection of genome mosaicism over short evolutionary distances

10.1101/053694 ◽

2016 ◽

Author(s):

Mirjana Domazet-Lošo ◽

Tomislav Domazet-Lošo

Keyword(s):

Real Data ◽

Mosaic Structure ◽

Local Alignment ◽

Data Sets ◽

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments ◽

Viral Genomes ◽

Alignment Free ◽

Query Region

AbstractProkaryotic and viral genomes are often altered by recombination and horizontal gene transfer. The existing methods for detecting recombination are primarily aimed at viral genomes or sets of loci, since the expensive computation of underlying statistical models often hinders the comparison of complete prokaryotic genomes. As an alternative, alignment-free solutions are more efficient, but cannot map (align) a query to subject genomes. To address this problem, we have developed gmos (Genome MOsaic Structure), a new program that determines the mosaic structure of query genomes when compared to a set of closely related subject genomes. The program first computes local alignments between query and subject genomes and then reconstructs the query mosaic structure by choosing the best local alignment for each query region. To accomplish the analysis quickly, the program mostly relies on pairwise alignments and constructs multiple sequence alignments over short overlapping subject regions only when necessary. This fine-tuned implementation achieves an efficiency comparable to an alignment-free tool. The program performs well for simulated and real data sets of closely related genomes and can be used for fast recombination detection; for instance, when a new prokaryotic pathogen is discovered. As an example, gmos was used to detect genome mosaicism in a pathogenic Enterococcus faecium strain compared to seven closely related genomes. The analysis took less than two minutes on a single 2.1 GHz processor. The output is available in fasta format and can be visualized using an accessory program, gmosDraw (freely available with gmos).

Download Full-text