Boosting alignment accuracy through adaptive local realignment

Mapping Intimacies ◽

10.1101/063131 ◽

2016 ◽

Cited By ~ 1

Author(s):

Dan DeBlasio ◽

John Kececioglu

Keyword(s):

New Method ◽

Alignment Accuracy ◽

Global Alignment ◽

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments ◽

Local Realignment ◽

First Time ◽

Final Alignment

AbstractMotivationWhile mutation rates can vary across the residues of a protein, when computing alignments of protein sequences the same setting of values for substitution score and gap penalty parameters is typically used across their entire length. We provide for the first time a new method called adaptive local realignment that automatically uses diverse parameter settings in different regions of the input sequences when computing multiple sequence alignments. This allows parameter settings to adapt to more closely match the local mutation rate across a protein.MethodOur method builds on our prior work on global alignment parameter advising with the Facet alignment accuracy estimator. Given a computed alignment, in each region that has low estimated accuracy, a collection of candidate realignments is generated using a precomputed set of alternate parameter settings. If one of these alternate realignments has higher estimated accuracy than the original subalignment, the region is replaced with the new realignment, and the concatenation of these realigned regions forms the final alignment that is output.ResultsAdaptive local realignment significantly improves the quality of alignments over using the single best default parameter setting. In particular, this new method of local advising, when combined with prior methods for global advising, boosts alignment accuracy by as much as 26% over the best default setting on hard-to-align benchmarks (and by 6.4% over using global advising alone).AvailabilityA new version of the Opal multiple sequence aligner that incorporates adaptive local realignment using Facet for parameter advising, is available free for non-commercial use at http://[email protected]

Download Full-text

Intra-species recombination among strains of the ampelovirus Grapevine leafroll-associated virus 4

Virology Journal ◽

10.1186/s12985-019-1243-4 ◽

2019 ◽

Vol 16 (1) ◽

Cited By ~ 1

Author(s):

Jati Adiputra ◽

Sridhar Jarugula ◽

Rayapati A. Naidu

Keyword(s):

Genome Sequence ◽

High Throughput Sequencing ◽

Washington State ◽

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments ◽

Genome Wide ◽

Detection Program ◽

Leafroll Disease ◽

First Time

Abstract Background Grapevine leafroll disease is one of the most economically important viral diseases affecting grape production worldwide. Grapevine leafroll-associated virus 4 (GLRaV-4, genus Ampelovirus, family Closteroviridae) is one of the six GLRaV species documented in grapevines (Vitis spp.). GLRaV-4 is made up of several distinct strains that were previously considered as putative species. Currently known strains of GLRaV-4 stand apart from other GLRaV species in lacking the minor coat protein. Methods In this study, the complete genome sequence of three strains of GLRaV-4 from Washington State vineyards was determined using a combination of high-throughput sequencing, Sanger sequencing and RACE. The genome sequence of these three strains was compared with corresponding sequences of GLRaV-4 strains reported from other grapevine-growing regions. Phylogenetic analysis and SimPlot and Recombination Detection Program (RDP) were used to identify putative recombination events among GLRaV-4 strains. Results The genome size of GLRaV-4 strain 4 (isolate WAMR-4), strain 5 (isolate WASB-5) and strain 9 (isolate WALA-9) from Washington State vineyards was determined to be 13,824 nucleotides (nt), 13,820 nt, and 13,850 nt, respectively. Multiple sequence alignments showed that a 11-nt sequence (5′-GTAATCTTTTG-3′) towards 5′ terminus of the 5′ non-translated region (NTR) and a 10-nt sequence (5′-ATCCAGGACC-3′) towards 3′ end of the 3′ NTR are conserved among the currently known GLRaV-4 strains. LR-106 isolate of strain 4 and Estellat isolate of strain 6 were identified as recombinants due to putative recombination events involving divergent sequences in the ORF1a from strain 5 and strain Pr. Conclusion Genome-wide analyses showed for the first time that recombinantion can occur between distinct strains of GLRaV-4 resulting in the emergence of genetically stable and biologically successful chimeric viruses. Although the origin of recombinant strains of GLRaV-4 remains elusive, intra-species recombination could be playing an important role in shaping genetic diversity and evolution of the virus and modulating the biology and epidemiology of GLRaV-4 strains.

Download Full-text

Relationship between multiple sequence alignments and quality of protein comparative models

Proteins Structure Function and Bioinformatics ◽

10.1002/prot.20284 ◽

2004 ◽

Vol 58 (1) ◽

pp. 151-157 ◽

Cited By ~ 32

Author(s):

Domenico Cozzetto ◽

Anna Tramontano

Keyword(s):

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments

Download Full-text

A statistical score for assessing the quality of multiple sequence alignments

BMC Bioinformatics ◽

10.1186/1471-2105-7-484 ◽

2006 ◽

Vol 7 (1) ◽

Cited By ~ 28

Author(s):

Virpi Ahola ◽

Tero Aittokallio ◽

Mauno Vihinen ◽

Esa Uusipaikka

Keyword(s):

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments

Download Full-text

Alignathon: A competitive assessment of whole genome alignment methods.

10.1101/003285 ◽

2014 ◽

Cited By ~ 1

Author(s):

Dent Earl ◽

Ngan K Nguyen ◽

Glenn Hickey ◽

Robert S. Harris ◽

Stephen Fitzgerald ◽

...

Keyword(s):

Genome Alignment ◽

Whole Genome ◽

Alignment Quality ◽

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments ◽

Simulation Based ◽

Benchmark Datasets ◽

Whole Genome Alignment

Background: Multiple sequence alignments (MSAs) are a prerequisite for a wide variety of evolutionary analyses. Published assessments and benchmark datasets for protein and, to a lesser extent, global nucleotide MSAs are available, but less effort has been made to establish benchmarks in the more general problem of whole genome alignment (WGA). Results: Using the same model as the successful Assemblathon competitions, we organized a competitive evaluation in which teams submitted their alignments, and assessments were performed collectively after all the submissions were received. Three datasets were used: two of simulated primate and mammalian phylogenies, and one of 20 real fly genomes. In total 35 submissions were assessed, submitted by ten teams using 12 different alignment pipelines. Conclusions: We found agreement between independent simulation-based and statistical assessments, indicating that there are substantial accuracy differences between contemporary alignment tools. We saw considerable difference in the alignment quality of differently annotated regions, and found few tools aligned the duplications analysed. We found many tools worked well at shorter evolutionary distances, but fewer performed competitively at longer distances. We provide all datasets, submissions and assessment programs for further study, and provide, as a resource for future benchmarking, a convenient repository of code and data for reproducing the simulation assessments.

Download Full-text

Some remarks on evaluating the quality of the multiple sequence alignment based on the BAliBASE benchmark

International Journal of Applied Mathematics and Computer Science ◽

10.2478/v10006-009-0054-y ◽

2009 ◽

Vol 19 (4) ◽

pp. 675-678 ◽

Cited By ~ 5

Author(s):

Jacek Błażewicz ◽

Piotr Formanowicz ◽

Paweł Wojciechowski

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments ◽

Formal Definitions ◽

Accuracy Measures ◽

Total Column ◽

Better Than

Some remarks on evaluating the quality of the multiple sequence alignment based on the BAliBASE benchmarkBAliBASE is one of the most widely used benchmarks for multiple sequence alignment programs. The accuracy of alignment methods is measured bybali_score—an application provided together with the database. The standard accuracy measures are the Sum of Pairs (SP) and the Total Column (TC). We have found that, for non-core block columns, results calculated bybali_scoreare different from those obtained on the basis of the formal definitions of the measures. We do not claim that one of these measures is better than the other, but they are definitely different. Such a situation can be the source of confusion when alignments obtained using various methods are compared. Therefore, we propose a new nomenclature for the measures of the quality of multiple sequence alignments to distinguish which one was actually calculated. Moreover, we have found that the occurrence of a gap in some column in the first sequence of the reference alignment causes column discarding.

Download Full-text

SNP-E: A New Method For Multiple Sequence Alignments Anal- ysis And Accurate Single Nucleotide Polymorphism Evaluation

Atlas Journal of Biology ◽

10.5147/ajb.2014.0134 ◽

2014 ◽

Vol 3 (1) ◽

pp. 206-2011

Author(s):

Melody N. Hemmati-Sholeh ◽

Larry A. Sholeh ◽

David A. Lightfoot

Keyword(s):

Single Nucleotide Polymorphism ◽

New Method ◽

Nucleotide Polymorphism ◽

Sequence Alignments ◽

Multiple Sequence ◽

Single Nucleotide ◽

Multiple Sequence Alignments ◽

Anal Ysis

Download Full-text

Faculty Opinions recommendation of Evolutionary profiles from the QR factorization of multiple sequence alignments.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1024515.296730 ◽

2005 ◽

Author(s):

Anne-Catherine Dock-Bregeon

Keyword(s):

Qr Factorization ◽

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments

Download Full-text

Faculty Opinions recommendation of Protein contact prediction by integrating deep multiple sequence alignments, coevolution and machine learning.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.732011981.793542976 ◽

2018 ◽

Author(s):

Chandra Verma ◽

Suryani Lukman

Keyword(s):

Machine Learning ◽

Sequence Alignments ◽

Multiple Sequence ◽

Contact Prediction ◽

Multiple Sequence Alignments

Download Full-text

Positive natural selection in primate genes of the type I interferon response

BMC Ecology and Evolution ◽

10.1186/s12862-021-01783-z ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Elena N. Judd ◽

Alison R. Gilchrist ◽

Nicholas R. Meyerson ◽

Sara L. Sawyer

Keyword(s):

Natural Selection ◽

Positive Selection ◽

Type I Interferon ◽

Interferon Response ◽

Type I ◽

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments ◽

Interferon Stimulated Genes ◽

Interferon Induction

Abstract Background The Type I interferon response is an important first-line defense against viruses. In turn, viruses antagonize (i.e., degrade, mis-localize, etc.) many proteins in interferon pathways. Thus, hosts and viruses are locked in an evolutionary arms race for dominance of the Type I interferon pathway. As a result, many genes in interferon pathways have experienced positive natural selection in favor of new allelic forms that can better recognize viruses or escape viral antagonists. Here, we performed a holistic analysis of selective pressures acting on genes in the Type I interferon family. We initially hypothesized that the genes responsible for inducing the production of interferon would be antagonized more heavily by viruses than genes that are turned on as a result of interferon. Our logic was that viruses would have greater effect if they worked upstream of the production of interferon molecules because, once interferon is produced, hundreds of interferon-stimulated proteins would activate and the virus would need to counteract them one-by-one. Results We curated multiple sequence alignments of primate orthologs for 131 genes active in interferon production and signaling (herein, “induction” genes), 100 interferon-stimulated genes, and 100 randomly chosen genes. We analyzed each multiple sequence alignment for the signatures of recurrent positive selection. Counter to our hypothesis, we found the interferon-stimulated genes, and not interferon induction genes, are evolving significantly more rapidly than a random set of genes. Interferon induction genes evolve in a way that is indistinguishable from a matched set of random genes (22% and 18% of genes bear signatures of positive selection, respectively). In contrast, interferon-stimulated genes evolve differently, with 33% of genes evolving under positive selection and containing a significantly higher fraction of codons that have experienced selection for recurrent replacement of the encoded amino acid. Conclusion Viruses may antagonize individual products of the interferon response more often than trying to neutralize the system altogether.

Download Full-text

SNN-SB: Combining Partial Alignment Using Modified SNN Algorithm with Segment-Based for Multiple Sequence Alignments

Journal of Physics Conference Series ◽

10.1088/1742-6596/1962/1/012048 ◽

2021 ◽

Vol 1962 (1) ◽

pp. 012048

Author(s):

Aziz Nasser Boraik Ali ◽

Hassan Pyar Ali Hassan ◽

Hesham Bahamish

Keyword(s):

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments ◽

Partial Alignment

Download Full-text