scholarly journals Boosting alignment accuracy through adaptive local realignment

2016 ◽  
Author(s):  
Dan DeBlasio ◽  
John Kececioglu

AbstractMotivationWhile mutation rates can vary across the residues of a protein, when computing alignments of protein sequences the same setting of values for substitution score and gap penalty parameters is typically used across their entire length. We provide for the first time a new method called adaptive local realignment that automatically uses diverse parameter settings in different regions of the input sequences when computing multiple sequence alignments. This allows parameter settings to adapt to more closely match the local mutation rate across a protein.MethodOur method builds on our prior work on global alignment parameter advising with the Facet alignment accuracy estimator. Given a computed alignment, in each region that has low estimated accuracy, a collection of candidate realignments is generated using a precomputed set of alternate parameter settings. If one of these alternate realignments has higher estimated accuracy than the original subalignment, the region is replaced with the new realignment, and the concatenation of these realigned regions forms the final alignment that is output.ResultsAdaptive local realignment significantly improves the quality of alignments over using the single best default parameter setting. In particular, this new method of local advising, when combined with prior methods for global advising, boosts alignment accuracy by as much as 26% over the best default setting on hard-to-align benchmarks (and by 6.4% over using global advising alone).AvailabilityA new version of the Opal multiple sequence aligner that incorporates adaptive local realignment using Facet for parameter advising, is available free for non-commercial use at http://[email protected]

2019 ◽  
Vol 16 (1) ◽  
Author(s):  
Jati Adiputra ◽  
Sridhar Jarugula ◽  
Rayapati A. Naidu

Abstract Background Grapevine leafroll disease is one of the most economically important viral diseases affecting grape production worldwide. Grapevine leafroll-associated virus 4 (GLRaV-4, genus Ampelovirus, family Closteroviridae) is one of the six GLRaV species documented in grapevines (Vitis spp.). GLRaV-4 is made up of several distinct strains that were previously considered as putative species. Currently known strains of GLRaV-4 stand apart from other GLRaV species in lacking the minor coat protein. Methods In this study, the complete genome sequence of three strains of GLRaV-4 from Washington State vineyards was determined using a combination of high-throughput sequencing, Sanger sequencing and RACE. The genome sequence of these three strains was compared with corresponding sequences of GLRaV-4 strains reported from other grapevine-growing regions. Phylogenetic analysis and SimPlot and Recombination Detection Program (RDP) were used to identify putative recombination events among GLRaV-4 strains. Results The genome size of GLRaV-4 strain 4 (isolate WAMR-4), strain 5 (isolate WASB-5) and strain 9 (isolate WALA-9) from Washington State vineyards was determined to be 13,824 nucleotides (nt), 13,820 nt, and 13,850 nt, respectively. Multiple sequence alignments showed that a 11-nt sequence (5′-GTAATCTTTTG-3′) towards 5′ terminus of the 5′ non-translated region (NTR) and a 10-nt sequence (5′-ATCCAGGACC-3′) towards 3′ end of the 3′ NTR are conserved among the currently known GLRaV-4 strains. LR-106 isolate of strain 4 and Estellat isolate of strain 6 were identified as recombinants due to putative recombination events involving divergent sequences in the ORF1a from strain 5 and strain Pr. Conclusion Genome-wide analyses showed for the first time that recombinantion can occur between distinct strains of GLRaV-4 resulting in the emergence of genetically stable and biologically successful chimeric viruses. Although the origin of recombinant strains of GLRaV-4 remains elusive, intra-species recombination could be playing an important role in shaping genetic diversity and evolution of the virus and modulating the biology and epidemiology of GLRaV-4 strains.


2006 ◽  
Vol 7 (1) ◽  
Author(s):  
Virpi Ahola ◽  
Tero Aittokallio ◽  
Mauno Vihinen ◽  
Esa Uusipaikka

2014 ◽  
Author(s):  
Dent Earl ◽  
Ngan K Nguyen ◽  
Glenn Hickey ◽  
Robert S. Harris ◽  
Stephen Fitzgerald ◽  
...  

Background: Multiple sequence alignments (MSAs) are a prerequisite for a wide variety of evolutionary analyses. Published assessments and benchmark datasets for protein and, to a lesser extent, global nucleotide MSAs are available, but less effort has been made to establish benchmarks in the more general problem of whole genome alignment (WGA). Results: Using the same model as the successful Assemblathon competitions, we organized a competitive evaluation in which teams submitted their alignments, and assessments were performed collectively after all the submissions were received. Three datasets were used: two of simulated primate and mammalian phylogenies, and one of 20 real fly genomes. In total 35 submissions were assessed, submitted by ten teams using 12 different alignment pipelines. Conclusions: We found agreement between independent simulation-based and statistical assessments, indicating that there are substantial accuracy differences between contemporary alignment tools. We saw considerable difference in the alignment quality of differently annotated regions, and found few tools aligned the duplications analysed. We found many tools worked well at shorter evolutionary distances, but fewer performed competitively at longer distances. We provide all datasets, submissions and assessment programs for further study, and provide, as a resource for future benchmarking, a convenient repository of code and data for reproducing the simulation assessments.


Author(s):  
Jacek Błażewicz ◽  
Piotr Formanowicz ◽  
Paweł Wojciechowski

Some remarks on evaluating the quality of the multiple sequence alignment based on the BAliBASE benchmarkBAliBASE is one of the most widely used benchmarks for multiple sequence alignment programs. The accuracy of alignment methods is measured bybali_score—an application provided together with the database. The standard accuracy measures are the Sum of Pairs (SP) and the Total Column (TC). We have found that, for non-core block columns, results calculated bybali_scoreare different from those obtained on the basis of the formal definitions of the measures. We do not claim that one of these measures is better than the other, but they are definitely different. Such a situation can be the source of confusion when alignments obtained using various methods are compared. Therefore, we propose a new nomenclature for the measures of the quality of multiple sequence alignments to distinguish which one was actually calculated. Moreover, we have found that the occurrence of a gap in some column in the first sequence of the reference alignment causes column discarding.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Elena N. Judd ◽  
Alison R. Gilchrist ◽  
Nicholas R. Meyerson ◽  
Sara L. Sawyer

Abstract Background The Type I interferon response is an important first-line defense against viruses. In turn, viruses antagonize (i.e., degrade, mis-localize, etc.) many proteins in interferon pathways. Thus, hosts and viruses are locked in an evolutionary arms race for dominance of the Type I interferon pathway. As a result, many genes in interferon pathways have experienced positive natural selection in favor of new allelic forms that can better recognize viruses or escape viral antagonists. Here, we performed a holistic analysis of selective pressures acting on genes in the Type I interferon family. We initially hypothesized that the genes responsible for inducing the production of interferon would be antagonized more heavily by viruses than genes that are turned on as a result of interferon. Our logic was that viruses would have greater effect if they worked upstream of the production of interferon molecules because, once interferon is produced, hundreds of interferon-stimulated proteins would activate and the virus would need to counteract them one-by-one. Results We curated multiple sequence alignments of primate orthologs for 131 genes active in interferon production and signaling (herein, “induction” genes), 100 interferon-stimulated genes, and 100 randomly chosen genes. We analyzed each multiple sequence alignment for the signatures of recurrent positive selection. Counter to our hypothesis, we found the interferon-stimulated genes, and not interferon induction genes, are evolving significantly more rapidly than a random set of genes. Interferon induction genes evolve in a way that is indistinguishable from a matched set of random genes (22% and 18% of genes bear signatures of positive selection, respectively). In contrast, interferon-stimulated genes evolve differently, with 33% of genes evolving under positive selection and containing a significantly higher fraction of codons that have experienced selection for recurrent replacement of the encoded amino acid. Conclusion Viruses may antagonize individual products of the interferon response more often than trying to neutralize the system altogether.


Sign in / Sign up

Export Citation Format

Share Document