A Survey of the State-of-the-Art Parallel Multiple Sequence Alignment Algorithms on Multicore Systems

Sara Shehab; Sameh Abdulah; Arabi E.

doi:10.5120/ijca2018917658

Design of multiple sequence alignment algorithms on parallel, distributed memory supercomputers

2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society ◽

10.1109/iembs.2011.6090208 ◽

2011 ◽

Cited By ~ 6

Author(s):

Philip C. Church ◽

Andrzej Goscinski ◽

Kathryn Holt ◽

Michael Inouye ◽

Amol Ghoting ◽

...

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Distributed Memory ◽

Multiple Sequence ◽

Alignment Algorithms

Download Full-text

Efficient Multiple Sequences Alignment Algorithm Generation via Components Assembly Under PAR Framework

Frontiers in Genetics ◽

10.3389/fgene.2020.628175 ◽

2021 ◽

Vol 11 ◽

Author(s):

Haipeng Shi ◽

Haihe Shi ◽

Shenghua Xu

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Sequence Similarity ◽

Alignment Algorithm ◽

Pairwise Sequence Alignment ◽

Multiple Sequence ◽

Sequence Alignment Algorithm ◽

Alignment Algorithms ◽

Sequence Similarity Analysis ◽

High Level

As a key algorithm in bioinformatics, sequence alignment algorithm is widely used in sequence similarity analysis and genome sequence database search. Existing research focuses mainly on the specific steps of the algorithm or is for specific problems, lack of high-level abstract domain algorithm framework. Multiple sequence alignment algorithms are more complex, redundant, and difficult to understand, and it is not easy for users to select the appropriate algorithm; some computing errors may occur. Based on our constructed pairwise sequence alignment algorithm component library and the convenient software platform PAR, a few expansion domain components are developed for multiple sequence alignment application domain, and specific multiple sequence alignment algorithm can be designed, and its corresponding program, i.e., C++/Java/Python program, can be generated efficiently and thus enables the improvement of the development efficiency of complex algorithms, as well as accuracy of sequence alignment calculation. A star alignment algorithm is designed and generated to demonstrate the development process.

Download Full-text

ProgSIO-MSA: Progressive-based single iterative optimization framework for multiple sequence alignment using an effective scoring system

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720020500055 ◽

2020 ◽

Vol 18 (02) ◽

pp. 2050005

Author(s):

Sanjay Bankapur ◽

Nagamma Patil

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Scoring System ◽

State Of The Art ◽

Biological Sequences ◽

Alignment Quality ◽

Multiple Sequence ◽

Iterative Optimization ◽

Optimization Framework ◽

Proposed Model

Aligning more than two biological sequences is termed multiple sequence alignment (MSA). To analyze biological sequences, MSA is one of the primary activities with potential applications in phylogenetics, homology markers, protein structure prediction, gene regulation, and drug discovery. MSA problem is considered as NP-complete. Moreover, with the advancement of Next-Generation Sequencing techniques, all the gene and protein databases are consistently loaded with a vast amount of raw sequence data which are neither analyzed nor annotated. To analyze these growing volumes of raw sequences, the need of computationally-efficient (polynomial time) models with accurate alignment is high. In this study, a progressive-based alignment model is proposed, named ProgSIO-MSA, which consists of an effective scoring system and an optimization framework. The proposed scoring system aligns sequences effectively using the combination of two scoring strategies, i.e. Look Back Ahead, that scores a residue pair dynamically based on the status information of the previous position to improve the sum-of-pair score, and Position-Residue-Specific Dynamic Gap Penalty, that dynamically penalizes a gap using mutation matrix on the basis of residue and its position information. The proposed single iterative optimization (SIO) framework identifies and optimizes the local optima trap to improve the alignment quality. The proposed model is evaluated against progressive-based state-of-the-art models on two benchmark datasets, i.e. BAliBASE and SABmark. The alignment quality (biological accuracy) of the proposed model is increased by a factor of 17.7% on BAliBASE dataset. The proposed model’s efficiency is compared with state-of-the-art models using time complexity as well as runtime analysis. Wilcoxon signed-rank statistical test results concluded that the quality of the proposed model significantly outperformed progressive-based state-of-the-art models.

Download Full-text

Component-Based Design and Assembly of Heuristic Multiple Sequence Alignment Algorithms

Frontiers in Genetics ◽

10.3389/fgene.2020.00105 ◽

2020 ◽

Vol 11 ◽

Author(s):

Haihe Shi ◽

Xuchu Zhang

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Multiple Sequence ◽

Alignment Algorithms

Download Full-text

Instability in progressive multiple sequence alignment algorithms

Algorithms for Molecular Biology ◽

10.1186/s13015-015-0057-1 ◽

2015 ◽

Vol 10 (1) ◽

Cited By ~ 13

Author(s):

Kieran Boyce ◽

Fabian Sievers ◽

Desmond G. Higgins

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Multiple Sequence ◽

Alignment Algorithms ◽

Progressive Multiple Sequence Alignment

Download Full-text

Recursive MAGUS: Scalable and accurate multiple sequence alignment

PLoS Computational Biology ◽

10.1371/journal.pcbi.1008950 ◽

2021 ◽

Vol 17 (10) ◽

pp. e1008950

Author(s):

Vladimir Smirnov

Keyword(s):

Open Source ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

State Of The Art ◽

Sequence Data ◽

Large Datasets ◽

Alignment Accuracy ◽

Multiple Sequence ◽

Large Numbers ◽

Source Form

Multiple sequence alignment tools struggle to keep pace with rapidly growing sequence data, as few methods can handle large datasets while maintaining alignment accuracy. We recently introduced MAGUS, a new state-of-the-art method for aligning large numbers of sequences. In this paper, we present a comprehensive set of enhancements that allow MAGUS to align vastly larger datasets with greater speed. We compare MAGUS to other leading alignment methods on datasets of up to one million sequences. Our results demonstrate the advantages of MAGUS over other alignment software in both accuracy and speed. MAGUS is freely available in open-source form at https://github.com/vlasmirnov/MAGUS.

Download Full-text

Multiple Sequence Alignment Averaging Improves Phylogeny Reconstruction

Systematic Biology ◽

10.1093/sysbio/syy036 ◽

2018 ◽

Vol 68 (1) ◽

pp. 117-130 ◽

Cited By ~ 9

Author(s):

Haim Ashkenazy ◽

Itamar Sela ◽

Eli Levy Karin ◽

Giddy Landan ◽

Tal Pupko

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Sequence Data ◽

Phylogenetic Signal ◽

Large Set ◽

Multiple Sequence ◽

Extra Effort ◽

Alignment Algorithms ◽

Tree Inference ◽

Alignment Errors

Abstract The classic methodology of inferring a phylogenetic tree from sequence data is composed of two steps. First, a multiple sequence alignment (MSA) is computed. Then, a tree is reconstructed assuming the MSA is correct. Yet, inferred MSAs were shown to be inaccurate and alignment errors reduce tree inference accuracy. It was previously proposed that filtering unreliable alignment regions can increase the accuracy of tree inference. However, it was also demonstrated that the benefit of this filtering is often obscured by the resulting loss of phylogenetic signal. In this work we explore an approach, in which instead of relying on a single MSA, we generate a large set of alternative MSAs and concatenate them into a single SuperMSA. By doing so, we account for phylogenetic signals contained in columns that are not present in the single MSA computed by alignment algorithms. Using simulations, we demonstrate that this approach results, on average, in more accurate trees compared to 1) using an unfiltered MSA and 2) using a single MSA with weights assigned to columns according to their reliability. Next, we explore in which regions of the MSA space our approach is expected to be beneficial. Finally, we provide a simple criterion for deciding whether or not the extra effort of computing a SuperMSA and inferring a tree from it is beneficial. Based on these assessments, we expect our methodology to be useful for many cases in which diverged sequences are analyzed. The option to generate such a SuperMSA is available at http://guidance.tau.ac.il.

Download Full-text

Recursive MAGUS: scalable and accurate multiple sequence alignment

10.1101/2021.04.09.439137 ◽

2021 ◽

Author(s):

Vladimir Smirnov

Keyword(s):

Open Source ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

State Of The Art ◽

Sequence Data ◽

Large Datasets ◽

Alignment Accuracy ◽

Multiple Sequence ◽

Large Numbers ◽

Source Form

Multiple sequence alignment tools struggle to keep pace with rapidly growing sequence data, as few methods can handle large datasets while maintaining alignment accuracy. We recently introduced MAGUS, a new state-of-the-art method for aligning large numbers of sequences. In this paper, we present a comprehensive set of enhancements that allow MAGUS to align vastly larger datasets with greater speed. We compare MAGUS to other leading alignment methods on datasets of up to one million sequences. Our results demonstrate the advantages of MAGUS over other alignment software in both accuracy and speed. MAGUS is freely available in open-source form at https://github.com/vlasmirnov/MAGUS.

Download Full-text

Improving accuracy of multiple sequence alignment algorithms based on alignment of neighboring residues

Nucleic Acids Research ◽

10.1093/nar/gkn945 ◽

2008 ◽

Vol 37 (2) ◽

pp. 463-472 ◽

Cited By ~ 13

Author(s):

Yue Lu ◽

Sing-Hoi Sze

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Multiple Sequence ◽

Alignment Algorithms ◽

Improving Accuracy

Download Full-text

Performance assessment of protein multiple sequence alignment algorithms based on permutation similarity measurement

Biochemical and Biophysical Research Communications ◽

10.1016/j.bbrc.2010.07.103 ◽

2010 ◽

Vol 399 (4) ◽

pp. 470-474 ◽

Cited By ~ 2

Author(s):

Zhi Gong ◽

Fangzhen Li ◽

Liuhuan Dong

Keyword(s):

Performance Assessment ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Similarity Measurement ◽

Multiple Sequence ◽

Alignment Algorithms ◽

Protein Multiple Sequence Alignment

Download Full-text