scholarly journals A Parallel Multiobjective Metaheuristic for Multiple Sequence Alignment

2017 ◽  
Author(s):  
Álvaro Rubio-Largo ◽  
Leonardo Vanneschi ◽  
Mauro Castelli ◽  
Miguel A. Vega-Rodríguez

AbstractThe alignment among three or more nucleotides/amino-acids sequences at the same time is known as Multiple Sequence Alignment (MSA), an NP-hard optimization problem. The time complexity of finding an optimal alignment raises exponentially when the number of sequences to align increases. In this work, we deal with a multiobjective version of the MSA problem where the goal is to simultaneously optimize the accuracy and conservation of the alignment. A parallel version of the Hybrid Multiobjective Memetic Metaheuristics for Multiple Sequence Alignment is proposed. In order to evaluate the parallel performance of our proposal, we have selected a pull of datasets with different number of sequences (up to 1000 sequences) and study its parallel performance against other well-known parallel metaheuristics published in the literature, such as MSAProbs, T-Coffee, Clustal Ω, and MAFFT. The comparative study reveals that our parallel aligner is around 25 times faster than the sequential version with 32 cores, obtaining a parallel efficiency around 80%.

2003 ◽  
Vol 01 (02) ◽  
pp. 267-287 ◽  
Author(s):  
Chuan Yi Tang ◽  
Chin Lung Lu ◽  
Margaret Dah-Tsyr Chang ◽  
Yin-Te Tsai ◽  
Yuh-Ju Sun ◽  
...  

In this paper, we design a heuristic algorithm of computing a constrained multiple sequence alignment (CMSA for short) for guaranteeing that the generated alignment satisfies the user-specified constraints that some particular residues should be aligned together. If the number of residues needed to be aligned together is a constant α, then the time-complexity of our CMSA algorithm for aligning K sequences is O(αKn4), where n is the maximum of the lengths of sequences. In addition, we have built up such a CMSA software system and made several experiments on the RNase sequences, which mainly function in catalyzing the degradation of RNA molecules. The resulting alignments illustrate the practicability of our method.


2021 ◽  
pp. 1-18
Author(s):  
Hafiz Asadul Rehman ◽  
Kashif Zafar ◽  
Ayesha Khan ◽  
Abdullah Imtiaz

Discovering structural, functional and evolutionary information in biological sequences have been considered as a core research area in Bioinformatics. Multiple Sequence Alignment (MSA) tries to align all sequences in a given query set to provide us ease in annotation of new sequences. Traditional methods to find the optimal alignment are computationally expensive in real time. This research presents an enhanced version of Bird Swarm Algorithm (BSA), based on bio inspired optimization. Enhanced Bird Swarm Align Algorithm (EBSAA) is proposed for multiple sequence alignment problem to determine the optimal alignment among different sequences. Twenty-one different datasets have been used in order to compare performance of EBSAA with Genetic Algorithm (GA) and Particle Swarm Align Algorithm (PSAA). The proposed technique results in better alignment as compared to GA and PSAA in most of the cases.


2012 ◽  
Vol 23 (04) ◽  
pp. 877-901 ◽  
Author(s):  
ADAM GUDYŚ ◽  
SEBASTIAN DEOROWICZ

Multiple sequence alignment (MSA) is one of the most important problems in computational biology. As availability of genomic and proteomic data constantly increases, new tools for processing this data in reasonable time are needed. One method of addressing this issue is parallelization. Nowadays, graphical processing units offer much more computational power than central processors, hence GPUs become more and more popular in computational-intensive tasks, including sequence alignment. We investigate the constrained multiple sequence alignment problem (CMSA) which allows some prior knowledge to be introduced to a final alignment. As a result we propose a GPU-parallel version of the Center Star algorithm which overtakes vastly its CPU-serial equivalent as well as the parallel version run on the quad-core processor. The speedups over CPU-serial algorithm were from 30 to 110 in the case of synthetic sets and from 55 to 75 for the real sequences obtained from the Pfam database.


2014 ◽  
Vol 31 (2) ◽  
pp. 283-296
Author(s):  
Guoli Ji ◽  
Yong Zeng ◽  
Zijiang Yang ◽  
Congting Ye ◽  
Jingci Yao

Purpose – The time complexity of most multiple sequence alignment algorithm is O(N2) or O(N3) (N is the number of sequences). In addition, with the development of biotechnology, the amount of biological sequences grows significantly. The traditional methods have some difficulties in handling large-scale sequence. The proposed Lemk_MSA method aims to reduce the time complexity, especially for large-scale sequences. At the same time, it can keep similar accuracy level compared to the traditional methods. Design/methodology/approach – LemK_MSA converts multiple sequence alignment into corresponding 10D vector alignment by ten types of copy modes based on Lempel-Ziv. Then, it uses k-means algorithm and NJ algorithm to divide the sequences into several groups and calculate guide tree of each group. A complete guide tree for multiple sequence alignment could be constructed by merging guide tree of every group. Moreover, for large-scale multiple sequence, Lemk_MSA proposes a GPU-based parallel way for distance matrix calculation. Findings – Under this approach, the time efficiency to process multiple sequence alignment can be improved. The high-throughput mouse antibody sequences are used to validate the proposed method. Compared to ClustalW, MAFFT and Mbed, LemK_MSA is more than ten times efficient while ensuring the alignment accuracy at the same time. Originality/value – This paper proposes a novel method with sequence vectorization for multiple sequence alignment based on Lempel-Ziv. A GPU-based parallel method has been designed for large-scale distance matrix calculation. It provides a new way for multiple sequence alignment research.


2017 ◽  
Author(s):  
Massimo Maiolo ◽  
Xiaolei Zhang ◽  
Manuel Gil ◽  
Maria Anisimova

AbstractSequence alignment lies at the heart of many evolutionary and comparative genomics studies. However, the optimal alignment of multiple sequences is NP-hard, so that exact algorithms become impractical for more than a few sequences. Thus, state of the art alignment methods employ progressive heuristics, breaking the problem into a series of pairwise alignments guided by a phylogenetic tree. Changes between homologous characters are typically modelled by a continuous-time Markov substitution model. In contrast, the dynamics of insertions and deletions (indels) are not modelled explicitly, because the computation of the marginal likelihood under such models has exponential time complexity in the number of taxa. Recently, Bouchard-Côté and Jordan [PNAS (2012) 110(4):1160–1166] have introduced a modification to a classical indel model, describing indel evolution on a phylogenetic tree as a Poisson process. The model termed PIP allows to compute the joint marginal probability of a multiple sequence alignment and a tree in linear time. Here, we present an new dynamic programming algorithm to align two multiple sequence alignments by maximum likelihood in polynomial time under PIP, and apply it a in progressive algorithm. To our knowledge, this is the first progressive alignment method using a rigorous mathematical formulation of an evolutionary indel process and with polynomial time complexity.


2020 ◽  
Vol 17 (1) ◽  
pp. 59-77
Author(s):  
Anand Kumar Nelapati ◽  
JagadeeshBabu PonnanEttiyappan

Background:Hyperuricemia and gout are the conditions, which is a response of accumulation of uric acid in the blood and urine. Uric acid is the product of purine metabolic pathway in humans. Uricase is a therapeutic enzyme that can enzymatically reduces the concentration of uric acid in serum and urine into more a soluble allantoin. Uricases are widely available in several sources like bacteria, fungi, yeast, plants and animals.Objective:The present study is aimed at elucidating the structure and physiochemical properties of uricase by insilico analysis.Methods:A total number of sixty amino acid sequences of uricase belongs to different sources were obtained from NCBI and different analysis like Multiple Sequence Alignment (MSA), homology search, phylogenetic relation, motif search, domain architecture and physiochemical properties including pI, EC, Ai, Ii, and were performed.Results:Multiple sequence alignment of all the selected protein sequences has exhibited distinct difference between bacterial, fungal, plant and animal sources based on the position-specific existence of conserved amino acid residues. The maximum homology of all the selected protein sequences is between 51-388. In singular category, homology is between 16-337 for bacterial uricase, 14-339 for fungal uricase, 12-317 for plants uricase, and 37-361 for animals uricase. The phylogenetic tree constructed based on the amino acid sequences disclosed clusters indicating that uricase is from different source. The physiochemical features revealed that the uricase amino acid residues are in between 300- 338 with a molecular weight as 33-39kDa and theoretical pI ranging from 4.95-8.88. The amino acid composition results showed that valine amino acid has a high average frequency of 8.79 percentage compared to different amino acids in all analyzed species.Conclusion:In the area of bioinformatics field, this work might be informative and a stepping-stone to other researchers to get an idea about the physicochemical features, evolutionary history and structural motifs of uricase that can be widely used in biotechnological and pharmaceutical industries. Therefore, the proposed in silico analysis can be considered for protein engineering work, as well as for gout therapy.


2019 ◽  
Vol 15 (4) ◽  
pp. 353-362
Author(s):  
Sambhaji B. Thakar ◽  
Maruti J. Dhanavade ◽  
Kailas D. Sonawane

Background: Legume plants are known for their rich medicinal and nutritional values. Large amount of medicinal information of various legume plants have been dispersed in the form of text. Objective: It is essential to design and construct a legume medicinal plants database, which integrate respective classes of legumes and include knowledge regarding medicinal applications along with their protein/enzyme sequences. Methods: The design and development of Legume Medicinal Plants Database (LegumeDB) has been done by using Microsoft Structure Query Language Server 2017. DBMS was used as back end and ASP.Net was used to lay out front end operations. VB.Net was used as arranged program for coding. Multiple sequence alignment, phylogenetic analysis and homology modeling techniques were also used. Results: This database includes information of 50 Legume medicinal species, which might be helpful to explore the information for researchers. Further, maturase K (matK) protein sequences of legumes and mangroves were retrieved from NCBI for multiple sequence alignment and phylogenetic analysis to understand evolutionary lineage between legumes and mangroves. Homology modeling technique was used to determine three-dimensional structure of matK from Legume species i.e. Vigna unguiculata using matK of mangrove species, Thespesia populnea as a template. The matK sequence analysis results indicate the conserved residues among legume and mangrove species. Conclusion: Phylogenetic analysis revealed closeness between legume species Vigna unguiculata and mangrove species Thespesia populnea to each other, indicating their similarity and origin from common ancestor. Thus, these studies might be helpful to understand evolutionary relationship between legumes and mangroves. : LegumeDB availability: http://legumedatabase.co.in


Sign in / Sign up

Export Citation Format

Share Document