scholarly journals SHOOT: phylogenetic gene search and ortholog inference

2021 ◽  
Author(s):  
David Emms ◽  
Steven Kelly

Determining the evolutionary relationships between gene sequences is fundamental to comparative biological research. However, conducting such analyses requires a high degree of technical proficiency in several computational tools including gene family construction, multiple sequence alignment, and phylogenetic inference. Here we present SHOOT, an easy to use phylogenetic search engine for fast and accurate phylogenetic analysis of biological sequences. SHOOT searches a user-provided query sequence against a database of phylogenetic trees of gene sequences (gene trees) and returns a gene tree with the given query sequence correctly grafted within it. We show that SHOOT can perform this search and placement with comparable speed to a conventional BLAST search. We demonstrate that SHOOT phylogenetic placements are as accurate as conventional multiple sequence alignment and maximum likelihood tree inference approaches. We further show that SHOOT can be used to identify orthologs with equivalent accuracy to conventional orthology inference methods. In summary, SHOOT is an accurate and fast tool for complete phylogenetic analysis of novel query sequences. An easy to use webserver is available online at www.shoot.bio.

2019 ◽  
Vol 15 (4) ◽  
pp. 353-362
Author(s):  
Sambhaji B. Thakar ◽  
Maruti J. Dhanavade ◽  
Kailas D. Sonawane

Background: Legume plants are known for their rich medicinal and nutritional values. Large amount of medicinal information of various legume plants have been dispersed in the form of text. Objective: It is essential to design and construct a legume medicinal plants database, which integrate respective classes of legumes and include knowledge regarding medicinal applications along with their protein/enzyme sequences. Methods: The design and development of Legume Medicinal Plants Database (LegumeDB) has been done by using Microsoft Structure Query Language Server 2017. DBMS was used as back end and ASP.Net was used to lay out front end operations. VB.Net was used as arranged program for coding. Multiple sequence alignment, phylogenetic analysis and homology modeling techniques were also used. Results: This database includes information of 50 Legume medicinal species, which might be helpful to explore the information for researchers. Further, maturase K (matK) protein sequences of legumes and mangroves were retrieved from NCBI for multiple sequence alignment and phylogenetic analysis to understand evolutionary lineage between legumes and mangroves. Homology modeling technique was used to determine three-dimensional structure of matK from Legume species i.e. Vigna unguiculata using matK of mangrove species, Thespesia populnea as a template. The matK sequence analysis results indicate the conserved residues among legume and mangrove species. Conclusion: Phylogenetic analysis revealed closeness between legume species Vigna unguiculata and mangrove species Thespesia populnea to each other, indicating their similarity and origin from common ancestor. Thus, these studies might be helpful to understand evolutionary relationship between legumes and mangroves. : LegumeDB availability: http://legumedatabase.co.in


2020 ◽  
pp. 565-579 ◽  
Author(s):  
Mohamed Issa ◽  
Aboul Ella Hassanien

Sequence alignment is a vital process in many biological applications such as Phylogenetic trees construction, DNA fragment assembly and structure/function prediction. Two kinds of alignment are pairwise alignment which align two sequences and Multiple Sequence alignment (MSA) that align sequences more than two. The accurate method of alignment is based on Dynamic Programming (DP) approach which suffering from increasing time exponentially with increasing the length and the number of the aligned sequences. Stochastic or meta-heuristics techniques speed up alignment algorithm but with near optimal alignment accuracy not as that of DP. Hence, This chapter aims to review the recent development of MSA using meta-heuristics algorithms. In addition, two recent techniques are focused in more deep: the first is Fragmented protein sequence alignment using two-layer particle swarm optimization (FTLPSO). The second is Multiple sequence alignment using multi-objective based bacterial foraging optimization algorithm (MO-BFO).


2019 ◽  
Vol 20 (S18) ◽  
Author(s):  
Qing Zhan ◽  
Nan Wang ◽  
Shuilin Jin ◽  
Renjie Tan ◽  
Qinghua Jiang ◽  
...  

Abstract Background During procedures for conducting multiple sequence alignment, that is so essential to use the substitution score of pairwise alignment. To compute adaptive scores for alignment, researchers usually use Hidden Markov Model or probabilistic consistency methods such as partition function. Recent studies show that optimizing the parameters for hidden Markov model, as well as integrating hidden Markov model with partition function can raise the accuracy of alignment. The combination of partition function and optimized HMM, which could further improve the alignment’s accuracy, however, was ignored by these researches. Results A novel algorithm for MSA called ProbPFP is presented in this paper. It intergrate optimized HMM by particle swarm with partition function. The algorithm of PSO was applied to optimize HMM’s parameters. After that, the posterior probability obtained by the HMM was combined with the one obtained by partition function, and thus to calculate an integrated substitution score for alignment. In order to evaluate the effectiveness of ProbPFP, we compared it with 13 outstanding or classic MSA methods. The results demonstrate that the alignments obtained by ProbPFP got the maximum mean TC scores and mean SP scores on these two benchmark datasets: SABmark and OXBench, and it got the second highest mean TC scores and mean SP scores on the benchmark dataset BAliBASE. ProbPFP is also compared with 4 other outstanding methods, by reconstructing the phylogenetic trees for six protein families extracted from the database TreeFam, based on the alignments obtained by these 5 methods. The result indicates that the reference trees are closer to the phylogenetic trees reconstructed from the alignments obtained by ProbPFP than the other methods. Conclusions We propose a new multiple sequence alignment method combining optimized HMM and partition function in this paper. The performance validates this method could make a great improvement of the alignment’s accuracy.


2017 ◽  
Vol 23 (1) ◽  
pp. 36
Author(s):  
Sesanti Basuki ◽  
Sudarsono Sudarsono

<p align="center">Abstrak</p><p>Gen <em>PMT</em> adalah gen penyandi enzim putresina N-metiltransferase (PMT) yang berperan dalam lintasan biosintesis nikotin pada tanaman tembakau (<em>Nicotiana tabacum</em>). Sepuluh varietas tembakau yang memiliki perbedaan tingkat kadar nikotin diuji untuk mempelajari: (1) keragaman runutan basa parsial gen <em>PMT</em> dari masing-masing varietas, dan (2) kekerabatan antara sepuluh varietas tembakau yang diuji berdasarkan keragaman runutan basa parsial gen <em>PMT</em>. Keragaman runutan basa dianalisis dengan mensejajarkan data runutan basa dari sepuluh varietas tembakau yang diuji dengan runutan basa dari <em>Ntpmt_Sindoro1</em> (JQ438825) yang telah tersimpan dalam <em>database</em><em> </em><em>genbank NCBI</em>. Hasil pensejajaran digunakan untuk menghitung matriks jarak, yang selanjutnya digunakan untuk menganalisis hubungan kekerabatan diantara sepuluh varietas tembakau. Hasil analisis memperlihatkan adanya variasi ukuran dan jumlah runutan basaparsial gen <em>PMT</em> asal sepuluh varietas tembakau yang dianalisis. Hasil analisis juga memperlihatkan bahwa runutan basa parsial gen <em>PMT</em> tersebut berasal/diturunkan dari sumber (<em>ancestor</em>) yang sama dan  terkait dengan biosintesis nikotin pada tembakau. Runutan basaparsial gen <em>PMT</em> dari sepuluh varietas yang dianalisis memisahkan antara kelompok tembakau introduksi (kadar nikotin rendah-sedang) dengan kelompok tembakau lokal (kadar nikotin sedang-tinggi). Dua kelompok memisah berdasarkan level kadar nikotin, danperbedaan/perubahan susunan basa pada situs-situs tertentu dari runutan basaparsial gen <em>PMT</em>  yang dianalisis. Informasi tentang mutasi yang terjadi pada situs-situs runutan basa dari parsial gen <em>PMT</em><em> </em>dapat digunakan untuk mempelajari keterkaitan antara perubahan basa pada fragmen gen <em>PMT</em> dengan kandungan nikotin total tembakau yang terjadi selama proses evolusi.</p><p>Kata kunci: Analisis pengelompokkan, gen <em>PMT,</em>Nikotin, <em>Nicotiana tabacum</em></p><p align="center"><strong> </strong></p><p align="center">Abstract</p><p><strong> </strong><em>PMT</em> gene is the gene encoded putrescine N-methiltransferase which is related to nicotine biosinthesis in tobacco (<em>Nicotiana  tabacum</em>). Ten tobacco varieties with different nicotine level were used inthis study. The aims of this study were: (1) to analyze thepartial <em>PMT</em> gene sequence diversity among ten tobacco varieties, and (2) to evaluate the closed-relationship amongten tobacco varieties based on their partial<em>PMT</em> gene sequences diversity.Sequence diversity was analyzed by multiple sequence alignment between the partial<em>PMT</em> gene sequence of the ten tobacco varietiesand <em>Ntpmt_Sindoro1 </em>sequence deposited in the NCBI gene-bank database.The phylogenetic relationship amongthe sequences was inferred by genetic distancebetween pairs of sequences using the pairwise and multiple sequence alignment analysis. Analysis of the sequences showed that all varieties analyzed had varied in size and number of the <em>PMT</em> gene fragments yielded. The analysis also revealed that thepartial<em>PMT</em> gene sequencesarecoming from the same ancestor which related to nicotine biosynthesis in tobacco. Phylogenetic analysis separated the partial<em>PMT</em> gene sequences into two different branches significantly (bootstrap value = 100), and clustered together based on tobacco types with different nicotine level in whichcould be due to some baseschanged on the specific sites of the<em>PMT</em> gene sequences.  This information could be used to study the relationship between some bases changed on the specific sites of the<em>PMT</em> gene sequences and the nicotine content variation yielded by the ten tobacco varieties that is happened during evolution time.</p><p>Key words: Clustering analysis, <em>PMT</em> gene, nicotine, <em>Nicotiana tabacum</em></p>


2021 ◽  
Author(s):  
Frederic Lemoine ◽  
Olivier Gascuel

Besides computer intensive steps, phylogenetic analysis workflows are usually composed of many small, reccuring, but important data manipulations steps. Among these, we can find file reformatting, sequence renaming, tree re-rooting, tree comparison, bootstrap support computation, etc. These are often performed by custom scripts or by several heterogeneous tools, which may be error prone, uneasy to maintain and produce results that are challenging to reproduce. For all these reasons, the development and reuse of phylogenetic workflows is often a complex task. We identified many operations that are part of most phylogenetic analyses, and implemented them in a toolkit called Gotree/Goalign. The Gotree/Goalign toolkit implements more than 120 user-friendly commands and an API dedicated to multiple sequence alignment and phylogenetic tree manipulations. It is developed in Go, which makes executables efficient, easily installable, integrable in workflow environments, and parallelizable when possible. This toolkit is freely available on most platforms (Linux, MacOS and Windows) and most architectures (amd64, i386). Sources and binaries are available on GitHub at https://github.com/evolbioinfo/{gotree|goalign} , Bioconda, and DockerHub.


Author(s):  
Mohamed Issa ◽  
Aboul Ella Hassanien

Sequence alignment is a vital process in many biological applications such as Phylogenetic trees construction, DNA fragment assembly and structure/function prediction. Two kinds of alignment are pairwise alignment which align two sequences and Multiple Sequence alignment (MSA) that align sequences more than two. The accurate method of alignment is based on Dynamic Programming (DP) approach which suffering from increasing time exponentially with increasing the length and the number of the aligned sequences. Stochastic or meta-heuristics techniques speed up alignment algorithm but with near optimal alignment accuracy not as that of DP. Hence, This chapter aims to review the recent development of MSA using meta-heuristics algorithms. In addition, two recent techniques are focused in more deep: the first is Fragmented protein sequence alignment using two-layer particle swarm optimization (FTLPSO). The second is Multiple sequence alignment using multi-objective based bacterial foraging optimization algorithm (MO-BFO).


2019 ◽  
Author(s):  
Tasfia Zahin ◽  
Md. Hasin Abrar ◽  
Mizanur Rahman ◽  
Tahrina Tasnim ◽  
Md. Shamsuzzoha Bayzid ◽  
...  

AbstractPhylogenetic analysis i.e. construction of an accurate phylogenetic tree from genomic sequences of a set of species is one of the main challenges in bioinformatics. The popular approaches to this require aligning each pair of sequences to calculate pairwise distances or aligning all the sequences to construct a multiple sequence alignment. The computational complexity and difficulties in getting accurate alignments have led to development of alignment-free methods to estimate phylogenies. However, the alignment free approaches focus on computing distances between species and do not utilize statistical approaches for phylogeny estimation. Herein, we present a simple alignment free method for phylogeny construction based on contiguous sub-sequences of length k termed k-mers. The presence or absence of these k-mers are used to construct a phylogeny using a maximum likelihood approach. The results suggest our method is competitive with other alignment-free approaches, while outperforming them in some cases.


Intervirology ◽  
2020 ◽  
pp. 1-8
Author(s):  
Ravali Thota ◽  
Vishweshwar Kumar Ganji ◽  
Sharanya Machanagari ◽  
Narasimha Reddy Yella ◽  
Bhagyalakshmi Buddala ◽  
...  

<b><i>Introduction:</i></b> Bluetongue disease is an economically important viral disease of livestock caused by bluetongue virus (BTV) having multiple serotypes. It belongs to the genus <i>Orbivirus</i> of family Reoviridae and subfamily Sedoreovirinae. The genome of BTV is 10 segmented dsRNA that codes for 7 structural and 4 nonstructural proteins, of which VP2 was reported to be serotype-specific and a major antigenic determinant. <b><i>Objective:</i></b> It is important to know the circulating serotypes in a particular geographical location for effective control of the disease. The present study unravels the molecular evolution of the circulating BTV serotypes during 2014–2018 in Telangana and Andhra Pradesh states of India. <b><i>Methods:</i></b> Multiple sequence alignment with available BTV serotypes in GenBank and phylogenetic analysis were performed for the partial VP2 sequences of major circulating BTV serotypes during the study period. <b><i>Results:</i></b> The multiple sequence alignment of circulating serotypes with respective reference isolates revealed variations in antigenic VP2. The phylogenetic analysis revealed that the major circulating serotypes were grouped into eastern topotypes (BTV-1, BTV-2, BTV-4, and BTV-16) and Western topotypes (BTV-5, BTV-12, and BTV-24). <b><i>Conclusion:</i></b> Our study strengthens the need for development of an effective vaccine, which can induce the immune response for a range of serotypes within and in between topotypes.


Sign in / Sign up

Export Citation Format

Share Document