pairwise sequence identity
Recently Published Documents


TOTAL DOCUMENTS

7
(FIVE YEARS 2)

H-INDEX

3
(FIVE YEARS 0)

2021 ◽  
Vol 18 (184) ◽  
Author(s):  
Patrick C. F. Buchholz ◽  
Bert van Loo ◽  
Bernard D. G. Eenink ◽  
Erich Bornberg-Bauer ◽  
Jürgen Pleiss

Evolutionary relationships of protein families can be characterized either by networks or by trees. Whereas trees allow for hierarchical grouping and reconstruction of the most likely ancestral sequences, networks lack a time axis but allow for thresholds of pairwise sequence identity to be chosen and, therefore, the clustering of family members with presumably more similar functions. Here, we use the large family of arylsulfatases and phosphonate monoester hydrolases to investigate similarities, strengths and weaknesses in tree and network representations. For varying thresholds of pairwise sequence identity, values of betweenness centrality and clustering coefficients were derived for nodes of the reconstructed ancestors to measure the propensity to act as a bridge in a network. Based on these properties, ancestral protein sequences emerge as bridges in protein sequence networks. Interestingly, many ancestral protein sequences appear close to extant sequences. Therefore, reconstructed ancestor sequences might also be interpreted as yet-to-be-identified homologues. The concept of ancestor reconstruction is compared to consensus sequences, too. It was found that hub sequences in a network, e.g. reconstructed ancestral sequences that are connected to many neighbouring sequences, share closer similarity with derived consensus sequences. Therefore, some reconstructed ancestor sequences can also be interpreted as consensus sequences.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Maria Littmann ◽  
Michael Heinzinger ◽  
Christian Dallago ◽  
Tobias Olenyi ◽  
Burkhard Rost

AbstractKnowing protein function is crucial to advance molecular and medical biology, yet experimental function annotations through the Gene Ontology (GO) exist for fewer than 0.5% of all known proteins. Computational methods bridge this sequence-annotation gap typically through homology-based annotation transfer by identifying sequence-similar proteins with known function or through prediction methods using evolutionary information. Here, we propose predicting GO terms through annotation transfer based on proximity of proteins in the SeqVec embedding rather than in sequence space. These embeddings originate from deep learned language models (LMs) for protein sequences (SeqVec) transferring the knowledge gained from predicting the next amino acid in 33 million protein sequences. Replicating the conditions of CAFA3, our method reaches an Fmax of 37 ± 2%, 50 ± 3%, and 57 ± 2% for BPO, MFO, and CCO, respectively. Numerically, this appears close to the top ten CAFA3 methods. When restricting the annotation transfer to proteins with < 20% pairwise sequence identity to the query, performance drops (Fmax BPO 33 ± 2%, MFO 43 ± 3%, CCO 53 ± 2%); this still outperforms naïve sequence-based transfer. Preliminary results from CAFA4 appear to confirm these findings. Overall, this new concept is likely to change the annotation of proteins, in particular for proteins from smaller families or proteins with intrinsically disordered regions.


2020 ◽  
Author(s):  
Maria Littmann ◽  
Michael Heinzinger ◽  
Christian Dallago ◽  
Tobias Olenyi ◽  
Burkhard Rost

AbstractKnowing protein function is crucial to advance molecular and medical biology, yet experimental function annotations through the Gene Ontology (GO) exist for fewer than 0.5% of all known proteins. Computational methods bridge this sequence-annotation gap typically through homology-based annotation transfer by identifying sequence-similar proteins with known function or through prediction methods using evolutionary information. Here, we propose predicting GO terms through annotation transfer based on proximity of proteins in the SeqVec embedding rather than in sequence space. These embeddings originate from deep learned language models (LMs) for protein sequences (SeqVec) transferring the knowledge gained from predicting the next amino acid in 33 million protein sequences. Replicating the conditions of CAFA3, our method reaches an Fmax of 37±2%, 50±3%, and 57±2% for BPO, MFO, and CCO, respectively. Numerically, this appears close to the top ten CAFA3 methods. When restricting the annotation transfer to proteins with <20% pairwise sequence identity to the query, performance drops (Fmax BPO 33±2%, MFO 43±3%, CCO 53±2%); this still outperforms naïve sequence-based transfer. Preliminary results from CAFA4 appear to confirm these findings. Overall, this new concept is likely to change the annotation of proteins, in particular for proteins from smaller families or proteins with intrinsically disordered regions.


F1000Research ◽  
2017 ◽  
Vol 6 ◽  
pp. 222 ◽  
Author(s):  
Jing Zhang ◽  
Qian Cong ◽  
Xiao-Ling Fan ◽  
Rongjiang Wang ◽  
Min Wang ◽  
...  

Background: Giant-Skipper butterflies from the genus Megathymus are North American endemics. These large and thick-bodied Skippers resemble moths and are unique in their life cycles. Grub-like at the later stages of development, caterpillars of these species feed and live inside yucca roots. Adults do not feed and are mostly local, not straying far from the patches of yucca plants. Methods: Pieces of muscle were dissected from the thorax of specimens and genomic DNA was extracted (also from the abdomen of a specimen collected nearly 60 years ago). Paired-end libraries were prepared and sequenced for 150bp from both ends. The mitogenomes were assembled from the reads followed by a manual gap-closing procedure and a phylogenetic tree was constructed using a maximum likelihood method from an alignment of the mitogenomes. Results: We determined mitogenome sequences of nominal subspecies of all five known species of Megathymus and Agathymus mariae to confidently root the phylogenetic tree. Pairwise sequence identity indicates the high similarity, ranging from 88-96% among coding regions for 13 proteins, 22 tRNAs and 2 rRNA, with a gene order typical for mitogenomes of Lepidoptera. Phylogenetic analysis confirms that Giant-Skippers (Megathymini) originate within the subfamily Hesperiinae and do not warrant a subfamily rank. Genus Megathymus is monophyletic and splits into two species groups. M. streckeri and M. cofaqui caterpillars feed deep in the main root system of yucca plants and deposit frass underground. M. ursus, M. beulahae and M. yuccae feed in the yucca caudex and roots near the ground, and deposit frass outside through a "tent" (a silk tube projecting from the center of yucca plant). M. yuccae and M. beulahae are sister species consistently with morphological similarities between them. Conclusions: We constructed the first DNA-based phylogeny of the genus Megathymus from their mitogenomes. The phylogeny agrees with morphological considerations.


2014 ◽  
Vol 64 (Pt_9) ◽  
pp. 3314-3319 ◽  
Author(s):  
Fatemeh Mohammadipanah ◽  
Javad Hamedi ◽  
Cathrin Spröer ◽  
María del Carmen Montero-Calasanz ◽  
Peter Schumann ◽  
...  

A novel strain of the genus Promicromonospora , designated HM 792T, was isolated from soil in Fars Province, Iran. On ISP 2 medium, the yellow-pigmented isolate produced long and branched hyphae that developed into a large number of irregularly shaped spores. It showed growth at 25–30 °C and pH 6.0–9.0 with 0–8 % (w/v) NaCl. Chemotaxonomic and molecular characteristics of the isolate matched those described for members of the genus Promicromonospora . Whole-cell hydrolysates of strain HM 792T contained the amino acids d-glutamic acid, l-alanine and l-lysine along with the sugars glucose and ribose. The main polar lipids were diphosphatidylglycerol, two unknown phospholipids, two unknown glycolipids and two unknown phosphoglycolipids, complemented by minor concentrations of phosphatidylinositol and phosphatidylglycerol. MK-9(H4) was the predominant menaquinone. The fatty-acid pattern was composed mainly of the saturated branched-chain acids anteiso-C15 : 0 and iso-C15 : 0. 16S rRNA gene sequence analysis showed the highest pairwise sequence identity (96.6–99.0 %) with the members of the genus Promicromonospora . Based on phenotypic and genotypic features, strain HM 792T is considered to represent a novel species of the genus Promicromonospora , for which the name Promicromonospora iranensis sp. nov. is proposed. Strain HM 792T ( = DSM 45554T = UTMC00792T = CCUG 63022T) is the type strain.


Sign in / Sign up

Export Citation Format

Share Document