pairwise sequence identity Latest Research Papers

Ancestral sequences of a large promiscuous enzyme family correspond to bridges in sequence space in a network representation

Journal of The Royal Society Interface ◽

10.1098/rsif.2021.0389 ◽

2021 ◽

Vol 18 (184) ◽

Author(s):

Patrick C. F. Buchholz ◽

Bert van Loo ◽

Bernard D. G. Eenink ◽

Erich Bornberg-Bauer ◽

Jürgen Pleiss

Keyword(s):

Protein Sequence ◽

Protein Sequences ◽

Large Family ◽

Time Axis ◽

Pairwise Sequence Identity ◽

Consensus Sequences ◽

Sequence Identity ◽

Ancestral Sequences ◽

Hierarchical Grouping ◽

Ancestral Protein

Evolutionary relationships of protein families can be characterized either by networks or by trees. Whereas trees allow for hierarchical grouping and reconstruction of the most likely ancestral sequences, networks lack a time axis but allow for thresholds of pairwise sequence identity to be chosen and, therefore, the clustering of family members with presumably more similar functions. Here, we use the large family of arylsulfatases and phosphonate monoester hydrolases to investigate similarities, strengths and weaknesses in tree and network representations. For varying thresholds of pairwise sequence identity, values of betweenness centrality and clustering coefficients were derived for nodes of the reconstructed ancestors to measure the propensity to act as a bridge in a network. Based on these properties, ancestral protein sequences emerge as bridges in protein sequence networks. Interestingly, many ancestral protein sequences appear close to extant sequences. Therefore, reconstructed ancestor sequences might also be interpreted as yet-to-be-identified homologues. The concept of ancestor reconstruction is compared to consensus sequences, too. It was found that hub sequences in a network, e.g. reconstructed ancestral sequences that are connected to many neighbouring sequences, share closer similarity with derived consensus sequences. Therefore, some reconstructed ancestor sequences can also be interpreted as consensus sequences.

Download Full-text

Embeddings from deep learning transfer GO annotations beyond homology

Scientific Reports ◽

10.1038/s41598-020-80786-0 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Maria Littmann ◽

Michael Heinzinger ◽

Christian Dallago ◽

Tobias Olenyi ◽

Burkhard Rost

Keyword(s):

Protein Function ◽

Protein Sequences ◽

Language Models ◽

Evolutionary Information ◽

Pairwise Sequence Identity ◽

Intrinsically Disordered ◽

Intrinsically Disordered Regions ◽

Sequence Identity ◽

Experimental Function ◽

Go Terms

AbstractKnowing protein function is crucial to advance molecular and medical biology, yet experimental function annotations through the Gene Ontology (GO) exist for fewer than 0.5% of all known proteins. Computational methods bridge this sequence-annotation gap typically through homology-based annotation transfer by identifying sequence-similar proteins with known function or through prediction methods using evolutionary information. Here, we propose predicting GO terms through annotation transfer based on proximity of proteins in the SeqVec embedding rather than in sequence space. These embeddings originate from deep learned language models (LMs) for protein sequences (SeqVec) transferring the knowledge gained from predicting the next amino acid in 33 million protein sequences. Replicating the conditions of CAFA3, our method reaches an Fmax of 37 ± 2%, 50 ± 3%, and 57 ± 2% for BPO, MFO, and CCO, respectively. Numerically, this appears close to the top ten CAFA3 methods. When restricting the annotation transfer to proteins with < 20% pairwise sequence identity to the query, performance drops (Fmax BPO 33 ± 2%, MFO 43 ± 3%, CCO 53 ± 2%); this still outperforms naïve sequence-based transfer. Preliminary results from CAFA4 appear to confirm these findings. Overall, this new concept is likely to change the annotation of proteins, in particular for proteins from smaller families or proteins with intrinsically disordered regions.

Download Full-text

Embeddings from deep learning transfer GO annotations beyond homology

10.1101/2020.09.04.282814 ◽

2020 ◽

Author(s):

Maria Littmann ◽

Michael Heinzinger ◽

Christian Dallago ◽

Tobias Olenyi ◽

Burkhard Rost

Keyword(s):

Protein Function ◽

Protein Sequences ◽

Language Models ◽

Evolutionary Information ◽

Pairwise Sequence Identity ◽

Intrinsically Disordered ◽

Intrinsically Disordered Regions ◽

Sequence Identity ◽

Experimental Function ◽

Go Terms

AbstractKnowing protein function is crucial to advance molecular and medical biology, yet experimental function annotations through the Gene Ontology (GO) exist for fewer than 0.5% of all known proteins. Computational methods bridge this sequence-annotation gap typically through homology-based annotation transfer by identifying sequence-similar proteins with known function or through prediction methods using evolutionary information. Here, we propose predicting GO terms through annotation transfer based on proximity of proteins in the SeqVec embedding rather than in sequence space. These embeddings originate from deep learned language models (LMs) for protein sequences (SeqVec) transferring the knowledge gained from predicting the next amino acid in 33 million protein sequences. Replicating the conditions of CAFA3, our method reaches an Fmax of 37±2%, 50±3%, and 57±2% for BPO, MFO, and CCO, respectively. Numerically, this appears close to the top ten CAFA3 methods. When restricting the annotation transfer to proteins with <20% pairwise sequence identity to the query, performance drops (Fmax BPO 33±2%, MFO 43±3%, CCO 53±2%); this still outperforms naïve sequence-based transfer. Preliminary results from CAFA4 appear to confirm these findings. Overall, this new concept is likely to change the annotation of proteins, in particular for proteins from smaller families or proteins with intrinsically disordered regions.

Download Full-text

Mitogenomes of Giant-Skipper Butterflies reveal an ancient split between deep and shallow root feeders

F1000Research ◽

10.12688/f1000research.10970.1 ◽

2017 ◽

Vol 6 ◽

pp. 222 ◽

Cited By ~ 8

Author(s):

Jing Zhang ◽

Qian Cong ◽

Xiao-Ling Fan ◽

Rongjiang Wang ◽

Min Wang ◽

...

Keyword(s):

Phylogenetic Tree ◽

Maximum Likelihood Method ◽

Life Cycles ◽

Likelihood Method ◽

High Similarity ◽

Pairwise Sequence Identity ◽

Main Root ◽

Coding Regions ◽

Species Groups ◽

Gap Closing

Background: Giant-Skipper butterflies from the genus Megathymus are North American endemics. These large and thick-bodied Skippers resemble moths and are unique in their life cycles. Grub-like at the later stages of development, caterpillars of these species feed and live inside yucca roots. Adults do not feed and are mostly local, not straying far from the patches of yucca plants. Methods: Pieces of muscle were dissected from the thorax of specimens and genomic DNA was extracted (also from the abdomen of a specimen collected nearly 60 years ago). Paired-end libraries were prepared and sequenced for 150bp from both ends. The mitogenomes were assembled from the reads followed by a manual gap-closing procedure and a phylogenetic tree was constructed using a maximum likelihood method from an alignment of the mitogenomes. Results: We determined mitogenome sequences of nominal subspecies of all five known species of Megathymus and Agathymus mariae to confidently root the phylogenetic tree. Pairwise sequence identity indicates the high similarity, ranging from 88-96% among coding regions for 13 proteins, 22 tRNAs and 2 rRNA, with a gene order typical for mitogenomes of Lepidoptera. Phylogenetic analysis confirms that Giant-Skippers (Megathymini) originate within the subfamily Hesperiinae and do not warrant a subfamily rank. Genus Megathymus is monophyletic and splits into two species groups. M. streckeri and M. cofaqui caterpillars feed deep in the main root system of yucca plants and deposit frass underground. M. ursus, M. beulahae and M. yuccae feed in the yucca caudex and roots near the ground, and deposit frass outside through a "tent" (a silk tube projecting from the center of yucca plant). M. yuccae and M. beulahae are sister species consistently with morphological similarities between them. Conclusions: We constructed the first DNA-based phylogeny of the genus Megathymus from their mitogenomes. The phylogeny agrees with morphological considerations.

Download Full-text

Promicromonospora iranensis sp. nov., an actinobacterium isolated from rhizospheric soil

INTERNATIONAL JOURNAL OF SYSTEMATIC AND EVOLUTIONARY MICROBIOLOGY ◽

10.1099/ijs.0.063982-0 ◽

2014 ◽

Vol 64 (Pt_9) ◽

pp. 3314-3319 ◽

Cited By ~ 12

Author(s):

Fatemeh Mohammadipanah ◽

Javad Hamedi ◽

Cathrin Spröer ◽

María del Carmen Montero-Calasanz ◽

Peter Schumann ◽

...

Keyword(s):

Gene Sequence ◽

Fatty Acid Pattern ◽

Molecular Characteristics ◽

Rrna Gene ◽

Fars Province ◽

Pairwise Sequence Identity ◽

Content Type ◽

Link Type ◽

Gene Sequence Analysis ◽

Branched Chain

A novel strain of the genus Promicromonospora , designated HM 792T, was isolated from soil in Fars Province, Iran. On ISP 2 medium, the yellow-pigmented isolate produced long and branched hyphae that developed into a large number of irregularly shaped spores. It showed growth at 25–30 °C and pH 6.0–9.0 with 0–8 % (w/v) NaCl. Chemotaxonomic and molecular characteristics of the isolate matched those described for members of the genus Promicromonospora . Whole-cell hydrolysates of strain HM 792T contained the amino acids d-glutamic acid, l-alanine and l-lysine along with the sugars glucose and ribose. The main polar lipids were diphosphatidylglycerol, two unknown phospholipids, two unknown glycolipids and two unknown phosphoglycolipids, complemented by minor concentrations of phosphatidylinositol and phosphatidylglycerol. MK-9(H4) was the predominant menaquinone. The fatty-acid pattern was composed mainly of the saturated branched-chain acids anteiso-C15 : 0 and iso-C15 : 0. 16S rRNA gene sequence analysis showed the highest pairwise sequence identity (96.6–99.0 %) with the members of the genus Promicromonospora . Based on phenotypic and genotypic features, strain HM 792T is considered to represent a novel species of the genus Promicromonospora , for which the name Promicromonospora iranensis sp. nov. is proposed. Strain HM 792T ( = DSM 45554T = UTMC00792T = CCUG 63022T) is the type strain.

Download Full-text

Faculty Opinions recommendation of How well is enzyme function conserved as a function of pairwise sequence identity?

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1016794.201479 ◽

2004 ◽

Author(s):

Mark Nelson

Keyword(s):

Enzyme Function ◽

Pairwise Sequence Identity ◽

Sequence Identity

Download Full-text

How Well is Enzyme Function Conserved as a Function of Pairwise Sequence Identity?

Journal of Molecular Biology ◽

10.1016/j.jmb.2003.08.057 ◽

2003 ◽

Vol 333 (4) ◽

pp. 863-882 ◽

Cited By ~ 241

Author(s):

Weidong Tian ◽

Jeffrey Skolnick

Keyword(s):

Enzyme Function ◽

Pairwise Sequence Identity ◽

Sequence Identity

Download Full-text

pairwise sequence identity
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Ancestral sequences of a large promiscuous enzyme family correspond to bridges in sequence space in a network representation

Embeddings from deep learning transfer GO annotations beyond homology

Embeddings from deep learning transfer GO annotations beyond homology

Mitogenomes of Giant-Skipper Butterflies reveal an ancient split between deep and shallow root feeders

Promicromonospora iranensis sp. nov., an actinobacterium isolated from rhizospheric soil

Faculty Opinions recommendation of How well is enzyme function conserved as a function of pairwise sequence identity?

How Well is Enzyme Function Conserved as a Function of Pairwise Sequence Identity?

Export Citation Format

pairwise sequence identityRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Ancestral sequences of a large promiscuous enzyme family correspond to bridges in sequence space in a network representation

Embeddings from deep learning transfer GO annotations beyond homology

Embeddings from deep learning transfer GO annotations beyond homology

Mitogenomes of Giant-Skipper Butterflies reveal an ancient split between deep and shallow root feeders

Promicromonospora iranensis sp. nov., an actinobacterium isolated from rhizospheric soil

Faculty Opinions recommendation of How well is enzyme function conserved as a function of pairwise sequence identity?

How Well is Enzyme Function Conserved as a Function of Pairwise Sequence Identity?

pairwise sequence identity
Recently Published Documents