scholarly journals HCAtk and pyHCA: A Toolkit and Python API for the Hydrophobic Cluster Analysis of Protein Sequences

2018 ◽  
Author(s):  
Tristan Bitard-Feildel ◽  
Isabelle Callebaut

AbstractMotivation: Detecting protein domains sharing no similarity to known domains, as stored in domain databases, is a challenging problem, particularly for unannotated proteomes, domains emerged recently, fast diverging proteins or domains with intrinsically disordered regions.Results: We developed pyHCA and HCAtk, a python API and standalone tool gathering together improved versions of previously developed methodologies, with new functionalities. The developed tools can be either used from command line or from a python API.Availability: HCAtk and pyHCA are available at https://github.com/T-B-F/pyHCA under the CeCILL-C license.

Biochimie ◽  
1990 ◽  
Vol 72 (8) ◽  
pp. 555-574 ◽  
Author(s):  
L. Lemesle-Varloot ◽  
B. Henrissat ◽  
C. Gaboriaud ◽  
V. Bissery ◽  
A. Morgat ◽  
...  

1988 ◽  
Vol 255 (3) ◽  
pp. 901-905 ◽  
Author(s):  
B Henrissat ◽  
Y Popineau ◽  
J C Kader

Hydrophobic-cluster analysis was used to characterize a conserved domain located near the C-terminal amino acid sequence of wheat (Triticum aestivum) storage proteins. This domain was transformed into a linear template for a global search for similarities in over 5200 protein sequences. In addition to proteins that had already been found to exhibit homology to wheat storage proteins, a previously unreported homology was found with non-specific lipid-transfer proteins from castor bean (Ricinus communis) and from spinach (Spinacia oleracea) leaf. Hydrophobic-cluster analysis of various members of the present protein group clearly shows a typical domain structure where (i) variable and conserved domains are located along the sequence at precise positions, (ii) the conserved domains probably reflect a common ancestor, and (iii) the unique properties of a given protein (chain cut into subunits, repetitive domains, trypsin-inhibitor active site) are associated with the variable domains.


2019 ◽  
Vol 47 (10) ◽  
pp. 4970-4973 ◽  
Author(s):  
Ben Smithers ◽  
Matt Oates ◽  
Julian Gough

Abstract The alignment between the boundaries of protein domains and the boundaries of exons could provide evidence for the evolution of proteins via domain shuffling, but literature in the field has so far struggled to conclusively show this. Here, on larger data sets than previously possible, we do finally show that this phenomenon is indisputably found widely across the eukaryotic tree. In contrast, the alignment between exons and the boundaries of intrinsically disordered regions of proteins is not a general property of eukaryotes. Most interesting of all is the discovery that domain–exon alignment is much more common in recently evolved protein sequences than older ones.


PROTEOMICS ◽  
2018 ◽  
Vol 18 (21-22) ◽  
pp. 1800054 ◽  
Author(s):  
Tristan Bitard-Feildel ◽  
Alexis Lamiable ◽  
Jean-Paul Mornon ◽  
Isabelle Callebaut

1995 ◽  
pp. 129-144
Author(s):  
Michel T. Semertzidis ◽  
Etienne Thoreau ◽  
Anne Tasso ◽  
Bernard Henrissat ◽  
Isabelle Callebaut ◽  
...  

1993 ◽  
Vol 9 (1) ◽  
pp. 37-44 ◽  
Author(s):  
L. Lemesle-Varloot ◽  
C. Gaboriaud ◽  
A. Morgat ◽  
G. Pantel ◽  
J. P. Mornon ◽  
...  

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Maria Littmann ◽  
Michael Heinzinger ◽  
Christian Dallago ◽  
Tobias Olenyi ◽  
Burkhard Rost

AbstractKnowing protein function is crucial to advance molecular and medical biology, yet experimental function annotations through the Gene Ontology (GO) exist for fewer than 0.5% of all known proteins. Computational methods bridge this sequence-annotation gap typically through homology-based annotation transfer by identifying sequence-similar proteins with known function or through prediction methods using evolutionary information. Here, we propose predicting GO terms through annotation transfer based on proximity of proteins in the SeqVec embedding rather than in sequence space. These embeddings originate from deep learned language models (LMs) for protein sequences (SeqVec) transferring the knowledge gained from predicting the next amino acid in 33 million protein sequences. Replicating the conditions of CAFA3, our method reaches an Fmax of 37 ± 2%, 50 ± 3%, and 57 ± 2% for BPO, MFO, and CCO, respectively. Numerically, this appears close to the top ten CAFA3 methods. When restricting the annotation transfer to proteins with < 20% pairwise sequence identity to the query, performance drops (Fmax BPO 33 ± 2%, MFO 43 ± 3%, CCO 53 ± 2%); this still outperforms naïve sequence-based transfer. Preliminary results from CAFA4 appear to confirm these findings. Overall, this new concept is likely to change the annotation of proteins, in particular for proteins from smaller families or proteins with intrinsically disordered regions.


Sign in / Sign up

Export Citation Format

Share Document