HCAtk and pyHCA: A Toolkit and Python API for the Hydrophobic Cluster Analysis of Protein Sequences

Mapping Intimacies ◽

10.1101/249995 ◽

2018 ◽

Cited By ~ 2

Author(s):

Tristan Bitard-Feildel ◽

Isabelle Callebaut

Keyword(s):

Cluster Analysis ◽

Protein Domains ◽

Protein Sequences ◽

Command Line ◽

Challenging Problem ◽

Hydrophobic Cluster Analysis ◽

Intrinsically Disordered ◽

Intrinsically Disordered Regions ◽

Hydrophobic Cluster ◽

Link Type

AbstractMotivation: Detecting protein domains sharing no similarity to known domains, as stored in domain databases, is a challenging problem, particularly for unannotated proteomes, domains emerged recently, fast diverging proteins or domains with intrinsically disordered regions.Results: We developed pyHCA and HCAtk, a python API and standalone tool gathering together improved versions of previously developed methodologies, with new functionalities. The developed tools can be either used from command line or from a python API.Availability: HCAtk and pyHCA are available at https://github.com/T-B-F/pyHCA under the CeCILL-C license.

Download Full-text

Hydrophobic cluster analysis: procedures to derive structural and functional information from 2-D-representation of protein sequences

Biochimie ◽

10.1016/0300-9084(90)90120-6 ◽

1990 ◽

Vol 72 (8) ◽

pp. 555-574 ◽

Cited By ~ 186

Author(s):

L. Lemesle-Varloot ◽

B. Henrissat ◽

C. Gaboriaud ◽

V. Bissery ◽

A. Morgat ◽

...

Keyword(s):

Cluster Analysis ◽

Protein Sequences ◽

Functional Information ◽

Hydrophobic Cluster Analysis ◽

Hydrophobic Cluster

Download Full-text

Hydrophobic-cluster analysis of plant protein sequences. A domain homology between storage and lipid-transfer proteins

Biochemical Journal ◽

10.1042/bj2550901 ◽

1988 ◽

Vol 255 (3) ◽

pp. 901-905 ◽

Cited By ~ 25

Author(s):

B Henrissat ◽

Y Popineau ◽

J C Kader

Keyword(s):

Cluster Analysis ◽

Storage Proteins ◽

Spinacia Oleracea ◽

Protein Sequences ◽

Protein Chain ◽

Lipid Transfer ◽

Lipid Transfer Proteins ◽

Conserved Domains ◽

Hydrophobic Cluster Analysis ◽

Hydrophobic Cluster

Hydrophobic-cluster analysis was used to characterize a conserved domain located near the C-terminal amino acid sequence of wheat (Triticum aestivum) storage proteins. This domain was transformed into a linear template for a global search for similarities in over 5200 protein sequences. In addition to proteins that had already been found to exhibit homology to wheat storage proteins, a previously unreported homology was found with non-specific lipid-transfer proteins from castor bean (Ricinus communis) and from spinach (Spinacia oleracea) leaf. Hydrophobic-cluster analysis of various members of the present protein group clearly shows a typical domain structure where (i) variable and conserved domains are located along the sequence at precise positions, (ii) the conserved domains probably reflect a common ancestor, and (iii) the unique properties of a given protein (chain cut into subunits, repetitive domains, trypsin-inhibitor active site) are associated with the variable domains.

Download Full-text

‘Why genes in pieces?’—revisited

Nucleic Acids Research ◽

10.1093/nar/gkz284 ◽

2019 ◽

Vol 47 (10) ◽

pp. 4970-4973 ◽

Cited By ~ 3

Author(s):

Ben Smithers ◽

Matt Oates ◽

Julian Gough

Keyword(s):

General Property ◽

Protein Domains ◽

Protein Sequences ◽

Data Sets ◽

Domain Shuffling ◽

Intrinsically Disordered ◽

Intrinsically Disordered Regions ◽

Evolution Of Proteins ◽

Domain Exon ◽

Disordered Regions

Abstract The alignment between the boundaries of protein domains and the boundaries of exons could provide evidence for the evolution of proteins via domain shuffling, but literature in the field has so far struggled to conclusively show this. Here, on larger data sets than previously possible, we do finally show that this phenomenon is indisputably found widely across the eukaryotic tree. In contrast, the alignment between exons and the boundaries of intrinsically disordered regions of proteins is not a general property of eukaryotes. Most interesting of all is the discovery that domain–exon alignment is much more common in recently evolved protein sequences than older ones.

Download Full-text

Hydrophobic Cluster Analysis of G Protein-Coupled Receptors: A Powerful tool to Derive Structural and Functional Information from 2D-Representation of Protein Sequences

Journal of Receptor Research ◽

10.3109/10799899309073654 ◽

1993 ◽

Vol 13 (1-4) ◽

pp. 179-194 ◽

Cited By ~ 4

Author(s):

Klaus-Ulrich Lentes ◽

Eric Mathieu ◽

Rainer Bischoff

Keyword(s):

Cluster Analysis ◽

G Protein ◽

Protein Sequences ◽

G Protein Coupled Receptors ◽

Functional Information ◽

Hydrophobic Cluster Analysis ◽

Hydrophobic Cluster ◽

G Protein Coupled

Download Full-text

Order in Disorder as Observed by the “Hydrophobic Cluster Analysis” of Protein Sequences

PROTEOMICS ◽

10.1002/pmic.201800054 ◽

2018 ◽

Vol 18 (21-22) ◽

pp. 1800054 ◽

Cited By ~ 8

Author(s):

Tristan Bitard-Feildel ◽

Alexis Lamiable ◽

Jean-Paul Mornon ◽

Isabelle Callebaut

Keyword(s):

Cluster Analysis ◽

Protein Sequences ◽

Hydrophobic Cluster Analysis ◽

Hydrophobic Cluster

Download Full-text

Visualization of Protein Sequences Using the Two-Dimensional Hydrophobic Cluster Analysis Method

Visualizing Biological Information ◽

10.1142/9789812832054_0012 ◽

1995 ◽

pp. 129-144

Author(s):

Michel T. Semertzidis ◽

Etienne Thoreau ◽

Anne Tasso ◽

Bernard Henrissat ◽

Isabelle Callebaut ◽

...

Keyword(s):

Cluster Analysis ◽

Protein Sequences ◽

Two Dimensional ◽

Analysis Method ◽

Hydrophobic Cluster Analysis ◽

Hydrophobic Cluster ◽

Cluster Analysis Method

Download Full-text

MANSEK and SUNHCA. Two interactive programs for the hydrophobic cluster analysis of protein sequences

Bioinformatics ◽

10.1093/bioinformatics/9.1.37 ◽

1993 ◽

Vol 9 (1) ◽

pp. 37-44 ◽

Cited By ~ 1

Author(s):

L. Lemesle-Varloot ◽

C. Gaboriaud ◽

A. Morgat ◽

G. Pantel ◽

J. P. Mornon ◽

...

Keyword(s):

Cluster Analysis ◽

Protein Sequences ◽

Hydrophobic Cluster Analysis ◽

Hydrophobic Cluster

Download Full-text

Embeddings from deep learning transfer GO annotations beyond homology

Scientific Reports ◽

10.1038/s41598-020-80786-0 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Maria Littmann ◽

Michael Heinzinger ◽

Christian Dallago ◽

Tobias Olenyi ◽

Burkhard Rost

Keyword(s):

Protein Function ◽

Protein Sequences ◽

Language Models ◽

Evolutionary Information ◽

Pairwise Sequence Identity ◽

Intrinsically Disordered ◽

Intrinsically Disordered Regions ◽

Sequence Identity ◽

Experimental Function ◽

Go Terms

AbstractKnowing protein function is crucial to advance molecular and medical biology, yet experimental function annotations through the Gene Ontology (GO) exist for fewer than 0.5% of all known proteins. Computational methods bridge this sequence-annotation gap typically through homology-based annotation transfer by identifying sequence-similar proteins with known function or through prediction methods using evolutionary information. Here, we propose predicting GO terms through annotation transfer based on proximity of proteins in the SeqVec embedding rather than in sequence space. These embeddings originate from deep learned language models (LMs) for protein sequences (SeqVec) transferring the knowledge gained from predicting the next amino acid in 33 million protein sequences. Replicating the conditions of CAFA3, our method reaches an Fmax of 37 ± 2%, 50 ± 3%, and 57 ± 2% for BPO, MFO, and CCO, respectively. Numerically, this appears close to the top ten CAFA3 methods. When restricting the annotation transfer to proteins with < 20% pairwise sequence identity to the query, performance drops (Fmax BPO 33 ± 2%, MFO 43 ± 3%, CCO 53 ± 2%); this still outperforms naïve sequence-based transfer. Preliminary results from CAFA4 appear to confirm these findings. Overall, this new concept is likely to change the annotation of proteins, in particular for proteins from smaller families or proteins with intrinsically disordered regions.

Download Full-text

Structural symmetry of the extracellular domain of the Cytokine/Growth hormone/Prolactin receptor family and Interferon receptors revealed by Hydrophobic Cluster Analysis

FEBS Letters ◽

10.1016/0014-5793(91)80437-8 ◽

1991 ◽

Vol 282 (1) ◽

pp. 26-31 ◽

Cited By ~ 103

Author(s):

E. Thoreau ◽

B. Petridou ◽

P.A. Kelly ◽

J. Djiane ◽

J.P. Mornon

Keyword(s):

Growth Hormone ◽

Cluster Analysis ◽

Prolactin Receptor ◽

Extracellular Domain ◽

Structural Symmetry ◽

Hydrophobic Cluster Analysis ◽

Hydrophobic Cluster ◽

Receptor Family

Download Full-text

Deciphering protein sequence information through hydrophobic cluster analysis (HCA): current status and perspectives

Cellular and Molecular Life Sciences ◽

10.1007/s000180050082 ◽

1997 ◽

Vol 53 (8) ◽

pp. 621-645 ◽

Cited By ~ 338

Author(s):

I. Callebaut ◽

G. Labesse ◽

P. Durand ◽

A. Poupon ◽

L. Canard ◽

...

Keyword(s):

Cluster Analysis ◽

Protein Sequence ◽

Current Status ◽

Sequence Information ◽

Hydrophobic Cluster Analysis ◽

Hydrophobic Cluster

Download Full-text