Prediction and validation of protein–protein interactors from genome-wide DNA-binding data using a knowledge-based machine-learning approach

Ashley J. Waardenberg; Bernou Homan; Stephanie Mohamed; Richard P. Harvey; Romaric Bouveret

doi:10.1098/rsob.160183

Prediction and validation of protein–protein interactors from genome-wide DNA-binding data using a knowledge-based machine-learning approach

Open Biology ◽

10.1098/rsob.160183 ◽

2016 ◽

Vol 6 (9) ◽

pp. 160183 ◽

Cited By ~ 9

Author(s):

Ashley J. Waardenberg ◽

Bernou Homan ◽

Stephanie Mohamed ◽

Richard P. Harvey ◽

Romaric Bouveret

Keyword(s):

Machine Learning ◽

Dna Binding ◽

Protein Interactions ◽

Heart Development ◽

Machine Learning Algorithms ◽

Protein Protein Interactions ◽

Knowledge Based ◽

Genome Wide ◽

Binding Data ◽

Gene Regulatory

The ability to accurately predict the DNA targets and interacting cofactors of transcriptional regulators from genome-wide data can significantly advance our understanding of gene regulatory networks. NKX2-5 is a homeodomain transcription factor that sits high in the cardiac gene regulatory network and is essential for normal heart development. We previously identified genomic targets for NKX2-5 in mouse HL-1 atrial cardiomyocytes using DNA-adenine methyltransferase identification (DamID). Here, we apply machine learning algorithms and propose a knowledge-based feature selection method for predicting NKX2-5 protein : protein interactions based on motif grammar in genome-wide DNA-binding data. We assessed model performance using leave-one-out cross-validation and a completely independent DamID experiment performed with replicates. In addition to identifying previously described NKX2-5-interacting proteins, including GATA, HAND and TBX family members, a number of novel interactors were identified, with direct protein : protein interactions between NKX2-5 and retinoid X receptor (RXR), paired-related homeobox (PRRX) and Ikaros zinc fingers (IKZF) validated using the yeast two-hybrid assay. We also found that the interaction of RXRα with NKX2-5 mutations found in congenital heart disease (Q187H, R189G and R190H) was altered. These findings highlight an intuitive approach to accessing protein–protein interaction information of transcription factors in DNA-binding experiments.

Download Full-text

Targeting Virus-host Protein Interactions: Feature Extraction and Machine Learning Approaches

Current Drug Metabolism ◽

10.2174/1389200219666180829121038 ◽

2019 ◽

Vol 20 (3) ◽

pp. 177-184 ◽

Cited By ~ 16

Author(s):

Nantao Zheng ◽

Kairou Wang ◽

Weihua Zhan ◽

Lei Deng

Keyword(s):

Machine Learning ◽

Computational Methods ◽

Protein Interactions ◽

Prediction Models ◽

Learning Algorithms ◽

Biological Data ◽

Machine Learning Algorithms ◽

Host Protein ◽

Protein Protein Interactions ◽

Protein Motifs

Background:Targeting critical viral-host Protein-Protein Interactions (PPIs) has enormous application prospects for therapeutics. Using experimental methods to evaluate all possible virus-host PPIs is labor-intensive and time-consuming. Recent growth in computational identification of virus-host PPIs provides new opportunities for gaining biological insights, including applications in disease control. We provide an overview of recent computational approaches for studying virus-host PPI interactions.Methods:In this review, a variety of computational methods for virus-host PPIs prediction have been surveyed. These methods are categorized based on the features they utilize and different machine learning algorithms including classical and novel methods.Results:We describe the pivotal and representative features extracted from relevant sources of biological data, mainly include sequence signatures, known domain interactions, protein motifs and protein structure information. We focus on state-of-the-art machine learning algorithms that are used to build binary prediction models for the classification of virus-host protein pairs and discuss their abilities, weakness and future directions.Conclusion:The findings of this review confirm the importance of computational methods for finding the potential protein-protein interactions between virus and host. Although there has been significant progress in the prediction of virus-host PPIs in recent years, there is a lot of room for improvement in virus-host PPI prediction.

Download Full-text

Evaluation of Machine Learning Algorithms on Protein-Protein Interactions

Advances in Intelligent Systems and Computing - Man-Machine Interactions 3 ◽

10.1007/978-3-319-02309-0_22 ◽

2014 ◽

pp. 211-218 ◽

Cited By ~ 1

Author(s):

Indrajit Saha ◽

Tomas Klingström ◽

Simon Forsberg ◽

Johan Wikander ◽

Julian Zubek ◽

...

Keyword(s):

Machine Learning ◽

Protein Interactions ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Protein Protein Interactions

Download Full-text

Erratum: Evaluation of Machine Learning Algorithms on Protein-Protein Interactions

Advances in Intelligent Systems and Computing - Man-Machine Interactions 3 ◽

10.1007/978-3-319-02309-0_71 ◽

2014 ◽

pp. E1-E1 ◽

Cited By ~ 1

Author(s):

Indrajit Saha ◽

Tomas Klingström ◽

Simon Forsberg ◽

Johan Wikander ◽

Julian Zubek ◽

...

Keyword(s):

Machine Learning ◽

Protein Interactions ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Protein Protein Interactions

Download Full-text

Physicochemical descriptors to discriminate protein-protein interactions in permanent and transient complexes selected by means of machine learning algorithms

Proteins Structure Function and Bioinformatics ◽

10.1002/prot.21104 ◽

2006 ◽

Vol 65 (3) ◽

pp. 607-622 ◽

Cited By ~ 35

Author(s):

Peter Block ◽

Juri Paern ◽

Eyke Hüllermeier ◽

Paul Sanschagrin ◽

Christoph A. Sotriffer ◽

...

Keyword(s):

Machine Learning ◽

Protein Interactions ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Protein Protein Interactions ◽

Physicochemical Descriptors

Download Full-text

Estimating gene regulatory networks and protein-protein interactions of Saccharomyces cerevisiae from multiple genome-wide data

Bioinformatics ◽

10.1093/bioinformatics/bti1133 ◽

2005 ◽

Vol 21 (Suppl 2) ◽

pp. ii206-ii212 ◽

Cited By ~ 20

Author(s):

N. Nariai ◽

Y. Tamada ◽

S. Imoto ◽

S. Miyano

Keyword(s):

Saccharomyces Cerevisiae ◽

Gene Regulatory Networks ◽

Protein Interactions ◽

Regulatory Networks ◽

Protein Protein Interactions ◽

Multiple Genome ◽

Genome Wide ◽

Genome Wide Data ◽

Gene Regulatory

Download Full-text

Roles of the 14-3-3 gene family in cotton flowering

BMC Plant Biology ◽

10.1186/s12870-021-02923-9 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Na Sang ◽

Hui Liu ◽

Bin Ma ◽

Xianzhong Huang ◽

Lu Zhuo ◽

...

Keyword(s):

Gene Family ◽

Protein Interactions ◽

Leucine Zipper ◽

Expression Patterns ◽

Bimolecular Fluorescence Complementation ◽

Structural Motif ◽

Virus Induced Gene Silencing ◽

Protein Protein Interactions ◽

Basic Leucine Zipper ◽

Genome Wide

Abstract Background In plants, 14-3-3 proteins, also called GENERAL REGULATORY FACTORs (GRFs), encoded by a large multigene family, are involved in protein–protein interactions and play crucial roles in various physiological processes. No genome-wide analysis of the GRF gene family has been performed in cotton, and their functions in flowering are largely unknown. Results In this study, 17, 17, 31, and 17 GRF genes were identified in Gossypium herbaceum, G. arboreum, G. hirsutum, and G. raimondii, respectively, by genome-wide analyses and were designated as GheGRFs, GaGRFs, GhGRFs, and GrGRFs, respectively. A phylogenetic analysis revealed that these proteins were divided into ε and non-ε groups. Gene structural, motif composition, synteny, and duplicated gene analyses of the identified GRF genes provided insights into the evolution of this family in cotton. GhGRF genes exhibited diverse expression patterns in different tissues. Yeast two-hybrid and bimolecular fluorescence complementation assays showed that the GhGRFs interacted with the cotton FLOWERING LOCUS T homologue GhFT in the cytoplasm and nucleus, while they interacted with the basic leucine zipper transcription factor GhFD only in the nucleus. Virus-induced gene silencing in G. hirsutum and transgenic studies in Arabidopsis demonstrated that GhGRF3/6/9/15 repressed flowering and that GhGRF14 promoted flowering. Conclusions Here, 82 GRF genes were identified in cotton, and their gene and protein features, classification, evolution, and expression patterns were comprehensively and systematically investigated. The GhGRF3/6/9/15 interacted with GhFT and GhFD to form florigen activation complexs that inhibited flowering. However, GhGRF14 interacted with GhFT and GhFD to form florigen activation complex that promoted flowering. The results provide a foundation for further studies on the regulatory mechanisms of flowering.

Download Full-text

Mechanism of NanR gene repression and allosteric induction of bacterial sialic acid metabolism

Nature Communications ◽

10.1038/s41467-021-22253-6 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Christopher R. Horne ◽

Hariprasad Venugopal ◽

Santosh Panjikar ◽

David M. Wood ◽

Amy Henrickson ◽

...

Keyword(s):

Sialic Acid ◽

Dna Binding ◽

Protein Interactions ◽

Acid Metabolism ◽

Environmental Changes ◽

Dna Interaction ◽

Protein Protein Interactions ◽

Nutrient Source ◽

Cryo Electron Microscopy ◽

Close Proximity

AbstractBacteria respond to environmental changes by inducing transcription of some genes and repressing others. Sialic acids, which coat human cell surfaces, are a nutrient source for pathogenic and commensal bacteria. The Escherichia coli GntR-type transcriptional repressor, NanR, regulates sialic acid metabolism, but the mechanism is unclear. Here, we demonstrate that three NanR dimers bind a (GGTATA)3-repeat operator cooperatively and with high affinity. Single-particle cryo-electron microscopy structures reveal the DNA-binding domain is reorganized to engage DNA, while three dimers assemble in close proximity across the (GGTATA)3-repeat operator. Such an interaction allows cooperative protein-protein interactions between NanR dimers via their N-terminal extensions. The effector, N-acetylneuraminate, binds NanR and attenuates the NanR-DNA interaction. The crystal structure of NanR in complex with N-acetylneuraminate reveals a domain rearrangement upon N-acetylneuraminate binding to lock NanR in a conformation that weakens DNA binding. Our data provide a molecular basis for the regulation of bacterial sialic acid metabolism.

Download Full-text

Shared Blood Transcriptomic Signatures between Alzheimer’s Disease and Diabetes Mellitus

Biomedicines ◽

10.3390/biomedicines9010034 ◽

2021 ◽

Vol 9 (1) ◽

pp. 34

Author(s):

Taesic Lee ◽

Hyunju Lee

Keyword(s):

Diabetes Mellitus ◽

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Protein Interactions ◽

Expression Patterns ◽

Protein Protein Interactions ◽

Hub Genes ◽

Gene Modules ◽

Gene Regulatory ◽

The Brain

Alzheimer’s disease (AD) and diabetes mellitus (DM) are known to have a shared molecular mechanism. We aimed to identify shared blood transcriptomic signatures between AD and DM. Blood expression datasets for each disease were combined and a co-expression network was used to construct modules consisting of genes with similar expression patterns. For each module, a gene regulatory network based on gene expression and protein-protein interactions was established to identify hub genes. We selected one module, where COPS4, PSMA6, GTF2B, GTF2F2, and SSB were identified as dysregulated transcription factors that were common between AD and DM. These five genes were also differentially co-expressed in disease-related tissues, such as the brain in AD and the pancreas in DM. Our study identified gene modules that were dysregulated in both AD and DM blood samples, which may contribute to reveal common pathophysiology between two diseases.

Download Full-text

Machine learning identifies two autophagy-related genes as markers of recurrence in colorectal cancer

Journal of International Medical Research ◽

10.1177/0300060520958808 ◽

2020 ◽

Vol 48 (10) ◽

pp. 030006052095880

Author(s):

Jianping Wu ◽

Sulai Liu ◽

Xiaoming Chen ◽

Hongfei Xu ◽

Yaoping Tang

Keyword(s):

Colorectal Cancer ◽

Machine Learning ◽

Protein Interactions ◽

Early Stage ◽

Predictive Ability ◽

Improve Patient Care ◽

Active Role ◽

Risk Groups ◽

Recurrence Risk ◽

Machine Learning Algorithms

Objective Colorectal cancer (CRC) is the most common cancer worldwide. Patient outcomes following recurrence of CRC are very poor. Therefore, identifying the risk of CRC recurrence at an early stage would improve patient care. Accumulating evidence shows that autophagy plays an active role in tumorigenesis, recurrence, and metastasis. Methods We used machine learning algorithms and two regression models, univariable Cox proportion and least absolute shrinkage and selection operator (LASSO), to identify 26 autophagy-related genes (ARGs) related to CRC recurrence. Results By functional annotation, these ARGs were shown to be enriched in necroptosis and apoptosis pathways. Protein–protein interactions identified SQSTM1, CASP8, HSP80AB1, FADD, and MAPK9 as core genes in CRC autophagy. Of 26 ARGs, BAX and PARP1 were regarded as having the most significant predictive ability of CRC recurrence, with prediction accuracy of 71.1%. Conclusion These results shed light on prediction of CRC recurrence by ARGs. Stratification of patients into recurrence risk groups by testing ARGs would be a valuable tool for early detection of CRC recurrence.

Download Full-text

Genome-wide analysis of vaccinia virus protein-protein interactions

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.080078197 ◽

2000 ◽

Vol 97 (9) ◽

pp. 4879-4884 ◽

Cited By ~ 193

Author(s):

S. McCraith ◽

T. Holtzman ◽

B. Moss ◽

S. Fields

Keyword(s):

Vaccinia Virus ◽

Protein Interactions ◽

Virus Protein ◽

Protein Protein Interactions ◽

Genome Wide Analysis ◽

Genome Wide

Download Full-text