scholarly journals Prediction and validation of protein–protein interactors from genome-wide DNA-binding data using a knowledge-based machine-learning approach

Open Biology ◽  
2016 ◽  
Vol 6 (9) ◽  
pp. 160183 ◽  
Author(s):  
Ashley J. Waardenberg ◽  
Bernou Homan ◽  
Stephanie Mohamed ◽  
Richard P. Harvey ◽  
Romaric Bouveret

The ability to accurately predict the DNA targets and interacting cofactors of transcriptional regulators from genome-wide data can significantly advance our understanding of gene regulatory networks. NKX2-5 is a homeodomain transcription factor that sits high in the cardiac gene regulatory network and is essential for normal heart development. We previously identified genomic targets for NKX2-5 in mouse HL-1 atrial cardiomyocytes using DNA-adenine methyltransferase identification (DamID). Here, we apply machine learning algorithms and propose a knowledge-based feature selection method for predicting NKX2-5 protein : protein interactions based on motif grammar in genome-wide DNA-binding data. We assessed model performance using leave-one-out cross-validation and a completely independent DamID experiment performed with replicates. In addition to identifying previously described NKX2-5-interacting proteins, including GATA, HAND and TBX family members, a number of novel interactors were identified, with direct protein : protein interactions between NKX2-5 and retinoid X receptor (RXR), paired-related homeobox (PRRX) and Ikaros zinc fingers (IKZF) validated using the yeast two-hybrid assay. We also found that the interaction of RXRα with NKX2-5 mutations found in congenital heart disease (Q187H, R189G and R190H) was altered. These findings highlight an intuitive approach to accessing protein–protein interaction information of transcription factors in DNA-binding experiments.

2019 ◽  
Vol 20 (3) ◽  
pp. 177-184 ◽  
Author(s):  
Nantao Zheng ◽  
Kairou Wang ◽  
Weihua Zhan ◽  
Lei Deng

Background:Targeting critical viral-host Protein-Protein Interactions (PPIs) has enormous application prospects for therapeutics. Using experimental methods to evaluate all possible virus-host PPIs is labor-intensive and time-consuming. Recent growth in computational identification of virus-host PPIs provides new opportunities for gaining biological insights, including applications in disease control. We provide an overview of recent computational approaches for studying virus-host PPI interactions.Methods:In this review, a variety of computational methods for virus-host PPIs prediction have been surveyed. These methods are categorized based on the features they utilize and different machine learning algorithms including classical and novel methods.Results:We describe the pivotal and representative features extracted from relevant sources of biological data, mainly include sequence signatures, known domain interactions, protein motifs and protein structure information. We focus on state-of-the-art machine learning algorithms that are used to build binary prediction models for the classification of virus-host protein pairs and discuss their abilities, weakness and future directions.Conclusion:The findings of this review confirm the importance of computational methods for finding the potential protein-protein interactions between virus and host. Although there has been significant progress in the prediction of virus-host PPIs in recent years, there is a lot of room for improvement in virus-host PPI prediction.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Na Sang ◽  
Hui Liu ◽  
Bin Ma ◽  
Xianzhong Huang ◽  
Lu Zhuo ◽  
...  

Abstract Background In plants, 14-3-3 proteins, also called GENERAL REGULATORY FACTORs (GRFs), encoded by a large multigene family, are involved in protein–protein interactions and play crucial roles in various physiological processes. No genome-wide analysis of the GRF gene family has been performed in cotton, and their functions in flowering are largely unknown. Results In this study, 17, 17, 31, and 17 GRF genes were identified in Gossypium herbaceum, G. arboreum, G. hirsutum, and G. raimondii, respectively, by genome-wide analyses and were designated as GheGRFs, GaGRFs, GhGRFs, and GrGRFs, respectively. A phylogenetic analysis revealed that these proteins were divided into ε and non-ε groups. Gene structural, motif composition, synteny, and duplicated gene analyses of the identified GRF genes provided insights into the evolution of this family in cotton. GhGRF genes exhibited diverse expression patterns in different tissues. Yeast two-hybrid and bimolecular fluorescence complementation assays showed that the GhGRFs interacted with the cotton FLOWERING LOCUS T homologue GhFT in the cytoplasm and nucleus, while they interacted with the basic leucine zipper transcription factor GhFD only in the nucleus. Virus-induced gene silencing in G. hirsutum and transgenic studies in Arabidopsis demonstrated that GhGRF3/6/9/15 repressed flowering and that GhGRF14 promoted flowering. Conclusions Here, 82 GRF genes were identified in cotton, and their gene and protein features, classification, evolution, and expression patterns were comprehensively and systematically investigated. The GhGRF3/6/9/15 interacted with GhFT and GhFD to form florigen activation complexs that inhibited flowering. However, GhGRF14 interacted with GhFT and GhFD to form florigen activation complex that promoted flowering. The results provide a foundation for further studies on the regulatory mechanisms of flowering.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Christopher R. Horne ◽  
Hariprasad Venugopal ◽  
Santosh Panjikar ◽  
David M. Wood ◽  
Amy Henrickson ◽  
...  

AbstractBacteria respond to environmental changes by inducing transcription of some genes and repressing others. Sialic acids, which coat human cell surfaces, are a nutrient source for pathogenic and commensal bacteria. The Escherichia coli GntR-type transcriptional repressor, NanR, regulates sialic acid metabolism, but the mechanism is unclear. Here, we demonstrate that three NanR dimers bind a (GGTATA)3-repeat operator cooperatively and with high affinity. Single-particle cryo-electron microscopy structures reveal the DNA-binding domain is reorganized to engage DNA, while three dimers assemble in close proximity across the (GGTATA)3-repeat operator. Such an interaction allows cooperative protein-protein interactions between NanR dimers via their N-terminal extensions. The effector, N-acetylneuraminate, binds NanR and attenuates the NanR-DNA interaction. The crystal structure of NanR in complex with N-acetylneuraminate reveals a domain rearrangement upon N-acetylneuraminate binding to lock NanR in a conformation that weakens DNA binding. Our data provide a molecular basis for the regulation of bacterial sialic acid metabolism.


Biomedicines ◽  
2021 ◽  
Vol 9 (1) ◽  
pp. 34
Author(s):  
Taesic Lee ◽  
Hyunju Lee

Alzheimer’s disease (AD) and diabetes mellitus (DM) are known to have a shared molecular mechanism. We aimed to identify shared blood transcriptomic signatures between AD and DM. Blood expression datasets for each disease were combined and a co-expression network was used to construct modules consisting of genes with similar expression patterns. For each module, a gene regulatory network based on gene expression and protein-protein interactions was established to identify hub genes. We selected one module, where COPS4, PSMA6, GTF2B, GTF2F2, and SSB were identified as dysregulated transcription factors that were common between AD and DM. These five genes were also differentially co-expressed in disease-related tissues, such as the brain in AD and the pancreas in DM. Our study identified gene modules that were dysregulated in both AD and DM blood samples, which may contribute to reveal common pathophysiology between two diseases.


2020 ◽  
Vol 48 (10) ◽  
pp. 030006052095880
Author(s):  
Jianping Wu ◽  
Sulai Liu ◽  
Xiaoming Chen ◽  
Hongfei Xu ◽  
Yaoping Tang

Objective Colorectal cancer (CRC) is the most common cancer worldwide. Patient outcomes following recurrence of CRC are very poor. Therefore, identifying the risk of CRC recurrence at an early stage would improve patient care. Accumulating evidence shows that autophagy plays an active role in tumorigenesis, recurrence, and metastasis. Methods We used machine learning algorithms and two regression models, univariable Cox proportion and least absolute shrinkage and selection operator (LASSO), to identify 26 autophagy-related genes (ARGs) related to CRC recurrence. Results By functional annotation, these ARGs were shown to be enriched in necroptosis and apoptosis pathways. Protein–protein interactions identified SQSTM1, CASP8, HSP80AB1, FADD, and MAPK9 as core genes in CRC autophagy. Of 26 ARGs, BAX and PARP1 were regarded as having the most significant predictive ability of CRC recurrence, with prediction accuracy of 71.1%. Conclusion These results shed light on prediction of CRC recurrence by ARGs. Stratification of patients into recurrence risk groups by testing ARGs would be a valuable tool for early detection of CRC recurrence.


Sign in / Sign up

Export Citation Format

Share Document