scholarly journals RGBM: Regularized Gradient Boosting Machines for the Identification of Transcriptional Regulators of Discrete Glioma Subtypes

2017 ◽  
Author(s):  
Raghvendra Mall ◽  
Luigi Cerulo ◽  
Khalid Kunji ◽  
Halima Bensmail ◽  
Thais S. Sabedot ◽  
...  

AbstractThe transcription factors (TF) which regulate gene expressions are key determinants of cellular phenotypes. Reconstructing large-scale genome-wide networks which capture the influence of TFs on target genes are essential for understanding and accurate modelling of living cells. We propose RGBM: a gene regulatory network (GRN) inference algorithm, which can handle data from heterogeneous information sources including dynamic time-series, gene knockout, gene knockdown, DNA microarrays and RNA-Seq expression profiles. RGBM allows to use an a priori mechanistic of active biding network consisting of TFs and corresponding target genes. RGBM is evaluated on the DREAM challenge datasets where it surpasses the winners of the competitions and other established methods for two evaluation metrics by about 10-15%.We use RGBM to identify the main regulators of the molecular subtypes of brain tumors. Our analysis reveals the identity and corresponding biological activities of the master regulators driving transformation of the G-CIMP-high into the G-CIMP-low subtype of glioma and PA-like into LGm6-GBM, thus, providing a clue to the yet undetermined nature of the transcriptional events driving the evolution among these novel glioma subtypes.RGBM is available for download on CRAN at https://cran.rproject.org/web/packages/RGBM/index.html

2015 ◽  
Author(s):  
Yifei Chen ◽  
Yi Li ◽  
Rajiv Narayan ◽  
Aravind Subramanian ◽  
Xiaohui Xie

Motivation: Large-scale gene expression profiling has been widely used to characterize cellular states in response to various disease conditions, genetic perturbations, etc. Although the cost of whole-genome expression profiles has been dropping steadily, generating a compendium of expression profiling over thousands of samples is still very expensive. Recognizing that gene expressions are often highly correlated, researchers from the NIH LINCS program have developed a cost- effective strategy of profiling only ̃1,000 carefully selected landmark genes and relying on computational methods to infer the expression of remaining target genes. However, the computational approach adopted by the LINCS program is currently based on linear regression, limiting its accuracy since it does not capture complex nonlinear relationship between expression of genes. Results: We present a deep learning method (abbreviated as D-GEX) to infer the expression of target genes from the expression of landmark genes. We used the microarray-based GEO dataset, consisting of 111K expression profiles, to train our model and compare its performance to those from other methods. In terms of mean absolute error averaged across all genes, deep learning significantly outperforms linear regression with 15.33% relative improvement. A gene-wise comparative analysis shows that deep learning achieves lower error than linear regression in 99.97% of the target genes. We also tested the performance of our learned model on an independent RNA-Seq-based GTEx dataset, which consists of 2,921 expression profiles. Deep learning still outperforms linear regression with 6.57% relative improvement, and achieves lower error in 81.31% of the target genes. Availability: D-GEX is available at https://github.com/uci-cbcl/D-GEX.


2008 ◽  
Vol 5 (2) ◽  
Author(s):  
Li Teng ◽  
Laiwan Chan

SummaryTraditional analysis of gene expression profiles use clustering to find groups of coexpressed genes which have similar expression patterns. However clustering is time consuming and could be diffcult for very large scale dataset. We proposed the idea of Discovering Distinct Patterns (DDP) in gene expression profiles. Since patterns showing by the gene expressions reveal their regulate mechanisms. It is significant to find all different patterns existing in the dataset when there is little prior knowledge. It is also a helpful start before taking on further analysis. We propose an algorithm for DDP by iteratively picking out pairs of gene expression patterns which have the largest dissimilarities. This method can also be used as preprocessing to initialize centers for clustering methods, like K-means. Experiments on both synthetic dataset and real gene expression datasets show our method is very effective in finding distinct patterns which have gene functional significance and is also effcient.


Author(s):  
Crescenzio Gallo

The possible applications of modeling and simulation in the field of bioinformatics are very extensive, ranging from understanding basic metabolic paths to exploring genetic variability. Experimental results carried out with DNA microarrays allow researchers to measure expression levels for thousands of genes simultaneously, across different conditions and over time. A key step in the analysis of gene expression data is the detection of groups of genes that manifest similar expression patterns. In this chapter, the authors examine various methods for analyzing gene expression data, addressing the important topics of (1) selecting the most differentially expressed genes, (2) grouping them by means of their relationships, and (3) classifying samples based on gene expressions.


2021 ◽  
Author(s):  
Tianyuan Lei ◽  
Xuhong Liao ◽  
Xiaodan Chen ◽  
Tengda Zhao ◽  
Yuehua Xu ◽  
...  

AbstractFunctional brain networks require dynamic reconfiguration to support flexible cognitive function. However, the developmental principles shaping brain network dynamics remain poorly understood. Here, we report the longitudinal development of large-scale brain network dynamics during childhood and adolescence, and its connection with gene expression profiles. Using a multilayer network model, we show the temporally varying modular architecture of child brain networks, with higher network switching primarily in the association cortex and lower switching in the primary regions. This topographical profile exhibits progressive maturation, which manifests as reduced modular dynamics, particularly in the transmodal (e.g., default-mode and frontoparietal) and sensorimotor regions. These developmental refinements mediate age-related enhancements of global network segregation and are linked with the expression profiles of genes associated with the enrichment of ion transport and nucleobase-containing compound transport. These results highlight a progressive stabilization of brain dynamics, which expand our understanding of the neural mechanisms that underlie cognitive development.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Lingpeng Kong ◽  
Yuanyuan Chen ◽  
Fengjiao Xu ◽  
Mingmin Xu ◽  
Zutan Li ◽  
...  

Abstract Background Currently, large-scale gene expression profiling has been successfully applied to the discovery of functional connections among diseases, genetic perturbation, and drug action. To address the cost of an ever-expanding gene expression profile, a new, low-cost, high-throughput reduced representation expression profiling method called L1000 was proposed, with which one million profiles were produced. Although a set of ~ 1000 carefully chosen landmark genes that can capture ~ 80% of information from the whole genome has been identified for use in L1000, the robustness of using these landmark genes to infer target genes is not satisfactory. Therefore, more efficient computational methods are still needed to deep mine the influential genes in the genome. Results Here, we propose a computational framework based on deep learning to mine a subset of genes that can cover more genomic information. Specifically, an AutoEncoder framework is first constructed to learn the non-linear relationship between genes, and then DeepLIFT is applied to calculate gene importance scores. Using this data-driven approach, we have re-obtained a landmark gene set. The result shows that our landmark genes can predict target genes more accurately and robustly than that of L1000 based on two metrics [mean absolute error (MAE) and Pearson correlation coefficient (PCC)]. This reveals that the landmark genes detected by our method contain more genomic information. Conclusions We believe that our proposed framework is very suitable for the analysis of biological big data to reveal the mysteries of life. Furthermore, the landmark genes inferred from this study can be used for the explosive amplification of gene expression profiles to facilitate research into functional connections.


2021 ◽  
pp. gr.275901.121
Author(s):  
Alexandre Laverre ◽  
Eric Tannier ◽  
Anamaria Necsulea

Gene expression is regulated through complex molecular interactions, involving cis-acting elements that can be situated far away from their target genes. Data on long-range contacts between promoters and regulatory elements is rapidly accumulating. However, it remains unclear how these regulatory relationships evolve and how they contribute to the establishment of robust gene expression profiles. Here, we address these questions by comparing genome-wide maps of promoter-centered chromatin contacts in mouse and human. We show that there is significant evolutionary conservation of cis-regulatory landscapes, indicating that selective pressures act to preserve not only regulatory element sequences but also their chromatin contacts with target genes. The extent of evolutionary conservation is remarkable for long-range promoter-enhancer contacts, illustrating how the structure of regulatory landscapes constrains large-scale genome evolution. We show that the evolution of cis-regulatory landscapes, measured in terms of distal element sequences, synteny or contacts with target genes, is significantly associated with gene expression evolution.


2021 ◽  
Author(s):  
Alexandre Laverré ◽  
Eric Tannier ◽  
Anamaria Necsulea

AbstractGene expression is regulated through complex molecular interactions, involving cis-acting elements that can be situated far away from their target genes. Data on long-range contacts between promoters and regulatory elements is rapidly accumulating. However, it remains unclear how these regulatory relationships evolve and how they contribute to the establishment of robust gene expression profiles. Here, we address these questions by comparing genome-wide maps of promoter-centered chromatin contacts in mouse and human. We show that there is significant evolutionary conservation of cis-regulatory landscapes, indicating that selective pressures act to preserve regulatory element sequences and their interactions with target genes. The extent of evolutionary conservation is remarkable for long-range promoter-enhancer contacts, illustrating how the structure of regulatory interactions constrains large-scale genome evolution. Notably, we show that the evolution of cis-regulatory landscapes, measured in terms of distal element sequences, synteny or contacts with target genes, is tightly linked to gene expression evolution.


2019 ◽  
Vol 2019 ◽  
pp. 1-8 ◽  
Author(s):  
Jie Ren ◽  
Lulu Shang ◽  
Qing Wang ◽  
Jing Li

Proteomics, the large-scale analysis of proteins, is contributing greatly to understanding gene function in the postgenomic era. However, disease protein ranking using shotgun proteomics data has not been fully evaluated. In this study, we prioritized disease-related proteins by integrating the protein-protein interaction (PPI) network and protein differential expression profiles from colon and rectal cancer (CRC) or breast cancer (BC) proteomics. We applied Local Ranking (LR) and Global Ranking (GR) methods in network with three kinds of protein sets as a priori knowledge, which were known disease proteins (KDPs) that were collected from the Online Mendelian Inheritance in Man (OMIM) database, differentially expressed proteins (DEPs), and the collection of KDPs and their direct neighborhood with differential expression (eKDPs). The cross-validations showed that GR method outperformed LR method while using eKDPs as the initial training showed significantly higher accuracy compared to using the other two a priori sets. And then we validated the top ranked proteins using RNAi-based loss-of-function screens in the DepMap database. The results showed that 75% of top 20 proteins in CRC are necessary for tumor survival. In summary, the network-based Global Ranking with protein differential expression can efficiently prioritize cancer-related proteins and discover new candidate cancer genes or proteins.


PeerJ ◽  
2016 ◽  
Vol 4 ◽  
pp. e2437 ◽  
Author(s):  
Changwei Bi ◽  
Yiqing Xu ◽  
Qiaolin Ye ◽  
Tongming Yin ◽  
Ning Ye

WRKY proteins are the zinc finger transcription factors that were first identified in plants. They can specifically interact with the W-box, which can be found in the promoter region of a large number of plant target genes, to regulate the expressions of downstream target genes. They also participate in diverse physiological and growing processes in plants. Prior to this study, a plenty of WRKY genes have been identified and characterized in herbaceous species, but there is no large-scale study of WRKY genes in willow. With the whole genome sequencing ofSalix suchowensis, we have the opportunity to conduct the genome-wide research for willow WRKY gene family. In this study, we identified 85 WRKY genes in the willow genome and renamed them from SsWRKY1 to SsWRKY85 on the basis of their specific distributions on chromosomes. Due to their diverse structural features, the 85 willow WRKY genes could be further classified into three main groups (group I–III), with five subgroups (IIa–IIe) in group II. With the multiple sequence alignment and the manual search, we found three variations of the WRKYGQK heptapeptide: WRKYGRK, WKKYGQK and WRKYGKK, and four variations of the normal zinc finger motif, which might execute some new biological functions. In addition, the SsWRKY genes from the same subgroup share the similar exon–intron structures and conserved motif domains. Further studies of SsWRKY genes revealed that segmental duplication events (SDs) played a more prominent role in the expansion of SsWRKY genes. Distinct expression profiles of SsWRKY genes with RNA sequencing data revealed that diverse expression patterns among five tissues, including tender roots, young leaves, vegetative buds, non-lignified stems and barks. With the analyses of WRKY gene family in willow, it is not only beneficial to complete the functional and annotation information of WRKY genes family in woody plants, but also provide important references to investigate the expansion and evolution of this gene family in flowering plants.


2020 ◽  
Author(s):  
Lingpeng Kong ◽  
Yuanyuan Chen ◽  
Cong Pian ◽  
Mingmin Xu ◽  
Zutan Li ◽  
...  

Abstract Background: Currently, large-scale gene expression profiling has been successfully applied to the discovery of functional connections among diseases, genetic perturbation, and drug action. To address the cost of an ever-expanding gene expression profile, a new, low-cost, high-throughput reduced representation expression profiling method called L1000 was proposed, with which one million profiles were produced. Although a set of ~ 1,000 carefully chosen landmark genes that can capture ~ 80% of information from the whole genome has been identified for use in L1000, the robustness of using these landmark genes to infer target genes is not satisfactory. Therefore, more efficient computational methods are still needed to deep mine the influential genes in the genome.Results: Here, we propose a computational framework based on deep learning to mine a subset of genes that can cover more genomic information. Specifically, an AutoEncoder framework is first constructed to learn the non-linear relationship between genes, and then DeepLIFT is applied to calculate gene importance scores. Using this data-driven approach, we have re-obtained a landmark gene set. The result shows that our landmark genes can predict target genes more accurately and robustly than that of L1000 based on two metrics (mean absolute error (MAE) and Pearson correlation coefficient (PCC)). This reveals that the landmark genes detected by our method contain more genomic information.Conclusions: We believe that our proposed framework is very suitable for the analysis of biological big data to reveal the mysteries of life. Furthermore, the landmark genes inferred from this study can be used for the explosive amplification of gene expression profiles to facilitate research into functional connections.


Sign in / Sign up

Export Citation Format

Share Document