scholarly journals Discovery of perturbation gene targets via free text metadata mining in Gene Expression Omnibus

2017 ◽  
Author(s):  
Djordje Djordjevic ◽  
Joshua Y. S. Tang ◽  
Yun Xin Chen ◽  
Shu Lun Shannon Kwan ◽  
Raymond W. K. Ling ◽  
...  

AbstractThere exists over 2.5 million publicly available gene expression samples across 101,000 data series in NCBI’s Gene Expression Omnibus (GEO) database. Due to the lack of the use of standardised ontology terms in GEO’s free text metadata to annotate the experimental type and sample type, this database remains difficult to harness computationally without significant manual intervention.In this work, we present an interactive R/Shiny tool called GEOracle that utilises text mining and machine learning techniques to automatically identify perturbation experiments, group treatment and control samples and perform differential expression. We present applications of GEOracle to discover conserved signalling pathway target genes and identify an organ specific gene regulatory network.GEOracle is effective in discovering perturbation gene targets in GEO by harnessing its free text metadata. Its effectiveness and applicability has been demonstrated by cross validation and two real-life case studies. It opens up new avenues to unlock the gene regulatory information embedded inside large biological databases such as GEO. GEOracle is available at https://github.com/VCCRI/GEOracle.

2019 ◽  
Vol 36 (1) ◽  
pp. 197-204 ◽  
Author(s):  
Xin Zhou ◽  
Xiaodong Cai

Abstract Motivation Gene regulatory networks (GRNs) of the same organism can be different under different conditions, although the overall network structure may be similar. Understanding the difference in GRNs under different conditions is important to understand condition-specific gene regulation. When gene expression and other relevant data under two different conditions are available, they can be used by an existing network inference algorithm to estimate two GRNs separately, and then to identify the difference between the two GRNs. However, such an approach does not exploit the similarity in two GRNs, and may sacrifice inference accuracy. Results In this paper, we model GRNs with the structural equation model (SEM) that can integrate gene expression and genetic perturbation data, and develop an algorithm named fused sparse SEM (FSSEM), to jointly infer GRNs under two conditions, and then to identify difference of the two GRNs. Computer simulations demonstrate that the FSSEM algorithm outperforms the approaches that estimate two GRNs separately. Analysis of a dataset of lung cancer and another dataset of gastric cancer with FSSEM inferred differential GRNs in cancer versus normal tissues, whose genes with largest network degrees have been reported to be implicated in tumorigenesis. The FSSEM algorithm provides a valuable tool for joint inference of two GRNs and identification of the differential GRN under two conditions. Availability and implementation The R package fssemR implementing the FSSEM algorithm is available at https://github.com/Ivis4ml/fssemR.git. It is also available on CRAN. Supplementary information Supplementary data are available at Bioinformatics online.


Cell Systems ◽  
2020 ◽  
Vol 10 (2) ◽  
pp. 169-182.e5 ◽  
Author(s):  
Supriya Sen ◽  
Zhang Cheng ◽  
Katherine M. Sheu ◽  
Yu Hsin Chen ◽  
Alexander Hoffmann

Database ◽  
2020 ◽  
Vol 2020 ◽  
Author(s):  
Constance M Smith ◽  
James A Kadin ◽  
Richard M Baldarelli ◽  
Jonathan S Beal ◽  
Olin Blodgett ◽  
...  

Abstract The Gene Expression Database (GXD), an extensive community resource of curated expression information for the mouse, has developed an RNA-Seq and Microarray Experiment Search (http://www.informatics.jax.org/gxd/htexp_index). This tool allows users to quickly and reliably find specific experiments in ArrayExpress and the Gene Expression Omnibus (GEO) that study endogenous gene expression in wild-type and mutant mice. Standardized metadata annotations, curated by GXD, allow users to specify the anatomical structure, developmental stage, mutated gene, strain and sex of samples of interest, as well as the study type and key parameters of the experiment. These searches, powered by controlled vocabularies and ontologies, can be combined with free text searching of experiment titles and descriptions. Search result summaries include link-outs to ArrayExpress and GEO, providing easy access to the expression data itself. Links to the PubMed entries for accompanying publications are also included. More information about this tool and GXD can be found at the GXD home page (http://www.informatics.jax.org/expression.shtml). Database URL: http://www.informatics.jax.org/expression.shtml


2011 ◽  
Vol 28 (2) ◽  
pp. 214-221 ◽  
Author(s):  
Geert Geeven ◽  
Ronald E. van Kesteren ◽  
August B. Smit ◽  
Mathisca C. M. de Gunst

2019 ◽  
Author(s):  
Sabina Kanton ◽  
Michael James Boyle ◽  
Zhisong He ◽  
Malgorzata Santel ◽  
Anne Weigert ◽  
...  

ABSTRACTThe human brain has changed dramatically since humans diverged from our closest living relatives, chimpanzees and the other great apes1–5. However, the genetic and developmental programs underlying this divergence are not fully understood6–8. Here, we have analyzed stem cell-derived cerebral organoids using single-cell transcriptomics (scRNA-seq) and accessible chromatin profiling (scATAC-seq) to explore gene regulatory changes that are specific to humans. We first analyze cell composition and reconstruct differentiation trajectories over the entire course of human cerebral organoid development from pluripotency, through neuroectoderm and neuroepithelial stages, followed by divergence into neuronal fates within the dorsal and ventral forebrain, midbrain and hindbrain regions. We find that brain region composition varies in organoids from different iPSC lines, yet regional gene expression patterns are largely reproducible across individuals. We then analyze chimpanzee and macaque cerebral organoids and find that human neuronal development proceeds at a delayed pace relative to the other two primates. Through pseudotemporal alignment of differentiation paths, we identify human-specific gene expression resolved to distinct cell states along progenitor to neuron lineages in the cortex. We find that chromatin accessibility is dynamic during cortex development, and identify instances of accessibility divergence between human and chimpanzee that correlate with human-specific gene expression and genetic change. Finally, we map human-specific expression in adult prefrontal cortex using single-nucleus RNA-seq and find developmental differences that persist into adulthood, as well as cell state-specific changes that occur exclusively in the adult brain. Our data provide a temporal cell atlas of great ape forebrain development, and illuminate dynamic gene regulatory features that are unique to humans.


2021 ◽  
Author(s):  
Deborah Weighill ◽  
Marouen Ben Guebila ◽  
Kimberly Glass ◽  
John Quackenbush ◽  
John Platig

AbstractThe majority of disease-associated genetic variants are thought to have regulatory effects, including the disruption of transcription factor (TF) binding and the alteration of downstream gene expression. Identifying how a person’s genotype affects their individual gene regulatory network has the potential to provide important insights into disease etiology and to enable improved genotype-specific disease risk assessments and treatments. However, the impact of genetic variants is generally not considered when constructing gene regulatory networks. To address this unmet need, we developed EGRET (Estimating the Genetic Regulatory Effect on TFs), which infers a genotype-specific gene regulatory network (GRN) for each individual in a study population by using message passing to integrate genotype-informed TF motif predictions - derived from individual genotype data, the predicted effects of variants on TF binding and gene expression, and TF motif predictions - with TF protein-protein interactions and gene expression. Comparing EGRET networks for two blood-derived cell lines identified genotype-associated cell-line specific regulatory differences which were subsequently validated using allele-specific expression, chromatin accessibility QTLs, and differential TF binding from ChIP-seq. In addition, EGRET GRNs for three cell types across 119 individuals captured regulatory differences associated with disease in a cell-type-specific manner. Our analyses demonstrate that EGRET networks can capture the impact of genetic variants on complex phenotypes, supporting a novel fine-scale stratification of individuals based on their genetic background. EGRET is available through the Network Zoo R package (netZooR v0.9; netzoo.github.io).


2020 ◽  
Author(s):  
Maud Fagny ◽  
Marieke Lydia Kuijjer ◽  
Maike Stam ◽  
Johann Joets ◽  
Olivier Turc ◽  
...  

AbstractEnhancers are important regulators of gene expression during numerous crucial processes including tissue differentiation across development. In plants, their recent molecular characterization revealed their capacity to activate the expression of several target genes through the binding of transcription factors. Nevertheless, identifying these target genes at a genome-wide level remains a challenge, in particular in species with large genomes, where enhancers and target genes can be hundreds of kilobases away. Therefore, the contribution of enhancers to regulatory network is still poorly understood in plants. In this study, we investigate the enhancer-driven regulatory network of two maize tissues at different stages: leaves at seedling stage and husks (bracts) at flowering. Using a systems biology approach, we integrate genomic, epigenomic and transcriptomic data to model the regulatory relationship between transcription factors and their potential target genes. We identify regulatory modules specific to husk and V2-IST, and show that they are involved in distinct functions related to the biology of each tissue. We evidence enhancers exhibiting binding sites for two distinct transcription factor families (DOF and AP2/ERF) that drive the tissue-specificity of gene expression in seedling immature leaf and husk. Analysis of the corresponding enhancer sequences reveals that two different transposable element families (TIR transposon Mutator and MITE Pif/Harbinger) have shaped the regulatory network in each tissue, and that MITEs have provided new transcription factor binding sites that are involved in husk tissue-specificity.SignificanceEnhancers play a major role in regulating tissue-specific gene expression in higher eukaryotes, including angiosperms. While molecular characterization of enhancers has improved over the past years, identifying their target genes at the genome-wide scale remains challenging. Here, we integrate genomic, epigenomic and transcriptomic data to decipher the tissue-specific gene regulatory network controlled by enhancers at two different stages of maize leaf development. Using a systems biology approach, we identify transcription factor families regulating gene tissue-specific expression in husk and seedling leaves, and characterize the enhancers likely to be involved. We show that a large part of maize enhancers is derived from transposable elements, which can provide novel transcription factor binding sites crucial to the regulation of tissue-specific biological functions.


PLoS ONE ◽  
2021 ◽  
Vol 16 (1) ◽  
pp. e0244864
Author(s):  
Carlos Mora-Martinez

Large amounts of effort have been invested in trying to understand how a single genome is able to specify the identity of hundreds of cell types. Inspired by some aspects of Caenorhabditis elegans biology, we implemented an in silico evolutionary strategy to produce gene regulatory networks (GRNs) that drive cell-specific gene expression patterns, mimicking the process of terminal cell differentiation. Dynamics of the gene regulatory networks are governed by a thermodynamic model of gene expression, which uses DNA sequences and transcription factor degenerate position weight matrixes as input. In a version of the model, we included chromatin accessibility. Experimentally, it has been determined that cell-specific and broadly expressed genes are regulated differently. In our in silico evolved GRNs, broadly expressed genes are regulated very redundantly and the architecture of their cis-regulatory modules is different, in accordance to what has been found in C. elegans and also in other systems. Finally, we found differences in topological positions in GRNs between these two classes of genes, which help to explain why broadly expressed genes are so resilient to mutations. Overall, our results offer an explanatory hypothesis on why broadly expressed genes are regulated so redundantly compared to cell-specific genes, which can be extrapolated to phenomena such as ChIP-seq HOT regions.


Sign in / Sign up

Export Citation Format

Share Document