Genome-wide active enhancer identification using cell type-specific signatures of epigenomic activity

Mapping Intimacies ◽

10.1101/421230 ◽

2018 ◽

Author(s):

Shalu Jhanwar ◽

Stephan Ossowski ◽

Jose Davila-Velderrain

Keyword(s):

Cell Fate ◽

Structural Complexity ◽

Predictive Performance ◽

Cell Types ◽

Genome Wide ◽

Active Enhancer ◽

Cell Type Specific ◽

Genomic Regions ◽

Human And Mouse ◽

Enhancer Identification

AbstractRecently enhancers have emerged as key players regulating crucial mechanisms such as cell fate determination and establishment of spatiotemporal patterns of gene expression during development. Due to their functional and structural complexity, an accurate in silico identification of active enhancers under specific conditions remain challenging. We present a novel machine learning based method that derives epigenomic patterns exclusively from experimentally characterized active enhancers contrasted with a weighted set of non-enhancer genomic regions. We demonstrate better predictive performance over previous methods, as well as wide generalizability by identifying and annotating active enhancers genome-wide across different tissues/cell types in human and mouse.

Download Full-text

Learning a genome-wide score of human–mouse conservation at the functional genomics level

Nature Communications ◽

10.1038/s41467-021-22653-8 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Soo Bin Kwon ◽

Jason Ernst

Keyword(s):

Mouse Model ◽

Functional Genomics ◽

Functional Genomic ◽

Transcriptomic Data ◽

Model Studies ◽

Genome Wide ◽

A Genome ◽

Important Challenge ◽

Genomic Regions ◽

Human And Mouse

AbstractIdentifying genomic regions with functional genomic properties that are conserved between human and mouse is an important challenge in the context of mouse model studies. To address this, we develop a method to learn a score of evidence of conservation at the functional genomics level by integrating information from a compendium of epigenomic, transcription factor binding, and transcriptomic data from human and mouse. The method, Learning Evidence of Conservation from Integrated Functional genomic annotations (LECIF), trains neural networks to generate this score for the human and mouse genomes. The resulting LECIF score highlights human and mouse regions with shared functional genomic properties and captures correspondence of biologically similar human and mouse annotations. Analysis with independent datasets shows the score also highlights loci associated with similar phenotypes in both species. LECIF will be a resource for mouse model studies by identifying loci whose functional genomic properties are likely conserved.

Download Full-text

Simultaneous profiling of multiple chromatin proteins in the same cells

10.1101/2021.04.27.441642 ◽

2021 ◽

Author(s):

Sneha Gopalan ◽

Yuqing Wang ◽

Nicholas W. Harper ◽

Manuel Garber ◽

Thomas G Fazzio

Keyword(s):

Rna Polymerase Ii ◽

Direct Analysis ◽

Cell Types ◽

Regulatory Elements ◽

Genome Wide ◽

Distinct Cell ◽

Direct Measurements ◽

Cell Type Specific ◽

Chromatin Proteins

Methods derived from CUT&RUN and CUT&Tag enable genome-wide mapping of the localization of proteins on chromatin from as few as one cell. These and other mapping approaches focus on one protein at a time, preventing direct measurements of co-localization of different chromatin proteins in the same cells and requiring prioritization of targets where samples are limiting. Here we describe multi-CUT&Tag, an adaptation of CUT&Tag that overcomes these hurdles by using antibody-specific barcodes to simultaneously map multiple proteins in the same cells. Highly specific multi-CUT&Tag maps of histone marks and RNA Polymerase II uncovered sites of co-localization in the same cells, active and repressed genes, and candidate cis-regulatory elements. Single-cell multi-CUT&Tag profiling facilitated identification of distinct cell types from a mixed population and characterization of cell type-specific chromatin architecture. In sum, multi-CUT&Tag increases the information content per cell of epigenomic maps, facilitating direct analysis of the interplay of different proteins on chromatin.

Download Full-text

Mathematical modelling of promoter occupancies in MYC-dependent gene regulation

Genomics and Computational Biology ◽

10.18547/gcb.2017.vol3.iss2.e54 ◽

2017 ◽

Vol 3 (2) ◽

pp. 54

Author(s):

Uwe Benary ◽

Elmar Wolf ◽

Jana Wolf

Keyword(s):

Dna Binding ◽

Mathematical Modelling ◽

Expression Patterns ◽

Cell Types ◽

Osteosarcoma Cell ◽

Genome Wide ◽

Human Osteosarcoma Cell ◽

Cell Type Specific ◽

Binding Behaviour ◽

Oncogene Protein

The human MYC proto-oncogene protein (MYC) is a transcription factor that plays a major role in the regulation of cell proliferation. Deregulation of MYC expression is often found in cancer. In the last years, several hypotheses have been proposed to explain cell type specific MYC target gene expression patterns despite genome wide DNA binding of MYC. In a recent publication, a mathematical modelling approach in combination with experimental data demonstrated that differences in MYC-DNA-binding affinity are sufficient to explain distinct promoter occupancies and allow stratification of distinct MYC-regulated biological processes at different MYC concentrations. Here, we extend the analysis of the published mathematical model of DNA-binding behaviour of MYC to demonstrate that the insights gained in the investigation of the human osteosarcoma cell line U2OS can be generalized to other human cell types.

Download Full-text

HiCRes: a computational method to estimate and predict the resolution of HiC libraries

10.1101/2020.09.22.307967 ◽

2020 ◽

Author(s):

Claire Marchal ◽

Nivedita Singh ◽

Ximena Corso-Díaz ◽

Anand Swaroop

Keyword(s):

Expression Patterns ◽

Three Dimensional ◽

Computational Method ◽

Mathematical Concepts ◽

Regulate Gene Expression ◽

Chromatin Interactions ◽

Genome Wide ◽

A Cell ◽

Cell Type Specific ◽

Human And Mouse

AbstractThree-dimensional (3D) conformation of the chromatin is crucial to stringently regulate gene expression patterns and DNA replication in a cell-type specific manner. HiC is a key technique for measuring 3D chromatin interactions genome wide. Estimating and predicting the resolution of a library is an essential step in any HiC experimental design. Here, we present the mathematical concepts to estimate the resolution of a library and predict whether deeper sequencing would enhance the resolution. We have developed HiCRes, a docker pipeline, by applying these concepts to human and mouse HiC libraries.

Download Full-text

A compendium of uniformly processed human gene expression and splicing quantitative trait loci

Nature Genetics ◽

10.1038/s41588-021-00924-w ◽

2021 ◽

Vol 53 (9) ◽

pp. 1290-1299

Author(s):

Nurlan Kerimov ◽

James D. Hayhurst ◽

Kateryna Peikova ◽

Jonathan R. Manning ◽

Peter Walter ◽

...

Keyword(s):

Gene Expression ◽

Quantitative Trait ◽

Target Genes ◽

Genome Wide Association Study ◽

Cell Types ◽

Summary Statistics ◽

Genome Wide ◽

Cell Type Specific ◽

Trait Locus ◽

Complex Human Traits

AbstractMany gene expression quantitative trait locus (eQTL) studies have published their summary statistics, which can be used to gain insight into complex human traits by downstream analyses, such as fine mapping and co-localization. However, technical differences between these datasets are a barrier to their widespread use. Consequently, target genes for most genome-wide association study (GWAS) signals have still not been identified. In the present study, we present the eQTL Catalogue (https://www.ebi.ac.uk/eqtl), a resource of quality-controlled, uniformly re-computed gene expression and splicing QTLs from 21 studies. We find that, for matching cell types and tissues, the eQTL effect sizes are highly reproducible between studies. Although most QTLs were shared between most bulk tissues, we identified a greater diversity of cell-type-specific QTLs from purified cell types, a subset of which also manifested as new disease co-localizations. Our summary statistics are freely available to enable the systematic interpretation of human GWAS associations across many cell types and tissues.

Download Full-text

A single cell brain atlas in human Alzheimer’s disease

10.1101/628347 ◽

2019 ◽

Cited By ~ 4

Author(s):

Alexandra Grubman ◽

Gabriel Chew ◽

John F. Ouyang ◽

Guizhi Sun ◽

Xin Yi Choo ◽

...

Keyword(s):

Gene Expression ◽

Transcription Factor ◽

Single Cell ◽

Cell Fate ◽

Expression Patterns ◽

Cell Types ◽

Gene Expression Patterns ◽

Cell Type ◽

Web Resource ◽

Cell Type Specific

AbstractAlzheimer’s disease (AD) is a heterogeneous disease that is largely dependent on the complex cellular microenvironment in the brain. This complexity impedes our understanding of how individual cell types contribute to disease progression and outcome. To characterize the molecular and functional cell diversity in the human AD brain we utilized single nuclei RNA- seq in AD and control patient brains in order to map the landscape of cellular heterogeneity in AD. We detail gene expression changes at the level of cells and cell subclusters, highlighting specific cellular contributions to global gene expression patterns between control and Alzheimer’s patient brains. We observed distinct cellular regulation of APOE which was repressed in oligodendrocyte progenitor cells (OPCs) and astrocyte AD subclusters, and highly enriched in a microglial AD subcluster. In addition, oligodendrocyte and microglia AD subclusters show discordant expression of APOE. Integration of transcription factor regulatory modules with downstream GWAS gene targets revealed subcluster-specific control of AD cell fate transitions. For example, this analysis uncovered that astrocyte diversity in AD was under the control of transcription factor EB (TFEB), a master regulator of lysosomal function and which initiated a regulatory cascade containing multiple AD GWAS genes. These results establish functional links between specific cellular sub-populations in AD, and provide new insights into the coordinated control of AD GWAS genes and their cell-type specific contribution to disease susceptibility. Finally, we created an interactive reference web resource which will facilitate brain and AD researchers to explore the molecular architecture of subtype and AD-specific cell identity, molecular and functional diversity at the single cell level.HighlightsWe generated the first human single cell transcriptome in AD patient brainsOur study unveiled 9 clusters of cell-type specific and common gene expression patterns between control and AD brains, including clusters of genes that present properties of different cell types (i.e. astrocytes and oligodendrocytes)Our analyses also uncovered functionally specialized sub-cellular clusters: 5 microglial clusters, 8 astrocyte clusters, 6 neuronal clusters, 6 oligodendrocyte clusters, 4 OPC and 2 endothelial clusters, each enriched for specific ontological gene categoriesOur analyses found manifold AD GWAS genes specifically associated with one cell-type, and sets of AD GWAS genes co-ordinately and differentially regulated between different brain cell-types in AD sub-cellular clustersWe mapped the regulatory landscape driving transcriptional changes in AD brain, and identified transcription factor networks which we predict to control cell fate transitions between control and AD sub-cellular clustersFinally, we provide an interactive web-resource that allows the user to further visualise and interrogate our dataset.Data resource web interface:http://adsn.ddnetbio.com

Download Full-text

Evaluating the contribution of cell-type specific alternative splicing to variation in lipid levels

10.1101/659326 ◽

2019 ◽

Author(s):

K.A.B. Gawronski ◽

W. Bone ◽

Y. Park ◽

E. Pashos ◽

X. Wang ◽

...

Keyword(s):

Alternative Splicing ◽

Quantitative Trait ◽

Cell Types ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Lipid Levels ◽

Cell Type ◽

Genome Wide ◽

Genetic Mechanisms ◽

Cell Type Specific

AbstractBackgroundGenome-wide association studies have identified 150+ loci associated with lipid levels. However, the genetic mechanisms underlying most of these loci are not well-understood. Recent work indicates that changes in the abundance of alternatively spliced transcripts contributes to complex trait variation. Consequently, identifying genetic loci that associate with alternative splicing in disease-relevant cell types and determining the degree to which these loci are informative for lipid biology is of broad interest.Methods and ResultsWe analyze gene splicing in 83 sample-matched induced pluripotent stem cell (iPSC) and hepatocyte-like cell (HLC) lines (n=166), as well as in an independent collection of primary liver tissues (n=96). We observe that transcript splicing is highly cell-type specific, and the genes that are differentially spliced between iPSCs and HLCs are enriched for metabolism pathway annotations. We identify 1,381 HLC splicing quantitative trait loci (sQTLs) and 1,462 iPSC sQTLs and find that sQTLs are often shared across cell types. To evaluate the contribution of sQTLs to variation in lipid levels, we conduct colocalization analysis using lipid genome-wide association data. We identify 19 lipid-associated loci that colocalize either with an HLC expression quantitative trait locus (eQTL) or sQTL. Only one locus colocalizes with both an sQTL and eQTL, indicating that sQTLs contribute information about GWAS loci that cannot be obtained by analysis of steady-state gene expression alone.ConclusionsThese results provide an important foundation for future efforts that use iPSC and iPSC-derived cells to evaluate genetic mechanisms influencing both cardiovascular disease risk and complex traits in general.

Download Full-text

Systematic evaluation of cell-type deconvolution pipelines for sequencing-based bulk DNA methylomes

10.1101/2021.11.29.470374 ◽

2021 ◽

Author(s):

Yunhee Jeong ◽

Reka Toth ◽

Marlene Ganslmeier ◽

Kersten Breuer ◽

Christoph Plass ◽

...

Keyword(s):

Cell Types ◽

Systematic Evaluation ◽

Cell Type ◽

Factors Affecting ◽

Genome Wide ◽

Cell Type Composition ◽

Type Composition ◽

Level Information ◽

Genomic Regions ◽

The Impact

DNA methylation sequencing is becoming increasingly popular, yielding genome-wide methylome data at single-base pair resolution through the novel cost- and labor-optimized protocols. It has tremendous potential for cell-type heterogeneity analysis, particularly in tumors, due to intrinsic read-level information. Although diverse deconvolution methods were developed to infer cell-type composition based on bulk sequencing-based methylomes, their systematic evaluation has not been performed so far. Here, we thoroughly review and evaluate five previously published deconvolution methods: Bayesian epiallele detection (BED), PRISM, csmFinder + coMethy, ClubCpG and MethylPurify, together with two array-based methods, MeDeCom and Houseman as a comparison group. Sequencing-based deconvolution methods consist of two main steps, informative region selection and cell-type composition estimation. Accordingly, we individually assessed the performance of each step and demonstrated the impact of the former step upon the performance of the following one. In conclusion, we demonstrate the best method showing the highest accuracy in different samples, and infer factors affecting cell-type deconvolution performance according to the number of cell types in the mixture. We found that cell-type deconvolution performance is influenced by different factors according to the number of components in the mixture. Whereas selecting similar genomic regions to DMRs generally contributed to increasing the performance in bi-component mixtures, the uniformity of cell-type distribution showed a high correlation with the performance in five cell-type bulk analyses.

Download Full-text

Inferring relevant tissues and cell types for complex traits in genome-wide association studies

10.1101/2021.06.09.447805 ◽

2021 ◽

Author(s):

Rujin Wang ◽

Danyu Lin ◽

Yuchao Jiang

Keyword(s):

Single Cell ◽

Complex Traits ◽

Association Studies ◽

Cell Types ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Cell Type ◽

Disease Etiology ◽

Genome Wide ◽

Cell Type Specific

More than a decade of genome-wide association studies (GWASs) have identified genetic risk variants that are significantly associated with complex traits. Emerging evidence suggests that the function of trait-associated variants likely acts in a tissue- or cell-type-specific fashion. Yet, it remains challenging to prioritize trait-relevant tissues or cell types to elucidate disease etiology. Here, we present EPIC (cEll tyPe enrIChment), a statistical framework that relates large-scale GWAS summary statistics to cell-type-specific omics measurements from single-cell sequencing. We derive powerful gene-level test statistics for common and rare variants, separately and jointly, and adopt generalized least squares to prioritize trait-relevant tissues or cell types while accounting for the correlation structures both within and between genes. Using enrichment of loci associated with four lipid traits in the liver and enrichment of loci associated with three neurological disorders in the brain as ground truths, we show that EPIC outperforms existing methods. We extend our framework to single-cell transcriptomic data and identify cell types underlying type 2 diabetes and schizophrenia. The enrichment is replicated using independent GWAS and single-cell datasets and further validated using PubMed search and existing bulk case-control testing results.

Download Full-text

Cell type-specific histone acetylation profiling of Alzheimer’s Disease subjects and integration with genetics

10.1101/2020.03.26.010330 ◽

2020 ◽

Author(s):

Easwaran Ramamurthy ◽

Gwyneth Welch ◽

Jemmie Cheng ◽

Yixin Yuan ◽

Laura Gunsalus ◽

...

Keyword(s):

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Late Onset ◽

Cell Types ◽

Brain Cell ◽

Nucleotide Polymorphisms ◽

Risk Genes ◽

Genome Wide ◽

Dorsolateral Prefrontal ◽

Genomic Regions

We profile genome-wide histone 3 lysine 27 acetylation (H3K27ac) of 3 major brain cell types from hippocampus and dorsolateral prefrontal cortex (dlPFC) of subjects with and without Alzheimer’s Disease (AD). We confirm that single nucleotide polymorphisms (SNPs) associated with late onset AD (LOAD) prefer to reside in the microglial histone acetylome, which varies most strongly with age. We observe acetylation differences associated with AD pathology at 3,598 peaks, predominantly in an oligodendrocyte-enriched population. Strikingly, these differences occur at the promoters of known early onset AD (EOAD) risk genes (APP, PSEN1, PSEN2, BACE1), late onset AD (LOAD) risk genes (BIN1, PICALM, CLU, ADAM10, ADAMTS4, SORL1 and FERMT2), and putative enhancers annotated to other genes associated with AD pathology (MAPT). More broadly, acetylation differences in the oligodendrocyte-enriched population occur near genes in pathways for central nervous system myelination and oxidative phosphorylation. In most cases, these promoter acetylation differences are associated with differences in transcription in oligodendrocytes. Overall, we reveal deregulation of known and novel pathways in AD and highlight genomic regions as therapeutic targets in oligodendrocytes of hippocampus and dlPFC.

Download Full-text