scholarly journals Direct prediction of regulatory elements from partial data without imputation

2019 ◽  
Author(s):  
Yu Zhang ◽  
Shaun Mahony

ABSTRACTGenome segmentation approaches allow us to characterize regulatory states in a given cell type using combinatorial patterns of histone modifications and other regulatory signals. In order to analyze regulatory state differences across cell types, current genome segmentation approaches typically require that the same regulatory genomics assays have been performed in all analyzed cell types. This necessarily limits both the numbers of cell types that can be analyzed and the complexity of the resulting regulatory states, as only a small number of histone modifications have been profiled across many cell types. Data imputation approaches that aim to estimate missing regulatory signals have been applied before genome segmentation. However, this approach is computationally costly and propagates any errors in imputation to produce incorrect genome segmentation results downstream.We present an extension to the IDEAS genome segmentation platform which can perform genome segmentation on incomplete regulatory genomics dataset collections without using imputation. Instead of relying on imputed data, we use an expectation-maximization approach to estimate marginal density functions within each regulatory state. We demonstrate that our genome segmentation results compare favorably with approaches based on imputation or other strategies for handling missing data. We further show that our approach can accurately impute missing data after genome segmentation, reversing the typical order of imputation/genome segmentation pipelines. Finally, we present a new 2D genome segmentation analysis of 127 human cell types studied by the Roadmap Epigenomics Consortium. By using an expanded set of chromatin marks that have been profiled in subsets of these cell types, our new segmentation results capture a more complex picture of combinatorial regulatory patterns that appear on the human genome.

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Seyed Ali Madani Tonekaboni ◽  
Benjamin Haibe-Kains ◽  
Mathieu Lupien

AbstractThe human genome is partitioned into a collection of genomic features, inclusive of genes, transposable elements, lamina interacting regions, early replicating control elements and cis-regulatory elements, such as promoters, enhancers, and anchors of chromatin interactions. Uneven distribution of these features within chromosomes gives rise to clusters, such as topologically associating domains (TADs), lamina-associated domains, clusters of cis-regulatory elements or large organized chromatin lysine (K) domains (LOCKs). Here we show that LOCKs from diverse histone modifications discriminate primitive from differentiated cell types. Active LOCKs (H3K4me1, H3K4me3 and H3K27ac) cover a higher fraction of the genome in primitive compared to differentiated cell types while repressive LOCKs (H3K9me3, H3K27me3 and H3K36me3) do not. Active LOCKs in differentiated cells lie proximal to highly expressed genes while active LOCKs in primitive cells tend to be bivalent. Genes proximal to bivalent LOCKs are minimally expressed in primitive cells. Furthermore, bivalent LOCKs populate TAD boundaries and are preferentially bound by regulators of chromatin interactions, including CTCF, RAD21 and ZNF143. Together, our results argue that LOCKs discriminate primitive from differentiated cell populations.


2017 ◽  
Author(s):  
Can Wang ◽  
Shihua Zhang

AbstractHistone modifications have been widely elucidated to play vital roles in gene regulation and cell identity. The Roadmap Epigenomics Consortium generated a reference catalogue of several key histone modifications across >100s of human cell types and tissues. Decoding these epigenomes into functional regulatory elements is a challenging task in computational biology. To this end, we adopted a differential chromatin modification analysis framework to comprehensively determine and characterize cell type-specific regulatory elements (CSREs) and their histone modification codes in the human epigenomes of five histone modifications across 127 tissues or cell types. The CSREs show significant relevance with cell type-specific biological functions and diseases and cell identity. Clustering of CSREs with their specificity signals reveals diverse histone codes, demonstrating the diversity of functional roles of CSREs within the same cell or tissue. Last but not least, dynamics of CSREs from close cell types or tissues can give a detailed view of developmental processes such as normal tissue development and cancer occurrence.


F1000Research ◽  
2018 ◽  
Vol 7 ◽  
pp. 121 ◽  
Author(s):  
Enrico Ferrero

The identification of therapeutic targets is a critical step in the research and developement of new drugs, with several drug discovery programmes failing because of a weak linkage between target and disease. Genome-wide association studies and large-scale gene expression experiments are providing insights into the biology of several common and complex diseases, but the complexity of transcriptional regulation mechanisms often limit our understanding of how genetic variation can influence changes in gene expression. Several initiatives in the field of regulatory genomics are aiming to close this gap by systematically identifying and cataloguing regulatory elements such as promoters and enhacers across different tissues and cell types. In this Bioconductor workflow, we will explore how different types of regulatory genomic data can be used for the functional interpretation of disease-associated variants and for the prioritisation of gene lists from gene expression experiments.


F1000Research ◽  
2018 ◽  
Vol 7 ◽  
pp. 121
Author(s):  
Enrico Ferrero

The identification of therapeutic targets is a critical step in the research and developement of new drugs, with several drug discovery programmes failing because of a weak linkage between target and disease. Genome-wide association studies and large-scale gene expression experiments are providing insights into the biology of several common diseases, but the complexity of transcriptional regulation mechanisms often limits our understanding of how genetic variation can influence changes in gene expression. Several initiatives in the field of regulatory genomics are aiming to close this gap by systematically identifying and cataloguing regulatory elements such as promoters and enhacers across different tissues and cell types. In this Bioconductor workflow, we will explore how different types of regulatory genomic data can be used for the functional interpretation of disease-associated variants and for the prioritisation of gene lists from gene expression experiments.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
John A. Halsall ◽  
Simon Andrews ◽  
Felix Krueger ◽  
Charlotte E. Rutledge ◽  
Gabriella Ficz ◽  
...  

AbstractChromatin configuration influences gene expression in eukaryotes at multiple levels, from individual nucleosomes to chromatin domains several Mb long. Post-translational modifications (PTM) of core histones seem to be involved in chromatin structural transitions, but how remains unclear. To explore this, we used ChIP-seq and two cell types, HeLa and lymphoblastoid (LCL), to define how changes in chromatin packaging through the cell cycle influence the distributions of three transcription-associated histone modifications, H3K9ac, H3K4me3 and H3K27me3. We show that chromosome regions (bands) of 10–50 Mb, detectable by immunofluorescence microscopy of metaphase (M) chromosomes, are also present in G1 and G2. They comprise 1–5 Mb sub-bands that differ between HeLa and LCL but remain consistent through the cell cycle. The same sub-bands are defined by H3K9ac and H3K4me3, while H3K27me3 spreads more widely. We found little change between cell cycle phases, whether compared by 5 Kb rolling windows or when analysis was restricted to functional elements such as transcription start sites and topologically associating domains. Only a small number of genes showed cell-cycle related changes: at genes encoding proteins involved in mitosis, H3K9 became highly acetylated in G2M, possibly because of ongoing transcription. In conclusion, modified histone isoforms H3K9ac, H3K4me3 and H3K27me3 exhibit a characteristic genomic distribution at resolutions of 1 Mb and below that differs between HeLa and lymphoblastoid cells but remains remarkably consistent through the cell cycle. We suggest that this cell-type-specific chromosomal bar-code is part of a homeostatic mechanism by which cells retain their characteristic gene expression patterns, and hence their identity, through multiple mitoses.


Sign in / Sign up

Export Citation Format

Share Document