Direct prediction of regulatory elements from partial data without imputation

Mapping Intimacies ◽

10.1101/643486 ◽

2019 ◽

Cited By ~ 1

Author(s):

Yu Zhang ◽

Shaun Mahony

Keyword(s):

Missing Data ◽

Histone Modifications ◽

Cell Types ◽

Regulatory Elements ◽

Data Imputation ◽

Regulatory State ◽

Partial Data ◽

Segmentation Analysis ◽

Regulatory Genomics ◽

Complex Picture

ABSTRACTGenome segmentation approaches allow us to characterize regulatory states in a given cell type using combinatorial patterns of histone modifications and other regulatory signals. In order to analyze regulatory state differences across cell types, current genome segmentation approaches typically require that the same regulatory genomics assays have been performed in all analyzed cell types. This necessarily limits both the numbers of cell types that can be analyzed and the complexity of the resulting regulatory states, as only a small number of histone modifications have been profiled across many cell types. Data imputation approaches that aim to estimate missing regulatory signals have been applied before genome segmentation. However, this approach is computationally costly and propagates any errors in imputation to produce incorrect genome segmentation results downstream.We present an extension to the IDEAS genome segmentation platform which can perform genome segmentation on incomplete regulatory genomics dataset collections without using imputation. Instead of relying on imputed data, we use an expectation-maximization approach to estimate marginal density functions within each regulatory state. We demonstrate that our genome segmentation results compare favorably with approaches based on imputation or other strategies for handling missing data. We further show that our approach can accurately impute missing data after genome segmentation, reversing the typical order of imputation/genome segmentation pipelines. Finally, we present a new 2D genome segmentation analysis of 127 human cell types studied by the Roadmap Epigenomics Consortium. By using an expanded set of chromatin marks that have been profiled in subsets of these cell types, our new segmentation results capture a more complex picture of combinatorial regulatory patterns that appear on the human genome.

Download Full-text

Large organized chromatin lysine domains help distinguish primitive from differentiated cell populations

Nature Communications ◽

10.1038/s41467-020-20830-9 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Seyed Ali Madani Tonekaboni ◽

Benjamin Haibe-Kains ◽

Mathieu Lupien

Keyword(s):

Transposable Elements ◽

Histone Modifications ◽

Cell Types ◽

Regulatory Elements ◽

Cell Populations ◽

Genomic Features ◽

Differentiated Cells ◽

Chromatin Interactions ◽

Topologically Associating Domains ◽

Highly Expressed Genes

AbstractThe human genome is partitioned into a collection of genomic features, inclusive of genes, transposable elements, lamina interacting regions, early replicating control elements and cis-regulatory elements, such as promoters, enhancers, and anchors of chromatin interactions. Uneven distribution of these features within chromosomes gives rise to clusters, such as topologically associating domains (TADs), lamina-associated domains, clusters of cis-regulatory elements or large organized chromatin lysine (K) domains (LOCKs). Here we show that LOCKs from diverse histone modifications discriminate primitive from differentiated cell types. Active LOCKs (H3K4me1, H3K4me3 and H3K27ac) cover a higher fraction of the genome in primitive compared to differentiated cell types while repressive LOCKs (H3K9me3, H3K27me3 and H3K36me3) do not. Active LOCKs in differentiated cells lie proximal to highly expressed genes while active LOCKs in primitive cells tend to be bivalent. Genes proximal to bivalent LOCKs are minimally expressed in primitive cells. Furthermore, bivalent LOCKs populate TAD boundaries and are preferentially bound by regulators of chromatin interactions, including CTCF, RAD21 and ZNF143. Together, our results argue that LOCKs discriminate primitive from differentiated cell populations.

Download Full-text

Large-scale determination and characterization of cell type-specific regulatory elements in the human genome

10.1101/176602 ◽

2017 ◽

Author(s):

Can Wang ◽

Shihua Zhang

Keyword(s):

Histone Modifications ◽

Large Scale ◽

Chromatin Modification ◽

Cell Types ◽

Regulatory Elements ◽

Cell Type ◽

Cell Identity ◽

Functional Roles ◽

Cancer Occurrence ◽

Cell Type Specific

AbstractHistone modifications have been widely elucidated to play vital roles in gene regulation and cell identity. The Roadmap Epigenomics Consortium generated a reference catalogue of several key histone modifications across >100s of human cell types and tissues. Decoding these epigenomes into functional regulatory elements is a challenging task in computational biology. To this end, we adopted a differential chromatin modification analysis framework to comprehensively determine and characterize cell type-specific regulatory elements (CSREs) and their histone modification codes in the human epigenomes of five histone modifications across 127 tissues or cell types. The CSREs show significant relevance with cell type-specific biological functions and diseases and cell identity. Clustering of CSREs with their specificity signals reveals diverse histone codes, demonstrating the diversity of functional roles of CSREs within the same cell or tissue. Last but not least, dynamics of CSREs from close cell types or tissues can give a detailed view of developmental processes such as normal tissue development and cancer occurrence.

Download Full-text

Using regulatory genomics data to interpret the function of disease variants and prioritise genes from expression studies

F1000Research ◽

10.12688/f1000research.13577.1 ◽

2018 ◽

Vol 7 ◽

pp. 121 ◽

Cited By ~ 1

Author(s):

Enrico Ferrero

Keyword(s):

Gene Expression ◽

Large Scale ◽

Association Studies ◽

Cell Types ◽

Regulatory Elements ◽

New Drugs ◽

Genome Wide Association Studies ◽

Functional Interpretation ◽

Expression Studies ◽

Regulatory Genomics

The identification of therapeutic targets is a critical step in the research and developement of new drugs, with several drug discovery programmes failing because of a weak linkage between target and disease. Genome-wide association studies and large-scale gene expression experiments are providing insights into the biology of several common and complex diseases, but the complexity of transcriptional regulation mechanisms often limit our understanding of how genetic variation can influence changes in gene expression. Several initiatives in the field of regulatory genomics are aiming to close this gap by systematically identifying and cataloguing regulatory elements such as promoters and enhacers across different tissues and cell types. In this Bioconductor workflow, we will explore how different types of regulatory genomic data can be used for the functional interpretation of disease-associated variants and for the prioritisation of gene lists from gene expression experiments.

Download Full-text

Using regulatory genomics data to interpret the function of disease variants and prioritise genes from expression studies

F1000Research ◽

10.12688/f1000research.13577.2 ◽

2018 ◽

Vol 7 ◽

pp. 121

Author(s):

Enrico Ferrero

Keyword(s):

Gene Expression ◽

Large Scale ◽

Association Studies ◽

Cell Types ◽

Regulatory Elements ◽

New Drugs ◽

Genome Wide Association Studies ◽

Functional Interpretation ◽

Expression Studies ◽

Regulatory Genomics

The identification of therapeutic targets is a critical step in the research and developement of new drugs, with several drug discovery programmes failing because of a weak linkage between target and disease. Genome-wide association studies and large-scale gene expression experiments are providing insights into the biology of several common diseases, but the complexity of transcriptional regulation mechanisms often limits our understanding of how genetic variation can influence changes in gene expression. Several initiatives in the field of regulatory genomics are aiming to close this gap by systematically identifying and cataloguing regulatory elements such as promoters and enhacers across different tissues and cell types. In this Bioconductor workflow, we will explore how different types of regulatory genomic data can be used for the functional interpretation of disease-associated variants and for the prioritisation of gene lists from gene expression experiments.

Download Full-text

244-OR: Single Nuclei Chromatin and Transcriptome Profiling across Human and Rat Skeletal Muscle Identifies Regulatory Elements, Genes, and Cell Types Associated with Diabetes GWAS Signals

Diabetes ◽

10.2337/db20-244-or ◽

2020 ◽

Vol 69 (Supplement 1) ◽

pp. 244-OR

Author(s):

STEPHEN PARKER ◽

Keyword(s):

Skeletal Muscle ◽

Transcriptome Profiling ◽

Cell Types ◽

Regulatory Elements ◽

Rat Skeletal Muscle

Download Full-text

An Improved Novel Index Measured Segmentation Based Imputation Algorithm for Missing Data Imputation

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse/v7i6/0217 ◽

2017 ◽

Vol 7 (6) ◽

pp. 283-286

Author(s):

Priyadharsini .C ◽

◽

Antony Selvadoss Thanamani ◽

Keyword(s):

Missing Data ◽

Data Imputation ◽

Missing Data Imputation

Download Full-text

A Two-stage Deep Autoencoder-based Missing Data Imputation Method for Wind Farm SCADA Data

IEEE Sensors Journal ◽

10.1109/jsen.2021.3061109 ◽

2021 ◽

pp. 1-1

Author(s):

Xin Liu ◽

Zijun Zhang

Keyword(s):

Missing Data ◽

Wind Farm ◽

Imputation Method ◽

Data Imputation ◽

Two Stage ◽

Missing Data Imputation

Download Full-text

Cooperative Clustering Missing Data Imputation

2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC) ◽

10.1109/smc42975.2020.9283484 ◽

2020 ◽

Author(s):

Daoming Wan ◽

Roozbeh Razavi-Far ◽

Mehrdad Saif

Keyword(s):

Missing Data ◽

Data Imputation ◽

Missing Data Imputation ◽

Cooperative Clustering

Download Full-text

Spatio-Temporal Missing Data Imputation for Smart Power Grids

Proceedings of the Twelfth ACM International Conference on Future Energy Systems ◽

10.1145/3447555.3466586 ◽

2021 ◽

Author(s):

Sanmukh R. Kuppannagari ◽

Yao Fu ◽

Chung Ming Chueng ◽

Viktor K. Prasanna

Keyword(s):

Missing Data ◽

Power Grids ◽

Data Imputation ◽

Smart Power ◽

Missing Data Imputation ◽

Smart Power Grids ◽

Spatio Temporal

Download Full-text

Histone modifications form a cell-type-specific chromosomal bar code that persists through the cell cycle

Scientific Reports ◽

10.1038/s41598-021-82539-z ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

John A. Halsall ◽

Simon Andrews ◽

Felix Krueger ◽

Charlotte E. Rutledge ◽

Gabriella Ficz ◽

...

Keyword(s):

Gene Expression ◽

Cell Cycle ◽

Histone Modifications ◽

Expression Patterns ◽

Cell Types ◽

Cell Type ◽

Bar Code ◽

Genes Encoding ◽

Cell Type Specific ◽

Rolling Windows

AbstractChromatin configuration influences gene expression in eukaryotes at multiple levels, from individual nucleosomes to chromatin domains several Mb long. Post-translational modifications (PTM) of core histones seem to be involved in chromatin structural transitions, but how remains unclear. To explore this, we used ChIP-seq and two cell types, HeLa and lymphoblastoid (LCL), to define how changes in chromatin packaging through the cell cycle influence the distributions of three transcription-associated histone modifications, H3K9ac, H3K4me3 and H3K27me3. We show that chromosome regions (bands) of 10–50 Mb, detectable by immunofluorescence microscopy of metaphase (M) chromosomes, are also present in G1 and G2. They comprise 1–5 Mb sub-bands that differ between HeLa and LCL but remain consistent through the cell cycle. The same sub-bands are defined by H3K9ac and H3K4me3, while H3K27me3 spreads more widely. We found little change between cell cycle phases, whether compared by 5 Kb rolling windows or when analysis was restricted to functional elements such as transcription start sites and topologically associating domains. Only a small number of genes showed cell-cycle related changes: at genes encoding proteins involved in mitosis, H3K9 became highly acetylated in G2M, possibly because of ongoing transcription. In conclusion, modified histone isoforms H3K9ac, H3K4me3 and H3K27me3 exhibit a characteristic genomic distribution at resolutions of 1 Mb and below that differs between HeLa and lymphoblastoid cells but remains remarkably consistent through the cell cycle. We suggest that this cell-type-specific chromosomal bar-code is part of a homeostatic mechanism by which cells retain their characteristic gene expression patterns, and hence their identity, through multiple mitoses.

Download Full-text