Discovery of Regulatory Elements by a Computational Method for Phylogenetic Footprinting

M. Blanchette

doi:10.1101/gr.6902

Haplotype-aware single-cell multiomics uncovers functional effects of somatic structural variation

10.1101/2021.11.11.468039 ◽

2021 ◽

Author(s):

Hyobin Jeong ◽

Karen Grimes ◽

Peter-Martin Bruch ◽

Tobias Rausch ◽

Patrick Hasenfeld ◽

...

Keyword(s):

Single Cell ◽

Nucleosome Occupancy ◽

Single Cells ◽

Chromosomal Rearrangements ◽

Regulatory Elements ◽

Computational Method ◽

Tumour Heterogeneity ◽

Cancer Genomes ◽

Functional Consequences ◽

Oncogenic Transcription Factor

Somatic structural variants (SVs) are widespread in cancer genomes, however, their impact on tumorigenesis and intra-tumour heterogeneity is incompletely understood, since methods to functionally characterize the broad spectrum of SVs arising in cancerous single-cells are lacking. We present a computational method, scNOVA, that couples SV discovery with nucleosome occupancy analysis by haplotype-resolved single-cell sequencing, to systematically uncover SV effects on cis-regulatory elements and gene activity. Application to leukemias and cell lines uncovered SV outcomes at several loci, including dysregulated cancer-related pathways and mono-allelic oncogene expression near SV breakpoints. At the intra-patient level, we identified different yet overlapping subclonal SVs that converge on aberrant Wnt signaling. We also deconvoluted the effects of catastrophic chromosomal rearrangements resulting in oncogenic transcription factor dysregulation. scNOVA directly links SVs to their functional consequences, opening the door for single-cell multiomics of SVs in heterogeneous cell populations.

Download Full-text

CONREAL: Conserved Regulatory Elements Anchored Alignment Algorithm for Identification of Transcription Factor Binding Sites by Phylogenetic Footprinting

Genome Research ◽

10.1101/gr.1642804 ◽

2003 ◽

Vol 14 (1) ◽

pp. 170-178 ◽

Cited By ~ 53

Author(s):

E. Berezikov

Keyword(s):

Transcription Factor ◽

Binding Sites ◽

Transcription Factor Binding Sites ◽

Phylogenetic Footprinting ◽

Regulatory Elements ◽

Transcription Factor Binding ◽

Alignment Algorithm ◽

Factor Binding

Download Full-text

EpiSAFARI: Sensitive detection of valleys in epigenetic signals for enhancing annotations of functional elements

Bioinformatics ◽

10.1093/bioinformatics/btz702 ◽

2019 ◽

Author(s):

Arif Harmanci ◽

Akdes Serin Harmanci ◽

Jyothishmathi Swaminathan ◽

Vidya Gopalakrishnan

Keyword(s):

Transcription Factor ◽

Regulatory Elements ◽

Transcription Factor Binding ◽

Computational Method ◽

Sensitive Detection ◽

Supplementary Information ◽

Chip Sequencing ◽

Factor Binding ◽

Nucleotide Resolution ◽

Systematic Identification

Abstract Motivation Functional genomics experiments generate genomewide signal profiles that are dense information sources for annotating the regulatory elements. These profiles measure epigenetic activity at the nucleotide resolution and they exhibit distinctive patterns as they fluctuate along the genome. Most notable of these patterns are the valley patterns that are prevalently observed in assays such as ChIP Sequencing and bisulfite sequencing. The genomic positions of valleys pinpoint locations of cis-regulatory elements such as enhancers and insulators. Systematic identification of the valleys provides novel information for delineating the annotation of regulatory elements. Nevertheless, the valleys are not reported by majority of the analysis pipelines. Results We describe EpiSAFARI, a computational method for sensitive detection of valleys from diverse types of epigenetic profiles. EpiSAFARI employs a novel smoothing method for decreasing noise in signal profiles and accounts for technical factors such as sparse signals, mappability, and nucleotide content. In performance comparisons, EpiSAFARI performs favorably in terms of accuracy. The histone modification valleys detected by EpiSAFARI exhibit high conservation, transcription factor binding, and they are enriched in nascent transcription. In addition, the large clusters of histone valleys are found to be enriched at the promoters of the developmentally associated genes. Differential histone valleys exhibit concordance with differential DNase signal at cell line specific valleys. DNA methylation valleys exhibit elevated conservation and high transcription factor binding. Specifically, we observed enriched binding of transcription factors associated with chromatin structure around methyl-valleys. Availability EpiSAFARI is publicly available at https://github.com/harmancilab/EpiSAFARI Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Structure-aided prediction of mammalian transcription factor complexes in conserved non-coding elements

Philosophical Transactions of the Royal Society B Biological Sciences ◽

10.1098/rstb.2013.0029 ◽

2013 ◽

Vol 368 (1632) ◽

pp. 20130029 ◽

Cited By ~ 24

Author(s):

Harendra Guturu ◽

Andrew C. Doxey ◽

Aaron M. Wenger ◽

Gill Bejerano

Keyword(s):

Transcription Factor ◽

Binding Site ◽

Protein Interactions ◽

Binding Sites ◽

Three Dimensional ◽

Structural Data ◽

Regulatory Elements ◽

Computational Method ◽

Web Resource ◽

Three Dimensional Models

Mapping the DNA-binding preferences of transcription factor (TF) complexes is critical for deciphering the functions of cis -regulatory elements. Here, we developed a computational method that compares co-occurring motif spacings in conserved versus unconserved regions of the human genome to detect evolutionarily constrained binding sites of rigid TF complexes. Structural data were used to estimate TF complex physical plausibility, explore overlapping motif arrangements seldom tackled by non-structure-aware methods, and generate and analyse three-dimensional models of the predicted complexes bound to DNA. Using this approach, we predicted 422 physically realistic TF complex motifs at 18% false discovery rate, the majority of which (326, 77%) contain some sequence overlap between binding sites. The set of mostly novel complexes is enriched in known composite motifs, predictive of binding site configurations in TF–TF–DNA crystal structures, and supported by ChIP-seq datasets. Structural modelling revealed three cooperativity mechanisms: direct protein–protein interactions, potentially indirect interactions and ‘through-DNA’ interactions. Indeed, 38% of the predicted complexes were found to contain four or more bases in which TF pairs appear to synergize through overlapping binding to the same DNA base pairs in opposite grooves or strands. Our TF complex and associated binding site predictions are available as a web resource at http://bejerano.stanford.edu/complex .

Download Full-text

Shortening of 3' UTRs in most cell types composing tumor tissues implicates alternative polyadenylation in protein metabolism

10.1101/2021.06.30.450496 ◽

2021 ◽

Author(s):

Dominik Burri ◽

Mihaela Zavolan

Keyword(s):

T Cell Activation ◽

Cell Activation ◽

Alternative Polyadenylation ◽

Cell Types ◽

Regulatory Elements ◽

Computational Method ◽

Control Tissue ◽

Mrna Maturation ◽

The Individual ◽

And Control

During pre-mRNA maturation 3' end processing can occur at different polyadenylation sites in the 3' untranslated region (3' UTR) to give rise to transcript isoforms that differ in the length of their 3' UTRs. Longer 3' UTRs contain additional cis-regulatory elements that impact the fate of the transcript and/or of the resulting protein. Extensive alternative polyadenylation (APA) has been observed in cancers, but the mechanisms and roles remain elusive. In particular, it is unclear whether the APA occurs in the malignant cells or in other cell types that infiltrate the tumor. To resolve this, we developed a computational method, called SCUREL, that quantifies changes in 3' UTR length between groups of cells, including cells of the same type originating from tumor and control tissue. We used this method to study APA in human lung adenocarcinoma (LUAD). SCUREL relies solely on annotated 3' UTRs and on control systems, such as T cell activation and spermatogenesis gives qualitatively similar results at much greater sensitivity compared to the previously published scAPA method. In the LUAD samples, we find a general trend towards 3' UTR shortening not only in cancer cells compared to the cell type of origin, but also when comparing other cell types from the tumor vs. the control tissue environment. However, we also find high variability in the individual targets between patients. The findings help to understand the extent and impact of APA in LUAD, which may support improvements in diagnosis and treatment.

Download Full-text

Identification of significant chromatin contacts from HiChIP data by FitHiChIP

Nature Communications ◽

10.1038/s41467-019-11950-y ◽

2019 ◽

Vol 10 (1) ◽

Cited By ~ 19

Author(s):

Sourya Bhattacharyya ◽

Vivek Chandra ◽

Pandurangan Vijayanand ◽

Ferhat Ay

Keyword(s):

Genetic Variants ◽

Statistical Significance ◽

Regulatory Elements ◽

Computational Method ◽

Genomic Distance ◽

Uniform Coverage

Abstract HiChIP/PLAC-seq is increasingly becoming popular for profiling 3D chromatin contacts among regulatory elements and for annotating functions of genetic variants. Here we describe FitHiChIP, a computational method for loop calling from HiChIP/PLAC-seq data, which jointly models the non-uniform coverage and genomic distance scaling of contact counts to compute statistical significance estimates. We also develop a technique to filter putative bystander loops that can be explained by stronger adjacent loops. Compared to existing methods, FitHiChIP performs better in recovering contacts reported by Hi-C, promoter capture Hi-C and ChIA-PET experiments and in capturing previously validated promoter-enhancer interactions. FitHiChIP loop calls are reproducible among replicates and are consistent across different experimental settings. Our work also provides a framework for differential HiChIP analysis with an option to utilize ChIP-seq data for further characterizing differential loops. Even though designed for HiChIP, FitHiChIP is also applicable to other conformation capture assays.

Download Full-text

Faculty Opinions recommendation of Phylogenetic footprinting reveals multiple regulatory elements involved in control of the meiotic recombination gene, REC102.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1004072.50154 ◽

2002 ◽

Author(s):

David Catcheside

Keyword(s):

Meiotic Recombination ◽

Phylogenetic Footprinting ◽

Regulatory Elements ◽

Recombination Gene

Download Full-text

Faculty Opinions recommendation of Phylogenetic footprinting reveals multiple regulatory elements involved in control of the meiotic recombination gene, REC102.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1004072.46355 ◽

2002 ◽

Author(s):

Michael Eisen

Keyword(s):

Meiotic Recombination ◽

Phylogenetic Footprinting ◽

Regulatory Elements ◽

Recombination Gene

Download Full-text

Graph-based data integration predicts long-range regulatory interactions across the human genome

10.1101/004622 ◽

2014 ◽

Cited By ~ 1

Author(s):

Sofie Demeyer ◽

Tom Michoel

Keyword(s):

Gene Expression ◽

Long Range ◽

Regulation Of Gene Expression ◽

Cell Types ◽

Exon Array ◽

Regulatory Elements ◽

Computational Method ◽

Open Chromatin ◽

Transcription Start Sites ◽

Distal Regulatory Elements

Transcriptional regulation of gene expression is one of the main processes that affect cell diversification from a single set of genes. Regulatory proteins often interact with DNA regions located distally from the transcription start sites (TSS) of the genes. We developed a computational method that combines open chromatin and gene expression information for a large number of cell types to identify these distal regulatory elements. Our method builds correlation graphs for publicly available DNase-seq and exon array datasets with matching samples and uses graph-based methods to filter findings supported by multiple datasets and remove indirect interactions. The resulting set of interactions was validated with both anecdotal information of known long-range interactions and unbiased experimental data deduced from Hi-C and CAGE experiments. Our results provide a novel set of high-confidence candidate open chromatin regions involved in gene regulation, often located several Mb away from the TSS of their target gene.

Download Full-text

Probabilities of Fitness Consequences for Point Mutations Across the Human Genome

10.1101/006825 ◽

2014 ◽

Cited By ~ 2

Author(s):

Brad Gulko ◽

Ilan Gronau ◽

Melissa J Hubisz ◽

Adam Siepel

Keyword(s):

Human Genome ◽

Point Mutations ◽

Cell Types ◽

Regulatory Elements ◽

Computational Method ◽

Fitness Consequences ◽

A Genome ◽

Public Data ◽

Fitness Consequence ◽

Functional Content

We describe a novel computational method for estimating the probability that a point mutation at each position in a genome will influence fitness. These fitness consequence (fitCons) scores serve as evolution-based measures of potential genomic function. Our approach is to cluster genomic positions into groups exhibiting distinct "fingerprints" based on high-throughput functional genomic data, then to estimate a probability of fitness consequences for each group from associated patterns of genetic polymorphism and divergence. We have generated fitCons scores for three human cell types based on public data from ENCODE. Compared with conventional conservation scores, fitCons scores show considerably improved prediction power for cis-regulatory elements. In addition, fitCons scores indicate that 4.2-7.5% of nucleotides in the human genome have influenced fitness since the human-chimpanzee divergence, and, in contrast to several recent studies, they suggest that recent evolutionary turnover has had alimited impact on the functional content of the genome.

Download Full-text