scholarly journals Genome-Wide Motif Statistics are Shaped by DNA Binding Proteins over Evolutionary Time Scales

2016 ◽  
Vol 6 (4) ◽  
Author(s):  
Long Qian ◽  
Edo Kussell
2017 ◽  
Vol 46 (1) ◽  
pp. 54-70 ◽  
Author(s):  
Shandar Ahmad ◽  
Philip Prathipati ◽  
Lokesh P Tripathi ◽  
Yi-An Chen ◽  
Ajay Arya ◽  
...  

2007 ◽  
Vol 36 (1) ◽  
pp. e8-e8 ◽  
Author(s):  
Jue Zeng ◽  
Jizhou Yan ◽  
Ting Wang ◽  
Deborah Mosbrook-Davis ◽  
Kyle T. Dolan ◽  
...  

2016 ◽  
Author(s):  
Long Qian ◽  
Edo Kussell

AbstractEctopic DNA binding by transcription factors and other DNA binding proteins can be detrimental to cellular functions and ultimately to organismal fitness. The frequency of protein-DNA binding at non-functional sites depends on the global composition of a genome with respect to all possible short motifs, or k-mer words. To determine whether weak yet ubiquitous protein-DNA interactions could exert significant evolutionary pressures on genomes, we correlate in vitro measurements of binding strengths on all 8-mer words from a large collection of transcription factors, in several different species, against their relative genomic frequencies. Our analysis reveals a clear signal of purifying selection to reduce the large number of weak binding sites genome-wide. This evolutionary process, which we call global selection, has a detectable hallmark in that similar words experience similar evolutionary pressure, a consequence of the biophysics of protein-DNA binding. By analyzing a large collection of genomes, we show that global selection exists in all domains of life, and operates through tiny selective steps, maintaining genomic binding landscapes over long evolutionary timescales.


2020 ◽  
Author(s):  
Soraya Shehata ◽  
Savannah Spradlin ◽  
Alison Swearingen ◽  
Graycen Wheeler ◽  
Arpan Das ◽  
...  

AbstractA key aspect in defining cell state is the complex choreography of DNA binding events in a given cell type, which in turn establishes a cell-specific gene-expression program. In the past two decades since the sequencing of the human genome there has been a deluge of genome-wide experiments which have measured gene-expression and DNA binding events across numerous cell-types and tissues. Here we re-analyze ENCODE data in a highly reproducible manner by utilizing standardized analysis pipelines, containerization, and literate programming with Rmarkdown. Our approach validated many findings from previous independent studies, underscoring the importance of ENCODE’s goals in providing these reproducible data resources. This approach also revealed several new findings: (i) 1,362 promoters, termed ‘reservoirs,’ have up to 111 different DNA binding-proteins localized on one promoter yet do not have any expression of steady-state RNA (ii) The human specific SVA repeat element may have been co-opted for enhancer regulation. Collectively, this study performed by the students of a CU Boulder computational biology class (BCHM 5631 – Spring 2020) demonstrates the value of reproducible findings and how resources like ENCODE that prioritize data standards can foster new findings with existing data in a didactic environment.


Sign in / Sign up

Export Citation Format

Share Document