Heterogeneity of Transcription Factor binding specificity models within and across cell lines
Complex gene expression patterns are mediated by binding of transcription factors (TF) to specific genomic loci. The in vivo occupancy of a TF is, in large part, determined by the TFs DNA binding interaction partners, motivating genomic context based models of TF occupancy. However, the approaches thus far have assumed a uniform binding model to explain genome wide bound sites for a TF in a cell-type and as such heterogeneity of TF occupancy models, and the extent to which binding rules underlying a TFs occupancy are shared across cell types, has not been investigated. Here, we develop an ensemble based approach (TRISECT) to identify heterogeneous binding rules of cell-type specific TF occupancy and analyze the inter-cell-type sharing of such rules. Comprehensive analysis of 23 TFs, each with ChIP-Seq data in 4-12 cell-types, shows that by explicitly capturing the heterogeneity of binding rules, TRISECT accurately identifies in vivo TF occupancy (93%) substantially improving upon previous methods. Importantly, many of the binding rules derived from individual cell-types are shared across cell-types and reveal distinct yet functionally coherent putative target genes in different cell-types. Closer inspection of the predicted cell-type-specific interaction partners provides insights into context-specific functional landscape of a TF. Together, our novel ensemble-based approach reveals, for the first time, a widespread heterogeneity of binding rules, comprising interaction partners within a cell-type, many of which nevertheless transcend cell-types. Notably, the putative targets of shared binding rules in different cell-types, while distinct, exhibit significant functional coherence.