LR-DNase: Predicting TF binding from DNase-seq data

Mapping Intimacies ◽

10.1101/082594 ◽

2016 ◽

Author(s):

Arjan van der Velde ◽

Michael Purcaro ◽

William Stafford Noble ◽

Zhiping Weng

Keyword(s):

Logistic Regression ◽

Binding Sites ◽

Genomic Sequence ◽

Regulation Of Gene Expression ◽

Cell Types ◽

Computational Method ◽

Binding Prediction ◽

Logistic Regression Models ◽

Motif Score ◽

Genomic Locations

ABSTRACTTranscription factors play a key role in the regulation of gene expression. Hypersensitivity to DNase I cleavage has long been used to gauge the accessibility of genomic DNA for transcription factor binding and as an indicator of regulatory genomic locations. An increasing amount of ChIP-seq data on a large number of TFs is being generated, mostly in a small number of cell types. DNase-seq data are being produced for hundreds of cell types. We aimed to develop a computational method that could combine ChIP-seq and DNase-seq data to predict TF binding sites in a wide variety of cell types. We trained and tested a logistic regression model, called LR-DNase, to predict binding sites for a specific TF using seven features derived from DNase-seq and genomic sequence. We calculated the area under the precision-recall curve at a false discovery rate cutoff of 0.5 for the LR-DNase model, a number of logistic regression models with fewer features, and several existing state-of-the-art TF binding prediction methods. The LR-DNase model outperformed existing unsupervised and supervised methods. Additionally, for many TFs, a model that uses only two features, DNase-seq reads and motif score, was sufficient to match the performance of the best existing methods.

Download Full-text

Graph-based data integration predicts long-range regulatory interactions across the human genome

10.1101/004622 ◽

2014 ◽

Cited By ~ 1

Author(s):

Sofie Demeyer ◽

Tom Michoel

Keyword(s):

Gene Expression ◽

Long Range ◽

Regulation Of Gene Expression ◽

Cell Types ◽

Exon Array ◽

Regulatory Elements ◽

Computational Method ◽

Open Chromatin ◽

Transcription Start Sites ◽

Distal Regulatory Elements

Transcriptional regulation of gene expression is one of the main processes that affect cell diversification from a single set of genes. Regulatory proteins often interact with DNA regions located distally from the transcription start sites (TSS) of the genes. We developed a computational method that combines open chromatin and gene expression information for a large number of cell types to identify these distal regulatory elements. Our method builds correlation graphs for publicly available DNase-seq and exon array datasets with matching samples and uses graph-based methods to filter findings supported by multiple datasets and remove indirect interactions. The resulting set of interactions was validated with both anecdotal information of known long-range interactions and unbiased experimental data deduced from Hi-C and CAGE experiments. Our results provide a novel set of high-confidence candidate open chromatin regions involved in gene regulation, often located several Mb away from the TSS of their target gene.

Download Full-text

Low Affinity Binding Sites in an Activating CRM Mediate Negative Autoregulation of the Drosophila Hox Gene Ultrabithorax

10.1101/744631 ◽

2019 ◽

Author(s):

Rebecca K Delker ◽

Vikram Ranade ◽

Ryan Loker ◽

Roumen Voutev ◽

Richard S Mann

Keyword(s):

Gene Expression ◽

Binding Sites ◽

Transcriptional Control ◽

Regulation Of Gene Expression ◽

Cell Types ◽

Transcriptional Level ◽

Regulation Of Transcription ◽

Control Of Gene Expression ◽

Hox Gene ◽

Endogenous Locus

AbstractSpecification of cell identity and the proper functioning of a mature cell depend on precise regulation of gene expression. Both binary ON/OFF regulation of transcription, as well as more fine-tuned control of transcription levels in the ON state, are required to define cell types. The Drosophila melanogaster Hox gene, Ultrabithorax (Ubx), exhibits both of these modes of control during development. While ON/OFF regulation is needed to specify the fate of the developing wing (Ubx OFF) and haltere (Ubx ON), the levels of Ubx within the haltere differ between compartments along the proximal-distal axis. Here, we identify and molecularly dissect the novel contribution of a previously identified Ubx cis-regulatory module (CRM), anterobithorax (abx), to a negative auto-regulatory loop that maintains decreased Ubx expression in the proximal compartment of the haltere as compared to the distal compartment. We find that Ubx, in complex with the known Hox cofactors, Homothorax (Hth) and Extradenticle (Exd), acts through low-affinity Ubx-Exd binding sites to reduce the levels of Ubx transcription in the proximal compartment. Importantly, we also reveal that Ubx-Exd-binding site mutations sufficient to result in de-repression of abx activity in the proximal haltere in a transgenic context are not sufficient to de-repress Ubx expression when mutated at the endogenous locus, suggesting the presence of multiple mechanisms through which Ubx-mediated repression occurs. Our results underscore the complementary nature of CRM analysis through transgenic reporter assays and genome modification of the endogenous locus; but, they also highlight the increasing need to understand gene regulation within the native context to capture the potential input of multiple genomic elements on gene control.Author SummaryOne of the most fundamental questions in biology is how information encoded in the DNA is translated into the diversity of cell-types that exist within a multicellular organism, each with the same genome. Regulation at the transcriptional level, mediated through the activity of transcription factors bound to cis-regulatory modules (CRMs), plays a key role in this process. While we typically distinguish cell-type by the specific subset of genes that are transcriptionally ON or OFF, it is also important to consider the more fine-tuned transcriptional control of gene expression level. We focus on the regulatory logic of the Hox developmental regulator, Ultrabithorax (Ubx), in fruit flies, which exhibits both forms of transcriptional control. While ON/OFF control of Ubx is required to define differential appendage fate in the T2 and T3 thoracic segments, respectively, more fine-tuned control of transcription levels is observed in distinct compartments within the T3 appendage, itself, in which all cells exhibit a Ubx ON state. Through genetic analysis of regulatory inputs, and dissection of a Ubx CRM in a transgenic context and at the endogenous locus, we reveal a compartment-specific negative autoregulatory loop that dampens Ubx transcription to maintain distinct transcriptional levels within a single developing tissue.

Download Full-text

SACSANN: identifying sequence-based determinants of chromosomal compartments

10.1101/2020.10.06.328039 ◽

2020 ◽

Author(s):

Julie A Prost ◽

Christopher JF Cameron ◽

Mathieu Blanchette

Keyword(s):

Machine Learning ◽

Binding Sites ◽

Genomic Organization ◽

Genomic Sequence ◽

Cell Types ◽

Mouse Cell ◽

Cell Type ◽

3D Genome ◽

Cell Type Specific ◽

Human And Mouse

Genomic organization is critical for proper gene regulation and based on a hierarchical model, where chromosomes are segmented into megabase-sized, cell-type-specific transcriptionally active (A) and inactive (B) compartments. Here, we describe SACSANN, a machine learning pipeline consisting of stacked artificial neural networks that predicts compartment annotation solely from genomic sequence-based features such as predicted transcription factor binding sites and transposable elements. SACSANN provides accurate and cell-type specific compartment predictions, while identifying key genomic sequence determinants that associate with A/B compartments. Models are shown to be largely transferable across analogous human and mouse cell types. By enabling the study of chromosome compartmentalization in species for which no Hi-C data is available, SACSANN paves the way toward the study of 3D genome evolution. SACSANN is publicly available on GitHub: https://github.com/BlanchetteLab/SACSANN

Download Full-text

NetTIME: improving multitask transcription factor binding site prediction with base-pair resolution

10.1101/2021.05.29.446316 ◽

2021 ◽

Author(s):

Ren Yi ◽

Kyunghyun Cho ◽

Richard Bonneau

Keyword(s):

Transcription Factor ◽

Base Pair ◽

Binding Sites ◽

Learning Strategy ◽

Cell Types ◽

Multitask Learning ◽

Cell Type ◽

Binding Prediction ◽

Single Task ◽

Cell Type Specific

Machine learning models for predicting cell-type-specific transcription factor (TF) binding sites have become increasingly more accurate thanks to the increased availability of next-generation sequencing data and more standardized model evaluation criteria. However, knowledge transfer from data-rich to data-limited TFs and cell types remains crucial for improving TF binding prediction models because available binding labels are highly skewed towards a small collection of TFs and cell types. Transfer prediction of TF binding sites can potentially benefit from a multitask learning approach; however, existing methods typically use shallow single-task models to generate low-resolution predictions. Here we propose NetTIME, a multitask learning framework for predicting cell-type-specific transcription factor binding sites with base-pair resolution. We show that the multitask learning strategy for TF binding prediction is more efficient than the single-task approach due to the increased data availability. NetTIME trains high-dimensional embedding vectors to distinguish TF and cell-type identities. We show that this approach is critical for the success of the multitask learning strategy and allows our model to make accurate transfer predictions within and beyond the training panels of TFs and cell types. We additionally train a linear-chain conditional random field (CRF) to classify binding predictions and show that this CRF eliminates the need for setting a probability threshold and reduces classification noise. We compare our method's predictive performance with several state-of-the-art methods, including DeepBind, BindSpace, and Catchitt, and show that our method outperforms previous methods under both supervised and transfer learning settings.

Download Full-text

Predicting Enhancer-Promoter Interaction from Genomic Sequence with Deep Neural Networks

10.1101/085241 ◽

2016 ◽

Cited By ~ 22

Author(s):

Shashank Singh ◽

Yang Yang ◽

Barnabás Póczos ◽

Jian Ma

Keyword(s):

Deep Learning ◽

Target Genes ◽

Genomic Sequence ◽

Cell Types ◽

Computational Method ◽

Learning Models ◽

Cell Type ◽

Genome Wide ◽

Level Information ◽

Single Cell Type

AbstractIn the human genome, distal enhancers are involved in regulating target genes through proxi-mal promoters by forming enhancer-promoter interactions. Although recently developed high-throughput experimental approaches have allowed us to recognize potential enhancer-promoter interactions genome-wide, it is still largely unclear to what extent the sequence-level information encoded in our genome help guide such interactions. Here we report a new computational method (named “SPEID”) using deep learning models to predict enhancer-promoter interactions based on sequence-based features only, when the locations of putative enhancers and promoters in a particular cell type are given. Our results across six different cell types demonstrate that SPEID is effective in predicting enhancer-promoter interactions as compared to state-of-the-art methods that only use information from a single cell type. As a proof-of-principle, we also applied SPEID to identify somatic non-coding mutations in melanoma samples that may have reduced enhancer-promoter interactions in tumor genomes. This work demonstrates that deep learning models can help reveal that sequence-based features alone are sufficient to reliably predict enhancer-promoter interactions genome-wide.

Download Full-text

Assessing damage severity of plant hopper and leaf folder in rice using hyperspectral remote sensing and multinomial logistic regression models

10.1603/ice.2016.93420 ◽

2016 ◽

Author(s):

Mathyam Prabhakar

Keyword(s):

Remote Sensing ◽

Logistic Regression ◽

Regression Models ◽

Multinomial Logistic Regression ◽

Hyperspectral Remote Sensing ◽

Logistic Regression Models

Download Full-text

The impact of SPY angiography on intraoperative decision making and outcomes for post-mastectomy reconstruction

Journal of Cancer Science and Therapeutics ◽

10.36879/jcst.19.000109 ◽

2019 ◽

pp. 1-5

Keyword(s):

Decision Making ◽

Logistic Regression ◽

Skin Necrosis ◽

Regression Models ◽

Multivariable Analysis ◽

P Value ◽

Flap Necrosis ◽

Single Institution ◽

Logistic Regression Models ◽

The Impact

Objective: While the use of intraoperative laser angiography (SPY) is increasing in mastectomy patients, its impact in the operating room to change the type of reconstruction performed has not been well described. The purpose of this study is to investigate whether SPY angiography influences post-mastectomy reconstruction decisions and outcomes. Methods and materials: A retrospective analysis of mastectomy patients with reconstruction at a single institution was performed from 2015-2017.All patients underwent intraoperative SPY after mastectomy but prior to reconstruction. SPY results were defined as ‘good’, ‘questionable’, ‘bad’, or ‘had skin excised’. Complications within 60 days of surgery were compared between those whose SPY results did not change the type of reconstruction done versus those who did. Preoperative and intraoperative variables were entered into multivariable logistic regression models if significant at the univariate level. A p-value <0.05 was considered significant. Results: 267 mastectomies were identified, 42 underwent a change in the type of planned reconstruction due to intraoperative SPY results. Of the 42 breasts that underwent a change in reconstruction, 6 had a ‘good’ SPY result, 10 ‘questionable’, 25 ‘bad’, and 2 ‘had areas excised’ (p<0.01). After multivariable analysis, predictors of skin necrosis included patients with ‘questionable’ SPY results (p<0.01, OR: 8.1, 95%CI: 2.06 – 32.2) and smokers (p<0.01, OR:5.7, 95%CI: 1.5 – 21.2). Predictors of any complication included a change in reconstruction (p<0.05, OR:4.5, 95%CI: 1.4-14.9) and ‘questionable’ SPY result (p<0.01, OR: 4.4, 95%CI: 1.6-14.9). Conclusion: SPY angiography results strongly influence intraoperative surgical decisions regarding the type of reconstruction performed. Patients most at risk for flap necrosis and complication post-mastectomy are those with questionable SPY results.

Download Full-text

Utilizing Twitter Data Analysis and Deep Learning to Identify Drug Use (Preprint)

10.2196/preprints.14681 ◽

2019 ◽

Author(s):

Joseph Tassone ◽

Peizhi Yan ◽

Mackenzie Simpson ◽

Chetan Mendhe ◽

Vijay Mago ◽

...

Keyword(s):

Social Media ◽

Logistic Regression ◽

Deep Learning ◽

Decision Tree ◽

Semantic Meaning ◽

Predictive Capability ◽

Logistic Regression Models ◽

Twitter Data ◽

Data Points ◽

Positive Classification

BACKGROUND The collection and examination of social media has become a useful mechanism for studying the mental activity and behavior tendencies of users. OBJECTIVE Through the analysis of a collected set of Twitter data, a model will be developed for predicting positively referenced, drug-related tweets. From this, trends and correlations can be determined. METHODS Twitter social media tweets and attribute data were collected and processed using topic pertaining keywords, such as drug slang and use-conditions (methods of drug consumption). Potential candidates were preprocessed resulting in a dataset 3,696,150 rows. The predictive classification power of multiple methods was compared including regression, decision trees, and CNN-based classifiers. For the latter, a deep learning approach was implemented to screen and analyze the semantic meaning of the tweets. RESULTS The logistic regression and decision tree models utilized 12,142 data points for training and 1041 data points for testing. The results calculated from the logistic regression models respectively displayed an accuracy of 54.56% and 57.44%, and an AUC of 0.58. While an improvement, the decision tree concluded with an accuracy of 63.40% and an AUC of 0.68. All these values implied a low predictive capability with little to no discrimination. Conversely, the CNN-based classifiers presented a heavy improvement, between the two models tested. The first was trained with 2,661 manually labeled samples, while the other included synthetically generated tweets culminating in 12,142 samples. The accuracy scores were 76.35% and 82.31%, with an AUC of 0.90 and 0.91. Using association rule mining in conjunction with the CNN-based classifier showed a high likelihood for keywords such as “smoke”, “cocaine”, and “marijuana” triggering a drug-positive classification. CONCLUSIONS Predictive analysis without a CNN is limited and possibly fruitless. Attribute-based models presented little predictive capability and were not suitable for analyzing this type of data. The semantic meaning of the tweets needed to be utilized, giving the CNN-based classifier an advantage over other solutions. Additionally, commonly mentioned drugs had a level of correspondence with frequently used illicit substances, proving the practical usefulness of this system. Lastly, the synthetically generated set provided increased scores, improving the predictive capability. CLINICALTRIAL None

Download Full-text

Effect of prostatic apex shape (Lee types) and urethral sphincter length in preoperative MRI on very early continence rates after radical prostatectomy

International Urology and Nephrology ◽

10.1007/s11255-021-02809-7 ◽

2021 ◽

Author(s):

Mike Wenzel ◽

Felix Preisser ◽

Matthias Mueller ◽

Lena H. Theissen ◽

Maria N. Welte ◽

...

Keyword(s):

Logistic Regression ◽

Radical Prostatectomy ◽

Regression Models ◽

Urethral Sphincter ◽

Anatomical Characteristics ◽

Logistic Regression Models ◽

Pad Test ◽

Type D ◽

Multivariable Logistic Regression ◽

Type C

Abstract Purpose To test the effect of anatomic variants of the prostatic apex overlapping the membranous urethra (Lee type classification), as well as median urethral sphincter length (USL) in preoperative multiparametric magnetic resonance imaging (mpMRI) on the very early continence in open (ORP) and robotic-assisted radical prostatectomy (RARP) patients. Methods In 128 consecutive patients (01/2018–12/2019), USL and the prostatic apex classified according to Lee types A–D in mpMRI prior to ORP or RARP were retrospectively analyzed. Uni- and multivariable logistic regression models were used to identify anatomic characteristics for very early continence rates, defined as urine loss of ≤ 1 g in the PAD-test. Results Of 128 patients with mpMRI prior to surgery, 76 (59.4%) underwent RARP vs. 52 (40.6%) ORP. In total, median USL was 15, 15 and 10 mm in the sagittal, coronal and axial dimensions. After stratification according to very early continence in the PAD-test (≤ 1 g vs. > 1 g), continent patients had significantly more frequently Lee type D (71.4 vs. 54.4%) and C (14.3 vs. 7.6%, p = 0.03). In multivariable logistic regression models, the sagittal median USL (odds ratio [OR] 1.03) and Lee type C (OR: 7.0) and D (OR: 4.9) were independent predictors for achieving very early continence in the PAD-test. Conclusion Patients’ individual anatomical characteristics in mpMRI prior to radical prostatectomy can be used to predict very early continence. Lee type C and D suggest being the most favorable anatomical characteristics. Moreover, longer sagittal median USL in mpMRI seems to improve very early continence rates.

Download Full-text

785 Reported Restful Sleep Predicting Emotional Distress: Does Exercise (and its modalities) moderate?

SLEEP ◽

10.1093/sleep/zsab072.782 ◽

2021 ◽

Vol 44 (Supplement_2) ◽

pp. A305-A306

Author(s):

Jesse Moore ◽

Ellita Williams ◽

Collin Popp ◽

Anthony Briggs ◽

Judite Blanc ◽

...

Keyword(s):

Logistic Regression ◽

Aerobic Exercise ◽

Regression Models ◽

Regression Analyses ◽

Logistic Regression Models ◽

The Past ◽

Strengthening Exercise ◽

Waking Up ◽

The Relationship ◽

Age And Sex

Abstract Introduction Literature shows that exercise moderates the relationship between sleep and emotional distress (ED.) However, it is unclear whether different types of exercise, such as aerobic and strengthening, affect this relationship differently. We investigated the moderating role of two types of exercise (aerobic and strengthening) regarding the relationship between ED and sleep. Methods Our analysis was based on data from 2018 National Health Interview Survey (NHIS), a nationally representative study in which 2,814 participants provided all data. Participants were asked 1) “how many days they woke up feeling rested over the past week”, 2) the Kessler 6 scale to determine ED (a score >13 indicates ED), and 3) the average frequency of strengthening or aerobic exercise per week. Logistic regression analyses were performed to determine if the reported days of waking up rested predicted level of ED. We then investigated whether strengthening or aerobic exercise differentially moderated this relationship. Covariates such as age and sex were adjusted in the logistic regression models. Logistic regression analyses were performed to determine if subjective reporting of restful sleep predicted level of ED. We investigated whether strengthening exercise or aerobic exercise differentially moderated this relationship. Covariates such as age and sex were adjusted in the logistic regression models. Results On average, participants reported 4.41 restful nights of sleep (SD =2.41), 3.43 strengthening activities (SD = 3.19,) and 8.47 aerobic activities a week (SD=5.91.) We found a significant association between days over the past week reporting waking up feeling rested and ED outcome according to K6, Χ2(1) = -741, p= <.001. The odds ratio signified a decrease of 52% in ED scores for each unit of restful sleep (OR = .48, (95% CI = .33, .65) p=<.001.) In the logistic regression model with moderation, aerobic exercise had a significant moderation effect, Χ2(1) = .03, p=.04, but strengthening exercise did not. Conclusion We found that restful sleep predicted reduction in ED scores. Aerobic exercise moderated this relationship, while strengthening exercise did not. Further research should investigate the longitudinal effects of exercise type on the relationship between restful sleep and ED. Support (if any) NIH (K07AG052685, R01MD007716, K01HL135452, R01HL152453)

Download Full-text