Deep Learning with Implicit Handling of Tissue-Specific Phenomena Predicts Tumor DNA Accessibility and Immune Activity

2019 ◽  
Author(s):  
Kamil Wnuk ◽  
Jeremi Sudol ◽  
Kevin Givechian ◽  
Patrick Soon-Shiong ◽  
Shahrooz Rabizadeh ◽  
...  
iScience ◽  
2019 ◽  
Vol 20 ◽  
pp. 119-136 ◽  
Author(s):  
Kamil Wnuk ◽  
Jeremi Sudol ◽  
Kevin B. Givechian ◽  
Patrick Soon-Shiong ◽  
Shahrooz Rabizadeh ◽  
...  

2017 ◽  
Author(s):  
Kamil Wnuk ◽  
Jeremi Sudol ◽  
Kevin B. Givechian ◽  
Patrick Soon-Shiong ◽  
Shahrooz Rabizadeh ◽  
...  

AbstractDNA accessibility is a key dynamic feature of chromatin regulation that can potentiate transcriptional events and tumor progression. Recently, neural networks have begun to make it possible to explore the impact of mutations on DNA accessibility and transcriptional regulation by demonstrating state-of-the-art prediction of chromatin features from DNA sequence data in specific tissue types. We demonstrate enhancements to improve such tissue-specific prediction performance, and show that by extending models with RNA-seq expression input, they can be applied to novel tissue samples whose types were not present in training. We show that our expression-informed model achieved particularly consistent accuracy predicting DNA accessibility at promoter and promoter flank regions of the genome.Leveraging this new tool to analyze tumor genomes across tissues, we provide a first glimpse of the DNA accessibility landscape across The Cancer Genome Atlas (TCGA). Our analysis of the Lung Adenocarcinoma (LUAD) cohort reveals that viewing tumors from the perspective of accessibility at promoters uniquely highlights several immune pathways inversely correlated with an overall more open chromatin state. Further, through identification of accessibility sites linked with differential gene expression in immune-inflamed LUAD tumors and training of a classifier ensemble, we show that patterns of predicted chromatin state are discriminative of immune activity across many tumor types, with direct implications for patient prognosis. We see such models playing a significant future role in matching patients to appropriate immunotherapy treatment regimens, as well as in analysis of other conditions where epigenetic state may play a significant role.Significance StatementDNA accessibility determines whether proteins have access to DNA-binding sites and is a key dynamic feature that influences regulation of gene expression that differentiates cells. We improve and extend a neural network model in a way that expands its application domain beyond studying the impact of genetic sequence and mutations on DNA accessibility in specific cell types, to tissues for which training data is unavailable.Leveraging our tool to analyze tumor genomes, we demonstrate that in lung adenocarcinomas the accessibility perspective uniquely highlights immune pathways inversely correlated with a more accessible DNA state. Further, we show that accessibility patterns learned from even a single tumor type can discriminate immune inflammation across many cancers, often with direct relation to patient prognosis.


2019 ◽  
Author(s):  
Mike Phuycharoen ◽  
Peyman Zarrineh ◽  
Laure Bridoux ◽  
Shilu Amin ◽  
Marta Losa ◽  
...  

ABSTRACTMotivationTranscription factors (TFs) can bind DNA in a cooperative manner, enabling a mutual increase in occupancy. Through this type of interaction, alternative binding sites can be preferentially bound in different tissues to regulate tissue-specific expression programmes. Recently, deep learning models have become state-of-the-art in various pattern analysis tasks, including applications in the field of genomics. We therefore investigate the application of convolutional neural network (CNN) models to the discovery of sequence features determining cooperative and differential TF binding across tissues.ResultsWe analyse ChIP-seq data from MEIS, TFs which are broadly expressed across mouse branchial arches, and HOXA2, which is expressed in the second and more posterior branchial arches. By developing models predictive of MEIS differential binding in all three tissues we are able to accurately predict HOXA2 co-binding sites. We evaluate transfer-like and multitask approaches to regularising the high-dimensional classification task with a larger regression dataset, allowing for creation of deeper and more accurate models. We test the performance of perturbation and gradient-based attribution methods in identifying the HOXA2 sites from differential MEIS data. Our results show that deep regularised models significantly outperform shallow CNNs as well as k-mer methods in the discovery of tissue-specific sites bound in vivo.AvailabilityFor implementation and models please visit https://doi.org/10.5281/zenodo.2635463.


Author(s):  
Kamil Wnuk ◽  
Jeremi Sudol ◽  
Shahrooz Rabizadeh ◽  
Patrick Soon-Shiong ◽  
Christopher Szeto ◽  
...  

2021 ◽  
Author(s):  
R. Tyler McLaughlin ◽  
Maansi Asthana ◽  
Marc Di Meo ◽  
Michele Ceccarelli ◽  
Howard J. Jacob ◽  
...  

In precision oncology, reliable identification of tumor-specific DNA mutations requires sequencing tumor DNA and non-tumor DNA (so-called "matched normal") from the same patient. The normal sample allows researchers to distinguish acquired (somatic) and hereditary (germline) variants. The ability to distinguish somatic and germline variants facilitates estimation of tumor mutation burden (TMB), which is a recently FDA-approved pan-cancer marker for highly successful cancer immunotherapies; in tumor-only variant calling (i.e., without a matched normal), the difficulty in discriminating germline and somatic variants results in inflated and unreliable TMB estimates. We apply machine learning to the task of somatic vs germline classification in tumor-only samples using TabNet, a recently developed attentive deep learning model for tabular data that has achieved state of the art performance in multiple classification tasks (Arik and Pfister 2019). We constructed a training set for supervised classification using features derived from tumor-only variant calling and drawing somatic and germline truth-labels from an independent pipeline incorporating the patient-matched normal samples. Our trained model achieved state-of-the-art performance on two hold-out test datasets: a TCGA dataset including sarcoma, breast adenocarcinoma, and endometrial carcinoma samples (F1-score: 88.3), and a metastatic melanoma dataset, (F1-score 79.8). Concordance between matched-normal and tumor-only TMB improves from R2 = 0.006 to 0.705 with the addition of our classifier. And importantly, this approach generalizes across tumor tissue types and capture kits and has a call rate of 100%. The interpretable feature masks of the attentive deep learning model explain the reasons for misclassified variants. We reproduce the recent finding that tumor-only TMB estimates for Black patients are extremely inflated relative to that of White patients due to the racial biases of germline databases. We show that our machine learning approach appreciably reduces this racial bias in tumor-only variant-calling.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Guanhua Zhu ◽  
Yu A. Guo ◽  
Danliang Ho ◽  
Polly Poon ◽  
Zhong Wee Poh ◽  
...  

AbstractProfiling of circulating tumor DNA (ctDNA) may offer a non-invasive approach to monitor disease progression. Here, we develop a quantitative method, exploiting local tissue-specific cell-free DNA (cfDNA) degradation patterns, that accurately estimates ctDNA burden independent of genomic aberrations. Nucleosome-dependent cfDNA degradation at promoters and first exon-intron junctions is strongly associated with differential transcriptional activity in tumors and blood. A quantitative model, based on just 6 regulatory regions, could accurately predict ctDNA levels in colorectal cancer patients. Strikingly, a model restricted to blood-specific regulatory regions could predict ctDNA levels across both colorectal and breast cancer patients. Using compact targeted sequencing (<25 kb) of predictive regions, we demonstrate how the approach could enable quantitative low-cost tracking of ctDNA dynamics and disease progression.


2020 ◽  
Vol 48 (5) ◽  
pp. e27-e27 ◽  
Author(s):  
Mike Phuycharoen ◽  
Peyman Zarrineh ◽  
Laure Bridoux ◽  
Shilu Amin ◽  
Marta Losa ◽  
...  

Abstract Transcription factors (TFs) can bind DNA in a cooperative manner, enabling a mutual increase in occupancy. Through this type of interaction, alternative binding sites can be preferentially bound in different tissues to regulate tissue-specific expression programmes. Recently, deep learning models have become state-of-the-art in various pattern analysis tasks, including applications in the field of genomics. We therefore investigate the application of convolutional neural network (CNN) models to the discovery of sequence features determining cooperative and differential TF binding across tissues. We analyse ChIP-seq data from MEIS, TFs which are broadly expressed across mouse branchial arches, and HOXA2, which is expressed in the second and more posterior branchial arches. By developing models predictive of MEIS differential binding in all three tissues, we are able to accurately predict HOXA2 co-binding sites. We evaluate transfer-like and multitask approaches to regularizing the high-dimensional classification task with a larger regression dataset, allowing for the creation of deeper and more accurate models. We test the performance of perturbation and gradient-based attribution methods in identifying the HOXA2 sites from differential MEIS data. Our results show that deep regularized models significantly outperform shallow CNNs as well as k-mer methods in the discovery of tissue-specific sites bound in vivo.


Sign in / Sign up

Export Citation Format

Share Document