scholarly journals An Integrated Deep Network for Cancer Survival Prediction Using Omics Data

2021 ◽  
Vol 4 ◽  
Author(s):  
Hamid Reza Hassanzadeh ◽  
May D. Wang

As a highly sophisticated disease that humanity faces, cancer is known to be associated with dysregulation of cellular mechanisms in different levels, which demands novel paradigms to capture informative features from different omics modalities in an integrated way. Successful stratification of patients with respect to their molecular profiles is a key step in precision medicine and in tailoring personalized treatment for critically ill patients. In this article, we use an integrated deep belief network to differentiate high-risk cancer patients from the low-risk ones in terms of the overall survival. Our study analyzes RNA, miRNA, and methylation molecular data modalities from both labeled and unlabeled samples to predict cancer survival and subsequently to provide risk stratification. To assess the robustness of our novel integrative analytics, we utilize datasets of three cancer types with 836 patients and show that our approach outperforms the most successful supervised and semi-supervised classification techniques applied to the same cancer prediction problems. In addition, despite the preconception that deep learning techniques require large size datasets for proper training, we have illustrated that our model can achieve better results for moderately sized cancer datasets.

2020 ◽  
Author(s):  
Luis Andre Vale-Silva ◽  
Karl Rohr

The age of precision medicine demands powerful computational techniques to handle high-dimensional patient data. We present MultiSurv, a multimodal deep learning method for long-term pan-cancer survival prediction. MultiSurv is composed of three main modules. A feature representation module includes a dedicated submodel for each input data modality. A data fusion layer aggregates the multimodal representations. Finally, a prediction submodel yields conditional survival probabilities for a predefined set of follow-up time intervals. We trained MultiSurv on clinical, imaging, and four different high-dimensional omics data modalities from patients diagnosed with one of 33 different cancer types. We evaluated unimodal input configurations against several previous methods and different multimodal data combinations. MultiSurv achieved the best results according to different time-dependent metrics and delivered highly accurate long-term patient survival curves. The best performance was obtained when combining clinical information with either gene expression or DNA methylation data, depending on the evaluation metric. Additionally, MultiSurv can handle missing data, including missing values and complete data modalities. Interestingly, for unimodal data we found that simpler modeling approaches, including the classical Cox proportional hazards method, can achieve results rivaling those of more complex methods for certain data modalities. We also show how the learned feature representations of MultiSurv can be used to visualize relationships between cancer types and individual patients, after embedding into a low-dimensional space.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Luís A. Vale-Silva ◽  
Karl Rohr

AbstractThe age of precision medicine demands powerful computational techniques to handle high-dimensional patient data. We present MultiSurv, a multimodal deep learning method for long-term pan-cancer survival prediction. MultiSurv uses dedicated submodels to establish feature representations of clinical, imaging, and different high-dimensional omics data modalities. A data fusion layer aggregates the multimodal representations, and a prediction submodel generates conditional survival probabilities for follow-up time intervals spanning several decades. MultiSurv is the first non-linear and non-proportional survival prediction method that leverages multimodal data. In addition, MultiSurv can handle missing data, including single values and complete data modalities. MultiSurv was applied to data from 33 different cancer types and yields accurate pan-cancer patient survival curves. A quantitative comparison with previous methods showed that Multisurv achieves the best results according to different time-dependent metrics. We also generated visualizations of the learned multimodal representation of MultiSurv, which revealed insights on cancer characteristics and heterogeneity.


Author(s):  
Wei Wang ◽  
Wei Liu

Abstract Motivation Accurately predicting the risk of cancer patients is a central challenge for clinical cancer research. For high-dimensional gene expression data, Cox proportional hazard model with the least absolute shrinkage and selection operator for variable selection (Lasso-Cox) is one of the most popular feature selection and risk prediction algorithms. However, the Lasso-Cox model treats all genes equally, ignoring the biological characteristics of the genes themselves. This often encounters the problem of poor prognostic performance on independent datasets. Results Here, we propose a Reweighted Lasso-Cox (RLasso-Cox) model to ameliorate this problem by integrating gene interaction information. It is based on the hypothesis that topologically important genes in the gene interaction network tend to have stable expression changes. We used random walk to evaluate the topological weight of genes, and then highlighted topologically important genes to improve the generalization ability of the RLasso-Cox model. Experiments on datasets of three cancer types showed that the RLasso-Cox model improves the prognostic accuracy and robustness compared with the Lasso-Cox model and several existing network-based methods. More importantly, the RLasso-Cox model has the advantage of identifying small gene sets with high prognostic performance on independent datasets, which may play an important role in identifying robust survival biomarkers for various cancer types. Availability and implementation http://bioconductor.org/packages/devel/bioc/html/RLassoCox.html Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Erik van Dijk ◽  
Tom van den Bosch ◽  
Kristiaan J. Lenos ◽  
Khalid El Makrini ◽  
Lisanne E. Nijman ◽  
...  

AbstractSurvival rates of cancer patients vary widely within and between malignancies. While genetic aberrations are at the root of all cancers, individual genomic features cannot explain these distinct disease outcomes. In contrast, intra-tumour heterogeneity (ITH) has the potential to elucidate pan-cancer survival rates and the biology that drives cancer prognosis. Unfortunately, a comprehensive and effective framework to measure ITH across cancers is missing. Here, we introduce a scalable measure of chromosomal copy number heterogeneity (CNH) that predicts patient survival across cancers. We show that the level of ITH can be derived from a single-sample copy number profile. Using gene-expression data and live cell imaging we demonstrate that ongoing chromosomal instability underlies the observed heterogeneity. Analysing 11,534 primary cancer samples from 37 different malignancies, we find that copy number heterogeneity can be accurately deduced and predicts cancer survival across tissues of origin and stages of disease. Our results provide a unifying molecular explanation for the different survival rates observed between cancer types.


2013 ◽  
Vol 106 ◽  
pp. S193-S194 ◽  
Author(s):  
A. Dekker ◽  
G. Nalbantov ◽  
C. Oberije ◽  
W. Wiessler ◽  
M. Eble ◽  
...  

2018 ◽  
Vol 2018 ◽  
pp. 1-10 ◽  
Author(s):  
Bo Wang ◽  
Jing Zhang

Long noncoding RNAs (lncRNAs) have an important role in various life processes of the body, especially cancer. The analysis of disease prognosis is ignored in current prediction on lncRNA–disease associations. In this study, a multiple linear regression model was constructed for lncRNA–disease association prediction based on clinical prognosis data (MlrLDAcp), which integrated the cancer data of clinical prognosis and the expression quantity of lncRNA transcript. MlrLDAcp could realize not only cancer survival prediction but also lncRNA–disease association prediction. Ultimately, 60 lncRNAs most closely related to prostate cancer survival were selected from 481 alternative lncRNAs. Then, the multiple linear regression relationship between the prognosis survival of 176 patients with prostate cancer and 60 lncRNAs was also given. Compared with previous studies, MlrLDAcp had a predominant survival predictive ability and could effectively predict lncRNA–disease associations. MlrLDAcp had an area under the curve (AUC) value of 0.875 for survival prediction and an AUC value of 0.872 for lncRNA–disease association prediction. It could be an effective biological method for biomedical research.


2021 ◽  
Vol 2021 ◽  
pp. 1-7
Author(s):  
Juncai Li ◽  
Xiaofei Jiang

Molecular property prediction is an essential task in drug discovery. Most computational approaches with deep learning techniques either focus on designing novel molecular representation or combining with some advanced models together. However, researchers pay fewer attention to the potential benefits in massive unlabeled molecular data (e.g., ZINC). This task becomes increasingly challenging owing to the limitation of the scale of labeled data. Motivated by the recent advancements of pretrained models in natural language processing, the drug molecule can be naturally viewed as language to some extent. In this paper, we investigate how to develop the pretrained model BERT to extract useful molecular substructure information for molecular property prediction. We present a novel end-to-end deep learning framework, named Mol-BERT, that combines an effective molecular representation with pretrained BERT model tailored for molecular property prediction. Specifically, a large-scale prediction BERT model is pretrained to generate the embedding of molecular substructures, by using four million unlabeled drug SMILES (i.e., ZINC 15 and ChEMBL 27). Then, the pretrained BERT model can be fine-tuned on various molecular property prediction tasks. To examine the performance of our proposed Mol-BERT, we conduct several experiments on 4 widely used molecular datasets. In comparison to the traditional and state-of-the-art baselines, the results illustrate that our proposed Mol-BERT can outperform the current sequence-based methods and achieve at least 2% improvement on ROC-AUC score on Tox21, SIDER, and ClinTox dataset.


2020 ◽  
Author(s):  
Georgios Kantidakis ◽  
Hein Putter ◽  
Carlo Lancia ◽  
Jacob de Boer ◽  
Andries E Braat ◽  
...  

Abstract Background: Predicting survival of recipients after liver transplantation is regarded as one of the most important challenges in contemporary medicine. Hence, improving on current prediction models is of great interest.Nowadays, there is a strong discussion in the medical field about machine learning (ML) and whether it has greater potential than traditional regression models when dealing with complex data. Criticism to ML is related to unsuitable performance measures and lack of interpretability which is important for clinicians.Methods: In this paper, ML techniques such as random forests and neural networks are applied to large data of 62294 patients from the United States with 97 predictors selected on clinical/statistical grounds, over more than 600, to predict survival from transplantation. Of particular interest is also the identification of potential risk factors. A comparison is performed between 3 different Cox models (with all variables, backward selection and LASSO) and 3 machine learning techniques: a random survival forest and 2 partial logistic artificial neural networks (PLANNs). For PLANNs, novel extensions to their original specification are tested. Emphasis is given on the advantages and pitfalls of each method and on the interpretability of the ML techniques.Results: Well-established predictive measures are employed from the survival field (C-index, Brier score and Integrated Brier Score) and the strongest prognostic factors are identified for each model. Clinical endpoint is overall graft-survival defined as the time between transplantation and the date of graft-failure or death. The random survival forest shows slightly better predictive performance than Cox models based on the C-index. Neural networks show better performance than both Cox models and random survival forest based on the Integrated Brier Score at 10 years.Conclusion: In this work, it is shown that machine learning techniques can be a useful tool for both prediction and interpretation in the survival context. From the ML techniques examined here, PLANN with 1 hidden layer predicts survival probabilities the most accurately, being as calibrated as the Cox model with all variables.


Sign in / Sign up

Export Citation Format

Share Document