An Integrated Deep Network for Cancer Survival Prediction Using Omics Data

Frontiers in Big Data ◽

10.3389/fdata.2021.568352 ◽

2021 ◽

Vol 4 ◽

Author(s):

Hamid Reza Hassanzadeh ◽

May D. Wang

Keyword(s):

Cancer Survival ◽

Molecular Data ◽

Survival Prediction ◽

Cellular Mechanisms ◽

Large Size ◽

Learning Techniques ◽

Proper Training ◽

Cancer Types ◽

High Risk Cancer ◽

Prediction Problems

As a highly sophisticated disease that humanity faces, cancer is known to be associated with dysregulation of cellular mechanisms in different levels, which demands novel paradigms to capture informative features from different omics modalities in an integrated way. Successful stratification of patients with respect to their molecular profiles is a key step in precision medicine and in tailoring personalized treatment for critically ill patients. In this article, we use an integrated deep belief network to differentiate high-risk cancer patients from the low-risk ones in terms of the overall survival. Our study analyzes RNA, miRNA, and methylation molecular data modalities from both labeled and unlabeled samples to predict cancer survival and subsequently to provide risk stratification. To assess the robustness of our novel integrative analytics, we utilize datasets of three cancer types with 836 patients and show that our approach outperforms the most successful supervised and semi-supervised classification techniques applied to the same cancer prediction problems. In addition, despite the preconception that deep learning techniques require large size datasets for proper training, we have illustrated that our model can achieve better results for moderately sized cancer datasets.

Download Full-text

MultiSurv: Long-term cancer survival prediction using multimodal deep learning

10.1101/2020.08.06.20169698 ◽

2020 ◽

Author(s):

Luis Andre Vale-Silva ◽

Karl Rohr

Keyword(s):

Deep Learning ◽

Missing Values ◽

Cancer Survival ◽

Dimensional Space ◽

Feature Representation ◽

High Dimensional ◽

Computational Techniques ◽

Survival Prediction ◽

Cancer Types

The age of precision medicine demands powerful computational techniques to handle high-dimensional patient data. We present MultiSurv, a multimodal deep learning method for long-term pan-cancer survival prediction. MultiSurv is composed of three main modules. A feature representation module includes a dedicated submodel for each input data modality. A data fusion layer aggregates the multimodal representations. Finally, a prediction submodel yields conditional survival probabilities for a predefined set of follow-up time intervals. We trained MultiSurv on clinical, imaging, and four different high-dimensional omics data modalities from patients diagnosed with one of 33 different cancer types. We evaluated unimodal input configurations against several previous methods and different multimodal data combinations. MultiSurv achieved the best results according to different time-dependent metrics and delivered highly accurate long-term patient survival curves. The best performance was obtained when combining clinical information with either gene expression or DNA methylation data, depending on the evaluation metric. Additionally, MultiSurv can handle missing data, including missing values and complete data modalities. Interestingly, for unimodal data we found that simpler modeling approaches, including the classical Cox proportional hazards method, can achieve results rivaling those of more complex methods for certain data modalities. We also show how the learned feature representations of MultiSurv can be used to visualize relationships between cancer types and individual patients, after embedding into a low-dimensional space.

Download Full-text

Long-term cancer survival prediction using multimodal deep learning

Scientific Reports ◽

10.1038/s41598-021-92799-4 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Luís A. Vale-Silva ◽

Karl Rohr

Keyword(s):

Deep Learning ◽

Cancer Survival ◽

Prediction Method ◽

High Dimensional ◽

Computational Techniques ◽

Survival Prediction ◽

Feature Representations ◽

Cancer Types ◽

Pan Cancer

AbstractThe age of precision medicine demands powerful computational techniques to handle high-dimensional patient data. We present MultiSurv, a multimodal deep learning method for long-term pan-cancer survival prediction. MultiSurv uses dedicated submodels to establish feature representations of clinical, imaging, and different high-dimensional omics data modalities. A data fusion layer aggregates the multimodal representations, and a prediction submodel generates conditional survival probabilities for follow-up time intervals spanning several decades. MultiSurv is the first non-linear and non-proportional survival prediction method that leverages multimodal data. In addition, MultiSurv can handle missing data, including single values and complete data modalities. MultiSurv was applied to data from 33 different cancer types and yields accurate pan-cancer patient survival curves. A quantitative comparison with previous methods showed that Multisurv achieves the best results according to different time-dependent metrics. We also generated visualizations of the learned multimodal representation of MultiSurv, which revealed insights on cancer characteristics and heterogeneity.

Download Full-text

Integration of gene interaction information into a reweighted Lasso-Cox model for accurate survival prediction

Bioinformatics ◽

10.1093/bioinformatics/btaa1046 ◽

2020 ◽

Author(s):

Wei Wang ◽

Wei Liu

Keyword(s):

Cox Model ◽

Gene Interaction ◽

Supplementary Information ◽

Survival Prediction ◽

Gene Interaction Network ◽

Clinical Cancer Research ◽

Prognostic Performance ◽

Interaction Information ◽

Cancer Types ◽

Risk Of Cancer

Abstract Motivation Accurately predicting the risk of cancer patients is a central challenge for clinical cancer research. For high-dimensional gene expression data, Cox proportional hazard model with the least absolute shrinkage and selection operator for variable selection (Lasso-Cox) is one of the most popular feature selection and risk prediction algorithms. However, the Lasso-Cox model treats all genes equally, ignoring the biological characteristics of the genes themselves. This often encounters the problem of poor prognostic performance on independent datasets. Results Here, we propose a Reweighted Lasso-Cox (RLasso-Cox) model to ameliorate this problem by integrating gene interaction information. It is based on the hypothesis that topologically important genes in the gene interaction network tend to have stable expression changes. We used random walk to evaluate the topological weight of genes, and then highlighted topologically important genes to improve the generalization ability of the RLasso-Cox model. Experiments on datasets of three cancer types showed that the RLasso-Cox model improves the prognostic accuracy and robustness compared with the Lasso-Cox model and several existing network-based methods. More importantly, the RLasso-Cox model has the advantage of identifying small gene sets with high prognostic performance on independent datasets, which may play an important role in identifying robust survival biomarkers for various cancer types. Availability and implementation http://bioconductor.org/packages/devel/bioc/html/RLassoCox.html Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Chromosomal copy number heterogeneity predicts survival rates across cancers

Nature Communications ◽

10.1038/s41467-021-23384-6 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Erik van Dijk ◽

Tom van den Bosch ◽

Kristiaan J. Lenos ◽

Khalid El Makrini ◽

Lisanne E. Nijman ◽

...

Keyword(s):

Copy Number ◽

Cancer Survival ◽

Survival Rates ◽

Cancer Prognosis ◽

Disease Outcomes ◽

Distinct Disease ◽

Cancer Types ◽

Chromosomal Copy Number ◽

Chromosomal Copy ◽

Pan Cancer

AbstractSurvival rates of cancer patients vary widely within and between malignancies. While genetic aberrations are at the root of all cancers, individual genomic features cannot explain these distinct disease outcomes. In contrast, intra-tumour heterogeneity (ITH) has the potential to elucidate pan-cancer survival rates and the biology that drives cancer prognosis. Unfortunately, a comprehensive and effective framework to measure ITH across cancers is missing. Here, we introduce a scalable measure of chromosomal copy number heterogeneity (CNH) that predicts patient survival across cancers. We show that the level of ITH can be derived from a single-sample copy number profile. Using gene-expression data and live cell imaging we demonstrate that ongoing chromosomal instability underlies the observed heterogeneity. Analysing 11,534 primary cancer samples from 37 different malignancies, we find that copy number heterogeneity can be accurately deduced and predicts cancer survival across tissues of origin and stages of disease. Our results provide a unifying molecular explanation for the different survival rates observed between cancer types.

Download Full-text

PD-0496: Multi-centric learning with a federated IT infrastructure: application to 2-year lung-cancer survival prediction

Radiotherapy and Oncology ◽

10.1016/s0167-8140(15)32802-4 ◽

2013 ◽

Vol 106 ◽

pp. S193-S194 ◽

Cited By ~ 3

Author(s):

A. Dekker ◽

G. Nalbantov ◽

C. Oberije ◽

W. Wiessler ◽

M. Eble ◽

...

Keyword(s):

Lung Cancer ◽

Cancer Survival ◽

Survival Prediction ◽

It Infrastructure ◽

Lung Cancer Survival

Download Full-text

Multiple Linear Regression Analysis of lncRNA–Disease Association Prediction Based on Clinical Prognosis Data

BioMed Research International ◽

10.1155/2018/3823082 ◽

2018 ◽

Vol 2018 ◽

pp. 1-10 ◽

Cited By ~ 2

Author(s):

Bo Wang ◽

Jing Zhang

Keyword(s):

Prostate Cancer ◽

Linear Regression ◽

Multiple Linear Regression ◽

Cancer Survival ◽

Disease Association ◽

The Body ◽

Survival Prediction ◽

Clinical Prognosis ◽

Disease Associations ◽

Auc Value

Long noncoding RNAs (lncRNAs) have an important role in various life processes of the body, especially cancer. The analysis of disease prognosis is ignored in current prediction on lncRNA–disease associations. In this study, a multiple linear regression model was constructed for lncRNA–disease association prediction based on clinical prognosis data (MlrLDAcp), which integrated the cancer data of clinical prognosis and the expression quantity of lncRNA transcript. MlrLDAcp could realize not only cancer survival prediction but also lncRNA–disease association prediction. Ultimately, 60 lncRNAs most closely related to prostate cancer survival were selected from 481 alternative lncRNAs. Then, the multiple linear regression relationship between the prognosis survival of 176 patients with prostate cancer and 60 lncRNAs was also given. Compared with previous studies, MlrLDAcp had a predominant survival predictive ability and could effectively predict lncRNA–disease associations. MlrLDAcp had an area under the curve (AUC) value of 0.875 for survival prediction and an AUC value of 0.872 for lncRNA–disease association prediction. It could be an effective biological method for biomedical research.

Download Full-text

Breast cancer survival prediction using seven prognostic biomarker genes

Oncology Letters ◽

10.3892/ol.2019.10635 ◽

2019 ◽

Cited By ~ 1

Author(s):

Liu Liu ◽

Zhilin Chen ◽

Wenjie Shi ◽

Hui Liu ◽

Weiyi Pang

Keyword(s):

Breast Cancer ◽

Cancer Survival ◽

Prognostic Biomarker ◽

Breast Cancer Survival ◽

Survival Prediction

Download Full-text

Mol-BERT: An Effective Molecular Representation with BERT for Molecular Property Prediction

Wireless Communications and Mobile Computing ◽

10.1155/2021/7181815 ◽

2021 ◽

Vol 2021 ◽

pp. 1-7

Author(s):

Juncai Li ◽

Xiaofei Jiang

Keyword(s):

Deep Learning ◽

Language Processing ◽

Large Scale ◽

Molecular Data ◽

Molecular Property ◽

Property Prediction ◽

Learning Framework ◽

Learning Techniques ◽

Potential Benefits ◽

Current Sequence

Molecular property prediction is an essential task in drug discovery. Most computational approaches with deep learning techniques either focus on designing novel molecular representation or combining with some advanced models together. However, researchers pay fewer attention to the potential benefits in massive unlabeled molecular data (e.g., ZINC). This task becomes increasingly challenging owing to the limitation of the scale of labeled data. Motivated by the recent advancements of pretrained models in natural language processing, the drug molecule can be naturally viewed as language to some extent. In this paper, we investigate how to develop the pretrained model BERT to extract useful molecular substructure information for molecular property prediction. We present a novel end-to-end deep learning framework, named Mol-BERT, that combines an effective molecular representation with pretrained BERT model tailored for molecular property prediction. Specifically, a large-scale prediction BERT model is pretrained to generate the embedding of molecular substructures, by using four million unlabeled drug SMILES (i.e., ZINC 15 and ChEMBL 27). Then, the pretrained BERT model can be fine-tuned on various molecular property prediction tasks. To examine the performance of our proposed Mol-BERT, we conduct several experiments on 4 widely used molecular datasets. In comparison to the traditional and state-of-the-art baselines, the results illustrate that our proposed Mol-BERT can outperform the current sequence-based methods and achieve at least 2% improvement on ROC-AUC score on Tox21, SIDER, and ClinTox dataset.

Download Full-text

Survival prediction models since liver transplantation - comparisons between Cox models and machine learning techniques

10.21203/rs.3.rs-22670/v3 ◽

2020 ◽

Author(s):

Georgios Kantidakis ◽

Hein Putter ◽

Carlo Lancia ◽

Jacob de Boer ◽

Andries E Braat ◽

...

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Liver Transplantation ◽

Prediction Models ◽

Machine Learning Techniques ◽

Brier Score ◽

Survival Prediction ◽

Cox Models ◽

Learning Techniques ◽

Random Survival Forest

Abstract Background: Predicting survival of recipients after liver transplantation is regarded as one of the most important challenges in contemporary medicine. Hence, improving on current prediction models is of great interest.Nowadays, there is a strong discussion in the medical field about machine learning (ML) and whether it has greater potential than traditional regression models when dealing with complex data. Criticism to ML is related to unsuitable performance measures and lack of interpretability which is important for clinicians.Methods: In this paper, ML techniques such as random forests and neural networks are applied to large data of 62294 patients from the United States with 97 predictors selected on clinical/statistical grounds, over more than 600, to predict survival from transplantation. Of particular interest is also the identification of potential risk factors. A comparison is performed between 3 different Cox models (with all variables, backward selection and LASSO) and 3 machine learning techniques: a random survival forest and 2 partial logistic artificial neural networks (PLANNs). For PLANNs, novel extensions to their original specification are tested. Emphasis is given on the advantages and pitfalls of each method and on the interpretability of the ML techniques.Results: Well-established predictive measures are employed from the survival field (C-index, Brier score and Integrated Brier Score) and the strongest prognostic factors are identified for each model. Clinical endpoint is overall graft-survival defined as the time between transplantation and the date of graft-failure or death. The random survival forest shows slightly better predictive performance than Cox models based on the C-index. Neural networks show better performance than both Cox models and random survival forest based on the Integrated Brier Score at 10 years.Conclusion: In this work, it is shown that machine learning techniques can be a useful tool for both prediction and interpretation in the survival context. From the ML techniques examined here, PLANN with 1 hidden layer predicts survival probabilities the most accurately, being as calibrated as the Cox model with all variables.

Download Full-text

A WCO Based Cancer Survival Prediction Using Statistical Feature Selection

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2020/121942020 ◽

2020 ◽

Vol 9 (4) ◽

pp. 5029-5034

Author(s):

Sanku Rajendra Kumar

Keyword(s):

Feature Selection ◽

Cancer Survival ◽

Survival Prediction ◽

Statistical Feature

Download Full-text