Machine learning models to predictin vivodrug response via optimal dimensionality reduction of tumour molecular profiles

Mapping Intimacies ◽

10.1101/277772 ◽

2018 ◽

Cited By ~ 3

Author(s):

Linh Nguyen ◽

Stefan Naulaerts ◽

Alexandra Bomane ◽

Alejandra Bruna ◽

Ghita Ghislat ◽

...

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Single Gene ◽

Response Prediction ◽

Gene Marker ◽

Data Sets ◽

Precision Oncology ◽

Gene Markers ◽

Combining Data ◽

Gene Alterations

ABSTRACTInter-tumour heterogeneity is one of cancer’s most fundamental features. Patient stratification based on drug response prediction is hence needed for effective anti-cancer therapy. However, lessons from the past indicate that single-gene markers of response are rare and/or often fail to achieve a significant impact in clinic. In this context, Machine Learning (ML) is emerging as a particularly promising complementary approach to precision oncology. Here we leverage comprehensive Patient-Derived Xenograft (PDX) pharmacogenomic data sets with dimensionality-reducing ML algorithms with this purpose. Results show that combining multiple gene alterations via ML leads to better discrimination between sensitive and resistant PDXs in 19 of the 26 analysed cases. Highly predictive ML models employing concise gene lists were found for three cases: Paclitaxel (breast cancer), Binimetinib (breast cancer) and Cetuximab (colorectal cancer). Interestingly, each of these ML models identify some responsive PDXs not harbouring the best actionable mutation for that case (such PDXs were missed by those single-gene markers). Moreover, ML multi-gene predictors generally retrieve a much higher proportion of treatment-sensitive PDXs than the corresponding single-gene marker. As PDXs often recapitulate clinical outcomes, these results suggest that many more patients could benefit from precision oncology if multiple ML algorithms were applied to existing clinical pharmacogenomics data, especially those algorithms generating classifiers combining data-selected gene alterations.

Download Full-text

Predicting Cancer Drug Response In Vivo by Learning an Optimal Feature Selection of Tumour Molecular Profiles

Biomedicines ◽

10.3390/biomedicines9101319 ◽

2021 ◽

Vol 9 (10) ◽

pp. 1319

Author(s):

Linh C. Nguyen ◽

Stefan Naulaerts ◽

Alejandra Bruna ◽

Ghita Ghislat ◽

Pedro J. Ballester

Keyword(s):

Breast Cancer ◽

Drug Response ◽

Single Gene ◽

Cancer Drug ◽

Gene Marker ◽

Precision Oncology ◽

Gene Markers ◽

Combining Data ◽

Gene Alterations

(1) Background: Inter-tumour heterogeneity is one of cancer’s most fundamental features. Patient stratification based on drug response prediction is hence needed for effective anti-cancer therapy. However, single-gene markers of response are rare and/or may fail to achieve a significant impact in the clinic. Machine Learning (ML) is emerging as a particularly promising complementary approach to precision oncology. (2) Methods: Here we leverage comprehensive Patient-Derived Xenograft (PDX) pharmacogenomic data sets with dimensionality-reducing ML algorithms with this purpose. (3) Results: Combining multiple gene alterations via ML leads to better discrimination between sensitive and resistant PDXs in 19 of the 26 analysed cases. Highly predictive ML models employing concise gene lists were found for three cases: paclitaxel (breast cancer), binimetinib (breast cancer) and cetuximab (colorectal cancer). Interestingly, each of these multi-gene ML models identifies some treatment-responsive PDXs not harbouring the best actionable mutation for that case. Thus, ML multi-gene predictors generally have much fewer false negatives than the corresponding single-gene marker. (4) Conclusions: As PDXs often recapitulate clinical outcomes, these results suggest that many more patients could benefit from precision oncology if ML algorithms were also applied to existing clinical pharmacogenomics data, especially those algorithms generating classifiers combining data-selected gene alterations.

Download Full-text

Machine Learning-Based A Priori Chemotherapy Response Prediction in Breast Cancer Patients using Textural CT Biomarkers*

2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC) ◽

10.1109/embc44109.2020.9176099 ◽

2020 ◽

Author(s):

Hadi Moghadas-Dastjerdi ◽

Hira R. Sha-E-Tallat ◽

Lakshmanan Sannachi ◽

Laurentius O. Osapoeta ◽

Ali Sadeghi-Naini ◽

...

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Cancer Patients ◽

A Priori ◽

Response Prediction ◽

Chemotherapy Response ◽

Breast Cancer Patients

Download Full-text

Can machine learning be used to reduce overtreatment of the axilla in breast cancer? Results from a retrospective cohort study (Preprint)

10.2196/preprints.34600 ◽

2021 ◽

Author(s):

Felix Jozsa ◽

Rose Baker ◽

Peter Kelly ◽

Muneer Ahmed ◽

Michael Douek

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Axillary Node ◽

Machine Learning Techniques ◽

Data Sets ◽

Axillary Ultrasound ◽

Primary Surgery ◽

Large Patient ◽

Specificity And Sensitivity ◽

Axillary Node Clearance

BACKGROUND Patients with early breast cancer undergoing primary surgery who have low axillary nodal burden can safely forego axillary node clearance (ANC). However, routine use of axillary ultrasound (AUS) leads to 43% of patients in this group having ANC unnecessarily following a positive AUS. The intersection of machine learning with medicine can provide innovative ways to understand specific risk within large patient data sets, but this has not yet been trialled in the arena of axillary node management in breast cancer. OBJECTIVE To assess if machine learning techniques could be used to improve pre-operative identification of patients with low and high axillary metastatic burden. METHODS A single-centre retrospective analysis was performed on patients with breast cancer who had a preoperative axillary ultrasound, and the specificity and sensitivity of AUS were calculated. Machine learning and standard statistical methods were applied to the data to see if, when used preoperatively, they could have improved the accuracy of AUS to better discern between high and low axillary burden. RESULTS The study included 459 patients; 31% (n=142) had a positive AUS, and, among this group, 62% (n=88) had two or fewer macrometastatic nodes at ANC. When applied to the dataset, logistic regression outperformed AUS and machine learning methods with a specificity of 0.950, correctly identifying 66 patients in this group who had been incorrectly classed as having high axillary burden by AUS alone. Of all the methods, the artificial neural network had the highest accuracy (0.919). Interestingly, AUS had the highest sensitivity of all methods (0.777), underlining its utility in this setting. CONCLUSIONS Machine learning greatly improves identification of the important subgroup of patients with no palpable axillary disease, positive ultrasound, and more than two metastatically involved nodes. A negative ultrasound in patients with no palpable lymphadenopathy is highly indicative of low burden and it is unclear if sentinel node biopsy adds value in this situation. CLINICALTRIAL n/a

Download Full-text

Molecular markers for circulating tumor cells in breast cancer.

Journal of Clinical Oncology ◽

10.1200/jco.2011.29.27_suppl.223 ◽

2011 ◽

Vol 29 (27_suppl) ◽

pp. 223-223

Author(s):

R. Zeillinger ◽

E. Obermayr ◽

A. Fink-Retter ◽

G. Heinze ◽

A. Reinthaller ◽

...

Keyword(s):

Breast Cancer ◽

Cancer Patients ◽

Tumor Cells ◽

Circulating Tumor Cells ◽

Gene Panel ◽

Control Group ◽

Gene Marker ◽

Breast Cancer Patients ◽

Gene Markers ◽

Blood Samples

223 Background: Recently, we identified a six gene panel (CCNE2, DKFZp762E1312, EMP2, MAL2, PPIC, and SLC6A8) for the RT-qPCR based detection of circulating tumor cells (CTC) in breast cancer patients. The aim of the present study was to evaluate the gene panel in further blood samples. Methods: Blood samples were taken from breast cancer patients with metastatic disease (MBC, N=10) or with no evidence of disease (NED, N=30). Putative CTC were enriched by Oncoquick density gradient centrifugation. Total RNA was isolated with RNeasy Micro Kit (QIAgen). Template cDNA was generated with M-MLV Reverse Transcriptase, RNase H Minus (Promega) and random nonamers as primers. RT-qPCR was performed in duplicate reactions using TaqMan Assays (Applied Biosystems) with default thermal cycling parameters. Raw data were analyzed with the AB7900 Sequence Detection Software version 2.2.2 using automatic baseline correction and manual cycle threshold setting. Gene expression was normalized to GAPDH expression. A threshold value TX for each gene X was set at two standard deviations above the mean dCtX value in the healthy control group. A patient was defined as CTC-positive, if at least one gene marker was over-expressed compared to the defined threshold. Results: The gene panel consisting of CCNE2, DKFZp762E1312, EMP2, MAL2, PPIC, and SLC6A8 identified 4/11 MBC but only 5/27 NED patients as CTC positive (p=0.163). By adding known CTC markers (SCGB2A2, TFF1, FXYD3, AGR2, S100A18, and EPCAM) to the panel, 7/11 MBC but only 6/27 NED patients were CTC positive (p=0.018). The presence of CTC in NED patients correlated with pN staging (p=0.026). Only one out of the six CTC positive NED patients relapsed within the observation period (median 35 months, range 25-39 months from blood sampling). We observed no correlation of CTC positivity and recurrence in NED patients. Conclusions: The sensitivity of the RT-qPCR based CTC detection in breast cancer patients may be enhanced by adding known CTC markers (SCGB2A2, TFF1, FXYD3, AGR2, S100A18, and EPCAM) to the six gene panel (CCNE2, DKFZp762E1312, EMP2, MAL2, PPIC, and SLC6A8). Longer follow-up times are needed to evaluate the predictive value of the gene markers on survival.

Download Full-text

Systematic assessment of multi-gene predictors of pan-cancer cell line sensitivity to drugs exploiting gene expression data

F1000Research ◽

10.12688/f1000research.10529.1 ◽

2016 ◽

Vol 5 ◽

pp. 2927 ◽

Cited By ~ 9

Author(s):

Linh Nguyen ◽

Cuong C Dang ◽

Pedro J. Ballester

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Cell Line ◽

Cell Lines ◽

Gene Expression Data ◽

Single Gene ◽

Cancer Cell Line ◽

Expression Data ◽

Gene Markers ◽

Pan Cancer

Background:Selected gene mutations are routinely used to guide the selection of cancer drugs for a given patient tumour. Large pharmacogenomic data sets were introduced to discover more of these single-gene markers of drug sensitivity. Very recently, machine learning regression has been used to investigate how well cancer cell line sensitivity to drugs is predicted depending on the type of molecular profile. The latter has revealed that gene expression data is the most predictive profile in the pan-cancer setting. However, no study to date has exploited GDSC data to systematically compare the performance of machine learning models based on multi-gene expression data against that of widely-used single-gene markers based on genomics data.Methods:Here we present this systematic comparison using Random Forest (RF) classifiers exploiting the expression levels of 13,321 genes and an average of 501 tested cell lines per drug. To account for time-dependent batch effects in IC50measurements, we employ independent test sets generated with more recent GDSC data than that used to train the predictors and show that this is a more realistic validation than K-fold cross-validation.Results and Discussion:Across 127 GDSC drugs, our results show that the single-gene markers unveiled by the MANOVA analysis tend to achieve higher precision than these RF-based multi-gene models, at the cost of generally having a poor recall (i.e. correctly detecting only a small part of the cell lines sensitive to the drug). Regarding overall classification performance, about two thirds of the drugs are better predicted by multi-gene RF classifiers. Among the drugs with the most predictive of these models, we found pyrimethamine, sunitinib and 17-AAG.Conclusions:We now know that this type of models can predictin vitrotumour response to these drugs. These models can thus be further investigated onin vivotumour models.

Download Full-text

Development and clinical validation of Lantern Pharma’s AI engine: Response algorithm for drug positioning and rescue (RADR).

Journal of Clinical Oncology ◽

10.1200/jco.2019.37.15_suppl.3114 ◽

2019 ◽

Vol 37 (15_suppl) ◽

pp. 3114-3114

Author(s):

Umesh Kathad ◽

Yuvanesh Vedaraju ◽

Aditya Kulkarni ◽

Gregory Tobin ◽

Panna Sharma

Keyword(s):

Breast Cancer ◽

Gene Expression ◽

Machine Learning ◽

Prediction Accuracy ◽

Response Prediction ◽

Supervised Machine Learning ◽

Clinical Validation ◽

Tumor Type ◽

Patient Records ◽

Number Of Patients

3114 Background: The Response Algorithm for Drug positioning and Rescue (RADR) technology is Lantern Pharma's proprietary Artificial Intelligence (Al)-based machine learning approach for biomarker identification and patient stratification. RADR is a combination of three automated modules working sequentially to generate drug- and tumor type-specific gene signatures predictive of response. Methods: RADR integrates genomics, drug sensitivity and systems biology inputs with supervised machine learning strategies and generates gene expression-based responder/ non-responder profiles for specific tumor indications with high accuracy, in addition to identification of new correlations of genetic biomarkers with drug activity. Pre-treatment patient gene expression profiles along with corresponding treatment outcomes were used as algorithm inputs. Model training was typically performed using an initial set of genes derived from cancer cell line data when available, and further applied to patient data for model tuning, cross-validation and final gene signature development. Model testing and performance computation were carried out on patient records held out as blinded datasets. Response prediction accuracy and sensitivity were among the model performance metrics calculated. Results: On average, RADR achieved a response prediction accuracy of 80% during clinical validation. We present retrospective analyses performed as part of RADR validation using more than 10 independent datasets of patients from selected cancer types treated with approved drugs including chemotherapy, targeted therapy and immunotherapy agents. For an instance, the application of the RADR program to a Paclitaxel trial in breast cancer patients could have potentially reduced the number of patients in the treatment arm from 92 unselected patients to 24 biomarker-selected patients to produce the same number of responders. Also, we cite published evidence correlating genes from RADR derived biomarkers with increased Paclitaxel sensitivity in breast cancer. Conclusions: The value of RADR platform architecture is derived from its validation through the analysis of about ~17 million oncology-specific clinical data points, and ~1000 patient records. By implementing unique biological, statistical and machine learning workflows, Lantern Pharma's RADR technology is capable of deriving robust biomarker panels for pre-selecting true responders for recruitment into clinical trials which may improve the success rate of oncology drug approvals.

Download Full-text

Handling Large Medical Data Sets for Disease Detection

Biomedical Engineering and Information Systems - Advances in Bioinformatics and Biomedical Engineering ◽

10.4018/978-1-61692-004-3.ch008 ◽

2011 ◽

pp. 162-176 ◽

Cited By ~ 2

Author(s):

Rahul Kala ◽

Anupam Shukla ◽

Ritu Tiwari

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Biomedical Engineering ◽

Intelligent Systems ◽

Computational Cost ◽

Poor Performance ◽

Medical Data ◽

Data Sets ◽

Training Time ◽

Low Computational Cost

The breakthrough in the field of intelligent systems has spread its fruits to the field of biomedical engineering as well; where a series of models are being applied to automatically detect diseases based on some parameters or inputs. The continuous research in this field has resulted in a large amount of database being created for many diseases which becomes very difficult to train. Also the number of attributes is under constant rise. This increases the dimensionality of the problem and ultimately leads to poor performance. In this chapter we deal with the methods to handle these situations. We discuss the mechanism to divide data between different sub-systems. We also discuss the method of division of the attributes to reduce the training time and complexity. The resultant systems are able to train better due to low computational cost and hence give better performance. We validated this with the Breast Cancer database from the UCI Machine Learning repository and found our algorithm optimal.

Download Full-text

Machine learning and feature selection for drug response prediction in precision oncology applications

Biophysical Reviews ◽

10.1007/s12551-018-0446-z ◽

2018 ◽

Vol 11 (1) ◽

pp. 31-39 ◽

Cited By ~ 45

Author(s):

Mehreen Ali ◽

Tero Aittokallio

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Drug Response ◽

Response Prediction ◽

Precision Oncology ◽

Selection For

Download Full-text

Deep learning assisted multi-omics integration for survival and drug-response prediction in breast cancer

BMC Genomics ◽

10.1186/s12864-021-07524-2 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Vidhi Malik ◽

Yogesh Kalakoti ◽

Durai Sundar

Keyword(s):

Breast Cancer ◽

Cancer Patients ◽

Drug Response ◽

Prediction Models ◽

Response Prediction ◽

Precision Oncology ◽

Z Score ◽

Breast Cancer Patients ◽

Omics Integration ◽

Ic50 Values

Abstract Background Survival and drug response are two highly emphasized clinical outcomes in cancer research that directs the prognosis of a cancer patient. Here, we have proposed a late multi omics integrative framework that robustly quantifies survival and drug response for breast cancer patients with a focus on the relative predictive ability of available omics datatypes. Neighborhood component analysis (NCA), a supervised feature selection algorithm selected relevant features from multi-omics datasets retrieved from The Cancer Genome Atlas (TCGA) and Genomics of Drug Sensitivity in Cancer (GDSC) databases. A Neural network framework, fed with NCA selected features, was used to develop survival and drug response prediction models for breast cancer patients. The drug response framework used regression and unsupervised clustering (K-means) to segregate samples into responders and non-responders based on their predicted IC50 values (Z-score). Results The survival prediction framework was highly effective in categorizing patients into risk subtypes with an accuracy of 94%. Compared to single-omics and early integration approaches, our drug response prediction models performed significantly better and were able to predict IC50 values (Z-score) with a mean square error (MSE) of 1.154 and an overall regression value of 0.92, showing a linear relationship between predicted and actual IC50 values. Conclusion The proposed omics integration strategy provides an effective way of extracting critical information from diverse omics data types enabling estimation of prognostic indicators. Such integrative models with high predictive power would have a significant impact and utility in precision oncology.

Download Full-text

Systematic assessment of multi-gene predictors of pan-cancer cell line sensitivity to drugs exploiting gene expression data

10.1101/095224 ◽

2016 ◽

Author(s):

Linh C. Nguyen ◽

Cuong C. Dang ◽

Pedro J. Ballester

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Cell Line ◽

Cell Lines ◽

Gene Expression Data ◽

Single Gene ◽

Cancer Cell Line ◽

Expression Data ◽

Gene Markers ◽

Pan Cancer

AbstractSelected gene mutations are routinely used to guide the selection of cancer drugs for a given patient tumour. Large pharmacogenomic data sets were introduced to discover more of these single-gene markers of drug sensitivity. Very recently, machine learning regression has been used to investigate how well cancer cell line sensitivity to drugs is predicted depending on the type of molecular profile. The latter has revealed that gene expression data is the most predictive profile in the pan-cancer setting. However, no study to date has exploited GDSC data to systematically compare the performance of machine learning models based on multi-gene expression data against that of widely-used single-gene markers based on genomics data.Here we present this systematic comparison using Random Forest (RF) classifiers exploiting the expression levels of 13,321 genes and an average of 501 tested cell lines per drug. To account for time-dependent batch effects in IC50measurements, we employ independent test sets generated with more recent GDSC data than that used to train the predictors and show that this is a more realistic validation than K-fold cross-validation. Across 127 GDSC drugs, our results show that the single-gene markers unveiled by the MANOVA analysis tend to achieve higher precision than these RF-based multi-gene models, at the cost of generally having a poor recall (i.e. correctly detecting only a small part of the cell lines sensitive to the drug). Regarding overall classification performance, about two thirds of the drugs are better predicted by multi-gene RF classifiers. Among the drugs with the most predictive of these models, we found pyrimethamine, sunitinib and 17-AAG.

Download Full-text