scholarly journals Feature selection strategies for drug sensitivity prediction

2019 ◽  
Author(s):  
Krzysztof Koras ◽  
Dilafruz Juraeva ◽  
Julian Kreis ◽  
Johanna Mazur ◽  
Eike Staub ◽  
...  

Drug sensitivity prediction constitutes one of the main challenges in personalized medicine. The major difficulty of this problem stems from the fact that the sensitivity of cancer cells to treatment depends on an unknown subset of a large number of biological features. Although feature selection is the key to interpretable results and identification of potential biomarkers, a comprehensive assessment of feature selection methods for drug sensitivity prediction has so far not been performed. We propose feature selection approaches driven by prior knowledge of drug targets, target pathways, and gene expression signatures. We asses these methodologies on Genomics of Drug Sensitivity in Cancer (GDSC) dataset, a panel of around 1000 cell lines screened against multiple anticancer compounds. We compare our results with a baseline model utilizing genome-wide gene expression features and common data-driven feature selection techniques. Together, 2484 unique models were evaluated, providing a comprehensive study of feature selection strategies for the drug response prediction problem. For 23 drugs, the models achieve better predictive performance when the features are selected according to prior knowledge of drug targets and pathways. The best correlation of observed and predicted response using the test set is achieved for Linifanib (r=0.75). Extending the drug-dependent features with gene expression signatures yields models that are most predictive of drug response for 60 drugs, with the best performing example of Dabrafenib. Examples of how pre-selection of features benefits the model interpretability are given for Dabrafenib, Linifanib and Quizartinib. Based on GDSC drug data, we find that feature selection driven by prior knowledge tends to yield better results for drugs targeting specific genes and pathways, while models with the genome-wide features perform better for drugs affecting general mechanisms such as metabolism and DNA replication. For a significant group of the compounds, even a very small number of features based on simple drug properties is often highly predictive of drug sensitivity, can explain the mechanism of drug action and be used as guidelines for their prescription. In general, choosing appropriate feature selection strategies has the potential to develop interpretable models that are indicative for therapy design.

2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Krzysztof Koras ◽  
Dilafruz Juraeva ◽  
Julian Kreis ◽  
Johanna Mazur ◽  
Eike Staub ◽  
...  

2020 ◽  
Vol 13 (S11) ◽  
Author(s):  
Khandakar Tanvir Ahmed ◽  
Sunho Park ◽  
Qibing Jiang ◽  
Yunku Yeu ◽  
TaeHyun Hwang ◽  
...  

Abstract Background Drug sensitivity prediction and drug responsive biomarker selection on high-throughput genomic data is a critical step in drug discovery. Many computational methods have been developed to serve this purpose including several deep neural network models. However, the modular relations among genomic features have been largely ignored in these methods. To overcome this limitation, the role of the gene co-expression network on drug sensitivity prediction is investigated in this study. Methods In this paper, we first introduce a network-based method to identify representative features for drug response prediction by using the gene co-expression network. Then, two graph-based neural network models are proposed and both models integrate gene network information directly into neural network for outcome prediction. Next, we present a large-scale comparative study among the proposed network-based methods, canonical prediction algorithms (i.e., Elastic Net, Random Forest, Partial Least Squares Regression, and Support Vector Regression), and deep neural network models for drug sensitivity prediction. All the source code and processed datasets in this study are available at https://github.com/compbiolabucf/drug-sensitivity-prediction. Results In the comparison of different feature selection methods and prediction methods on a non-small cell lung cancer (NSCLC) cell line RNA-seq gene expression dataset with 50 different drug treatments, we found that (1) the network-based feature selection method improves the prediction performance compared to Pearson correlation coefficients; (2) Random Forest outperforms all the other canonical prediction algorithms and deep neural network models; (3) the proposed graph-based neural network models show better prediction performance compared to deep neural network model; (4) the prediction performance is drug dependent and it may relate to the drug’s mechanism of action. Conclusions Network-based feature selection method and prediction models improve the performance of the drug response prediction. The relations between the genomic features are more robust and stable compared to the correlation between each individual genomic feature and the drug response in high dimension and low sample size genomic datasets.


Genomics ◽  
2019 ◽  
Vol 111 (5) ◽  
pp. 1078-1088 ◽  
Author(s):  
Mehmet Tan ◽  
Ozan Fırat Özgül ◽  
Batuhan Bardak ◽  
Işıksu Ekşioğlu ◽  
Suna Sabuncuoğlu

2013 ◽  
Vol 31 (15_suppl) ◽  
pp. e14544-e14544
Author(s):  
Eva Budinska ◽  
Jenny Wilding ◽  
Vlad Calin Popovici ◽  
Edoardo Missiaglia ◽  
Arnaud Roth ◽  
...  

e14544 Background: We identified CRC gene expression subtypes (ASCO 2012, #3511), which associate with established parameters of outcome as well as relevant biological motifs. We now substantiate their biological and potentially clinical significance by linking them with cell line data and drug sensitivity, primarily attempting to identify models for the poor prognosis subtypes Mesenchymal and CIMP-H like (characterized by EMT/stroma and immune-associated gene modules, respectively). Methods: We analyzed gene expression profiles of 35 publicly available cell lines with sensitivity data for 82 drug compounds, and our 94 cell lines with data on sensitivity for 7 compounds and colony morphology. As in vitro, stromal and immune-associated genes loose their relevance, we trained a new classifier based on genes expressed in both systems, which identifies the subtypes in both tissue and cell cultures. Cell line subtypes were validated by comparing their enrichment for molecular markers with that of our CRC subtypes. Drug sensitivity was assessed by linking original subtypes with 92 drug response signatures (MsigDB) via gene set enrichment analysis, and by screening drug sensitivity of cell line panels against our subtypes (Kruskal-Wallis test). Results: Of the cell lines 70% could be assigned to a subtype with a probability as high as 0.95. The cell line subtypes were significantly associated with their KRAS, BRAF and MSI status and corresponded to our CRC subtypes. Interestingly, the cell lines which in matrigel created a network of undifferentiated cells were assigned to the Mesenchymal subtype. Drug response studies revealed potential sensitivity of subtypes to multiple compounds, in addition to what could be predicted based on their mutational profile (e.g. sensitivity of the CIMP-H subtype to Dasatinib, p<0.01). Conclusions: Our data support the biological and potentially clinical significance of the CRC subtypes in their association with cell line models, including results of drug sensitivity analysis. Our subtypes might not only have prognostic value but might also be predictive for response to drugs. Subtyping cell lines further substantiates their significance as relevant model for functional studies.


2014 ◽  
Vol 15 (4) ◽  
pp. 347-353 ◽  
Author(s):  
G Riddick ◽  
H Song ◽  
S L Holbeck ◽  
W Kopp ◽  
J Walling ◽  
...  

2020 ◽  
Author(s):  
Francisco J. Esteban ◽  
Peter J. Tonellato ◽  
Dennis P. Wall

AbstractThe genetic heterogeneity of autism has stymied the search for causes and cures. Even whole-genomic studies on large numbers of families have yielded results of relatively little impact. In the present work, we analyze two genomic databases using a novel strategy that takes prior knowledge of genetic relationships into account and that was designed to boost signal important to our understanding of the molecular basis of autism. Our strategy was designed to identify significant genomic variation within a priori defined biological concepts and improves signal detection while lessening the severity of multiple test correction seen in standard analysis of genome-wide association data. Upon application of our approach using 3,244 biological concepts, we detected genomic variation in 68 biological concepts with significant association to autism in comparison to family-based controls. These concepts clustered naturally into a total of 19 classes, principally including cell adhesion, cancer, and immune response. The top-ranking concepts contained high percentages of genes already suspected to play roles in autism or in a related neurological disorder. In addition, many of the sets associated with autism at the DNA level also proved to be predictive of changes in gene expression within a separate population of autistic cases, suggesting that the signature of genomic variation may also be detectable in blood-based transcriptional profiles. This robust cross-validation with gene expression data from individuals with autism coupled with the enrichment within autism-related neurological disorders supported the possibility that the mutations play important roles in the onset of autism and should be given priority for further study. In sum, our work provides new leads into the genetic underpinnings of autism and highlights the importance of reanalysis of genomic studies of complex disease using prior knowledge of genetic organization.Author SummaryThe genetic heterogeneity of autism has stymied the search for causes and cures. Even whole-genomic studies on large numbers of families have yielded results of relatively little impact. In the present work, we reanalyze two of the most influential whole-genomic studies using a novel strategy that takes prior knowledge of genetic relationships into account in an effort to boost signal important to our understanding of the molecular structure of autism. Our approach demonstrates that these genome wide association studies contain more information relevant to autism than previously realized. We detected 68 highly significant collections of mutations that map to genes with measurable and significant changes in gene expression in autistic individuals, and that have been implicated in other neurological disorders believed to be closely related, and genetically linked, to autism. Our work provides leads into the genetic underpinnings of autism and highlights the importance of reanalysis of genomic studies of disease using prior knowledge of genetic organization.


2018 ◽  
Author(s):  
Adrià Fernández-Torras ◽  
Miquel Duran-Frigola ◽  
Patrick Aloy

AbstractBackgroundThe integration of large-scale drug sensitivity screens and genome-wide experiments is changing the field of pharmacogenomics, revealing molecular determinants of drug response without the need for previous knowledge about drug action. In particular, transcriptional signatures of drug sensitivity may guide drug repositioning, prioritize drug combinations and point to new therapeutic biomarkers. However, the inherent complexity of transcriptional signatures, with thousands of differentially expressed genes, makes them hard to interpret, thus giving poor mechanistic insights and hampering translation to clinics.MethodsTo simplify drug signatures, we have developed a network-based methodology to identify functionally coherent gene modules. Our strategy starts with the calculation of drug-gene correlations and is followed by a pathway-oriented filtering and a network-diffusion analysis across the interactome.ResultsWe apply our approach to 189 drugs tested in 671 cancer cell lines and observe a connection between gene expression levels of the modules and mechanisms of action of the drugs. Further, we characterize multiple aspects of the modules, including their functional categories, tissue-specificity and prevalence in clinics. Finally, we prove the predictive capability of the modules and demonstrate how they can be used as gene sets in conventional enrichment analyses.ConclusionsNetwork biology strategies like module detection are able to digest the outcome of large-scale pharmacogenomic initiatives, thereby contributing to their interpretability and improving the characterization of the drugs screened.


Blood ◽  
2015 ◽  
Vol 126 (23) ◽  
pp. 4249-4249
Author(s):  
Amit Kumar Mitra ◽  
Ujjal Mukherjee ◽  
Taylor Harding ◽  
Holly Stessman ◽  
Ying Li ◽  
...  

Abstract Multiple myeloma (MM) is characterized by significant genetic diversity at subclonal levels that likely plays a defining role in the heterogeneity of tumor progression, clinical aggressiveness and drug sensitivity. Such heterogeneity is a driving factor in the evolution of MM, from founder clones through outgrowth of subclonal fractions. DNA Sequencing studies on MM samples have indeed demonstrated such heterogeneity in subclonal architecture at diagnosis based on recurrent mutations in pathologically relevant genes that may ultimately to lead to relapse. However, no study so far has reported a predictive gene expression signature that can identify, distinguish and quantify drug sensitive and drug-resistant subpopulations within a bulk population of myeloma cells. In recent years, our laboratory has successfully developed a gene expression profile (GEP)-based signature that could not only distinguish drug response of MM cell lines, but also was effective in stratifying patient outcomes when applied to GEP profiles from MM clinical trials using proteasome inhibitors (PI) as chemotherapeutic agents. Further, we noted myeloma cell lines that responded to the drug often contained residual sub-population of cells that did not respond, and likely were selectively propagated during drug treatment in vitro, and in patients. In this study, we performed targeted qRT-PCR analysis of single cells using a gene panel that included PI sensitivity genes and gene signatures that could discriminate between low and high-risk myeloma followed by intensive bioinformatics and statistical analysis for the classification and prediction of PI response in individual cells within bulk multiple myeloma tumors. Fluidigm's C1 Single-Cell Auto Prep System was used to perform automated single-cell capture, processing and cDNA synthesis on 576 pre-treatment cells from 12 cell lines representing a wide range of PI-sensitivity and 370 cells from 7 patient samples undergoing PI treatment followed by targeted gene expression profiling of single cells using automated, high-throughput on-chip qRT-PCR analysis using 96.96 Dynamic Array IFCs on the BioMark HD System. Probability of resistance for each individual cell was predicted using a pipeline that employed the machine learning methods Random Forest, Support Vector Machine (radial and sigmoidal), LASSO and kNN (k Nearest Neighbor) for making single-cell GEP data-driven predictions/ decisions. The weighted probabilities from each of the algorithms were used to quantify resistance of each individual cell and plotted using Ensemble forecasting algorithm. Using our drug response GEP signature at the single cell level, we could successfully identify distinct subpopulations of tumor cells that were predicted to be sensitive or resistant to PIs. Subsequently, we developed a R Statistical analysis package (http://cran.r-project.org), SCATTome (Single Cell Analysis of Targeted Transcriptome), that can restructure data obtained from Fluidigm qPCR analysis run, filter missing data, perform scaling of filtered data, build classification models and successfully predict drug response of individual cells and classify each cell's probability of response based on the targeted transcriptome. We will present the program output as graphical displays of single cell response probabilities. This package provides a novel classification method that has the potential to predict subclonal response to a variety of therapeutic agents. Disclosures Kumar: Skyline: Consultancy, Honoraria; BMS: Consultancy; Onyx: Consultancy, Research Funding; Sanofi: Consultancy, Research Funding; Janssen: Consultancy, Research Funding; Novartis: Research Funding; Takeda: Consultancy, Research Funding; Celgene: Consultancy, Research Funding.


Sign in / Sign up

Export Citation Format

Share Document