scholarly journals Assessing the clinical utility of genomic expression data across human cancers

Oncotarget ◽  
2016 ◽  
Vol 7 (29) ◽  
pp. 45926-45936 ◽  
Author(s):  
Xinsen Xu ◽  
Lei Huang ◽  
Chun Hei Chan ◽  
Tao Yu ◽  
Runchen Miao ◽  
...  
Blood ◽  
2018 ◽  
Vol 132 (Supplement 1) ◽  
pp. 290-290
Author(s):  
Abdelrahman H Elsayed ◽  
Huiyun Wu ◽  
Xueyuan Cao ◽  
Susana C. Raimondi ◽  
James R. Downing ◽  
...  

Abstract Introduction: Resistance and relapse remain major obstacles in the treatment of acute myeloid leukemia (AML). Pre-existence and persistence of drug resistant leukemia stem cells (LSCs) is considered one of the major causes of relapse. A previous study (Ng et al., 2016) reported a prognostic signature of 17 genes (LSC17 score) differentially expressed in LSC+ compared to LSC- cell fractions that predicted outcome in patients with AML thereby classifying patients into high and low risk groups. The goal of this study is to determine the validity of LSC17 score in pediatric AML patients and to enhance its clinical utility by exploring a new score with limited number of stem cell genes. Methods: 150 pediatric patients with AML enrolled in the multicenter AML02 clinical trial (ClinicalTrials.gov Identifier: NCT00136084) with Affymetrix U133A microarray gene expression data and clinical data were included in the study. Since only 14 of the 17 genes were represented on the Affymetrix U133A gene chip we tested the validity of the LSC14 score using the previously defined equation (Ng et al, 2016) with multiple clinical endpoints such as minimum residual disease (MRD), event free survival (EFS) and overall survival (OS). To reduce the model complexity, we applied a penalized regression algorithm called the least absolute shrinkage and selection operator (LASSO) implemented in the glmnet R-package using event free survival (EFS) as an outcome variable. Score of the new equation, which included three genes, was designated as pediatric-LSC3 (pLSC3). pLSC3 was tested in the AML02 cohort for association of high or low pLSC3 (based on the median value) with clinical endpoints mentioned above. pLSC3 score equation was validated using publically available gene-expression data from 117 pediatric relapse enriched AML patient cohort enrolled in Children's Oncology Group (COG) protocol (TARGET database). COX-proportional hazard models and Log rank test were used for survival data analysis. Results: AML02 cohort: Patients with high LSC14 scores (greater than median), had significantly worse MRD (p<0.0001), EFS (HR = 3.72, P <0.00001) and OS (HR = 4.85, P <0.00001) compared to patients with low LSC14 scores. After applying LASSO regression to simplify the score equation, only three genes (DNMT3B, CD34 and GPR56) remained significant to the model fit of the EFS data thus we created a pLSC3 with coefficients as described in the equation: pLSC3_SCORE = (DNMT3B*0.0431) + (CD34*0.00076) + (GPR56*0.0326). Patients were classified as high or low pLSC3 and patients with high pLSC3 scores had significantly worse EFS (HR=3.595, P < 0.0001; Figure 1A) and OS (HR= 4.53, P<0.0001) and higher MRD after induction 1 and induction II, respectively (P<0.00001 and p=0.0001 respectively; Figure 1C). These results were further validated in an independent cohort of patients from TARGET database, where higher pLSC3 score was associated with worse EFS, OS and MRD (EFS: HR=1.64, P=0.0248; Figure 1B, OS: HR = 1.77, P = 0.0349 and MRD p=0.0002, Figure 1D). Consistent results were also observed with high pLSC3 predictive of significantly worse outcome within standard risk group patients within both AML02 and COG cohorts (AML02-EFS: HR = 2.97, P = 0.0153, COG-EFS: HR = 2.22, P = 0.0096; Figure 1E and F respectively). In a multivariate COX regression model, pLSC3 score groups was the only significant covariate (table 1). It explained 13.1% of variability in EFS and 11.6% of variability in OS, while other prognostic factors such as risk groups, FLT3 status, treatment arm and age collectively explained 15.1 and 12.1 % of variability. Discussion: In summary, our results show validity of a previously defined LSC14 score in a pediatric AML population from the multicenter AML02 clinical trial. To enhance the clinical utility, score equation was further simplified and the final score (pLSC3) was derived from three genes: DNMT3B, which encodes for DNA methyltransferase; CD34, an important cell surface marker for early-undifferentiated LSCs; and GPR56, a G protein coupled receptor of significance in AML. Given that there is need to refine classification of a highly heterogeneous group of patients with standard risk AML, we show that differentiating standard risk patients based on pLSC3 score should be considered in the future. We show the relevance of pLSC3 in two independent cohorts, opening up opportunities to improve treatment outcomes of pediatric patients with AML. Disclosures No relevant conflicts of interest to declare.


2020 ◽  
Author(s):  
Dmitry Rychkov ◽  
Jessica Neely ◽  
Tomiko Oskotsky ◽  
Steven Yu ◽  
Noah Perlmutter ◽  
...  

AbstractBackground/PurposeThere is an urgent need to identify effective biomarkers for early diagnosis of rheumatoid arthritis (RA) and to accurately monitor disease activity. Here we define an RA meta-profile using publicly available cross-tissue gene expression data and apply machine learning to identify putative biomarkers, which we further validate on independent datasets.MethodsWe carried out a comprehensive search for publicly available microarray gene expression data in the NCBI Gene Expression Omnibus database for whole blood and synovial tissues from RA patients and healthy controls. The raw data from 13 synovium datasets with 284 samples and 14 blood datasets with 1,885 samples were downloaded and processed. The datasets for each tissue were merged, batch corrected and split into training and test sets. We then developed and applied a robust feature selection pipeline to identify genes dysregulated in both tissues and highly associated with RA. From the training data, we identified a set of overlapping differentially expressed genes following the condition of co-directionality. The classification performance of each gene in the resulting set was evaluated on the testing sets using the area under a receiver operating characteristic curve. Five independent datasets were used to validate and threshold the feature selected (FS) genes. Finally, we defined the RA Score, composed of the geometric mean of the selected RA Score Panel genes, and demonstrated its clinical utility.ResultsThis feature selection pipeline resulted in a set of 25 upregulated and 28 downregulated genes. To assess the robustness of these FS genes, we trained a Random Forest machine learning model with this set of 53 genes and then with the set of 33 overlapping genes differentially expressed in both tissues and tested on the validation cohorts. The model with FS genes outperformed the model with common DE genes with AUC 0.89 ± 0.04 vs 0.87 ± 0.04. The FS genes were further validated on the 5 independent datasets resulting in 10 upregulated genes, TNFAIP6, S100A8, TNFSF10, DRAM1, LY96, QPCT, KYNU, ENTPD1, CLIC1, and ATP6V0E1, which are involved in innate immune system pathways, including neutrophil degranulation and apoptosis. There were also three downregulated genes, HSP90AB1, NCL, and CIRBP, that are involved in metabolic processes and T-cell receptor regulation of apoptosis.To investigate the clinical utility of the 13 validated genes, the RA Score was developed and found to be highly correlated with the disease activity score based on the 28 examined joints (DAS28) (r = 0.33 ± 0.03, p = 7e-9) and able to distinguish osteoarthritis (OA) from RA samples (OR 0.57, 95% CI [0.34, 0.80], p = 8e-10). Moreover, the RA Score was not significantly different for rheumatoid factor (RF) positive and RF-negative RA sub-phenotypes (p = 0.9) and also distinguished polyarticular juvenile idiopathic arthritis (polyJIA) from healthy individuals in 10 independent pediatric cohorts (OR 1.15, 95% CI [1.01, 1.3], p = 2e-4) suggesting the generalizability of this score in clinical applications. The RA Score was also able to monitor the treatment effect among RA patients (t-test of treated vs untreated, p = 2e-4). Finally, we performed immunoblotting analysis of 6 proteins in unstimulated PBMC lysates from an independent cohort of 8 newly diagnosed RA patients and 7 healthy controls, where two proteins, TNFAIP6/TSG6 and HSP90AB1/HSP90, were validated and the S100A8 protein showed near significant up-regulation.ConclusionThe RA Score, consisting of 13 putative biomarkers identified through a robust feature selection procedure on public data and validated using multiple independent data sets, could be useful in the diagnosis and treatment monitoring of RA.


2017 ◽  
Vol 2017 ◽  
pp. 1-9
Author(s):  
Annamalai Muthiah ◽  
Susanna R. Keller ◽  
Jae K. Lee

Different computational approaches have been examined and compared for inferring network relationships from time-series genomic data on human disease mechanisms under the recent Dialogue on Reverse Engineering Assessment and Methods (DREAM) challenge. Many of these approaches infer all possible relationships among all candidate genes, often resulting in extremely crowded candidate network relationships with many more False Positives than True Positives. To overcome this limitation, we introduce a novel approach, Module Anchored Network Inference (MANI), that constructs networks by analyzing sequentially small adjacent building blocks (modules). Using MANI, we inferred a 7-gene adipogenesis network based on time-series gene expression data during adipocyte differentiation. MANI was also applied to infer two 10-gene networks based on time-course perturbation datasets from DREAM3 and DREAM4 challenges. MANI well inferred and distinguished serial, parallel, and time-dependent gene interactions and network cascades in these applications showing a superior performance to other in silico network inference techniques for discovering and reconstructing gene network relationships.


Sign in / Sign up

Export Citation Format

Share Document