scholarly journals Multiclass Prediction with Partial Least Square Regression for Gene Expression Data: Applications in Breast Cancer Intrinsic Taxonomy

2013 ◽  
Vol 2013 ◽  
pp. 1-9 ◽  
Author(s):  
Chi-Cheng Huang ◽  
Shih-Hsin Tu ◽  
Ching-Shui Huang ◽  
Heng-Hui Lien ◽  
Liang-Chuan Lai ◽  
...  

Multiclass prediction remains an obstacle for high-throughput data analysis such as microarray gene expression profiles. Despite recent advancements in machine learning and bioinformatics, most classification tools were limited to the applications of binary responses. Our aim was to apply partial least square (PLS) regression for breast cancer intrinsic taxonomy, of which five distinct molecular subtypes were identified. The PAM50 signature genes were used as predictive variables in PLS analysis, and the latent gene component scores were used in binary logistic regression for each molecular subtype. The 139 prototypical arrays for PAM50 development were used as training dataset, and three independent microarray studies with Han Chinese origin were used for independent validation (n=535). The agreement between PAM50 centroid-based single sample prediction (SSP) and PLS-regression was excellent (weighted Kappa: 0.988) within the training samples, but deteriorated substantially in independent samples, which could attribute to much more unclassified samples by PLS-regression. If these unclassified samples were removed, the agreement between PAM50 SSP and PLS-regression improved enormously (weighted Kappa: 0.829 as opposed to 0.541 when unclassified samples were analyzed). Our study ascertained the feasibility of PLS-regression in multi-class prediction, and distinct clinical presentations and prognostic discrepancies were observed across breast cancer molecular subtypes.

Author(s):  
Eiman Tamah Alshammari

This paper motivation is to find the most accurate technique to predict the ground level ozone at Al Jahra station, Kuwait. The data on the meteorological variables (air temperature, relative humidity, solar radiation, direction and speed of wind) and concentration of seven pollutants of environment (SO2, NO2, NO, CO2, CO, NMHC, and CH4) were applied to forecast the ozone concentration in atmosphere. In this report, three methods (PLS regression, support vector machine (SVM), and multiple least-square regression) were used to predict ground-level ozone. We used Fifteen parameters to evaluate the performance of methods. Multiple least-square regression, partial least square regression (PLS regression), and SVM using linear and radial kernels were the best performers with MAE (mean absolute error) of 9.17x 10-03, 9.72 x 10-03, 9.64 x 10-03, and 9.12 x 10-03, respectively. SVM with polynomial kernel had MAE of 5.46 x 10-02. These results show that these methods could be used to predict ground-level ozone concentrations at Al Jahra station in Kuwait.


2007 ◽  
Vol 25 (18_suppl) ◽  
pp. 2531-2531
Author(s):  
J. Hannemann ◽  
H. Halfwerk ◽  
A. Velds ◽  
C. Loo ◽  
E. J. Rutgers ◽  
...  

2531 Background: Preoperative chemotherapy is increasingly employed to treat primary breast cancer, allowing an ‘in vivo chemosensitivity test’. Markers which predict a pathological complete response are urgently needed to refine this strategy. This study was conducted to evaluate the use of gene expression profiling to predict response to neoadjuvant anthracycline- or taxane-based chemotherapy. Methods: Patients with operable or locally advanced HER2-negative breast cancer received preoperative chemotherapy: either dose- dense doxorubicin and cyclophosphamide (ddAC) or capecitabine and docetaxel (CD). Core needle biopsies were taken before treatment and gene expression profiling was performed using 35k oligo microarrays. Results: Gene expression profiles were obtained from pretreatment biopsies of 63 tumors. 27% of the patients achieved a (near) pathologic complete remission (pCR), 40% of the patients had a partial remission and 33% of the patients did not respond to chemotherapy. Based on the gene expression profiles, tumors were assigned to the previously identified “molecular subtypes” luminal, basal-like or ERBB2-like (Sorlie et al., PNAS 98: 10869, 2001). 13 out of 25 patients with a basal-like tumor (52%) achieved a complete remission, whereas for the luminal tumors a pCR was only obtained in 2 out of 29 patients. Using four published gene expression classifiers of response to chemotherapy, a reasonable separation between responders and non-responders could be observed for two of these. We also performed exploratory supervised classification analyses on our dataset to identify a novel classifier. This resulted in a classifier for response to therapy irrespective of the chemotherapy regimen used and a second classifier specifically associated with response to ddAC chemotherapy. We will perform validation of these classifiers in samples from patients that are currently being enrolled in the study. Conclusions: Basal-like tumors have a better response to neoadjuvant chemotherapy as compared to other tumor types. The identification of robust gene expression signatures for better response prediction may require larger patient groups and should probably be established separately for each of the molecular subtypes of breast cancer. No significant financial relationships to disclose.


2006 ◽  
Vol 9 (1) ◽  
pp. 1-3
Author(s):  
P. E. Lønning

Citation of original article:F. Bertucci, P. Finetti, J. Rougemont, E. Charafe-Jauffret, N. Cervera, C. Tarpin,et al. Gene expression profiling identifies molecular subtypes of inflammatory breast cancer.Cancer Research2005;65(6): 2170–8.Abstract of the original articleBreast cancer is a heterogeneous disease. Comprehensive gene expression profiles obtained using DNA microarrays have revealed previously indistinguishable subtypes of non-inflammatory breast cancer (NIBC) related to different features of mammary epithelial biology and significantly associated with survival. Inflammatory breast cancer (IBC) is a rare, particular, and aggressive form of disease. Here we have investigated whether the five molecular subtypes described for NIBC (luminal A and B, basal, ERBB2 overexpressing, and normal breast-like) were also present in IBC. We monitored the RNA expression of approximately 8,000 genes in 83 breast tissue samples including 37 IBC, 44 NIBC, and 2 normal breast samples. Hierarchical clustering identified the five subtypes of breast cancer in both NIBC and IBC samples. These subtypes were highly similar to those defined in previous studies and associated with similar histoclinical features. The robustness of this classification was confirmed by the use of both alternative gene set and analysis method, and the results were corroborated at the protein level. Furthermore, we show that the differences in gene expression between NIBC and IBC and between IBC with and without pathologic complete response that we have recently reported persist in each subtype. Our results show that the expression signatures defining molecular subtypes of NIBC are also present in IBC. Obtained using different patient series and different microarray platforms, they reinforce confidence in the expression-based molecular taxonomy but also give evidence for its universality in breast cancer, independently of a specific clinical form.


2018 ◽  
Vol 21 (2) ◽  
pp. 74-83
Author(s):  
Tzu-Hung Hsiao ◽  
Yu-Chiao Chiu ◽  
Yu-Heng Chen ◽  
Yu-Ching Hsu ◽  
Hung-I Harry Chen ◽  
...  

Aim and Objective: The number of anticancer drugs available currently is limited, and some of them have low treatment response rates. Moreover, developing a new drug for cancer therapy is labor intensive and sometimes cost prohibitive. Therefore, “repositioning” of known cancer treatment compounds can speed up the development time and potentially increase the response rate of cancer therapy. This study proposes a systems biology method for identifying new compound candidates for cancer treatment in two separate procedures. Materials and Methods: First, a “gene set–compound” network was constructed by conducting gene set enrichment analysis on the expression profile of responses to a compound. Second, survival analyses were applied to gene expression profiles derived from four breast cancer patient cohorts to identify gene sets that are associated with cancer survival. A “cancer–functional gene set– compound” network was constructed, and candidate anticancer compounds were identified. Through the use of breast cancer as an example, 162 breast cancer survival-associated gene sets and 172 putative compounds were obtained. Results: We demonstrated how to utilize the clinical relevance of previous studies through gene sets and then connect it to candidate compounds by using gene expression data from the Connectivity Map. Specifically, we chose a gene set derived from a stem cell study to demonstrate its association with breast cancer prognosis and discussed six new compounds that can increase the expression of the gene set after the treatment. Conclusion: Our method can effectively identify compounds with a potential to be “repositioned” for cancer treatment according to their active mechanisms and their association with patients’ survival time.


2021 ◽  
Vol 13 (4) ◽  
pp. 641
Author(s):  
Gopal Ramdas Mahajan ◽  
Bappa Das ◽  
Dayesh Murgaokar ◽  
Ittai Herrmann ◽  
Katja Berger ◽  
...  

Conventional methods of plant nutrient estimation for nutrient management need a huge number of leaf or tissue samples and extensive chemical analysis, which is time-consuming and expensive. Remote sensing is a viable tool to estimate the plant’s nutritional status to determine the appropriate amounts of fertilizer inputs. The aim of the study was to use remote sensing to characterize the foliar nutrient status of mango through the development of spectral indices, multivariate analysis, chemometrics, and machine learning modeling of the spectral data. A spectral database within the 350–1050 nm wavelength range of the leaf samples and leaf nutrients were analyzed for the development of spectral indices and multivariate model development. The normalized difference and ratio spectral indices and multivariate models–partial least square regression (PLSR), principal component regression, and support vector regression (SVR) were ineffective in predicting any of the leaf nutrients. An approach of using PLSR-combined machine learning models was found to be the best to predict most of the nutrients. Based on the independent validation performance and summed ranks, the best performing models were cubist (R2 ≥ 0.91, the ratio of performance to deviation (RPD) ≥ 3.3, and the ratio of performance to interquartile distance (RPIQ) ≥ 3.71) for nitrogen, phosphorus, potassium, and zinc, SVR (R2 ≥ 0.88, RPD ≥ 2.73, RPIQ ≥ 3.31) for calcium, iron, copper, boron, and elastic net (R2 ≥ 0.95, RPD ≥ 4.47, RPIQ ≥ 6.11) for magnesium and sulfur. The results of the study revealed the potential of using hyperspectral remote sensing data for non-destructive estimation of mango leaf macro- and micro-nutrients. The developed approach is suggested to be employed within operational retrieval workflows for precision management of mango orchard nutrients.


2021 ◽  
Vol 28 ◽  
pp. 107327482098851
Author(s):  
Zeng-Hong Wu ◽  
Yun Tang ◽  
Yan Zhou

Background: Epigenetic changes are tightly linked to tumorigenesis development and malignant transformation’ However, DNA methylation occurs earlier and is constant during tumorigenesis. It plays an important role in controlling gene expression in cancer cells. Methods: In this study, we determining the prognostic value of molecular subtypes based on DNA methylation status in breast cancer samples obtained from The Cancer Genome Atlas database (TCGA). Results: Seven clusters and 204 corresponding promoter genes were identified based on consensus clustering using 166 CpG sites that significantly influenced survival outcomes. The overall survival (OS) analysis showed a significant prognostic difference among the 7 groups (p<0.05). Finally, a prognostic model was used to estimate the results of patients on the testing set based on the classification findings of a training dataset DNA methylation subgroups. Conclusions: The model was found to be important in the identification of novel biomarkers and could be of help to patients with different breast cancer subtypes when predicting prognosis, clinical diagnosis and management.


2021 ◽  
Vol 11 (2) ◽  
pp. 618
Author(s):  
Tanvir Tazul Islam ◽  
Md Sajid Ahmed ◽  
Md Hassanuzzaman ◽  
Syed Athar Bin Amir ◽  
Tanzilur Rahman

Diabetes is a chronic illness that affects millions of people worldwide and requires regular monitoring of a patient’s blood glucose level. Currently, blood glucose is monitored by a minimally invasive process where a small droplet of blood is extracted and passed to a glucometer—however, this process is uncomfortable for the patient. In this paper, a smartphone video-based noninvasive technique is proposed for the quantitative estimation of glucose levels in the blood. The videos are collected steadily from the tip of the subject’s finger using smartphone cameras and subsequently converted into a Photoplethysmography (PPG) signal. A Gaussian filter is applied on top of the Asymmetric Least Square (ALS) method to remove high-frequency noise, optical noise, and motion interference from the raw PPG signal. These preprocessed signals are then used for extracting signal features such as systolic and diastolic peaks, the time differences between consecutive peaks (DelT), first derivative, and second derivative peaks. Finally, the features are fed into Principal Component Regression (PCR), Partial Least Square Regression (PLS), Support Vector Regression (SVR) and Random Forest Regression (RFR) models for the prediction of glucose level. Out of the four statistical learning techniques used, the PLS model, when applied to an unbiased dataset, has the lowest standard error of prediction (SEP) at 17.02 mg/dL.


Molecules ◽  
2021 ◽  
Vol 26 (6) ◽  
pp. 1546
Author(s):  
Ioanna Dagla ◽  
Anthony Tsarbopoulos ◽  
Evagelos Gikas

Colistimethate sodium (CMS) is widely administrated for the treatment of life-threatening infections caused by multidrug-resistant Gram-negative bacteria. Until now, the quality control of CMS formulations has been based on microbiological assays. Herein, an ultra-high-performance liquid chromatography coupled to ultraviolet detector methodology was developed for the quantitation of CMS in injectable formulations. The design of experiments was performed for the optimization of the chromatographic parameters. The chromatographic separation was achieved using a Waters Acquity BEH C8 column employing gradient elution with a mobile phase consisting of (A) 0.001 M aq. ammonium formate and (B) methanol/acetonitrile 79/21 (v/v). CMS compounds were detected at 214 nm. In all, 23 univariate linear-regression models were constructed to measure CMS compounds separately, and one partial least-square regression (PLSr) model constructed to assess the total CMS amount in formulations. The method was validated over the range 100–220 μg mL−1. The developed methodology was employed to analyze several batches of CMS injectable formulations that were also compared against a reference batch employing a Principal Component Analysis, similarity and distance measures, heatmaps and the structural similarity index. The methodology was based on freely available software in order to be readily available for the pharmaceutical industry.


Sign in / Sign up

Export Citation Format

Share Document