Missing Values Estimation in Microarray Data with Partial Least Squares Regression

Fitting Cox models in a big data context -on a massive scale in terms of volume, intensity, and complexity exceeding the capacity of usual analytic tools-is often challenging. If some data are missing, it is even more difficult. We proposed algorithms that were able to fit Cox models in high dimensional settings using extensions of partial least squares regression to the Cox models. Some of them were able to cope with missing data. We were recently able to extend our most recent algorithms to big data, thus allowing to fit Cox model for big data with missing values. When cross-validating standard or extended Cox models, the commonly used criterion is the cross-validated partial loglikelihood using a naive or a van Houwelingen scheme —to make efficient use of the death times of the left out data in relation to the death times of all the data. Quite astonishingly, we will show, using a strong simulation study involving three different data simulation algorithms, that these two cross-validation methods fail with the extensions, either straightforward or more involved ones, of partial least squares regression to the Cox model. This is quite an interesting result for at least two reasons. Firstly, several nice features of PLS based models, including regularization, interpretability of the components, missing data support, data visualization thanks to biplots of individuals and variables —and even parsimony or group parsimony for Sparse partial least squares or sparse group SPLS based models, account for a common use of these extensions by statisticians who usually select their hyperparameters using cross-validation. Secondly, they are almost always featured in benchmarking studies to assess the performance of a new estimation technique used in a high dimensional or big data context and often show poor statistical properties. We carried out a vast simulation study to evaluate more than a dozen of potential cross-validation criteria, either AUC or prediction error based. Several of them lead to the selection of a reasonable number of components. Using these newly found cross-validation criteria to fit extensions of partial least squares regression to the Cox model, we performed a benchmark reanalysis that showed enhanced performances of these techniques. In addition, we proposed sparse group extensions of our algorithms and defined a new robust measure based on the Schmid score and the R coefficient of determination for least absolute deviation: the integrated R Schmid Score weighted. The R-package used in this article is available on the CRAN, http://cran.r-project.org/web/packages/plsRcox/index.html. The R package bigPLS will soon be available on the CRAN and, until then, is available on Github https://github.com/fbertran/bigPLS.

Download Full-text

Use of Biplots and Partial Least Squares Regression in Microarray Data Analysis for Assessing Association between Genes Involved in Different Biological Pathways

Computational Intelligence Methods for Bioinformatics and Biostatistics - Lecture Notes in Computer Science ◽

10.1007/978-3-642-21946-7_10 ◽

2011 ◽

pp. 123-134 ◽

Cited By ~ 2

Author(s):

Niccoló Bassani ◽

Federico Ambrogi ◽

Danila Coradini ◽

Elia Biganzoli

Keyword(s):

Data Analysis ◽

Least Squares ◽

Partial Least Squares ◽

Microarray Data ◽

Partial Least Squares Regression ◽

Biological Pathways ◽

Microarray Data Analysis ◽

Least Squares Regression

Download Full-text

Outlier detection and ambiguity detection for microarray data in probabilistic discriminant partial least squares regression

Journal of Chemometrics ◽

10.1002/cem.1304 ◽

2010 ◽

Vol 24 (7-8) ◽

pp. 434-443 ◽

Cited By ~ 7

Author(s):

C. Botella ◽

J. Ferré ◽

R. Boqué

Keyword(s):

Least Squares ◽

Partial Least Squares ◽

Outlier Detection ◽

Microarray Data ◽

Partial Least Squares Regression ◽

Least Squares Regression ◽

Ambiguity Detection

Download Full-text

Particles Counting in Intracellular Images by Partial Least Squares Regression and HLAC Feature between Multiple Features

IEEJ Transactions on Electronics Information and Systems ◽

10.1541/ieejeiss.135.236 ◽

2015 ◽

Vol 135 (2) ◽

pp. 236-243

Author(s):

Shohei Kumagai ◽

Kazuhiro Hotta

Keyword(s):

Least Squares ◽

Partial Least Squares ◽

Partial Least Squares Regression ◽

Least Squares Regression ◽

Multiple Features

Download Full-text

Use of reflectance spectroscopy to estimate the organic carbon and CaCO3 contents of soils

Agrokémia és Talajtan ◽

10.1556/agrokem.60.2012.2.5 ◽

2012 ◽

Vol 61 (2) ◽

pp. 277-290 ◽

Cited By ~ 1

Author(s):

Ádám Csorba ◽

Vince Láng ◽

László Fenyvesi ◽

Erika Michéli

Keyword(s):

Organic Carbon ◽

Least Squares ◽

Partial Least Squares ◽

Partial Least Squares Regression ◽

Mean Squared Error ◽

Reflectance Spectroscopy ◽

Least Squares Regression ◽

Root Mean Squared Error ◽

Squared Error

Napjainkban egyre nagyobb igény mutatkozik olyan technológiák és módszerek kidolgozására és alkalmazására, melyek lehetővé teszik a gyors, költséghatékony és környezetbarát talajadat-felvételezést és kiértékelést. Ezeknek az igényeknek felel meg a reflektancia spektroszkópia, mely az elektromágneses spektrum látható (VIS) és közeli infravörös (NIR) tartományában (350–2500 nm) végzett reflektancia-mérésekre épül. Figyelembe véve, hogy a talajokról felvett reflektancia spektrum információban nagyon gazdag, és a vizsgált tartományban számos talajalkotó rendelkezik karakterisztikus spektrális „ujjlenyomattal”, egyetlen görbéből lehetővé válik nagyszámú, kulcsfontosságú talajparaméter egyidejű meghatározása. Dolgozatunkban, a reflektancia spektroszkópia alapjaira helyezett, a talajok ösz-szetételének meghatározását célzó módszertani fejlesztés első lépéseit mutatjuk be. Munkánk során talajok szervesszén- és CaCO3-tartalmának megbecslését lehetővé tévő többváltozós matematikai-statisztikai módszerekre (részleges legkisebb négyzetek módszere, partial least squares regression – PLSR) épülő prediktív modellek létrehozását és tesztelését végeztük el. A létrehozott modellek tesztelése során megállapítottuk, hogy az eljárás mindkét talajparaméter esetében magas R2értéket [R2(szerves szén) = 0,815; R2(CaCO3) = 0,907] adott. A becslés pontosságát jelző közepes négyzetes eltérés (root mean squared error – RMSE) érték mindkét paraméter esetében közepesnek mondható [RMSE (szerves szén) = 0,467; RMSE (CaCO3) = 3,508], mely a reflektancia mérési előírások standardizálásával jelentősen javítható. Vizsgálataink alapján arra a következtetésre jutottunk, hogy a reflektancia spektroszkópia és a többváltozós kemometriai eljárások együttes alkalmazásával, gyors és költséghatékony adatfelvételezési és -értékelési módszerhez juthatunk.

Download Full-text

Speech Emotion Recognition Based on Sparse Representation

Archives of Acoustics ◽

10.2478/aoa-2013-0055 ◽

2013 ◽

Vol 38 (4) ◽

pp. 465-470 ◽

Cited By ~ 11

Author(s):

Jingjie Yan ◽

Xiaolan Wang ◽

Weiyi Gu ◽

LiLi Ma

Keyword(s):

Dimensionality Reduction ◽

Emotion Recognition ◽

Least Squares ◽

Partial Least Squares ◽

Partial Least Squares Regression ◽

Speech Emotion Recognition ◽

Least Squares Regression ◽

Computer Science Pedagogy ◽

Reduction Methods ◽

Analysis Computer

Abstract Speech emotion recognition is deemed to be a meaningful and intractable issue among a number of do- mains comprising sentiment analysis, computer science, pedagogy, and so on. In this study, we investigate speech emotion recognition based on sparse partial least squares regression (SPLSR) approach in depth. We make use of the sparse partial least squares regression method to implement the feature selection and dimensionality reduction on the whole acquired speech emotion features. By the means of exploiting the SPLSR method, the component parts of those redundant and meaningless speech emotion features are lessened to zero while those serviceable and informative speech emotion features are maintained and selected to the following classification step. A number of tests on Berlin database reveal that the recogni- tion rate of the SPLSR method can reach up to 79.23% and is superior to other compared dimensionality reduction methods.

Download Full-text

ESTIMATION OF RIVER WATER QUALITY USING DIFFENTIAL ULTRAVIOLET-VISIBLE SPECTRA BASED ON PARTIAL LEAST SQUARES REGRESSION

Journal of Japan Society of Civil Engineers Ser B1 (Hydraulic Engineering) ◽

10.2208/jscejhe.74.i_301 ◽

2018 ◽

Vol 74 (4) ◽

pp. I_301-I_306 ◽

Cited By ~ 1

Author(s):

Yanping LYU ◽

Tsuyoshi KINOUCHI

Keyword(s):

Water Quality ◽

Least Squares ◽

River Water ◽

Partial Least Squares ◽

Partial Least Squares Regression ◽

River Water Quality ◽

Least Squares Regression ◽

Visible Spectra

Download Full-text

Aroma profiles of commercial Chinese traditional fermented fish (Suan yu) in Western Hunan: GC-MS, odor activity value and sensory evaluation by partial least squares regression

International Journal of Food Properties ◽

10.1080/10942912.2020.1716790 ◽

2020 ◽

Vol 23 (1) ◽

pp. 213-226 ◽

Cited By ~ 2

Author(s):

Pei Gao ◽

Qixing Jiang ◽

Yanshun Xu ◽

Fang Yang ◽

Peipei Yu ◽

...

Keyword(s):

Least Squares ◽

Partial Least Squares ◽

Sensory Evaluation ◽

Partial Least Squares Regression ◽

Least Squares Regression ◽

Fermented Fish ◽

Western Hunan ◽

Odor Activity Value

Download Full-text

Partial least squares regression with compositional response variables and covariates

Journal of Applied Statistics ◽

10.1080/02664763.2020.1795813 ◽

2020 ◽

pp. 1-20

Author(s):

Jiajia Chen ◽

Xiaoqin Zhang ◽

Karel Hron

Keyword(s):

Least Squares ◽

Partial Least Squares ◽

Partial Least Squares Regression ◽

Least Squares Regression ◽

Response Variables

Download Full-text