A Comparison of Techniques for Modelling Data with Non-Linear Structure

Laila Stordrange; Olav M. Kvalheim; Per A. Hassel; Dick Malthe-Sørenssen; Fred Olav Libnau

doi:10.1255/jnirs.354

A Comparison of Techniques for Modelling Data with Non-Linear Structure

Journal of Near Infrared Spectroscopy ◽

10.1255/jnirs.354 ◽

2003 ◽

Vol 11 (1) ◽

pp. 55-70 ◽

Cited By ~ 6

Author(s):

Laila Stordrange ◽

Olav M. Kvalheim ◽

Per A. Hassel ◽

Dick Malthe-Sørenssen ◽

Fred Olav Libnau

Keyword(s):

Near Infrared ◽

Principal Component Regression ◽

Predictive Ability ◽

Principal Component ◽

Linear Structure ◽

Data Set ◽

Linear Pattern ◽

Near Infrared Spectra ◽

Non Linear ◽

Regression Techniques

Partial least squares (PLS) is a powerful tool for multivariate linear regression. But what if the data show a non-linear structure? Near infrared spectra from a pharmaceutical process were used as a case study. An ANOVA test revealed that the data are well described by a 2nd order polynomial. This work investigates the application of regression techniques that account for slightly non-linear data. The regression techniques investigated are: linearising data by applying transformations, local PLS, i.e. splitting of data, and quadratic PLS. These models were compared with ordinary PLS and principal component regression (PCR). The predictive ability of the models was tested on an independent data set acquired a year later. Using the knowledge of non-linear pattern and important spectral regions, simpler models with better predictive ability can be obtained.

Download Full-text

A New Method for Multivariate Calibration

Journal of Near Infrared Spectroscopy ◽

10.1255/jnirs.555 ◽

2005 ◽

Vol 13 (5) ◽

pp. 241-254 ◽

Cited By ~ 50

Author(s):

Ralf Marbach

Keyword(s):

Near Infrared ◽

Multivariate Calibration ◽

Low Cost ◽

Classical Model ◽

Principal Component Regression ◽

Principal Component ◽

New Method ◽

Data Set ◽

Pharmaceutical Tablets ◽

K Matrix

A new method for multivariate calibration is described that combines the best features of “classical” (also called “physical” or “K-matrix”) calibration and “inverse” (or “statistical” or “P-matrix”) calibration. By estimating the spectral signal in the physical way and the spectral noise in the statistical way, so to speak, the prediction accuracy of the inverse model can be combined with the low cost and ease of interpretability of the classical model, including “built-in” proof of specificity of response. The cost of calibration is significantly reduced compared to today's standard practice of statistical calibration using partial least squares or principal component regression, because the need for lab-reference values is virtually eliminated. The method is demonstrated on a data set of near-infrared spectra from pharmaceutical tablets, which is available on the web (so-called Chambersburg Shoot-out 2002 data set). Another benefit is that the correct definitions of the “limits of multivariate detection” become obvious. The sensitivity of multivariate measurements is shown to be limited by the so-called “spectral noise,” and the specificity is shown to be limited by potentially existing “unspecific correlations.” Both limits are testable from first principles, i.e. from measurable pieces of data and without the need to perform any calibration.

Download Full-text

Linear and non-linear pattern recognition models for classification of fruit from visible–near infrared spectra

Chemometrics and Intelligent Laboratory Systems ◽

10.1016/s0169-7439(00)00070-8 ◽

2000 ◽

Vol 51 (2) ◽

pp. 201-216 ◽

Cited By ~ 46

Author(s):

Jaesoo Kim ◽

Alistair Mowat ◽

Philip Poole ◽

Nikola Kasabov

Keyword(s):

Pattern Recognition ◽

Infrared Spectra ◽

Near Infrared ◽

Linear Pattern ◽

Near Infrared Spectra ◽

Non Linear

Download Full-text

Optimised Scaling (OS-2) Regression Applied to near Infrared Diffuse Spectroscopy Data from Food Products

Journal of Near Infrared Spectroscopy ◽

10.1255/jnirs.12 ◽

1993 ◽

Vol 1 (2) ◽

pp. 85-97 ◽

Cited By ~ 7

Author(s):

Tomas Isaksson ◽

Ziyi Wang ◽

Bruce Kowalski

Keyword(s):

Near Infrared ◽

Scatter Correction ◽

Principal Component Regression ◽

Predictive Ability ◽

Principal Component ◽

Calibration Method ◽

Meat Sample ◽

Data Sets ◽

Regression Methods ◽

Calibration Models

A recently presented calibration method, called optimised scaling (OS-2) was tested and compared to multiplicative scatter correction (MSC) and principal component regression (PCR). The predictive ability of these regression methods was tested on eight data sets consisting of diffuse near infrared (NIR) reflectance and transmittance continuous spectra of meat, sausages, soya bean and designed sample sets. Calibration was performed for constituents such as fat, protein, water, carbohydrate, temperature, lactate and glucose. A total of 21 calibration models were validated and compared. OS-2 gave good or promising prediction results for the major constituents with large variation, such as prediction of fat in two of the studied meat sample sets. OS-2 gave poorer prediction results of minor constituents compared to MSC or first derivatives of the data and PCR.

Download Full-text

QSAR Study of PARP Inhibitors by GA-MLR, GA-SVM and GA-ANN Approaches

Current Analytical Chemistry ◽

10.2174/1573411016999200518083359 ◽

2020 ◽

Vol 16 (8) ◽

pp. 1088-1105

Author(s):

Nafiseh Vahedi ◽

Majid Mohammadhosseini ◽

Mehdi Nekoei

Keyword(s):

Present Report ◽

Principal Component ◽

Parp Inhibitors ◽

Support Vector ◽

Ann Model ◽

Statistical Parameters ◽

Qsar Study ◽

Data Set ◽

Test Set ◽

Non Linear

Background: The poly(ADP-ribose) polymerases (PARP) is a nuclear enzyme superfamily present in eukaryotes. Methods: In the present report, some efficient linear and non-linear methods including multiple linear regression (MLR), support vector machine (SVM) and artificial neural networks (ANN) were successfully used to develop and establish quantitative structure-activity relationship (QSAR) models capable of predicting pEC50 values of tetrahydropyridopyridazinone derivatives as effective PARP inhibitors. Principal component analysis (PCA) was used to a rational division of the whole data set and selection of the training and test sets. A genetic algorithm (GA) variable selection method was employed to select the optimal subset of descriptors that have the most significant contributions to the overall inhibitory activity from the large pool of calculated descriptors. Results: The accuracy and predictability of the proposed models were further confirmed using crossvalidation, validation through an external test set and Y-randomization (chance correlations) approaches. Moreover, an exhaustive statistical comparison was performed on the outputs of the proposed models. The results revealed that non-linear modeling approaches, including SVM and ANN could provide much more prediction capabilities. Conclusion: Among the constructed models and in terms of root mean square error of predictions (RMSEP), cross-validation coefficients (Q2 LOO and Q2 LGO), as well as R2 and F-statistical value for the training set, the predictive power of the GA-SVM approach was better. However, compared with MLR and SVM, the statistical parameters for the test set were more proper using the GA-ANN model.

Download Full-text

Cox Survival Analysis of Microarray Gene Expression Data Using Correlation Principal Component Regression

Statistical Applications in Genetics and Molecular Biology ◽

10.2202/1544-6115.1153 ◽

2007 ◽

Vol 6 (1) ◽

Cited By ~ 4

Author(s):

Qiang Zhao ◽

Jianguo Sun

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Principal Component Regression ◽

Predictive Ability ◽

Principal Component ◽

Microarray Gene Expression Data ◽

Expression Data ◽

Microarray Gene Expression ◽

New Approach ◽

Microarray Gene

Statistical analysis of microarray gene expression data has recently attracted a great deal of attention. One problem of interest is to relate genes to survival outcomes of patients with the purpose of building regression models for the prediction of future patients' survival based on their gene expression data. For this, several authors have discussed the use of the proportional hazards or Cox model after reducing the dimension of the gene expression data. This paper presents a new approach to conduct the Cox survival analysis of microarray gene expression data with the focus on models' predictive ability. The method modifies the correlation principal component regression (Sun, 1995) to handle the censoring problem of survival data. The results based on simulated data and a set of publicly available data on diffuse large B-cell lymphoma show that the proposed method works well in terms of models' robustness and predictive ability in comparison with some existing partial least squares approaches. Also, the new approach is simpler and easy to implement.

Download Full-text

Nondestructive NIR and NIT Determination of Protein, Fat, and Water in Plastic-Wrapped, Homogenized Meat

Applied Spectroscopy ◽

10.1366/0003702924926745 ◽

1992 ◽

Vol 46 (11) ◽

pp. 1685-1694 ◽

Cited By ~ 37

Author(s):

Tomas Isaksson ◽

Charles E. Miller ◽

Tormod Næs

Keyword(s):

Diffuse Reflectance ◽

Near Infrared ◽

Multivariate Calibration ◽

Principal Component Regression ◽

Principal Component ◽

Prediction Errors ◽

Optimal Test ◽

Meat Samples ◽

Water Contents

In this work, the abilities of near-infrared diffuse reflectance (NIR) and transmittance (NIT) spectroscopy to noninvasively determine the protein, fat, and water contents of plastic-wrapped homogenized meat are evaluated. One hundred homogenized beef samples, ranging from 1 to 23% fat, wrapped in polyamide/polyethylene laminates, were used. Results of multivariate calibration and prediction for protein, fat, and water contents are presented. The optimal test set prediction errors (root mean square error of prediction, RMSEP), obtained with the use of the principal component regression method with NIR data, were 0.45, 0.29 and 0.50 weight % for protein, fat, and water, respectively, for plastic-wrapped meat (compared to 0.40, 0.28 and 0.45 wt % for unwrapped meat). The optimal prediction errors for the NIT method were 0.31, 0.52 and 0.42 wt % for protein, fat, and water, respectively, for plastic-wrapped meat samples (compared to 0.27, 0.38, and 0.37 wt % for unwrapped meat). We can conclude that the addition of the laminate only slightly reduced the abilities of the NIR and NIT method to predict protein, fat, and water contents in homogenized meat.

Download Full-text

Multicomponent Analysis of Near-Infrared Spectra of Anesthetized Rat Head: (I) Estimation of Component Spectra by Principal Component Analysis

Oxygen Transport to Tissue XI - Advances in Experimental Medicine and Biology ◽

10.1007/978-1-4684-5643-1_1 ◽

1989 ◽

pp. 3-10 ◽

Cited By ~ 5

Author(s):

Ryuichiro Araki ◽

Ichiro Nashimoto

Keyword(s):

Principal Component Analysis ◽

Infrared Spectra ◽

Near Infrared ◽

Principal Component ◽

Component Analysis ◽

Multicomponent Analysis ◽

Near Infrared Spectra

Download Full-text

Spectroscopic characterization of human immunodeficiency virus type-1-infected plasma by principal component analysis and soft independent modeling of class analogy of visible and near-infrared spectra

Molecular Medicine Reports ◽

10.3892/mmr_00000176 ◽

2009 ◽

Vol 2 (5) ◽

Cited By ~ 1

Author(s):

Sakudo

Keyword(s):

Principal Component Analysis ◽

Human Immunodeficiency Virus ◽

Infrared Spectra ◽

Near Infrared ◽

Principal Component ◽

Spectroscopic Characterization ◽

Near Infrared Spectra ◽

Immunodeficiency Virus

Download Full-text

Reconstruction of constituent spectra for individual samples through principal component analysis of near-infrared spectra

The Analyst ◽

10.1039/an9891400683 ◽

1989 ◽

Vol 114 (6) ◽

pp. 683 ◽

Cited By ~ 3

Author(s):

Ian A. Cowe ◽

James W. McNicol ◽

D. Clifford Cuthbertson

Keyword(s):

Principal Component Analysis ◽

Infrared Spectra ◽

Near Infrared ◽

Principal Component ◽

Component Analysis ◽

Near Infrared Spectra

Download Full-text

Enhancing Web Data Mining

Advances in Data Mining and Database Management - Web Usage Mining Techniques and Applications Across Industries ◽

10.4018/978-1-5225-0613-3.ch005 ◽

2017 ◽

pp. 116-136

Author(s):

Abhishek Taneja

Keyword(s):

Web Mining ◽

Sufficient Conditions ◽

Principal Component Regression ◽

Predictive Ability ◽

Principal Component ◽

Least Square ◽

Meta Learning ◽

The Right ◽

Task Oriented ◽

Three Factor

An enormous production of databases in almost every area of human endeavor particularly through web has created a great demand for new, powerful tools for turning data into useful, task-oriented knowledge. The aim of this study is to study the predictive ability of Factor Analysis a web mining technique to prevent voting, averaging, stack generalization, meta- learning and thus saving much of our time in choosing the right technique for right kind of underlying dataset. This chapter compares the three factor based techniques viz. principal component regression (PCR), Generalized Least Square (GLS) Regression, and Maximum Likelihood Regression (MLR) method and explores their predictive ability on theoretical as well as on experimental basis. All the three factor based techniques have been compared using the necessary conditions for forecasting like R-Square, Adjusted R-Square, F-Test, JB (Jarque-Bera) test of normality. This study can be further explored and enhanced using sufficient conditions for forecasting like Theil's Inequality coefficient (TIC), and Janur Quotient (JQ).

Download Full-text