scholarly journals Regression-Based Methods for Finding Coupled Patterns

2008 ◽  
Vol 21 (17) ◽  
pp. 4384-4398 ◽  
Author(s):  
Michael K. Tippett ◽  
Timothy DelSole ◽  
Simon J. Mason ◽  
Anthony G. Barnston

Abstract There are a variety of multivariate statistical methods for analyzing the relations between two datasets. Two commonly used methods are canonical correlation analysis (CCA) and maximum covariance analysis (MCA), which find the projections of the data onto coupled patterns with maximum correlation and covariance, respectively. These projections are often used in linear prediction models. Redundancy analysis and principal predictor analysis construct projections that maximize the explained variance and the sum of squared correlations of regression models. This paper shows that the above pattern methods are equivalent to different diagonalizations of the regression between the two datasets. The different diagonalizations are computed using the singular value decomposition of the regression matrix developed using data that are suitably transformed for each method. This common framework for the pattern methods permits easy comparison of their properties. Principal component regression is shown to be a special case of CCA-based regression. A commonly used linear prediction model constructed from MCA patterns does not give a least squares estimate since correlations among MCA predictors are neglected. A variation, denoted least squares estimate (LSE)-MCA, is suggested that uses the same patterns but minimizes squared error. Since the different pattern methods correspond to diagonalizations of the same regression matrix, they all produce the same regression model when a complete set of patterns is used. Different prediction models are obtained when an incomplete set of patterns is used, with each method optimizing different properties of the regression. Some key points are illustrated in two idealized examples, and the methods are applied to statistical downscaling of rainfall over the northeast of Brazil.

2007 ◽  
Vol 90 (2) ◽  
pp. 391-404 ◽  
Author(s):  
Fadia H Metwally ◽  
Yasser S El-Saharty ◽  
Mohamed Refaat ◽  
Sonia Z El-Khateeb

Abstract New selective, precise, and accurate methods are described for the determination of a ternary mixture containing drotaverine hydrochloride (I), caffeine (II), and paracetamol (III). The first method uses the first (D1) and third (D3) derivative spectrophotometry at 331 and 315 nm for the determination of (I) and (III), respectively, without interference from (II). The second method depends on the simultaneous use of the first derivative of the ratio spectra (DD1) with measurement at 312.4 nm for determination of (I) using the spectrum of 40 μg/mL (III) as a divisor or measurement at 286.4 and 304 nm after using the spectrum of 4 μg/mL (I) as a divisor for the determination of (II) and (III), respectively. In the third method, the predictive abilities of the classical least-squares, principal component regression, and partial least-squares were examined for the simultaneous determination of the ternary mixture. The last method depends on thin-layer chromatography-densitometry after separation of the mixture on silica gel plates using ethyl acetatechloroformmethanol (16 + 3 + 1, v/v/v) as the mobile phase. The spots were scanned at 281, 272, and 248 nm for the determination of (I), (II), and (III), respectively. Regression analysis showed good correlation in the selected ranges with excellent percentage recoveries. The chemical variables affecting the analytical performance of the methodology were studied and optimized. The methods showed no significant interferences from excipients. Intraday and interday assay precision and accuracy values were within regulatory limits. The suggested procedures were checked using laboratory-prepared mixtures and were successfully applied for the analysis of their pharmaceutical preparations. The validity of the proposed methods was further assessed by applying a standard addition technique. The results obtained by applying the proposed methods were statistically analyzed and compared with those obtained by the manufacturer's method.


2021 ◽  
Vol 11 (13) ◽  
pp. 5895
Author(s):  
Kristina Serec ◽  
Sanja Dolanski Babić

The double-stranded B-form and A-form have long been considered the two most important native forms of DNA, each with its own distinct biological roles and hence the focus of many areas of study, from cellular functions to cancer diagnostics and drug treatment. Due to the heterogeneity and sensitivity of the secondary structure of DNA, there is a need for tools capable of a rapid and reliable quantification of DNA conformation in diverse environments. In this work, the second paper in the series that addresses conformational transitions in DNA thin films utilizing FTIR spectroscopy, we exploit popular chemometric methods: the principal component analysis (PCA), support vector machine (SVM) learning algorithm, and principal component regression (PCR), in order to quantify and categorize DNA conformation in thin films of different hydrated states. By complementing FTIR technique with multivariate statistical methods, we demonstrate the ability of our sample preparation and automated spectral analysis protocol to rapidly and efficiently determine conformation in DNA thin films based on the vibrational signatures in the 1800–935 cm−1 range. Furthermore, we assess the impact of small hydration-related changes in FTIR spectra on automated DNA conformation detection and how to avoid discrepancies by careful sampling.


Processes ◽  
2021 ◽  
Vol 9 (1) ◽  
pp. 166
Author(s):  
Majed Aljunaid ◽  
Yang Tao ◽  
Hongbo Shi

Partial least squares (PLS) and linear regression methods are widely utilized for quality-related fault detection in industrial processes. Standard PLS decomposes the process variables into principal and residual parts. However, as the principal part still contains many components unrelated to quality, if these components were not removed it could cause many false alarms. Besides, although these components do not affect product quality, they have a great impact on process safety and information about other faults. Removing and discarding these components will lead to a reduction in the detection rate of faults, unrelated to quality. To overcome the drawbacks of Standard PLS, a novel method, MI-PLS (mutual information PLS), is proposed in this paper. The proposed MI-PLS algorithm utilizes mutual information to divide the process variables into selected and residual components, and then uses singular value decomposition (SVD) to further decompose the selected part into quality-related and quality-unrelated components, subsequently constructing quality-related monitoring statistics. To ensure that there is no information loss and that the proposed MI-PLS can be used in quality-related and quality-unrelated fault detection, a principal component analysis (PCA) model is performed on the residual component to obtain its score matrix, which is combined with the quality-unrelated part to obtain the total quality-unrelated monitoring statistics. Finally, the proposed method is applied on a numerical example and Tennessee Eastman process. The proposed MI-PLS has a lower computational load and more robust performance compared with T-PLS and PCR.


Author(s):  
Hervé Cardot ◽  
Pascal Sarda

This article presents a selected bibliography on functional linear regression (FLR) and highlights the key contributions from both applied and theoretical points of view. It first defines FLR in the case of a scalar response and shows how its modelization can also be extended to the case of a functional response. It then considers two kinds of estimation procedures for this slope parameter: projection-based estimators in which regularization is performed through dimension reduction, such as functional principal component regression, and penalized least squares estimators that take into account a penalized least squares minimization problem. The article proceeds by discussing the main asymptotic properties separating results on mean square prediction error and results on L2 estimation error. It also describes some related models, including generalized functional linear models and FLR on quantiles, and concludes with a complementary bibliography and some open problems.


2015 ◽  
Vol 28 (3) ◽  
pp. 1016-1030 ◽  
Author(s):  
Erik Swenson

Abstract Various multivariate statistical methods exist for analyzing covariance and isolating linear relationships between datasets. The most popular linear methods are based on singular value decomposition (SVD) and include canonical correlation analysis (CCA), maximum covariance analysis (MCA), and redundancy analysis (RDA). In this study, continuum power CCA (CPCCA) is introduced as one extension of continuum power regression for isolating pairs of coupled patterns whose temporal variation maximizes the squared covariance between partially whitened variables. Similar to the whitening transformation, the partial whitening transformation acts to decorrelate individual variables but only to a partial degree with the added benefit of preconditioning sample covariance matrices prior to inversion, providing a more accurate estimate of the population covariance. CPCCA is a unified approach in the sense that the full range of solutions bridges CCA, MCA, RDA, and principal component regression (PCR). Recommended CPCCA solutions include a regularization for CCA, a variance bias correction for MCA, and a regularization for RDA. Applied to synthetic data samples, such solutions yield relatively higher skill in isolating known coupled modes embedded in noise. Provided with some crude prior expectation of the signal-to-noise ratio, the use of asymmetric CPCCA solutions may be justifiable and beneficial. An objective parameter choice is offered for regularization with CPCCA based on the covariance estimate of O. Ledoit and M. Wolf, and the results are quite robust. CPCCA is encouraged for a range of applications.


2016 ◽  
Vol 99 (5) ◽  
pp. 1247-1251 ◽  
Author(s):  
Hamed M Elfatatry ◽  
Mokhtar M Mabrouk ◽  
Sherin F Hammad ◽  
Fotouh R Mansour ◽  
Amira H Kamal ◽  
...  

Abstract The present work describes new spectrophotometric methods for the simultaneous determination of phenylephrine hydrochloride and ketorolac tromethamine in their synthetic mixtures. The applied chemometric techniques are multivariate methods including classical least squares, principal component regression, and partial least squares. In these techniques, the concentration data matrix was prepared by using the synthetic mixtures containing these drugs dissolved in distilled water. The absorbance data matrix corresponding to the concentration data was obtained by measuring the absorbances at 16 wavelengths in the range 244–274 nm at 2 nm intervals in the zero-order spectra. The spectrophotometric procedures do not require any separation steps. The accuracy, precision, and linearity ranges of the methods have been determined, and analyzing synthetic mixtures containing the studied drugs has validated them. The developed methods were successfully applied to the synthetic mixtures and the results were compared to those obtained by a reported HPLC method.


2018 ◽  
Vol 2018 ◽  
pp. 1-7 ◽  
Author(s):  
Edwin García-Miguel ◽  
Ofelia Gabriela Meza-Márquez ◽  
Guillermo Osorio-Revilla ◽  
Darío Iker Téllez-Medina ◽  
Cristian Jiménez-Martínez ◽  
...  

Chemometric methods using mid-FTIR spectroscopy were developed in order to reduce the time of study of melamine and cyanuric acid in infant formulas. Chemometric models were constructed using the algorithms Partial Least Squares (PLS1, PLS2) and Principal Component Regression (PCR) in order to correlate the IR signal with the levels of melamine or cyanuric acid in the infant formula samples. Results showed that the best correlations were obtained using PLS1 (R2: 0.9998, SEC: 0.0793, and SEP: 0.5545 for melamine and R2: 0.9997, SEC: 0.1074, and SEP: 0.5021 for cyanuric acid). Also, the SIMCA model was studied to distinguish between adulterated formulas and nonadulterated samples, giving optimum discrimination and good interclass distances between samples. Results showed that chemometric models demonstrated a good predictive ability of melamine and cyanuric acid concentrations in infant formulas, showing that this is a rapid and accurate technique to be used in the identification and quantification of these adulterants in infant formulas.


Sign in / Sign up

Export Citation Format

Share Document