Logistic Biplot by Conjugate Gradient Algorithms and Iterated SVD

Jose Giovany Babativa-Márquez; José Luis Vicente-Villardón

doi:10.3390/math9162015

Logistic Biplot by Conjugate Gradient Algorithms and Iterated SVD

Mathematics ◽

10.3390/math9162015 ◽

2021 ◽

Vol 9 (16) ◽

pp. 2015

Author(s):

Jose Giovany Babativa-Márquez ◽

José Luis Vicente-Villardón

Keyword(s):

Conjugate Gradient ◽

Binary Data ◽

Cross Validation ◽

Principal Component ◽

Monte Carlo Study ◽

Data Matrix ◽

Binary Matrix ◽

Data Set ◽

Conjugate Gradient Algorithms ◽

The Matrix

Multivariate binary data are increasingly frequent in practice. Although some adaptations of principal component analysis are used to reduce dimensionality for this kind of data, none of them provide a simultaneous representation of rows and columns (biplot). Recently, a technique named logistic biplot (LB) has been developed to represent the rows and columns of a binary data matrix simultaneously, even though the algorithm used to fit the parameters is too computationally demanding to be useful in the presence of sparsity or when the matrix is large. We propose the fitting of an LB model using nonlinear conjugate gradient (CG) or majorization–minimization (MM) algorithms, and a cross-validation procedure is introduced to select the hyperparameter that represents the number of dimensions in the model. A Monte Carlo study that considers scenarios with several sparsity levels and different dimensions of the binary data set shows that the procedure based on cross-validation is successful in the selection of the model for all algorithms studied. The comparison of the running times shows that the CG algorithm is more efficient in the presence of sparsity and when the matrix is not very large, while the performance of the MM algorithm is better when the binary matrix is balanced or large. As a complement to the proposed methods and to give practical support, a package has been written in the R language called BiplotML. To complete the study, real binary data on gene expression methylation are used to illustrate the proposed methods.

Download Full-text

Association Plots: Visualizing associations in high-dimensional correspondence analysis biplots

10.1101/2020.10.23.352096 ◽

2020 ◽

Author(s):

Elzbieta Gralinska ◽

Martin Vingron

Keyword(s):

Correspondence Analysis ◽

Principal Component ◽

Projection Methods ◽

Three Dimensions ◽

Data Matrix ◽

Small Data ◽

Complex Data ◽

Science Data ◽

Data Set ◽

The Matrix

SummaryIn molecular biology, just as in many other fields of science, data often come in the form of matrices or contingency tables with many measurements (rows) for a set of variables (columns). While projection methods like Principal Component Analysis or Correspondence Analysis can be applied for obtaining an overview of such data, in cases where the matrix is very large the associated loss of information upon projection into two or three dimensions may be dramatic. However, when the set of variables can be grouped into clusters, this opens up a new angle on the data. We focus on the question which measurements are associated to a cluster and distinguish it from other clusters. Correspondence Analysis employs a geometry geared towards answering this question. We exploit this feature in order to introduce Association Plots for visualizing cluster-specific measurements in complex data. Association Plots are two-dimensional, independent of the size of data matrix or cluster, and depict the measurements associated to a cluster of variables. We demonstrate our method first on a small data set and then on a genomic example comprising more than 10,000 conditions. We will show that Association Plots can clearly highlight those measurements which characterize a cluster of variables.

Download Full-text

A New Neural Network Approach For Face Recognition Based On Conjugate Gradient Algorithms And Principal Component Analysis

Journal of Mathematics and Computer Science ◽

10.22436/jmcs.06.01.09 ◽

2013 ◽

Vol 06 (01) ◽

pp. 166-175 ◽

Cited By ~ 1

Author(s):

Hamed Azami ◽

Milad Malekzadeh ◽

Saeid Sanei

Keyword(s):

Neural Network ◽

Principal Component Analysis ◽

Face Recognition ◽

Conjugate Gradient ◽

Principal Component ◽

Component Analysis ◽

Network Approach ◽

Neural Network Approach ◽

Conjugate Gradient Algorithms ◽

Gradient Algorithms

Download Full-text

Deteksi Penyakit Kanker Payudara dengan Seleksi Fitur berbasis Principal Component Analysis dan Random Forest

Jurnal Infortech ◽

10.31294/infortech.v2i1.8079 ◽

2020 ◽

Vol 2 (1) ◽

pp. 96-101

Author(s):

Ahmad Fauzi ◽

Riki Supriyadi ◽

Nurlaelatul Maulidah

Keyword(s):

Breast Cancer ◽

Principal Component Analysis ◽

Random Forest ◽

Cross Validation ◽

Principal Component ◽

Component Analysis ◽

Data Set ◽

Fold Cross Validation

Abstrak - Skrining merupakan upaya deteksi dini untuk mengidentifikasi penyakit atau kelainan yang secara klinis belum jelas dengan menggunakan tes, pemeriksaan atau prosedur tertentu. Upaya ini dapat digunakan secara cepat untuk membedakan orang - orang yang kelihatannya sehat tetapi sesungguhnya menderita suatu kelainan.Tujuan utama penelitian ini adalah untuk meningkatkan peforma klasifikasi pada diagnosis kanker payudara dengan menerapkan seleksi fitur pada beberapa algoritme klasifikasi. Penelitian ini menggunakan database kanker payudara Breast Cancer Coimbra Data Set . Metode seleksi fitur berbasis pricipal component analysis akan dipasangkan dengan beberapa algoritme klasifikasi dan metode, seperti Logitboost,Bagging,dan Random Forest. Penelitian ini menggunakan 10 fold cross validation sebagai metode evaluasi. Hasil penelitian menunjukkan metode seleksi fitur berbasis pricipal component analysis mengalami peningkatan peforma klasifikasi secara signifikan setelah dipasangkan dengan seleksi fitur Random Forest dan logitboost, Random forest menunjukan peforma terbaik dengan akurasi 79.3103% dengan nilai AUC sebesar 0,843. Kata Kunci: Seleksi Fitur,PCA, Kanker Payudara,Skrining,Random Forest

Download Full-text

Identification of Rainfall Patterns on Hydrological Simulation Using Robust Principal Component Analysis

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v11.i3.pp1162-1167 ◽

2018 ◽

Vol 11 (3) ◽

pp. 1162 ◽

Cited By ~ 1

Author(s):

S.M. Shaharudin ◽

N. Ahmad ◽

N.H. Zainuddin ◽

N.S. Mohamed

Keyword(s):

Principal Component Analysis ◽

Simulated Data ◽

Principal Component ◽

Breakdown Point ◽

Component Analysis ◽

Data Matrix ◽

Robust Pca ◽

Data Set ◽

Number Of Components ◽

Rainfall Patterns

A robust dimension reduction method in Principal Component Analysis (PCA) was used to rectify the issue of unbalanced clusters in rainfall patterns due to the skewed nature of rainfall data. A robust measure in PCA using Tukey’s biweight correlation to downweigh observations was introduced and the optimum breakdown point to extract the number of components in PCA using this approach is proposed. A set of simulated data matrix that mimicked the real data set was used to determine an appropriate breakdown point for robust PCA and compare the performance of the both approaches. The simulated data indicated a breakdown point of 70% cumulative percentage of variance gave a good balance in extracting the number of components .The results showed a more significant and substantial improvement with the robust PCA than the PCA based Pearson correlation in terms of the average number of clusters obtained and its cluster quality.

Download Full-text

Factor Analysis, Random Data, and Patterned Results

American Antiquity ◽

10.2307/280208 ◽

1981 ◽

Vol 46 (2) ◽

pp. 272-283 ◽

Cited By ~ 17

Author(s):

Robert K. Vierra ◽

David L. Carlson

Keyword(s):

Factor Analysis ◽

Correlation Matrix ◽

Statistical Significance ◽

Principal Component ◽

Multivariate Statistical Techniques ◽

Random Data ◽

Multivariate Statistical ◽

Data Set ◽

Bartlett’S Test ◽

The Matrix

Multivariate statistical techniques such as factor analysis are capable of producing patterned results with most, if not all, data matrices. This paper demonstrates that patterned results are obtainable when principal component analysis is applied to a random data set. It is suggested that Bartlett's test for the statistical significance of a correlation matrix be employed in deciding whether a factor analysis of the matrix is justified.

Download Full-text

Impact of sample size on principal component analysis ordination of an environmental data set: effects on eigenstructure

Ekológia (Bratislava) ◽

10.1515/eko-2016-0014 ◽

2016 ◽

Vol 35 (2) ◽

pp. 173-190 ◽

Cited By ~ 13

Author(s):

S. Shahid Shaukat ◽

Toqeer Ahmed Rao ◽

Moazzam A. Khan

Keyword(s):

Principal Component Analysis ◽

Sample Size ◽

Principal Component ◽

Component Analysis ◽

Small Sample ◽

Environmental Data ◽

Data Matrix ◽

Data Sets ◽

Data Set ◽

The Impact

AbstractIn this study, we used bootstrap simulation of a real data set to investigate the impact of sample size (N = 20, 30, 40 and 50) on the eigenvalues and eigenvectors resulting from principal component analysis (PCA). For each sample size, 100 bootstrap samples were drawn from environmental data matrix pertaining to water quality variables (p = 22) of a small data set comprising of 55 samples (stations from where water samples were collected). Because in ecology and environmental sciences the data sets are invariably small owing to high cost of collection and analysis of samples, we restricted our study to relatively small sample sizes. We focused attention on comparison of first 6 eigenvectors and first 10 eigenvalues. Data sets were compared using agglomerative cluster analysis using Ward’s method that does not require any stringent distributional assumptions.

Download Full-text

Analyzing Raman Maps of Pharmaceutical Products by Sample—Sample Two-Dimensional Correlation

Applied Spectroscopy ◽

10.1366/0003702053946047 ◽

2005 ◽

Vol 59 (5) ◽

pp. 630-638 ◽

Cited By ~ 21

Author(s):

Slobodan Šašić ◽

Donald A. Clark ◽

John C. Mitchell ◽

Martin J. Snowden

Keyword(s):

Covariance Matrix ◽

Principal Component ◽

Pharmaceutical Products ◽

Correlation Spectroscopy ◽

Data Matrix ◽

Two Dimensional ◽

Data Set ◽

Spectral Matrix ◽

Projection Approach ◽

Few Data

Sample–sample (SS) two-dimensional (2D) correlation spectroscopy is applied in this study as a spectral selection tool to produce chemical images of real-world pharmaceutical samples consisting of two, three, and four components. The most unique spectra in a Raman mapping spectral matrix are found after analysis of the covariance matrix. (This is obtained by multiplying the original mapping data matrix by itself.) These spectra are identified by analyzing the slices of the covariance matrix at the positions where covariance values are at maxima. Chemical images are subsequently produced in a univariate fashion by visually selecting the wavenumbers in the extracted spectra that are least overlapped. The performance of SS 2D correlation is compared with principal component analysis in terms of highlighting the most prominent spectral differences across the whole data set (which typically comprises several thousand spectra) and determining the total number of species present. In addition, the selection of the unique spectra by SS 2D correlation is compared with the selection obtained by the orthogonal projection approach (OPA). Both comparisons are found to be satisfactory and demonstrate that a quite simple SS 2D correlation routine can be used for producing reliable images of unknown samples. The main benefit of using SS 2D correlation is that it is based on a few data processing commands that can be executed separately and produce results that are closely related to the chemical features of the system.

Download Full-text

A New Neural Network Approach For Face Recognition Based On Conjugate Gradient Algorithms And Principal Component Analysis

Journal of Mathematics and Computer Science ◽

10.22436/jmcs.06.03.01 ◽

2013 ◽

Vol 06 (03) ◽

pp. 166-175 ◽

Cited By ~ 9

Author(s):

Hamed Azami ◽

Milad Malekzadeh ◽

Saeid Sanei

Keyword(s):

Neural Network ◽

Principal Component Analysis ◽

Face Recognition ◽

Conjugate Gradient ◽

Principal Component ◽

Component Analysis ◽

Network Approach ◽

Neural Network Approach ◽

Conjugate Gradient Algorithms ◽

Gradient Algorithms

Download Full-text

DIFFERENTIATION OF VIETNAMESE COFFEE ORIGIN AND CULTIVARS BY AMINO AND FATTY ACID PROFILE ANALYSIS PRELIMINARY STUDY

Vietnam Journal of Science and Technology ◽

10.15625/2525-2518/58/6a/15629 ◽

2021 ◽

Vol 58 (6A) ◽

pp. 288

Author(s):

Hoang Quoc Tuan ◽

Lai Quoc Dat ◽

Cung Thi To Quynh ◽

Nguyen Hoang Dung ◽

Nguyen Xuan Loi ◽

...

Keyword(s):

Fatty Acids ◽

Glutamic Acid ◽

Geographical Origin ◽

Profile Analysis ◽

Principal Component ◽

Hierarchical Cluster ◽

Eicosenoic Acid ◽

Complete Data ◽

Data Matrix ◽

Data Set

Compositions of fatty acids and amino acids compound were investigated in coffee beans included Arabica and Robusta cultivars grown in three region of Vietnam. Principal component analysis (PCA) and hierarchical cluster analysis (HCA) were performed on the complete data set to reveal chemical differences among all samples and identify markers characteristic of a particular botanical geographical origin of the coffee. The major fatty acids in the coffee oil analyzed in this study were linoleic acid (C18:2), stearic acid (C18:0), oleic acid (C18:1) palmitic acid (C16:0) and myristic acid (C14:0), followed by small amounts of arachic acid (C20:0), docosanoic acid (C22:0) and eicosenoic acid (C20:1). Glutamic acid and aspartic acid were found at high amount in robusta coffee, from 271 mg/100gDW to 786 mg/100g DW and 373mg/100g DW to 486 mg/100g DW, respectively, whereas alanine and glutamic acid in arabica coffee were in high amount at 268 mg/100g DW to 351 mg/100g DW and 209 mg/100g DW to 285 mg/100g DW, respectively. Leucine (301 to 416 mg/100 g DW), phenylalanine (226 to 305 mg/100 g DW), and lysine (199 to 269 mg/100 g DW). PCA of the complete data matrix demonstrated that there were significant differences among all coffee cultivars and geographical origin, HCA supported the results of PCA and achieved a satisfactory classification performance.

Download Full-text

Time Series Components Separation Based on Singular Spectral Analysis Visualization: an HJ-biplot Method Application

Statistics Optimization & Information Computing ◽

10.19139/soic-2310-5070-897 ◽

2020 ◽

Vol 8 (2) ◽

pp. 346-358

Author(s):

Alberto Oliveira da Silva ◽

Adelaide Freitas

Keyword(s):

Time Series ◽

Time Series Data ◽

Singular Spectrum Analysis ◽

Principal Component ◽

Series Data ◽

Real World Data ◽

Components Separation ◽

Data Set ◽

The Matrix ◽

Simultaneous Representation

The extraction of essential features of any real-valued time series is crucial for exploring, modeling and producing, for example, forecasts. Taking advantage of the representation of a time series data by its trajectory matrix of Hankel constructed using Singular Spectrum Analysis, as well as of its decomposition through Principal Component Analysis via Partial Least Squares, we implement a graphical display employing the biplot methodology. A diversity of types of biplots can be constructed depending on the two matrices considered in the factorization of the trajectory matrix. In this work, we discuss the called HJ-biplot which yields a simultaneous representation of both rows and columns of the matrix with maximum quality. Interpretation of this type of biplot on Hankel related trajectory matrices is discussed from a real-world data set.

Download Full-text