Two-Phase Incremental Kernel PCA for Learning Massive or Online Datasets

Complexity ◽

10.1155/2019/5937274 ◽

2019 ◽

Vol 2019 ◽

pp. 1-17 ◽

Cited By ~ 2

Author(s):

Feng Zhao ◽

Islem Rekik ◽

Seong-Whan Lee ◽

Jing Liu ◽

Junying Zhang ◽

...

Keyword(s):

Principal Component Analysis ◽

Principal Components ◽

Principal Component ◽

Component Analysis ◽

Incremental Algorithm ◽

Kernel Principal Component Analysis ◽

Second Phase ◽

Two Phase ◽

Batch Mode ◽

Machine Learning Applications

As a powerful nonlinear feature extractor, kernel principal component analysis (KPCA) has been widely adopted in many machine learning applications. However, KPCA is usually performed in a batch mode, leading to some potential problems when handling massive or online datasets. To overcome this drawback of KPCA, in this paper, we propose a two-phase incremental KPCA (TP-IKPCA) algorithm which can incorporate data into KPCA in an incremental fashion. In the first phase, an incremental algorithm is developed to explicitly express the data in the kernel space. In the second phase, we extend an incremental principal component analysis (IPCA) to estimate the kernel principal components. Extensive experimental results on both synthesized and real datasets showed that the proposed TP-IKPCA produces similar principal components as conventional batch-based KPCA but is computationally faster than KPCA and its several incremental variants. Therefore, our algorithm can be applied to massive or online datasets where the batch method is not available.

Download Full-text

Hyperspectral Dimensionality Reduction Based on Multiscale Superpixelwise Kernel Principal Component Analysis

Remote Sensing ◽

10.3390/rs11101219 ◽

2019 ◽

Vol 11 (10) ◽

pp. 1219 ◽

Cited By ~ 4

Author(s):

Lan Zhang ◽

Hongjun Su ◽

Jingwei Shen

Keyword(s):

Principal Component Analysis ◽

Dimensionality Reduction ◽

Principal Components ◽

Classification Accuracy ◽

Hyperspectral Image ◽

Principal Component ◽

Component Analysis ◽

Homogeneous Region ◽

Kernel Principal Component Analysis ◽

Nonlinear Features

Dimensionality reduction (DR) is an important preprocessing step in hyperspectral image applications. In this paper, a superpixelwise kernel principal component analysis (SuperKPCA) method for DR that performs kernel principal component analysis (KPCA) on each homogeneous region is proposed to fully utilize the KPCA’s ability to acquire nonlinear features. Moreover, for the proposed method, the differences in the DR results obtained based on different fundamental images (the first principal components obtained by principal component analysis (PCA), KPCA, and minimum noise fraction (MNF)) are compared. Extensive experiments show that when 5, 10, 20, and 30 samples from each class are selected, for the Indian Pines, Pavia University, and Salinas datasets: (1) when the most suitable fundamental image is selected, the classification accuracy obtained by SuperKPCA can be increased by 0.06%–0.74%, 3.88%–4.37%, and 0.39%–4.85%, respectively, when compared with SuperPCA, which performs PCA on each homogeneous region; (2) the DR results obtained based on different first principal components are different and complementary. By fusing the multiscale classification results obtained based on different first principal components, the classification accuracy can be increased by 0.54%–2.68%, 0.12%–1.10%, and 0.01%–0.08%, respectively, when compared with the method based only on the most suitable fundamental image.

Download Full-text

Comparing the performance of linear and nonlinear principal components in the context of high-dimensional genomic data integration

Statistical Applications in Genetics and Molecular Biology ◽

10.1515/sagmb-2016-0066 ◽

2017 ◽

Vol 16 (3) ◽

Author(s):

Shofiqul Islam ◽

Sonia Anand ◽

Jemila Hamid ◽

Lehana Thabane ◽

Joseph Beyene

Keyword(s):

Principal Component Analysis ◽

Data Integration ◽

Principal Components ◽

Mirna Expression ◽

Principal Component ◽

Component Analysis ◽

Kernel Principal Component Analysis ◽

Data Sets ◽

Data Set ◽

Multiple Data Sets

AbstractLinear principal component analysis (PCA) is a widely used approach to reduce the dimension of gene or miRNA expression data sets. This method relies on the linearity assumption, which often fails to capture the patterns and relationships inherent in the data. Thus, a nonlinear approach such as kernel PCA might be optimal. We develop a copula-based simulation algorithm that takes into account the degree of dependence and nonlinearity observed in these data sets. Using this algorithm, we conduct an extensive simulation to compare the performance of linear and kernel principal component analysis methods towards data integration and death classification. We also compare these methods using a real data set with gene and miRNA expression of lung cancer patients. First few kernel principal components show poor performance compared to the linear principal components in this occasion. Reducing dimensions using linear PCA and a logistic regression model for classification seems to be adequate for this purpose. Integrating information from multiple data sets using either of these two approaches leads to an improved classification accuracy for the outcome.

Download Full-text

Combination Method of Kernel Principal Component Analysis and Independent Component Analysis for Process Monitoring

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.249-250.153 ◽

2012 ◽

Vol 249-250 ◽

pp. 153-158

Author(s):

Ying Wang Xiao ◽

Ying Du

Keyword(s):

Principal Component Analysis ◽

Independent Component Analysis ◽

Process Monitoring ◽

Principal Component ◽

Component Analysis ◽

Independent Component ◽

Combination Method ◽

Kernel Principal Component Analysis ◽

Two Phase ◽

Monitoring Method

A combination method of kernel principal component analysis (KPCA) and independent component analysis (ICA) for process monitoring is proposed. The new method is a two-phase algorithm: whitened KPCA plus ICA. KPCA spheres data and makes the data structure become as linearly separable as possible by virtue of an implicit nonlinear mapping determined by kernel. ICA seeks the projection directions in the KPCA whitened space, making the distribution of the projected data as non-gaussian as possible. The application to the Tennessee Eastman (TE) simulated process indicates that the proposed process monitoring method can effectively capture the nonlinear relationship in process variables. Its performance significantly outperforms monitoring method based on ICA or KPCA.

Download Full-text

Quantitative assessment of spiritual capital in changing organizations by principal component analysis and fuzzy clustering

Journal of Organizational Change Management ◽

10.1108/jocm-07-2014-0127 ◽

2015 ◽

Vol 28 (3) ◽

pp. 469-485 ◽

Cited By ~ 6

Author(s):

Mohammad Reza Taghizadeh Yazdi

Keyword(s):

Principal Component Analysis ◽

Fuzzy Clustering ◽

Questionnaire Survey ◽

Quantitative Assessment ◽

Principal Component ◽

Component Analysis ◽

Second Phase ◽

Two Phase ◽

Content Type ◽

Spiritual Capital

Purpose – The purpose of this paper is to illustrate the application of statistical tools and techniques for quantitative assessment of spiritual capital (SC) based on a questionnaire survey in the organizations which undergo large-scale organizational change projects. Design/methodology/approach – A sample of 65 individuals from three organizations were interviewed. The paper uses the 12 principles of transformation available to spiritual intelligence (referred to as SQ characteristics) to assess SC in a two-phase integrated algorithm of principal component analysis (PCA) and fuzzy clustering. Findings – The paper proposes a two-phase integrated algorithm. In the first phase, PCA is used to reduce the scores of items related to each of SQ characteristics and aggregate them into a single and unique measure. In the second phase, PCA is applied for total SQ quantification. For verification and validation, fuzzy clustering is employed along with PCA to cluster the people in the survey into different classes, which may possess different stocks of SC and rank them based on their level of SQ. The results of PCA are verified and validated by fuzzy clustering revealing the applicability and usefulness of PCA for SC quantification. Research limitations/implications – The paper is based on individual judgments about their own SQ characteristics hence the results of questionnaire survey may be biased by individual personal characteristics. Future research can apply the proposed algorithm and check for its reliability using other psychometric instruments available in the field. Originality/value – The paper contributes by filling a gap in the quantitative management tools literature, in which empirical studies on validated multivariate analysis of spirituality have been scarce until now.

Download Full-text

Kernel principal component analysis for multimedia retrieval

Global Journal of Information Technology Emerging Technologies ◽

10.18844/gjit.v6i1.384 ◽

2016 ◽

Vol 6 (1) ◽

Author(s):

Guang-Ho Cha

Keyword(s):

Principal Component Analysis ◽

Dimensionality Reduction ◽

Principal Components ◽

Principal Component ◽

Feature Space ◽

Component Analysis ◽

Multimedia Retrieval ◽

Kernel Principal Component Analysis ◽

Kernel Pca ◽

Data Set

Principal component analysis (PCA) is an important tool in many areas including data reduction and interpretation, information retrieval, image processing, and so on. Kernel PCA has recently been proposed as a nonlinear extension of the popular PCA. The basic idea is to first map the input space into a feature space via a nonlinear map and then compute the principal components in that feature space. This paper illustrates the potential of kernel PCA for dimensionality reduction and feature extraction in multimedia retrieval. By the use of Gaussian kernels, the principal components were computed in the feature space of an image data set and they are used as new dimensions to approximate image features. Extensive experimental results show that kernel PCA performs better than linear PCA with respect to the retrieval quality as well as the retrieval precision in content-based image retrievals.Keywords: Principal component analysis, kernel principal component analysis, multimedia retrieval, dimensionality reduction, image retrieval

Download Full-text

Face Recognition Based on a Combination of Increasing Virtual Samples and Kernel Principal Component Analysis

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.743.522 ◽

2015 ◽

Vol 743 ◽

pp. 522-525

Author(s):

Hao Zhang ◽

S.F. Wang

Keyword(s):

Principal Component Analysis ◽

Face Recognition ◽

Principal Components ◽

Principal Component ◽

Component Analysis ◽

Kernel Principal Component Analysis ◽

Training Samples ◽

Virtual Samples ◽

Recognition Result ◽

Series Of Experiments

In pattern recognition such as face recognition, the recognition result is not only limited by the quality and quantity of samples, but also limited by the extracted principal components. For improving the quality and quantity of training samples and for extracting more efficient principal components, this paper presents a recognition method combing the increased virtual samples and kernel principal component analysis (KPCA), which doubly weakens the influence of nonlinear factors on face recognition. New database is generated with the pose-changed and the mirror-like virtual images. Then KPCA is used for dimension reduction and feature extraction. The shortest Euclidean distance is applied to measure similarity. A series of experiments are conducted in the ORL and YALE face database and the experimental results show the efficiency of the proposed method.

Download Full-text

Stormwater inflow prediction using radar rainfall data compressed by principal component analysis

Water Practice & Technology ◽

10.2166/wpt.2006.017 ◽

2006 ◽

Vol 1 (1) ◽

Author(s):

K. Katayama ◽

K. Kimijima ◽

O. Yamanaka ◽

A. Nagaiwa ◽

Y. Ono

Keyword(s):

Principal Component Analysis ◽

Prediction Model ◽

Principal Components ◽

Prediction Method ◽

Principal Component ◽

Component Analysis ◽

Rainfall Data ◽

Radar Rainfall ◽

Input Variables ◽

Inflow Prediction

This paper proposes a method of stormwater inflow prediction using radar rainfall data as the input of the prediction model constructed by system identification. The aim of the proposal is to construct a compact system by reducing the dimension of the input data. In this paper, Principal Component Analysis (PCA), which is widely used as a statistical method for data analysis and compression, is applied to pre-processing radar rainfall data. Then we evaluate the proposed method using the radar rainfall data and the inflow data acquired in a certain combined sewer system. This study reveals that a few principal components of radar rainfall data can be appropriate as the input variables to storm water inflow prediction model. Consequently, we have established a procedure for the stormwater prediction method using a few principal components of radar rainfall data.

Download Full-text

Towards fine-scale population stratification modeling based on kernel principal component analysis and random forest

Genes & Genomics ◽

10.1007/s13258-021-01057-4 ◽

2021 ◽

Author(s):

Weiwen Zhang ◽

Lianglun Cheng ◽

Guoheng Huang

Keyword(s):

Principal Component Analysis ◽

Random Forest ◽

Population Stratification ◽

Principal Component ◽

Component Analysis ◽

Kernel Principal Component Analysis ◽

Fine Scale ◽

Scale Population

Download Full-text

Monitoring a Reverse Osmosis Process with Kernel Principal Component Analysis: A Preliminary Approach

Applied Sciences ◽

10.3390/app11146370 ◽

2021 ◽

Vol 11 (14) ◽

pp. 6370

Author(s):

Elena Quatrini ◽

Francesco Costantino ◽

David Mba ◽

Xiaochuan Li ◽

Tat-Hean Gan

Keyword(s):

Principal Component Analysis ◽

Fault Detection ◽

Water Purification ◽

Principal Component ◽

Temporal Correlation ◽

Component Analysis ◽

Kernel Principal Component Analysis ◽

Original Dataset ◽

Monitoring Process ◽

Machine Health

The water purification process is becoming increasingly important to ensure the continuity and quality of subsequent production processes, and it is particularly relevant in pharmaceutical contexts. However, in this context, the difficulties arising during the monitoring process are manifold. On the one hand, the monitoring process reveals various discontinuities due to different characteristics of the input water. On the other hand, the monitoring process is discontinuous and random itself, thus not guaranteeing continuity of the parameters and hindering a straightforward analysis. Consequently, further research on water purification processes is paramount to identify the most suitable techniques able to guarantee good performance. Against this background, this paper proposes an application of kernel principal component analysis for fault detection in a process with the above-mentioned characteristics. Based on the temporal variability of the process, the paper suggests the use of past and future matrices as input for fault detection as an alternative to the original dataset. In this manner, the temporal correlation between process parameters and machine health is accounted for. The proposed approach confirms the possibility of obtaining very good monitoring results in the analyzed context.

Download Full-text