scholarly journals Two-Phase Incremental Kernel PCA for Learning Massive or Online Datasets

Complexity ◽  
2019 ◽  
Vol 2019 ◽  
pp. 1-17 ◽  
Author(s):  
Feng Zhao ◽  
Islem Rekik ◽  
Seong-Whan Lee ◽  
Jing Liu ◽  
Junying Zhang ◽  
...  

As a powerful nonlinear feature extractor, kernel principal component analysis (KPCA) has been widely adopted in many machine learning applications. However, KPCA is usually performed in a batch mode, leading to some potential problems when handling massive or online datasets. To overcome this drawback of KPCA, in this paper, we propose a two-phase incremental KPCA (TP-IKPCA) algorithm which can incorporate data into KPCA in an incremental fashion. In the first phase, an incremental algorithm is developed to explicitly express the data in the kernel space. In the second phase, we extend an incremental principal component analysis (IPCA) to estimate the kernel principal components. Extensive experimental results on both synthesized and real datasets showed that the proposed TP-IKPCA produces similar principal components as conventional batch-based KPCA but is computationally faster than KPCA and its several incremental variants. Therefore, our algorithm can be applied to massive or online datasets where the batch method is not available.

2019 ◽  
Vol 11 (10) ◽  
pp. 1219 ◽  
Author(s):  
Lan Zhang ◽  
Hongjun Su ◽  
Jingwei Shen

Dimensionality reduction (DR) is an important preprocessing step in hyperspectral image applications. In this paper, a superpixelwise kernel principal component analysis (SuperKPCA) method for DR that performs kernel principal component analysis (KPCA) on each homogeneous region is proposed to fully utilize the KPCA’s ability to acquire nonlinear features. Moreover, for the proposed method, the differences in the DR results obtained based on different fundamental images (the first principal components obtained by principal component analysis (PCA), KPCA, and minimum noise fraction (MNF)) are compared. Extensive experiments show that when 5, 10, 20, and 30 samples from each class are selected, for the Indian Pines, Pavia University, and Salinas datasets: (1) when the most suitable fundamental image is selected, the classification accuracy obtained by SuperKPCA can be increased by 0.06%–0.74%, 3.88%–4.37%, and 0.39%–4.85%, respectively, when compared with SuperPCA, which performs PCA on each homogeneous region; (2) the DR results obtained based on different first principal components are different and complementary. By fusing the multiscale classification results obtained based on different first principal components, the classification accuracy can be increased by 0.54%–2.68%, 0.12%–1.10%, and 0.01%–0.08%, respectively, when compared with the method based only on the most suitable fundamental image.


Author(s):  
Shofiqul Islam ◽  
Sonia Anand ◽  
Jemila Hamid ◽  
Lehana Thabane ◽  
Joseph Beyene

AbstractLinear principal component analysis (PCA) is a widely used approach to reduce the dimension of gene or miRNA expression data sets. This method relies on the linearity assumption, which often fails to capture the patterns and relationships inherent in the data. Thus, a nonlinear approach such as kernel PCA might be optimal. We develop a copula-based simulation algorithm that takes into account the degree of dependence and nonlinearity observed in these data sets. Using this algorithm, we conduct an extensive simulation to compare the performance of linear and kernel principal component analysis methods towards data integration and death classification. We also compare these methods using a real data set with gene and miRNA expression of lung cancer patients. First few kernel principal components show poor performance compared to the linear principal components in this occasion. Reducing dimensions using linear PCA and a logistic regression model for classification seems to be adequate for this purpose. Integrating information from multiple data sets using either of these two approaches leads to an improved classification accuracy for the outcome.


2012 ◽  
Vol 249-250 ◽  
pp. 153-158
Author(s):  
Ying Wang Xiao ◽  
Ying Du

A combination method of kernel principal component analysis (KPCA) and independent component analysis (ICA) for process monitoring is proposed. The new method is a two-phase algorithm: whitened KPCA plus ICA. KPCA spheres data and makes the data structure become as linearly separable as possible by virtue of an implicit nonlinear mapping determined by kernel. ICA seeks the projection directions in the KPCA whitened space, making the distribution of the projected data as non-gaussian as possible. The application to the Tennessee Eastman (TE) simulated process indicates that the proposed process monitoring method can effectively capture the nonlinear relationship in process variables. Its performance significantly outperforms monitoring method based on ICA or KPCA.


2015 ◽  
Vol 28 (3) ◽  
pp. 469-485 ◽  
Author(s):  
Mohammad Reza Taghizadeh Yazdi

Purpose – The purpose of this paper is to illustrate the application of statistical tools and techniques for quantitative assessment of spiritual capital (SC) based on a questionnaire survey in the organizations which undergo large-scale organizational change projects. Design/methodology/approach – A sample of 65 individuals from three organizations were interviewed. The paper uses the 12 principles of transformation available to spiritual intelligence (referred to as SQ characteristics) to assess SC in a two-phase integrated algorithm of principal component analysis (PCA) and fuzzy clustering. Findings – The paper proposes a two-phase integrated algorithm. In the first phase, PCA is used to reduce the scores of items related to each of SQ characteristics and aggregate them into a single and unique measure. In the second phase, PCA is applied for total SQ quantification. For verification and validation, fuzzy clustering is employed along with PCA to cluster the people in the survey into different classes, which may possess different stocks of SC and rank them based on their level of SQ. The results of PCA are verified and validated by fuzzy clustering revealing the applicability and usefulness of PCA for SC quantification. Research limitations/implications – The paper is based on individual judgments about their own SQ characteristics hence the results of questionnaire survey may be biased by individual personal characteristics. Future research can apply the proposed algorithm and check for its reliability using other psychometric instruments available in the field. Originality/value – The paper contributes by filling a gap in the quantitative management tools literature, in which empirical studies on validated multivariate analysis of spirituality have been scarce until now.


Author(s):  
Guang-Ho Cha

Principal component analysis (PCA) is an important tool in many areas including data reduction and interpretation, information retrieval, image processing, and so on. Kernel PCA has recently been proposed as a nonlinear extension of the popular PCA. The basic idea is to first map the input space into a feature space via a nonlinear map and then compute the principal components in that feature space. This paper illustrates the potential of kernel PCA for dimensionality reduction and feature extraction in multimedia retrieval. By the use of Gaussian kernels, the principal components were computed in the feature space of an image data set and they are used as new dimensions to approximate image features. Extensive experimental results show that kernel PCA performs better than linear PCA with respect to the retrieval quality as well as the retrieval precision in content-based image retrievals.Keywords: Principal component analysis, kernel principal component analysis, multimedia retrieval, dimensionality reduction, image retrieval


2015 ◽  
Vol 743 ◽  
pp. 522-525
Author(s):  
Hao Zhang ◽  
S.F. Wang

In pattern recognition such as face recognition, the recognition result is not only limited by the quality and quantity of samples, but also limited by the extracted principal components. For improving the quality and quantity of training samples and for extracting more efficient principal components, this paper presents a recognition method combing the increased virtual samples and kernel principal component analysis (KPCA), which doubly weakens the influence of nonlinear factors on face recognition. New database is generated with the pose-changed and the mirror-like virtual images. Then KPCA is used for dimension reduction and feature extraction. The shortest Euclidean distance is applied to measure similarity. A series of experiments are conducted in the ORL and YALE face database and the experimental results show the efficiency of the proposed method.


2006 ◽  
Vol 1 (1) ◽  
Author(s):  
K. Katayama ◽  
K. Kimijima ◽  
O. Yamanaka ◽  
A. Nagaiwa ◽  
Y. Ono

This paper proposes a method of stormwater inflow prediction using radar rainfall data as the input of the prediction model constructed by system identification. The aim of the proposal is to construct a compact system by reducing the dimension of the input data. In this paper, Principal Component Analysis (PCA), which is widely used as a statistical method for data analysis and compression, is applied to pre-processing radar rainfall data. Then we evaluate the proposed method using the radar rainfall data and the inflow data acquired in a certain combined sewer system. This study reveals that a few principal components of radar rainfall data can be appropriate as the input variables to storm water inflow prediction model. Consequently, we have established a procedure for the stormwater prediction method using a few principal components of radar rainfall data.


2021 ◽  
Vol 11 (14) ◽  
pp. 6370
Author(s):  
Elena Quatrini ◽  
Francesco Costantino ◽  
David Mba ◽  
Xiaochuan Li ◽  
Tat-Hean Gan

The water purification process is becoming increasingly important to ensure the continuity and quality of subsequent production processes, and it is particularly relevant in pharmaceutical contexts. However, in this context, the difficulties arising during the monitoring process are manifold. On the one hand, the monitoring process reveals various discontinuities due to different characteristics of the input water. On the other hand, the monitoring process is discontinuous and random itself, thus not guaranteeing continuity of the parameters and hindering a straightforward analysis. Consequently, further research on water purification processes is paramount to identify the most suitable techniques able to guarantee good performance. Against this background, this paper proposes an application of kernel principal component analysis for fault detection in a process with the above-mentioned characteristics. Based on the temporal variability of the process, the paper suggests the use of past and future matrices as input for fault detection as an alternative to the original dataset. In this manner, the temporal correlation between process parameters and machine health is accounted for. The proposed approach confirms the possibility of obtaining very good monitoring results in the analyzed context.


Sign in / Sign up

Export Citation Format

Share Document