scholarly journals Identifying the Informational/Signal Dimension in Principal Component Analysis

Mathematics ◽  
2018 ◽  
Vol 6 (11) ◽  
pp. 269 ◽  
Author(s):  
Sergio Camiz ◽  
Valério Pillar

The identification of a reduced dimensional representation of the data is among the main issues of exploratory multidimensional data analysis and several solutions had been proposed in the literature according to the method. Principal Component Analysis (PCA) is the method that has received the largest attention thus far and several identification methods—the so-called stopping rules—have been proposed, giving very different results in practice, and some comparative study has been carried out. Some inconsistencies in the previous studies led us to try to fix the distinction between signal from noise in PCA—and its limits—and propose a new testing method. This consists in the production of simulated data according to a predefined eigenvalues structure, including zero-eigenvalues. From random populations built according to several such structures, reduced-size samples were extracted and to them different levels of random normal noise were added. This controlled introduction of noise allows a clear distinction between expected signal and noise, the latter relegated to the non-zero eigenvalues in the samples corresponding to zero ones in the population. With this new method, we tested the performance of ten different stopping rules. Of every method, for every structure and every noise, both power (the ability to correctly identify the expected dimension) and type-I error (the detection of a dimension composed only by noise) have been measured, by counting the relative frequencies in which the smallest non-zero eigenvalue in the population was recognized as signal in the samples and that in which the largest zero-eigenvalue was recognized as noise, respectively. This way, the behaviour of the examined methods is clear and their comparison/evaluation is possible. The reported results show that both the generalization of the Bartlett’s test by Rencher and the Bootstrap method by Pillar result much better than all others: both are accounted for reasonable power, decreasing with noise, and very good type-I error. Thus, more than the others, these methods deserve being adopted.

Entropy ◽  
2019 ◽  
Vol 21 (6) ◽  
pp. 548 ◽  
Author(s):  
Yuqing Sun ◽  
Jun Niu

Hydrological regionalization is a useful step in hydrological modeling and prediction. The regionalization is not always straightforward, however, due to the lack of long-term hydrological data and the complex multi-scale variability features embedded in the data. This study examines the multiscale soil moisture variability for the simulated data on a grid cell base obtained from a large-scale hydrological model, and clusters the grid-cell based soil moisture data using wavelet-based multiscale entropy and principal component analysis, over the Xijiang River basin in South China, for the period of 2002–2010. The effective regionalization, for 169 grid cells with the special resolution of 0.5° × 0.5°, produced homogeneous groups based on the pattern of wavelet-based entropy information. Four distinct modes explain 80.14% of the total embedded variability of the transformed wavelet power across different timescales. Moreover, the possible implications of the regionalization results for local hydrological applications, such as parameter estimation for an ungagged catchment and designing a uniform prediction strategy for a sub-area in a large-scale basin, are discussed.


2014 ◽  
Vol 556-562 ◽  
pp. 4317-4320
Author(s):  
Qiang Zhang ◽  
Li Ping Liu ◽  
Chao Liu

As a zero-emission mode of transportation, an increasing number of Electric Vehicles (EV) have come into use in our daily lives. The EV charging station is an important component of the Smart Grid which is now facing the challenges of big data. This paper presents a data compression and reconstruction method based on the technique of Principal Component Analysis (PCA). The data reconstruction error Normalized Absolute Percent Error (NAPE) is taken into consideration to balance the compression ratio and data reconstruction quality. By using the simulated data, the effectiveness of data compression and reconstruction for EV charging stations are verified.


2013 ◽  
Vol 2013 ◽  
pp. 1-16 ◽  
Author(s):  
Cong Liu ◽  
Xu Wei-sheng ◽  
Wu Qi-di

We propose the Tensorial Kernel Principal Component Analysis (TKPCA) for dimensionality reduction and feature extraction from tensor objects, which extends the conventional Principal Component Analysis (PCA) in two perspectives: working directly with multidimensional data (tensors) in their native state and generalizing an existing linear technique to its nonlinear version by applying the kernel trick. Our method aims to remedy the shortcomings of multilinear subspace learning (tensorial PCA) developed recently in modelling the nonlinear manifold of tensor objects and brings together the desirable properties of kernel methods and tensor decompositions for significant performance gain when the data are multidimensional and nonlinear dependencies do exist. Our approach begins by formulating TKPCA as an optimization problem. Then, we develop a kernel function based on Grassmann Manifold that can directly take tensorial representation as parameters instead of traditional vectorized representation. Furthermore, a TKPCA-based tensor object recognition is also proposed for application of the action recognition. Experiments with real action datasets show that the proposed method is insensitive to both noise and occlusion and performs well compared with state-of-the-art algorithms.


2020 ◽  
Vol 17 (2) ◽  
pp. 67
Author(s):  
Arief Ginanjar ◽  
Awan Setiawan

Ketika menggunakan Kansei Engineering dalam mencari kandidat terbaik untuk menentukan model perancangan antarmuka website, peneliti menggunakan metode analisis Partial Least Square (PLS) yang dilakukan secara berulang hingga ditemukan elemen terbaik yang dapat diimplementasikan. PLS sebagai alat bantu untuk menentukan nilai terbaik antara elemen website. Output perbandingan yang dihasilkan akan dikelompokkan berdasarkan Kansei Word sebagaimana yang telah ditentukan dalam rencana awal implementasi Kansei Engineering, output perbandingan PLS iterasi pertama mempunyai kemungkinan mendapatkan nilai usulan terbaik jika digabung dengan melakukan iterasi kedua terhadap asimilasi dua atau tiga elemen yang mempunyai nilai tertinggi. Metodologi yang digunakan mengacu kepada Kansei Engineering Type I dengan melalui pengolahan data menggunakan Cronbach’s Alpha untuk menguji kelayakan responden, kemudian untuk mengetahui hubungan Kansei Words dapat menggunakan Coefficient Correlation Analysis (CCA), sedangkan hubungan antara Kansei Words dengan spesimen dapat menggunakan Principal Component Analysis (PCA), sedangkan mencari pengaruh Kansei Words paling kuat dapat menggunakan Factor Analysis (FA) dan analisis Partial Least Square (PLS) namun harus dilakukan iterasi proses PLS hingga variabel rekomendasi model perancangan antarmuka yang dihasilkan menjadi lebih bervariatif.


2019 ◽  
Author(s):  
Florian Wagner ◽  
Dalia Barkley ◽  
Itai Yanai

AbstractSingle-cell RNA-Seq measurements are commonly affected by high levels of technical noise, posing challenges for data analysis and visualization. A diverse array of methods has been proposed to computationally remove noise by sharing information across similar cells or genes, however their respective accuracies have been difficult to establish. Here, we propose a simple denoising strategy based on principal component analysis (PCA). We show that while PCA performed on raw data is biased towards highly expressed genes, this bias can be mitigated with a cell aggregation step, allowing the recovery of denoised expression values for both highly and lowly expressed genes. We benchmark our resulting ENHANCE algorithm and three previously described methods on simulated data that closely mimic real datasets, showing that ENHANCE provides the best overall denoising accuracy, recovering modules of co-expressed genes and cell subpopulations. Implementations of our algorithm are available at https://github.com/yanailab/enhance.


2010 ◽  
Vol 08 (06) ◽  
pp. 995-1011 ◽  
Author(s):  
HAO ZHENG ◽  
HONGWEI WU

Metagenomics is an emerging field in which the power of genomic analysis is applied to an entire microbial community, bypassing the need to isolate and culture individual microbial species. Assembling of metagenomic DNA fragments is very much like the overlap-layout-consensus procedure for assembling isolated genomes, but is augmented by an additional binning step to differentiate scaffolds, contigs and unassembled reads into various taxonomic groups. In this paper, we employed n-mer oligonucleotide frequencies as the features and developed a hierarchical classifier (PCAHIER) for binning short (≤ 1,000 bps) metagenomic fragments. The principal component analysis was used to reduce the high dimensionality of the feature space. The hierarchical classifier consists of four layers of local classifiers that are implemented based on the linear discriminant analysis. These local classifiers are responsible for binning prokaryotic DNA fragments into superkingdoms, of the same superkingdom into phyla, of the same phylum into genera, and of the same genus into species, respectively. We evaluated the performance of the PCAHIER by using our own simulated data sets as well as the widely used simHC synthetic metagenome data set from the IMG/M system. The effectiveness of the PCAHIER was demonstrated through comparisons against a non-hierarchical classifier, and two existing binning algorithms (TETRA and Phylopythia).


2019 ◽  
Vol 11 (14) ◽  
pp. 246 ◽  
Author(s):  
B. R. A. Moreira ◽  
R. S. Viana ◽  
L. A. M. Lisboa ◽  
P. R. M. Lopes ◽  
P. A. M. Figueiredo ◽  
...  

The biggest challenge facing in sugar-energy plants is to move towards the biorefinery concept, without threatening the environment and health. Energy cane is the state-of-the-art of smart energy crops to provide suitable whole-raw material to produce upgraded biofuels, dehydrated alcohol for transportation, refined sugar, yeast-fermented alcoholic beverages, soft drinks, silage and high-quality fodder, as well as to cogenerate heat and bioelectricity from burnt lignocellulose. We, accordingly, present fuzzy c-means (FCM) clustering algorithm interconnected with principal component analysis (PCA) as powerful exploratory data analysis tool to wisely classify hybrids of energy cane for production of first-generation ethanol and cogeneration of heat and bioelectricity. From the orthogonally-rotated factorial map, fuzzy cluster I aggregated the hybrids VX12-0277, VX12-1191, VX12-1356 and VX12-1658 composed of higher contents of soluble solids and sucrose, and larger productive yields of fermentable sugars. These parameters correlated with the X-axis component referring to technological quality of cane juice. Fuzzy cluster III aggregated the hybrids VX12-0180 and VX12-1022 consisted of higher fiber content. This parameter correlated with the Y-axis component referring to physicochemical quality of lignocellulose. From the PCA-FCM methodology, the conclusion is, therefore, hybrids from fuzzy cluster I prove to be type I energy cane (higher sucrose to fiber ratio) and could serve as energy supply pathways to produce bioethanol, while the hybrids from fuzzy cluster III are type II energy cane (lower sucrose to fiber ratio), denoting potential as higher fiber yield biomass sources to feed cogeneration of heat and bioelectricity in high temperature and pressure furnace-boiler system.


2021 ◽  
Vol 30 (30 (1)) ◽  
pp. 177-186
Author(s):  
Silviu Cornel Virgil Chiriac

The current paper is part of a wider study which aims at identifying the determining factors of the performances of the entities in the real estate field and the setting up of a composite index of the companies’ performances based on a sample of 29 companies listed at the BVB Bucharest (Bucharest Stock Exchange) in the year 2019 using one of the multidimensional data analysis techniques, the principal component analysis. The descriptive analysis, the principal component analysis for setting up the composite index of the companies performances were applied within the study in order to highlight the most important companies from the point of view of the financial performance. The descriptive analysis of the data set highlights the overview within the companies selected for analysis. The study aims at building a synthetic indicator that will show the financial performance of the companies selected based on 9 financial indicators using the principal component analysis PCA. The 9 indicators considered for the analysis were selected based on specialised articles and they are: ROA – return on assets, which reflect the company’s capacity of using its assets productively, ROE – return on equity, which measures the efficiency of use of the stockholders’ capitals, rotation of total assets, general liquidity ratio, general solvency ratio, general dent-to-equity level, net profit margin, gross return of portfolio.


Sign in / Sign up

Export Citation Format

Share Document