scholarly journals Comparative Study of Dimensionality Reduction Techniques for Spectral–Temporal Data

Information ◽  
2020 ◽  
Vol 12 (1) ◽  
pp. 1
Author(s):  
Shingchern D. You ◽  
Ming-Jen Hung

This paper studies the use of three different approaches to reduce the dimensionality of a type of spectral–temporal features, called motion picture expert group (MPEG)-7 audio signature descriptors (ASD). The studied approaches include principal component analysis (PCA), independent component analysis (ICA), and factor analysis (FA). These approaches are applied to ASD features obtained from audio items with or without distortion. These low-dimensional features are used as queries to a dataset containing low-dimensional features extracted from undistorted items. Doing so, we may investigate the distortion-resistant capability of each approach. The experimental results show that features obtained by the ICA or FA reduction approaches have higher identification accuracy than the PCA approach for moderately distorted items. Therefore, to extract features from distorted items, ICA or FA approaches should also be considered in addition to the PCA approach.

2019 ◽  
Vol 8 (S3) ◽  
pp. 66-71
Author(s):  
T. Sudha ◽  
P. Nagendra Kumar

Data mining is one of the major areas of research. Clustering is one of the main functionalities of datamining. High dimensionality is one of the main issues of clustering and Dimensionality reduction can be used as a solution to this problem. The present work makes a comparative study of dimensionality reduction techniques such as t-distributed stochastic neighbour embedding and probabilistic principal component analysis in the context of clustering. High dimensional data have been reduced to low dimensional data using dimensionality reduction techniques such as t-distributed stochastic neighbour embedding and probabilistic principal component analysis. Cluster analysis has been performed on the high dimensional data as well as the low dimensional data sets obtained through t-distributed stochastic neighbour embedding and Probabilistic principal component analysis with varying number of clusters. Mean squared error; time and space have been considered as parameters for comparison. The results obtained show that time taken to convert the high dimensional data into low dimensional data using probabilistic principal component analysis is higher than the time taken to convert the high dimensional data into low dimensional data using t-distributed stochastic neighbour embedding.The space required by the data set reduced through Probabilistic principal component analysis is less than the storage space required by the data set reduced through t-distributed stochastic neighbour embedding.


2021 ◽  
Vol 54 (4) ◽  
pp. 1-34
Author(s):  
Felipe L. Gewers ◽  
Gustavo R. Ferreira ◽  
Henrique F. De Arruda ◽  
Filipi N. Silva ◽  
Cesar H. Comin ◽  
...  

Principal component analysis (PCA) is often applied for analyzing data in the most diverse areas. This work reports, in an accessible and integrated manner, several theoretical and practical aspects of PCA. The basic principles underlying PCA, data standardization, possible visualizations of the PCA results, and outlier detection are subsequently addressed. Next, the potential of using PCA for dimensionality reduction is illustrated on several real-world datasets. Finally, we summarize PCA-related approaches and other dimensionality reduction techniques. All in all, the objective of this work is to assist researchers from the most diverse areas in using and interpreting PCA.


2020 ◽  
Vol 0 (0) ◽  
Author(s):  
Alexandra-Maria Tăuţan ◽  
Alessandro C. Rossi ◽  
Ruben de Francisco ◽  
Bogdan Ionescu

AbstractMethods developed for automatic sleep stage detection make use of large amounts of data in the form of polysomnographic (PSG) recordings to build predictive models. In this study, we investigate the effect of several dimensionality reduction techniques, i.e., principal component analysis (PCA), factor analysis (FA), and autoencoders (AE) on common classifiers, e.g., random forests (RF), multilayer perceptron (MLP), long-short term memory (LSTM) networks, for automated sleep stage detection. Experimental testing is carried out on the MGH Dataset provided in the “You Snooze, You Win: The PhysioNet/Computing in Cardiology Challenge 2018”. The signals used as input are the six available (EEG) electoencephalographic channels and combinations with the other PSG signals provided: ECG – electrocardiogram, EMG – electromyogram, respiration based signals – respiratory efforts and airflow. We observe that a similar or improved accuracy is obtained in most cases when using all dimensionality reduction techniques, which is a promising result as it allows to reduce the computational load while maintaining performance and in some cases also improves the accuracy of automated sleep stage detection. In our study, using autoencoders for dimensionality reduction maintains the performance of the model, while using PCA and FA the accuracy of the models is in most cases improved.


2020 ◽  
Author(s):  
Gregory Kiar ◽  
Yohan Chatelain ◽  
Ali Salari ◽  
Alan C. Evans ◽  
Tristan Glatard

AbstractMachine learning models are commonly applied to human brain imaging datasets in an effort to associate function or structure with behaviour, health, or other individual phenotypes. Such models often rely on low-dimensional maps generated by complex processing pipelines. However, the numerical instabilities inherent to pipelines limit the fidelity of these maps and introduce computational bias. Monte Carlo Arithmetic, a technique for introducing controlled amounts of numerical noise, was used to perturb a structural connectome estimation pipeline, ultimately producing a range of plausible networks for each sample. The variability in the perturbed networks was captured in an augmented dataset, which was then used for an age classification task. We found that resampling brain networks across a series of such numerically perturbed outcomes led to improved performance in all tested classifiers, preprocessing strategies, and dimensionality reduction techniques. Importantly, we find that this benefit does not hinge on a large number of perturbations, suggesting that even minimally perturbing a dataset adds meaningful variance which can be captured in the subsequently designed models.


2014 ◽  
Vol 578-579 ◽  
pp. 1020-1023
Author(s):  
Jing Zhou Lu ◽  
Jia Chen Wang ◽  
Xu Zhu

In this paper, we introduce a set of techniques for time series analysis based on principal component analysis (PCA). Firstly, the autoregressive (AR) model is established using acceleration response data, and the root mean squared error (RMSE) of AR model is calculated based on PCA. Then a new damage sensitive feature (DSF) based on the AR coefficients is presented. To test the efficacy of the damage detection and localization methodologies, the algorithm has been tested on the analytical and experimental results of a three-story frame structure model of the Los Alamos National Laboratory. The result of the damage detection indicates that the algorithm is able to identify and localize minor to severe damage as defined for the structure. It shows that the suggested method can lead to less amount of computing time, high suitability and identification accuracy.


Animals ◽  
2020 ◽  
Vol 10 (8) ◽  
pp. 1406
Author(s):  
Ronald M. Parra-Hernández ◽  
Jorge I. Posada-Quintero ◽  
Orlando Acevedo-Charry ◽  
Hugo F. Posada-Quintero

Vocalizations from birds are a fruitful source of information for the classification of species. However, currently used analyses are ineffective to determine the taxonomic status of some groups. To provide a clearer grouping of taxa for such bird species from the analysis of vocalizations, more sensitive techniques are required. In this study, we have evaluated the sensitivity of the Uniform Manifold Approximation and Projection (UMAP) technique for grouping the vocalizations of individuals of the Rough-legged Tyrannulet Phyllomyias burmeisteri complex. Although the existence of two taxonomic groups has been suggested by some studies, the species has presented taxonomic difficulties in classification in previous studies. UMAP exhibited a clearer separation of groups than previously used dimensionality-reduction techniques (i.e., principal component analysis), as it was able to effectively identify the two taxa groups. The results achieved with UMAP in this study suggest that the technique can be useful in the analysis of species with complex in taxonomy through vocalizations data as a complementary tool including behavioral traits such as acoustic communication.


Complexity ◽  
2018 ◽  
Vol 2018 ◽  
pp. 1-13 ◽  
Author(s):  
Ireneusz Czarnowski ◽  
Piotr Jędrzejowicz

In the paper, several data reduction techniques for machine learning from big datasets are discussed and evaluated. The discussed approach focuses on combining several techniques including stacking, rotation, and data reduction aimed at improving the performance of the machine classification. Stacking is seen as the technique allowing to take advantage of the multiple classification models. The rotation-based techniques are used to increase the heterogeneity of the stacking ensembles. Data reduction makes it possible to classify instances belonging to big datasets. We propose to use an agent-based population learning algorithm for data reduction in the feature and instance dimensions. For diversification of the classifier ensembles within the rotation also, alternatively, principal component analysis and independent component analysis are used. The research question addressed in the paper is formulated as follows: does the performance of a classifier using the reduced dataset be improved by integrating the data reduction mechanism with the rotation-based technique and the stacking?


2011 ◽  
Vol 341-342 ◽  
pp. 790-797 ◽  
Author(s):  
Zhi Yan Xiang ◽  
Tie Yong Cao ◽  
Peng Zhang ◽  
Tao Zhu ◽  
Jing Feng Pan

In this paper, an object tracking approach is introduced for color video sequences. The approach presents the integration of color distributions and probabilistic principal component analysis (PPCA) into particle filtering framework. Color distributions are robust to partial occlusion, are rotation and scale invariant and are calculated efficiently. Principal Component Analysis (PCA) is used to update the eigenbasis and the mean, which can reflect the appearance changes of the tracked object. And a low dimensional subspace representation of PPCA efficiently adapts to these changes of appearance of the target object. At the same time, a forgetting factor is incorporated into the updating process, which can be used to economize on processing time and enhance the efficiency of object tracking. Computer simulation experiments demonstrate the effectiveness and the robustness of the proposed tracking algorithm when the target object undergoes pose and scale changes, defilade and complex background.


2014 ◽  
Vol 571-572 ◽  
pp. 753-756
Author(s):  
Wei Li Li ◽  
Xiao Qing Yin ◽  
Bin Wang ◽  
Mao Jun Zhang ◽  
Ke Tan

Denoising is an important issue for laser active image. This paper attempted to process laser active image in the low-dimensional sub-space. We adopted the principal component analysis with local pixel grouping (LPG-PCA) denoising method proposed by Zhang [1], and compared it with the conventional denoising method for laser active image, such as wavelet filtering, wavelet soft threshold filtering and median filtering. Experimental results show that the image denoised by LPG-PCA has higher BIQI value than other images, most of the speckle noise can be reduced and the detail structure information is well preserved. The low-dimensional sub-space idea is a new direction for laser active image denoising.


Sign in / Sign up

Export Citation Format

Share Document