On Simultaneous Data-Based Dimension Reduction and Hidden Phase Identification

2008 ◽  
Vol 65 (6) ◽  
pp. 1941-1954 ◽  
Author(s):  
Illia Horenko

Abstract A problem of simultaneous dimension reduction and identification of hidden attractive manifolds in multidimensional data with noise is considered. The problem is approached in two consecutive steps: (i) embedding the original data in a sufficiently high-dimensional extended space in a way proposed by Takens in his embedding theorem, followed by (ii) a minimization of the residual functional. The residual functional is constructed to measure the distance between the original data in extended space and their reconstruction based on a low-dimensional description. The reduced representation of the analyzed data results from projection onto a fixed number of unknown low-dimensional manifolds. Two specific forms of the residual functional are proposed, defining two different types of essential coordinates: (i) localized essential orthogonal functions (EOFs) and (ii) localized functions called principal original components (POCs). The application of the framework is exemplified both on a Lorenz attractor model with measurement noise and on historical air temperature data. It is demonstrated how the new method can be used for the elimination of noise and identification of the seasonal low-frequency components in meteorological data. An application of the proposed POCs in the context of the low-dimensional predictive models construction is presented.

2013 ◽  
Vol 380-384 ◽  
pp. 4035-4038 ◽  
Author(s):  
Nan Yao ◽  
Feng Qian ◽  
Zuo Lei Sun

Dimensionality reduction (DR) of image features plays an important role in image retrieval and classification tasks. Recently, two types of methods have been proposed to improve both the accuracy and efficiency for the dimensionality reduction problem. One uses Non-negative matrix factorization (NMF) to describe the image distribution on the space of base matrix. Another one for dimension reduction trains a subspace projection matrix to project original data space into some low-dimensional subspaces which have deep architecture, so that the low-dimensional codes would be learned. At the same time, the graph based similarity learning algorithm which tries to exploit contextual information for improving the effectiveness of image rankings is also proposed for image class and retrieval problem. In this paper, after above two methods mentioned are utilized to reduce the high-dimensional features of images respectively, we learn the graph based similarity for the image classification problem. This paper compares the proposed approach with other approaches on an image database.


2021 ◽  
Vol 2021 ◽  
pp. 1-8
Author(s):  
Chuanlei Zhang ◽  
Jiangtao Liu ◽  
Wei Chen ◽  
Jinyuan Shi ◽  
Minda Yao ◽  
...  

The unsupervised anomaly detection task based on high-dimensional or multidimensional data occupies a very important position in the field of machine learning and industrial applications; especially in the aspect of network security, the anomaly detection of network data is particularly important. The key to anomaly detection is density estimation. Although the methods of dimension reduction and density estimation have made great progress in recent years, most dimension reduction methods are difficult to retain the key information of original data or multidimensional data. Recent studies have shown that the deep autoencoder (DAE) can solve this problem well. In order to improve the performance of unsupervised anomaly detection, we propose an anomaly detection scheme based on a deep autoencoder (DAE) and clustering methods. The deep autoencoder is trained to learn the compressed representation of the input data and then feed it to clustering approach. This scheme makes full use of the advantages of the deep autoencoder (DAE) to generate low-dimensional representation and reconstruction errors for the input high-dimensional or multidimensional data and uses them to reconstruct the input samples. The proposed scheme could eliminate redundant information contained in the data, improve performance of clustering methods in identifying abnormal samples, and reduce the amount of calculation. To verify the effectiveness of the proposed scheme, massive comparison experiments have been conducted with traditional dimension reduction algorithms and clustering methods. The results of experiments demonstrate that, in most cases, the proposed scheme outperforms the traditional dimension reduction algorithms with different clustering methods.


2021 ◽  
Vol 13 (6) ◽  
pp. 1098
Author(s):  
Egor Prikaziuk ◽  
Peiqi Yang ◽  
Christiaan van der Tol

In this study, we demonstrate that the Google Earth Engine (GEE) dataset of Sentinel-3 Ocean and Land Color Instrument (OLCI) level-1 deviates from the original Copernicus Open Access Data Hub Service (DHUS) data by 10–20 W m−2 sr−1μμm−1 per pixel per band. We compared GEE and DHUS single pixel time series for the period from April 2016 to September 2020 and identified two sources of this discrepancy: the ground pixel position and reprojection. The ground pixel position of OLCI product can be determined in two ways: from geo-coordinates (DHUS) or from tie-point coordinates (GEE). We recommend using geo-coordinates for pixel extraction from the original data. When the Sentinel Application Platform (SNAP) Pixel Extraction Tool is used, an additional distance check has to be conducted to exclude pixels that lay further than 212 m from the point of interest. Even geo-coordinates-based pixel extraction requires the homogeneity of the target area at a 700 m diameter (49 ha) footprint (double of the pixel resolution). The GEE OLCI dataset can be safely used if the homogeneity assumption holds at 2700 m diameter (9-by-9 OLCI pixels) or if the uncertainty in the radiance of 10% is not critical for the application. Further analysis showed that the scaling factors reported in the GEE dataset description must not be used. Finally, observation geometry and meteorological data are not present in the GEE OLCI dataset, but they are crucial for most applications. Therefore, we propose to calculate angles and extraterrestrial solar fluxes and to use an alternative data source—the Copernicus Atmosphere Monitoring Service (CAMS) dataset—for meteodata.


2020 ◽  
Vol 49 (3) ◽  
pp. 421-437
Author(s):  
Genggeng Liu ◽  
Lin Xie ◽  
Chi-Hua Chen

Dimensionality reduction plays an important role in the data processing of machine learning and data mining, which makes the processing of high-dimensional data more efficient. Dimensionality reduction can extract the low-dimensional feature representation of high-dimensional data, and an effective dimensionality reduction method can not only extract most of the useful information of the original data, but also realize the function of removing useless noise. The dimensionality reduction methods can be applied to all types of data, especially image data. Although the supervised learning method has achieved good results in the application of dimensionality reduction, its performance depends on the number of labeled training samples. With the growing of information from internet, marking the data requires more resources and is more difficult. Therefore, using unsupervised learning to learn the feature of data has extremely important research value. In this paper, an unsupervised multilayered variational auto-encoder model is studied in the text data, so that the high-dimensional feature to the low-dimensional feature becomes efficient and the low-dimensional feature can retain mainly information as much as possible. Low-dimensional feature obtained by different dimensionality reduction methods are used to compare with the dimensionality reduction results of variational auto-encoder (VAE), and the method can be significantly improved over other comparison methods.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Taylor Miller ◽  
Keval Patel ◽  
Coralis Rodriguez ◽  
Eric V. Stabb ◽  
Stephen J. Hagen

AbstractMany pheromone sensing bacteria produce and detect more than one chemically distinct signal, or autoinducer. The pathways that detect these signals are typically noisy and interlocked through crosstalk and feedback. As a result, the sensing response of individual cells is described by statistical distributions that change under different combinations of signal inputs. Here we examine how signal crosstalk reshapes this response. We measure how combinations of two homoserine lactone (HSL) input signals alter the statistical distributions of individual cell responses in the AinS/R- and LuxI/R-controlled branches of the Vibrio fischeri bioluminescence pathway. We find that, while the distributions of pathway activation in individual cells vary in complex fashion with environmental conditions, these changes have a low-dimensional representation. For both the AinS/R and LuxI/R branches, the distribution of individual cell responses to mixtures of the two HSLs is effectively one-dimensional, so that a single tuning parameter can capture the full range of variability in the distributions. Combinations of crosstalking HSL signals extend the range of responses for each branch of the circuit, so that signals in combination allow population-wide distributions that are not available under a single HSL input. Dimension reduction also simplifies the problem of identifying the HSL conditions to which the pathways and their outputs are most sensitive. A comparison of the maximum sensitivity HSL conditions to actual HSL levels measured during culture growth indicates that the AinS/R and LuxI/R branches lack sensitivity to population density except during the very earliest and latest stages of growth respectively.


Geophysics ◽  
2009 ◽  
Vol 74 (6) ◽  
pp. V123-V132 ◽  
Author(s):  
Daniel Trad

Although 3D seismic data are being acquired in larger volumes than ever before, the spatial sampling of these volumes is not always adequate for certain seismic processes. This is especially true of marine and land wide-azimuth acquisitions, leading to the development of multidimensional data interpolation techniques. Simultaneous interpolation in all five seismic data dimensions (inline, crossline, offset, azimuth, and frequency) has great utility in predicting missing data with correct amplitude and phase variations. Although there are many techniques that can be implemented in five dimensions, this study focused on sparse Fourier reconstruction. The success of Fourier interpolation methods depends largely on two factors: (1) having efficient Fourier transform operators that permit the use of large multidimensional data windows and (2) constraining the spatial spectrum along dimensions where seismic amplitudes change slowly so that the sparseness and band limitation assumptions remain valid. Fourier reconstruction can be performed when enforcing a sparseness constraint on the 4D spatial spectrum obtained from frequency slices of five-dimensional windows. Binning spatial positions into a fine 4D grid facilitates the use of the FFT, which helps on the convergence of the inversion algorithm. This improves the results and computational efficiency. The 5D interpolation can successfully interpolate sparse data, improve AVO analysis, and reduce migration artifacts. Target geometries for optimal interpolation and regularization of land data can be classified in terms of whether they preserve the original data and whether they are designed to achieve surface or subsurface consistency.


2019 ◽  
Vol 30 (3) ◽  
pp. 559-570
Author(s):  
Jukka Sirén ◽  
Samuel Kaski

Abstract Approximate Bayesian computation (ABC) and other likelihood-free inference methods have gained popularity in the last decade, as they allow rigorous statistical inference for complex models without analytically tractable likelihood functions. A key component for accurate inference with ABC is the choice of summary statistics, which summarize the information in the data, but at the same time should be low-dimensional for efficiency. Several dimension reduction techniques have been introduced to automatically construct informative and low-dimensional summaries from a possibly large pool of candidate summaries. Projection-based methods, which are based on learning simple functional relationships from the summaries to parameters, are widely used and usually perform well, but might fail when the assumptions behind the transformation are not satisfied. We introduce a localization strategy for any projection-based dimension reduction method, in which the transformation is estimated in the neighborhood of the observed data instead of the whole space. Localization strategies have been suggested before, but the performance of the transformed summaries outside the local neighborhood has not been guaranteed. In our localization approach the transformation is validated and optimized over validation datasets, ensuring reliable performance. We demonstrate the improvement in the estimation accuracy for localized versions of linear regression and partial least squares, for three different models of varying complexity.


Author(s):  
Akira Imakura ◽  
Momo Matsuda ◽  
Xiucai Ye ◽  
Tetsuya Sakurai

Dimensionality reduction methods that project highdimensional data to a low-dimensional space by matrix trace optimization are widely used for clustering and classification. The matrix trace optimization problem leads to an eigenvalue problem for a low-dimensional subspace construction, preserving certain properties of the original data. However, most of the existing methods use only a few eigenvectors to construct the low-dimensional space, which may lead to a loss of useful information for achieving successful classification. Herein, to overcome the deficiency of the information loss, we propose a novel complex moment-based supervised eigenmap including multiple eigenvectors for dimensionality reduction. Furthermore, the proposed method provides a general formulation for matrix trace optimization methods to incorporate with ridge regression, which models the linear dependency between covariate variables and univariate labels. To reduce the computational complexity, we also propose an efficient and parallel implementation of the proposed method. Numerical experiments indicate that the proposed method is competitive compared with the existing dimensionality reduction methods for the recognition performance. Additionally, the proposed method exhibits high parallel efficiency.


2004 ◽  
Vol 14 (02) ◽  
pp. 653-666 ◽  
Author(s):  
AXEL HUTT

The present work reviews briefly a segmentation method and a modeling approach for multivariate quasi-stationary data. The combination of both parts allows the extraction of low-dimensional models from multidimensional data. The segmentation method is applied to event-related potentials and fields and early auditory evoked potentials and extracts ERP- and ERF-components and early auditory waves objectively and independent from the number of segments. Additionally, the early auditory wave Pa is modeled by a two-dimensional system of ordinary differential equations. We find a common topology of wave Pa, which lets us conjecture intrinsic low-dimensional underlying attractors in the corresponding neuronal dynamics.


2009 ◽  
Vol 180 (10) ◽  
pp. 2002-2012 ◽  
Author(s):  
Sergei Manzhos ◽  
Koichi Yamashita ◽  
Tucker Carrington

Sign in / Sign up

Export Citation Format

Share Document