Manifold Learning for Visualizing and Analyzing High-dimensional Data

An Integrated Manifold Learning Approach for High Dimensional Data Feature Extractions and its Applications to Online Process Monitoring of Additive Manufacturing

IISE Transactions ◽

10.1080/24725854.2020.1849876 ◽

2020 ◽

pp. 1-20

Author(s):

Chenang Liu ◽

Zhenyu (James) Kong ◽

Suresh Babu ◽

Chase Joslin ◽

James Ferguson

Keyword(s):

Additive Manufacturing ◽

Process Monitoring ◽

Manifold Learning ◽

High Dimensional Data ◽

High Dimensional ◽

Learning Approach ◽

Online Process Monitoring ◽

Online Process

Download Full-text

Functional Modeling of High-Dimensional Data: A Manifold Learning Approach

Mathematics ◽

10.3390/math9040406 ◽

2021 ◽

Vol 9 (4) ◽

pp. 406

Author(s):

Harold A. Hernández-Roig ◽

M. Carmen Aguilera-Morillo ◽

Rosa E. Lillo

Keyword(s):

Gene Expression ◽

Manifold Learning ◽

Essential Feature ◽

High Dimensional Data ◽

High Dimensional ◽

Nucleotide Polymorphisms ◽

Functional Modeling ◽

Expression Arrays ◽

Gene Expression Arrays ◽

Functional Representation

This paper introduces stringing via Manifold Learning (ML-stringing), an alternative to the original stringing based on Unidimensional Scaling (UDS). Our proposal is framed within a wider class of methods that map high-dimensional observations to the infinite space of functions, allowing the use of Functional Data Analysis (FDA). Stringing handles general high-dimensional data as scrambled realizations of an unknown stochastic process. Therefore, the essential feature of the method is a rearrangement of the observed values. Motivated by the linear nature of UDS and the increasing number of applications to biosciences (e.g., functional modeling of gene expression arrays and single nucleotide polymorphisms, or the classification of neuroimages) we aim to recover more complex relations between predictors through ML. In simulation studies, it is shown that ML-stringing achieves higher-quality orderings and that, in general, this leads to improvements in the functional representation and modeling of the data. The versatility of our method is also illustrated with an application to a colon cancer study that deals with high-dimensional gene expression arrays. This paper shows that ML-stringing is a feasible alternative to the UDS-based version. Also, it opens a window to new contributions to the field of FDA and the study of high-dimensional data.

Download Full-text

Utility metric for unsupervised feature selection

PeerJ Computer Science ◽

10.7717/peerj-cs.477 ◽

2021 ◽

Vol 7 ◽

pp. e477

Author(s):

Amalia Villa ◽

Abhijith Mundanad Narayanan ◽

Sabine Van Huffel ◽

Alexander Bertrand ◽

Carolina Varon

Keyword(s):

Feature Selection ◽

Manifold Learning ◽

State Of The Art ◽

High Dimensional Data ◽

Subset Selection ◽

The State ◽

Computational Time ◽

High Dimensional ◽

Learning Stage ◽

Unsupervised Feature Selection

Feature selection techniques are very useful approaches for dimensionality reduction in data analysis. They provide interpretable results by reducing the dimensions of the data to a subset of the original set of features. When the data lack annotations, unsupervised feature selectors are required for their analysis. Several algorithms for this aim exist in the literature, but despite their large applicability, they can be very inaccessible or cumbersome to use, mainly due to the need for tuning non-intuitive parameters and the high computational demands. In this work, a publicly available ready-to-use unsupervised feature selector is proposed, with comparable results to the state-of-the-art at a much lower computational cost. The suggested approach belongs to the methods known as spectral feature selectors. These methods generally consist of two stages: manifold learning and subset selection. In the first stage, the underlying structures in the high-dimensional data are extracted, while in the second stage a subset of the features is selected to replicate these structures. This paper suggests two contributions to this field, related to each of the stages involved. In the manifold learning stage, the effect of non-linearities in the data is explored, making use of a radial basis function (RBF) kernel, for which an alternative solution for the estimation of the kernel parameter is presented for cases with high-dimensional data. Additionally, the use of a backwards greedy approach based on the least-squares utility metric for the subset selection stage is proposed. The combination of these new ingredients results in the utility metric for unsupervised feature selection U2FS algorithm. The proposed U2FS algorithm succeeds in selecting the correct features in a simulation environment. In addition, the performance of the method on benchmark datasets is comparable to the state-of-the-art, while requiring less computational time. Moreover, unlike the state-of-the-art, U2FS does not require any tuning of parameters.

Download Full-text

Manifold learning: Dimensionality reduction and high dimensional data reconstruction via dictionary learning

Neurocomputing ◽

10.1016/j.neucom.2016.07.045 ◽

2016 ◽

Vol 216 ◽

pp. 268-285 ◽

Cited By ~ 8

Author(s):

Zhong Zhao ◽

Guocan Feng ◽

Jiehua Zhu ◽

Qi Shen

Keyword(s):

Dimensionality Reduction ◽

Manifold Learning ◽

Dictionary Learning ◽

High Dimensional Data ◽

High Dimensional ◽

Data Reconstruction

Download Full-text

Polynomial approximation to manifold learning

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-200202 ◽

2021 ◽

pp. 1-19

Author(s):

Guo Niu ◽

Zhengming Ma ◽

Haoqing Chen ◽

Xue Su

Keyword(s):

Dimensionality Reduction ◽

Manifold Learning ◽

Polynomial Approximation ◽

High Dimensional Data ◽

High Dimensional ◽

Dimensional Representation ◽

Object Function ◽

Polynomial Representation ◽

Intrinsic Structure ◽

Low Dimensional

Manifold learning plays an important role in nonlinear dimensionality reduction. But many manifold learning algorithms cannot offer an explicit expression for dealing with the problem of out-of-sample (or new data). In recent, many improved algorithms introduce a fixed function to the object function of manifold learning for learning this expression. In manifold learning, the relationship between the high-dimensional data and its low-dimensional representation is a local homeomorphic mapping. Therefore, these improved algorithms actually change or damage the intrinsic structure of manifold learning, as well as not manifold learning. In this paper, a novel manifold learning based on polynomial approximation (PAML) is proposed, which learns the polynomial approximation of manifold learning by using the dimensionality reduction results of manifold learning and the original high-dimensional data. In particular, we establish a polynomial representation of high-dimensional data with Kronecker product, and learns an optimal transformation matrix with this polynomial representation. This matrix gives an explicit and optimal nonlinear mapping between the high-dimensional data and its low-dimensional representation, and can be directly used for solving the problem of new data. Compare with using the fixed linear or nonlinear relationship instead of the manifold relationship, our proposed method actually learns the polynomial optimal approximation of manifold learning, without changing the object function of manifold learning (i.e., keeping the intrinsic structure of manifold learning). We implement experiments over eight data sets with the advanced algorithms published in recent years to demonstrate the benefits of our algorithm.

Download Full-text

Research of High Dimensional Data Corresponding Based on Semi Supervised Manifold Learning

International Journal of Advancements in Computing Technology ◽

10.4156/ijact.vol3.issue10.33 ◽

2011 ◽

Vol 3 (10) ◽

pp. 266-273 ◽

Cited By ~ 1

Author(s):

Gelan Yang ◽

Qian He ◽

Xiaojun Deng

Keyword(s):

Manifold Learning ◽

High Dimensional Data ◽

High Dimensional

Download Full-text

Large Sample Covariance Matrices and High-Dimensional Data Analysis

10.1017/cbo9781107588080 ◽

2015 ◽

Cited By ~ 26

Author(s):

Jianfeng Yao ◽

Shurong Zheng ◽

Zhidong Bai

Keyword(s):

Data Analysis ◽

High Dimensional Data ◽

Covariance Matrices ◽

High Dimensional ◽

Large Sample ◽

Sample Covariance Matrices ◽

Sample Covariance ◽

High Dimensional Data Analysis

Download Full-text

Fractal-Based Methods as a Technique for Estimating the Intrinsic Dimensionality of High-Dimensional Data: A Survey

Informatica ◽

10.15388/informatica.2016.84 ◽

2016 ◽

Vol 27 (2) ◽

pp. 257-281 ◽

Cited By ~ 5

Author(s):

Rasa Karbauskaitė ◽

Gintautas Dzemyda

Keyword(s):

High Dimensional Data ◽

High Dimensional ◽

Intrinsic Dimensionality

Download Full-text

A Fast Clustering Algorithm for Large-scale and High Dimensional Data

ACTA AUTOMATICA SINICA ◽

10.3724/sp.j.1004.2009.00859 ◽

2009 ◽

Vol 35 (7) ◽

pp. 859-866

Author(s):

Ming LIU ◽

Xiao-Long WANG ◽

Yuan-Chao LIU

Keyword(s):

Large Scale ◽

Clustering Algorithm ◽

High Dimensional Data ◽

High Dimensional

Download Full-text

Improved negative selection algorithm for network anomaly detection on high-dimensional data

Journal of Computer Applications ◽

10.3724/sp.j.1087.2009.00805 ◽

2009 ◽

Vol 29 (3) ◽

pp. 805-807 ◽

Cited By ~ 1

Author(s):

Wen-zhong GUO ◽

Guo-long CHEN ◽

Qing-liang CHEN

Keyword(s):

Anomaly Detection ◽

Negative Selection ◽

High Dimensional Data ◽

High Dimensional ◽

Selection Algorithm ◽

Negative Selection Algorithm ◽

Network Anomaly Detection

Download Full-text