Dimensionality Reduction by Weighted Connections between Neighborhoods

Abstract and Applied Analysis ◽

10.1155/2014/928136 ◽

2014 ◽

Vol 2014 ◽

pp. 1-5 ◽

Cited By ~ 1

Author(s):

Fuding Xie ◽

Yutao Fan ◽

Ming Zhou

Keyword(s):

Dimensionality Reduction ◽

Dimensional Space ◽

High Dimensional Data ◽

Reduction Technique ◽

Experimental Results ◽

High Dimensional ◽

Reduced Dimensionality ◽

Dimensionality Reduction Technique ◽

Low Dimensionality ◽

Local Topology

Dimensionality reduction is the transformation of high-dimensional data into a meaningful representation of reduced dimensionality. This paper introduces a dimensionality reduction technique by weighted connections between neighborhoods to improveK-Isomap method, attempting to preserve perfectly the relationships between neighborhoods in the process of dimensionality reduction. The validity of the proposal is tested by three typical examples which are widely employed in the algorithms based on manifold. The experimental results show that the local topology nature of dataset is preserved well while transforming dataset in high-dimensional space into a new dataset in low-dimensionality by the proposed method.

Download Full-text

Explaining three-dimensional dimensionality reduction plots

Information Visualization ◽

10.1177/1473871615600010 ◽

2015 ◽

Vol 15 (2) ◽

pp. 154-172 ◽

Cited By ~ 11

Author(s):

Danilo B Coimbra ◽

Rafael M Martins ◽

Tácito TAT Neves ◽

Alexandru C Telea ◽

Fernando V Paulovich

Keyword(s):

Dimensionality Reduction ◽

Dimensional Space ◽

Three Dimensional ◽

Original Data ◽

Reduction Technique ◽

High Dimensional ◽

Dimensionality Reduction Technique ◽

Visualization Techniques ◽

High Dimensional Datasets ◽

Three Dimensional Space

Understanding three-dimensional projections created by dimensionality reduction from high-variate datasets is very challenging. In particular, classical three-dimensional scatterplots used to display such projections do not explicitly show the relations between the projected points, the viewpoint used to visualize the projection, and the original data variables. To explore and explain such relations, we propose a set of interactive visualization techniques. First, we adapt and enhance biplots to show the data variables in the projected three-dimensional space. Next, we use a set of interactive bar chart legends to show variables that are visible from a given viewpoint and also assist users to select an optimal viewpoint to examine a desired set of variables. Finally, we propose an interactive viewpoint legend that provides an overview of the information visible in a given three-dimensional projection from all possible viewpoints. Our techniques are simple to implement and can be applied to any dimensionality reduction technique. We demonstrate our techniques on the exploration of several real-world high-dimensional datasets.

Download Full-text

Evaluation of UMAP as an alternative to t-SNE for single-cell data

10.1101/298430 ◽

2018 ◽

Cited By ~ 23

Author(s):

Etienne Becht ◽

Charles-Antoine Dutertre ◽

Immanuel W. H. Kwok ◽

Lai Guan Ng ◽

Florent Ginhoux ◽

...

Keyword(s):

Dimensionality Reduction ◽

Single Cell ◽

Rna Sequencing ◽

Reduction Technique ◽

High Dimensional ◽

The Past ◽

Dimensionality Reduction Technique ◽

Single Cell Rna Sequencing ◽

Linear Dimensionality Reduction ◽

Cell Data

AbstractUniform Manifold Approximation and Projection (UMAP) is a recently-published non-linear dimensionality reduction technique. Another such algorithm, t-SNE, has been the default method for such task in the past years. Herein we comment on the usefulness of UMAP high-dimensional cytometry and single-cell RNA sequencing, notably highlighting faster runtime and consistency, meaningful organization of cell clusters and preservation of continuums in UMAP compared to t-SNE.

Download Full-text

Cluster Weighted Model Based on TSNE Algorithm for High-Dimensional Data

10.21203/rs.3.rs-347795/v1 ◽

2021 ◽

Author(s):

Kehinde Olobatuyi

Keyword(s):

Mixture Models ◽

Dimensional Space ◽

High Dimensional Data ◽

Expectation Maximization Algorithm ◽

Real Data ◽

R Package ◽

High Dimensional ◽

Data Sets ◽

Dimensionality Reduction Technique ◽

Weighted Model

Abstract Similar to many Machine Learning models, both accuracy and speed of the Cluster weighted models (CWMs) can be hampered by high-dimensional data, leading to previous works on a parsimonious technique to reduce the effect of ”Curse of dimensionality” on mixture models. In this work, we review the background study of the cluster weighted models (CWMs). We further show that parsimonious technique is not sufficient for mixture models to thrive in the presence of huge high-dimensional data. We discuss a heuristic for detecting the hidden components by choosing the initial values of location parameters using the default values in the ”FlexCWM” R package. We introduce a dimensionality reduction technique called T-distributed stochastic neighbor embedding (TSNE) to enhance the parsimonious CWMs in high-dimensional space. Originally, CWMs are suited for regression but for classification purposes, all multi-class variables are transformed logarithmically with some noise. The parameters of the model are obtained via expectation maximization algorithm. The effectiveness of the discussed technique is demonstrated using real data sets from different fields.

Download Full-text

High-Dimensional Text Clustering by Dimensionality Reduction and Improved Density Peak

Wireless Communications and Mobile Computing ◽

10.1155/2020/8881112 ◽

2020 ◽

Vol 2020 ◽

pp. 1-16

Author(s):

Yujia Sun ◽

Jan Platoš

Keyword(s):

Dimensionality Reduction ◽

Data Clustering ◽

High Dimensional Data ◽

Random Projection ◽

Experimental Results ◽

High Dimensional ◽

Density Peak ◽

Text Data ◽

Number Of Clusters ◽

Density Peaks

This study focuses on high-dimensional text data clustering, given the inability of K-means to process high-dimensional data and the need to specify the number of clusters and randomly select the initial centers. We propose a Stacked-Random Projection dimensionality reduction framework and an enhanced K-means algorithm DPC-K-means based on the improved density peaks algorithm. The improved density peaks algorithm determines the number of clusters and the initial clustering centers of K-means. Our proposed algorithm is validated using seven text datasets. Experimental results show that this algorithm is suitable for clustering of text data by correcting the defects of K-means.

Download Full-text

Hierarchical surrogate model with dimensionality reduction technique for high‐dimensional uncertainty propagation

International Journal for Numerical Methods in Engineering ◽

10.1002/nme.6299 ◽

2020 ◽

Vol 121 (9) ◽

pp. 2068-2085

Author(s):

Kai Cheng ◽

Zhenzhou Lu

Keyword(s):

Dimensionality Reduction ◽

Surrogate Model ◽

Uncertainty Propagation ◽

Reduction Technique ◽

High Dimensional ◽

Dimensionality Reduction Technique

Download Full-text

Decision Tree based Classification and Dimensionality Reduction of Cervical Cancer

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.f4530.049620 ◽

2020 ◽

Vol 9 (4) ◽

pp. 1531-1535

Keyword(s):

Cervical Cancer ◽

Decision Tree ◽

Dimensionality Reduction ◽

Reduction Technique ◽

High Dimensional ◽

Biological Processes ◽

Decision Tree Algorithm ◽

Dimensionality Reduction Technique ◽

C4.5 Algorithm ◽

C4.5 Decision Tree

The data revolution in medicines and biology have increased our fundamental understandings of biological processes and determining the factors causing any disease, but it has also posed a challenge towards their analysis. After breast cancer, most of the deaths among women are due to cervical cancer. According to IARC, alone in 2012 a noticeable number of cases estimated 7095 of cervical cancer were reported. 16.5% of the deaths were due to the cervical cancer with the total deaths of 28,711 among women. To analyze the high dimensional data with high accuracy and in less amount of time, their dimensionality needs to be reduced to remove irrelevant features. The classification is performed using the recent iteration in Quinlan’s C4.5 decision tree algorithm i.e. C5.0 algorithm and PCA as Dimensionality Reduction technique. Our proposed methodology has shown a significant improvement in the account of time taken by both algorithms. This shows that C5.0 algorithm is superior to C4.5 algorithm.

Download Full-text

High-dimensional data analysis with subspace comparison using matrix visualization

Information Visualization ◽

10.1177/1473871617733996 ◽

2017 ◽

Vol 18 (1) ◽

pp. 94-109 ◽

Cited By ~ 3

Author(s):

Junpeng Wang ◽

Xiaotong Liu ◽

Han-Wei Shen

Keyword(s):

Dimensionality Reduction ◽

Dimensional Space ◽

High Dimensional Data ◽

Principal Component ◽

Data Exploration ◽

High Dimensional ◽

Data Sets ◽

Subspace Analysis ◽

Matrix Visualization ◽

Fine Tune

Due to the intricate relationship between different dimensions of high-dimensional data, subspace analysis is often conducted to decompose dimensions and give prominence to certain subsets of dimensions, i.e. subspaces. Exploring and comparing subspaces are important to reveal the underlying features of subspaces, as well as to portray the characteristics of individual dimensions. To date, most of the existing high-dimensional data exploration and analysis approaches rely on dimensionality reduction algorithms (e.g. principal component analysis and multi-dimensional scaling) to project high-dimensional data, or their subspaces, to two-dimensional space and employ scatterplots for visualization. However, the dimensionality reduction algorithms are sometimes difficult to fine-tune and scatterplots are not effective for comparative visualization, making subspace comparison hard to perform. In this article, we aggregate high-dimensional data or their subspaces by computing pair-wise distances between all data items and showing the distances with matrix visualizations to present the original high-dimensional data or subspaces. Our approach enables effective visual comparisons among subspaces, which allows users to further investigate the characteristics of individual dimensions by studying their behaviors in similar subspaces. Through subspace comparisons, we identify dominant, similar, and conforming dimensions in different subspace contexts of synthetic and real-world high-dimensional data sets. Additionally, we present a prototype that integrates parallel coordinates plot and matrix visualization for high-dimensional data exploration and incremental dimensionality analysis, which also allows users to further validate the dimension characterization results derived from the subspace comparisons.

Download Full-text

Effective semi-supervised nonlinear dimensionality reduction for wood defects recognition

Computer Science and Information Systems ◽

10.2298/csis1001127z ◽

2010 ◽

Vol 7 (1) ◽

pp. 127-138 ◽

Cited By ~ 2

Author(s):

Zhao Zhang ◽

Ye Ning

Keyword(s):

Data Analysis ◽

Dimensionality Reduction ◽

Data Visualization ◽

Domain Knowledge ◽

High Dimensional Data ◽

Original Data ◽

Experimental Results ◽

High Dimensional ◽

Nonlinear Dimensionality Reduction ◽

Practical Usefulness

Dimensionality reduction is an important preprocessing step in high-dimensional data analysis without losing intrinsic information. The problem of semi-supervised nonlinear dimensionality reduction called KNDR is considered for wood defects recognition. In this setting, domain knowledge in forms of pairs constraints are used to specify whether pairs of instances belong to the same class or different classes. KNDR can project the data onto a set of 'useful' features and preserve the structure of labeled and unlabeled data as well as the constraints defined in the embedding space, under which the projections of the original data can be effectively partitioned from each other. We demonstrate the practical usefulness of KNDR for data visualization and wood defects recognition through extensive experiments. Experimental results show it achieves similar or even higher performances than some existing methods.

Download Full-text

Serial multimodal framework for enhancing user convenience using dimensionality reduction technique

2016 International Conference on Circuit, Power and Computing Technologies (ICCPCT) ◽

10.1109/iccpct.2016.7530162 ◽

2016 ◽

Cited By ~ 1

Author(s):

Sandra Prasad ◽

J. Aravinth

Keyword(s):

Dimensionality Reduction ◽

Reduction Technique ◽

Dimensionality Reduction Technique

Download Full-text

Parallel Framework for Dimensionality Reduction of Large-Scale Datasets

Scientific Programming ◽

10.1155/2015/180214 ◽

2015 ◽

Vol 2015 ◽

pp. 1-12 ◽

Cited By ~ 3

Author(s):

Sai Kiranmayee Samudrala ◽

Jaroslaw Zola ◽

Srinivas Aluru ◽

Baskar Ganapathysubramanian

Keyword(s):

Dimensionality Reduction ◽

Organic Solar Cells ◽

Large Scale ◽

Parallel Implementation ◽

High Dimensional Data ◽

Real Life ◽

Processing Parameters ◽

High Dimensional ◽

Morphology Evolution ◽

Reduction Techniques

Dimensionality reduction refers to a set of mathematical techniques used to reduce complexity of the original high-dimensional data, while preserving its selected properties. Improvements in simulation strategies and experimental data collection methods are resulting in a deluge of heterogeneous and high-dimensional data, which often makes dimensionality reduction the only viable way to gain qualitative and quantitative understanding of the data. However, existing dimensionality reduction software often does not scale to datasets arising in real-life applications, which may consist of thousands of points with millions of dimensions. In this paper, we propose a parallel framework for dimensionality reduction of large-scale data. We identify key components underlying the spectral dimensionality reduction techniques, and propose their efficient parallel implementation. We show that the resulting framework can be used to process datasets consisting of millions of points when executed on a 16,000-core cluster, which is beyond the reach of currently available methods. To further demonstrate applicability of our framework we perform dimensionality reduction of 75,000 images representing morphology evolution during manufacturing of organic solar cells in order to identify how processing parameters affect morphology evolution.

Download Full-text