Diffusion maps and coarse-graining: a unified framework for dimensionality reduction, graph partitioning, and data set parameterization

S. Lafon; A.B. Lee

doi:10.1109/tpami.2006.184

A New Approach for Supervised Dimensionality Reduction

International Journal of Data Warehousing and Mining ◽

10.4018/ijdwm.2018100102 ◽

2018 ◽

Vol 14 (4) ◽

pp. 20-37 ◽

Cited By ~ 1

Author(s):

Yinglei Song ◽

Yongzhong Li ◽

Junfeng Qu

Keyword(s):

Eigenvalue Problem ◽

Dimensionality Reduction ◽

Image Databases ◽

Data Sets ◽

Data Set ◽

New Approach ◽

Local Structures ◽

Benchmark Data ◽

Global And Local

This article develops a new approach for supervised dimensionality reduction. This approach considers both global and local structures of a labelled data set and maximizes a new objective that includes the effects from both of them. The objective can be approximately optimized by solving an eigenvalue problem. The approach is evaluated based on a few benchmark data sets and image databases. Its performance is also compared with a few other existing approaches for dimensionality reduction. Testing results show that, on average, this new approach can achieve more accurate results for dimensionality reduction than existing approaches.

Download Full-text

Review of Chinese Short Text Classification

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.336-338.2171 ◽

2013 ◽

Vol 336-338 ◽

pp. 2171-2174

Author(s):

Fen Lin Wu ◽

Jin Gou ◽

Cheng Wang

Keyword(s):

Text Classification ◽

Open Data ◽

Unified Framework ◽

Data Set ◽

Research Directions ◽

Short Text

Because features of Chinese short text are different from long text, current text classification technology of long text classification are not suit of short text classification. This paper summarizes features, various methods and Application for Chinese short text classification, and pointes out their advantages, disadvantages and applicability scope. Establishing Open data set and unified framework standard are two main Problems and further research directions.

Download Full-text

A Unified Framework for Dimensionality Reduction

2010 Chinese Conference on Pattern Recognition (CCPR) ◽

10.1109/ccpr.2010.5659134 ◽

2010 ◽

Author(s):

Fei Ma ◽

Jie Chen

Keyword(s):

Dimensionality Reduction ◽

Unified Framework

Download Full-text

Dimensionality Reduction and Classification of Hyperspectral Images using Genetic Algorithm

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v3.i3.pp503-511 ◽

2016 ◽

Vol 3 (3) ◽

pp. 503 ◽

Cited By ~ 2

Author(s):

R. Kiran Kumar ◽

B. Saichandana ◽

K. Srinivas

Keyword(s):

Genetic Algorithm ◽

Dimensionality Reduction ◽

Hyperspectral Image ◽

Image Data ◽

Unsupervised Classification ◽

Hyperspectral Data ◽

Band Selection ◽

Data Set ◽

Mode Decomposition ◽

Fused Image

This paper presents genetic algorithm based band selection and classification on hyperspectral image data set. Hyperspectral remote sensors collect image data for a large number of narrow, adjacent spectral bands. Every pixel in hyperspectral image involves a continuous spectrum that is used to classify the objects with great detail and precision. In this paper, first filtering based on 2-D Empirical mode decomposition method is used to remove any noisy components in each band of the hyperspectral data. After filtering, band selection is done using genetic algorithm in-order to remove bands that convey less information. This dimensionality reduction minimizes many requirements such as storage space, computational load, communication bandwidth etc which is imposed on the unsupervised classification algorithms. Next image fusion is performed on the selected hyperspectral bands to selectively merge the maximum possible features from the selected images to form a single image. This fused image is classified using genetic algorithm. Three different indices, such as K-means Index (KMI) and Jm measure are used as objective functions. This method increases classification accuracy and performance of hyperspectral image than without dimensionality reduction.

Download Full-text

A new efficient method to detect genetic interactions for lung cancer GWAS

BMC Medical Genomics ◽

10.1186/s12920-020-00807-9 ◽

2020 ◽

Vol 13 (1) ◽

Author(s):

Jennifer Luyapan ◽

Xuemei Ji ◽

Siting Li ◽

Xiangjun Xiao ◽

Dakai Zhu ◽

...

Keyword(s):

Lung Cancer ◽

Dimensionality Reduction ◽

Multifactor Dimensionality Reduction ◽

Disease Onset ◽

Genetic Interactions ◽

Survival Outcomes ◽

Type I ◽

Genome Wide Association Studies ◽

Data Set ◽

Genome Wide

Abstract Background Genome-wide association studies (GWAS) have proven successful in predicting genetic risk of disease using single-locus models; however, identifying single nucleotide polymorphism (SNP) interactions at the genome-wide scale is limited due to computational and statistical challenges. We addressed the computational burden encountered when detecting SNP interactions for survival analysis, such as age of disease-onset. To confront this problem, we developed a novel algorithm, called the Efficient Survival Multifactor Dimensionality Reduction (ES-MDR) method, which used Martingale Residuals as the outcome parameter to estimate survival outcomes, and implemented the Quantitative Multifactor Dimensionality Reduction method to identify significant interactions associated with age of disease-onset. Methods To demonstrate efficacy, we evaluated this method on two simulation data sets to estimate the type I error rate and power. Simulations showed that ES-MDR identified interactions using less computational workload and allowed for adjustment of covariates. We applied ES-MDR on the OncoArray-TRICL Consortium data with 14,935 cases and 12,787 controls for lung cancer (SNPs = 108,254) to search over all two-way interactions to identify genetic interactions associated with lung cancer age-of-onset. We tested the best model in an independent data set from the OncoArray-TRICL data. Results Our experiment on the OncoArray-TRICL data identified many one-way and two-way models with a single-base deletion in the noncoding region of BRCA1 (HR 1.24, P = 3.15 × 10–15), as the top marker to predict age of lung cancer onset. Conclusions From the results of our extensive simulations and analysis of a large GWAS study, we demonstrated that our method is an efficient algorithm that identified genetic interactions to include in our models to predict survival outcomes.

Download Full-text

Generalized notions of sparsity and restricted isometry property. Part I: a unified framework

Information and Inference A Journal of the IMA ◽

10.1093/imaiai/iay018 ◽

2019 ◽

Vol 9 (1) ◽

pp. 157-193 ◽

Cited By ~ 1

Author(s):

Marius Junge ◽

Kiryung Lee

Keyword(s):

Banach Space ◽

Inverse Problems ◽

Compressed Sensing ◽

Dimensionality Reduction ◽

Group Actions ◽

Function Spaces ◽

Restricted Isometry Property ◽

Low Rank ◽

Unified Framework ◽

Order Tensor

Abstract The restricted isometry property (RIP) is an integral tool in the analysis of various inverse problems with sparsity models. Motivated by the applications of compressed sensing and dimensionality reduction of low-rank tensors, we propose generalized notions of sparsity and provide a unified framework for the corresponding RIP, in particular when combined with isotropic group actions. Our results extend an approach by Rudelson and Vershynin to a much broader context including commutative and non-commutative function spaces. Moreover, our Banach space notion of sparsity applies to affine group actions. The generalized approach in particular applies to high-order tensor products.

Download Full-text

BLOCK ENTROPY ANALYSIS OF HEART RATE VARIABILITY SIGNALS

International Journal of Bifurcation and Chaos ◽

10.1142/s0218127406015933 ◽

2006 ◽

Vol 16 (07) ◽

pp. 2093-2101 ◽

Cited By ~ 15

Author(s):

K. KARAMANOS ◽

S. NIKOLOPOULOS ◽

K. HIZANIDIS ◽

G. MANIS ◽

A. ALEXANDRIDI ◽

...

Keyword(s):

Healthy Subjects ◽

Heat Rate ◽

Coarse Graining ◽

Coarse Grained ◽

Entropy Analysis ◽

Data Set ◽

Novel Approach ◽

Data Files ◽

Block Entropy ◽

Artery Disease

In this paper we present a novel approach to the analysis of Heat Rate Variability (HRV) data, by coarse-graining analysis using the estimation of Block Entropies with the technique of lumping. HRV time series are generated from long recordings of Electrocardiograms (ECGs) and are then filtered in order to produce a coarse-grained symbolic dynamics. Block Entropy analysis is applied to these dynamics in order to examine its coarse-grained statistics. Our data set is comprised of two subsets, one of healthy subjects and another of Coronary Artery Disease (CAD) patients. It is found that Entropy analysis provides a quick and efficient tool for the differentiation of these series according to subject category. Healthy subjects provided more complex statistics compared to patients; specifically, the healthy data files provided higher values of block Entropies compared to patient ones. We also compare these results with the Correlation Dimension Estimation in order to establish coherency. We believe that this analysis may provide a useful statistical method towards the better understanding of the human cardiac system.

Download Full-text

Diffusion maps for dimensionality reduction with partially labeled samples

2010 The 2nd International Conference on Computer and Automation Engineering (ICCAE) ◽

10.1109/iccae.2010.5451384 ◽

2010 ◽

Author(s):

Feng Zheng ◽

Zhan Song

Keyword(s):

Dimensionality Reduction ◽

Diffusion Maps

Download Full-text

Hybrid modeling using multivariate, discrete probability distributions

10.5194/egusphere-egu21-2866 ◽

2021 ◽

Author(s):

Uwe Ehret

Keyword(s):

Probability Distributions ◽

Coarse Graining ◽

Training Data ◽

Data Set ◽

Discrete Probability ◽

Learning From Data ◽

Out Of Sample ◽

Hydrological System ◽

Discrete Probability Distributions ◽

Process Based Models

In this contribution, I will suggest an approach to build models as ordered and connected collections of multivariate, discrete probability distributions (dpd's). This approach can be seen as a Machine-Learning (ML) approach as it allows very flexible learning from data (almost) without prior constraints. Models can be built on dpd's only (fully data-based model), but they can also be included into existing process-based models at places where relations among data are not well-known (hybrid model). This provides flexibility for learning similar to including other ML approaches - e.g. Neural Networks - into process-based models, with the advantage that the dpd's can be investigated and interpreted by the modeler as long as their dimensionality remains low. Models based on dpd's are fundamentally probabilistic, and model responses for out-of-sample situations can be assured by dynamically coarse-graining the dpd's: The farther a predictive situation is from the learning situations, the coarser/more uncertain the prediction will be, and vice versa.I will present the main elements and steps of such dpd-based modeling at the example of several systems, ranging from simple deterministic (ideal spring) to complex (hydrological system), and will discuss the influence of i) the size of the available training data set, ii) choice of the dpd priors, and iii) binning choices on the models' predictive power.

Download Full-text

ExTree—Explainable Genetic Feature Coupling Tree Using Fuzzy Mapping for Dimensionality Reduction with Application to NACA 0012 Airfoils Self-Noise Data Set

Advances in Intelligent Systems and Computing - Fuzzy Information Processing 2020 ◽

10.1007/978-3-030-81561-5_24 ◽

2021 ◽

pp. 289-300

Author(s):

Javier Viaña ◽

Kelly Cohen

Keyword(s):

Dimensionality Reduction ◽

Fuzzy Mapping ◽

Data Set ◽

Genetic Feature ◽

Noise Data ◽

Naca 0012

Download Full-text