scholarly journals Diffusion maps and coarse-graining: a unified framework for dimensionality reduction, graph partitioning, and data set parameterization

2006 ◽  
Vol 28 (9) ◽  
pp. 1393-1403 ◽  
Author(s):  
S. Lafon ◽  
A.B. Lee
2018 ◽  
Vol 14 (4) ◽  
pp. 20-37 ◽  
Author(s):  
Yinglei Song ◽  
Yongzhong Li ◽  
Junfeng Qu

This article develops a new approach for supervised dimensionality reduction. This approach considers both global and local structures of a labelled data set and maximizes a new objective that includes the effects from both of them. The objective can be approximately optimized by solving an eigenvalue problem. The approach is evaluated based on a few benchmark data sets and image databases. Its performance is also compared with a few other existing approaches for dimensionality reduction. Testing results show that, on average, this new approach can achieve more accurate results for dimensionality reduction than existing approaches.


2013 ◽  
Vol 336-338 ◽  
pp. 2171-2174
Author(s):  
Fen Lin Wu ◽  
Jin Gou ◽  
Cheng Wang

Because features of Chinese short text are different from long text, current text classification technology of long text classification are not suit of short text classification. This paper summarizes features, various methods and Application for Chinese short text classification, and pointes out their advantages, disadvantages and applicability scope. Establishing Open data set and unified framework standard are two main Problems and further research directions.


Author(s):  
R. Kiran Kumar ◽  
B. Saichandana ◽  
K. Srinivas

<p>This paper presents genetic algorithm based band selection and classification on hyperspectral image data set. Hyperspectral remote sensors collect image data for a large number of narrow, adjacent spectral bands. Every pixel in hyperspectral image involves a continuous spectrum that is used to classify the objects with great detail and precision. In this paper, first filtering based on 2-D Empirical mode decomposition method is used to remove any noisy components in each band of the hyperspectral data. After filtering, band selection is done using genetic algorithm in-order to remove bands that convey less information. This dimensionality reduction minimizes many requirements such as storage space, computational load, communication bandwidth etc which is imposed on the unsupervised classification algorithms. Next image fusion is performed on the selected hyperspectral bands to selectively merge the maximum possible features from the selected images to form a single image. This fused image is classified using genetic algorithm. Three different indices, such as K-means Index (KMI) and Jm measure are used as objective functions. This method increases classification accuracy and performance of hyperspectral image than without dimensionality reduction.</p>


2020 ◽  
Vol 13 (1) ◽  
Author(s):  
Jennifer Luyapan ◽  
Xuemei Ji ◽  
Siting Li ◽  
Xiangjun Xiao ◽  
Dakai Zhu ◽  
...  

Abstract Background Genome-wide association studies (GWAS) have proven successful in predicting genetic risk of disease using single-locus models; however, identifying single nucleotide polymorphism (SNP) interactions at the genome-wide scale is limited due to computational and statistical challenges. We addressed the computational burden encountered when detecting SNP interactions for survival analysis, such as age of disease-onset. To confront this problem, we developed a novel algorithm, called the Efficient Survival Multifactor Dimensionality Reduction (ES-MDR) method, which used Martingale Residuals as the outcome parameter to estimate survival outcomes, and implemented the Quantitative Multifactor Dimensionality Reduction method to identify significant interactions associated with age of disease-onset. Methods To demonstrate efficacy, we evaluated this method on two simulation data sets to estimate the type I error rate and power. Simulations showed that ES-MDR identified interactions using less computational workload and allowed for adjustment of covariates. We applied ES-MDR on the OncoArray-TRICL Consortium data with 14,935 cases and 12,787 controls for lung cancer (SNPs = 108,254) to search over all two-way interactions to identify genetic interactions associated with lung cancer age-of-onset. We tested the best model in an independent data set from the OncoArray-TRICL data. Results Our experiment on the OncoArray-TRICL data identified many one-way and two-way models with a single-base deletion in the noncoding region of BRCA1 (HR 1.24, P = 3.15 × 10–15), as the top marker to predict age of lung cancer onset. Conclusions From the results of our extensive simulations and analysis of a large GWAS study, we demonstrated that our method is an efficient algorithm that identified genetic interactions to include in our models to predict survival outcomes.


2019 ◽  
Vol 9 (1) ◽  
pp. 157-193 ◽  
Author(s):  
Marius Junge ◽  
Kiryung Lee

Abstract The restricted isometry property (RIP) is an integral tool in the analysis of various inverse problems with sparsity models. Motivated by the applications of compressed sensing and dimensionality reduction of low-rank tensors, we propose generalized notions of sparsity and provide a unified framework for the corresponding RIP, in particular when combined with isotropic group actions. Our results extend an approach by Rudelson and Vershynin to a much broader context including commutative and non-commutative function spaces. Moreover, our Banach space notion of sparsity applies to affine group actions. The generalized approach in particular applies to high-order tensor products.


2006 ◽  
Vol 16 (07) ◽  
pp. 2093-2101 ◽  
Author(s):  
K. KARAMANOS ◽  
S. NIKOLOPOULOS ◽  
K. HIZANIDIS ◽  
G. MANIS ◽  
A. ALEXANDRIDI ◽  
...  

In this paper we present a novel approach to the analysis of Heat Rate Variability (HRV) data, by coarse-graining analysis using the estimation of Block Entropies with the technique of lumping. HRV time series are generated from long recordings of Electrocardiograms (ECGs) and are then filtered in order to produce a coarse-grained symbolic dynamics. Block Entropy analysis is applied to these dynamics in order to examine its coarse-grained statistics. Our data set is comprised of two subsets, one of healthy subjects and another of Coronary Artery Disease (CAD) patients. It is found that Entropy analysis provides a quick and efficient tool for the differentiation of these series according to subject category. Healthy subjects provided more complex statistics compared to patients; specifically, the healthy data files provided higher values of block Entropies compared to patient ones. We also compare these results with the Correlation Dimension Estimation in order to establish coherency. We believe that this analysis may provide a useful statistical method towards the better understanding of the human cardiac system.


2021 ◽  
Author(s):  
Uwe Ehret

&lt;p&gt;In this contribution, I will suggest an approach to build models as ordered and connected collections of multivariate, discrete probability distributions (dpd's). This approach can be seen as a Machine-Learning (ML) approach as it allows very flexible learning from data (almost) without prior constraints. Models can be built on dpd's only (fully data-based model), but they can also be included into existing process-based models at places where relations among data are not well-known (hybrid model). This provides flexibility for learning similar to including other ML approaches - e.g. Neural Networks - into process-based models, with the advantage that the dpd's can be investigated and interpreted by the modeler as long as their dimensionality remains low. Models based on dpd's are fundamentally probabilistic, and model responses for out-of-sample situations can be assured by dynamically coarse-graining the dpd's: The farther a predictive situation is from the learning situations, the coarser/more uncertain the prediction will be, and vice versa.&lt;/p&gt;&lt;p&gt;I will present the main elements and steps of such dpd-based modeling at the example of several systems, ranging from simple deterministic (ideal spring) to complex (hydrological system), and will discuss the influence of i) the size of the available training data set, ii) choice of the dpd priors, and iii) binning choices on the models' predictive power.&lt;/p&gt;


Sign in / Sign up

Export Citation Format

Share Document