High Dimensional Classification of Structural MRI Alzheimer?s Disease Data Based on Large Scale Regularization

A Compendium to Ensure Computational Reproducibility in High-Dimensional Classification Tasks

Statistical Applications in Genetics and Molecular Biology ◽

10.2202/1544-6115.1078 ◽

2004 ◽

Vol 3 (1) ◽

pp. 1-24 ◽

Cited By ~ 46

Author(s):

Markus Ruschhaupt ◽

Wolfgang Huber ◽

Annemarie Poustka ◽

Ulrich Mansmann

Keyword(s):

Expression Profiles ◽

Gene Expression Profiles ◽

Statistical Processing ◽

Primary Data ◽

High Dimensional ◽

Microarray Gene Expression ◽

Data Set ◽

Dimensional Classification ◽

Biological Interpretation

We demonstrate a concept and implementation of a compendium for the classification of high-dimensional data from microarray gene expression profiles. A compendium is an interactive document that bundles primary data, statistical processing methods, figures, and derived data together with the textual documentation and conclusions. Interactivity allows the reader to modify and extend these components. We address the following questions: how much does the discriminatory power of a classifier depend on the choice of the algorithm that was used to identify it; what alternative classifiers could be used just as well; how robust is the result. The answers to these questions are essential prerequisites for validation and biological interpretation of the classifiers. We show how to use this approach by looking at these questions for a specific breast cancer microarray data set that first has been studied by Huang et al. (2003).

Download Full-text

Weighted Similarity Classifier Using Differential Evolution and Genetic Algorithm in Weight Optimization

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2004.p0591 ◽

2004 ◽

Vol 8 (6) ◽

pp. 591-598 ◽

Cited By ~ 7

Author(s):

Pasi Luukka ◽

◽

Jouni Sampo

Keyword(s):

Genetic Algorithm ◽

Differential Evolution ◽

Large Scale ◽

Differential Evolution Algorithm ◽

Similarity Measures ◽

High Dimensional ◽

Weight Optimization ◽

Large Scale Data ◽

Data Weighting

We have compared the differential evolution and genetic algorithms in a study of weight optimization for different similarity measures in a task of classification. In a study of high dimensional data weighting similarity measures become of great importance and efforts to study suitable optimizers is needed. In this article we have studied proper weighting of similarity measures in the classification of high dimensional and large scale data. We will show that in most cases the differential evolution algorithm should be used in finding the weights instead of the genetic algorithm.

Download Full-text

On two simple and effective procedures for high dimensional classification of general populations

Statistical Papers ◽

10.1007/s00362-015-0660-8 ◽

2015 ◽

Vol 57 (2) ◽

pp. 381-405 ◽

Cited By ~ 1

Author(s):

Zhaoyuan Li ◽

Jianfeng Yao

Keyword(s):

High Dimensional ◽

Dimensional Classification ◽

General Populations ◽

Effective Procedures

Download Full-text

A Dynamic Ensemble Framework for Mining Textual Streams with Class Imbalance

The Scientific World JOURNAL ◽

10.1155/2014/497354 ◽

2014 ◽

Vol 2014 ◽

pp. 1-11 ◽

Cited By ~ 2

Author(s):

Ge Song ◽

Yunming Ye

Keyword(s):

Large Scale ◽

State Of The Art ◽

Concept Drift ◽

Real Life ◽

Class Imbalance ◽

High Dimensional ◽

Adaptive Selection ◽

Stream Classification ◽

Rare Class

Textual stream classification has become a realistic and challenging issue since large-scale, high-dimensional, and non-stationary streams with class imbalance have been widely used in various real-life applications. According to the characters of textual streams, it is technically difficult to deal with the classification of textual stream, especially in imbalanced environment. In this paper, we propose a new ensemble framework, clustering forest, for learning from the textual imbalanced stream with concept drift (CFIM). The CFIM is based on ensemble learning by integrating a set of clustering trees (CTs). An adaptive selection method, which flexibly chooses the useful CTs by the property of the stream, is presented in CFIM. In particular, to deal with the problem of class imbalance, we collect and reuse both rare-class instances and misclassified instances from the historical chunks. Compared to most existing approaches, it is worth pointing out that our approach assumes that both majority class and rareclass may suffer from concept drift. Thus the distribution of resampled instances is similar to the current concept. The effectiveness of CFIM is examined in five real-world textual streams under an imbalanced nonstationary environment. Experimental results demonstrate that CFIM achieves better performance than four state-of-the-art ensemble models.

Download Full-text

LOCUST: An Online Analytical Processing Framework for High Dimensional Classification of Data Streams

2008 IEEE 24th International Conference on Data Engineering ◽

10.1109/icde.2008.4497451 ◽

2008 ◽

Cited By ~ 5

Author(s):

Charu C. Aggarwal ◽

Philip S. Yu

Keyword(s):

Data Streams ◽

Online Analytical Processing ◽

High Dimensional ◽

Dimensional Classification ◽

Analytical Processing ◽

Processing Framework

Download Full-text

Large-Scale Feature Selection With Gaussian Mixture Models for the Classification of High Dimensional Remote Sensing Images

IEEE Transactions on Computational Imaging ◽

10.1109/tci.2017.2666551 ◽

2017 ◽

Vol 3 (2) ◽

pp. 230-242 ◽

Cited By ~ 18

Author(s):

Adrien Lagrange ◽

Mathieu Fauvel ◽

Manuel Grizonnet

Keyword(s):

Remote Sensing ◽

Feature Selection ◽

Mixture Models ◽

Large Scale ◽

Gaussian Mixture Models ◽

Gaussian Mixture ◽

High Dimensional ◽

Remote Sensing Images ◽

Scale Feature

Download Full-text

Latent‐lSVM classification of very high‐dimensional and large‐scale multi‐class datasets

Concurrency and Computation Practice and Experience ◽

10.1002/cpe.4224 ◽

2017 ◽

Vol 31 (2) ◽

pp. e4224 ◽

Cited By ~ 5

Author(s):

Thanh‐Nghi Do ◽

François Poulet

Keyword(s):

Large Scale ◽

High Dimensional ◽

Very High

Download Full-text

Artificial Neural Networks Compared to Factor Analysis for Low-Dimensional Classification of High-Dimensional Body Fat Topography Data of Healthy and Diabetic Subjects

Computers and Biomedical Research ◽

10.1006/cbmr.2000.1550 ◽

2000 ◽

Vol 33 (5) ◽

pp. 365-374 ◽

Cited By ~ 8

Author(s):

Erwin Tafeit ◽

Reinhard Möller ◽

Karl Sudi ◽

Gilbert Reibnegger

Keyword(s):

Neural Networks ◽

Factor Analysis ◽

Artificial Neural Networks ◽

Body Fat ◽

High Dimensional ◽

Dimensional Classification ◽

Artificial Neural ◽

Low Dimensional ◽

Topography Data

Download Full-text

Feature selection methods and their combinations in high-dimensional classification of speaker likability, intelligibility and personality traits

Computer Speech & Language ◽

10.1016/j.csl.2013.11.004 ◽

2015 ◽

Vol 29 (1) ◽

pp. 145-171 ◽

Cited By ~ 106

Author(s):

Jouni Pohjalainen ◽

Okko Räsänen ◽

Serdar Kadioglu

Keyword(s):

Feature Selection ◽

Personality Traits ◽

High Dimensional ◽

Selection Methods ◽

Dimensional Classification

Download Full-text

Massively expedited genome-wide heritability analysis (MEGHA)

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1415603112 ◽

2015 ◽

Vol 112 (8) ◽

pp. 2479-2484 ◽

Cited By ~ 46

Author(s):

Tian Ge ◽

Thomas E. Nichols ◽

Phil H. Lee ◽

Avram J. Holmes ◽

Joshua L. Roffman ◽

...

Keyword(s):

Large Scale ◽

Structural Mri ◽

Population Based ◽

European Ancestry ◽

Computational Time ◽

High Dimensional ◽

Pedigree Data ◽

Snp Data ◽

Genome Wide ◽

Heritability Analysis

The discovery and prioritization of heritable phenotypes is a computational challenge in a variety of settings, including neuroimaging genetics and analyses of the vast phenotypic repositories in electronic health record systems and population-based biobanks. Classical estimates of heritability require twin or pedigree data, which can be costly and difficult to acquire. Genome-wide complex trait analysis is an alternative tool to compute heritability estimates from unrelated individuals, using genome-wide data that are increasingly ubiquitous, but is computationally demanding and becomes difficult to apply in evaluating very large numbers of phenotypes. Here we present a fast and accurate statistical method for high-dimensional heritability analysis using genome-wide SNP data from unrelated individuals, termed massively expedited genome-wide heritability analysis (MEGHA) and accompanying nonparametric sampling techniques that enable flexible inferences for arbitrary statistics of interest. MEGHA produces estimates and significance measures of heritability with several orders of magnitude less computational time than existing methods, making heritability-based prioritization of millions of phenotypes based on data from unrelated individuals tractable for the first time to our knowledge. As a demonstration of application, we conducted heritability analyses on global and local morphometric measurements derived from brain structural MRI scans, using genome-wide SNP data from 1,320 unrelated young healthy adults of non-Hispanic European ancestry. We also computed surface maps of heritability for cortical thickness measures and empirically localized cortical regions where thickness measures were significantly heritable. Our analyses demonstrate the unique capability of MEGHA for large-scale heritability-based screening and high-dimensional heritability profile construction.

Download Full-text