High Dimensional Brushing for Interactive Exploration of Multivariate Data

Multiple imputations (MI) are predominantly applied in such processes that are involved in the transaction of huge chunks of missing data. Multivariate data that follow traditional statistical models undergoes great suffering for the inadequate availability of pertinent data. The field of distributed computing research faces the biggest hurdle in the form of insufficient high dimensional multivariate data. It mainly deals with the analysis of parallel input problems found in the cloud computing network in general and evaluation of high-performance computing in particular. In fact, it is a tough task to utilize parallel multiple input methods for accomplishing remarkable performance as well as allowing huge datasets achieves scale. In this regard, it is essential that a credible data system is developed and a decomposition strategy is used to partition workload in the entire process for minimum data dependence. Subsequently, a moderate synchronization and/or meager communication liability is followed for placing parallel impute methods for achieving scale as well as more processes. The present article proposes many novel applications for better efficiency. As the first step, this article suggests distributed-oriented serial regression multiple imputation for enhancing the efficiency of imputation task in high dimensional multivariate normal data. As the next step, the processes done in three diverse with parallel back ends viz. Multiple imputation that used the socket method to serve serial regression and the Fork Method to distribute work over workers, and also same work experiments in dynamic structure with a load balance mechanism. In the end, the set of distributed MI methods are used to experimentally analyze amplitude of imputation scores spanning across three probable scenarios in the range of 1:500. Further, the study makes an important observation that due to the efficiency of numerous imputation methods, the data is arranged proportionately in a missing range of 10% to 50%, low to high, while dealing with data between 1000 and 100,000 samples. The experiments are done in a cloud environment and demonstrate that it is possible to generate a decent speed by lessening the repetitive communication between processors.

Download Full-text

Tests of Covariance Matrices for High Dimensional Multivariate Data Under Non Normality

Communication in Statistics- Theory and Methods ◽

10.1080/03610926.2013.770533 ◽

2013 ◽

Vol 44 (7) ◽

pp. 1387-1398 ◽

Cited By ~ 1

Author(s):

M. Rauf Ahmad ◽

Dietrich Von Rosen

Keyword(s):

Multivariate Data ◽

Covariance Matrices ◽

High Dimensional

Download Full-text

Systematically Exploring Associations among Multivariate Data

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6158 ◽

2020 ◽

Vol 34 (04) ◽

pp. 6786-6794

Author(s):

Lifeng Zhang

Keyword(s):

Interaction Effect ◽

Functional Relationship ◽

Multivariate Data ◽

Coefficient Of Determination ◽

High Dimensional ◽

Data Sets ◽

Statistical Tool ◽

Wide Range ◽

Main Effect ◽

Data Points

Detecting relationships among multivariate data is often of great importance in the analysis of high-dimensional data sets, and has received growing attention for decades from both academic and industrial fields. In this study, we propose a statistical tool named the neighbor correlation coefficient (nCor), which is based on a new idea that measures the local continuity of the reordered data points to quantify the strength of the global association between variables. With sufficient sample size, the new method is able to capture a wide range of functional relationship, whether it is linear or nonlinear, bivariate or multivariate, main effect or interaction. The score of nCor roughly approximates the coefficient of determination (R2) of the data which implies the proportion of variance in one variable that is predictable from one or more other variables. On this basis, three nCor based statistics are also proposed here to further characterize the intra and inter structures of the associations from the aspects of nonlinearity, interaction effect, and variable redundancy. The mechanisms of these measures are proved in theory and demonstrated with numerical analyses.

Download Full-text

Canonical correlation analysis in high dimensions with structured regularization

Statistical Modelling ◽

10.1177/1471082x211041033 ◽

2021 ◽

pp. 1471082X2110410

Author(s):

Elena Tuzhilina ◽

Leonardo Tozzi ◽

Trevor Hastie

Keyword(s):

Data Structure ◽

Correlation Analysis ◽

Canonical Correlation Analysis ◽

Canonical Correlation ◽

Multivariate Data ◽

High Dimensional Data ◽

High Dimensional ◽

High Dimensions ◽

Structured Regularization ◽

Data Matrices

Canonical correlation analysis (CCA) is a technique for measuring the association between two multivariate data matrices. A regularized modification of canonical correlation analysis (RCCA) which imposes an [Formula: see text] penalty on the CCA coefficients is widely used in applications with high-dimensional data. One limitation of such regularization is that it ignores any data structure, treating all the features equally, which can be ill-suited for some applications. In this article we introduce several approaches to regularizing CCA that take the underlying data structure into account. In particular, the proposed group regularized canonical correlation analysis (GRCCA) is useful when the variables are correlated in groups. We illustrate some computational strategies to avoid excessive computations with regularized CCA in high dimensions. We demonstrate the application of these methods in our motivating application from neuroscience, as well as in a small simulation example.

Download Full-text

Interactive Glyph Graphics of Multivariate Data in Psychometrics

Methodology ◽

10.1027/1614-2241/a000031 ◽

2011 ◽

Vol 7 (4) ◽

pp. 134-144 ◽

Cited By ~ 3

Author(s):

Ali Ünlü ◽

Waqas Ahmed Malik

Keyword(s):

Software Package ◽

International Student ◽

Statistical Data ◽

Student Assessment ◽

Multivariate Data ◽

High Dimensional ◽

International Student Assessment ◽

Visualization Software ◽

Data Points ◽

Key Features

Gauguin (Grouping And Using Glyphs Uncovering Individual Nuances) is statistical data visualization software for the interactive graphical exploration of multivariate data using glyph representations. Glyphs are defined as geometric shapes scaled by the values of multivariate data. Each glyph represents one high-dimensional data point or the prototype (average) of a group or cluster of data points. This paper reviews the capabilities, functionality, and interactive properties of this software package. Key features of Gauguin are illustrated with data from the Programme for International Student Assessment.

Download Full-text

A note on mean testing for high dimensional multivariate data under non-normality

Statistica Neerlandica ◽

10.1111/j.1467-9574.2012.00533.x ◽

2012 ◽

Vol 67 (1) ◽

pp. 81-99 ◽

Cited By ~ 4

Author(s):

M. Rauf Ahmad ◽

Dietrich von Rosen ◽

Martin Singull

Keyword(s):

Multivariate Data ◽

High Dimensional

Download Full-text

High-Dimensional Depth-Cuing for Guided Tours of Multivariate Data

Computing and Graphics in Statistics - The IMA Volumes in Mathematics and its Applications ◽

10.1007/978-1-4613-9154-8_16 ◽

1991 ◽

pp. 239-252 ◽

Cited By ~ 1

Author(s):

Forrest W. Young ◽

Penny Rheingans

Keyword(s):

Multivariate Data ◽

High Dimensional ◽

Guided Tours

Download Full-text

Supervised classification in high-dimensional space: geometrical, statistical, and asymptotical properties of multivariate data

IEEE Transactions on Systems Man and Cybernetics Part C (Applications and Reviews) ◽

10.1109/5326.661089 ◽

1998 ◽

Vol 28 (1) ◽

pp. 39-54 ◽

Cited By ~ 270

Author(s):

L.O. Jimenez ◽

D.A. Landgrebe

Keyword(s):

Supervised Classification ◽

Dimensional Space ◽

Multivariate Data ◽

High Dimensional ◽

High Dimensional Space

Download Full-text

Nonparametric relevance-shifted multiple testing procedures for the analysis of high-dimensional multivariate data with small sample sizes

BMC Bioinformatics ◽

10.1186/1471-2105-9-54 ◽

2008 ◽

Vol 9 (1) ◽

Cited By ~ 2

Author(s):

Cornelia Frömke ◽

Ludwig A Hothorn ◽

Siegfried Kropf

Keyword(s):

Multiple Testing ◽

Multivariate Data ◽

Small Sample ◽

High Dimensional ◽

Sample Sizes ◽

Testing Procedures ◽

Multiple Testing Procedures ◽

Small Sample Sizes

Download Full-text

Distortion-Guided Structure-Driven Interactive Exploration of High-Dimensional Data

Computer Graphics Forum ◽

10.1111/cgf.12366 ◽

2014 ◽

Vol 33 (3) ◽

pp. 101-110 ◽

Cited By ~ 16

Author(s):

S. Liu ◽

B. Wang ◽

P.-T. Bremer ◽

V. Pascucci

Keyword(s):

High Dimensional Data ◽

High Dimensional ◽

Interactive Exploration

Download Full-text