Topological visual analysis of clusterings in high-dimensional information spaces

Patrick Oesterling; Patrick Jähnichen; Gerhard Heyer; Gerik Scheuermann

doi:10.1515/itit-2014-1073

Topological visual analysis of clusterings in high-dimensional information spaces

it - Information Technology ◽

10.1515/itit-2014-1073 ◽

2015 ◽

Vol 57 (1) ◽

Author(s):

Patrick Oesterling ◽

Patrick Jähnichen ◽

Gerhard Heyer ◽

Gerik Scheuermann

Keyword(s):

Visual Analysis ◽

Noisy Data ◽

Information Loss ◽

Local Analysis ◽

High Dimensional ◽

Information Spaces ◽

Domain Specific ◽

Analysis Process ◽

Visual Clutter ◽

Case Structure

AbstractIn many applications, domain-specific entities are easily compared and categorized if they are represented as high-dimensional feature vectors. To detect object similarities and to quantify coherent groups, analysts often visualize the vectors directly, aiming to identify clusters visually. However, common visualizations for high-dimensional data often suffer from information loss, occlusions and visual clutter for large and noisy data. In this case, structure is misleading and false insights are derived. We use topological concepts to provide a structural view of the points. We analyze them in their original space and depict their clustering structure using intuitive landscapes. We describe the visual analysis process to define and simplify the structural view and to perform local analysis by linking individual features to other visualizations.

Download Full-text

Learning from high-dimensional noisy data via projections onto multi-dimensional ellipsoids

2010 IEEE International Conference on Acoustics, Speech and Signal Processing ◽

10.1109/icassp.2010.5495284 ◽

2010 ◽

Author(s):

Liuling Gong ◽

Dan Schonfeld

Keyword(s):

Noisy Data ◽

High Dimensional

Download Full-text

Visual analysis to compare academic performances of quota and non-quota students from computer-related programs

10.5753/wei.2020.11154 ◽

2020 ◽

Author(s):

Pedro B. Pio ◽

Igor C. Sodré ◽

Vinicius R. P. Borges

Keyword(s):

Academic Performance ◽

Public Universities ◽

Visual Analysis ◽

Dropout Rates ◽

Quota System ◽

Brazilian Society ◽

Analysis Process

The implementation of affirmative actions in public universities is a topic of debate within the Brazilian society, specially regarding the academic performance of students that have been admitted through the quota system. This paper describes a visual analysis process to explore and compare the academic performances of quota and non-quota students from computer-related programs in a public Brazilian university. The results revealed that both failure and dropout rates for quota students are slightly higher than non-quota students in the first terms, but tends to present similar rates at the final terms.

Download Full-text

Geometric classifiers for high-dimensional noisy data

Journal of Multivariate Analysis ◽

10.1016/j.jmva.2021.104850 ◽

2021 ◽

pp. 104850

Author(s):

Aki Ishii ◽

Kazuyoshi Yata ◽

Makoto Aoshima

Keyword(s):

Noisy Data ◽

High Dimensional

Download Full-text

A systematic view on data descriptors for the visual analysis of tabular data

Information Visualization ◽

10.1177/1473871616667767 ◽

2016 ◽

Vol 16 (3) ◽

pp. 232-256 ◽

Cited By ~ 5

Author(s):

Hans-Jörg Schulz ◽

Thomas Nocke ◽

Magnus Heitzler ◽

Heidrun Schumann

Keyword(s):

Data Analysis ◽

Visual Analysis ◽

Data Gathering ◽

Climate Impact ◽

General Concept ◽

Basic Knowledge ◽

Data Types ◽

Process Data ◽

Tabular Data ◽

Analysis Process

Visualization has become an important ingredient of data analysis, supporting users in exploring data and confirming hypotheses. At the beginning of a visual data analysis process, data characteristics are often assessed in an initial data profiling step. These include, for example, statistical properties of the data and information on the data’s well-formedness, which can be used during the subsequent analysis to adequately parametrize views and to highlight or exclude data items. We term this information data descriptors, which can span such diverse aspects as the data’s provenance, its storage schema, or its uncertainties. Gathered descriptors encapsulate basic knowledge about the data and can thus be used as objective starting points for the visual analysis process. In this article, we bring together these different aspects in a systematic form that describes the data itself (e.g. its content and context) and its relation to the larger data gathering and visual analysis process (e.g. its provenance and its utility). Once established in general, we further detail the concept of data descriptors specifically for tabular data as the most common form of structured data today. Finally, we utilize these data descriptors for tabular data to capture domain-specific data characteristics in the field of climate impact research. This procedure from the general concept via the concrete data type to the specific application domain effectively provides a blueprint for instantiating data descriptors for other data types and domains in the future.

Download Full-text

Application of Sentiment Analysis in Movie reviews

Advances in Business Information Systems and Analytics - Sentiment Analysis and Knowledge Discovery in Contemporary Business ◽

10.4018/978-1-5225-4999-4.ch006 ◽

2019 ◽

pp. 77-90 ◽

Cited By ~ 5

Author(s):

ThippaReddy Gadekallu ◽

Akshat Soni ◽

Deeptanu Sarkar ◽

Lakshmanna Kuruva

Keyword(s):

Sentiment Analysis ◽

Experimental Work ◽

Opinion Mining ◽

Rating Systems ◽

Domain Specific ◽

Analysis Process ◽

The People ◽

Textual Data ◽

Feature Based ◽

The Web

Sentiment analysis is a sub-domain of opinion mining where the analysis is focused on the extraction of emotions and opinions of the people towards a particular topic from a structured, semi-structured, or unstructured textual data. In this chapter, the authors try to focus the task of sentiment analysis on IMDB movie review database. This chapter presents the experimental work on a new kind of domain-specific feature-based heuristic for aspect-level sentiment analysis of movie reviews. The authors have devised an aspect-oriented scheme that analyzes the textual reviews of a movie and assign it a sentiment label on each aspect. Finally, the authors conclude that incorporating syntactical information in the models is vital to the sentiment analysis process. The authors also conclude that the proposed approach to sentiment classification supplements the existing rating movie rating systems used across the web and will serve as base to future researches in this domain.

Download Full-text

A New Visual Analysis Approach to the High Dimensional Data

Journal of Physics Conference Series ◽

10.1088/1742-6596/1098/1/012008 ◽

2018 ◽

Vol 1098 ◽

pp. 012008

Author(s):

Meina Song ◽

Zhihui Pang ◽

E Haihong

Keyword(s):

Visual Analysis ◽

High Dimensional Data ◽

Analysis Approach ◽

High Dimensional

Download Full-text

Double-Arc Parallel Coordinates and its Axes re-Ordering Methods

Mobile Networks and Applications ◽

10.1007/s11036-019-01455-9 ◽

2020 ◽

Vol 25 (4) ◽

pp. 1376-1391

Author(s):

Liangfu Lu ◽

Wenbo Wang ◽

Zhiyuan Tan

Keyword(s):

High Dimensional Data ◽

Two Dimensions ◽

High Dimensional ◽

Data Sets ◽

Parallel Coordinates ◽

Practical Applications ◽

Data Volume ◽

Visual Clutter ◽

Correlation Information ◽

Value Decomposition

AbstractThe Parallel Coordinates Plot (PCP) is a popular technique for the exploration of high-dimensional data. In many cases, researchers apply it as an effective method to analyze and mine data. However, when today’s data volume is getting larger, visual clutter and data clarity become two of the main challenges in parallel coordinates plot. Although Arc Coordinates Plot (ACP) is a popular approach to address these challenges, few optimization and improvement have been made on it. In this paper, we do three main contributions on the state-of-the-art PCP methods. One approach is the improvement of visual method itself. The other two approaches are mainly on the improvement of perceptual scalability when the scale or the dimensions of the data turn to be large in some mobile and wireless practical applications. 1) We present an improved visualization method based on ACP, termed as double arc coordinates plot (DACP). It not only reduces the visual clutter in ACP, but use a dimension-based bundling method with further optimization to deals with the issues of the conventional parallel coordinates plot (PCP). 2)To reduce the clutter caused by the order of the axes and reveal patterns that hidden in the data sets, we propose our first dimensional reordering method, a contribution-based method in DACP, which is based on the singular value decomposition (SVD) algorithm. The approach computes the importance score of attributes (dimensions) of the data using SVD and visualize the dimensions from left to right in DACP according the score in SVD. 3) Moreover, a similarity-based method, which is based on the combination of nonlinear correlation coefficient and SVD algorithm, is proposed as well in the paper. To measure the correlation between two dimensions and explains how the two dimensions interact with each other, we propose a reordering method based on non-linear correlation information measurements. We mainly use mutual information to calculate the partial similarity of dimensions in high-dimensional data visualization, and SVD is used to measure global data. Lastly, we use five case scenarios to evaluate the effectiveness of DACP, and the results show that our approaches not only do well in visualizing multivariate dataset, but also effectively alleviate the visual clutter in the conventional PCP, which bring users a better visual experience.

Download Full-text

VafusQ: A methodology to build visual analysis applications with data quality features

Information Visualization ◽

10.1177/1473871618821255 ◽

2018 ◽

Vol 18 (4) ◽

pp. 384-404

Author(s):

Jhon Alejandro Triana ◽

Dirk Zeckzer ◽

Hans Hagen ◽

Jose Tiberio Hernandez

Keyword(s):

Data Quality ◽

Visual Analysis ◽

New Technologies ◽

The Public ◽

Interactive Applications ◽

Public Transport System ◽

Analysis Process ◽

Interactive Visualizations ◽

Quality Features

The use of interactive applications to support the decision-making process is more common every day. However, a huge amount of data is required in order to make more informed decisions. Fortunately, with the arrival of new technologies there are many data sources available. This requirement of data causes heterogeneity and data quality problems. A set of data quality problems are reduced in the preprocessing stage. However, many data quality issues persist after the preprocessing stage. For this reason, we proposed a methodology to take the data quality problems, to represent them and simultaneously support the analysis process. In addition, an application is developed as a use case of the methodology by analyzing the public transport system in Bogotá. Furthermore, a case study is performed to test the usefulness of the developed application. As a result, the methodology made possible the development of interactive visualizations that constitute an application that is useful to achieve the analysis tasks by including data quality features.

Download Full-text