Topological visual analysis of clusterings in high-dimensional information spaces

2015 ◽  
Vol 57 (1) ◽  
Author(s):  
Patrick Oesterling ◽  
Patrick Jähnichen ◽  
Gerhard Heyer ◽  
Gerik Scheuermann

AbstractIn many applications, domain-specific entities are easily compared and categorized if they are represented as high-dimensional feature vectors. To detect object similarities and to quantify coherent groups, analysts often visualize the vectors directly, aiming to identify clusters visually. However, common visualizations for high-dimensional data often suffer from information loss, occlusions and visual clutter for large and noisy data. In this case, structure is misleading and false insights are derived. We use topological concepts to provide a structural view of the points. We analyze them in their original space and depict their clustering structure using intuitive landscapes. We describe the visual analysis process to define and simplify the structural view and to perform local analysis by linking individual features to other visualizations.

2020 ◽  
Author(s):  
Pedro B. Pio ◽  
Igor C. Sodré ◽  
Vinicius R. P. Borges

The implementation of affirmative actions in public universities is a topic of debate within the Brazilian society, specially regarding the academic performance of students that have been admitted through the quota system. This paper describes a visual analysis process to explore and compare the academic performances of quota and non-quota students from computer-related programs in a public Brazilian university. The results revealed that both failure and dropout rates for quota students are slightly higher than non-quota students in the first terms, but tends to present similar rates at the final terms.


2021 ◽  
pp. 104850
Author(s):  
Aki Ishii ◽  
Kazuyoshi Yata ◽  
Makoto Aoshima
Keyword(s):  

2016 ◽  
Vol 16 (3) ◽  
pp. 232-256 ◽  
Author(s):  
Hans-Jörg Schulz ◽  
Thomas Nocke ◽  
Magnus Heitzler ◽  
Heidrun Schumann

Visualization has become an important ingredient of data analysis, supporting users in exploring data and confirming hypotheses. At the beginning of a visual data analysis process, data characteristics are often assessed in an initial data profiling step. These include, for example, statistical properties of the data and information on the data’s well-formedness, which can be used during the subsequent analysis to adequately parametrize views and to highlight or exclude data items. We term this information data descriptors, which can span such diverse aspects as the data’s provenance, its storage schema, or its uncertainties. Gathered descriptors encapsulate basic knowledge about the data and can thus be used as objective starting points for the visual analysis process. In this article, we bring together these different aspects in a systematic form that describes the data itself (e.g. its content and context) and its relation to the larger data gathering and visual analysis process (e.g. its provenance and its utility). Once established in general, we further detail the concept of data descriptors specifically for tabular data as the most common form of structured data today. Finally, we utilize these data descriptors for tabular data to capture domain-specific data characteristics in the field of climate impact research. This procedure from the general concept via the concrete data type to the specific application domain effectively provides a blueprint for instantiating data descriptors for other data types and domains in the future.


Author(s):  
ThippaReddy Gadekallu ◽  
Akshat Soni ◽  
Deeptanu Sarkar ◽  
Lakshmanna Kuruva

Sentiment analysis is a sub-domain of opinion mining where the analysis is focused on the extraction of emotions and opinions of the people towards a particular topic from a structured, semi-structured, or unstructured textual data. In this chapter, the authors try to focus the task of sentiment analysis on IMDB movie review database. This chapter presents the experimental work on a new kind of domain-specific feature-based heuristic for aspect-level sentiment analysis of movie reviews. The authors have devised an aspect-oriented scheme that analyzes the textual reviews of a movie and assign it a sentiment label on each aspect. Finally, the authors conclude that incorporating syntactical information in the models is vital to the sentiment analysis process. The authors also conclude that the proposed approach to sentiment classification supplements the existing rating movie rating systems used across the web and will serve as base to future researches in this domain.


2020 ◽  
Vol 25 (4) ◽  
pp. 1376-1391
Author(s):  
Liangfu Lu ◽  
Wenbo Wang ◽  
Zhiyuan Tan

AbstractThe Parallel Coordinates Plot (PCP) is a popular technique for the exploration of high-dimensional data. In many cases, researchers apply it as an effective method to analyze and mine data. However, when today’s data volume is getting larger, visual clutter and data clarity become two of the main challenges in parallel coordinates plot. Although Arc Coordinates Plot (ACP) is a popular approach to address these challenges, few optimization and improvement have been made on it. In this paper, we do three main contributions on the state-of-the-art PCP methods. One approach is the improvement of visual method itself. The other two approaches are mainly on the improvement of perceptual scalability when the scale or the dimensions of the data turn to be large in some mobile and wireless practical applications. 1) We present an improved visualization method based on ACP, termed as double arc coordinates plot (DACP). It not only reduces the visual clutter in ACP, but use a dimension-based bundling method with further optimization to deals with the issues of the conventional parallel coordinates plot (PCP). 2)To reduce the clutter caused by the order of the axes and reveal patterns that hidden in the data sets, we propose our first dimensional reordering method, a contribution-based method in DACP, which is based on the singular value decomposition (SVD) algorithm. The approach computes the importance score of attributes (dimensions) of the data using SVD and visualize the dimensions from left to right in DACP according the score in SVD. 3) Moreover, a similarity-based method, which is based on the combination of nonlinear correlation coefficient and SVD algorithm, is proposed as well in the paper. To measure the correlation between two dimensions and explains how the two dimensions interact with each other, we propose a reordering method based on non-linear correlation information measurements. We mainly use mutual information to calculate the partial similarity of dimensions in high-dimensional data visualization, and SVD is used to measure global data. Lastly, we use five case scenarios to evaluate the effectiveness of DACP, and the results show that our approaches not only do well in visualizing multivariate dataset, but also effectively alleviate the visual clutter in the conventional PCP, which bring users a better visual experience.


2018 ◽  
Vol 18 (4) ◽  
pp. 384-404
Author(s):  
Jhon Alejandro Triana ◽  
Dirk Zeckzer ◽  
Hans Hagen ◽  
Jose Tiberio Hernandez

The use of interactive applications to support the decision-making process is more common every day. However, a huge amount of data is required in order to make more informed decisions. Fortunately, with the arrival of new technologies there are many data sources available. This requirement of data causes heterogeneity and data quality problems. A set of data quality problems are reduced in the preprocessing stage. However, many data quality issues persist after the preprocessing stage. For this reason, we proposed a methodology to take the data quality problems, to represent them and simultaneously support the analysis process. In addition, an application is developed as a use case of the methodology by analyzing the public transport system in Bogotá. Furthermore, a case study is performed to test the usefulness of the developed application. As a result, the methodology made possible the development of interactive visualizations that constitute an application that is useful to achieve the analysis tasks by including data quality features.


Sign in / Sign up

Export Citation Format

Share Document