Mapping texts through dimensionality reduction and visualization techniques for interactive exploration of document collections

2006 ◽  
Author(s):  
Alneu de Andrade Lopes ◽  
Rosane Minghim ◽  
Vinícius Melo ◽  
Fernando V. Paulovich
Entropy ◽  
2019 ◽  
Vol 21 (7) ◽  
pp. 669 ◽  
Author(s):  
António M. Lopes ◽  
J. A. Tenreiro Machado

This paper considers several distinct mathematical and computational tools, namely complexity, dimensionality-reduction, clustering, and visualization techniques, for characterizing music. Digital representations of musical works of four artists are analyzed by means of distinct indices and visualized using the multidimensional scaling technique. The results are then correlated with the artists’ musical production. The patterns found in the data demonstrate the effectiveness of the approach for assessing the complexity of musical information.


2017 ◽  
Vol 869 ◽  
pp. 212-225 ◽  
Author(s):  
Diana Fernandez-Prieto ◽  
Hans Hagen

For decades, multiple lighting simulation software packages and plugins for commercial software have been developed in an effort to ease the usage and integration of simulation into the lighting design process. In this effort, one of the main challenges is to provide lighting designers with an easy and comprehensive access to simulation results. Visualization is used as a means to achieve this goal. In this paper, we explore two of the most used free lighting simulation packages towards the identification of visualization techniques that facilitate the access to the simulation results as well as the identification of opportunities for the enhancement of simulation-assisted lighting design processes. A test case of a metal workshop illustrates the output produced by both software packages. Based on this exploration, we identified an open gap regarding three main aspects: interactive exploration of simulation results, visualization of compliance with lighting standards, and visual comparison of lighting solutions. We provide a discussion on how approaches from other domains can be applied to close this gap.


2015 ◽  
Vol 15 (2) ◽  
pp. 154-172 ◽  
Author(s):  
Danilo B Coimbra ◽  
Rafael M Martins ◽  
Tácito TAT Neves ◽  
Alexandru C Telea ◽  
Fernando V Paulovich

Understanding three-dimensional projections created by dimensionality reduction from high-variate datasets is very challenging. In particular, classical three-dimensional scatterplots used to display such projections do not explicitly show the relations between the projected points, the viewpoint used to visualize the projection, and the original data variables. To explore and explain such relations, we propose a set of interactive visualization techniques. First, we adapt and enhance biplots to show the data variables in the projected three-dimensional space. Next, we use a set of interactive bar chart legends to show variables that are visible from a given viewpoint and also assist users to select an optimal viewpoint to examine a desired set of variables. Finally, we propose an interactive viewpoint legend that provides an overview of the information visible in a given three-dimensional projection from all possible viewpoints. Our techniques are simple to implement and can be applied to any dimensionality reduction technique. We demonstrate our techniques on the exploration of several real-world high-dimensional datasets.


Author(s):  
Alison Smith ◽  
Tak Yeon Lee ◽  
Forough Poursabzi-Sangdeh ◽  
Jordan Boyd-Graber ◽  
Niklas Elmqvist ◽  
...  

Probabilistic topic models are important tools for indexing, summarizing, and analyzing large document collections by their themes. However, promoting end-user understanding of topics remains an open research problem. We compare labels generated by users given four topic visualization techniques—word lists, word lists with bars, word clouds, and network graphs—against each other and against automatically generated labels. Our basis of comparison is participant ratings of how well labels describe documents from the topic. Our study has two phases: a labeling phase where participants label visualized topics and a validation phase where different participants select which labels best describe the topics’ documents. Although all visualizations produce similar quality labels, simple visualizations such as word lists allow participants to quickly understand topics, while complex visualizations take longer but expose multi-word expressions that simpler visualizations obscure. Automatic labels lag behind user-created labels, but our dataset of manually labeled topics highlights linguistic patterns (e.g., hypernyms, phrases) that can be used to improve automatic topic labeling algorithms.


2009 ◽  
Vol 6 (2) ◽  
pp. 217-227 ◽  
Author(s):  
Aswani Kumar

Domains such as text, images etc contain large amounts of redundancies and ambiguities among the attributes which result in considerable noise effects (i.e. the data is high dimension). Retrieving the data from high dimensional datasets is a big challenge. Dimensionality reduction techniques have been a successful avenue for automatically extracting the latent concepts by removing the noise and reducing the complexity in processing the high dimensional data. In this paper we conduct a systematic study on comparing the unsupervised dimensionality reduction techniques for text retrieval task. We analyze these techniques from the view of complexity, approximation error and retrieval quality with experiments on four testing document collections.


Author(s):  
P. Alagambigai ◽  
K. Thangavel

Visualization techniques could enhance the existing methods for knowledge and data discovery by increasing the user involvement in the interactive process. VISTA, an interactive visual cluster rendering system, is known to be an effective model which allows the user to interactively observe clusters in a series of continuously changing visualizations through visual tuning. Identification of the dominating dimensions for visual tuning and visual distance computation process becomes tedious, when the dimensionality of the dataset increases. One common approach to solve this problem is dimensionality reduction. This chapter compares the performance of three proposed feature selection methods viz., Entropy Weighting Feature Selection, Outlier Score Based Feature Selection and Contribution to the Entropy Based Feature Selection for interactive visual clustering system. The cluster quality of the three feature selection methods is also compared. The experiments are carried out for various datasets of University of California, Irvine (UCI) machine learning data repository.


2005 ◽  
Vol 4 (2) ◽  
pp. 96-113 ◽  
Author(s):  
Jinwook Seo ◽  
Ben Shneiderman

Interactive exploration of multidimensional data sets is challenging because: (1) it is difficult to comprehend patterns in more than three dimensions, and (2) current systems often are a patchwork of graphical and statistical methods leaving many researchers uncertain about how to explore their data in an orderly manner. We offer a set of principles and a novel rank-by-feature framework that could enable users to better understand distributions in one (1D) or two dimensions (2D), and then discover relationships, clusters, gaps, outliers, and other features. Users of our framework can view graphical presentations (histograms, boxplots, and scatterplots), and then choose a feature detection criterion to rank 1D or 2D axis-parallel projections. By combining information visualization techniques (overview, coordination, and dynamic query) with summaries and statistical methods users can systematically examine the most important 1D and 2D axis-parallel projections. We summarize our Graphics, Ranking, and Interaction for Discovery (GRID) principles as: (1) study 1D, study 2D, then find features (2) ranking guides insight, statistics confirm. We implemented the rank-by-feature framework in the Hierarchical Clustering Explorer, but the same data exploration principles could enable users to organize their discovery process so as to produce more thorough analyses and extract deeper insights in any multidimensional data application, such as spreadsheets, statistical packages, or information visualization tools.


Author(s):  
Antonio M. Rinaldi

The need to manage electronic documents is an open issue in the digital era. It becomes a challenging problem on the internet where a large amount of data needs even more efficient and effective methods and techniques for mining and representing information. In this context, document summarization, browsing processes and visualization techniques have had a great impact on several dimensions of user information perception. In this context, the use of ontologies for knowledge representation has rapidly grown in the last years in several application domains together with social-based techniques such as tag clouds. This form of visualization tool is becoming particularly useful in the interaction process between users and social applications where a huge amount of data needs to have effective and efficient interfaces. In this article, the authors propose a novel methodology based on a combination of ontologies and Tag Clouds for web document collections browsing and summarizing, they call this tool Semantic Tag Cloud.


Sign in / Sign up

Export Citation Format

Share Document