MCLEAN: Multilevel Clustering Exploration As Network

10.7287/peerj.preprints.3448 ◽

2017 ◽

Author(s):

Daniel Alcaide ◽

Jan Aerts

Keyword(s):

Data Analysis ◽

Visual Analytics ◽

User Study ◽

R Package ◽

Agglomerative Hierarchical Clustering ◽

Levels Of Detail ◽

Heterogeneous Datasets ◽

Community Finding ◽

Multilevel Representations ◽

Different Levels

Finding useful patterns in datasets has attracted considerable interest in the field of visual analytics. One of the most common tasks is the identification and representation of clusters. However, this is non-trivial in heterogeneous datasets since the data needs to be analyzed from different perspectives. Indeed, highly variable patterns may mask underlying trends in the dataset. Dendrograms are graphical representations resulting from agglomerative hierarchical clustering and provide a framework for viewing the clustering at different levels of detail. However, dendrograms become cluttered when the dataset gets large, and the single cut of the dendrogram to demarcate different clusters can be insufficient in heterogeneous datasets. In this work, we propose a visual analytics methodology called MCLEAN that offers a general approach for guiding the user through the exploration and detection of clusters. Powered by a graph-based transformation of the relational data, it supports a scalable environment for representation of heterogeneous datasets by changing the spatialization. We thereby combine multilevel representations of the clustered dataset with community finding algorithms. Our approach entails displaying the results of the heuristics to users, providing a setting from which to start the exploration and data analysis. To evaluate our proposed approach, we conduct a qualitative user study, where participants are asked to explore a heterogeneous dataset, comparing the results obtained by MCLEAN with the dendrogram. These qualitative results reveal that MCLEAN is an effective way of aiding users in the detection of clusters in heterogeneous datasets. The proposed methodology is implemented in an R package available at https://bitbucket.org/vda-lab/mclean

Download Full-text

MCLEAN: Multilevel Clustering Exploration As Network

10.7287/peerj.preprints.3448v1 ◽

2017 ◽

Author(s):

Daniel Alcaide ◽

Jan Aerts

Keyword(s):

Data Analysis ◽

Visual Analytics ◽

User Study ◽

R Package ◽

Agglomerative Hierarchical Clustering ◽

Levels Of Detail ◽

Heterogeneous Datasets ◽

Community Finding ◽

Multilevel Representations ◽

Different Levels

Finding useful patterns in datasets has attracted considerable interest in the field of visual analytics. One of the most common tasks is the identification and representation of clusters. However, this is non-trivial in heterogeneous datasets since the data needs to be analyzed from different perspectives. Indeed, highly variable patterns may mask underlying trends in the dataset. Dendrograms are graphical representations resulting from agglomerative hierarchical clustering and provide a framework for viewing the clustering at different levels of detail. However, dendrograms become cluttered when the dataset gets large, and the single cut of the dendrogram to demarcate different clusters can be insufficient in heterogeneous datasets. In this work, we propose a visual analytics methodology called MCLEAN that offers a general approach for guiding the user through the exploration and detection of clusters. Powered by a graph-based transformation of the relational data, it supports a scalable environment for representation of heterogeneous datasets by changing the spatialization. We thereby combine multilevel representations of the clustered dataset with community finding algorithms. Our approach entails displaying the results of the heuristics to users, providing a setting from which to start the exploration and data analysis. To evaluate our proposed approach, we conduct a qualitative user study, where participants are asked to explore a heterogeneous dataset, comparing the results obtained by MCLEAN with the dendrogram. These qualitative results reveal that MCLEAN is an effective way of aiding users in the detection of clusters in heterogeneous datasets. The proposed methodology is implemented in an R package available at https://bitbucket.org/vda-lab/mclean

Download Full-text

DataSite: Proactive visual data exploration with computation of insight-based recommendations

Information Visualization ◽

10.1177/1473871618806555 ◽

2018 ◽

Vol 18 (2) ◽

pp. 251-267 ◽

Cited By ~ 10

Author(s):

Zhe Cui ◽

Sriram Karthik Badam ◽

M Adil Yalçin ◽

Niklas Elmqvist

Keyword(s):

Data Analysis ◽

Visual Analytics ◽

Domain Knowledge ◽

Recommendation System ◽

User Study ◽

Data Exploration ◽

Server Side ◽

Knowledge Requirements ◽

High Knowledge ◽

Visual Data Exploration

Effective data analysis ideally requires the analyst to have high expertise as well as high knowledge of the data. Even with such familiarity, manually pursuing all potential hypotheses and exploring all possible views is impractical. We present DataSite, a proactive visual analytics system where the burden of selecting and executing appropriate computations is shared by an automatic server-side computation engine. Salient features identified by these automatic background processes are surfaced as notifications in a feed timeline. DataSite effectively turns data analysis into a conversation between analyst and computer, thereby reducing the cognitive load and domain knowledge requirements. We validate the system with a user study comparing it to a recent visualization recommendation system, yielding significant improvement, particularly for complex analyses that existing analytics systems do not support well.

Download Full-text

suddengains: An R package to identify sudden gains in longitudinal data

10.31234/osf.io/2wa84 ◽

2019 ◽

Author(s):

Milan Wiedemann ◽

Graham R Thew ◽

Richard Stott ◽

Anke Ehlers

Keyword(s):

Longitudinal Data ◽

Psychological Intervention ◽

R Package ◽

Outcome Variable ◽

Individual Change ◽

Exact Methods ◽

Sudden Gains ◽

Levels Of Detail ◽

Data Files ◽

Different Levels

Sudden gains are large and stable improvements in an outcome variable between consecutive measurements, for example during a psychological intervention with multiple assessments. Researching these occurrences could help understand individual change processes in longitudinal data. Three criteria are generally used to identify sudden gains in psychological interventions. However, applying these criteria can be time consuming and prone to errors if not fully automated. Adaptations to these criteria and methodological decisions such as how multiple gains are handled vary across studies and are reported with different levels of detail. These problems limit the comparability of individual studies and make it hard to understand or replicate the exact methods used. The R package suddengains provides a set of tools to facilitate sudden gains research. This article illustrates how to use the package to identify sudden gains or sudden losses and how to extract descriptive statistics as well as exportable data files for further analysis. It also outlines how these analyses can be customised to apply adaptations of the standard criteria. The suddengains package therefore offers significant scope to improve the efficiency, reporting, and reproducibility of sudden gains research.

Download Full-text

Change Detection for Building Footprints with Different Levels of Detail Using Combined Shape and Pattern Analysis

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi7100406 ◽

2018 ◽

Vol 7 (10) ◽

pp. 406 ◽

Cited By ~ 1

Author(s):

Xiaodong Zhou ◽

Zhe Chen ◽

Xiang Zhang ◽

Tinghua Ai

Keyword(s):

Change Detection ◽

Urban Studies ◽

Spatial Databases ◽

Data Matching ◽

Levels Of Detail ◽

Heterogeneous Datasets ◽

Rule Sets ◽

The Many ◽

Measurable Factors ◽

Different Levels

Crowd-sourced geographic information is becoming increasingly available, providing diverse and timely sources for updating existing spatial databases to facilitate urban studies, geoinformatics, and real estate practices. However, the discrepancies between heterogeneous datasets present challenges for automated change detection. In this paper, we identify important measurable factors to account for issues like boundary mismatch, large offset, and discrepancies in the levels of detail between the more current and to-be-updated datasets. These factors are organized into rule sets that include data matching, merge of the many-to-many correspondence, controlled displacement, shape similarity, morphology of difference parts, and the building pattern constraint. We tested our approach against OpenStreetMap and a Dutch topographic dataset (TOP10NL). By removing or adding some components, the results show that our approach (accuracy = 0.90) significantly outperformed a basic geometric method (0.77), commonly used in previous studies, implying a more reliable change detection in realistic update scenarios. We further found that distinguishing between small and large buildings was a useful heuristic in creating the rules.

Download Full-text

Toward a deeper understanding of the relationship between interaction constraints and visual isomorphs

Information Visualization ◽

10.1177/1473871611433712 ◽

2012 ◽

Vol 11 (3) ◽

pp. 222-236 ◽

Cited By ~ 2

Author(s):

Wenwen Dou ◽

Caroline Ziemkiewicz ◽

Lane Harrison ◽

Dong Hyun Jeong ◽

William Ribarsky ◽

...

Keyword(s):

Problem Solving ◽

Cognitive Science ◽

Visual Analytics ◽

User Study ◽

Critical Role ◽

Science Literature ◽

Manual Manipulation ◽

Interaction Constraints ◽

The Relationship ◽

Different Levels

Interaction and manual manipulation have been shown in cognitive science literature to play a critical role in problem solving. Given different types of interactions or constraints on interactions, a problem can appear to have different degrees of difficulty. While this relationship between interaction and problem solving has been well studied in the cognitive science literatures, the visual analytics community has yet to exploit this understanding for analytical problem solving. In this paper, we hypothesize that constraints on interactions and constraints encoded in visual representations can lead to strategies of varying effectiveness during problem solving. To test our hypothesis, we conducted a user study in which participants were given different levels of interaction constraints when solving a simple mathematic game called number scrabble. Number scrabble is known to have an optimal visual problem isomorph, and the goal of this study is to learn if and how the participants could derive the isomorph and to analyze the strategies that the participants utilize in solving the problem. Our results indicate that constraints on interactions do affect problem solving, and that although the optimal visual isomorph is difficult to derive, certain interaction constraints can lead to a higher chance of deriving the isomorph.

Download Full-text

Development of a Simulation-Based Process Chain – Strategy for Different Levels of Detail for the Preprocessing Definitions

SNE Simulation Notes Europe ◽

10.11128/sne.21.tn.10081 ◽

2011 ◽

Vol 21 (3-4) ◽

pp. 135-140 ◽

Cited By ~ 5

Author(s):

Toni A. Krol ◽

Sebastian Westhäuser ◽

M. F. Zäh ◽

Johannes Schilp ◽

G. Groth

Keyword(s):

Process Chain ◽

Levels Of Detail ◽

Simulation Based ◽

Different Levels

Download Full-text

Preprocessing Profiling Model for Visual Analytics

10.5753/sibgrapi.est.2020.12991 ◽

2020 ◽

Author(s):

Alessandra Maciel Paz Milani ◽

Fernando V. Paulovich ◽

Isabel Harb Manssour

Keyword(s):

Data Mining ◽

Data Analysis ◽

Visual Analytics ◽

Data Preprocessing ◽

Interview Study ◽

Raw Data ◽

Important Stage ◽

Analysis Process

Analyzing and managing raw data are still a challenging part of the data analysis process, mainly regarding data preprocessing. Although we can find studies proposing design implications or recommendations for visualization solutions in the data analysis scope, they do not focus on challenges during the preprocessing phase. Likewise, the current Visual Analytics processes do not consider preprocessing an equally important stage in their process. Thus, with this study, we aim to contribute to the discussion of how we can use and combine methods of visualization and data mining to assist data analysts during the preprocessing activities. To achieve that, we introduce the Preprocessing Profiling Model for Visual Analytics, which contemplates a set of features to inspire the implementation of new solutions. In turn, these features were designed considering a list of insights we obtained during an interview study with thirteen data analysts. Our contributions can be summarized as offering resources to promote a shift to a visual preprocessing.

Download Full-text

archivist: An R Package for Managing, Recording and Restoring Data Analysis Results

Journal of Statistical Software ◽

10.18637/jss.v082.i11 ◽

2017 ◽

Vol 82 (11) ◽

Cited By ~ 4

Author(s):

Przemysaw Biecek ◽

Marcin Kosinski

Keyword(s):

Data Analysis ◽

R Package

Download Full-text

Functional data analysis techniques to improve the generalizability of near-infrared spectral data for monitoring mosquito populations

10.1101/2020.04.28.058495 ◽

2020 ◽

Cited By ~ 1

Author(s):

Pedro M. Esperança ◽

Dari F. Da ◽

Ben Lambert ◽

Roch K. Dabiré ◽

Thomas S. Churcher

Keyword(s):

Data Analysis ◽

Functional Data Analysis ◽

Functional Data ◽

Near Infrared ◽

R Package ◽

Mosquito Vector ◽

Modelling Framework ◽

Functional Representation ◽

Infrared Spectral ◽

Generalised Linear Modelling

AbstractNear infrared spectroscopy is increasingly being used as an economical method to monitor mosquito vector populations in support of disease control. Despite this rise in popularity, strong geographical variation in spectra has proven an issue for generalising predictions from one location to another. Here, we use a functional data analysis approach—which models spectra as smooth curves rather than as a discrete set of points—to develop a method that is robust to geographic heterogeneity. Specifically, we use a penalised generalised linear modelling framework which includes efficient functional representation of spectra, spectral smoothing and regularisation. To ensure better generalisation of model predictions from one training set to another, we use cross-validation procedures favouring smoother representation of spectra. To illustrate the performance of our approach, we collected spectra for field-caught specimens of Anopheles gambiae complex mosquitoes – the most epidemiologically important vector species on the planet – in two sites in Burkina Faso. Using these spectra, we show how models trained on data from one site can successfully classify morphologically identical sibling species in another site, over 250km away. Whilst we apply our framework to species prediction, our unified statistical framework can, alternatively, handle regression analysis (for example, to determine mosquito age) and other types of multinomial classification (for example, to determine infection status). To make our methods readily available for field entomologists, we have created an open-source R package mlevcm. All data used is publicly also available.

Download Full-text