scholarly journals A Rank-by-Feature Framework for Unsupervised Multidimensional Data Exploration Using Low Dimensional Projections

Author(s):  
Jinwook Seo ◽  
B. Shneiderman
Informatics ◽  
2017 ◽  
Vol 4 (3) ◽  
pp. 26 ◽  
Author(s):  
Johannes Kruiger ◽  
Almoctar Hassoumi ◽  
Hans-Jörg Schulz ◽  
AlexandruC Telea ◽  
Christophe Hurter

2004 ◽  
Vol 14 (02) ◽  
pp. 653-666 ◽  
Author(s):  
AXEL HUTT

The present work reviews briefly a segmentation method and a modeling approach for multivariate quasi-stationary data. The combination of both parts allows the extraction of low-dimensional models from multidimensional data. The segmentation method is applied to event-related potentials and fields and early auditory evoked potentials and extracts ERP- and ERF-components and early auditory waves objectively and independent from the number of segments. Additionally, the early auditory wave Pa is modeled by a two-dimensional system of ordinary differential equations. We find a common topology of wave Pa, which lets us conjecture intrinsic low-dimensional underlying attractors in the corresponding neuronal dynamics.


2009 ◽  
Vol 180 (10) ◽  
pp. 2002-2012 ◽  
Author(s):  
Sergei Manzhos ◽  
Koichi Yamashita ◽  
Tucker Carrington

Author(s):  
Alyssa Long ◽  
Alexander Glogowski ◽  
Matthew Meppiel ◽  
Lisa De Vito ◽  
Eric Engle ◽  
...  

Abstract Objective Clinical research informatics tools are necessary to support comprehensive studies of infectious diseases. The National Institute of Allergy and Infectious Diseases (NIAID) developed the publicly accessible Tuberculosis Data Exploration Portal (TB DEPOT) to address the complex etiology of tuberculosis (TB). Materials and Methods TB DEPOT displays deidentified patient case data and facilitates analyses across a wide range of clinical, socioeconomic, genomic, and radiological factors. The solution is built using Amazon Web Services cloud-based infrastructure, .NET Core, Angular, Highcharts, R, PLINK, and other custom-developed services. Structured patient data, pathogen genomic variants, and medical images are integrated into the solution to allow seamless filtering across data domains. Results Researchers can use TB DEPOT to query TB patient cases, create and save patient cohorts, and execute comparative statistical analyses on demand. The tool supports user-driven data exploration and fulfills the National Institute of Health’s Findable, Accessible, Interoperable, and Reusable (FAIR) principles. Discussion TB DEPOT is the first tool of its kind in the field of TB research to integrate multidimensional data from TB patient cases. Its scalable and flexible architectural design has accommodated growth in the data, organizations, types of data, feature requests, and usage. Use of client-side technologies over server-side technologies and prioritizing maintenance have been important lessons learned. Future directions are dynamically prioritized and key functionality is shared through an application programming interface. Conclusion This paper describes the platform development methodology, resulting functionality, benefits, and technical considerations of a clinical research informatics application to support increased understanding of TB.


Author(s):  
V. Kamp

Applications like environmental information systems define various scenarios of multidimensional data analysis, and they require special efforts concerning data representation, storage and processing. The project CARLOS (Cancer Registry Lower-Saxony) developed the Epidemiological and Statistical Data Exploration System (CARESS) to support multidimensional analysis of health data. The system is based on an architecture that focuses on extensive interoperability between a database management system and several analysis and visualisation tools. As spatial and statistical aspects of the data play an important role, CARESS provides special support for the integration of both.


Author(s):  
Alfredo Cuzzocrea

Data Warehousing (DW) systems store materialized views, data marts and data cubes, and provide nicely data exploration and analysis interfaces via OnLine Analytical Processing (OLAP) (Gray et al., 1997) and Data Mining (DM) tools and algorithms. Also, OnLine Analytical Mining (OLAM) (Han, 1997) integrates the previous knowledge discovery methodologies and offers a meaningfully convergence between OLAP and DM, thus contributing to significantly augment the power of data exploration and analysis capabilities of knowledge workers. At the storage layer, the mentioned knowledge discovery methodologies share the problem of efficiently accessing, querying and processing multidimensional data, which in turn heavily affect the performance of knowledge discovery processes at the application layer. Due to the fact that OLAP and OLAM directly process data cubes/marts, and DM is more and more encompassing methodologies that are interested to multidimensional data, the problem of efficiently representing data cubes by means of a meaningfully selected view set is become of relevant interest for the Data Warehousing and OLAP research community. This problem is directly related to the analogous problem of efficiently computing the data cube from a given relational data source (Harinarayan et al., 1996; Agarwal et al., 1996; Sarawagi et al., 1996; Zhao et al., 1997). Given a relational data source R and a target data cube schema W, the view selection problem in OLAP deals with how to select and materialize views from R in order to compute the data cube A defined by the schema W by optimizing both the query processing time, denoted by TQ, which models the amount of time required to answer a reference query-workload on the materialized view set, and the view maintenance time, denoted by TM, which models the amount of time required to maintain the materialized view set when updates occur, under a given set of constraints I that, without any loss of generality, can be represented by a space bound constraint B limiting the overall occupancy of the views to be materialized (i.e., I = ). It has been demonstrated (Gupta, 1997; Gupta & Mumick, 2005) that this problem is NP-hard, thus heuristic schemes are necessary. Heuristics are, in turn, implemented in the vest of greedy algorithms (Yang et al., 1997; Kalnis et al., 2002). In this article, we focus the attention on state-ofthe- art methods for the view selection problem in Data Warehousing and OLAP, and complete our analytical contribution with a theoretical analysis of these proposals under different selected properties that nicely model spatial and temporal complexity aspects of the investigated problem.


Author(s):  
Raghunath Kar ◽  
Susanta Kumar Das

In real life clustering of high dimensional data is a big problem. To find out the dense regions from increasing dimensions is one of them. We have already studied the clustering techniques of low dimensional data sets like k-means, k-mediod, BIRCH, CLARANS, CURE, DBScan, PAM etc. If a region is dense then it consists with number of data points with a minimum support of input parameter ø other wise it cannot take into clustering. So in this approach we have implemented CLIQUE to find out the clusters from multidimensional data sets. In dimension growth subspace clustering the clustering process start at single dimensional subspaces and grows upward to higher dimensional ones. It is a partition method where each dimension divided like a grid structure. The grid is a cell where the data points are present. We check the dense units from the structure by applying different algorithms. Finally the clusters are formed from the high dimensional data sets.


2006 ◽  
Vol 12 (4) ◽  
pp. 353-359 ◽  
Author(s):  
Antanas Žilinskas ◽  
Julius Žilinskas

Multidimensional scaling addresses the problem of representation of objects specified by proximity data by points in low dimensional embedding space. The problem is reduced to optimization of an accuracy measure of fit of the proximity data by the distances between the respective points. Three‐dimensional embedding space is considered in the present paper. Images of data of different dimensionality are discussed as well as dependence of visualization accuracy on dimensionality of embedding space and complexity of data.


2008 ◽  
Vol 65 (6) ◽  
pp. 1941-1954 ◽  
Author(s):  
Illia Horenko

Abstract A problem of simultaneous dimension reduction and identification of hidden attractive manifolds in multidimensional data with noise is considered. The problem is approached in two consecutive steps: (i) embedding the original data in a sufficiently high-dimensional extended space in a way proposed by Takens in his embedding theorem, followed by (ii) a minimization of the residual functional. The residual functional is constructed to measure the distance between the original data in extended space and their reconstruction based on a low-dimensional description. The reduced representation of the analyzed data results from projection onto a fixed number of unknown low-dimensional manifolds. Two specific forms of the residual functional are proposed, defining two different types of essential coordinates: (i) localized essential orthogonal functions (EOFs) and (ii) localized functions called principal original components (POCs). The application of the framework is exemplified both on a Lorenz attractor model with measurement noise and on historical air temperature data. It is demonstrated how the new method can be used for the elimination of noise and identification of the seasonal low-frequency components in meteorological data. An application of the proposed POCs in the context of the low-dimensional predictive models construction is presented.


2013 ◽  
Author(s):  
Christopher Beaumont ◽  
Thomas Robitaille ◽  
Alyssa Goodman ◽  
Michelle Borkin

Sign in / Sign up

Export Citation Format

Share Document