Multivariate Pointwise Information-Driven Data Sampling and Visualization

Soumya Dutta; Ayan Biswas; James Ahrens

doi:10.3390/e21070699

Multivariate Pointwise Information-Driven Data Sampling and Visualization

Entropy ◽

10.3390/e21070699 ◽

2019 ◽

Vol 21 (7) ◽

pp. 699

Author(s):

Soumya Dutta ◽

Ayan Biswas ◽

James Ahrens

Keyword(s):

Large Scale ◽

Scientific Data ◽

Data Sets ◽

Data Sampling ◽

Sampled Data ◽

Statistical Association ◽

Sampling Algorithm ◽

Data Summarization ◽

Multiple Variables ◽

Reduced Data

With increasing computing capabilities of modern supercomputers, the size of the data generated from the scientific simulations is growing rapidly. As a result, application scientists need effective data summarization techniques that can reduce large-scale multivariate spatiotemporal data sets while preserving the important data properties so that the reduced data can answer domain-specific queries involving multiple variables with sufficient accuracy. While analyzing complex scientific events, domain experts often analyze and visualize two or more variables together to obtain a better understanding of the characteristics of the data features. Therefore, data summarization techniques are required to analyze multi-variable relationships in detail and then perform data reduction such that the important features involving multiple variables are preserved in the reduced data. To achieve this, in this work, we propose a data sub-sampling algorithm for performing statistical data summarization that leverages pointwise information theoretic measures to quantify the statistical association of data points considering multiple variables and generates a sub-sampled data that preserves the statistical association among multi-variables. Using such reduced sampled data, we show that multivariate feature query and analysis can be done effectively. The efficacy of the proposed multivariate association driven sampling algorithm is presented by applying it on several scientific data sets.

Download Full-text

Exploring Imputation Techniques for Missing Data in Transportation Management Systems

Transportation Research Record Journal of the Transportation Research Board ◽

10.3141/1836-17 ◽

2003 ◽

Vol 1836 (1) ◽

pp. 132-142 ◽

Cited By ~ 61

Author(s):

Brian L. Smith ◽

William T. Scherer ◽

James H. Conklin

Keyword(s):

Missing Data ◽

Urban Areas ◽

Large Scale ◽

Management Systems ◽

Data Sets ◽

Transportation Industry ◽

Transportation Management ◽

Reduced Data ◽

Natural Characteristics

Many states have implemented large-scale transportation management systems to improve mobility in urban areas. These systems are highly prone to missing and erroneous data, which results in drastically reduced data sets for analysis and real-time operations. Imputation is the practice of filling in missing data with estimated values. Currently, the transportation industry generally does not use imputation as a means for handling missing data. Other disciplines have recognized the importance of addressing missing data and, as a result, methods and software for imputing missing data are becoming widely available. The feasibility and applicability of imputing missing traffic data are addressed, and a preliminary analysis of several heuristic and statistical imputation techniques is performed. Preliminary results produced excellent performance in the case study and indicate that the statistical techniques are more accurate while maintaining the natural characteristics of the data.

Download Full-text

A new bed elevation dataset for Greenland

The Cryosphere Discussions ◽

10.5194/tcd-6-4829-2012 ◽

2012 ◽

Vol 6 (6) ◽

pp. 4829-4860 ◽

Cited By ~ 5

Author(s):

J. A. Griggs ◽

J. L. Bamber ◽

R. T. W. L. , Hurkmans ◽

J. A. Dowdesewell ◽

S. P. Gogineni ◽

...

Keyword(s):

Large Scale ◽

Ice Sheet ◽

The Arctic ◽

Ice Thickness ◽

Data Sets ◽

Data Sampling ◽

The North ◽

Bed Elevation ◽

Arctic Ice ◽

Elevation Model

Abstract. We present a new bed elevation dataset for Greenland derived from a combination of multiple airborne ice thickness surveys undertaken between the 1970s and 2011. Around 344 000 line kilometres of airborne data were used, with the majority of this having been collected since the year 2000, when the last comprehensive compilation was undertaken. The airborne data were combined with satellite-derived elevations for non glaciated terrain to produce a consistent bed digital elevation model (DEM) over the entire island including across the glaciated/ice free boundary. The DEM was extended to the continental margin with the aid of bathymetric data, primarily from a compilation for the Arctic. Ice shelf thickness was determined where a floating tongue exists, in particular in the north. The across-track spacing between flight lines warranted interpolation at 1 km postings near the ice sheet margin and 2.5 km in the interior. Grids of ice surface elevation, error estimates for the DEM, ice thickness and data sampling density were also produced alongside a mask of land/ocean/grounded ice/floating ice. Errors in bed elevation range from a minimum of ±6 m to about ±200 m, as a function of distance from an observation and local topographic variability. A comparison with the compilation published in 2001 highlights the improvement in resolution afforded by the new data sets, particularly along the ice sheet margin, where ice velocity is highest and changes most marked. We use the new bed and surface DEMs to calculate the hydraulic potential for subglacial flow and present the large scale pattern of water routing. We estimate that the volume of ice included in our land/ice mask would raise eustatic sea level by 7.36 m, excluding any solid earth effects that would take place during ice sheet decay.

Download Full-text

Faculty Opinions recommendation of Comparative assessment of large-scale data sets of protein-protein interactions.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1006598.82257 ◽

2002 ◽

Author(s):

Rob Russell

Keyword(s):

Protein Interactions ◽

Large Scale ◽

Comparative Assessment ◽

Data Sets ◽

Protein Protein Interactions ◽

Large Scale Data ◽

Scale Data ◽

Large Scale Data Sets

Download Full-text

Galaxy spin direction distribution in HST and SDSS show similar large-scale asymmetry

Publications of the Astronomical Society of Australia ◽

10.1017/pasa.2020.46 ◽

2020 ◽

Vol 37 ◽

Author(s):

Lior Shamir

Keyword(s):

Large Scale ◽

Spiral Galaxies ◽

Hubble Space Telescope ◽

Gravitational Interaction ◽

Large Data ◽

Sloan Digital Sky Survey ◽

Data Sets ◽

Dipole Axis ◽

Data Set ◽

The Asymmetry

Abstract Several recent observations using large data sets of galaxies showed non-random distribution of the spin directions of spiral galaxies, even when the galaxies are too far from each other to have gravitational interaction. Here, a data set of $\sim8.7\cdot10^3$ spiral galaxies imaged by Hubble Space Telescope (HST) is used to test and profile a possible asymmetry between galaxy spin directions. The asymmetry between galaxies with opposite spin directions is compared to the asymmetry of galaxies from the Sloan Digital Sky Survey. The two data sets contain different galaxies at different redshift ranges, and each data set was annotated using a different annotation method. The results show that both data sets show a similar asymmetry in the COSMOS field, which is covered by both telescopes. Fitting the asymmetry of the galaxies to cosine dependence shows a dipole axis with probabilities of $\sim2.8\sigma$ and $\sim7.38\sigma$ in HST and SDSS, respectively. The most likely dipole axis identified in the HST galaxies is at $(\alpha=78^{\rm o},\delta=47^{\rm o})$ and is well within the $1\sigma$ error range compared to the location of the most likely dipole axis in the SDSS galaxies with $z>0.15$ , identified at $(\alpha=71^{\rm o},\delta=61^{\rm o})$ .

Download Full-text

Accelerating In-Transit Co-Processing for Scientific Simulations Using Region-Based Data-Driven Analysis

Algorithms ◽

10.3390/a14050154 ◽

2021 ◽

Vol 14 (5) ◽

pp. 154

Author(s):

Marcus Walldén ◽

Masao Okita ◽

Fumihiko Ino ◽

Dimitris Drikakis ◽

Ioannis Kokkinakis

Keyword(s):

Large Scale ◽

Data Driven ◽

Data Sets ◽

Output Constraints ◽

Data Driven Approach ◽

Scientific Simulations ◽

Multiple Metrics ◽

In Transit ◽

Multiple Compression ◽

Large Scale Simulations

Increasing processing capabilities and input/output constraints of supercomputers have increased the use of co-processing approaches, i.e., visualizing and analyzing data sets of simulations on the fly. We present a method that evaluates the importance of different regions of simulation data and a data-driven approach that uses the proposed method to accelerate in-transit co-processing of large-scale simulations. We use the importance metrics to simultaneously employ multiple compression methods on different data regions to accelerate the in-transit co-processing. Our approach strives to adaptively compress data on the fly and uses load balancing to counteract memory imbalances. We demonstrate the method’s efficiency through a fluid mechanics application, a Richtmyer–Meshkov instability simulation, showing how to accelerate the in-transit co-processing of simulations. The results show that the proposed method expeditiously can identify regions of interest, even when using multiple metrics. Our approach achieved a speedup of 1.29× in a lossless scenario. The data decompression time was sped up by 2× compared to using a single compression method uniformly.

Download Full-text

ECMO Therapy in Acute Chest Syndrome for Patients with Sickle Cell Disease: a Case Report and Literature Review

SN Comprehensive Clinical Medicine ◽

10.1007/s42399-021-00987-0 ◽

2021 ◽

Author(s):

Soi Avgeridou ◽

Ilija Djordjevic ◽

Anton Sabashnikov ◽

Kaveh Eghbalzadeh ◽

Laura Suhr ◽

...

Keyword(s):

Sickle Cell Disease ◽

Sickle Cell ◽

Scientific Data ◽

Acute Chest Syndrome ◽

Data Sets ◽

Limited Data ◽

Cell Disease ◽

Life Saving ◽

Institutional Experience ◽

Ecmo Therapy

AbstractExtracorporeal membrane oxygenation (ECMO) plays an important role as a life-saving tool for patients with therapy-refractory cardio-respiratory failure. Especially, for rare and infrequent indications, scientific data is scarce. The conducted paper focuses primarily on our institutional experience with a 19-year-old patient suffering an acute chest syndrome, a pathognomonic pulmonary condition presented by patients with sickle cell disease. After implementation of awake ECMO therapy, the patient was successfully weaned off support and discharged home 22 days after initiation of the extracorporeal circulation. In addition to limited data and current literature, further and larger data sets are necessary to determine the outcome after ECMO therapy for this rare indication.

Download Full-text

Prevention of Mountain Disasters and Maintenance of Residential Area through Real-Time Terrain Rendering

Sustainability ◽

10.3390/su13052950 ◽

2021 ◽

Vol 13 (5) ◽

pp. 2950

Author(s):

Su-Kyung Sung ◽

Eun-Seok Lee ◽

Byeong-Seok Shin

Keyword(s):

Real Time ◽

Graphics Processing Units ◽

High Speed ◽

Large Scale ◽

Civil Engineering ◽

Three Dimensional ◽

Sampled Data ◽

Residential Areas ◽

Terrain Rendering ◽

Mountain Disasters

Climate change increases the frequency of localized heavy rains and typhoons. As a result, mountain disasters, such as landslides and earthworks, continue to occur, causing damage to roads and residential areas downstream. Moreover, large-scale civil engineering works, including dam construction, cause rapid changes in the terrain, which harm the stability of residential areas. Disasters, such as landslides and earthenware, occur extensively, and there are limitations in the field of investigation; thus, there are many studies being conducted to model terrain geometrically and to observe changes in terrain according to external factors. However, conventional topography methods are expressed in a way that can only be interpreted by people with specialized knowledge. Therefore, there is a lack of consideration for three-dimensional visualization that helps non-experts understand. We need a way to express changes in terrain in real time and to make it intuitive for non-experts to understand. In conventional height-based terrain modeling and simulation, there is a problem in which some of the sampled data are irregularly distorted and do not show the exact terrain shape. The proposed method utilizes a hierarchical vertex cohesion map to correct inaccurately modeled terrain caused by uniform height sampling, and to compensate for geometric errors using Hausdorff distances, while not considering only the elevation difference of the terrain. The mesh reconstruction, which triangulates the three-vertex placed at each location and makes it the smallest unit of 3D model data, can be done at high speed on graphics processing units (GPUs). Our experiments confirm that it is possible to express changes in terrain accurately and quickly compared with existing methods. These functions can improve the sustainability of residential spaces by predicting the damage caused by mountainous disasters or civil engineering works around the city and make it easy for non-experts to understand.

Download Full-text

Compartment and hub definitions tune metabolic networks for metabolomic interpretations

GigaScience ◽

10.1093/gigascience/giz137 ◽

2020 ◽

Vol 9 (1) ◽

Cited By ~ 3

Author(s):

T Cameron Waller ◽

Jordan A Berg ◽

Alexander Lex ◽

Brian E Chapman ◽

Jared Rutter

Keyword(s):

Large Scale ◽

Metabolic Networks ◽

Shortest Paths ◽

Large Data ◽

Differential Regulation ◽

Large Data Sets ◽

Data Sets ◽

Human Metabolism ◽

Experimental Conditions ◽

Systemic Model

Abstract Background Metabolic networks represent all chemical reactions that occur between molecular metabolites in an organism’s cells. They offer biological context in which to integrate, analyze, and interpret omic measurements, but their large scale and extensive connectivity present unique challenges. While it is practical to simplify these networks by placing constraints on compartments and hubs, it is unclear how these simplifications alter the structure of metabolic networks and the interpretation of metabolomic experiments. Results We curated and adapted the latest systemic model of human metabolism and developed customizable tools to define metabolic networks with and without compartmentalization in subcellular organelles and with or without inclusion of prolific metabolite hubs. Compartmentalization made networks larger, less dense, and more modular, whereas hubs made networks larger, more dense, and less modular. When present, these hubs also dominated shortest paths in the network, yet their exclusion exposed the subtler prominence of other metabolites that are typically more relevant to metabolomic experiments. We applied the non-compartmental network without metabolite hubs in a retrospective, exploratory analysis of metabolomic measurements from 5 studies on human tissues. Network clusters identified individual reactions that might experience differential regulation between experimental conditions, several of which were not apparent in the original publications. Conclusions Exclusion of specific metabolite hubs exposes modularity in both compartmental and non-compartmental metabolic networks, improving detection of relevant clusters in omic measurements. Better computational detection of metabolic network clusters in large data sets has potential to identify differential regulation of individual genes, transcripts, and proteins.

Download Full-text

High Performance Computational Analysis of Large-scale Proteome Data Sets to Assess Incremental Contribution to Coverage of the Human Genome

Journal of Proteome Research ◽

10.1021/pr400181q ◽

2013 ◽

Vol 12 (6) ◽

pp. 2858-2868 ◽

Cited By ~ 29

Author(s):

Nadin Neuhauser ◽

Nagarjuna Nagaraj ◽

Peter McHardy ◽

Sara Zanivan ◽

Richard Scheltema ◽

...

Keyword(s):

Human Genome ◽

High Performance ◽

Large Scale ◽

Computational Analysis ◽

Data Sets

Download Full-text

ON GENERATING DIGITAL ELEVATION MODELS FROM LIDAR DATA – RESOLUTION VERSUS ACCURACY AND TOPOGRAPHIC WETNESS INDEX INDICES IN NORTHERN PEATLANDS

Geodesy and Cartography ◽

10.3846/20296991.2012.702983 ◽

2012 ◽

Vol 38 (2) ◽

pp. 57-69 ◽

Cited By ~ 12

Author(s):

Abdulghani Hasan ◽

Petter Pilesjö ◽

Andreas Persson

Keyword(s):

Large Scale ◽

Drainage Area ◽

Data Sets ◽

Topographic Wetness Index ◽

Absolute Deviation ◽

Digital Elevation ◽

Elevation Data ◽

Scale Modelling ◽

Data Points ◽

Emission Modelling

Global change and GHG emission modelling are dependent on accurate wetness estimations for predictions of e.g. methane emissions. This study aims to quantify how the slope, drainage area and the TWI vary with the resolution of DEMs for a flat peatland area. Six DEMs with spatial resolutions from 0.5 to 90 m were interpolated with four different search radiuses. The relationship between accuracy of the DEM and the slope was tested. The LiDAR elevation data was divided into two data sets. The number of data points facilitated an evaluation dataset with data points not more than 10 mm away from the cell centre points in the interpolation dataset. The DEM was evaluated using a quantile-quantile test and the normalized median absolute deviation. It showed independence of the resolution when using the same search radius. The accuracy of the estimated elevation for different slopes was tested using the 0.5 meter DEM and it showed a higher deviation from evaluation data for steep areas. The slope estimations between resolutions showed differences with values that exceeded 50%. Drainage areas were tested for three resolutions, with coinciding evaluation points. The model ability to generate drainage area at each resolution was tested by pair wise comparison of three data subsets and showed differences of more than 50% in 25% of the evaluated points. The results show that consideration of DEM resolution is a necessity for the use of slope, drainage area and TWI data in large scale modelling.

Download Full-text