Assessing data change in scientific datasets

A scalable framework for continuous query evaluations over multidimensional, scientific datasets

Concurrency and Computation Practice and Experience ◽

10.1002/cpe.3651 ◽

2015 ◽

Vol 28 (8) ◽

pp. 2546-2563 ◽

Cited By ~ 3

Author(s):

Cameron Tolooee ◽

Matthew Malensek ◽

Sangmi Lee Pallickara

Keyword(s):

Continuous Query ◽

Scientific Datasets

Download Full-text

Hybrid Query and Data Qodering for Fast and Progressive Range-Aggregate Query Answering

Data Warehousing and Mining ◽

10.4018/978-1-59904-951-9.ch071 ◽

2008 ◽

pp. 1250-1268

Author(s):

Cyrus Shahabi ◽

Mehrdad Jahangiri ◽

Dimitris Sacharidis

Keyword(s):

Experimental Studies ◽

Query Answering ◽

Common Belief ◽

Multidimensional Datasets ◽

Scientific Datasets ◽

Aggregate Queries ◽

Random Dataset ◽

Aggregate Query ◽

The Common ◽

Superior Approach

Data analysis systems require range-aggregate query answering of large multidimensional datasets. We provide the necessary framework to build a retrieval system capable of providing fast answers with progressively increasing accuracy in support of range-aggregate queries. In addition, with error forecasting, we provide estimations on the accuracy of the generated approximate results. Our framework utilizes the wavelet transformation of query and data hypercubes. While prior work focused on the ordering of either the query or the data coefficients, we propose a class of hybrid ordering techniques that exploits both query and data wavelets in answering queries progressively. This work effectively subsumes and extends most of the current work where wavelets are used as a tool for approximate or progressive query evaluation. The results of our experimental studies show that independent of the characteristics of the dataset, the data coefficient ordering, contrary to the common belief, is the inferior approach. Hybrid ordering, on the other hand, performs best for scientific datasets that are inter-correlated. For an entirely random dataset with no inter-correlation, query ordering is the superior approach.

Download Full-text

Ranking Multi-Metric Scientific Achievements Using a Concept of Pareto Optimality

Mathematics ◽

10.3390/math8060956 ◽

2020 ◽

Vol 8 (6) ◽

pp. 956 ◽

Cited By ~ 1

Author(s):

Shahryar Rahnamayan ◽

Sedigheh Mahdavi ◽

Kalyanmoy Deb ◽

Azam Asilian Bidgoli

Keyword(s):

Pareto Optimality ◽

Ranking Methods ◽

Pareto Dominance ◽

Optimal Weights ◽

Scientific Datasets ◽

Ranking Strategy ◽

Resolution Problem ◽

Number Of Publications ◽

Statistical Metrics ◽

Scientific Achievements

The ranking of multi-metric scientific achievements is a challenging task. For example, the scientific ranking of researchers utilizes two major types of indicators; namely, number of publications and citations. In fact, they focus on how to select proper indicators, considering only one indicator or combination of them. The majority of ranking methods combine several indicators, but these methods are faced with a challenging concern—the assignment of suitable/optimal weights to the targeted indicators. Pareto optimality is defined as a measure of efficiency in the multi-objective optimization which seeks the optimal solutions by considering multiple criteria/objectives simultaneously. The performance of the basic Pareto dominance depth ranking strategy decreases by increasing the number of criteria (generally speaking, when it is more than three criteria). In this paper, a new, modified Pareto dominance depth ranking strategy is proposed which uses some dominance metrics obtained from the basic Pareto dominance depth ranking and some sorted statistical metrics to rank the scientific achievements. It attempts to find the clusters of compared data by using all of indicators simultaneously. Furthermore, we apply the proposed method to address the multi-source ranking resolution problem which is very common these days; for example, there are several world-wide institutions which rank the world’s universities every year, but their rankings are not consistent. As our case studies, the proposed method was used to rank several scientific datasets (i.e., researchers, universities, and countries) for proof of concept.

Download Full-text

Feature Characterization in Scientific Datasets

Advances in Intelligent Data Analysis - Lecture Notes in Computer Science ◽

10.1007/3-540-44816-0_1 ◽

2001 ◽

pp. 1-12 ◽

Cited By ~ 1

Author(s):

Elizabeth Bradley ◽

Nancy Collins ◽

W. Philip Kegelmeyer

Keyword(s):

Scientific Datasets ◽

Feature Characterization

Download Full-text

Evaluation of lossless and lossy algorithms for the compression of scientific datasets in NetCDF-4 or HDF5 formatted files

10.5194/gmd-2018-250 ◽

2018 ◽

Author(s):

Xavier Delaunay ◽

Aurélie Courtois ◽

Flavien Gouillon

Keyword(s):

Data Storage ◽

Compression Ratio ◽

Reduction Method ◽

High Compression Ratio ◽

Compression Speed ◽

File Formats ◽

Scientific Datasets ◽

Bounded Data ◽

Data Reduction Method ◽

Rounding Algorithm

Abstract. The increasing volume of scientific datasets imposes the use of compression to reduce the data storage or transmission costs, specifically for the oceanography or meteorological datasets generated by Earth observation mission ground segments. These data are mostly produced in NetCDF formatted files. Indeed, the NetCDF-4/HDF5 file formats are widely spread in the global scientific community because of the nice features they offer. Particularly, the HDF5 offers the dynamically loaded filter plugin functionality allowing users to write filters, such as compression/decompression filters, to process the data before reading or writing it on the disk. In this work, we evaluate the performance of lossy and lossless compression/decompression methods through NetCDF-4 and HDF5 tools on analytical and real scientific floating-point datasets. We also introduce the Digit Rounding algorithm, a new relative error bounded data reduction method inspired by the Bit Grooming algorithm. The Digit Rounding algorithm allows high compression ratio while preserving a given number of significant digits in the dataset. It achieves higher compression ratio than the Bit Grooming algorithm while keeping similar compression speed.

Download Full-text