aggregation queries Latest Research Papers

Accelerating approximate aggregation queries with expensive predicates

Proceedings of the VLDB Endowment ◽

10.14778/3476249.3476285 ◽

2021 ◽

Vol 14 (11) ◽

pp. 2341-2354

Author(s):

Daniel Kang ◽

John Guibas ◽

Peter Bailis ◽

Tatsunori Hashimoto ◽

Yi Sun ◽

...

Keyword(s):

Neural Networks ◽

Query Processing ◽

Deep Neural Networks ◽

Optimal Allocation ◽

Approximate Query Processing ◽

Approximate Aggregation ◽

Approximate Query ◽

Real World Datasets ◽

Aggregation Queries ◽

Processing Techniques

Researchers and industry analysts are increasingly interested in computing aggregation queries over large, unstructured datasets with selective predicates that are computed using expensive deep neural networks (DNNs). As these DNNs are expensive and because many applications can tolerate approximate answers, analysts are interested in accelerating these queries via approximations. Unfortunately, standard approximate query processing techniques to accelerate such queries are not applicable because they assume the result of the predicates are available ahead of time. Furthermore, recent work using cheap approximations (i.e., proxies) do not support aggregation queries with predicates. To accelerate aggregation queries with expensive predicates, we develop and analyze a query processing algorithm that leverages proxies (ABAE). ABAE must account for the key challenge that it may sample records that do not satisfy the predicate. To address this challenge, we first use the proxy to group records into strata so that records satisfying the predicate are ideally grouped into few strata. Given these strata, ABAE uses pilot sampling and plugin estimates to sample according to the optimal allocation. We show that ABAE converges at an optimal rate in a novel analysis of stratified sampling with draws that may not satisfy the predicate. We further show that ABAE outperforms on baselines on six real-world datasets, reducing labeling costs by up to 2.3X.

Download Full-text

CAvSAT: Answering Aggregation Queries over Inconsistent Databases via SAT Solving

Proceedings of the 2021 International Conference on Management of Data ◽

10.1145/3448016.3452749 ◽

2021 ◽

Author(s):

Akhil A. Dixit ◽

Phokion G. Kolaitis

Keyword(s):

Sat Solving ◽

Inconsistent Databases ◽

Aggregation Queries

Download Full-text

Scotty

ACM Transactions on Database Systems ◽

10.1145/3433675 ◽

2021 ◽

Vol 46 (1) ◽

pp. 1-46

Author(s):

Jonas Traub ◽

Philipp Marian Grulich ◽

Alejandro Rodríguez Cuéllar ◽

Sebastian Breß ◽

Asterios Katsifodimos ◽

...

Keyword(s):

Data Stream ◽

Stream Processing ◽

Sliding Window ◽

Improve Performance ◽

General Applicability ◽

Aggregation Functions ◽

Aggregation Techniques ◽

Aggregation Queries ◽

Alternative Solutions ◽

Apache Storm

Window aggregation is a core operation in data stream processing. Existing aggregation techniques focus on reducing latency, eliminating redundant computations, or minimizing memory usage. However, each technique operates under different assumptions with respect to workload characteristics, such as properties of aggregation functions (e.g., invertible, associative), window types (e.g., sliding, sessions), windowing measures (e.g., time- or count-based), and stream (dis)order. In this article, we present Scotty , an efficient and general open-source operator for sliding-window aggregation in stream processing systems, such as Apache Flink, Apache Beam, Apache Samza, Apache Kafka, Apache Spark, and Apache Storm. One can easily extend Scotty with user-defined aggregation functions and window types. Scotty implements the concept of general stream slicing and derives workload characteristics from aggregation queries to improve performance without sacrificing its general applicability. We provide an in-depth view on the algorithms of the general stream slicing approach. Our experiments show that Scotty outperforms alternative solutions.

Download Full-text

From natural language processing to neural databases

Proceedings of the VLDB Endowment ◽

10.14778/3447689.3447706 ◽

2021 ◽

Vol 14 (6) ◽

pp. 1033-1039

Author(s):

James Thorne ◽

Majid Yazdani ◽

Marzieh Saeidi ◽

Fabrizio Silvestri ◽

Sebastian Riedel ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Neural Nets ◽

Join Queries ◽

The Rich ◽

New Research ◽

Aggregation Queries ◽

Performance Gains ◽

Text Images

In recent years, neural networks have shown impressive performance gains on long-standing AI problems, such as answering queries from text and machine translation. These advances raise the question of whether neural nets can be used at the core of query processing to derive answers from facts, even when the facts are expressed in natural language. If so, it is conceivable that we could relax the fundamental assumption of database management, namely, that our data is represented as fields of a pre-defined schema. Furthermore, such technology would enable combining information from text, images, and structured data seamlessly. This paper introduces neural databases , a class of systems that use NLP transformers as localized answer derivation engines. We ground the vision in NeuralDB, a system for querying facts represented as short natural language sentences. We demonstrate that recent natural language processing models, specifically transformers, can answer select-project-join queries if they are given a set of relevant facts. However, they cannot scale to non-trivial databases nor answer set-based and aggregation queries. Based on these insights, we identify specific research challenges that are needed to build neural databases. Some of the challenges require drawing upon the rich literature in data management, and others pose new research opportunities to the NLP community. Finally, we show that with preliminary solutions, NeuralDB can already answer queries over thousands of sentences with very high accuracy.

Download Full-text

Efficient Algorithms for Kernel Aggregation Queries

IEEE Transactions on Knowledge and Data Engineering ◽

10.1109/tkde.2020.3018376 ◽

2020 ◽

pp. 1-1

Author(s):

Tsz Nam Chan ◽

Leong Hou U ◽

Reynold Cheng ◽

Man Lung Yiu ◽

Shivansh Mittal

Keyword(s):

Efficient Algorithms ◽

Aggregation Queries

Download Full-text

Obscure: Information-Theoretically Secure, Oblivious, and Verifiable Aggregation Queries on Secret-Shared Outsourced Data

IEEE Transactions on Knowledge and Data Engineering ◽

10.1109/tkde.2020.2983932 ◽

2020 ◽

pp. 1-1

Author(s):

Peeyush Gupta ◽

Yin Li ◽

Sharad Mehrotra ◽

Nisha Panwar ◽

Shantanu Sharma ◽

...

Keyword(s):

Outsourced Data ◽

Aggregation Queries

Download Full-text

AQapprox: Aggregation Queries Approximation with Distribution-Aware Online Sampling

Web Information Systems Engineering – WISE 2020 - Lecture Notes in Computer Science ◽

10.1007/978-3-030-62008-0_28 ◽

2020 ◽

pp. 404-416

Author(s):

Han Wu ◽

Xiaoling Wang ◽

Xingjian Lu

Keyword(s):

Aggregation Queries ◽

Online Sampling

Download Full-text

STASH : Fast Hierarchical Aggregation Queries for Effective Visual Spatiotemporal Explorations

2019 IEEE International Conference on Cluster Computing (CLUSTER) ◽

10.1109/cluster.2019.8891029 ◽

2019 ◽

Cited By ~ 1

Author(s):

Saptashwa Mitra ◽

Paahuni Khandelwal ◽

Shrideep Pallickara ◽

Sangmi Lee Pallickara

Keyword(s):

Hierarchical Aggregation ◽

Aggregation Queries

Download Full-text

Generating multidimensional schemata from relational aggregation queries

World Wide Web ◽

10.1007/s11280-019-00706-9 ◽

2019 ◽

Vol 23 (1) ◽

pp. 337-359

Author(s):

Zheng Huo ◽

Kerry Taylor ◽

Xiuzhen Zhang ◽

Suzhen Wang ◽

Chaoyi Pang

Keyword(s):

Aggregation Queries

Download Full-text

A Data Structure for Real-Time Aggregation Queries of Big Brain Networks

Neuroinformatics ◽

10.1007/s12021-019-09428-9 ◽

2019 ◽

Vol 18 (1) ◽

pp. 131-149

Author(s):

Florian Johann Ganglberger ◽

Joanna Kaczanowska ◽

Wulf Haubensak ◽

Katja Bühler

Keyword(s):

Data Structure ◽

Real Time ◽

Brain Networks ◽

Time Aggregation ◽

Aggregation Queries

Download Full-text

aggregation queries
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Accelerating approximate aggregation queries with expensive predicates

CAvSAT: Answering Aggregation Queries over Inconsistent Databases via SAT Solving

Scotty

From natural language processing to neural databases

Efficient Algorithms for Kernel Aggregation Queries

Obscure: Information-Theoretically Secure, Oblivious, and Verifiable Aggregation Queries on Secret-Shared Outsourced Data

AQapprox: Aggregation Queries Approximation with Distribution-Aware Online Sampling

STASH : Fast Hierarchical Aggregation Queries for Effective Visual Spatiotemporal Explorations

Generating multidimensional schemata from relational aggregation queries

A Data Structure for Real-Time Aggregation Queries of Big Brain Networks

Export Citation Format

aggregation queriesRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Accelerating approximate aggregation queries with expensive predicates

CAvSAT: Answering Aggregation Queries over Inconsistent Databases via SAT Solving

Scotty

From natural language processing to neural databases

Efficient Algorithms for Kernel Aggregation Queries

Obscure: Information-Theoretically Secure, Oblivious, and Verifiable Aggregation Queries on Secret-Shared Outsourced Data

AQapprox: Aggregation Queries Approximation with Distribution-Aware Online Sampling

STASH : Fast Hierarchical Aggregation Queries for Effective Visual Spatiotemporal Explorations

Generating multidimensional schemata from relational aggregation queries

A Data Structure for Real-Time Aggregation Queries of Big Brain Networks

aggregation queries
Recently Published Documents