approximate query processing
Recently Published Documents


TOTAL DOCUMENTS

74
(FIVE YEARS 9)

H-INDEX

10
(FIVE YEARS 0)

2021 ◽  
Vol 14 (11) ◽  
pp. 2341-2354
Author(s):  
Daniel Kang ◽  
John Guibas ◽  
Peter Bailis ◽  
Tatsunori Hashimoto ◽  
Yi Sun ◽  
...  

Researchers and industry analysts are increasingly interested in computing aggregation queries over large, unstructured datasets with selective predicates that are computed using expensive deep neural networks (DNNs). As these DNNs are expensive and because many applications can tolerate approximate answers, analysts are interested in accelerating these queries via approximations. Unfortunately, standard approximate query processing techniques to accelerate such queries are not applicable because they assume the result of the predicates are available ahead of time. Furthermore, recent work using cheap approximations (i.e., proxies) do not support aggregation queries with predicates. To accelerate aggregation queries with expensive predicates, we develop and analyze a query processing algorithm that leverages proxies (ABAE). ABAE must account for the key challenge that it may sample records that do not satisfy the predicate. To address this challenge, we first use the proxy to group records into strata so that records satisfying the predicate are ideally grouped into few strata. Given these strata, ABAE uses pilot sampling and plugin estimates to sample according to the optimal allocation. We show that ABAE converges at an optimal rate in a novel analysis of stratified sampling with draws that may not satisfy the predicate. We further show that ABAE outperforms on baselines on six real-world datasets, reducing labeling costs by up to 2.3X.


Sensors ◽  
2021 ◽  
Vol 21 (12) ◽  
pp. 4160
Author(s):  
Isam Mashhour Al Jawarneh ◽  
Paolo Bellavista ◽  
Antonio Corradi ◽  
Luca Foschini ◽  
Rebecca Montanari

Large amounts of georeferenced data streams arrive daily to stream processing systems. This is attributable to the overabundance of affordable IoT devices. In addition, interested practitioners desire to exploit Internet of Things (IoT) data streams for strategic decision-making purposes. However, mobility data are highly skewed and their arrival rates fluctuate. This nature poses an extra challenge on data stream processing systems, which are required in order to achieve pre-specified latency and accuracy goals. In this paper, we propose ApproxSSPS, which is a system for approximate processing of geo-referenced mobility data, at scale with quality of service guarantees. We focus on stateful aggregations (e.g., means, counts) and top-N queries. ApproxSSPS features a controller that interactively learns the latency statistics and calculates proper sampling rates to meet latency or/and accuracy targets. An overarching trait of ApproxSSPS is its ability to strike a plausible balance between latency and accuracy targets. We evaluate ApproxSSPS on Apache Spark Structured Streaming with real mobility data. We also compared ApproxSSPS against a state-of-the-art online adaptive processing system. Our extensive experiments prove that ApproxSSPS can fulfill latency and accuracy targets with varying sets of parameter configurations and load intensities (i.e., transient peaks in data loads versus slow arriving streams). Moreover, our results show that ApproxSSPS outperforms the baseline counterpart by significant magnitudes. In short, ApproxSSPS is a novel spatial data stream processing system that can deliver real accurate results in a timely manner, by dynamically specifying the limits on data samples.


Author(s):  
Ran Ben Basat ◽  
Seungbum Jo ◽  
Srinivasa Rao Satti ◽  
Shubham Ugare

2021 ◽  
Vol 14 (8) ◽  
pp. 1365-1377
Author(s):  
Tiantian Liu ◽  
Huan Li ◽  
Hua Lu ◽  
Muhammad Aamir Cheema ◽  
Lidan Shou

Indoor venues accommodate many people who collectively form crowds. Such crowds in turn influence people's routing choices, e.g., people may prefer to avoid crowded rooms when walking from A to B. This paper studies two types of crowd-aware indoor path planning queries. The Indoor Crowd-Aware Fastest Path Query (FPQ) finds a path with the shortest travel time in the presence of crowds, whereas the Indoor Least Crowded Path Query (LCPQ) finds a path encountering the least objects en route. To process the queries, we design a unified framework with three major components. First, an indoor crowd model organizes indoor topology and captures object flows between rooms. Second, a time-evolving population estimator derives room populations for a future timestamp to support crowd-aware routing cost computations in query processing. Third, two exact and two approximate query processing algorithms process each type of query. All algorithms are based on graph traversal over the indoor crowd model and use the same search framework with different strategies of updating the populations during the search process. All proposals are evaluated experimentally on synthetic and real data. The experimental results demonstrate the efficiency and scalability of our framework and query processing algorithms.


2021 ◽  
Vol 546 ◽  
pp. 1113-1134
Author(s):  
Meifan Zhang ◽  
Hongzhi Wang

2021 ◽  
pp. 1-8
Author(s):  
Shuai Ma ◽  
Jinpeng Huai

Over the past a few years, research and development has made significant progresses on big data analytics. A fundamental issue for big data analytics is the efficiency. If the optimal solution is unable to attain or unnecessary or has a price to high to pay, it is reasonable to sacrifice optimality with a "good" feasible solution that can be computed efficiently. Existing approximation techniques can be in general classified into approximation algorithms, approximate query processing for aggregate SQL queries and approximation computing for multiple layers of the system stack. In this article, we systematically introduce approximate computation, i.e. , query approximation and data approximation, for efficient and effective big data analytics. We explain the ideas and rationales behind query and data approximation, and show efficiency can be obtained with high effectiveness, and even without sacrificing for effectiveness, for certain data analytic tasks.


Sign in / Sign up

Export Citation Format

Share Document