An Overview of Large-Scale Stream Processing Engines

A Containerized Approach for Allocating Distributed Stream Queries to Fog Nodes

10.36227/techrxiv.14151650.v1 ◽

2021 ◽

Author(s):

Hamed Hasibi ◽

Saeed Sedighian Kashi

Keyword(s):

Fog Computing ◽

Stream Processing ◽

Stream Data ◽

Process Data ◽

Stream Query Processing ◽

Tremendous Amount ◽

Stream Processing Engines ◽

Iot Devices ◽

Distributed Stream Processing

Fog computing brings cloud capabilities closer to the Internet of Things (IoT) devices. IoT devices generate a tremendous amount of stream data towards the cloud via hierarchical fog nodes. To process data streams, many Stream Processing Engines (SPEs) have been developed. Without the fog layer, the stream query processing executes on the cloud, which forwards much traffic toward the cloud. When a hierarchical fog layer is available, a complex query can be divided into simple queries to run on fog nodes by using distributed stream processing. In this paper, we propose an approach to assign stream queries to fog nodes using container technology. We name this approach Stream Queries Placement in Fog (SQPF). Our goal is to minimize end-to-end delay to achieve a better quality of service. At first, in the emulation step, we make docker container instances from SPEs and evaluate their processing delay and throughput under different resource configurations and queries with varying input rates. Then in the placement step, we assign queries among fog nodes by using a genetic algorithm. The practical approach used in SQPF achieves a near-the-best assignment based on the lowest application deadline in real scenarios, and evaluation results are evidence of this goal.

Download Full-text

Benchmarking Tool for Modern Distributed Stream Processing Engines

2019 International Conference on Information Networking (ICOIN) ◽

10.1109/icoin.2019.8718106 ◽

2019 ◽

Author(s):

Muhammad Hanif ◽

Hyeongdeok Yoon ◽

Choonhwa Lee

Keyword(s):

Stream Processing ◽

Stream Processing Engines ◽

Distributed Stream Processing

Download Full-text

StreamMine3G: Elastic and Fault Tolerant Large Scale Stream Processing

Encyclopedia of Big Data Technologies ◽

10.1007/978-3-319-63962-8_145-1 ◽

2018 ◽

pp. 1-10

Author(s):

André Martin ◽

Andrey Brito ◽

Christof Fetzer

Keyword(s):

Large Scale ◽

Fault Tolerant ◽

Stream Processing

Download Full-text

QoS-aware resource allocation for stream processing engines using priority channels

2017 IEEE 16th International Symposium on Network Computing and Applications (NCA) ◽

10.1109/nca.2017.8171365 ◽

2017 ◽

Author(s):

Yidan Wang ◽

Zahir Tari ◽

M. Reza HoseinyFarahabady ◽

Albert Y. Zomaya

Keyword(s):

Resource Allocation ◽

Stream Processing ◽

Stream Processing Engines

Download Full-text

Towards large-scale graph stream processing platform

Proceedings of the 23rd International Conference on World Wide Web - WWW '14 Companion ◽

10.1145/2567948.2580051 ◽

2014 ◽

Cited By ~ 9

Author(s):

Toyotaro Suzumura ◽

Shunsuke Nishii ◽

Masaru Ganse

Keyword(s):

Large Scale ◽

Stream Processing ◽

Processing Platform

Download Full-text

Visualizing Large-Scale Streaming Applications

Information Visualization ◽

10.1057/ivs.2009.5 ◽

2009 ◽

Vol 8 (2) ◽

pp. 87-106 ◽

Cited By ~ 6

Author(s):

Wim De Pauw ◽

Henrique Andrade

Keyword(s):

Performance Optimization ◽

Capacity Planning ◽

Large Scale ◽

Stream Processing ◽

Trading Systems ◽

Streaming Applications ◽

Computing Paradigm ◽

Management Of Resources ◽

Real Time Visualization ◽

Adaptive Nature

Stream processing is a new and important computing paradigm. Innovative streaming applications are being developed in areas ranging from scientific applications (for example, environment monitoring), to business intelligence (for example, fraud detection and trend analysis), to financial markets (for example, algorithmic trading systems). In this paper we describe Streamsight, a new visualization tool built to examine, monitor and help understand the dynamic behavior of streaming applications. Streamsight can handle the complex, distributed and large-scale nature of stream processing applications by using hierarchical graphs, multi-perspective visualizations, and de-cluttering strategies. To address the dynamic and adaptive nature of these applications, Streamsight also provides real-time visualization as well as the capability to record and replay. All these features are used for debugging, for performance optimization, and for management of resources, including capacity planning. More than 100 developers, both inside and outside IBM, have been using Streamsight to help design and implement large-scale stream processing applications.

Download Full-text

Alovera: A Fast Stream Processing System for Large-Scale Data

2013 8th ChinaGrid Annual Conference ◽

10.1109/chinagrid.2013.9 ◽

2013 ◽

Author(s):

Zhen'an Zhang ◽

Dongjie Zhang ◽

Xiaopeng Yu ◽

Jing Wang ◽

Chunjiang He ◽

...

Keyword(s):

Large Scale ◽

Stream Processing ◽

Processing System ◽

Large Scale Data ◽

Scale Data

Download Full-text

An adaptive SLA-based data flow mechanism for stream processing engines

2017 International Conference on Information and Communication Technology Convergence (ICTC) ◽

10.1109/ictc.2017.8190947 ◽

2017 ◽

Cited By ~ 5

Author(s):

Muhammad Hanif ◽

Hyungduk Yoon ◽

Sunglim Jang ◽

Choonhwa Lee

Keyword(s):

Data Flow ◽

Stream Processing ◽

Flow Mechanism ◽

Stream Processing Engines

Download Full-text

Locality/Fairness-Aware Job Scheduling in Distributed Stream Processing Engines

Electronics ◽

10.3390/electronics9111857 ◽

2020 ◽

Vol 9 (11) ◽

pp. 1857

Author(s):

Siwoon Son ◽

Yang-Sae Moon

Keyword(s):

Real World ◽

Data Streams ◽

Job Scheduling ◽

Stream Processing ◽

Process Data ◽

Parallel Tasks ◽

Stream Processing Engines ◽

Job Scheduler ◽

Distributed Stream Processing ◽

Apache Storm

Distributed stream processing engines (DSPEs) deploy multiple tasks on distributed servers to process data streams in real time. Many DSPEs have provided locality-aware stream partitioning (LSP) methods to reduce network communication costs. However, an even job scheduler provided by DSPEs deploys tasks far away from each other on the distributed servers, which cannot use the LSP properly. In this paper, we propose a Locality/Fairness-aware job scheduler (L/F job scheduler) that considers locality together to solve problems of the even job scheduler that only considers fairness. First, the L/F job scheduler increases cohesion of contiguous tasks that require message transmissions for the locality. At the same time, it reduces coupling of parallel tasks that do not require message transmissions for the fairness. Next, we connect the contiguous tasks into a stream pipeline and evenly deploy stream pipelines to the distributed servers so that the L/F job scheduler achieves high cohesion and low coupling. Finally, we implement the proposed L/F job scheduler in Apache Storm, a representative DSPE, and evaluate it in both synthetic and real-world workloads. Experimental results show that the L/F job scheduler is similar in throughput compared to the even job scheduler, but latency is significantly improved by up to 139.2% for the LSP applications and by up to 140.7% even for the non-LSP applications. The L/F job scheduler also improves latency by 19.58% and 12.13%, respectively, in two real-world workloads. These results indicate that our L/F job scheduler provides superior processing performance for the DSPE applications.

Download Full-text

A Flexible IoT Stream Processing Architecture Based on Microservices

Information ◽

10.3390/info11120565 ◽

2020 ◽

Vol 11 (12) ◽

pp. 565

Author(s):

Luca Bixio ◽

Giorgio Delzanno ◽

Stefano Rebora ◽

Matteo Rulli

Keyword(s):

Real Time ◽

Query Language ◽

Stream Processing ◽

Heterogeneous Data ◽

Core Level ◽

Processing Unit ◽

Reference Architecture ◽

Real Time Processing ◽

Stream Processing Engines ◽

Processing Engine

The Internet of Things (IoT) has created new and challenging opportunities for data analytics. The IoT represents an infinitive source of massive and heterogeneous data, whose real-time processing is an increasingly important issue. IoT applications usually consist of multiple technological layers connecting ‘things’ to a remote cloud core. These layers are generally grouped into two macro levels: the edge level (consisting of the devices at the boundary of the network near the devices that produce the data) and the core level (consisting of the remote cloud components of the application). The aim of this work is to propose an adaptive microservices architecture for IoT platforms which provides real-time stream processing functionalities that can seamlessly both at the edge-level and cloud-level. More in detail, we introduce the notion of μ-service, a stream processing unit that can be indifferently allocated on the edge and core level, and a Reference Architecture that provides all necessary services (namely Proxy, Adapter and Data Processing μ-services) for dealing with real-time stream processing in a very flexible way. Furthermore, in order to abstract away from the underlying stream processing engine and IoT layers (edge/cloud), we propose: (1) a service definition language consisting of a configuration language based on JSON objects (interoperability), (2) a rule-based query language with basic filter operations that can be compiled to most of the existing stream processing engines (portability), and (3) a combinator language to build pipelines of filter definitions (compositionality). Although our proposal has been designed to extend the Senseioty platform, a proprietary IoT platform developed by FlairBit, it could be adapted to every platform based on similar technologies. As a proof of concept, we provide details of a preliminary prototype based on the Java OSGi framework.

Download Full-text