An Overview of Large-Scale Stream Processing Engines

2014 ◽  
pp. 404-423 ◽  
2021 ◽  
Author(s):  
Hamed Hasibi ◽  
Saeed Sedighian Kashi

Fog computing brings cloud capabilities closer to the Internet of Things (IoT) devices. IoT devices generate a tremendous amount of stream data towards the cloud via hierarchical fog nodes. To process data streams, many Stream Processing Engines (SPEs) have been developed. Without the fog layer, the stream query processing executes on the cloud, which forwards much traffic toward the cloud. When a hierarchical fog layer is available, a complex query can be divided into simple queries to run on fog nodes by using distributed stream processing. In this paper, we propose an approach to assign stream queries to fog nodes using container technology. We name this approach Stream Queries Placement in Fog (SQPF). Our goal is to minimize end-to-end delay to achieve a better quality of service. At first, in the emulation step, we make docker container instances from SPEs and evaluate their processing delay and throughput under different resource configurations and queries with varying input rates. Then in the placement step, we assign queries among fog nodes by using a genetic algorithm. The practical approach used in SQPF achieves a near-the-best assignment based on the lowest application deadline in real scenarios, and evaluation results are evidence of this goal.


2009 ◽  
Vol 8 (2) ◽  
pp. 87-106 ◽  
Author(s):  
Wim De Pauw ◽  
Henrique Andrade

Stream processing is a new and important computing paradigm. Innovative streaming applications are being developed in areas ranging from scientific applications (for example, environment monitoring), to business intelligence (for example, fraud detection and trend analysis), to financial markets (for example, algorithmic trading systems). In this paper we describe Streamsight, a new visualization tool built to examine, monitor and help understand the dynamic behavior of streaming applications. Streamsight can handle the complex, distributed and large-scale nature of stream processing applications by using hierarchical graphs, multi-perspective visualizations, and de-cluttering strategies. To address the dynamic and adaptive nature of these applications, Streamsight also provides real-time visualization as well as the capability to record and replay. All these features are used for debugging, for performance optimization, and for management of resources, including capacity planning. More than 100 developers, both inside and outside IBM, have been using Streamsight to help design and implement large-scale stream processing applications.


Author(s):  
Zhen'an Zhang ◽  
Dongjie Zhang ◽  
Xiaopeng Yu ◽  
Jing Wang ◽  
Chunjiang He ◽  
...  

Electronics ◽  
2020 ◽  
Vol 9 (11) ◽  
pp. 1857
Author(s):  
Siwoon Son ◽  
Yang-Sae Moon

Distributed stream processing engines (DSPEs) deploy multiple tasks on distributed servers to process data streams in real time. Many DSPEs have provided locality-aware stream partitioning (LSP) methods to reduce network communication costs. However, an even job scheduler provided by DSPEs deploys tasks far away from each other on the distributed servers, which cannot use the LSP properly. In this paper, we propose a Locality/Fairness-aware job scheduler (L/F job scheduler) that considers locality together to solve problems of the even job scheduler that only considers fairness. First, the L/F job scheduler increases cohesion of contiguous tasks that require message transmissions for the locality. At the same time, it reduces coupling of parallel tasks that do not require message transmissions for the fairness. Next, we connect the contiguous tasks into a stream pipeline and evenly deploy stream pipelines to the distributed servers so that the L/F job scheduler achieves high cohesion and low coupling. Finally, we implement the proposed L/F job scheduler in Apache Storm, a representative DSPE, and evaluate it in both synthetic and real-world workloads. Experimental results show that the L/F job scheduler is similar in throughput compared to the even job scheduler, but latency is significantly improved by up to 139.2% for the LSP applications and by up to 140.7% even for the non-LSP applications. The L/F job scheduler also improves latency by 19.58% and 12.13%, respectively, in two real-world workloads. These results indicate that our L/F job scheduler provides superior processing performance for the DSPE applications.


Information ◽  
2020 ◽  
Vol 11 (12) ◽  
pp. 565
Author(s):  
Luca Bixio ◽  
Giorgio Delzanno ◽  
Stefano Rebora ◽  
Matteo Rulli

The Internet of Things (IoT) has created new and challenging opportunities for data analytics. The IoT represents an infinitive source of massive and heterogeneous data, whose real-time processing is an increasingly important issue. IoT applications usually consist of multiple technological layers connecting ‘things’ to a remote cloud core. These layers are generally grouped into two macro levels: the edge level (consisting of the devices at the boundary of the network near the devices that produce the data) and the core level (consisting of the remote cloud components of the application). The aim of this work is to propose an adaptive microservices architecture for IoT platforms which provides real-time stream processing functionalities that can seamlessly both at the edge-level and cloud-level. More in detail, we introduce the notion of μ-service, a stream processing unit that can be indifferently allocated on the edge and core level, and a Reference Architecture that provides all necessary services (namely Proxy, Adapter and Data Processing μ-services) for dealing with real-time stream processing in a very flexible way. Furthermore, in order to abstract away from the underlying stream processing engine and IoT layers (edge/cloud), we propose: (1) a service definition language consisting of a configuration language based on JSON objects (interoperability), (2) a rule-based query language with basic filter operations that can be compiled to most of the existing stream processing engines (portability), and (3) a combinator language to build pipelines of filter definitions (compositionality). Although our proposal has been designed to extend the Senseioty platform, a proprietary IoT platform developed by FlairBit, it could be adapted to every platform based on similar technologies. As a proof of concept, we provide details of a preliminary prototype based on the Java OSGi framework.


Sign in / Sign up

Export Citation Format

Share Document