An efficient approach for low latency processing in stream data

PeerJ Computer Science ◽

10.7717/peerj-cs.426 ◽

2021 ◽

Vol 7 ◽

pp. e426

Author(s):

Nirav Bhatt ◽

Amit Thakkar

Keyword(s):

Stock Market ◽

Stream Processing ◽

Big Data Analytics ◽

Processing System ◽

Window Size ◽

Arrival Rate ◽

Low Latency ◽

Stream Data ◽

Distributed Environment ◽

Real World Application

Stream data is the data that is generated continuously from the different data sources and ideally defined as the data that has no discrete beginning or end. Processing the stream data is a part of big data analytics that aims at querying the continuously arriving data and extracting meaningful information from the stream. Although earlier processing of such stream was using batch analytics, nowadays there are applications like the stock market, patient monitoring, and traffic analysis which can cause a drastic difference in processing, if the output is generated in levels of hours and minutes. The primary goal of any real-time stream processing system is to process the stream data as soon as it arrives. Correspondingly, analytics of the stream data also needs consideration of surrounding dependent data. For example, stock market analytics results are often useless if we do not consider their associated or dependent parameters which affect the result. In a real-world application, these dependent stream data usually arrive from the distributed environment. Hence, the stream processing system has to be designed, which can deal with the delay in the arrival of such data from distributed sources. We have designed the stream processing model which can deal with all the possible latency and provide an end-to-end low latency system. We have performed the stock market prediction by considering affecting parameters, such as USD, OIL Price, and Gold Price with an equal arrival rate. We have calculated the Normalized Root Mean Square Error (NRMSE) which simplifies the comparison among models with different scales. A comparative analysis of the experiment presented in the report shows a significant improvement in the result when considering the affecting parameters. In this work, we have used the statistical approach to forecast the probability of possible data latency arrives from distributed sources. Moreover, we have performed preprocessing of stream data to ensure at-least-once delivery semantics. In the direction towards providing low latency in processing, we have also implemented exactly-once processing semantics. Extensive experiments have been performed with varying sizes of the window and data arrival rate. We have concluded that system latency can be reduced when the window size is equal to the data arrival rate.

Download Full-text

s2p: Provenance Research for Stream Processing System

Applied Sciences ◽

10.3390/app11125523 ◽

2021 ◽

Vol 11 (12) ◽

pp. 5523

Author(s):

Qian Ye ◽

Minyan Lu

Keyword(s):

Data Storage ◽

Stream Processing ◽

Processing System ◽

Coarse Grained ◽

Stream Data ◽

Related Data ◽

Dsp System ◽

Provenance Research ◽

Dsp Systems ◽

Abnormal Results

The main purpose of our provenance research for DSP (distributed stream processing) systems is to analyze abnormal results. Provenance for these systems is not nontrivial because of the ephemerality of stream data and instant data processing mode in modern DSP systems. Challenges include but are not limited to an optimization solution for avoiding excessive runtime overhead, reducing provenance-related data storage, and providing it in an easy-to-use fashion. Without any prior knowledge about which kinds of data may finally lead to the abnormal, we have to track all transformations in detail, which potentially causes hard system burden. This paper proposes s2p (Stream Process Provenance), which mainly consists of online provenance and offline provenance, to provide fine- and coarse-grained provenance in different precision. We base our design of s2p on the fact that, for a mature online DSP system, the abnormal results are rare, and the results that require a detailed analysis are even rarer. We also consider state transition in our provenance explanation. We implement s2p on Apache Flink named as s2p-flink and conduct three experiments to evaluate its scalability, efficiency, and overhead from end-to-end cost, throughput, and space overhead. Our evaluation shows that s2p-flink incurs a 13% to 32% cost overhead, 11% to 24% decline in throughput, and few additional space costs in the online provenance phase. Experiments also demonstrates the s2p-flink can scale well. A case study is presented to demonstrate the feasibility of the whole s2p solution.

Download Full-text

Load adaptive and fault tolerant distributed stream processing system for explosive stream data

2016 18th International Conference on Advanced Communication Technology (ICACT) ◽

10.1109/icact.2016.7423612 ◽

2016 ◽

Author(s):

Myungcheol Lee ◽

Miyoung Lee ◽

Sung Jin Hur ◽

Ikkyun Kim

Keyword(s):

Fault Tolerant ◽

Stream Processing ◽

Processing System ◽

Stream Data ◽

Distributed Stream Processing

Download Full-text

Load adaptive distributed stream processing system for explosive stream data

2015 17th International Conference on Advanced Communication Technology (ICACT) ◽

10.1109/icact.2015.7224896 ◽

2015 ◽

Cited By ~ 1

Author(s):

Myungcheol Lee ◽

Miyoung Lee ◽

Sung Jin Hur ◽

Ikkyun Kim

Keyword(s):

Stream Processing ◽

Processing System ◽

Stream Data ◽

Distributed Stream Processing

Download Full-text

A QoS-Latency Aware Event Stream Processing with Elastic-FaaS

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.j9965.0881019 ◽

2019 ◽

Vol 8 (10) ◽

pp. 3756-3762

Keyword(s):

Real Time ◽

Virtual Machines ◽

Stream Processing ◽

Processing System ◽

Low Latency ◽

Event Stream ◽

Huge Impact ◽

Time Required ◽

The Right ◽

Cloud Technologies

Stream processing systems need to be elastically scalable to process and respond the unpredictable massive load spike in real-time with high throughput and low latency. Though the modern cloud technologies can help in elastically provisioning the required computing resources on-the-fly, finding out the right point-in-time varies among systems based on their expected QoS characteristics. The latency sensitivity of the stream processing applications varies based on their nature and pre-set requirements. For few applications, even a little latency in the response will have huge impact, whereas for others the little latency will not have that much impact. For the former ones, the processing systems are expected to be highly available, elastically scalable, and fast enough to perform, whenever there is a spike. The time required to elasticity provision the systems under FaaS is very high, comparing to provisioning the Virtual Machines and Containers. However, the current FaaS systems have some limitations that need to be overcome to handle the unexpected spike in real-time. This paper proposes a new algorithm called Elastic-FaaS on top of the existing FaaS to overcome this QoS latency issue. Our proposed algorithm will provision required number of FaaS container instances than any typical FaaS can provision normally, whenever there is a demand to avoid the latency issue. We have experimented our algorithm with an event stream processing system and the result shows that our proposed Elastic-FaaS algorithm performs better than typical FaaS by improving the throughput that meets the high accuracy and low latency requirements.

Download Full-text

Load adaptive and fault tolerant distributed stream processing system for explosive stream data

2016 18th International Conference on Advanced Communication Technology (ICACT) ◽

10.1109/icact.2016.7423613 ◽

2016 ◽

Author(s):

Myungcheol Lee ◽

Miyoung Lee ◽

Sung Jin Hur ◽

Ikkyun Kim

Keyword(s):

Fault Tolerant ◽

Stream Processing ◽

Processing System ◽

Stream Data ◽

Distributed Stream Processing

Download Full-text

Deep Learning Based Forecasting in Stock Market with Big Data Analytics

2019 Scientific Meeting on Electrical-Electronics & Biomedical Engineering and Computer Science (EBBT) ◽

10.1109/ebbt.2019.8741818 ◽

2019 ◽

Cited By ~ 2

Author(s):

Gozde Sismanoglu ◽

Mehmet Ali Onde ◽

Furkan Kocer ◽

Ozgur Koray Sahingoz

Keyword(s):

Big Data ◽

Deep Learning ◽

Stock Market ◽

Data Analytics ◽

Big Data Analytics

Download Full-text

Architecture of a stream processing system

Fundamentals of Stream Processing ◽

10.1017/cbo9781139058940.009 ◽

2014 ◽

pp. 203-217

Author(s):

Henrique Andrade ◽

Bugra Gedik ◽

Deepak Turaga

Keyword(s):

Stream Processing ◽

Processing System

Download Full-text

Integrating fault-tolerance and elasticity in a distributed data stream processing system

Proceedings of the 26th International Conference on Scientific and Statistical Database Management - SSDBM '14 ◽

10.1145/2618243.2618288 ◽

2014 ◽

Cited By ~ 7

Author(s):

Kasper Grud Skat Madsen ◽

Philip Thyssen ◽

Yongluan Zhou

Keyword(s):

Fault Tolerance ◽

Data Stream ◽

Stream Processing ◽

Processing System ◽

Distributed Data ◽

Data Stream Processing

Download Full-text

Efficient Sensor Stream Data Processing System to use Cache Technique for Ubiquitous Sensor Network Application Service

Journal of Computer Science ◽

10.3844/jcssp.2012.333.336 ◽

2012 ◽

Vol 8 (3) ◽

pp. 333-336 ◽

Cited By ~ 2

Author(s):

Keyword(s):

Data Processing ◽

Sensor Network ◽

Processing System ◽

Stream Data ◽

Data Processing System ◽

Application Service ◽

Network Application ◽

Stream Data Processing

Download Full-text

A Containerized Approach for Allocating Distributed Stream Queries to Fog Nodes

10.36227/techrxiv.14151650.v1 ◽

2021 ◽

Author(s):

Hamed Hasibi ◽

Saeed Sedighian Kashi

Keyword(s):

Fog Computing ◽

Stream Processing ◽

Stream Data ◽

Process Data ◽

Stream Query Processing ◽

Tremendous Amount ◽

Stream Processing Engines ◽

Iot Devices ◽

Distributed Stream Processing

Fog computing brings cloud capabilities closer to the Internet of Things (IoT) devices. IoT devices generate a tremendous amount of stream data towards the cloud via hierarchical fog nodes. To process data streams, many Stream Processing Engines (SPEs) have been developed. Without the fog layer, the stream query processing executes on the cloud, which forwards much traffic toward the cloud. When a hierarchical fog layer is available, a complex query can be divided into simple queries to run on fog nodes by using distributed stream processing. In this paper, we propose an approach to assign stream queries to fog nodes using container technology. We name this approach Stream Queries Placement in Fog (SQPF). Our goal is to minimize end-to-end delay to achieve a better quality of service. At first, in the emulation step, we make docker container instances from SPEs and evaluate their processing delay and throughput under different resource configurations and queries with varying input rates. Then in the placement step, we assign queries among fog nodes by using a genetic algorithm. The practical approach used in SQPF achieves a near-the-best assignment based on the lowest application deadline in real scenarios, and evaluation results are evidence of this goal.

Download Full-text