Probabilistic Hesitant Fuzzy Methods for Prioritizing Distributed Stream Processing Frameworks for IoT Applications

Mathematical Problems in Engineering ◽

10.1155/2021/6655477 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Zhimin Lin ◽

Chao Huang ◽

Mingwei Lin

Keyword(s):

Large Scale ◽

Evaluation Criteria ◽

Stream Processing ◽

Multicriteria Decision Making ◽

Practical Case ◽

Time Data ◽

Iot Applications ◽

Definition Of ◽

Real Time Data Processing ◽

Distributed Stream Processing

Distributed stream processing frameworks (DSPFs) are the vital engine, which can handle real-time data processing and analytics for IoT applications. How to prioritize DSPFs and select the most suitable one for special IoT applications is an open issue. To help developers of IoT applications to solve this complex issue, a novel probabilistic hesitant fuzzy multicriteria decision making (MCDM) model is put forward in this paper. To characterize the requirements for large-scale IoT data stream processing, a novel evaluation criteria system including qualitative and quantitative criteria is established. To accurately model the collective opinions from skilled developers and consider their psychological distance, the definition of probabilistic hesitant fuzzy sets (PHFSs) is used. To derive the importance degrees of criteria, a novel probabilistic hesitant fuzzy best-worst (PHFBW) method is proposed based on the score value. To prioritize the DSPFs and choose the most suitable one, a novel probabilistic hesitant fuzzy MULTIMOORA method is put forward. Finally, a practical case composed of four Apache stream processing frameworks, namely, Storm, Flink, Spark, and Samza, is studied. The obtained results indicate that throughput, latency, and reliability are considered to be the three most important criteria, and Flink is the most suitable stream framework.

Download Full-text

A Metadata and Z Score-based Load-Shedding Technique in IoT-based Data Collection Systems

International Journal of Mathematical Engineering and Management Sciences ◽

10.33889/ijmems.2021.6.1.023 ◽

2020 ◽

Vol 6 (1) ◽

pp. 363-382

Author(s):

Mario José Diván ◽

María Laura Sánchez-Reynoso

Keyword(s):

Data Collection ◽

Stream Processing ◽

Original Data ◽

Load Shedding ◽

Distributed Data ◽

Z Score ◽

Time Data ◽

Measurement Framework ◽

Real Time Data Processing ◽

Future Work

The Internet-of-Things (IoT) has emerged as an alternative to communicate different pieces of technology to foster the distributed data collection. The measurement projects and the Real-time data processing are articulated to take advantage of this environment, fostering a sustainable data-driven decision making. The Data Stream Processing Strategy (DSPS) is a Stream Processing Engine focused on measurement projects, where each concept is previously agreed through a measurement framework. The Measurement Adapter (MA) is a component whose responsibility is to pair each metric’s definition from the measurement project with data sensors to transmit data (i.e., measures) with metadata (i.e., tags indicating the data meaning) together. The Gathering Function (GF) receives and derivates data for its processing from each MA, while it implements load-shedding (LS) techniques based on Metadata to avoid a processing collapse when all MAs informs jointly and frequently. Here, a Metadata and Z-score based load-shedding technique implemented locally in the MA is proposed. Thus, the load-shedding is located at the same data source to avoid data transmission and saving resources. Also, an incremental estimation of average, deviations, covariance, and correlations are implemented and employed to calculate the Z-scores and to data retain/discard selectively. Four simulations discrete were designed and performed to analyze the proposal. Results indicate that the local LS required only 24% of the original data transmissions, a minimum of 18.61 ms as the data lifespan, while it consumes 890.26 KB. As future work, other kinds of dependencies analysis will be analyzed to provide local alternatives to LS.

Download Full-text

INS real-time data processing composite system using large scale general purpose computers

Nuclear Instruments and Methods in Physics Research ◽

10.1016/0167-5087(83)90425-8 ◽

1983 ◽

Vol 213 (2-3) ◽

pp. 317-327 ◽

Cited By ~ 3

Author(s):

Jun Kokame ◽

Motonobu Takano ◽

Tomoko Oshikubo ◽

Kurazo Chiba ◽

Kumataro Ukai ◽

...

Keyword(s):

Data Processing ◽

Real Time ◽

Large Scale ◽

Composite System ◽

General Purpose ◽

Time Data ◽

Real Time Data ◽

Real Time Data Processing

Download Full-text

Benchmarking Distributed Stream Processing Platforms for IoT Applications

Performance Evaluation and Benchmarking. Traditional - Big Data - Interest of Things - Lecture Notes in Computer Science ◽

10.1007/978-3-319-54334-5_7 ◽

2017 ◽

pp. 90-106 ◽

Cited By ~ 4

Author(s):

Anshu Shukla ◽

Yogesh Simmhan

Keyword(s):

Stream Processing ◽

Iot Applications ◽

Distributed Stream Processing

Download Full-text

Cluster-Based Systematic Data Aggregation Model (CSDAM) for Real-Time Data Processing in Large-Scale WSN

Wireless Personal Communications ◽

10.1007/s11277-020-07054-2 ◽

2020 ◽

Cited By ~ 1

Author(s):

M. Shobana ◽

R. Sabitha ◽

S. Karthik

Keyword(s):

Data Processing ◽

Real Time ◽

Data Aggregation ◽

Large Scale ◽

Time Data ◽

Aggregation Model ◽

Systematic Data ◽

Real Time Data ◽

Real Time Data Processing

Download Full-text

Evaluation of distributed stream processing frameworks for IoT applications in Smart Cities

Journal Of Big Data ◽

10.1186/s40537-019-0215-2 ◽

2019 ◽

Vol 6 (1) ◽

Cited By ~ 8

Author(s):

Hamid Nasiri ◽

Saeed Nasehi ◽

Maziar Goudarzi

Keyword(s):

Smart Cities ◽

Stream Processing ◽

Iot Applications ◽

Distributed Stream Processing

Download Full-text

Storage Optimization for Large-Scale Distributed Stream Processing Systems

2007 IEEE International Parallel and Distributed Processing Symposium ◽

10.1109/ipdps.2007.370633 ◽

2007 ◽

Cited By ~ 1

Author(s):

Kirsten Hildrum ◽

Fred Douglis ◽

Joel L. Wolf ◽

Philip Yu ◽

Lisa Fleischer ◽

...

Keyword(s):

Large Scale ◽

Stream Processing ◽

Storage Optimization ◽

Distributed Stream Processing

Download Full-text

Performance Analysis of Large-Scale Distributed Stream Processing Systems on the Cloud

2018 IEEE 11th International Conference on Cloud Computing (CLOUD) ◽

10.1109/cloud.2018.00103 ◽

2018 ◽

Cited By ~ 3

Author(s):

Tri Minh Truong ◽

Aaron Harwood ◽

Richard O. Sinnott ◽

Shiping Chen

Keyword(s):

Performance Analysis ◽

Large Scale ◽

Stream Processing ◽

Distributed Stream Processing

Download Full-text

Extended Kalman Filter for Large Scale Vessels Trajectory Tracking in Distributed Stream Processing Systems

Advanced Analytics and Learning on Temporal Data - Lecture Notes in Computer Science ◽

10.1007/978-3-030-39098-3_12 ◽

2020 ◽

pp. 151-166

Author(s):

Katarzyna Juraszek ◽

Nidhi Saini ◽

Marcela Charfuelan ◽

Holmer Hemsen ◽

Volker Markl

Keyword(s):

Kalman Filter ◽

Extended Kalman Filter ◽

Trajectory Tracking ◽

Large Scale ◽

Stream Processing ◽

Distributed Stream Processing

Download Full-text

Predicting the Stability of Large-scale Distributed Stream Processing Systems on the Cloud

Proceedings of the 7th International Conference on Cloud Computing and Services Science ◽

10.5220/0006357606030610 ◽

2017 ◽

Author(s):

Tri Minh Truong ◽

Aaron Harwood ◽

Richard O. Sinnott

Keyword(s):

Large Scale ◽

Stream Processing ◽

The Stability ◽

Distributed Stream Processing

Download Full-text

Distributed stream processing for genomics pipelines

10.7287/peerj.preprints.3338v1 ◽

2017 ◽

Author(s):

Francesco Versaci ◽

Luca Pireddu ◽

Gianluigi Zanetti

Keyword(s):

Personalized Medicine ◽

Large Scale ◽

Fault Tolerant ◽

Great Part ◽

Stream Processing ◽

Sequencing Data ◽

Modern Biology ◽

Scale Population ◽

Conventional Solution ◽

Distributed Stream Processing

Personalized medicine is in great part enabled by the progress in data acquisition technologies for modern biology, such as next-generation sequencing (NGS). Conventional NGS processing workflows are composed by independent tools implementing shared-memory parallelism which communicate by means of intermediate files. With increasing data sizes this approach is showing its limited scalability and robustness characteristics – problems that make it unsuitable for large-scale, population-wide personalized medicine applications. In this work we propose the adoption of the stream computing architecture to make the genomics pipeline more scalable, and fault-tolerant. We implemented the first processing phases for Illumina sequencing data – from raw data to alignment – using the Apache Flink distributed stream processing framework and Apache Kafka. The new pipeline has been tested processing the raw output of an Illumina HiSeq3000 sequencer and producing aligned reads in CRAM format. The results show near optimal scalability characteristics on experiments from 1 to 12 computing nodes, with a speed-up of 9.5x over the conventional solution (which cannot automatically run on multiple nodes). This result is particularly positive considering that the very short runtime of the experiment – less than 15 minutes – makes significant the constant time costs imposed by the overheads of the frameworks.

Download Full-text