A Distributed Stream Processing Middleware Framework for Real-Time Analysis of Heterogeneous Data on Big Data Platform: Case of Environmental Monitoring

Adeyinka Akanbi; Muthoni Masinde

doi:10.3390/s20113166

A Distributed Stream Processing Middleware Framework for Real-Time Analysis of Heterogeneous Data on Big Data Platform: Case of Environmental Monitoring

Sensors ◽

10.3390/s20113166 ◽

2020 ◽

Vol 20 (11) ◽

pp. 3166

Author(s):

Adeyinka Akanbi ◽

Muthoni Masinde

Keyword(s):

Big Data ◽

Environmental Monitoring ◽

Real Time ◽

Stream Processing ◽

Heterogeneous Data ◽

Legacy Systems ◽

Time Analysis ◽

Real Time Analysis ◽

Distributed Stream Processing ◽

Processing Engine

In recent years, the application and wide adoption of Internet of Things (IoT)-based technologies have increased the proliferation of monitoring systems, which has consequently exponentially increased the amounts of heterogeneous data generated. Processing and analysing the massive amount of data produced is cumbersome and gradually moving from classical ‘batch’ processing—extract, transform, load (ETL) technique to real-time processing. For instance, in environmental monitoring and management domain, time-series data and historical dataset are crucial for prediction models. However, the environmental monitoring domain still utilises legacy systems, which complicates the real-time analysis of the essential data, integration with big data platforms and reliance on batch processing. Herein, as a solution, a distributed stream processing middleware framework for real-time analysis of heterogeneous environmental monitoring and management data is presented and tested on a cluster using open source technologies in a big data environment. The system ingests datasets from legacy systems and sensor data from heterogeneous automated weather systems irrespective of the data types to Apache Kafka topics using Kafka Connect APIs for processing by the Kafka streaming processing engine. The stream processing engine executes the predictive numerical models and algorithms represented in event processing (EP) languages for real-time analysis of the data streams. To prove the feasibility of the proposed framework, we implemented the system using a case study scenario of drought prediction and forecasting based on the Effective Drought Index (EDI) model. Firstly, we transform the predictive model into a form that could be executed by the streaming engine for real-time computing. Secondly, the model is applied to the ingested data streams and datasets to predict drought through persistent querying of the infinite streams to detect anomalies. As a conclusion of this study, a performance evaluation of the distributed stream processing middleware infrastructure is calculated to determine the real-time effectiveness of the framework.

Download Full-text

Real Time Analysis System of User Behavior from the Perspective of Big Data

IOP Conference Series Materials Science and Engineering ◽

10.1088/1757-899x/750/1/012055 ◽

2020 ◽

Vol 750 ◽

pp. 012055

Author(s):

Zhiqiang Cai ◽

Jiaai Zhang

Keyword(s):

Big Data ◽

Real Time ◽

User Behavior ◽

Time Analysis ◽

Real Time Analysis ◽

Analysis System

Download Full-text

A scalable machine learning online service for big data real-time analysis

2014 IEEE Symposium on Computational Intelligence in Big Data (CIBD) ◽

10.1109/cibd.2014.7011537 ◽

2014 ◽

Cited By ~ 13

Author(s):

Alejandro Baldominos ◽

Esperanza Albacete ◽

Yago Saez ◽

Pedro Isasi

Keyword(s):

Machine Learning ◽

Big Data ◽

Real Time ◽

Time Analysis ◽

Online Service ◽

Real Time Analysis

Download Full-text

A Flexible IoT Stream Processing Architecture Based on Microservices

Information ◽

10.3390/info11120565 ◽

2020 ◽

Vol 11 (12) ◽

pp. 565

Author(s):

Luca Bixio ◽

Giorgio Delzanno ◽

Stefano Rebora ◽

Matteo Rulli

Keyword(s):

Real Time ◽

Query Language ◽

Stream Processing ◽

Heterogeneous Data ◽

Core Level ◽

Processing Unit ◽

Reference Architecture ◽

Real Time Processing ◽

Stream Processing Engines ◽

Processing Engine

The Internet of Things (IoT) has created new and challenging opportunities for data analytics. The IoT represents an infinitive source of massive and heterogeneous data, whose real-time processing is an increasingly important issue. IoT applications usually consist of multiple technological layers connecting ‘things’ to a remote cloud core. These layers are generally grouped into two macro levels: the edge level (consisting of the devices at the boundary of the network near the devices that produce the data) and the core level (consisting of the remote cloud components of the application). The aim of this work is to propose an adaptive microservices architecture for IoT platforms which provides real-time stream processing functionalities that can seamlessly both at the edge-level and cloud-level. More in detail, we introduce the notion of μ-service, a stream processing unit that can be indifferently allocated on the edge and core level, and a Reference Architecture that provides all necessary services (namely Proxy, Adapter and Data Processing μ-services) for dealing with real-time stream processing in a very flexible way. Furthermore, in order to abstract away from the underlying stream processing engine and IoT layers (edge/cloud), we propose: (1) a service definition language consisting of a configuration language based on JSON objects (interoperability), (2) a rule-based query language with basic filter operations that can be compiled to most of the existing stream processing engines (portability), and (3) a combinator language to build pipelines of filter definitions (compositionality). Although our proposal has been designed to extend the Senseioty platform, a proprietary IoT platform developed by FlairBit, it could be adapted to every platform based on similar technologies. As a proof of concept, we provide details of a preliminary prototype based on the Java OSGi framework.

Download Full-text

Visualized Analysis of Tourism Big Data based on Real-Time Analysis and Complexity Measurement

2021 6th International Conference on Inventive Computation Technologies (ICICT) ◽

10.1109/icict50816.2021.9358758 ◽

2021 ◽

Author(s):

Ling Liu

Keyword(s):

Big Data ◽

Real Time ◽

Time Analysis ◽

Real Time Analysis ◽

Complexity Measurement

Download Full-text

Real-time analysis of healthcare using big data analytics

IOP Conference Series Materials Science and Engineering ◽

10.1088/1757-899x/263/4/042056 ◽

2017 ◽

Vol 263 ◽

pp. 042056 ◽

Cited By ~ 3

Author(s):

J Antony Basco ◽

N C Senthilkumar

Keyword(s):

Big Data ◽

Real Time ◽

Data Analytics ◽

Big Data Analytics ◽

Time Analysis ◽

Real Time Analysis

Download Full-text

Real-time Analysis and Visualization for Big Data of Energy Consumption

Proceedings of the 2017 International Conference on Software and e-Business - ICSEB 2017 ◽

10.1145/3178212.3178229 ◽

2017 ◽

Author(s):

Jiaxue Li ◽

Wei Song ◽

Simon Fong

Keyword(s):

Big Data ◽

Energy Consumption ◽

Real Time ◽

Time Analysis ◽

Real Time Analysis

Download Full-text

The Metamorphosis (of RAM3S)

Applied Sciences ◽

10.3390/app112411584 ◽

2021 ◽

Vol 11 (24) ◽

pp. 11584

Author(s):

Ilaria Bartolini ◽

Marco Patella

Keyword(s):

Big Data ◽

Real Time ◽

Data Streams ◽

Multimedia Data ◽

Data Streaming ◽

Time Analysis ◽

Real Time Analysis ◽

Big Data Technologies ◽

Big Data Streams ◽

The One

The real-time analysis of Big Data streams is a terrific resource for transforming data into value. For this, Big Data technologies for smart processing of massive data streams are available, but the facilities they offer are often too raw to be effectively exploited by analysts. RAM3S (Real-time Analysis of Massive MultiMedia Streams) is a framework that acts as a middleware software layer between multimedia stream analysis techniques and Big Data streaming platforms, so as to facilitate the implementation of the former on top of the latter. RAM3S has been proven helpful in simplifying the deployment of non-parallel techniques to streaming platforms, such as Apache Storm or Apache Flink. In this paper, we show how RAM3S has been updated to incorporate novel stream processing platforms, such as Apache Samza, and to be able to communicate with different message brokers, such as Apache Kafka. Abstracting from the message broker also provides us with the ability to pipeline several RAM3S instances that can, therefore, perform different processing tasks. This represents a richer model for stream analysis with respect to the one already available in the original RAM3S version. The generality of this new RAM3S version is demonstrated through experiments conducted on three different multimedia applications, proving that RAM3S is a formidable asset for enabling efficient and effective Data Mining and Machine Learning on multimedia data streams.

Download Full-text

RDF stream processing with CQELS framework for real-time analysis

Proceedings of the 9th ACM International Conference on Distributed Event-Based Systems - DEBS '15 ◽

10.1145/2675743.2772586 ◽

2015 ◽

Cited By ~ 1

Author(s):

Danh Le Phuoc ◽

Minh Dao-Tran ◽

Anh Le Tuan ◽

Manh Nguyen Duc ◽

Manfred Hauswirth

Keyword(s):

Real Time ◽

Stream Processing ◽

Time Analysis ◽

Real Time Analysis

Download Full-text

Efficient Foreground Extraction From HEVC Compressed Video for Application to Real-Time Analysis of Surveillance ‘Big’ Data

IEEE Transactions on Image Processing ◽

10.1109/tip.2015.2445631 ◽

2015 ◽

Vol 24 (11) ◽

pp. 3574-3585 ◽

Cited By ~ 14

Author(s):

Bhaskar Dey ◽

Malay K. Kundu

Keyword(s):

Big Data ◽

Real Time ◽

Compressed Video ◽

Time Analysis ◽

Foreground Extraction ◽

Real Time Analysis

Download Full-text

Research on Real-time Analysis and Hybrid Encryption of Big Data

2019 2nd International Conference on Artificial Intelligence and Big Data (ICAIBD) ◽

10.1109/icaibd.2019.8836992 ◽

2019 ◽

Cited By ~ 2

Author(s):

Yang Hui ◽

Li Zesong

Keyword(s):

Big Data ◽

Real Time ◽

Time Analysis ◽

Hybrid Encryption ◽

Real Time Analysis

Download Full-text