Monitoring Data Streams at Process Level in Scientific Big Data Batch Clusters

We are living in the age of big data, a majority of which is stream data. The real-time processing of this data requires careful consideration from different perspectives. Concept drift is a change in the data’s underlying distribution, a significant issue, especially when learning from data streams. It requires learners to be adaptive to dynamic changes. Random forest is an ensemble approach that is widely used in classical non-streaming settings of machine learning applications. At the same time, the Adaptive Random Forest (ARF) is a stream learning algorithm that showed promising results in terms of its accuracy and ability to deal with various types of drift. The incoming instances’ continuity allows for their binomial distribution to be approximated to a Poisson(1) distribution. In this study, we propose a mechanism to increase such streaming algorithms’ efficiency by focusing on resampling. Our measure, resampling effectiveness (ρ), fuses the two most essential aspects in online learning; accuracy and execution time. We use six different synthetic data sets, each having a different type of drift, to empirically select the parameter λ of the Poisson distribution that yields the best value for ρ. By comparing the standard ARF with its tuned variations, we show that ARF performance can be enhanced by tackling this important aspect. Finally, we present three case studies from different contexts to test our proposed enhancement method and demonstrate its effectiveness in processing large data sets: (a) Amazon customer reviews (written in English), (b) hotel reviews (in Arabic), and (c) real-time aspect-based sentiment analysis of COVID-19-related tweets in the United States during April 2020. Results indicate that our proposed method of enhancement exhibited considerable improvement in most of the situations.

Download Full-text

Editor's notes on Special Issue on “Statistical Process Control for Big Data Streams”

Journal of Quality Technology ◽

10.1080/00224065.2018.1532698 ◽

2018 ◽

Vol 50 (4) ◽

pp. 327-328

Author(s):

Giovanna Capizzi ◽

Changliang Zou ◽

Fugee Tsung

Keyword(s):

Big Data ◽

Process Control ◽

Statistical Process Control ◽

Data Streams ◽

Special Issue ◽

Statistical Process ◽

Big Data Streams

Download Full-text

A Fitting Approach to Construct and Measurement Alignment

Organizational Research Methods ◽

10.1177/1094428117728372 ◽

2017 ◽

Vol 21 (3) ◽

pp. 592-632 ◽

Cited By ~ 19

Author(s):

Margaret M. Luciano ◽

John E. Mathieu ◽

Semin Park ◽

Scott I. Tannenbaum

Keyword(s):

Big Data ◽

Data Streams ◽

Iterative Process ◽

Emerging Technologies ◽

Measurement Techniques ◽

Great Promise ◽

Big Data Technologies ◽

Nearly Continuous ◽

Dynamic Phenomena ◽

Over Time

Many phenomena of interest to management and psychology scholars are dynamic and change over time. One of the primary impediments to the examination of dynamic phenomena has been challenges associated with collecting data at a sufficient frequency and duration to accurately model such changes. Emerging technologies that produce nearly continuous streams of big data offer great promise to address those challenges; however, they introduce new methodological challenges and construct validity concerns. We seek to integrate the emerging big data technologies into the existing repertoire of measurement techniques and advance an iterative process to enhance their measurement fit. First, we provide an overview of dynamic constructs and temporal frameworks, highlighting their measurement implications. Second, we discuss different data streams and feature emerging technologies that leverage big data as a means to index dynamic constructs. Third, we integrate the previous sections and advance an iterative approach to achieving measurement fit, highlighting factors that make some measurement choices more suitable and viable than others. In so doing, we hope to accelerate the advancement of dynamic theories and methods.

Download Full-text

IoT Based Decision Making System to Improve Veracity of Big Data

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i3.1.16799 ◽

2018 ◽

Vol 7 (3.1) ◽

pp. 63 ◽

Cited By ~ 1

Author(s):

R Revathy ◽

R Aroul Canessane

Keyword(s):

Decision Making ◽

Big Data ◽

Internet Of Things ◽

Data Streams ◽

Data Cleaning ◽

Data Handling ◽

Secure Data ◽

Decision Making System ◽

Data Investigation ◽

Impact Data

Data are vital to help decision making. On the off chance that data have low veracity, choices are not liable to be sound. Internet of Things (IoT) quality rates big data with error, irregularity, deficiency, trickery, and model guess. Improving data veracity is critical to address these difficulties. In this article, we condense the key qualities and difficulties of IoT, which impact data handling and decision making. We audit the scene of estimating and upgrading data veracity and mining indeterminate data streams. Also, we propose five suggestions for future advancement of veracious big IoT data investigation that are identified with the heterogeneous and appropriated nature of IoT data, self-governing basic leadership, setting mindful and area streamlined philosophies, data cleaning and handling procedures for IoT edge gadgets, and protection safeguarding, customized, and secure data administration.

Download Full-text

Rethinking elastic online scheduling of big data streaming applications over high-velocity continuous data streams

The Journal of Supercomputing ◽

10.1007/s11227-017-2151-2 ◽

2017 ◽

Vol 74 (2) ◽

pp. 615-636 ◽

Cited By ~ 15

Author(s):

Dawei Sun ◽

Hongbin Yan ◽

Shang Gao ◽

Xunyun Liu ◽

Rajkumar Buyya

Keyword(s):

Big Data ◽

Data Streams ◽

High Velocity ◽

Online Scheduling ◽

Continuous Data ◽

Data Streaming ◽

Streaming Applications

Download Full-text