Big Data Clustering using Data Streams Approach

In the healthcare industry, the ability to monitor patients via biomedical signals assists healthcare professionals in detecting early signs of conditions such as blocked arteries and abnormal heart rhythms. Using data clustering, it is possible to interpret these signals to look for patterns that may indicate emerging or developing conditions. This can be accomplished by basing monitoring systems on a fast clustering algorithm that processes fast-paced streams of raw data effectively. This paper presents a clustering method, POD-Clus, which can be useful in computer-aided diagnosis. The proposed method clusters data streams in linear time and outperforms a competing algorithm in capturing changes of clusters in data streams.

Download Full-text

Linear Scheduling of Big Data Streams on Multiprocessor Sets in the Cloud

IEEE/WIC/ACM International Conference on Web Intelligence on - WI '19 ◽

10.1145/3350546.3352507 ◽

2019 ◽

Cited By ~ 1

Author(s):

Nicoleta Tantalaki ◽

Stavros Souravlas ◽

Manos Roumeliotis ◽

Stefanos Katsavounis

Keyword(s):

Big Data ◽

Data Streams ◽

Linear Scheduling ◽

Big Data Streams

Download Full-text

Ensembled Adaptive Fuzzy K-Means With Stochastic Extreme Gradient Boost Big Data Clustering on Geo-Social Networks

2021 International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE) ◽

10.1109/icacite51222.2021.9404574 ◽

2021 ◽

Author(s):

M. Anoop ◽

P. Sripriya

Keyword(s):

Social Networks ◽

Big Data ◽

Data Clustering ◽

Adaptive Fuzzy

Download Full-text

Social influence determination on big data streams in an online social network

Multimedia Tools and Applications ◽

10.1007/s11042-017-4890-8 ◽

2017 ◽

Vol 76 (21) ◽

pp. 22133-22167

Author(s):

Kumaran P. ◽

Chitrakala S.

Keyword(s):

Big Data ◽

Social Network ◽

Social Influence ◽

Data Streams ◽

Online Social Network ◽

Big Data Streams

Download Full-text

Boosting heapsort performance of processing Big Data streams

SoutheastCon 2016 ◽

10.1109/secon.2016.7506674 ◽

2016 ◽

Author(s):

Usamah Algemili ◽

Adi Alhudhaif

Keyword(s):

Big Data ◽

Data Streams ◽

Big Data Streams

Download Full-text

Efficient prediction of concentrating solar power plant productivity using data clustering

Solar Energy ◽

10.1016/j.solener.2021.06.002 ◽

2021 ◽

Vol 224 ◽

pp. 730-741

Author(s):

Janna Martinek ◽

Michael J. Wagner

Keyword(s):

Power Plant ◽

Data Clustering ◽

Solar Power ◽

Plant Productivity ◽

Solar Power Plant ◽

Concentrating Solar Power ◽

Efficient Prediction ◽

Using Data

Download Full-text

A nonparametric adaptive sampling strategy for online monitoring of big data streams

2017 13th IEEE Conference on Automation Science and Engineering (CASE) ◽

10.1109/coase.2017.8256208 ◽

2017 ◽

Author(s):

Xiaochen Xian ◽

Andi Wang ◽

Kaibo Liu

Keyword(s):

Big Data ◽

Data Streams ◽

Adaptive Sampling ◽

Online Monitoring ◽

Sampling Strategy ◽

Big Data Streams

Download Full-text

Kernel-based low-rank feature extraction on a budget for big data streams

2015 IEEE Global Conference on Signal and Information Processing (GlobalSIP) ◽

10.1109/globalsip.2015.7418333 ◽

2015 ◽

Cited By ~ 3

Author(s):

Fatemeh Sheikholeslami ◽

Dimitris Berberidis ◽

Georgios B. Giannakis

Keyword(s):

Feature Extraction ◽

Big Data ◽

Data Streams ◽

Low Rank ◽

Big Data Streams

Download Full-text

Measuring the Effectiveness of Adaptive Random Forest for Handling Concept Drift in Big Data Streams

Entropy ◽

10.3390/e23070859 ◽

2021 ◽

Vol 23 (7) ◽

pp. 859

Author(s):

Abdulaziz O. AlQabbany ◽

Aqil M. Azmi

Keyword(s):

Big Data ◽

Random Forest ◽

Real Time ◽

Data Streams ◽

Learning Algorithm ◽

Concept Drift ◽

The United States ◽

Careful Consideration ◽

Data Sets ◽

Stream Data

We are living in the age of big data, a majority of which is stream data. The real-time processing of this data requires careful consideration from different perspectives. Concept drift is a change in the data’s underlying distribution, a significant issue, especially when learning from data streams. It requires learners to be adaptive to dynamic changes. Random forest is an ensemble approach that is widely used in classical non-streaming settings of machine learning applications. At the same time, the Adaptive Random Forest (ARF) is a stream learning algorithm that showed promising results in terms of its accuracy and ability to deal with various types of drift. The incoming instances’ continuity allows for their binomial distribution to be approximated to a Poisson(1) distribution. In this study, we propose a mechanism to increase such streaming algorithms’ efficiency by focusing on resampling. Our measure, resampling effectiveness (ρ), fuses the two most essential aspects in online learning; accuracy and execution time. We use six different synthetic data sets, each having a different type of drift, to empirically select the parameter λ of the Poisson distribution that yields the best value for ρ. By comparing the standard ARF with its tuned variations, we show that ARF performance can be enhanced by tackling this important aspect. Finally, we present three case studies from different contexts to test our proposed enhancement method and demonstrate its effectiveness in processing large data sets: (a) Amazon customer reviews (written in English), (b) hotel reviews (in Arabic), and (c) real-time aspect-based sentiment analysis of COVID-19-related tweets in the United States during April 2020. Results indicate that our proposed method of enhancement exhibited considerable improvement in most of the situations.

Download Full-text

Big data clustering considering chaotic correlation dimension feature extraction

10.1109/icvris51417.2020.00130 ◽

2020 ◽

Author(s):

Shanshan Liu

Keyword(s):

Feature Extraction ◽

Big Data ◽

Data Clustering ◽

Correlation Dimension

Download Full-text