PARSUC: A Parallel Subsampling-Based Method for Clustering Remote Sensing Big Data

Xia;  Huang;  Li;  Zhou;  Zhang

doi:10.3390/s19153438

PARSUC: A Parallel Subsampling-Based Method for Clustering Remote Sensing Big Data

Sensors ◽

10.3390/s19153438 ◽

2019 ◽

Vol 19 (15) ◽

pp. 3438 ◽

Cited By ~ 3

Author(s):

Xia ◽

Huang ◽

Li ◽

Zhou ◽

Zhang

Keyword(s):

Remote Sensing ◽

Big Data ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Image Data ◽

Data Partitioning ◽

Data Mining Technique ◽

Mining Technique ◽

Hadoop Platform ◽

Parallel Clustering

Remote sensing big data (RSBD) is generally characterized by huge volumes, diversity, and high dimensionality. Mining hidden information from RSBD for different applications imposes significant computational challenges. Clustering is an important data mining technique widely used in processing and analyzing remote sensing imagery. However, conventional clustering algorithms are designed for relatively small datasets. When applied to problems with RSBD, they are, in general, too slow or inefficient for practical use. In this paper, we proposed a parallel subsampling-based clustering (PARSUC) method for improving the performance of RSBD clustering in terms of both efficiency and accuracy. PARSUC leverages a novel subsampling-based data partitioning (SubDP) method to realize three-step parallel clustering, effectively solving the notable performance bottleneck of the existing parallel clustering algorithms; that is, they must cope with numerous repeated calculations to get a reasonable result. Furthermore, we propose a centroid filtering algorithm (CFA) to eliminate subsampling errors and to guarantee the accuracy of the clustering results. PARSUC was implemented on a Hadoop platform by using the MapReduce parallel model. Experiments conducted on massive remote sensing imageries with different sizes showed that PARSUC (1) provided much better accuracy than conventional remote sensing clustering algorithms in handling larger image data; (2) achieved notable scalability with increased computing nodes added; and (3) spent much less time than the existing parallel clustering algorithm in handling RSBD.

Download Full-text

Research of Remote Sensing Image Data Mining Technique Based on Web

2009 Asia-Pacific Conference on Information Processing ◽

10.1109/apcip.2009.82 ◽

2009 ◽

Author(s):

Du Ping ◽

He YueShun

Keyword(s):

Remote Sensing ◽

Data Mining ◽

Image Data ◽

Remote Sensing Image ◽

Data Mining Technique ◽

Mining Technique

Download Full-text

Retrieving Information and Discovering Knowledge from Unstructured Data Using Big Data Mining Technique: Heavy Oil Fields Example

10.2523/17805-ms ◽

2014 ◽

Cited By ~ 1

Author(s):

Wenkuang Wu ◽

Xiaoguang Lu ◽

Ben Cox ◽

Guoqiang Li ◽

Lihua Lin ◽

...

Keyword(s):

Data Mining ◽

Big Data ◽

Heavy Oil ◽

Oil Fields ◽

Unstructured Data ◽

Data Mining Technique ◽

Big Data Mining ◽

Mining Technique

Download Full-text

A Parallel Clustering Algorithm for Power Big Data Analysis

Communications in Computer and Information Science - Parallel Architecture, Algorithm and Programming ◽

10.1007/978-981-10-6442-5_51 ◽

2017 ◽

pp. 533-540

Author(s):

Xiangjun Meng ◽

Liang Chen ◽

Yidong Li

Keyword(s):

Big Data ◽

Data Analysis ◽

Clustering Algorithm ◽

Big Data Analysis ◽

Parallel Clustering

Download Full-text

A survey on parallel clustering algorithms for Big Data

Artificial Intelligence Review ◽

10.1007/s10462-020-09918-2 ◽

2020 ◽

Author(s):

Zineb Dafir ◽

Yasmine Lamari ◽

Said Chah Slaoui

Keyword(s):

Big Data ◽

Clustering Algorithms ◽

Parallel Clustering

Download Full-text

A Research Roadmap of Big Data Clustering Algorithms for Future Internet of Things

International Journal of Organizational and Collective Intelligence ◽

10.4018/ijoci.2019040102 ◽

2019 ◽

Vol 9 (2) ◽

pp. 16-30 ◽

Cited By ~ 1

Author(s):

Hind Bangui ◽

Mouzhi Ge ◽

Barbora Buhnova

Keyword(s):

Big Data ◽

Internet Of Things ◽

Mobile Networks ◽

Data Clustering ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Future Internet ◽

Research Challenges ◽

Initial Stage ◽

Big Data Technologies

Due to the massive data increase in different Internet of Things (IoT) domains such as healthcare IoT and Smart City IoT, Big Data technologies have been emerged as critical analytics tools for analyzing the IoT data. Among the Big Data technologies, data clustering is one of the essential approaches to process the IoT data. However, how to select a suitable clustering algorithm for IoT data is still unclear. Furthermore, since Big Data technology are still in its initial stage for different IoT domains, it is thus valuable to propose and structure the research challenges between Big Data and IoT. Therefore, this article starts by reviewing and comparing the data clustering algorithms that can be applied in IoT datasets, and then extends the discussions to a broader IoT context such as IoT dynamics and IoT mobile networks. Finally, this article identifies a set of research challenges that harvest a research roadmap for the Big Data research in IoT domains. The proposed research roadmap aims at bridging the research gaps between Big Data and various IoT contexts.

Download Full-text

The fast clustering algorithm for the big data based on K-means

International Journal of Wavelets Multiresolution and Information Processing ◽

10.1142/s0219691320500538 ◽

2020 ◽

Vol 18 (06) ◽

pp. 2050053

Author(s):

Ting Xie ◽

Taiping Zhang

Keyword(s):

Big Data ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Feature Space ◽

Data Sets ◽

Benchmark Data ◽

Clustering Model ◽

Alternating Direction ◽

Learning Technique ◽

Noise Data

As a powerful unsupervised learning technique, clustering is the fundamental task of big data analysis. However, many traditional clustering algorithms for big data that is a collection of high dimension, sparse and noise data do not perform well both in terms of computational efficiency and clustering accuracy. To alleviate these problems, this paper presents Feature K-means clustering model on the feature space of big data and introduces its fast algorithm based on Alternating Direction Multiplier Method (ADMM). We show the equivalence of the Feature K-means model in the original space and the feature space and prove the convergence of its iterative algorithm. Computationally, we compare the Feature K-means with Spherical K-means and Kernel K-means on several benchmark data sets, including artificial data and four face databases. Experiments show that the proposed approach is comparable to the state-of-the-art algorithm in big data clustering.

Download Full-text

Research on Parallel DBSCAN Algorithm Design Based on MapReduce

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.301-303.1133 ◽

2011 ◽

Vol 301-303 ◽

pp. 1133-1138 ◽

Cited By ~ 17

Author(s):

Yan Xiang Fu ◽

Wei Zhong Zhao ◽

Hui Fang Ma

Keyword(s):

Data Clustering ◽

Large Scale ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Algorithm Design ◽

Document Retrieval ◽

Commodity Hardware ◽

Dbscan Clustering ◽

Dbscan Algorithm ◽

Parallel Clustering

Data clustering has been received considerable attention in many applications, such as data mining, document retrieval, image segmentation and pattern classification. The enlarging volumes of information emerging by the progress of technology, makes clustering of very large scale of data a challenging task. In order to deal with the problem, more researchers try to design efficient parallel clustering algorithms. In this paper, we propose a parallel DBSCAN clustering algorithm based on Hadoop, which is a simple yet powerful parallel programming platform. The experimental results demonstrate that the proposed algorithm can scale well and efficiently process large datasets on commodity hardware.

Download Full-text

An improved ACS algorithm for data clustering

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v17.i3.pp1506-1515 ◽

2020 ◽

Vol 17 (3) ◽

pp. 1506

Author(s):

Ayad Mohammed Jabbar ◽

Ku Ruhana Ku-Mahamud ◽

Rafid Sagban

Keyword(s):

Data Clustering ◽

Foraging Behaviour ◽

Clustering Algorithms ◽

Data Mining Technique ◽

Mining Technique ◽

Algorithm Comparison ◽

Hidden Patterns ◽

Real World Datasets ◽

Acs Algorithm ◽

Modification Rate

<span lang="EN-GB">Data clustering is a data mining technique that discovers hidden patterns by creating groups (clusters) of objects. Each object in every cluster exhibits sufficient similarity to its neighbourhood, whereas objects with insufficient similarity are found in other clusters. Data clustering techniques minimise intra-cluster similarity in each cluster and maximise inter-cluster dissimilarity amongst different clusters. Ant colony optimisation for clustering (ACOC) is a swarm algorithm inspired by the foraging behaviour of ants. This algorithm minimises deterministic imperfections in which clustering is considered an optimisation problem. However, ACOC suffers from high diversification in which the algorithm cannot search for best solutions in the local neighbourhood. To improve the ACOC, this study proposes a modified ACOC, called M-ACOC, which has a modification rate parameter that controls the convergence of the algorithm. Comparison of the performance of several common clustering algorithms using real-world datasets shows that the accuracy results of the proposed algorithm surpasses other algorithms. </span>

Download Full-text

Visualizing Fuzzy Relationship in Bibliographic Big Data Using Hybrid Approach Combining Fuzzyc-Means and Newman-Girvan Algorithm

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2014.p0896 ◽

2014 ◽

Vol 18 (6) ◽

pp. 896-907 ◽

Cited By ~ 2

Author(s):

Maslina Zolkepli ◽

◽

Fangyan Dong ◽

Kaoru Hirota

Keyword(s):

Big Data ◽

Biological Networks ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Hybrid Approach ◽

Citation Network ◽

The Public ◽

Big Data Visualization ◽

Similar Cluster ◽

Fuzzy Techniques

Bibliographic big data visualization method is proposed by incorporating a combination of fuzzyc-means clustering and the Newman-Girvan clustering algorithm, where clustered results are displayed in a network view by grouping objects with similar cluster memberships. As current bibliographic visualizations focus on the crisp relationship among data, fuzzy analysis and visualization may offer insights to bibliographic big data, enabling faster decision making by improving displayed information precision. The proposed method is applied to the DBLP citation network dataset. Results show that merging two clustering algorithms and visualization using fuzzy techniques enables the user to converge a few target papers within an average of 5 minutes from 1.5 million papers stored in the DBLP. Users targeted for the proposed method include researchers, educators, and students who hope to use real-world social and biological networks. The proposal is planned to be opened to the public through the Internet.

Download Full-text

Research on Execution of Civil Servants and Professional Ethics based on Data Mining Technique and Joint Modeling Analysis of Multiple Factors under Big Data Environment

International Journal of u- and e- Service Science and Technology ◽

10.14257/ijunesst.2016.9.5.23 ◽

2016 ◽

Vol 9 (5) ◽

pp. 257-270

Author(s):

Yang Du ◽

Hongwei Wang

Keyword(s):

Data Mining ◽

Big Data ◽

Professional Ethics ◽

Joint Modeling ◽

Civil Servants ◽

Data Mining Technique ◽

Multiple Factors ◽

Mining Technique ◽

Modeling Analysis ◽

Data Environment

Download Full-text