scholarly journals PARSUC: A Parallel Subsampling-Based Method for Clustering Remote Sensing Big Data

Sensors ◽  
2019 ◽  
Vol 19 (15) ◽  
pp. 3438 ◽  
Author(s):  
Xia ◽  
Huang ◽  
Li ◽  
Zhou ◽  
Zhang

Remote sensing big data (RSBD) is generally characterized by huge volumes, diversity, and high dimensionality. Mining hidden information from RSBD for different applications imposes significant computational challenges. Clustering is an important data mining technique widely used in processing and analyzing remote sensing imagery. However, conventional clustering algorithms are designed for relatively small datasets. When applied to problems with RSBD, they are, in general, too slow or inefficient for practical use. In this paper, we proposed a parallel subsampling-based clustering (PARSUC) method for improving the performance of RSBD clustering in terms of both efficiency and accuracy. PARSUC leverages a novel subsampling-based data partitioning (SubDP) method to realize three-step parallel clustering, effectively solving the notable performance bottleneck of the existing parallel clustering algorithms; that is, they must cope with numerous repeated calculations to get a reasonable result. Furthermore, we propose a centroid filtering algorithm (CFA) to eliminate subsampling errors and to guarantee the accuracy of the clustering results. PARSUC was implemented on a Hadoop platform by using the MapReduce parallel model. Experiments conducted on massive remote sensing imageries with different sizes showed that PARSUC (1) provided much better accuracy than conventional remote sensing clustering algorithms in handling larger image data; (2) achieved notable scalability with increased computing nodes added; and (3) spent much less time than the existing parallel clustering algorithm in handling RSBD.

Author(s):  
Hind Bangui ◽  
Mouzhi Ge ◽  
Barbora Buhnova

Due to the massive data increase in different Internet of Things (IoT) domains such as healthcare IoT and Smart City IoT, Big Data technologies have been emerged as critical analytics tools for analyzing the IoT data. Among the Big Data technologies, data clustering is one of the essential approaches to process the IoT data. However, how to select a suitable clustering algorithm for IoT data is still unclear. Furthermore, since Big Data technology are still in its initial stage for different IoT domains, it is thus valuable to propose and structure the research challenges between Big Data and IoT. Therefore, this article starts by reviewing and comparing the data clustering algorithms that can be applied in IoT datasets, and then extends the discussions to a broader IoT context such as IoT dynamics and IoT mobile networks. Finally, this article identifies a set of research challenges that harvest a research roadmap for the Big Data research in IoT domains. The proposed research roadmap aims at bridging the research gaps between Big Data and various IoT contexts.


Author(s):  
Ting Xie ◽  
Taiping Zhang

As a powerful unsupervised learning technique, clustering is the fundamental task of big data analysis. However, many traditional clustering algorithms for big data that is a collection of high dimension, sparse and noise data do not perform well both in terms of computational efficiency and clustering accuracy. To alleviate these problems, this paper presents Feature K-means clustering model on the feature space of big data and introduces its fast algorithm based on Alternating Direction Multiplier Method (ADMM). We show the equivalence of the Feature K-means model in the original space and the feature space and prove the convergence of its iterative algorithm. Computationally, we compare the Feature K-means with Spherical K-means and Kernel K-means on several benchmark data sets, including artificial data and four face databases. Experiments show that the proposed approach is comparable to the state-of-the-art algorithm in big data clustering.


2011 ◽  
Vol 301-303 ◽  
pp. 1133-1138 ◽  
Author(s):  
Yan Xiang Fu ◽  
Wei Zhong Zhao ◽  
Hui Fang Ma

Data clustering has been received considerable attention in many applications, such as data mining, document retrieval, image segmentation and pattern classification. The enlarging volumes of information emerging by the progress of technology, makes clustering of very large scale of data a challenging task. In order to deal with the problem, more researchers try to design efficient parallel clustering algorithms. In this paper, we propose a parallel DBSCAN clustering algorithm based on Hadoop, which is a simple yet powerful parallel programming platform. The experimental results demonstrate that the proposed algorithm can scale well and efficiently process large datasets on commodity hardware.


Author(s):  
Ayad Mohammed Jabbar ◽  
Ku Ruhana Ku-Mahamud ◽  
Rafid Sagban

<span lang="EN-GB">Data clustering is a data mining technique that discovers hidden patterns by creating groups (clusters) of objects. Each object in every cluster exhibits sufficient similarity to its neighbourhood, whereas objects with insufficient similarity are found in other clusters. Data clustering techniques minimise intra-cluster similarity in each cluster and maximise inter-cluster dissimilarity amongst different clusters. Ant colony optimisation for clustering (ACOC) is a swarm algorithm inspired by the foraging behaviour of ants. This algorithm minimises deterministic imperfections in which clustering is considered an optimisation problem. However, ACOC suffers from high diversification in which the algorithm cannot search for best solutions in the local neighbourhood. To improve the ACOC, this study proposes a modified ACOC, called M-ACOC, which has a modification rate parameter that controls the convergence of the algorithm. Comparison of the performance of several common clustering algorithms using real-world datasets shows that the accuracy results of the proposed algorithm surpasses other algorithms. </span>


Author(s):  
Maslina Zolkepli ◽  
◽  
Fangyan Dong ◽  
Kaoru Hirota

Bibliographic big data visualization method is proposed by incorporating a combination of fuzzyc-means clustering and the Newman-Girvan clustering algorithm, where clustered results are displayed in a network view by grouping objects with similar cluster memberships. As current bibliographic visualizations focus on the crisp relationship among data, fuzzy analysis and visualization may offer insights to bibliographic big data, enabling faster decision making by improving displayed information precision. The proposed method is applied to the DBLP citation network dataset. Results show that merging two clustering algorithms and visualization using fuzzy techniques enables the user to converge a few target papers within an average of 5 minutes from 1.5 million papers stored in the DBLP. Users targeted for the proposed method include researchers, educators, and students who hope to use real-world social and biological networks. The proposal is planned to be opened to the public through the Internet.


Sign in / Sign up

Export Citation Format

Share Document