An Efficient MapReduce-Based Parallel Clustering Algorithm for Distributed Traffic Subarea Division

Discrete Dynamics in Nature and Society ◽

10.1155/2015/793010 ◽

2015 ◽

Vol 2015 ◽

pp. 1-18 ◽

Cited By ~ 7

Author(s):

Dawen Xia ◽

Binfeng Wang ◽

Yantao Li ◽

Zhuobo Rong ◽

Zili Zhang

Keyword(s):

Intelligent Transportation Systems ◽

Large Scale ◽

Clustering Algorithm ◽

Transportation Systems ◽

Division Problem ◽

Data Sets ◽

Trajectory Data ◽

Computing Platform ◽

Distributed Computing Platform ◽

Parallel Clustering

Traffic subarea division is vital for traffic system management and traffic network analysis in intelligent transportation systems (ITSs). Since existing methods may not be suitable for big traffic data processing, this paper presents a MapReduce-based Parallel Three-PhaseK-Means (Par3PKM) algorithm for solving traffic subarea division problem on a widely adopted Hadoop distributed computing platform. Specifically, we first modify the distance metric and initialization strategy ofK-Means and then employ a MapReduce paradigm to redesign the optimizedK-Means algorithm for parallel clustering of large-scale taxi trajectories. Moreover, we propose a boundary identifying method to connect the borders of clustering results for each cluster. Finally, we divide traffic subarea of Beijing based on real-world trajectory data sets generated by 12,000 taxis in a period of one month using the proposed approach. Experimental evaluation results indicate that when compared withK-Means, Par2PK-Means, and ParCLARA, Par3PKM achieves higher efficiency, more accuracy, and better scalability and can effectively divide traffic subarea with big taxi trajectory data.

Download Full-text

Parallel Clustering Algorithm for Large-Scale Biological Data Sets

PLoS ONE ◽

10.1371/journal.pone.0091315 ◽

2014 ◽

Vol 9 (4) ◽

pp. e91315 ◽

Cited By ~ 13

Author(s):

Minchao Wang ◽

Wu Zhang ◽

Wang Ding ◽

Dongbo Dai ◽

Huiran Zhang ◽

...

Keyword(s):

Large Scale ◽

Clustering Algorithm ◽

Biological Data ◽

Data Sets ◽

Parallel Clustering

Download Full-text

A MapReduce-Based Parallel Frequent Pattern Growth Algorithm for Spatiotemporal Association Analysis of Mobile Trajectory Big Data

Complexity ◽

10.1155/2018/2818251 ◽

2018 ◽

Vol 2018 ◽

pp. 1-16 ◽

Cited By ~ 7

Author(s):

Dawen Xia ◽

Xiaonan Lu ◽

Huaqing Li ◽

Wendong Wang ◽

Yantao Li ◽

...

Keyword(s):

Big Data ◽

Association Analysis ◽

Intelligent Transportation Systems ◽

Large Scale ◽

Pattern Mining ◽

Frequent Pattern Mining ◽

Transportation Systems ◽

Frequent Pattern ◽

Trajectory Data ◽

Pattern Growth

Frequent pattern mining is an effective approach for spatiotemporal association analysis of mobile trajectory big data in data-driven intelligent transportation systems. While existing parallel algorithms have been successfully applied to frequent pattern mining of large-scale trajectory data, two major challenges are how to overcome the inherent defects of Hadoop to cope with taxi trajectory big data including massive small files and how to discover the implicitly spatiotemporal frequent patterns with MapReduce. To conquer these challenges, this paper presents a MapReduce-based Parallel Frequent Pattern growth (MR-PFP) algorithm to analyze the spatiotemporal characteristics of taxi operating using large-scale taxi trajectories with massive small file processing strategies on a Hadoop platform. More specifically, we first implement three methods, that is, Hadoop Archives (HAR), CombineFileInputFormat (CFIF), and Sequence Files (SF), to overcome the existing defects of Hadoop and then propose two strategies based on their performance evaluations. Next, we incorporate SF into Frequent Pattern growth (FP-growth) algorithm and then implement the optimized FP-growth algorithm on a MapReduce framework. Finally, we analyze the characteristics of taxi operating in both spatial and temporal dimensions by MR-PFP in parallel. The results demonstrate that MR-PFP is superior to existing Parallel FP-growth (PFP) algorithm in efficiency and scalability.

Download Full-text

GNSS RUMS: GNSS Realistic Urban Multiagent Simulator for Collaborative Positioning Research

Remote Sensing ◽

10.3390/rs13040544 ◽

2021 ◽

Vol 13 (4) ◽

pp. 544

Author(s):

Guohao Zhang ◽

Bing Xu ◽

Hoi-Fung Ng ◽

Li-Ta Hsu

Keyword(s):

Urban Area ◽

Intelligent Transportation Systems ◽

Urban Areas ◽

Large Scale ◽

Transportation Systems ◽

Accurate Localization ◽

Gnss Receivers ◽

Multipath Signal ◽

Scale Test ◽

Error Behavior

Accurate localization of road agents (GNSS receivers) is the basis of intelligent transportation systems, which is still difficult to achieve for GNSS positioning in urban areas due to the signal interferences from buildings. Various collaborative positioning techniques were recently developed to improve the positioning performance by the aid from neighboring agents. However, it is still challenging to study their performances comprehensively. The GNSS measurement error behavior is complicated in urban areas and unable to be represented by naive models. On the other hand, real experiments requiring numbers of devices are difficult to conduct, especially for a large-scale test. Therefore, a GNSS realistic urban measurement simulator is developed to provide measurements for collaborative positioning studies. The proposed simulator employs a ray-tracing technique searching for all possible interferences in the urban area. Then, it categorizes them into direct, reflected, diffracted, and multipath signal to simulate the pseudorange, C/N0, and Doppler shift measurements correspondingly. The performance of the proposed simulator is validated through real experimental comparisons with different scenarios based on commercial-grade receivers. The proposed simulator is also applied with different positioning algorithms, which verifies it is sophisticated enough for the collaborative positioning studies in the urban area.

Download Full-text

Self-Adaptive K-Means Based on a Covering Algorithm

Complexity ◽

10.1155/2018/7698274 ◽

2018 ◽

Vol 2018 ◽

pp. 1-16 ◽

Cited By ~ 1

Author(s):

Yiwen Zhang ◽

Yuanyuan Zhou ◽

Xing Guo ◽

Jintao Wu ◽

Qiang He ◽

...

Keyword(s):

Large Scale ◽

Clustering Algorithm ◽

Real Data ◽

Second Phase ◽

Data Sets ◽

Number Of Clusters ◽

Large Scale Data ◽

Long Time ◽

Two Phases ◽

Selection Of

The K-means algorithm is one of the ten classic algorithms in the area of data mining and has been studied by researchers in numerous fields for a long time. However, the value of the clustering number k in the K-means algorithm is not always easy to be determined, and the selection of the initial centers is vulnerable to outliers. This paper proposes an improved K-means clustering algorithm called the covering K-means algorithm (C-K-means). The C-K-means algorithm can not only acquire efficient and accurate clustering results but also self-adaptively provide a reasonable numbers of clusters based on the data features. It includes two phases: the initialization of the covering algorithm (CA) and the Lloyd iteration of the K-means. The first phase executes the CA. CA self-organizes and recognizes the number of clusters k based on the similarities in the data, and it requires neither the number of clusters to be prespecified nor the initial centers to be manually selected. Therefore, it has a “blind” feature, that is, k is not preselected. The second phase performs the Lloyd iteration based on the results of the first phase. The C-K-means algorithm combines the advantages of CA and K-means. Experiments are carried out on the Spark platform, and the results verify the good scalability of the C-K-means algorithm. This algorithm can effectively solve the problem of large-scale data clustering. Extensive experiments on real data sets show that the accuracy and efficiency of the C-K-means algorithm outperforms the existing algorithms under both sequential and parallel conditions.

Download Full-text

Deep Autoencoder Neural Networks for Short-Term Traffic Congestion Prediction of Transportation Networks

Sensors ◽

10.3390/s19102229 ◽

2019 ◽

Vol 19 (10) ◽

pp. 2229 ◽

Cited By ~ 12

Author(s):

Sen Zhang ◽

Yong Yao ◽

Jie Hu ◽

Yong Zhao ◽

Shaobo Li ◽

...

Keyword(s):

Neural Network ◽

Intelligent Transportation Systems ◽

Traffic Congestion ◽

Large Scale ◽

Transportation Networks ◽

Transportation Network ◽

Transportation Systems ◽

Neural Network Models ◽

Computation Efficiency ◽

Congestion Prediction

Traffic congestion prediction is critical for implementing intelligent transportation systems for improving the efficiency and capacity of transportation networks. However, despite its importance, traffic congestion prediction is severely less investigated compared to traffic flow prediction, which is partially due to the severe lack of large-scale high-quality traffic congestion data and advanced algorithms. This paper proposes an accessible and general workflow to acquire large-scale traffic congestion data and to create traffic congestion datasets based on image analysis. With this workflow we create a dataset named Seattle Area Traffic Congestion Status (SATCS) based on traffic congestion map snapshots from a publicly available online traffic service provider Washington State Department of Transportation. We then propose a deep autoencoder-based neural network model with symmetrical layers for the encoder and the decoder to learn temporal correlations of a transportation network and predicting traffic congestion. Our experimental results on the SATCS dataset show that the proposed DCPN model can efficiently and effectively learn temporal relationships of congestion levels of the transportation network for traffic congestion forecasting. Our method outperforms two other state-of-the-art neural network models in prediction performance, generalization capability, and computation efficiency.

Download Full-text

Blockchain for the Internet of Vehicles: A Decentralized IoT Solution for Vehicles Communication Using Ethereum

Sensors ◽

10.3390/s20143928 ◽

2020 ◽

Vol 20 (14) ◽

pp. 3928 ◽

Cited By ~ 1

Author(s):

Rateb Jabbar ◽

Mohamed Kharbeche ◽

Khalifa Al-Khalifa ◽

Moez Krichen ◽

Kamel Barkaoui

Keyword(s):

Intelligent Transportation Systems ◽

Secure Communication ◽

Data Exchange ◽

Smart Cities ◽

Transportation Systems ◽

Smart Devices ◽

The Internet ◽

Internet Of Vehicles ◽

Computing Platform ◽

Blockchain Technology

The concept of smart cities has become prominent in modern metropolises due to the emergence of embedded and connected smart devices, systems, and technologies. They have enabled the connection of every “thing” to the Internet. Therefore, in the upcoming era of the Internet of Things, the Internet of Vehicles (IoV) will play a crucial role in newly developed smart cities. The IoV has the potential to solve various traffic and road safety problems effectively in order to prevent fatal crashes. However, a particular challenge in the IoV, especially in Vehicle-to-Vehicle (V2V) and Vehicle-to-Infrastructure (V2I) communications, is to ensure fast, secure transmission and accurate recording of the data. In order to overcome these challenges, this work is adapting Blockchain technology for real time application (RTA) to solve Vehicle-to-Everything (V2X) communications problems. Therefore, the main novelty of this paper is to develop a Blockchain-based IoT system in order to establish secure communication and create an entirely decentralized cloud computing platform. Moreover, the authors qualitatively tested the performance and resilience of the proposed system against common security attacks. Computational tests showed that the proposed solution solved the main challenges of Vehicle-to-X (V2X) communications such as security, centralization, and lack of privacy. In addition, it guaranteed an easy data exchange between different actors of intelligent transportation systems.

Download Full-text

An efficient trajectory-clustering algorithm based on an index tree

Transactions of the Institute of Measurement and Control ◽

10.1177/0142331211423284 ◽

2011 ◽

Vol 34 (7) ◽

pp. 850-861 ◽

Cited By ~ 15

Author(s):

Guan Yuan ◽

Shixiong Xia ◽

Lei Zhang ◽

Yong Zhou ◽

Cheng Ji

Keyword(s):

Radio Frequency Identification ◽

Clustering Algorithm ◽

Real Data ◽

Structural Similarity ◽

Location Based Services ◽

Similarity Function ◽

Data Sets ◽

Trajectory Clustering ◽

Trajectory Data ◽

Index Tree

With the development of location-based services, such as the Global Positioning System and Radio Frequency Identification, a great deal of trajectory data can be collected. Therefore, how to mine knowledge from these data has become an attractive topic. In this paper, we propose an efficient trajectory-clustering algorithm based on an index tree. Firstly, an index tree is proposed to store trajectories and their similarity matrix, with which trajectories can be retrieved efficiently; secondly, a new conception of trajectory structure is introduced to analyse both the internal and external features of trajectories; then, trajectories are partitioned into trajectory segments according to their corners; furthermore, the similarity between every trajectory segment pairs is compared by presenting the structural similarity function; finally, trajectory segments are grouped into different clusters according to their location in the different levels of the index tree. Experimental results on real data sets demonstrate not only the efficiency and effectiveness of our algorithm, but also the great flexibility that feature sensitivity can be adjusted by different parameters, and the cluster results are more practically significant.

Download Full-text

dSimpleGraph: A Novel Distributed Clustering Algorithm for Exploring Very Large Scale Unknown Data Sets

2010 IEEE International Conference on Data Mining Workshops ◽

10.1109/icdmw.2010.12 ◽

2010 ◽

Cited By ~ 3

Author(s):

Li Lu ◽

Yunhong Gu ◽

Robert Grossman

Keyword(s):

Large Scale ◽

Clustering Algorithm ◽

Data Sets ◽

Distributed Clustering

Download Full-text

Efficient and Reliable Cluster-Based Data Transmission for Vehicular Ad Hoc Networks

Mobile Information Systems ◽

10.1155/2018/9826782 ◽

2018 ◽

Vol 2018 ◽

pp. 1-15 ◽

Cited By ~ 4

Author(s):

Xiang Ji ◽

Huiqun Yu ◽

Guisheng Fan ◽

Huaiying Sun ◽

Liqiong Chen

Keyword(s):

Routing Protocol ◽

Data Transmission ◽

Intelligent Transportation Systems ◽

Clustering Algorithm ◽

Vehicular Ad Hoc Networks ◽

Ad Hoc ◽

Cluster Formation ◽

Cluster Head ◽

Sampling Strategy ◽

Transportation Systems

Vehicular ad hoc network (VANET) is an emerging technology for the future intelligent transportation systems (ITSs). The current researches are intensely focusing on the problems of routing protocol reliability and scalability across the urban VANETs. Vehicle clustering is testified to be a promising approach to improve routing reliability and scalability by grouping vehicles together to serve as the foundation for ITS applications. However, some prominent characteristics, like high mobility and uneven spatial distribution of vehicles, may affect the clustering performance. Therefore, how to establish and maintain stable clusters has become a challenging problem in VANETs. This paper proposes a link reliability-based clustering algorithm (LRCA) to provide efficient and reliable data transmission in VANETs. Before clustering, a novel link lifetime-based (LLT-based) neighbor sampling strategy is put forward to filter out the redundant unstable neighbors. The proposed clustering scheme mainly composes of three parts: cluster head selection, cluster formation, and cluster maintenance. Furthermore, we propose a routing protocol of LRCA to serve the infotainment applications in VANET. To make routing decisions appropriate, we nominate special nodes at intersections to evaluate the network condition by assigning weights to the road segments. Routes with the lowest weights are then selected as the optimal data forwarding paths. We evaluate clustering stability and routing performance of the proposed approach by comparing with some existing schemes. The extensive simulation results show that our approach outperforms in both cluster stability and data transmission.

Download Full-text

An Efficient Parallel Clustering Algorithm for Large Scale Database

Journal of Software ◽

10.4304/jsw.4.10.1119-1126 ◽

2009 ◽

Vol 4 (10) ◽

Author(s):

Jianfeng Yang ◽

Puliu Yan ◽

Yinbo Xie ◽

Qing Geng ◽

Jolly Wang ◽

...

Keyword(s):

Large Scale ◽

Clustering Algorithm ◽

Parallel Clustering

Download Full-text