Ship-handling behavior pattern recognition using AIS sub-trajectory clustering analysis based on the T-SNE and spectral clustering algorithms

2020 ◽  
Vol 205 ◽  
pp. 106919 ◽  
Author(s):  
Miao Gao ◽  
Guo-You Shi
2021 ◽  
Vol 11 (8) ◽  
pp. 3693
Author(s):  
Alberto Blazquez-Herranz ◽  
Juan-Ignacio Caballero-Garzon ◽  
Albert Zilverberg ◽  
Christian Wolff ◽  
Alejandro Rodríguez-Gonzalez ◽  
...  

Mobile devices equipped with sensors are generating an amount of geo-spatial related data that, properly analyzed can be used for future applications. In particular, being able to establish similar trajectories is crucial to analyze events on common points in the trajectories. CROSS-CPP is a European project whose main aim is to provide tools to store data in a data market and to have a toolbox to analyze the data. As part of these analytic tools, a set of functionalities has been developed to cluster trajectories. Based on previous work on clustering algorithms we present in this paper a Quickbundels algorithm adaptation to trajectory clustering . Experiments using different distance measures show that Quickbundles outperforms spectral clustering, with the WS84 geodesic distance being the one that provides the best results.


Symmetry ◽  
2021 ◽  
Vol 13 (4) ◽  
pp. 596
Author(s):  
Krishna Kumar Sharma ◽  
Ayan Seal ◽  
Enrique Herrera-Viedma ◽  
Ondrej Krejcar

Calculating and monitoring customer churn metrics is important for companies to retain customers and earn more profit in business. In this study, a churn prediction framework is developed by modified spectral clustering (SC). However, the similarity measure plays an imperative role in clustering for predicting churn with better accuracy by analyzing industrial data. The linear Euclidean distance in the traditional SC is replaced by the non-linear S-distance (Sd). The Sd is deduced from the concept of S-divergence (SD). Several characteristics of Sd are discussed in this work. Assays are conducted to endorse the proposed clustering algorithm on four synthetics, eight UCI, two industrial databases and one telecommunications database related to customer churn. Three existing clustering algorithms—k-means, density-based spatial clustering of applications with noise and conventional SC—are also implemented on the above-mentioned 15 databases. The empirical outcomes show that the proposed clustering algorithm beats three existing clustering algorithms in terms of its Jaccard index, f-score, recall, precision and accuracy. Finally, we also test the significance of the clustering results by the Wilcoxon’s signed-rank test, Wilcoxon’s rank-sum test, and sign tests. The relative study shows that the outcomes of the proposed algorithm are interesting, especially in the case of clusters of arbitrary shape.


2014 ◽  
Vol 687-691 ◽  
pp. 1350-1353
Author(s):  
Li Li Fu ◽  
Yong Li Liu ◽  
Li Jing Hao

Spectral clustering algorithm is a kind of clustering algorithm based on spectral graph theory. As spectral clustering has deep theoretical foundation as well as the advantage in dealing with non-convex distribution, it has received much attention in machine learning and data mining areas. The algorithm is easy to implement, and outperforms traditional clustering algorithms such as K-means algorithm. This paper aims to give some intuitions on spectral clustering. We describe different graph partition criteria, the definition of spectral clustering, and clustering steps, etc. Finally, in order to solve the disadvantage of spectral clustering, some improvements are introduced briefly.


2017 ◽  
Author(s):  
Joe R. Melton ◽  
Reinel Sospedra-Alfonso ◽  
Kelly E. McCusker

Abstract. We investigate the application of clustering algorithms to represent sub-grid scale variability in soil texture for use in a global-scale terrestrial ecosystem model. Our model, the coupled Canadian Land Surface Scheme – Canadian Terrestrial Ecosystem Model (CLASS-CTEM), is typically implemented at a coarse spatial resolution (ca. 2.8° × 2.8°) due to its use as the land surface component of the Canadian Earth System Model (CanESM). CLASS-CTEM can, however, be run with tiling of the land surface as a means to represent sub-grid heterogeneity. We first determined that the model was sensitive to tiling of the soil textures via an idealized test case before attempting to cluster soil textures globally. To cluster a high-resolution soil texture dataset onto our coarse model grid, we use two linked algorithms (OPTICS (Ankerst et al., 1999; Daszykowski et al., 2002) and Sander et al. (2003)) to provide tiles of representative soil textures for use as CLASS-CTEM inputs. The clustering process results in, on average, about three tiles per CLASS-CTEM grid cell with most cells having four or less tiles. Results from CLASS-CTEM simulations conducted with the tiled inputs (Cluster) versus those using a simple grid-mean soil texture (Gridmean) show CLASS-CTEM, at least on a global scale, is relatively insensitive to the tiled soil textures, however differences can be large in arid or peatland regions. The Cluster simulation has generally lower soil moisture and lower overall vegetation productivity than the Gridmean simulation except in arid regions where plant productivity increases. In these dry regions, the influence of the tiling is stronger due to the general state of vegetation moisture stress which allows a single tile, whose soil texture retains more plant available water, to yield much higher productivity. Although the use of clustering analysis appears promising as a means to represent sub-grid heterogeneity, soil textures appear to be reasonably represented for global scale simulations using a simple grid-mean value.


Author(s):  
Junjie Wu ◽  
Jian Chen ◽  
Hui Xiong

Cluster analysis (Jain & Dubes, 1988) provides insight into the data by dividing the objects into groups (clusters), such that objects in a cluster are more similar to each other than objects in other clusters. Cluster analysis has long played an important role in a wide variety of fields, such as psychology, bioinformatics, pattern recognition, information retrieval, machine learning, and data mining. Many clustering algorithms, such as K-means and Unweighted Pair Group Method with Arithmetic Mean (UPGMA), have been wellestablished. A recent research focus on clustering analysis is to understand the strength and weakness of various clustering algorithms with respect to data factors. Indeed, people have identified some data characteristics that may strongly affect clustering analysis including high dimensionality and sparseness, the large size, noise, types of attributes and data sets, and scales of attributes (Tan, Steinbach, & Kumar, 2005). However, further investigation is expected to reveal whether and how the data distributions can have the impact on the performance of clustering algorithms. Along this line, we study clustering algorithms by answering three questions: 1. What are the systematic differences between the distributions of the resultant clusters by different clustering algorithms? 2. How can the distribution of the “true” cluster sizes make impact on the performances of clustering algorithms? 3. How to choose an appropriate clustering algorithm in practice? The answers to these questions can guide us for the better understanding and the use of clustering methods. This is noteworthy, since 1) in theory, people seldom realized that there are strong relationships between the clustering algorithms and the cluster size distributions, and 2) in practice, how to choose an appropriate clustering algorithm is still a challenging task, especially after an algorithm boom in data mining area. This chapter thus tries to fill this void initially. To this end, we carefully select two widely used categories of clustering algorithms, i.e., K-means and Agglomerative Hierarchical Clustering (AHC), as the representative algorithms for illustration. In the chapter, we first show that K-means tends to generate the clusters with a relatively uniform distribution on the cluster sizes. Then we demonstrate that UPGMA, one of the robust AHC methods, acts in an opposite way to K-means; that is, UPGMA tends to generate the clusters with high variation on the cluster sizes. Indeed, the experimental results indicate that the variations of the resultant cluster sizes by K-means and UPGMA, measured by the Coefficient of Variation (CV), are in the specific intervals, say [0.3, 1.0] and [1.0, 2.5] respectively. Finally, we put together K-means and UPGMA for a further comparison, and propose some rules for the better choice of the clustering schemes from the data distribution point of view.


2020 ◽  
Vol 2020 ◽  
pp. 1-15
Author(s):  
Peng Zhang ◽  
Kun She

The target of the clustering analysis is to group a set of data points into several clusters based on the similarity or distance. The similarity or distance is usually a scalar used in numerous traditional clustering algorithms. Nevertheless, a vector, such as data gravitational force, contains more information than a scalar and can be applied in clustering analysis to promote clustering performance. Therefore, this paper proposes a three-stage hierarchical clustering approach called GHC, which takes advantage of the vector characteristic of data gravitational force inspired by the law of universal gravitation. In the first stage, a sparse gravitational graph is constructed based on the top k data gravitations between each data point and its neighbors in the local region. Then the sparse graph is partitioned into many subgraphs by the gravitational influence coefficient. In the last stage, the satisfactory clustering result is obtained by merging these subgraphs iteratively by using a new linkage criterion. To demonstrate the performance of GHC algorithm, the experiments on synthetic and real-world data sets are conducted, and the results show that the GHC algorithm achieves better performance than the other existing clustering algorithms.


Author(s):  
Juanjuan Luo ◽  
Huadong Ma ◽  
Dongqing Zhou

Abstract Similarity matrix has a significant effect on the performance of the spectral clustering, and how to determine the neighborhood in the similarity matrix effectively is one of its main difficulties. In this paper, a “divide and conquer” strategy is proposed to model the similarity matrix construction task by adopting Multiobjective evolutionary algorithm (MOEA). The whole procedure is divided into two phases, phase I aims to determine the nonzero entries of the similarity matrix, and Phase II aims to determine the value of the nonzero entries of the similarity matrix. In phase I, the main contribution is that we model the task as a biobjective dynamic optimization problem, which optimizes the diversity and the similarity at the same time. It makes each individual determine one nonzero entry for each sample, and the encoding length decreases to O(N) in contrast with the non-ensemble multiobjective spectral clustering. In addition, a specific initialization operator and diversity preservation strategy are proposed during this phase. In phase II, three ensemble strategies are designed to determine the value of the nonzero value of the similarity matrix. Furthermore, this Pareto ensemble framework is extended to semi-supervised clustering by transforming the semi-supervised information to constraints. In contrast with the previous multiobjective evolutionary-based spectral clustering algorithms, the proposed Pareto ensemble-based framework makes a balance between time cost and the clustering accuracy, which is demonstrated in the experiments section.


Fluids ◽  
2020 ◽  
Vol 5 (4) ◽  
pp. 184
Author(s):  
Guilherme S. Vieira ◽  
Irina I. Rypina ◽  
Michael R. Allshouse

Partitioning ocean flows into regions dynamically distinct from their surroundings based on material transport can assist search-and-rescue planning by reducing the search domain. The spectral clustering method partitions the domain by identifying fluid particle trajectories that are similar. The partitioning validity depends on the accuracy of the ocean forecasting, which is subject to several sources of uncertainty: model initialization, limited knowledge of the physical processes, boundary conditions, and forcing terms. Instead of a single model output, multiple realizations are produced spanning a range of potential outcomes, and trajectory clustering is used to identify robust features and quantify the uncertainty of the ensemble-averaged results. First, ensemble statistics are used to investigate the cluster sensitivity to the spectral clustering method free-parameters and the forecast parameters for the analytic Bickley jet, a geostrophic flow model. Then, we analyze an operational coastal ocean ensemble forecast and compare the clustering results to drifter trajectories south of Martha’s Vineyard. This approach identifies regions of low uncertainty where drifters released within a cluster predominantly remain there throughout the window of analysis. Drifters released in regions of high uncertainty tend to either enter neighboring clusters or deviate from all predicted outcomes.


Sign in / Sign up

Export Citation Format

Share Document