Ship-handling behavior pattern recognition using AIS sub-trajectory clustering analysis based on the T-SNE and spectral clustering algorithms

Clustering Moving Object Trajectories: Integration in CROSS-CPP Analytic Toolbox

Applied Sciences ◽

10.3390/app11083693 ◽

2021 ◽

Vol 11 (8) ◽

pp. 3693

Author(s):

Alberto Blazquez-Herranz ◽

Juan-Ignacio Caballero-Garzon ◽

Albert Zilverberg ◽

Christian Wolff ◽

Alejandro Rodríguez-Gonzalez ◽

...

Keyword(s):

Spectral Clustering ◽

Geodesic Distance ◽

Clustering Algorithms ◽

Distance Measures ◽

Trajectory Clustering ◽

Related Data ◽

Data Market ◽

The One ◽

Object Trajectories ◽

Moving Object Trajectories

Mobile devices equipped with sensors are generating an amount of geo-spatial related data that, properly analyzed can be used for future applications. In particular, being able to establish similar trajectories is crucial to analyze events on common points in the trajectories. CROSS-CPP is a European project whose main aim is to provide tools to store data in a data market and to have a toolbox to analyze the data. As part of these analytic tools, a set of functionalities has been developed to cluster trajectories. Based on previous work on clustering algorithms we present in this paper a Quickbundels algorithm adaptation to trajectory clustering . Experiments using different distance measures show that Quickbundles outperforms spectral clustering, with the WS84 geodesic distance being the one that provides the best results.

Download Full-text

An Enhanced Spectral Clustering Algorithm with S-Distance

Symmetry ◽

10.3390/sym13040596 ◽

2021 ◽

Vol 13 (4) ◽

pp. 596

Author(s):

Krishna Kumar Sharma ◽

Ayan Seal ◽

Enrique Herrera-Viedma ◽

Ondrej Krejcar

Keyword(s):

Spectral Clustering ◽

Clustering Algorithm ◽

Spatial Clustering ◽

Clustering Algorithms ◽

Rank Test ◽

Customer Churn ◽

Signed Rank ◽

Signed Rank Test ◽

Spectral Clustering Algorithm ◽

Industrial Databases

Calculating and monitoring customer churn metrics is important for companies to retain customers and earn more profit in business. In this study, a churn prediction framework is developed by modified spectral clustering (SC). However, the similarity measure plays an imperative role in clustering for predicting churn with better accuracy by analyzing industrial data. The linear Euclidean distance in the traditional SC is replaced by the non-linear S-distance (Sd). The Sd is deduced from the concept of S-divergence (SD). Several characteristics of Sd are discussed in this work. Assays are conducted to endorse the proposed clustering algorithm on four synthetics, eight UCI, two industrial databases and one telecommunications database related to customer churn. Three existing clustering algorithms—k-means, density-based spatial clustering of applications with noise and conventional SC—are also implemented on the above-mentioned 15 databases. The empirical outcomes show that the proposed clustering algorithm beats three existing clustering algorithms in terms of its Jaccard index, f-score, recall, precision and accuracy. Finally, we also test the significance of the clustering results by the Wilcoxon’s signed-rank test, Wilcoxon’s rank-sum test, and sign tests. The relative study shows that the outcomes of the proposed algorithm are interesting, especially in the case of clusters of arbitrary shape.

Download Full-text

Research on Spectral Clustering

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.687-691.1350 ◽

2014 ◽

Vol 687-691 ◽

pp. 1350-1353

Author(s):

Li Li Fu ◽

Yong Li Liu ◽

Li Jing Hao

Keyword(s):

Spectral Clustering ◽

Clustering Algorithm ◽

Theoretical Foundation ◽

Clustering Algorithms ◽

Spectral Graph Theory ◽

Graph Partition ◽

Mining Areas ◽

Spectral Graph ◽

Definition Of ◽

Spectral Clustering Algorithm

Spectral clustering algorithm is a kind of clustering algorithm based on spectral graph theory. As spectral clustering has deep theoretical foundation as well as the advantage in dealing with non-convex distribution, it has received much attention in machine learning and data mining areas. The algorithm is easy to implement, and outperforms traditional clustering algorithms such as K-means algorithm. This paper aims to give some intuitions on spectral clustering. We describe different graph partition criteria, the definition of spectral clustering, and clustering steps, etc. Finally, in order to solve the disadvantage of spectral clustering, some improvements are introduced briefly.

Download Full-text

Tiling soil textures for terrestrial ecosystem modelling via clustering analysis: a case study with CLASS-CTEM (version 2.1)

10.5194/gmd-2017-3 ◽

2017 ◽

Author(s):

Joe R. Melton ◽

Reinel Sospedra-Alfonso ◽

Kelly E. McCusker

Keyword(s):

Soil Texture ◽

Land Surface ◽

Clustering Analysis ◽

Clustering Algorithms ◽

Terrestrial Ecosystem ◽

Grid Cell ◽

Moisture Stress ◽

Ecosystem Model ◽

Global Scale ◽

Terrestrial Ecosystem Model

Abstract. We investigate the application of clustering algorithms to represent sub-grid scale variability in soil texture for use in a global-scale terrestrial ecosystem model. Our model, the coupled Canadian Land Surface Scheme – Canadian Terrestrial Ecosystem Model (CLASS-CTEM), is typically implemented at a coarse spatial resolution (ca. 2.8° × 2.8°) due to its use as the land surface component of the Canadian Earth System Model (CanESM). CLASS-CTEM can, however, be run with tiling of the land surface as a means to represent sub-grid heterogeneity. We first determined that the model was sensitive to tiling of the soil textures via an idealized test case before attempting to cluster soil textures globally. To cluster a high-resolution soil texture dataset onto our coarse model grid, we use two linked algorithms (OPTICS (Ankerst et al., 1999; Daszykowski et al., 2002) and Sander et al. (2003)) to provide tiles of representative soil textures for use as CLASS-CTEM inputs. The clustering process results in, on average, about three tiles per CLASS-CTEM grid cell with most cells having four or less tiles. Results from CLASS-CTEM simulations conducted with the tiled inputs (Cluster) versus those using a simple grid-mean soil texture (Gridmean) show CLASS-CTEM, at least on a global scale, is relatively insensitive to the tiled soil textures, however differences can be large in arid or peatland regions. The Cluster simulation has generally lower soil moisture and lower overall vegetation productivity than the Gridmean simulation except in arid regions where plant productivity increases. In these dry regions, the influence of the tiling is stronger due to the general state of vegetation moisture stress which allows a single tile, whose soil texture retains more plant available water, to yield much higher productivity. Although the use of clustering analysis appears promising as a means to represent sub-grid heterogeneity, soil textures appear to be reasonably represented for global scale simulations using a simple grid-mean value.

Download Full-text

A Data Distribution View of Clustering Algorithms

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch059 ◽

2011 ◽

pp. 374-381 ◽

Cited By ~ 1

Author(s):

Junjie Wu ◽

Jian Chen ◽

Hui Xiong

Keyword(s):

Data Mining ◽

Cluster Analysis ◽

Clustering Analysis ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Data Distribution ◽

Point Of View ◽

Group Method ◽

Data Sets ◽

Distribution Point

Cluster analysis (Jain & Dubes, 1988) provides insight into the data by dividing the objects into groups (clusters), such that objects in a cluster are more similar to each other than objects in other clusters. Cluster analysis has long played an important role in a wide variety of fields, such as psychology, bioinformatics, pattern recognition, information retrieval, machine learning, and data mining. Many clustering algorithms, such as K-means and Unweighted Pair Group Method with Arithmetic Mean (UPGMA), have been wellestablished. A recent research focus on clustering analysis is to understand the strength and weakness of various clustering algorithms with respect to data factors. Indeed, people have identified some data characteristics that may strongly affect clustering analysis including high dimensionality and sparseness, the large size, noise, types of attributes and data sets, and scales of attributes (Tan, Steinbach, & Kumar, 2005). However, further investigation is expected to reveal whether and how the data distributions can have the impact on the performance of clustering algorithms. Along this line, we study clustering algorithms by answering three questions: 1. What are the systematic differences between the distributions of the resultant clusters by different clustering algorithms? 2. How can the distribution of the “true” cluster sizes make impact on the performances of clustering algorithms? 3. How to choose an appropriate clustering algorithm in practice? The answers to these questions can guide us for the better understanding and the use of clustering methods. This is noteworthy, since 1) in theory, people seldom realized that there are strong relationships between the clustering algorithms and the cluster size distributions, and 2) in practice, how to choose an appropriate clustering algorithm is still a challenging task, especially after an algorithm boom in data mining area. This chapter thus tries to fill this void initially. To this end, we carefully select two widely used categories of clustering algorithms, i.e., K-means and Agglomerative Hierarchical Clustering (AHC), as the representative algorithms for illustration. In the chapter, we first show that K-means tends to generate the clusters with a relatively uniform distribution on the cluster sizes. Then we demonstrate that UPGMA, one of the robust AHC methods, acts in an opposite way to K-means; that is, UPGMA tends to generate the clusters with high variation on the cluster sizes. Indeed, the experimental results indicate that the variations of the resultant cluster sizes by K-means and UPGMA, measured by the Coefficient of Variation (CV), are in the specific intervals, say [0.3, 1.0] and [1.0, 2.5] respectively. Finally, we put together K-means and UPGMA for a further comparison, and propose some rules for the better choice of the clustering schemes from the data distribution point of view.

Download Full-text

Research and development of spectral clustering algorithms

International Journal of Collaborative Intelligence ◽

10.1504/ijci.2016.084114 ◽

2016 ◽

Vol 1 (4) ◽

pp. 275

Author(s):

Ling Ding

Keyword(s):

Research And Development ◽

Spectral Clustering ◽

Clustering Algorithms

Download Full-text

A Novel Hierarchical Clustering Approach Based on Universal Gravitation

Mathematical Problems in Engineering ◽

10.1155/2020/6748056 ◽

2020 ◽

Vol 2020 ◽

pp. 1-15

Author(s):

Peng Zhang ◽

Kun She

Keyword(s):

Hierarchical Clustering ◽

Clustering Analysis ◽

Gravitational Force ◽

Clustering Algorithms ◽

Influence Coefficient ◽

Data Sets ◽

Universal Gravitation ◽

Real World Data ◽

Gravitational Influence ◽

Clustering Approach

The target of the clustering analysis is to group a set of data points into several clusters based on the similarity or distance. The similarity or distance is usually a scalar used in numerous traditional clustering algorithms. Nevertheless, a vector, such as data gravitational force, contains more information than a scalar and can be applied in clustering analysis to promote clustering performance. Therefore, this paper proposes a three-stage hierarchical clustering approach called GHC, which takes advantage of the vector characteristic of data gravitational force inspired by the law of universal gravitation. In the first stage, a sparse gravitational graph is constructed based on the top k data gravitations between each data point and its neighbors in the local region. Then the sparse graph is partitioned into many subgraphs by the gravitational influence coefficient. In the last stage, the satisfactory clustering result is obtained by merging these subgraphs iteratively by using a new linkage criterion. To demonstrate the performance of GHC algorithm, the experiments on synthetic and real-world data sets are conducted, and the results show that the GHC algorithm achieves better performance than the other existing clustering algorithms.

Download Full-text

A pareto ensemble based spectral clustering framework

Complex & Intelligent Systems ◽

10.1007/s40747-020-00215-7 ◽

2020 ◽

Author(s):

Juanjuan Luo ◽

Huadong Ma ◽

Dongqing Zhou

Keyword(s):

Phase I ◽

Phase Ii ◽

Spectral Clustering ◽

Clustering Algorithms ◽

Divide And Conquer ◽

Nonzero Entry ◽

Similarity Matrix ◽

Diversity Preservation ◽

Two Phases ◽

Matrix Construction

Abstract Similarity matrix has a significant effect on the performance of the spectral clustering, and how to determine the neighborhood in the similarity matrix effectively is one of its main difficulties. In this paper, a “divide and conquer” strategy is proposed to model the similarity matrix construction task by adopting Multiobjective evolutionary algorithm (MOEA). The whole procedure is divided into two phases, phase I aims to determine the nonzero entries of the similarity matrix, and Phase II aims to determine the value of the nonzero entries of the similarity matrix. In phase I, the main contribution is that we model the task as a biobjective dynamic optimization problem, which optimizes the diversity and the similarity at the same time. It makes each individual determine one nonzero entry for each sample, and the encoding length decreases to O(N) in contrast with the non-ensemble multiobjective spectral clustering. In addition, a specific initialization operator and diversity preservation strategy are proposed during this phase. In phase II, three ensemble strategies are designed to determine the value of the nonzero value of the similarity matrix. Furthermore, this Pareto ensemble framework is extended to semi-supervised clustering by transforming the semi-supervised information to constraints. In contrast with the previous multiobjective evolutionary-based spectral clustering algorithms, the proposed Pareto ensemble-based framework makes a balance between time cost and the clustering accuracy, which is demonstrated in the experiments section.

Download Full-text

Uncertainty Quantification of Trajectory Clustering Applied to Ocean Ensemble Forecasts

Fluids ◽

10.3390/fluids5040184 ◽

2020 ◽

Vol 5 (4) ◽

pp. 184

Author(s):

Guilherme S. Vieira ◽

Irina I. Rypina ◽

Michael R. Allshouse

Keyword(s):

Spectral Clustering ◽

Trajectory Clustering ◽

Clustering Method ◽

Material Transport ◽

Ensemble Forecasts ◽

Uncertainty Model ◽

Ocean Flows ◽

Ocean Forecasting ◽

High Uncertainty ◽

Spectral Clustering Method

Partitioning ocean flows into regions dynamically distinct from their surroundings based on material transport can assist search-and-rescue planning by reducing the search domain. The spectral clustering method partitions the domain by identifying fluid particle trajectories that are similar. The partitioning validity depends on the accuracy of the ocean forecasting, which is subject to several sources of uncertainty: model initialization, limited knowledge of the physical processes, boundary conditions, and forcing terms. Instead of a single model output, multiple realizations are produced spanning a range of potential outcomes, and trajectory clustering is used to identify robust features and quantify the uncertainty of the ensemble-averaged results. First, ensemble statistics are used to investigate the cluster sensitivity to the spectral clustering method free-parameters and the forecast parameters for the analytic Bickley jet, a geostrophic flow model. Then, we analyze an operational coastal ocean ensemble forecast and compare the clustering results to drifter trajectories south of Martha’s Vineyard. This approach identifies regions of low uncertainty where drifters released within a cluster predominantly remain there throughout the window of analysis. Drifters released in regions of high uncertainty tend to either enter neighboring clusters or deviate from all predicted outcomes.

Download Full-text

Spectral Clustering Analysis of CAD Model Based on Multi-Feature Fusion

2019 12th International Conference on Intelligent Computation Technology and Automation (ICICTA) ◽

10.1109/icicta49267.2019.00155 ◽

2019 ◽

Author(s):

Huanchun Yuan ◽

Jihua Wang

Keyword(s):

Clustering Analysis ◽

Spectral Clustering ◽

Feature Fusion ◽

Cad Model ◽

Model Based

Download Full-text