scholarly journals An OD Flow Clustering Method Based on Vector Constraints: A Case Study for Beijing Taxi Origin-Destination Data

2020 ◽  
Vol 9 (2) ◽  
pp. 128
Author(s):  
Xiaogang Guo ◽  
Zhijie Xu ◽  
Jianqin Zhang ◽  
Jian Lu ◽  
Hao Zhang

Origin-destination (OD) flow pattern mining is an important research method of urban dynamics, in which OD flow clustering analysis discovers the activity patterns of urban residents and mine the coupling relationship of urban subspace and dynamic causes. The existing flow clustering methods are limited by the spatial constraints of OD points, rely on the spatial similarity of geographical points, and lack in-depth analysis of high-dimensional flow characteristics, and therefore it is difficult to find irregular flow clusters. In this paper, we propose an OD flow clustering method based on vector constraints (ODFCVC), which defines OD flow event point and OD flow vector to express the spatial location relationship and geometric flow behavior characteristics of OD flow. First, the OD flow vector coordinate system is normalized by the Euclidean distance-based OD flow event point spatial clustering, and then the OD flow clusters with similar flow patterns are mined using adjusted cosine similarity-based OD flow vector feature clustering. The transformation of OD data from point set space to vector space is realized by constraining the vector coordinate system and vector similarity through two-step clustering, which simplifies the calculation of high-dimensional similarity of OD flow and helps mining representative OD flow clusters in flow space. Due to the OD flow cluster property, the k-means algorithm is selected as the basic clustering logic in the two-step clustering method, and a sum of squared error perceptually important points algorithm considering silhouette coefficients (SSEPIP) is adopted to automatically extract the optimal cluster number without defining any parameters. Tested by origin-destination flow data in Beijing, China, new traffic flow communities based on traffic hubs are obtained by using the ODFCVC method, and irregular traffic flow clusters (including cluster mode, divergence mode, and convergence mode) with representative travel trends are found.

2020 ◽  
Vol 2020 ◽  
pp. 1-12
Author(s):  
Huangyue Chen ◽  
Lingchen Kong ◽  
Yan Li

Clustering is an important ingredient of unsupervised learning; classical clustering methods include K-means clustering and hierarchical clustering. These methods may suffer from instability because of their tendency prone to sink into the local optimal solutions of the nonconvex optimization model. In this paper, we propose a new convex clustering method for high-dimensional data based on the sparse group lasso penalty, which can simultaneously group observations and eliminate noninformative features. In this method, the number of clusters can be learned from the data instead of being given in advance as a parameter. We theoretically prove that the proposed method has desirable statistical properties, including a finite sample error bound and feature screening consistency. Furthermore, the semiproximal alternating direction method of multipliers is designed to solve the sparse group lasso convex clustering model, and its convergence analysis is established without any conditions. Finally, the effectiveness of the proposed method is thoroughly demonstrated through simulated experiments and real applications.


Aerospace ◽  
2021 ◽  
Vol 8 (9) ◽  
pp. 266
Author(s):  
Weili Zeng ◽  
Zhengfeng Xu ◽  
Zhipeng Cai ◽  
Xiao Chu ◽  
Xiaobo Lu

The aircraft trajectory clustering analysis in the terminal airspace is conducive to determining the representative route structure of the arrival and departure trajectory and extracting their typical patterns, which is important for air traffic management such as airspace structure optimization, trajectory planning, and trajectory prediction. However, the current clustering methods perform poorly due to the large flight traffic, high density, and complex airspace structure in the terminal airspace. In recent years, the continuous development of Deep Learning has demonstrated its powerful ability to extract internal potential features of large dataset. Therefore, this paper mainly tries a deep trajectory clustering method based on deep autoencoder (DAE). To this end, this paper proposes a trajectory clustering method based on deep autoencoder (DAE) and Gaussian mixture model (GMM) to mine the prevailing traffic flow patterns in the terminal airspace. The DAE is trained to extract feature representations from historical high-dimensional trajectory data. Subsequently, the output of DAE is input into GMM for clustering. This paper takes the terminal airspace of Guangzhou Baiyun International Airport in China as a case to verify the proposed method. Through the direct visualization and dimensionality reduction visualization of the clustering results, it is found that the traffic flow patterns identified by the clustering method in this paper are intuitive and separable.


2012 ◽  
Vol 532-533 ◽  
pp. 939-943
Author(s):  
Yi Ding ◽  
Xian Fu

Text clustering typically involves clustering in a high dimensional space, which appears difficult with regard to virtually all practical settings. In addition, given a particular clustering result it is typically very hard to come up with a good explanation of why the text clusters have been constructed the way they are. . To solve these problems, based on topic concept clustering, this paper proposes a method for Chinese document clustering. In this paper, we introduce a novel topical document clustering method called Document Features Indexing Clustering (DFIC), which can identify topics accurately and cluster documents according to these topics. In DFIC, “topic elements” are defined and extracted for indexing base clusters. Additionally, document features are investigated and exploited. Experimental results show that DFIC can gain a higher precision (92.76%) than some widely used traditional clustering methods.


2021 ◽  
Vol 10 (3) ◽  
pp. 161
Author(s):  
Hao-xuan Chen ◽  
Fei Tao ◽  
Pei-long Ma ◽  
Li-na Gao ◽  
Tong Zhou

Spatial analysis is an important means of mining floating car trajectory information, and clustering method and density analysis are common methods among them. The choice of the clustering method affects the accuracy and time efficiency of the analysis results. Therefore, clarifying the principles and characteristics of each method is the primary prerequisite for problem solving. Taking four representative spatial analysis methods—KMeans, Density-Based Spatial Clustering of Applications with Noise (DBSCAN), Clustering by Fast Search and Find of Density Peaks (CFSFDP), and Kernel Density Estimation (KDE)—as examples, combined with the hotspot spatiotemporal mining problem of taxi trajectory, through quantitative analysis and experimental verification, it is found that DBSCAN and KDE algorithms have strong hotspot discovery capabilities, but the heat regions’ shape of DBSCAN is found to be relatively more robust. DBSCAN and CFSFDP can achieve high spatial accuracy in calculating the entrance and exit position of a Point of Interest (POI). KDE and DBSCAN are more suitable for the classification of heat index. When the dataset scale is similar, KMeans has the highest operating efficiency, while CFSFDP and KDE are inferior. This paper resolves to a certain extent the lack of scientific basis for selecting spatial analysis methods in current research. The conclusions drawn in this paper can provide technical support and act as a reference for the selection of methods to solve the taxi trajectory mining problem.


2015 ◽  
Vol 17 (5) ◽  
pp. 719-732
Author(s):  
Dulakshi Santhusitha Kumari Karunasingha ◽  
Shie-Yui Liong

A simple clustering method is proposed for extracting representative subsets from lengthy data sets. The main purpose of the extracted subset of data is to use it to build prediction models (of the form of approximating functional relationships) instead of using the entire large data set. Such smaller subsets of data are often required in exploratory analysis stages of studies that involve resource consuming investigations. A few recent studies have used a subtractive clustering method (SCM) for such data extraction, in the absence of clustering methods for function approximation. SCM, however, requires several parameters to be specified. This study proposes a clustering method, which requires only a single parameter to be specified, yet it is shown to be as effective as the SCM. A method to find suitable values for the parameter is also proposed. Due to having only a single parameter, using the proposed clustering method is shown to be orders of magnitudes more efficient than using SCM. The effectiveness of the proposed method is demonstrated on phase space prediction of three univariate time series and prediction of two multivariate data sets. Some drawbacks of SCM when applied for data extraction are identified, and the proposed method is shown to be a solution for them.


2021 ◽  
Vol 15 ◽  
Author(s):  
Justin L. Balsor ◽  
Keon Arbabi ◽  
Desmond Singh ◽  
Rachel Kwan ◽  
Jonathan Zaslavsky ◽  
...  

Studying the molecular development of the human brain presents unique challenges for selecting a data analysis approach. The rare and valuable nature of human postmortem brain tissue, especially for developmental studies, means the sample sizes are small (n), but the use of high throughput genomic and proteomic methods measure the expression levels for hundreds or thousands of variables [e.g., genes or proteins (p)] for each sample. This leads to a data structure that is high dimensional (p ≫ n) and introduces the curse of dimensionality, which poses a challenge for traditional statistical approaches. In contrast, high dimensional analyses, especially cluster analyses developed for sparse data, have worked well for analyzing genomic datasets where p ≫ n. Here we explore applying a lasso-based clustering method developed for high dimensional genomic data with small sample sizes. Using protein and gene data from the developing human visual cortex, we compared clustering methods. We identified an application of sparse k-means clustering [robust sparse k-means clustering (RSKC)] that partitioned samples into age-related clusters that reflect lifespan stages from birth to aging. RSKC adaptively selects a subset of the genes or proteins contributing to partitioning samples into age-related clusters that progress across the lifespan. This approach addresses a problem in current studies that could not identify multiple postnatal clusters. Moreover, clusters encompassed a range of ages like a series of overlapping waves illustrating that chronological- and brain-age have a complex relationship. In addition, a recently developed workflow to create plasticity phenotypes (Balsor et al., 2020) was applied to the clusters and revealed neurobiologically relevant features that identified how the human visual cortex changes across the lifespan. These methods can help address the growing demand for multimodal integration, from molecular machinery to brain imaging signals, to understand the human brain’s development.


2021 ◽  
Author(s):  
Feiyang Ren ◽  
Yi Han ◽  
Shaohan Wang ◽  
He Jiang

Abstract A novel marine transportation network based on high-dimensional AIS data with a multi-level clustering algorithm is proposed to discover important waypoints in trajectories based on selected navigation features. This network contains two parts: the calculation of major nodes with CLIQUE and BIRCH clustering methods and navigation network construction with edge construction theory. Unlike the state-of-art work for navigation clustering with only ship coordinate, the proposed method contains more high-dimensional features such as drafting, weather, and fuel consumption. By comparing the historical AIS data, more than 220,133 lines of data in 30 days were used to extract 440 major nodal points in less than 4 minutes with ordinary PC specs (i5 processer). The proposed method can be performed on more dimensional data for better ship path planning or even national economic analysis. Current work has shown good performance on complex ship trajectories distinction and great potential for future shipping transportation market analytical predictions.


2018 ◽  
Vol 6 (2) ◽  
Author(s):  
Elly Muningsih - AMIK BSI Yogyakarta

Abstract ~ The K-Means method is one of the clustering methods that is widely used in data clustering research. While the K-Medoids method is an efficient method used for processing small data. This study aims to compare two clustering methods by grouping customers into 3 clusters according to their characteristics, namely very potential (loyal) customers, potential customers and non potential customers. The method used in this study is the K-Means clustering method and the K-Medoids method. The data used is online sales transaction. The clustering method testing is done by using a Fuzzy RFM (Recency, Frequenty and Monetary) model where the average (mean) of the third value is taken. From the data testing is known that the K-Means method is better than the K-Medoids method with an accuracy value of 90.47%. Whereas from the data processing carried out is known that cluster 1 has 16 members (customers), cluster 2 has 11 members and cluster 3 has 15 members. Keywords : clustering, K-Means method, K-Medoids method, customer, Fuzzy RFM model. Abstrak ~ Metode K-Means merupakan salah satu metode clustering yang banyak digunakan dalam penelitian pengelompokan data. Sedangkan metode K-Medoids merupakan metode yang efisien digunakan untuk pengolahan data yang kecil. Penelitian ini bertujuan untuk membandingkan atau mengkomparasi dua metode clustering dengan cara mengelompokkan pelanggan menjadi 3 cluster sesuai dengan karakteristiknya, yaitu pelanggan sangat potensial (loyal), pelanggan potensial dan pelanggan kurang (tidak) potensial. Metode yang digunakan dalam penelitian ini adalah metode clustering K-Means dan metode K-Medoids. Data yang digunakan adalah data transaksi penjualan online. Pengujian metode clustering yang dilakukan adalah dengan menggunakan model Fuzzy RFM (Recency, Frequenty dan Monetary) dimana diambil rata-rata (mean) dari nilai ketiga tersebut. Dari pengujian data diketahui bahwa metode K-Means lebih baik dari metode K-Medoids dengan nilai akurasi 90,47%. Sedangkan dari pengolahan data yang dilakukan diketahui bahwa cluster 1 memiliki 16 anggota (pelanggan), cluster 2 memiliki 11 anggota dan cluster 3 memiliki 15 anggota. Kata kunci : clustering, metode K-Means, metode K-Medoids, pelanggan, model Fuzzy RFM.


Sign in / Sign up

Export Citation Format

Share Document