Trajectory Clustering and k-NN for Robust Privacy Preserving Spatiotemporal Databases

Elias Dritsas; Maria Trigka; Panagiotis Gerolymatos; Spyros Sioutas

doi:10.3390/a11120207

Trajectory Clustering and k-NN for Robust Privacy Preserving Spatiotemporal Databases

Algorithms ◽

10.3390/a11120207 ◽

2018 ◽

Vol 11 (12) ◽

pp. 207 ◽

Cited By ~ 2

Author(s):

Elias Dritsas ◽

Maria Trigka ◽

Panagiotis Gerolymatos ◽

Spyros Sioutas

Keyword(s):

Nearest Neighbor ◽

Dimensional Space ◽

Motion Vector ◽

Research Work ◽

Privacy Preserving ◽

Mobile Users ◽

Trajectory Clustering ◽

K Nearest Neighbor ◽

Trajectory Data ◽

Spatiotemporal Databases

In the context of this research work, we studied the problem of privacy preserving on spatiotemporal databases. In particular, we investigated the k-anonymity of mobile users based on real trajectory data. The k-anonymity set consists of the k nearest neighbors. We constructed a motion vector of the form (x,y,g,v) where x and y are the spatial coordinates, g is the angle direction, and v is the velocity of mobile users, and studied the problem in four-dimensional space. We followed two approaches. The former applied only k-Nearest Neighbor (k-NN) algorithm on the whole dataset, while the latter combined trajectory clustering, based on K-means, with k-NN. Actually, it applied k-NN inside a cluster of mobile users with similar motion pattern (g,v). We defined a metric, called vulnerability, that measures the rate at which k-NNs are varying. This metric varies from 1 k (high robustness) to 1 (low robustness) and represents the probability the real identity of a mobile user being discovered from a potential attacker. The aim of this work was to prove that, with high probability, the above rate tends to a number very close to 1 k in clustering method, which means that the k-anonymity is highly preserved. Through experiments on real spatial datasets, we evaluated the anonymity robustness, the so-called vulnerability, of the proposed method.

Download Full-text

Trajectory Clustering and k-NN for Robust Privacy Preserving k-NN Query Processing in GeoSpark

Algorithms ◽

10.3390/a13080182 ◽

2020 ◽

Vol 13 (8) ◽

pp. 182

Author(s):

Elias Dritsas ◽

Andreas Kanavos ◽

Maria Trigka ◽

Gerasimos Vonitsanos ◽

Spyros Sioutas ◽

...

Keyword(s):

Big Data ◽

Spatial Data ◽

Privacy Preservation ◽

Nearest Neighbor ◽

Data Representation ◽

Privacy Preserving ◽

Temporal Data ◽

K Nearest Neighbor ◽

Trajectory Data ◽

Spatio Temporal

Privacy Preserving and Anonymity have gained significant concern from the big data perspective. We have the view that the forthcoming frameworks and theories will establish several solutions for privacy protection. The k-anonymity is considered a key solution that has been widely employed to prevent data re-identifcation and concerns us in the context of this work. Data modeling has also gained significant attention from the big data perspective. It is believed that the advancing distributed environments will provide users with several solutions for efficient spatio-temporal data management. GeoSpark will be utilized in the current work as it is a key solution that has been widely employed for spatial data. Specifically, it works on the top of Apache Spark, the main framework leveraged from the research community and organizations for big data transformation, processing and visualization. To this end, we focused on trajectory data representation so as to be applicable to the GeoSpark environment, and a GeoSpark-based approach is designed for the efficient management of real spatio-temporal data. Th next step is to gain deeper understanding of the data through the application of k nearest neighbor (k-NN) queries either using indexing methods or otherwise. The k-anonymity set computation, which is the main component for privacy preservation evaluation and the main issue of our previous works, is evaluated in the GeoSpark environment. More to the point, the focus here is on the time cost of k-anonymity set computation along with vulnerability measurement. The extracted results are presented into tables and figures for visual inspection.

Download Full-text

Parallel kNN Queries for Big Data Based on Voronoi Diagram Using MapReduce

Advances in Data Mining and Database Management - Handbook of Research on Innovative Database Query Processing Techniques ◽

10.4018/978-1-4666-8767-7.ch014 ◽

2015 ◽

pp. 392-414

Author(s):

Wei Yan

Keyword(s):

Big Data ◽

Voronoi Diagram ◽

Spatial Databases ◽

Nearest Neighbor ◽

Programming Model ◽

Dimensional Space ◽

Data Sets ◽

Two Dimensional ◽

K Nearest Neighbor ◽

K Nearest Neighbors

In cloud computing environments parallel kNN queries for big data is an important issue. The k nearest neighbor queries (kNN queries), designed to find k nearest neighbors from a dataset S for every object in another dataset R, is a primitive operator widely adopted by many applications including knowledge discovery, data mining, and spatial databases. This chapter proposes a parallel method of kNN queries for big data using MapReduce programming model. Firstly, this chapter proposes an approximate algorithm that is based on mapping multi-dimensional data sets into two-dimensional data sets, and transforming kNN queries into a sequence of two-dimensional point searches. Then, in two-dimensional space this chapter proposes a partitioning method using Voronoi diagram, which incorporates the Voronoi diagram into R-tree. Furthermore, this chapter proposes an efficient algorithm for processing kNN queries based on R-tree using MapReduce programming model. Finally, this chapter presents the results of extensive experimental evaluations which indicate efficiency of the proposed approach.

Download Full-text

Improving k-Nearest Neighbor Pattern Recognition Models for Privacy-Preserving Data Analysis

2019 IEEE International Conference on Big Data (Big Data) ◽

10.1109/bigdata47090.2019.9006281 ◽

2019 ◽

Author(s):

Walisa Romsaiyud ◽

Henning Schnoor ◽

Wilhelm Hasselbring

Keyword(s):

Pattern Recognition ◽

Data Analysis ◽

Nearest Neighbor ◽

Privacy Preserving ◽

K Nearest Neighbor

Download Full-text

Storage Efficient Trajectory Clustering and k-NN for Robust Privacy Preserving Spatio-Temporal Databases

Algorithms ◽

10.3390/a12120266 ◽

2019 ◽

Vol 12 (12) ◽

pp. 266 ◽

Cited By ~ 2

Author(s):

Elias Dritsas ◽

Andreas Kanavos ◽

Maria Trigka ◽

Spyros Sioutas ◽

Athanasios Tsakalidis

Keyword(s):

Privacy Preservation ◽

Dimensional Space ◽

Research Work ◽

Temporal Databases ◽

Mobile Object ◽

Linear Component ◽

Trajectory Data ◽

Spatio Temporal ◽

Storage Problem ◽

The One

The need to store massive volumes of spatio-temporal data has become a difficult task as GPS capabilities and wireless communication technologies have become prevalent to modern mobile devices. As a result, massive trajectory data are produced, incurring expensive costs for storage, transmission, as well as query processing. A number of algorithms for compressing trajectory data have been proposed in order to overcome these difficulties. These algorithms try to reduce the size of trajectory data, while preserving the quality of the information. In the context of this research work, we focus on both the privacy preservation and storage problem of spatio-temporal databases. To alleviate this issue, we propose an efficient framework for trajectories representation, entitled DUST (DUal-based Spatio-temporal Trajectory), by which a raw trajectory is split into a number of linear sub-trajectories which are subjected to dual transformation that formulates the representatives of each linear component of initial trajectory; thus, the compressed trajectory achieves compression ratio equal to M : 1 . To our knowledge, we are the first to study and address k-NN queries on nonlinear moving object trajectories that are represented in dual dimensional space. Additionally, the proposed approach is expected to reinforce the privacy protection of such data. Specifically, even in case that an intruder has access to the dual points of trajectory data and try to reproduce the native points that fit a specific component of the initial trajectory, the identity of the mobile object will remain secure with high probability. In this way, the privacy of the k-anonymity method is reinforced. Through experiments on real spatial datasets, we evaluate the robustness of the new approach and compare it with the one studied in our previous work.

Download Full-text

A Privacy Preserving Cloud-Based K-NN Search Scheme with Lightweight User Loads

Computers ◽

10.3390/computers9010001 ◽

2020 ◽

Vol 9 (1) ◽

pp. 1 ◽

Cited By ~ 1

Author(s):

Yeong-Cherng Hsu ◽

Chih-Hsin Hsueh ◽

Ja-Ling Wu

Keyword(s):

Data Privacy ◽

Nearest Neighbor ◽

Search Algorithm ◽

Data Access ◽

Privacy Preserving ◽

Secret Key ◽

K Nearest Neighbor ◽

Sensitive Data ◽

Cloud Data ◽

Cloud Server

With the growing popularity of cloud computing, it is convenient for data owners to outsource their data to a cloud server. By utilizing the massive storage and computational resources in cloud, data owners can also provide a platform for users to make query requests. However, due to the privacy concerns, sensitive data should be encrypted before outsourcing. In this work, a novel privacy preserving K-nearest neighbor (K-NN) search scheme over the encrypted outsourced cloud dataset is proposed. The problem is about letting the cloud server find K nearest points with respect to an encrypted query on the encrypted dataset, which was outsourced by data owners, and return the searched results to the querying user. Comparing with other existing methods, our approach leverages the resources of the cloud more by shifting most of the required computational loads, from data owners and query users, to the cloud server. In addition, there is no need for data owners to share their secret key with others. In a nutshell, in the proposed scheme, data points and user queries are encrypted attribute-wise and the entire search algorithm is performed in the encrypted domain; therefore, our approach not only preserves the data privacy and query privacy but also hides the data access pattern from the cloud server. Moreover, by using a tree structure, the proposed scheme could accomplish query requests in sub-liner time, according to our performance analysis. Finally, experimental results demonstrate the practicability and the efficiency of our method.

Download Full-text

Classification of Thyroid Carcinoma in FNAB Cytological Microscopic Images

International Journal of Healthcare Information Systems and Informatics ◽

10.4018/jhisi.2010040107 ◽

2010 ◽

Vol 5 (2) ◽

pp. 60-72 ◽

Cited By ~ 3

Author(s):

B. Gopinath ◽

B. R. Gupta

Keyword(s):

Thyroid Carcinoma ◽

Thyroid Nodules ◽

Nearest Neighbor ◽

Research Work ◽

Medullary Carcinoma ◽

Needle Aspiration ◽

Future Research ◽

Discrete Wavelet ◽

K Nearest Neighbor ◽

Classification Rate

This paper investigates an image classification method performing thyroid carcinoma classification in Fine Needle Aspiration Biopsy cytological images of thyroid nodules under noise conditions and varying staining conditions. The segmentation method combines the image processing techniques thresholding and mathematical morphology. Feature extraction and classification are carried out by discrete wavelet transform and Euclidean distance based on k-nearest neighbor classifier, respectively. The classification methodology is successfully tested for Papillary carcinoma and Medullary carcinoma cytological images of thyroid nodules, showing promising results, encouraging future research work. The maximum classification rate of 95.84% and minimum classification rate of 79.17% have been reported for various testing sets of FNAB cytological images of thyroid nodules.

Download Full-text

Efficient Privacy-Preserving k-Nearest Neighbor Search

10.1109/icdcs.2008.79 ◽

2008 ◽

Cited By ~ 47

Author(s):

Yinian Qi ◽

Mikhail J. Atallah

Keyword(s):

Nearest Neighbor ◽

Privacy Preserving ◽

Nearest Neighbor Search ◽

K Nearest Neighbor ◽

Neighbor Search ◽

K Nearest Neighbor Search

Download Full-text

Updatable privacy-preserving $$\textit{K}$$-nearest neighbor query in location-based s-ervice

Peer-to-Peer Networking and Applications ◽

10.1007/s12083-021-01290-4 ◽

2022 ◽

Author(s):

Songyang Wu ◽

Wenju Xu ◽

Zhiyong Hong ◽

Pu Duan ◽

Benyu Zhang ◽

...

Keyword(s):

Nearest Neighbor ◽

Privacy Preserving ◽

K Nearest Neighbor ◽

Nearest Neighbor Query

Download Full-text

Identifying Modes of Driving Railway Trains from GPS Trajectory Data: An Ensemble Classifier-Based Approach

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi7080308 ◽

2018 ◽

Vol 7 (8) ◽

pp. 308 ◽

Cited By ~ 4

Author(s):

Han Zheng ◽

Zanyang Cui ◽

Xingchen Zhang

Keyword(s):

Nearest Neighbor ◽

Capacity Utilization ◽

Real Data ◽

Parameter Tuning ◽

Integrated Approach ◽

Ensemble Classifier ◽

Gradient Boosting ◽

Support Vector ◽

K Nearest Neighbor ◽

Trajectory Data

Recognizing Modes of Driving Railway Trains (MDRT) can help to solve railway freight transportation problems in driver behavior research, auto-driving system design and capacity utilization optimization. Previous studies have focused on analyses and applications of MDRT, but there is currently no approach to automatically and effectively identify MDRT in the context of big data. In this study, we propose an integrated approach including data preprocessing, feature extraction, classifiers modeling, training and parameter tuning, and model evaluation to infer MDRT using GPS data. The highlights of this study are as follows: First, we propose methods for extracting Driving Segmented Standard Deviation Features (DSSDF) combined with classical features for the purpose of improving identification performances. Second, we find the most suitable classifier for identifying MDRT based on a comparison of performances of K-Nearest Neighbor, Support Vector Machines, AdaBoost, Random Forest, Gradient Boosting Decision Tree, and XGBoost. From the real-data experiment, we conclude that: (i) The ensemble classifier XGBoost produces the best performance with an accuracy of 92.70%; (ii) The group of DSSDF plays an important role in identifying MDRT with an accuracy improvement of 11.2% (using XGBoost). The proposed approach has been applied in capacity utilization optimization and new driver training for the Baoshen Railway.

Download Full-text

Privacy Preserving Reverse k-Nearest Neighbor Queries

2018 19th IEEE International Conference on Mobile Data Management (MDM) ◽

10.1109/mdm.2018.00035 ◽

2018 ◽

Author(s):

Layla Pournajaf ◽

Farnaz Tahmasebian ◽

Li Xiong ◽

Vaidy Sunderam ◽

Cyrus Shahabi

Keyword(s):

Nearest Neighbor ◽

Privacy Preserving ◽

K Nearest Neighbor ◽

Nearest Neighbor Queries

Download Full-text