scholarly journals Trajectory Clustering and k-NN for Robust Privacy Preserving Spatiotemporal Databases

Algorithms ◽  
2018 ◽  
Vol 11 (12) ◽  
pp. 207 ◽  
Author(s):  
Elias Dritsas ◽  
Maria Trigka ◽  
Panagiotis Gerolymatos ◽  
Spyros Sioutas 

In the context of this research work, we studied the problem of privacy preserving on spatiotemporal databases. In particular, we investigated the k-anonymity of mobile users based on real trajectory data. The k-anonymity set consists of the k nearest neighbors. We constructed a motion vector of the form (x,y,g,v) where x and y are the spatial coordinates, g is the angle direction, and v is the velocity of mobile users, and studied the problem in four-dimensional space. We followed two approaches. The former applied only k-Nearest Neighbor (k-NN) algorithm on the whole dataset, while the latter combined trajectory clustering, based on K-means, with k-NN. Actually, it applied k-NN inside a cluster of mobile users with similar motion pattern (g,v). We defined a metric, called vulnerability, that measures the rate at which k-NNs are varying. This metric varies from 1 k (high robustness) to 1 (low robustness) and represents the probability the real identity of a mobile user being discovered from a potential attacker. The aim of this work was to prove that, with high probability, the above rate tends to a number very close to 1 k in clustering method, which means that the k-anonymity is highly preserved. Through experiments on real spatial datasets, we evaluated the anonymity robustness, the so-called vulnerability, of the proposed method.

Algorithms ◽  
2020 ◽  
Vol 13 (8) ◽  
pp. 182
Author(s):  
Elias Dritsas ◽  
Andreas Kanavos ◽  
Maria Trigka ◽  
Gerasimos Vonitsanos ◽  
Spyros Sioutas ◽  
...  

Privacy Preserving and Anonymity have gained significant concern from the big data perspective. We have the view that the forthcoming frameworks and theories will establish several solutions for privacy protection. The k-anonymity is considered a key solution that has been widely employed to prevent data re-identifcation and concerns us in the context of this work. Data modeling has also gained significant attention from the big data perspective. It is believed that the advancing distributed environments will provide users with several solutions for efficient spatio-temporal data management. GeoSpark will be utilized in the current work as it is a key solution that has been widely employed for spatial data. Specifically, it works on the top of Apache Spark, the main framework leveraged from the research community and organizations for big data transformation, processing and visualization. To this end, we focused on trajectory data representation so as to be applicable to the GeoSpark environment, and a GeoSpark-based approach is designed for the efficient management of real spatio-temporal data. Th next step is to gain deeper understanding of the data through the application of k nearest neighbor (k-NN) queries either using indexing methods or otherwise. The k-anonymity set computation, which is the main component for privacy preservation evaluation and the main issue of our previous works, is evaluated in the GeoSpark environment. More to the point, the focus here is on the time cost of k-anonymity set computation along with vulnerability measurement. The extracted results are presented into tables and figures for visual inspection.


Author(s):  
Wei Yan

In cloud computing environments parallel kNN queries for big data is an important issue. The k nearest neighbor queries (kNN queries), designed to find k nearest neighbors from a dataset S for every object in another dataset R, is a primitive operator widely adopted by many applications including knowledge discovery, data mining, and spatial databases. This chapter proposes a parallel method of kNN queries for big data using MapReduce programming model. Firstly, this chapter proposes an approximate algorithm that is based on mapping multi-dimensional data sets into two-dimensional data sets, and transforming kNN queries into a sequence of two-dimensional point searches. Then, in two-dimensional space this chapter proposes a partitioning method using Voronoi diagram, which incorporates the Voronoi diagram into R-tree. Furthermore, this chapter proposes an efficient algorithm for processing kNN queries based on R-tree using MapReduce programming model. Finally, this chapter presents the results of extensive experimental evaluations which indicate efficiency of the proposed approach.


Algorithms ◽  
2019 ◽  
Vol 12 (12) ◽  
pp. 266 ◽  
Author(s):  
Elias Dritsas ◽  
Andreas Kanavos ◽  
Maria Trigka ◽  
Spyros Sioutas ◽  
Athanasios Tsakalidis

The need to store massive volumes of spatio-temporal data has become a difficult task as GPS capabilities and wireless communication technologies have become prevalent to modern mobile devices. As a result, massive trajectory data are produced, incurring expensive costs for storage, transmission, as well as query processing. A number of algorithms for compressing trajectory data have been proposed in order to overcome these difficulties. These algorithms try to reduce the size of trajectory data, while preserving the quality of the information. In the context of this research work, we focus on both the privacy preservation and storage problem of spatio-temporal databases. To alleviate this issue, we propose an efficient framework for trajectories representation, entitled DUST (DUal-based Spatio-temporal Trajectory), by which a raw trajectory is split into a number of linear sub-trajectories which are subjected to dual transformation that formulates the representatives of each linear component of initial trajectory; thus, the compressed trajectory achieves compression ratio equal to M : 1 . To our knowledge, we are the first to study and address k-NN queries on nonlinear moving object trajectories that are represented in dual dimensional space. Additionally, the proposed approach is expected to reinforce the privacy protection of such data. Specifically, even in case that an intruder has access to the dual points of trajectory data and try to reproduce the native points that fit a specific component of the initial trajectory, the identity of the mobile object will remain secure with high probability. In this way, the privacy of the k-anonymity method is reinforced. Through experiments on real spatial datasets, we evaluate the robustness of the new approach and compare it with the one studied in our previous work.


Computers ◽  
2020 ◽  
Vol 9 (1) ◽  
pp. 1 ◽  
Author(s):  
Yeong-Cherng Hsu ◽  
Chih-Hsin Hsueh ◽  
Ja-Ling Wu

With the growing popularity of cloud computing, it is convenient for data owners to outsource their data to a cloud server. By utilizing the massive storage and computational resources in cloud, data owners can also provide a platform for users to make query requests. However, due to the privacy concerns, sensitive data should be encrypted before outsourcing. In this work, a novel privacy preserving K-nearest neighbor (K-NN) search scheme over the encrypted outsourced cloud dataset is proposed. The problem is about letting the cloud server find K nearest points with respect to an encrypted query on the encrypted dataset, which was outsourced by data owners, and return the searched results to the querying user. Comparing with other existing methods, our approach leverages the resources of the cloud more by shifting most of the required computational loads, from data owners and query users, to the cloud server. In addition, there is no need for data owners to share their secret key with others. In a nutshell, in the proposed scheme, data points and user queries are encrypted attribute-wise and the entire search algorithm is performed in the encrypted domain; therefore, our approach not only preserves the data privacy and query privacy but also hides the data access pattern from the cloud server. Moreover, by using a tree structure, the proposed scheme could accomplish query requests in sub-liner time, according to our performance analysis. Finally, experimental results demonstrate the practicability and the efficiency of our method.


Author(s):  
B. Gopinath ◽  
B. R. Gupta

This paper investigates an image classification method performing thyroid carcinoma classification in Fine Needle Aspiration Biopsy cytological images of thyroid nodules under noise conditions and varying staining conditions. The segmentation method combines the image processing techniques thresholding and mathematical morphology. Feature extraction and classification are carried out by discrete wavelet transform and Euclidean distance based on k-nearest neighbor classifier, respectively. The classification methodology is successfully tested for Papillary carcinoma and Medullary carcinoma cytological images of thyroid nodules, showing promising results, encouraging future research work. The maximum classification rate of 95.84% and minimum classification rate of 79.17% have been reported for various testing sets of FNAB cytological images of thyroid nodules.


2018 ◽  
Vol 7 (8) ◽  
pp. 308 ◽  
Author(s):  
Han Zheng ◽  
Zanyang Cui ◽  
Xingchen Zhang

Recognizing Modes of Driving Railway Trains (MDRT) can help to solve railway freight transportation problems in driver behavior research, auto-driving system design and capacity utilization optimization. Previous studies have focused on analyses and applications of MDRT, but there is currently no approach to automatically and effectively identify MDRT in the context of big data. In this study, we propose an integrated approach including data preprocessing, feature extraction, classifiers modeling, training and parameter tuning, and model evaluation to infer MDRT using GPS data. The highlights of this study are as follows: First, we propose methods for extracting Driving Segmented Standard Deviation Features (DSSDF) combined with classical features for the purpose of improving identification performances. Second, we find the most suitable classifier for identifying MDRT based on a comparison of performances of K-Nearest Neighbor, Support Vector Machines, AdaBoost, Random Forest, Gradient Boosting Decision Tree, and XGBoost. From the real-data experiment, we conclude that: (i) The ensemble classifier XGBoost produces the best performance with an accuracy of 92.70%; (ii) The group of DSSDF plays an important role in identifying MDRT with an accuracy improvement of 11.2% (using XGBoost). The proposed approach has been applied in capacity utilization optimization and new driver training for the Baoshen Railway.


Author(s):  
Layla Pournajaf ◽  
Farnaz Tahmasebian ◽  
Li Xiong ◽  
Vaidy Sunderam ◽  
Cyrus Shahabi

Sign in / Sign up

Export Citation Format

Share Document