Privacy Preserving Outlier Detection Using Locality Sensitive Hashing

Efficient and Privacy-Preserving Multi-User Outsourced K-Means Clustering

Computer and Information Science ◽

10.5539/cis.v14n2p26 ◽

2021 ◽

Vol 14 (2) ◽

pp. 26

Author(s):

Na Li ◽

Lianguan Huang ◽

Yanling Li ◽

Meng Sun

Keyword(s):

Data Mining ◽

Big Data ◽

Clustering Algorithm ◽

Privacy Preserving ◽

Locality Sensitive Hashing ◽

Sensitive Information ◽

The Public ◽

Big Data Mining ◽

Euclidean Distances ◽

Computational Resources

In recent years, with the development of the Internet, the data on the network presents an outbreak trend. Big data mining aims at obtaining useful information through data processing, such as clustering, clarifying and so on. Clustering is an important branch of big data mining and it is popular because of its simplicity. A new trend for clients who lack of storage and computational resources is to outsource the data and clustering task to the public cloud platforms. However, as datasets used for clustering may contain some sensitive information (e.g., identity information, health information), simply outsourcing them to the cloud platforms can't protect the privacy. So clients tend to encrypt their databases before uploading to the cloud for clustering. In this paper, we focus on privacy protection and efficiency promotion with respect to k-means clustering, and we propose a new privacy-preserving multi-user outsourced k-means clustering algorithm which is based on locality sensitive hashing (LSH). In this algorithm, we use a Paillier cryptosystem encrypting databases, and combine LSH to prune off some unnecessary computations during the clustering. That is, we don't need to compute the Euclidean distances between each data record and each clustering center. Finally, the theoretical and experimental results show that our algorithm is more efficient than most existing privacy-preserving k-means clustering.

Download Full-text

Privacy-Preserving Outlier Detection for Data Streams

Data and Applications Security and Privacy XXXI - Lecture Notes in Computer Science ◽

10.1007/978-3-319-61176-1_12 ◽

2017 ◽

pp. 225-238 ◽

Cited By ~ 1

Author(s):

Jonas Böhler ◽

Daniel Bernau ◽

Florian Kerschbaum

Keyword(s):

Outlier Detection ◽

Data Streams ◽

Privacy Preserving

Download Full-text

Privacy Preserving Density-Based Outlier Detection

2010 International Conference on Communications and Mobile Computing ◽

10.1109/cmc.2010.274 ◽

2010 ◽

Cited By ~ 3

Author(s):

Zaisheng Dai ◽

Liusheng Huang ◽

Youwen Zhu ◽

Wei Yang

Keyword(s):

Outlier Detection ◽

Privacy Preserving

Download Full-text

Privacy-Preserving Outlier Detection

Fourth IEEE International Conference on Data Mining (ICDM'04) ◽

10.1109/icdm.2004.10081 ◽

2005 ◽

Cited By ~ 35

Author(s):

J. Vaidya ◽

C. Clifton

Keyword(s):

Outlier Detection ◽

Privacy Preserving

Download Full-text

Privacy Preserving Spatial Outlier Detection

10.1109/icycs.2008.345 ◽

2008 ◽

Cited By ~ 3

Author(s):

Anrong Xue ◽

Xiqiang Duan ◽

Handa Ma ◽

Weihe Chen ◽

Shiguang Ju

Keyword(s):

Outlier Detection ◽

Privacy Preserving ◽

Spatial Outlier

Download Full-text

Privacy Preserving Outlier Detection over Vertically Partitioned Data

2009 International Conference on E-Business and Information System Security ◽

10.1109/ebiss.2009.5138025 ◽

2009 ◽

Cited By ~ 2

Author(s):

Zhengyou Zhou ◽

Liusheng Huang ◽

Yang Wei ◽

Ye Yun

Keyword(s):

Outlier Detection ◽

Privacy Preserving ◽

Partitioned Data ◽

Vertically Partitioned Data

Download Full-text

Splitting chemical structure data sets for federated privacy-preserving machine learning

Journal of Cheminformatics ◽

10.1186/s13321-021-00576-2 ◽

2021 ◽

Vol 13 (1) ◽

Author(s):

Jaak Simm ◽

Lina Humbeck ◽

Adam Zalewski ◽

Noe Sturm ◽

Wouter Heyndrickx ◽

...

Keyword(s):

Machine Learning ◽

Quality Criteria ◽

Privacy Preserving ◽

Locality Sensitive Hashing ◽

Data Sets ◽

Data Set ◽

Test Set ◽

Chemical Structures ◽

Multiple Partners ◽

Applications Of Machine Learning

AbstractWith the increase in applications of machine learning methods in drug design and related fields, the challenge of designing sound test sets becomes more and more prominent. The goal of this challenge is to have a realistic split of chemical structures (compounds) between training, validation and test set such that the performance on the test set is meaningful to infer the performance in a prospective application. This challenge is by its own very interesting and relevant, but is even more complex in a federated machine learning approach where multiple partners jointly train a model under privacy-preserving conditions where chemical structures must not be shared between the different participating parties. In this work we discuss three methods which provide a splitting of a data set and are applicable in a federated privacy-preserving setting, namely: a. locality-sensitive hashing (LSH), b. sphere exclusion clustering, c. scaffold-based binning (scaffold network). For evaluation of these splitting methods we consider the following quality criteria (compared to random splitting): bias in prediction performance, classification label and data imbalance, similarity distance between the test and training set compounds. The main findings of the paper are a. both sphere exclusion clustering and scaffold-based binning result in high quality splitting of the data sets, b. in terms of compute costs sphere exclusion clustering is very expensive in the case of federated privacy-preserving setting.

Download Full-text

Amplified Locality-Sensitive Hashing for Privacy-Preserving Distributed Service Recommendation

Security, Privacy, and Anonymity in Computation, Communication, and Storage - Lecture Notes in Computer Science ◽

10.1007/978-3-319-72389-1_23 ◽

2017 ◽

pp. 280-297 ◽

Cited By ~ 1

Author(s):

Lianyong Qi ◽

Wanchun Dou ◽

Xuyun Zhang ◽

Shui Yu

Keyword(s):

Privacy Preserving ◽

Locality Sensitive Hashing ◽

Service Recommendation ◽

Distributed Service

Download Full-text

Privacy-Preserving Approximate k-Nearest-Neighbors Search that Hides Access, Query and Volume Patterns

Proceedings on Privacy Enhancing Technologies ◽

10.2478/popets-2021-0084 ◽

2021 ◽

Vol 2021 (4) ◽

pp. 549-574

Author(s):

Alexandra Boldyreva ◽

Tianxin Tang

Keyword(s):

Nearest Neighbors ◽

Privacy Preserving ◽

Locality Sensitive Hashing ◽

Security Model ◽

Symmetric Encryption ◽

K Nearest Neighbors ◽

Encrypted Data ◽

Generic Construction ◽

Knn Search ◽

Security Guarantees

Abstract We study the problem of privacy-preserving approximate kNN search in an outsourced environment — the client sends the encrypted data to an untrusted server and later can perform secure approximate kNN search and updates. We design a security model and propose a generic construction based on locality-sensitive hashing, symmetric encryption, and an oblivious map. The construction provides very strong security guarantees, not only hiding the information about the data, but also the access, query, and volume patterns. We implement, evaluate efficiency, and compare the performance of two concrete schemes based on an oblivious AVL tree and an oblivious BSkiplist.

Download Full-text

Outlier Detection by Privacy-Preserving Ensemble Decision Tree U sing Homomorphic Encryption

10.1109/ijcnn52387.2021.9534464 ◽

2021 ◽

Author(s):

Kengo Itokazu ◽

Lihua Wang ◽

Seiichi Ozawa

Keyword(s):

Decision Tree ◽

Outlier Detection ◽

Homomorphic Encryption ◽

Privacy Preserving

Download Full-text