Privacy Preserving Outlier Detection Using Locality Sensitive Hashing

Author(s):  
Nisarg Raval ◽  
Madhuchand Rushi Pillutla ◽  
Piyush Bansal ◽  
Kannan Srinathan ◽  
C.V. Jawahar
2021 ◽  
Vol 14 (2) ◽  
pp. 26
Author(s):  
Na Li ◽  
Lianguan Huang ◽  
Yanling Li ◽  
Meng Sun

In recent years, with the development of the Internet, the data on the network presents an outbreak trend. Big data mining aims at obtaining useful information through data processing, such as clustering, clarifying and so on. Clustering is an important branch of big data mining and it is popular because of its simplicity. A new trend for clients who lack of storage and computational resources is to outsource the data and clustering task to the public cloud platforms. However, as datasets used for clustering may contain some sensitive information (e.g., identity information, health information), simply outsourcing them to the cloud platforms can't protect the privacy. So clients tend to encrypt their databases before uploading to the cloud for clustering. In this paper, we focus on privacy protection and efficiency promotion with respect to k-means clustering, and we propose a new privacy-preserving multi-user outsourced k-means clustering algorithm which is based on locality sensitive hashing (LSH). In this algorithm, we use a Paillier cryptosystem encrypting databases, and combine LSH to prune off some unnecessary computations during the clustering. That is, we don't need to compute the Euclidean distances between each data record and each clustering center. Finally, the theoretical and experimental results show that our algorithm is more efficient than most existing privacy-preserving k-means clustering.


Author(s):  
Anrong Xue ◽  
Xiqiang Duan ◽  
Handa Ma ◽  
Weihe Chen ◽  
Shiguang Ju

2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Jaak Simm ◽  
Lina Humbeck ◽  
Adam Zalewski ◽  
Noe Sturm ◽  
Wouter Heyndrickx ◽  
...  

AbstractWith the increase in applications of machine learning methods in drug design and related fields, the challenge of designing sound test sets becomes more and more prominent. The goal of this challenge is to have a realistic split of chemical structures (compounds) between training, validation and test set such that the performance on the test set is meaningful to infer the performance in a prospective application. This challenge is by its own very interesting and relevant, but is even more complex in a federated machine learning approach where multiple partners jointly train a model under privacy-preserving conditions where chemical structures must not be shared between the different participating parties. In this work we discuss three methods which provide a splitting of a data set and are applicable in a federated privacy-preserving setting, namely: a. locality-sensitive hashing (LSH), b. sphere exclusion clustering, c. scaffold-based binning (scaffold network). For evaluation of these splitting methods we consider the following quality criteria (compared to random splitting): bias in prediction performance, classification label and data imbalance, similarity distance between the test and training set compounds. The main findings of the paper are a. both sphere exclusion clustering and scaffold-based binning result in high quality splitting of the data sets, b. in terms of compute costs sphere exclusion clustering is very expensive in the case of federated privacy-preserving setting.


2021 ◽  
Vol 2021 (4) ◽  
pp. 549-574
Author(s):  
Alexandra Boldyreva ◽  
Tianxin Tang

Abstract We study the problem of privacy-preserving approximate kNN search in an outsourced environment — the client sends the encrypted data to an untrusted server and later can perform secure approximate kNN search and updates. We design a security model and propose a generic construction based on locality-sensitive hashing, symmetric encryption, and an oblivious map. The construction provides very strong security guarantees, not only hiding the information about the data, but also the access, query, and volume patterns. We implement, evaluate efficiency, and compare the performance of two concrete schemes based on an oblivious AVL tree and an oblivious BSkiplist.


Sign in / Sign up

Export Citation Format

Share Document