BLENDER: Enabling Local Search with a Hybrid Differential Privacy Model

Brendan Avent; Aleksandra Korolova; David Zeber; Torgeir Hovden; Benjamin Livshits

doi:10.29012/jpc.680

Research on improved privacy publishing algorithm based on set cover

Computer Science and Information Systems ◽

10.2298/csis180915023l ◽

2019 ◽

Vol 16 (3) ◽

pp. 705-731

Author(s):

Haoze Lv ◽

Zhaobin Liu ◽

Zhonglian Hu ◽

Lihai Nie ◽

Weijiang Liu ◽

...

Keyword(s):

Privacy Protection ◽

Data Privacy ◽

Differential Privacy ◽

Main Idea ◽

Data Availability ◽

Set Cover ◽

Data Sets ◽

Data Set ◽

Query Cover ◽

Privacy Model

With the invention of big data era, data releasing is becoming a hot topic in database community. Meanwhile, data privacy also raises the attention of users. As far as the privacy protection models that have been proposed, the differential privacy model is widely utilized because of its many advantages over other models. However, for the private releasing of multi-dimensional data sets, the existing algorithms are publishing data usually with low availability. The reason is that the noise in the released data is rapidly grown as the increasing of the dimensions. In view of this issue, we propose algorithms based on regular and irregular marginal tables of frequent item sets to protect privacy and promote availability. The main idea is to reduce the dimension of the data set, and to achieve differential privacy protection with Laplace noise. First, we propose a marginal table cover algorithm based on frequent items by considering the effectiveness of query cover combination, and then obtain a regular marginal table cover set with smaller size but higher data availability. Then, a differential privacy model with irregular marginal table is proposed in the application scenario with low data availability and high cover rate. Next, we obtain the approximate optimal marginal table cover algorithm by our analysis to get the query cover set which satisfies the multi-level query policy constraint. Thus, the balance between privacy protection and data availability is achieved. Finally, extensive experiments have been done on synthetic and real databases, demonstrating that the proposed method preforms better than state-of-the-art methods in most cases.

Download Full-text

Perosonalized Differentially Private Location Collection Method with Adaptive GPS Discretization

Communications in Computer and Information Science - Cyber Security ◽

10.1007/978-981-33-4922-3_13 ◽

2020 ◽

pp. 175-190

Author(s):

Huichuan Liu ◽

Yong Zeng ◽

Jiale Liu ◽

Zhihong Liu ◽

Jianfeng Ma ◽

...

Keyword(s):

Differential Privacy ◽

Geographic Location ◽

Real Data ◽

User Profile ◽

Data Sets ◽

User Privacy ◽

Mobile Terminals ◽

Data Collection Process ◽

Private Location ◽

Privacy Budget

AbstractIn recent years, with the development of mobile terminals, geographic location has attracted the attention of many researchers because of its convenience in collection and its ability to reflect user profile. To protect user privacy, researchers have adopted local differential privacy in data collection process. However, most existing methods assume that location has already been discretized, which we found, if not done carefully, may introduces huge noise, lowering collected result utility. Thus in this paper, we design a differentially private location division module that could automatically discretize locations according to access density of each region. However, as the size of discretized regions may be large, if directly applying existing local differential privacy based attribute method, the overall utility of collected results may be completely destroyed. Thus, we further improve the optimized binary local hash method, based on personalized differential privacy, to collect user visit frequency of each discretized region. This solution improve the accuracy of the collected results while satisfying the privacy of the user’s geographic location. Through experiments on synthetic and real data sets, this paper proves that the proposed method achieves higher accuracy than the best known method under the same privacy budget.

Download Full-text

Privacy-Preserving Hybrid K-Means

Censorship, Surveillance, and Privacy ◽

10.4018/978-1-5225-7113-1.ch049 ◽

2019 ◽

pp. 1009-1026

Author(s):

Zhiqiang Gao ◽

Yixiao Sun ◽

Xiaolong Cui ◽

Yutao Wang ◽

Yanyu Duan ◽

...

Keyword(s):

Data Mining ◽

Differential Privacy ◽

Privacy Preserving ◽

Local Optimum ◽

Data Sets ◽

Swarm Optimization ◽

Second Stage ◽

Private Data ◽

Privacy Budget ◽

Selection Of

This article describes how the most widely used clustering, k-means, is prone to fall into a local optimum. Notably, traditional clustering approaches are directly performed on private data and fail to cope with malicious attacks in massive data mining tasks against attackers' arbitrary background knowledge. It would result in violation of individuals' privacy, as well as leaks through system resources and clustering outputs. To address these issues, the authors propose an efficient privacy-preserving hybrid k-means under Spark. In the first stage, particle swarm optimization is executed in resilient distributed datasets to initiate the selection of clustering centroids in the k-means on Spark. In the second stage, k-means is executed on the condition that a privacy budget is set as ε/2t with Laplace noise added in each round of iterations. Extensive experimentation on public UCI data sets show that on the premise of guaranteeing utility of privacy data and scalability, their approach outperforms the state-of-the-art varieties of k-means by utilizing swarm intelligence and rigorous paradigms of differential privacy.

Download Full-text

Privacy-Preserving Hybrid K-Means

International Journal of Data Warehousing and Mining ◽

10.4018/ijdwm.2018040101 ◽

2018 ◽

Vol 14 (2) ◽

pp. 1-17 ◽

Cited By ~ 4

Author(s):

Zhiqiang Gao ◽

Yixiao Sun ◽

Xiaolong Cui ◽

Yutao Wang ◽

Yanyu Duan ◽

...

Keyword(s):

Differential Privacy ◽

State Of The Art ◽

Privacy Preserving ◽

Local Optimum ◽

Massive Data ◽

Data Sets ◽

Second Stage ◽

Private Data ◽

Privacy Budget ◽

Selection Of

This article describes how the most widely used clustering, k-means, is prone to fall into a local optimum. Notably, traditional clustering approaches are directly performed on private data and fail to cope with malicious attacks in massive data mining tasks against attackers' arbitrary background knowledge. It would result in violation of individuals' privacy, as well as leaks through system resources and clustering outputs. To address these issues, the authors propose an efficient privacy-preserving hybrid k-means under Spark. In the first stage, particle swarm optimization is executed in resilient distributed datasets to initiate the selection of clustering centroids in the k-means on Spark. In the second stage, k-means is executed on the condition that a privacy budget is set as ε/2t with Laplace noise added in each round of iterations. Extensive experimentation on public UCI data sets show that on the premise of guaranteeing utility of privacy data and scalability, their approach outperforms the state-of-the-art varieties of k-means by utilizing swarm intelligence and rigorous paradigms of differential privacy.

Download Full-text

On Sparse Linear Regression in the Local Differential Privacy Model

IEEE Transactions on Information Theory ◽

10.1109/tit.2020.3040406 ◽

2020 ◽

pp. 1-1

Author(s):

Di Wang ◽

Jinhui Xu

Keyword(s):

Linear Regression ◽

Differential Privacy ◽

Privacy Model

Download Full-text

Hybrid Neural Models For Rice Yields Times Forecasting

Jurnal Teknologi ◽

10.11113/jt.v52.128 ◽

2012 ◽

Author(s):

Ruhaidah Samsudin ◽

Puteh Saad ◽

Ani Shabri

Keyword(s):

Time Series ◽

Hybrid Model ◽

Moving Average ◽

Time Series Prediction ◽

Arima Model ◽

Data Sets ◽

Ann Model ◽

Yield Data ◽

Rice Yields ◽

Artificial Neural Network Ann

In this paper, time series prediction is considered as a problem of missing value. A model for the determination of the missing time series value is presented. The hybrid model integrating autoregressive intergrated moving average (ARIMA) and artificial neural network (ANN) model is developed to solve this problem. The developed models attempts to incorporate the linear characteristics of an ARIMA model and nonlinear patterns of ANN to create a hybrid model. In this study, time series modeling of rice yield data in Muda Irrigation area. Malaysia from 1995 to 2003 are considered. Experimental results with rice yields data sets indicate that the hybrid model improve the forecasting performance by either of the models used separately. Key words: ARIMA; Box and Jenkins; neural networks; rice yields; hybrid ANN model

Download Full-text

Optimal Distribution of Privacy Budget in Differential Privacy

Lecture Notes in Computer Science - Risks and Security of Internet and Systems ◽

10.1007/978-3-030-12143-3_18 ◽

2019 ◽

pp. 222-236

Author(s):

Anis Bkakria ◽

Aimilia Tasidou ◽

Nora Cuppens-Boulahia ◽

Frédéric Cuppens ◽

Fatma Bouattour ◽

...

Keyword(s):

Differential Privacy ◽

Optimal Distribution ◽

Privacy Budget

Download Full-text

Privacy-Preserving Gradient Boosting Decision Trees

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i01.5422 ◽

2020 ◽

Vol 34 (01) ◽

pp. 784-791 ◽

Cited By ~ 1

Author(s):

Qinbin Li ◽

Zhaomin Wu ◽

Zeyi Wen ◽

Bingsheng He

Keyword(s):

Machine Learning ◽

Differential Privacy ◽

Training Data ◽

Gradient Boosting ◽

Training Algorithm ◽

Model Accuracy ◽

Machine Learning Model ◽

Improve Model ◽

Privacy Budget ◽

Privacy Level

The Gradient Boosting Decision Tree (GBDT) is a popular machine learning model for various tasks in recent years. In this paper, we study how to improve model accuracy of GBDT while preserving the strong guarantee of differential privacy. Sensitivity and privacy budget are two key design aspects for the effectiveness of differential private models. Existing solutions for GBDT with differential privacy suffer from the significant accuracy loss due to too loose sensitivity bounds and ineffective privacy budget allocations (especially across different trees in the GBDT model). Loose sensitivity bounds lead to more noise to obtain a fixed privacy level. Ineffective privacy budget allocations worsen the accuracy loss especially when the number of trees is large. Therefore, we propose a new GBDT training algorithm that achieves tighter sensitivity bounds and more effective noise allocations. Specifically, by investigating the property of gradient and the contribution of each tree in GBDTs, we propose to adaptively control the gradients of training data for each iteration and leaf node clipping in order to tighten the sensitivity bounds. Furthermore, we design a novel boosting framework to allocate the privacy budget between trees so that the accuracy loss can be further reduced. Our experiments show that our approach can achieve much better model accuracy than other baselines.

Download Full-text

A Hybrid Model of Cross-Domain Authentication for Password Synchronization

Key Engineering Materials ◽

10.4028/www.scientific.net/kem.474-476.729 ◽

2011 ◽

Vol 474-476 ◽

pp. 729-734

Author(s):

Qiu Yu Zhang ◽

Zhi Peng Cai ◽

Zhan Ting Yuan ◽

Feng Man Miao

Keyword(s):

Distributed Computing ◽

Theoretical Analysis ◽

Hybrid Model ◽

Transport Protocols ◽

Key Technology ◽

Cross Domain ◽

New Type ◽

Hybrid Cross

Cross-domain authentication is a key technology used in distributed computing, however, it isn’t perfect. In this paper, a new type of hybrid cross-domain authentication model is proposed to make up its shortcoming in safety, scalability and password synchronization. In this model, advantages of Kerberos and SAML in cross-domain authentication process are combined, and it mixed password transport protocols is adopted to achieve password synchronization. Theoretical analysis shows it can enhance the security and scalability of cross-domain authentication, the efficiency of cross-domain authentication is also improved as the attainment of password synchronization.

Download Full-text

Analysis of Different Evolutionary Techniques on Fuzzy Rule Base Generation

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2019.8286 ◽

2019 ◽

Vol 16 (9) ◽

pp. 4008-4014

Author(s):

Savita Wadhawan ◽

Gautam Kumar ◽

Vivek Bhatnagar

Keyword(s):

Local Search ◽

Fuzzy Rule ◽

Numerical Data ◽

Memetic Algorithms ◽

Population Based ◽

Rule Base ◽

Data Sets ◽

Battery Charger ◽

Data Set ◽

Key Issues

This paper presents the analysis of different population based algorithms for the rulebase generation from numerical data sets. As fuzzy rulebase generation is one of the key issues in fuzzy modeling. The algorithms are applied on a rapid Ni–Cd battery charger data set. In this paper, we compare the efficiency of different algorithms and conclude that SCA algorithms with local search give remarkable efficiency as compared to SCA algorithms alone. Also found that the efficiency of SCA with local search is comparable to memetic algorithms.

Download Full-text