Algorithms for synthetic data release under differential privacy

Abstract A key challenge facing the design of differential privacy in the non-interactive setting is to maintain the utility of the released data. To overcome this challenge, we utilize the Diaconis-Freedman-Meckes (DFM) effect, which states that most projections of high-dimensional data are nearly Gaussian. Hence, we propose the RON-Gauss model that leverages the novel combination of dimensionality reduction via random orthonormal (RON) projection and the Gaussian generative model for synthesizing differentially-private data. We analyze how RON-Gauss benefits from the DFM effect, and present multiple algorithms for a range of machine learning applications, including both unsupervised and supervised learning. Furthermore, we rigorously prove that (a) our algorithms satisfy the strong ɛ-differential privacy guarantee, and (b) RON projection can lower the level of perturbation required for differential privacy. Finally, we illustrate the effectiveness of RON-Gauss under three common machine learning applications – clustering, classification, and regression – on three large real-world datasets. Our empirical results show that (a) RON-Gauss outperforms previous approaches by up to an order of magnitude, and (b) loss in utility compared to the non-private real data is small. Thus, RON-Gauss can serve as a key enabler for real-world deployment of privacy-preserving data release.

Download Full-text

Spatial Statistic Data Release Based on Differential Privacy

KSII Transactions on Internet and Information Systems ◽

10.3837/tiis.2019.10.023 ◽

2019 ◽

Vol 13 (10) ◽

Keyword(s):

Differential Privacy ◽

Spatial Statistic ◽

Data Release

Download Full-text

Hierarchical Bayesian model to inferPL(Z)relations usingGaiaparallaxes

Astronomy and Astrophysics ◽

10.1051/0004-6361/201832945 ◽

2019 ◽

Vol 623 ◽

pp. A156 ◽

Cited By ~ 3

Author(s):

H. E. Delgado ◽

L. M. Sarro ◽

G. Clementini ◽

T. Muraveva ◽

A. Garofalo

Keyword(s):

Bayesian Model ◽

Probability Distributions ◽

Synthetic Data ◽

Full Description ◽

Hierarchical Bayesian ◽

Hierarchical Bayesian Model ◽

Rr Lyrae Stars ◽

Rr Lyrae ◽

Data Release ◽

Lyrae Stars

In a recent study we analysed period–luminosity–metallicity (PLZ) relations for RR Lyrae stars using theGaiaData Release 2 (DR2) parallaxes. It built on a previous work that was based on the firstGaiaData Release (DR1), and also included period–luminosity (PL) relations for Cepheids and RR Lyrae stars. The method used to infer the relations fromGaiaDR2 data and one of the methods used forGaiaDR1 data was based on a Bayesian model, the full description of which was deferred to a subsequent publication. This paper presents the Bayesian method for the inference of the parameters ofPL(Z) relations used in those studies, the main feature of which is to manage the uncertainties on observables in a rigorous and well-founded way. The method encodes the probability relationships between the variables of the problem in a hierarchical Bayesian model and infers the posterior probability distributions of thePL(Z) relationship coefficients using Markov chain Monte Carlo simulation techniques. We evaluate the method with several semi-synthetic data sets and apply it to a sample of 200 fundamental and first-overtone RR Lyrae stars for whichGaiaDR1 parallaxes and literatureKs-band mean magnitudes are available. We define and test several hyperprior probabilities to verify their adequacy and check the sensitivity of the solution with respect to the prior choice. The main conclusion of this work, based on the test with semi-syntheticGaiaDR1 parallaxes, is the absolute necessity of incorporating the existing correlations between the period, metallicity, and parallax measurements in the form of model priors in order to avoid systematically biased results, especially in the case of non-negligible uncertainties in the parallaxes. The relation coefficients obtained here have been superseded by those presented in our recent paper that incorporates the findings of this work and the more recentGaiaDR2 measurements.

Download Full-text

An Enhanced Algorithm for Dynamic Data Release Based on Differential Privacy

Procedia Computer Science ◽

10.1016/j.procs.2020.06.050 ◽

2020 ◽

Vol 174 ◽

pp. 15-21

Author(s):

H.Y. Kang ◽

Y.L. Ma ◽

X.M. Si

Keyword(s):

Differential Privacy ◽

Dynamic Data ◽

Data Release

Download Full-text

Secure and efficient outsourcing differential privacy data release scheme in Cyber–physical system

Future Generation Computer Systems ◽

10.1016/j.future.2018.03.034 ◽

2020 ◽

Vol 108 ◽

pp. 1314-1323 ◽

Cited By ~ 6

Author(s):

Heng Ye ◽

Jiqiang Liu ◽

Wei Wang ◽

Ping Li ◽

Tong Li ◽

...

Keyword(s):

Physical System ◽

Differential Privacy ◽

Cyber Physical System ◽

Data Release

Download Full-text

Optimizing fitness-for-use of differentially private linear queries

Proceedings of the VLDB Endowment ◽

10.14778/3467861.3467864 ◽

2021 ◽

Vol 14 (10) ◽

pp. 1730-1742

Author(s):

Yingtai Xiao ◽

Zeyu Ding ◽

Yuxin Wang ◽

Danfeng Zhang ◽

Daniel Kifer

Keyword(s):

Gaussian Noise ◽

Differential Privacy ◽

Covariance Structure ◽

Fine Grained ◽

Private Data ◽

Data Release ◽

The Matrix ◽

The Cost ◽

Matrix Mechanism ◽

Accuracy Constraints

In practice, differentially private data releases are designed to support a variety of applications. A data release is fit for use if it meets target accuracy requirements for each application. In this paper, we consider the problem of answering linear queries under differential privacy subject to per-query accuracy constraints. Existing practical frameworks like the matrix mechanism do not provide such fine-grained control (they optimize total error, which allows some query answers to be more accurate than necessary, at the expense of other queries that become no longer useful). Thus, we design a fitness-for-use strategy that adds privacy-preserving Gaussian noise to query answers. The covariance structure of the noise is optimized to meet the fine-grained accuracy requirements while minimizing the cost to privacy.

Download Full-text

Privacy Preserving in Location Data Release: A Differential Privacy Approach

Lecture Notes in Computer Science - PRICAI 2014: Trends in Artificial Intelligence ◽

10.1007/978-3-319-13560-1_15 ◽

2014 ◽

pp. 183-195 ◽

Cited By ~ 2

Author(s):

Ping Xiong ◽

Tianqing Zhu ◽

Lei Pan ◽

Wenjia Niu ◽

Gang Li

Keyword(s):

Differential Privacy ◽

Privacy Preserving ◽

Location Data ◽

Data Release

Download Full-text

Private FL-GAN: Differential Privacy Synthetic Data Generation Based on Federated Learning

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp40776.2020.9054559 ◽

2020 ◽

Cited By ~ 1

Author(s):

Bangzhou Xin ◽

Wei Yang ◽

Yangyang Geng ◽

Sheng Chen ◽

Shaowei Wang ◽

...

Keyword(s):

Differential Privacy ◽

Synthetic Data ◽

Data Generation ◽

Synthetic Data Generation

Download Full-text