scholarly journals Algorithms for synthetic data release under differential privacy

2016 ◽  
Author(s):  
Jun Zhang
2019 ◽  
Vol 2019 (1) ◽  
pp. 26-46 ◽  
Author(s):  
Thee Chanyaswad ◽  
Changchang Liu ◽  
Prateek Mittal

Abstract A key challenge facing the design of differential privacy in the non-interactive setting is to maintain the utility of the released data. To overcome this challenge, we utilize the Diaconis-Freedman-Meckes (DFM) effect, which states that most projections of high-dimensional data are nearly Gaussian. Hence, we propose the RON-Gauss model that leverages the novel combination of dimensionality reduction via random orthonormal (RON) projection and the Gaussian generative model for synthesizing differentially-private data. We analyze how RON-Gauss benefits from the DFM effect, and present multiple algorithms for a range of machine learning applications, including both unsupervised and supervised learning. Furthermore, we rigorously prove that (a) our algorithms satisfy the strong ɛ-differential privacy guarantee, and (b) RON projection can lower the level of perturbation required for differential privacy. Finally, we illustrate the effectiveness of RON-Gauss under three common machine learning applications – clustering, classification, and regression – on three large real-world datasets. Our empirical results show that (a) RON-Gauss outperforms previous approaches by up to an order of magnitude, and (b) loss in utility compared to the non-private real data is small. Thus, RON-Gauss can serve as a key enabler for real-world deployment of privacy-preserving data release.


2019 ◽  
Vol 623 ◽  
pp. A156 ◽  
Author(s):  
H. E. Delgado ◽  
L. M. Sarro ◽  
G. Clementini ◽  
T. Muraveva ◽  
A. Garofalo

In a recent study we analysed period–luminosity–metallicity (PLZ) relations for RR Lyrae stars using theGaiaData Release 2 (DR2) parallaxes. It built on a previous work that was based on the firstGaiaData Release (DR1), and also included period–luminosity (PL) relations for Cepheids and RR Lyrae stars. The method used to infer the relations fromGaiaDR2 data and one of the methods used forGaiaDR1 data was based on a Bayesian model, the full description of which was deferred to a subsequent publication. This paper presents the Bayesian method for the inference of the parameters ofPL(Z) relations used in those studies, the main feature of which is to manage the uncertainties on observables in a rigorous and well-founded way. The method encodes the probability relationships between the variables of the problem in a hierarchical Bayesian model and infers the posterior probability distributions of thePL(Z) relationship coefficients using Markov chain Monte Carlo simulation techniques. We evaluate the method with several semi-synthetic data sets and apply it to a sample of 200 fundamental and first-overtone RR Lyrae stars for whichGaiaDR1 parallaxes and literatureKs-band mean magnitudes are available. We define and test several hyperprior probabilities to verify their adequacy and check the sensitivity of the solution with respect to the prior choice. The main conclusion of this work, based on the test with semi-syntheticGaiaDR1 parallaxes, is the absolute necessity of incorporating the existing correlations between the period, metallicity, and parallax measurements in the form of model priors in order to avoid systematically biased results, especially in the case of non-negligible uncertainties in the parallaxes. The relation coefficients obtained here have been superseded by those presented in our recent paper that incorporates the findings of this work and the more recentGaiaDR2 measurements.


2020 ◽  
Vol 108 ◽  
pp. 1314-1323 ◽  
Author(s):  
Heng Ye ◽  
Jiqiang Liu ◽  
Wei Wang ◽  
Ping Li ◽  
Tong Li ◽  
...  

2021 ◽  
Vol 14 (10) ◽  
pp. 1730-1742
Author(s):  
Yingtai Xiao ◽  
Zeyu Ding ◽  
Yuxin Wang ◽  
Danfeng Zhang ◽  
Daniel Kifer

In practice, differentially private data releases are designed to support a variety of applications. A data release is fit for use if it meets target accuracy requirements for each application. In this paper, we consider the problem of answering linear queries under differential privacy subject to per-query accuracy constraints. Existing practical frameworks like the matrix mechanism do not provide such fine-grained control (they optimize total error, which allows some query answers to be more accurate than necessary, at the expense of other queries that become no longer useful). Thus, we design a fitness-for-use strategy that adds privacy-preserving Gaussian noise to query answers. The covariance structure of the noise is optimized to meet the fine-grained accuracy requirements while minimizing the cost to privacy.


Sign in / Sign up

Export Citation Format

Share Document