Calibrating Noise to Sensitivity in Private Data Analysis

Cynthia Dwork; Frank McSherry; Kobbi Nissim; Adam Smith

doi:10.29012/jpc.v7i3.405

Calibrating Noise to Sensitivity in Private Data Analysis

Journal of Privacy and Confidentiality ◽

10.29012/jpc.v7i3.405 ◽

2017 ◽

Vol 7 (3) ◽

pp. 17-51 ◽

Cited By ~ 15

Author(s):

Cynthia Dwork ◽

Frank McSherry ◽

Kobbi Nissim ◽

Adam Smith

Keyword(s):

Standard Deviation ◽

Differential Privacy ◽

Random Noise ◽

Sensitive Information ◽

Private Data ◽

Single Argument ◽

True Answer ◽

Definition Of ◽

Analytical Tools ◽

Separation Results

We continue a line of research initiated in Dinur and Nissim (2003); Dwork and Nissim (2004); and Blum et al. (2005) on privacy-preserving statistical databases. Consider a trusted server that holds a database of sensitive information. Given a query function $f$ mapping databases to reals, the so-called {\em true answer} is the result of applying $f$ to the database. To protect privacy, the true answer is perturbed by the addition of random noise generated according to a carefully chosen distribution, and this response, the true answer plus noise, is returned to the user. Previous work focused on the case of noisy sums, in which $f = \sum_i g(x_i)$, where $x_i$ denotes the $i$th row of the database and $g$ maps database rows to $[0,1]$. We extend the study to general functions $f$, proving that privacy can be preserved by calibrating the standard deviation of the noise according to the {\em sensitivity} of the function $f$. Roughly speaking, this is the amount that any single argument to $f$ can change its output. The new analysis shows that for several particular applications substantially less noise is needed than was previously understood to be the case. The first step is a very clean definition of privacy---now known as differential privacy---and measure of its loss. We also provide a set of tools for designing and combining differentially private algorithms, permitting the construction of complex differentially private analytical tools from simple differentially private primitives. Finally, we obtain separation results showing the increased value of interactive statistical release mechanisms over non-interactive ones.

Download Full-text

Kamino

Proceedings of the VLDB Endowment ◽

10.14778/3467861.3467876 ◽

2021 ◽

Vol 14 (10) ◽

pp. 1886-1899

Author(s):

Chang Ge ◽

Shubhankar Mohapatra ◽

Xi He ◽

Ihab F. Ilyas

Keyword(s):

Differential Privacy ◽

State Of The Art ◽

Synthesis Methods ◽

Sensitive Information ◽

Synthesis System ◽

Data Synthesis ◽

Generative Process ◽

Original Dataset ◽

Private Data ◽

Data Owner

Organizations are increasingly relying on data to support decisions. When data contains private and sensitive information, the data owner often desires to publish a synthetic database instance that is similarly useful as the true data, while ensuring the privacy of individual data records. Existing differentially private data synthesis methods aim to generate useful data based on applications, but they fail in keeping one of the most fundamental data properties of the structured data --- the underlying correlations and dependencies among tuples and attributes (i.e., the structure of the data). This structure is often expressed as integrity and schema constraints, or with a probabilistic generative process. As a result, the synthesized data is not useful for any downstream tasks that require this structure to be preserved. This work presents KAMINO, a data synthesis system to ensure differential privacy and to preserve the structure and correlations present in the original dataset. KAMINO takes as input of a database instance, along with its schema (including integrity constraints), and produces a synthetic database instance with differential privacy and structure preservation guarantees. We empirically show that while preserving the structure of the data, KAMINO achieves comparable and even better usefulness in applications of training classification models and answering marginal queries than the state-of-the-art methods of differentially private data synthesis.

Download Full-text

RON-Gauss: Enhancing Utility in Non-Interactive Private Data Release

Proceedings on Privacy Enhancing Technologies ◽

10.2478/popets-2019-0003 ◽

2019 ◽

Vol 2019 (1) ◽

pp. 26-46 ◽

Cited By ~ 2

Author(s):

Thee Chanyaswad ◽

Changchang Liu ◽

Prateek Mittal

Keyword(s):

Machine Learning ◽

Real World ◽

Differential Privacy ◽

Real Data ◽

The Novel ◽

Private Data ◽

Data Release ◽

Machine Learning Applications ◽

Order Of Magnitude ◽

Real World Datasets

Abstract A key challenge facing the design of differential privacy in the non-interactive setting is to maintain the utility of the released data. To overcome this challenge, we utilize the Diaconis-Freedman-Meckes (DFM) effect, which states that most projections of high-dimensional data are nearly Gaussian. Hence, we propose the RON-Gauss model that leverages the novel combination of dimensionality reduction via random orthonormal (RON) projection and the Gaussian generative model for synthesizing differentially-private data. We analyze how RON-Gauss benefits from the DFM effect, and present multiple algorithms for a range of machine learning applications, including both unsupervised and supervised learning. Furthermore, we rigorously prove that (a) our algorithms satisfy the strong ɛ-differential privacy guarantee, and (b) RON projection can lower the level of perturbation required for differential privacy. Finally, we illustrate the effectiveness of RON-Gauss under three common machine learning applications – clustering, classification, and regression – on three large real-world datasets. Our empirical results show that (a) RON-Gauss outperforms previous approaches by up to an order of magnitude, and (b) loss in utility compared to the non-private real data is small. Thus, RON-Gauss can serve as a key enabler for real-world deployment of privacy-preserving data release.

Download Full-text

Model for research of the style of political leadership in Serbia at the crossing of the millennium

Filozofija i drustvo ◽

10.2298/fid1302211p ◽

2013 ◽

Vol 24 (2) ◽

pp. 211-238 ◽

Cited By ~ 2

Author(s):

Predrag Pavlicevic

Keyword(s):

Political Communication ◽

Political Leadership ◽

The Political ◽

Political Elites ◽

Political Leaders ◽

Political Language ◽

Theoretical Approaches ◽

Definition Of ◽

Analytical Tools ◽

Aforementioned Study

This article indicated a model for a scientific description of styles of political leadership in Serbia from 1990 to the present, more precisely, pointed the basic elements of concept developed by the author in the study ?The style of political leaders in Serbia in the period 1990-2006? (2010). For the evaluation the author uses analytical tools that include the aforementioned concept, simultaneously indicating correlative theoretical approaches the aforementioned study did not examine, and may be of importance for the research of political elites in Serbia. This contributes the epistemological part of the method, which is registered in the definition of the style of political leadership as a term and the category apparatus that follows - understood from the aspect of the political style: the style in building political power, the style of political communication, the style of building one?s legitimacy, the ideological style, the styles of political language, symbolism and rituals, non-verbal communication and style in expressing patriotism. Starting from the fact that political styles are related to characteristics of political cultures and that it is necessary to make a concept of ideal typical models of styles focused on political subjects, this article marked the styles of political leadership typology related to the specific acting of political leaders in Serbia: authoritarian, republican, realistic, populist, conformist, revolutionary and style of a politician-rebel.

Download Full-text

Inductive learning and local differential privacy for privacy-preserving offloading in mobile edge intelligent systems

10.36227/techrxiv.13698883 ◽

2021 ◽

Author(s):

Jude TCHAYE-KONDI ◽

Yanlong Zhai ◽

Liehuang Zhu

Keyword(s):

Intelligent Systems ◽

Differential Privacy ◽

Inductive Learning ◽

Random Noise ◽

Privacy Preserving ◽

Sensitive Data ◽

Privacy Concerns ◽

Feature Extractor ◽

Data Source ◽

Series Of Experiments

<div>We address privacy and latency issues in the edge/cloud computing environment while training a centralized AI model. In our particular case, the edge devices are the only data source for the model to train on the central server. Current privacy-preserving and reducing network latency solutions rely on a pre-trained feature extractor deployed on the devices to help extract only important features from the sensitive dataset. However, finding a pre-trained model or pubic dataset to build a feature extractor for certain tasks may turn out to be very challenging. With the large amount of data generated by edge devices, the edge environment does not really lack data, but its improper access may lead to privacy concerns. In this paper, we present DeepGuess , a new privacy-preserving, and latency aware deeplearning framework. DeepGuess uses a new learning mechanism enabled by the AutoEncoder(AE) architecture called Inductive Learning, which makes it possible to train a central neural network using the data produced by end-devices while preserving their privacy. With inductive learning, sensitive data remains on devices and is not explicitly involved in any backpropagation process. The AE’s Encoder is deployed on devices to extracts and transfers important features to the server. To enhance privacy, we propose a new local deferentially private algorithm that allows the Edge devices to apply random noise to features extracted from their sensitive data before transferred to an untrusted server. The experimental evaluation of DeepGuess demonstrates its effectiveness and ability to converge on a series of experiments.</div>

Download Full-text

Differential Privacy for Statistics: What we Know and What we Want to Learn

Journal of Privacy and Confidentiality ◽

10.29012/jpc.v1i2.570 ◽

2010 ◽

Vol 1 (2) ◽

Cited By ~ 30

Author(s):

Cynthia Dwork ◽

Adam Smith

Keyword(s):

Research Agenda ◽

Data Privacy ◽

Differential Privacy ◽

Definition Of

We motivate and review the definition of differential privacy, survey some results on differentially private statistical estimators, and outline a research agenda. This survey is based on two presentations given by the authors at an NCHS/CDC sponsored workshop on data privacy in May 2008.

Download Full-text

Learning With Differential Privacy

Handbook of Research on Cyber Crime and Information Privacy - Advances in Information Security, Privacy, and Ethics ◽

10.4018/978-1-7998-5728-0.ch019 ◽

2021 ◽

pp. 372-395

Author(s):

Poushali Sengupta ◽

Sudipta Paul ◽

Subhankar Mishra

Keyword(s):

Privacy Protection ◽

Differential Privacy ◽

Intrusion Detection Systems ◽

Sensitive Information ◽

Detection Systems ◽

Trade Offs ◽

Personal Level ◽

Encryption Decryption ◽

Individual Trees ◽

Prevention Methods

The leakage of data might have an extreme effect on the personal level if it contains sensitive information. Common prevention methods like encryption-decryption, endpoint protection, intrusion detection systems are prone to leakage. Differential privacy comes to the rescue with a proper promise of protection against leakage, as it uses a randomized response technique at the time of collection of the data which promises strong privacy with better utility. Differential privacy allows one to access the forest of data by describing their pattern of groups without disclosing any individual trees. The current adaption of differential privacy by leading tech companies and academia encourages authors to explore the topic in detail. The different aspects of differential privacy, its application in privacy protection and leakage of information, a comparative discussion on the current research approaches in this field, its utility in the real world as well as the trade-offs will be discussed.

Download Full-text

Privacy in Control and Dynamical Systems

Annual Review of Control Robotics and Autonomous Systems ◽

10.1146/annurev-control-060117-105018 ◽

2018 ◽

Vol 1 (1) ◽

pp. 309-332 ◽

Cited By ~ 10

Author(s):

Shuo Han ◽

George J. Pappas

Keyword(s):

Dynamical Systems ◽

Smart Grids ◽

Differential Privacy ◽

Side Information ◽

Sensitive Information ◽

Efficient Operation ◽

The Public ◽

Rigorous Approach ◽

Trade Offs ◽

User Data

Many modern dynamical systems, such as smart grids and traffic networks, rely on user data for efficient operation. These data often contain sensitive information that the participating users do not wish to reveal to the public. One major challenge is to protect the privacy of participating users when utilizing user data. Over the past decade, differential privacy has emerged as a mathematically rigorous approach that provides strong privacy guarantees. In particular, differential privacy has several useful properties, including resistance to both postprocessing and the use of side information by adversaries. Although differential privacy was first proposed for static-database applications, this review focuses on its use in the context of control systems, in which the data under processing often take the form of data streams. Through two major applications—filtering and optimization algorithms—we illustrate the use of mathematical tools from control and optimization to convert a nonprivate algorithm to its private counterpart. These tools also enable us to quantify the trade-offs between privacy and system performance.

Download Full-text

Descriptive and Predictive Analytical Methods for Big Data

Web Services ◽

10.4018/978-1-5225-7501-6.ch018 ◽

2019 ◽

pp. 314-331 ◽

Cited By ~ 1

Author(s):

Sema A. Kalaian ◽

Rafa M. Kasim ◽

Nabeel R. Kasim

Keyword(s):

Big Data ◽

Standard Deviation ◽

Linear Regression ◽

Multiple Linear Regression ◽

Knowledge Discovery ◽

Data Visualization ◽

Analytical Methods ◽

Data Analytics ◽

Enterprise Performance ◽

Analytical Tools

Data analytics and modeling are powerful analytical tools for knowledge discovery through examining and capturing the complex and hidden relationships and patterns among the quantitative variables in the existing massive structured Big Data in efforts to predict future enterprise performance. The main purpose of this chapter is to present a conceptual and practical overview of some of the basic and advanced analytical tools for analyzing structured Big Data. The chapter covers descriptive and predictive analytical methods. Descriptive analytical tools such as mean, median, mode, variance, standard deviation, and data visualization methods (e.g., histograms, line charts) are covered. Predictive analytical tools for analyzing Big Data such as correlation, simple- and multiple- linear regression are also covered in the chapter.

Download Full-text

Privacy Preserving Machine Learning and Deep Learning Techniques

Handbook of Research on Applications and Implementations of Machine Learning Techniques - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-5225-9902-9.ch012 ◽

2020 ◽

pp. 222-235

Author(s):

Divya Asok ◽

Chitra P. ◽

Bharathiraja Muthurajan

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Data Privacy ◽

Differential Privacy ◽

Digital Data ◽

Sensitive Data ◽

Security Issues ◽

Private Data ◽

Learning Techniques ◽

Machine Learning Applications

In the past years, the usage of internet and quantity of digital data generated by large organizations, firms, and governments have paved the way for the researchers to focus on security issues of private data. This collected data is usually related to a definite necessity. For example, in the medical field, health record systems are used for the exchange of medical data. In addition to services based on users' current location, many potential services rely on users' location history or their spatial-temporal provenance. However, most of the collected data contain data identifying individual which is sensitive. With the increase of machine learning applications around every corner of the society, it could significantly contribute to the preservation of privacy of both individuals and institutions. This chapter gives a wider perspective on the current literature on privacy ML and deep learning techniques, along with the non-cryptographic differential privacy approach for ensuring sensitive data privacy.

Download Full-text

Privacy-Preserving Monotonicity of Differential Privacy Mechanisms

Applied Sciences ◽

10.3390/app8112081 ◽

2018 ◽

Vol 8 (11) ◽

pp. 2081 ◽

Cited By ~ 1

Author(s):

Hai Liu ◽

Zhenqiang Wu ◽

Yihui Zhou ◽

Changgen Peng ◽

Feng Tian ◽

...

Keyword(s):

Differential Privacy ◽

Estimation Error ◽

Privacy Preserving ◽

Randomized Response ◽

Response Mechanism ◽

Rational Model ◽

Trade Off ◽

Definition Of ◽

Utility Metrics ◽

Monotonicity Results

Differential privacy mechanisms can offer a trade-off between privacy and utility by using privacy metrics and utility metrics. The trade-off of differential privacy shows that one thing increases and another decreases in terms of privacy metrics and utility metrics. However, there is no unified trade-off measurement of differential privacy mechanisms. To this end, we proposed the definition of privacy-preserving monotonicity of differential privacy, which measured the trade-off between privacy and utility. First, to formulate the trade-off, we presented the definition of privacy-preserving monotonicity based on computational indistinguishability. Second, building on privacy metrics of the expected estimation error and entropy, we theoretically and numerically showed privacy-preserving monotonicity of Laplace mechanism, Gaussian mechanism, exponential mechanism, and randomized response mechanism. In addition, we also theoretically and numerically analyzed the utility monotonicity of these several differential privacy mechanisms based on utility metrics of modulus of characteristic function and variant of normalized entropy. Third, according to the privacy-preserving monotonicity of differential privacy, we presented a method to seek trade-off under a semi-honest model and analyzed a unilateral trade-off under a rational model. Therefore, privacy-preserving monotonicity can be used as a criterion to evaluate the trade-off between privacy and utility in differential privacy mechanisms under the semi-honest model. However, privacy-preserving monotonicity results in a unilateral trade-off of the rational model, which can lead to severe consequences.

Download Full-text