scholarly journals Beyond Differential Privacy: Synthetic Micro-Data Generation with Deep Generative Neural Networks

Author(s):  
Ofer Mendelevitch ◽  
Michael D. Lesh
2021 ◽  
Vol 25 (1) ◽  
pp. 138-161
Author(s):  
O. G. Bondar ◽  
E. O. Brezhneva ◽  
O. G. Dobroserdov ◽  
K. G. Andreev ◽  
N. V. Polyakov

Purpose of research: search and analysis of existing models of gas-sensitive sensors. Development of mathematical models of gas-sensitive sensors of various types (semiconductor, thermocatalytic, optical, electrochemical) for their subsequent use in the training of artificial neural networks (INS). Investigation of main physicochemical patterns underlying the principles of sensor operation, consideration of the influence of environmental factors and cross-sensitivity on the sensor output signal. Comparison of simulation results with actual characteristics produced by the sensor industry. The concept of creating mathematical models is described. Their parameterization, research and assessment of adequacy are carried out.Methods. Numerical methods, computer modeling methods, electrical circuit theory, the theory of chemosorption and heterogeneous catalysis, the Freundlich and Langmuir equations, the Buger-Lambert-Behr law, the foundations of electrochemistry were used in creating mathematical models. Standard deviation (MSE) and relative error were calculated to assess the adequacy of the models.Results. The concept of creating mathematical models of sensors based on physicochemical patterns is described. This concept allows the process of data generation for training artificial neural networks used in multi-component gas analyzers for the purpose of joint information processing to be automated. Models of semiconductor, thermocatalytic, optical and electrochemical sensors were obtained and upgraded, considering the influence of additional factors on the sensor signal. Parameterization and assessment of adequacy and extrapolation properties of models by graphical dependencies presented in technical documentation of sensors were carried out. Errors (relative and RMS) of discrepancy of real data and results of simulation of gas-sensitive sensors by basic parameters are determined. The standard error of reproduction of the main characteristics of the sensors did not exceed 0.5%.Conclusion. Multivariable mathematical models of gas-sensitive sensors are synthesized, considering the influence of main gas and external factors (pressure, temperature, humidity, cross-sensitivity) on the output signal and allowing to generate training data for sensors of various types.


Author(s):  
S Thivaharan ◽  
G Srivatsun

The amount of data generated by modern communication devices is enormous, reaching petabytes. The rate of data generation is also increasing at an unprecedented rate. Though modern technology supports storage in massive amounts, the industry is reluctant in retaining the data, which includes the following characteristics: redundancy in data, unformatted records with outdated information, data that misleads the prediction and data with no impact on the class prediction. Out of all of this data, social media plays a significant role in data generation. As compared to other data generators, the ratio at which the social media generates the data is comparatively higher. Industry and governments are both worried about the circulation of mischievous or malcontents, as they are extremely susceptible and are used by criminals. So it is high time to develop a model to classify the social media contents as fair and unfair. The developed model should have higher accuracy in predicting the class of contents. In this article, tensor flow based deep neural networks are deployed with a fixed Epoch count of 15, in order to attain 25% more accuracy over the other existing models. Activation methods like “Relu” and “Sigmoid”, which are specific for Tensor flow platforms support to attain the improved prediction accuracy.


2020 ◽  
Author(s):  
Ishika Singh ◽  
Haoyi Zhou ◽  
Kunlin Yang ◽  
Meng Ding ◽  
Bill Lin ◽  
...  

Neural architecture search, which aims to automatically search for architectures (e.g., convolution, max pooling) of neural networks that maximize validation performance, has achieved remarkable progress recently. In many application scenarios, several parties would like to collaboratively search for a shared neural architecture by leveraging data from all parties. However, due to privacy concerns, no party wants its data to be seen by other parties. To address this problem, we propose federated neural architecture search (FNAS), where different parties collectively search for a differentiable architecture by exchanging gradients of architecture variables without exposing their data to other parties. To further preserve privacy, we study differentially-private FNAS (DP-FNAS), which adds random noise to the gradients of architecture variables. We provide theoretical guarantees of DP-FNAS in achieving differential privacy. Experiments show that DP-FNAS can search highly-performant neural architectures while protecting the privacy of individual parties. The code is available at https://github.com/UCSD-AI4H/DP-FNAS


2021 ◽  
Author(s):  
Ali Hatamizadeh ◽  
Hongxu Yin ◽  
Pavlo Molchanov ◽  
Andriy Myronenko ◽  
Wenqi Li ◽  
...  

Abstract Federated learning (FL) allows the collaborative training of AI models without needing to share raw data. This capability makes it especially interesting for healthcare applications where patient and data privacy is of utmost concern. However, recent works on the inversion of deep neural networks from model gradients raised concerns about the security of FL in preventing the leakage of training data. In this work, we show that these attacks presented in the literature are impractical in real FL use-cases and provide a new baseline attack that works for more realistic scenarios where the clients’ training involves updating the Batch Normalization (BN) statistics. Furthermore, we present new ways to measure and visualize potential data leakage in FL. Our work is a step towards establishing reproducible methods of measuring data leakage in FL and could help determine the optimal tradeoffs between privacy-preserving techniques, such as differential privacy, and model accuracy based on quantifiable metrics.


2021 ◽  
Vol 118 (15) ◽  
pp. e2101344118
Author(s):  
Qiao Liu ◽  
Jiaze Xu ◽  
Rui Jiang ◽  
Wing Hung Wong

Density estimation is one of the fundamental problems in both statistics and machine learning. In this study, we propose Roundtrip, a computational framework for general-purpose density estimation based on deep generative neural networks. Roundtrip retains the generative power of deep generative models, such as generative adversarial networks (GANs) while it also provides estimates of density values, thus supporting both data generation and density estimation. Unlike previous neural density estimators that put stringent conditions on the transformation from the latent space to the data space, Roundtrip enables the use of much more general mappings where target density is modeled by learning a manifold induced from a base density (e.g., Gaussian distribution). Roundtrip provides a statistical framework for GAN models where an explicit evaluation of density values is feasible. In numerical experiments, Roundtrip exceeds state-of-the-art performance in a diverse range of density estimation tasks.


2021 ◽  
Vol 2021 (1) ◽  
pp. 64-84
Author(s):  
Ashish Dandekar ◽  
Debabrota Basu ◽  
Stéphane Bressan

AbstractThe calibration of noise for a privacy-preserving mechanism depends on the sensitivity of the query and the prescribed privacy level. A data steward must make the non-trivial choice of a privacy level that balances the requirements of users and the monetary constraints of the business entity.Firstly, we analyse roles of the sources of randomness, namely the explicit randomness induced by the noise distribution and the implicit randomness induced by the data-generation distribution, that are involved in the design of a privacy-preserving mechanism. The finer analysis enables us to provide stronger privacy guarantees with quantifiable risks. Thus, we propose privacy at risk that is a probabilistic calibration of privacy-preserving mechanisms. We provide a composition theorem that leverages privacy at risk. We instantiate the probabilistic calibration for the Laplace mechanism by providing analytical results.Secondly, we propose a cost model that bridges the gap between the privacy level and the compensation budget estimated by a GDPR compliant business entity. The convexity of the proposed cost model leads to a unique fine-tuning of privacy level that minimises the compensation budget. We show its effectiveness by illustrating a realistic scenario that avoids overestimation of the compensation budget by using privacy at risk for the Laplace mechanism. We quantitatively show that composition using the cost optimal privacy at risk provides stronger privacy guarantee than the classical advanced composition. Although the illustration is specific to the chosen cost model, it naturally extends to any convex cost model. We also provide realistic illustrations of how a data steward uses privacy at risk to balance the trade-off between utility and privacy.


Author(s):  
George Leal Jamil ◽  
Alexis Rocha da Silva

Users' personal, highly sensitive data such as photos and voice recordings are kept indefinitely by the companies that collect it. Users can neither delete nor restrict the purposes for which it is used. Learning how to machine learning that protects privacy, we can make a huge difference in solving many social issues like curing disease, etc. Deep neural networks are susceptible to various inference attacks as they remember information about their training data. In this chapter, the authors introduce differential privacy, which ensures that different kinds of statistical analysis don't compromise privacy and federated learning, training a machine learning model on a data to which we do not have access to.


Sign in / Sign up

Export Citation Format

Share Document