Privacy Protection from Sampling and Perturbation in Survey Microdata

Natalie Shlomo; Chris J. Skinner

doi:10.29012/jpc.v4i1.615

Privacy Protection from Sampling and Perturbation in Survey Microdata

Journal of Privacy and Confidentiality ◽

10.29012/jpc.v4i1.615 ◽

2012 ◽

Vol 4 (1) ◽

Cited By ~ 3

Author(s):

Natalie Shlomo ◽

Chris J. Skinner

Keyword(s):

Computer Science ◽

Privacy Protection ◽

Differential Privacy ◽

Science Literature ◽

Confidential Information ◽

Social Surveys ◽

Statistical Disclosure Limitation ◽

Statistical Disclosure ◽

Key Variables ◽

Statistical Agencies

Statistical agencies release microdata from social surveys as public-use files after applying statistical disclosure limitation (SDL) techniques. Disclosure risk is typically assessed in terms of identification risk, where it is supposed that small counts on cross-classified identifying key variables, i.e. a key, could be used to make an identification and confidential information may be learnt. In this paper we explore the application of definitions of privacy from the computer science literature to the same problem, with a focus on sampling and a form of perturbation which can be represented as misclassification. We consider two privacy definitions: differential privacy and probabilistic differential privacy. Chaudhuri and Mishra (2006) have shown that sampling does not guarantee differential privacy, but that, under certain conditions, it may ensure probabilistic differential privacy. We discuss these definitions and conditions in the context of survey microdata. We then extend this discussion to the case of perturbation. We show that differential privacy can be ensured if and only if the perturbation employs a misclassification matrix with no zero entries. We also show that probabilistic differential privacy is a viable alternative to differential privacy when there are zeros in the misclassification matrix. We discuss some common examples of SDL methods where in some cases zeros may be prevalent in the misclassification matrix.

Download Full-text

Statistical Disclosure Limitation: New Directions and Challenges

Journal of Privacy and Confidentiality ◽

10.29012/jpc.684 ◽

2018 ◽

Vol 8 (1) ◽

Author(s):

Natalie Shlomo

Keyword(s):

Data Dissemination ◽

Differential Privacy ◽

Synthetic Data ◽

Remote Access ◽

Disclosure Limitation ◽

Disclosure Risk ◽

Statistical Disclosure Limitation ◽

Statistical Disclosure ◽

Statistical Agencies ◽

Definition Of

An overview of traditional types of data dissemination at statistical agencies is provided including definitions of disclosure risks, the quantification of disclosure risk and data utility and common statistical disclosure limitation (SDL) methods. However, with technological advancements and the increasing push by governments for openand accessible data, new forms of data dissemination are currently being explored. We focus on web-based applications such as flexible table builders and remote analysis servers, synthetic data and remote access. Many of these applications introduce new challenges for statistical agencies as they are gradually relinquishing some of their control on what data is released. There is now more recognition of the need for perturbative methods to protect the confidentiality of data subjects. These new forms of data dissemination are changing the landscape of how disclosure risks are conceptualized and the types of SDL methods that need to be applied to protect thedata. In particular, inferential disclosure is the main disclosure risk of concern and encompasses the traditional types of disclosure risks based on identity and attribute disclosures. These challenges have led to statisticians exploring the computer science definition of differential privacy and privacy- by-design applications. We explore how differential privacy can be a useful addition to the current SDL framework within statistical agencies.

Download Full-text

How Will Statistical Agencies Operate When All Data Are Private?

Journal of Privacy and Confidentiality ◽

10.29012/jpc.v7i3.404 ◽

2017 ◽

Vol 7 (3) ◽

Cited By ~ 3

Author(s):

John M Abowd

Keyword(s):

Big Data ◽

Privacy Protection ◽

Disclosure Limitation ◽

Dual Problems ◽

Statistical Disclosure Limitation ◽

Statistical Disclosure ◽

Access Controls ◽

Statistical Agencies ◽

Paradigm Shifting ◽

Two Sides

The dual problems of respecting citizen privacy and protecting the confidentiality of their data have become hopelessly conflated in the “Big Data” era. There are orders of magnitude more data outside an agency’s firewall than inside it—compromising the integrity of traditional statistical disclosure limitation methods. And increasingly the information processed by the agency was “asked” in a context wholly outside the agency’s operations—blurring the distinction between what was asked and what is published. Already, private businesses like Microsoft, Google and Apple recognize that cybersecurity (safeguarding the integrity and access controls for internal data) and privacy protection (ensuring that what is published does not reveal too much about any person or business) are two sides of the same coin. This is a paradigm-shifting moment for statistical agencies.

Download Full-text

Big data, differential privacy and national statistical organisations

Statistical Journal of the IAOS ◽

10.3233/sji-200685 ◽

2020 ◽

Vol 36 (4) ◽

pp. 1067-1074

Author(s):

James Bailie

Keyword(s):

Big Data ◽

Differential Privacy ◽

Risk Measure ◽

Science Literature ◽

Privacy Risk ◽

Research Areas ◽

Frequency Table ◽

Statistical Disclosure ◽

A Cell ◽

The Impact

Differential privacy (DP) has emerged in the computer science literature as a measure of the impact on an individual’s privacy resulting from the publication of a statistical output such as a frequency table. This paper provides an introduction to DP for official statisticians and discuss its relevance, benefits and challenges from a National Statistical Organisation (NSO) perspective. We motivate our study by examining how privacy is evolving in the era of big data and how this might prompt a shift from traditional statistical disclosure techniques used in official statistics – which are generally applied on a cell-by-cell or table-by-table basis – to formal privacy methods, like DP, which are applied from a perspective encompassing the totality of the outputs generated from a given dataset. We identify an important interplay between DP’s holistic privacy risk measure and the difficulty for NSOs in implementing DP, showing that DP’s major advantage is also DP’s major challenge. This paper provides new work addressing two key DP research areas for NSOs: DP’s application to survey data and its incorporation within the Five Safes framework.

Download Full-text

The composition and formation of effective teams: computer science meets organizational psychology

The Knowledge Engineering Review ◽

10.1017/s026988891800019x ◽

2018 ◽

Vol 33 ◽

Cited By ~ 2

Author(s):

Ewa Andrejczuk ◽

Rita Berger ◽

Juan A. Rodriguez-Aguilar ◽

Carles Sierra ◽

Víctor Marín-Puchades

Keyword(s):

Computer Science ◽

Team Performance ◽

Organizational Psychology ◽

Team Formation ◽

Science Literature ◽

Effective Teams ◽

Human Agent ◽

Wide Range ◽

Psychology Literature ◽

Cross Fertilization

AbstractNowadays the composition and formation of effective teams is highly important for both companies to assure their competitiveness and for a wide range of emerging applications exploiting multiagent collaboration (e.g. crowdsourcing, human-agent collaborations). The aim of this article is to provide an integrative perspective on team composition, team formation, and their relationship with team performance. Thus, we review the contributions in both the computer science literature and the organizational psychology literature dealing with these topics. Our purpose is twofold. First, we aim at identifying the strengths and weaknesses of the contributions made by these two diverse bodies of research. Second, we aim at identifying cross-fertilization opportunities that help both disciplines benefit from one another. Given the volume of existing literature, our review is not intended to be exhaustive. Instead, we have preferred to focus on the most significant contributions in both fields together with recent contributions that break new ground to spur innovative research.

Download Full-text

Correlated Differential Privacy Protection for Big Data

2018 IEEE 32nd International Conference on Advanced Information Networking and Applications (AINA) ◽

10.1109/aina.2018.00147 ◽

2018 ◽

Author(s):

Denglong Lv ◽

Shibing Zhu

Keyword(s):

Big Data ◽

Privacy Protection ◽

Differential Privacy

Download Full-text

Some problems about English-Spanish translations in computer science literature

ACM SIGCSE Bulletin ◽

10.1145/187387.187395 ◽

1994 ◽

Vol 26 (3) ◽

pp. 15

Author(s):

Carlos Iván Chesñevar

Keyword(s):

Computer Science ◽

Science Literature

Download Full-text

Integrating Differential Privacy in the Statistical Disclosure Control Tool-Kit for Synthetic Data Production

Privacy in Statistical Databases - Lecture Notes in Computer Science ◽

10.1007/978-3-030-57521-2_19 ◽

2020 ◽

pp. 271-280

Author(s):

Natalie Shlomo

Keyword(s):

Differential Privacy ◽

Synthetic Data ◽

Statistical Disclosure Control ◽

Disclosure Control ◽

Data Production ◽

Statistical Disclosure ◽

Control Tool

Download Full-text

The issues connected with the anonymization of medical data. Part 2. Advanced anonymization and anonymization controlled by owner of protected sensitive data

HIGHER SCHOOL’S PULSE ◽

10.5604/01.3001.0003.3161 ◽

2014 ◽

Vol 8 (2) ◽

pp. 13-24 ◽

Cited By ~ 1

Author(s):

Arkadiusz Liber

Keyword(s):

Privacy Protection ◽

Differential Privacy ◽

Data Access ◽

Medical Data ◽

Medical Documentation ◽

Research Review ◽

Sensitive Data ◽

Data Anonymization ◽

Data Access Control ◽

Anonymized Data

Introduction: Medical documentation ought to be accessible with the preservation of its integrity as well as the protection of personal data. One of the manners of its protection against disclosure is anonymization. Contemporary methods ensure anonymity without the possibility of sensitive data access control. it seems that the future of sensitive data processing systems belongs to the personalized method. In the first part of the paper k-Anonymity, (X,y)- Anonymity, (α,k)- Anonymity, and (k,e)-Anonymity methods were discussed. these methods belong to well - known elementary methods which are the subject of a significant number of publications. As the source papers to this part, Samarati, Sweeney, wang, wong and zhang’s works were accredited. the selection of these publications is justified by their wider research review work led, for instance, by Fung, Wang, Fu and y. however, it should be noted that the methods of anonymization derive from the methods of statistical databases protection from the 70s of 20th century. Due to the interrelated content and literature references the first and the second part of this article constitute the integral whole.Aim of the study: The analysis of the methods of anonymization, the analysis of the methods of protection of anonymized data, the study of a new security type of privacy enabling device to control disclosing sensitive data by the entity which this data concerns.Material and methods: Analytical methods, algebraic methods.Results: Delivering material supporting the choice and analysis of the ways of anonymization of medical data, developing a new privacy protection solution enabling the control of sensitive data by entities which this data concerns.Conclusions: In the paper the analysis of solutions for data anonymization, to ensure privacy protection in medical data sets, was conducted. the methods of: k-Anonymity, (X,y)- Anonymity, (α,k)- Anonymity, (k,e)-Anonymity, (X,y)-Privacy, lKc-Privacy, l-Diversity, (X,y)-linkability, t-closeness, confidence Bounding and Personalized Privacy were described, explained and analyzed. The analysis of solutions of controlling sensitive data by their owner was also conducted. Apart from the existing methods of the anonymization, the analysis of methods of the protection of anonymized data was included. In particular, the methods of: δ-Presence, e-Differential Privacy, (d,γ)-Privacy, (α,β)-Distributing Privacy and protections against (c,t)-isolation were analyzed. Moreover, the author introduced a new solution of the controlled protection of privacy. the solution is based on marking a protected field and the multi-key encryption of sensitive value. The suggested way of marking the fields is in accordance with Xmlstandard. For the encryption, (n,p) different keys cipher was selected. to decipher the content the p keys of n were used. The proposed solution enables to apply brand new methods to control privacy of disclosing sensitive data.

Download Full-text

Differentially Private Autocorrelation Time-Series Data Publishing Based on Sliding Window

Security and Communication Networks ◽

10.1155/2021/6665984 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Jing Zhao ◽

Shubo Liu ◽

Xingxing Xiong ◽

Zhaohui Cai

Keyword(s):

Time Series ◽

Privacy Protection ◽

Large Scale ◽

Differential Privacy ◽

Time Series Data ◽

Sliding Window ◽

Data Publishing ◽

Series Data ◽

Data Publication ◽

Autocorrelation Time

Privacy protection is one of the major obstacles for data sharing. Time-series data have the characteristics of autocorrelation, continuity, and large scale. Current research on time-series data publication mainly ignores the correlation of time-series data and the lack of privacy protection. In this paper, we study the problem of correlated time-series data publication and propose a sliding window-based autocorrelation time-series data publication algorithm, called SW-ATS. Instead of using global sensitivity in the traditional differential privacy mechanisms, we proposed periodic sensitivity to provide a stronger degree of privacy guarantee. SW-ATS introduces a sliding window mechanism, with the correlation between the noise-adding sequence and the original time-series data guaranteed by sequence indistinguishability, to protect the privacy of the latest data. We prove that SW-ATS satisfies ε-differential privacy. Compared with the state-of-the-art algorithm, SW-ATS is superior in reducing the error rate of MAE which is about 25%, improving the utility of data, and providing stronger privacy protection.

Download Full-text

Learning With Differential Privacy

Handbook of Research on Cyber Crime and Information Privacy - Advances in Information Security, Privacy, and Ethics ◽

10.4018/978-1-7998-5728-0.ch019 ◽

2021 ◽

pp. 372-395

Author(s):

Poushali Sengupta ◽

Sudipta Paul ◽

Subhankar Mishra

Keyword(s):

Privacy Protection ◽

Differential Privacy ◽

Intrusion Detection Systems ◽

Sensitive Information ◽

Detection Systems ◽

Trade Offs ◽

Personal Level ◽

Encryption Decryption ◽

Individual Trees ◽

Prevention Methods

The leakage of data might have an extreme effect on the personal level if it contains sensitive information. Common prevention methods like encryption-decryption, endpoint protection, intrusion detection systems are prone to leakage. Differential privacy comes to the rescue with a proper promise of protection against leakage, as it uses a randomized response technique at the time of collection of the data which promises strong privacy with better utility. Differential privacy allows one to access the forest of data by describing their pattern of groups without disclosing any individual trees. The current adaption of differential privacy by leading tech companies and academia encourages authors to explore the topic in detail. The different aspects of differential privacy, its application in privacy protection and leakage of information, a comparative discussion on the current research approaches in this field, its utility in the real world as well as the trade-offs will be discussed.

Download Full-text