The Role of Information Theory in the Field of Big Data Privacy

A Comprehensive Survey on Local Differential Privacy

Security and Communication Networks ◽

10.1155/2020/8829523 ◽

2020 ◽

Vol 2020 ◽

pp. 1-29 ◽

Cited By ~ 1

Author(s):

Xingxing Xiong ◽

Shubo Liu ◽

Dan Li ◽

Zhaohui Cai ◽

Xiaoguang Niu

Keyword(s):

Big Data ◽

Data Analysis ◽

Statistical Learning ◽

Data Privacy ◽

Statistical Estimation ◽

Differential Privacy ◽

Future Research ◽

Reference Source ◽

Complex Data ◽

Comprehensive Survey

With the advent of the era of big data, privacy issues have been becoming a hot topic in public. Local differential privacy (LDP) is a state-of-the-art privacy preservation technique that allows to perform big data analysis (e.g., statistical estimation, statistical learning, and data mining) while guaranteeing each individual participant’s privacy. In this paper, we present a comprehensive survey of LDP. We first give an overview on the fundamental knowledge of LDP and its frameworks. We then introduce the mainstream privatization mechanisms and methods in detail from the perspective of frequency oracle and give insights into recent studied on private basic statistical estimation (e.g., frequency estimation and mean estimation) and complex statistical estimation (e.g., multivariate distribution estimation and private estimation over complex data) under LDP. Furthermore, we present current research circumstances on LDP including the private statistical learning/inferencing, private statistical data analysis, privacy amplification techniques for LDP, and some application fields under LDP. Finally, we identify future research directions and open challenges for LDP. This survey can serve as a good reference source for the research of LDP to deal with various privacy-related scenarios to be encountered in practice.

Download Full-text

Bought and Sold: Exploring the Effects of Big Data on User Agency and Commodification

10.32920/ryerson.14657883.v1 ◽

2021 ◽

Author(s):

Kristia M. Pavlakos

Keyword(s):

Social Sciences ◽

Big Data ◽

Data Privacy ◽

Large Data ◽

Data Sets ◽

Privacy Regulation ◽

Scholarly Literature ◽

User Interests ◽

The Social ◽

The Relationship

Big Data1is a phenomenon that has been increasingly studied in the academy in recent years, especially in technological and scientific contexts. However, it is still a relatively new field of academic study; because it has been previously considered in mainly technological contexts, more attention needs to be drawn to the contributions made in Big Data scholarship in the social sciences by scholars like Omar Tene and Jules Polonetsky, Bart Custers, Kate Crawford, Nick Couldry, and Jose van Dijk. The purpose of this Major Research Paper is to gain insight into the issues surrounding privacy and user rights, roles, and commodification in relation to Big Data in a social sciences context. The term “Big Data” describes the collection, aggregation, and analysis of large data sets. While corporations are usually responsible for the analysis and dissemination of the data, most of this data is user generated, and there must be considerations regarding the user’s rights and roles. In this paper, I raise three main issues that shape the discussion: how users can be more active agents in data ownership, how consent measures can be made to actively reflect user interests instead of focusing on benefitting corporations, and how user agency can be preserved. Through an analysis of social sciences scholarly literature on Big Data, privacy, and user commodification, I wish to determine how these concepts are being discussed, where there have been advancements in privacy regulation and the prevention of user commodification, and where there is a need to improve these measures. In doing this, I hope to discover a way to better facilitate the relationship between data collectors and analysts, and user-generators. 1 While there is no definitive resolution as to whether or not to capitalize the term “Big Data”, in capitalizing it I chose to conform with such authors as boyd and Crawford (2012), Couldry and Turow (2014), and Dalton and Thatcher (2015), who do so in the scholarly literature.

Download Full-text

Privacy Preserving Data Mining on Unstructured Data

Privacy and Security Policies in Big Data - Advances in Information Security, Privacy, and Ethics ◽

10.4018/978-1-5225-2486-1.ch008 ◽

2017 ◽

pp. 167-190

Author(s):

Trupti Vishwambhar Kenekar ◽

Ajay R. Dani

Keyword(s):

Data Mining ◽

Big Data ◽

Structure Data ◽

Data Privacy ◽

Differential Privacy ◽

Unstructured Data ◽

Map Reduce ◽

Individual Data ◽

Data Set ◽

Privacy Preserving Data Mining

As Big Data is group of structured, unstructured and semi-structure data collected from various sources, it is important to mine and provide privacy to individual data. Differential Privacy is one the best measure which provides strong privacy guarantee. The chapter proposed differentially private frequent item set mining using map reduce requires less time for privately mining large dataset. The chapter discussed problem of preserving data privacy, different challenges to preserving data privacy in big data environment, Data privacy techniques and their applications to unstructured data. The analyses of experimental results on structured and unstructured data set are also presented.

Download Full-text

When Relational-Based Applications Go to NoSQL Databases: A Survey

Information ◽

10.3390/info10070241 ◽

2019 ◽

Vol 10 (7) ◽

pp. 241

Author(s):

Geomar A. Schreiner ◽

Denio Duarte ◽

Ronaldo dos S. Melo

Keyword(s):

Big Data ◽

Comparative Analysis ◽

Relational Databases ◽

Research Area ◽

Relational Data ◽

Data Sets ◽

System Architectures ◽

Nosql Databases ◽

State Of Art ◽

Open Issues

Several data-centric applications today produce and manipulate a large volume of data, the so-called Big Data. Traditional databases, in particular, relational databases, are not suitable for Big Data management. As a consequence, some approaches that allow the definition and manipulation of large relational data sets stored in NoSQL databases through an SQL interface have been proposed, focusing on scalability and availability. This paper presents a comparative analysis of these approaches based on an architectural classification that organizes them according to their system architectures. Our motivation is that wrapping is a relevant strategy for relational-based applications that intend to move relational data to NoSQL databases (usually maintained in the cloud). We also claim that this research area has some open issues, given that most approaches deal with only a subset of SQL operations or give support to specific target NoSQL databases. Our intention with this survey is, therefore, to contribute to the state-of-art in this research area and also provide a basis for choosing or even designing a relational-to-NoSQL data wrapping solution.

Download Full-text

Data Privacy Preservation and Security Approaches for Sensitive Data in Big Data

10.3233/apc210221 ◽

2021 ◽

Author(s):

Rohit Ravindra Nikam ◽

Rekha Shahapurkar

Keyword(s):

Data Mining ◽

Data Analytics ◽

Data Privacy ◽

Privacy Preservation ◽

Large Data ◽

Research Area ◽

Data Sets ◽

Sensitive Information ◽

Sensitive Data ◽

Data Mining Techniques

Data mining is a technique that explores the necessary data is extracted from large data sets. Privacy protection of data mining is about hiding the sensitive information or identity of breach security or without losing data usability. Sensitive data contains confidential information about individuals, businesses, and governments who must not agree upon before sharing or publishing his privacy data. Conserving data mining privacy has become a critical research area. Various evaluation metrics such as performance in terms of time efficiency, data utility, and degree of complexity or resistance to data mining techniques are used to estimate the privacy preservation of data mining techniques. Social media and smart phones produce tons of data every minute. To decision making, the voluminous data produced from the different sources can be processed and analyzed. But data analytics are vulnerable to breaches of privacy. One of the data analytics frameworks is recommendation systems commonly used by e-commerce sites such as Amazon, Flip Kart to recommend items to customers based on their purchasing habits that lead to characterized. This paper presents various techniques of privacy conservation, such as data anonymization, data randomization, generalization, data permutation, etc. such techniques which existing researchers use. We also analyze the gap between various processes and privacy preservation methods and illustrate how to overcome such issues with new innovative methods. Finally, our research describes the outcome summary of the entire literature.

Download Full-text

Research on improved privacy publishing algorithm based on set cover

Computer Science and Information Systems ◽

10.2298/csis180915023l ◽

2019 ◽

Vol 16 (3) ◽

pp. 705-731

Author(s):

Haoze Lv ◽

Zhaobin Liu ◽

Zhonglian Hu ◽

Lihai Nie ◽

Weijiang Liu ◽

...

Keyword(s):

Privacy Protection ◽

Data Privacy ◽

Differential Privacy ◽

Main Idea ◽

Data Availability ◽

Set Cover ◽

Data Sets ◽

Data Set ◽

Query Cover ◽

Privacy Model

With the invention of big data era, data releasing is becoming a hot topic in database community. Meanwhile, data privacy also raises the attention of users. As far as the privacy protection models that have been proposed, the differential privacy model is widely utilized because of its many advantages over other models. However, for the private releasing of multi-dimensional data sets, the existing algorithms are publishing data usually with low availability. The reason is that the noise in the released data is rapidly grown as the increasing of the dimensions. In view of this issue, we propose algorithms based on regular and irregular marginal tables of frequent item sets to protect privacy and promote availability. The main idea is to reduce the dimension of the data set, and to achieve differential privacy protection with Laplace noise. First, we propose a marginal table cover algorithm based on frequent items by considering the effectiveness of query cover combination, and then obtain a regular marginal table cover set with smaller size but higher data availability. Then, a differential privacy model with irregular marginal table is proposed in the application scenario with low data availability and high cover rate. Next, we obtain the approximate optimal marginal table cover algorithm by our analysis to get the query cover set which satisfies the multi-level query policy constraint. Thus, the balance between privacy protection and data availability is achieved. Finally, extensive experiments have been done on synthetic and real databases, demonstrating that the proposed method preforms better than state-of-the-art methods in most cases.

Download Full-text

Bought and Sold: Exploring the Effects of Big Data on User Agency and Commodification

10.32920/ryerson.14657883 ◽

2021 ◽

Author(s):

Kristia M. Pavlakos

Keyword(s):

Social Sciences ◽

Big Data ◽

Data Privacy ◽

Large Data ◽

Data Sets ◽

Privacy Regulation ◽

Scholarly Literature ◽

User Interests ◽

The Social ◽

The Relationship

Big Data1is a phenomenon that has been increasingly studied in the academy in recent years, especially in technological and scientific contexts. However, it is still a relatively new field of academic study; because it has been previously considered in mainly technological contexts, more attention needs to be drawn to the contributions made in Big Data scholarship in the social sciences by scholars like Omar Tene and Jules Polonetsky, Bart Custers, Kate Crawford, Nick Couldry, and Jose van Dijk. The purpose of this Major Research Paper is to gain insight into the issues surrounding privacy and user rights, roles, and commodification in relation to Big Data in a social sciences context. The term “Big Data” describes the collection, aggregation, and analysis of large data sets. While corporations are usually responsible for the analysis and dissemination of the data, most of this data is user generated, and there must be considerations regarding the user’s rights and roles. In this paper, I raise three main issues that shape the discussion: how users can be more active agents in data ownership, how consent measures can be made to actively reflect user interests instead of focusing on benefitting corporations, and how user agency can be preserved. Through an analysis of social sciences scholarly literature on Big Data, privacy, and user commodification, I wish to determine how these concepts are being discussed, where there have been advancements in privacy regulation and the prevention of user commodification, and where there is a need to improve these measures. In doing this, I hope to discover a way to better facilitate the relationship between data collectors and analysts, and user-generators. 1 While there is no definitive resolution as to whether or not to capitalize the term “Big Data”, in capitalizing it I chose to conform with such authors as boyd and Crawford (2012), Couldry and Turow (2014), and Dalton and Thatcher (2015), who do so in the scholarly literature.

Download Full-text

Differential Privacy Approach for Big Data Privacy in Healthcare

Privacy and Security Policies in Big Data - Advances in Information Security, Privacy, and Ethics ◽

10.4018/978-1-5225-2486-1.ch009 ◽

2017 ◽

pp. 191-213 ◽

Cited By ~ 1

Author(s):

Marmar Moussa ◽

Steven A. Demurjian

Keyword(s):

Big Data ◽

Data Sharing ◽

Data Privacy ◽

Large Scale ◽

Differential Privacy ◽

Security And Privacy ◽

Instrumental Case Study ◽

Large Scale Data ◽

Private Data

This chapter presents a survey of the most important security and privacy issues related to large-scale data sharing and mining in big data with focus on differential privacy as a promising approach for achieving privacy especially in statistical databases often used in healthcare. A case study is presented utilizing differential privacy in healthcare domain, the chapter analyzes and compares the major differentially private data release strategies and noise mechanisms such as the Laplace and the exponential mechanisms. The background section discusses several security and privacy approaches in big data including authentication and encryption protocols, and privacy preserving techniques such as k-anonymity. Next, the chapter introduces the differential privacy concepts used in the interactive and non-interactive data sharing models and the various noise mechanisms used. An instrumental case study is then presented to examine the effect of applying differential privacy in analytics. The chapter then explores the future trends and finally, provides a conclusion.

Download Full-text

Epilogue

Topology: A Very Short Introduction ◽

10.1093/actrade/9780198832683.003.0007 ◽

2019 ◽

pp. 128-130

Author(s):

Richard Earl

Keyword(s):

Big Data ◽

Data Analysis ◽

Research Area ◽

General Topology ◽

Topological Data Analysis ◽

Data Sets ◽

Current Interest ◽

And Topology ◽

Active Research ◽

Active Research Area

Topology remains a large, active research area in mathematics. Unsurprisingly its character has changed over the last century—there is considerably less current interest in general topology, but whole new areas have emerged, such as topological data analysis to help analyze big data sets. The Epilogue concludes that the interfaces of topology with other areas have remained rich and numerous, and it can be hard telling where topology stops and geometry or algebra or analysis or physics begin. Often that richness comes from studying structures that have interconnected flavours of algebra, geometry, and topology, but sometimes a result, seemingly of an entirely algebraic nature say, can be proved by purely topological means.

Download Full-text

Privacy and Data-Based Research

The Journal of Economic Perspectives ◽

10.1257/jep.28.2.75 ◽

2014 ◽

Vol 28 (2) ◽

pp. 75-98 ◽

Cited By ~ 24

Author(s):

Ori Heffetz ◽

Katrina Ligett

Keyword(s):

Big Data ◽

Computer Science ◽

Data Privacy ◽

Differential Privacy ◽

Statistical Databases

What can we, as users of microdata, formally guarantee to the individuals (or firms) in our dataset, regarding their privacy? We retell a few stories, well-known in data-privacy circles, of failed anonymization attempts in publicly released datasets. We then provide a mostly informal introduction to several ideas from the literature on differential privacy, an active literature in computer science that studies formal approaches to preserving the privacy of individuals in statistical databases. We apply some of its insights to situations routinely faced by applied economists, emphasizing big-data contexts.

Download Full-text