Research on improved privacy publishing algorithm based on set cover

Haoze Lv; Zhaobin Liu; Zhonglian Hu; Lihai Nie; Weijiang Liu; Xinfeng Ye

doi:10.2298/csis180915023l

Research on improved privacy publishing algorithm based on set cover

Computer Science and Information Systems ◽

10.2298/csis180915023l ◽

2019 ◽

Vol 16 (3) ◽

pp. 705-731

Author(s):

Haoze Lv ◽

Zhaobin Liu ◽

Zhonglian Hu ◽

Lihai Nie ◽

Weijiang Liu ◽

...

Keyword(s):

Privacy Protection ◽

Data Privacy ◽

Differential Privacy ◽

Main Idea ◽

Data Availability ◽

Set Cover ◽

Data Sets ◽

Data Set ◽

Query Cover ◽

Privacy Model

With the invention of big data era, data releasing is becoming a hot topic in database community. Meanwhile, data privacy also raises the attention of users. As far as the privacy protection models that have been proposed, the differential privacy model is widely utilized because of its many advantages over other models. However, for the private releasing of multi-dimensional data sets, the existing algorithms are publishing data usually with low availability. The reason is that the noise in the released data is rapidly grown as the increasing of the dimensions. In view of this issue, we propose algorithms based on regular and irregular marginal tables of frequent item sets to protect privacy and promote availability. The main idea is to reduce the dimension of the data set, and to achieve differential privacy protection with Laplace noise. First, we propose a marginal table cover algorithm based on frequent items by considering the effectiveness of query cover combination, and then obtain a regular marginal table cover set with smaller size but higher data availability. Then, a differential privacy model with irregular marginal table is proposed in the application scenario with low data availability and high cover rate. Next, we obtain the approximate optimal marginal table cover algorithm by our analysis to get the query cover set which satisfies the multi-level query policy constraint. Thus, the balance between privacy protection and data availability is achieved. Finally, extensive experiments have been done on synthetic and real databases, demonstrating that the proposed method preforms better than state-of-the-art methods in most cases.

Download Full-text

A trajectory data publishing algorithm satisfying local suppression

International Journal of Distributed Sensor Networks ◽

10.1177/1550147721993402 ◽

2021 ◽

Vol 17 (2) ◽

pp. 155014772199340

Author(s):

Xiaohui Li ◽

Yuliang Bai ◽

Yajun Wang ◽

Bo Li

Keyword(s):

Privacy Protection ◽

Loss Rate ◽

Differential Privacy ◽

Classification Tree ◽

Data Availability ◽

Trajectory Data ◽

User Privacy ◽

Data Set ◽

Privacy Leakage ◽

Protection Method

Suppressing the trajectory data to be released can effectively reduce the risk of user privacy leakage. However, the global suppression of the data set to meet the traditional privacy model method reduces the availability of trajectory data. Therefore, we propose a trajectory data differential privacy protection algorithm based on local suppression Trajectory privacy protection based on local suppression (TPLS) to provide the user with the ability and flexibility of protecting data through local suppression. The main contributions of this article include as follows: (1) introducing privacy protection method in trajectory data release, (2) performing effective local suppression judgment on the points in the minimum violation sequence of the trajectory data set, and (3) proposing a differential privacy protection algorithm based on local suppression. In the algorithm, we achieve the purpose Maximal frequent sequence (MFS) sequence loss rate in the trajectory data set by effective local inhibition judgment and updating the minimum violation sequence set, and then establish a classification tree and add noise to the leaf nodes to improve the security of the data to be published. Simulation results show that the proposed algorithm is effective, which can reduce the data loss rate and improve data availability while reducing the risk of user privacy leakage.

Download Full-text

Privacy Preserving Data Mining on Unstructured Data

Privacy and Security Policies in Big Data - Advances in Information Security, Privacy, and Ethics ◽

10.4018/978-1-5225-2486-1.ch008 ◽

2017 ◽

pp. 167-190

Author(s):

Trupti Vishwambhar Kenekar ◽

Ajay R. Dani

Keyword(s):

Data Mining ◽

Big Data ◽

Structure Data ◽

Data Privacy ◽

Differential Privacy ◽

Unstructured Data ◽

Map Reduce ◽

Individual Data ◽

Data Set ◽

Privacy Preserving Data Mining

As Big Data is group of structured, unstructured and semi-structure data collected from various sources, it is important to mine and provide privacy to individual data. Differential Privacy is one the best measure which provides strong privacy guarantee. The chapter proposed differentially private frequent item set mining using map reduce requires less time for privately mining large dataset. The chapter discussed problem of preserving data privacy, different challenges to preserving data privacy in big data environment, Data privacy techniques and their applications to unstructured data. The analyses of experimental results on structured and unstructured data set are also presented.

Download Full-text

Research on Data Privacy Protection of Internet of Vehicles Based on Differential Privacy

IOP Conference Series Earth and Environmental Science ◽

10.1088/1755-1315/428/1/012007 ◽

2020 ◽

Vol 428 ◽

pp. 012007

Author(s):

Xue Luo ◽

Juan Wang ◽

Jing Xu ◽

Mengting Shen

Keyword(s):

Privacy Protection ◽

Data Privacy ◽

Differential Privacy ◽

Internet Of Vehicles ◽

Data Privacy Protection

Download Full-text

Data Privacy Protection Based on Micro Aggregation with Dynamic Sensitive Attribute Updating

Sensors ◽

10.3390/s18072307 ◽

2018 ◽

Vol 18 (7) ◽

pp. 2307 ◽

Cited By ~ 2

Author(s):

Yancheng Shi ◽

Zhenjiang Zhang ◽

Han-Chieh Chao ◽

Bo Shen

Keyword(s):

Privacy Protection ◽

Data Privacy ◽

Large Scale ◽

Data Centers ◽

Personal Information ◽

Rapid Development ◽

Data Availability ◽

Personal Privacy ◽

Data Anonymization ◽

Data Privacy Protection

With the rapid development of information technology, large-scale personal data, including those collected by sensors or IoT devices, is stored in the cloud or data centers. In some cases, the owners of the cloud or data centers need to publish the data. Therefore, how to make the best use of the data in the risk of personal information leakage has become a popular research topic. The most common method of data privacy protection is the data anonymization, which has two main problems: (1) The availability of information after clustering will be reduced, and it cannot be flexibly adjusted. (2) Most methods are static. When the data is released multiple times, it will cause personal privacy leakage. To solve the problems, this article has two contributions. The first one is to propose a new method based on micro-aggregation to complete the process of clustering. In this way, the data availability and the privacy protection can be adjusted flexibly by considering the concepts of distance and information entropy. The second contribution of this article is to propose a dynamic update mechanism that guarantees that the individual privacy is not compromised after the data has been subjected to multiple releases, and minimizes the loss of information. At the end of the article, the algorithm is simulated with real data sets. The availability and advantages of the method are demonstrated by calculating the time, the average information loss and the number of forged data.

Download Full-text

A multi-decade record of high-quality fCO<sub>2</sub> data in version 3 of the Surface Ocean CO<sub>2</sub> Atlas (SOCAT)

10.5194/essd-2016-15 ◽

2016 ◽

Cited By ~ 6

Author(s):

Dorothee C. E. Bakker ◽

Benjamin Pfeil ◽

Camilla S. Landa ◽

Nicolas Metzl ◽

Kevin M. O'Brien ◽

...

Keyword(s):

Carbon Dioxide ◽

Surface Water ◽

Data Collection ◽

Data Availability ◽

Data Sets ◽

Science Data ◽

High Quality ◽

Data Set ◽

Surface Ocean ◽

Biogeochemical Models

Abstract. The Surface Ocean CO2 Atlas (SOCAT) is a synthesis of quality-controlled fCO2 (fugacity of carbon dioxide) values for the global surface oceans and coastal seas with regular updates. Version 3 of SOCAT has 14.5 million fCO2 values from 3646 data sets covering the years 1957 to 2014. This latest version has an additional 4.4 million fCO2 values relative to version 2 and extends the record from 2011 to 2014. Version 3 also significantly increases the data availability for 2005 to 2013. SOCAT has an average of approximately 1.2 million surface water fCO2 values per year for the years 2006 to 2012. Quality and documentation of the data has improved. A new feature is the data set quality control (QC) flag of E for data from alternative sensors and platforms. The accuracy of surface water fCO2 has been defined for all data set QC flags. Automated range checking has been carried out for all data sets during their upload into SOCAT. The upgrade of the interactive Data Set Viewer (previously known as the Cruise Data Viewer) allows better interrogation of the SOCAT data collection and rapid creation of high-quality figures for scientific presentations. Automated data upload has been launched for version 4 and will enable more frequent SOCAT releases in the future. High-profile scientific applications of SOCAT include quantification of the ocean sink for atmospheric carbon dioxide and its long-term variation, detection of ocean acidification, as well as evaluation of coupled-climate and ocean-only biogeochemical models. Users of SOCAT data products are urged to acknowledge the contribution of data providers, as stated in the SOCAT Fair Data Use Statement. This ESSD (Earth System Science Data) "Living Data" publication documents the methods and data sets used for the assembly of this new version of the SOCAT data collection and compares these with those used for earlier versions of the data collection (Pfeil et al., 2013; Sabine et al., 2013; Bakker et al., 2014).

Download Full-text

Privacy-Preserving Stacking with Application to Cross-organizational Diabetes Prediction

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/571 ◽

2019 ◽

Author(s):

Quanming Yao ◽

Xiawei Guo ◽

James Kwok ◽

Weiwei Tu ◽

Yuqiang Chen ◽

...

Keyword(s):

Transfer Learning ◽

Differential Privacy ◽

Original Data ◽

Privacy Preserving ◽

Data Sets ◽

Data Set ◽

Predicting Performance ◽

Empirical Performance ◽

Feature Based ◽

Diabetes Prediction

To meet the standard of differential privacy, noise is usually added into the original data, which inevitably deteriorates the predicting performance of subsequent learning algorithms. In this paper, motivated by the success of improving predicting performance by ensemble learning, we propose to enhance privacy-preserving logistic regression by stacking. We show that this can be done either by sample-based or feature-based partitioning. However, we prove that when privacy-budgets are the same, feature-based partitioning requires fewer samples than sample-based one, and thus likely has better empirical performance. As transfer learning is difficult to be integrated with a differential privacy guarantee, we further combine the proposed method with hypothesis transfer learning to address the problem of learning across different organizations. Finally, we not only demonstrate the effectiveness of our method on two benchmark data sets, i.e., MNIST and NEWS20, but also apply it into a real application of cross-organizational diabetes prediction from RUIJIN data set, where privacy is of a significant concern.

Download Full-text

Providing Data With High Utility And No Disclosure Risk For The Public and Researchers: An Evaluation By Advanced Statistical Disclosure Risk Methods

Austrian Journal of Statistics ◽

10.17713/ajs.v43i4.43 ◽

2014 ◽

Vol 43 (4) ◽

pp. 247-254

Author(s):

Matthias Templ

Keyword(s):

Data Privacy ◽

Data Sets ◽

Real World Data ◽

Data Set ◽

The Public ◽

Disclosure Control ◽

High Data ◽

Disclosure Risk ◽

Statistical Disclosure ◽

High Utility

The demand of data from surveys, registers or other data sets containing sensibleinformation on people or enterprises have been increased significantly over the last years.However, before providing data to the public or to researchers, confidentiality has to berespected for any data set containing sensible individual information. Confidentiality canbe achieved by applying statistical disclosure control (SDC) methods to the data. Theresearch on SDC methods becomes more and more important in the last years because ofan increase of the awareness on data privacy and because of the fact that more and moredata are provided to the public or to researchers. However, for legal reasons this is onlyvisible when the released data has (very) low disclosure risk.In this contribution existing disclosure risk methods are review and summarized. Thesemethods are finally applied on a popular real-world data set - the Structural EarningsSurvey (SES) of Austria. It is shown that the application of few selected anonymisationmethods leads to well-protected anonymised data with high data utility and low informationloss.

Download Full-text

BLENDER: Enabling Local Search with a Hybrid Differential Privacy Model

Journal of Privacy and Confidentiality ◽

10.29012/jpc.680 ◽

2019 ◽

Vol 9 (2) ◽

Author(s):

Brendan Avent ◽

Aleksandra Korolova ◽

David Zeber ◽

Torgeir Hovden ◽

Benjamin Livshits

Keyword(s):

Local Search ◽

Hybrid Model ◽

Differential Privacy ◽

Data Sets ◽

Privacy Model ◽

New Type ◽

Privacy Budget

We propose a hybrid model of differential privacy that considers a combination of regular and opt-in users who desire the differential privacy guarantees of the local privacy model and the trusted curator model, respectively. We demonstrate that within this model, it is possible to design a new type of blended algorithm that improves the utility of obtained data, while providing users with their desired privacy guarantees. We apply this algorithm to the task of privately computing the head of the search log and show that the blended approach provides significant improvements in the utility of the data compared to related work. Specifically, on two large search click data sets, comprising 1.75 and 16 GB, respectively, our approach attains NDCG values exceeding 95% across a range of privacy budget values.

Download Full-text

Constructing a Lightweight Key-Value Store Based on the Windows Native Features

Applied Sciences ◽

10.3390/app9183801 ◽

2019 ◽

Vol 9 (18) ◽

pp. 3801 ◽

Cited By ~ 1

Author(s):

Hyuk-Yoon Kwon

Keyword(s):

State Of The Art ◽

Main Idea ◽

Real Data ◽

Data Sets ◽

Parameter Setting ◽

Data Set ◽

Multi Level ◽

Windows Registry ◽

Best Parameter ◽

Better Than

In this paper, we propose a method to construct a lightweight key-value store based on the Windows native features. The main idea is providing a thin wrapper for the key-value store on top of a built-in storage in Windows, called Windows registry. First, we define a mapping of the components in the key-value store onto the components in the Windows registry. Then, we present a hash-based multi-level registry index so as to distribute the key-value data balanced and to efficiently access them. Third, we implement basic operations of the key-value store (i.e., Get, Put, and Delete) by manipulating the Windows registry using the Windows native APIs. We call the proposed key-value store WR-Store. Finally, we propose an efficient ETL (Extract-Transform-Load) method to migrate data stored in WR-Store into any other environments that support existing key-value stores. Because the performance of the Windows registry has not been studied much, we perform the empirical study to understand the characteristics of WR-Store, and then, tune the performance of WR-Store to find the best parameter setting. Through extensive experiments using synthetic and real data sets, we show that the performance of WR-Store is comparable to or even better than the state-of-the-art systems (i.e., RocksDB, BerkeleyDB, and LevelDB). Especially, we show the scalability of WR-Store. That is, WR-Store becomes much more efficient than the other key-value stores as the size of data set increases. In addition, we show that the performance of WR-Store is maintained even in the case of intensive registry workloads where 1000 processes accessing to the registry actively are concurrently running.

Download Full-text

Trajectory data privacy protection based on differential privacy mechanism

IOP Conference Series Materials Science and Engineering ◽

10.1088/1757-899x/351/1/012017 ◽

2018 ◽

Vol 351 ◽

pp. 012017 ◽

Cited By ~ 1

Author(s):

Ke Gu ◽

Lihao Yang ◽

Yongzhi Liu ◽

Niandong Liao

Keyword(s):

Privacy Protection ◽

Data Privacy ◽

Differential Privacy ◽

Trajectory Data ◽

Data Privacy Protection

Download Full-text