scholarly journals Research on improved privacy publishing algorithm based on set cover

2019 ◽  
Vol 16 (3) ◽  
pp. 705-731
Author(s):  
Haoze Lv ◽  
Zhaobin Liu ◽  
Zhonglian Hu ◽  
Lihai Nie ◽  
Weijiang Liu ◽  
...  

With the invention of big data era, data releasing is becoming a hot topic in database community. Meanwhile, data privacy also raises the attention of users. As far as the privacy protection models that have been proposed, the differential privacy model is widely utilized because of its many advantages over other models. However, for the private releasing of multi-dimensional data sets, the existing algorithms are publishing data usually with low availability. The reason is that the noise in the released data is rapidly grown as the increasing of the dimensions. In view of this issue, we propose algorithms based on regular and irregular marginal tables of frequent item sets to protect privacy and promote availability. The main idea is to reduce the dimension of the data set, and to achieve differential privacy protection with Laplace noise. First, we propose a marginal table cover algorithm based on frequent items by considering the effectiveness of query cover combination, and then obtain a regular marginal table cover set with smaller size but higher data availability. Then, a differential privacy model with irregular marginal table is proposed in the application scenario with low data availability and high cover rate. Next, we obtain the approximate optimal marginal table cover algorithm by our analysis to get the query cover set which satisfies the multi-level query policy constraint. Thus, the balance between privacy protection and data availability is achieved. Finally, extensive experiments have been done on synthetic and real databases, demonstrating that the proposed method preforms better than state-of-the-art methods in most cases.

2021 ◽  
Vol 17 (2) ◽  
pp. 155014772199340
Author(s):  
Xiaohui Li ◽  
Yuliang Bai ◽  
Yajun Wang ◽  
Bo Li

Suppressing the trajectory data to be released can effectively reduce the risk of user privacy leakage. However, the global suppression of the data set to meet the traditional privacy model method reduces the availability of trajectory data. Therefore, we propose a trajectory data differential privacy protection algorithm based on local suppression Trajectory privacy protection based on local suppression (TPLS) to provide the user with the ability and flexibility of protecting data through local suppression. The main contributions of this article include as follows: (1) introducing privacy protection method in trajectory data release, (2) performing effective local suppression judgment on the points in the minimum violation sequence of the trajectory data set, and (3) proposing a differential privacy protection algorithm based on local suppression. In the algorithm, we achieve the purpose Maximal frequent sequence (MFS) sequence loss rate in the trajectory data set by effective local inhibition judgment and updating the minimum violation sequence set, and then establish a classification tree and add noise to the leaf nodes to improve the security of the data to be published. Simulation results show that the proposed algorithm is effective, which can reduce the data loss rate and improve data availability while reducing the risk of user privacy leakage.


Author(s):  
Trupti Vishwambhar Kenekar ◽  
Ajay R. Dani

As Big Data is group of structured, unstructured and semi-structure data collected from various sources, it is important to mine and provide privacy to individual data. Differential Privacy is one the best measure which provides strong privacy guarantee. The chapter proposed differentially private frequent item set mining using map reduce requires less time for privately mining large dataset. The chapter discussed problem of preserving data privacy, different challenges to preserving data privacy in big data environment, Data privacy techniques and their applications to unstructured data. The analyses of experimental results on structured and unstructured data set are also presented.


Sensors ◽  
2018 ◽  
Vol 18 (7) ◽  
pp. 2307 ◽  
Author(s):  
Yancheng Shi ◽  
Zhenjiang Zhang ◽  
Han-Chieh Chao ◽  
Bo Shen

With the rapid development of information technology, large-scale personal data, including those collected by sensors or IoT devices, is stored in the cloud or data centers. In some cases, the owners of the cloud or data centers need to publish the data. Therefore, how to make the best use of the data in the risk of personal information leakage has become a popular research topic. The most common method of data privacy protection is the data anonymization, which has two main problems: (1) The availability of information after clustering will be reduced, and it cannot be flexibly adjusted. (2) Most methods are static. When the data is released multiple times, it will cause personal privacy leakage. To solve the problems, this article has two contributions. The first one is to propose a new method based on micro-aggregation to complete the process of clustering. In this way, the data availability and the privacy protection can be adjusted flexibly by considering the concepts of distance and information entropy. The second contribution of this article is to propose a dynamic update mechanism that guarantees that the individual privacy is not compromised after the data has been subjected to multiple releases, and minimizes the loss of information. At the end of the article, the algorithm is simulated with real data sets. The availability and advantages of the method are demonstrated by calculating the time, the average information loss and the number of forged data.


2016 ◽  
Author(s):  
Dorothee C. E. Bakker ◽  
Benjamin Pfeil ◽  
Camilla S. Landa ◽  
Nicolas Metzl ◽  
Kevin M. O'Brien ◽  
...  

Abstract. The Surface Ocean CO2 Atlas (SOCAT) is a synthesis of quality-controlled fCO2 (fugacity of carbon dioxide) values for the global surface oceans and coastal seas with regular updates. Version 3 of SOCAT has 14.5 million fCO2 values from 3646 data sets covering the years 1957 to 2014. This latest version has an additional 4.4 million fCO2 values relative to version 2 and extends the record from 2011 to 2014. Version 3 also significantly increases the data availability for 2005 to 2013. SOCAT has an average of approximately 1.2 million surface water fCO2 values per year for the years 2006 to 2012. Quality and documentation of the data has improved. A new feature is the data set quality control (QC) flag of E for data from alternative sensors and platforms. The accuracy of surface water fCO2 has been defined for all data set QC flags. Automated range checking has been carried out for all data sets during their upload into SOCAT. The upgrade of the interactive Data Set Viewer (previously known as the Cruise Data Viewer) allows better interrogation of the SOCAT data collection and rapid creation of high-quality figures for scientific presentations. Automated data upload has been launched for version 4 and will enable more frequent SOCAT releases in the future. High-profile scientific applications of SOCAT include quantification of the ocean sink for atmospheric carbon dioxide and its long-term variation, detection of ocean acidification, as well as evaluation of coupled-climate and ocean-only biogeochemical models. Users of SOCAT data products are urged to acknowledge the contribution of data providers, as stated in the SOCAT Fair Data Use Statement. This ESSD (Earth System Science Data) "Living Data" publication documents the methods and data sets used for the assembly of this new version of the SOCAT data collection and compares these with those used for earlier versions of the data collection (Pfeil et al., 2013; Sabine et al., 2013; Bakker et al., 2014).


Author(s):  
Quanming Yao ◽  
Xiawei Guo ◽  
James Kwok ◽  
Weiwei Tu ◽  
Yuqiang Chen ◽  
...  

To meet the standard of differential privacy, noise is usually added into the original data, which inevitably deteriorates the predicting performance of subsequent learning algorithms. In this paper, motivated by the success of improving predicting performance by ensemble learning, we propose to enhance privacy-preserving logistic regression by stacking. We show that this can be done either by sample-based or feature-based partitioning. However, we prove that when privacy-budgets are the same, feature-based partitioning requires fewer samples than sample-based one, and thus likely has better empirical performance. As transfer learning is difficult to be integrated with a differential privacy guarantee, we further combine the proposed method with hypothesis transfer learning to address the problem of learning across different organizations. Finally, we not only demonstrate the effectiveness of our method on two benchmark data sets, i.e., MNIST and NEWS20, but also apply it into a real application of cross-organizational diabetes prediction from RUIJIN data set, where privacy is of a significant concern.


2014 ◽  
Vol 43 (4) ◽  
pp. 247-254
Author(s):  
Matthias Templ

The demand of data from surveys, registers or other data sets containing sensibleinformation on people or enterprises have been increased significantly over the last years.However, before providing data to the public or to researchers, confidentiality has to berespected for any data set containing sensible individual information. Confidentiality canbe achieved by applying statistical disclosure control (SDC) methods to the data. Theresearch on SDC methods becomes more and more important in the last years because ofan increase of the awareness on data privacy and because of the fact that more and moredata are provided to the public or to researchers. However, for legal reasons this is onlyvisible when the released data has (very) low disclosure risk.In this contribution existing disclosure risk methods are review and summarized. Thesemethods are finally applied on a popular real-world data set - the Structural EarningsSurvey (SES) of Austria. It is shown that the application of few selected anonymisationmethods leads to well-protected anonymised data with high data utility and low informationloss.


2019 ◽  
Vol 9 (2) ◽  
Author(s):  
Brendan Avent ◽  
Aleksandra Korolova ◽  
David Zeber ◽  
Torgeir Hovden ◽  
Benjamin Livshits

We propose a hybrid model of differential privacy that considers a combination of regular and opt-in users who desire the differential privacy guarantees of the local privacy model and the trusted curator model, respectively. We demonstrate that within this model, it is possible to design a new type of blended algorithm that improves the utility of obtained data, while providing users with their desired privacy guarantees. We apply this algorithm to the task of privately computing the head of the search log and show that the blended approach provides significant improvements in the utility of the data compared to related work. Specifically, on two large search click data sets, comprising 1.75 and 16 GB, respectively, our approach attains NDCG values exceeding 95% across a range of privacy budget values.


2019 ◽  
Vol 9 (18) ◽  
pp. 3801 ◽  
Author(s):  
Hyuk-Yoon Kwon

In this paper, we propose a method to construct a lightweight key-value store based on the Windows native features. The main idea is providing a thin wrapper for the key-value store on top of a built-in storage in Windows, called Windows registry. First, we define a mapping of the components in the key-value store onto the components in the Windows registry. Then, we present a hash-based multi-level registry index so as to distribute the key-value data balanced and to efficiently access them. Third, we implement basic operations of the key-value store (i.e., Get, Put, and Delete) by manipulating the Windows registry using the Windows native APIs. We call the proposed key-value store WR-Store. Finally, we propose an efficient ETL (Extract-Transform-Load) method to migrate data stored in WR-Store into any other environments that support existing key-value stores. Because the performance of the Windows registry has not been studied much, we perform the empirical study to understand the characteristics of WR-Store, and then, tune the performance of WR-Store to find the best parameter setting. Through extensive experiments using synthetic and real data sets, we show that the performance of WR-Store is comparable to or even better than the state-of-the-art systems (i.e., RocksDB, BerkeleyDB, and LevelDB). Especially, we show the scalability of WR-Store. That is, WR-Store becomes much more efficient than the other key-value stores as the size of data set increases. In addition, we show that the performance of WR-Store is maintained even in the case of intensive registry workloads where 1000 processes accessing to the registry actively are concurrently running.


Sign in / Sign up

Export Citation Format

Share Document