An Extended Mondrian Algorithm – XMondrian to Protect Identity Disclosure

Mapping Intimacies ◽

10.3233/apc210088 ◽

2021 ◽

Author(s):

R. Padmaja ◽

V. Santhi

Keyword(s):

Research Area ◽

Data Encryption ◽

Vital Role ◽

Data Publishing ◽

Extended Version ◽

Data Anonymization ◽

Identity Disclosure ◽

Categorical Attributes ◽

Privacy Preserving Data Publishing ◽

Day By Day

In recent days, Privacy Preserving Data Publishing (PPDP) is considered as vital research area due to rapid increasing rate of data being published in the Internet day by day. Many Organizations often need to publish their data in internet for research and analysis purpose, but there is no guarantee that those data would be used only for ethical purposes. Hence data anonymization comes into picture and play a vital role in preventing identity disclosure, also it restricts the amount of data that can be seen or used by the external users. It is an extensively used PPDP technique among data encryption, data anonymization and data perturbation methods. Mondrian is considered as one such data anonymization technique that has outperformed compare to many anonymization algorithms, because of its fast and scalable nature. However, the algorithm insists to encode the categorical values into numerical values and decode it, to generalize the data. To overcome this problem, a new extended version of Mondrian algorithm is proposed, and it is called XMondrian algorithm. The proposed algorithm can handle both numerical and categorical attributes without encoding or decoding the categorical values.The effectiveness of the proposed algorithm has been analysed through experimental study and observed that the proposed XMondrian algorithm outshine the existing Mondrian algorithm in terms of anonymization time and Cavg. Cavg is one of the metric used to quantify the utility of data.

Download Full-text

f-Slip: An Efficient Privacy-Preserving Data Publishing Framework for 1: M Microdata with Multiple Sensitive Attributes.

10.21203/rs.3.rs-660451/v1 ◽

2021 ◽

Author(s):

Jayapradha J ◽

Prakash M

Keyword(s):

Privacy Preserving ◽

Vital Role ◽

Data Publishing ◽

Slip Model ◽

Correlation Attack ◽

Sensitive Attribute ◽

Utility Loss ◽

Privacy Preserving Data Publishing ◽

Loss Efficiency ◽

Attribute Correlation

Abstract Privacy of the individuals plays a vital role when a dataset is disclosed in public. Privacy-preserving data publishing is a process of releasing the anonymized dataset for various purposes of analysis and research. The data to be published contain several sensitive attributes such as diseases, salary, symptoms, etc. Earlier, researchers have dealt with datasets considering it would contain only one record for an individual [1:1 dataset], which is uncompromising in various applications. Later, many researchers concentrate on the dataset, where an individual has multiple records [1:M dataset]. In the paper, a model f-slip was proposed that can address the various attacks such as Background Knowledge (bk) attack, Multiple Sensitive attribute correlation attack (MSAcorr), Quasi-identifier correlation attack(QIcorr), Non-membership correlation attack(NMcorr) and Membership correlation attack(Mcorr) in 1:M dataset and the solutions for the attacks. In f -slip, the anatomization was performed to divide the table into two subtables consisting of i) quasi-identifier and ii) sensitive attributes. The correlation of sensitive attributes is computed to anonymize the sensitive attributes without breaking the linking relationship. Further, the quasi-identifier table was divided and k-anonymity was implemented on it. An efficient anonymization technique, frequency-slicing (f-slicing), was also developed to anonymize the sensitive attributes. The f -slip model is consistent as the number of records increases. Extensive experiments were performed on a real-world dataset Informs and proved that the f -slip model outstrips the state-of-the-art techniques in terms of utility loss, efficiency and also acquires an optimal balance between privacy and utility.

Download Full-text

Efficiently Supporting Online Privacy-Preserving Data Publishing in a Distributed Computing Environment

Applied Sciences ◽

10.3390/app112210740 ◽

2021 ◽

Vol 11 (22) ◽

pp. 10740

Author(s):

Jong Kim

Keyword(s):

Personal Information ◽

Privacy Preserving ◽

Online Privacy ◽

Data Publishing ◽

Sensitive Information ◽

Data Anonymization ◽

Query Result ◽

Individual Entity ◽

Privacy Preserving Data Publishing ◽

Increasing Demand

There has recently been an increasing need for the collection and sharing of microdata containing information regarding an individual entity. Because microdata typically contain sensitive information on an individual, releasing it directly for public use may violate existing privacy requirements. Thus, extensive studies have been conducted on privacy-preserving data publishing (PPDP), which ensures that any microdata released satisfy the privacy policy requirements. Most existing privacy-preserving data publishing algorithms consider a scenario in which a data publisher, receiving a request for the release of data containing personal information, anonymizes the data prior to publishing—a process that is usually conducted offline. However, with the increasing demand for the sharing of data among various parties, it is more desirable to integrate the data anonymization functionality into existing systems that are capable of supporting online query processing. Thus, we developed a novel scheme that is able to efficiently anonymize the query results on the fly, and thus support efficient online privacy-preserving data publishing. In particular, given a user’s query, the proposed approach effectively estimates the generalization level of each quasi-identifier attribute, thereby achieving the k-anonymity property in the query result datasets based on the statistical information without applying k-anonymity on all actual datasets, which is a costly procedure. The experiment results show that, through the proposed method, significant gains in processing time can be achieved.

Download Full-text

EDAMS: Efficient Data Anonymization Model Selector for Privacy-Preserving Data Publishing

Engineering, Technology & Applied Science Research ◽

10.48084/etasr.3374 ◽

2020 ◽

Vol 10 (2) ◽

pp. 5423-5427

Author(s):

T. Qamar ◽

N. Z. Bawany ◽

N. A. Khan

Keyword(s):

Private Information ◽

Privacy Preserving ◽

Data Publishing ◽

Data Anonymization ◽

Data Collection Process ◽

Drastic Increase ◽

Efficient Data ◽

Privacy Preserving Data Publishing ◽

Optimum Model ◽

Minimal Effort

The evolution of internet to the Internet of Things (IoT) gives an exponential rise to the data collection process. This drastic increase in the collection of a person’s private information represents a serious threat to his/her privacy. Privacy-Preserving Data Publishing (PPDP) is an area that provides a way of sharing data in their anonymized version, i.e. keeping the identity of a person undisclosed. Various anonymization models are available in the area of PPDP that guard privacy against numerous attacks. However, selecting the optimum model which balances utility and privacy is a challenging process. This study proposes the Efficient Data Anonymization Model Selector (EDAMS) for PPDP which generates an optimized anonymized dataset in terms of privacy and utility. EDAMS inputs the dataset with required parameters and produces its anonymized version by incorporating PPDP techniques while balancing utility and privacy. EDAMS is currently incorporating three PPDP techniques, namely k-anonymity, l-diversity, and t-closeness. It is tested against different variations of three datasets. The results are validated by testing each variation explicitly with the stated techniques. The results show the effectiveness of EDAMS by selecting the optimum model with minimal effort.

Download Full-text

Privacy preserving data publishing and data anonymization approaches: A review

2017 International Conference on Computing, Communication and Automation (ICCCA) ◽

10.1109/ccaa.2017.8229787 ◽

2017 ◽

Cited By ~ 7

Author(s):

Puneet Goswami ◽

Suman Madan

Keyword(s):

Privacy Preserving ◽

Data Publishing ◽

Data Anonymization ◽

Privacy Preserving Data Publishing

Download Full-text

K-Anonymity Versus L-Diversity: A Comparative Analysis on Data Anonymization Techniques

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i3.4.14669 ◽

2018 ◽

Vol 7 (3.4) ◽

pp. 24

Author(s):

Dr Sowmyarani C N ◽

Dr Dayananda P

Keyword(s):

Decision Making ◽

Comparative Analysis ◽

Privacy Preserving ◽

Data Publishing ◽

Original Form ◽

Specific Information ◽

Specific Data ◽

Data Anonymization ◽

Privacy Preserving Data Publishing

The main aim of data publishing is to make the data utilized by the researchers, scientists and data analysts to process the data by analytics and statistics which in turn useful for decision making. This data in its original form may contain some person-specific information, which should not be disclosed while publishing the data. So, privacy of such individuals should be preserved. Hence, privacy preserving data publishing plays a major role in providing privacy for person-specific data. The data should be published in such a way that, there should not be any technical way for adversary to infer the information of specific individuals. This paper provides overview on popular privacy preserving techniques. In this study, a honest effort shows that, concepts behind these techniques are analyzed and justified with suitable examples, drawbacks and vulnerability of these techniques towards privacy attacks are narrated.

Download Full-text

An optimal dynamic KCi-slice model for privacy preserving data publishing of multiple sensitive attributes adopting various sensitivity thresholds

International Journal of Data Science ◽

10.1504/ijds.2019.105264 ◽

2019 ◽

Vol 4 (4) ◽

pp. 320

Author(s):

N.V.S. Lakshmipathi Raju ◽

M.N. Seetaramanath ◽

P. Srinivasa Rao

Keyword(s):

Privacy Preserving ◽

Data Publishing ◽

Optimal Dynamic ◽

Privacy Preserving Data Publishing

Download Full-text

Privacy preserving data publishing of categorical data through k ‐anonymity and feature selection

Healthcare Technology Letters ◽

10.1049/htl.2015.0050 ◽

2016 ◽

Vol 3 (1) ◽

pp. 16-21 ◽

Cited By ~ 10

Author(s):

Aristos Aristodimou ◽

Athos Antoniades ◽

Constantinos S. Pattichis

Keyword(s):

Feature Selection ◽

Categorical Data ◽

Privacy Preserving ◽

Data Publishing ◽

Privacy Preserving Data Publishing

Download Full-text

Anonymization Based on Improved Bucketization (AIB): A Privacy-Preserving Data Publishing Technique for Improving Data Utility in Healthcare Data

Journal of Medical Imaging and Health Informatics ◽

10.1166/jmihi.2021.3901 ◽

2021 ◽

Vol 11 (12) ◽

pp. 3164-3173

Author(s):

R. Indhumathi ◽

S. Sathiya Devi

Keyword(s):

Medical Information ◽

Threshold Value ◽

Privacy Preserving ◽

Data Publishing ◽

Published Data ◽

Sensitive Information ◽

Data Utility ◽

Healthcare Data ◽

Privacy Preserving Data Publishing ◽

Horizontal Partitioning

Data sharing is essential in present biomedical research. A large quantity of medical information is gathered and for different objectives of analysis and study. Because of its large collection, anonymity is essential. Thus, it is quite important to preserve privacy and prevent leakage of sensitive information of patients. Most of the Anonymization methods such as generalisation, suppression and perturbation are proposed to overcome the information leak which degrades the utility of the collected data. During data sanitization, the utility is automatically diminished. Privacy Preserving Data Publishing faces the main drawback of maintaining tradeoff between privacy and data utility. To address this issue, an efficient algorithm called Anonymization based on Improved Bucketization (AIB) is proposed, which increases the utility of published data while maintaining privacy. The Bucketization technique is used in this paper with the intervention of the clustering method. The proposed work is divided into three stages: (i) Vertical and Horizontal partitioning (ii) Assigning Sensitive index to attributes in the cluster (iii) Verifying each cluster against privacy threshold (iv) Examining for privacy breach in Quasi Identifier (QI). To increase the utility of published data, the threshold value is determined based on the distribution of elements in each attribute, and the anonymization method is applied only to the specific QI element. As a result, the data utility has been improved. Finally, the evaluation results validated the design of paper and demonstrated that our design is effective in improving data utility.

Download Full-text