Frequent Symptom Sets Identification from Uncertain Medical Data in Differentially Private Way

Scientific Programming ◽

10.1155/2017/7545347 ◽

2017 ◽

Vol 2017 ◽

pp. 1-10 ◽

Cited By ~ 1

Author(s):

Zhe Ding ◽

Zhen Qin ◽

Zhiguang Qin

Keyword(s):

Data Mining ◽

Data Privacy ◽

Differential Privacy ◽

Uncertain Data ◽

Frequent Itemsets ◽

Medical Data ◽

Patient Data ◽

Background Information ◽

Sensitive Information ◽

Symptom Association Probability

Data mining techniques are applied to identify hidden patterns in large amounts of patient data. These patterns can assist physicians in making more accurate diagnosis. For different physical conditions of patients, the same physiological index corresponds to a different symptom association probability for each patient. Data mining technologies based on certain data cannot be directly applied to these patients’ data. Patient data are sensitive data. An adversary with sufficient background information can make use of the patterns mined from uncertain medical data to obtain the sensitive information of patients. In this paper, a new algorithm is presented to determine the top K most frequent itemsets from uncertain medical data and to protect data privacy. Based on traditional algorithms for mining frequent itemsets from uncertain data, our algorithm applies sparse vector algorithm and the Laplace mechanism to ensure differential privacy for the top K most frequent itemsets for uncertain medical data and the expected supports of these frequent itemsets. We prove that our algorithm can guarantee differential privacy in theory. Moreover, we carry out experiments with four real-world scenario datasets and two synthetic datasets. The experimental results demonstrate the performance of our algorithm.

Download Full-text

The issues connected with the anonymization of medical data. Part 1. The introduction to the anonymization of medical data. Ensuring the protection of sensitive information with the use of such methods as f(a) and f(a,b)

HIGHER SCHOOL’S PULSE ◽

10.5604/01.3001.0003.3155 ◽

2014 ◽

Vol 8 (1) ◽

pp. 13-21 ◽

Cited By ~ 1

Author(s):

ARKADIUSZ LIBER

Keyword(s):

Privacy Protection ◽

Data Privacy ◽

Differential Privacy ◽

Medical Data ◽

Medical Documentation ◽

Sensitive Information ◽

Sensitive Data ◽

New Methods ◽

Anonymized Data ◽

Data Privacy Protection

Introduction: Medical documentation must be protected against damage or loss, in compliance with its integrity and credibility and the opportunity to a permanent access by the authorized staff and, finally, protected against the access of unauthorized persons. Anonymization is one of the methods to safeguard the data against the disclosure.Aim of the study: The study aims at the analysis of methods of anonymization, the analysis of methods of the protection of anonymized data and the study of a new security type of privacy enabling to control sensitive data by the entity which the data concerns.Material and methods: The analytical and algebraic methods were used.Results: The study ought to deliver the materials supporting the choice and analysis of the ways of the anonymization of medical data, and develop a new privacy protection solution enabling the control of sensitive data by entities whom this data concerns.Conclusions: In the paper, the analysis of solutions of data anonymizing used for medical data privacy protection was con-ducted. The methods, such as k-Anonymity, (X,y)- Anonymity, (a,k)- Anonymity, (k,e)-Anonymity, (X,y)-Privacy, LKC-Privacy, l-Diversity, (X,y)-Linkability, t-Closeness, Confidence Bounding and Personalized Privacy were described, explained and analyzed. The analysis of solutions to control sensitive data by their owners was also conducted. Apart from the existing methods of the anonymization, the analysis of methods of the anonimized data protection was conducted, in particular the methods of: d-Presence, e-Differential Privacy, (d,g)-Privacy, (a,b)-Distributing Privacy and protections against (c,t)-Isolation were analyzed. The author introduced a new solution of the controlled protection of privacy. The solution is based on marking a protected field and multi-key encryption of the sensitive value. The suggested way of fields marking is in accordance to the XML standard. For the encryption (n,p) different key cipher was selected. To decipher the content the p keys of n is used. The proposed solution enables to apply brand new methods for the control of privacy of disclosing sensitive data.

Download Full-text

Privacy Preserving Data Mining on Unstructured Data

Privacy and Security Policies in Big Data - Advances in Information Security, Privacy, and Ethics ◽

10.4018/978-1-5225-2486-1.ch008 ◽

2017 ◽

pp. 167-190

Author(s):

Trupti Vishwambhar Kenekar ◽

Ajay R. Dani

Keyword(s):

Data Mining ◽

Big Data ◽

Structure Data ◽

Data Privacy ◽

Differential Privacy ◽

Unstructured Data ◽

Map Reduce ◽

Individual Data ◽

Data Set ◽

Privacy Preserving Data Mining

As Big Data is group of structured, unstructured and semi-structure data collected from various sources, it is important to mine and provide privacy to individual data. Differential Privacy is one the best measure which provides strong privacy guarantee. The chapter proposed differentially private frequent item set mining using map reduce requires less time for privately mining large dataset. The chapter discussed problem of preserving data privacy, different challenges to preserving data privacy in big data environment, Data privacy techniques and their applications to unstructured data. The analyses of experimental results on structured and unstructured data set are also presented.

Download Full-text

Towards Distributed Association Rule Mining Privacy

Application of Agents and Intelligent Information Technologies - Advances in Intelligent Information Technologies ◽

10.4018/978-1-59904-265-7.ch011 ◽

2011 ◽

pp. 245-271

Author(s):

Mafruz Ashrafi ◽

David Taniar ◽

Kate Smith

Keyword(s):

Data Mining ◽

Data Privacy ◽

Large Data ◽

Digital Data ◽

Sensitive Information ◽

Distributed Data ◽

Data Repositories ◽

Actionable Knowledge ◽

The Cost ◽

Network Technologies

With the advancement of storage, retrieval, and network technologies today, the amount of information available to each organization is literally exploding. Although it is widely recognized that the value of data as an organizational asset often becomes a liability because of the cost to acquire and manage those data is far more than the value that is derived from it. Thus, the success of modern organizations not only relies on their capability to acquire and manage their data but their efficiency to derive useful actionable knowledge from it. To explore and analyze large data repositories and discover useful actionable knowledge from them, modern organizations have used a technique known as data mining, which analyzes voluminous digital data and discovers hidden but useful patterns from such massive digital data. However, discovery of hidden patterns has statistical meaning and may often disclose some sensitive information. As a result, privacy becomes one of the prime concerns in the data-mining research community. Since distributed data mining discovers rules by combining local models from various distributed sites, breaching data privacy happens more often than it does in centralized environments.

Download Full-text

Privacy-Preserving Process Mining in Healthcare

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph17051612 ◽

2020 ◽

Vol 17 (5) ◽

pp. 1612 ◽

Cited By ~ 4

Author(s):

Anastasiia Pika ◽

Moe T. Wynn ◽

Stephanus Budiono ◽

Arthur H.M. ter Hofstede ◽

Wil M.P. van der Aalst ◽

...

Keyword(s):

Data Mining ◽

Data Privacy ◽

Process Mining ◽

Personal Data ◽

Data Transformation ◽

Privacy Preserving ◽

Sensitive Information ◽

Process Data ◽

Mining Community ◽

Healthcare Process

Process mining has been successfully applied in the healthcare domain and has helped to uncover various insights for improving healthcare processes. While the benefits of process mining are widely acknowledged, many people rightfully have concerns about irresponsible uses of personal data. Healthcare information systems contain highly sensitive information and healthcare regulations often require protection of data privacy. The need to comply with strict privacy requirements may result in a decreased data utility for analysis. Until recently, data privacy issues did not get much attention in the process mining community; however, several privacy-preserving data transformation techniques have been proposed in the data mining community. Many similarities between data mining and process mining exist, but there are key differences that make privacy-preserving data mining techniques unsuitable to anonymise process data (without adaptations). In this article, we analyse data privacy and utility requirements for healthcare process data and assess the suitability of privacy-preserving data transformation methods to anonymise healthcare data. We demonstrate how some of these anonymisation methods affect various process mining results using three publicly available healthcare event logs. We describe a framework for privacy-preserving process mining that can support healthcare process mining analyses. We also advocate the recording of privacy metadata to capture information about privacy-preserving transformations performed on an event log.

Download Full-text

Differential privacy based classification model for mining medical data stream using adaptive random forest

Acta Universitatis Sapientiae Informatica ◽

10.2478/ausi-2021-0001 ◽

2021 ◽

Vol 13 (1) ◽

pp. 1-20

Author(s):

Hayder K. Fatlawi ◽

Attila Kiss

Keyword(s):

Data Mining ◽

Random Forest ◽

Data Stream ◽

Differential Privacy ◽

Medical Data ◽

The Other ◽

Classification Model ◽

Mining Operations ◽

Typical Data ◽

Stable Performance

Abstract Most typical data mining techniques are developed based on training the batch data which makes the task of mining the data stream represent a significant challenge. On the other hand, providing a mechanism to perform data mining operations without revealing the patient’s identity has increasing importance in the data mining field. In this work, a classification model with differential privacy is proposed for mining the medical data stream using Adaptive Random Forest (ARF). The experimental results of applying the proposed model on four medical datasets show that ARF mostly has a more stable performance over the other six techniques.

Download Full-text

Data Privacy Preservation and Security Approaches for Sensitive Data in Big Data

10.3233/apc210221 ◽

2021 ◽

Author(s):

Rohit Ravindra Nikam ◽

Rekha Shahapurkar

Keyword(s):

Data Mining ◽

Data Analytics ◽

Data Privacy ◽

Privacy Preservation ◽

Large Data ◽

Research Area ◽

Data Sets ◽

Sensitive Information ◽

Sensitive Data ◽

Data Mining Techniques

Data mining is a technique that explores the necessary data is extracted from large data sets. Privacy protection of data mining is about hiding the sensitive information or identity of breach security or without losing data usability. Sensitive data contains confidential information about individuals, businesses, and governments who must not agree upon before sharing or publishing his privacy data. Conserving data mining privacy has become a critical research area. Various evaluation metrics such as performance in terms of time efficiency, data utility, and degree of complexity or resistance to data mining techniques are used to estimate the privacy preservation of data mining techniques. Social media and smart phones produce tons of data every minute. To decision making, the voluminous data produced from the different sources can be processed and analyzed. But data analytics are vulnerable to breaches of privacy. One of the data analytics frameworks is recommendation systems commonly used by e-commerce sites such as Amazon, Flip Kart to recommend items to customers based on their purchasing habits that lead to characterized. This paper presents various techniques of privacy conservation, such as data anonymization, data randomization, generalization, data permutation, etc. such techniques which existing researchers use. We also analyze the gap between various processes and privacy preservation methods and illustrate how to overcome such issues with new innovative methods. Finally, our research describes the outcome summary of the entire literature.

Download Full-text

A Differential Privacy Protection Query Language for Medical Data: Study Design (Preprint)

10.2196/preprints.23073 ◽

2020 ◽

Author(s):

Huanhuan Wang ◽

Xiang Wu ◽

Yongqi Tan ◽

Hongsheng Yin ◽

Xiaochun Cheng ◽

...

Keyword(s):

Data Mining ◽

Big Data ◽

Privacy Protection ◽

Modular Design ◽

Differential Privacy ◽

Query Language ◽

Medical Data ◽

Medical Data Mining ◽

Medical Big Data ◽

Mining Algorithms

BACKGROUND Medical data mining and sharing is an important process to realize the value of medical big data in E-Health applications. However, medical data contains a large amount of personal private information of patients, there is a risk of privacy disclosure when sharing and mining. Therefore, how to ensure the security of medical big data in the process of publishing, sharing and mining has become the focus of current researches. OBJECTIVE The objective of our study is to design a framework based on differential privacy protection mechanism to ensure the security sharing of medical data. We developed a privacy Protection Query Language (PQL) that can integrate multiple machine mining methods and provide secure sharing functions for medical data. METHODS This paper adopts a modular design method with three sub-modules, including parsing module, mining module and noising module. Each module encapsulates different computing devices, such as composite parser, noise jammer, etc. In the PQL framework, we apply the differential privacy mechanism to the results of the module collaborative calculation to optimize the security of various mining algorithms. These computing devices operate independently, but the mining results depend on their cooperation. RESULTS Designed and developed a query language framework that provides medical data mining, sharing and privacy preserving functions. We theoretically proved the performance of the PQL framework. The experimental results showed that the PQL framework can ensure the security of each mining result, and the average usefulness of the output results is above 97%. CONCLUSIONS We presented a security framework that enables medical data providers to securely share the health data or treatment data, and developed a usable query language based on differential privacy mechanism that enables researchers to mine potential information securely using data mining algorithms. CLINICALTRIAL

Download Full-text

Overview of Privacy Protection Technology Based on Database Application

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.241-244.2816 ◽

2012 ◽

Vol 241-244 ◽

pp. 2816-2821 ◽

Cited By ~ 2

Author(s):

Hai Fang Wei ◽

Bei Zhan Wang ◽

Xiang Deng ◽

Ai Hua Wu

Keyword(s):

Data Mining ◽

Privacy Protection ◽

Data Privacy ◽

Sensitive Information ◽

Basic Principles ◽

Database Application ◽

Research Results ◽

Depth Analysis ◽

Protection Technology ◽

Future Direction

With the emergence and development of data applications such as database and data mining, how to protect data privacy and prevent disclosure of sensitive information has become one of the major challenges we are facing now. Privacy protection technologies need to protect data privacy without compromising data applications. The research results of privacy protection field are summarized, and the basic principles and features of various types of privacy protection technologies are described. After the in-depth analysis and comparison of existing technologies, this paper points out the future direction of the privacy protection technology.

Download Full-text

Top-k Frequent Itemsets Publication of Uncertain Data Based on Differential Privacy

Web Information Systems and Applications - Lecture Notes in Computer Science ◽

10.1007/978-3-030-60029-7_49 ◽

2020 ◽

pp. 547-558

Author(s):

Yunfeng Zou ◽

Xiaohan Bao ◽

Chao Xu ◽

Weiwei Ni

Keyword(s):

Differential Privacy ◽

Uncertain Data ◽

Frequent Itemsets

Download Full-text

Stochastic Channel-Based Federated Learning With Neural Network Pruning for Medical Data Privacy Preservation: Model Development and Experimental Validation (Preprint)

10.2196/preprints.17265 ◽

2019 ◽

Author(s):

Rulin Shao ◽

Hongyu He ◽

Ziwei Chen ◽

Hui Liu ◽

Dianbo Liu

Keyword(s):

Distributed System ◽

Data Privacy ◽

High Performance ◽

Privacy Preservation ◽

Characteristic Curve ◽

Model Development ◽

Medical Data ◽

Performance Model ◽

Security And Privacy ◽

Sensitive Information

BACKGROUND Artificial neural networks have achieved unprecedented success in the medical domain. This success depends on the availability of massive and representative datasets. However, data collection is often prevented by privacy concerns, and people want to take control over their sensitive information during both the training and using processes. OBJECTIVE To address security and privacy issues, we propose a privacy-preserving method for the analysis of distributed medical data. The proposed method, termed stochastic channel-based federated learning (SCBFL), enables participants to train a high-performance model cooperatively and in a distributed manner without sharing their inputs. METHODS We designed, implemented, and evaluated a channel-based update algorithm for a central server in a distributed system. The update algorithm will select the channels with regard to the most active features in a training loop, and then upload them as learned information from local datasets. A pruning process, which serves as a model accelerator, was further applied to the algorithm based on the validation set. RESULTS We constructed a distributed system consisting of 5 clients and 1 server. Our trials showed that the SCBFL method can achieve an area under the receiver operating characteristic curve (AUC-ROC) of 0.9776 and an area under the precision-recall curve (AUC-PR) of 0.9695 with only 10% of channels shared with the server. Compared with the federated averaging algorithm, the proposed SCBFL method achieved a 0.05388 higher AUC-ROC and 0.09695 higher AUC-PR. In addition, our experiment showed that 57% of the time is saved by the pruning process with only a reduction of 0.0047 in AUC-ROC performance and a reduction of 0.0068 in AUC-PR performance. CONCLUSIONS In this experiment, our model demonstrated better performance and a higher saturating speed than the federated averaging method, which reveals all of the parameters of local models to the server. The saturation rate of performance could be promoted by introducing a pruning process and further improvement could be achieved by tuning the pruning rate.

Download Full-text