scholarly journals Sensitivity Levels: Optimizing the Performance of Privacy Preserving DNA Alignment

2018 ◽  
Author(s):  
Maria Fernandes ◽  
Jérémie Decouchant ◽  
Marcus Völp ◽  
Francisco M Couto ◽  
Paulo Esteves-Veríssimo

AbstractThe advent of high throughput next-generation sequencing (NGS) machines made DNA sequencing cheaper, but also put pressure on the genomic life-cycle, which includes aligning millions of short DNA sequences, called reads, to a reference genome. On the performance side, efficient algorithms have been developed, and parallelized on public clouds. On the privacy side, since genomic data are utterly sensitive, several cryptographic mechanisms have been proposed to align reads securely, with a lower performance than the former, which in turn are not secure. This manuscript proposes a novel contribution to improving the privacy performance product in current genomic studies. Building on recent works that argue that genomics data needs to be × treated according to a threat-risk analysis, we introduce a multi-level sensitivity classification of genomic variations. Our classification prevents the amplification of possible privacy attacks, thanks to promoting and partitioning mechanisms among sensitivity levels. Thanks to this classification, reads can be aligned, stored, and later accessed, using different security levels. We then extend a recent filter, which detects the reads that carry sensitive information, to classify reads into sensitivity levels. Finally, based on a review of the existing alignment methods, we show that adapting alignment algorithms to reads sensitivity allows high performance gains, whilst enforcing high privacy levels. Our results indicate that using sensitivity levels is feasible to optimize the performance of privacy preserving alignment, if one combines the advantages of private and public clouds.

Author(s):  
Priya Ranjan ◽  
Raj Kumar Paul

With the increase of digital data on servers different approach of data mining is applied for the retrieval of interesting information in decision making. A major social concern of data mining is the issue of privacy and data security. So privacy preserving mining come in existence, as it validates those data mining algorithms that do not disclose sensitive information. This work provides privacy for sensitive rules that discriminate data on the basis of community, gender, country, etc. Rules are obtained by aprior algorithm of association rule mining. Those rules which contain sensitive item set with minimum threshold value are considered as sensitive. Perturbation technique is used for the hiding of sensitive rules. The age of large database is now a big issue. So researchers try to develop a high performance platform to efficiently secure these kind of data before publishing. Here proposed work has resolve this issue of digital data security by finding the relation between the columns of the dataset which is based on the highly relative association patterns. Here use of super modularity is also done which balance the risk and utilization of the data. Experiment is done on large dataset which have all kind of attribute for implementing proposed work features. The experiments showed that the proposed algorithms perform well on large databases. It work better as the Maximum lost pattern percentage is zero a certain value of support.


2019 ◽  
Author(s):  
Rulin Shao ◽  
Hongyu He ◽  
Hui Liu ◽  
Dianbo Liu

BACKGROUND Artificial neural network has achieved unprecedented success in a wide variety of domains such as classifying, predicting and recognizing objects. This success depends on the availability of massive and representative datasets. However, data collection is often prevented by privacy concerns and people want to take control over their sensitive information during both training and using processes. OBJECTIVE To address this problem, we propose a privacy-preserving method for the distributed system. The proposed method, Stochastic Channel-Based Federated Learning (SCBF), enables the participants to train a high-performance model cooperatively without sharing their inputs. METHODS Specifically, we design, implement and evaluate a channel-based update algorithm for the central server in a distributed system. The update algorithm will select the channels with regard to the most active features in a training loop and upload them as learned information from local datasets. A pruning process, which serves as a model accelerator, is applied to the algorithm based on the validation set. RESULTS We construct a distributed system consisting of 5 clients and 1 server. Our trials show that the Stochastic Channel-Based Federated Learning method can achieve an AUCROC of 0.9776 and an AUCPR of 0.9695 with 10% channels shared with the server. Compared with Federated Averaging algorithm, the proposed method achieves 0.05388 higher in AUCROC and 0.09695 higher in AUCPR. In addition, our experiment shows that 57% of the time is saved by the pruning process with only a reduction of 0.0047 in AUCROC performance and a reduction of 0.0068 in AUCPR. CONCLUSIONS In the experiment, our model presents better performances and higher saturating speed than the Federated Averaging method, which reveals all the parameters of local models to the server. We also demonstrate that the saturating rate of performance could be promoted by introducing a pruning process and further improvement could be achieved by tuning the pruning rate.


2019 ◽  
Author(s):  
Nour Almadhoun ◽  
Erman Ayday ◽  
Özgür Ulusoy

Abstract Motivation The rapid progress in genome sequencing has led to high availability of genomic data. However, due to growing privacy concerns about the participant’s sensitive information, accessing results and data of genomic studies is restricted to only trusted individuals. On the other hand, paving the way to biomedical discoveries requires granting open access to genomic databases. Privacy-preserving mechanisms can be a solution for granting wider access to such data while protecting their owners. In particular, there has been growing interest in applying the concept of differential privacy (DP) while sharing summary statistics about genomic data. DP provides a mathematically rigorous approach but it does not consider the dependence between tuples in a database, which may degrade the privacy guarantees offered by the DP. Results In this work, focusing on genomic databases, we show this drawback of DP and we propose techniques to mitigate it. First, using a real-world genomic dataset, we demonstrate the feasibility of an inference attack on differentially private query results by utilizing the correlations between the tuples in the dataset. The results show that the adversary can infer sensitive genomic data about a user from the differentially private query results by exploiting correlations between genomes of family members. Second, we propose a mechanism for privacy-preserving sharing of statistics from genomic datasets to attain privacy guarantees while taking into consideration the dependence between tuples. By evaluating our mechanism on different genomic datasets, we empirically demonstrate that our proposed mechanism can achieve up to 50% better privacy than traditional DP-based solutions. Availability https://github.com/nourmadhoun/Differential-privacy-genomic-inference-attack. Supplementary information Supplementary data are available at Bioinformatics online.


2008 ◽  
Vol 59 (7) ◽  
Author(s):  
Corina Samoila ◽  
Alfa Xenia Lupea ◽  
Andrei Anghel ◽  
Marilena Motoc ◽  
Gabriela Otiman ◽  
...  

Denaturing High Performance Liquid Chromatography (DHPLC) is a relatively new method used for screening DNA sequences, characterized by high capacity to detect mutations/polymorphisms. This study is focused on the Transgenomic WAVETM DNA Fragment Analysis (based on DHPLC separation method) of a 485 bp fragment from human EC-SOD gene promoter in order to detect single nucleotide polymorphism (SNPs) associated with atherosclerosis and risk factors of cardiovascular disease. The fragment of interest was amplified by PCR reaction and analyzed by DHPLC in 100 healthy subjects and 70 patients characterized by atheroma. No different melting profiles were detected for the analyzed DNA samples. A combination of computational methods was used to predict putative transcription factors in the fragment of interest. Several putative transcription factors binding sites from the Ets-1 oncogene family: ETS member Elk-1, polyomavirus enhancer activator-3 (PEA3), protein C-Ets-1 (Ets-1), GABP: GA binding protein (GABP), Spi-1 and Spi-B/PU.1 related transcription factors, from the Krueppel-like family: Gut-enriched Krueppel-like factor (GKLF), Erythroid Krueppel-like factor (EKLF), Basic Krueppel-like factor (BKLF), GC box and myeloid zinc finger protein MZF-1 were identified in the evolutionary conserved regions. The bioinformatics results need to be investigated further in others studies by experimental approaches.


2014 ◽  
Vol 28 (07) ◽  
pp. 1450056 ◽  
Author(s):  
Hua-Lin Cai ◽  
Yi Yang ◽  
Yi-Han Zhang ◽  
Chang-Jian Zhou ◽  
Cang-Ran Guo ◽  
...  

In this paper, a surface acoustic wave (SAW) biosensor with gold delay area on LiNbO 3 substrate detecting DNA sequences is proposed. By well-designed device parameters of the SAW sensor, it achieves a high performance for highly sensitive detection of target DNA. In addition, an effective biological treatment method for DNA immobilization and abundant experimental verification of the sensing effect have made it a reliable device in DNA detection. The loading mass of the probe and target DNA sequences is obtained from the frequency shifts, which are big enough in this work due to an effective biological treatment. The experimental results show that the biosensor has a high sensitivity of 1.2 pg/ml/Hz and high selectivity characteristic is also verified by the few responses of other substances. In combination with wireless transceiver, we develop a wireless receiving and processing system that can directly display the detection results.


2021 ◽  
Vol 11 (12) ◽  
pp. 3164-3173
Author(s):  
R. Indhumathi ◽  
S. Sathiya Devi

Data sharing is essential in present biomedical research. A large quantity of medical information is gathered and for different objectives of analysis and study. Because of its large collection, anonymity is essential. Thus, it is quite important to preserve privacy and prevent leakage of sensitive information of patients. Most of the Anonymization methods such as generalisation, suppression and perturbation are proposed to overcome the information leak which degrades the utility of the collected data. During data sanitization, the utility is automatically diminished. Privacy Preserving Data Publishing faces the main drawback of maintaining tradeoff between privacy and data utility. To address this issue, an efficient algorithm called Anonymization based on Improved Bucketization (AIB) is proposed, which increases the utility of published data while maintaining privacy. The Bucketization technique is used in this paper with the intervention of the clustering method. The proposed work is divided into three stages: (i) Vertical and Horizontal partitioning (ii) Assigning Sensitive index to attributes in the cluster (iii) Verifying each cluster against privacy threshold (iv) Examining for privacy breach in Quasi Identifier (QI). To increase the utility of published data, the threshold value is determined based on the distribution of elements in each attribute, and the anonymization method is applied only to the specific QI element. As a result, the data utility has been improved. Finally, the evaluation results validated the design of paper and demonstrated that our design is effective in improving data utility.


Sign in / Sign up

Export Citation Format

Share Document