scholarly journals Beyond Genes: Re-Identifiability of Proteomic Data and Its Implications for Personalized Medicine

Genes ◽  
2019 ◽  
Vol 10 (9) ◽  
pp. 682 ◽  
Author(s):  
Kurt Boonen ◽  
Kristien Hens ◽  
Gerben Menschaert ◽  
Geert Baggerman ◽  
Dirk Valkenborg ◽  
...  

The increasing availability of high throughput proteomics data provides us with opportunities as well as posing new ethical challenges regarding data privacy and re-identifiability of participants. Moreover, the fact that proteomics represents a level between the genotype and the phenotype further exacerbates the situation, introducing dilemmas related to publicly available data, anonymization, ownership of information and incidental findings. In this paper, we try to differentiate proteomics from genomics data and cover the ethical challenges related to proteomics data sharing. Finally, we give an overview of the proposed solutions and the outlook for future studies.

Author(s):  
Shalin Eliabeth S. ◽  
Sarju S.

Big data privacy preservation is one of the most disturbed issues in current industry. Sometimes the data privacy problems never identified when input data is published on cloud environment. Data privacy preservation in hadoop deals in hiding and publishing input dataset to the distributed environment. In this paper investigate the problem of big data anonymization for privacy preservation from the perspectives of scalability and time factor etc. At present, many cloud applications with big data anonymization faces the same kind of problems. For recovering this kind of problems, here introduced a data anonymization algorithm called Two Phase Top-Down Specialization (TPTDS) algorithm that is implemented in hadoop. For the data anonymization-45,222 records of adults information with 15 attribute values was taken as the input big data. With the help of multidimensional anonymization in map reduce framework, here implemented proposed Two-Phase Top-Down Specialization anonymization algorithm in hadoop and it will increases the efficiency on the big data processing system. By conducting experiment in both one dimensional and multidimensional map reduce framework with Two Phase Top-Down Specialization algorithm on hadoop, the better result shown in multidimensional anonymization on input adult dataset. Data sets is generalized in a top-down manner and the better result was shown in multidimensional map reduce framework by the better IGPL values generated by the algorithm. The anonymization was performed with specialization operation on taxonomy tree. The experiment shows that the solutions improves the IGPL values, anonymity parameter and decreases the execution time of big data privacy preservation by compared to the existing algorithm. This experimental result will leads to great application to the distributed environment.


Author(s):  
Djuro Josić ◽  
Tamara Martinović ◽  
Urh Černigoj ◽  
Jana Vidič ◽  
Krešimir Pavelić

2015 ◽  
Vol 38 (4) ◽  
Author(s):  
Eva C. Winkler ◽  
Christoph Schickhardt

Abstract:The use of whole genome sequencing in translational research not only holds promise for finding new targeted therapies but also raises several ethical and legal questions. The four main ethical and legal challenges are as follows: (1) the handling of additional or incidental findings stemming from whole genome sequencing in research contexts; (2) the compatibility and balancing of data protection and research that is based on broad data sharing; (3) the responsibility of researchers, particularly of non-physician researchers, working in the field of genome sequencing; and (4) the process of informing and asking patients or research subjects for informed consent to the sequencing of their genome. In this paper, first, these four challenges are illustrated and, second, concrete solutions are proposed, as elaborated by the interdisciplinary Heidelberg EURAT project group, as guidelines for the use of genome sequencing in translation research and therapy in Heidelberg.


F1000Research ◽  
2018 ◽  
Vol 7 ◽  
pp. 741 ◽  
Author(s):  
Kevin Rue-Albrecht ◽  
Federico Marini ◽  
Charlotte Soneson ◽  
Aaron T.L. Lun

Data exploration is critical to the comprehension of large biological data sets generated by high-throughput assays such as sequencing. However, most existing tools for interactive visualisation are limited to specific assays or analyses. Here, we present the iSEE (Interactive SummarizedExperiment Explorer) software package, which provides a general visual interface for exploring data in a SummarizedExperiment object. iSEE is directly compatible with many existing R/Bioconductor packages for analysing high-throughput biological data, and provides useful features such as simultaneous examination of (meta)data and analysis results, dynamic linking between plots and code tracking for reproducibility. We demonstrate the utility and flexibility of iSEE by applying it to explore a range of real transcriptomics and proteomics data sets.


2015 ◽  
Author(s):  
Lisa M. Breckels ◽  
Sean Holden ◽  
David Wojnar ◽  
Claire M. Mulvey ◽  
Andy Christoforou ◽  
...  

AbstractSub-cellular localisation of proteins is an essential post-translational regulatory mechanism that can be assayed using high-throughput mass spectrometry (MS). These MS-based spatial proteomics experiments enable us to pinpoint the sub-cellular distribution of thousands of proteins in a specific system under controlled conditions. Recent advances in high-throughput MS methods have yielded a plethora of experimental spatial proteomics data for the cell biology community. Yet, there are many third-party data sources, such as immunofluorescence microscopy or protein annotations and sequences, which represent a rich and vast source of complementary information. We present a unique transfer learning classification framework that utilises a nearest-neighbour or support vector machine system, to integrate heterogeneous data sources to considerably improve on the quantity and quality of sub-cellular protein assignment. We demonstrate the utility of our algorithms through evaluation of five experimental datasets, from four different species in conjunction with four different auxiliary data sources to classify proteins to tens of sub-cellular compartments with high generalisation accuracy. We further apply the method to an experiment on pluripotent mouse embryonic stem cells to classify a set of previously unknown proteins, and validate our findings against a recent high resolution map of the mouse stem cell proteome. The methodology is distributed as part of the open-source Bioconductor pRoloc suite for spatial proteomics data analysis.AbbreviationsLOPITLocalisation of Organelle Proteins by Isotope TaggingPCPProtein Correlation ProfilingMLMachine learningTLTransfer learningSVMSupport vector machinePCAPrincipal component analysisGOGene OntologyCCCellular compartmentiTRAQIsobaric tags for relative and absolute quantitationTMTTandem mass tagsMSMass spectrometry


2020 ◽  
Author(s):  
Thierry Balliau ◽  
Harold Duruflé ◽  
Nicolas Blanchet ◽  
Mélisande Blein-Nicolas ◽  
Nicolas B. Langlade ◽  
...  

AbstractThis article describes how the proteomic data were produced on sunflower plants subjected to water deficit. Twenty-four sunflower genotypes were selected to represent genetic diversity within cultivated sunflower. They included both inbred lines and their hybridsWater deficit was applied to plants in pots at the vegetative stage using the high-throughput phenotyping platform Heliaphen. Here, we provide proteomic data from sunflower leaves corresponding to the identification of 3062 proteins and the quantification of 1211 of them in these 24 genotypes grown in two watering conditions. These data differentiate both treatment and the different genotypes and constitute a valuable resource to the community to study adaptation of crops to drought and the molecular basis of heterosis.


2021 ◽  
Author(s):  
Oliver M. Crook ◽  
Colin T. R. Davies ◽  
Laurent Gatto ◽  
Paul D.W. Kirk ◽  
Kathryn S. Lilley

AbstractThe steady-state localisation of proteins provides vital insight into their function. These localisations are context specific with proteins translocating between different sub-cellular niches upon perturbation of the subcellular environment. Differential localisation provides a step towards mechanistic insight of subcellular protein dynamics. Aberrant localisation has been implicated in a number of pathologies, thus differential localisation may help characterise disease states and facilitate rational drug discovery by suggesting novel targets. High-accuracy high-throughput mass spectrometry-based methods now exist to map the steady-state localisation and re-localisation of proteins. Here, we propose a principled Bayesian approach, BANDLE, that uses these data to compute the probability that a protein differentially localises upon cellular perturbation, as well quantifying the uncertainty in these estimates. Furthermore, BANDLE allows information to be shared across spatial proteomics datasets to improve statistical power. Extensive simulation studies demonstrate that BANDLE reduces the number of both type I and type II errors compared to existing approaches. Application of BANDLE to datasets studying EGF stimulation and AP-4 dependent localisation recovers well studied translocations, using only two-thirds of the provided data. Moreover, we implicate TMEM199 with AP-4 dependent localisation. In an application to cytomegalovirus infection, we obtain novel insights into the rewiring of the host proteome. Integration of high-throughput transcriptomic and proteomic data, along with degradation assays, acetylation experiments and a cytomegalovirus interactome allows us to provide the functional context of these data.


2018 ◽  
Vol 17 (10) ◽  
pp. 3431-3444 ◽  
Author(s):  
Becky C. Carlyle ◽  
Robert R. Kitchen ◽  
Jing Zhang ◽  
Rashaun S. Wilson ◽  
Tukiet T. Lam ◽  
...  

Lab on a Chip ◽  
2021 ◽  
Author(s):  
Kunpeng Cai ◽  
Shruti Mankar ◽  
Taiga Ajiri ◽  
Kentaro Shirai ◽  
Tasuku Yotoriyama

There is an increasing need for the enrichment of rare cells in the clinical environments of precision medicine, personalized medicine, and regenerative medicine. With the possibility of becoming the next-generation...


2022 ◽  
Vol 25 (1) ◽  
pp. 1-25
Author(s):  
Sibghat Ullah Bazai ◽  
Julian Jang-Jaccard ◽  
Hooman Alavizadeh

Multi-dimensional data anonymization approaches (e.g., Mondrian) ensure more fine-grained data privacy by providing a different anonymization strategy applied for each attribute. Many variations of multi-dimensional anonymization have been implemented on different distributed processing platforms (e.g., MapReduce, Spark) to take advantage of their scalability and parallelism supports. According to our critical analysis on overheads, either existing iteration-based or recursion-based approaches do not provide effective mechanisms for creating the optimal number of and relative size of resilient distributed datasets (RDDs), thus heavily suffer from performance overheads. To solve this issue, we propose a novel hybrid approach for effectively implementing a multi-dimensional data anonymization strategy (e.g., Mondrian) that is scalable and provides high-performance. Our hybrid approach provides a mechanism to create far fewer RDDs and smaller size partitions attached to each RDD than existing approaches. This optimal RDD creation and operations approach is critical for many multi-dimensional data anonymization applications that create tremendous execution complexity. The new mechanism in our proposed hybrid approach can dramatically reduce the critical overheads involved in re-computation cost, shuffle operations, message exchange, and cache management.


Sign in / Sign up

Export Citation Format

Share Document