Beyond Genes: Re-Identifiability of Proteomic Data and Its Implications for Personalized Medicine

Kurt Boonen; Kristien Hens; Gerben Menschaert; Geert Baggerman; Dirk Valkenborg; Gokhan Ertaylan

doi:10.3390/genes10090682

Beyond Genes: Re-Identifiability of Proteomic Data and Its Implications for Personalized Medicine

Genes ◽

10.3390/genes10090682 ◽

2019 ◽

Vol 10 (9) ◽

pp. 682 ◽

Cited By ~ 2

Author(s):

Kurt Boonen ◽

Kristien Hens ◽

Gerben Menschaert ◽

Geert Baggerman ◽

Dirk Valkenborg ◽

...

Keyword(s):

Personalized Medicine ◽

High Throughput ◽

Incidental Findings ◽

Data Privacy ◽

Ethical Challenges ◽

Proteomics Data ◽

Future Studies ◽

Data Anonymization ◽

Proteomic Data ◽

Ownership Of Information

The increasing availability of high throughput proteomics data provides us with opportunities as well as posing new ethical challenges regarding data privacy and re-identifiability of participants. Moreover, the fact that proteomics represents a level between the genotype and the phenotype further exacerbates the situation, introducing dilemmas related to publicly available data, anonymization, ownership of information and incidental findings. In this paper, we try to differentiate proteomics from genomics data and cover the ethical challenges related to proteomics data sharing. Finally, we give an overview of the proposed solutions and the outlook for future studies.

Download Full-text

Big Data Privacy Preservation Using Two Phase Top-Down Specialization Algorithm with Multidimensional Map Reduce Framework on Hadoop

International Journal of Distributed and Cloud Computing ◽

10.21863/ijdcc/2015.3.2.009 ◽

2015 ◽

Vol 3 (2) ◽

Author(s):

Shalin Eliabeth S. ◽

Sarju S.

Keyword(s):

Big Data ◽

Data Privacy ◽

Privacy Preservation ◽

Experimental Result ◽

Map Reduce ◽

Distributed Environment ◽

Top Down ◽

Two Phase ◽

Data Anonymization ◽

Big Data Privacy

Big data privacy preservation is one of the most disturbed issues in current industry. Sometimes the data privacy problems never identified when input data is published on cloud environment. Data privacy preservation in hadoop deals in hiding and publishing input dataset to the distributed environment. In this paper investigate the problem of big data anonymization for privacy preservation from the perspectives of scalability and time factor etc. At present, many cloud applications with big data anonymization faces the same kind of problems. For recovering this kind of problems, here introduced a data anonymization algorithm called Two Phase Top-Down Specialization (TPTDS) algorithm that is implemented in hadoop. For the data anonymization-45,222 records of adults information with 15 attribute values was taken as the input big data. With the help of multidimensional anonymization in map reduce framework, here implemented proposed Two-Phase Top-Down Specialization anonymization algorithm in hadoop and it will increases the efficiency on the big data processing system. By conducting experiment in both one dimensional and multidimensional map reduce framework with Two Phase Top-Down Specialization algorithm on hadoop, the better result shown in multidimensional anonymization on input adult dataset. Data sets is generalized in a top-down manner and the better result was shown in multidimensional map reduce framework by the better IGPL values generated by the algorithm. The anonymization was performed with specialization operation on taxonomy tree. The experiment shows that the solutions improves the IGPL values, anonymity parameter and decreases the execution time of big data privacy preservation by compared to the existing algorithm. This experimental result will leads to great application to the distributed environment.

Download Full-text

High-Throughput Analytics in the Function of Personalized Medicine

Personalized Medicine in Healthcare Systems - Europeanization and Globalization ◽

10.1007/978-3-030-16465-2_6 ◽

2019 ◽

pp. 67-87

Author(s):

Djuro Josić ◽

Tamara Martinović ◽

Urh Černigoj ◽

Jana Vidič ◽

Krešimir Pavelić

Keyword(s):

Personalized Medicine ◽

High Throughput

Download Full-text

Ethical challenges of whole genome sequencing in translational research and answers by the EURAT-project

LaboratoriumsMedizin ◽

10.1515/labmed-2014-0048 ◽

2015 ◽

Vol 38 (4) ◽

Author(s):

Eva C. Winkler ◽

Christoph Schickhardt

Keyword(s):

Translational Research ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Incidental Findings ◽

Whole Genome ◽

Ethical Challenges ◽

Research Subjects ◽

Translation Research ◽

Legal Questions ◽

Legal Challenges

Abstract:The use of whole genome sequencing in translational research not only holds promise for finding new targeted therapies but also raises several ethical and legal questions. The four main ethical and legal challenges are as follows: (1) the handling of additional or incidental findings stemming from whole genome sequencing in research contexts; (2) the compatibility and balancing of data protection and research that is based on broad data sharing; (3) the responsibility of researchers, particularly of non-physician researchers, working in the field of genome sequencing; and (4) the process of informing and asking patients or research subjects for informed consent to the sequencing of their genome. In this paper, first, these four challenges are illustrated and, second, concrete solutions are proposed, as elaborated by the interdisciplinary Heidelberg EURAT project group, as guidelines for the use of genome sequencing in translation research and therapy in Heidelberg.

Download Full-text

iSEE: Interactive SummarizedExperiment Explorer

F1000Research ◽

10.12688/f1000research.14966.1 ◽

2018 ◽

Vol 7 ◽

pp. 741 ◽

Cited By ~ 26

Author(s):

Kevin Rue-Albrecht ◽

Federico Marini ◽

Charlotte Soneson ◽

Aaron T.L. Lun

Keyword(s):

High Throughput ◽

Software Package ◽

Biological Data ◽

Data Exploration ◽

Data Sets ◽

Proteomics Data ◽

Code Tracking ◽

Dynamic Linking ◽

Interactive Visualisation ◽

Visual Interface

Data exploration is critical to the comprehension of large biological data sets generated by high-throughput assays such as sequencing. However, most existing tools for interactive visualisation are limited to specific assays or analyses. Here, we present the iSEE (Interactive SummarizedExperiment Explorer) software package, which provides a general visual interface for exploring data in a SummarizedExperiment object. iSEE is directly compatible with many existing R/Bioconductor packages for analysing high-throughput biological data, and provides useful features such as simultaneous examination of (meta)data and analysis results, dynamic linking between plots and code tracking for reproducibility. We demonstrate the utility and flexibility of iSEE by applying it to explore a range of real transcriptomics and proteomics data sets.

Download Full-text

Learning from heterogeneous data sources: an application in spatial proteomics

10.1101/022152 ◽

2015 ◽

Cited By ~ 1

Author(s):

Lisa M. Breckels ◽

Sean Holden ◽

David Wojnar ◽

Claire M. Mulvey ◽

Andy Christoforou ◽

...

Keyword(s):

Mass Spectrometry ◽

Support Vector Machine ◽

Transfer Learning ◽

High Throughput ◽

Cell Biology ◽

Heterogeneous Data ◽

Data Sources ◽

Support Vector ◽

Proteomics Data ◽

Heterogeneous Data Sources

AbstractSub-cellular localisation of proteins is an essential post-translational regulatory mechanism that can be assayed using high-throughput mass spectrometry (MS). These MS-based spatial proteomics experiments enable us to pinpoint the sub-cellular distribution of thousands of proteins in a specific system under controlled conditions. Recent advances in high-throughput MS methods have yielded a plethora of experimental spatial proteomics data for the cell biology community. Yet, there are many third-party data sources, such as immunofluorescence microscopy or protein annotations and sequences, which represent a rich and vast source of complementary information. We present a unique transfer learning classification framework that utilises a nearest-neighbour or support vector machine system, to integrate heterogeneous data sources to considerably improve on the quantity and quality of sub-cellular protein assignment. We demonstrate the utility of our algorithms through evaluation of five experimental datasets, from four different species in conjunction with four different auxiliary data sources to classify proteins to tens of sub-cellular compartments with high generalisation accuracy. We further apply the method to an experiment on pluripotent mouse embryonic stem cells to classify a set of previously unknown proteins, and validate our findings against a recent high resolution map of the mouse stem cell proteome. The methodology is distributed as part of the open-source Bioconductor pRoloc suite for spatial proteomics data analysis.AbbreviationsLOPITLocalisation of Organelle Proteins by Isotope TaggingPCPProtein Correlation ProfilingMLMachine learningTLTransfer learningSVMSupport vector machinePCAPrincipal component analysisGOGene OntologyCCCellular compartmentiTRAQIsobaric tags for relative and absolute quantitationTMTTandem mass tagsMSMass spectrometry

Download Full-text

Proteomic data from leaves of twenty-four sunflower genotypes underwater deficit

10.1101/2020.06.26.171066 ◽

2020 ◽

Author(s):

Thierry Balliau ◽

Harold Duruflé ◽

Nicolas Blanchet ◽

Mélisande Blein-Nicolas ◽

Nicolas B. Langlade ◽

...

Keyword(s):

Genetic Diversity ◽

Water Deficit ◽

High Throughput ◽

Molecular Basis ◽

Inbred Lines ◽

Vegetative Stage ◽

Valuable Resource ◽

Proteomic Data ◽

High Throughput Phenotyping

AbstractThis article describes how the proteomic data were produced on sunflower plants subjected to water deficit. Twenty-four sunflower genotypes were selected to represent genetic diversity within cultivated sunflower. They included both inbred lines and their hybridsWater deficit was applied to plants in pots at the vegetative stage using the high-throughput phenotyping platform Heliaphen. Here, we provide proteomic data from sunflower leaves corresponding to the identification of 3062 proteins and the quantification of 1211 of them in these 24 genotypes grown in two watering conditions. These data differentiate both treatment and the different genotypes and constitute a valuable resource to the community to study adaptation of crops to drought and the molecular basis of heterosis.

Download Full-text

Inferring differential subcellular localisation in comparative spatial proteomics using BANDLE

10.1101/2021.01.04.425239 ◽

2021 ◽

Author(s):

Oliver M. Crook ◽

Colin T. R. Davies ◽

Laurent Gatto ◽

Paul D.W. Kirk ◽

Kathryn S. Lilley

Keyword(s):

Steady State ◽

High Throughput ◽

Statistical Power ◽

Subcellular Localisation ◽

Type I ◽

Disease States ◽

Proteomic Data ◽

Rational Drug ◽

Context Specific ◽

Type Ii Errors

AbstractThe steady-state localisation of proteins provides vital insight into their function. These localisations are context specific with proteins translocating between different sub-cellular niches upon perturbation of the subcellular environment. Differential localisation provides a step towards mechanistic insight of subcellular protein dynamics. Aberrant localisation has been implicated in a number of pathologies, thus differential localisation may help characterise disease states and facilitate rational drug discovery by suggesting novel targets. High-accuracy high-throughput mass spectrometry-based methods now exist to map the steady-state localisation and re-localisation of proteins. Here, we propose a principled Bayesian approach, BANDLE, that uses these data to compute the probability that a protein differentially localises upon cellular perturbation, as well quantifying the uncertainty in these estimates. Furthermore, BANDLE allows information to be shared across spatial proteomics datasets to improve statistical power. Extensive simulation studies demonstrate that BANDLE reduces the number of both type I and type II errors compared to existing approaches. Application of BANDLE to datasets studying EGF stimulation and AP-4 dependent localisation recovers well studied translocations, using only two-thirds of the provided data. Moreover, we implicate TMEM199 with AP-4 dependent localisation. In an application to cytomegalovirus infection, we obtain novel insights into the rewiring of the host proteome. Integration of high-throughput transcriptomic and proteomic data, along with degradation assays, acetylation experiments and a cytomegalovirus interactome allows us to provide the functional context of these data.

Download Full-text

Isoform-Level Interpretation of High-Throughput Proteomics Data Enabled by Deep Integration with RNA-seq

Journal of Proteome Research ◽

10.1021/acs.jproteome.8b00310 ◽

2018 ◽

Vol 17 (10) ◽

pp. 3431-3444 ◽

Cited By ~ 5

Author(s):

Becky C. Carlyle ◽

Robert R. Kitchen ◽

Jing Zhang ◽

Rashaun S. Wilson ◽

Tukiet T. Lam ◽

...

Keyword(s):

High Throughput ◽

Rna Seq ◽

Proteomics Data ◽

Deep Integration

Download Full-text

An integrated high-throughput microfluidic circulatory fluorescence-activated cell sorting system (μ-CFACS) for the enrichment of rare cells

Lab on a Chip ◽

10.1039/d1lc00298h ◽

2021 ◽

Author(s):

Kunpeng Cai ◽

Shruti Mankar ◽

Taiga Ajiri ◽

Kentaro Shirai ◽

Tasuku Yotoriyama

Keyword(s):

Personalized Medicine ◽

Regenerative Medicine ◽

Precision Medicine ◽

High Throughput ◽

Cell Sorting ◽

Next Generation ◽

Fluorescence Activated Cell Sorting ◽

Clinical Environments ◽

Rare Cells ◽

Sorting System

There is an increasing need for the enrichment of rare cells in the clinical environments of precision medicine, personalized medicine, and regenerative medicine. With the possibility of becoming the next-generation...

Download Full-text

A Novel Hybrid Approach for Multi-Dimensional Data Anonymization for Apache Spark

ACM Transactions on Privacy and Security ◽

10.1145/3484945 ◽

2022 ◽

Vol 25 (1) ◽

pp. 1-25

Author(s):

Sibghat Ullah Bazai ◽

Julian Jang-Jaccard ◽

Hooman Alavizadeh

Keyword(s):

Critical Analysis ◽

Data Privacy ◽

High Performance ◽

Distributed Processing ◽

Hybrid Approach ◽

Relative Size ◽

Optimal Number ◽

Data Anonymization ◽

Fine Grained ◽

Message Exchange

Multi-dimensional data anonymization approaches (e.g., Mondrian) ensure more fine-grained data privacy by providing a different anonymization strategy applied for each attribute. Many variations of multi-dimensional anonymization have been implemented on different distributed processing platforms (e.g., MapReduce, Spark) to take advantage of their scalability and parallelism supports. According to our critical analysis on overheads, either existing iteration-based or recursion-based approaches do not provide effective mechanisms for creating the optimal number of and relative size of resilient distributed datasets (RDDs), thus heavily suffer from performance overheads. To solve this issue, we propose a novel hybrid approach for effectively implementing a multi-dimensional data anonymization strategy (e.g., Mondrian) that is scalable and provides high-performance. Our hybrid approach provides a mechanism to create far fewer RDDs and smaller size partitions attached to each RDD than existing approaches. This optimal RDD creation and operations approach is critical for many multi-dimensional data anonymization applications that create tremendous execution complexity. The new mechanism in our proposed hybrid approach can dramatically reduce the critical overheads involved in re-computation cost, shuffle operations, message exchange, and cache management.

Download Full-text