Private Genomes and Public SNPs: Homomorphic Encryption of Genotypes and Phenotypes for Shared Quantitative Genetics

Richard Mott; Christian Fischer; Pjotr Prins; Robert William Davies

doi:10.1534/genetics.120.303153

Private Genomes and Public SNPs: Homomorphic Encryption of Genotypes and Phenotypes for Shared Quantitative Genetics

Genetics ◽

10.1534/genetics.120.303153 ◽

2020 ◽

Vol 215 (2) ◽

pp. 359-372

Author(s):

Richard Mott ◽

Christian Fischer ◽

Pjotr Prins ◽

Robert William Davies

Keyword(s):

Mixed Models ◽

Homomorphic Encryption ◽

Orthogonal Transformation ◽

High Dimensional ◽

Brute Force ◽

Genetic Associations ◽

Privacy Concerns ◽

Quantitative Genetic ◽

Heritability Estimation ◽

Phenotype Data

Sharing human genotype and phenotype data is essential to discover otherwise inaccessible genetic associations, but is a challenge because of privacy concerns. Here, we present a method of homomorphic encryption that obscures individuals’ genotypes and phenotypes, and is suited to quantitative genetic association analysis. Encrypted ciphertext and unencrypted plaintext are analytically interchangeable. The encryption uses a high-dimensional random linear orthogonal transformation key that leaves the likelihood of quantitative trait data unchanged under a linear model with normally distributed errors. It also preserves linkage disequilibrium between genetic variants and associations between variants and phenotypes. It scrambles relationships between individuals: encrypted genotype dosages closely resemble Gaussian deviates, and can be replaced by quantiles from a Gaussian with negligible effects on accuracy. Likelihood-based inferences are unaffected by orthogonal encryption. These include linear mixed models to control for unequal relatedness between individuals, heritability estimation, and including covariates when testing association. Orthogonal transformations can be applied in a modular fashion for multiparty federated mega-analyses where the parties first agree to share a common set of genotype sites and covariates prior to encryption. Each then privately encrypts and shares their own ciphertext, and analyses all parties’ ciphertexts. In the absence of private variants, or knowledge of the key, we show that it is infeasible to decrypt ciphertext using existing brute-force or noise-reduction attacks. We present the method as a challenge to the community to determine its security.

Download Full-text

Private Genomes and Public SNPs: Homomorphic encryption of genotypes and phenotypes for shared quantitative genetics

10.1101/2020.04.02.021865 ◽

2020 ◽

Author(s):

Richard Mott ◽

Christian Fischer ◽

Pjotr Prins ◽

Robert William Davies

Keyword(s):

Linear Models ◽

Homomorphic Encryption ◽

Orthogonal Transformation ◽

Genetic Associations ◽

Mixed Linear Models ◽

Privacy Concerns ◽

Analytical Perspective ◽

Share Data ◽

Encryption Method ◽

Number Of Parties

AbstractSharing human genotype and phenotype data presents a challenge because of privacy concerns, but is essential in order to discover otherwise inaccessible genetic associations. Here we present a method of homomorphic encryption that obscures individuals’ genotypes and phenotypes and is suited to quantitative genetic association analysis. Encrypted ciphertext and unencrypted plaintext are interchangeable from an analytical perspective. This allows one to store ciphertext on public web services and share data across multiple studies, while maintaining privacy. The encryption method uses as its key a high-dimensional random linear orthogonal transformation that leaves the likelihood of quantitative trait data unchanged under a linear model with normally distributed errors. It also preserves linkage disequilibrium between genetic variants and associations between variants and phenotypes. It scrambles relationships between individuals: encrypted genotype dosages closely resemble Gaussian deviates, and in fact can be replaced by quantiles from a Gaussian with only negligible effects on accuracy. Standard likelihood-based inferences are unaffected by orthogonal encryption. These include the use of mixed linear models to control for unequal relatedness between individuals, the estimation of heritability, and the inclusion of covariates when testing for association. Orthogonal transformations can also be applied in a modular fashion that permits multi-party federated mega-analyses. Under this scheme any number of parties first agree to share a common set of genotype sites and covariates prior to encryption. Each party then privately encrypts and shares their own ciphertext, and analyses the other parties’ ciphertexts. In the absence of private variants, or knowledge of the key, we show that it is infeasible to decrypt ciphertext using existing brute-force or noise reduction attacks. Therefore, we present the method as a challenge to the community to determine its security.

Download Full-text

Improving heritability estimation by a variable selection approach in sparse high dimensional linear mixed models

Journal of the Royal Statistical Society Series C (Applied Statistics) ◽

10.1111/rssc.12261 ◽

2018 ◽

Vol 67 (4) ◽

pp. 813-839 ◽

Cited By ~ 3

Author(s):

Anna Bonnet ◽

Céline Lévy‐Leduc ◽

Elisabeth Gassiat ◽

Roberto Toro ◽

Thomas Bourgeron

Keyword(s):

Variable Selection ◽

Mixed Models ◽

Linear Mixed Models ◽

High Dimensional ◽

Heritability Estimation ◽

Selection Approach

Download Full-text

Heritability estimation in high dimensional sparse linear mixed models

Electronic Journal of Statistics ◽

10.1214/15-ejs1069 ◽

2015 ◽

Vol 9 (2) ◽

pp. 2099-2129 ◽

Cited By ~ 5

Author(s):

Anna Bonnet ◽

Elisabeth Gassiat ◽

Céline Lévy-Leduc

Keyword(s):

Mixed Models ◽

Linear Mixed Models ◽

High Dimensional ◽

Heritability Estimation

Download Full-text

CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets

F1000Research ◽

10.12688/f1000research.11622.3 ◽

2019 ◽

Vol 6 ◽

pp. 748 ◽

Cited By ~ 11

Author(s):

Malgorzata Nowicka ◽

Carsten Krieg ◽

Helena L. Crowell ◽

Lukas M. Weber ◽

Felix J. Hartmann ◽

...

Keyword(s):

Cell Population ◽

High Throughput ◽

Mixed Models ◽

Exploratory Data Analysis ◽

Linear Mixed Models ◽

High Dimensional ◽

Cell Populations ◽

Dimensional Scaling ◽

Exploratory Data

High-dimensional mass and flow cytometry (HDCyto) experiments have become a method of choice for high-throughput interrogation and characterization of cell populations. Here, we present an updated R-based pipeline for differential analyses of HDCyto data, largely based on Bioconductor packages. We computationally define cell populations using FlowSOM clustering, and facilitate an optional but reproducible strategy for manual merging of algorithm-generated clusters. Our workflow offers different analysis paths, including association of cell type abundance with a phenotype or changes in signalling markers within specific subpopulations, or differential analyses of aggregated signals. Importantly, the differential analyses we show are based on regression frameworks where the HDCyto data is the response; thus, we are able to model arbitrary experimental designs, such as those with batch effects, paired designs and so on. In particular, we apply generalized linear mixed models or linear mixed models to analyses of cell population abundance or cell-population-specific analyses of signaling markers, allowing overdispersion in cell count or aggregated signals across samples to be appropriately modeled. To support the formal statistical analyses, we encourage exploratory data analysis at every step, including quality control (e.g., multi-dimensional scaling plots), reporting of clustering results (dimensionality reduction, heatmaps with dendrograms) and differential analyses (e.g., plots of aggregated signals).

Download Full-text

R/qtlcharts: interactive graphics for quantitative trait locus mapping

10.1101/011437 ◽

2014 ◽

Cited By ~ 1

Author(s):

Karl W Broman

Keyword(s):

Quantitative Trait Locus ◽

Quantitative Trait Locus Mapping ◽

Quantitative Trait ◽

Quantitative Traits ◽

R Package ◽

High Dimensional ◽

Interactive Graphics ◽

Phenotype Data ◽

Trait Locus ◽

Locus Mapping

Every data visualization can be improved with some level of interactivity. Interactive graphics hold particular promise for the exploration of high-dimensional data. R/qtlcharts is an R package to create interactive graphics for experiments to map quantitative trait loci (QTL; genetic loci that influence quantitative traits). R/qtlcharts serves as a companion to the R/qtl package, providing interactive versions of R/qtl's static graphs, as well as additional interactive graphs for the exploration of high-dimensional genotype and phenotype data.

Download Full-text

General Methods for Evolutionary Quantitative Genetic Inference from Generalized Mixed Models

Genetics ◽

10.1534/genetics.115.186536 ◽

2016 ◽

Vol 204 (3) ◽

pp. 1281-1294 ◽

Cited By ~ 70

Author(s):

Pierre de Villemereuil ◽

Holger Schielzeth ◽

Shinichi Nakagawa ◽

Michael Morrissey

Keyword(s):

Mixed Models ◽

Quantitative Genetic ◽

Genetic Inference

Download Full-text

Bayesian adaptive lasso with variational Bayes for variable selection in high-dimensional generalized linear mixed models

Communications in Statistics - Simulation and Computation ◽

10.1080/03610918.2017.1387663 ◽

2018 ◽

Vol 48 (2) ◽

pp. 530-543 ◽

Cited By ~ 2

Author(s):

Dao Thanh Tung ◽

Minh-Ngoc Tran ◽

Tran Manh Cuong

Keyword(s):

Variable Selection ◽

Mixed Models ◽

Generalized Linear Mixed Models ◽

Linear Mixed Models ◽

Variational Bayes ◽

Adaptive Lasso ◽

High Dimensional

Download Full-text

Refined UNet V2: End-to-End Patch-Wise Network for Noise-Free Cloud and Shadow Segmentation

Remote Sensing ◽

10.3390/rs12213530 ◽

2020 ◽

Vol 12 (21) ◽

pp. 3530

Author(s):

Libin Jiao ◽

Lianzhi Huo ◽

Changmiao Hu ◽

Ping Tang

Keyword(s):

Message Passing ◽

Conditional Random Field ◽

High Dimensional ◽

Gaussian Filter ◽

Brute Force ◽

Time Consumption ◽

End To End ◽

Joint Prediction ◽

Essential Prerequisite ◽

Potential Efficiency

Cloud and shadow detection is an essential prerequisite for further remote sensing processing, whereas edge-precise segmentation remains a challenging issue. In Refined UNet, we considered the aforementioned task and proposed a two-stage pipeline to achieve the edge-precise segmentation. The isolated segmentation regions in Refined UNet, however, bring inferior visualization and should be sufficiently eliminated. Moreover, an end-to-end model is also expected to jointly predict and refine the segmentation results. In this paper, we propose the end-to-end Refined UNet v2 to achieve joint prediction and refinement of cloud and shadow segmentation, which is capable of visually neutralizing redundant segmentation pixels or regions. To this end, we inherit the pipeline of Refine UNet, revisit the bilateral message passing in the inference of conditional random field (CRF), and then develop a novel bilateral strategy derived from the Guided Gaussian filter. Derived from a local linear model of denoising, our v2 can considerably remove isolated segmentation pixels or regions, which is able to yield “cleaner” results. Compared to the high-dimensional Gaussian filter, the Guided Gaussian filter-based message-passing strategy is quite straightforward and easy to implement so that a brute-force implementation can be easily given in GPU frameworks, which is potentially efficient and facilitates embedding. Moreover, we prove that Guided Gaussian filter-based message passing is highly relevant to the Gaussian bilateral term in Dense CRF. Experiments and results demonstrate that our v2 is quantitatively comparable to Refined UNet, but can visually outperform that from the noise-free segmentation perspective. The comparison of time consumption also supports the potential efficiency of our v2.

Download Full-text

An Efficient Fully Homomorphic Encryption Scheme for Private Information Retrieval in the Cloud

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001420550083 ◽

2019 ◽

Vol 34 (04) ◽

pp. 2055008

Author(s):

Xun Wang ◽

Tao Luo ◽

Jianfeng Li

Keyword(s):

Information Retrieval ◽

Theoretical Analysis ◽

Private Information ◽

Homomorphic Encryption ◽

Private Information Retrieval ◽

Encryption Scheme ◽

Fully Homomorphic Encryption ◽

Encrypted Data ◽

Privacy Concerns ◽

Simulation Results

Information retrieval in the cloud is common and convenient. Nevertheless, privacy concerns should not be ignored as the cloud is not fully trustable. Fully Homomorphic Encryption (FHE) allows arbitrary operations to be performed on encrypted data, where the decryption of the result of ciphertext operation equals that of the corresponding plaintext operation. Thus, FHE schemes can be utilized for private information retrieval (PIR) on encrypted data. In the FHE scheme proposed by Ducas and Micciancio (DM), only a single homomorphic NOT AND (NAND) operation is allowed between consecutive ciphertext refreshings. Aiming at this problem, an improved FHE scheme is proposed for efficient PIR where homomorphic additions and multiplications are based on linear operations on ciphertext vectors. Theoretical analysis shows that when compared with the DM scheme, the proposed scheme allows multiple homomorphic additions and a single homomorphic multiplication to be performed. The number of allowed homomorphic additions is determined by the ratio of the ciphertext modulus to the upper bound of initial ciphertext noise. Moreover, simulation results show that the proposed scheme is significantly faster than the DM scheme in the homomorphic evaluation for a series of algorithms.

Download Full-text

LiMM‐PCA: Combining ASCA + and linear mixed models to analyse high‐dimensional designed data

Journal of Chemometrics ◽

10.1002/cem.3232 ◽

2020 ◽

Vol 34 (6) ◽

Cited By ~ 1

Author(s):

Manon Martin ◽

Bernadette Govaerts

Keyword(s):

Mixed Models ◽

Linear Mixed Models ◽

High Dimensional

Download Full-text