Increasing the Efficiency of Genome-wide Association Mapping via Hidden Markov Models

Mapping Intimacies ◽

10.1101/039099 ◽

2016 ◽

Author(s):

Hong Gao ◽

Hua Tang ◽

Carlos Bustamante

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Large Scale ◽

Hidden Markov ◽

Association Studies ◽

Genome Wide Association ◽

Trend Test ◽

Genome Wide Association Studies ◽

Data Set ◽

Genome Wide

With the rapid production of high dimensional genetic data, one major challenge in genome-wide association studies is to develop effective and efficient statistical tools to resolve the low power problem of detecting causal SNPs with low to moderate susceptibility, whose effects are often obscured by substantial background noises. Here we present a novel method that serves as an optimal technique for reducing background noises and improving detection power in genome-wide association studies. The approach uses hidden Markov model and its derivate Markov hidden Markov model to estimate the posterior probabilities of a markers being in an associated state. We conducted extensive simulations based on the human whole genome genotype data from the GlaxoSmithKline-POPRES project to calibrate the sensitivity and specificity of our method and compared with many popular approaches for detecting positive signals including the χ^2 test for association and the Cochran-Armitage trend test. Our simulation results suggested that at very low false positive rates (<10^-6), our method reaches the power of 0.9, and is more powerful than any other approaches, when the allelic effect of the causal variant is non-additive or unknown. Application of our method to the data set generated by Welcome Trust Case Control Consortium using 14,000 cases and 3,000 controls confirmed its powerfulness and efficiency under the context of the large-scale genome-wide association studies.

Download Full-text

A Novel Hidden Markov Model for Genome-Wide Association Studies

2017 IEEE International Conference on Software Quality, Reliability and Security Companion (QRS-C) ◽

10.1109/qrs-c.2017.86 ◽

2017 ◽

Author(s):

Junli Yang ◽

Bo Song ◽

Bing Yan ◽

Guoqiang Li

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Hidden Markov ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide

Download Full-text

Large-scale multiple testing in genome-wide association studies via region-specific hidden Markov models

BMC Bioinformatics ◽

10.1186/1471-2105-14-282 ◽

2013 ◽

Vol 14 (1) ◽

pp. 282 ◽

Cited By ~ 4

Author(s):

Jian Xiao ◽

Wensheng Zhu ◽

Jianhua Guo

Keyword(s):

Hidden Markov Models ◽

Multiple Testing ◽

Large Scale ◽

Markov Models ◽

Hidden Markov ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide

Download Full-text

GWASpro: a high-performance genome-wide association analysis server

Bioinformatics ◽

10.1093/bioinformatics/bty989 ◽

2018 ◽

Vol 35 (14) ◽

pp. 2512-2514 ◽

Cited By ~ 4

Author(s):

Bongsong Kim ◽

Xinbin Dai ◽

Wenchao Zhang ◽

Zhaohong Zhuang ◽

Darlene L Sanchez ◽

...

Keyword(s):

High Performance ◽

Large Scale ◽

Linear Mixed Model ◽

Association Studies ◽

Learning Curves ◽

Experimental Designs ◽

Genome Wide Association ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Genome Wide

Abstract Summary We present GWASpro, a high-performance web server for the analyses of large-scale genome-wide association studies (GWAS). GWASpro was developed to provide data analyses for large-scale molecular genetic data, coupled with complex replicated experimental designs such as found in plant science investigations and to overcome the steep learning curves of existing GWAS software tools. GWASpro supports building complex design matrices, by which complex experimental designs that may include replications, treatments, locations and times, can be accounted for in the linear mixed model. GWASpro is optimized to handle GWAS data that may consist of up to 10 million markers and 10 000 samples from replicable lines or hybrids. GWASpro provides an interface that significantly reduces the learning curve for new GWAS investigators. Availability and implementation GWASpro is freely available at https://bioinfo.noble.org/GWASPRO. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Detection of structural variations in densely-labelled optical DNA barcodes: A hidden Markov model approach

PLoS ONE ◽

10.1371/journal.pone.0259670 ◽

2021 ◽

Vol 16 (11) ◽

pp. e0259670

Author(s):

Albertas Dvirnas ◽

Callum Stewart ◽

Vilhelm Müller ◽

Santosh Kumar Bikkarolla ◽

Karolin Frykholm ◽

...

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Large Scale ◽

Hidden Markov ◽

Sequence Information ◽

True Positive ◽

Dna Barcodes ◽

Structural Variations ◽

Genomic Alterations ◽

Data Set

Large-scale genomic alterations play an important role in disease, gene expression, and chromosome evolution. Optical DNA mapping (ODM), commonly categorized into sparsely-labelled ODM and densely-labelled ODM, provides sequence-specific continuous intensity profiles (DNA barcodes) along single DNA molecules and is a technique well-suited for detecting such alterations. For sparsely-labelled barcodes, the possibility to detect large genomic alterations has been investigated extensively, while densely-labelled barcodes have not received as much attention. In this work, we introduce HMMSV, a hidden Markov model (HMM) based algorithm for detecting structural variations (SVs) directly in densely-labelled barcodes without access to sequence information. We evaluate our approach using simulated data-sets with 5 different types of SVs, and combinations thereof, and demonstrate that the method reaches a true positive rate greater than 80% for randomly generated barcodes with single variations of size 25 kilobases (kb). Increasing the length of the SV further leads to larger true positive rates. For a real data-set with experimental barcodes on bacterial plasmids, we successfully detect matching barcode pairs and SVs without any particular assumption of the types of SVs present. Instead, our method effectively goes through all possible combinations of SVs. Since ODM works on length scales typically not reachable with other techniques, our methodology is a promising tool for identifying arbitrary combinations of genomic alterations.

Download Full-text

Replicability analysis in genome-wide association studies via Cartesian hidden Markov models

BMC Bioinformatics ◽

10.1186/s12859-019-2707-7 ◽

2019 ◽

Vol 20 (1) ◽

Author(s):

Pengfei Wang ◽

Wensheng Zhu

Keyword(s):

Hidden Markov Models ◽

Markov Models ◽

Hidden Markov ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide

Download Full-text

Genome-wide association study of agronomic traits in bread wheat reveals novel putative alleles for future breeding programs

BMC Plant Biology ◽

10.1186/s12870-019-2165-4 ◽

2019 ◽

Vol 19 (1) ◽

Cited By ~ 11

Author(s):

Yousef Rahimi ◽

Mohammad Reza Bihamta ◽

Alireza Taleei ◽

Hadi Alipour ◽

Pär K. Ingvarsson

Keyword(s):

Bread Wheat ◽

Genome Wide Association Study ◽

Agronomic Traits ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Wheat Varieties ◽

Data Set ◽

Protein Coding ◽

Genome Wide

Abstract Background Identification of loci for agronomic traits and characterization of their genetic architecture are crucial in marker-assisted selection (MAS). Genome-wide association studies (GWAS) have increasingly been used as potent tools in identifying marker-trait associations (MTAs). The introduction of new adaptive alleles in the diverse genetic backgrounds may help to improve grain yield of old or newly developed varieties of wheat to balance supply and demand throughout the world. Landraces collected from different climate zones can be an invaluable resource for such adaptive alleles. Results GWAS was performed using a collection of 298 Iranian bread wheat varieties and landraces to explore the genetic basis of agronomic traits during 2016–2018 cropping seasons under normal (well-watered) and stressed (rain-fed) conditions. A high-quality genotyping by sequencing (GBS) dataset was obtained using either all original single nucleotide polymorphism (SNP, 10938 SNPs) or with additional imputation (46,862 SNPs) based on W7984 reference genome. The results confirm that the B genome carries the highest number of significant marker pairs in both varieties (49,880, 27.37%) and landraces (55,086, 28.99%). The strongest linkage disequilibrium (LD) between pairs of markers was observed on chromosome 2D (0.296). LD decay was lower in the D genome, compared to the A and B genomes. Association mapping under two tested environments yielded a total of 313 and 394 significant (−log10P >3) MTAs for the original and imputed SNP data sets, respectively. Gene ontology results showed that 27 and 27.5% of MTAs of SNPs in the original set were located in protein-coding regions for well-watered and rain-fed conditions, respectively. While, for the imputed data set 22.6 and 16.6% of MTAs represented in protein-coding genes for the well-watered and rain-fed conditions, respectively. Conclusions Our finding suggests that Iranian bread wheat landraces harbor valuable alleles that are adaptive under drought stress conditions. MTAs located within coding genes can be utilized in genome-based breeding of new wheat varieties. Although imputation of missing data increased the number of MTAs, the fraction of these MTAs located in coding genes were decreased across the different sub-genomes.

Download Full-text

Secure large-scale genome-wide association studies using homomorphic encryption

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1918257117 ◽

2020 ◽

Vol 117 (21) ◽

pp. 11608-11613 ◽

Cited By ~ 1

Author(s):

Marcelo Blatt ◽

Alexander Gusev ◽

Yuriy Polyakov ◽

Shafi Goldwasser

Keyword(s):

Large Scale ◽

Homomorphic Encryption ◽

Association Studies ◽

Genome Wide Association ◽

Single Server ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

User Interactions ◽

Individual Level ◽

Genome Wide

Genome-wide association studies (GWASs) seek to identify genetic variants associated with a trait, and have been a powerful approach for understanding complex diseases. A critical challenge for GWASs has been the dependence on individual-level data that typically have strict privacy requirements, creating an urgent need for methods that preserve the individual-level privacy of participants. Here, we present a privacy-preserving framework based on several advances in homomorphic encryption and demonstrate that it can perform an accurate GWAS analysis for a real dataset of more than 25,000 individuals, keeping all individual data encrypted and requiring no user interactions. Our extrapolations show that it can evaluate GWASs of 100,000 individuals and 500,000 single-nucleotide polymorphisms (SNPs) in 5.6 h on a single server node (or in 11 min on 31 server nodes running in parallel). Our performance results are more than one order of magnitude faster than prior state-of-the-art results using secure multiparty computation, which requires continuous user interactions, with the accuracy of both solutions being similar. Our homomorphic encryption advances can also be applied to other domains where large-scale statistical analyses over encrypted data are needed.

Download Full-text

A critical evaluation of results from genome-wide association studies of micronutrient status and their utility in the practice of precision nutrition

British Journal Of Nutrition ◽

10.1017/s0007114519001119 ◽

2019 ◽

Vol 122 (2) ◽

pp. 121-130 ◽

Cited By ~ 2

Author(s):

Marie-Joe Dib ◽

Ruan Elliott ◽

Kourosh R. Ahmadi

Keyword(s):

Large Scale ◽

Association Studies ◽

Critical Evaluation ◽

Water Soluble ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Micronutrient Deficiencies ◽

Micronutrient Status ◽

Genome Wide ◽

Fat Soluble Vitamins

AbstractRapid advances in ‘omics’ technologies have paved the way forward to an era where more ‘precise’ approaches – ‘precision’ nutrition – which leverage data on genetic variability alongside the traditional indices, have been put forth as the state-of-the-art solution to redress the effects of malnutrition across the life course. We purport that this inference is premature and that it is imperative to first review and critique the existing evidence from large-scale epidemiological findings. We set out to provide a critical evaluation of findings from genome-wide association studies (GWAS) in the roadmap to precision nutrition, focusing on GWAS of micronutrient disposition. We found that a large number of loci associated with biomarkers of micronutrient status have been identified. Mean estimates of heritability of micronutrient status ranged between 20 and 35 % for minerals, 56–59 % for water-soluble and 30–70 % for fat-soluble vitamins. With some exceptions, the majority of the identified genetic variants explained little of the overall variance in status for each micronutrient, ranging between 1·3 and 8 % (minerals), <0·1–12 % (water-soluble) and 1·7–2·3 % for (fat-soluble) vitamins. However, GWAS have provided some novel insight into mechanisms that underpin variability in micronutrient status. Our findings highlight obvious gaps that need to be addressed if the full scope of precision nutrition is ever to be realised, including research aimed at (i) dissecting the genetic basis of micronutrient deficiencies or ‘response’ to intake/supplementation (ii) identifying trans-ethnic and ethnic-specific effects (iii) identifying gene–nutrient interactions for the purpose of unravelling molecular ‘behaviour’ in a range of environmental contexts.

Download Full-text

Multiple testing in genome-wide association studies via hidden Markov models

Bioinformatics ◽

10.1093/bioinformatics/btp476 ◽

2009 ◽

Vol 25 (21) ◽

pp. 2802-2808 ◽

Cited By ~ 31

Author(s):

Zhi Wei ◽

Wenguang Sun ◽

Kai Wang ◽

Hakon Hakonarson

Keyword(s):

Hidden Markov Models ◽

Multiple Testing ◽

Markov Models ◽

Hidden Markov ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide

Download Full-text

Large-Scale Development of Gene-Associated Single-Nucleotide Polymorphism Markers for Molluscan Population Genomic, Comparative Genomic, and Genome-Wide Association Studies

DNA Research ◽

10.1093/dnares/dst048 ◽

2013 ◽

Vol 21 (2) ◽

pp. 183-193 ◽

Cited By ~ 10

Author(s):

W. Jiao ◽

X. Fu ◽

J. Li ◽

L. Li ◽

L. Feng ◽

...

Keyword(s):

Single Nucleotide Polymorphism ◽

Large Scale ◽

Association Studies ◽

Genome Wide Association ◽

Comparative Genomic ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphism ◽

Single Nucleotide ◽

Population Genomic ◽

Genome Wide

Download Full-text