scholarly journals Sparse Bayesian learning for predicting phenotypes and ranking influential markers in yeast

2018 ◽  
Author(s):  
Maryam Ayat ◽  
Michael Domaratzki

Genomic selection and genome-wide association studies are two related problems that can be applied to the plant breeding industry. Genomic selection is a method to predict phenotypes (i.e., traits) such as yield and drought resistance in crops from high-density markers positioned throughout the genome of the varieties. In this paper, we employ employ sparse Bayesian learning as a technique for genomic selection and ranking markers based on their relevance to a trait, which can aid in genome-wide association studies. We define and explore two different forms of the sparse Bayesian learning for predicting phenotypes and identifying the most influential markers of a trait, respectively. In particular, we introduce a new framework based on sparse Bayesian and ensemble learning for ranking influential markers of a trait. Then, we apply our methods on a real-world \textit{Saccharomyces cerevisiae} dataset, and analyse our results with respect to existing related works, trait heritability, as well as the accuracies obtained from the use of different kernel functions including linear, Gaussian, and string kernels. We find that sparse Bayesian methods are not only as good as other machine learning methods in predicting yeast growth in different environments, but are also capable of identifying the most important markers, including both positive and negative effects on the growth, from which biologists can get insight. This attribute can make our proposed ensemble of sparse Bayesian learners favourable in ranking markers based on their relevance to a trait.

2019 ◽  
Vol 35 (21) ◽  
pp. 4327-4335
Author(s):  
Meiyue Wang ◽  
Shizhong Xu

AbstractMotivationGenomic scanning approaches that detect one locus at a time are subject to many problems in genome-wide association studies and quantitative trait locus mapping. The problems include large matrix inversion, over-conservativeness for tests after Bonferroni correction and difficulty in evaluation of the total genetic contribution to a trait’s variance. Targeting these problems, we take a further step and investigate a multiple locus model that detects all markers simultaneously in a single model.ResultsWe developed a sparse Bayesian learning (SBL) method for quantitative trait locus mapping and genome-wide association studies. This new method adopts a coordinate descent algorithm to estimate parameters (marker effects) by updating one parameter at a time conditional on current values of all other parameters. It uses an L2 type of penalty that allows the method to handle extremely large sample sizes (>100 000). Simulation studies show that SBL often has higher statistical powers and the simulated true loci are often detected with extremely small P-values, indicating that SBL is insensitive to stringent thresholds in significance testing.Availability and implementationAn R package (sbl) is available on the comprehensive R archive network (CRAN) and https://github.com/MeiyueComputBio/sbl/tree/master/R%20packge.Supplementary informationSupplementary data are available at Bioinformatics online.


Genome ◽  
2010 ◽  
Vol 53 (11) ◽  
pp. 876-883 ◽  
Author(s):  
Ben Hayes ◽  
Mike Goddard

Results from genome-wide association studies in livestock, and humans, has lead to the conclusion that the effect of individual quantitative trait loci (QTL) on complex traits, such as yield, are likely to be small; therefore, a large number of QTL are necessary to explain genetic variation in these traits. Given this genetic architecture, gains from marker-assisted selection (MAS) programs using only a small number of DNA markers to trace a limited number of QTL is likely to be small. This has lead to the development of alternative technology for using the available dense single nucleotide polymorphism (SNP) information, called genomic selection. Genomic selection uses a genome-wide panel of dense markers so that all QTL are likely to be in linkage disequilibrium with at least one SNP. The genomic breeding values are predicted to be the sum of the effect of these SNPs across the entire genome. In dairy cattle breeding, the accuracy of genomic estimated breeding values (GEBV) that can be achieved and the fact that these are available early in life have lead to rapid adoption of the technology. Here, we discuss the design of experiments necessary to achieve accurate prediction of GEBV in future generations in terms of the number of markers necessary and the size of the reference population where marker effects are estimated. We also present a simple method for implementing genomic selection using a genomic relationship matrix. Future challenges discussed include using whole genome sequence data to improve the accuracy of genomic selection and management of inbreeding through genomic relationships.


2017 ◽  
Author(s):  
Agustín Barría ◽  
Kris A. Christensen ◽  
Katharina Correa ◽  
Ana Jedlicki ◽  
Jean P. Lhorente ◽  
...  

ABSTRACTPiscirickettsia salmonis is one of the main infectious diseases affecting coho salmon (Oncorhynchus kisutch) farming. Current treatments have been ineffective for the control of the disease. Genetic improvement for P. salmonis resistance has been proposed as a feasible alternative for the control of this infectious disease in farmed fish. Genotyping by sequencing (GBS) strategies allow genotyping hundreds of individuals with thousands of single nucleotide polymorphisms (SNPs), which can be used to perform genome wide association studies (GWAS) and predict genetic values using genome-wide information. We used double-digest restriction-site associated DNA (ddRAD) sequencing to dissect the genetic architecture of resistance against P. salmonis in a farmed coho salmon population and identify molecular markers associated with the trait. We also evaluated genomic selection (GS) models in order to determine the potential to accelerate the genetic improvement of this trait by means of using genome-wide molecular information. 764 individuals from 33 full-sib families (17 highly resistant and 16 highly susceptible) which were experimentally challenged against P. salmonis were sequenced using ddRAD sequencing. A total of 4,174 SNP markers were identified in the population. These markers were used to perform a GWAS and testing genomic selection models. One SNP related with iron availability was genome-wide significantly associated with resistance to P. salmonis defined as day of death. Genomic selection models showed similar accuracies and predictive abilities than traditional pedigree-based best linear unbiased prediction (PBLUP) method.


2020 ◽  
Vol 10 ◽  
Author(s):  
Rakesh K. Srivastava ◽  
Ram B. Singh ◽  
Vijaya Lakshmi Pujarula ◽  
Srikanth Bollam ◽  
Madhu Pusuluri ◽  
...  

Genes ◽  
2019 ◽  
Vol 10 (12) ◽  
pp. 995
Author(s):  
R. Calderón-Chagoya ◽  
J.H. Hernandez-Medrano ◽  
F.J. Ruiz-López ◽  
A. Garcia-Ruiz ◽  
V.E. Vega-Murillo ◽  
...  

Genomic selection has been proposed for the mitigation of methane (CH4) emissions by cattle because there is considerable variability in CH4 emissions between individuals fed on the same diet. The genome-wide association study (GWAS) represents an important tool for the detection of candidate genes, haplotypes or single nucleotide polymorphisms (SNP) markers related to characteristics of economic interest. The present study included information for 280 cows in three dairy production systems in Mexico: 1) Dual Purpose (n = 100), 2) Specialized Tropical Dairy (n = 76), 3) Familiar Production System (n = 104). Concentrations of CH4 in a breath of individual cows at the time of milking (MEIm) were estimated through a system of infrared sensors. After quality control analyses, 21,958 SNPs were included. Associations of markers were made using a linear regression model, corrected with principal component analyses. In total, 46 SNPs were identified as significant for CH4 production. Several SNPs associated with CH4 production were found at regions previously described for quantitative trait loci of composition characteristics of meat, milk fatty acids and characteristics related to feed intake. It was concluded that the SNPs identified could be used in genomic selection programs in developing countries and combined with other datasets for global selection.


2017 ◽  
Vol 2017 ◽  
pp. 1-17 ◽  
Author(s):  
Stefanie Friedrichs ◽  
Juliane Manitz ◽  
Patricia Burger ◽  
Christopher I. Amos ◽  
Angela Risch ◽  
...  

The analysis of genome-wide association studies (GWAS) benefits from the investigation of biologically meaningful gene sets, such as gene-interaction networks (pathways). We propose an extension to a successful kernel-based pathway analysis approach by integrating kernel functions into a powerful algorithmic framework for variable selection, to enable investigation of multiple pathways simultaneously. We employ genetic similarity kernels from the logistic kernel machine test (LKMT) as base-learners in a boosting algorithm. A model to explain case-control status is created iteratively by selecting pathways that improve its prediction ability. We evaluated our method in simulation studies adopting 50 pathways for different sample sizes and genetic effect strengths. Additionally, we included an exemplary application of kernel boosting to a rheumatoid arthritis and a lung cancer dataset. Simulations indicate that kernel boosting outperforms the LKMT in certain genetic scenarios. Applications to GWAS data on rheumatoid arthritis and lung cancer resulted in sparse models which were based on pathways interpretable in a clinical sense. Kernel boosting is highly flexible in terms of considered variables and overcomes the problem of multiple testing. Additionally, it enables the prediction of clinical outcomes. Thus, kernel boosting constitutes a new, powerful tool in the analysis of GWAS data and towards the understanding of biological processes involved in disease susceptibility.


2022 ◽  
Vol 23 (1) ◽  
Author(s):  
Dachang Dou ◽  
Linyong Shen ◽  
Jiamei Zhou ◽  
Zhiping Cao ◽  
Peng Luan ◽  
...  

Abstract Background The identification of markers and genes for growth traits may not only benefit for marker assist selection /genomic selection but also provide important information for understanding the genetic foundation of growth traits in broilers. Results In the current study, we estimated the genetic parameters of eight growth traits in broilers and carried out the genome-wide association studies for these growth traits. A total of 113 QTNs discovered by multiple methods together, and some genes, including ACTA1, IGF2BP1, TAPT1, LDB2, PRKCA, TGFBR2, GLI3, SLC16A7, INHBA, BAMBI, APCDD1, GPR39, and GATA4, were identified as important candidate genes for rapid growth in broilers. Conclusions The results of this study will provide important information for understanding the genetic foundation of growth traits in broilers.


Crop Science ◽  
2019 ◽  
Vol 59 (6) ◽  
pp. 2572-2584 ◽  
Author(s):  
Tigist Mideksa Damesa ◽  
Jens Hartung ◽  
Manje Gowda ◽  
Yoseph Beyene ◽  
Biswanath Das ◽  
...  

Forests ◽  
2020 ◽  
Vol 11 (2) ◽  
pp. 239 ◽  
Author(s):  
Sawitri ◽  
Naoki Tani ◽  
Mohammad Na’iem ◽  
Widiyatno ◽  
Sapto Indrioko ◽  
...  

Shorea platyclados (Dark Red Meranti) is a commercially important timber tree species in Southeast Asia. However, its stocks have dramatically declined due, inter alia, to excessive logging, insufficient natural regeneration and a slow recovery rate. Thus, there is a need to promote enrichment planting and develop effective technique to support its rehabilitation and improve timber production through implementation of Genome-Wide Association Studies (GWAS) and Genomic Selection (GS). To assist such efforts, plant materials were collected from a half-sib progeny population in Sari Bumi Kusuma forest concession, Kalimantan, Indonesia. Using 5900 markers in sequences obtained from 356 individuals, we detected high linkage disequilibrium (LD) extending up to >145 kb, suggesting that associations between phenotypic traits and markers in LD can be more easily and feasibly detected with GWAS than with analysis of quantitative trait loci (QTLs). However, the detection power of GWAS seems low, since few single nucleotide polymorphisms linked to any focal traits were detected with a stringent false discovery rate, indicating that the species’ phenotypic traits are mostly under polygenic quantitative control. Furthermore, Machine Learning provided higher prediction accuracies than Bayesian methods. We also found that stem diameter, branch diameter ratio and wood density were more predictable than height, clear bole, branch angle and wood stiffness traits. Our study suggests that GS has potential for improving the productivity and quality of S. platyclados, and our genomic heritability estimates may improve the selection of traits to target in future breeding of this species.


Sign in / Sign up

Export Citation Format

Share Document