Genomic predictive ability for foliar nutritive traits in perennial ryegrass

Mapping Intimacies ◽

10.1101/727958 ◽

2019 ◽

Author(s):

Sai Krishna Arojju ◽

Mingshu Cao ◽

M. Z. Zulfi Jahufer ◽

Brent A Barrett ◽

Marty J Faville

Keyword(s):

Genomic Selection ◽

Genomic Prediction ◽

Nutritive Value ◽

Prediction Models ◽

Genotypic Variation ◽

Genetic Correlations ◽

Predictive Ability ◽

Water Soluble ◽

Training Set ◽

Sib Families

AbstractForage nutritive value impacts animal nutrition, which underpins livestock productivity, reproduction and health. Genetic improvement for nutritive traits has been limited, as they are typically expensive and time-consuming to measure through conventional methods. Genomic selection is appropriate for such complex and expensive traits, enabling cost-effective prediction of breeding values using genome-wide markers. The aims of the present study were to assess the potential of genomic selection for a range of nutritive traits in a multi-population training set, and to quantify contributions of genotypic, environmental and genotype-by-environment (G × E) variance components to trait variation and heritability for nutritive traits. The training set consisted of a total of 517 half-sibling (half-sib) families, from five advanced breeding populations, evaluated in two distinct New Zealand grazing environments. Autumn-harvested samples were analyzed for 18 nutritive traits and maternal parents of the half-sib families were genotyped using genotyping-by-sequencing. Significant (P<0.05) genotypic variation was detected for all nutritive traits and genomic heritability (h2g) was moderate to high (0.20 to 0.74). G × E interactions were significant and particularly large for water soluble carbohydrate (WSC), crude fat, phosphorus (P) and crude protein. GBLUP, KGD-GBLUP and BayesC genomic prediction models displayed similar predictive ability, estimated by 10-fold cross validation, for all nutritive traits with values ranging from r = 0.16 to 0.45 using phenotypes from across two environments. High predictive ability was observed for the mineral traits sulphur (0.44), sodium (0.45) and magnesium (0.45) and the lowest values were observed for P (0.16), digestibility (0.22) and high molecular weight WSC (0.23). Predictive ability estimates for most nutritive traits were retained when marker number was reduced from 1 million to as few as 50,000. The moderate to high predictive abilities observed suggests implementation of genomic selection is feasible for most of the nutritive traits examined. For traits with lower predictive ability, multi-trait genomic prediction approaches that exploit the strong genetic correlations observed amongst some nutritive traits may be useful. This appears to be particularly important for WSC, considered one of the primary constituent of nutritive value for forages.

Download Full-text

Genomic Predictive Ability for Foliar Nutritive Traits in Perennial Ryegrass

G3 Genes|Genome|Genetics ◽

10.1534/g3.119.400880 ◽

2019 ◽

Vol 10 (2) ◽

pp. 695-708 ◽

Cited By ~ 6

Author(s):

Sai Krishna Arojju ◽

Mingshu Cao ◽

M. Z. Zulfi Jahufer ◽

Brent A. Barrett ◽

Marty J. Faville

Keyword(s):

Genomic Selection ◽

Perennial Ryegrass ◽

Nutritive Value ◽

Prediction Models ◽

Predictive Ability ◽

Genotyping By Sequencing ◽

Water Soluble ◽

Soluble Carbohydrate ◽

Training Set ◽

Sib Families

Forage nutritive value impacts animal nutrition, which underpins livestock productivity, reproduction and health. Genetic improvement for nutritive traits in perennial ryegrass has been limited, as they are typically expensive and time-consuming to measure through conventional methods. Genomic selection is appropriate for such complex and expensive traits, enabling cost-effective prediction of breeding values using genome-wide markers. The aims of the present study were to assess the potential of genomic selection for a range of nutritive traits in a multi-population training set, and to quantify contributions of family, location and family-by-location variance components to trait variation and heritability for nutritive traits. The training set consisted of a total of 517 half-sibling (half-sib) families, from five advanced breeding populations, evaluated in two distinct New Zealand grazing environments. Autumn-harvested samples were analyzed for 18 nutritive traits and maternal parents of the half-sib families were genotyped using genotyping-by-sequencing. Significant (P < 0.05) family variance was detected for all nutritive traits and genomic heritability (h2g) was moderate to high (0.20 to 0.74). Family-by-location interactions were significant and particularly large for water soluble carbohydrate (WSC), crude fat, phosphorus (P) and crude protein. GBLUP, KGD-GBLUP and BayesCπ genomic prediction models displayed similar predictive ability, estimated by 10-fold cross validation, for all nutritive traits with values ranging from r = 0.16 to 0.45 using phenotypes from across two locations. High predictive ability was observed for the mineral traits sulfur (0.44), sodium (0.45) and magnesium (0.45) and the lowest values were observed for P (0.16), digestibility (0.22) and high molecular weight WSC (0.23). Predictive ability estimates for most nutritive traits were retained when marker number was reduced from one million to as few as 50,000. The moderate to high predictive abilities observed suggests implementation of genomic selection is feasible for most of the nutritive traits examined.

Download Full-text

Divergent Genomic Selection for Herbage Accumulation and Days-To-Heading in Perennial Ryegrass

Agronomy ◽

10.3390/agronomy10030340 ◽

2020 ◽

Vol 10 (3) ◽

pp. 340

Author(s):

Marty Faville ◽

Mingshu Cao ◽

Jana Schmidt ◽

Douglas Ryan ◽

Siva Ganesh ◽

...

Keyword(s):

Genomic Selection ◽

Perennial Ryegrass ◽

Genetic Gain ◽

Genomic Prediction ◽

Prediction Models ◽

Selection Response ◽

Training Set ◽

Days To Heading ◽

Selection For ◽

Target Environment

Increasing the rate of genetic gain for dry matter (DM) yield in perennial ryegrass (Lolium perenne L.), which is a key source of nutrition for ruminants in temperate environments, is an important goal for breeders. Genomic selection (GS) is a strategy used to improve genetic gain by using molecular marker information to predict breeding values in selection candidates. An empirical assessment of GS for herbage accumulation (HA; proxy for DM yield) and days-to-heading (DTH) was completed by using existing genomic prediction models to conduct one cycle of divergent GS in four selection populations (Pop I G1 and G3; Pop III G1 and G3), for each trait. G1 populations were the offspring of the training set and G3 populations were two generations further on from that. The HA of the High GEBV selection group (SG) progenies, averaged across all four populations, was 28% higher (p < 0.05) than Low GEBV SGs when assessed in the target environment, while it did not differ significantly in a second environment. Divergence was greater in Pop I (43%–65%) than Pop III (10%–16%) and the selection response was higher in G1 than in G3. Divergent GS for DTH also produced significant (p < 0.05) differences between High and Low GEBV SGs in G1 populations (+6.3 to 9.1 days; 31%–61%) and smaller, non-significant (p > 0.05) responses in G3. This study shows that genomic prediction models, trained from a small, composite reference set, can be used to improve traits with contrasting genetic architectures in perennial ryegrass. The results highlight the importance of target environment selection for training models, as well as the influence of relatedness between the training set and selection populations.

Download Full-text

BWGS: a R package for genomic selection and its application to a wheat breeding programme

10.1101/763037 ◽

2019 ◽

Author(s):

Gilles Charmet ◽

Louis Gautier Tran ◽

Jérôme Auzanneau ◽

Renaud Rincent ◽

Sophie Bouchet

Keyword(s):

Missing Data ◽

Genomic Selection ◽

Prediction Models ◽

Predictive Accuracy ◽

Predictive Ability ◽

Breeding Programme ◽

Training Set ◽

Desktop Computer ◽

Marker Selection ◽

Breeding Programmes

AbstractWe developed an integrated R library called BWGS to enable easy computation of Genomic Estimates of Breeding values (GEBV) for genomic selection. BWGS relies on existing R-libraries, all freely available from CRAN servers. The two main functions enable to run 1) replicated random cross validations within a training set of genotyped and phenotyped lines and 2) GEBV prediction, for a set of genotyped-only lines. Options are available for 1) missing data imputation, 2) markers and training set selection and 3) genomic prediction with 15 different methods, either parametric or semi-parametric.The usefulness and efficiency of BWGS are illustrated using a population of wheat lines from a real breeding programme. Adjusted yield data from historical trials (highly unbalanced design) were used for testing the options of BWGS. On the whole, 760 candidate lines with adjusted phenotypes and genotypes for 47 839 robust SNP were used. With a simple desktop computer, we obtained results which compared with previously published results on wheat genomic selection. As predicted by the theory, factors that are most influencing predictive ability, for a given trait of moderate heritability, are the size of the training population and a minimum number of markers for capturing every QTL information. Missing data up to 40%, if randomly distributed, do not degrade predictive ability once imputed, and up to 80% randomly distributed missing data are still acceptable once imputed with Expectation-Maximization method of package rrBLUP. It is worth noticing that selecting markers that are most associated to the trait do improve predictive ability, compared with the whole set of markers, but only when marker selection is made on the whole population. When marker selection is made only on the sampled training set, this advantage nearly disappeared, since it was clearly due to overfitting. Few differences are observed between the 15 prediction models with this dataset. Although non-parametric methods that are supposed to capture non-additive effects have slightly better predictive accuracy, differences remain small. Finally, the GEBV from the 15 prediction models are all highly correlated to each other. These results are encouraging for an efficient use of genomic selection in applied breeding programmes and BWGS is a simple and powerful toolbox to apply in breeding programmes or training activities.

Download Full-text

A review of deep learning applications for genomic selection

BMC Genomics ◽

10.1186/s12864-020-07319-x ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Osval Antonio Montesinos-López ◽

Abelardo Montesinos-López ◽

Paulino Pérez-Rodríguez ◽

José Alberto Barrón-López ◽

Johannes W. R. Martini ◽

...

Keyword(s):

Deep Learning ◽

Plant Breeding ◽

Genomic Selection ◽

Genomic Prediction ◽

Mixed Model ◽

Prediction Models ◽

Genetic Effect ◽

Training Data ◽

Additive Genetic Effect ◽

Main Body

Abstract Background Several conventional genomic Bayesian (or no Bayesian) prediction methods have been proposed including the standard additive genetic effect model for which the variance components are estimated with mixed model equations. In recent years, deep learning (DL) methods have been considered in the context of genomic prediction. The DL methods are nonparametric models providing flexibility to adapt to complicated associations between data and output with the ability to adapt to very complex patterns. Main body We review the applications of deep learning (DL) methods in genomic selection (GS) to obtain a meta-picture of GS performance and highlight how these tools can help solve challenging plant breeding problems. We also provide general guidance for the effective use of DL methods including the fundamentals of DL and the requirements for its appropriate use. We discuss the pros and cons of this technique compared to traditional genomic prediction approaches as well as the current trends in DL applications. Conclusions The main requirement for using DL is the quality and sufficiently large training data. Although, based on current literature GS in plant and animal breeding we did not find clear superiority of DL in terms of prediction power compared to conventional genome based prediction models. Nevertheless, there are clear evidences that DL algorithms capture nonlinear patterns more efficiently than conventional genome based. Deep learning algorithms are able to integrate data from different sources as is usually needed in GS assisted breeding and it shows the ability for improving prediction accuracy for large plant breeding data. It is important to apply DL to large training-testing data sets.

Download Full-text

Multi-Trait Genomic Prediction Improves Predictive Ability for Dry Matter Yield and Water-Soluble Carbohydrates in Perennial Ryegrass

Frontiers in Plant Science ◽

10.3389/fpls.2020.01197 ◽

2020 ◽

Vol 11 ◽

Author(s):

Sai Krishna Arojju ◽

Mingshu Cao ◽

Michael Trolove ◽

Brent A. Barrett ◽

Courtney Inch ◽

...

Keyword(s):

Perennial Ryegrass ◽

Genomic Prediction ◽

Dry Matter ◽

Predictive Ability ◽

Water Soluble ◽

Soluble Carbohydrates ◽

Dry Matter Yield ◽

Water Soluble Carbohydrates

Download Full-text

Genomic Prediction and Genetic Correlation of Agronomic, Blackleg Disease, and Seed Quality Traits in Canola (Brassica napus L.)

Plants ◽

10.3390/plants9060719 ◽

2020 ◽

Vol 9 (6) ◽

pp. 719

Author(s):

Mulusew Fikere ◽

Denise M. Barbulescu ◽

M. Michelle Malmberg ◽

Pankaj Maharjan ◽

Phillip A. Salisbury ◽

...

Keyword(s):

Genomic Selection ◽

Genomic Prediction ◽

Prediction Accuracy ◽

Seed Quality ◽

Agronomic Traits ◽

Genetic Correlations ◽

Quality Traits ◽

Blackleg Disease ◽

Genetic Progress ◽

Seed Quality Traits

Genomic selection accelerates genetic progress in crop breeding through the prediction of future phenotypes of selection candidates based on only their genomic information. Here we report genetic correlations and genomic prediction accuracies in 22 agronomic, disease, and seed quality traits measured across multiple years (2015–2017) in replicated trials under rain-fed and irrigated conditions in Victoria, Australia. Two hundred and two spring canola lines were genotyped for 62,082 Single Nucleotide Polymorphisms (SNPs) using transcriptomic genotype-by-sequencing (GBSt). Traits were evaluated in single trait and bivariate genomic best linear unbiased prediction (GBLUP) models and cross-validation. GBLUP were also expanded to include genotype-by-environment G × E interactions. Genomic heritability varied from 0.31to 0.66. Genetic correlations were highly positive within traits across locations and years. Oil content was positively correlated with most agronomic traits. Strong, not previously documented, negative correlations were observed between average internal infection (a measure of blackleg disease) and arachidic and stearic acids. The genetic correlations between fatty acid traits followed the expected patterns based on oil biosynthesis pathways. Genomic prediction accuracy ranged from 0.29 for emergence count to 0.69 for seed yield. The incorporation of G × E translates into improved prediction accuracy by up to 6%. The genomic prediction accuracies achieved indicate that genomic selection is ready for application in canola breeding.

Download Full-text

Haplotype genomic prediction of phenotypic values based on chromosome distance and gene boundaries using low-coverage sequencing in Duroc pigs

Genetics Selection Evolution ◽

10.1186/s12711-021-00661-y ◽

2021 ◽

Vol 53 (1) ◽

Author(s):

Cheng Bian ◽

Dzianis Prakapenka ◽

Cheng Tan ◽

Ruifei Yang ◽

Di Zhu ◽

...

Keyword(s):

Genomic Selection ◽

Genomic Prediction ◽

Prediction Accuracy ◽

Prediction Models ◽

Average Daily Gain ◽

Live Weight ◽

Feed Conversion ◽

Muscle Area ◽

Haplotype Blocks ◽

Low Coverage

Abstract Background Genomic selection using single nucleotide polymorphism (SNP) markers has been widely used for genetic improvement of livestock, but most current methods of genomic selection are based on SNP models. In this study, we investigated the prediction accuracies of haplotype models based on fixed chromosome distances and gene boundaries compared to those of SNP models for genomic prediction of phenotypic values. We also examined the reasons for the successes and failures of haplotype genomic prediction. Methods We analyzed a swine population of 3195 Duroc boars with records on eight traits: body judging score (BJS), teat number (TN), age (AGW), loin muscle area (LMA), loin muscle depth (LMD) and back fat thickness (BF) at 100 kg live weight, and average daily gain (ADG) and feed conversion rate (FCR) from 30 to100 kg live weight. Ten-fold validation was used to evaluate the prediction accuracy of each SNP model and each multi-allelic haplotype model based on 488,124 autosomal SNPs from low-coverage sequencing. Haplotype blocks were defined using fixed chromosome distances or gene boundaries. Results Compared to the best SNP model, the accuracy of predicting phenotypic values using a haplotype model was greater by 7.4% for BJS, 7.1% for AGW, 6.6% for ADG, 4.9% for FCR, 2.7% for LMA, 1.9% for LMD, 1.4% for BF, and 0.3% for TN. The use of gene-based haplotype blocks resulted in the best prediction accuracy for LMA, LMD, and TN. Compared to estimates of SNP additive heritability, estimates of haplotype epistasis heritability were strongly correlated with the increase in prediction accuracy by haplotype models. The increase in prediction accuracy was largest for BJS, AGW, ADG, and FCR, which also had the largest estimates of haplotype epistasis heritability, 24.4% for BJS, 14.3% for AGW, 14.5% for ADG, and 17.7% for FCR. SNP and haplotype heritability profiles across the genome identified several genes with large genetic contributions to phenotypes: NUDT3 for LMA, LMD and BF, VRTN for TN, COL5A2 for BJS, BSND for ADG, and CARTPT for FCR. Conclusions Haplotype prediction models improved the accuracy for genomic prediction of phenotypes in Duroc pigs. For some traits, the best prediction accuracy was obtained with haplotypes defined using gene regions, which provides evidence that functional genomic information can improve the accuracy of haplotype genomic prediction for certain traits.

Download Full-text

Combining genetic resources and elite material populations to improve the accuracy of genomic prediction in apple

G3 Genes|Genome|Genetics ◽

10.1093/g3journal/jkab420 ◽

2021 ◽

Author(s):

Xabi Cazenave ◽

Bernard Petit ◽

Marc Lateur ◽

Hilde Nybom ◽

Jiri Sedlak ◽

...

Keyword(s):

Genetic Resources ◽

Genomic Selection ◽

Predictive Ability ◽

Practical Implementation ◽

Specific Marker ◽

Training Set ◽

High Genetic Diversity ◽

Breeding Programs ◽

Breeding Cycles ◽

Two Populations

Abstract Genomic selection is an attractive strategy for apple breeding that could reduce the length of breeding cycles. A possible limitation to the practical implementation of this approach lies in the creation of a training set large and diverse enough to ensure accurate predictions. In this study, we investigated the potential of combining two available populations, i.e. genetic resources and elite material, in order to obtain a large training set with a high genetic diversity. We compared the predictive ability of genomic predictions within-population, across-population or when combining both populations, and tested a model accounting for population-specific marker effects in this last case. The obtained predictive abilities were moderate to high according to the studied trait and small increases in predictive ability could be obtained for some traits when the two populations were combined into a unique training set. We also investigated the potential of such a training set to predict hybrids resulting from crosses between the two populations, with a focus on the method to design the training set and the best proportion of each population to optimize predictions. The measured predictive abilities were very similar for all the proportions, except for the extreme cases where only one of the two populations was used in the training set, in which case predictive abilities could be lower than when using both populations. Using an optimization algorithm to choose the genotypes in the training set also led to higher predictive abilities than when the genotypes were chosen at random. Our results provide guidelines to initiate breeding programs that use genomic selection when the implementation of the training set is a limitation.

Download Full-text

CV-α: designing validations sets to increase the precision and enable multiple comparison tests in genomic prediction

10.1101/2020.11.11.376343 ◽

2020 ◽

Author(s):

Rafael Massahiro Yassue ◽

José Felipe Gonzaga Sabadin ◽

Giovanni Galli ◽

Filipe Couto Alves ◽

Roberto Fritsche-Neto

Keyword(s):

Genomic Prediction ◽

Cross Validation ◽

Prediction Models ◽

Mean Squared Error ◽

Predictive Ability ◽

Proof Of Concept ◽

Squared Error ◽

High Effect ◽

The Mean ◽

Fold Cross Validation

AbstractUsually, the comparison among genomic prediction models is based on validation schemes as Repeated Random Subsampling (RRS) or K-fold cross-validation. Nevertheless, the design of training and validation sets has a high effect on the way and subjectiveness that we compare models. Those procedures cited above have an overlap across replicates that might cause an overestimated estimate and lack of residuals independence due to resampling issues and might cause less accurate results. Furthermore, posthoc tests, such as ANOVA, are not recommended due to assumption unfulfilled regarding residuals independence. Thus, we propose a new way to sample observations to build training and validation sets based on cross-validation alpha-based design (CV-α). The CV-α was meant to create several scenarios of validation (replicates x folds), regardless of the number of treatments. Using CV-α, the number of genotypes in the same fold across replicates was much lower than K-fold, indicating higher residual independence. Therefore, based on the CV-α results, as proof of concept, via ANOVA, we could compare the proposed methodology to RRS and K-fold, applying four genomic prediction models with a simulated and real dataset. Concerning the predictive ability and bias, all validation methods showed similar performance. However, regarding the mean squared error and coefficient of variation, the CV-α method presented the best performance under the evaluated scenarios. Moreover, as it has no additional cost nor complexity, it is more reliable and allows the use of non-subjective methods to compare models and factors. Therefore, CV-α can be considered a more precise validation methodology for model selection.

Download Full-text

Combining genetic resources and elite material populations to improve the accuracy of genomic prediction in apple

10.1101/2021.08.27.457920 ◽

2021 ◽

Author(s):

Xabi Cazenave ◽

Bernard Petit ◽

Francois Laurens ◽

Charles-Eric Durel ◽

Helene Muranty

Keyword(s):

Genetic Resources ◽

Genomic Selection ◽

Predictive Ability ◽

Practical Implementation ◽

Specific Marker ◽

Training Set ◽

High Genetic Diversity ◽

Breeding Programs ◽

Breeding Cycles ◽

Two Populations

Genomic selection is an attractive strategy for apple breeding that could reduce the length of breeding cycles. A possible limitation to the practical implementation of this approach lies in the creation of a training set large and diverse enough to ensure accurate predictions. In this study, we investigated the potential of combining two available populations, i.e. genetic resources and elite material, in order to obtain a large training set with a high genetic diversity. We compared the predictive ability of genomic predictions within-population, across-population or when combining both populations, and tested a model accounting for population-specific marker effects in this last case. The obtained predictive abilities were moderate to high according to the studied trait and were always highest when the two populations were combined into a unique training set. We also investigated the potential of such a training set to predict hybrids resulting from crosses between the two populations, with a focus on the method to design the training set and the best proportion of each population to optimize predictions. The measured predictive abilities were very similar for all the proportions, except for the extreme cases where only one of the two populations was used in the training set, in which case predictive abilities could be lower than when using both populations. Using an optimization algorithm to choose the genotypes in the training set also led to higher predictive abilities than when the genotypes were chosen at random. Our results provide guidelines to initiate breeding programs that use genomic selection when the implementation of the training set is a limitation.

Download Full-text