Prospects for genomic selection in cassava breeding

Mapping Intimacies ◽

10.1101/108662 ◽

2017 ◽

Cited By ~ 2

Author(s):

Marnin D. Wolfe ◽

Dunia Pino Del Carpio ◽

Olumide Alabi ◽

Chiedozie Egesi ◽

Lydia C. Ezenwaka ◽

...

Keyword(s):

Genomic Selection ◽

Prediction Accuracy ◽

Cross Validation ◽

Prediction Models ◽

Mosaic Disease ◽

Cassava Mosaic Disease ◽

Training Population ◽

Breeding Programs ◽

Population Prediction ◽

Selection Of

ABSTRACTCassava (Manihot esculenta Crantz) is a clonally propagated staple food crop in the tropics. Genomic selection (GS) reduces selection cycle times by the prediction of breeding value for selection of unevaluated lines based on genome-wide marker data. GS has been implemented at three breeding programs in sub-Saharan Africa. Initial studies provided promising estimates of predictive abilities in single populations using standard prediction models and scenarios. In the present study we expand on previous analyses by assessing the accuracy of seven prediction models for seven traits in three prediction scenarios: (1) cross-validation within each population, (2) cross-population prediction and (3) cross-generation prediction. We also evaluated the impact of increasing training population size by phenotyping progenies selected either at random or using a genetic algorithm. Cross-validation results were mostly consistent across breeding programs, with non-additive models like RKHS predicting an average of 10% more accurately. Accuracy was generally associated with heritability. Cross-population prediction accuracy was generally low (mean 0.18 across traits and models) but prediction of cassava mosaic disease severity increased up to 57% in one Nigerian population, when combining data from another related population. Accuracy across-generation was poorer than within (cross-validation) as expected, but indicated that accuracy should be sufficient for rapid-cycling GS on several traits. Selection of prediction model made some difference across generations, but increasing training population (TP) size was more important. In some cases, using a genetic algorithm, selecting one third of progeny could achieve accuracy equivalent to phenotyping all progeny. Based on the datasets analyzed in this study, it was apparent that the size of a training population (TP) has a significant impact on prediction accuracy for most traits. We are still in the early stages of GS in this crop, but results are promising, at least for some traits. The TPs need to continue to grow and quality phenotyping is more critical than ever. General guidelines for successful GS are emerging. Phenotyping can be done on fewer individuals, cleverly selected, making for trials that are more focused on the quality of the data collected.Abbreviations(GS)Genomic selection(GBS)genotype-by-sequencing(IITA)International Institute of Tropical Agriculture(NRCRI)National Root Crops Research Institute(NaCRRI)National Crops Resources Research Institute(GEBVs)genomic estimated breeding values(TP)training population(RTWT)fresh root weight(RTNO)root number(SHTWT)fresh shoot weight(HI)harvest index(DM)dry matter(CMD)content cassava mosaic disease(MCMDS)mean CMD severity(VIGOR)early vigor

Download Full-text

How Population Structure Impacts Genomic Selection Accuracy in Cross-Validation: Implications for Practical Breeding

Frontiers in Plant Science ◽

10.3389/fpls.2020.592977 ◽

2020 ◽

Vol 11 ◽

Author(s):

Christian R. Werner ◽

R. Chris Gaynor ◽

Gregor Gorjanc ◽

John M. Hickey ◽

Tobias Kox ◽

...

Keyword(s):

Family Structure ◽

Genomic Selection ◽

Genomic Prediction ◽

Prediction Accuracy ◽

Cross Validation ◽

Careful Analysis ◽

Critical Approach ◽

Crop Species ◽

Breeding Programs ◽

Mendelian Sampling

Over the last two decades, the application of genomic selection has been extensively studied in various crop species, and it has become a common practice to report prediction accuracies using cross validation. However, genomic prediction accuracies obtained from random cross validation can be strongly inflated due to population or family structure, a characteristic shared by many breeding populations. An understanding of the effect of population and family structure on prediction accuracy is essential for the successful application of genomic selection in plant breeding programs. The objective of this study was to make this effect and its implications for practical breeding programs comprehensible for breeders and scientists with a limited background in quantitative genetics and genomic selection theory. We, therefore, compared genomic prediction accuracies obtained from different random cross validation approaches and within-family prediction in three different prediction scenarios. We used a highly structured population of 940 Brassica napus hybrids coming from 46 testcross families and two subpopulations. Our demonstrations show how genomic prediction accuracies obtained from among-family predictions in random cross validation and within-family predictions capture different measures of prediction accuracy. While among-family prediction accuracy measures prediction accuracy of both the parent average component and the Mendelian sampling term, within-family prediction only measures how accurately the Mendelian sampling term can be predicted. With this paper we aim to foster a critical approach to different measures of genomic prediction accuracy and a careful analysis of values observed in genomic selection experiments and reported in literature.

Download Full-text

Scalable Sparse Testing Genomic Selection Strategy for Early Yield Testing Stage

Frontiers in Plant Science ◽

10.3389/fpls.2021.658978 ◽

2021 ◽

Vol 12 ◽

Author(s):

Sikiru Adeniyi Atanda ◽

Michael Olsen ◽

Jose Crossa ◽

Juan Burgueño ◽

Renaud Rincent ◽

...

Keyword(s):

Genomic Selection ◽

Prediction Accuracy ◽

Cross Validation ◽

Prediction Models ◽

Historical Data ◽

Coefficient Of Determination ◽

Selection Strategy ◽

Environment Interaction ◽

Genetic Merit ◽

Testing Stage

To enable a scalable sparse testing genomic selection (GS) strategy at preliminary yield trials in the CIMMYT maize breeding program, optimal approaches to incorporate genotype by environment interaction (GEI) in genomic prediction models are explored. Two cross-validation schemes were evaluated: CV1, predicting the genetic merit of new bi-parental populations that have been evaluated in some environments and not others, and CV2, predicting the genetic merit of half of a bi-parental population that has been phenotyped in some environments and not others using the coefficient of determination (CDmean) to determine optimized subsets of a full-sib family to be evaluated in each environment. We report similar prediction accuracies in CV1 and CV2, however, CV2 has an intuitive appeal in that all bi-parental populations have representation across environments, allowing efficient use of information across environments. It is also ideal for building robust historical data because all individuals of a full-sib family have phenotypic data, albeit in different environments. Results show that grouping of environments according to similar growing/management conditions improved prediction accuracy and reduced computational requirements, providing a scalable, parsimonious approach to multi-environmental trials and GS in early testing stages. We further demonstrate that complementing the full-sib calibration set with optimized historical data results in improved prediction accuracy for the cross-validation schemes.

Download Full-text

Genomic Selection for Any Dairy Breeding Program via Optimized Investment in Phenotyping and Genotyping

Frontiers in Genetics ◽

10.3389/fgene.2021.637017 ◽

2021 ◽

Vol 12 ◽

Author(s):

Jana Obšteter ◽

Janez Jenko ◽

Gregor Gorjanc

Keyword(s):

Genomic Selection ◽

Genetic Gain ◽

Breeding Program ◽

Progeny Testing ◽

Initial Training ◽

Training Population ◽

Breeding Programs ◽

Use Of Resources ◽

Dairy Cattle Breeding ◽

Close Relatives

This paper evaluates the potential of maximizing genetic gain in dairy cattle breeding by optimizing investment into phenotyping and genotyping. Conventional breeding focuses on phenotyping selection candidates or their close relatives to maximize selection accuracy for breeders and quality assurance for producers. Genomic selection decoupled phenotyping and selection and through this increased genetic gain per year compared to the conventional selection. Although genomic selection is established in well-resourced breeding programs, small populations and developing countries still struggle with the implementation. The main issues include the lack of training animals and lack of financial resources. To address this, we simulated a case-study of a small dairy population with a number of scenarios with equal available resources yet varied use of resources for phenotyping and genotyping. The conventional progeny testing scenario collected 11 phenotypic records per lactation. In genomic selection scenarios, we reduced phenotyping to between 10 and 1 phenotypic records per lactation and invested the saved resources into genotyping. We tested these scenarios at different relative prices of phenotyping to genotyping and with or without an initial training population for genomic selection. Reallocating a part of phenotyping resources for repeated milk records to genotyping increased genetic gain compared to the conventional selection scenario regardless of the amount and relative cost of phenotyping, and the availability of an initial training population. Genetic gain increased by increasing genotyping, despite reduced phenotyping. High-genotyping scenarios even saved resources. Genomic selection scenarios expectedly increased accuracy for young non-phenotyped candidate males and females, but also proven females. This study shows that breeding programs should optimize investment into phenotyping and genotyping to maximize return on investment. Our results suggest that any dairy breeding program using conventional progeny testing with repeated milk records can implement genomic selection without increasing the level of investment.

Download Full-text

Pitfalls and Remedies for Cross Validation with Multi-trait Genomic Prediction Methods

10.1101/595397 ◽

2019 ◽

Author(s):

Daniel Runcie ◽

Hao Cheng

Keyword(s):

Genomic Prediction ◽

Prediction Accuracy ◽

Cross Validation ◽

Prediction Models ◽

Selection Index ◽

Parametric Method ◽

Multiple Traits ◽

Gold Standard Method ◽

Secondary Traits ◽

Validation Strategy

ABSTRACTIncorporating measurements on correlated traits into genomic prediction models can increase prediction accuracy and selection gain. However, multi-trait genomic prediction models are complex and prone to overfitting which may result in a loss of prediction accuracy relative to single-trait genomic prediction. Cross-validation is considered the gold standard method for selecting and tuning models for genomic prediction in both plant and animal breeding. When used appropriately, cross-validation gives an accurate estimate of the prediction accuracy of a genomic prediction model, and can effectively choose among disparate models based on their expected performance in real data. However, we show that a naive cross-validation strategy applied to the multi-trait prediction problem can be severely biased and lead to sub-optimal choices between single and multi-trait models when secondary traits are used to aid in the prediction of focal traits and these secondary traits are measured on the individuals to be tested. We use simulations to demonstrate the extent of the problem and propose three partial solutions: 1) a parametric solution from selection index theory, 2) a semi-parametric method for correcting the cross-validation estimates of prediction accuracy, and 3) a fully non-parametric method which we call CV2*: validating model predictions against focal trait measurements from genetically related individuals. The current excitement over high-throughput phenotyping suggests that more comprehensive phenotype measurements will be useful for accelerating breeding programs. Using an appropriate cross-validation strategy should more reliably determine if and when combining information across multiple traits is useful.

Download Full-text

Maximizing efficiency of genomic selection in CIMMYT’s tropical maize breeding program

Theoretical and Applied Genetics ◽

10.1007/s00122-020-03696-9 ◽

2020 ◽

Author(s):

Sikiru Adeniyi Atanda ◽

Michael Olsen ◽

Juan Burgueño ◽

Jose Crossa ◽

Daniel Dzidzienyo ◽

...

Keyword(s):

Genomic Selection ◽

Prediction Accuracy ◽

Large Scale ◽

Primary Objective ◽

Breeding Program ◽

Breeding Cycle ◽

Training Set ◽

Maize Breeding ◽

Phenotypic Data ◽

Breeding Programs

Abstract Key message Historical data from breeding programs can be efficiently used to improve genomic selection accuracy, especially when the training set is optimized to subset individuals most informative of the target testing set. Abstract The current strategy for large-scale implementation of genomic selection (GS) at the International Maize and Wheat Improvement Center (CIMMYT) global maize breeding program has been to train models using information from full-sibs in a “test-half-predict-half approach.” Although effective, this approach has limitations, as it requires large full-sib populations and limits the ability to shorten variety testing and breeding cycle times. The primary objective of this study was to identify optimal experimental and training set designs to maximize prediction accuracy of GS in CIMMYT’s maize breeding programs. Training set (TS) design strategies were evaluated to determine the most efficient use of phenotypic data collected on relatives for genomic prediction (GP) using datasets containing 849 (DS1) and 1389 (DS2) DH-lines evaluated as testcrosses in 2017 and 2018, respectively. Our results show there is merit in the use of multiple bi-parental populations as TS when selected using algorithms to maximize relatedness between the training and prediction sets. In a breeding program where relevant past breeding information is not readily available, the phenotyping expenditure can be spread across connected bi-parental populations by phenotyping only a small number of lines from each population. This significantly improves prediction accuracy compared to within-population prediction, especially when the TS for within full-sib prediction is small. Finally, we demonstrate that prediction accuracy in either sparse testing or “test-half-predict-half” can further be improved by optimizing which lines are planted for phenotyping and which lines are to be only genotyped for advancement based on GP.

Download Full-text

Haplotype genomic prediction of phenotypic values based on chromosome distance and gene boundaries using low-coverage sequencing in Duroc pigs

Genetics Selection Evolution ◽

10.1186/s12711-021-00661-y ◽

2021 ◽

Vol 53 (1) ◽

Author(s):

Cheng Bian ◽

Dzianis Prakapenka ◽

Cheng Tan ◽

Ruifei Yang ◽

Di Zhu ◽

...

Keyword(s):

Genomic Selection ◽

Genomic Prediction ◽

Prediction Accuracy ◽

Prediction Models ◽

Average Daily Gain ◽

Live Weight ◽

Feed Conversion ◽

Muscle Area ◽

Haplotype Blocks ◽

Low Coverage

Abstract Background Genomic selection using single nucleotide polymorphism (SNP) markers has been widely used for genetic improvement of livestock, but most current methods of genomic selection are based on SNP models. In this study, we investigated the prediction accuracies of haplotype models based on fixed chromosome distances and gene boundaries compared to those of SNP models for genomic prediction of phenotypic values. We also examined the reasons for the successes and failures of haplotype genomic prediction. Methods We analyzed a swine population of 3195 Duroc boars with records on eight traits: body judging score (BJS), teat number (TN), age (AGW), loin muscle area (LMA), loin muscle depth (LMD) and back fat thickness (BF) at 100 kg live weight, and average daily gain (ADG) and feed conversion rate (FCR) from 30 to100 kg live weight. Ten-fold validation was used to evaluate the prediction accuracy of each SNP model and each multi-allelic haplotype model based on 488,124 autosomal SNPs from low-coverage sequencing. Haplotype blocks were defined using fixed chromosome distances or gene boundaries. Results Compared to the best SNP model, the accuracy of predicting phenotypic values using a haplotype model was greater by 7.4% for BJS, 7.1% for AGW, 6.6% for ADG, 4.9% for FCR, 2.7% for LMA, 1.9% for LMD, 1.4% for BF, and 0.3% for TN. The use of gene-based haplotype blocks resulted in the best prediction accuracy for LMA, LMD, and TN. Compared to estimates of SNP additive heritability, estimates of haplotype epistasis heritability were strongly correlated with the increase in prediction accuracy by haplotype models. The increase in prediction accuracy was largest for BJS, AGW, ADG, and FCR, which also had the largest estimates of haplotype epistasis heritability, 24.4% for BJS, 14.3% for AGW, 14.5% for ADG, and 17.7% for FCR. SNP and haplotype heritability profiles across the genome identified several genes with large genetic contributions to phenotypes: NUDT3 for LMA, LMD and BF, VRTN for TN, COL5A2 for BJS, BSND for ADG, and CARTPT for FCR. Conclusions Haplotype prediction models improved the accuracy for genomic prediction of phenotypes in Duroc pigs. For some traits, the best prediction accuracy was obtained with haplotypes defined using gene regions, which provides evidence that functional genomic information can improve the accuracy of haplotype genomic prediction for certain traits.

Download Full-text

Genomic selection for lentil breeding: empirical evidence

10.1101/608406 ◽

2019 ◽

Cited By ~ 2

Author(s):

Teketel A. Haile ◽

Taryn Heidecker ◽

Derek Wright ◽

Sandesh Neupane ◽

Larissa Ramsay ◽

...

Keyword(s):

Genomic Selection ◽

Statistical Models ◽

Prediction Accuracy ◽

Recombinant Inbred Lines ◽

Genotype By Environment Interaction ◽

Exome Capture ◽

Environment Interaction ◽

Multiple Trait ◽

Similar Accuracy ◽

Population Prediction

AbstractGenomic selection (GS) is a type of marker-based selection which was initially suggested for livestock breeding and is being encouraged for crop breeding. Several statistical models and approaches have been developed to implement GS; however, none of these methods have been tested for use in lentil breeding. This study was conducted to evaluate different GS models and prediction scenarios based on empirical data and to make recommendations for designing genomic selection strategies for lentil breeding. We evaluated nine single-trait models, two multiple-trait models, and models that account for population structure and genotype-by-environment interaction (GEI) using a lentil diversity panel and two recombinant inbred lines (RIL) populations that were genotyped using a custom exome capture assay. Within-population, across-population and across-environment predictions were made for five phenology traits. Prediction accuracy varied among the evaluated models, populations, prediction scenarios, traits, and statistical models. Single-trait models showed similar accuracy for each trait in the absence of large effect QTL but BayesB outperformed all models when there were QTL with relatively large effects. Models that accounted for GEI and multiple-trait (MT) models increased prediction accuracy for a low heritability trait by up to 66% and 14% but accuracy did not improve for traits of high heritability. Moderate to high accuracies were obtained for within-population and across-environment predictions but across-population prediction accuracy was very low. This suggests that GS can be implemented in lentil to make predictions within populations and across environments, but across-population prediction should not be considered when the population size is small.

Download Full-text

Optimizing Low-Cost Genotyping and Imputation Strategies for Genomic Selection in Atlantic Salmon

G3 Genes|Genome|Genetics ◽

10.1534/g3.119.400800 ◽

2019 ◽

Vol 10 (2) ◽

pp. 581-590 ◽

Cited By ~ 4

Author(s):

Smaragda Tsairidou ◽

Alastair Hamilton ◽

Diego Robledo ◽

James E. Bron ◽

Ross D. Houston

Keyword(s):

Atlantic Salmon ◽

Genomic Selection ◽

Environmental Sustainability ◽

Genomic Prediction ◽

Prediction Accuracy ◽

Imputation Accuracy ◽

Cost Effective ◽

High Density ◽

Genotype Imputation ◽

Breeding Programs

Genomic selection enables cumulative genetic gains in key production traits such as disease resistance, playing an important role in the economic and environmental sustainability of aquaculture production. However, it requires genome-wide genetic marker data on large populations, which can be prohibitively expensive. Genotype imputation is a cost-effective method for obtaining high-density genotypes, but its value in aquaculture breeding programs which are characterized by large full-sibling families has yet to be fully assessed. The aim of this study was to optimize the use of low-density genotypes and evaluate genotype imputation strategies for cost-effective genomic prediction. Phenotypes and genotypes (78,362 SNPs) were obtained for 610 individuals from a Scottish Atlantic salmon breeding program population (Landcatch, UK) challenged with sea lice, Lepeophtheirus salmonis. The genomic prediction accuracy of genomic selection was calculated using GBLUP approaches and compared across SNP panels of varying densities and composition, with and without imputation. Imputation was tested when parents were genotyped for the optimal SNP panel, and offspring were genotyped for a range of lower density imputation panels. Reducing SNP density had little impact on prediction accuracy until 5,000 SNPs, below which the accuracy dropped. Imputation accuracy increased with increasing imputation panel density. Genomic prediction accuracy when offspring were genotyped for just 200 SNPs, and parents for 5,000 SNPs, was 0.53. This accuracy was similar to the full high density and optimal density dataset, and markedly higher than using 200 SNPs without imputation. These results suggest that imputation from very low to medium density can be a cost-effective tool for genomic selection in Atlantic salmon breeding programs.

Download Full-text

Improving Prediction Accuracy Using Multi-allelic Haplotype Prediction and Training Population Optimization in Wheat

G3 Genes|Genome|Genetics ◽

10.1534/g3.120.401165 ◽

2020 ◽

Vol 10 (7) ◽

pp. 2265-2273 ◽

Cited By ~ 1

Author(s):

Ahmad H. Sallam ◽

Emily Conley ◽

Dzianis Prakapenka ◽

Yang Da ◽

James A. Anderson

Keyword(s):

Population Structure ◽

Protein Content ◽

Prediction Accuracy ◽

Cross Validation ◽

Predictive Ability ◽

Training Population ◽

Percentage Points ◽

And Training ◽

Fold Cross Validation ◽

Single Snps

The use of haplotypes may improve the accuracy of genomic prediction over single SNPs because haplotypes can better capture linkage disequilibrium and genomic similarity in different lines and may capture local high-order allelic interactions. Additionally, prediction accuracy could be improved by portraying population structure in the calibration set. A set of 383 advanced lines and cultivars that represent the diversity of the University of Minnesota wheat breeding program was phenotyped for yield, test weight, and protein content and genotyped using the Illumina 90K SNP Assay. Population structure was confirmed using single SNPs. Haplotype blocks of 5, 10, 15, and 20 adjacent markers were constructed for all chromosomes. A multi-allelic haplotype prediction algorithm was implemented and compared with single SNPs using both k-fold cross validation and stratified sampling optimization. After confirming population structure, the stratified sampling improved the predictive ability compared with k-fold cross validation for yield and protein content, but reduced the predictive ability for test weight. In all cases, haplotype predictions outperformed single SNPs. Haplotypes of 15 adjacent markers showed the best improvement in accuracy for all traits; however, this was more pronounced in yield and protein content. The combined use of haplotypes of 15 adjacent markers and training population optimization significantly improved the predictive ability for yield and protein content by 14.3 (four percentage points) and 16.8% (seven percentage points), respectively, compared with using single SNPs and k-fold cross validation. These results emphasize the effectiveness of using haplotypes in genomic selection to increase genetic gain in self-fertilized crops.

Download Full-text

Genomic Selection in Winter Wheat Breeding Using a Recommender Approach

Genes ◽

10.3390/genes11070779 ◽

2020 ◽

Vol 11 (7) ◽

pp. 779

Author(s):

Dennis N. Lozada ◽

Arron H. Carter

Keyword(s):

Winter Wheat ◽

Genomic Selection ◽

Prediction Models ◽

Heading Date ◽

Predictive Ability ◽

Wheat Breeding ◽

Snp Markers ◽

Bayesian Regression ◽

Phenotypic Trait ◽

Breeding Programs

Achieving optimal predictive ability is key to increasing the relevance of implementing genomic selection (GS) approaches in plant breeding programs. The potential of an item-based collaborative filtering (IBCF) recommender system in the context of multi-trait, multi-environment GS has been explored. Different GS scenarios for IBCF were evaluated for a diverse population of winter wheat lines adapted to the Pacific Northwest region of the US. Predictions across years through cross-validations resulted in improved predictive ability when there is a high correlation between environments. Using multiple spectral traits collected from high-throughput phenotyping resulted in better GS accuracies for grain yield (GY) compared to using only single traits for predictions. Trait adjustments through various Bayesian regression models using genomic information from SNP markers was the most effective in achieving improved accuracies for GY, heading date, and plant height among the GS scenarios evaluated. Bayesian LASSO had the highest predictive ability compared to other models for phenotypic trait adjustments. IBCF gave competitive accuracies compared to a genomic best linear unbiased predictor (GBLUP) model for predicting different traits. Overall, an IBCF approach could be used as an alternative to traditional prediction models for important target traits in wheat breeding programs.

Download Full-text