Selection of trait-specific markers and multi-environment models improve genomic predictive ability in rice

Mapping Intimacies ◽

10.1101/482109 ◽

2018 ◽

Author(s):

Aditi Bhandari ◽

Jérôme Bartholomé ◽

Tuong-Vi Cao ◽

Nilima Kumari ◽

Julien frouin ◽

...

Keyword(s):

Drought Stress ◽

Genomic Prediction ◽

Complex Traits ◽

Prediction Models ◽

Predictive Ability ◽

Reference Population ◽

Snp Markers ◽

Selection Strategy ◽

Specific Marker ◽

Marker Selection

AbstractDeveloping high yielding rice varieties that are tolerant to drought stress is crucial for the sustainable livelihood of rice farmers in rainfed rice cropping ecosystems. Genomic selection (GS) promises to be an effective breeding option for these complex traits. We evaluated the effectiveness of two rather new options in the implementation of GS: trait and environment-specific marker selection and the use of multi-environment prediction models. A reference population of 280 rainfed lowland accessions endowed with 215k SNP markers data was phenotyped under a favorable and two managed drought environments. Trait-specific SNP subsets (28k) were selected for each trait under each environment, using results of GWAS performed with the complete genotype dataset. Performances of single-environment and multi-environment genomic prediction models were compared using kernel regression based methods (GBLUP and RKHS) under two cross validation scenario: availability (CV2) or not (CV1) of phenotypic data for the validation set, in one of the environments. The most realistic trait-specific marker selection strategy achieved predictive ability (PA) of genomic prediction was up to 22% higher than markers selected on the bases of neutral linkage disequilibrium (LD). Tolerance to drought stress was up to 32% better predicted by multi-environment models (especially RKHS based models) under CV2 strategy. Under the less favorable CV1 strategy, the multi-environment models achieved similar PA than the single-environment predictions. We also showed that reasonable PA could be obtained with as few as 3,000 SNP markers, even in a population of low LD extent, provided marker selection is based on pairwise LD. The implications of these findings for breeding for drought tolerance are discussed. The most resource sparing option would be accurate phenotyping of the reference population in a favorable environment and under a managed drought, while the candidate population would be phenotyped only under one of those environments.

Download Full-text

Harnessing Genetic Diversity in the USDA Pea Germplasm Collection Through Genomic Prediction

Frontiers in Genetics ◽

10.3389/fgene.2021.707754 ◽

2021 ◽

Vol 12 ◽

Author(s):

Md. Abdullah Al Bari ◽

Ping Zheng ◽

Indalecio Viera ◽

Hannah Worral ◽

Stephen Szwiec ◽

...

Keyword(s):

Genetic Diversity ◽

Seed Yield ◽

Genomic Prediction ◽

Complex Traits ◽

Prediction Models ◽

Germplasm Collection ◽

Predictive Ability ◽

Snp Markers ◽

Breeding Values ◽

Germplasm Collections

Phenotypic evaluation and efficient utilization of germplasm collections can be time-intensive, laborious, and expensive. However, with the plummeting costs of next-generation sequencing and the addition of genomic selection to the plant breeder’s toolbox, we now can more efficiently tap the genetic diversity within large germplasm collections. In this study, we applied and evaluated genomic prediction’s potential to a set of 482 pea (Pisum sativum L.) accessions—genotyped with 30,600 single nucleotide polymorphic (SNP) markers and phenotyped for seed yield and yield-related components—for enhancing selection of accessions from the USDA Pea Germplasm Collection. Genomic prediction models and several factors affecting predictive ability were evaluated in a series of cross-validation schemes across complex traits. Different genomic prediction models gave similar results, with predictive ability across traits ranging from 0.23 to 0.60, with no model working best across all traits. Increasing the training population size improved the predictive ability of most traits, including seed yield. Predictive abilities increased and reached a plateau with increasing number of markers presumably due to extensive linkage disequilibrium in the pea genome. Accounting for population structure effects did not significantly boost predictive ability, but we observed a slight improvement in seed yield. By applying the best genomic prediction model (e.g., RR-BLUP), we then examined the distribution of genotyped but nonphenotyped accessions and the reliability of genomic estimated breeding values (GEBV). The distribution of GEBV suggested that none of the nonphenotyped accessions were expected to perform outside the range of the phenotyped accessions. Desirable breeding values with higher reliability can be used to identify and screen favorable germplasm accessions. Expanding the training set and incorporating additional orthogonal information (e.g., transcriptomics, metabolomics, physiological traits, etc.) into the genomic prediction framework can enhance prediction accuracy.

Download Full-text

Multi-trait Genomic Prediction Model Increased the Predictive Ability for Agronomic and Malting Quality Traits in Barley (Hordeum vulgare L.)

G3 Genes|Genome|Genetics ◽

10.1534/g3.119.400968 ◽

2020 ◽

Vol 10 (3) ◽

pp. 1113-1124 ◽

Cited By ~ 8

Author(s):

Madhav Bhatta ◽

Lucia Gutierrez ◽

Lorena Cammarota ◽

Fernanda Cardozo ◽

Silvia Germán ◽

...

Keyword(s):

Prediction Model ◽

Genomic Prediction ◽

Complex Traits ◽

Prediction Models ◽

Agronomic Traits ◽

Predictive Ability ◽

Malting Quality ◽

Quality Traits ◽

Multiple Traits ◽

Correlated Traits

Plant breeders regularly evaluate multiple traits across multiple environments, which opens an avenue for using multiple traits in genomic prediction models. We assessed the potential of multi-trait (MT) genomic prediction model through evaluating several strategies of incorporating multiple traits (eight agronomic and malting quality traits) into the prediction models with two cross-validation schemes (CV1, predicting new lines with genotypic information only and CV2, predicting partially phenotyped lines using both genotypic and phenotypic information from correlated traits) in barley. The predictive ability was similar for single (ST-CV1) and multi-trait (MT-CV1) models to predict new lines. However, the predictive ability for agronomic traits was considerably increased when partially phenotyped lines (MT-CV2) were used. The predictive ability for grain yield using the MT-CV2 model with other agronomic traits resulted in 57% and 61% higher predictive ability than ST-CV1 and MT-CV1 models, respectively. Therefore, complex traits such as grain yield are better predicted when correlated traits are used. Similarly, a considerable increase in the predictive ability of malting quality traits was observed when correlated traits were used. The predictive ability for grain protein content using the MT-CV2 model with both agronomic and malting traits resulted in a 76% higher predictive ability than ST-CV1 and MT-CV1 models. Additionally, the higher predictive ability for new environments was obtained for all traits using the MT-CV2 model compared to the MT-CV1 model. This study showed the potential of improving the genomic prediction of complex traits by incorporating the information from multiple traits (cost-friendly and easy to measure traits) collected throughout breeding programs which could assist in speeding up breeding cycles.

Download Full-text

Genetic architecture and genomic prediction accuracy of apple quantitative traits across environments

10.1101/2021.11.29.470309 ◽

2021 ◽

Author(s):

Michaela Jung ◽

Beat Keller ◽

Morgane Roth ◽

Maria Jose Aranzana ◽

Annemarie Auwerkerken ◽

...

Keyword(s):

Genomic Prediction ◽

Prediction Accuracy ◽

Genetic Architecture ◽

Quantitative Traits ◽

Prediction Models ◽

Phenotypic Variability ◽

Reference Population ◽

Genomic Study ◽

Genomic Tools ◽

Breeding Efficiency

Implementation of genomic tools is desirable to increase the efficiency of apple breeding. The apple reference population (apple REFPOP) proved useful for rediscovering loci, estimating genomic prediction accuracy, and studying genotype by environment interactions (GxE). Here we show contrasting genetic architecture and genomic prediction accuracies for 30 quantitative traits across up to six European locations using the apple REFPOP. A total of 59 stable and 277 location-specific associations were found using GWAS, 69.2% of which are novel when compared with 41 reviewed publications. Average genomic prediction accuracies of 0.18-0.88 were estimated using single-environment univariate, single-environment multivariate, multi-environment univariate, and multi-environment multivariate models. The GxE accounted for up to 24% of the phenotypic variability. This most comprehensive genomic study in apple in terms of trait-environment combinations provided knowledge of trait biology and prediction models that can be readily applied for marker-assisted or genomic selection, thus facilitating increased breeding efficiency.

Download Full-text

Dissection of the impact of prioritized QTL-linked and -unlinked SNP markers on the accuracy of genomic selection1

BMC Genomic Data ◽

10.1186/s12863-021-00979-y ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Ashley S. Ling ◽

El Hamidi Hay ◽

Samuel E. Aggrey ◽

Romdhane Rekaya

Keyword(s):

Genomic Prediction ◽

Complex Traits ◽

Negative Impact ◽

Pearson Correlation ◽

Genetic Selection ◽

Snp Markers ◽

False Positives ◽

Population Statistic ◽

The Impact ◽

Genomic Predictions

Abstract Background Use of genomic information has resulted in an undeniable improvement in prediction accuracies and an increase in genetic gain in animal and plant genetic selection programs in spite of oversimplified assumptions about the true biological processes. Even for complex traits, a large portion of markers do not segregate with or effectively track genomic regions contributing to trait variation; yet it is not clear how genomic prediction accuracies are impacted by such potentially nonrelevant markers. In this study, a simulation was carried out to evaluate genomic predictions in the presence of markers unlinked with trait-relevant QTL. Further, we compared the ability of the population statistic FST and absolute estimated marker effect as preselection statistics to discriminate between linked and unlinked markers and the corresponding impact on accuracy. Results We found that the accuracy of genomic predictions decreased as the proportion of unlinked markers used to calculate the genomic relationships increased. Using all, only linked, and only unlinked marker sets yielded prediction accuracies of 0.62, 0.89, and 0.22, respectively. Furthermore, it was found that prediction accuracies are severely impacted by unlinked markers with large spurious associations. FST-preselected marker sets of 10 k and larger yielded accuracies 8.97 to 17.91% higher than those achieved using preselection by absolute estimated marker effects, despite selecting 5.1 to 37.7% more unlinked markers and explaining 2.4 to 5.0% less of the genetic variance. This was attributed to false positives selected by absolute estimated marker effects having a larger spurious association with the trait of interest and more negative impact on predictions. The Pearson correlation between FST scores and absolute estimated marker effects was 0.77 and 0.27 among only linked and only unlinked markers, respectively. The sensitivity of FST scores to detect truly linked markers is comparable to absolute estimated marker effects but the consistency between the two statistics regarding false positives is weak. Conclusion Identification and exclusion of markers that have little to no relevance to the trait of interest may significantly increase genomic prediction accuracies. The population statistic FST presents an efficient and effective tool for preselection of trait-relevant markers.

Download Full-text

CV-α: designing validations sets to increase the precision and enable multiple comparison tests in genomic prediction

10.1101/2020.11.11.376343 ◽

2020 ◽

Author(s):

Rafael Massahiro Yassue ◽

José Felipe Gonzaga Sabadin ◽

Giovanni Galli ◽

Filipe Couto Alves ◽

Roberto Fritsche-Neto

Keyword(s):

Genomic Prediction ◽

Cross Validation ◽

Prediction Models ◽

Mean Squared Error ◽

Predictive Ability ◽

Proof Of Concept ◽

Squared Error ◽

High Effect ◽

The Mean ◽

Fold Cross Validation

AbstractUsually, the comparison among genomic prediction models is based on validation schemes as Repeated Random Subsampling (RRS) or K-fold cross-validation. Nevertheless, the design of training and validation sets has a high effect on the way and subjectiveness that we compare models. Those procedures cited above have an overlap across replicates that might cause an overestimated estimate and lack of residuals independence due to resampling issues and might cause less accurate results. Furthermore, posthoc tests, such as ANOVA, are not recommended due to assumption unfulfilled regarding residuals independence. Thus, we propose a new way to sample observations to build training and validation sets based on cross-validation alpha-based design (CV-α). The CV-α was meant to create several scenarios of validation (replicates x folds), regardless of the number of treatments. Using CV-α, the number of genotypes in the same fold across replicates was much lower than K-fold, indicating higher residual independence. Therefore, based on the CV-α results, as proof of concept, via ANOVA, we could compare the proposed methodology to RRS and K-fold, applying four genomic prediction models with a simulated and real dataset. Concerning the predictive ability and bias, all validation methods showed similar performance. However, regarding the mean squared error and coefficient of variation, the CV-α method presented the best performance under the evaluated scenarios. Moreover, as it has no additional cost nor complexity, it is more reliable and allows the use of non-subjective methods to compare models and factors. Therefore, CV-α can be considered a more precise validation methodology for model selection.

Download Full-text

Genomic Prediction Using Canopy Coverage Image and Genotypic Information in Soybean via a Hybrid Model

Evolutionary Bioinformatics ◽

10.1177/1176934319840026 ◽

2019 ◽

Vol 15 ◽

pp. 117693431984002 ◽

Cited By ~ 2

Author(s):

Reka Howard ◽

Diego Jarquin

Keyword(s):

Molecular Marker ◽

Genomic Prediction ◽

Prediction Models ◽

Predictive Ability ◽

Image Data ◽

Nested Association Mapping ◽

Canopy Coverage ◽

Prediction Techniques ◽

And Training ◽

Marker Information

Prediction techniques are important in plant breeding as they provide a tool for selection that is more efficient and economical than traditional phenotypic and pedigree based selection. The conventional genomic prediction models include molecular marker information to predict the phenotype. With the development of new phenomics techniques we have the opportunity to collect image data on the plants, and extend the traditional genomic prediction models where we incorporate diverse set of information collected on the plants. In our research, we developed a hybrid matrix model that incorporates molecular marker and canopy coverage information as a weighted linear combination to predict grain yield for the soybean nested association mapping (SoyNAM) panel. To obtain the testing and training sets, we clustered the individuals based on their marker and canopy information using 2 different clustering techniques, and we compared 5 different cross-validation schemes. The results showed that the predictive ability of the models was the highest when both the canopy and marker information was included, and it was the lowest when only the canopy information was included.

Download Full-text

Genomic Selection in Winter Wheat Breeding Using a Recommender Approach

Genes ◽

10.3390/genes11070779 ◽

2020 ◽

Vol 11 (7) ◽

pp. 779

Author(s):

Dennis N. Lozada ◽

Arron H. Carter

Keyword(s):

Winter Wheat ◽

Genomic Selection ◽

Prediction Models ◽

Heading Date ◽

Predictive Ability ◽

Wheat Breeding ◽

Snp Markers ◽

Bayesian Regression ◽

Phenotypic Trait ◽

Breeding Programs

Achieving optimal predictive ability is key to increasing the relevance of implementing genomic selection (GS) approaches in plant breeding programs. The potential of an item-based collaborative filtering (IBCF) recommender system in the context of multi-trait, multi-environment GS has been explored. Different GS scenarios for IBCF were evaluated for a diverse population of winter wheat lines adapted to the Pacific Northwest region of the US. Predictions across years through cross-validations resulted in improved predictive ability when there is a high correlation between environments. Using multiple spectral traits collected from high-throughput phenotyping resulted in better GS accuracies for grain yield (GY) compared to using only single traits for predictions. Trait adjustments through various Bayesian regression models using genomic information from SNP markers was the most effective in achieving improved accuracies for GY, heading date, and plant height among the GS scenarios evaluated. Bayesian LASSO had the highest predictive ability compared to other models for phenotypic trait adjustments. IBCF gave competitive accuracies compared to a genomic best linear unbiased predictor (GBLUP) model for predicting different traits. Overall, an IBCF approach could be used as an alternative to traditional prediction models for important target traits in wheat breeding programs.

Download Full-text

Perspectives on Applications of Hierarchical Gene-To-Phenotype (G2P) Maps to Capture Non-stationary Effects of Alleles in Genomic Prediction

Frontiers in Plant Science ◽

10.3389/fpls.2021.663565 ◽

2021 ◽

Vol 12 ◽

Author(s):

Owen M. Powell ◽

Kai P. Voss-Fels ◽

David R. Jordan ◽

Graeme Hammer ◽

Mark Cooper

Keyword(s):

Plant Breeding ◽

Genomic Prediction ◽

Complex Traits ◽

Prediction Accuracy ◽

Predictive Ability ◽

Complex Trait ◽

Substitution Effects ◽

Term Prediction ◽

Gxe Interactions

Genomic prediction of complex traits across environments, breeding cycles, and populations remains a challenge for plant breeding. A potential explanation for this is that underlying non-additive genetic (GxG) and genotype-by-environment (GxE) interactions generate allele substitution effects that are non-stationary across different contexts. Such non-stationary effects of alleles are either ignored or assumed to be implicitly captured by most gene-to-phenotype (G2P) maps used in genomic prediction. The implicit capture of non-stationary effects of alleles requires the G2P map to be re-estimated across different contexts. We discuss the development and application of hierarchical G2P maps that explicitly capture non-stationary effects of alleles and have successfully increased short-term prediction accuracy in plant breeding. These hierarchical G2P maps achieve increases in prediction accuracy by allowing intermediate processes such as other traits and environmental factors and their interactions to contribute to complex trait variation. However, long-term prediction remains a challenge. The plant breeding community should undertake complementary simulation and empirical experiments to interrogate various hierarchical G2P maps that connect GxG and GxE interactions simultaneously. The existing genetic correlation framework can be used to assess the magnitude of non-stationary effects of alleles and the predictive ability of these hierarchical G2P maps in long-term, multi-context genomic predictions of complex traits in plant breeding.

Download Full-text

Utility of Climatic Information via Combining Ability Models to Improve Genomic Prediction for Yield Within the Genomes to Fields Maize Project

Frontiers in Genetics ◽

10.3389/fgene.2020.592769 ◽

2021 ◽

Vol 11 ◽

Author(s):

Diego Jarquin ◽

Natalia de Leon ◽

Cinta Romay ◽

Martin Bohn ◽

Edward S. Buckler ◽

...

Keyword(s):

Genomic Prediction ◽

Combining Ability ◽

Prediction Models ◽

Predictive Ability ◽

Weather Data ◽

Genomic Relationship Matrix ◽

Relationship Matrix ◽

Environment Interaction ◽

Environmental Covariates ◽

Genotype By Environment

Genomic prediction provides an efficient alternative to conventional phenotypic selection for developing improved cultivars with desirable characteristics. New and improved methods to genomic prediction are continually being developed that attempt to deal with the integration of data types beyond genomic information. Modern automated weather systems offer the opportunity to capture continuous data on a range of environmental parameters at specific field locations. In principle, this information could characterize training and target environments and enhance predictive ability by incorporating weather characteristics as part of the genotype-by-environment (G×E) interaction component in prediction models. We assessed the usefulness of including weather data variables in genomic prediction models using a naïve environmental kinship model across 30 environments comprising the Genomes to Fields (G2F) initiative in 2014 and 2015. Specifically four different prediction scenarios were evaluated (i) tested genotypes in observed environments; (ii) untested genotypes in observed environments; (iii) tested genotypes in unobserved environments; and (iv) untested genotypes in unobserved environments. A set of 1,481 unique hybrids were evaluated for grain yield. Evaluations were conducted using five different models including main effect of environments; general combining ability (GCA) effects of the maternal and paternal parents modeled using the genomic relationship matrix; specific combining ability (SCA) effects between maternal and paternal parents; interactions between genetic (GCA and SCA) effects and environmental effects; and finally interactions between the genetics effects and environmental covariates. Incorporation of the genotype-by-environment interaction term improved predictive ability across all scenarios. However, predictive ability was not improved through inclusion of naive environmental covariates in G×E models. More research should be conducted to link the observed weather conditions with important physiological aspects in plant development to improve predictive ability through the inclusion of weather data.

Download Full-text

Genomic predictive ability for foliar nutritive traits in perennial ryegrass

10.1101/727958 ◽

2019 ◽

Author(s):

Sai Krishna Arojju ◽

Mingshu Cao ◽

M. Z. Zulfi Jahufer ◽

Brent A Barrett ◽

Marty J Faville

Keyword(s):

Genomic Selection ◽

Genomic Prediction ◽

Nutritive Value ◽

Prediction Models ◽

Genotypic Variation ◽

Genetic Correlations ◽

Predictive Ability ◽

Water Soluble ◽

Training Set ◽

Sib Families

AbstractForage nutritive value impacts animal nutrition, which underpins livestock productivity, reproduction and health. Genetic improvement for nutritive traits has been limited, as they are typically expensive and time-consuming to measure through conventional methods. Genomic selection is appropriate for such complex and expensive traits, enabling cost-effective prediction of breeding values using genome-wide markers. The aims of the present study were to assess the potential of genomic selection for a range of nutritive traits in a multi-population training set, and to quantify contributions of genotypic, environmental and genotype-by-environment (G × E) variance components to trait variation and heritability for nutritive traits. The training set consisted of a total of 517 half-sibling (half-sib) families, from five advanced breeding populations, evaluated in two distinct New Zealand grazing environments. Autumn-harvested samples were analyzed for 18 nutritive traits and maternal parents of the half-sib families were genotyped using genotyping-by-sequencing. Significant (P<0.05) genotypic variation was detected for all nutritive traits and genomic heritability (h2g) was moderate to high (0.20 to 0.74). G × E interactions were significant and particularly large for water soluble carbohydrate (WSC), crude fat, phosphorus (P) and crude protein. GBLUP, KGD-GBLUP and BayesC genomic prediction models displayed similar predictive ability, estimated by 10-fold cross validation, for all nutritive traits with values ranging from r = 0.16 to 0.45 using phenotypes from across two environments. High predictive ability was observed for the mineral traits sulphur (0.44), sodium (0.45) and magnesium (0.45) and the lowest values were observed for P (0.16), digestibility (0.22) and high molecular weight WSC (0.23). Predictive ability estimates for most nutritive traits were retained when marker number was reduced from 1 million to as few as 50,000. The moderate to high predictive abilities observed suggests implementation of genomic selection is feasible for most of the nutritive traits examined. For traits with lower predictive ability, multi-trait genomic prediction approaches that exploit the strong genetic correlations observed amongst some nutritive traits may be useful. This appears to be particularly important for WSC, considered one of the primary constituent of nutritive value for forages.

Download Full-text