Training set design in genomic prediction with multiple biparental families

Neural network-based software sensor: training set design and application to a continuous pulp digester

Control Engineering Practice ◽

10.1016/j.conengprac.2004.02.013 ◽

2005 ◽

Vol 13 (2) ◽

pp. 135-143 ◽

Cited By ~ 27

Author(s):

Pascal Dufour ◽

Sharad Bhartiya ◽

Prasad S. Dhurjati ◽

Francis J. Doyle III

Keyword(s):

Neural Network ◽

Set Design ◽

Training Set ◽

Software Sensor ◽

Pulp Digester ◽

Design And Application

Download Full-text

Genomic Prediction of Yield Traits in Single-Cross Hybrid Rice (Oryza sativa L.)

Frontiers in Genetics ◽

10.3389/fgene.2021.692870 ◽

2021 ◽

Vol 12 ◽

Author(s):

Marlee R. Labroo ◽

Jauhar Ali ◽

M. Umair Aslam ◽

Erik Jon de Asis ◽

Madonna A. dela Paz ◽

...

Keyword(s):

Genetic Distance ◽

Genomic Prediction ◽

Hybrid Rice ◽

Phenotypic Selection ◽

Production Costs ◽

Breeding Cycle ◽

Male Sterile ◽

Training Set ◽

Combining Abilities ◽

Genomic Predictions

Hybrid rice varieties can outyield the best inbred varieties by 15 – 30% with appropriate management. However, hybrid rice requires more inputs and management than inbred rice to realize a yield advantage in high-yielding environments. The development of stress-tolerant hybrid rice with lowered input requirements could increase hybrid rice yield relative to production costs. We used genomic prediction to evaluate the combining abilities of 564 stress-tolerant lines used to develop Green Super Rice with 13 male sterile lines of the International Rice Research Institute for yield-related traits. We also evaluated the performance of their F1 hybrids. We identified male sterile lines with good combining ability as well as F1 hybrids with potential further use in product development. For yield per plant, accuracies of genomic predictions of hybrid genetic values ranged from 0.490 to 0.822 in cross-validation if neither parent or up to both parents were included in the training set, and both general and specific combining abilities were modeled. The accuracy of phenotypic selection for hybrid yield per plant was 0.682. The accuracy of genomic predictions of male GCA for yield per plant was 0.241, while the accuracy of phenotypic selection was 0.562. At the observed accuracies, genomic prediction of hybrid genetic value could allow improved identification of high-performing single crosses. In a reciprocal recurrent genomic selection program with an accelerated breeding cycle, observed male GCA genomic prediction accuracies would lead to similar rates of genetic gain as phenotypic selection. It is likely that prediction accuracies of male GCA could be improved further by targeted expansion of the training set. Additionally, we tested the correlation of parental genetic distance with mid-parent heterosis in the phenotyped hybrids. We found the average mid-parent heterosis for yield per plant to be consistent with existing literature values at 32.0%. In the overall population of study, parental genetic distance was significantly negatively correlated with mid-parent heterosis for yield per plant (r = −0.131) and potential yield (r = −0.092), but within female families the correlations were non-significant and near zero. As such, positive parental genetic distance was not reliably associated with positive mid-parent heterosis.

Download Full-text

Informed training set design enables efficient machine learning-assisted directed protein evolution

Cell Systems ◽

10.1016/j.cels.2021.07.008 ◽

2021 ◽

Cited By ~ 2

Author(s):

Bruce J. Wittmann ◽

Yisong Yue ◽

Frances H. Arnold

Keyword(s):

Machine Learning ◽

Protein Evolution ◽

Set Design ◽

Training Set ◽

Efficient Machine

Download Full-text

Facing the Cover-Source Mismatch on JPHide using Training-Set Design

Proceedings of the 6th ACM Workshop on Information Hiding and Multimedia Security - IH&MMSec '18 ◽

10.1145/3206004.3206021 ◽

2018 ◽

Cited By ~ 2

Author(s):

Dirk Borghys ◽

Patrick Bas ◽

Helena Bruyninckx

Keyword(s):

Set Design ◽

Training Set

Download Full-text

Enviromic Assembly Increases Accuracy and Reduces Costs of the Genomic Prediction for Yield Plasticity in Maize

Frontiers in Plant Science ◽

10.3389/fpls.2021.717552 ◽

2021 ◽

Vol 12 ◽

Author(s):

Germano Costa-Neto ◽

Jose Crossa ◽

Roberto Fritsche-Neto

Keyword(s):

Phenotypic Variation ◽

Genomic Prediction ◽

Proof Of Concept ◽

Tropical Maize ◽

Training Set ◽

Early Screening ◽

Environmental Similarity ◽

Genetic And Environmental Factors ◽

Growing Conditions

Quantitative genetics states that phenotypic variation is a consequence of the interaction between genetic and environmental factors. Predictive breeding is based on this statement, and because of this, ways of modeling genetic effects are still evolving. At the same time, the same refinement must be used for processing environmental information. Here, we present an “enviromic assembly approach,” which includes using ecophysiology knowledge in shaping environmental relatedness into whole-genome predictions (GP) for plant breeding (referred to as enviromic-aided genomic prediction, E-GP). We propose that the quality of an environment is defined by the core of environmental typologies and their frequencies, which describe different zones of plant adaptation. From this, we derived markers of environmental similarity cost-effectively. Combined with the traditional additive and non-additive effects, this approach may better represent the putative phenotypic variation observed across diverse growing conditions (i.e., phenotypic plasticity). Then, we designed optimized multi-environment trials coupling genetic algorithms, enviromic assembly, and genomic kinships capable of providing in-silico realization of the genotype-environment combinations that must be phenotyped in the field. As proof of concept, we highlighted two E-GP applications: (1) managing the lack of phenotypic information in training accurate GP models across diverse environments and (2) guiding an early screening for yield plasticity exerting optimized phenotyping efforts. Our approach was tested using two tropical maize sets, two types of enviromics assembly, six experimental network sizes, and two types of optimized training set across environments. We observed that E-GP outperforms benchmark GP in all scenarios, especially when considering smaller training sets. The representativeness of genotype-environment combinations is more critical than the size of multi-environment trials (METs). The conventional genomic best-unbiased prediction (GBLUP) is inefficient in predicting the quality of a yet-to-be-seen environment, while enviromic assembly enabled it by increasing the accuracy of yield plasticity predictions. Furthermore, we discussed theoretical backgrounds underlying how intrinsic envirotype-phenotype covariances within the phenotypic records can impact the accuracy of GP. The E-GP is an efficient approach to better use environmental databases to deliver climate-smart solutions, reduce field costs, and anticipate future scenarios.

Download Full-text

Training Set Design for Test Removal Classication in IC Test

10.15760/etd.2028 ◽

2000 ◽

Author(s):

Nagarjun Hassan Ranganath

Keyword(s):

Set Design ◽

Design For Test ◽

Training Set ◽

Ic Test

Download Full-text

Genomic Prediction of Tropical Maize Resistance to Fall Armyworm and Weevils: Genomic Selection Should Focus on Effective Training Set Determination

10.20944/preprints202007.0336.v1 ◽

2020 ◽

Author(s):

Arfang Badji ◽

Lewis Machida ◽

Daniel Bomet Kwemoi ◽

Frank Kumi ◽

Dennis Okii ◽

...

Keyword(s):

Genomic Selection ◽

Genomic Prediction ◽

Insect Pests ◽

Fall Armyworm ◽

Sub Saharan Africa ◽

Tropical Maize ◽

Training Set ◽

Maize Weevil ◽

Maize Resistance ◽

Sub Saharan

Genomic selection (GS) can accelerate variety release by shortening variety development phase when factors that influence prediction accuracies (PA) of genomic prediction (GP) models such as training set (TS) size and relationship with the breeding set (BS) are optimized beforehand. In this study, PAs for the resistance to fall armyworm (FAW) and maize weevil (MW) in a diverse tropical maize panel composed of 341 double haploid and inbred lines were estimated. Both phenotypic best linear unbiased predictors (BLUPs) and estimators (BLUEs) were predicted using 17 parametric, semi-parametric, and nonparametric algorithms with a 10-fold and 5 repetitions cross-validation strategy. n. For both MW and FAW resistance datasets with an RBTS of 37%, PAs achieved with BLUPs were at least as twice as higher than those realized with BLUEs. The PAs achieved with BLUPs for MW resistance traits: grain weight loss (GWL), adult progeny emergence (AP), and number of affected kernels (AK) varied from 0.66 to 0.82. The PAs were also high for FAW resistance RBTS datasets, varying from 0.694 to 0.714 (for RBTS of 37%) to 0.843 to 0.844 (for RBTS of 85%). The PAs for FAW resistance with PBTS were generally high varying from 0.83 to 0.86, except for one dataset that had PAs ranging from 0.11 to 0.75. GP models showed generally similar predictive abilities for each trait while the TS designation was determinant. There was a highly positive correlation (R=0.92***) between TS size and PAs for the RBTS approach while, for the PBTS, these parameters were highly negatively correlated (R=-0.44***), indicating the importance of the degree of kinship between the TS and the BS with the smallest TS (31%) achieving the highest PAs (0.86). This study paves the way towards the use of GS for maize resistance to insect pests in sub-Saharan Africa.

Download Full-text

The effects of training population design on genomic prediction accuracy in wheat

10.1101/443267 ◽

2018 ◽

Cited By ~ 1

Author(s):

Stefan McKinnon Edwards ◽

Jaap B. Buntjer ◽

Robert Jackson ◽

Alison R. Bentley ◽

Jacob Lage ◽

...

Keyword(s):

Genomic Selection ◽

Genetic Gain ◽

Genomic Prediction ◽

Genetic Material ◽

Breeding Value ◽

Training Set ◽

Or Efficiency ◽

Genomic Breeding Value ◽

Close Relatives ◽

Training Sets

AbstractGenomic selection offers several routes for increasing genetic gain or efficiency of plant breeding programs. In various species of livestock there is empirical evidence of increased rates of genetic gain from the use of genomic selection to target different aspects of the breeder’s equation. Accurate predictions of genomic breeding value are central to this and the design of training sets is in turn central to achieving sufficient levels of accuracy. In summary, small numbers of close relatives and very large numbers of distant relatives are expected to enable accurate predictions.To quantify the effect of some of the properties of training sets on the accuracy of genomic selection in crops we performed an extensive field-based winter wheat trial. In summary, this trial involved the construction of 44 F2:4 bi- and triparental populations, from which 2992 lines were grown on four field locations and yield was measured. For each line, genotype data were generated for 25,000 segregating single nucleotide polymorphism markers. The overall heritability of yield was estimated to 0.65, and estimates within individual families ranged between 0.10 and 0.85. Within cross genomic prediction accuracies of yield BLUEs were 0.125 – 0.127 using two different cross-validation approaches, and generally increased with training set size. Using related crosses in training and validation sets generally resulted in higher prediction accuracies than using unrelated crosses. The results of this study emphasize the importance of the training set design in relation to the genetic material to which the resulting prediction model is to be applied.

Download Full-text

A Function Accounting for Training Set Size and Marker Density to Model the Average Accuracy of Genomic Prediction

PLoS ONE ◽

10.1371/journal.pone.0081046 ◽

2013 ◽

Vol 8 (12) ◽

pp. e81046 ◽

Cited By ~ 33

Author(s):

Malena Erbe ◽

Birgit Gredler ◽

Franz Reinhold Seefried ◽

Beat Bapst ◽

Henner Simianer

Keyword(s):

Genomic Prediction ◽

Marker Density ◽

Training Set ◽

Set Size ◽

Average Accuracy

Download Full-text

Genomic predictive ability for foliar nutritive traits in perennial ryegrass

10.1101/727958 ◽

2019 ◽

Author(s):

Sai Krishna Arojju ◽

Mingshu Cao ◽

M. Z. Zulfi Jahufer ◽

Brent A Barrett ◽

Marty J Faville

Keyword(s):

Genomic Selection ◽

Genomic Prediction ◽

Nutritive Value ◽

Prediction Models ◽

Genotypic Variation ◽

Genetic Correlations ◽

Predictive Ability ◽

Water Soluble ◽

Training Set ◽

Sib Families

AbstractForage nutritive value impacts animal nutrition, which underpins livestock productivity, reproduction and health. Genetic improvement for nutritive traits has been limited, as they are typically expensive and time-consuming to measure through conventional methods. Genomic selection is appropriate for such complex and expensive traits, enabling cost-effective prediction of breeding values using genome-wide markers. The aims of the present study were to assess the potential of genomic selection for a range of nutritive traits in a multi-population training set, and to quantify contributions of genotypic, environmental and genotype-by-environment (G × E) variance components to trait variation and heritability for nutritive traits. The training set consisted of a total of 517 half-sibling (half-sib) families, from five advanced breeding populations, evaluated in two distinct New Zealand grazing environments. Autumn-harvested samples were analyzed for 18 nutritive traits and maternal parents of the half-sib families were genotyped using genotyping-by-sequencing. Significant (P<0.05) genotypic variation was detected for all nutritive traits and genomic heritability (h2g) was moderate to high (0.20 to 0.74). G × E interactions were significant and particularly large for water soluble carbohydrate (WSC), crude fat, phosphorus (P) and crude protein. GBLUP, KGD-GBLUP and BayesC genomic prediction models displayed similar predictive ability, estimated by 10-fold cross validation, for all nutritive traits with values ranging from r = 0.16 to 0.45 using phenotypes from across two environments. High predictive ability was observed for the mineral traits sulphur (0.44), sodium (0.45) and magnesium (0.45) and the lowest values were observed for P (0.16), digestibility (0.22) and high molecular weight WSC (0.23). Predictive ability estimates for most nutritive traits were retained when marker number was reduced from 1 million to as few as 50,000. The moderate to high predictive abilities observed suggests implementation of genomic selection is feasible for most of the nutritive traits examined. For traits with lower predictive ability, multi-trait genomic prediction approaches that exploit the strong genetic correlations observed amongst some nutritive traits may be useful. This appears to be particularly important for WSC, considered one of the primary constituent of nutritive value for forages.

Download Full-text