Evaluating Methods of Updating Training Data in Long-Term Genomewide Selection

Mapping Intimacies ◽

10.1101/087163 ◽

2016 ◽

Author(s):

Jeffrey L. Neyhart ◽

Tyler Tiede ◽

Aaron J. Lorenz ◽

Kevin P. Smith

Keyword(s):

Genetic Gain ◽

Prediction Accuracy ◽

Training Data ◽

Training Population ◽

Optimal Method ◽

Phenotypic Data ◽

Breeding Cycles ◽

Genomewide Selection ◽

The Impact

ABSTRACTGenomewide selection is hailed for its ability to facilitate greater genetic gains per unit time. Over breeding cycles, the requisite linkage disequilibrium (LD) between quantitative trait loci (QTL) and markers is expected to change as a result of recombination, selection, and drift, leading to a decay in prediction accuracy. Previous research has identified the need to update the training population using data that may capture new LD generated over breeding cycles, however optimal methods of updating have not been explored. In a barley (Hordeum vulgare L.) breeding simulation experiment, we examined prediction accuracy and response to selection when updating the training population each cycle with the best predicted lines, the worst predicted lines, both the best and worst predicted lines, random lines, criterion-selected lines, or no lines. In the short-term, we found that updating with the best predicted lines or the best and worst predicted lines resulted in high prediction accuracy and genetic gain, but in the long-term, all methods (besides not updating) performed similarly. We also examined the impact of including all data in the training population or only the most recent data. Though patterns among update methods were similar, using a smaller, but more recent training population provided a slight advantage in prediction accuracy and genetic gain. In an actual breeding program, a breeder might desire to gather phenotypic data on lines predicted to be the best, perhaps to evaluate possible cultivars. Therefore, our results suggest that an optimal method of updating the training population is also very practical.

Download Full-text

New cycle, same old mistakes? Overlapping vs. discrete generations in long-term recurrent selection

10.1101/2021.10.12.464059 ◽

2021 ◽

Author(s):

Marlee R. Labroo ◽

Jessica E. Rutkoski

Keyword(s):

Genetic Gain ◽

Overlapping Generations ◽

Recurrent Selection ◽

Phenotypic Selection ◽

Breeding Value ◽

Breeding Values ◽

Truncation Selection ◽

Breeding Cycles ◽

Error Bias

Background: Recurrent selection is a foundational breeding method for quantitative trait improvement. It typically features rapid breeding cycles that can lead to high rates of genetic gain. In recurrent phenotypic selection, generations do not overlap, which means that breeding candidates are evaluated and considered for selection for only one cycle. With recurrent genomic selection, candidates can be evaluated based on genomic estimated breeding values indefinitely, therefore facilitating overlapping generations. Candidates with true high breeding values that were discarded in one cycle due to underestimation of breeding value could be identified and selected in subsequent cycles. The consequences of allowing generations to overlap in recurrent selection are unknown. We assessed whether maintaining overlapping and discrete generations led to differences in genetic gain for phenotypic, genomic truncation, and genomic optimum contribution recurrent selection by simulation of traits with various heritabilities and genetic architectures across fifty breeding cycles. We also assessed differences of overlapping and discrete generations in a conventional breeding scheme with multiple stages and cohorts. Results: With phenotypic selection, overlapping generations led to decreased genetic gain compared to discrete generations due to increased selection error bias. Selected individuals, which were in the upper tail of the distribution of phenotypic values, tended to also have high absolute error relative to their true breeding value compared to the overall population. Without repeated phenotyping, these erroneously outlying individuals were repeatedly selected across cycles, leading to decreased genetic gain. With genomic truncation selection, overlapping and discrete generations performed similarly as updating breeding values precluded repeatedly selecting individuals with inaccurately high estimates of breeding values in subsequent cycles. Overlapping generations did not outperform discrete generations in the presence of a positive genetic trend with genomic truncation selection, as past generations had lower mean genetic values than the current generation of selection candidates. With genomic optimum contribution selection, overlapping and discrete generations performed similarly, but overlapping generations slightly outperformed discrete generations in the long term if the targeted inbreeding rate was extremely low. Conclusions: Maintaining discrete generations in recurrent phenotypic mass selection leads to increased genetic gain, especially at low heritabilities, by preventing selection error bias. With genomic truncation selection and genomic optimum contribution selection, genetic gain does not differ between discrete and overlapping generations assuming non-genetic effects are not present. Overlapping generations may increase genetic gain in the long term with very low targeted rates of inbreeding in genomic optimum contribution selection.

Download Full-text

Evaluating Methods of Updating Training Data in Long-Term Genomewide Selection

G3 Genes|Genome|Genetics ◽

10.1534/g3.117.040550 ◽

2017 ◽

Vol 7 (5) ◽

pp. 1499-1510 ◽

Cited By ~ 16

Author(s):

Jeffrey L. Neyhart ◽

Tyler Tiede ◽

Aaron J. Lorenz ◽

Kevin P. Smith

Keyword(s):

Training Data ◽

Genomewide Selection ◽

Evaluating Methods

Download Full-text

Genomic prediction for malting quality traits in practical barley breeding programs

10.1101/2020.07.30.228007 ◽

2020 ◽

Cited By ~ 1

Author(s):

Pernille Sarup ◽

Vahid Edriss ◽

Nanna Hellum Kristensen ◽

Jens Due Jensen ◽

Jihad Orabi ◽

...

Keyword(s):

Genomic Prediction ◽

Prediction Accuracy ◽

Cross Validation ◽

Spring Barley ◽

Malting Quality ◽

Breeding Cycle ◽

Quality Traits ◽

Training Population ◽

Barley Breeding ◽

Breeding Cycles

AbstractGenomic prediction can be advantageous in barley breeding for traits such as yield and malting quality to increase selection accuracy and minimize expensive phenotyping. In this paper, we investigate the possibilities of genomic selection for malting quality traits using a limited training population. The size of the training population is an important factor in determining the prediction accuracy of a trait. We investigated the potential for genomic prediction of malting quality within breeding cycles with leave one out (LOO) cross-validation, and across breeding cycles with leave set out (LSO) cross-validation. In addition, we investigated the effect of training population size on prediction accuracy by random two, four, and ten-fold cross-validation. The material used in this study was a population of 1329 spring barley lines from four breeding cycles. We found medium to high narrow sense heritabilities of the malting traits (0.31 to 0.65). Accuracies of predicting breeding values from LOO tests ranged from 0.6 to 0.9 making it worth the effort to use genomic prediction within breeding cycles. Accuracies from LSO tests ranged from 0.39 to 0.70 showing that genomic prediction across the breeding cycles were possible as well. Accuracy of prediction increased when the size of the training population increased. Therefore, prediction accuracy might be increased both within and across breeding cycle by increasing size of the training population

Download Full-text

Test Oracle Generation Based on BPNN by Using the Values of Variables at Different Breakpoints for Programs

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194021500492 ◽

2021 ◽

Vol 31 (10) ◽

pp. 1469-1494

Author(s):

Chunyan Ma ◽

Shaoying Liu ◽

Jinglan Fu ◽

Tao Zhang

Keyword(s):

Prediction Accuracy ◽

Recall Rate ◽

Training Data ◽

New Method ◽

Test Case ◽

Test Cases ◽

Test Oracles ◽

Test Input ◽

Test Oracle ◽

The Impact

Automatic test oracle generation is a bottleneck in realizing full automation of the entire software testing process. This study proposes a new method for automatically generating a test oracle for a new test input on the basis of several historical test cases by using a backpropagation neural network (BPNN) model. The new method is different from existing test oracle techniques. Specifically, our method has two steps. First, the values of variables are collected as training data when several historical test inputs are used to execute the program at different breakpoints. The test oracles (pass or fail) of these test cases are utilized to classify and label the training data. Second, a new test input is used to execute the program at different breakpoints, where the trained BPNN prediction model automatically generates its test oracle on the basis of the collected values of the variables involved. We conduct an experiment to validate our method. In the experiment, 113 faulty versions of seven types of programs are used as experimental objects. Results show that the average prediction accuracy rate of 74,651 test oracles is 95.8%. Although the failed test cases in the training data account for less than 5%, the overall average recall rate (prediction accuracy of test case execution failure) of all programs is 78.9%. Furthermore, the trained BPNN can reveal not only the impact of the values of variables but also the impact of the logical correspondence between variables in test oracle generation.

Download Full-text

Long-Term Glucose Forecasting Using a Physiological Model and Deconvolution of the Continuous Glucose Monitoring Signal

Sensors ◽

10.3390/s19194338 ◽

2019 ◽

Vol 19 (19) ◽

pp. 4338 ◽

Cited By ~ 2

Author(s):

Chengyuan Liu ◽

Josep Vehí ◽

Parizad Avari ◽

Monika Reddy ◽

Nick Oliver ◽

...

Keyword(s):

Blood Glucose ◽

Continuous Glucose Monitoring ◽

Latent Variable ◽

Prediction Accuracy ◽

Insulin Dose ◽

Artificial Pancreas ◽

Glucose Monitoring ◽

Physiological Model ◽

The Impact

(1) Objective: Blood glucose forecasting in type 1 diabetes (T1D) management is a maturing field with numerous algorithms being published and a few of them having reached the commercialisation stage. However, accurate long-term glucose predictions (e.g., >60 min), which are usually needed in applications such as precision insulin dosing (e.g., an artificial pancreas), still remain a challenge. In this paper, we present a novel glucose forecasting algorithm that is well-suited for long-term prediction horizons. The proposed algorithm is currently being used as the core component of a modular safety system for an insulin dose recommender developed within the EU-funded PEPPER (Patient Empowerment through Predictive PERsonalised decision support) project. (2) Methods: The proposed blood glucose forecasting algorithm is based on a compartmental composite model of glucose–insulin dynamics, which uses a deconvolution technique applied to the continuous glucose monitoring (CGM) signal for state estimation. In addition to commonly employed inputs by glucose forecasting methods (i.e., CGM data, insulin, carbohydrates), the proposed algorithm allows the optional input of meal absorption information to enhance prediction accuracy. Clinical data corresponding to 10 adult subjects with T1D were used for evaluation purposes. In addition, in silico data obtained with a modified version of the UVa-Padova simulator was used to further evaluate the impact of accounting for meal absorption information on prediction accuracy. Finally, a comparison with two well-established glucose forecasting algorithms, the autoregressive exogenous (ARX) model and the latent variable-based statistical (LVX) model, was carried out. (3) Results: For prediction horizons beyond 60 min, the performance of the proposed physiological model-based (PM) algorithm is superior to that of the LVX and ARX algorithms. When comparing the performance of PM against the secondly ranked method (ARX) on a 120 min prediction horizon, the percentage improvement on prediction accuracy measured with the root mean square error, A-region of error grid analysis (EGA), and hypoglycaemia prediction calculated by the Matthews correlation coefficient, was 18.8 % , 17.9 % , and 80.9 % , respectively. Although showing a trend towards improvement, the addition of meal absorption information did not provide clinically significant improvements. (4) Conclusion: The proposed glucose forecasting algorithm is potentially well-suited for T1D management applications which require long-term glucose predictions.

Download Full-text

Physics-Guided Machine Learning for Scientific Discovery: An Application in Simulating Lake Temperature Profiles

ACM/IMS Transactions on Data Science ◽

10.1145/3447814 ◽

2021 ◽

Vol 2 (3) ◽

pp. 1-26

Author(s):

Xiaowei Jia ◽

Jared Willard ◽

Anuj Karpatne ◽

Jordan S. Read ◽

Jacob A. Zwart ◽

...

Keyword(s):

Machine Learning ◽

Environmental Sustainability ◽

Prediction Accuracy ◽

State Of The Art ◽

Scientific Discovery ◽

Human Life ◽

Training Data ◽

Physical Processes ◽

Aquatic Resource ◽

The Impact

Physics-based models are often used to study engineering and environmental systems. The ability to model these systems is the key to achieving our future environmental sustainability and improving the quality of human life. This article focuses on simulating lake water temperature, which is critical for understanding the impact of changing climate on aquatic ecosystems and assisting in aquatic resource management decisions. General Lake Model (GLM) is a state-of-the-art physics-based model used for addressing such problems. However, like other physics-based models used for studying scientific and engineering systems, it has several well-known limitations due to simplified representations of the physical processes being modeled or challenges in selecting appropriate parameters. While state-of-the-art machine learning models can sometimes outperform physics-based models given ample amount of training data, they can produce results that are physically inconsistent. This article proposes a physics-guided recurrent neural network model (PGRNN) that combines RNNs and physics-based models to leverage their complementary strengths and improves the modeling of physical processes. Specifically, we show that a PGRNN can improve prediction accuracy over that of physics-based models (by over 20% even with very little training data), while generating outputs consistent with physical laws. An important aspect of our PGRNN approach lies in its ability to incorporate the knowledge encoded in physics-based models. This allows training the PGRNN model using very few true observed data while also ensuring high prediction accuracy. Although we present and evaluate this methodology in the context of modeling the dynamics of temperature in lakes, it is applicable more widely to a range of scientific and engineering disciplines where physics-based (also known as mechanistic) models are used.

Download Full-text

Suppressed, but Not Forgotten

Swiss Journal of Psychology ◽

10.1024/1421-0185/a000033 ◽

2011 ◽

Vol 70 (1) ◽

pp. 5-11 ◽

Cited By ~ 8

Author(s):

Beat Meier ◽

Anja König ◽

Samuel Parak ◽

Katharina Henke

Keyword(s):

Memory Trace ◽

Thought Suppression ◽

Test Phase ◽

Indirect Test ◽

Final Test ◽

Test Experiment ◽

And Control ◽

Cue Words ◽

The Impact

This study investigates the impact of thought suppression over a 1-week interval. In two experiments with 80 university students each, we used the think/no-think paradigm in which participants initially learn a list of word pairs (cue-target associations). Then they were presented with some of the cue words again and should either respond with the target word or avoid thinking about it. In the final test phase, their memory for the initially learned cue-target pairs was tested. In Experiment 1, type of memory test was manipulated (i.e., direct vs. indirect). In Experiment 2, type of no-think instructions was manipulated (i.e., suppress vs. substitute). Overall, our results showed poorer memory for no-think and control items compared to think items across all experiments and conditions. Critically, however, more no-think than control items were remembered after the 1-week interval in the direct, but not in the indirect test (Experiment 1) and with thought suppression, but not thought substitution instructions (Experiment 2). We suggest that during thought suppression a brief reactivation of the learned association may lead to reconsolidation of the memory trace and hence to better retrieval of suppressed than control items in the long term.

Download Full-text