Metafounders are Fst fixation indices and reduce bias in single step genomic evaluations

Mapping Intimacies ◽

10.1101/083675 ◽

2016 ◽

Author(s):

Carolina Andrea Garcia-Baccino ◽

Andres Legarra ◽

Ole F Christensen ◽

Ignacy Misztal ◽

Ivan Pocrnic ◽

...

Keyword(s):

Maximum Likelihood ◽

Least Squares ◽

Method Of Moments ◽

Generalized Least Squares ◽

Single Step ◽

Genomic Relationship Matrix ◽

Relationship Matrix ◽

Genetic Trend ◽

Allelic Frequencies ◽

True Value

ABSTRACTBACKGROUNDMetafounders are pseudo-individuals that condense the genetic heterozygosity and relationships within and across base pedigree populations, i.e. ancestral populations. This work addresses estimation and usefulness of metafounder relationships in Single Step GBLUP.RESULTSWe show that the ancestral relationship parameters are proportional to standardized covariances of base allelic frequencies across populations, like Fst fixation indexes. These covariances of base allelic frequencies can be estimated from marker genotypes of related recent individuals, and pedigree. Simple methods for estimation include naïve computation of allele frequencies from marker genotypes or a method of moments equating average pedigree-based and marker-based relationships. Complex methods include generalized least squares or maximum likelihood based on pedigree relationships. To our knowledge, methods to infer Fst coefficients and Fst differentiation have not been developed for related populations.A compatible genomic relationship matrix constructed as a crossproduct of {−1,0,1} codes, and equivalent (up to scale factors) to an identity by state relationship matrix at the markers, is derived. Using a simulation with a single population under selection, in which only males and youngest animals were genotyped, we observed that generalized least squares or maximum likelihood gave accurate and unbiased estimates of the ancestral relationship parameter (true value: 0.40) whereas the other two (naïve and method of moments) were biased (estimates of 0.43 and 0.35). We also observed that genomic evaluation by Single Step GBLUP using metafounders was less biased in terms of accurate genetic trend (0.01 instead of 0.12 bias), slightly overdispersed (0.94 instead of 0.99) and as accurate (0.74) than the regular Single Step GBLUP. Single Step GBLUP using metafounders also provided consistent estimates of heritability.CONCLUSIONSEstimation of metafounder relationship can be achieved using BLUP-like methods with pedigree and markers. Inclusion of metafounder relationships improves bias of genomic predictions with no loss in accuracy.

Download Full-text

Application of single-step GBLUP in New Zealand Romney sheep

Animal Production Science ◽

10.1071/an19315 ◽

2020 ◽

Vol 60 (9) ◽

pp. 1136

Author(s):

M. A. Nilforooshan

Keyword(s):

New Zealand ◽

Single Step ◽

Pedigree Information ◽

Genomic Relationship Matrix ◽

Relationship Matrix ◽

Genomic Relationship ◽

Genetic Trend ◽

Weaning Weight ◽

Sheep Population ◽

Pedigree Relationship

Context In New Zealand, Romney is the most predominant breed and is reared as a dual-purpose sheep. The number of genotypes is rapidly increasing in the sheep population, and making use of both genotypes and pedigree information is of importance for genetic evaluations. Single-step genomic best linear unbiased prediction (ssGBLUP) is a method for simultaneous prediction of genetic merits for genotyped and non-genotyped animals. The combination and the compatibility of the genomic relationship matrix (G) and the pedigree relationship matrix for genotyped animals (A22) is important for unbiased ssGBLUP. Aims The aim of the present study was to find an optimum genetic relationship matrix for ssGBLUP weaning-weight evaluation of Romney sheep in New Zealand. Methods Data consisted of adjusted weaning weights for 2422011 sheep, 50K single-nucleotide polymorphism genotypes for 13304 animals and 3028688 animals in the pedigree. Blending of G and A22 was tested with weights (k) ranging from 0.2 to 0.99 (kG + (1 – k)A22), followed by none or one of the three methods of tuning G to A22. Key results The averages of G and A22 were close to each other for overall, diagonal and off-diagonal elements. Therefore, differently tuned G performed similarly. However, elements of G showed larger variation than did the elements of A22 and, on average, genotyped animals were less related in G than in A22. Correlations between genomic estimated breeding values (GEBV) for the top 500 genotyped animals, as well as the rank correlations, were almost 1 among ssGBLUP evaluations using tuned G. The corresponding correlations with BLUP evaluations were increased by blending G with a larger proportion of A22, and were further increased by tuning G, indicating improved compatibility between G and A22. Blending and tuning G suppressed the inflation of GEBV and bias and it moved the genetic trend closer to the genetic trend obtained from BLUP. Conclusions A combination of blending and tuning G to A22, with a blending rate of 0.5 at most, is recommended for weaning weight of Romney sheep in New Zealand. Failure to do that resulted in inflated GEBV that can reduce the accuracy of selection, especially for genotyped animals. Implications There is a growing interest in the single-step GBLUP method for simultaneous genetic evaluation of genotyped and non-genotyped animals, in which genomic and pedigree relationship matrices are admixed. Using data from New Zealand Romney sheep, we have shown that adjustment of the genomic relationship matrix on the basis of the pedigree relationship matrix is necessary to avoid inflated evaluations. Improving the compatibility between genomic and pedigree relationship matrices is important for obtaining accurate and unbiased single-step GBLUP evaluations.

Download Full-text

Estimating an Autoregressive Current Effects Model of Sales Response when Observations are Aggregated over Time: Least Squares versus Maximum Likelihood

Journal of Marketing Research ◽

10.1177/002224378802500308 ◽

1988 ◽

Vol 25 (3) ◽

pp. 301-307

Author(s):

Wilfried R. Vanhonacker

Keyword(s):

Maximum Likelihood ◽

Least Squares ◽

Serial Correlation ◽

Mean Squared Error ◽

Generalized Least Squares ◽

Parameter Estimates ◽

Squared Error ◽

Positive Serial Correlation ◽

Serial Correlation Coefficient ◽

Over Time

Estimating autoregressive current effects models is not straightforward when observations are aggregated over time. The author evaluates a familiar iterative generalized least squares (IGLS) approach and contrasts it to a maximum likelihood (ML) approach. Analytic and numerical results suggest that (1) IGLS and ML provide good estimates for the response parameters in instances of positive serial correlation, (2) ML provides superior (in mean squared error) estimates for the serial correlation coefficient, and (3) IGLS might have difficulty in deriving parameter estimates in instances of negative serial correlation.

Download Full-text

Generalized least squares and maximum likelihood estimations of multivariate polychoric correlations

Acta Mathematicae Applicatae Sinica English Series ◽

10.1007/bf02008373 ◽

1987 ◽

Vol 3 (4) ◽

pp. 351-357 ◽

Cited By ~ 1

Author(s):

Xiqin Li ◽

Weixian Pan

Keyword(s):

Maximum Likelihood ◽

Least Squares ◽

Generalized Least Squares ◽

Polychoric Correlations ◽

Maximum Likelihood Estimations

Download Full-text

Comparative Analysis of Some Structural Equation Model Estimation Methods with Application to Coronary Heart Disease Risk

Journal of Probability and Statistics ◽

10.1155/2020/4181426 ◽

2020 ◽

Vol 2020 ◽

pp. 1-15

Author(s):

David Adedia ◽

Atinuke O. Adebanji ◽

Simon Kojo Appiah

Keyword(s):

Coronary Heart Disease ◽

Maximum Likelihood ◽

Least Squares ◽

Asymptotic Distribution ◽

Structural Equation ◽

Generalized Least Squares ◽

Least Squares Estimator ◽

Coronary Risk ◽

Distribution Free ◽

Better Than

This study compared a ridge maximum likelihood estimator to Yuan and Chan (2008) ridge maximum likelihood, maximum likelihood, unweighted least squares, generalized least squares, and asymptotic distribution-free estimators in fitting six models that show relationships in some noncommunicable diseases. Uncontrolled hypertension has been shown to be a leading cause of coronary heart disease, kidney dysfunction, and other negative health outcomes. It poses equal danger when asymptomatic and undetected. Research has also shown that it tends to coexist with diabetes mellitus (DM), with the presence of DM doubling the risk of hypertension. The study assessed the effect of obesity, type II diabetes, and hypertension on coronary risk and also the existence of converse relationship with structural equation modelling (SEM). The results showed that the two ridge estimators did better than other estimators. Nonconvergence occurred for most of the models for asymptotic distribution-free estimator and unweighted least squares estimator whilst generalized least squares estimator had one nonconvergence of results. Other estimators provided competing outputs, but unweighted least squares estimator reported unreliable parameter estimates such as large chi-square test statistic and root mean square error of approximation for Model 3. The maximum likelihood family of estimators did better than others like asymptotic distribution-free estimator in terms of overall model fit and parameter estimation. Also, the study found that increase in obesity could result in a significant increase in both hypertension and coronary risk. Diastolic blood pressure and diabetes have significant converse effects on each other. This implies those who are hypertensive can develop diabetes and vice versa.

Download Full-text

Comparison of Maximum Likelihood, Generalized Least Squares, Ordinary Least Squares, and Asymptotically Distribution Free Parameter Estimates in Drug Abuse Latent Variable Causal Models

Journal of Drug Education ◽

10.2190/bjf9-xcv5-ewnn-pbgy ◽

1983 ◽

Vol 13 (4) ◽

pp. 387-404 ◽

Cited By ~ 11

Author(s):

G. J. Huba ◽

L. L. Harlow

Keyword(s):

Drug Abuse ◽

Maximum Likelihood ◽

Least Squares ◽

Latent Variable ◽

Generalized Least Squares ◽

Causal Models ◽

Multivariate Normal ◽

Distribution Free ◽

Asymptotically Distribution Free ◽

Normally Distributed

Latent variable causal modeling techniques are sometimes criticized when applied to drug abuse data because the commonly-employed maximum likelihood parameter estimation method requires that the data be normally distributed for the statistical tests to be accurate. In this article, four estimators for the parameters in two large latent variable causal models are compared in real drug abuse datasets. One estimator does not require that the data be multivariate normal and does, in fact, correct for data non-normality. Specifically, maximum likelihood and generalized least squares estimators for normally-distributed variables are compared with Browne's asymptotically distribution free techniques for continuous non-normally distributed data. Additionally, ordinary (unweighted) least squares estimates are used. Descriptions of the techniques are given and actual results in two “real” datasets are provided. It is concluded that the distribution free technique provides results which are generally comparable to those obtained with maximum likelihood estimation for datasets which depart in typical ways from the ideal of the multivariate normal distribution.

Download Full-text

MAXIMUM LIKELIHOOD ESTIMATORS IN THE MULTIVARIATE AUTOREGRESSIVE MOVING-AVERAGE MODEL FROM A GENERALIZED LEAST SQUARES VIEWPOINT

Journal of Time Series Analysis ◽

10.1111/j.1467-9892.1992.tb00099.x ◽

1992 ◽

Vol 13 (2) ◽

pp. 133-145 ◽

Cited By ~ 19

Author(s):

Gregory C. Reinsel ◽

Sabyasachi Basu ◽

Sook Fwe Yap

Keyword(s):

Maximum Likelihood ◽

Least Squares ◽

Moving Average ◽

Maximum Likelihood Estimators ◽

Generalized Least Squares ◽

Autoregressive Moving Average ◽

Average Model ◽

Autoregressive Moving Average Model ◽

Moving Average Model ◽

Multivariate Autoregressive

Download Full-text

Maximum likelihood and generalized least squares analyses of two-level structural equation models

Statistics & Probability Letters ◽

10.1016/0167-7152(92)90206-k ◽

1992 ◽

Vol 14 (1) ◽

pp. 25-30 ◽

Cited By ~ 7

Author(s):

Wai-Yin Poon ◽

Sik-Yum Lee

Keyword(s):

Maximum Likelihood ◽

Least Squares ◽

Structural Equation ◽

Structural Equation Models ◽

Generalized Least Squares

Download Full-text

A Comparison between Maximum Likelihood and Generalized Least Squares in a Heteroscedastic Linear Model

Journal of the American Statistical Association ◽

10.1080/01621459.1982.10477901 ◽

1982 ◽

Vol 77 (380) ◽

pp. 878-882 ◽

Cited By ~ 57

Author(s):

R. J. Carroll ◽

David Ruppert

Keyword(s):

Maximum Likelihood ◽

Least Squares ◽

Linear Model ◽

Generalized Least Squares

Download Full-text

Level-biases in estimated breeding values due to the use of different SNP panels over time in ssGBLUP

Genetics Selection Evolution ◽

10.1186/s12711-019-0517-z ◽

2019 ◽

Vol 51 (1) ◽

Cited By ~ 1

Author(s):

Øyvind Nordbø ◽

Arne B. Gjuvsland ◽

Leiv Sigbjørn Eikje ◽

Theo Meuwissen

Keyword(s):

Value Added ◽

Single Step ◽

Fine Tuning ◽

Genomic Relationship Matrix ◽

Relationship Matrix ◽

Optimal Selection ◽

Breeding Values ◽

Estimated Breeding Values ◽

Snp Panels ◽

Genomic Predictions

Abstract Background The main aim of single-step genomic predictions was to facilitate optimal selection in populations consisting of both genotyped and non-genotyped individuals. However, in spite of intensive research, biases still occur, which make it difficult to perform optimal selection across groups of animals. The objective of this study was to investigate whether incomplete genotype datasets with errors could be a potential source of level-bias between genotyped and non-genotyped animals and between animals genotyped on different single nucleotide polymorphism (SNP) panels in single-step genomic predictions. Results Incomplete and erroneous genotypes of young animals caused biases in breeding values between groups of animals. Systematic noise or missing data for less than 1% of the SNPs in the genotype data had substantial effects on the differences in breeding values between genotyped and non-genotyped animals, and between animals genotyped on different chips. The breeding values of young genotyped individuals were biased upward, and the magnitude was up to 0.8 genetic standard deviations, compared with breeding values of non-genotyped individuals. Similarly, the magnitude of a small value added to the diagonal of the genomic relationship matrix affected the level of average breeding values between groups of genotyped and non-genotyped animals. Cross-validation accuracies and regression coefficients were not sensitive to these factors. Conclusions Because, historically, different SNP chips have been used for genotyping different parts of a population, fine-tuning of imputation within and across SNP chips and handling of missing genotypes are crucial for reducing bias. Although all the SNPs used for estimating breeding values are present on the chip used for genotyping young animals, incompleteness and some genotype errors might lead to level-biases in breeding values.

Download Full-text

335 Genomic predictions with a multi-breed genomic relationship matrix

Journal of Animal Science ◽

10.1093/jas/skz258.099 ◽

2019 ◽

Vol 97 (Supplement_3) ◽

pp. 49-50

Author(s):

Yvette Steyn ◽

Daniela Lourenco ◽

Ignacy Misztal

Keyword(s):

Prediction Accuracy ◽

Negative Impact ◽

Reference Population ◽

Single Step ◽

Genomic Relationship Matrix ◽

Relationship Matrix ◽

Genomic Relationship ◽

Effective Population ◽

Specific Allele ◽

Missing Genotypes

Abstract Multi-breed evaluations have the advantage of increasing the size of the reference population for genomic evaluations and are quite simple; however, combining breeds usually have a negative impact on prediction accuracy. The aim of this study was to evaluate the use of a multi-breed genomic relationship matrix (G), where SNP for each breed are non-shared. The multi-breed G is set assuming known genotypes for one breed and missing genotypes for the remaining breeds. This setup may avoid spurious IBS relationships between breeds and considers breed-specific allele frequencies. This scenario was contrasted to multi-breed evaluations where all SNP are shared, i.e., the same SNP, and to single-breed evaluations. Different SNP densities, namely 9k and 45k, and different effective population sizes (Ne) were tested. Five breeds mimicking recent beef cattle populations that diverged from the same historical population were simulated using different selection criteria. It was assumed that QTL effects were the same over all breeds. For the recent population, generations 1 to 9 had approximately half of the animals genotyped, whereas all 1200 animals were genotyped in generation 10. Genotyped animals in generation 10 were set as validation; therefore, each breed had a validation set. Analysis were performed using single-step GBLUP (ssGBLUP). Prediction accuracy was calculated as correlation between true (T) and genomic estimated (GE) BV. Accuracies of GEBV were lower for the larger Ne and low SNP density. All three scenarios using 45K resulted in similar accuracies, suggesting that the marker density is high enough to account for relationships and linkage disequilibrium with QTL. A shared multi-breed evaluation using 9K resulted in a decrease of accuracy of 0.08 for a smaller Ne and 0.11 for a larger Ne. This loss was mostly avoided when markers were treated as non-shared within the same genomic relationship matrix.

Download Full-text