Coheritability and Coenvironmentability as Concepts for Partitioning the Phenotypic Correlation

Mapping Intimacies ◽

10.1101/598623 ◽

2019 ◽

Author(s):

Jorge Vasquez-Kool

Keyword(s):

Genetic Parameter ◽

Three Dimensional ◽

Large Data ◽

Phenotypic Correlation ◽

Shared Environment ◽

Parameter Estimates ◽

Extensive Literature ◽

Data Set ◽

Phenotypic Characters ◽

Relative Contribution

AbstractCentral to the study of joint inheritance of quantitative traits is the determination of the degree of association between two phenotypic characters, and to quantify the relative contribution of shared genetic and environmental components influencing such relationship. One way to approach this problem builds on classical quantitative genetics theory, where the phenotypic correlation between two traits is modelled as the sum of a genetic component called the coheritability (hx,y), which reflects the degree of shared genetics influencing the phenotypic correlation, and an environmental component, namely the coenvironmentability (ex,y) that accounts for all other factors that exert influence on the observed trait-trait association. Here a mathematical and statistical framework is presented on the partition of the phenotypic correlation into these components. I describe visualization tools to analyze and ex,y concurrently, in the form of a three-dimensional (3DHER-plane) and a two-dimensional (2DHER-field) plots. A large data set of genetic parameter estimates (heritabilities, genetic and phenotypic correlations) was compiled from an extensive literature review, from which coheritability and coenvironmentability were derived, with the object to observe patterns of distribution, and tendency. Illustrative examples from a diverse set of published studies show the value of applying this partition to generate hypotheses proposing the differential contribution of shared genetics and shared environment to an observed phenotypic relationship between traits.

Download Full-text

Realized Sampling Variances of Estimates of Genetic Parameters and the Difference Between Genetic and Phenotypic Correlations

Genetics ◽

10.1093/genetics/143.3.1409 ◽

1996 ◽

Vol 143 (3) ◽

pp. 1409-1416 ◽

Cited By ~ 1

Author(s):

Kenneth R Koots ◽

John P Gibson

Keyword(s):

Genetic Correlation ◽

Genetic Parameters ◽

Genetic Parameter ◽

Genetic Correlations ◽

Phenotypic Correlation ◽

Parameter Estimates ◽

Sampling Variance ◽

Phenotypic Correlations ◽

Genetic And Phenotypic Correlations ◽

Heritability Estimates

Abstract A data set of 1572 heritability estimates and 1015 pairs of genetic and phenotypic correlation estimates, constructed from a survey of published beef cattle genetic parameter estimates, provided a rare opportunity to study realized sampling variances of genetic parameter estimates. The distribution of both heritability estimates and genetic correlation estimates, when plotted against estimated accuracy, was consistent with random error variance being some three times the sampling variance predicted from standard formulae. This result was consistent with the observation that the variance of estimates of heritabilities and genetic correlations between populations were about four times the predicted sampling variance, suggesting few real differences in genetic parameters between populations. Except where there was a strong biological or statistical expectation of a difference, there was little evidence for differences between genetic and phenotypic correlations for most trait combinations or for differences in genetic correlations between populations. These results suggest that, even for controlled populations, estimating genetic parameters specific to a given population is less useful than commonly believed. A serendipitous discovery was that, in the standard formula for theoretical standard error of a genetic correlation estimate, the heritabilities refer to the estimated values and not, as seems generally assumed, the true population values.

Download Full-text

Solvent Accessibility of Residues Undergoing Pathogenic Variations in Humans: From Protein Structures to Protein Sequences

Frontiers in Molecular Biosciences ◽

10.3389/fmolb.2020.626363 ◽

2021 ◽

Vol 7 ◽

Author(s):

Castrense Savojardo ◽

Matteo Manfredi ◽

Pier Luigi Martelli ◽

Rita Casadio

Keyword(s):

Solvent Accessibility ◽

Protein Structures ◽

Three Dimensional ◽

Protein Sequences ◽

Large Data ◽

Human Protein ◽

Dimensional Structure ◽

Wild Type ◽

Solvent Exposure ◽

Data Set

Solvent accessibility (SASA) is a key feature of proteins for determining their folding and stability. SASA is computed from protein structures with different algorithms, and from protein sequences with machine-learning based approaches trained on solved structures. Here we ask the question as to which extent solvent exposure of residues can be associated to the pathogenicity of the variation. By this, SASA of the wild-type residue acquires a role in the context of functional annotation of protein single-residue variations (SRVs). By mapping variations on a curated database of human protein structures, we found that residues targeted by disease related SRVs are less accessible to solvent than residues involved in polymorphisms. The disease association is not evenly distributed among the different residue types: SRVs targeting glycine, tryptophan, tyrosine, and cysteine are more frequently disease associated than others. For all residues, the proportion of disease related SRVs largely increases when the wild-type residue is buried and decreases when it is exposed. The extent of the increase depends on the residue type. With the aid of an in house developed predictor, based on a deep learning procedure and performing at the state-of-the-art, we are able to confirm the above tendency by analyzing a large data set of residues subjected to variations and occurring in some 12,494 human protein sequences still lacking three-dimensional structure (derived from HUMSAVAR). Our data support the notion that surface accessible area is a distinguished property of residues that undergo variation and that pathogenicity is more frequently associated to the buried property than to the exposed one.

Download Full-text

Interactive geovisualization of activity-travel patterns using three-dimensional geographical information systems: a methodological exploration with a large data set

Transportation Research Part C Emerging Technologies ◽

10.1016/s0968-090x(00)00017-6 ◽

2000 ◽

Vol 8 (1-6) ◽

pp. 185-203 ◽

Cited By ~ 292

Author(s):

Mei-Po Kwan

Keyword(s):

Information Systems ◽

Geographical Information Systems ◽

Three Dimensional ◽

Large Data ◽

Geographical Information ◽

Travel Patterns ◽

Data Set ◽

Large Data Set

Download Full-text

FRACTAL ASPECTS OF PROTEIN STRUCTURE AND DYNAMICS

Fractals ◽

10.1142/s0218348x93000198 ◽

1993 ◽

Vol 01 (02) ◽

pp. 179-189 ◽

Cited By ~ 6

Author(s):

T. GREGORY DEWEY

Keyword(s):

Dynamic Properties ◽

Linear Chain ◽

Protein Structures ◽

Three Dimensional ◽

Alpha Helix ◽

Large Data ◽

Thermal Fluctuations ◽

Radius Of Gyration ◽

Data Set ◽

Long Range Correlations

Proteins have well-defined three dimensional structures which are dictated by their amino acid sequence. Despite this great specificity, general structural and dynamic properties exist. Scaling relationships for the radius of gyration and surface area of a large data set of proteins are demonstrated in this work. These results show that proteins scale as collapsed polymers. Thermal fluctuations are examined for two different proteins by an analysis of the Debye-Waller factors derived from X-ray crystallographic data. Long-range correlations exist between fluctuations along the backbone. A disordered Ising model is presented which gives similar correlations. To further examine the role of multiple connectivity in protein structures, the vibrational spectrum for an alpha helix (linear chain with H-bonds) is analyzed from recursive relationships derived using a decimation technique.

Download Full-text

Genetic parameters for clinical mastitis in Holstein-Friesians in the United Kingdom: a Bayesian analysis

Animal Science ◽

10.1017/s1357729800058203 ◽

2001 ◽

Vol 73 (2) ◽

pp. 229-240 ◽

Cited By ~ 27

Author(s):

H. N. Kadarmideen ◽

R. Rekaya ◽

D. Gianola

Keyword(s):

Genetic Parameters ◽

Genetic Parameter ◽

Mixed Linear Model ◽

Clinical Mastitis ◽

Parameter Estimates ◽

Data Sets ◽

Data Sampling ◽

Environmental Variance ◽

Data Set ◽

Monte Carlo Techniques

AbstractA Bayesian threshold-liability model with Markov chain Monte Carlo techniques was used to infer genetic parameters for clinical mastitis records collected on Holstein-Friesian cows by one of the United Kingdom’s national recording schemes. Four data sets were created to investigate the effect of data sampling methods on genetic parameter estimates for first and multi-lactation cows, separately. The data sets were: (1) cows with complete first lactations only (8671 cows); (2) all cows, with first lactations whether complete or incomplete (10 967 cows); (3) cows with complete multi-lactations (32 948 records); and (4) all cows with multiple lactations whether complete or incomplete (44 268 records). A Gaussian mixed linear model with sire effects was adopted for liability. Explanatory variables included in the model varied for each data set. Analyses were conducted using Gibbs sampling and estimates were on the liability scale. Posterior means of heritability for clinical mastitis were higher for first lactations (0·11 and 0·10 for data sets 1 and 2, respectively) than for multiple lactations (0·09 and 0·07, for data sets 3 and 4, respectively). For multiple lactations, estimates of permanent environmental variance were higher for complete than incomplete lactations. Repeatability was 0·21 and 0·17 for data sets 3 and 4, respectively. This suggests the existence of effects, other than additive genetic effects, on susceptibility to mastitis that are common to all lactations. In first or multi-lactation data sets, heritability was proportionately 0·10 to 0·19 lower for data sets with all records (in which case the models had days in milk as a covariate) than for data with only complete lactation records (models without days in milk as a covariate). This suggests an effect of data sampling on genetic parameter estimates. The regression of liability on days in milk differed from zero, indicating that the probability of mastitis is higher for longer lactations, as expected. Results also indicated that a regression on days in milk should be included in a model for genetic evaluation of sires for mastitis resistance based on records in progress.

Download Full-text

Visualization of High-Dimensional Data by Pairwise Fusion Matrices Using t-SNE

Symmetry ◽

10.3390/sym11010107 ◽

2019 ◽

Vol 11 (1) ◽

pp. 107 ◽

Cited By ~ 6

Author(s):

Mujtaba Husnain ◽

Malik Missen ◽

Shahzad Mumtaz ◽

Muhammad Luqman ◽

Mickaël Coustaty ◽

...

Keyword(s):

Local Structure ◽

High Dimensional Data ◽

Three Dimensional ◽

Principal Component ◽

Large Data ◽

High Dimensional ◽

Data Set ◽

Novel Approach ◽

Critical Issues ◽

Low Dimensional

We applied t-distributed stochastic neighbor embedding (t-SNE) to visualize Urdu handwritten numerals (or digits). The data set used consists of 28 × 28 images of handwritten Urdu numerals. The data set was created by inviting authors from different categories of native Urdu speakers. One of the challenging and critical issues for the correct visualization of Urdu numerals is shape similarity between some of the digits. This issue was resolved using t-SNE, by exploiting local and global structures of the large data set at different scales. The global structure consists of geometrical features and local structure is the pixel-based information for each class of Urdu digits. We introduce a novel approach that allows the fusion of these two independent spaces using Euclidean pairwise distances in a highly organized and principled way. The fusion matrix embedded with t-SNE helps to locate each data point in a two (or three-) dimensional map in a very different way. Furthermore, our proposed approach focuses on preserving the local structure of the high-dimensional data while mapping to a low-dimensional plane. The visualizations produced by t-SNE outperformed other classical techniques like principal component analysis (PCA) and auto-encoders (AE) on our handwritten Urdu numeral dataset.

Download Full-text

Experimental duration and predator satiation levels systematically affect functional response parameters

10.1101/108886 ◽

2017 ◽

Author(s):

Yuanheng Li ◽

Björn C. Rall ◽

Gregor Kalinkat

Keyword(s):

Food Web ◽

Functional Response ◽

Large Data ◽

Handling Time ◽

Set Covering ◽

Parameter Estimates ◽

Functional Responses ◽

Predator Satiation ◽

Data Set ◽

Wide Range

AbstractEmpirical feeding studies where density-dependent consumption rates are fitted to functional response models are often used to parametrize the interaction strengths in models of population or food-web dynamics. However, the relationship between functional response parameter estimates from short-term feeding studies and real-world, long-term, trophic interaction strengths remains largely untested. In a critical first step to address this void, we tested for systematic effects of experimental duration and predator satiation on the estimation of functional response parameters, namely attack rate and handling time. Analyzing a large data set covering a wide range of predator taxonomies and body sizes we show that attack rates decrease with increasing experimental duration, and that handling times of starved predators are consistently shorter than those of satiated predators. Therefore, both the experimental duration and the predator satiation level have a strong and systematic impact on the predictions of population dynamics and food-web stability. Our study highlights potential pitfalls at the intersection of empirical and theoretical applications of functional responses. We conclude our study with some practical suggestions how these implications should be addressed in the future to improve predictive abilities and realism in models of predator-prey interactions.

Download Full-text

Growth differences and genetic parameter estimates of 15 teak (Tectona grandis L.f.) genotypes of various ages clonally propagated by microcuttings and planted under humid tropical conditions

Silvae Genetica ◽

10.1515/sg-2013-0024 ◽

2013 ◽

Vol 62 (1-6) ◽

pp. 196-206 ◽

Cited By ~ 3

Author(s):

D. K. S. Goh ◽

Y. Japarudin ◽

A. Alwi ◽

M. Lapammu ◽

A. Flori ◽

...

Keyword(s):

Growth Traits ◽

Genetic Parameter ◽

Tectona Grandis ◽

Annual Rainfall ◽

Phenotypic Correlation ◽

Parameter Estimates ◽

Age Related ◽

Tropical Conditions ◽

Growth Differences ◽

Clonal Test

Abstract Fifteen clones of teak (Tectona grandis) produced by micropropagation from 0.5 to more than 60 yr-old selected ortets were established in a clonal test in Sabah (East Malaysia) under 2500 mm of annual rainfall to compare their growth performances during the first 7 years of development. Field establishment was good with average mortality less than 10%. The clones developed rapidly true-to-type with significant between-clone differences in growth. Ranges of clone means were 13.6 to 19.3 m in height, 16.3 to 23.4 cm in diameter at breast height (DBH) and 129 to 264 dm3 in volume. Broad sense heritability estimates for these growth traits were lower overall for single trees (H2i) than for clone means (H2c) (H2i ≤ 0.257 vs H2c ≤ 0.634 for height, H2i ≤ 0.120 vs H2c ≤ 0.383 for DBH and H2i ≤ 0.125 vs H2c ≤ 0.364 for volume). The highest genetic gain that could be expected from the best three clones out of the fifteen compared was at age 2 for height (+0.66 m, or +11.7%), and age 3 for DBH (+0.87cm, or +10.4%) and volume (+4.65 dm3, or +15.7%). Age-related phenotypic correlation values were reliably (P < 0.0001) higher and more consistent for DBH (rP ≥ 0.61) than for height (0.37 ≤ rP ≤ 0.69), or than between DBH and height, except for height at 3 (0.51 ≤ rP ≤ 0.63) and 6 (0.55 ≤ rP ≤ 0.69) years. Height and DBH were moderately to highly genetically correlated (0.54 ≤ rG ≤ 0.90).

Download Full-text

A deep learning approach for staging embryonic tissue isolates with small data

PLoS ONE ◽

10.1371/journal.pone.0244151 ◽

2021 ◽

Vol 16 (1) ◽

pp. e0244151

Author(s):

Adam Joseph Ronald Pond ◽

Seongwon Hwang ◽

Berta Verd ◽

Benjamin Steventon

Keyword(s):

Machine Learning ◽

Three Dimensional ◽

3D Culture ◽

Large Data ◽

Test Accuracy ◽

Small Data ◽

Neural Net ◽

Learning Approaches ◽

Data Set

Machine learning approaches are becoming increasingly widespread and are now present in most areas of research. Their recent surge can be explained in part due to our ability to generate and store enormous amounts of data with which to train these models. The requirement for large training sets is also responsible for limiting further potential applications of machine learning, particularly in fields where data tend to be scarce such as developmental biology. However, recent research seems to indicate that machine learning and Big Data can sometimes be decoupled to train models with modest amounts of data. In this work we set out to train a CNN-based classifier to stage zebrafish tail buds at four different stages of development using small information-rich data sets. Our results show that two and three dimensional convolutional neural networks can be trained to stage developing zebrafish tail buds based on both morphological and gene expression confocal microscopy images, achieving in each case up to 100% test accuracy scores. Importantly, we show that high accuracy can be achieved with data set sizes of under 100 images, much smaller than the typical training set size for a convolutional neural net. Furthermore, our classifier shows that it is possible to stage isolated embryonic structures without the need to refer to classic developmental landmarks in the whole embryo, which will be particularly useful to stage 3D culture in vitro systems such as organoids. We hope that this work will provide a proof of principle that will help dispel the myth that large data set sizes are always required to train CNNs, and encourage researchers in fields where data are scarce to also apply ML approaches.

Download Full-text

PIQMEE: Bayesian Phylodynamic Method for Analysis of Large Data Sets with Duplicate Sequences

Molecular Biology and Evolution ◽

10.1093/molbev/msaa136 ◽

2020 ◽

Vol 37 (10) ◽

pp. 3061-3075 ◽

Cited By ~ 2

Author(s):

Veronika Boskova ◽

Tanja Stadler

Keyword(s):

Large Data ◽

Large Data Sets ◽

Parameter Estimates ◽

Data Sets ◽

Sequencing Data ◽

Full Data ◽

Data Set ◽

Reliable Parameter ◽

Phylodynamic Analysis ◽

Speed And Accuracy

Abstract Next-generation sequencing of pathogen quasispecies within a host yields data sets of tens to hundreds of unique sequences. However, the full data set often contains thousands of sequences, because many of those unique sequences have multiple identical copies. Data sets of this size represent a computational challenge for currently available Bayesian phylogenetic and phylodynamic methods. Through simulations, we explore how large data sets with duplicate sequences affect the speed and accuracy of phylogenetic and phylodynamic analysis within BEAST 2. We show that using unique sequences only leads to biases, and using a random subset of sequences yields imprecise parameter estimates. To overcome these shortcomings, we introduce PIQMEE, a BEAST 2 add-on that produces reliable parameter estimates from full data sets with increased computational efficiency as compared with the currently available methods within BEAST 2. The principle behind PIQMEE is to resolve the tree structure of the unique sequences only, while simultaneously estimating the branching times of the duplicate sequences. Distinguishing between unique and duplicate sequences allows our method to perform well even for very large data sets. Although the classic method converges poorly for data sets of 6,000 sequences when allowed to run for 7 days, our method converges in slightly more than 1 day. In fact, PIQMEE can handle data sets of around 21,000 sequences with 20 unique sequences in 14 days. Finally, we apply the method to a real, within-host HIV sequencing data set with several thousand sequences per patient.

Download Full-text