Mapping Quantitative Trait Loci in F2 Incorporating Phenotypes of F3 Progeny

Yuan-Ming Zhang; Shizhong Xu

doi:10.1093/genetics/166.4.1981

Mapping Quantitative Trait Loci in F2 Incorporating Phenotypes of F3 Progeny

Genetics ◽

10.1093/genetics/166.4.1981 ◽

2004 ◽

Vol 166 (4) ◽

pp. 1981-1993 ◽

Cited By ~ 1

Author(s):

Yuan-Ming Zhang ◽

Shizhong Xu

Keyword(s):

Qtl Mapping ◽

Statistical Method ◽

Mixture Model ◽

Statistical Power ◽

Plant Traits ◽

Simulated Data ◽

Real Data ◽

Fundamental Principle ◽

Laboratory Animals ◽

Daughter Design

AbstractIn plants and laboratory animals, QTL mapping is commonly performed using F2 or BC individuals derived from the cross of two inbred lines. Typical QTL mapping statistics assume that each F2 individual is genotyped for the markers and phenotyped for the trait. For plant traits with low heritability, it has been suggested to use the average phenotypic values of F3 progeny derived from selfing F2 plants in place of the F2 phenotype itself. All F3 progeny derived from the same F2 plant belong to the same F2:3 family, denoted by F2:3. If the size of each F2:3 family (the number of F3 progeny) is sufficiently large, the average value of the family will represent the genotypic value of the F2 plant, and thus the power of QTL mapping may be significantly increased. The strategy of using F2 marker genotypes and F3 average phenotypes for QTL mapping in plants is quite similar to the daughter design of QTL mapping in dairy cattle. We study the fundamental principle of the plant version of the daughter design and develop a new statistical method to map QTL under this F2:3 strategy. We also propose to combine both the F2 phenotypes and the F2:3 average phenotypes to further increase the power of QTL mapping. The statistical method developed in this study differs from published ones in that the new method fully takes advantage of the mixture distribution for F2:3 families of heterozygous F2 plants. Incorporation of this new information has significantly increased the statistical power of QTL detection relative to the classical F2 design, even if only a single F3 progeny is collected from each F2:3 family. The mixture model is developed on the basis of a single-QTL model and implemented via the EM algorithm. Substantial computer simulation was conducted to demonstrate the improved efficiency of the mixture model. Extension of the mixture model to multiple QTL analysis is developed using a Bayesian approach. The computer program performing the Bayesian analysis of the simulated data is available to users for real data analysis.

Download Full-text

An evaluation of the interpretability and predictive performance of the BayesR model for genomic prediction

10.1101/2020.10.23.351700 ◽

2020 ◽

Author(s):

Fanny Mollandin ◽

Andrea Rau ◽

Pascal Croiseau

Keyword(s):

Linkage Disequilibrium ◽

Qtl Mapping ◽

Effect Size ◽

Genomic Prediction ◽

Simulated Data ◽

Predictive Performance ◽

Real Data ◽

Sliding Windows ◽

Size Classes ◽

The Impact

ABSTRACTTechnological advances and decreasing costs have led to the rise of increasingly dense genotyping data, making feasible the identification of potential causal markers. Custom genotyping chips, which combine medium-density genotypes with a custom genotype panel, can capitalize on these candidates to potentially yield improved accuracy and interpretability in genomic prediction. A particularly promising model to this end is BayesR, which divides markers into four effect size classes. BayesR has been shown to yield accurate predictions and promise for quantitative trait loci (QTL) mapping in real data applications, but an extensive benchmarking in simulated data is currently lacking. Based on a set of real genotypes, we generated simulated data under a variety of genetic architectures, phenotype heritabilities, and we evaluated the impact of excluding or including causal markers among the genotypes. We define several statistical criteria for QTL mapping, including several based on sliding windows to account for linkage disequilibrium. We compare and contrast these statistics and their ability to accurately prioritize known causal markers. Overall, we confirm the strong predictive performance for BayesR in moderately to highly heritable traits, particularly for 50k custom data. In cases of low heritability or weak linkage disequilibrium with the causal marker in 50k genotypes, QTL mapping is a challenge, regardless of the criterion used. BayesR is a promising approach to simultaneously obtain accurate predictions and interpretable classifications of SNPs into effect size classes. We illustrated the performance of BayesR in a variety of simulation scenarios, and compared the advantages and limitations of each.

Download Full-text

Estimating dengue transmission intensity from serological data: a comparative analysis using mixture and catalytic models.

10.1101/2021.10.29.21259708 ◽

2021 ◽

Author(s):

Victoria M Cox ◽

Megan M O'Driscoll ◽

Natsuko Imai ◽

Ari Prayitno ◽

Sri Rezeki Hadinegoro ◽

...

Keyword(s):

Mixture Models ◽

Mixture Model ◽

Time Constant ◽

Binary Classification ◽

Simulated Data ◽

Real Data ◽

Denv Infection ◽

Health Concern ◽

Time Varying ◽

Serological Data

Background. Dengue virus (DENV) infection is a global health concern of increasing magnitude. To target intervention strategies, accurate estimates of the force of infection (FOI) are necessary. Catalytic models have been widely used to estimate DENV FOI and rely on a binary classification of serostatus as seropositive or seronegative, according to pre-defined antibody thresholds. Previous work has demonstrated the use of thresholds can cause serostatus misclassification and biased estimates. In contrast, mixture models do not rely on thresholds and use the full distribution of antibody titres. To date, there has been limited application of mixture models to estimate DENV FOI. Methods. We compare the application of mixture models and time-constant and time-varying catalytic models to simulated data and to serological data collected in Vietnam from 2004 to 2009 (N ≥ 2178) and Indonesia in 2014 (N = 3194). Results. The simulation study showed greater estimate bias from the time-constant and time-varying catalytic models (FOI bias = 1.3% (0.05%, 4.6%) and 2.3% (0.06%, 7.8%), seroprevalence bias = 3.1% (0.25%, 9.4%) and 2.9% (0.26%, 8.7%), respectively) than from the mixture model (FOI bias = 0.41% (95% CI 0.02%, 2.7%), seroprevalence bias = 0.11% (0.01%, 3.6%)). When applied to real data from Vietnam, the mixture model frequently produced higher FOI and seroprevalence estimates than the catalytic models. Conclusions. Our results suggest mixture models represent valid, potentially less biased, alternatives to catalytic models, which could be particularly useful when estimating FOI and seroprevalence in low transmission settings, where serostatus misclassification tends to be higher.

Download Full-text

Shape invariant mixture model for clustering non-linear longitudinal growth trajectories

Statistical Methods in Medical Research ◽

10.1177/0962280218815301 ◽

2018 ◽

Vol 28 (12) ◽

pp. 3769-3784

Author(s):

Zihang Lu ◽

Wendy Lou

Keyword(s):

Mixture Model ◽

Linear Growth ◽

Simulated Data ◽

Real Data ◽

Repeated Measurements ◽

Growth Trajectories ◽

Growth Mixture Model ◽

Growth Mixture ◽

Shape Invariant ◽

Non Linear

In longitudinal studies, it is often of great interest to cluster individual trajectories based on repeated measurements taken over time. Non-linear growth trajectories are often seen in practice, and the individual data can also be measured sparsely, and at irregular time points, which may complicate the modeling process. Motivated by a study of pregnant women hormone profiles, we proposed a shape invariant growth mixture model for clustering non-linear growth trajectories. Bayesian inference via Monte Carlo Markov Chain was employed to estimate the parameters of interest. We compared our model to the commonly used growth mixture model and functional clustering approach by simulation studies. Results from analyzing the real data and simulated data were presented and discussed.

Download Full-text

An evaluation of the predictive performance and mapping power of the BayesR model for genomic prediction

G3 Genes|Genome|Genetics ◽

10.1093/g3journal/jkab225 ◽

2021 ◽

Author(s):

Fanny Mollandin ◽

Andrea Rau ◽

Pascal Croiseau

Keyword(s):

Linkage Disequilibrium ◽

Qtl Mapping ◽

Effect Size ◽

Genomic Prediction ◽

Simulated Data ◽

Predictive Performance ◽

Real Data ◽

Sliding Windows ◽

Size Classes ◽

The Impact

Abstract Technological advances and decreasing costs have led to the rise of increasingly dense genotyping data, making feasible the identification of potential causal markers. Custom genotyping chips, which combine medium-density genotypes with a custom genotype panel, can capitalize on these candidates to potentially yield improved accuracy and interpretability in genomic prediction. A particularly promising model to this end is BayesR, which divides markers into four effect size classes. BayesR has been shown to yield accurate predictions and promise for quantitative trait loci (QTL) mapping in real data applications, but an extensive benchmarking in simulated data is currently lacking. Based on a set of real genotypes, we generated simulated data under a variety of genetic architectures, phenotype heritabilities, and we evaluated the impact of excluding or including causal markers among the genotypes. We define several statistical criteria for QTL mapping, including several based on sliding windows to account for linkage disequilibrium. We compare and contrast these statistics and their ability to accurately prioritize known causal markers. Overall, we confirm the strong predictive performance for BayesR in moderately to highly heritable traits, particularly for 50k custom data. In cases of low heritability or weak linkage disequilibrium with the causal marker in 50k genotypes, QTL mapping is a challenge, regardless of the criterion used. BayesR is a promising approach to simultaneously obtain accurate predictions and interpretable classifications of SNPs into effect size classes. We illustrated the performance of BayesR in a variety of simulation scenarios, and compared the advantages and limitations of each.

Download Full-text

DRAMS: A Tool to Detect and Re-Align Mixed-up Samples for Integrative Studies of Multi-omics Data

10.1101/831537 ◽

2019 ◽

Author(s):

Yi Jiang ◽

Gina Giase ◽

Kay Grennan ◽

Annie W. Shieh ◽

Yan Xia ◽

...

Keyword(s):

Statistical Power ◽

Genetic Relatedness ◽

Majority Vote ◽

Simulated Data ◽

Real Data ◽

Sorting Algorithm ◽

Omics Data ◽

Sample Collection ◽

Complex Disorders ◽

Integrative Analyses

AbstractStudies of complex disorders benefit from integrative analyses of multiple omics data. Yet, sample mix-ups frequently occur in multi-omics studies, weakening statistical power and risking false findings. Accurately aligning sample information, genotype, and corresponding omics data is critical for integrative analyses. We developed DRAMS (https://github.com/Yi-Jiang/DRAMS) to Detect and Re-Align Mixed-up Samples to address the sample mix-up problem. It uses a logistic regression model followed by a modified topological sorting algorithm to identify the potential true IDs based on data relationships of multi-omics. According to tests using simulated data, the more types of omics data used or the smaller the proportion of mix-ups, the better that DRAMS performs. Applying DRAMS to real data from the PsychENCODE BrainGVEX project, we detected and corrected 201 (12.5% of total data generated) mix-ups. Of the 21 mix-ups involving errors of racial identity, DRAMS re-assigned all samples to the correct racial cluster in the 1000 Genomes project. In doing so, quantitative trait loci (QTL) (FDR<0.01) increased by an average of 1.62-fold. The use of DRAMS in multi-omics studies will strengthen statistical power of the study and improve quality of the results. Even though very limited studies have multi-omics data in place, we expect such data will increase quickly with the needs of DRAMS.Author summarySample mix-up happens inevitably during sample collection, processing, and data management. It leads to reduced statistical power and sometimes false findings. It is of great importance to correct mixed-up samples before conducting any downstream analyses. We developed DRAMS to detect and re-align mixed-up samples in multi-omics studies. The basic idea of DRAMS is to align the data and labels for each sample leveraging the genetic information of multi-omics data. DRAMS corrects sample IDs following a two-step strategy. At first, it estimates pairwise genetic relatedness among all the data generated from all the individuals. Because the different data generated from the same individual should share the same genetics, we can cluster all the highly related data and consider that the data from one cluster have only one potential ID. Then, we used a “majority vote” strategy to infer the potential ID for individuals in each cluster. Other information, such as match of genetics-based and reported sexes, omics priorority, etc., were also used to direct identifying the potential IDs. It has been proved that DRAMS performs very well in both simulation and PsychENCODE BrainGVEX multi-omics data.

Download Full-text

Separation of Chromatographic Co-Eluted Compounds by Clustering and by Functional Data Analysis

Metabolites ◽

10.3390/metabo11040214 ◽

2021 ◽

Vol 11 (4) ◽

pp. 214

Author(s):

Aneta Sawikowska ◽

Anna Piasecka ◽

Piotr Kachlicki ◽

Paweł Krajewski

Keyword(s):

Simulated Data ◽

Principal Component ◽

Real Data ◽

Functional Principal Component Analysis ◽

Additional Advantage ◽

Time Alignment ◽

Peak Separation ◽

Biological Mixtures ◽

Overlapping Peaks ◽

Retention Time Alignment

Peak overlapping is a common problem in chromatography, mainly in the case of complex biological mixtures, i.e., metabolites. Due to the existence of the phenomenon of co-elution of different compounds with similar chromatographic properties, peak separation becomes challenging. In this paper, two computational methods of separating peaks, applied, for the first time, to large chromatographic datasets, are described, compared, and experimentally validated. The methods lead from raw observations to data that can form inputs for statistical analysis. First, in both methods, data are normalized by the mass of sample, the baseline is removed, retention time alignment is conducted, and detection of peaks is performed. Then, in the first method, clustering is used to separate overlapping peaks, whereas in the second method, functional principal component analysis (FPCA) is applied for the same purpose. Simulated data and experimental results are used as examples to present both methods and to compare them. Real data were obtained in a study of metabolomic changes in barley (Hordeum vulgare) leaves under drought stress. The results suggest that both methods are suitable for separation of overlapping peaks, but the additional advantage of the FPCA is the possibility to assess the variability of individual compounds present within the same peaks of different chromatograms.

Download Full-text

A Closed-Form Solution to Planar Feature-Based Registration of LiDAR Point Clouds

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10070435 ◽

2021 ◽

Vol 10 (7) ◽

pp. 435

Author(s):

Yongbo Wang ◽

Nanshan Zheng ◽

Zhengfu Bian

Keyword(s):

Closed Form ◽

Closed Form Solution ◽

Simulated Data ◽

Real Data ◽

Point Clouds ◽

Form Solution ◽

Spatial Transformation ◽

Dual Quaternions ◽

Feature Based ◽

Planar Feature

Since pairwise registration is a necessary step for the seamless fusion of point clouds from neighboring stations, a closed-form solution to planar feature-based registration of LiDAR (Light Detection and Ranging) point clouds is proposed in this paper. Based on the Plücker coordinate-based representation of linear features in three-dimensional space, a quad tuple-based representation of planar features is introduced, which makes it possible to directly determine the difference between any two planar features. Dual quaternions are employed to represent spatial transformation and operations between dual quaternions and the quad tuple-based representation of planar features are given, with which an error norm is constructed. Based on L2-norm-minimization, detailed derivations of the proposed solution are explained step by step. Two experiments were designed in which simulated data and real data were both used to verify the correctness and the feasibility of the proposed solution. With the simulated data, the calculated registration results were consistent with the pre-established parameters, which verifies the correctness of the presented solution. With the real data, the calculated registration results were consistent with the results calculated by iterative methods. Conclusions can be drawn from the two experiments: (1) The proposed solution does not require any initial estimates of the unknown parameters in advance, which assures the stability and robustness of the solution; (2) Using dual quaternions to represent spatial transformation greatly reduces the additional constraints in the estimation process.

Download Full-text

Penalized partial least squares for pleiotropy

BMC Bioinformatics ◽

10.1186/s12859-021-03968-1 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Camilo Broc ◽

Therese Truong ◽

Benoit Liquet

Keyword(s):

Least Squares ◽

Partial Least Squares ◽

Association Studies ◽

A Priori ◽

Simulated Data ◽

Real Data ◽

Genome Wide Association Studies ◽

Genetic Associations ◽

Multiple Traits ◽

Application Fields

Abstract Background The increasing number of genome-wide association studies (GWAS) has revealed several loci that are associated to multiple distinct phenotypes, suggesting the existence of pleiotropic effects. Highlighting these cross-phenotype genetic associations could help to identify and understand common biological mechanisms underlying some diseases. Common approaches test the association between genetic variants and multiple traits at the SNP level. In this paper, we propose a novel gene- and a pathway-level approach in the case where several independent GWAS on independent traits are available. The method is based on a generalization of the sparse group Partial Least Squares (sgPLS) to take into account groups of variables, and a Lasso penalization that links all independent data sets. This method, called joint-sgPLS, is able to convincingly detect signal at the variable level and at the group level. Results Our method has the advantage to propose a global readable model while coping with the architecture of data. It can outperform traditional methods and provides a wider insight in terms of a priori information. We compared the performance of the proposed method to other benchmark methods on simulated data and gave an example of application on real data with the aim to highlight common susceptibility variants to breast and thyroid cancers. Conclusion The joint-sgPLS shows interesting properties for detecting a signal. As an extension of the PLS, the method is suited for data with a large number of variables. The choice of Lasso penalization copes with architectures of groups of variables and observations sets. Furthermore, although the method has been applied to a genetic study, its formulation is adapted to any data with high number of variables and an exposed a priori architecture in other application fields.

Download Full-text

Calibration of Camera and Flash LiDAR System with a Triangular Pyramid Target

Applied Sciences ◽

10.3390/app11020582 ◽

2021 ◽

Vol 11 (2) ◽

pp. 582

Author(s):

Zean Bu ◽

Changku Sun ◽

Peng Wang ◽

Hang Dong

Keyword(s):

Simulated Data ◽

Real Data ◽

Calibration Method ◽

Multiple Sensors ◽

Triangular Pyramid ◽

World Coordinate System ◽

Flash Lidar ◽

Novel Method ◽

3D Information ◽

Incremental Validation

Calibration between multiple sensors is a fundamental procedure for data fusion. To address the problems of large errors and tedious operation, we present a novel method to conduct the calibration between light detection and ranging (LiDAR) and camera. We invent a calibration target, which is an arbitrary triangular pyramid with three chessboard patterns on its three planes. The target contains both 3D information and 2D information, which can be utilized to obtain intrinsic parameters of the camera and extrinsic parameters of the system. In the proposed method, the world coordinate system is established through the triangular pyramid. We extract the equations of triangular pyramid planes to find the relative transformation between two sensors. One capture of camera and LiDAR is sufficient for calibration, and errors are reduced by minimizing the distance between points and planes. Furthermore, the accuracy can be increased by more captures. We carried out experiments on simulated data with varying degrees of noise and numbers of frames. Finally, the calibration results were verified by real data through incremental validation and analyzing the root mean square error (RMSE), demonstrating that our calibration method is robust and provides state-of-the-art performance.

Download Full-text

Prediction of Fuel Poverty Potential Risk Index Using Six Regression Algorithms: A Case-Study of Chilean Social Dwellings

Sustainability ◽

10.3390/su13052426 ◽

2021 ◽

Vol 13 (5) ◽

pp. 2426

Author(s):

David Bienvenido-Huertas ◽

Jesús A. Pulido-Arcas ◽

Carlos Rubio-Bellido ◽

Alexis Pérez-Fargallo

Keyword(s):

Low Income ◽

Potential Risk ◽

Energy Use ◽

Risk Index ◽

Computing Time ◽

Simulated Data ◽

Real Data ◽

Support Vector ◽

Energy Poverty ◽

Regression Algorithms

In recent times, studies about the accuracy of algorithms to predict different aspects of energy use in the building sector have flourished, being energy poverty one of the issues that has received considerable critical attention. Previous studies in this field have characterized it using different indicators, but they have failed to develop instruments to predict the risk of low-income households falling into energy poverty. This research explores the way in which six regression algorithms can accurately forecast the risk of energy poverty by means of the fuel poverty potential risk index. Using data from the national survey of socioeconomic conditions of Chilean households and generating data for different typologies of social dwellings (e.g., form ratio or roof surface area), this study simulated 38,880 cases and compared the accuracy of six algorithms. Multilayer perceptron, M5P and support vector regression delivered the best accuracy, with correlation coefficients over 99.5%. In terms of computing time, M5P outperforms the rest. Although these results suggest that energy poverty can be accurately predicted using simulated data, it remains necessary to test the algorithms against real data. These results can be useful in devising policies to tackle energy poverty in advance.

Download Full-text