Missing phenotype data imputation in pedigree data analysis

Brooke L. Fridley; Mariza de Andrade

doi:10.1002/gepi.20261

Pedigree Data Analysis With Crossover Interference

Genetics ◽

10.1093/genetics/164.4.1561 ◽

2003 ◽

Vol 164 (4) ◽

pp. 1561-1566

Author(s):

Sharon Browning

Keyword(s):

Data Analysis ◽

Linkage Analysis ◽

Importance Sampling ◽

Genetic Data ◽

Genotype Data ◽

Pedigree Data ◽

Chi Square ◽

Crossover Interference ◽

Map Construction ◽

Interference Models

AbstractWe propose a new method for calculating probabilities for pedigree genetic data that incorporates crossover interference using the chi-square models. Applications include relationship inference, genetic map construction, and linkage analysis. The method is based on importance sampling of unobserved inheritance patterns conditional on the observed genotype data and takes advantage of fast algorithms for no-interference models while using reweighting to allow for interference. We show that the method is effective for arbitrarily many markers with small pedigrees.

Download Full-text

Extension of variance components approach to incorporate temporal trends and longitudinal pedigree data analysis

Genetic Epidemiology ◽

10.1002/gepi.01118 ◽

2002 ◽

Vol 22 (3) ◽

pp. 221-232 ◽

Cited By ~ 25

Author(s):

Mariza de Andrade ◽

René Guéguen ◽

Sophie Visvikis ◽

Catherine Sass ◽

Gérard Siest ◽

...

Keyword(s):

Data Analysis ◽

Variance Components ◽

Temporal Trends ◽

Pedigree Data

Download Full-text

Fast-GBS v2.0: an analysis toolkit for genotyping-by-sequencing data

Genome ◽

10.1139/gen-2020-0077 ◽

2020 ◽

Vol 63 (11) ◽

pp. 577-581

Author(s):

Davoud Torkamaneh ◽

Jérôme Laroche ◽

François Belzile

Keyword(s):

Data Analysis ◽

Missing Data ◽

Low Cost ◽

Genotyping By Sequencing ◽

Data Imputation ◽

Sequencing Data ◽

Missing Data Imputation ◽

Analysis Process ◽

Computational Resources ◽

Analysis Platform

Genotyping-by-sequencing (GBS) is a rapid, flexible, low-cost, and robust genotyping method that simultaneously discovers variants and calls genotypes within a broad range of samples. These characteristics make GBS an excellent tool for many applications and research questions from conservation biology to functional genomics in both model and non-model species. Continued improvement of GBS relies on a more comprehensive understanding of data analysis, development of fast and efficient bioinformatics pipelines, accurate missing data imputation, and active post-release support. Here, we present the second generation of Fast-GBS (v2.0) that offers several new options (e.g., processing paired-end reads and imputation of missing data) and features (e.g., summary statistics of genotypes) to improve the GBS data analysis process. The performance assessment analysis showed that Fast-GBS v2.0 outperformed other available analytical pipelines, such as GBS-SNP-CROP and Gb-eaSy. Fast-GBS v2.0 provides an analysis platform that can be run with different types of sequencing data, modest computational resources, and allows for missing-data imputation for various species in different contexts.

Download Full-text

Machine Learning Data Imputation and Classification in a Multicohort Hypertension Clinical Study

Bioinformatics and Biology Insights ◽

10.4137/bbi.s29473 ◽

2015 ◽

Vol 9s3 ◽

pp. BBI.S29473 ◽

Cited By ~ 8

Author(s):

William Seffens ◽

Chad Evans ◽

Keyword(s):

Machine Learning ◽

Clinical Study ◽

Translational Research ◽

Clinical Data ◽

Rare Variants ◽

Genome Wide Association Studies ◽

Data Imputation ◽

Data Set ◽

Genome Wide ◽

Phenotype Data

Health-care initiatives are pushing the development and utilization of clinical data for medical discovery and translational research studies. Machine learning tools implemented for Big Data have been applied to detect patterns in complex diseases. This study focuses on hypertension and examines phenotype data across a major clinical study called Minority Health Genomics and Translational Research Repository Database composed of self-reported African American (AA) participants combined with related cohorts. Prior genome-wide association studies for hypertension in AAs presumed that an increase of disease burden in susceptible populations is due to rare variants. But genomic analysis of hypertension, even those designed to focus on rare variants, has yielded marginal genome-wide results over many studies. Machine learning and other nonparametric statistical methods have recently been shown to uncover relationships in complex phenotypes, genotypes, and clinical data. We trained neural networks with phenotype data for missing-data imputation to increase the usable size of a clinical data set. Validity was established by showing performance effects using the expanded data set for the association of phenotype variables with case/control status of patients. Data mining classification tools were used to generate association rules.

Download Full-text

A Bayesian structural equation model in general pedigree data analysis

Statistical Analysis and Data Mining The ASA Data Science Journal ◽

10.1002/sam.11434 ◽

2019 ◽

Vol 12 (5) ◽

pp. 404-411

Author(s):

Mahdi Akbarzadeh ◽

Abbas Moghimbeigi ◽

Nathan Morris ◽

Maryam S. Daneshpour ◽

Hossein Mahjub ◽

...

Keyword(s):

Data Analysis ◽

Structural Equation Model ◽

Structural Equation ◽

Equation Model ◽

Pedigree Data ◽

General Pedigree

Download Full-text

<i>In-field high throughput phenotyping and phenotype data analysis for cotton plant growth using LiDAR</i>

10.13031/aim.201701210 ◽

2017 ◽

Author(s):

Shangpeng Sun ◽

Changying Li

Keyword(s):

Data Analysis ◽

Plant Growth ◽

Cotton Plant ◽

High Throughput ◽

Phenotype Data ◽

High Throughput Phenotyping

Download Full-text

Advanced phenotyping and phenotype data analysis for the study of plant growth and development

Frontiers in Plant Science ◽

10.3389/fpls.2015.00619 ◽

2015 ◽

Vol 6 ◽

Cited By ~ 107

Author(s):

Md. Matiur Rahaman ◽

Dijun Chen ◽

Zeeshan Gillani ◽

Christian Klukas ◽

Ming Chen

Keyword(s):

Data Analysis ◽

Plant Growth ◽

Growth And Development ◽

Plant Growth And Development ◽

Phenotype Data

Download Full-text

The Role of Microsimulation in Longitudinal Data Analysis

Canadian Studies in Population ◽

10.25336/p67k5x ◽

2001 ◽

Vol 28 (2) ◽

pp. 313 ◽

Cited By ~ 12

Author(s):

Douglas A. Wolf

Keyword(s):

Data Analysis ◽

Longitudinal Data ◽

Static Analysis ◽

Statistical Models ◽

Longitudinal Data Analysis ◽

Data Imputation ◽

Missing Data Imputation ◽

Unique Role ◽

Dynamic Analyses

Microsimulation is well known as a tool for static analysis of tax and transfer policies, for the generation of programmatic cost estimates, and dynamic analyses of socio-economic and demographic systems. However, microsimulation also has the potential to contribute to longitudinal data analysis in several ways, including extending the range of outputs generated by a model, addressing several defective-data problems, and serving as a vehicle for missing-data imputation. This paper discusses microsimulation procedures suitable for several commonly-used statistical models applied to longitudinal data. It also addresses the unique role that can be played by microsimulation in longitudinal data analysis, and the problem of accounting for the several sources of variability associated with microsimulation procedures.

Download Full-text