scholarly journals Resolving microsatellite genotype ambiguity in populations of allopolyploid and diploidized autopolyploid organisms using negative correlations between allelic variables

2015 ◽  
Author(s):  
Lindsay V. Clark ◽  
Andrea Drauch Schreier

AbstractA major limitation in the analysis of genetic marker data from polyploid organisms is non-Mendelian segregation, particularly when a single marker yields allelic signals from multiple, independently segregating loci (isoloci). However, with markers such as microsatellites that detect more than two alleles, it is sometimes possible to deduce which alleles belong to which isoloci. Here we describe a novel mathematical property of codominant marker data when it is recoded as binary (presence/absence) allelic variables: under random mating in an infinite population, two allelic variables will be negatively correlated if they belong to the same locus, but uncorrelated if they belong to different loci. We present an algorithm to take advantage of this mathematical property, sorting alleles into isoloci based on correlations, then refining the allele assignments after checking for consistency with individual genotypes. We demonstrate the utility of our method on simulated data, as well as a real microsatellite dataset from a natural population of octoploid white sturgeon (Acipenser transmontanus). Our methodology is implemented in the R package polysat version 1.5.

2001 ◽  
Vol 78 (2) ◽  
pp. 163-170 ◽  
Author(s):  
A. C. FIUMERA ◽  
M. A. ASMUSSEN

Parentage studies often estimate the number of parents contributing to half-sib progeny arrays by counting the number of alleles attributed to unshared parents. This approach is compromised when an offspring has the same heterozygous genotype as the shared parent, for then the contribution of the unshared parent cannot be unambiguously deduced. To determine how often such cases occur, formulae for co-dominant markers with n alleles are derived here for Ph, the probability that a given heterozygous parent has an offspring with the same heterozygous genotype, and Pa, the probability that a randomly chosen offspring has the same heterozygous genotype as the shared parent. These formulae have been derived assuming Mendelian segregation with either (1) an arbitrary mating system, (2) random mating or (3) mixed mating. The maximum value of Pa under random mating is 0·25 and occurs with any two alleles each at a frequency of 0·5. The behaviour with partial selfing (where reproduction is by selfing with probability s, and random mating otherwise) is more complex. For n [les ] 3 alleles, the maximum value of Pa occurs with any two alleles each at a frequency of 0·5 if s < 0·25, and with three equally frequent alleles otherwise. Numerically, the maximum value of Pa for n [ges ] 4 alleles occurs with n* [les ] n alleles at equal frequencies, where the maximizing number of alleles n* is an increasing function of the selfing rate. Analytically, the maximum occurs with all n alleles present and equally frequent if s [ges ] 2/3. In addition, the potential applicability of these formulae for evolutionary studies is briefly discussed.


Author(s):  
Zachary R. McCaw ◽  
Hanna Julienne ◽  
Hugues Aschard

AbstractAlthough missing data are prevalent in applications, existing implementations of Gaussian mixture models (GMMs) require complete data. Standard practice is to perform complete case analysis or imputation prior to model fitting. Both approaches have serious drawbacks, potentially resulting in biased and unstable parameter estimates. Here we present MGMM, an R package for fitting GMMs in the presence of missing data. Using three case studies on real and simulated data sets, we demonstrate that, when the underlying distribution is near-to a GMM, MGMM is more effective at recovering the true cluster assignments than state of the art imputation followed by standard GMM. Moreover, MGMM provides an accurate assessment of cluster assignment uncertainty even when the generative distribution is not a GMM. This assessment may be used to identify unassignable observations. MGMM is available as an R package on CRAN: https://CRAN.R-project.org/package=MGMM.


1966 ◽  
Vol 3 (01) ◽  
pp. 94-114 ◽  
Author(s):  
B. E. Ellison

This paper is concerned with the distribution of “types” of individuals in an infinite population after indefinitely many nonoverlapping generations of random mating. The absence of selection and mutation is assumed. The probabilistic law which governs the production of an offspring may be asymmetrical with respect to the “sexes” of the two parents, but the law is assumed to apply independently of the “sex” of the offspring. The question of the existence of a limit distribution of types, the rate at which a limit distribution is approached, and properties of limit distributions are treated.


2020 ◽  
Vol 37 (7) ◽  
pp. 2124-2136
Author(s):  
Paul D Blischak ◽  
Michael S Barker ◽  
Ryan N Gutenkunst

Abstract Demographic inference using the site frequency spectrum (SFS) is a common way to understand historical events affecting genetic variation. However, most methods for estimating demography from the SFS assume random mating within populations, precluding these types of analyses in inbred populations. To address this issue, we developed a model for the expected SFS that includes inbreeding by parameterizing individual genotypes using beta-binomial distributions. We then take the convolution of these genotype probabilities to calculate the expected frequency of biallelic variants in the population. Using simulations, we evaluated the model’s ability to coestimate demography and inbreeding using one- and two-population models across a range of inbreeding levels. We also applied our method to two empirical examples, American pumas (Puma concolor) and domesticated cabbage (Brassica oleracea var. capitata), inferring models both with and without inbreeding to compare parameter estimates and model fit. Our simulations showed that we are able to accurately coestimate demographic parameters and inbreeding even for highly inbred populations (F = 0.9). In contrast, failing to include inbreeding generally resulted in inaccurate parameter estimates in simulated data and led to poor model fit in our empirical analyses. These results show that inbreeding can have a strong effect on demographic inference, a pattern that was especially noticeable for parameters involving changes in population size. Given the importance of these estimates for informing practices in conservation, agriculture, and elsewhere, our method provides an important advancement for accurately estimating the demographic histories of these species.


1973 ◽  
Vol 21 (3) ◽  
pp. 247-262 ◽  
Author(s):  
B. S. Weir ◽  
C. Clark Cockerham

SUMMARYAn infinite population practising a constant amount of selfing and random mating is studied. The effects of the mating system on two linked loci with an arbitrary number of neutral alleles are determined. Expressions are obtained for the two-locus descent measure, and hence genotypic frequencies and disequilibria functions, in any generation. The nature of the equilibrium population is deduced. The special cases of pure selfing or pure random mating and completely linked or completely unlinked loci are considered separately.


2018 ◽  
Vol 35 (10) ◽  
pp. 1797-1798 ◽  
Author(s):  
Han Cao ◽  
Jiayu Zhou ◽  
Emanuel Schwarz

Abstract Motivation Multi-task learning (MTL) is a machine learning technique for simultaneous learning of multiple related classification or regression tasks. Despite its increasing popularity, MTL algorithms are currently not available in the widely used software environment R, creating a bottleneck for their application in biomedical research. Results We developed an efficient, easy-to-use R library for MTL (www.r-project.org) comprising 10 algorithms applicable for regression, classification, joint predictor selection, task clustering, low-rank learning and incorporation of biological networks. We demonstrate the utility of the algorithms using simulated data. Availability and implementation The RMTL package is an open source R package and is freely available at https://github.com/transbioZI/RMTL. RMTL will also be available on cran.r-project.org. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (11) ◽  
pp. 3466-3473
Author(s):  
Maya Levy ◽  
Amit Frishberg ◽  
Irit Gat-Viks

Abstract Motivation Cell-to-cell variation has uncovered associations between cellular phenotypes. However, it remains challenging to address the cellular diversity of such associations. Results Here, we do not rely on the conventional assumption that the same association holds throughout the entire cell population. Instead, we assume that associations may exist in a certain subset of the cells. We developed CEllular Niche Association (CENA) to reliably predict pairwise associations together with the cell subsets in which the associations are detected. CENA does not rely on predefined subsets but only requires that the cells of each predicted subset would share a certain characteristic state. CENA may therefore reveal dynamic modulation of dependencies along cellular trajectories of temporally evolving states. Using simulated data, we show the advantage of CENA over existing methods and its scalability to a large number of cells. Application of CENA to real biological data demonstrates dynamic changes in associations that would be otherwise masked. Availability and implementation CENA is available as an R package at Github: https://github.com/mayalevy/CENA and is accompanied by a complete set of documentations and instructions. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


1996 ◽  
Vol 74 (11) ◽  
pp. 1852-1859 ◽  
Author(s):  
Matthew A. Gitzendanner ◽  
Gayle E. Dupper ◽  
Eleanor E. White ◽  
Brett M. Foord ◽  
Paul D. Hodgskiss ◽  
...  

Lack of genetic markers has hindered the study of the mating system of Cronartium ribicola, an exotic forest pathogen Meeting natural and cultivated white pines throughout North America. Isozymes, randomly amplified polymorphic DNA (RAPDs), and restriction length polymorphisms (RFLPs) were used to study the mating system of this rust. Heterozygosity (outcrossing) in diploid telia was demonstrated by analysis of cultures derived from the meiotic products (basidiospores) of individual telia. Families of basidiospores cultured from single telia were used to test for Mendelian segregation and for conformance of loci to Hardy–Weinberg equilibrium. A total of 18 polymorphic loci were identified with the three marker systems. All except for three RAPD loci showed Mendelian segregation in the single-telium families. To quantify the level of outcrossing, gene and genotype frequencies were calculated for families from a single population. Up to 24 families were surveyed with isozymes, 14 with RAPDs, and 18 with RFLPs. Except for one isozyme locus (MPI) in one sample, all 14 loci tested with these families were in Hardy–Weinberg equilibrium, indicating random mating. Further studies, with a different sample from the same population, showed all three isozyme loci to be in Hardy–Weinberg equilibrium. The three marker systems were consistent as to the amount of variation detected. Resistance selection and breeding programs must consider the implications of genetic recombination that outcrossing affords the rust. Keywords: isozymes, RAPDs, RFLPs, Hardy–Weinberg equilibrium, white pine blister rust.


2014 ◽  
Vol 33 (2) ◽  
pp. 27
Author(s):  
Maria Angeles Gallego ◽  
Maria Victoria Ibanez ◽  
Amelia Simó

Many medical and biological problems require to extract information from microscopical images. Boolean models have been extensively used to analyze binary images of random clumps in many scientific fields. In this paper, a particular type of Boolean model with an underlying non-stationary point process is considered. The intensity of the underlying point process is formulated as a fixed function of the distance to a region of interest. A method to estimate the parameters of this Boolean model is introduced, and its performance is checked in two different settings. Firstly, a comparative study with other existent methods is done using simulated data. Secondly, the method is applied to analyze the longleaf data set, which is a very popular data set in the context of point processes included in the R package spatstat. Obtained results show that the new method provides as accurate estimates as those obtained with more complex methods developed for the general case. Finally, to illustrate the application of this model and this method, a particular type of phytopathological images are analyzed. These images show callose depositions in leaves of Arabidopsis plants. The analysis of callose depositions, is very popular in the phytopathological literature to quantify activity of plant immunity.


2014 ◽  
Vol 42 (15) ◽  
pp. e121-e121 ◽  
Author(s):  
Hari Krishna Yalamanchili ◽  
Zhaoyuan Li ◽  
Panwen Wang ◽  
Maria P. Wong ◽  
Jianfeng Yao ◽  
...  

Abstract Conventionally, overall gene expressions from microarrays are used to infer gene networks, but it is challenging to account splicing isoforms. High-throughput RNA Sequencing has made splice variant profiling practical. However, its true merit in quantifying splicing isoforms and isoform-specific exon expressions is not well explored in inferring gene networks. This study demonstrates SpliceNet, a method to infer isoform-specific co-expression networks from exon-level RNA-Seq data, using large dimensional trace. It goes beyond differentially expressed genes and infers splicing isoform network changes between normal and diseased samples. It eases the sample size bottleneck; evaluations on simulated data and lung cancer-specific ERBB2 and MAPK signaling pathways, with varying number of samples, evince the merit in handling high exon to sample size ratio datasets. Inferred network rewiring of well established Bcl-x and EGFR centered networks from lung adenocarcinoma expression data is in good agreement with literature. Gene level evaluations demonstrate a substantial performance of SpliceNet over canonical correlation analysis, a method that is currently applied to exon level RNA-Seq data. SpliceNet can also be applied to exon array data. SpliceNet is distributed as an R package available at http://www.jjwanglab.org/SpliceNet.


Sign in / Sign up

Export Citation Format

Share Document