Deceptive combined effects of short allele dominance and stuttering: an example with Ixodes scapularis, the main vector of Lyme disease in the U.S.A.

Mapping Intimacies ◽

10.1101/622373 ◽

2019 ◽

Cited By ~ 2

Author(s):

Thierry De Meeûs ◽

Cynthia T. Chan ◽

John M. Ludwig ◽

Jean I. Tsao ◽

Jaymin Patel ◽

...

Keyword(s):

Ixodes Scapularis ◽

Null Alleles ◽

Population Subdivision ◽

Microsatellite Data ◽

Significant Excess ◽

Data Set ◽

Short Allele ◽

False Discovery ◽

Subdivided Populations ◽

Reasonable Proportion

ABSTRACTNull alleles, short allele dominance (SAD), and stuttering increase the perceived relative inbreeding of individuals and subpopulations as measured by Wright’s FIS and FST. Ascertainment bias, due to such amplifying problems are usually caused by inaccurate primer design (if developed from a different species or a distant population), poor DNA quality, low DNA concentration, or a combination of some or all these sources of inaccuracy. When combined, these issues can increase the correlation between polymorphism at concerned loci and, consequently, of linkage disequilibrium (LD) between those. In this note, we studied an original microsatellite data set generated by analyzing nine loci in Ixodes scapularis ticks from the eastern U.S.A. To detect null alleles and SAD we used correlation methods and variation measures. To detect stuttering, we evaluated heterozygote deficit between alleles displaying a single repeat difference. We demonstrated that an important proportion of loci affected by amplification problems (one with null alleles, two with SAD and three with stuttering) lead to highly significant heterozygote deficits (FIS=0.1, p-value<0.0001). This occurred together with an important proportion (22%) of pairs of loci in significant LD, two of which were still significant after a false discovery rate (FDR) correction, and some variation in the measurement of population subdivision across loci (Wright’s FST). This suggested a strong Wahlund effect and/or selection at several loci. By finding small peaks corresponding to previously disregarded larger alleles in some homozygous profiles for loci with SAD and by pooling alleles close in size for loci with stuttering, we generated an amended dataset. Except for one locus with null alleles and another still displaying a modest SAD, the analyses of the corrected dataset revealed a significant excess of heterozygotes (FIS=-0.07 as expected in dioecious and strongly subdivided populations, with a more reasonable proportion (19%) of pairs of loci characterized by significant LD, none of which stayed significant after the FDR procedure. Strong subdivision was also confirmed by the standardized FST’ corrected for null alleles (FST’=0.19) and small effective subpopulation sizes (Ne=7).

Download Full-text

Population Subdivision and Molecular Sequence Variation: Theory and Analysis of Drosophila ananassae Data

Genetics ◽

10.1093/genetics/165.3.1385 ◽

2003 ◽

Vol 165 (3) ◽

pp. 1385-1395

Author(s):

Claus Vogl ◽

Aparup Das ◽

Mark Beaumont ◽

Sujata Mohanty ◽

Wolfgang Stephan

Keyword(s):

Sequence Data ◽

Isolation By Distance ◽

Allele Frequencies ◽

Drosophila Ananassae ◽

Population Subdivision ◽

Variation Theory ◽

Peripheral Populations ◽

Molecular Variation ◽

Data Set ◽

Evolutionary Forces

Abstract Population subdivision complicates analysis of molecular variation. Even if neutrality is assumed, three evolutionary forces need to be considered: migration, mutation, and drift. Simplification can be achieved by assuming that the process of migration among and drift within subpopulations is occurring fast compared to mutation and drift in the entire population. This allows a two-step approach in the analysis: (i) analysis of population subdivision and (ii) analysis of molecular variation in the migrant pool. We model population subdivision using an infinite island model, where we allow the migration/drift parameter 0398; to vary among populations. Thus, central and peripheral populations can be differentiated. For inference of 0398;, we use a coalescence approach, implemented via a Markov chain Monte Carlo (MCMC) integration method that allows estimation of allele frequencies in the migrant pool. The second step of this approach (analysis of molecular variation in the migrant pool) uses the estimated allele frequencies in the migrant pool for the study of molecular variation. We apply this method to a Drosophila ananassae sequence data set. We find little indication of isolation by distance, but large differences in the migration parameter among populations. The population as a whole seems to be expanding. A population from Bogor (Java, Indonesia) shows the highest variation and seems closest to the species center.

Download Full-text

Detecting Wahlund effects together with amplification problems: Cryptic species, null alleles and short allele dominance inGlossina pallidipespopulations from Tanzania

Molecular Ecology Resources ◽

10.1111/1755-0998.12989 ◽

2019 ◽

Vol 19 (3) ◽

pp. 757-772 ◽

Cited By ~ 9

Author(s):

Oliver Manangwa ◽

Thierry De Meeûs ◽

Pascal Grébaut ◽

Adeline Ségard ◽

Mechtilda Byamungu ◽

...

Keyword(s):

Cryptic Species ◽

Null Alleles ◽

Short Allele

Download Full-text

Weighted mining of massive collections of P-values by convex optimization

Information and Inference A Journal of the IMA ◽

10.1093/imaiai/iax013 ◽

2017 ◽

Vol 7 (2) ◽

pp. 251-275

Author(s):

Edgar Dobriban

Keyword(s):

Convex Optimization ◽

Multiple Testing ◽

Observational Cosmology ◽

Data Sets ◽

Data Set ◽

P Values ◽

False Discovery ◽

Massive Data Set ◽

Optimal Weighting ◽

Weighting Problem

Abstract Researchers in data-rich disciplines—think of computational genomics and observational cosmology—often wish to mine large bodies of $P$-values looking for significant effects, while controlling the false discovery rate or family-wise error rate. Increasingly, researchers also wish to prioritize certain hypotheses, for example, those thought to have larger effect sizes, by upweighting, and to impose constraints on the underlying mining, such as monotonicity along a certain sequence. We introduce Princessp, a principled method for performing weighted multiple testing by constrained convex optimization. Our method elegantly allows one to prioritize certain hypotheses through upweighting and to discount others through downweighting, while constraining the underlying weights involved in the mining process. When the $P$-values derive from monotone likelihood ratio families such as the Gaussian means model, the new method allows exact solution of an important optimal weighting problem previously thought to be non-convex and computationally infeasible. Our method scales to massive data set sizes. We illustrate the applications of Princessp on a series of standard genomics data sets and offer comparisons with several previous ‘standard’ methods. Princessp offers both ease of operation and the ability to scale to extremely large problem sizes. The method is available as open-source software from github.com/dobriban/pvalue_weighting_matlab (accessed 11 October 2017).

Download Full-text

“Riverscape” genetics: river characteristics influence the genetic structure and diversity of anadromous and freshwater Atlantic salmon (Salmo salar) populations in northwest Russia

Canadian Journal of Fisheries and Aquatic Sciences ◽

10.1139/f2012-114 ◽

2012 ◽

Vol 69 (12) ◽

pp. 1947-1958 ◽

Cited By ~ 32

Author(s):

Mikhail Yu. Ozerov ◽

Alexey E. Veselov ◽

Jaakko Lumme ◽

Craig R. Primmer

Keyword(s):

Genetic Diversity ◽

Atlantic Salmon ◽

Salmo Salar ◽

Carrying Capacity ◽

Salmonid Fishes ◽

Microsatellite Data ◽

Data Set ◽

Atlantic Salmon Salmo Salar ◽

Genetic Diversity And Structure ◽

Genetic Structure And Diversity

Combining population genetic and landscape ecology approaches provides an understanding of how environmental factors affect individual dispersal, population size, and structure. We first generated a set of predictions of the expected effect of “riverscape” characteristics on salmonid genetic diversity and divergence, based on the results of earlier research on this topic in salmonid fishes. We then tested these predictions in a data set consisting of the microsatellite data and riverscape characteristics of 39 Atlantic salmon ( Salmo salar ) populations from northwest Russia. The carrying capacity of the river was an important factor shaping the genetic diversity and differentiation of Atlantic salmon populations in the region: salmon in rivers with a larger carrying capacity tended to have higher genetic diversity and lower genetic differentiation. The importance of other riverscape characteristics often varied between anadromous and freshwater populations. Taken together, these associations demonstrate a high and complex level of river landscape influence on the genetic diversity and structure of Atlantic salmon populations and highlight the importance of spawning and nursery area maintenance for the conservation of salmonids.

Download Full-text

Breakdown of gametophytic self-incompatibility in subdivided populations

10.1101/444125 ◽

2018 ◽

Author(s):

Thomas Brom ◽

Vincent Castric ◽

Sylvain Billiard

Keyword(s):

Inbreeding Depression ◽

Natural Populations ◽

Simulation Models ◽

Population Subdivision ◽

Self Incompatibility ◽

S Locus ◽

Isolated Populations ◽

Data Accessibility ◽

Local Diversity ◽

Subdivided Populations

AbstractMany hermaphroditic flowering plants species possess a genetic self-incompatibility (SI) system that prevents self-fertilization and is typically controlled by a single multiallelic locus, the S-locus. The conditions under which SI can be stably maintained in single isolated populations are well known and depend chiefly on the level of inbreeding depression and the number of SI alleles segregating at the S-locus. However, while both the number of SI alleles and the level of inbreeding depression are potentially affected by population subdivision, the conditions for the maintenance of SI in subdivided populations remain to be studied. In this paper, we combine analytical predictions and two different individual-based simulation models to show that population subdivision can severely compromise the maintenance of SI. Under the conditions we explored, this effect is mainly driven by the decrease of the local diversity of SI alleles rather than by a change in the dynamics of inbreeding depression. We discuss the implications of our results for the interpretation of empirical data on the loss of SI in natural populations.Data accessibility statementNo data to be archived

Download Full-text

Behind Taxonomic Variability: The Functional Redundancy in the Tick Microbiome

Microorganisms ◽

10.3390/microorganisms8111829 ◽

2020 ◽

Vol 8 (11) ◽

pp. 1829

Author(s):

Agustín Estrada-Peña ◽

Alejandro Cabezas-Cruz ◽

Dasiel Obregón

Keyword(s):

Metabolic Pathways ◽

Developmental Stages ◽

Sequence Data ◽

Ixodes Scapularis ◽

Functional Redundancy ◽

Taxonomic Composition ◽

Functional Similarity ◽

Data Set ◽

Taxonomic Profiling ◽

Tick Microbiome

The taxonomic composition and diversity of tick midgut microbiota have been extensively studied in different species of the genera Rhipicephalus, Ixodes, Amblyomma, Haemaphysalis, Hyalomma, Dermacentor, Argas and Ornithodoros, while the functional significance of bacterial diversity has been proportionally less explored. In this study, we used previously published 16S amplicon sequence data sets from three Ixodes scapularis cohorts, two of uninfected nymphs, and one of larvae experimentally infected with Borrelia burgdorferi, to test the functional redundancy of the tick microbiome. We predicted the metabolic profiling of each sample using the state-of-the-art metagenomics tool PICRUSt2. The results showed that the microbiomes of all I. scapularis samples share only 80 taxa (24.6%, total 324), while out of the 342 metabolic pathways predicted, 82.7%, were shared by all the ticks. Borrelia-infected larvae lack 15.4% of pathways found in the microbiome of uninfected nymphs. Taxa contribution analysis showed that the functional microbiome of uninfected ticks was highly redundant, with, in some cases, up to 198 bacterial taxa contributing to a single pathway. However, Borrelia-infected larvae had a smaller redundancy with 6.7% of pathways provided by more than 100 genera, while 15.7–19.2% of pathways were provided by more than 100 genera in the two cohorts of uninfected ticks. In addition, we compared the functional profiles of three microbial communities from each data set, identified through a network-based approach, and we observed functional similarity between them. Based on the functional redundancy and functional similarity of the microbiome of ticks in different developmental stages and infection status, we concluded that the tick gut microbiota is a self-regulating community of very diverse bacteria contributing to a defined set of metabolic pathways and functions with yet unexplored relevance for tick fitness and/or bacterial community stability. We propose a change of focus in which the tick microbiome must be analyzed in all dimensions, highlighting their functional traits, instead of the conventional taxonomic profiling.

Download Full-text

Measwring and Testing Genetic Differentiation With Ordered Versus Unordered Alleles

Genetics ◽

10.1093/genetics/144.3.1237 ◽

1996 ◽

Vol 144 (3) ◽

pp. 1237-1245 ◽

Cited By ~ 38

Author(s):

O Pons ◽

R J Petit

Keyword(s):

Dna Sequences ◽

Nuclear Dna ◽

Mutation Rates ◽

Phylogeographic Structure ◽

Published Data ◽

Data Set ◽

Subdivided Populations ◽

Two Measures ◽

The Difference ◽

Single Data

Abstract Estimates and variances of diversity and differentiation measures in subdivided populations are proposed that can be applied to haplotypes (ordered alleles such as DNA sequences, which may contain a record of their own histories). Hence, two measures of differentiation can be compared for a single data set: one (GST) that makes use only of the allelic frequencies and the other (NST) for which similarities between the haplotypes are taken into account in addition. Tests are proposed to compare NST and GST with zero and with each other. The difference between NST and GST can be caused by several factors, including sampling artefacts, unequal effect of mutation rates and phylogeographic structure. The method presented is applied to a published data set where a nuclear DNA sequence had been determined from individuals of a grasshopper distributed in 24 regions of Europe. Additional insights into the genetic subdivision of these populations are obtained by progressively combining related haplotypes and reanalyzing the data each time.

Download Full-text

Detection of Lane-Change Events in Naturalistic Driving Videos

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001418500301 ◽

2018 ◽

Vol 32 (10) ◽

pp. 1850030 ◽

Cited By ~ 3

Author(s):

Shuhang Wang ◽

Brian R. Ott ◽

Gang Luo

Keyword(s):

Data Reduction ◽

Reduction Rate ◽

Lane Change ◽

Data Set ◽

Lane Changes ◽

Automated Method ◽

Temporal Domain ◽

False Discovery ◽

Change Events ◽

Naturalistic Driving

Lane changes are important behaviors to study in driving research. Automated detection of lane-change events is required to address the need for data reduction of a vast amount of naturalistic driving videos. This paper presents a method to deal with weak lane-marker patterns as small as a couple of pixels wide. The proposed method is novel in its approach to detecting lane-change events by accumulating lane-marker candidates over time. Since the proposed method tracks lane markers in temporal domain, it is robust to low resolution and many different kinds of interferences. The proposed technique was tested using 490 h of naturalistic driving videos collected from 63 drivers. The lane-change events in a 10-h video set were first manually coded and compared with the outcome of the automated method. The method’s sensitivity was 94.8% and the data reduction rate was 93.6%. The automated procedure was further evaluated using the remaining 480-h driving videos. The data reduction rate was 97.4%. All 4971 detected events were manually reviewed and classified as either true or false lane-change events. Bootstrapping showed that the false discovery rate from the larger data set was not significantly different from that of the 10-h manually coded data set. This study demonstrated that the temporal processing of lane markers is an efficient strategy for detecting lane-change events involving weak lane-marker patterns in naturalistic driving.

Download Full-text

MIXTURE MODELS FOR DETECTING DIFFERENTIALLY EXPRESSED GENES IN MICROARRAYS

International Journal of Neural Systems ◽

10.1142/s0129065706000755 ◽

2006 ◽

Vol 16 (05) ◽

pp. 353-362 ◽

Cited By ~ 2

Author(s):

LIAT BEN-TOVIM JONES ◽

RICHARD BEAN ◽

GEOFFREY J. MCLACHLAN ◽

JUSTIN XI ZHU

Keyword(s):

Mixture Models ◽

False Negative ◽

False Negative Rate ◽

Multiple Hypothesis Testing ◽

Differentially Expressed ◽

Data Set ◽

False Discovery ◽

Mixture Model Approach ◽

Number Of Classes ◽

Selection Of

An important and common problem in microarray experiments is the detection of genes that are differentially expressed in a given number of classes. As this problem concerns the selection of significant genes from a large pool of candidate genes, it needs to be carried out within the framework of multiple hypothesis testing. In this paper, we focus on the use of mixture models to handle the multiplicity issue. With this approach, a measure of the local FDR (false discovery rate) is provided for each gene. An attractive feature of the mixture model approach is that it provides a framework for the estimation of the prior probability that a gene is not differentially expressed, and this probability can subsequently be used in forming a decision rule. The rule can also be formed to take the false negative rate into account. We apply this approach to a well-known publicly available data set on breast cancer, and discuss our findings with reference to other approaches.

Download Full-text

A sampling theory of selectively neutral alleles in a subdivided population.

Genetics ◽

10.1093/genetics/119.3.721 ◽

1988 ◽

Vol 119 (3) ◽

pp. 721-729

Author(s):

E R Tillier ◽

G B Golding

Keyword(s):

Population Structure ◽

Population Subdivision ◽

Continuous Approximation ◽

Sample Distribution ◽

Migration Rates ◽

Practical Possibility ◽

Structured Population ◽

Subdivided Population ◽

Subdivided Populations ◽

Actual Size

Abstract Ewens' sampling distribution is investigated for a structured population. Samples are assumed to be taken from a single subpopulation that exchanges migrants with other subpopulations. A complete description of the probability distribution for such samples is not a practical possibility but an equilibrium approximation can be found. This approximation extracts the information necessary for constructing a continuous approximation to the complete distribution using known values of the distribution and its derivatives in randomly mating populations. It is shown that this approximation is as complete a description of a single biologically realistic subpopulation as is possible given standard uncertainties about the actual size of the migration rates, relative sizes of each of the subpopulations and other factors that might affect the genetic structure of a subpopulation. Any further information must be gained at the expense of generality. This approximation is used to investigate the effect of population subdivision on Watterson's test of neutrality. It is known that the infinite allele, sample distribution is independent of mutation rate when made conditional on the number of alleles in the sample. It is shown that the conditional, infinite allele, sample distribution from this approximation is also independent of population structure and hence Watterson's test is still approximately valid for subdivided populations.

Download Full-text