scholarly journals dDocent: a RADseq, variant-calling pipeline designed for population genomics of non-model organisms

PeerJ ◽  
2014 ◽  
Vol 2 ◽  
pp. e431 ◽  
Author(s):  
Jonathan B. Puritz ◽  
Christopher M. Hollenbeck ◽  
John R. Gold
2014 ◽  
Author(s):  
Jonathan Puritz ◽  
Christopher M. Hollenbeck ◽  
John R. Gold

Restriction-site associated DNA sequencing (RADseq) has become a powerful and useful approach for population genomics. Currently, no software exists that utilizes both paired-end reads from RADseq data to efficiently produce population-informative variant calls, especially for organisms with large effective population sizes and high levels of genetic polymorphism but for which no genomic resources exist. dDocent is an analysis pipeline with a user-friendly, command-line interface designed to process individually barcoded RADseq data (with double cut sites) into informative SNPs/Indels for population-level analyses. The pipeline, written in BASH, uses data reduction techniques and other stand-alone software packages to perform quality trimming and adapter removal, de novo assembly of RAD loci, read mapping, SNP and Indel calling, and baseline data filtering. Double-digest RAD data from population pairings of three different marine fishes were used to compare dDocent with Stacks, the first generally available, widely used pipeline for analysis of RADseq data. dDocent consistently identified more SNPs shared across greater numbers of individuals and with higher levels of coverage. This is most likely due to the fact that dDocent quality trims instead of filtering and incorporates both forward and reverse reads in assembly, mapping, and SNP calling, thus enabling use of reads with Indel polymorphisms. The pipeline and a comprehensive user guide can be found at (http://dDocent.wordpress.com).


2014 ◽  
Author(s):  
Jonathan Puritz ◽  
Christopher M. Hollenbeck ◽  
John R. Gold

Restriction-site associated DNA sequencing (RADseq) has become a powerful and useful approach for population genomics. Currently, no software exists that utilizes both paired-end reads from RADseq data to efficiently produce population-informative variant calls, especially for organisms with large effective population sizes and high levels of genetic polymorphism but for which no genomic resources exist. dDocent is an analysis pipeline with a user-friendly, command-line interface designed to process individually barcoded RADseq data (with double cut sites) into informative SNPs/Indels for population-level analyses. The pipeline, written in BASH, uses data reduction techniques and other stand-alone software packages to perform quality trimming and adapter removal, de novo assembly of RAD loci, read mapping, SNP and Indel calling, and baseline data filtering. Double-digest RAD data from population pairings of three different marine fishes were used to compare dDocent with Stacks, the first generally available, widely used pipeline for analysis of RADseq data. dDocent consistently identified more SNPs shared across greater numbers of individuals and with higher levels of coverage. This is most likely due to the fact that dDocent quality trims instead of filtering and incorporates both forward and reverse reads in assembly, mapping, and SNP calling, thus enabling use of reads with Indel polymorphisms. The pipeline and a comprehensive user guide can be found at (http://dDocent.wordpress.com).


2015 ◽  
Author(s):  
Thomas F Cooke ◽  
Muh-Ching Yee ◽  
Marina Muzzio ◽  
Alexandra Sockell ◽  
Ryan Bell ◽  
...  

Reduced representation sequencing methods such as genotyping-by-sequencing (GBS) enable low-cost measurement of genetic variation without the need for a reference genome assembly. These methods are widely used in genetic mapping and population genetics studies, especially with non-model organisms. Variant calling error rates, however, are higher in GBS than in standard sequencing, in particular due to restriction site polymorphisms, and few computational tools exist that specifically model and correct these errors. We developed a statistical method to remove errors caused by restriction site polymorphisms, implemented in the software package GBStools. We evaluated it in several simulated data sets, varying in number of samples, mean coverage and population mutation rate, and in two empirical human data sets (N = 8 and N = 63 samples). In our simulations, GBStools improved genotype accuracy more than commonly used filters such as Hardy-Weinberg equilibrium p-values. GBStools is most effective at removing genotype errors in data sets over 100 samples when coverage is 40X or higher, and the improvement is most pronounced in species with high genomic diversity. We also demonstrate the utility of GBS and GBStools for human population genetic inference in Argentine populations and reveal widely varying individual ancestry proportions and an excess of singletons, consistent with recent population growth.


2017 ◽  
Author(s):  
Audrey Rohfritsch ◽  
Maxime Galan ◽  
Mathieu Gautier ◽  
Karim Gharbi ◽  
Gert Olsson ◽  
...  

AbstractInfectious pathogens are major selective forces acting on individuals. The recent advent of high-throughput sequencing technologies now enables to investigate the genetic bases of resistance/susceptibility to infections in non-model organisms. From an evolutionary perspective, the analysis of the genetic diversity observed at these genes in natural populations provides insight into the mechanisms maintaining polymorphism and their epidemiological consequences. We explored these questions in the context of the interactions between Puumala hantavirus (PUUV) and its reservoir host, the bank vole Myodes glareolus. Despite the continuous spatial distribution of M. glareolus in Europe, PUUV distribution is strongly heterogeneous. Different defence strategies might have evolved in bank voles as a result of co-adaptation with PUUV, which may in turn reinforce spatial heterogeneity in PUUV distribution. We performed a genome scan study of six bank vole populations sampled along a North/South transect in Sweden, including PUUV endemic and non-endemic areas. We combined candidate gene analyses (Tlr4, Tlr7, Mx2 genes) and high throughput sequencing of RAD (Restriction-site Associated DNA) markers. We found evidence for outlier loci showing high levels of genetic differentiation. Ten outliers among the 52 that matched to mouse protein-coding genes corresponded to immune related genes and were detected using ecological associations with variations in PUUV prevalence. One third of the enriched pathways concerned immune processes, including platelet activation and TLR pathway. In the future, functional experimentations should enable to confirm the role of these these immune related genes with regard to the interactions between M. glareolus and PUUV.


2016 ◽  
Vol 82 (10) ◽  
pp. 3070-3081 ◽  
Author(s):  
Changyi Zhang ◽  
Qunxin She ◽  
Hongkai Bi ◽  
Rachel J. Whitaker

ABSTRACTSulfolobus islandicusserves as a model for studying archaeal biology as well as linking novel biology to evolutionary ecology using functional population genomics. In the present study, we developed a new counterselectable genetic marker inS. islandicusto expand the genetic toolbox for this species. We show that resistance to the purine analog 6-methylpurine (6-MP) inS. islandicusM.16.4 is due to the inactivation of a putative adenine phosphoribosyltransferase encoded byM164_0158(apt). The application of theaptgene as a novel counterselectable marker was first illustrated by constructing an unmarked α-amylase deletion mutant. Furthermore, the 6-MP counterselection feature was employed in a forward (loss-of-function) mutation assay to reveal the profile of spontaneous mutations inS. islandicusM.16.4 at theaptlocus. Moreover, the general conservation ofaptgenes in the crenarchaea suggests that the same strategy can be broadly applied to other crenarchaeal model organisms. These results demonstrate that theaptlocus represents a new tool for genetic manipulation and sequence analysis of the hyperthermophilic crenarchaeonS. islandicus.IMPORTANCECurrently, thepyrEF/5-fluoroorotic acid (5-FOA) counterselection system remains the sole counterselection marker in crenarchaeal genetics. Since mostSulfolobusmutants constructed by the research community were derived from genetic hosts lacking thepyrEFgenes, thepyrEF/5-FOA system is no longer available for use in forward mutation assays. Demonstration of theapt/6-MP counterselection system for theSulfolobusmodel renders it possible to again study the mutation profiles in mutants that have already been constructed by the use of strains with apyrEF-deficient background. Furthermore, additional counterselectable markers will allow us to conduct more sophisticated genetic studies, i.e., investigate mechanisms of chromosomal DNA transfer and quantify recombination frequencies amongS. islandicusstrains.


2020 ◽  
Author(s):  
Brandon T. Sinn ◽  
Sandra J. Simon ◽  
Mathilda V. Santee ◽  
Stephen P. DiFazio ◽  
Nicole M. Fama ◽  
...  

ABSTRACTThe capability to generate densely sampled single nucleotide polymorphism (SNP) data is essential in diverse subdisciplines of biology, including crop breeding, pathology, forensics, forestry, ecology, evolution, and conservation. However, access to the expensive equipment and bioinformatics infrastructure required for genome-scale sequencing is still a limiting factor in the developing world and for institutions with limited resources.Here we present ISSRseq, a PCR-based method for reduced representation of genomic variation using simple sequence repeats as priming sites to sequence inter-simple sequence repeat (ISSR) regions. Briefly, ISSR regions are amplified with single primers, pooled, and used to construct sequencing libraries with a low-cost, efficient commercial kit, and sequenced on the Illumina platform. We also present a flexible bioinformatic pipeline that assembles ISSR loci, calls and hard filters variants, outputs data matrices in common formats, and conducts population analyses using R.Using three angiosperm species as case studies, we demonstrate that ISSRseq is highly repeatable, necessitates only simple wet-lab skills and commonplace instrumentation, is flexible in terms of the number of single primers used, is low-cost, and can generate genomic-scale variant discovery on par with existing RRS methods that require high sample integrity and concentration.ISSRseq represents a straightforward approach to SNP genotyping in any organism, and we predict that this method will be particularly useful for those studying population genomics and phylogeography of non-model organisms. Furthermore, the ease of ISSRseq relative to other RRS methods should prove useful for those conducting research in undergraduate and graduate environments, and more broadly by those lacking access to expensive instrumentation or expertise in bioinformatics.


Sign in / Sign up

Export Citation Format

Share Document