scholarly journals Estimating and accounting for genotyping errors in RAD-seq experiments

2019 ◽  
Author(s):  
Luisa Bresadola ◽  
Vivian Link ◽  
C. Alex Buerkle ◽  
Christian Lexer ◽  
Daniel Wegmann

AbstractIn non-model organisms, evolutionary questions are frequently addressed using reduced representation sequencing techniques due to their low cost, ease of use, and because they do not require genomic resources such as a reference genome. However, evidence is accumulating that such techniques may be affected by specific biases, questioning the accuracy of obtained genotypes, and as a consequence, their usefulness in evolutionary studies. Here we introduce three strategies to estimate genotyping error rates from such data: through the comparison to high quality genotypes obtained with a different technique, from individual replicates, or from a population sample when assuming Hardy-Weinberg equilibrium. Applying these strategies to data obtained with Restriction site Associated DNA sequencing (RAD-seq), arguably the most popular reduced representation sequencing technique, revealed per-allele genotyping error rates that were much higher than sequencing error rates, particularly at heterozygous sites that were wrongly inferred as homozygous. As we exemplify through the inference of genome-wide and local ancestry of well characterized hybrids of two Eurasian poplar (Populus) species, such high error rates may lead to wrong biological conclusions. By properly accounting for these error rates in downstream analyses, either by incorporating genotyping errors directly or by recalibrating genotype likelihoods, we were nevertheless able to use the RAD-seq data to support biologically meaningful and robust inferences of ancestry among Populus hybrids. Based on these findings, we strongly recommend carefully assessing genotyping error rates in reduced representation sequencing experiments, and to properly account for these in downstream analyses, for instance using the tools presented here.

2015 ◽  
Author(s):  
Thomas F Cooke ◽  
Muh-Ching Yee ◽  
Marina Muzzio ◽  
Alexandra Sockell ◽  
Ryan Bell ◽  
...  

Reduced representation sequencing methods such as genotyping-by-sequencing (GBS) enable low-cost measurement of genetic variation without the need for a reference genome assembly. These methods are widely used in genetic mapping and population genetics studies, especially with non-model organisms. Variant calling error rates, however, are higher in GBS than in standard sequencing, in particular due to restriction site polymorphisms, and few computational tools exist that specifically model and correct these errors. We developed a statistical method to remove errors caused by restriction site polymorphisms, implemented in the software package GBStools. We evaluated it in several simulated data sets, varying in number of samples, mean coverage and population mutation rate, and in two empirical human data sets (N = 8 and N = 63 samples). In our simulations, GBStools improved genotype accuracy more than commonly used filters such as Hardy-Weinberg equilibrium p-values. GBStools is most effective at removing genotype errors in data sets over 100 samples when coverage is 40X or higher, and the improvement is most pronounced in species with high genomic diversity. We also demonstrate the utility of GBS and GBStools for human population genetic inference in Argentine populations and reveal widely varying individual ancestry proportions and an excess of singletons, consistent with recent population growth.


2020 ◽  
Author(s):  
Brendan N. Reid ◽  
Rachel L. Moran ◽  
Christopher J. Kopack ◽  
Sarah W. Fitzpatrick

AbstractResearchers studying non-model organisms have an increasing number of methods available for generating genomic data. However, the applicability of different methods across species, as well as the effect of reference genome choice on population genomic inference, are still difficult to predict in many cases. We evaluated the impact of data type (whole-genome vs. reduced representation) and reference genome choice on data quality and on population genomic and phylogenomic inference across several species of darters (subfamily Etheostomatinae), a highly diverse radiation of freshwater fish. We generated a high-quality reference genome and developed a hybrid RADseq/sequence capture (Rapture) protocol for the Arkansas darter (Etheostoma cragini). Rapture data from 1900 individuals spanning four darter species showed recovery of most loci across darter species at high depth and consistent estimates of heterozygosity regardless of reference genome choice. Loci with baits spanning both sides of the restriction enzyme cut site performed especially well across species. For low-coverage whole-genome data, choice of reference genome affected read depth and inferred heterozygosity. For similar amounts of sequence data, Rapture performed better at identifying fine-scale genetic structure compared to whole-genome sequencing. Rapture loci also recovered an accurate phylogeny for the study species and demonstrated high phylogenetic informativeness across the evolutionary history of the genus Etheostoma. Low cost and high cross-species effectiveness regardless of reference genome suggest that Rapture and similar sequence capture methods may be worthwhile choices for studies of diverse species radiations.


2020 ◽  
Author(s):  
Brandon T. Sinn ◽  
Sandra J. Simon ◽  
Mathilda V. Santee ◽  
Stephen P. DiFazio ◽  
Nicole M. Fama ◽  
...  

ABSTRACTThe capability to generate densely sampled single nucleotide polymorphism (SNP) data is essential in diverse subdisciplines of biology, including crop breeding, pathology, forensics, forestry, ecology, evolution, and conservation. However, access to the expensive equipment and bioinformatics infrastructure required for genome-scale sequencing is still a limiting factor in the developing world and for institutions with limited resources.Here we present ISSRseq, a PCR-based method for reduced representation of genomic variation using simple sequence repeats as priming sites to sequence inter-simple sequence repeat (ISSR) regions. Briefly, ISSR regions are amplified with single primers, pooled, and used to construct sequencing libraries with a low-cost, efficient commercial kit, and sequenced on the Illumina platform. We also present a flexible bioinformatic pipeline that assembles ISSR loci, calls and hard filters variants, outputs data matrices in common formats, and conducts population analyses using R.Using three angiosperm species as case studies, we demonstrate that ISSRseq is highly repeatable, necessitates only simple wet-lab skills and commonplace instrumentation, is flexible in terms of the number of single primers used, is low-cost, and can generate genomic-scale variant discovery on par with existing RRS methods that require high sample integrity and concentration.ISSRseq represents a straightforward approach to SNP genotyping in any organism, and we predict that this method will be particularly useful for those studying population genomics and phylogeography of non-model organisms. Furthermore, the ease of ISSRseq relative to other RRS methods should prove useful for those conducting research in undergraduate and graduate environments, and more broadly by those lacking access to expensive instrumentation or expertise in bioinformatics.


2019 ◽  
Author(s):  
Melanie K. Hess ◽  
Suzanne J. Rowe ◽  
Tracey C. Van Stijn ◽  
Hannah M. Henry ◽  
Sharon M. Hickey ◽  
...  

AbstractMicrobial community profiles have been associated with a variety of traits, including methane emissions in livestock, however, these profiles can be difficult and expensive to obtain for thousands of samples. The objective of this work was to develop a low-cost, high-throughput approach to capture the diversity of the rumen microbiome. Restriction enzyme reduced representation sequencing (RE-RRS) using ApeKI or PstI, and two bioinformatic pipelines (reference-based and reference-free) were compared to 16S rRNA gene sequencing using repeated samples collected two weeks apart from 118 sheep that were phenotypically extreme (60 high and 58 low) for methane emitted per kg dry matter intake (n=236). DNA was extracted from freeze-dried rumen samples using a phenol chloroform and bead-beating protocol prior to sequencing. The resulting sequences were used to investigate the repeatability of the rumen microbial community profiles, the effect of host genetics, laboratory and analytical method, and the genetic and phenotypic correlations with methane production. The results suggested that the best method was PstI RE-RRS analyzed with the reference-free approach via a correspondence analysis, with estimates for repeatability of 0.62±0.06, heritability 0.31±0.29, and genetic and phenotypic correlation with methane emissions of 0.88±0.25 and 0.64±0.05 respectively for the first component of correspondence analysis. The reference-free approach assigned 62.0±5.7% of reads to common 65 bp tags, much higher than the reference-based approach of 6.8±1.8% of reads assigned. Sensitivity studies suggested approximately 2000 samples could be sequenced in a single lane on an Illumina HiSeq 2500, therefore the current work of 118 samples/lane and future proposed 384 samples/lane are well within that threshold. Our approach is now being used to investigate host factors affecting the rumen and its association with a variety of production and environmental traits. With minor adaptations, our approach could be used to obtain microbial profiles from other metagenomic samples.


2021 ◽  
pp. gr.275579.121
Author(s):  
Daniel P Cooke ◽  
David C Wedge ◽  
Gerton Lunter

Genotyping from sequencing is the basis of emerging strategies in the molecular breeding of polyploid plants. However, compared with the situation for diploids, where genotyping accuracies are confidently determined with comprehensive benchmarks, polyploids have been neglected; there are no benchmarks measuring genotyping error rates for small variants using real sequencing reads. We previously introduced a variant calling method - Octopus - that accurately calls germline variants in diploids and somatic mutations in tumors. Here, we evaluate Octopus and other popular tools on whole-genome tetraploid and hexaploid datasets created using in silico mixtures of diploid Genome In a Bottle (GIAB) samples. We find that genotyping errors are abundant for typical sequencing depths, but that Octopus makes 25% fewer errors than other methods on average. We supplement our benchmarks with concordance analysis in real autotriploid banana datasets.


2021 ◽  
Author(s):  
Henrik Christiansen ◽  
Franz M. Heindler ◽  
Bart Hellemans ◽  
Quentin Jossart ◽  
Francesca Pasotti ◽  
...  

Genome-wide data are invaluable to characterize differentiation and adaptation of natural populations. Reduced representation sequencing (RRS) subsamples a genome repeatedly across many individuals. However, RRS requires careful optimization and fine-tuning to deliver high marker density while being cost-efficient. The number of genomic fragments created through restriction enzyme digestion and the sequencing library setup must match to achieve sufficient sequencing coverage per locus. Here, we present a workflow based on published information and computational and experimental procedures to investigate and streamline the applicability of RRS. In an iterative process genome size estimates, restriction enzymes and size selection windows were tested and scaled in six classes of Antarctic animals (Ostracoda, Malacostraca, Bivalvia, Asteroidea, Actinopterygii, Aves). Achieving high marker density would be expensive in amphipods, the malacostracan target taxon, due to the large genome size. We propose alternative approaches such as mitogenome or target capture sequencing for this group. Pilot libraries were sequenced for all other target taxa. Ostracods, bivalves, sea stars, and fish showed overall good coverage and marker numbers for downstream population genomic analyses. In contrast, the bird test library produced low coverage and few polymorphic loci, likely due to degraded DNA. Prior testing and optimization are important to identify which groups are amenable for RRS and where alternative methods may currently offer better cost-benefit ratios. The steps outlined here are easy to follow for other non-model taxa with little genomic resources, thus stimulating efficient resource use for the many pressing research questions in molecular ecology.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Henrik Christiansen ◽  
Franz M. Heindler ◽  
Bart Hellemans ◽  
Quentin Jossart ◽  
Francesca Pasotti ◽  
...  

Abstract Background Genome-wide data are invaluable to characterize differentiation and adaptation of natural populations. Reduced representation sequencing (RRS) subsamples a genome repeatedly across many individuals. However, RRS requires careful optimization and fine-tuning to deliver high marker density while being cost-efficient. The number of genomic fragments created through restriction enzyme digestion and the sequencing library setup must match to achieve sufficient sequencing coverage per locus. Here, we present a workflow based on published information and computational and experimental procedures to investigate and streamline the applicability of RRS. Results In an iterative process genome size estimates, restriction enzymes and size selection windows were tested and scaled in six classes of Antarctic animals (Ostracoda, Malacostraca, Bivalvia, Asteroidea, Actinopterygii, Aves). Achieving high marker density would be expensive in amphipods, the malacostracan target taxon, due to the large genome size. We propose alternative approaches such as mitogenome or target capture sequencing for this group. Pilot libraries were sequenced for all other target taxa. Ostracods, bivalves, sea stars, and fish showed overall good coverage and marker numbers for downstream population genomic analyses. In contrast, the bird test library produced low coverage and few polymorphic loci, likely due to degraded DNA. Conclusions Prior testing and optimization are important to identify which groups are amenable for RRS and where alternative methods may currently offer better cost-benefit ratios. The steps outlined here are easy to follow for other non-model taxa with little genomic resources, thus stimulating efficient resource use for the many pressing research questions in molecular ecology.


PLoS ONE ◽  
2020 ◽  
Vol 15 (4) ◽  
pp. e0219882 ◽  
Author(s):  
Melanie K. Hess ◽  
Suzanne J. Rowe ◽  
Tracey C. Van Stijn ◽  
Hannah M. Henry ◽  
Sharon M. Hickey ◽  
...  

2021 ◽  
Author(s):  
Daniel P Cooke ◽  
David C Wedge ◽  
Gerton Lunter

Genotyping from sequencing is the basis of emerging strategies in the molecular breeding of polyploid plants. However, compared with the situation for diploids, where genotyping accuracies are confidently determined with comprehensive benchmarks, polyploids have been neglected; there are no benchmarks measuring genotyping error rates for small variants using real sequencing reads. We previously introduced a variant calling method – Octopus – that accurately calls germline variants in diploids and somatic mutations in tumors. Here, we evaluate Octopus and other popular tools on whole-genome tetraploid and hexaploid datasets created using in silico mixtures of diploid Genome In a Bottle samples. We find that genotyping errors are abundant for typical sequencing depths, but that Octopus makes 25% fewer errors than other methods on average. We supplement our benchmarks with concordance analysis in real autotriploid banana datasets.


2020 ◽  
Vol 34 (03) ◽  
pp. 145-151
Author(s):  
Shimpei Ono ◽  
Hiroyuki Ohi ◽  
Rei Ogawa

AbstractSince propeller flaps are elevated as island flaps and most often nourished by a single perforator nearby the defect, it is challenging to change the flap design intraoperatively when a reliable perforator cannot be found where expected to exist. Thus, accurate preoperative mapping of perforators is essential in the safe planning of propeller flaps. Various methods have been reported so far: (1) handheld acoustic Doppler sonography (ADS), (2) color duplex sonography (CDS), (3) perforator computed tomographic angiography (P-CTA), and (4) magnetic resonance angiography (MRA). To facilitate the preoperative perforator assessment, P-CTA is currently considered as the gold standard imaging tool in revealing the three-dimensional anatomical details of perforators precisely. Nevertheless, ADS remains the most widely used tool due to its low cost, faster learning, and ease of use despite an undesirable number of false-positive results. CDS can provide hemodynamic characteristics of the perforator and is a valid and safer alternative particularly in patients in whom ionizing radiation and/or contrast exposure should be limited. Although MRA is less accurate in detecting smaller perforators of caliber less than 1.0 mm and the intramuscular course of perforators at the present time, MRA is expected to improve in the future due to the recent developments in technology, making it as accurate as P-CTA. Moreover, it provides the advantage of being radiation-free with fewer contrast reactions.


Sign in / Sign up

Export Citation Format

Share Document