Estimating and accounting for genotyping errors in RAD-seq experiments

Mapping Intimacies ◽

10.1101/587428 ◽

2019 ◽

Cited By ~ 2

Author(s):

Luisa Bresadola ◽

Vivian Link ◽

C. Alex Buerkle ◽

Christian Lexer ◽

Daniel Wegmann

Keyword(s):

Low Cost ◽

Population Sample ◽

Error Rates ◽

Ease Of Use ◽

Model Organisms ◽

Sequencing Error ◽

Genotyping Error ◽

Genotyping Errors ◽

Reduced Representation ◽

Reduced Representation Sequencing

AbstractIn non-model organisms, evolutionary questions are frequently addressed using reduced representation sequencing techniques due to their low cost, ease of use, and because they do not require genomic resources such as a reference genome. However, evidence is accumulating that such techniques may be affected by specific biases, questioning the accuracy of obtained genotypes, and as a consequence, their usefulness in evolutionary studies. Here we introduce three strategies to estimate genotyping error rates from such data: through the comparison to high quality genotypes obtained with a different technique, from individual replicates, or from a population sample when assuming Hardy-Weinberg equilibrium. Applying these strategies to data obtained with Restriction site Associated DNA sequencing (RAD-seq), arguably the most popular reduced representation sequencing technique, revealed per-allele genotyping error rates that were much higher than sequencing error rates, particularly at heterozygous sites that were wrongly inferred as homozygous. As we exemplify through the inference of genome-wide and local ancestry of well characterized hybrids of two Eurasian poplar (Populus) species, such high error rates may lead to wrong biological conclusions. By properly accounting for these error rates in downstream analyses, either by incorporating genotyping errors directly or by recalibrating genotype likelihoods, we were nevertheless able to use the RAD-seq data to support biologically meaningful and robust inferences of ancestry among Populus hybrids. Based on these findings, we strongly recommend carefully assessing genotyping error rates in reduced representation sequencing experiments, and to properly account for these in downstream analyses, for instance using the tools presented here.

Download Full-text

GBStools: A Unified Approach for Reduced Representation Sequencing and Genotyping

10.1101/030494 ◽

2015 ◽

Author(s):

Thomas F Cooke ◽

Muh-Ching Yee ◽

Marina Muzzio ◽

Alexandra Sockell ◽

Ryan Bell ◽

...

Keyword(s):

Restriction Site ◽

Variant Calling ◽

Simulated Data ◽

Error Rates ◽

Genomic Diversity ◽

Model Organisms ◽

Data Sets ◽

Reduced Representation ◽

Restriction Site Polymorphisms ◽

Reduced Representation Sequencing

Reduced representation sequencing methods such as genotyping-by-sequencing (GBS) enable low-cost measurement of genetic variation without the need for a reference genome assembly. These methods are widely used in genetic mapping and population genetics studies, especially with non-model organisms. Variant calling error rates, however, are higher in GBS than in standard sequencing, in particular due to restriction site polymorphisms, and few computational tools exist that specifically model and correct these errors. We developed a statistical method to remove errors caused by restriction site polymorphisms, implemented in the software package GBStools. We evaluated it in several simulated data sets, varying in number of samples, mean coverage and population mutation rate, and in two empirical human data sets (N = 8 and N = 63 samples). In our simulations, GBStools improved genotype accuracy more than commonly used filters such as Hardy-Weinberg equilibrium p-values. GBStools is most effective at removing genotype errors in data sets over 100 samples when coverage is 40X or higher, and the improvement is most pronounced in species with high genomic diversity. We also demonstrate the utility of GBS and GBStools for human population genetic inference in Argentine populations and reveal widely varying individual ancestry proportions and an excess of singletons, consistent with recent population growth.

Download Full-text

Rapture-ready darters: choice of reference genome and genotyping method (whole-genome or sequence capture) influence population genomic inference in Etheostoma

10.1101/2020.05.21.108274 ◽

2020 ◽

Author(s):

Brendan N. Reid ◽

Rachel L. Moran ◽

Christopher J. Kopack ◽

Sarah W. Fitzpatrick

Keyword(s):

Reference Genome ◽

Sequence Data ◽

Low Cost ◽

Read Depth ◽

Model Organisms ◽

Whole Genome ◽

Reduced Representation ◽

Sequence Capture ◽

Population Genomic ◽

The Impact

AbstractResearchers studying non-model organisms have an increasing number of methods available for generating genomic data. However, the applicability of different methods across species, as well as the effect of reference genome choice on population genomic inference, are still difficult to predict in many cases. We evaluated the impact of data type (whole-genome vs. reduced representation) and reference genome choice on data quality and on population genomic and phylogenomic inference across several species of darters (subfamily Etheostomatinae), a highly diverse radiation of freshwater fish. We generated a high-quality reference genome and developed a hybrid RADseq/sequence capture (Rapture) protocol for the Arkansas darter (Etheostoma cragini). Rapture data from 1900 individuals spanning four darter species showed recovery of most loci across darter species at high depth and consistent estimates of heterozygosity regardless of reference genome choice. Loci with baits spanning both sides of the restriction enzyme cut site performed especially well across species. For low-coverage whole-genome data, choice of reference genome affected read depth and inferred heterozygosity. For similar amounts of sequence data, Rapture performed better at identifying fine-scale genetic structure compared to whole-genome sequencing. Rapture loci also recovered an accurate phylogeny for the study species and demonstrated high phylogenetic informativeness across the evolutionary history of the genus Etheostoma. Low cost and high cross-species effectiveness regardless of reference genome suggest that Rapture and similar sequence capture methods may be worthwhile choices for studies of diverse species radiations.

Download Full-text

ISSRseq: an extensible, low-cost, and efficient method for reduced representation sequencing

10.1101/2020.12.21.423774 ◽

2020 ◽

Author(s):

Brandon T. Sinn ◽

Sandra J. Simon ◽

Mathilda V. Santee ◽

Stephen P. DiFazio ◽

Nicole M. Fama ◽

...

Keyword(s):

Population Genomics ◽

Low Cost ◽

Inter Simple Sequence Repeat ◽

Genomic Variation ◽

Model Organisms ◽

Limiting Factor ◽

List Type ◽

Illumina Platform ◽

Reduced Representation ◽

Simple Sequence

ABSTRACTThe capability to generate densely sampled single nucleotide polymorphism (SNP) data is essential in diverse subdisciplines of biology, including crop breeding, pathology, forensics, forestry, ecology, evolution, and conservation. However, access to the expensive equipment and bioinformatics infrastructure required for genome-scale sequencing is still a limiting factor in the developing world and for institutions with limited resources.Here we present ISSRseq, a PCR-based method for reduced representation of genomic variation using simple sequence repeats as priming sites to sequence inter-simple sequence repeat (ISSR) regions. Briefly, ISSR regions are amplified with single primers, pooled, and used to construct sequencing libraries with a low-cost, efficient commercial kit, and sequenced on the Illumina platform. We also present a flexible bioinformatic pipeline that assembles ISSR loci, calls and hard filters variants, outputs data matrices in common formats, and conducts population analyses using R.Using three angiosperm species as case studies, we demonstrate that ISSRseq is highly repeatable, necessitates only simple wet-lab skills and commonplace instrumentation, is flexible in terms of the number of single primers used, is low-cost, and can generate genomic-scale variant discovery on par with existing RRS methods that require high sample integrity and concentration.ISSRseq represents a straightforward approach to SNP genotyping in any organism, and we predict that this method will be particularly useful for those studying population genomics and phylogeography of non-model organisms. Furthermore, the ease of ISSRseq relative to other RRS methods should prove useful for those conducting research in undergraduate and graduate environments, and more broadly by those lacking access to expensive instrumentation or expertise in bioinformatics.

Download Full-text

A restriction enzyme reduced representation sequencing approach for low-cost, high-throughput metagenome profiling

10.1101/694133 ◽

2019 ◽

Author(s):

Melanie K. Hess ◽

Suzanne J. Rowe ◽

Tracey C. Van Stijn ◽

Hannah M. Henry ◽

Sharon M. Hickey ◽

...

Keyword(s):

Microbial Community ◽

Correspondence Analysis ◽

High Throughput ◽

Restriction Enzyme ◽

Low Cost ◽

Methane Emissions ◽

Phenotypic Correlation ◽

Reduced Representation ◽

Freeze Dried ◽

Reduced Representation Sequencing

AbstractMicrobial community profiles have been associated with a variety of traits, including methane emissions in livestock, however, these profiles can be difficult and expensive to obtain for thousands of samples. The objective of this work was to develop a low-cost, high-throughput approach to capture the diversity of the rumen microbiome. Restriction enzyme reduced representation sequencing (RE-RRS) using ApeKI or PstI, and two bioinformatic pipelines (reference-based and reference-free) were compared to 16S rRNA gene sequencing using repeated samples collected two weeks apart from 118 sheep that were phenotypically extreme (60 high and 58 low) for methane emitted per kg dry matter intake (n=236). DNA was extracted from freeze-dried rumen samples using a phenol chloroform and bead-beating protocol prior to sequencing. The resulting sequences were used to investigate the repeatability of the rumen microbial community profiles, the effect of host genetics, laboratory and analytical method, and the genetic and phenotypic correlations with methane production. The results suggested that the best method was PstI RE-RRS analyzed with the reference-free approach via a correspondence analysis, with estimates for repeatability of 0.62±0.06, heritability 0.31±0.29, and genetic and phenotypic correlation with methane emissions of 0.88±0.25 and 0.64±0.05 respectively for the first component of correspondence analysis. The reference-free approach assigned 62.0±5.7% of reads to common 65 bp tags, much higher than the reference-based approach of 6.8±1.8% of reads assigned. Sensitivity studies suggested approximately 2000 samples could be sequenced in a single lane on an Illumina HiSeq 2500, therefore the current work of 118 samples/lane and future proposed 384 samples/lane are well within that threshold. Our approach is now being used to investigate host factors affecting the rumen and its association with a variety of production and environmental traits. With minor adaptations, our approach could be used to obtain microbial profiles from other metagenomic samples.

Download Full-text

Benchmarking small-variant genotyping in polyploids

Genome Research ◽

10.1101/gr.275579.121 ◽

2021 ◽

pp. gr.275579.121

Author(s):

Daniel P Cooke ◽

David C Wedge ◽

Gerton Lunter

Keyword(s):

In Silico ◽

Molecular Breeding ◽

Somatic Mutations ◽

Variant Calling ◽

Error Rates ◽

Whole Genome ◽

Genotyping Error ◽

Genotyping Errors ◽

Diploid Genome ◽

Concordance Analysis

Genotyping from sequencing is the basis of emerging strategies in the molecular breeding of polyploid plants. However, compared with the situation for diploids, where genotyping accuracies are confidently determined with comprehensive benchmarks, polyploids have been neglected; there are no benchmarks measuring genotyping error rates for small variants using real sequencing reads. We previously introduced a variant calling method - Octopus - that accurately calls germline variants in diploids and somatic mutations in tumors. Here, we evaluate Octopus and other popular tools on whole-genome tetraploid and hexaploid datasets created using in silico mixtures of diploid Genome In a Bottle (GIAB) samples. We find that genotyping errors are abundant for typical sequencing depths, but that Octopus makes 25% fewer errors than other methods on average. We supplement our benchmarks with concordance analysis in real autotriploid banana datasets.

Download Full-text

Facilitating population genomics of non-model organisms through optimized experimental design for reduced representation sequencing

10.1101/2021.03.30.437642 ◽

2021 ◽

Author(s):

Henrik Christiansen ◽

Franz M. Heindler ◽

Bart Hellemans ◽

Quentin Jossart ◽

Francesca Pasotti ◽

...

Keyword(s):

Genome Size ◽

Fine Tuning ◽

Alternative Methods ◽

Model Organisms ◽

Restriction Enzyme Digestion ◽

Marker Density ◽

Reduced Representation ◽

A Genome ◽

High Marker Density ◽

Reduced Representation Sequencing

Genome-wide data are invaluable to characterize differentiation and adaptation of natural populations. Reduced representation sequencing (RRS) subsamples a genome repeatedly across many individuals. However, RRS requires careful optimization and fine-tuning to deliver high marker density while being cost-efficient. The number of genomic fragments created through restriction enzyme digestion and the sequencing library setup must match to achieve sufficient sequencing coverage per locus. Here, we present a workflow based on published information and computational and experimental procedures to investigate and streamline the applicability of RRS. In an iterative process genome size estimates, restriction enzymes and size selection windows were tested and scaled in six classes of Antarctic animals (Ostracoda, Malacostraca, Bivalvia, Asteroidea, Actinopterygii, Aves). Achieving high marker density would be expensive in amphipods, the malacostracan target taxon, due to the large genome size. We propose alternative approaches such as mitogenome or target capture sequencing for this group. Pilot libraries were sequenced for all other target taxa. Ostracods, bivalves, sea stars, and fish showed overall good coverage and marker numbers for downstream population genomic analyses. In contrast, the bird test library produced low coverage and few polymorphic loci, likely due to degraded DNA. Prior testing and optimization are important to identify which groups are amenable for RRS and where alternative methods may currently offer better cost-benefit ratios. The steps outlined here are easy to follow for other non-model taxa with little genomic resources, thus stimulating efficient resource use for the many pressing research questions in molecular ecology.

Download Full-text

Facilitating population genomics of non-model organisms through optimized experimental design for reduced representation sequencing

BMC Genomics ◽

10.1186/s12864-021-07917-3 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Henrik Christiansen ◽

Franz M. Heindler ◽

Bart Hellemans ◽

Quentin Jossart ◽

Francesca Pasotti ◽

...

Keyword(s):

Genome Size ◽

Fine Tuning ◽

Alternative Methods ◽

Model Organisms ◽

Restriction Enzyme Digestion ◽

Marker Density ◽

Reduced Representation ◽

A Genome ◽

High Marker Density ◽

Reduced Representation Sequencing

Abstract Background Genome-wide data are invaluable to characterize differentiation and adaptation of natural populations. Reduced representation sequencing (RRS) subsamples a genome repeatedly across many individuals. However, RRS requires careful optimization and fine-tuning to deliver high marker density while being cost-efficient. The number of genomic fragments created through restriction enzyme digestion and the sequencing library setup must match to achieve sufficient sequencing coverage per locus. Here, we present a workflow based on published information and computational and experimental procedures to investigate and streamline the applicability of RRS. Results In an iterative process genome size estimates, restriction enzymes and size selection windows were tested and scaled in six classes of Antarctic animals (Ostracoda, Malacostraca, Bivalvia, Asteroidea, Actinopterygii, Aves). Achieving high marker density would be expensive in amphipods, the malacostracan target taxon, due to the large genome size. We propose alternative approaches such as mitogenome or target capture sequencing for this group. Pilot libraries were sequenced for all other target taxa. Ostracods, bivalves, sea stars, and fish showed overall good coverage and marker numbers for downstream population genomic analyses. In contrast, the bird test library produced low coverage and few polymorphic loci, likely due to degraded DNA. Conclusions Prior testing and optimization are important to identify which groups are amenable for RRS and where alternative methods may currently offer better cost-benefit ratios. The steps outlined here are easy to follow for other non-model taxa with little genomic resources, thus stimulating efficient resource use for the many pressing research questions in molecular ecology.

Download Full-text

A restriction enzyme reduced representation sequencing approach for low-cost, high-throughput metagenome profiling

PLoS ONE ◽

10.1371/journal.pone.0219882 ◽

2020 ◽

Vol 15 (4) ◽

pp. e0219882 ◽

Cited By ~ 2

Author(s):

Melanie K. Hess ◽

Suzanne J. Rowe ◽

Tracey C. Van Stijn ◽

Hannah M. Henry ◽

Sharon M. Hickey ◽

...

Keyword(s):

High Throughput ◽

Restriction Enzyme ◽

Low Cost ◽

Reduced Representation ◽

Reduced Representation Sequencing

Download Full-text

Benchmarking small-variant genotyping in polyploids

10.1101/2021.03.29.436766 ◽

2021 ◽

Author(s):

Daniel P Cooke ◽

David C Wedge ◽

Gerton Lunter

Keyword(s):

In Silico ◽

Molecular Breeding ◽

Somatic Mutations ◽

Variant Calling ◽

Error Rates ◽

Whole Genome ◽

Genotyping Error ◽

Genotyping Errors ◽

Diploid Genome ◽

Concordance Analysis

Genotyping from sequencing is the basis of emerging strategies in the molecular breeding of polyploid plants. However, compared with the situation for diploids, where genotyping accuracies are confidently determined with comprehensive benchmarks, polyploids have been neglected; there are no benchmarks measuring genotyping error rates for small variants using real sequencing reads. We previously introduced a variant calling method – Octopus – that accurately calls germline variants in diploids and somatic mutations in tumors. Here, we evaluate Octopus and other popular tools on whole-genome tetraploid and hexaploid datasets created using in silico mixtures of diploid Genome In a Bottle samples. We find that genotyping errors are abundant for typical sequencing depths, but that Octopus makes 25% fewer errors than other methods on average. We supplement our benchmarks with concordance analysis in real autotriploid banana datasets.

Download Full-text

Imaging in Propeller Flap Surgery

Seminars in Plastic Surgery ◽

10.1055/s-0040-1715159 ◽

2020 ◽

Vol 34 (03) ◽

pp. 145-151

Author(s):

Shimpei Ono ◽

Hiroyuki Ohi ◽

Rei Ogawa

Keyword(s):

Low Cost ◽

Three Dimensional ◽

Duplex Sonography ◽

Computed Tomographic Angiography ◽

Ease Of Use ◽

Computed Tomographic ◽

Imaging Tool ◽

Preoperative Mapping ◽

Flap Design ◽

Tomographic Angiography

AbstractSince propeller flaps are elevated as island flaps and most often nourished by a single perforator nearby the defect, it is challenging to change the flap design intraoperatively when a reliable perforator cannot be found where expected to exist. Thus, accurate preoperative mapping of perforators is essential in the safe planning of propeller flaps. Various methods have been reported so far: (1) handheld acoustic Doppler sonography (ADS), (2) color duplex sonography (CDS), (3) perforator computed tomographic angiography (P-CTA), and (4) magnetic resonance angiography (MRA). To facilitate the preoperative perforator assessment, P-CTA is currently considered as the gold standard imaging tool in revealing the three-dimensional anatomical details of perforators precisely. Nevertheless, ADS remains the most widely used tool due to its low cost, faster learning, and ease of use despite an undesirable number of false-positive results. CDS can provide hemodynamic characteristics of the perforator and is a valid and safer alternative particularly in patients in whom ionizing radiation and/or contrast exposure should be limited. Although MRA is less accurate in detecting smaller perforators of caliber less than 1.0 mm and the intramuscular course of perforators at the present time, MRA is expected to improve in the future due to the recent developments in technology, making it as accurate as P-CTA. Moreover, it provides the advantage of being radiation-free with fewer contrast reactions.

Download Full-text