Linkage Disequilibrium Estimation in Low Coverage High-Throughput Sequencing Data

Mapping Intimacies ◽

10.1101/235937 ◽

2017 ◽

Cited By ~ 1

Author(s):

Timothy P. Bilton ◽

John C. McEwan ◽

Shannon M. Clarke ◽

Rudiger Brauning ◽

Tracey C. van Stijn ◽

...

Keyword(s):

Linkage Disequilibrium ◽

High Throughput ◽

High Throughput Sequencing ◽

Cost Effective ◽

Likelihood Method ◽

Sequencing Data ◽

Diverse Range ◽

Pairwise Linkage Disequilibrium ◽

Large Populations ◽

Low Coverage

AbstractHigh-throughput sequencing methods that multiplex a large number of individuals have provided a cost-effective approach for discovering genome-wide genetic variation in large populations. These sequencing methods are increasingly being utilized in population genetic studies across a diverse range of species. One side-effect of these methods, however, is that one or more alleles at a particular locus may not be sequenced, particularly when the sequencing depth is low, resulting in some heterozygous genotypes being called as homozygous. Under-called heterozygous genotypes have a profound effect on the estimation of linkage disequilibrium and, if not taken into account, leads to inaccurate estimates. We developed a new likelihood method, GUS-LD, to estimate pairwise linkage disequilibrium using low coverage sequencing data that accounts for under-called heterozygous genotypes. Our findings show that accurate estimates were obtained using GUS-LD on low coverage sequencing data, whereas underestimation of linkage disequilibrium results if no adjustment is made for under-called heterozygotes.

Download Full-text

Linkage Disequilibrium Estimation in Low Coverage High-Throughput Sequencing Data

Genetics ◽

10.1534/genetics.118.300831 ◽

2018 ◽

Vol 209 (2) ◽

pp. 389-400 ◽

Cited By ~ 16

Author(s):

Timothy P. Bilton ◽

John C. McEwan ◽

Shannon M. Clarke ◽

Rudiger Brauning ◽

Tracey C. van Stijn ◽

...

Keyword(s):

Linkage Disequilibrium ◽

High Throughput ◽

High Throughput Sequencing ◽

Sequencing Data ◽

High Throughput Sequencing Data ◽

Low Coverage

Download Full-text

Genome-Wide Estimation of Linkage Disequilibrium from Population-Level High-Throughput Sequencing Data

Genetics ◽

10.1534/genetics.114.165514 ◽

2014 ◽

Vol 197 (4) ◽

pp. 1303-1313 ◽

Cited By ~ 16

Author(s):

Takahiro Maruki ◽

Michael Lynch

Keyword(s):

Linkage Disequilibrium ◽

High Throughput ◽

High Throughput Sequencing ◽

Population Level ◽

Sequencing Data ◽

High Throughput Sequencing Data ◽

Genome Wide

Download Full-text

sppIDer: a species identification tool to investigate hybrid genomes with high-throughput sequencing

10.1101/333815 ◽

2018 ◽

Cited By ~ 1

Author(s):

Quinn K. Langdon ◽

David Peris ◽

Brian Kyle ◽

Chris Todd Hittinger

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Rapid Identification ◽

Sequencing Data ◽

Pure Species ◽

High Throughput Sequencing Data ◽

Interspecies Hybrids ◽

Evolutionary Trajectories ◽

Low Coverage ◽

Reference Genomes

AbstractThe genomics era has expanded our knowledge about the diversity of the living world, yet harnessing high-throughput sequencing data to investigate alternative evolutionary trajectories, such as hybridization, is still challenging. Here we present sppIDer, a pipeline for the characterization of interspecies hybrids and pure species,that illuminates the complete composition of genomes. sppIDer maps short-read sequencing data to a combination genome built from reference genomes of several species of interest and assesses the genomic contribution and relative ploidy of each parental species, producing a series of colorful graphical outputs ready for publication. As a proof-of-concept, we use the genus Saccharomyces to detect and visualize both interspecies hybrids and pure strains, even with missing parental reference genomes. Through simulation, we show that sppIDer is robust to variable reference genome qualities and performs well with low-coverage data. We further demonstrate the power of this approach in plants, animals, and other fungi. sppIDer is robust to many different inputs and provides visually intuitive insight into genome composition that enables the rapid identification of species and their interspecies hybrids. sppIDer exists as a Docker image, which is a reusable, reproducible, transparent, and simple-to-run package that automates the pipeline and installation of the required dependencies (https://github.com/GLBRC/sppIDer).

Download Full-text

Accounting for Errors in Low Coverage High-Throughput Sequencing Data when Constructing Genetic Maps using Biparental Outcrossed Populations

10.1101/249722 ◽

2018 ◽

Author(s):

Timothy P. Bilton ◽

Matthew R. Schofield ◽

Michael A. Black ◽

David Chagné ◽

Phillip L. Wilcox ◽

...

Keyword(s):

High Throughput ◽

Genetic Linkage ◽

High Throughput Sequencing ◽

Diploid Species ◽

Genotyping By Sequencing ◽

Genetic Maps ◽

Linkage Maps ◽

Sequencing Data ◽

Genetic Linkage Maps ◽

Low Coverage

ABSTRACTNext generation sequencing is an efficient method that allows for substantially more markers than previous technologies, providing opportunities for building high density genetic linkage maps, which facilitate the development of non-model species’ genomic assemblies and the investigation of their genes. However, constructing genetic maps using data generated via high-throughput sequencing technology (e.g., genotyping-by-sequencing) is complicated by the presence of sequencing errors and genotyping errors resulting from missing parental alleles due to low sequencing depth. If unaccounted for, these errors lead to inflated genetic maps. In addition, map construction in many species is performed using full-sib family populations derived from the outcrossing of two individuals, where unknown parental phase and varying segregation types further complicate construction. We present a new methodology for modeling low coverage sequencing data in the construction of genetic linkage maps using full-sib populations of diploid species, implemented in a package called GUSMap. Our model is based on an extension of the Lander-Green hidden Markov model that accounts for errors present in sequencing data. Results show that GUSMap was able to give accurate estimates of the recombination fractions and overall map distance, while most existing mapping packages produced inflated genetic maps in the presence of errors. Our results demonstrate the feasibility of using low coverage sequencing data to produce genetic maps without requiring extensive filtering of potentially erroneous genotypes, provided that the associated errors are correctly accounted for in the model.

Download Full-text

Handling of targeted amplicon sequencing data focusing on index hopping and demultiplexing using a nested metabarcoding approach in ecology

Scientific Reports ◽

10.1038/s41598-021-98018-4 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Yasemin Guenay-Greunke ◽

David A. Bohan ◽

Michael Traugott ◽

Corinna Wallinger

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Cost Effective ◽

Amplicon Sequencing ◽

Sequencing Depth ◽

Sequencing Error ◽

Sequencing Data ◽

Large Sample ◽

Sequencing Errors ◽

Plant Feeding

AbstractHigh-throughput sequencing platforms are increasingly being used for targeted amplicon sequencing because they enable cost-effective sequencing of large sample sets. For meaningful interpretation of targeted amplicon sequencing data and comparison between studies, it is critical that bioinformatic analyses do not introduce artefacts and rely on detailed protocols to ensure that all methods are properly performed and documented. The analysis of large sample sets and the use of predefined indexes create challenges, such as adjusting the sequencing depth across samples and taking sequencing errors or index hopping into account. However, the potential biases these factors introduce to high-throughput amplicon sequencing data sets and how they may be overcome have rarely been addressed. On the example of a nested metabarcoding analysis of 1920 carabid beetle regurgitates to assess plant feeding, we investigated: (i) the variation in sequencing depth of individually tagged samples and the effect of library preparation on the data output; (ii) the influence of sequencing errors within index regions and its consequences for demultiplexing; and (iii) the effect of index hopping. Our results demonstrate that despite library quantification, large variation in read counts and sequencing depth occurred among samples and that the sequencing error rate in bioinformatic software is essential for accurate adapter/primer trimming and demultiplexing. Moreover, setting an index hopping threshold to avoid incorrect assignment of samples is highly recommended.

Download Full-text

Heap: a highly sensitive and accurate SNP detection tool for low-coverage high-throughput sequencing data

DNA Research ◽

10.1093/dnares/dsx012 ◽

2017 ◽

Vol 24 (4) ◽

pp. 397-405 ◽

Cited By ~ 6

Author(s):

Masaaki Kobayashi ◽

Hajime Ohyanagi ◽

Hideki Takanashi ◽

Satomi Asano ◽

Toru Kudo ◽

...

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Sequencing Data ◽

Snp Detection ◽

Detection Tool ◽

High Throughput Sequencing Data ◽

Highly Sensitive ◽

Low Coverage

Download Full-text

Accounting for Errors in Low Coverage High-Throughput Sequencing Data When Constructing Genetic Maps Using Biparental Outcrossed Populations

Genetics ◽

10.1534/genetics.117.300627 ◽

2018 ◽

Vol 209 (1) ◽

pp. 65-76 ◽

Cited By ~ 18

Author(s):

Timothy P. Bilton ◽

Matthew R. Schofield ◽

Michael A. Black ◽

David Chagné ◽

Phillip L. Wilcox ◽

...

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Genetic Maps ◽

Sequencing Data ◽

High Throughput Sequencing Data ◽

Low Coverage

Download Full-text

Detecting Selection in Low-Coverage High-Throughput Sequencing Data using Principal Component Analysis

10.1101/2021.03.01.432540 ◽

2021 ◽

Author(s):

Jonas Meisner ◽

Anders Albrechtsen ◽

Kristian Hanghøj

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

False Positive Rate ◽

Principal Component ◽

Human Populations ◽

Population Genetic Study ◽

Sequencing Data ◽

High Throughput Sequencing Data ◽

Positive Rate ◽

Low Coverage

1AbstractIdentification of selection signatures between populations is often an important part of a population genetic study. Leveraging high-throughput DNA sequencing larger sample sizes of populations with similar ancestries has become increasingly common. This has led to the need of methods capable of identifying signals of selection in populations with a continuous cline of genetic differentiation. Individuals from continuous populations are inherently challenging to group into meaningful units which is why existing methods rely on principal components analysis for inference of the selection signals. These existing methods require called genotypes as input which is problematic for studies based on low-coverage sequencing data. Here, we present two selections statistics which we have implemented in the PCAngsd framework. These methods account for genotype uncertainty, opening for the opportunity to conduct selection scans in continuous populations from low and/or variable coverage sequencing data. To illustrate their use, we applied the methods to low-coverage sequencing data from human populations of East Asian and European ancestries and show that the implemented selection statistics can control the false positive rate and that they identify the same signatures of selection from low-coverage sequencing data as state-of-the-art software using high quality called genotypes. Moreover, we show that PCAngsd outperform selection statistics obtained from called genotypes from low-coverage sequencing data.

Download Full-text

Faculty Opinions recommendation of Coalescent Inference Using Serially Sampled, High-Throughput Sequencing Data from Intrahost HIV Infection.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.726132071.793531014 ◽

2017 ◽

Author(s):

Sarah Rowland-Jones ◽

Sophie Andrews

Keyword(s):

Hiv Infection ◽

High Throughput ◽

High Throughput Sequencing ◽

Sequencing Data ◽

High Throughput Sequencing Data

Download Full-text

BlindCall: ultra-fast base-calling of high-throughput sequencing data by blind deconvolution

Bioinformatics ◽

10.1093/bioinformatics/btu010 ◽

2014 ◽

Vol 30 (9) ◽

pp. 1214-1219 ◽

Cited By ~ 6

Author(s):

C. Ye ◽

C. Hsiao ◽

H. Corrada Bravo

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Blind Deconvolution ◽

Sequencing Data ◽

Base Calling ◽

High Throughput Sequencing Data

Download Full-text