scholarly journals Linkage Disequilibrium Estimation in Low Coverage High-Throughput Sequencing Data

2017 ◽  
Author(s):  
Timothy P. Bilton ◽  
John C. McEwan ◽  
Shannon M. Clarke ◽  
Rudiger Brauning ◽  
Tracey C. van Stijn ◽  
...  

AbstractHigh-throughput sequencing methods that multiplex a large number of individuals have provided a cost-effective approach for discovering genome-wide genetic variation in large populations. These sequencing methods are increasingly being utilized in population genetic studies across a diverse range of species. One side-effect of these methods, however, is that one or more alleles at a particular locus may not be sequenced, particularly when the sequencing depth is low, resulting in some heterozygous genotypes being called as homozygous. Under-called heterozygous genotypes have a profound effect on the estimation of linkage disequilibrium and, if not taken into account, leads to inaccurate estimates. We developed a new likelihood method, GUS-LD, to estimate pairwise linkage disequilibrium using low coverage sequencing data that accounts for under-called heterozygous genotypes. Our findings show that accurate estimates were obtained using GUS-LD on low coverage sequencing data, whereas underestimation of linkage disequilibrium results if no adjustment is made for under-called heterozygotes.

Genetics ◽  
2018 ◽  
Vol 209 (2) ◽  
pp. 389-400 ◽  
Author(s):  
Timothy P. Bilton ◽  
John C. McEwan ◽  
Shannon M. Clarke ◽  
Rudiger Brauning ◽  
Tracey C. van Stijn ◽  
...  

2018 ◽  
Author(s):  
Quinn K. Langdon ◽  
David Peris ◽  
Brian Kyle ◽  
Chris Todd Hittinger

AbstractThe genomics era has expanded our knowledge about the diversity of the living world, yet harnessing high-throughput sequencing data to investigate alternative evolutionary trajectories, such as hybridization, is still challenging. Here we present sppIDer, a pipeline for the characterization of interspecies hybrids and pure species,that illuminates the complete composition of genomes. sppIDer maps short-read sequencing data to a combination genome built from reference genomes of several species of interest and assesses the genomic contribution and relative ploidy of each parental species, producing a series of colorful graphical outputs ready for publication. As a proof-of-concept, we use the genus Saccharomyces to detect and visualize both interspecies hybrids and pure strains, even with missing parental reference genomes. Through simulation, we show that sppIDer is robust to variable reference genome qualities and performs well with low-coverage data. We further demonstrate the power of this approach in plants, animals, and other fungi. sppIDer is robust to many different inputs and provides visually intuitive insight into genome composition that enables the rapid identification of species and their interspecies hybrids. sppIDer exists as a Docker image, which is a reusable, reproducible, transparent, and simple-to-run package that automates the pipeline and installation of the required dependencies (https://github.com/GLBRC/sppIDer).


2018 ◽  
Author(s):  
Timothy P. Bilton ◽  
Matthew R. Schofield ◽  
Michael A. Black ◽  
David Chagné ◽  
Phillip L. Wilcox ◽  
...  

ABSTRACTNext generation sequencing is an efficient method that allows for substantially more markers than previous technologies, providing opportunities for building high density genetic linkage maps, which facilitate the development of non-model species’ genomic assemblies and the investigation of their genes. However, constructing genetic maps using data generated via high-throughput sequencing technology (e.g., genotyping-by-sequencing) is complicated by the presence of sequencing errors and genotyping errors resulting from missing parental alleles due to low sequencing depth. If unaccounted for, these errors lead to inflated genetic maps. In addition, map construction in many species is performed using full-sib family populations derived from the outcrossing of two individuals, where unknown parental phase and varying segregation types further complicate construction. We present a new methodology for modeling low coverage sequencing data in the construction of genetic linkage maps using full-sib populations of diploid species, implemented in a package called GUSMap. Our model is based on an extension of the Lander-Green hidden Markov model that accounts for errors present in sequencing data. Results show that GUSMap was able to give accurate estimates of the recombination fractions and overall map distance, while most existing mapping packages produced inflated genetic maps in the presence of errors. Our results demonstrate the feasibility of using low coverage sequencing data to produce genetic maps without requiring extensive filtering of potentially erroneous genotypes, provided that the associated errors are correctly accounted for in the model.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Yasemin Guenay-Greunke ◽  
David A. Bohan ◽  
Michael Traugott ◽  
Corinna Wallinger

AbstractHigh-throughput sequencing platforms are increasingly being used for targeted amplicon sequencing because they enable cost-effective sequencing of large sample sets. For meaningful interpretation of targeted amplicon sequencing data and comparison between studies, it is critical that bioinformatic analyses do not introduce artefacts and rely on detailed protocols to ensure that all methods are properly performed and documented. The analysis of large sample sets and the use of predefined indexes create challenges, such as adjusting the sequencing depth across samples and taking sequencing errors or index hopping into account. However, the potential biases these factors introduce to high-throughput amplicon sequencing data sets and how they may be overcome have rarely been addressed. On the example of a nested metabarcoding analysis of 1920 carabid beetle regurgitates to assess plant feeding, we investigated: (i) the variation in sequencing depth of individually tagged samples and the effect of library preparation on the data output; (ii) the influence of sequencing errors within index regions and its consequences for demultiplexing; and (iii) the effect of index hopping. Our results demonstrate that despite library quantification, large variation in read counts and sequencing depth occurred among samples and that the sequencing error rate in bioinformatic software is essential for accurate adapter/primer trimming and demultiplexing. Moreover, setting an index hopping threshold to avoid incorrect assignment of samples is highly recommended.


DNA Research ◽  
2017 ◽  
Vol 24 (4) ◽  
pp. 397-405 ◽  
Author(s):  
Masaaki Kobayashi ◽  
Hajime Ohyanagi ◽  
Hideki Takanashi ◽  
Satomi Asano ◽  
Toru Kudo ◽  
...  

Genetics ◽  
2018 ◽  
Vol 209 (1) ◽  
pp. 65-76 ◽  
Author(s):  
Timothy P. Bilton ◽  
Matthew R. Schofield ◽  
Michael A. Black ◽  
David Chagné ◽  
Phillip L. Wilcox ◽  
...  

2021 ◽  
Author(s):  
Jonas Meisner ◽  
Anders Albrechtsen ◽  
Kristian Hanghøj

1AbstractIdentification of selection signatures between populations is often an important part of a population genetic study. Leveraging high-throughput DNA sequencing larger sample sizes of populations with similar ancestries has become increasingly common. This has led to the need of methods capable of identifying signals of selection in populations with a continuous cline of genetic differentiation. Individuals from continuous populations are inherently challenging to group into meaningful units which is why existing methods rely on principal components analysis for inference of the selection signals. These existing methods require called genotypes as input which is problematic for studies based on low-coverage sequencing data. Here, we present two selections statistics which we have implemented in the PCAngsd framework. These methods account for genotype uncertainty, opening for the opportunity to conduct selection scans in continuous populations from low and/or variable coverage sequencing data. To illustrate their use, we applied the methods to low-coverage sequencing data from human populations of East Asian and European ancestries and show that the implemented selection statistics can control the false positive rate and that they identify the same signatures of selection from low-coverage sequencing data as state-of-the-art software using high quality called genotypes. Moreover, we show that PCAngsd outperform selection statistics obtained from called genotypes from low-coverage sequencing data.


Sign in / Sign up

Export Citation Format

Share Document