Nuclear Orthologs Derived from Whole Genome Sequencing Indicate Cryptic Diversity in the Bemisia tabaci (Insecta: Aleyrodidae) Complex of Whiteflies

Robert S. de Moya; Judith K. Brown; Andrew D. Sweet; Kimberly K. O. Walden; Jorge R. Paredes-Montero; Robert M. Waterhouse; Kevin P. Johnson

doi:10.3390/d11090151

Batch effects in population genomic studies with low‐coverage whole genome sequencing data: causes, detection, and mitigation

Molecular Ecology Resources ◽

10.1111/1755-0998.13559 ◽

2021 ◽

Author(s):

Runyang Nicolas Lou ◽

Nina Overgaard Therkildsen

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Batch Effects ◽

Sequencing Data ◽

Population Genomic ◽

Genomic Studies ◽

Low Coverage

Download Full-text

Extensive global movement of multidrug-resistant M. tuberculosis strains revealed by whole-genome analysis

Thorax ◽

10.1136/thoraxjnl-2018-211616 ◽

2019 ◽

Vol 74 (9) ◽

pp. 882-889 ◽

Cited By ~ 8

Author(s):

Keira A Cohen ◽

Abigail L Manson ◽

Thomas Abeel ◽

Christopher A Desjardins ◽

Sinead B Chapman ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Data Sharing ◽

Genome Sequencing ◽

Multidrug Resistant ◽

Whole Genome ◽

Whole Genome Analysis ◽

Global Spread ◽

Mdr Tb ◽

Global Movement ◽

Genomic Studies

BackgroundWhile the international spread of multidrug-resistant (MDR) Mycobacterium tuberculosis strains is an acknowledged public health threat, a broad and more comprehensive examination of the global spread of MDR-tuberculosis (TB) using whole-genome sequencing has not yet been performed.MethodsIn a global dataset of 5310 M. tuberculosis whole-genome sequences isolated from five continents, we performed a phylogenetic analysis to identify and characterise clades of MDR-TB with respect to geographic dispersion.ResultsExtensive international dissemination of MDR-TB was observed, with identification of 32 migrant MDR-TB clades with descendants isolated in 17 unique countries. Relatively recent movement of strains from both Beijing and non-Beijing lineages indicated successful global spread of varied genetic backgrounds. Migrant MDR-TB clade members shared relatively recent common ancestry, with a median estimate of divergence of 13–27 years. Migrant extensively drug-resistant (XDR)-TB clades were not observed, although development of XDR-TB within migratory MDR-TB clades was common.ConclusionsApplication of genomic techniques to investigate global MDR migration patterns revealed extensive global spread of MDR clades between countries of varying TB burden. Further expansion of genomic studies to incorporate isolates from diverse global settings into a single analysis, as well as data sharing platforms that facilitate genomic data sharing across country lines, may allow for future epidemiological analyses to monitor for international transmission of MDR-TB. In addition, efforts to perform routine whole-genome sequencing on all newly identified M. tuberculosis, like in England, will serve to better our understanding of the transmission dynamics of MDR-TB globally.

Download Full-text

Abundant sequence divergence in the native Japanese cattle Mishima-Ushi ( Bos taurus ) detected using whole-genome sequencing

Genomics ◽

10.1016/j.ygeno.2013.08.002 ◽

2013 ◽

Vol 102 (4) ◽

pp. 372-378 ◽

Cited By ~ 10

Author(s):

Kaoru Tsuda ◽

Ryouka Kawahara-Miki ◽

Satoshi Sano ◽

Misaki Imai ◽

Tatsuo Noguchi ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Bos Taurus ◽

Sequence Divergence ◽

Whole Genome

Download Full-text

Rare and common variant discovery by whole-genome sequencing of 101 Thoroughbred racehorses

Scientific Reports ◽

10.1038/s41598-021-95669-1 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Teruaki Tozaki ◽

Aoi Ohnuma ◽

Mio Kikuchi ◽

Taichiro Ishige ◽

Hironaga Kakoi ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Rare Variants ◽

Industrial Applications ◽

Whole Genome ◽

X Chromosomes ◽

Variant Discovery ◽

Functional Variants ◽

Thoroughbred Racehorses ◽

Using Data

AbstractThe Thoroughbred breed was formed by crossing Oriental horse breeds and British native horses and is currently used in horseracing worldwide. In this study, we constructed a single-nucleotide variant (SNV) database using data from 101 Thoroughbred racehorses. Whole genome sequencing (WGS) revealed 11,570,312 and 602,756 SNVs in autosomal (1–31) and X chromosomes, respectively, yielding a total of 12,173,068 SNVs. About 6.9% of identified SNVs were rare variants observed only in one allele in 101 horses. The number of SNVs detected in individual horses ranged from 4.8 to 5.3 million. Individual horses had a maximum of 25,554 rare variants; several of these were functional variants, such as non-synonymous substitutions, start-gained, start-lost, stop-gained, and stop-lost variants. Therefore, these rare variants may affect differences in traits and phenotypes among individuals. When observing the distribution of rare variants among horses, one breeding stallion had a smaller number of rare variants compared to other horses, suggesting that the frequency of rare variants in the Japanese Thoroughbred population increases through breeding. In addition, our variant database may provide useful basic information for industrial applications, such as the detection of genetically modified racehorses in gene-doping control and pedigree-registration of racehorses using SNVs as markers.

Download Full-text

Batch effects in population genomic studies with low-coverage whole genome sequencing data: causes, detection, and mitigation

10.22541/au.162791857.78788821/v1 ◽

2021 ◽

Author(s):

Runyang Nicolas Lou ◽

Nina Overgaard Therkildsen

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Read Length ◽

Whole Genome ◽

Batch Effects ◽

Sequencing Data ◽

Population Genomic ◽

Spatial Coverage ◽

Genomic Studies ◽

Low Coverage

Over the past few decades, the rapid democratization of high-throughput sequencing and the growing emphasis on open science practices have resulted in an explosion in the amount of publicly available sequencing data. This opens new opportunities for combining datasets to achieve unprecedented sample sizes, spatial coverage, or temporal replication in population genomic studies. However, a common concern is that non-biological differences between datasets may generate batch effects that can confound real biological patterns. Despite general awareness about the risk of batch effects, few studies have examined empirically how they manifest in real datasets, and it remains unclear what factors cause batch effects and how to best detect and mitigate their impact bioinformatically. In this paper, we compare two batches of low-coverage whole genome sequencing (lcWGS) data generated from the same populations of Atlantic cod (Gadus morhua). First, we show that with a “batch-effect-naive” bioinformatic pipeline, batch effects severely biased our genetic diversity estimates, population structure inference, and selection scan. We then demonstrate that these batch effects resulted from multiple technical differences between our datasets, including the sequencing instrument model/chemistry, read type, read length, DNA degradation level, and sequencing depth, but their impact can be detected and substantially mitigated with simple bioinformatic approaches. We conclude that combining datasets remains a powerful approach as long as batch effects are explicitly accounted for. We focus on lcWGS data in this paper, which may be particularly vulnerable to certain causes of batch effects, but many of our conclusions also apply to other sequencing strategies.

Download Full-text

An Amplicon-Based Approach for the Whole-Genome Sequencing of Human Metapneumovirus

Viruses ◽

10.3390/v13030499 ◽

2021 ◽

Vol 13 (3) ◽

pp. 499

Author(s):

Rachel L. Tulloch ◽

Jen Kok ◽

Ian Carter ◽

Dominic E. Dwyer ◽

John-Sebastian Eden

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Epidemiological Studies ◽

Human Metapneumovirus ◽

Clinical Samples ◽

Whole Genome ◽

Genomic Studies ◽

Primer Sets ◽

Tract Disease ◽

Sequencing Platforms

Human metapneumovirus (HMPV) is an important cause of upper and lower respiratory tract disease in individuals of all ages. It is estimated that most individuals will be infected by HMPV by the age of five years old. Despite this burden of disease, there remain caveats in our knowledge of global genetic diversity due to a lack of HMPV sequencing, particularly at the whole-genome scale. The purpose of this study was to create a simple and robust approach for HMPV whole-genome sequencing to be used for genomic epidemiological studies. To design our assay, all available HMPV full-length genome sequences were downloaded from the National Center for Biotechnology Information (NCBI) GenBank database and used to design four primer sets to amplify long, overlapping amplicons spanning the viral genome and, importantly, specific to all known HMPV subtypes. These amplicons were then pooled and sequenced on an Illumina iSeq 100 (Illumina, San Diego, CA, USA); however, the approach is suitable to other common sequencing platforms. We demonstrate the utility of this method using a representative subset of clinical samples and examine these sequences using a phylogenetic approach. Here we present an amplicon-based method for the whole-genome sequencing of HMPV from clinical extracts that can be used to better inform genomic studies of HMPV epidemiology and evolution.

Download Full-text

A beginner's guide to low-coverage whole genome sequencing for population genomics

10.22541/au.160689616.68843086/v3 ◽

2021 ◽

Author(s):

Runyang Nicolas Lou ◽

Arne Jacobs ◽

Aryn Wilder ◽

Nina Overgaard Therkildsen

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Population Genomics ◽

Cost Effective ◽

Whole Genome ◽

Model Species ◽

Population Genomic ◽

Genomic Studies ◽

Lower Depth ◽

Low Coverage

Low-coverage whole genome sequencing (lcWGS) has emerged as a powerful and cost-effective approach for population genomic studies in both model and non-model species. However, with read depths too low to confidently call individual genotypes, lcWGS requires specialized analysis tools that explicitly account for genotype uncertainty. A growing number of such tools have become available, but it can be difficult to get an overview of what types of analyses can be performed reliably with lcWGS data, and how the distribution of sequencing effort between the number of samples analyzed and per-sample sequencing depths affects inference accuracy. In this introductory guide to lcWGS, we first illustrate how the per-sample cost for lcWGS is now comparable to RAD-seq and Pool-seq in many systems. We then provide an overview of software packages that explicitly account for genotype uncertainty in different types of population genomic inference. Next, we use both simulated and empirical data to assess the accuracy of allele frequency and genetic diversity estimation, detection of population structure, and selection scans under different sequencing strategies. Our results show that spreading a given amount of sequencing effort across more samples with lower depth per sample consistently improves the accuracy of most types of inference, with a few notable exceptions. Finally, we assess the potential for using imputation to bolster inference from lcWGS data in non-model species, and discuss current limitations and future perspectives for lcWGS-based population genomics research. With this overview, we hope to make lcWGS more approachable and stimulate its broader adoption.

Download Full-text

A beginner's guide to low-coverage whole genome sequencing for population genomics

10.22541/au.160689616.68843086/v4 ◽

2021 ◽

Author(s):

Runyang Nicolas Lou ◽

Arne Jacobs ◽

Aryn Wilder ◽

Nina Overgaard Therkildsen

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Population Genomics ◽

Cost Effective ◽

Whole Genome ◽

Model Species ◽

Population Genomic ◽

Genomic Studies ◽

Lower Depth ◽

Low Coverage

Low-coverage whole genome sequencing (lcWGS) has emerged as a powerful and cost-effective approach for population genomic studies in both model and non-model species. However, with read depths too low to confidently call individual genotypes, lcWGS requires specialized analysis tools that explicitly account for genotype uncertainty. A growing number of such tools have become available, but it can be difficult to get an overview of what types of analyses can be performed reliably with lcWGS data, and how the distribution of sequencing effort between the number of samples analyzed and per-sample sequencing depths affects inference accuracy. In this introductory guide to lcWGS, we first illustrate how the per-sample cost for lcWGS is now comparable to RAD-seq and Pool-seq in many systems. We then provide an overview of software packages that explicitly account for genotype uncertainty in different types of population genomic inference. Next, we use both simulated and empirical data to assess the accuracy of allele frequency and genetic diversity estimation, detection of population structure, and selection scans under different sequencing strategies. Our results show that spreading a given amount of sequencing effort across more samples with lower depth per sample consistently improves the accuracy of most types of inference, with a few notable exceptions. Finally, we assess the potential for using imputation to bolster inference from lcWGS data in non-model species, and discuss current limitations and future perspectives for lcWGS-based population genomics research. With this overview, we hope to make lcWGS more approachable and stimulate its broader adoption.

Download Full-text

Whole-Genome Sequencing of Clostridium sp. Strain FP2, Isolated from Spoiled Venison

Microbiology Resource Announcements ◽

10.1128/mra.00334-20 ◽

2020 ◽

Vol 9 (18) ◽

Author(s):

Nikola Palevich ◽

Faith P. Palevich ◽

Paul H. Maclean ◽

Ruy Jauregui ◽

Eric Altermann ◽

...

Keyword(s):

New Zealand ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Genome Sequence ◽

Draft Genome ◽

Draft Genome Sequence ◽

Functional Genomic ◽

Whole Genome ◽

Genomic Studies ◽

Red Meats

Clostridium sp. strain FP2 was isolated from vacuum-packaged refrigerated spoiled venison in New Zealand. This report describes the generation and annotation of the 5.6-Mb draft genome sequence of Clostridium sp. FP2, which will facilitate future functional genomic studies to improve our understanding of premature spoilage of red meats.

Download Full-text

Whole-genome sequencing of 1,171 elderly admixed individuals from the largest Latin American metropolis (São Paulo, Brazil)

10.21203/rs.3.rs-85969/v1 ◽

2020 ◽

Author(s):

Michel Naslavsky ◽

Marilia Scliar ◽

Guilherme Yamamoto ◽

Jaqueline Wang ◽

Stepanka Zverinova ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Latin American ◽

Population Genomics ◽

Mobile Element ◽

Whole Genome ◽

High Coverage ◽

Genomic Studies ◽

Novel Alleles ◽

Recessive Disorders

Abstract As whole-genome sequencing (WGS) becomes the gold standard tool for studying population genomics and medical applications, data on diverse non-European and admixed individuals are still scarce. Here, we present a high-coverage WGS dataset of 1,171 highly admixed elderly Brazilians from a census-based cohort, providing over 76 million variants, of which ~ 2 million are absent from large public databases. WGS enabled identifying ~ 2,000 novel mobile element insertions, nearly 5 Mb of genomic segments absent from human genome reference, and over 140 novel alleles from HLA genes. We reclassified and curated nearly four hundred variant's pathogenicity assertions in genes associated with dominantly inherited Mendelian disorders and calculated the incidence for selected recessive disorders, demonstrating the clinical usefulness of the present study. Finally, we observed that whole-genome and HLA imputation could be significantly improved compared to available datasets since rare variation represents the largest proportion of input from WGS. These results demonstrate that even smaller sample sizes of underrepresented populations bring relevant data for genomic studies, especially when exploring analyses allowed only by WGS.

Download Full-text