Indexcov: fast coverage quality control for whole-genome sequencing

Mapping Intimacies ◽

10.1101/148296 ◽

2017 ◽

Author(s):

Brent S. Pedersen ◽

Ryan L. Collins ◽

Michael E. Talkowski ◽

Aaron R. Quinlan

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Large Scale ◽

Genomic Region ◽

Chromosomal Anomalies ◽

Whole Genome ◽

Sequence Alignments ◽

Linear Index ◽

Coverage Quality ◽

Genomic Regions

AbstractThe BAM1 and CRAM2 formats provide a supplementary linear index that facilitates rapid access to sequence alignments in arbitrary genomic regions. Comparing consecutive entries in a BAM or CRAM index allows one to infer the number of alignment records per genomic region for use as an effective proxy of sequence depth in each genomic region. Based on these properties, we have developed indexcov, an efficient estimator of whole-genome sequencing coverage to rapidly identify samples with aberrant coverage profiles, reveal large scale chromosomal anomalies, recognize potential batch effects, and infer the sex of a sample. Indexcov is available at: https://github.com/brentp/goleft under the MIT license.

Download Full-text

0306 Exploring the feasibility of using copy number variants as genetic markers through large-scale whole genome sequencing experiments

Journal of Animal Science ◽

10.2527/jam2016-0306 ◽

2016 ◽

Vol 94 (suppl_5) ◽

pp. 146-146

Author(s):

D. M. Bickhart ◽

L. Xu ◽

J. L. Hutchison ◽

J. B. Cole ◽

D. J. Null ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genetic Markers ◽

Genome Sequencing ◽

Copy Number ◽

Large Scale ◽

Copy Number Variants ◽

Whole Genome

Download Full-text

Plasmids or no plasmids? A comparison between the agilent TapeStation and whole-genome sequencing data in a large-scale bacterial sequencing project

10.26226/morressier.56d5ba27d462b80296c95fe7 ◽

2016 ◽

Author(s):

Sarah Alexander

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Large Scale ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Sequencing Project

Download Full-text

Assessing genomic diversity and signatures of selection in Jiaxian Red cattle using whole-genome sequencing data

BMC Genomics ◽

10.1186/s12864-020-07340-0 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Xiaoting Xia ◽

Shunjin Zhang ◽

Huaju Zhang ◽

Zijing Zhang ◽

Ningbo Chen ◽

...

Keyword(s):

Population Structure ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Genomic Variation ◽

Genomic Diversity ◽

System Response ◽

Whole Genome ◽

Population Structure Analysis ◽

Native Cattle ◽

Genomic Regions

Abstract Background Native cattle breeds are an important source of genetic variation because they might carry alleles that enable them to adapt to local environment and tough feeding conditions. Jiaxian Red, a Chinese native cattle breed, is reported to have originated from crossbreeding between taurine and indicine cattle; their history as a draft and meat animal dates back at least 30 years. Using whole-genome sequencing (WGS) data of 30 animals from the core breeding farm, we investigated the genetic diversity, population structure and genomic regions under selection of Jiaxian Red cattle. Furthermore, we used 131 published genomes of world-wide cattle to characterize the genomic variation of Jiaxian Red cattle. Results The population structure analysis revealed that Jiaxian Red cattle harboured the ancestry with East Asian taurine (0.493), Chinese indicine (0.379), European taurine (0.095) and Indian indicine (0.033). Three methods (nucleotide diversity, linkage disequilibrium decay and runs of homozygosity) implied the relatively high genomic diversity in Jiaxian Red cattle. We used θπ, CLR, FST and XP-EHH methods to look for the candidate signatures of positive selection in Jiaxian Red cattle. A total number of 171 (θπ and CLR) and 17 (FST and XP-EHH) shared genes were identified using different detection strategies. Functional annotation analysis revealed that these genes are potentially responsible for growth and feed efficiency (CCSER1), meat quality traits (ROCK2, PPP1R12A, CYB5R4, EYA3, PHACTR1), fertility (RFX4, SRD5A2) and immune system response (SLAMF1, CD84 and SLAMF6). Conclusion We provide a comprehensive overview of sequence variations in Jiaxian Red cattle genomes. Selection signatures were detected in genomic regions that are possibly related to economically important traits in Jiaxian Red cattle. We observed a high level of genomic diversity and low inbreeding in Jiaxian Red cattle. These results provide a basis for further resource protection and breeding improvement of this breed.

Download Full-text

A large-scale whole-genome sequencing analysis reveals false positives of bacterial essential genes

Applied Microbiology and Biotechnology ◽

10.1007/s00253-021-11702-3 ◽

2021 ◽

Author(s):

Yuanhao Li ◽

Bo Jiang ◽

Weijun Dai

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Large Scale ◽

False Positives ◽

Essential Genes ◽

Whole Genome ◽

Sequencing Analysis

Download Full-text

Improving tuberculosis surveillance by detecting international transmission using publicly available whole-genome sequencing data

10.1101/834150 ◽

2019 ◽

Author(s):

Andrea Sanchini ◽

Christine Jandrasits ◽

Julius Tembrockhaus ◽

Thomas Andreas Kohl ◽

Christian Utpatel ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Large Scale ◽

Added Value ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

International Transmission ◽

The Public ◽

Public Dataset ◽

Public Repositories

AbstractIntroductionImproving the surveillance of tuberculosis (TB) is especially important for multidrug-resistant (MDR) and extensively drug-resistant (XDR)-TB. The large amount of publicly available whole-genome sequencing (WGS) data for TB gives us the chance to re-use data and to perform additional analysis at a large scale.AimWe assessed the usefulness of raw WGS data of global MDR/XDR-TB isolates available from public repositories to improve TB surveillance.MethodsWe extracted raw WGS data and the related metadata of Mycobacterium tuberculosis isolates available from the Sequence Read Archive. We compared this public dataset with WGS data and metadata of 131 MDR- and XDR-TB isolates from Germany in 2012-2013.ResultsWe aggregated a dataset that includes 1,081 MDR and 250 XDR isolates among which we identified 133 molecular clusters. In 16 clusters, the isolates were from at least two different countries. For example, cluster2 included 56 MDR/XDR isolates from Moldova, Georgia, and Germany. By comparing the WGS data from Germany and the public dataset, we found that 11 clusters contained at least one isolate from Germany and at least one isolate from another country. We could, therefore, connect TB cases despite missing epidemiological information.ConclusionWe demonstrated the added value of using WGS raw data from public repositories to contribute to TB surveillance. By comparing the German and the public dataset, we identified potential international transmission events. Thus, using this approach might support the interpretation of national surveillance results in an international context.

Download Full-text

Fine mapping of genomic regions associated with female fertility in Nellore beef cattle based on sequence variants from segregating sires

Journal of Animal Science and Biotechnology ◽

10.1186/s40104-019-0403-0 ◽

2019 ◽

Vol 10 (1) ◽

Author(s):

Gerson A. Oliveira Júnior ◽

Daniel J. A. Santos ◽

Aline S. M. Cesar ◽

Solomon A. Boison ◽

Ricardo V. Ventura ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Fine Mapping ◽

Reproductive Hormones ◽

Sequence Variants ◽

Whole Genome ◽

Biological Processes ◽

Snp Chip ◽

Mesh Terms ◽

Genomic Regions

Abstract Background Impaired fertility in cattle limits the efficiency of livestock production systems. Unraveling the genetic architecture of fertility traits would facilitate their improvement by selection. In this study, we characterized SNP chip haplotypes at QTL blocks then used whole-genome sequencing to fine map genomic regions associated with reproduction in a population of Nellore (Bos indicus) heifers. Methods The dataset comprised of 1337 heifers genotyped using a GeneSeek® Genomic Profiler panel (74677 SNPs), representing the daughters from 78 sires. After performing marker quality control, 64800 SNPs were retained. Haplotypes carried by each sire at six previously identified QTL on BTAs 5, 14 and 18 for heifer pregnancy and BTAs 8, 11 and 22 for antral follicle count were constructed using findhap software. The significance of the contrasts between the effects of every two paternally-inherited haplotype alleles were used to identify sires that were heterozygous at each QTL. Whole-genome sequencing data localized to the haplotypes from six sires and 20 other ancestors were used to identify sequence variants that were concordant with the haplotype contrasts. Enrichment analyses were applied to these variants using KEGG and MeSH libraries. Results A total of six (BTA 5), six (BTA 14) and five (BTA 18) sires were heterozygous for heifer pregnancy QTL whereas six (BTA 8), fourteen (BTA 11), and five (BTA 22) sires were heterozygous for number of antral follicles’ QTL. Due to inadequate representation of many haplotype alleles in the sequenced animals, fine mapping analysis could only be reliably performed for the QTL on BTA 5 and 14, which had 641 and 3733 concordant candidate sequence variants, respectively. The KEGG “Circadian rhythm” and “Neurotrophin signaling pathway” were significantly associated with the genes in the QTL on BTA 5 whereas 32 MeSH terms were associated with the QTL on BTA 14. Among the concordant sequence variants, 0.2% and 0.3% were classified as missense variants for BTAs 5 and 14, respectively, highlighting the genes MTERF2, RTMB, ENSBTAG00000037306 (miRNA), ENSBTAG00000040351, PRKDC, and RGS20. The potential causal mutations found in the present study were associated with biological processes such as oocyte maturation, embryo development, placenta development and response to reproductive hormones. Conclusions The identification of heterozygous sires by positionally phasing SNP chip data and contrasting haplotype effects for previously detected QTL can be used for fine mapping to identify potential causal mutations and candidate genes. Genomic variants on genes MTERF2, RTBC, miRNA ENSBTAG00000037306, ENSBTAG00000040351, PRKDC, and RGS20, which are known to have influence on reproductive biological processes, were detected.

Download Full-text

Recombinational Switching of the Clostridium difficile S-Layer and a Novel Glycosylation Gene Cluster Revealed by Large-Scale Whole-Genome Sequencing

The Journal of Infectious Diseases ◽

10.1093/infdis/jis734 ◽

2012 ◽

Vol 207 (4) ◽

pp. 675-686 ◽

Cited By ~ 58

Author(s):

Kate E. Dingle ◽

Xavier Didelot ◽

M. Azim Ansari ◽

David W. Eyre ◽

Alison Vaughan ◽

...

Keyword(s):

Clostridium Difficile ◽

Whole Genome Sequencing ◽

Gene Cluster ◽

Genome Sequencing ◽

Large Scale ◽

Whole Genome

Download Full-text

Whole-Genome Sequencing for Routine Pathogen Surveillance in Public Health: a Population Snapshot of InvasiveStaphylococcus aureusin Europe

mBio ◽

10.1128/mbio.00444-16 ◽

2016 ◽

Vol 7 (3) ◽

Cited By ~ 123

Author(s):

David M. Aanensen ◽

Edward J. Feil ◽

Matthew T. G. Holden ◽

Janina Dordel ◽

Corin A. Yeats ◽

...

Keyword(s):

Public Health ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Large Scale ◽

Bacterial Pathogens ◽

Epidemiological Surveillance ◽

Data Sets ◽

Whole Genome ◽

Bioinformatic Tools ◽

Road Map

ABSTRACTThe implementation of routine whole-genome sequencing (WGS) promises to transform our ability to monitor the emergence and spread of bacterial pathogens. Here we combined WGS data from 308 invasiveStaphylococcus aureusisolates corresponding to a pan-European population snapshot, with epidemiological and resistance data. Geospatial visualization of the data is made possible by a generic software tool designed for public health purposes that is available at the project URL (http://www.microreact.org/project/EkUvg9uY?tt=rc). Our analysis demonstrates that high-risk clones can be identified on the basis of population level properties such as clonal relatedness, abundance, and spatial structuring and by inferring virulence and resistance properties on the basis of gene content. We also show thatin silicopredictions of antibiotic resistance profiles are at least as reliable as phenotypic testing. We argue that this work provides a comprehensive road map illustrating the three vital components for future molecular epidemiological surveillance: (i) large-scale structured surveys, (ii) WGS, and (iii) community-oriented database infrastructure and analysis tools.IMPORTANCEThe spread of antibiotic-resistant bacteria is a public health emergency of global concern, threatening medical intervention at every level of health care delivery. Several recent studies have demonstrated the promise of routine whole-genome sequencing (WGS) of bacterial pathogens for epidemiological surveillance, outbreak detection, and infection control. However, as this technology becomes more widely adopted, the key challenges of generating representative national and international data sets and the development of bioinformatic tools to manage and interpret the data become increasingly pertinent. This study provides a road map for the integration of WGS data into routine pathogen surveillance. We emphasize the importance of large-scale routine surveys to provide the population context for more targeted or localized investigation and the development of open-access bioinformatic tools to provide the means to combine and compare independently generated data with publicly available data sets.

Download Full-text

Robust and rapid algorithms facilitate large-scale whole genome sequencing downstream analysis in an integrative framework

Nucleic Acids Research ◽

10.1093/nar/gkx019 ◽

2017 ◽

pp. gkx019 ◽

Cited By ~ 6

Author(s):

Miaoxin Li ◽

Jiang Li ◽

Mulin Jun Li ◽

Zhicheng Pan ◽

Jacob Shujui Hsu ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Large Scale ◽

Whole Genome ◽

Integrative Framework ◽

Downstream Analysis

Download Full-text

Fast and inexpensive whole genome sequencing library preparation from intact yeast cells

10.1101/2020.09.03.280990 ◽

2020 ◽

Author(s):

Sibylle C Vonesch ◽

Shengdi Li ◽

Chelsea Szu Tu ◽

Bianca P Hennig ◽

Nikolay Dobrev ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Genomic Dna ◽

Large Scale ◽

Massively Parallel Sequencing ◽

Yeast Cells ◽

Whole Genome ◽

High Quality ◽

Rapid Preparation ◽

Yeast Cultures

ABSTRACTThrough the increase in the capacity of sequencing machines massively parallel sequencing of thousands of samples in a single run is now possible. With the improved throughput and resulting drop in the price of sequencing, the cost and time for preparation of sequencing libraries have become the major bottleneck in large-scale experiments. Methods using a hyperactive variant of the Tn5 transposase efficiently generate libraries starting from cDNA or genomic DNA in a few hours and are highly scalable. For genome sequencing, however, the time and effort spent on genomic DNA isolation limits the practicability of sequencing large numbers of samples. Here, we describe a highly scalable method for preparing high quality whole-genome sequencing libraries directly from yeast cultures in less than three hours at 34 cents per sample. We skip the rate-limiting step of genomic DNA extraction by directly tagmenting yeast spheroplasts and add a nucleosome release step prior to enrichment PCR to improve the evenness of genomic coverage. Resulting libraries do not show any GC-bias and are comparable in quality to libraries processed from genomic DNA with a commercially available Tn5-based kit. We use our protocol to investigate CRISPR/Cas9 on- and off-target edits and reliably detect edited variants and shared polymorphisms between strains. Our protocol enables rapid preparation of unbiased and high-quality, sequencing-ready indexed libraries for hundreds of yeast strains in a single day at a low price. By adjusting individual steps of our workflow we expect that our protocol can be adapted to other organisms.

Download Full-text