First de novo whole genome sequencing and assembly of the bar-headed goose

PeerJ ◽

10.7717/peerj.8914 ◽

2020 ◽

Vol 8 ◽

pp. e8914 ◽

Cited By ~ 1

Author(s):

Wen Wang ◽

Fang Wang ◽

Rongkai Hao ◽

Aizhen Wang ◽

Kirill Sharshov ◽

...

Keyword(s):

High Altitude ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Genome Assembly ◽

De Novo ◽

Gene Prediction ◽

Repetitive Sequences ◽

Gene Families ◽

Whole Genome ◽

Sequencing Data

Background The bar-headed goose (Anser indicus) mainly inhabits the plateau wetlands of Asia. As a specialized high-altitude species, bar-headed geese can migrate between South and Central Asia and annually fly twice over the Himalayan mountains along the central Asian flyway. The physiological, biochemical and behavioral adaptations of bar-headed geese to high-altitude living and flying have raised much interest. However, to date, there is still no genome assembly information publicly available for bar-headed geese. Methods In this study, we present the first de novo whole genome sequencing and assembly of the bar-headed goose, along with gene prediction and annotation. Results 10X Genomics sequencing produced a total of 124 Gb sequencing data, which can cover the estimated genome size of bar-headed goose for 103 times (average coverage). The genome assembly comprised 10,528 scaffolds, with a total length of 1.143 Gb and a scaffold N50 of 10.09 Mb. Annotation of the bar-headed goose genome assembly identified a total of 102 Mb (8.9%) of repetitive sequences, 16,428 protein-coding genes, and 282 tRNAs. In total, we determined that there were 63 expanded and 20 contracted gene families in the bar-headed goose compared with the other 15 vertebrates. We also performed a positive selection analysis between the bar-headed goose and the closely related low-altitude goose, swan goose (Anser cygnoides), to uncover its genetic adaptations to the Qinghai-Tibetan Plateau. Conclusion We reported the currently most complete genome sequence of the bar-headed goose. Our assembly will provide a valuable resource to enhance further studies of the gene functions of bar-headed goose. The data will also be valuable for facilitating studies of the evolution, population genetics and high-altitude adaptations of the bar-headed geese at the genomic level.

Download Full-text

Norgal: extraction and de novo assembly of mitochondrial DNA from whole-genome sequencing data

BMC Bioinformatics ◽

10.1186/s12859-017-1927-y ◽

2017 ◽

Vol 18 (1) ◽

Cited By ~ 21

Author(s):

Kosai Al-Nakeeb ◽

Thomas Nordahl Petersen ◽

Thomas Sicheritz-Pontén

Keyword(s):

Mitochondrial Dna ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

De Novo Assembly ◽

De Novo ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data

Download Full-text

De novo indels within introns contribute to ASD incidence

10.1101/137471 ◽

2017 ◽

Cited By ~ 2

Author(s):

Adriana Munoz ◽

Boris Yamrom ◽

Yoon-ha Lee ◽

Peter Andrews ◽

Steven Marks ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Target Genes ◽

De Novo ◽

Whole Genome Sequencing Data ◽

P Value ◽

Whole Genome ◽

Sequencing Data ◽

Control Sets ◽

The Difference

AbstractCopy number profiling and whole-exome sequencing has allowed us to make remarkable progress in our understanding of the genetics of autism over the past ten years, but there are major aspects of the genetics that are unresolved. Through whole-genome sequencing, additional types of genetic variants can be observed. These variants are abundant and to know which are functional is challenging. We have analyzed whole-genome sequencing data from 510 of the Simons Simplex Collections quad families and focused our attention on intronic variants. Within the introns of 546 high-quality autism target genes, we identified 63 de novo indels in the affected and only 37 in the unaffected siblings. The difference of 26 events is significantly larger than expected (p-val = 0.01) and using reasonable extrapolation shows that de novo intronic indels can contribute to at least 10% of simplex autism. The significance increases if we restrict to the half of the autism targets that are intolerant to damaging variants in the normal human population, which half we expect to be even more enriched for autism genes. For these 273 targets we observe 43 and 20 events in affected and unaffected siblings, respectively (p-value of 0.005). There was no significant signal in the number of de novo intronic indels in any of the control sets of genes analyzed. We see no signal from de novo substitutions in the introns of target genes.

Download Full-text

Harmonization of whole-genome sequencing for outbreak surveillance of Enterobacteriaceae and Enterococci

Microbial Genomics ◽

10.1099/mgen.0.000567 ◽

2021 ◽

Vol 7 (7) ◽

Author(s):

Casper Jamin ◽

Sien De Koster ◽

Stefanie van Koeveringe ◽

Dieter De Coninck ◽

Klaas Mensaert ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Type Species ◽

De Novo ◽

Whole Genome ◽

Data Generation ◽

Sequencing Data ◽

Content Type ◽

Link Type ◽

Antimicrobial Resistance Genes

Whole-genome sequencing (WGS) is becoming the de facto standard for bacterial typing and outbreak surveillance of resistant bacterial pathogens. However, interoperability for WGS of bacterial outbreaks is poorly understood. We hypothesized that harmonization of WGS for outbreak surveillance is achievable through the use of identical protocols for both data generation and data analysis. A set of 30 bacterial isolates, comprising of various species belonging to the Enterobacteriaceae family and Enterococcus genera, were selected and sequenced using the same protocol on the Illumina MiSeq platform in each individual centre. All generated sequencing data were analysed by one centre using BioNumerics (6.7.3) for (i) genotyping origin of replications and antimicrobial resistance genes, (ii) core-genome multi-locus sequence typing (cgMLST) for Escherichia coli and Klebsiella pneumoniae and whole-genome multi-locus sequencing typing (wgMLST) for all species. Additionally, a split k-mer analysis was performed to determine the number of SNPs between samples. A precision of 99.0% and an accuracy of 99.2% was achieved for genotyping. Based on cgMLST, a discrepant allele was called only in 2/27 and 3/15 comparisons between two genomes, for E. coli and K. pneumoniae, respectively. Based on wgMLST, the number of discrepant alleles ranged from 0 to 7 (average 1.6). For SNPs, this ranged from 0 to 11 SNPs (average 3.4). Furthermore, we demonstrate that using different de novo assemblers to analyse the same dataset introduces up to 150 SNPs, which surpasses most thresholds for bacterial outbreaks. This shows the importance of harmonization of data-processing surveillance of bacterial outbreaks. In summary, multi-centre WGS for bacterial surveillance is achievable, but only if protocols are harmonized.

Download Full-text

Whole-genome sequencing of 182 Bursaphelenchus xylophilus strains generates first long read based de novo genome assembly and reveals temperature associated population structure

10.22541/au.159352211.19983305 ◽

2020 ◽

Author(s):

Xiaolei Ding ◽

Yunfei Guo ◽

Jianren Ye ◽

Xiaoqin Wu ◽

Sixi Lin ◽

...

Keyword(s):

Population Structure ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Genome Assembly ◽

De Novo ◽

Bursaphelenchus Xylophilus ◽

Whole Genome ◽

De Novo Genome Assembly ◽

Long Read

Download Full-text

Genome Wide Variant Analysis of Simplex Autism Families with an Integrative Clinical-Bioinformatics Pipeline

10.1101/019208 ◽

2015 ◽

Author(s):

Laura T Jiménez-Barrón ◽

Jason A O'Rawe ◽

Yiyang Wu ◽

Margaret Yoon ◽

Han Fang ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

De Novo ◽

Autism Spectrum ◽

Repetitive Behaviors ◽

Whole Genome ◽

Sequencing Data ◽

Bioinformatics Pipeline ◽

Bioinformatics Tools ◽

Genome Wide

Autism spectrum disorders (ASD) are a group of developmental disabilities that affect social interaction, communication and are characterized by repetitive behaviors. There is now a large body of evidence that suggests a complex role of genetics in ASD, in which many different loci are involved. Although many current population scale genomic studies have been demonstrably fruitful, these studies generally focus on analyzing a limited part of the genome or use a limited set of bioinformatics tools. These limitations preclude the analysis of genome-wide perturbations that may contribute to the development and severity of ASD-related phenotypes. To overcome these limitations, we have developed and utilized an integrative clinical and bioinformatics pipeline for generating a more complete and reliable set of genomic variants for downstream analyses. Our study focuses on the analysis of three simplex autism families consisting of one affected child, unaffected parents, and one unaffected sibling. All members were clinically evaluated and widely phenotyped. Genotyping arrays and whole genome sequencing were performed on each member, and the resulting sequencing data were analyzed using a variety of available bioinformatics tools. We searched for rare variants of putative functional impact that were found to be segregating according to de-novo, autosomal recessive, x-linked, mitochondrial and compound heterozygote transmission models. The resulting candidate variants included three small heterozygous CNVs, a rare heterozygous de novo nonsense mutation in MYBBP1A located within exon 1, and a novel de novo missense variant in LAMB3. Our work demonstrates how more comprehensive analyses that include rich clinical data and whole genome sequencing data can generate reliable results for use in downstream investigations. We are moving to implement our framework for the analysis and study of larger cohorts of families, where statistical rigor can accompany genetic findings.

Download Full-text

Whole genome sequencing reveals high complexity of copy number variation at insecticide resistance loci in malaria mosquitoes

10.1101/399568 ◽

2018 ◽

Cited By ~ 2

Author(s):

Eric R. Lucas ◽

Alistair Miles ◽

Nicholas J. Harding ◽

Chris S. Clarkson ◽

Mara K. N. Lawniczak ◽

...

Keyword(s):

Insecticide Resistance ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Copy Number ◽

Gene Families ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Metabolic Resistance ◽

Extended Haplotype

AbstractBackgroundPolymorphisms in the copy number of a genetic region can influence gene expression, coding sequence and zygosity, making them powerful actors in the evolutionary process. Copy number variants (CNVs) are however understudied, being more difficult to detect than single nucleotide polymorphisms. We take advantage of the intense selective pressures on the major malaria vector Anopheles gambiae, caused by the widespread use of insecticides for malaria control, to investigate the role of CNVs in the evolution of insecticide resistance.ResultsUsing the whole-genome sequencing data from 1142 samples in the An. gambiae 1000 genomes project, we identified 1557 independent increases in copy number, encompassing a total of 267 genes, which were enriched for gene families linked to metabolic insecticide resistance. The five major candidate genes for metabolic resistance were all found in at least one CNV, and were often the target of multiple independent CNVs, reaching as many as 16 CNVs in Cyp9k1. These CNVs have furthermore been spreading due to positive selection, indicated by high local CNV frequencies and extended haplotype homozygosity.ConclusionsOur results demonstrate the importance of CNVs in the response to selection, with CNVs being closely associated with genes involved in the evolution of resistance to insecticides, highlighting the urgent need to identify their relative contributions to resistance and to track their spread as the application of insecticide in malaria endemic countries intensifies. Our detailed descriptions of CNVs found across the species range provides the tools to do so.

Download Full-text

Whole genome sequencing and assembly of a Caenorhabditis elegans genome with complex genomic rearrangements using the MinION sequencing device

10.1101/099143 ◽

2017 ◽

Cited By ~ 12

Author(s):

JR Tyson ◽

NJ O’Neil ◽

M Jain ◽

HE Olsen ◽

P Hieter ◽

...

Keyword(s):

Caenorhabditis Elegans ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Genome Assembly ◽

De Novo ◽

Sequence Data ◽

Genomic Rearrangements ◽

Whole Genome ◽

C Elegans ◽

Long Reads

ABSTRACTAdvances in 3rd generation sequencing have opened new possibilities for ‘benchtop’ whole genome sequencing. The MinION is a portable device that uses nanopore technology and can sequence long DNA molecules. MinION long reads are well suited for sequencing and de novo assembly of complex genomes with large repetitive elements. Long reads also facilitate the identification of complex genomic rearrangements such as those observed in tumor genomes. To assess the feasibility of the de novo assembly of large complex genomes using both MinION and Illumina platforms, we sequenced the genome of a Caenorhabditis elegans strain that contains a complex acetaldehyde-induced rearrangement and a biolistic bombardment-mediated insertion of a GFP containing plasmid. Using ∼5.8 gigabases of MinION sequence data, we were able to assemble a C. elegans genome containing 145 contigs (N50 contig length = 1.22 Mb) that covered >99% of the 100,286,401 bp reference genome. In contrast, using ∼8.04 gigabases of Illumina sequence data, we were able to assemble a C. elegans genome in 38,645 contigs (N50 contig length = ∼26 kb) containing 117 Mb. From the MinION genome assembly we identified the complex structures of both the acetaldehyde-induced mutation and the biolistic-mediated insertion. To date, this is the largest genome to be assembled exclusively from MinION data and is the first demonstration that the long reads of MinION sequencing can be used for whole genome assembly of large (100 Mb) genomes and the elucidation of complex genomic rearrangements.

Download Full-text

De novo whole genome sequencing data of two mangrove-isolated microalgae from Terengganu coastal waters

Data in Brief ◽

10.1016/j.dib.2019.104680 ◽

2019 ◽

Vol 27 ◽

pp. 104680 ◽

Cited By ~ 2

Author(s):

Kit Yinn Teh ◽

C.L.Wan Afifudeen ◽

Ahmad Aziz ◽

Li Lian Wong ◽

Saw Hong Loh ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Coastal Waters ◽

De Novo ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data

Download Full-text

Harmonization of whole genome sequencing for outbreak surveillance of Enterobacteriaceae and Enterococci

10.1101/2020.11.20.392399 ◽

2020 ◽

Author(s):

Casper Jamin ◽

Sien de Koster ◽

Stefanie van Koeveringe ◽

Dieter de Coninck ◽

Klaas Mensaert ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

De Novo ◽

Illumina Miseq ◽

Whole Genome ◽

Sequencing Data ◽

Bacterial Typing ◽

E Coli ◽

Antimicrobial Resistance Genes ◽

Miseq Platform

AbstractWhole genome sequencing (WGS), is becoming the facto standard for bacterial typing and outbreak surveillance of resistant bacterial pathogens. We performed a three-center ring trial to assert if inter-laboratory harmonization of WGS is achievable, for this goal. To this end, a set of 30 bacterial isolates comprising of various species belonging to the Enterobacteriaceae and Enterococcus genera were selected and sequenced using the same protocol on the Illumina MiSeq platform in each individual centre. All generated sequencing data was analysed by 1 centre using BioNumerics (6.7.3) for i) genotyping origin of replications & antimicrobial resistance genes, ii) core-genome (cgMLST) for E. coli and K. pneumoniae & whole-genome multi locus sequencing typing (wgMLST) for all species. Additionally, a split k-mer analysis was performed to determine the number of SNPs between samples. A precision of 99.0% and an accuracy of 99.2% was achieved for genotyping. Based on cgMLST, only in 2/27 and 3/15 comparisons a discrepant allele was called between two genomes, for E. coli and K. pneumonia, respectively. Based on wgMLST, the number of discrepant alleles ranged from 0 to 7 (average 1.6). For SNPs, this ranged from 0-11 SNPs (average 3.4). Furthermore, we demonstrate that using different de novo assemblers to analyse the same dataset introduces up to 150 SNPs, which surpasses most thresholds for bacterial outbreaks. This shows the importance of harmonisation of data processing surveillance of bacterial outbreaks. Summarizing, multi-center WGS for bacterial surveillance is achievable, but only if protocols are harmonized.

Download Full-text

De novo ZIC2 frameshift variant associated with frontonasal dysplasia in a Limousin calf

BMC Genomics ◽

10.1186/s12864-020-07350-y ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Marina Braun ◽

Annika Lehmbecker ◽

Deborah Eikelberg ◽

Maren Hellige ◽

Andreas Beineke ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

De Novo ◽

De Novo Mutation ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Craniofacial Malformations ◽

Frontonasal Dysplasia ◽

Affected Calf

Abstract Background Bovine frontonasal dysplasias like arhinencephaly, synophthalmia, cyclopia and anophthalmia are sporadic congenital facial malformations. In this study, computed tomography, necropsy, histopathological examinations and whole genome sequencing on an Illumina NextSeq500 were performed to characterize a stillborn Limousin calf with frontonasal dysplasia. In order to identify private genetic and structural variants, we screened whole genome sequencing data of the affected calf and unaffected relatives including parents, a maternal and paternal halfsibling. Results The stillborn calf exhibited severe craniofacial malformations. Nose and maxilla were absent, mandibles were upwardly curved and a median cleft palate was evident. Eyes, optic nerve and orbital cavities were not developed and the rudimentary orbita showed hypotelorism. A defect centrally in the front skull covered with a membrane extended into the intracranial cavity. Aprosencephaly affected telencephalic and diencephalic structures and cerebellum. In addition, a shortened tail was seen. Filtering whole genome sequencing data revealed a private frameshift variant within the candidate gene ZIC2 in the affected calf. This variant was heterozygous mutant in this case and homozygous wild type in parents, half-siblings and controls. Conclusions We found a novel ZIC2 frameshift mutation in an aprosencephalic Limousin calf. The origin of this variant is most likely due to a de novo mutation in the germline of one parent or during very early embryonic development. To the authors’ best knowledge, this is the first identified mutation in cattle associated with bovine frontonasal dysplasia.

Download Full-text