Analysis of Genomic Sequence Data Reveals the Origin and Evolutionary Separation of Hawaiian Hoary Bat Populations

Corinna A Pinzari; Lin Kang; Pawel Michalak; Lars S Jermiin; Donald K Price; Frank J Bonaccorso

doi:10.1093/gbe/evaa137

Analysis of Genomic Sequence Data Reveals the Origin and Evolutionary Separation of Hawaiian Hoary Bat Populations

Genome Biology and Evolution ◽

10.1093/gbe/evaa137 ◽

2020 ◽

Vol 12 (9) ◽

pp. 1504-1514

Author(s):

Corinna A Pinzari ◽

Lin Kang ◽

Pawel Michalak ◽

Lars S Jermiin ◽

Donald K Price ◽

...

Keyword(s):

De Novo ◽

Genomic Sequence ◽

Sequence Data ◽

Genomic Diversity ◽

Nucleotide Polymorphisms ◽

Whole Genome Analysis ◽

Lasiurus Cinereus ◽

Conservation Concern ◽

Genome Profiles ◽

Hoary Bats

Abstract We examine the genetic history and population status of Hawaiian hoary bats (Lasiurus semotus), the most isolated bats on Earth, and their relationship to northern hoary bats (Lasiurus cinereus), through whole-genome analysis of single-nucleotide polymorphisms mapped to a de novo-assembled reference genome. Profiles of genomic diversity and divergence indicate that Hawaiian hoary bats are distinct from northern hoary bats, and form a monophyletic group, indicating a single ancestral colonization event 1.34 Ma, followed by substantial divergence between islands beginning 0.51 Ma. Phylogenetic analysis indicates Maui is central to the radiation across the archipelago, with the southward expansion to Hawai‘i and westward to O‘ahu and Kaua‘i. Because this endangered species is of conservation concern, a clearer understanding of the population genetic structure of this bat in the Hawaiian Islands is of timely importance.

Download Full-text

Defects in pyruvate kinase cause a conditional increase of thiamine synthesis inSalmonella typhimurium

Canadian Journal of Microbiology ◽

10.1139/w99-042 ◽

1999 ◽

Vol 45 (7) ◽

pp. 565-572 ◽

Cited By ~ 7

Author(s):

Todd Christian ◽

Diana M Downs

Keyword(s):

Pyruvate Kinase ◽

De Novo ◽

Genomic Sequence ◽

Sequence Data ◽

Purine Biosynthesis ◽

Phenotypic Analysis ◽

Metabolic Defects ◽

Genes Encoding ◽

Metabolic Interactions

As genomic sequence data become more prevalent, the challenges in microbial physiology shift from identifying biochemical pathways to understanding the interactions that occur between them to create a robust but responsive metabolism. One of the most powerful methods to identify such interactions is in vivo phenotypic analysis. We have utilized thiamine synthesis as a model to detect subtle metabolic interactions due to the sensitivity allowed by the small cellular requirement for this vitamin. Although purine biosynthesis produces an intermediate in thiamine synthesis, mutants blocked in the first step of de novo purine biosynthesis (PurF) are able to grow in the absence of thiamine owing to an alternative synthesis. A number of general metabolic defects have been found to prevent PurF-independent thiamine synthesis. Here we report stimulation of thiamine-independent growth caused by a mutation in one or both genes encoding the pyruvate kinase isozymes. The results presented herein represent the first phenotype described for mutants defective in pykA or pykF, and thus identify metabolic interactions that exist in vivo.Key words: thiamine synthesis, metabolic integration.

Download Full-text

One is not enough: On the effects of reference genome for the mapping and subsequent analyses of short-reads

PLoS Computational Biology ◽

10.1371/journal.pcbi.1008678 ◽

2021 ◽

Vol 17 (1) ◽

pp. e1008678

Author(s):

Carlos Valiente-Mullor ◽

Beatriz Beamud ◽

Iván Ansari ◽

Carlos Francés-Cuesta ◽

Neris García-González ◽

...

Keyword(s):

Legionella Pneumophila ◽

Phylogenetic Trees ◽

High Throughput Sequencing ◽

Reference Genome ◽

Sequence Data ◽

Genetic Distances ◽

Genomic Diversity ◽

Nucleotide Polymorphisms ◽

Recombination Rates ◽

Almost All

Mapping of high-throughput sequencing (HTS) reads to a single arbitrary reference genome is a frequently used approach in microbial genomics. However, the choice of a reference may represent a source of errors that may affect subsequent analyses such as the detection of single nucleotide polymorphisms (SNPs) and phylogenetic inference. In this work, we evaluated the effect of reference choice on short-read sequence data from five clinically and epidemiologically relevant bacteria (Klebsiella pneumoniae, Legionella pneumophila, Neisseria gonorrhoeae, Pseudomonas aeruginosa and Serratia marcescens). Publicly available whole-genome assemblies encompassing the genomic diversity of these species were selected as reference sequences, and read alignment statistics, SNP calling, recombination rates, dN/dS ratios, and phylogenetic trees were evaluated depending on the mapping reference. The choice of different reference genomes proved to have an impact on almost all the parameters considered in the five species. In addition, these biases had potential epidemiological implications such as including/excluding isolates of particular clades and the estimation of genetic distances. These findings suggest that the single reference approach might introduce systematic errors during mapping that affect subsequent analyses, particularly for data sets with isolates from genetically diverse backgrounds. In any case, exploring the effects of different references on the final conclusions is highly recommended.

Download Full-text

Metagenome-Derived Draft Genome Sequence of Acidithiobacillus ferrooxidans RV1 from an Abandoned Gold Tailing in Neuquén, Argentina

Solid State Phenomena ◽

10.4028/www.scientific.net/ssp.262.439 ◽

2017 ◽

Vol 262 ◽

pp. 439-442 ◽

Cited By ~ 1

Author(s):

Ricardo Ulloa ◽

Ana Moya-Beltrán ◽

Francisco Issotta ◽

Harold Nuñez ◽

Paulo C. Covarrubias ◽

...

Keyword(s):

Dna Hybridization ◽

Microbial Consortium ◽

De Novo ◽

Genomic Sequence ◽

Enrichment Culture ◽

Draft Genome ◽

Genomic Diversity ◽

Average Nucleotide Identity ◽

Lime Treatment ◽

Genetic Traits

In this work we report the metagenome-derived draft genomic sequence of an enrichment culture dominated by A. ferrooxidans obtained from an airlift bioreactor inoculated with the microbial consortium recovered from the “Relave Viejo” tailing. The genome of this culture was assembled de-novo and by reference, generating a consensus assembly of 3.0 Mb. On the basis of 16S rRNA (100 % identity), average nucleotide identity analysis (99.33% identity) and in silico DNA-DNA hybridization against A. ferrooxidans ATCC 23270T (97.9%), the recovered genome is confirmed to pertain to A. ferrooxidans species. Comparative genomics results are presented to uncover the genetic traits of the variant surviving lime treatment and to further explore the genomic diversity of these model iron oxidizing species.

Download Full-text

New whole genome de novo assemblies of three divergent strains of rice (O. sativa) documents novel gene space of aus and indica

10.1101/003764 ◽

2014 ◽

Cited By ~ 4

Author(s):

Michael C Schatz ◽

Lyza G Maron ◽

Joshua C Stein ◽

Alejandro Hernandez Wences ◽

James Gurtowski ◽

...

Keyword(s):

Structural Variation ◽

De Novo ◽

Sequence Data ◽

Biological Properties ◽

Genomic Diversity ◽

Reference Sequence ◽

Human Populations ◽

Whole Genome ◽

Rice Varieties ◽

Assembly Technology

The use of high throughput genome-sequencing technologies has uncovered a large extent of structural variation in eukaryotic genomes that makes important contributions to genomic diversity and phenotypic variation. Currently, when the genomes of different strains of a given organism are compared, whole genome resequencing data are aligned to an established reference sequence. However when the reference differs in significant structural ways from the individuals under study, the analysis is often incomplete or inaccurate. Here, we use rice as a model to explore the extent of structural variation among strains adapted to different ecologies and geographies, and show that this variation can be significant, often matching or exceeding the variation present in closely related human populations or other mammals. We demonstrate how improvements in sequencing and assembly technology allow rapid and inexpensive de novo assembly of next generation sequence data into high-quality assemblies that can be directly compared to provide an unbiased assessment. Using this approach, we are able to accurately assess the ?pan-genome? of three divergent rice varieties and document several megabases of each genome absent in the other two. Many of the genome-specific loci are annotated to contain genes, reflecting the potential for new biological properties that would be missed by standard resequencing approaches. We further provide a detailed analysis of several loci associated with agriculturally important traits, illustrating the utility of our approach for biological discovery. All of the data and software are openly available to support further breeding and functional studies of rice and other species.

Download Full-text

A hybrid pipeline for reconstruction and analysis of viral genomes at multi-organ level

GigaScience ◽

10.1093/gigascience/giaa086 ◽

2020 ◽

Vol 9 (8) ◽

Cited By ~ 4

Author(s):

Diogo Pratas ◽

Mari Toppinen ◽

Lari Pyöriä ◽

Klaus Hedman ◽

Antti Sajantila ◽

...

Keyword(s):

De Novo ◽

Sequence Data ◽

Ex Vivo ◽

Genomic Variation ◽

Nucleotide Polymorphisms ◽

Sensitive Data ◽

Viral Genomes ◽

Sequencing Technologies ◽

Multiple Organs ◽

Research Perspectives

Abstract Background Advances in sequencing technologies have enabled the characterization of multiple microbial and host genomes, opening new frontiers of knowledge while kindling novel applications and research perspectives. Among these is the investigation of the viral communities residing in the human body and their impact on health and disease. To this end, the study of samples from multiple tissues is critical, yet, the complexity of such analysis calls for a dedicated pipeline. We provide an automatic and efficient pipeline for identification, assembly, and analysis of viral genomes that combines the DNA sequence data from multiple organs. TRACESPipe relies on cooperation among 3 modalities: compression-based prediction, sequence alignment, and de novo assembly. The pipeline is ultra-fast and provides, additionally, secure transmission and storage of sensitive data. Findings TRACESPipe performed outstandingly when tested on synthetic and ex vivo datasets, identifying and reconstructing all the viral genomes, including those with high levels of single-nucleotide polymorphisms. It also detected minimal levels of genomic variation between different organs. Conclusions TRACESPipe’s unique ability to simultaneously process and analyze samples from different sources enables the evaluation of within-host variability. This opens up the possibility to investigate viral tissue tropism, evolution, fitness, and disease associations. Moreover, additional features such as DNA damage estimation and mitochondrial DNA reconstruction and analysis, as well as exogenous-source controls, expand the utility of this pipeline to other fields such as forensics and ancient DNA studies. TRACESPipe is released under GPLv3 and is available for free download at https://github.com/viromelab/tracespipe.

Download Full-text

Complete Genome Sequences from Three Genetically Distinct Strains Reveal High Intraspecies Genetic Diversity in the Microsporidian Encephalitozoon cuniculi

Eukaryotic Cell ◽

10.1128/ec.00312-12 ◽

2013 ◽

Vol 12 (4) ◽

pp. 503-511 ◽

Cited By ~ 33

Author(s):

Jean-François Pombert ◽

Jinshan Xu ◽

David R. Smith ◽

David Heiman ◽

Sarah Young ◽

...

Keyword(s):

Genetic Diversity ◽

De Novo ◽

Genomic Diversity ◽

Nucleotide Polymorphisms ◽

Entire Genome ◽

Content Type ◽

Large Numbers ◽

Intracellular Parasites ◽

Intergenic Regions ◽

Identical Gene

ABSTRACTMicrosporidia from the Encephalitozoonidae are obligate intracellular parasites with highly conserved and compacted nuclear genomes: they have few introns, short intergenic regions, and almost identical gene complements and chromosome arrangements. Comparative genomics ofEncephalitozoonand microsporidia in general have focused largely on the genomic diversity between different species, and we know very little about the levels of genetic diversity within species. Polymorphism studies withEncephalitozoonare so far restricted to a small number of genes, and a few genetically distinct strains have been identified; most notably, three genotypes (ECI, ECII, and ECIII) of the model speciesE. cuniculihave been identified based on variable repeats in the rRNA internal transcribed spacer (ITS). To determine ifE. cuniculigenotypes are genetically distinct lineages across the entire genome and at the same time to examine the question of intraspecies genetic diversity in microsporidia in general, we sequencedde novogenomes from each of the three genotypes and analyzed patterns of single nucleotide polymorphisms (SNPs) and insertions/deletions across the genomes. Although the strains have almost identical gene contents, they harbor large numbers of SNPs, including numerous nonsynonymous changes, indicating massive intraspecies variation within the Encephalitozoonidae. Based on this diversity, we conclude that the recognized genotypes are genetically distinct and propose new molecular markers for microsporidian genotyping.

Download Full-text

Single Nucleotide Polymorphisms Caused by Assembly Errors

Genomics Insights ◽

10.4137/gei.s3653 ◽

2010 ◽

Vol 3 ◽

pp. GEI.S3653

Author(s):

Jürgen Kleffe ◽

Robert Weißmann ◽

Florian F. Schmitzberger

Keyword(s):

Single Nucleotide Polymorphisms ◽

De Novo ◽

Sequence Data ◽

Bacterial Genome ◽

Sequence Assembly ◽

Nucleotide Polymorphisms ◽

Base Pairs ◽

Single Nucleotide ◽

Current Sequence ◽

Assembly Algorithms

We compare the results of three different assembler programs, Celera, Phrap and Mira2, for the same set of about a hundred thousand Sanger reads derived from an unknown bacterial genome. In difference to previous assembly comparisons we do not focus on speed of computation and numbers of assembled contigs but on how the different sequence assemblies agree by content. Threefold consistently assembled genome regions are identified in order to estimate a lower bound of erroneously identified single nucleotide polymorphisms (SNP) caused by nothing but the process of mathematical sequence assembly. We identified 509 sequence triplets common to all three de-novo assemblies spanning only 34% (3.3 Mb) of the bacterial genome with 175 of these regions (~1.5 Mb) including erroneous SNPs and insertion/deletions. Within these triplets this on average leads to one error per 7,155 base pairs. Replacing the assembler Mira2 by the most recent version Mira3, the letter number even drops to 5,923. Our results therefore suggest that a considerably high number of erroneous SNPs may be present in current sequence data and mathematicians should urgently take up research on numerical stability of sequence assembly algorithms. Furthermore, even the latest versions of currently used assemblers produce erroneous SNPs that depend on the order reads are used as input. Such errors will severely hamper molecular diagnostics as well as relating genome variation and disease. This issue needs to be addressed urgently as the field is moving fast into clinical applications.

Download Full-text

Mycobacterium tuberculosis complex lineage 5 exhibits high levels of within-lineage genomic diversity and differing gene content compared to the type strain H37Rv

10.1101/2020.06.22.164186 ◽

2020 ◽

Author(s):

C. N’Dira Sanoussi ◽

Mireia Coscolla ◽

Boatema Ofori-Anyinam ◽

Isaac Darko Otchere ◽

Martin Antonio ◽

...

Keyword(s):

Mycobacterium Tuberculosis ◽

Reference Genome ◽

De Novo ◽

Genome Mapping ◽

Sequence Data ◽

Gene Diversity ◽

Genomic Analysis ◽

Genomic Diversity ◽

Gene Content ◽

Link Type

AbstractPathogens of the Mycobacterium tuberculosis complex (MTBC) are considered monomorphic, with little gene content variation between strains. Nevertheless, several genotypic and phenotypic factors separate the different MTBC lineages (L), especially L5 and L6 (traditionally termed Mycobacterium africanum), from each other. However, genome variability and gene content especially of L5 and L6 strains have not been fully explored and may be potentially important for pathobiology and current approaches for genomic analysis of MTBC isolates, including transmission studies.We compared the genomes of 358 L5 clinical isolates (including 3 completed genomes and 355 Illumina WGS (whole genome sequenced) isolates) to the L5 complete genomes and H37Rv, and identified multiple genes differentially present or absent between H37Rv and L5 strains. Additionally, considerable gene content variability was found across L5 strains, including a split in the L5.3 sublineage into L5.3.1 and L5.3.2. These gene content differences had a small knock on effect on transmission cluster estimation, with clustering rates influenced by the selection of reference genome, and with potential over-estimation of recent transmission when using H37Rv as the reference genome.Our data show that the use of H37Rv as reference genome results in missing SNPs in genes unique for L5 strains. This potentially leads to an underestimation of the diversity present in the genome of L5 strains and in turn affects the transmission clustering rates. As such, a full capture of the gene diversity, especially for high resolution outbreak analysis, requires a variation of the single H37Rv-centric reference genome mapping approach currently used in most WGS data analysis pipelines. Moreover, the high within-lineage gene content variability suggests that the pan-genome of M. tuberculosis is at least several kilobases larger than previously thought, implying a concatenated or reference-free genome assembly (de novo) approach may be needed for particular questions.Data summarySequence data for the Illumina dataset are available at European Genome-phenome Archive (EGA; https://www.ebi.ac.uk/ega/) under the study accession numbers PRJEB38317 and PRJEB38656. Individual runs accession numbers are indicated in Table S8.PacBio raw reads for the L5 Benin genome are available on the ENA accession SAME3170744. The assembled L5 Benin genome is available on NCBI with accession PRJNA641267. To ensure naming conventions of the genes in the three L5 genomes can be followed, we have uploaded these annotated GFF files to figshare at https://doi.org/10.6084/m9.figshare.12911849.v1.Custom python scripts used in this analysis can be found at https://github.com/conmeehan/pathophy.

Download Full-text

Investigating the likely association between genetic ancestry and COVID-19 manifestation (Preprint)

10.2196/preprints.19312 ◽

2020 ◽

Author(s):

Ranajit Das ◽

Sudeep D Ghate

Keyword(s):

Single Nucleotide Polymorphisms ◽

Genome Wide Association Study ◽

Genomic Sequence ◽

Sequence Data ◽

Antiviral Response ◽

Genetic Ancestry ◽

Recovery Ratio ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

A Genome

UNSTRUCTURED The novel coronavirus 2019-nCoV/SARS-CoV-2 infection has shown discernible variability across the globe. While in some countries people are recovering relatively quicker, in others, recovery times have been comparatively longer and numbers of those succumbing to it high. In this study, we aimed to evaluate the likely association between an individual’s ancestry and the extent of COVID-19 manifestation employing Europeans as the case study. We employed 10,215 ancient and modern genomes across the globe assessing 597,573 single nucleotide polymorphisms (SNPs). Pearson’s correlation coefficient (r) between various ancestry proportions of European genomes and COVID-19 death/recovery ratio was calculated and its significance was statistically evaluated. We found significant positive correlation (p=0.03) between European Mesolithic hunter gatherers (WHG) ancestral fractions and COVID-19 death/recovery ratio and a marginally significant negative correlation (p=0.06) between Neolithic Iranian ancestry fractions and COVID-19 death/recovery ratio. We further identified 404 immune response related single nucleotide polymorphisms (SNPs) by comparing publicly available 753 genomes from various European countries against 838 genomes from various Eastern Asian countries in a genome wide association study (GWAS). Prominently, we identified that SNPs associated with Interferon stimulated antiviral response, Interferon-stimulated gene 15 mediated antiviral mechanism and 2′-5′ oligoadenylate synthase mediated antiviral response show large differences in allele frequencies between Europeans and East Asians. Overall, to the best of our knowledge, this is the first study evaluating the likely association between genetic ancestry and COVID-19 manifestation. While our current findings improve our overall understanding of the COVID-19, we note that the development of effective therapeutics will benefit immensely from more detailed analyses of individual genomic sequence data from COVID-19 patients of varied ancestries.

Download Full-text

Genetic Characterization and Variation of African Swine Fever Virus China/GD/2019 Strain in Domestic Pigs

Pathogens ◽

10.3390/pathogens11010097 ◽

2022 ◽

Vol 11 (1) ◽

pp. 97

Author(s):

Xun Wang ◽

Xiaoying Wang ◽

Xiaoxiao Zhang ◽

Sheng He ◽

Yaosheng Chen ◽

...

Keyword(s):

Genomic Sequence ◽

African Swine Fever Virus ◽

African Swine Fever ◽

Genomic Diversity ◽

Whole Genome ◽

Synonymous Mutation ◽

Nucleotide Polymorphisms ◽

Mutation Site ◽

New Variant ◽

Genotype Ii

African swine fever (ASF) was first introduced into Northern China in 2018 and has spread through China since then. Here, we extracted the viral DNA from the blood samples from an ASF outbreak farm in Guangdong province, China and sequenced the whole genome. We assembled the full length genomic sequence of this strain, named China/GD/2019. The whole genome was 188,642 bp long (terminal inverted repeats and loops were not sequenced), encoding 175 open reading frames (ORF). The China/GD/2019 strain belonged to p72 genotype II and p54 genotype IIa. Phylogenetic analysis relationships based on single nucleotide polymorphisms (SNPs) also demonstrated that it grouped into genotype II. A certain number of ORFs mainly belonging to multigene families (MGFs) were absent in the China/GD/2019 strain in comparison to the China/ASFV/SY-18 strain. A deletion of approximately 1 kb was found in the China/GD/2019 genome which was located at the EP153R and EP402R genes in comparison to the China/2018/AnhuiXCGQ strain. We revealed a synonymous mutation site at gene F317L and a non-synonymous mutation site at gene MGF_360-6L in China/GD/2019 comparing to three known Chinese strains. Pair-wise comparison revealed 165 SNP sites in MGF_360-1L between Estonia 2014 and the China/GD/2019 strain. Comparing to China/GD/2019, we revealed a base deletion located at gene D1133L in China/Pig/HLJ/2018 and China/DB/LN/2018, which results in a frameshift mutation to alter the encoding protein. Our findings indicate that China/GD/2019 is a new variant with certain deletions and mutations. This study deepens our understanding of the genomic diversity and genetic variation of ASFV.

Download Full-text