Comparative Genomics of Clinical Isolates of the Emerging Tick-Borne Pathogen Neoehrlichia mikurensis

Anna Grankvist; Daniel Jaén-Luchoro; Linda Wass; Per Sikora; Christine Wennerås

doi:10.3390/microorganisms9071488

Comparative Genomics of Clinical Isolates of the Emerging Tick-Borne Pathogen Neoehrlichia mikurensis

Microorganisms ◽

10.3390/microorganisms9071488 ◽

2021 ◽

Vol 9 (7) ◽

pp. 1488

Author(s):

Anna Grankvist ◽

Daniel Jaén-Luchoro ◽

Linda Wass ◽

Per Sikora ◽

Christine Wennerås

Keyword(s):

Vascular Endothelium ◽

De Novo ◽

Phylogenetic Analyses ◽

Geographic Origin ◽

Comparative Genomic ◽

Whole Genome ◽

Illumina Hiseq ◽

Protein Coding ◽

Ehrlichia Ruminantium ◽

Protein Coding Genes

Tick-borne ‘Neoehrlichia (N.) mikurensis’ is the cause of neoehrlichiosis, an infectious vasculitis of humans. This strict intracellular pathogen is a member of the family Anaplasmataceae and has been unculturable until recently. The only available genetic data on this new pathogen are six partially sequenced housekeeping genes. The aim of this study was to advance the knowledge regarding ‘N. mikurensis’ genomic relatedness with other Anaplasmataceae members, intra-species genotypic variability and potential virulence factors explaining its tropism for vascular endothelium. Here, we present the de novo whole-genome sequences of three ‘N. mikurensis’ strains derived from Swedish patients diagnosed with neoehrlichiosis. The genomes were obtained by extraction of DNA from patient plasma, library preparation using 10x Chromium technology, and sequencing by Illumina Hiseq-4500. ‘N. mikurensis’ was found to have the next smallest genome of the Anaplasmataceae family (1.1 Mbp with 27% GC contents) consisting of 845 protein-coding genes, every third of which with unknown function. Comparative genomic analyses revealed that ‘N. mikurensis’ was more closely related to Ehrlichia chaffeensis than to Ehrlichia ruminantium, the opposite of what 16SrRNA sequence-based phylogenetic analyses determined. The genetic variability of the three whole-genome-sequenced ‘N. mikurensis’ strains was extremely low, between 0.14 and 0.22‰, a variation that was associated with geographic origin. No protein-coding genes exclusively shared by N. mikurensis and E. ruminantium were identified to explain their common tropism for vascular endothelium.

Download Full-text

Phylogenetic relationships and taxonomic position of genus Hyperacrius (Rodentia: Arvicolinae) from Kashmir based on evidences from analysis of mitochondrial genome and study of skull morphology

PeerJ ◽

10.7717/peerj.10364 ◽

2020 ◽

Vol 8 ◽

pp. e10364

Author(s):

Natalia I. Abramson ◽

Fedor N. Golenishchev ◽

Semen Yu. Bodrov ◽

Olga V. Bondareva ◽

Evgeny A. Genelt-Yanovskiy ◽

...

Keyword(s):

Mitochondrial Genome ◽

De Novo ◽

Phylogenetic Analyses ◽

Complete Mitochondrial Genome ◽

Morphological Characters ◽

Molecular Data ◽

Phylogenetic Position ◽

Skull Morphology ◽

Protein Coding ◽

Protein Coding Genes

In this article, we present the nearly complete mitochondrial genome of the Subalpine Kashmir vole Hyperacrius fertilis (Arvicolinae, Cricetidae, Rodentia), assembled using data from Illumina next-generation sequencing (NGS) of the DNA from a century-old museum specimen. De novo assembly consisted of 16,341 bp and included all mitogenome protein-coding genes as well as 12S and 16S RNAs, tRNAs and D-loop. Using the alignment of protein-coding genes of 14 previously published Arvicolini tribe mitogenomes, seven Clethrionomyini mitogenomes, and also Ondatra and Dicrostonyx outgroups, we conducted phylogenetic reconstructions based on a dataset of 13 protein-coding genes (PCGs) under maximum likelihood and Bayesian inference. Phylogenetic analyses robustly supported the phylogenetic position of this species within the tribe Arvicolini. Among the Arvicolini, Hyperacrius represents one of the early-diverged lineages. This result of phylogenetic analysis altered the conventional view on phylogenetic relatedness between Hyperacrius and Alticola and prompted the revision of morphological characters underlying the former assumption. Morphological analysis performed here confirmed molecular data and provided additional evidence for taxonomic replacement of the genus Hyperacrius from the tribe Clethrionomyini to the tribe Arvicolini.

Download Full-text

A high-quality chromosomal genome assembly of Diospyros oleifera Cheng

GigaScience ◽

10.1093/gigascience/giz164 ◽

2020 ◽

Vol 9 (1) ◽

Author(s):

Yujing Suo ◽

Peng Sun ◽

Huihui Cheng ◽

Weijuan Han ◽

Songfeng Diao ◽

...

Keyword(s):

Molecular Mechanisms ◽

De Novo ◽

Phylogenetic Analyses ◽

Draft Genome ◽

Diospyros Kaki ◽

High Quality ◽

Phylogenetic Tree Analysis ◽

Protein Coding ◽

Protein Coding Genes ◽

Anthocyanin Pathway

Abstract Background Diospyros oleifera Cheng, of the family Ebenaceae, is an economically important tree. Phylogenetic analyses indicate that D. oleifera is closely related to Diospyros kaki Thunb. and could be used as a model plant for studies of D. kaki. Therefore, development of genomic resources of D. oleifera will facilitate auxiliary assembly of the hexaploid persimmon genome and elucidate the molecular mechanisms of important traits. Findings The D. oleifera genome was assembled with 443.6 Gb of raw reads using the Pacific Bioscience Sequel and Illumina HiSeq X Ten platforms. The final draft genome was ∼812.3 Mb and had a high level of continuity with N50 of 3.36 Mb. Fifteen scaffolds corresponding to the 15 chromosomes were assembled to a final size of 721.5 Mb using 332 scaffolds, accounting for 88.81% of the genome. Repeat sequences accounted for 54.8% of the genome. By de novo sequencing and analysis of homology with other plant species, 30,530 protein-coding genes with an average transcript size of 7,105.40 bp were annotated; of these, 28,580 protein-coding genes (93.61%) had conserved functional motifs or terms. In addition, 171 candidate genes involved in tannin synthesis and deastringency in persimmon were identified; of these chalcone synthase (CHS) genes were expanded in the D. oleifera genome compared with Diospyros lotus, Camellia sinensis, and Vitis vinifera. Moreover, 186 positively selected genes were identified, including chalcone isomerase (CHI) gene, a key enzyme in the flavonoid-anthocyanin pathway. Phylogenetic tree analysis indicated that the split of D. oleifera and D. lotus likely occurred 9.0 million years ago. In addition to the ancient γ event, a second whole-genome duplication event occurred in D. oleifera and D. lotus. Conclusions We generated a high-quality chromosome-level draft genome for D. oleifera, which will facilitate assembly of the hexaploid persimmon genome and further studies of major economic traits in the genus Diospyros.

Download Full-text

The draft genome sequence of Eucalyptus polybractea based on hybrid assembly with short- and long-reads reads

10.1101/2021.05.18.444652 ◽

2021 ◽

Author(s):

Teng Li ◽

David Kainer ◽

William J Foley ◽

Allen Rodrigo ◽

Carsten Kuelheim

Keyword(s):

Population Genomics ◽

De Novo ◽

Draft Genome ◽

Hybrid Assembly ◽

Illumina Hiseq ◽

Protein Coding ◽

Genome Coverage ◽

Protein Coding Genes ◽

Long Reads ◽

Long Read

Eucalyptus polybractea is a small, multi-stemmed tree, which is widely cultivated in Australia for the production of Eucalyptus oil. We report the hybrid assembly of the E. polybractea genome utilizing both short- and long-read technology. We generated 44 Gb of Illumina HiSeq short reads and 8 Gb of Nanopore long reads, representing approximately 83 and 15 times genome coverage, respectively. The hybrid-assembled genome, after polishing, contained 24,864 scaffolds with an accumulated length of 523 Mb (N50 = 40.3 kb; BUSCO-calculated genome completeness of 94.3%). The genome contained 35,385 predicted protein-coding genes detected by combining homology-based and de novo approaches. We have provided the first assembled genome based on hybrid sequences from the highly diverse Eucalyptus subgenus Symphyomyrtus, and revealed the value of including long-reads from Nanopore technology for enhancing the contiguity of the assembled genome, as well as for improving its completeness. We anticipate that the E. polybractea genome will be an invaluable resource supporting a range of studies in genetics, population genomics and evolution of related species in Eucalyptus.

Download Full-text

Whole Genome Sequence of the Commercially Relevant Mushroom Strain Agaricus bisporus var. bisporus ARP23

G3 Genes|Genome|Genetics ◽

10.1534/g3.119.400563 ◽

2019 ◽

Vol 9 (10) ◽

pp. 3057-3066 ◽

Cited By ~ 2

Author(s):

Eoin O’Connor ◽

Jamie McGowan ◽

Charley G. P. McCarthy ◽

Aniça Amini ◽

Helen Grogan ◽

...

Keyword(s):

Genome Sequence ◽

Agaricus Bisporus ◽

Genomic Analysis ◽

Whole Genome Sequence ◽

Comparative Genomic ◽

Whole Genome ◽

Protein Coding ◽

Single Strain ◽

Protein Coding Genes ◽

Starting Point

Agaricus bisporus is an extensively cultivated edible mushroom. Demand for cultivation is continuously growing and difficulties associated with breeding programs now means strains are effectively considered monoculture. While commercial growing practices are highly efficient and tightly controlled, the over-use of a single strain has led to a variety of disease outbreaks from a range of pathogens including bacteria, fungi and viruses. To address this, the Agaricus Resource Program (ARP) was set up to collect wild isolates from diverse geographical locations through a bounty-driven scheme to create a repository of wild Agaricus germplasm. One of the strains collected, Agaricus bisporus var. bisporus ARP23, has been crossed extensively with white commercial varieties leading to the generation of a novel hybrid with a dark brown pileus commonly referred to as ‘Heirloom’. Heirloom has been successfully implemented into commercial mushroom cultivation. In this study the whole genome of Agaricus bisporus var. bisporus ARP23 was sequenced and assembled with Illumina and PacBio sequencing technology. The final genome was found to be 33.49 Mb in length and have significant levels of synteny to other sequenced Agaricus bisporus strains. Overall, 13,030 putative protein coding genes were located and annotated. Relative to the other A. bisporus genomes that are currently available, Agaricus bisporus var. bisporus ARP23 is the largest A. bisporus strain in terms of gene number and genetic content sequenced to date. Comparative genomic analysis shows that the A. bisporus mating loci in unifactorial and unsurprisingly highly conserved between strains. The lignocellulolytic gene content of all A. bisporus strains compared is also very similar. Our results show that the pangenome structure of A. bisporus is quite diverse with between 60–70% of the total protein coding genes per strain considered as being orthologous and syntenically conserved. These analyses and the genome sequence described herein are the starting point for more detailed molecular analyses into the growth and phenotypical responses of Agaricus bisporus var. bisporus ARP23 when challenged with economically important mycoviruses.

Download Full-text

Whole-genome sequence of the Tibetan frog Nanorana parkeri and the comparative evolution of tetrapod genomes

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1501764112 ◽

2015 ◽

Vol 112 (11) ◽

pp. E1257-E1262 ◽

Cited By ~ 103

Author(s):

Yan-Bo Sun ◽

Zi-Jun Xiong ◽

Xue-Yan Xiang ◽

Shi-Ping Liu ◽

Wei-Wei Zhou ◽

...

Keyword(s):

De Novo ◽

Structural Evolution ◽

Whole Genome Sequence ◽

Comparative Genomic ◽

Whole Genome ◽

Protein Coding ◽

Comparable Rate ◽

Evolutionary Studies ◽

Genomic Studies ◽

The Difference

The development of efficient sequencing techniques has resulted in large numbers of genomes being available for evolutionary studies. However, only one genome is available for all amphibians, that of Xenopus tropicalis, which is distantly related from the majority of frogs. More than 96% of frogs belong to the Neobatrachia, and no genome exists for this group. This dearth of amphibian genomes greatly restricts genomic studies of amphibians and, more generally, our understanding of tetrapod genome evolution. To fill this gap, we provide the de novo genome of a Tibetan Plateau frog, Nanorana parkeri, and compare it to that of X. tropicalis and other vertebrates. This genome encodes more than 20,000 protein-coding genes, a number similar to that of Xenopus. Although the genome size of Nanorana is considerably larger than that of Xenopus (2.3 vs. 1.5 Gb), most of the difference is due to the respective number of transposable elements in the two genomes. The two frogs exhibit considerable conserved whole-genome synteny despite having diverged approximately 266 Ma, indicating a slow rate of DNA structural evolution in anurans. Multigenome synteny blocks further show that amphibians have fewer interchromosomal rearrangements than mammals but have a comparable rate of intrachromosomal rearrangements. Our analysis also identifies 11 Mb of anuran-specific highly conserved elements that will be useful for comparative genomic analyses of frogs. The Nanorana genome offers an improved understanding of evolution of tetrapod genomes and also provides a genomic reference for other evolutionary studies.

Download Full-text

Comparative genomic analyses highlight the contribution of pseudogenized protein-coding genes to human lincRNAs

10.1101/163626 ◽

2017 ◽

Author(s):

Wan-Hsin Liu ◽

Zing Tsung-Yeh Tsai ◽

Huai-Kuang Tsai

Keyword(s):

Human Genome ◽

Noncoding Rna ◽

De Novo ◽

Systematic Investigation ◽

Comparative Genomic ◽

Protein Coding ◽

Protein Coding Genes ◽

Competing Endogenous Rnas ◽

Intergenic Regions ◽

The Relationship

AbstractBackgroundThe regulatory roles of long intergenic noncoding RNAs (lincRNAs) in humans have been revealed through the use of advanced sequencing technology. Recently, three possible scenarios of lincRNA origin have been proposed: de novo origination from intergenic regions, duplication from long noncoding RNA, and pseudogenization from protein. The first two scenarios are largely studied and supported, yet few studies focused on the evolution from pseudo genized protein-coding sequence to lincRNA. Due to the non-mutually exclusive nature that these three scenarios have, accompanied by the need of systematic investigation of lincRNA origination, we conduct a comparative genomics study to investigate the evolution of human lincRNAs.ResultsCombining with syntenic analysis and stringent Blastn e-value cutoff, we found that the majority of lincRNAs are aligned to the intergenic regions of other species. Interestingly, 193 human lincRNAs could have protein-coding orthologs in at least two of nine vertebrates. Transposable elements in these conserved regions in human genome are much less than expectation. Moreover, 19% of these lincRNAs have overlaps with or are close to pseudogenes in the human genome.ConclusionsWe suggest that a notable portion of lincRNAs could be derived from pseudogenized protein-coding genes. Furthermore, based on our computational analysis, we hypothesize that a subset of these lincRNAs could have potential to regulate their paralogs by functioning as competing endogenous RNAs. Our results provide evolutionary evidence of the relationship between human lincRNAs and protein-coding genes.

Download Full-text

Mitogenomics and Evolutionary History of Rodent Whipworms (Trichuris spp.) Originating from Three Biogeographic Regions

Life ◽

10.3390/life11060540 ◽

2021 ◽

Vol 11 (6) ◽

pp. 540

Author(s):

Jan Petružela ◽

Alexis Ribas ◽

Joëlle Goüy de Bellocq

Keyword(s):

Phylogenetic Analyses ◽

Eastern Africa ◽

Comparative Genomic ◽

Trna Genes ◽

Whole Genome ◽

Coding Region ◽

Trichuris Muris ◽

Protein Coding ◽

Wide Range ◽

Biogeographic Regions

Trichuris spp. is a widespread nematode which parasitizes a wide range of mammalian hosts including rodents, the most diverse mammalian order. However, genetic data on rodent whipworms are still scarce, with only one published whole genome (Trichuris muris) despite an increasing demand for whole genome data. We sequenced the whipworm mitogenomes from seven rodent hosts belonging to three biogeographic regions (Palearctic, Afrotropical, and Indomalayan), including three previously described species: Trichuris cossoni, Trichurisarvicolae, and Trichurismastomysi. We assembled and annotated two complete and five almost complete mitogenomes (lacking only the long non-coding region) and performed comparative genomic and phylogenetic analyses. All the mitogenomes are circular, have the same organisation, and consist of 13 protein-coding, 2 rRNA, and 22 tRNA genes. The phylogenetic analysis supports geographical clustering of whipworm species and indicates that T. mastomysi found in Eastern Africa is able to infect multiple closely related rodent hosts. Our results are informative for species delimitation based on mitochondrial markers and could be further used in studies on phylogeny, phylogeography, and population genetics of rodent whipworms

Download Full-text

Sequence Variation in Two Protein-Coding Genes Correlates with Mycelial Compatibility Groupings in Sclerotium rolfsii

Phytopathology ◽

10.1094/phyto-07-12-0151-r ◽

2013 ◽

Vol 103 (5) ◽

pp. 479-487 ◽

Cited By ~ 6

Author(s):

Efrén Remesal ◽

Blanca B. Landa ◽

María del Mar Jiménez-Gasco ◽

Juan A. Navas-Cortés

Keyword(s):

Rna Polymerase Ii ◽

Sequence Variation ◽

Geographic Origin ◽

Sclerotium Rolfsii ◽

Nuclear Ribosomal Dna ◽

Causal Organism ◽

Protein Coding ◽

Protein Coding Genes ◽

Wide Range ◽

Mycelial Compatibility

Populations of Sclerotium rolfsii, the causal organism of Sclerotium root-rot on a wide range of hosts, can be placed into mycelial compatibility groups (MCGs). In this study, we evaluated three different molecular approaches to unequivocally identify each of 12 previously identified MCGs. These included restriction fragment length polymorphism (RFLP) patterns of the internal transcribed spacer (ITS) region of nuclear ribosomal DNA (rDNA) and sequence analysis of two protein-coding genes: translation elongation factor 1α (EF1α) and RNA polymerase II subunit two (RPB2). A collection of 238 single-sclerotial isolates representing 12 MCGs of S. rolfsii were obtained from diseased sugar beet plants from Chile, Italy, Portugal, and Spain. ITS-RFLP analysis using four restriction enzymes (AluI, HpaII, RsaI, and MboI) displayed a low degree of variability among MCGs. Only three different restriction profiles were identified among S. rolfsii isolates, with no correlation to MCG or to geographic origin. Based on nucleotide polymorphisms, the RPB2 gene was more variable among MCGs compared with the EF1α gene. Thus, 10 of 12 MCGs could be characterized utilizing the RPB2 region only, while the EF1α region resolved 7 MCGs. However, the analysis of combined partial sequences of EF1α and RPB2 genes allowed discrimination among each of the 12 MCGs. All isolates belonging to the same MCG showed identical nucleotide sequences that differed by at least in one nucleotide from a different MCG. The consistency of our results to identify the MCG of a given S. rolfsii isolate using the combined sequences of EF1α and RPB2 genes was confirmed using blind trials. Our study demonstrates that sequence variation in the protein-coding genes EF1α and RPB2 may be exploited as a diagnostic tool for MCG typing in S. rolfsii as well as to identify previously undescribed MCGs.

Download Full-text

Mitogenome of Alaudala cheleensis (Passeriformes: Alaudidae) and comparative analyses of Sylvioidea mitogenomes

Zootaxa ◽

10.11646/zootaxa.4952.2.7 ◽

2021 ◽

Vol 4952 (2) ◽

pp. 331-353

Author(s):

CHAO YANG ◽

LE ZHAO ◽

QINGXIONG WANG ◽

HAO YUAN ◽

XUEJUAN LI ◽

...

Keyword(s):

Secondary Structure ◽

Gene Order ◽

Phylogenetic Relationships ◽

Gene Rearrangement ◽

Phylogenetic Analyses ◽

Protein Coding ◽

Comparative Analyses ◽

Protein Coding Genes ◽

Basal Position

To gain a better understanding of mitogenome features and phylogenetic relationships in Sylvioidea, a superfamily of Passerida, suborder Passeri, Passeriformes, the whole mitogenome of Alaudala cheleensis Swinhoe (Alaudidae) was sequenced, a comparative mitogenomic analysis of 18 Sylvioidea species was carried out, and finally, a phylogeny was reconstructed based on the mitochondrial dataset. Gene order of the A. cheleensis mitogenome was similar to that of other Sylvioidea species, including the gene rearrangement of cytb-trnT-CR1-trnP-nad6-trnE-remnant CR2-trnF-rrnS. There was slightly higher A+T content than that of G+C in the mitogenome, with an obvious C skew. The ATG codon initiated all protein-coding genes, while six terminating codons were used. The secondary structure of rrnS contained three domains and 47 helices, whereas rrnL included six domains and 60 helices. All tRNAs could be folded into a classic clover-leaf secondary structure except for trnS (AGY). The CR1 could be divided into three domains, including several conserved boxes (C-string, F, E, D, C and B-box, Bird similarity box, CSB1). Comparative analyses within Sylvioidea mitogenomes showed that most mitochondrial features were consistent with that of the A. cheleensis mitogenome. The basal position of the Alaudidae within the Sylvioidea in our phylogenetic analyses is consistent with other recent studies.

Download Full-text

From de novo to ‘de nono’: The majority of novel protein coding genes identified with phylostratigraphy are old genes or recent duplicates

Genome Biology and Evolution ◽

10.1093/gbe/evy231 ◽

2018 ◽

Cited By ~ 2

Author(s):

Claudio Casola

Keyword(s):

De Novo ◽

Protein Coding ◽

Protein Coding Genes ◽

Novel Protein

Download Full-text