Pan-Genome Analyses of Geobacillus spp. Reveal Genetic Characteristics and Composting Potential

Mengmeng Wang; Han Zhu; Zhijian Kong; Tuo Li; Lei Ma; Dongyang Liu; Qirong Shen

doi:10.3390/ijms21093393

Pan-Genome Analyses of Geobacillus spp. Reveal Genetic Characteristics and Composting Potential

International Journal of Molecular Sciences ◽

10.3390/ijms21093393 ◽

2020 ◽

Vol 21 (9) ◽

pp. 3393

Author(s):

Mengmeng Wang ◽

Han Zhu ◽

Zhijian Kong ◽

Tuo Li ◽

Lei Ma ◽

...

Keyword(s):

Core Genome ◽

Agricultural Waste ◽

Environmental Parameters ◽

Housekeeping Genes ◽

Ecological Diversity ◽

Thermostable Enzymes ◽

Genetic Characteristics ◽

Evolutionary Mechanism ◽

The Core ◽

Pan Genome

The genus Geobacillus is abundant in ecological diversity and is also well-known as an authoritative source for producing various thermostable enzymes. Although it is clear now that Geobacillus evolved from Bacillus, relatively little knowledge has been obtained regarding its evolutionary mechanism, which might also contribute to its ecological diversity and biotechnology potential. Here, a statistical comparison of thirty-two Geobacillus genomes was performed with a specific focus on pan- and core genomes. The pan-genome of this set of Geobacillus strains contained 14,913 genes, and the core genome contained 940 genes. The Clusters of Orthologous Groups (COG) and Carbohydrate-Active Enzymes (CAZymes) analysis revealed that the Geobacillus strains had huge potential industrial application in composting for agricultural waste management. Detailed comparative analyses showed that basic functional classes and housekeeping genes were conserved in the core genome, while genes associated with environmental interaction or energy metabolism were more enriched in the pan-genome. Therefore, the evolution of Geobacillus seems to be guided by environmental parameters. In addition, horizontal gene transfer (HGT) events among different Geobacillus species were detected. Altogether, pan-genome analysis was a useful method for detecting the evolutionary mechanism, and Geobacillus’ evolution was directed by the environment and HGT events.

Download Full-text

First Steps in the Analysis of Prokaryotic Pan-Genomes

Bioinformatics and Biology Insights ◽

10.1177/1177932220938064 ◽

2020 ◽

Vol 14 ◽

pp. 117793222093806

Author(s):

Sávio Souza Costa ◽

Luís Carlos Guimarães ◽

Artur Silva ◽

Siomar Castro Soares ◽

Rafael Azevedo Baraúna

Keyword(s):

Genome Analysis ◽

Core Genome ◽

Bacterial Species ◽

Genomic Analysis ◽

Gene Families ◽

Specific Group ◽

The Core ◽

Pan Genome ◽

Research Areas ◽

Key Concepts

Pan-genome is defined as the set of orthologous and unique genes of a specific group of organisms. The pan-genome is composed by the core genome, accessory genome, and species- or strain-specific genes. The pan-genome is considered open or closed based on the alpha value of the Heap law. In an open pan-genome, the number of gene families will continuously increase with the addition of new genomes to the analysis, while in a closed pan-genome, the number of gene families will not increase considerably. The first step of a pan-genome analysis is the homogenization of genome annotation. The same software should be used to annotate genomes, such as GeneMark or RAST. Subsequently, several software are used to calculate the pan-genome such as BPGA, GET_HOMOLOGUES, PGAP, among others. This review presents all these initial steps for those who want to perform a pan-genome analysis, explaining key concepts of the area. Furthermore, we present the pan-genomic analysis of 9 bacterial species. These are the species with the highest number of genomes deposited in GenBank. We also show the influence of the identity and coverage parameters on the prediction of orthologous and paralogous genes. Finally, we cite the perspectives of several research areas where pan-genome analysis can be used to answer important issues.

Download Full-text

Virulence and antibiotic resistance plasticity of Arcobacter butzleri: insights on the genomic diversity of an emerging human pathogen

10.1101/775932 ◽

2019 ◽

Author(s):

Joana Isidro ◽

Susana Ferreira ◽

Miguel Pinto ◽

Fernanda Domingues ◽

Mónica Oleastro ◽

...

Keyword(s):

Antibiotic Resistance ◽

Comparative Genomics ◽

Core Genome ◽

Human Pathogen ◽

Genome Diversity ◽

Pathogenic Potential ◽

The Core ◽

Pan Genome ◽

Arcobacter Butzleri ◽

Genome Scale

AbstractArcobacter butzleri is a food and waterborne bacteria and an emerging human pathogen, frequently displaying a multidrug resistant character. Still, no comprehensive genome-scale comparative analysis has been performed so far, which has limited our knowledge on A. butzleri diversification and pathogenicity. Here, we performed a deep genome analysis of A. butzleri focused on decoding its core- and pan-genome diversity and specific genetic traits underlying its pathogenic potential and diverse ecology. In total, 49 A. butzleri strains (collected from human, animal, food and environmental sources) were screened.A. butzleri (genome size 2.07-2.58 Mbp) revealed a large open pan-genome with 7474 genes (about 50% being singletons) and a small core-genome with 1165 genes. The core-genome is highly diverse (≥55% of the core genes presenting at least 40/49 alleles), being enriched with genes associated with housekeeping functions. In contrast, the accessory genome presented a high proportion of loci with an unknown function, also being particularly overrepresented by genes associated with defence mechanisms. A. butzleri revealed a plastic virulome (including newly identified determinants), marked by the differential presence of multiple adaptation-related virulence factors, such as the urease cluster ureD(AB)CEFG (phenotypically confirmed), the hypervariable hemagglutinin-encoding hecA, a putative type I secretion system (T1SS) harboring another agglutinin potentially related to adherence and a novel VirB/D4 T4SS likely linked to interbacterial competition and cytotoxicity. In addition, A. butzleri harbors a large repertoire of efflux pumps (EPs) (ten “core” and nine differentially present) and other antibiotic resistant determinants. We provide the first description of a genetic determinant of macrolides resistance in A. butzleri, by associating the inactivation of a TetR repressor (likely regulating an EP) with erythromycin resistance. Fluoroquinolones resistance correlated with the Thr-85-Ile substitution in GyrA and ampicillin resistance was linked to an OXA-15-like β-lactamase. Remarkably, by decoding the polymorphism pattern of the porin- and adhesin-encoding main antigen PorA, this study strongly supports that this pathogen is able to exchange porA as a whole and/or hypervariable epitope-encoding regions separately, leading to a multitude of chimeric PorA presentations that can impact pathogen-host interaction during infection. Ultimately, our unprecedented screening of short sequence repeats detected potential phase-variable genes related to adaptation and host/environment interaction, such as lipopolysaccharide modification and motility/chemotaxis, suggesting that phase variation likely modulate A. butzleri key adaptive functions.In summary, this study constitutes a turning point on A. butzleri comparative genomics revealing that this human gastrointestinal pathogen is equipped with vast virulence and antibiotic resistance arsenals, which, coupled with its remarkable core- and pan-genome diversity, opens a multitude of phenotypic fingerprints for environmental/host adaptation and pathogenicity.IMPACT STATEMENTDiarrhoeal diseases are the most common cause of human illness caused by foodborne hazards, but the surveillance of diarrhoeal diseases is biased towards the most commonly searched infectious agents (namely Campylobacter jejuni and C. coli). In fact, other less studied pathogens are frequently found as the etiological agent when refined non-selective culture conditions are applied. A hallmark example is the diarrhoeal-causing Arcobacter butzleri which, despite being also associated with extra-intestinal diseases, such as bacteremia in humans and mastitis in animals, and displaying high rates of antibiotic resistance, has not yet been profoundly investigated regarding its epidemiology, diversity and pathogenicity. To overcome the general lack of knowledge on A. butzleri comparative genomics, we provide the first comprehensive genome-scale analysis of A. butzleri focused on exploring the intraspecies virulome content and diversity, resistance determinants, as well as how this pathogen shapes its genome towards ecological adaptation and host invasion. The unveiled scenario of A. butzleri rampant diversity and plasticity reinforces the pathogenic potential of this food and waterborne hazard, while opening multiple research lines that will certainly contribute to the future development of more robust species-oriented diagnostics and molecular surveillance of A. butzleri.DATA SUMMARYA. butzleri raw sequence reads generated in the present study were deposited in the European Nucleotide Archive (ENA) (BioProject PRJEB34441). The assembled contigs (.fasta and .gbk files), the nucleotide sequences of the predicted transcripts (CDS, rRNA, tRNA, tmRNA, misc_RNA) (.ffn files) and the respective amino acid sequences of the translated CDS sequences (.faa files) are available at http://doi.org/10.5281/zenodo.3434222. Detailed ENA accession numbers, as well as the draft genome statistics are described in Table S1.

Download Full-text

Heterogeneity among estimates of the core genome and pan-genome in different pneumococcal populations

10.1101/133991 ◽

2017 ◽

Cited By ~ 5

Author(s):

Andries J van Tonder ◽

James E Bray ◽

Keith A Jolley ◽

Sigríður J Quirk ◽

Gunnsteinn Haraldsson ◽

...

Keyword(s):

Bacterial Population ◽

Core Genome ◽

Bacterial Species ◽

Essential Point ◽

Genetic Lineages ◽

The Core ◽

Pan Genome ◽

Single Dataset ◽

Genomic Regions ◽

Core Genes

AbstractBackgroundUnderstanding the structure of a bacterial population is essential in order to understand bacterial evolution, or which genetic lineages cause disease, or the consequences of perturbations to the bacterial population. Estimating the core genome, the genes common to all or nearly all strains of a species, is an essential component of such analyses. The size and composition of the core genome varies by dataset, but our hypothesis was that variation between different collections of the same bacterial species should be minimal. To test this, the genome sequences of 3,121 pneumococci recovered from healthy individuals in Reykjavik (Iceland), Southampton (United Kingdom), Boston (USA) and Maela (Thailand) were analysed.ResultsThe analyses revealed a ‘supercore’ genome (genes shared by all 3,121 pneumococci) of only 303 genes, although 461 additional core genes were shared by pneumococci from Reykjavik, Southampton and Boston. Overall, the size and composition of the core genomes and pan-genomes among pneumococci recovered in Reykjavik, Southampton and Boston were very similar, but pneumococci from Maela were distinctly different. Inspection of the pan-genome of Maela pneumococci revealed several >25 Kb sequence regions that were homologous to genomic regions found in other bacterial species.ConclusionsSome subsets of the global pneumococcal population are highly heterogeneous and thus our hypothesis was rejected. This is an essential point of consideration before generalising the findings from a single dataset to the wider pneumococcal population.

Download Full-text

Pan-genome of Novel Pantoea stewartii subsp. indologenes Reveal Genes Involved in Onion Pathogenicity and Evidence of Lateral Gene Transfer

10.20944/preprints202107.0400.v1 ◽

2021 ◽

Author(s):

Gaurav Agarwal ◽

Ronald D. Gitaitis ◽

Bhabesh Dutta

Keyword(s):

Gene Transfer ◽

Core Genome ◽

Foxtail Millet ◽

Evaluation Study ◽

Full Spectrum ◽

The Core ◽

Pan Genome ◽

Pantoea Stewartii ◽

Comparative Phylogenetic Analysis ◽

Core Genes

Pantoea stewartii subsp. indologenes (Psi) is a causative agent of leafspot of foxtail millet and pearl millet; however, novel strains were recently identified that are pathogenic on onion. Our recent host range evaluation study identified two pathovars; P. stewartii subsp. indologenes pv. cepacicola pv. nov. and P. stewartii subsp. indologenes pv. setariae pv. nov. that are pathogenic on onion and millets or on millets only, respectively. In the current study we developed a pan-genome using the whole genome sequencing of newly identified/classified Psi strains from both pathovars [pv. cepacicola (n= 4) and pv. setariae (n=13)]. The full spectrum of the pan-genome contained 7,030 genes. Among these, 3,546 (present in genomes of all 17 strains) were the core genes that were a subset of 3,682 soft-core genes (present in ≥16 strains). The accessory genome included 1,308 shell genes and 2,040 cloud genes (present in ≤ 2 strains). The pan-genome showed a clear liner progression with >6,000 genes, suggesting the pan-genome of Psi is open. Comparative phylogenetic analysis showed differences in phylogenetic clustering of Pantoea spp. using PAVs/wgMLST approach in comparison to core genome SNP-based phylogeny. Further, we conducted a horizontal gene transfer (HGT) study including four other Pantoea species namely, P. stewartii subsp. stewartii LMG 2715T, P. ananatis LMG 2665T, P. agglomerans LMG L15, and P. allii LMG 24248T. A total of 317 HGT events among four Pantoea species were identified with most gene transfers observed between Psi pv. cepacicola and Psi pv. setariae. Pan-GWAS analysis predicted a total of 154 genes including seven cluster of genes associated with the pathogenicity phenotype on onion. One of the clusters contain 11 genes with known functions and are found to be chromosomally located.

Download Full-text

New insights into homoeologous copy number variations in the hexaploid wheat genome

10.1101/2020.09.09.289447 ◽

2020 ◽

Author(s):

Caroline Juery ◽

Lorenzo Concia ◽

Romain De Oliveira ◽

Nathan Papon ◽

Ricardo Ramírez-González ◽

...

Keyword(s):

Hexaploid Wheat ◽

Core Genome ◽

Diploid Species ◽

Housekeeping Genes ◽

Copy Number Variations ◽

List Type ◽

The Core ◽

History Of ◽

Function Expression ◽

Dispensable Genome

AbstractBread wheat is an allohexaploid species originating from two successive and recent rounds of hybridization between three diploid species that were very similar in terms of chromosome number, genome size, TE content, gene content and synteny. As a result, it has long been considered that most of the genes were in three pairs of homoeologous copies. However, these so-called triads represent only one half of wheat genes, while the remaining half belong to homoeologous groups with various number of copies across subgenomes. In this study, we examined and compared the distribution, conservation, function, expression and epigenetic profiles of triads with homoeologous groups having undergone a deletion (dyads) or a duplication (tetrads) in one subgenome. We show that dyads and tetrads are mostly located in distal regions and have lower expression level and breadth than triads. Moreover, they are enriched in functions related to adaptation and more associated with the repressive H3K27me3 modification. Altogether, these results suggest that triads mainly correspond to housekeeping genes and are part of the core genome, while dyads and tetrads belong to the Triticeae dispensable genome. In addition, by comparing the different categories of dyads and tetrads, we hypothesize that, unlike most of the allopolyploid species, subgenome dominance and biased fractionation are absent in hexaploid wheat. Differences observed between the three subgenomes are more likely related to two successive and ongoing waves of post-polyploid diploidization, that had impacted A and B more significantly than D, as a result of the evolutionary history of hexaploid wheat.Core ideasOnly one half of hexaploid wheat genes are in triads, i.e. in a 1:1:1 ratio across subgenomesTriads are likely part of the core genome; dyads and tetrads belong to the dispensable genomeSubgenome dominance and biased fractionation are absent in hexaploid wheatSubgenome differences are related to two successive waves of post-polyploid diploidization

Download Full-text

Analysis of the Core Genome and Pan-Genome of Autotrophic Acetogenic Bacteria

Frontiers in Microbiology ◽

10.3389/fmicb.2016.01531 ◽

2016 ◽

Vol 7 ◽

Cited By ~ 25

Author(s):

Jongoh Shin ◽

Yoseb Song ◽

Yujin Jeong ◽

Byung-Kwan Cho

Keyword(s):

Core Genome ◽

The Core ◽

Pan Genome ◽

Acetogenic Bacteria

Download Full-text

Evolution of the Core Genome of Pseudomonas syringae, a Highly Clonal, Endemic Plant Pathogen

Applied and Environmental Microbiology ◽

10.1128/aem.70.4.1999-2012.2004 ◽

2004 ◽

Vol 70 (4) ◽

pp. 1999-2012 ◽

Cited By ~ 291

Author(s):

Sara F. Sarkar ◽

David S. Guttman

Keyword(s):

Genetic Variation ◽

Pseudomonas Syringae ◽

Core Genome ◽

Demographic History ◽

Housekeeping Genes ◽

The United States ◽

Plant Diseases ◽

Endemic Plant ◽

Host Association ◽

The Core

ABSTRACT Pseudomonas syringae is a common foliar bacterium responsible for many important plant diseases. We studied the population structure and dynamics of the core genome of P. syringae via multilocus sequencing typing (MLST) of 60 strains, representing 21 pathovars and 2 nonpathogens, isolated from a variety of plant hosts. Seven housekeeping genes, dispersed around the P. syringae genome, were sequenced to obtain 400 to 500 nucleotides per gene. Forty unique sequence types were identified, with most strains falling into one of four major clades. Phylogenetic and maximum-likelihood analyses revealed a remarkable degree of congruence among the seven genes, indicating a common evolutionary history for the seven loci. MLST and population genetic analyses also found a very low level of recombination. Overall, mutation was found to be approximately four times more likely than recombination to change any single nucleotide. A skyline plot was used to study the demographic history of P. syringae. The species was found to have maintained a constant population size over time. Strains were also found to remain genetically homogeneous over many years, and when isolated from sites as widespread as the United States and Japan. An analysis of molecular variance found that host association explains only a small proportion of the total genetic variation in the sample. These analyses reveal that with respect to the core genome, P. syringae is a highly clonal and stable species that is endemic within plant populations, yet the genetic variation seen in these genes only weakly predicts host association.

Download Full-text

Exploration of Survival Traits, Probiotic Determinants, Host Interactions, and Functional Evolution of Bifidobacterial Genomes Using Comparative Genomics

Genes ◽

10.3390/genes9100477 ◽

2018 ◽

Vol 9 (10) ◽

pp. 477 ◽

Cited By ~ 5

Author(s):

Vikas Sharma ◽

Fauzul Mobeen ◽

Tulika Prakash

Keyword(s):

Core Genome ◽

Size Variation ◽

Genomic Islands ◽

Genome Size Variation ◽

Host Interactions ◽

The Core ◽

Pan Genome ◽

Wide Range ◽

Insertion Elements ◽

Open Nature

Members of the genus Bifidobacterium are found in a wide-range of habitats and are used as important probiotics. Thus, exploration of their functional traits at the genus level is of utmost significance. Besides, this genus has been demonstrated to exhibit an open pan-genome based on the limited number of genomes used in earlier studies. However, the number of genomes is a crucial factor for pan-genome calculations. We have analyzed the pan-genome of a comparatively larger dataset of 215 members of the genus Bifidobacterium belonging to different habitats, which revealed an open nature. The pan-genome for the 56 probiotic and human-gut strains of this genus, was also found to be open. The accessory- and unique-components of this pan-genome were found to be under the operation of Darwinian selection pressure. Further, their genome-size variation was predicted to be attributed to the abundance of certain functions carried by genomic islands, which are facilitated by insertion elements and prophages. In silico functional and host-microbe interaction analyses of their core-genome revealed significant genomic factors for niche-specific adaptations and probiotic traits. The core survival traits include stress tolerance, biofilm formation, nutrient transport, and Sec-secretion system, whereas the core probiotic traits are imparted by the factors involved in carbohydrate- and protein-metabolism and host-immunomodulations.

Download Full-text

Comparative Genomics and CAZyme Genome Repertoires of Marine Zobellia amurskyensis KMM 3526T and Zobellia laminariae KMM 3676T

Marine Drugs ◽

10.3390/md17120661 ◽

2019 ◽

Vol 17 (12) ◽

pp. 661 ◽

Cited By ~ 5

Author(s):

Nadezhda Chernysheva ◽

Evgeniya Bystritskaya ◽

Anna Stenkova ◽

Ilya Golovkin ◽

Olga Nedashkovskaya ◽

...

Keyword(s):

Comparative Genomics ◽

Core Genome ◽

Genomic Analysis ◽

Comparative Genomic Analysis ◽

Active Enzyme ◽

Comparative Genomic ◽

The Core ◽

Pan Genome ◽

Carbohydrate Active Enzyme ◽

Pharmaceutical Industries

We obtained two novel draft genomes of type Zobellia strains with estimated genome sizes of 5.14 Mb for Z. amurskyensis KMM 3526Т and 5.16 Mb for Z. laminariae KMM 3676Т. Comparative genomic analysis has been carried out between obtained and known genomes of Zobellia representatives. The pan-genome of Zobellia genus is composed of 4853 orthologous clusters and the core genome was estimated at 2963 clusters. The genus CAZome was represented by 775 GHs classified into 62 families, 297 GTs of 16 families, 100 PLs of 13 families, 112 CEs of 13 families, 186 CBMs of 18 families and 42 AAs of six families. A closer inspection of the carbohydrate-active enzyme (CAZyme) genomic repertoires revealed members of new putative subfamilies of GH16 and GH117, which can be biotechnologically promising for production of oligosaccharides and rare monomers with different bioactivities. We analyzed AA3s, among them putative FAD-dependent glycoside oxidoreductases (FAD-GOs) being of particular interest as promising biocatalysts for glycoside deglycosylation in food and pharmaceutical industries.

Download Full-text

Diversity unearthed by the estimated molecular phylogeny and ecologically quantitative characteristics of uncultured Ehrlichia bacteria in Haemaphysalis ticks, Japan

Scientific Reports ◽

10.1038/s41598-020-80690-7 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Hongru Su ◽

Eri Onoda ◽

Hitoshi Tai ◽

Hiromi Fujita ◽

Shigetoshi Sakabe ◽

...

Keyword(s):

Phylogenetic Analysis ◽

16S Rrna ◽

Core Genome ◽

Phylogenetic Analyses ◽

Taxonomic Status ◽

Housekeeping Genes ◽

Pcr Screening ◽

Taxonomic Profiling ◽

Quantitative Characteristics

AbstractEhrlichia species are obligatory intracellular bacteria transmitted by arthropods, and some of these species cause febrile diseases in humans and livestock. Genome sequencing has only been performed with cultured Ehrlichia species, and the taxonomic status of such ehrlichiae has been estimated by core genome-based phylogenetic analysis. However, many uncultured ehrlichiae exist in nature throughout the world, including Japan. This study aimed to conduct a molecular-based taxonomic and ecological characterization of uncultured Ehrlichia species or genotypes from ticks in Japan. We first surveyed 616 Haemaphysalis ticks by p28-PCR screening and analyzed five additional housekeeping genes (16S rRNA, groEL, gltA, ftsZ, and rpoB) from 11 p28-PCR-positive ticks. Phylogenetic analyses of the respective genes showed similar trees but with some differences. Furthermore, we found that V1 in the V1–V9 regions of Ehrlichia 16S rRNA exhibited the greatest variability. From an ecological viewpoint, the amounts of ehrlichiae in a single tick were found to equal approx. 6.3E+3 to 2.0E+6. Subsequently, core-partial-RGGFR-based phylogenetic analysis based on the concatenated sequences of the five housekeeping loci revealed six Ehrlichia genotypes, which included potentially new Ehrlichia species. Thus, our approach contributes to the taxonomic profiling and ecological quantitative analysis of uncultured or unidentified Ehrlichia species or genotypes worldwide.

Download Full-text