scholarly journals Pan-Genome Analyses of Geobacillus spp. Reveal Genetic Characteristics and Composting Potential

2020 ◽  
Vol 21 (9) ◽  
pp. 3393
Author(s):  
Mengmeng Wang ◽  
Han Zhu ◽  
Zhijian Kong ◽  
Tuo Li ◽  
Lei Ma ◽  
...  

The genus Geobacillus is abundant in ecological diversity and is also well-known as an authoritative source for producing various thermostable enzymes. Although it is clear now that Geobacillus evolved from Bacillus, relatively little knowledge has been obtained regarding its evolutionary mechanism, which might also contribute to its ecological diversity and biotechnology potential. Here, a statistical comparison of thirty-two Geobacillus genomes was performed with a specific focus on pan- and core genomes. The pan-genome of this set of Geobacillus strains contained 14,913 genes, and the core genome contained 940 genes. The Clusters of Orthologous Groups (COG) and Carbohydrate-Active Enzymes (CAZymes) analysis revealed that the Geobacillus strains had huge potential industrial application in composting for agricultural waste management. Detailed comparative analyses showed that basic functional classes and housekeeping genes were conserved in the core genome, while genes associated with environmental interaction or energy metabolism were more enriched in the pan-genome. Therefore, the evolution of Geobacillus seems to be guided by environmental parameters. In addition, horizontal gene transfer (HGT) events among different Geobacillus species were detected. Altogether, pan-genome analysis was a useful method for detecting the evolutionary mechanism, and Geobacillus’ evolution was directed by the environment and HGT events.

2020 ◽  
Vol 14 ◽  
pp. 117793222093806
Author(s):  
Sávio Souza Costa ◽  
Luís Carlos Guimarães ◽  
Artur Silva ◽  
Siomar Castro Soares ◽  
Rafael Azevedo Baraúna

Pan-genome is defined as the set of orthologous and unique genes of a specific group of organisms. The pan-genome is composed by the core genome, accessory genome, and species- or strain-specific genes. The pan-genome is considered open or closed based on the alpha value of the Heap law. In an open pan-genome, the number of gene families will continuously increase with the addition of new genomes to the analysis, while in a closed pan-genome, the number of gene families will not increase considerably. The first step of a pan-genome analysis is the homogenization of genome annotation. The same software should be used to annotate genomes, such as GeneMark or RAST. Subsequently, several software are used to calculate the pan-genome such as BPGA, GET_HOMOLOGUES, PGAP, among others. This review presents all these initial steps for those who want to perform a pan-genome analysis, explaining key concepts of the area. Furthermore, we present the pan-genomic analysis of 9 bacterial species. These are the species with the highest number of genomes deposited in GenBank. We also show the influence of the identity and coverage parameters on the prediction of orthologous and paralogous genes. Finally, we cite the perspectives of several research areas where pan-genome analysis can be used to answer important issues.


2019 ◽  
Author(s):  
Joana Isidro ◽  
Susana Ferreira ◽  
Miguel Pinto ◽  
Fernanda Domingues ◽  
Mónica Oleastro ◽  
...  

AbstractArcobacter butzleri is a food and waterborne bacteria and an emerging human pathogen, frequently displaying a multidrug resistant character. Still, no comprehensive genome-scale comparative analysis has been performed so far, which has limited our knowledge on A. butzleri diversification and pathogenicity. Here, we performed a deep genome analysis of A. butzleri focused on decoding its core- and pan-genome diversity and specific genetic traits underlying its pathogenic potential and diverse ecology. In total, 49 A. butzleri strains (collected from human, animal, food and environmental sources) were screened.A. butzleri (genome size 2.07-2.58 Mbp) revealed a large open pan-genome with 7474 genes (about 50% being singletons) and a small core-genome with 1165 genes. The core-genome is highly diverse (≥55% of the core genes presenting at least 40/49 alleles), being enriched with genes associated with housekeeping functions. In contrast, the accessory genome presented a high proportion of loci with an unknown function, also being particularly overrepresented by genes associated with defence mechanisms. A. butzleri revealed a plastic virulome (including newly identified determinants), marked by the differential presence of multiple adaptation-related virulence factors, such as the urease cluster ureD(AB)CEFG (phenotypically confirmed), the hypervariable hemagglutinin-encoding hecA, a putative type I secretion system (T1SS) harboring another agglutinin potentially related to adherence and a novel VirB/D4 T4SS likely linked to interbacterial competition and cytotoxicity. In addition, A. butzleri harbors a large repertoire of efflux pumps (EPs) (ten “core” and nine differentially present) and other antibiotic resistant determinants. We provide the first description of a genetic determinant of macrolides resistance in A. butzleri, by associating the inactivation of a TetR repressor (likely regulating an EP) with erythromycin resistance. Fluoroquinolones resistance correlated with the Thr-85-Ile substitution in GyrA and ampicillin resistance was linked to an OXA-15-like β-lactamase. Remarkably, by decoding the polymorphism pattern of the porin- and adhesin-encoding main antigen PorA, this study strongly supports that this pathogen is able to exchange porA as a whole and/or hypervariable epitope-encoding regions separately, leading to a multitude of chimeric PorA presentations that can impact pathogen-host interaction during infection. Ultimately, our unprecedented screening of short sequence repeats detected potential phase-variable genes related to adaptation and host/environment interaction, such as lipopolysaccharide modification and motility/chemotaxis, suggesting that phase variation likely modulate A. butzleri key adaptive functions.In summary, this study constitutes a turning point on A. butzleri comparative genomics revealing that this human gastrointestinal pathogen is equipped with vast virulence and antibiotic resistance arsenals, which, coupled with its remarkable core- and pan-genome diversity, opens a multitude of phenotypic fingerprints for environmental/host adaptation and pathogenicity.IMPACT STATEMENTDiarrhoeal diseases are the most common cause of human illness caused by foodborne hazards, but the surveillance of diarrhoeal diseases is biased towards the most commonly searched infectious agents (namely Campylobacter jejuni and C. coli). In fact, other less studied pathogens are frequently found as the etiological agent when refined non-selective culture conditions are applied. A hallmark example is the diarrhoeal-causing Arcobacter butzleri which, despite being also associated with extra-intestinal diseases, such as bacteremia in humans and mastitis in animals, and displaying high rates of antibiotic resistance, has not yet been profoundly investigated regarding its epidemiology, diversity and pathogenicity. To overcome the general lack of knowledge on A. butzleri comparative genomics, we provide the first comprehensive genome-scale analysis of A. butzleri focused on exploring the intraspecies virulome content and diversity, resistance determinants, as well as how this pathogen shapes its genome towards ecological adaptation and host invasion. The unveiled scenario of A. butzleri rampant diversity and plasticity reinforces the pathogenic potential of this food and waterborne hazard, while opening multiple research lines that will certainly contribute to the future development of more robust species-oriented diagnostics and molecular surveillance of A. butzleri.DATA SUMMARYA. butzleri raw sequence reads generated in the present study were deposited in the European Nucleotide Archive (ENA) (BioProject PRJEB34441). The assembled contigs (.fasta and .gbk files), the nucleotide sequences of the predicted transcripts (CDS, rRNA, tRNA, tmRNA, misc_RNA) (.ffn files) and the respective amino acid sequences of the translated CDS sequences (.faa files) are available at http://doi.org/10.5281/zenodo.3434222. Detailed ENA accession numbers, as well as the draft genome statistics are described in Table S1.


2017 ◽  
Author(s):  
Andries J van Tonder ◽  
James E Bray ◽  
Keith A Jolley ◽  
Sigríður J Quirk ◽  
Gunnsteinn Haraldsson ◽  
...  

AbstractBackgroundUnderstanding the structure of a bacterial population is essential in order to understand bacterial evolution, or which genetic lineages cause disease, or the consequences of perturbations to the bacterial population. Estimating the core genome, the genes common to all or nearly all strains of a species, is an essential component of such analyses. The size and composition of the core genome varies by dataset, but our hypothesis was that variation between different collections of the same bacterial species should be minimal. To test this, the genome sequences of 3,121 pneumococci recovered from healthy individuals in Reykjavik (Iceland), Southampton (United Kingdom), Boston (USA) and Maela (Thailand) were analysed.ResultsThe analyses revealed a ‘supercore’ genome (genes shared by all 3,121 pneumococci) of only 303 genes, although 461 additional core genes were shared by pneumococci from Reykjavik, Southampton and Boston. Overall, the size and composition of the core genomes and pan-genomes among pneumococci recovered in Reykjavik, Southampton and Boston were very similar, but pneumococci from Maela were distinctly different. Inspection of the pan-genome of Maela pneumococci revealed several >25 Kb sequence regions that were homologous to genomic regions found in other bacterial species.ConclusionsSome subsets of the global pneumococcal population are highly heterogeneous and thus our hypothesis was rejected. This is an essential point of consideration before generalising the findings from a single dataset to the wider pneumococcal population.


Author(s):  
Gaurav Agarwal ◽  
Ronald D. Gitaitis ◽  
Bhabesh Dutta

Pantoea stewartii subsp. indologenes (Psi) is a causative agent of leafspot of foxtail millet and pearl millet; however, novel strains were recently identified that are pathogenic on onion. Our recent host range evaluation study identified two pathovars; P. stewartii subsp. indologenes pv. cepacicola pv. nov. and P. stewartii subsp. indologenes pv. setariae pv. nov. that are pathogenic on onion and millets or on millets only, respectively. In the current study we developed a pan-genome using the whole genome sequencing of newly identified/classified Psi strains from both pathovars [pv. cepacicola (n= 4) and pv. setariae (n=13)]. The full spectrum of the pan-genome contained 7,030 genes. Among these, 3,546 (present in genomes of all 17 strains) were the core genes that were a subset of 3,682 soft-core genes (present in ≥16 strains). The accessory genome included 1,308 shell genes and 2,040 cloud genes (present in ≤ 2 strains). The pan-genome showed a clear liner progression with >6,000 genes, suggesting the pan-genome of Psi is open. Comparative phylogenetic analysis showed differences in phylogenetic clustering of Pantoea spp. using PAVs/wgMLST approach in comparison to core genome SNP-based phylogeny. Further, we conducted a horizontal gene transfer (HGT) study including four other Pantoea species namely, P. stewartii subsp. stewartii LMG 2715T, P. ananatis LMG 2665T, P. agglomerans LMG L15, and P. allii LMG 24248T. A total of 317 HGT events among four Pantoea species were identified with most gene transfers observed between Psi pv. cepacicola and Psi pv. setariae. Pan-GWAS analysis predicted a total of 154 genes including seven cluster of genes associated with the pathogenicity phenotype on onion. One of the clusters contain 11 genes with known functions and are found to be chromosomally located.


2020 ◽  
Author(s):  
Caroline Juery ◽  
Lorenzo Concia ◽  
Romain De Oliveira ◽  
Nathan Papon ◽  
Ricardo Ramírez-González ◽  
...  

AbstractBread wheat is an allohexaploid species originating from two successive and recent rounds of hybridization between three diploid species that were very similar in terms of chromosome number, genome size, TE content, gene content and synteny. As a result, it has long been considered that most of the genes were in three pairs of homoeologous copies. However, these so-called triads represent only one half of wheat genes, while the remaining half belong to homoeologous groups with various number of copies across subgenomes. In this study, we examined and compared the distribution, conservation, function, expression and epigenetic profiles of triads with homoeologous groups having undergone a deletion (dyads) or a duplication (tetrads) in one subgenome. We show that dyads and tetrads are mostly located in distal regions and have lower expression level and breadth than triads. Moreover, they are enriched in functions related to adaptation and more associated with the repressive H3K27me3 modification. Altogether, these results suggest that triads mainly correspond to housekeeping genes and are part of the core genome, while dyads and tetrads belong to the Triticeae dispensable genome. In addition, by comparing the different categories of dyads and tetrads, we hypothesize that, unlike most of the allopolyploid species, subgenome dominance and biased fractionation are absent in hexaploid wheat. Differences observed between the three subgenomes are more likely related to two successive and ongoing waves of post-polyploid diploidization, that had impacted A and B more significantly than D, as a result of the evolutionary history of hexaploid wheat.Core ideasOnly one half of hexaploid wheat genes are in triads, i.e. in a 1:1:1 ratio across subgenomesTriads are likely part of the core genome; dyads and tetrads belong to the dispensable genomeSubgenome dominance and biased fractionation are absent in hexaploid wheatSubgenome differences are related to two successive waves of post-polyploid diploidization


2016 ◽  
Vol 7 ◽  
Author(s):  
Jongoh Shin ◽  
Yoseb Song ◽  
Yujin Jeong ◽  
Byung-Kwan Cho

2004 ◽  
Vol 70 (4) ◽  
pp. 1999-2012 ◽  
Author(s):  
Sara F. Sarkar ◽  
David S. Guttman

ABSTRACT Pseudomonas syringae is a common foliar bacterium responsible for many important plant diseases. We studied the population structure and dynamics of the core genome of P. syringae via multilocus sequencing typing (MLST) of 60 strains, representing 21 pathovars and 2 nonpathogens, isolated from a variety of plant hosts. Seven housekeeping genes, dispersed around the P. syringae genome, were sequenced to obtain 400 to 500 nucleotides per gene. Forty unique sequence types were identified, with most strains falling into one of four major clades. Phylogenetic and maximum-likelihood analyses revealed a remarkable degree of congruence among the seven genes, indicating a common evolutionary history for the seven loci. MLST and population genetic analyses also found a very low level of recombination. Overall, mutation was found to be approximately four times more likely than recombination to change any single nucleotide. A skyline plot was used to study the demographic history of P. syringae. The species was found to have maintained a constant population size over time. Strains were also found to remain genetically homogeneous over many years, and when isolated from sites as widespread as the United States and Japan. An analysis of molecular variance found that host association explains only a small proportion of the total genetic variation in the sample. These analyses reveal that with respect to the core genome, P. syringae is a highly clonal and stable species that is endemic within plant populations, yet the genetic variation seen in these genes only weakly predicts host association.


Genes ◽  
2018 ◽  
Vol 9 (10) ◽  
pp. 477 ◽  
Author(s):  
Vikas Sharma ◽  
Fauzul Mobeen ◽  
Tulika Prakash

Members of the genus Bifidobacterium are found in a wide-range of habitats and are used as important probiotics. Thus, exploration of their functional traits at the genus level is of utmost significance. Besides, this genus has been demonstrated to exhibit an open pan-genome based on the limited number of genomes used in earlier studies. However, the number of genomes is a crucial factor for pan-genome calculations. We have analyzed the pan-genome of a comparatively larger dataset of 215 members of the genus Bifidobacterium belonging to different habitats, which revealed an open nature. The pan-genome for the 56 probiotic and human-gut strains of this genus, was also found to be open. The accessory- and unique-components of this pan-genome were found to be under the operation of Darwinian selection pressure. Further, their genome-size variation was predicted to be attributed to the abundance of certain functions carried by genomic islands, which are facilitated by insertion elements and prophages. In silico functional and host-microbe interaction analyses of their core-genome revealed significant genomic factors for niche-specific adaptations and probiotic traits. The core survival traits include stress tolerance, biofilm formation, nutrient transport, and Sec-secretion system, whereas the core probiotic traits are imparted by the factors involved in carbohydrate- and protein-metabolism and host-immunomodulations.


Marine Drugs ◽  
2019 ◽  
Vol 17 (12) ◽  
pp. 661 ◽  
Author(s):  
Nadezhda Chernysheva ◽  
Evgeniya Bystritskaya ◽  
Anna Stenkova ◽  
Ilya Golovkin ◽  
Olga Nedashkovskaya ◽  
...  

We obtained two novel draft genomes of type Zobellia strains with estimated genome sizes of 5.14 Mb for Z. amurskyensis KMM 3526Т and 5.16 Mb for Z. laminariae KMM 3676Т. Comparative genomic analysis has been carried out between obtained and known genomes of Zobellia representatives. The pan-genome of Zobellia genus is composed of 4853 orthologous clusters and the core genome was estimated at 2963 clusters. The genus CAZome was represented by 775 GHs classified into 62 families, 297 GTs of 16 families, 100 PLs of 13 families, 112 CEs of 13 families, 186 CBMs of 18 families and 42 AAs of six families. A closer inspection of the carbohydrate-active enzyme (CAZyme) genomic repertoires revealed members of new putative subfamilies of GH16 and GH117, which can be biotechnologically promising for production of oligosaccharides and rare monomers with different bioactivities. We analyzed AA3s, among them putative FAD-dependent glycoside oxidoreductases (FAD-GOs) being of particular interest as promising biocatalysts for glycoside deglycosylation in food and pharmaceutical industries.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Hongru Su ◽  
Eri Onoda ◽  
Hitoshi Tai ◽  
Hiromi Fujita ◽  
Shigetoshi Sakabe ◽  
...  

AbstractEhrlichia species are obligatory intracellular bacteria transmitted by arthropods, and some of these species cause febrile diseases in humans and livestock. Genome sequencing has only been performed with cultured Ehrlichia species, and the taxonomic status of such ehrlichiae has been estimated by core genome-based phylogenetic analysis. However, many uncultured ehrlichiae exist in nature throughout the world, including Japan. This study aimed to conduct a molecular-based taxonomic and ecological characterization of uncultured Ehrlichia species or genotypes from ticks in Japan. We first surveyed 616 Haemaphysalis ticks by p28-PCR screening and analyzed five additional housekeeping genes (16S rRNA, groEL, gltA, ftsZ, and rpoB) from 11 p28-PCR-positive ticks. Phylogenetic analyses of the respective genes showed similar trees but with some differences. Furthermore, we found that V1 in the V1–V9 regions of Ehrlichia 16S rRNA exhibited the greatest variability. From an ecological viewpoint, the amounts of ehrlichiae in a single tick were found to equal approx. 6.3E+3 to 2.0E+6. Subsequently, core-partial-RGGFR-based phylogenetic analysis based on the concatenated sequences of the five housekeeping loci revealed six Ehrlichia genotypes, which included potentially new Ehrlichia species. Thus, our approach contributes to the taxonomic profiling and ecological quantitative analysis of uncultured or unidentified Ehrlichia species or genotypes worldwide.


Sign in / Sign up

Export Citation Format

Share Document