scholarly journals panX: pan-genome analysis and exploration

2016 ◽  
Author(s):  
Wei Ding ◽  
Franz Baumdicker ◽  
Richard A. Neher

Horizontal transfer, gene loss, and duplication result in dynamic bacterial genomes shaped by a complex mixture of different modes of evolution. Closely related strains can differ in the presence or absence of many genes, and the total number of distinct genes found in a set of related isolates – the pan-genome – is often many times larger than the genome of individual isolates. We have developed a pipeline that efficiently identifies orthologous gene clusters in the pan-genome. This pipeline is coupled to a powerful yet easy-to-use web-based visualization software for interactive exploration of the pan-genome. The visualization consists of connected components that allow rapid filtering and searching of genes and inspection of their evolutionary history. For each gene cluster, panX displays an alignment, a phylogenetic tree, maps mutations within that cluster to the branches of the tree and infers gain and loss of genes on the core-genome phylogeny. PanX is available at pangenome.de. Custom pan-genomes can be visualized either using a webserver or by serving panX locally as a browser-based application.

2021 ◽  
Vol 9 (8) ◽  
pp. 1761
Author(s):  
Gaurav Agarwal ◽  
Ronald D. Gitaitis ◽  
Bhabesh Dutta

Pantoea stewartii subsp. indologenes (Psi) is a causative agent of leafspot on foxtail millet and pearl millet; however, novel strains were recently identified that are pathogenic on onions. Our recent host range evaluation study identified two pathovars; P. stewartii subsp. indologenes pv. cepacicola pv. nov. and P. stewartii subsp. indologenes pv. setariae pv. nov. that are pathogenic on onions and millets or on millets only, respectively. In the current study, we developed a pan-genome using the whole genome sequencing of newly identified/classified Psi strains from both pathovars [pv. cepacicola (n = 4) and pv. setariae (n = 13)]. The full spectrum of the pan-genome contained 7030 genes. Among these, 3546 (present in genomes of all 17 strains) were the core genes that were a subset of 3682 soft-core genes (present in ≥16 strains). The accessory genome included 1308 shell genes and 2040 cloud genes (present in ≤2 strains). The pan-genome showed a clear linear progression with >6000 genes, suggesting that the pan-genome of Psi is open. Comparative phylogenetic analysis showed differences in phylogenetic clustering of Pantoea spp. using PAVs/wgMLST approach in comparison with core genome SNPs-based phylogeny. Further, we conducted a horizontal gene transfer (HGT) study using Psi strains from both pathovars along with strains from other Pantoea species, namely, P. stewartii subsp. stewartii LMG 2715T, P. ananatis LMG 2665T, P. agglomerans LMG L15, and P. allii LMG 24248T. A total of 317 HGT events among four Pantoea species were identified with most gene transfer events occurring between Psi pv. cepacicola and Psi pv. setariae. Pan-GWAS analysis predicted a total of 154 genes, including seven gene-clusters, which were associated with the pathogenicity phenotype (necrosis on seedling) on onions. One of the gene-clusters contained 11 genes with known functions and was found to be chromosomally located.


2021 ◽  
Vol 7 (5) ◽  
pp. 337
Author(s):  
Daniel Peterson ◽  
Tang Li ◽  
Ana M. Calvo ◽  
Yanbin Yin

Phytopathogenic Ascomycota are responsible for substantial economic losses each year, destroying valuable crops. The present study aims to provide new insights into phytopathogenicity in Ascomycota from a comparative genomic perspective. This has been achieved by categorizing orthologous gene groups (orthogroups) from 68 phytopathogenic and 24 non-phytopathogenic Ascomycota genomes into three classes: Core, (pathogen or non-pathogen) group-specific, and genome-specific accessory orthogroups. We found that (i) ~20% orthogroups are group-specific and accessory in the 92 Ascomycota genomes, (ii) phytopathogenicity is not phylogenetically determined, (iii) group-specific orthogroups have more enriched functional terms than accessory orthogroups and this trend is particularly evident in phytopathogenic fungi, (iv) secreted proteins with signal peptides and horizontal gene transfers (HGTs) are the two functional terms that show the highest occurrence and significance in group-specific orthogroups, (v) a number of other functional terms are also identified to have higher significance and occurrence in group-specific orthogroups. Overall, our comparative genomics analysis determined positive enrichment existing between orthogroup classes and revealed a prediction of what genomic characteristics make an Ascomycete phytopathogenic. We conclude that genes shared by multiple phytopathogenic genomes are more important for phytopathogenicity than those that are unique in each genome.


2021 ◽  
Vol 9 (4) ◽  
pp. 768
Author(s):  
Karel Kopejtka ◽  
Yonghui Zeng ◽  
David Kaftan ◽  
Vadim Selyanin ◽  
Zdenko Gardian ◽  
...  

An aerobic, yellow-pigmented, bacteriochlorophyll a-producing strain, designated AAP5 (=DSM 111157=CCUG 74776), was isolated from the alpine lake Gossenköllesee located in the Tyrolean Alps, Austria. Here, we report its description and polyphasic characterization. Phylogenetic analysis of the 16S rRNA gene showed that strain AAP5 belongs to the bacterial genus Sphingomonas and has the highest pairwise 16S rRNA gene sequence similarity with Sphingomonas glacialis (98.3%), Sphingomonas psychrolutea (96.8%), and Sphingomonas melonis (96.5%). Its genomic DNA G + C content is 65.9%. Further, in silico DNA-DNA hybridization and calculation of the average nucleotide identity speaks for the close phylogenetic relationship of AAP5 and Sphingomonas glacialis. The high percentage (76.2%) of shared orthologous gene clusters between strain AAP5 and Sphingomonas paucimobilis NCTC 11030T, the type species of the genus, supports the classification of the two strains into the same genus. Strain AAP5 was found to contain C18:1ω7c (64.6%) as a predominant fatty acid (>10%) and the polar lipid profile contained phosphatidylglycerol, diphosphatidylglycerol, phosphatidylethanolamine, sphingoglycolipid, six unidentified glycolipids, one unidentified phospholipid, and two unidentified lipids. The main respiratory quinone was ubiquinone-10. Strain AAP5 is a facultative photoheterotroph containing type-2 photosynthetic reaction centers and, in addition, contains a xathorhodopsin gene. No CO2-fixation pathways were found.


2015 ◽  
Vol 112 (29) ◽  
pp. 9070-9075 ◽  
Author(s):  
Purushottam D. Dixit ◽  
Tin Yau Pang ◽  
F. William Studier ◽  
Sergei Maslov

An approximation to the ∼4-Mbp basic genome shared by 32 strains ofEscherichia colirepresenting six evolutionary groups has been derived and analyzed computationally. A multiple alignment of the 32 complete genome sequences was filtered to remove mobile elements and identify the most reliable ∼90% of the aligned length of each of the resulting 496 basic-genome pairs. Patterns of single base-pair mutations (SNPs) in aligned pairs distinguish clonally inherited regions from regions where either genome has acquired DNA fragments from diverged genomes by homologous recombination since their last common ancestor. Such recombinant transfer is pervasive across the basic genome, mostly between genomes in the same evolutionary group, and generates many unique mosaic patterns. The six least-diverged genome pairs have one or two recombinant transfers of length ∼40–115 kbp (and few if any other transfers), each containing one or more gene clusters known to confer strong selective advantage in some environments. Moderately diverged genome pairs (0.4–1% SNPs) show mosaic patterns of interspersed clonal and recombinant regions of varying lengths throughout the basic genome, whereas more highly diverged pairs within an evolutionary group or pairs between evolutionary groups having >1.3% SNPs have few clonal matches longer than a few kilobase pairs. Many recombinant transfers appear to incorporate fragments of the entering DNA produced by restriction systems of the recipient cell. A simple computational model can closely fit the data. Most recombinant transfers seem likely to be due to generalized transduction by coevolving populations of phages, which could efficiently distribute variability throughout bacterial genomes.


1997 ◽  
Vol 45 (5) ◽  
pp. 467-472 ◽  
Author(s):  
Janet L. Siefert ◽  
Kirt A. Martin ◽  
Fadi Abdi ◽  
William R. Widger ◽  
George E. Fox

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Rachel M. Colquhoun ◽  
Michael B. Hall ◽  
Leandro Lima ◽  
Leah W. Roberts ◽  
Kerri M. Malone ◽  
...  

AbstractWe present pandora, a novel pan-genome graph structure and algorithms for identifying variants across the full bacterial pan-genome. As much bacterial adaptability hinges on the accessory genome, methods which analyze SNPs in just the core genome have unsatisfactory limitations. Pandora approximates a sequenced genome as a recombinant of references, detects novel variation and pan-genotypes multiple samples. Using a reference graph of 578 Escherichia coli genomes, we compare 20 diverse isolates. Pandora recovers more rare SNPs than single-reference-based tools, is significantly better than picking the closest RefSeq reference, and provides a stable framework for analyzing diverse samples without reference bias.


2020 ◽  
Author(s):  
Sandra A. C. Figueiredo ◽  
Marco Preto ◽  
Gabriela Moreira ◽  
Teresa P. Martins ◽  
Kathleen Abt ◽  
...  

Natural products have an important role in several human activities, most notably as sources of new drugs. In recent years, massive sequencing and annotation of bacterial genomes has revealed an unexpectedly large number of secondary metabolite biosynthetic gene clusters whose products are yet to be discovered. For example, cyanobacterial genomes contain a large number of gene clusters that likely incorporate fatty acid-derived moieties, but for most cases we lack the knowledge and tools to effectively predict or detect the encoded natural products. Here, we exploit the apparent lack of a functional beta-oxidation pathway in cyanobacteria to achieve efficient stable-isotope labeling of their fatty acid-derived lipidome. We show that supplementation of cyanobacterial cultures with deuterated fatty acids can be used to easily detect natural product signatures in individual strains. The utility of this strategy is demonstrated in two cultured cyanobacteria by uncovering analogues of the multidrug-resistance reverting hapalosin, and novel, cytotoxic, lactylate-nocuolin A hybrids – the nocuolactylates.


2021 ◽  
Author(s):  
Pradeep Ruperao ◽  
Nepolean Thirunavukkarasu ◽  
Prasad Gandham ◽  
Sivasubramani S. ◽  
Govindaraj M ◽  
...  

AbstractSorghum (Sorghum bicolor L.) is one of the most important food crops in the arid and rainfed production ecologies. It is a part of resilient farming and is projected as a smart crop to overcome the food and nutritional challenges in the developing world. The development and characterisation of the sorghum pan-genome will provide insight into genome diversity and functionality, supporting sorghum improvement. We built a sorghum pan-genome using reference genomes as well as 354 genetically diverse sorghum accessions belonging to different races. We explored the structural and functional characteristics of the pan-genome and explain its utility in supporting genetic gain. The newly-developed pan-genome has a total of 35,719 genes, a core genome of 16,821 genes and an average of 32,795 genes in each cultivar. The variable genes are enriched with environment responsive genes and classify the sorghum accessions according to their race. We show that 53% of genes display presence-absence variation, and some of these variable genes are predicted to be functionally associated with drought traits. Using more than two million SNPs from the pan-genome, association analysis identified 398 SNPs significantly associated with important agronomic traits, of which, 92 were in genes. Drought gene expression analysis identified 1,788 genes that are functionally linked to different conditions, of which 79 were absent from the reference genome assembly. This study provides comprehensive genomic diversity resources in sorghum which can be used in genome assisted crop improvement.


2021 ◽  
Vol 12 ◽  
Author(s):  
Carlos Caicedo-Montoya ◽  
Monserrat Manzo-Ruiz ◽  
Rigoberto Ríos-Estepa

Species of the genus Streptomyces are known for their ability to produce multiple secondary metabolites; their genomes have been extensively explored to discover new bioactive compounds. The richness of genomic data currently available allows filtering for high quality genomes, which in turn permits reliable comparative genomics studies and an improved prediction of biosynthetic gene clusters (BGCs) through genome mining approaches. In this work, we used 121 genome sequences of the genus Streptomyces in a comparative genomics study with the aim of estimating the genomic diversity by protein domains content, sequence similarity of proteins and conservation of Intergenic Regions (IGRs). We also searched for BGCs but prioritizing those with potential antibiotic activity. Our analysis revealed that the pan-genome of the genus Streptomyces is clearly open, with a high quantity of unique gene families across the different species and that the IGRs are rarely conserved. We also described the phylogenetic relationships of the analyzed genomes using multiple markers, obtaining a trustworthy tree whose relationships were further validated by Average Nucleotide Identity (ANI) calculations. Finally, 33 biosynthetic gene clusters were detected to have potential antibiotic activity and a predicted mode of action, which might serve up as a guide to formulation of related experimental studies.


2017 ◽  
Author(s):  
Christian Munck ◽  
Mostafa M. Hashim Ellabaan ◽  
Michael Schantz Klausen ◽  
Morten O.A. Sommer

AbstractGenes capable of conferring resistance to clinically used antibiotics have been found in many different natural environments. However, a concise overview of the resistance genes found in common human bacterial pathogens is lacking, which complicates risk ranking of environmental reservoirs. Here, we present an analysis of potential antibiotic resistance genes in the 17 most common bacterial pathogens isolated from humans. We analyzed more than 20,000 bacterial genomes and defined a clinical resistome as the set of resistance genes found across these genomes. Using this database, we uncovered the co-occurrence frequencies of the resistance gene clusters within each species enabling identification of co-dissemination and co-selection patterns. The resistance genes identified in this study represent the subset of the environmental resistome that is clinically relevant and the dataset and approach provides a baseline for further investigations into the abundance of clinically relevant resistance genes across different environments. To facilitate an easy overview the data is presented at the species level at www.resistome.biosustain.dtu.dk.


Sign in / Sign up

Export Citation Format

Share Document