CoreCruncher: Fast and Robust Construction of Core Genomes in Large Prokaryotic Data Sets

Molecular Biology and Evolution ◽

10.1093/molbev/msaa224 ◽

2020 ◽

Author(s):

Connor D Harris ◽

Ellis L Torrance ◽

Kasie Raymann ◽

Louis-Marie Bobay

Keyword(s):

Core Genome ◽

Genomic Data ◽

Data Sets ◽

The Core ◽

Genomic Analyses ◽

Massive Accumulation ◽

Genome Comparisons

Abstract The core genome represents the set of genes shared by all, or nearly all, strains of a given population or species of prokaryotes. Inferring the core genome is integral to many genomic analyses, however, most methods rely on the comparison of all the pairs of genomes; a step that is becoming increasingly difficult given the massive accumulation of genomic data. Here, we present CoreCruncher; a program that robustly and rapidly constructs core genomes across hundreds or thousands of genomes. CoreCruncher does not compute all pairwise genome comparisons and uses a heuristic based on the distributions of identity scores to classify sequences as orthologs or paralogs/xenologs. Although it is much faster than current methods, our results indicate that our approach is more conservative than other tools and less sensitive to the presence of paralogs and xenologs. CoreCruncher is freely available from: https://github.com/lbobay/CoreCruncher. CoreCruncher is written in Python 3.7 and can also run on Python 2.7 without modification. It requires the python library Numpy and either Usearch or Blast. Certain options require the programs muscle or mafft.

Download Full-text

The Sinorhizobium fredii HH103 Genome: A Comparative Analysis With S. fredii Strains Differing in Their Symbiotic Behavior With Soybean

Molecular Plant-Microbe Interactions ◽

10.1094/mpmi-12-14-0397-fi ◽

2015 ◽

Vol 28 (7) ◽

pp. 811-824 ◽

Cited By ~ 33

Author(s):

José-María Vinardell ◽

Sebastián Acosta-Jurado ◽

Susanne Zehner ◽

Michael Göttfert ◽

Anke Becker ◽

...

Keyword(s):

Core Genome ◽

Comparative Genomic ◽

Secretion Systems ◽

Sinorhizobium Fredii ◽

The Core ◽

Sensing Systems ◽

Plot Analysis ◽

Genomic Analyses ◽

Surface Polysaccharide ◽

Protein Secretion Systems

Sinorhizobium fredii HH103 is a fast-growing rhizobial strain infecting a broad range of legumes including both American and Asiatic soybeans. In this work, we present the sequencing and annotation of the HH103 genome (7.25 Mb), consisting of one chromosome and six plasmids and representing the structurally most complex sinorhizobial genome sequenced so far. Comparative genomic analyses of S. fredii HH103 with strains USDA257 and NGR234 showed that the core genome of these three strains contains 4,212 genes (61.7% of the HH103 genes). Synteny plot analysis revealed that the much larger chromosome of USDA257 (6.48 Mb) is colinear to the HH103 (4.3 Mb) and NGR324 chromosomes (3.9 Mb). An additional region of the USDA257 chromosome of about 2 Mb displays similarity to plasmid pSfHH103e. Remarkable differences exist between HH103 and NGR234 concerning nod genes, flavonoid effect on surface polysaccharide production, and quorum-sensing systems. Furthermore a number of protein secretion systems have been found. Two genes coding for putative type III–secreted effectors not previously described in S. fredii, nopI and gunA, have been located on the HH103 genome. These differences could be important to understand the different symbiotic behavior of S. fredii strains HH103, USDA257, and NGR234 with soybean.

Download Full-text

Whole genome comparisons of serotype 4b and 1/2a strains of the food-borne pathogen Listeria monocytogenes reveal new insights into the core genome components of this species

Nucleic Acids Research ◽

10.1093/nar/gkh562 ◽

2004 ◽

Vol 32 (8) ◽

pp. 2386-2395 ◽

Cited By ~ 368

Author(s):

K. E. Nelson

Keyword(s):

Listeria Monocytogenes ◽

Core Genome ◽

Whole Genome ◽

The Core ◽

Food Borne ◽

Serotype 4B ◽

Genome Comparisons

Download Full-text

Comparative genomics reveals new functional insights in uncultured MAST species

The ISME Journal ◽

10.1038/s41396-020-00885-8 ◽

2021 ◽

Author(s):

Aurelie Labarre ◽

David López-Escardó ◽

Francisco Latorre ◽

Guy Leonard ◽

François Bucchini ◽

...

Keyword(s):

Energy Metabolism ◽

Genomic Data ◽

Proton Pumps ◽

Functional Annotations ◽

The Core ◽

Unicellular Eukaryotes ◽

Microbial Eukaryotes ◽

Core Energy ◽

Marine Surface ◽

Environmental Sequences

AbstractHeterotrophic lineages of stramenopiles exhibit enormous diversity in morphology, lifestyle, and habitat. Among them, the marine stramenopiles (MASTs) represent numerous independent lineages that are only known from environmental sequences retrieved from marine samples. The core energy metabolism characterizing these unicellular eukaryotes is poorly understood. Here, we used single-cell genomics to retrieve, annotate, and compare the genomes of 15 MAST species, obtained by coassembling sequences from 140 individual cells sampled from the marine surface plankton. Functional annotations from their gene repertoires are compatible with all of them being phagocytotic. The unique presence of rhodopsin genes in MAST species, together with their widespread expression in oceanic waters, supports the idea that MASTs may be capable of using sunlight to thrive in the photic ocean. Additional subsets of genes used in phagocytosis, such as proton pumps for vacuole acidification and peptidases for prey digestion, did not reveal particular trends in MAST genomes as compared with nonphagocytotic stramenopiles, except a larger presence and diversity of V-PPase genes. Our analysis reflects the complexity of phagocytosis machinery in microbial eukaryotes, which contrasts with the well-defined set of genes for photosynthesis. These new genomic data provide the essential framework to study ecophysiology of uncultured species and to gain better understanding of the function of rhodopsins and related carotenoids in stramenopiles.

Download Full-text

Perceptions of ‘Precision’ and ‘Personalised’ Medicine in Singapore and Associated Ethical Issues

Asian Bioethics Review ◽

10.1007/s41649-021-00165-3 ◽

2021 ◽

Vol 13 (2) ◽

pp. 179-194

Author(s):

Serene Ong ◽

Jeffrey Ling ◽

Angela Ballantyne ◽

Tamra Lysaght ◽

Vicki Xafis

Keyword(s):

Precision Medicine ◽

Ethical Issues ◽

Personalised Medicine ◽

Large Population ◽

Data Sets ◽

Visual Aids ◽

Policy Makers ◽

The Social ◽

Genomic Analyses ◽

Common Understanding

AbstractGovernments are investing in precision medicine (PM) with the aim of improving healthcare through the use of genomic analyses and data analytics to develop tailored treatment approaches for individual patients. The success of PM is contingent upon clear public communications that engender trust and secure the social licence to collect and share large population-wide data sets because specific consent for each data re-use is impractical. Variation in the terminology used by different programmes used to describe PM may hinder clear communication and threaten trust. Language is used to create common understanding and expectations regarding precision medicine between researchers, clinicians and the volunteers. There is a need to better understand public interpretations of PM-related terminology. This paper reports on a qualitative study involving 24 focus group participants in the multi-lingual context of Singapore. The study explored how Singaporeans interpret and understand the terms ‘precision medicine’ and ‘personalised medicine’, and which term they felt more aptly communicates the concept and goals of PM. Results suggest that participants were unable to readily link the terms with this area of medicine and initially displayed preferences for the more familiar term of ‘personalised’. The use of visual aids to convey key concepts resonated with participants, some of whom then indicated preferences for the term ‘precision’ as being a more accurate description of PM research. These aids helped to facilitate dialogue around the ethical and social value, as well as the risks, of PM. Implications for programme developers and policy makers are discussed.

Download Full-text

A Genomic Distance Based on MUM Indicates Discontinuity between Most Bacterial Species and Genera

Journal of Bacteriology ◽

10.1128/jb.01202-08 ◽

2008 ◽

Vol 191 (1) ◽

pp. 91-99 ◽

Cited By ~ 115

Author(s):

Marc Deloger ◽

Meriem El Karoui ◽

Marie-Agnès Petit

Keyword(s):

Dna Sequences ◽

Dna Content ◽

Core Genome ◽

Biological Diversity ◽

Bacterial Species ◽

Genomic Distance ◽

The Core ◽

Intraspecies Diversity ◽

Genome Level ◽

Definition Of

ABSTRACT The fundamental unit of biological diversity is the species. However, a remarkable extent of intraspecies diversity in bacteria was discovered by genome sequencing, and it reveals the need to develop clear criteria to group strains within a species. Two main types of analyses used to quantify intraspecies variation at the genome level are the average nucleotide identity (ANI), which detects the DNA conservation of the core genome, and the DNA content, which calculates the proportion of DNA shared by two genomes. Both estimates are based on BLAST alignments for the definition of DNA sequences common to the genome pair. Interestingly, however, results using these methods on intraspecies pairs are not well correlated. This prompted us to develop a genomic-distance index taking into account both criteria of diversity, which are based on DNA maximal unique matches (MUM) shared by two genomes. The values, called MUMi, for MUM index, correlate better with the ANI than with the DNA content. Moreover, the MUMi groups strains in a way that is congruent with routinely used multilocus sequence-typing trees, as well as with ANI-based trees. We used the MUMi to determine the relatedness of all available genome pairs at the species and genus levels. Our analysis reveals a certain consistency in the current notion of bacterial species, in that the bulk of intraspecies and intragenus values are clearly separable. It also confirms that some species are much more diverse than most. As the MUMi is fast to calculate, it offers the possibility of measuring genome distances on the whole database of available genomes.

Download Full-text

To catch a hijacker: abundance, evolution and genetic diversity of P4-like bacteriophage satellites

Philosophical Transactions of the Royal Society B Biological Sciences ◽

10.1098/rstb.2020.0475 ◽

2021 ◽

Vol 377 (1842) ◽

Author(s):

Jorge A. Moura de Sousa ◽

Eduardo P. C. Rocha

Keyword(s):

Escherichia Coli ◽

Molecular Mechanisms ◽

Core Genome ◽

Mobile Genetic Elements ◽

Abundance Distribution ◽

Theme Issue ◽

Genetic Elements ◽

The Core ◽

Antagonistic Interactions ◽

Variable Genes

Bacteriophages (phages) are bacterial parasites that can themselves be parasitized by phage satellites. The molecular mechanisms used by satellites to hijack phages are sometimes understood in great detail, but the origins, abundance, distribution and composition of these elements are poorly known. Here, we show that P4-like elements are present in more than 30% of the genomes of Enterobacterales, and in almost half of those of Escherichia coli , sometimes in multiple distinct copies. We identified over 1000 P4-like elements with very conserved genetic organization of the core genome and a few hotspots with highly variable genes. These elements are never found in plasmids and have very little homology to known phages, suggesting an independent evolutionary origin. Instead, they are scattered across chromosomes, possibly because their integrases are often exchanged with other elements. The rooted phylogenies of hijacking functions are correlated and suggest longstanding coevolution. They also reveal broad host ranges in P4-like elements, as almost identical elements can be found in distinct bacterial genera. Our results show that P4-like phage satellites constitute a very distinct, widespread and ancient family of mobile genetic elements. They pave the way for studying the molecular evolution of antagonistic interactions between phages and their satellites. This article is part of the theme issue ‘The secret lives of microbial mobile genetic elements’.

Download Full-text

The core genome multi-locus sequence typing of Mycoplasma anserisalpingitidis

BMC Genomics ◽

10.1186/s12864-020-06817-2 ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Áron B. Kovács ◽

Zsuzsa Kreizinger ◽

Barbara Forró ◽

Dénes Grózner ◽

Alexa Mitter ◽

...

Keyword(s):

Core Genome ◽

Multi Locus Sequence Typing ◽

The Core

Download Full-text

Revisiting Bap Multidomain Protein: More Than Sticking Bacteria Together

Frontiers in Microbiology ◽

10.3389/fmicb.2020.613581 ◽

2020 ◽

Vol 11 ◽

Author(s):

Jaione Valle ◽

Xianyang Fang ◽

Iñigo Lasa

Keyword(s):

Biofilm Formation ◽

Core Genome ◽

Current Knowledge ◽

Surface Proteins ◽

Structure And Properties ◽

Diverse Range ◽

The Core ◽

Self Assembling ◽

Abiotic Surfaces ◽

Range Of Functions

One of the major components of the staphylococcal biofilm is surface proteins that assemble as scaffold components of the biofilm matrix. Among the different surface proteins able to contribute to biofilm formation, this review is dedicated to the Biofilm Associated Protein (Bap). Bap is part of the accessory genome of Staphylococcus aureus but orthologs of Bap in other staphylococcal species belong to the core genome. When present, Bap promotes adhesion to abiotic surfaces and induces strong intercellular adhesion by self-assembling into amyloid like aggregates in response to the levels of calcium and the pH in the environment. During infection, Bap enhances the adhesion to epithelial cells where it binds directly to the host receptor Gp96 and inhibits the entry of the bacteria into the cells. To perform such diverse range of functions, Bap comprises several domains, and some of them include several motifs associated to distinct functions. Based on the knowledge accumulated with the Bap protein of S. aureus, this review aims to summarize the current knowledge of the structure and properties of each domain of Bap and their contribution to Bap functionality.

Download Full-text

The Religious Divide in Voting Preferences and Attitudes in the 2019 Election

Studies in Indian Politics ◽

10.1177/2321023019874892 ◽

2019 ◽

Vol 7 (2) ◽

pp. 161-175 ◽

Cited By ~ 1

Author(s):

Shreyas Sardesai

Keyword(s):

Sufficient Evidence ◽

Data Sets ◽

Election Study ◽

Large Measure ◽

National Elections ◽

Election Result ◽

The Core ◽

National Election Study ◽

Bharatiya Janata Party ◽

The Government

This article attempts to empirically test the claims made by several commentators that religious polarization was at the core of the 2019 Lok Sabha election verdict. Relying heavily on the National Election Study (NES) data sets, it finds that the election result was in large measure an outcome of massive vote consolidation on religious lines, with the majority Hindu community preferring the Bharatiya Janata Party (BJP)-led National Democratic Alliance (NDA) in unprecedented proportion and the main religious minorities largely staying away from it, although there were some exceptions. It shows that, for two national elections in a row, the Narendra Modi- and Amit Shah-led BJP has been able to overcome the caste hierarchies among Hindus and systematically construct a Hindu category of voters versus others. This chasm between Hindus and the minorities is also seen with respect to their attitudes regarding the government, its leadership and contentious issues like the Ayodhya dispute. This article, however, does not find sufficient evidence with regard to the claims that a large part of the Hindu support for the BJP-led alliance may have been on account of anti-minority sentiments.

Download Full-text

Why? – Successful Pseudomonas aeruginosa clones with a focus on clone C

FEMS Microbiology Reviews ◽

10.1093/femsre/fuaa029 ◽

2020 ◽

Vol 44 (6) ◽

pp. 740-762

Author(s):

Changhan Lee ◽

Jens Klockgether ◽

Sebastian Fischer ◽

Janja Trcek ◽

Burkhard Tümmler ◽

...

Keyword(s):

Pseudomonas Aeruginosa ◽

Core Genome ◽

Genomic Island ◽

Protein Quality ◽

Temperature Tolerance ◽

Protein Homeostasis ◽

Gene Products ◽

Accessory Genome ◽

The Core ◽

Conserved Core

ABSTRACT The environmental species Pseudomonas aeruginosa thrives in a variety of habitats. Within the epidemic population structure of P. aeruginosa, occassionally highly successful clones that are equally capable to succeed in the environment and the human host arise. Framed by a highly conserved core genome, individual members of successful clones are characterized by a high variability in their accessory genome. The abundance of successful clones might be funded in specific features of the core genome or, although not mutually exclusive, in the variability of the accessory genome. In clone C, one of the most predominant clones, the plasmid pKLC102 and the PACGI-1 genomic island are two ubiquitous accessory genetic elements. The conserved transmissible locus of protein quality control (TLPQC) at the border of PACGI-1 is a unique horizontally transferred compository element, which codes predominantly for stress-related cargo gene products such as involved in protein homeostasis. As a hallmark, most TLPQC xenologues possess a core genome equivalent. With elevated temperature tolerance as a characteristic of clone C strains, the unique P. aeruginosa and clone C specific disaggregase ClpG is a major contributor to tolerance. As other successful clones, such as PA14, do not encode the TLPQC locus, ubiquitous denominators of success, if existing, need to be identified.

Download Full-text