scholarly journals CoreCruncher: Fast and Robust Construction of Core Genomes in Large Prokaryotic Data Sets

Author(s):  
Connor D Harris ◽  
Ellis L Torrance ◽  
Kasie Raymann ◽  
Louis-Marie Bobay

Abstract The core genome represents the set of genes shared by all, or nearly all, strains of a given population or species of prokaryotes. Inferring the core genome is integral to many genomic analyses, however, most methods rely on the comparison of all the pairs of genomes; a step that is becoming increasingly difficult given the massive accumulation of genomic data. Here, we present CoreCruncher; a program that robustly and rapidly constructs core genomes across hundreds or thousands of genomes. CoreCruncher does not compute all pairwise genome comparisons and uses a heuristic based on the distributions of identity scores to classify sequences as orthologs or paralogs/xenologs. Although it is much faster than current methods, our results indicate that our approach is more conservative than other tools and less sensitive to the presence of paralogs and xenologs. CoreCruncher is freely available from: https://github.com/lbobay/CoreCruncher. CoreCruncher is written in Python 3.7 and can also run on Python 2.7 without modification. It requires the python library Numpy and either Usearch or Blast. Certain options require the programs muscle or mafft.

2015 ◽  
Vol 28 (7) ◽  
pp. 811-824 ◽  
Author(s):  
José-María Vinardell ◽  
Sebastián Acosta-Jurado ◽  
Susanne Zehner ◽  
Michael Göttfert ◽  
Anke Becker ◽  
...  

Sinorhizobium fredii HH103 is a fast-growing rhizobial strain infecting a broad range of legumes including both American and Asiatic soybeans. In this work, we present the sequencing and annotation of the HH103 genome (7.25 Mb), consisting of one chromosome and six plasmids and representing the structurally most complex sinorhizobial genome sequenced so far. Comparative genomic analyses of S. fredii HH103 with strains USDA257 and NGR234 showed that the core genome of these three strains contains 4,212 genes (61.7% of the HH103 genes). Synteny plot analysis revealed that the much larger chromosome of USDA257 (6.48 Mb) is colinear to the HH103 (4.3 Mb) and NGR324 chromosomes (3.9 Mb). An additional region of the USDA257 chromosome of about 2 Mb displays similarity to plasmid pSfHH103e. Remarkable differences exist between HH103 and NGR234 concerning nod genes, flavonoid effect on surface polysaccharide production, and quorum-sensing systems. Furthermore a number of protein secretion systems have been found. Two genes coding for putative type III–secreted effectors not previously described in S. fredii, nopI and gunA, have been located on the HH103 genome. These differences could be important to understand the different symbiotic behavior of S. fredii strains HH103, USDA257, and NGR234 with soybean.


2021 ◽  
Author(s):  
Aurelie Labarre ◽  
David López-Escardó ◽  
Francisco Latorre ◽  
Guy Leonard ◽  
François Bucchini ◽  
...  

AbstractHeterotrophic lineages of stramenopiles exhibit enormous diversity in morphology, lifestyle, and habitat. Among them, the marine stramenopiles (MASTs) represent numerous independent lineages that are only known from environmental sequences retrieved from marine samples. The core energy metabolism characterizing these unicellular eukaryotes is poorly understood. Here, we used single-cell genomics to retrieve, annotate, and compare the genomes of 15 MAST species, obtained by coassembling sequences from 140 individual cells sampled from the marine surface plankton. Functional annotations from their gene repertoires are compatible with all of them being phagocytotic. The unique presence of rhodopsin genes in MAST species, together with their widespread expression in oceanic waters, supports the idea that MASTs may be capable of using sunlight to thrive in the photic ocean. Additional subsets of genes used in phagocytosis, such as proton pumps for vacuole acidification and peptidases for prey digestion, did not reveal particular trends in MAST genomes as compared with nonphagocytotic stramenopiles, except a larger presence and diversity of V-PPase genes. Our analysis reflects the complexity of phagocytosis machinery in microbial eukaryotes, which contrasts with the well-defined set of genes for photosynthesis. These new genomic data provide the essential framework to study ecophysiology of uncultured species and to gain better understanding of the function of rhodopsins and related carotenoids in stramenopiles.


2021 ◽  
Vol 13 (2) ◽  
pp. 179-194
Author(s):  
Serene Ong ◽  
Jeffrey Ling ◽  
Angela Ballantyne ◽  
Tamra Lysaght ◽  
Vicki Xafis

AbstractGovernments are investing in precision medicine (PM) with the aim of improving healthcare through the use of genomic analyses and data analytics to develop tailored treatment approaches for individual patients. The success of PM is contingent upon clear public communications that engender trust and secure the social licence to collect and share large population-wide data sets because specific consent for each data re-use is impractical. Variation in the terminology used by different programmes used to describe PM may hinder clear communication and threaten trust. Language is used to create common understanding and expectations regarding precision medicine between researchers, clinicians and the volunteers. There is a need to better understand public interpretations of PM-related terminology. This paper reports on a qualitative study involving 24 focus group participants in the multi-lingual context of Singapore. The study explored how Singaporeans interpret and understand the terms ‘precision medicine’ and ‘personalised medicine’, and which term they felt more aptly communicates the concept and goals of PM. Results suggest that participants were unable to readily link the terms with this area of medicine and initially displayed preferences for the more familiar term of ‘personalised’. The use of visual aids to convey key concepts resonated with participants, some of whom then indicated preferences for the term ‘precision’ as being a more accurate description of PM research. These aids helped to facilitate dialogue around the ethical and social value, as well as the risks, of PM. Implications for programme developers and policy makers are discussed.


2008 ◽  
Vol 191 (1) ◽  
pp. 91-99 ◽  
Author(s):  
Marc Deloger ◽  
Meriem El Karoui ◽  
Marie-Agnès Petit

ABSTRACT The fundamental unit of biological diversity is the species. However, a remarkable extent of intraspecies diversity in bacteria was discovered by genome sequencing, and it reveals the need to develop clear criteria to group strains within a species. Two main types of analyses used to quantify intraspecies variation at the genome level are the average nucleotide identity (ANI), which detects the DNA conservation of the core genome, and the DNA content, which calculates the proportion of DNA shared by two genomes. Both estimates are based on BLAST alignments for the definition of DNA sequences common to the genome pair. Interestingly, however, results using these methods on intraspecies pairs are not well correlated. This prompted us to develop a genomic-distance index taking into account both criteria of diversity, which are based on DNA maximal unique matches (MUM) shared by two genomes. The values, called MUMi, for MUM index, correlate better with the ANI than with the DNA content. Moreover, the MUMi groups strains in a way that is congruent with routinely used multilocus sequence-typing trees, as well as with ANI-based trees. We used the MUMi to determine the relatedness of all available genome pairs at the species and genus levels. Our analysis reveals a certain consistency in the current notion of bacterial species, in that the bulk of intraspecies and intragenus values are clearly separable. It also confirms that some species are much more diverse than most. As the MUMi is fast to calculate, it offers the possibility of measuring genome distances on the whole database of available genomes.


Author(s):  
Jorge A. Moura de Sousa ◽  
Eduardo P. C. Rocha

Bacteriophages (phages) are bacterial parasites that can themselves be parasitized by phage satellites. The molecular mechanisms used by satellites to hijack phages are sometimes understood in great detail, but the origins, abundance, distribution and composition of these elements are poorly known. Here, we show that P4-like elements are present in more than 30% of the genomes of Enterobacterales, and in almost half of those of Escherichia coli , sometimes in multiple distinct copies. We identified over 1000 P4-like elements with very conserved genetic organization of the core genome and a few hotspots with highly variable genes. These elements are never found in plasmids and have very little homology to known phages, suggesting an independent evolutionary origin. Instead, they are scattered across chromosomes, possibly because their integrases are often exchanged with other elements. The rooted phylogenies of hijacking functions are correlated and suggest longstanding coevolution. They also reveal broad host ranges in P4-like elements, as almost identical elements can be found in distinct bacterial genera. Our results show that P4-like phage satellites constitute a very distinct, widespread and ancient family of mobile genetic elements. They pave the way for studying the molecular evolution of antagonistic interactions between phages and their satellites. This article is part of the theme issue ‘The secret lives of microbial mobile genetic elements’.


BMC Genomics ◽  
2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Áron B. Kovács ◽  
Zsuzsa Kreizinger ◽  
Barbara Forró ◽  
Dénes Grózner ◽  
Alexa Mitter ◽  
...  

2020 ◽  
Vol 11 ◽  
Author(s):  
Jaione Valle ◽  
Xianyang Fang ◽  
Iñigo Lasa

One of the major components of the staphylococcal biofilm is surface proteins that assemble as scaffold components of the biofilm matrix. Among the different surface proteins able to contribute to biofilm formation, this review is dedicated to the Biofilm Associated Protein (Bap). Bap is part of the accessory genome of Staphylococcus aureus but orthologs of Bap in other staphylococcal species belong to the core genome. When present, Bap promotes adhesion to abiotic surfaces and induces strong intercellular adhesion by self-assembling into amyloid like aggregates in response to the levels of calcium and the pH in the environment. During infection, Bap enhances the adhesion to epithelial cells where it binds directly to the host receptor Gp96 and inhibits the entry of the bacteria into the cells. To perform such diverse range of functions, Bap comprises several domains, and some of them include several motifs associated to distinct functions. Based on the knowledge accumulated with the Bap protein of S. aureus, this review aims to summarize the current knowledge of the structure and properties of each domain of Bap and their contribution to Bap functionality.


2019 ◽  
Vol 7 (2) ◽  
pp. 161-175 ◽  
Author(s):  
Shreyas Sardesai

This article attempts to empirically test the claims made by several commentators that religious polarization was at the core of the 2019 Lok Sabha election verdict. Relying heavily on the National Election Study (NES) data sets, it finds that the election result was in large measure an outcome of massive vote consolidation on religious lines, with the majority Hindu community preferring the Bharatiya Janata Party (BJP)-led National Democratic Alliance (NDA) in unprecedented proportion and the main religious minorities largely staying away from it, although there were some exceptions. It shows that, for two national elections in a row, the Narendra Modi- and Amit Shah-led BJP has been able to overcome the caste hierarchies among Hindus and systematically construct a Hindu category of voters versus others. This chasm between Hindus and the minorities is also seen with respect to their attitudes regarding the government, its leadership and contentious issues like the Ayodhya dispute. This article, however, does not find sufficient evidence with regard to the claims that a large part of the Hindu support for the BJP-led alliance may have been on account of anti-minority sentiments.


2020 ◽  
Vol 44 (6) ◽  
pp. 740-762
Author(s):  
Changhan Lee ◽  
Jens Klockgether ◽  
Sebastian Fischer ◽  
Janja Trcek ◽  
Burkhard Tümmler ◽  
...  

ABSTRACT The environmental species Pseudomonas aeruginosa thrives in a variety of habitats. Within the epidemic population structure of P. aeruginosa, occassionally highly successful clones that are equally capable to succeed in the environment and the human host arise. Framed by a highly conserved core genome, individual members of successful clones are characterized by a high variability in their accessory genome. The abundance of successful clones might be funded in specific features of the core genome or, although not mutually exclusive, in the variability of the accessory genome. In clone C, one of the most predominant clones, the plasmid pKLC102 and the PACGI-1 genomic island are two ubiquitous accessory genetic elements. The conserved transmissible locus of protein quality control (TLPQC) at the border of PACGI-1 is a unique horizontally transferred compository element, which codes predominantly for stress-related cargo gene products such as involved in protein homeostasis. As a hallmark, most TLPQC xenologues possess a core genome equivalent. With elevated temperature tolerance as a characteristic of clone C strains, the unique P. aeruginosa and clone C specific disaggregase ClpG is a major contributor to tolerance. As other successful clones, such as PA14, do not encode the TLPQC locus, ubiquitous denominators of success, if existing, need to be identified.


Sign in / Sign up

Export Citation Format

Share Document