Selection of representative genomes for 24,706 bacterial and archaeal species clusters provide a complete genome-based taxonomy

Mapping Intimacies ◽

10.1101/771964 ◽

2019 ◽

Cited By ~ 13

Author(s):

Donovan H. Parks ◽

Maria Chuvochina ◽

Pierre-Alain Chaumeil ◽

Christian Rinke ◽

Aaron J. Mussig ◽

...

Keyword(s):

De Novo ◽

Reference Tree ◽

Archaeal Species ◽

Accepted Average ◽

Scientific Results ◽

Public Repositories ◽

Taxonomic Framework ◽

Taxonomic Assignments ◽

Type Strains ◽

Selection Of

AbstractWe recently introduced the Genome Taxonomy Database (GTDB), a phylogenetically consistent, genome-based taxonomy providing rank normalized classifications for nearly 150,000 genomes from domain to genus. However, nearly 40% of the genomes used to infer the GTDB reference tree lack a species name, reflecting the large number of genomes in public repositories without complete taxonomic assignments. Here we address this limitation by proposing 24,706 species clusters which encompass all publicly available bacterial and archaeal genomes when using commonly accepted average nucleotide identity (ANI) criteria for circumscribing species. In contrast to previous ANI studies, we selected a single representative genome to serve as the nomenclatural type for circumscribing each species with type strains used where available. We complemented the 8,792 species clusters with validly or effectively published names with 15,914de novospecies clusters in order to assign placeholder names to the growing number of genomes from uncultivated species. This provides the first complete domain to species taxonomic framework which will improve communication of scientific results.

Download Full-text

Posttransplant CMV infection and the role of immunosuppression (Sponsor: Novartis Pharma GmbH, Nürnberg)

Therapieforum Transplant ◽

10.1055/a-0711-9047 ◽

2016 ◽

Vol 04 (01) ◽

pp. 4-10

Keyword(s):

Lower Incidence ◽

Paradigm Shift ◽

Mtor Inhibitor ◽

De Novo ◽

Expert Panel ◽

Risk Of Infection ◽

Cmv Infection ◽

Expert Meeting ◽

Selection Of

AbstractImmunosuppression permits graft survival after transplantation and consequently a longer and better life. On the other hand, it increases the risk of infection, for instance with cytomegalovirus (CMV). However, the various available immunosuppressive therapies differ in this regard. One of the first clinical trials using de novo everolimus after kidney transplantation [1] already revealed a considerably lower incidence of CMV infection in the everolimus arms than in the mycophenolate mofetil (MMF) arm. This result was repeatedly confirmed in later studies [2–4]. Everolimus is now considered a substance with antiviral properties. This article is based on the expert meeting “Posttransplant CMV infection and the role of immunosuppression”. The expert panel called for a paradigm shift: In a CMV prevention strategy the targeted selection of the immunosuppressive therapy is also a key element. For patients with elevated risk of CMV, mTOR inhibitor-based immunosuppression is advantageous as it is associated with a significantly lower incidence of CMV events.

Download Full-text

Selection of Tree Nut Allergen Peptide Markers: A Need for Improved Protein Sequence Databases

Journal of AOAC International ◽

10.1093/jaoac/102.5.1263 ◽

2019 ◽

Vol 102 (5) ◽

pp. 1263-1270 ◽

Cited By ~ 1

Author(s):

Weili Xiong ◽

Melinda A McFarland ◽

Cary Pirone ◽

Christine H Parker

Keyword(s):

Food Allergen ◽

Protein Sequence ◽

Sequence Information ◽

Sequencing Data ◽

Reference Tree ◽

Candidate Peptide ◽

Tree Nut ◽

Allergen Detection ◽

Sequence Databases ◽

Selection Of

Abstract Background: To effectively safeguard the food-allergic population and support compliance with food-labeling regulations, the food industry and regulatory agencies require reliable methods for food allergen detection and quantification. MS-based detection of food allergens relies on the systematic identification of robust and selective target peptide markers. The selection of proteotypic peptide markers, however, relies on the availability of high-quality protein sequence information, a bottleneck for the analysis of many plant-based proteomes. Method: In this work, data were compiled for reference tree nut ingredients and evaluated using a parsimony-driven global proteomics workflow. Results: The utility of supplementing existing incomplete protein sequence databases with translated genomic sequencing data was evaluated for English walnut and provided enhanced selection of candidate peptide markers and differentiation between closely related species. Highlights: Future improvements of protein databases and release of genomics-derived sequences are expected to facilitate the development of robust and harmonized LC–tandem MS-based methods for food allergen detection.

Download Full-text

Cydrasil 3, a curated 16S rRNA gene reference package and web app for cyanobacterial phylogenetic placement

Scientific Data ◽

10.1038/s41597-021-01015-5 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Daniel Roush ◽

Ana Giraldo-Silva ◽

Ferran Garcia-Pichel

Keyword(s):

16S Rrna ◽

16S Rrna Gene ◽

Web Application ◽

De Novo ◽

Reference Sequence ◽

Rrna Gene ◽

Automated Classification ◽

Reference Tree ◽

Phylogenetic Placement ◽

Routine Identification

AbstractCyanobacteria are a widespread and important bacterial phylum, responsible for a significant portion of global carbon and nitrogen fixation. Unfortunately, reliable and accurate automated classification of cyanobacterial 16S rRNA gene sequences is muddled by conflicting systematic frameworks, inconsistent taxonomic definitions (including the phylum itself), and database errors. To address this, we introduce Cydrasil 3 (https://www.cydrasil.org), a curated 16S rRNA gene reference package, database, and web application designed to provide a full phylogenetic perspective for cyanobacterial systematics and routine identification. Cydrasil 3 contains over 1300 manually curated sequences longer than 1100 base pairs and can be used for phylogenetic placement or as a reference sequence set for de novo phylogenetic reconstructions. The web application (utilizing PaPaRA and EPA-ng) can place thousands of sequences into the reference tree and has detailed instructions on how to analyze results. While the Cydrasil web application offers no taxonomic assignments, it instead provides phylogenetic placement, as well as a searchable database with curation notes and metadata, and a mechanism for community feedback.

Download Full-text

Frequency distribution of journalistic attention for scientific studies and scientific sources: An input – output analysis

10.31235/osf.io/jszt7 ◽

2020 ◽

Author(s):

Markus Lehmkuhl ◽

Nikolai Promies

Keyword(s):

Power Law ◽

Media Coverage ◽

Social Impact ◽

News Coverage ◽

Input Output ◽

Output Analysis ◽

Input Output Analysis ◽

Study Results ◽

Scientific Results ◽

Selection Of

Based on the decision-theoretical conditions underlying the selection of events for news coverage in science journalism, this article uses a novel input-output analysis to investigate which of the more than eight million scientific study results published between August 2014 and July 2018 have been selected by global journalism to a relevant degree. We are interested in two different structures in the media coverage of scientific results. Firstly, the structure of sources that journalists use, i.e. scientific journals, and secondly, the congruence of the journalistic selection of single results. Previous research suggests that the selection of sources and results follows a certain heavy-tailed distribution, a power law. Mathematically, this distribution can be described with a function of the form C*x-α. We argue that the exponent of such power law distributions can potentially be an indicator to describe selectivity in journalism on a high aggregation level. In our input-output analysis, we look for such patterns in the coverage of all scientific results published in the database Scopus over four years. To get an estimate of the coverage of these results, we use data from the altmetrics provider Altmetric, more precisely their Mainstream-Media-Score (MSM-Score). Based on exploratory analyses, we define papers with a score of 50 or above as Social Impact Papers (SIPs). Over our study period, we identified 5,833 SIPs published in 1,236 journals. We consider a power law fit with an exponent of about -2 to be plausible for the distribution of the source selection but cannot confirm the power law hypothesis for the distribution of the selection of single results. In this case, an exponentially truncated power law seems to be the better fit.

Download Full-text

DEVELOPMENT OF TOURIST ORIENTED FARMS OF LVIV REGION

GEOGRAPHY AND TOURISM ◽

10.17721/2308-135x.2020.58.36-42 ◽

2020 ◽

pp. 36-42

Author(s):

Olga Tsymbala ◽

Julia Dorosh

Keyword(s):

Dairy Products ◽

Raw Materials ◽

Relevant Information ◽

Educational Process ◽

Practical Significance ◽

Dominant Group ◽

Current State ◽

Tourism Sector ◽

Scientific Results ◽

Selection Of

Purpose. Characterize the development trends of the network of tourist-oriented farms which operate within the Lviv region. Systematize information about their specialization and key areas of work, identify and describe the main groups of farms in Lviv region in view of the priority of their activities, as well as reveal the features of the tourist offers for visitors and tourist groups. Methods. The research used the method of analysis to study the literature and information sources on the research topic; the method of scientific systematization is applied to the selection of tourist oriented farms of Lviv region and the characteristic of key areas of their activities; the cartographic method allowed to visualize the location of the studied farms within the region by forming a map of Lviv region with the indication of settlements where the tourist-oriented farms operate. Results. The role, place and significance of tourist-oriented farms for the development of rural tourism are outlined. The farms of Lviv region that are involved in the tourism sector, focused on the reception of tourists and presented in the information space are highlighted. The dominant group of tourist-oriented farms in Lviv region is formed by those farms that are engaged in the cultivation of cows, goats, sheep, specializing in the manufacture and sale of dairy products, especially various types of cheese. At the same time, a number of cheese factories have been formed within the region, operating on the basis of purchased raw materials. A separate group includes honey eco-farms, berry farms, snail farms, ostriches, etc. Systematized data on the specifics, features of the development of the studied farms are presented in the table, the main products and tourist services they offer on the market of tourist services are highlighted. A map of Lviv region with a presentation of the geography of location of tourist-oriented farms within the region are developed. The scientific novelty of the obtained results lies in the generalization of information about the existing tourist-oriented farms of Lviv region and the analysis of their offer on the tourist market. The practical significance lies in the systematization of relevant information about the current state of development of tourist oriented farms in Lviv region. The obtained scientific results can be used in the process of forming tours in Lviv region as well as in the educational process in the training of future specialists in the specialty «Tourism».

Download Full-text

Phylogeny of the family Pasteurellaceae based on rpoB sequences

INTERNATIONAL JOURNAL OF SYSTEMATIC AND EVOLUTIONARY MICROBIOLOGY ◽

10.1099/ijs.0.03043-0 ◽

2004 ◽

Vol 54 (4) ◽

pp. 1393-1399 ◽

Cited By ~ 153

Author(s):

Bożena Korczak ◽

Henrik Christensen ◽

Stefan Emler ◽

Joachim Frey ◽

Peter Kuhnert

Keyword(s):

16S Rdna ◽

Dna Hybridization ◽

Gene Encoding ◽

Phylogenetic Studies ◽

Study Selection ◽

Gene Sequence Analysis ◽

Rdna Sequencing ◽

The Family ◽

Type Strains ◽

Selection Of

Sequences of the gene encoding the β-subunit of the RNA polymerase (rpoB) were used to delineate the phylogeny of the family Pasteurellaceae. A total of 72 strains, including the type strains of the major described species as well as selected field isolates, were included in the study. Selection of universal rpoB-derived primers for the family allowed straightforward amplification and sequencing of a 560 bp fragment of the rpoB gene. In parallel, 16S rDNA was sequenced from all strains. The phylogenetic tree obtained with the rpoB sequences reflected the major branches of the tree obtained with the 16S rDNA, especially at the genus level. Only a few discrepancies between the trees were observed. In certain cases the rpoB phylogeny was in better agreement with DNA–DNA hybridization studies than the phylogeny derived from 16S rDNA. The rpoB gene is strongly conserved within the various species of the family of Pasteurellaceae. Hence, rpoB gene sequence analysis in conjunction with 16S rDNA sequencing is a valuable tool for phylogenetic studies of the Pasteurellaceae and may also prove useful for reorganizing the current taxonomy of this bacterial family.

Download Full-text

Detecting and correcting misclassified sequences in the large-scale public databases

Bioinformatics ◽

10.1093/bioinformatics/btaa586 ◽

2020 ◽

Vol 36 (18) ◽

pp. 4699-4705

Author(s):

Hamid Bagheri ◽

Andrew J Severin ◽

Hridesh Rajan

Keyword(s):

Large Scale ◽

Sequence Similarity ◽

Heuristic Method ◽

Simulated Data ◽

Supplementary Information ◽

Small Subset ◽

Taxonomic Assignment ◽

User Input ◽

Public Repositories ◽

Taxonomic Assignments

Abstract Motivation As the cost of sequencing decreases, the amount of data being deposited into public repositories is increasing rapidly. Public databases rely on the user to provide metadata for each submission that is prone to user error. Unfortunately, most public databases, such as non-redundant (NR), rely on user input and do not have methods for identifying errors in the provided metadata, leading to the potential for error propagation. Previous research on a small subset of the NR database analyzed misclassification based on sequence similarity. To the best of our knowledge, the amount of misclassification in the entire database has not been quantified. We propose a heuristic method to detect potentially misclassified taxonomic assignments in the NR database. We applied a curation technique and quality control to find the most probable taxonomic assignment. Our method incorporates provenance and frequency of each annotation from manually and computationally created databases and clustering information at 95% similarity. Results We found more than two million potentially taxonomically misclassified proteins in the NR database. Using simulated data, we show a high precision of 97% and a recall of 87% for detecting taxonomically misclassified proteins. The proposed approach and findings could also be applied to other databases. Availability and implementation Source code, dataset, documentation, Jupyter notebooks and Docker container are available at https://github.com/boalang/nr. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Guard Cell Microfilament Analyzer Facilitates the Analysis of the Organization and Dynamics of Actin Filaments in Arabidopsis Guard Cells

International Journal of Molecular Sciences ◽

10.3390/ijms20112753 ◽

2019 ◽

Vol 20 (11) ◽

pp. 2753

Author(s):

Xin Li ◽

Min Diao ◽

Yanan Zhang ◽

Guanlin Chen ◽

Shanjin Huang ◽

...

Keyword(s):

Actin Cytoskeleton ◽

Dynamic Behavior ◽

Guard Cell ◽

Actin Filaments ◽

De Novo ◽

Guard Cells ◽

Stomatal Movement ◽

Selection Of

The actin cytoskeleton is involved in regulating stomatal movement, which forms distinct actin arrays within guard cells of stomata with different apertures. How those actin arrays are formed and maintained remains largely unexplored. Elucidation of the dynamic behavior of differently oriented actin filaments in guard cells will enhance our understanding in this regard. Here, we initially developed a program called ‘guard cell microfilament analyzer’ (GCMA) that enables the selection of individual actin filaments and analysis of their orientations semiautomatically in guard cells. We next traced the dynamics of individual actin filaments and performed careful quantification in open and closed stomata. We found that de novo nucleation of actin filaments occurs at both dorsal and ventral sides of guard cells from open and closed stomata. Interestingly, most of the nucleated actin filaments elongate radially and longitudinally in open and closed stomata, respectively. Strikingly, radial filaments tend to form bundles whereas longitudinal filaments tend to be removed by severing and depolymerization in open stomata. By contrast, longitudinal filaments tend to form bundles that are severed less frequently in closed stomata. These observations provide insights into the formation and maintenance of distinct actin arrays in guard cells in stomata of different apertures.

Download Full-text

A genome-based species taxonomy of the Lactobacillus Genus Complex

10.1101/537084 ◽

2019 ◽

Cited By ~ 1

Author(s):

Stijn Wittouck ◽

Sander Wuyts ◽

Conor J Meehan ◽

Vera van Noort ◽

Sarah Lebeer

Keyword(s):

De Novo ◽

Sequence Divergence ◽

Rrna Sequence ◽

16S Rrna Sequence ◽

Genome Data ◽

Current State ◽

A Genome ◽

Identity Threshold ◽

Species Taxonomy ◽

Type Strains

AbstractBackgroundThere are over 200 published species within the Lactobacillus Genus Complex (LGC), the majority of which have sequenced type strain genomes available. Although gold standard, genome-based species delimitation cutoffs are accepted by the community, they are seldom checked against currently available genome data. In addition, there are many species-level misclassification issues within the LGC. We constructed a de novo species taxonomy for the LGC based on 2,459 publicly available, decent-quality genomes and using a 94% core nucleotide identity threshold. We reconciled these de novo species with published species and subspecies names by (i) identifying genomes of type strains in our dataset and (ii) performing comparisons based on 16S rRNA sequence identity against type strains.ResultsWe found that genomes within the LGC could be divided into 239 clusters (de novo species) that were discontinuous and exclusive. Comparison of these de novo species to published species lead to the identification of ten sets of published species that can be merged and one species that can be split. Further, we found at least eight genome clusters that constitute new species. Finally, we were able to accurately classify 98 unclassified genomes and reclassify 74 wrongly classified genomes.ConclusionsThe current state of LGC species taxonomy is largely consistent with genome data, but there are some inconsistencies as well as genome misclassifications. These inconsistencies should be resolved to evolve towards a meaningful taxonomy where species have a consistent size in terms of sequence divergence.

Download Full-text

Epstein Barr virus epitope/MHC interaction combined with convergent recombination drive selection of diverse T cell receptor α and β repertoires

10.1101/2020.02.06.938241 ◽

2020 ◽

Author(s):

Anna Gil ◽

Larisa Kamga ◽

Ramakanth Chirravuri-Venkata ◽

Nuray Aslan ◽

Fransenio Clark ◽

...

Keyword(s):

T Cell ◽

Viral Infections ◽

De Novo ◽

Dominant Role ◽

Treatment Strategies ◽

Optimal Strategies ◽

Cd8 T Cell ◽

Human Virus ◽

Tcr Repertoire ◽

Selection Of

AbstractRecognition modes of individual T cell receptors (TCR) are well studied, but factors driving the selection of TCR repertoires from primary through persistent human virus infections are less well understood. Using deep sequencing, we demonstrate a high degree of diversity of EBV-specific clonotypes in acute infectious mononucleosis. Only 9% of unique clonotypes detected in AIM persisted into convalescence; the majority (91%) of unique clonotypes detected in AIM were not detected in convalescence and were seeming replaced by equally diverse “de-novo” clonotypes. The persistent clonotypes had a greater probability of being generated than non-persistent due to convergence recombination of multiple nucleotide sequences to encode the same amino acid sequence, as well as the use of shorter CDR3 regions with fewer nucleotide additions (i.e. sequences closer to germline). Moreover, the two most immunodominant HLA-A2-restricted EBV epitopes, BRLF1109 and BMLF1280, show highly distinct antigen-specific public (i.e. shared between individuals) features. In fact, TCRα CDR3 motifs played a dominant role, while TCRβ played a minimal role, in the selection of TCR repertoire to an immunodominant EBV epitope, BRLF1. This contrasts with the majority of previously reported repertoires, which appear to be selected either on TCRβ CDR3 interactions with peptide/MHC or in combination with TCRα CDR3. Understanding of how TCR/peptide/MHC complex interactions drive repertoire selection can be used to develop optimal strategies for vaccine design or generation of appropriate adoptive immunotherapies for viral infections in transplant settings or for cancer.ImportanceSeveral lines of evidence suggest that TCRα and β repertoires play a role in disease outcomes and treatment strategies during viral infections in transplant patients, and in cancer and autoimmune disease therapy. Our data suggests that it is essential that we understand the basic principles of how to drive optimum repertoires for both TCR chains, α and β. We address this important issue by characterizing the CD8 TCR repertoire to a common persistent human viral infection (EBV), which is controlled by appropriate CD8 T cell responses. The ultimate goal would be to determine if the individuals who are infected asymptomatically develop a different TCR repertoire than those that develop the immunopathology of AIM. Here, we begin by doing an in depth characterization of both CD8 T cell TCRα and β repertoires to two immunodominant EBV epitopes over the course of AIM identifying potential factors that may be driving their selection.

Download Full-text