scholarly journals Comparison of three clustering approaches for detecting novel environmental microbial diversity

PeerJ ◽  
2016 ◽  
Vol 4 ◽  
pp. e1692 ◽  
Author(s):  
Dominik Forster ◽  
Micah Dunthorn ◽  
Thorsten Stoeck ◽  
Frédéric Mahé

Discovery of novel diversity in high-throughput sequencing studies is an important aspect in environmental microbial ecology. To evaluate the effects that amplicon clustering methods have on the discovery of novel diversity, we clustered an environmental marine high-throughput sequencing dataset of protist amplicons together with reference sequences from the taxonomically curated Protist Ribosomal Reference (PR2) database using threede novoapproaches: sequence similarity networks, USEARCH, and Swarm. The potentially novel diversity uncovered by each clustering approach differed drastically in the number of operational taxonomic units (OTUs) and in the number of environmental amplicons in these novel diversity OTUs. Global pairwise alignment comparisons revealed that numerous amplicons classified as potentially novel by USEARCH and Swarm were more than 97% similar to references of PR2. Using shortest path analyses on sequence similarity network OTUs and Swarm OTUs we found additional novel diversity within OTUs that would have gone unnoticed without further exploiting their underlying network topologies. These results demonstrate that graph theory provides powerful tools for microbial ecology and the analysis of environmental high-throughput sequencing datasets. Furthermore, sequence similarity networks were most accurate in delineating novel diversity from previously discovered diversity.

2015 ◽  
Author(s):  
Dominik Forster ◽  
Micah Dunthorn ◽  
Thorsten Stoeck ◽  
Frédéric Mahé

Discovery of novel diversity in high-throughput sequencing (HTS) studies is a central task in environmental microbial ecology. To evaluate the effects that amplicon clustering methods have on novel diversity discovery, we clustered an environmental marine protist HTS dataset of protist reads together with accessions from the taxonomically curated PR2 reference database using three de novo approaches: sequence similarity networks, USEARCH, and Swarm. The novel diversity uncovered by each clustering approach differed drastically in the number of operational taxonomic units (OTUs) and the number of environmental amplicons in these novel diversity OTUs. Global pairwise alignment comparisons revealed that numerous amplicons classified as novel by USEARCH and Swarm were actually highly similar to reference accessions. Using graph theory we found additional novel diversity within OTUs that would have gone unnoticed without further using their underlying network topologies. Our results suggest that novel diversity inferred from clustering approaches requires further validation, whereas graph theory provides a powerful tool for microbial ecology and the analyses of environmental HTS datasets.


2015 ◽  
Author(s):  
Dominik Forster ◽  
Micah Dunthorn ◽  
Thorsten Stoeck ◽  
Frédéric Mahé

Discovery of novel diversity in high-throughput sequencing (HTS) studies is a central task in environmental microbial ecology. To evaluate the effects that amplicon clustering methods have on novel diversity discovery, we clustered an environmental marine protist HTS dataset of protist reads together with accessions from the taxonomically curated PR2 reference database using three de novo approaches: sequence similarity networks, USEARCH, and Swarm. The novel diversity uncovered by each clustering approach differed drastically in the number of operational taxonomic units (OTUs) and the number of environmental amplicons in these novel diversity OTUs. Global pairwise alignment comparisons revealed that numerous amplicons classified as novel by USEARCH and Swarm were actually highly similar to reference accessions. Using graph theory we found additional novel diversity within OTUs that would have gone unnoticed without further using their underlying network topologies. Our results suggest that novel diversity inferred from clustering approaches requires further validation, whereas graph theory provides a powerful tool for microbial ecology and the analyses of environmental HTS datasets.


2021 ◽  
Vol 4 ◽  
Author(s):  
Kálmán Tapolczai ◽  
François Keck ◽  
Valentin Vasselon ◽  
Géza Selmeczy ◽  
Maria Kahlert ◽  
...  

Diatom biomonitoring and ecological studies can greatly benefit from DNA metabarcoding compared to conventional microscopical analysis by potentially providing more reliable and accurate data in a cost- and time-efficient way. A conventional strategy for the bioinformatic treatment of sequencing data involves the clustering of quality filtered sequences into Operational Taxonomic Units (OTUs) based on a global sequence similarity, and their assignment to taxonomy using a reference library. Then, the obtained species lists of the successfully assigned taxa are used for subsequent analyses or quality index calculation. However, the high diversity of bioinformatic methods and parameters make inter-studies comparison difficult, especially because OTUs are specific to a given study. Clustering sequences into OTUs aims to reduce the biasing effect of sequencing artefacts and to reach an approximate species level delimitation at the price of potentially grouping together sequences with different ecology. A similar bias occurs when sequences that differ from each other by their ecological preference are assigned to the same taxa. The incompleteness of reference libraries can further introduce a bias by not taking into account unassigned sequences, thus losing the ecological information they possess. In order to overcome these biases, our studies tested new approaches on de novo developed diatom indices based on periphytic samples collected from streams in France and Hungary. Index development was performed with the leave-one-out cross validation (LOOCV) technique by building a model on a training dataset containing n-1 samples and testing it on the remaining test sample. Test values were correlated with a reference environmental gradient. The model was based on the calculation of optimum and tolerance of taxonomic units along the reference gradient and a modified Zelinka-Marvan diatom index equation. Taxonomic units tested in the studies were morphospecies, OTUs (95% similarity threshold), Individual Sequence Units (ISUs, via minimal bioinformatic quality filtering) and Exact Sequence Variants (ESVs, via DADA2 denoising algorithm). The “clustering-free” approach (ISU- and ESV-based indices) performed better than the OTU-based one, providing a fine taxonomic resolution where the ecological difference on genetically close sequence variants could be detected. Thus, these indices are more adapted to a standardized and comparable routine bioassessment. The “taxonomy-free” approach revealed the ecological preferences for those molecular taxonomic units (ISUs/ESVs) that otherwise either (i) would have been assigned to the same taxa due to genetic similarity, or (ii) would not have been recognized because of their absence from the reference libraries. However, we also found that taxonomic information cannot be neglected in ecological studies when the presence of organisms under particular environmental conditions is to be explained or interpreted e.g. via the traits they possess. New types of clustering methods are welcome in the future of biomonitoring where the delimitation of taxonomic units should be refined based on a higher emphasis on their ecology rather than on morphological or genetical criteria.


2020 ◽  
Vol 401 (12) ◽  
pp. 1389-1405
Author(s):  
Lars-Oliver Essen ◽  
Marian Samuel Vogt ◽  
Hans-Ulrich Mösch

AbstractSelective adhesion of fungal cells to one another and to foreign surfaces is fundamental for the development of multicellular growth forms and the successful colonization of substrates and host organisms. Accordingly, fungi possess diverse cell wall-associated adhesins, mostly large glycoproteins, which present N-terminal adhesion domains at the cell surface for ligand recognition and binding. In order to function as robust adhesins, these glycoproteins must be covalently linkedto the cell wall via C-terminal glycosylphosphatidylinositol (GPI) anchors by transglycosylation. In this review, we summarize the current knowledge on the structural and functional diversity of so far characterized protein families of adhesion domains and set it into a broad context by an in-depth bioinformatics analysis using sequence similarity networks. In addition, we discuss possible mechanisms for the membrane-to-cell wall transfer of fungal adhesins by membrane-anchored Dfg5 transglycosidases.


Plants ◽  
2021 ◽  
Vol 10 (4) ◽  
pp. 753
Author(s):  
Miroslav Glasa ◽  
Richard Hančinský ◽  
Katarína Šoltys ◽  
Lukáš Predajňa ◽  
Jana Tomašechová ◽  
...  

In recent years, high throughput sequencing (HTS) has brought new possibilities to the study of the diversity and complexity of plant viromes. Mixed infection of a single plant with several viruses is frequently observed in such studies. We analyzed the virome of 10 tomato and sweet pepper samples from Slovakia, all showing the presence of potato virus Y (PVY) infection. Most datasets allow the determination of the nearly complete sequence of a single-variant PVY genome, belonging to one of the PVY recombinant strains (N-Wi, NTNa, or NTNb). However, in three to-mato samples (T1, T40, and T62) the presence of N-type and O-type sequences spanning the same genome region was documented, indicative of mixed infections involving different PVY strains variants, hampering the automated assembly of PVY genomes present in the sample. The N- and O-type in silico data were further confirmed by specific RT-PCR assays targeting UTR-P1 and NIa genomic parts. Although full genomes could not be de novo assembled directly in this situation, their deep coverage by relatively long paired reads allowed their manual re-assembly using very stringent mapping parameters. These results highlight the complexity of PVY infection of some host plants and the challenges that can be met when trying to precisely identify the PVY isolates involved in mixed infection.


2017 ◽  
Vol 83 (17) ◽  
Author(s):  
Francesca De Filippis ◽  
Manolo Laiola ◽  
Giuseppe Blaiotta ◽  
Danilo Ercolini

ABSTRACT Target-gene amplicon sequencing is the most exploited high-throughput sequencing application in microbial ecology. The targets are taxonomically relevant genes, with 16S rRNA being the gold standard for bacteria. As for fungi, the most commonly used target is the internal transcribed spacer (ITS). However, the uneven ITS length among species may promote preferential amplification and sequencing and incorrect estimation of their abundance. Therefore, the use of different targets is desirable. We evaluated the use of three different target amplicons for the characterization of fungal diversity. After an in silico primer evaluation, we compared three amplicons (the ITS1-ITS2 region [ITS1-2], 18S ribosomal small subunit RNA, and the D1/D2 domain of the 26S ribosomal large subunit RNA), using biological samples and a mock community of common fungal species. All three targets allowed for accurate identification of the species present. Nevertheless, high heterogeneity in ITS1-2 length was found, and this caused an overestimation of the abundance of species with a shorter ITS, while both 18S and 26S amplicons allowed for more reliable quantification. We demonstrated that ITS1-2 amplicon sequencing, although widely used, may lead to an incorrect evaluation of fungal communities, and efforts should be made to promote the use of different targets in sequencing-based microbial ecology studies. IMPORTANCE Amplicon-sequencing approaches for fungi may rely on different targets affecting the diversity and abundance of the fungal species. An increasing number of studies will address fungal diversity by high-throughput amplicon sequencing. The description of the communities must be accurate and reliable in order to draw useful insights and to address both ecological and biological questions. By analyzing a mock community and several biological samples, we demonstrate that using different amplicon targets may change the results of fungal microbiota analysis, and we highlight how a careful choice of the target is fundamental for a thorough description of the fungal communities.


2020 ◽  
Author(s):  
Emily N. Junkins ◽  
Bradley S. Stevenson

AbstractMolecular techniques continue to reveal a growing disparity between the immense diversity of microbial life and the small proportion that is in pure culture. The disparity, originally dubbed “the great plate count anomaly” by Staley and Konopka, has become even more vexing given our increased understanding of the importance of microbiomes to a host and the role of microorganisms in the vital biogeochemical functions of our biosphere. Searching for novel antimicrobial drug targets often focuses on screening a broad diversity of microorganisms. If diverse microorganisms are to be screened, they need to be cultivated. Recent innovative research has used molecular techniques to assess the efficacy of cultivation efforts, providing invaluable feedback to cultivation strategies for isolating targeted and/or novel microorganisms. Here, we aimed to determine the efficiency of cultivating representative microorganisms from a non-human, mammalian microbiome, identify those microorganisms, and determine the bioactivity of isolates. Molecular methods indicated that around 57% of the ASVs detected in the original inoculum were cultivated in our experiments, but nearly 53% of the total ASVs that were present in our cultivation experiments were not detected in the original inoculum. In light of our controls, our data suggests that when molecular tools were used to characterize our cultivation efforts, they provided a more complete, albeit more complex, understanding of which organisms were present compared to what was eventually cultivated. Lastly, about 3% of the isolates collected from our cultivation experiments showed inhibitory bioactivity against a multidrug-resistant pathogen panel, further highlighting the importance of informing and directing future cultivation efforts with molecular tools.ImportanceCultivation is the definitive tool to understand a microorganism’s physiology, metabolism, and ecological role(s). Despite continuous efforts to hone this skill, researchers are still observing yet-to-be cultivated organisms through high-throughput sequencing studies. Here, we use the very same tool that highlights biodiversity to assess cultivation efficiency. When applied to drug discovery, where screening a vast number of isolates for bioactive metabolites is common, cultivating redundant organisms is a hindrance. However, we observed that cultivating in combination with molecular tools can expand the observed diversity of an environment and its community, potentially increasing the number of microorganisms to be screened for natural products.


PLoS ONE ◽  
2017 ◽  
Vol 12 (7) ◽  
pp. e0178650
Author(s):  
Janamejaya Chowdhary ◽  
Frank E. Löffler ◽  
Jeremy C. Smith

Author(s):  
Yuansheng Liu ◽  
Xiaocai Zhang ◽  
Quan Zou ◽  
Xiangxiang Zeng

Abstract Summary Removing duplicate and near-duplicate reads, generated by high-throughput sequencing technologies, is able to reduce computational resources in downstream applications. Here we develop minirmd, a de novo tool to remove duplicate reads via multiple rounds of clustering using different length of minimizer. Experiments demonstrate that minirmd removes more near-duplicate reads than existing clustering approaches and is faster than existing multi-core tools. To the best of our knowledge, minirmd is the first tool to remove near-duplicates on reverse-complementary strand. Availability and implementation https://github.com/yuansliu/minirmd. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document