scholarly journals Unsupervised Two-Way Clustering of Metagenomic Sequences

2012 ◽  
Vol 2012 ◽  
pp. 1-11 ◽  
Author(s):  
Shruthi Prabhakara ◽  
Raj Acharya

A major challenge facing metagenomics is the development of tools for the characterization of functional and taxonomic content of vast amounts of short metagenome reads. The efficacy of clustering methods depends on the number of reads in the dataset, the read length and relative abundances of source genomes in the microbial community. In this paper, we formulate an unsupervised naive Bayes multispecies, multidimensional mixture model for reads from a metagenome. We use the proposed model to cluster metagenomic reads by their species of origin and to characterize the abundance of each species. We model the distribution of word counts along a genome as a Gaussian for shorter, frequent words and as a Poisson for longer words that are rare. We employ either a mixture of Gaussians or mixture of Poissons to model reads within each bin. Further, we handle the high-dimensionality and sparsity associated with the data, by grouping the set of words comprising the reads, resulting in a two-way mixture model. Finally, we demonstrate the accuracy and applicability of this method on simulated and real metagenomes. Our method can accurately cluster reads as short as 100 bps and is robust to varying abundances, divergences and read lengths.

2021 ◽  
Author(s):  
Polina Suter ◽  
Eva Dazert ◽  
Jack Kuipers ◽  
Charlotte K.Y. Ng ◽  
Tuyana Boldanova ◽  
...  

Comprehensive molecular characterization of cancer subtypes is essential for predicting clinical outcomes and searching for personalized treatments. We present bnClustOmics, a statistical model and computational tool for multi-omics unsupervised clustering, which serves a dual purpose: Clustering patient samples based on a Bayesian network mixture model and learning the networks of omics variables representing these clusters. The discovered networks encode interactions among all omics variables and provide a molecular characterization of each patient subgroup. We conducted simulation studies that demonstrated the advantages of our approach compared to other clustering methods in the case where the generative model is a mixture of Bayesian networks. We applied bnClustOmics to a hepatocellular carcinoma (HCC) dataset comprising genome (mutation and copy number), transcriptome, proteome, and phosphoproteome data. We identified three main HCC subtypes together with molecular characteristics, some of which are associated with survival even when adjusting for the clinical stage. Cluster-specific networks shed light on the links between genotypes and molecular phenotypes of samples within their respective clusters and suggest targets for personalized treatments.


2020 ◽  
Vol 21 (6) ◽  
pp. 2096
Author(s):  
Przemyslaw Decewicz ◽  
Piotr Golec ◽  
Mateusz Szymczak ◽  
Monika Radlinska ◽  
Lukasz Dziewit

The Ochrobactrum genus consists of an extensive repertoire of biotechnologically valuable bacterial strains but also opportunistic pathogens. In our previous study, a novel strain, Ochrobactrum sp. POC9, which enhances biogas production in wastewater treatment plants (WWTPs) was identified and thoroughly characterized. Despite an insightful analysis of that bacterium, its susceptibility to bacteriophages present in WWTPs has not been evaluated. Using raw sewage sample from WWTP and applying the enrichment method, two virulent phages, vB_OspM_OC and vB_OspP_OH, which infect the POC9 strain, were isolated. These are the first virulent phages infecting Ochrobactrum spp. identified so far. Both phages were subjected to thorough functional and genomic analyses, which allowed classification of the vB_OspM_OC virus as a novel jumbo phage, with a genome size of over 227 kb. This phage encodes DNA methyltransferase, which mimics the specificity of cell cycle regulated CcrM methylase, a component of the epigenetic regulatory circuits in Alphaproteobacteria. In this study, an analysis of the overall diversity of Ochrobactrum-specific (pro)phages retrieved from databases and extracted in silico from bacterial genomes was also performed. Complex genome mining allowed us to build similarity networks to compare 281 Ochrobactrum-specific viruses. Analyses of the obtained networks revealed a high diversity of Ochrobactrum phages and their dissimilarity to the viruses infecting other bacteria.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Shengquan Chen ◽  
Guanao Yan ◽  
Wenyu Zhang ◽  
Jinzhao Li ◽  
Rui Jiang ◽  
...  

AbstractThe recent advancements in single-cell technologies, including single-cell chromatin accessibility sequencing (scCAS), have enabled profiling the epigenetic landscapes for thousands of individual cells. However, the characteristics of scCAS data, including high dimensionality, high degree of sparsity and high technical variation, make the computational analysis challenging. Reference-guided approaches, which utilize the information in existing datasets, may facilitate the analysis of scCAS data. Here, we present RA3 (Reference-guided Approach for the Analysis of single-cell chromatin Accessibility data), which utilizes the information in massive existing bulk chromatin accessibility and annotated scCAS data. RA3 simultaneously models (1) the shared biological variation among scCAS data and the reference data, and (2) the unique biological variation in scCAS data that identifies distinct subpopulations. We show that RA3 achieves superior performance when used on several scCAS datasets, and on references constructed using various approaches. Altogether, these analyses demonstrate the wide applicability of RA3 in analyzing scCAS data.


2016 ◽  
Vol 2016 ◽  
pp. 1-10
Author(s):  
Yunjie Chen ◽  
Tianming Zhan ◽  
Ji Zhang ◽  
Hongyuan Wang

We propose a novel segmentation method based on regional and nonlocal information to overcome the impact of image intensity inhomogeneities and noise in human brain magnetic resonance images. With the consideration of the spatial distribution of different tissues in brain images, our method does not need preestimation or precorrection procedures for intensity inhomogeneities and noise. A nonlocal information based Gaussian mixture model (NGMM) is proposed to reduce the effect of noise. To reduce the effect of intensity inhomogeneity, the multigrid nonlocal Gaussian mixture model (MNGMM) is proposed to segment brain MR images in each nonoverlapping multigrid generated by using a new multigrid generation method. Therefore the proposed model can simultaneously overcome the impact of noise and intensity inhomogeneity and automatically classify 2D and 3D MR data into tissues of white matter, gray matter, and cerebral spinal fluid. To maintain the statistical reliability and spatial continuity of the segmentation, a fusion strategy is adopted to integrate the clustering results from different grid. The experiments on synthetic and clinical brain MR images demonstrate the superior performance of the proposed model comparing with several state-of-the-art algorithms.


2006 ◽  
Vol 57 (3) ◽  
pp. 452-469 ◽  
Author(s):  
Hélène Moussard ◽  
David Moreira ◽  
Marie-Anne Cambon-Bonavita ◽  
Purificación López-García ◽  
Christian Jeanthon

PLoS ONE ◽  
2016 ◽  
Vol 11 (2) ◽  
pp. e0149564 ◽  
Author(s):  
Sandrine Louis ◽  
Rewati-Mukund Tappu ◽  
Antje Damms-Machado ◽  
Daniel H. Huson ◽  
Stephan C. Bischoff

2006 ◽  
Vol 394 (3) ◽  
pp. 575-579 ◽  
Author(s):  
Sergey V. Novoselov ◽  
Deame Hua ◽  
Alexey V. Lobanov ◽  
Vadim N. Gladyshev

Sec (selenocysteine) is a rare amino acid in proteins. It is co-translationally inserted into proteins at UGA codons with the help of SECIS (Sec insertion sequence) elements. A full set of selenoproteins within a genome, known as the selenoproteome, is highly variable in different organisms. However, most of the known eukaryotic selenoproteins are represented in the mammalian selenoproteome. In addition, many of these selenoproteins have cysteine orthologues. Here, we describe a new selenoprotein, designated Fep15, which is distantly related to members of the 15 kDa selenoprotein (Sep15) family. Fep15 is absent in mammals, can be detected only in fish and is present in these organisms only in the selenoprotein form. In contrast with other members of the Sep15 family, which contain a putative active site composed of Sec and cysteine, Fep15 has only Sec. When transiently expressed in mammalian cells, Fep15 incorporated Sec in an SECIS- and SBP2 (SECIS-binding protein 2)-dependent manner and was targeted to the endoplasmic reticulum by its N-terminal signal peptide. Phylogenetic analyses of Sep15 family members suggest that Fep15 evolved by gene duplication.


2010 ◽  
Vol 101 (2) ◽  
pp. 491-500 ◽  
Author(s):  
Thomas F. Ducey ◽  
Matias B. Vanotti ◽  
Anthony D. Shriner ◽  
Ariel A. Szogi ◽  
Aprel Q. Ellison

Sign in / Sign up

Export Citation Format

Share Document