Unsupervised Two-Way Clustering of Metagenomic Sequences

Journal of Biomedicine and Biotechnology ◽

10.1155/2012/153647 ◽

2012 ◽

Vol 2012 ◽

pp. 1-11 ◽

Cited By ~ 2

Author(s):

Shruthi Prabhakara ◽

Raj Acharya

Keyword(s):

Microbial Community ◽

Mixture Model ◽

High Dimensionality ◽

Read Length ◽

Clustering Methods ◽

Mixture Of Gaussians ◽

Proposed Model ◽

A Genome ◽

Relative Abundances

A major challenge facing metagenomics is the development of tools for the characterization of functional and taxonomic content of vast amounts of short metagenome reads. The efficacy of clustering methods depends on the number of reads in the dataset, the read length and relative abundances of source genomes in the microbial community. In this paper, we formulate an unsupervised naive Bayes multispecies, multidimensional mixture model for reads from a metagenome. We use the proposed model to cluster metagenomic reads by their species of origin and to characterize the abundance of each species. We model the distribution of word counts along a genome as a Gaussian for shorter, frequent words and as a Poisson for longer words that are rare. We employ either a mixture of Gaussians or mixture of Poissons to model reads within each bin. Further, we handle the high-dimensionality and sparsity associated with the data, by grouping the set of words comprising the reads, resulting in a two-way mixture model. Finally, we demonstrate the accuracy and applicability of this method on simulated and real metagenomes. Our method can accurately cluster reads as short as 100 bps and is robust to varying abundances, divergences and read lengths.

Download Full-text

Multi-omics subtyping of hepatocellular carcinoma patients using a Bayesian network mixture model

10.1101/2021.12.16.473083 ◽

2021 ◽

Author(s):

Polina Suter ◽

Eva Dazert ◽

Jack Kuipers ◽

Charlotte K.Y. Ng ◽

Tuyana Boldanova ◽

...

Keyword(s):

Hepatocellular Carcinoma ◽

Bayesian Network ◽

Molecular Characterization ◽

Mixture Model ◽

Clinical Stage ◽

Molecular Characteristics ◽

Clustering Methods ◽

Cancer Subtypes ◽

Molecular Phenotypes

Comprehensive molecular characterization of cancer subtypes is essential for predicting clinical outcomes and searching for personalized treatments. We present bnClustOmics, a statistical model and computational tool for multi-omics unsupervised clustering, which serves a dual purpose: Clustering patient samples based on a Bayesian network mixture model and learning the networks of omics variables representing these clusters. The discovered networks encode interactions among all omics variables and provide a molecular characterization of each patient subgroup. We conducted simulation studies that demonstrated the advantages of our approach compared to other clustering methods in the case where the generative model is a mixture of Bayesian networks. We applied bnClustOmics to a hepatocellular carcinoma (HCC) dataset comprising genome (mutation and copy number), transcriptome, proteome, and phosphoproteome data. We identified three main HCC subtypes together with molecular characteristics, some of which are associated with survival even when adjusting for the clinical stage. Cluster-specific networks shed light on the links between genotypes and molecular phenotypes of samples within their respective clusters and suggest targets for personalized treatments.

Download Full-text

Faculty Opinions recommendation of Molecular-phylogenetic characterization of microbial community imbalances in human inflammatory bowel diseases.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1112029.567996 ◽

2008 ◽

Author(s):

John Pemberton ◽

Stefan Holubar

Keyword(s):

Microbial Community ◽

Inflammatory Bowel Diseases ◽

Phylogenetic Characterization ◽

Bowel Diseases ◽

Molecular Phylogenetic ◽

Inflammatory Bowel

Download Full-text

Identification and Characterization of the First Virulent Phages, Including a Novel Jumbo Virus, Infecting Ochrobactrum spp.

International Journal of Molecular Sciences ◽

10.3390/ijms21062096 ◽

2020 ◽

Vol 21 (6) ◽

pp. 2096

Author(s):

Przemyslaw Decewicz ◽

Piotr Golec ◽

Mateusz Szymczak ◽

Monika Radlinska ◽

Lukasz Dziewit

Keyword(s):

Dna Methyltransferase ◽

Wastewater Treatment Plants ◽

Biogas Production ◽

Genome Mining ◽

Sewage Sample ◽

Bacterial Strains ◽

Regulatory Circuits ◽

A Genome ◽

Insightful Analysis

The Ochrobactrum genus consists of an extensive repertoire of biotechnologically valuable bacterial strains but also opportunistic pathogens. In our previous study, a novel strain, Ochrobactrum sp. POC9, which enhances biogas production in wastewater treatment plants (WWTPs) was identified and thoroughly characterized. Despite an insightful analysis of that bacterium, its susceptibility to bacteriophages present in WWTPs has not been evaluated. Using raw sewage sample from WWTP and applying the enrichment method, two virulent phages, vB_OspM_OC and vB_OspP_OH, which infect the POC9 strain, were isolated. These are the first virulent phages infecting Ochrobactrum spp. identified so far. Both phages were subjected to thorough functional and genomic analyses, which allowed classification of the vB_OspM_OC virus as a novel jumbo phage, with a genome size of over 227 kb. This phage encodes DNA methyltransferase, which mimics the specificity of cell cycle regulated CcrM methylase, a component of the epigenetic regulatory circuits in Alphaproteobacteria. In this study, an analysis of the overall diversity of Ochrobactrum-specific (pro)phages retrieved from databases and extracted in silico from bacterial genomes was also performed. Complex genome mining allowed us to build similarity networks to compare 281 Ochrobactrum-specific viruses. Analyses of the obtained networks revealed a high diversity of Ochrobactrum phages and their dissimilarity to the viruses infecting other bacteria.

Download Full-text

RA3 is a reference-guided approach for epigenetic characterization of single cells

Nature Communications ◽

10.1038/s41467-021-22495-4 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Shengquan Chen ◽

Guanao Yan ◽

Wenyu Zhang ◽

Jinzhao Li ◽

Rui Jiang ◽

...

Keyword(s):

Single Cell ◽

Computational Analysis ◽

Reference Data ◽

Single Cells ◽

Chromatin Accessibility ◽

Biological Variation ◽

Superior Performance ◽

High Dimensionality ◽

High Degree

AbstractThe recent advancements in single-cell technologies, including single-cell chromatin accessibility sequencing (scCAS), have enabled profiling the epigenetic landscapes for thousands of individual cells. However, the characteristics of scCAS data, including high dimensionality, high degree of sparsity and high technical variation, make the computational analysis challenging. Reference-guided approaches, which utilize the information in existing datasets, may facilitate the analysis of scCAS data. Here, we present RA3 (Reference-guided Approach for the Analysis of single-cell chromatin Accessibility data), which utilizes the information in massive existing bulk chromatin accessibility and annotated scCAS data. RA3 simultaneously models (1) the shared biological variation among scCAS data and the reference data, and (2) the unique biological variation in scCAS data that identifies distinct subpopulations. We show that RA3 achieves superior performance when used on several scCAS datasets, and on references constructed using various approaches. Altogether, these analyses demonstrate the wide applicability of RA3 in analyzing scCAS data.

Download Full-text

Multigrid Nonlocal Gaussian Mixture Model for Segmentation of Brain Tissues in Magnetic Resonance Images

BioMed Research International ◽

10.1155/2016/6727290 ◽

2016 ◽

Vol 2016 ◽

pp. 1-10

Author(s):

Yunjie Chen ◽

Tianming Zhan ◽

Ji Zhang ◽

Hongyuan Wang

Keyword(s):

Magnetic Resonance ◽

Gaussian Mixture Model ◽

Mixture Model ◽

Gaussian Mixture ◽

Magnetic Resonance Images ◽

Intensity Inhomogeneity ◽

Mr Images ◽

Proposed Model ◽

Brain Mr Images ◽

The Impact

We propose a novel segmentation method based on regional and nonlocal information to overcome the impact of image intensity inhomogeneities and noise in human brain magnetic resonance images. With the consideration of the spatial distribution of different tissues in brain images, our method does not need preestimation or precorrection procedures for intensity inhomogeneities and noise. A nonlocal information based Gaussian mixture model (NGMM) is proposed to reduce the effect of noise. To reduce the effect of intensity inhomogeneity, the multigrid nonlocal Gaussian mixture model (MNGMM) is proposed to segment brain MR images in each nonoverlapping multigrid generated by using a new multigrid generation method. Therefore the proposed model can simultaneously overcome the impact of noise and intensity inhomogeneity and automatically classify 2D and 3D MR data into tissues of white matter, gray matter, and cerebral spinal fluid. To maintain the statistical reliability and spatial continuity of the segmentation, a fusion strategy is adopted to integrate the clustering results from different grid. The experiments on synthetic and clinical brain MR images demonstrate the superior performance of the proposed model comparing with several state-of-the-art algorithms.

Download Full-text

Uncultured Archaea in a hydrothermal microbial assemblage: phylogenetic diversity and characterization of a genome fragment from a euryarchaeote

FEMS Microbiology Ecology ◽

10.1111/j.1574-6941.2006.00128.x ◽

2006 ◽

Vol 57 (3) ◽

pp. 452-469 ◽

Cited By ~ 14

Author(s):

HÃ©lÃ¨ne Moussard ◽

David Moreira ◽

Marie-Anne Cambon-Bonavita ◽

PurificaciÃ³n LÃ³pez-GarcÃa ◽

Christian Jeanthon

Keyword(s):

Phylogenetic Diversity ◽

Genome Fragment ◽

Microbial Assemblage ◽

A Genome ◽

Uncultured Archaea

Download Full-text

Characterization of the Microbial Community in a Partial Nitrifying Sequencing Batch Biofilm Reactor

Current Microbiology ◽

10.1007/s00284-011-0019-x ◽

2011 ◽

Vol 63 (6) ◽

pp. 543-550 ◽

Cited By ~ 10

Author(s):

Taotao Zeng ◽

Dong Li ◽

Jie Zhang

Keyword(s):

Microbial Community ◽

Biofilm Reactor ◽

Sequencing Batch Biofilm Reactor

Download Full-text

Characterization of the Gut Microbial Community of Obese Patients Following a Weight-Loss Intervention Using Whole Metagenome Shotgun Sequencing

PLoS ONE ◽

10.1371/journal.pone.0149564 ◽

2016 ◽

Vol 11 (2) ◽

pp. e0149564 ◽

Cited By ~ 90

Author(s):

Sandrine Louis ◽

Rewati-Mukund Tappu ◽

Antje Damms-Machado ◽

Daniel H. Huson ◽

Stephan C. Bischoff

Keyword(s):

Weight Loss ◽

Microbial Community ◽

Shotgun Sequencing ◽

Obese Patients ◽

Weight Loss Intervention ◽

Gut Microbial Community

Download Full-text

Identification and characterization of Fep15, a new selenocysteine-containing member of the Sep15 protein family

Biochemical Journal ◽

10.1042/bj20051569 ◽

2006 ◽

Vol 394 (3) ◽

pp. 575-579 ◽

Cited By ~ 28

Author(s):

Sergey V. Novoselov ◽

Deame Hua ◽

Alexey V. Lobanov ◽

Vadim N. Gladyshev

Keyword(s):

Mammalian Cells ◽

Insertion Sequence ◽

Phylogenetic Analyses ◽

Dependent Manner ◽

Putative Active Site ◽

A Genome ◽

Sequence Elements ◽

Identification And Characterization ◽

Insertion Sequence Elements

Sec (selenocysteine) is a rare amino acid in proteins. It is co-translationally inserted into proteins at UGA codons with the help of SECIS (Sec insertion sequence) elements. A full set of selenoproteins within a genome, known as the selenoproteome, is highly variable in different organisms. However, most of the known eukaryotic selenoproteins are represented in the mammalian selenoproteome. In addition, many of these selenoproteins have cysteine orthologues. Here, we describe a new selenoprotein, designated Fep15, which is distantly related to members of the 15 kDa selenoprotein (Sep15) family. Fep15 is absent in mammals, can be detected only in fish and is present in these organisms only in the selenoprotein form. In contrast with other members of the Sep15 family, which contain a putative active site composed of Sec and cysteine, Fep15 has only Sec. When transiently expressed in mammalian cells, Fep15 incorporated Sec in an SECIS- and SBP2 (SECIS-binding protein 2)-dependent manner and was targeted to the endoplasmic reticulum by its N-terminal signal peptide. Phylogenetic analyses of Sep15 family members suggest that Fep15 evolved by gene duplication.

Download Full-text

Characterization of a microbial community capable of nitrification at cold temperature

Bioresource Technology ◽

10.1016/j.biortech.2009.07.091 ◽

2010 ◽

Vol 101 (2) ◽

pp. 491-500 ◽

Cited By ~ 55

Author(s):

Thomas F. Ducey ◽

Matias B. Vanotti ◽

Anthony D. Shriner ◽

Ariel A. Szogi ◽

Aprel Q. Ellison

Keyword(s):

Microbial Community ◽

Cold Temperature

Download Full-text