BinSanity: unsupervised clustering of environmental microbial assemblies using coverage and affinity propagation

PeerJ ◽

10.7717/peerj.3035 ◽

2017 ◽

Vol 5 ◽

pp. e3035 ◽

Cited By ~ 58

Author(s):

Elaina D. Graham ◽

John F. Heidelberg ◽

Benjamin J. Tully

Keyword(s):

Neural Networks ◽

Microbial Community ◽

Mixture Models ◽

Clustering Algorithm ◽

Gaussian Mixture Models ◽

Gc Content ◽

Gaussian Mixture ◽

Affinity Propagation ◽

Adjusted Rand Index ◽

Low Coverage

Metagenomics has become an integral part of defining microbial diversity in various environments. Many ecosystems have characteristically low biomass and few cultured representatives. Linking potential metabolisms to phylogeny in environmental microorganisms is important for interpreting microbial community functions and the impacts these communities have on geochemical cycles. However, with metagenomic studies there is the computational hurdle of ‘binning’ contigs into phylogenetically related units or putative genomes. Binning methods have been implemented with varying approaches such as k-means clustering, Gaussian mixture models, hierarchical clustering, neural networks, and two-way clustering; however, many of these suffer from biases against low coverage/abundance organisms and closely related taxa/strains. We are introducing a new binning method, BinSanity, that utilizes the clustering algorithm affinity propagation (AP), to cluster assemblies using coverage with compositional based refinement (tetranucleotide frequency and percent GC content) to optimize bins containing multiple source organisms. This separation of composition and coverage based clustering reduces bias for closely related taxa. BinSanity was developed and tested on artificial metagenomes varying in size and complexity. Results indicate that BinSanity has a higher precision, recall, and Adjusted Rand Index compared to five commonly implemented methods. When tested on a previously published environmental metagenome, BinSanity generated high completion and low redundancy bins corresponding with the published metagenome-assembled genomes.

Download Full-text

BinSanity: Unsupervised Clustering of Environmental Microbial Assemblies Using Coverage and Affinity Propagation

10.1101/069567 ◽

2016 ◽

Cited By ~ 1

Author(s):

Elaina Graham ◽

John Heidelberg ◽

Benjamin Tully

Keyword(s):

Neural Networks ◽

Microbial Community ◽

Clustering Algorithm ◽

Gaussian Mixture Models ◽

Gaussian Mixture ◽

Affinity Propagation ◽

Adjusted Rand Index ◽

Two Samples ◽

Low Coverage ◽

Gut Metagenome

AbstractMetagenomics has become an integral part of defining microbial diversity in various environments. Many ecosystems have characteristically low biomass and few cultured representatives. Linking potential metabolisms to phylogeny in environmental microorganisms is important for interpreting microbial community functions and the impacts these communities have on geochemical cycles. However, with metagenomic studies there is the computational hurdle of ‘binning’ contigs into phylogenetically related units or putative genomes. Binning methods have been implemented with varying approaches such as k-means clustering, Gaussian mixture models, hierarchical clustering, neural networks, and two-way clustering; however, many of these suffer from biases against low coverage/abundance organisms and closely related taxa/strains. We are introducing a new binning method, BinSanity, that utilizes the clustering algorithm affinity propagation (AP), to cluster assemblies using coverage alone, removing potential composition based biases in clustering contigs, but requires a minimum of two samples. To increase fidelity, a refinement script was developed that uses composition data (tetranucleotide frequency and %G+C content) to refine bins containing multiple source organisms. This separation of composition and coverage based signatures reduces clustering bias for closely related taxa. BinSanity was developed and tested on artificial metagenomes varying in size and complexity. Results indicate that this implementation of AP lead to a higher precision, recall, and Adjusted Rand Index over five commonly implemented methods. When tested on a previously published infant gut metagenome, BinSanity generated high completion and low redundancy bins corresponding with the published metagenome-assembled genomes.

Download Full-text

Spectral Clustering Algorithm of Uncertain Objects

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.300-301.1058 ◽

2013 ◽

Vol 300-301 ◽

pp. 1058-1061

Author(s):

Tong He

Keyword(s):

Mixture Models ◽

Experimental Evaluation ◽

Spectral Clustering ◽

Clustering Algorithm ◽

Gaussian Mixture Models ◽

Gaussian Mixture ◽

Bayesian Probability ◽

Leibler Divergence ◽

Effectiveness And Efficiency ◽

Spectral Clustering Algorithm

By extending classical spectral clustering algorithm, a new clustering algorithm of uncertain objects is proposed in this paper. In the algorithm, each uncertain object is represented as a Gaussian mixture model, and Kullback-Leibler divergence and Bayesian probability are respectively used as similarity measure between Gaussian mixture models. In an extensive experimental evaluation, we not only show the effectiveness and efficiency of the new algorithm and compare it with CLARANS algorithm of uncertain objects.

Download Full-text

Semi-supervised Gaussian Mixture Models Clustering Algorithm Based on Immune Clonal Selection

Proceedings of the 2015 4th National Conference on Electrical, Electronics and Computer Engineering ◽

10.2991/nceece-15.2016.214 ◽

2016 ◽

Author(s):

Wenlong Huang ◽

Xiaodan Wang

Keyword(s):

Mixture Models ◽

Clustering Algorithm ◽

Clonal Selection ◽

Gaussian Mixture Models ◽

Gaussian Mixture

Download Full-text

Probability Estimation of Direct Hydrocarbon Indicators Using Gaussian Mixture Models

10.21528/cbic2021-131 ◽

2021 ◽

Author(s):

John B. Lemos ◽

Matheus R. S. Barbosa ◽

Edric B. Troccoli ◽

Alexsandro G. Cerqueira

Keyword(s):

Cluster Analysis ◽

Mixture Models ◽

Clustering Algorithm ◽

Gaussian Mixture Models ◽

Principal Component ◽

Gaussian Mixture ◽

Optimal Number ◽

Original Dataset ◽

Pca Algorithm ◽

Optimal Number Of Clusters

This work aims to delimit the Direct Hydrocarbon Indicators (DHI) zones using the Gaussian Mixture Models (GMM) algorithm, an unsupervised machine learning method, over the FS8 seismic horizon in the seismic data of the Dutch F3 Field. The dataset used to perform the cluster analysis was extracted from the 3D seismic dataset. It comprises the following seismic attributes: Sweetness, Spectral Decomposition, Acoustic Impedance, Coherence, and Instantaneous Amplitude. The Principal Component Analysis (PCA) algorithm was applied in the original dataset for dimensionality reduction and noise filtering, and we choose the first three principal components to be the input of the clustering algorithm. The cluster analysis using the Gaussian Mixture Models was performed by varying the number of groups from 2 to 20. The Elbow Method suggested a smaller number of groups than needed to isolate the DHI zones. Therefore, we observed that four is the optimal number of clusters to highlight this seismic feature. Furthermore, it was possible to interpret other clusters related to the lithology through geophysical well log data.

Download Full-text

Automatic Genre Classification of TV Programmes Using Gaussian Mixture Models and Neural Networks

18th International Conference on Database and Expert Systems Applications (DEXA 2007) ◽

10.1109/dexa.2007.92 ◽

2007 ◽

Cited By ~ 9

Author(s):

Maurizio Montagnuolo ◽

Alberto Messina

Keyword(s):

Neural Networks ◽

Mixture Models ◽

Gaussian Mixture Models ◽

Gaussian Mixture ◽

Genre Classification

Download Full-text

Determination of the Representative Socioeconomic Level by BSA in the Mexican Republic

Revista Perspectiva Empresarial ◽

10.16967/rpe.v5n2a6 ◽

2018 ◽

Vol 5 (2) ◽

pp. 83-100

Author(s):

María Dolores Luquín-García ◽

Edith Cecilia Macedo Ruíz ◽

Omar Rojas-Altamirano ◽

Carlos López-Hernández

Keyword(s):

Neural Networks ◽

Mixture Models ◽

Market Research ◽

Gaussian Mixture Models ◽

Gaussian Mixture ◽

Socioeconomic Level ◽

The One ◽

Definition Of ◽

Statistical Area

The aim of this article is to determine the socioeconomic level (SEL) with disaggregation of the Basic Statistical Area (BSA) in the Mexican Republic. The methodology used is the one established by the Mexican Association of Market Research Agencies (AMAI) along with the National Institute of Statistics and Geography (INEGI). The Clustering of the BSAs was carried out according to variables contained in the Population and Housing Census of 2010, through Gaussian mixture models, learning neural networks and finally, by defining the labels corresponding to each SEL. We found the existence of a representative SEL for each BSA. In addition, the definition of each socioeconomic level shows good results with an average of 90.86% of correctly labeled elements.

Download Full-text

Speech emotion recognition based on Gaussian Mixture Models and Deep Neural Networks

2017 Information Theory and Applications Workshop (ITA) ◽

10.1109/ita.2017.8023477 ◽

2017 ◽

Cited By ~ 5

Author(s):

Ivan J. Tashev ◽

Zhong-Qiu Wang ◽

Keith Godin

Keyword(s):

Neural Networks ◽

Emotion Recognition ◽

Mixture Models ◽

Deep Neural Networks ◽

Gaussian Mixture Models ◽

Gaussian Mixture ◽

Speech Emotion Recognition

Download Full-text

Gaussian Mixture Models and Probabilistic Decision-Based Neural Networks for Pattern Classification: A Comparative Study

Neural Computing and Applications ◽

10.1007/s005210050026 ◽

1999 ◽

Vol 8 (3) ◽

pp. 235-245 ◽

Cited By ~ 17

Author(s):

K. K. Yiu ◽

M. W. Mak ◽

C. K. Li

Keyword(s):

Neural Networks ◽

Comparative Study ◽

Mixture Models ◽

Pattern Classification ◽

Gaussian Mixture Models ◽

Gaussian Mixture

Download Full-text

A robust EM clustering algorithm for Gaussian mixture models

Pattern Recognition ◽

10.1016/j.patcog.2012.04.031 ◽

2012 ◽

Vol 45 (11) ◽

pp. 3950-3961 ◽

Cited By ~ 96

Author(s):

Miin-Shen Yang ◽

Chien-Yo Lai ◽

Chih-Ying Lin

Keyword(s):

Mixture Models ◽

Clustering Algorithm ◽

Gaussian Mixture Models ◽

Gaussian Mixture ◽

Em Clustering

Download Full-text

Automatic Genre Classification of TV Programmes Using Gaussian Mixture Models and Neural Networks

18th International Conference on Database and Expert Systems Applications (DEXA 2007) ◽

10.1109/dexa.2007.4312865 ◽

2007 ◽

Cited By ~ 2

Author(s):

Maurizio Montagnuolo ◽

Alberto Messina

Keyword(s):

Neural Networks ◽

Mixture Models ◽

Gaussian Mixture Models ◽

Gaussian Mixture ◽

Genre Classification

Download Full-text