Gaussian Mixture Models and Probabilistic Decision-Based Neural Networks for Pattern Classification: A Comparative Study

1999 ◽  
Vol 8 (3) ◽  
pp. 235-245 ◽  
Author(s):  
K. K. Yiu ◽  
M. W. Mak ◽  
C. K. Li
PeerJ ◽  
2017 ◽  
Vol 5 ◽  
pp. e3035 ◽  
Author(s):  
Elaina D. Graham ◽  
John F. Heidelberg ◽  
Benjamin J. Tully

Metagenomics has become an integral part of defining microbial diversity in various environments. Many ecosystems have characteristically low biomass and few cultured representatives. Linking potential metabolisms to phylogeny in environmental microorganisms is important for interpreting microbial community functions and the impacts these communities have on geochemical cycles. However, with metagenomic studies there is the computational hurdle of ‘binning’ contigs into phylogenetically related units or putative genomes. Binning methods have been implemented with varying approaches such as k-means clustering, Gaussian mixture models, hierarchical clustering, neural networks, and two-way clustering; however, many of these suffer from biases against low coverage/abundance organisms and closely related taxa/strains. We are introducing a new binning method, BinSanity, that utilizes the clustering algorithm affinity propagation (AP), to cluster assemblies using coverage with compositional based refinement (tetranucleotide frequency and percent GC content) to optimize bins containing multiple source organisms. This separation of composition and coverage based clustering reduces bias for closely related taxa. BinSanity was developed and tested on artificial metagenomes varying in size and complexity. Results indicate that BinSanity has a higher precision, recall, and Adjusted Rand Index compared to five commonly implemented methods. When tested on a previously published environmental metagenome, BinSanity generated high completion and low redundancy bins corresponding with the published metagenome-assembled genomes.


2020 ◽  
Vol 25 (4) ◽  
pp. 622-641
Author(s):  
Jurgita Arnastauskaitė ◽  
Tomas Ruzgas

Flexible and reliable probability density estimation is fundamental in unsupervised learning and classification. Finite Gaussian mixture models are commonly used for this purpose. However, the parametric form of the distribution is not always known. In this case, non-parametric density estimation methods are used. Usually, these methods become computationally demanding as the number of components increases. In this paper, a comparative study of accuracy of some nonparametric density estimators is made by means of simulation. The following approaches have been considered: an adaptive bandwidth kernel estimator, a projection pursuit estimator, a logspline estimator, and a k-nearest neighbor estimator. It was concluded that data clustering as a pre-processing step improves the estimation of mixture densities. However, in case data does not have clearly defined clusters, the pre-preprocessing step does not give that much of advantage. The application of density estimators is illustrated using municipal solid waste data collected in Kaunas (Lithuania). The data distribution is similar (i.e., with kurtotic unimodal density) to the benchmark distribution introduced by Marron and Wand. Based on the homogeneity tests it can be concluded that distributions of the municipal solid waste fractions in Kutaisi (Georgia), Saint-Petersburg (Russia), and Boryspil (Ukraine) are statistically indifferent compared to the distribution of waste fractions in Kaunas. The distribution of waste data collected in Kaunas (Lithuania) follows the general observations introduced by Marron and Wand (i.e., has one mode and certain kurtosis).


2018 ◽  
Vol 5 (2) ◽  
pp. 83-100
Author(s):  
María Dolores Luquín-García ◽  
Edith Cecilia Macedo Ruíz ◽  
Omar Rojas-Altamirano ◽  
Carlos López-Hernández

The aim of this article is to determine the socioeconomic level (SEL) with disaggregation of the Basic Statistical Area (BSA) in the Mexican Republic. The methodology used is the one established by the Mexican Association of Market Research Agencies (AMAI) along with the National Institute of Statistics and Geography (INEGI). The Clustering of the BSAs was carried out according to variables contained in the Population and Housing Census of 2010, through Gaussian mixture models, learning neural networks and finally, by defining the labels corresponding to each SEL. We found the existence of a representative SEL for each BSA. In addition, the definition of each socioeconomic level shows good results with an average of 90.86% of correctly labeled elements.


Sign in / Sign up

Export Citation Format

Share Document