scholarly journals CRPClustering: An R Package for Bayesian Nonparametric Chinese Restaurant Process Clustering with Entropy

Author(s):  
Masashi Okada

Clustering is a scientific method which finds the clusters of data and many related methods are traditionally researched for long terms. Bayesian nonparametrics is statistics which can treat models having infinite parameters. Chinese restaurant process is used in order to compose Dirichlet process. The clustering which uses Chinese restaurant process does not need to decide the number of clusters in advance. This algorithm automatically adjusts it. Then, this package can calculate clusters in addition to entropy as the ambiguity of clusters.

2018 ◽  
Author(s):  
Masashi Okada

Clustering is a scientific method which finds the clusters of data and many related methods are traditionally researched for long terms. Bayesian nonparametrics is statistics which can treat models having infinite parameters. Chinese restaurant process is used in order to compose Dirichlet process. The clustering which uses Chinese restaurant process does not need to decide the number of clusters in advance. This algorithm automatically adjusts it. Then, this package can calculate clusters in addition to entropy as the ambiguity of clusters.


2018 ◽  
Author(s):  
Masashi Okada

Clustering is a scientific method which finds the clusters of data and many related methods are traditionally researched for long terms. Bayesian nonparametrics is statistics which can treat models having infinite parameters. Chinese restaurant process is used in order to compose Dirichlet process. The clustering which uses Chinese restaurant process does not need to decide the number of clusters in advance. This algorithm automatically adjusts it. Then, this package can calculate clusters in addition to entropy as the ambiguity of clusters.


2019 ◽  
Vol 7 (1) ◽  
pp. 45-52
Author(s):  
Caroline Lawless ◽  
Julyan Arbel

Abstract For a long time, the Dirichlet process has been the gold standard discrete random measure in Bayesian nonparametrics. The Pitman-Yor process provides a simple and mathematically tractable generalization, allowing for a very flexible control of the clustering behaviour. Two commonly used representations of the Pitman-Yor process are the stick-breaking process and the Chinese restaurant process. The former is a constructive representation of the process which turns out very handy for practical implementation, while the latter describes the partition distribution induced. Obtaining one from the other is usually done indirectly with use of measure theory. In contrast, we propose here an elementary proof of Pitman-Yor’s Chinese Restaurant process from its stick-breaking representation.


Inventions ◽  
2018 ◽  
Vol 3 (4) ◽  
pp. 80 ◽  
Author(s):  
Georgios Palaiokrassas ◽  
Athanasios Voulodimos ◽  
Antonios Litke ◽  
Athanasios Papaoikonomou ◽  
Theodora Varvarigou

In this paper, we propose a method for event detection on social media, which aims at clustering media items into groups of events based on their textural information as well as available metadata. Our approach is based on distance-dependent Chinese Restaurant Process (ddCRP), a clustering approach resembling Dirichlet process algorithm. Furthermore, we scrutinize the effectiveness of a series of pre-processing steps in improving the detection performance. We experimentally evaluated our method using the Social Event Detection (SED) dataset of MediaEval 2013 benchmarking workshop, which pertains to the discovery of social events and their grouping in event-specific clusters. The obtained results indicate that the proposed method attains very good performance rates compared to existing approaches.


Author(s):  
Dongming Li ◽  
Changming Sun ◽  
Su Wei ◽  
Yue Yu ◽  
Jinhua Yang ◽  
...  

In this paper, a segmentation method for cell images using Markov random field (MRF) based on a Chinese restaurant process model (CRPM) is proposed. Firstly, we carry out the preprocessing on the cell images, and then we focus on cell image segmentation using MRF based on a CRPM under a maximum a posteriori (MAP) criterion. The CRPM can be used to estimate the number of clusters in advance, adjusting the number of clusters automatically according to the size of the data. Finally, the conditional iteration mode (CIM) method is used to implement the MRF based cell image segmentation process. To validate our proposed method, segmentation experiments are performed on oral mucosal cell images. The segmentation results were compared with other methods, using precision, Dice, and mean square error (MSE) as the objective evaluation criteria. The experimental results show that our method produces accurate cell image segmentation results, and our method can effectively improve segmentation for the nucleus, binuclear cell, and micronucleus cell. This work will play an important role in cell image recognition and analysis.


2019 ◽  
Vol 5 ◽  
pp. e206
Author(s):  
Reza Arfa ◽  
Rubiyah Yusof ◽  
Parvaneh Shabanzadeh

Trajectory clustering and path modelling are two core tasks in intelligent transport systems with a wide range of applications, from modeling drivers’ behavior to traffic monitoring of road intersections. Traditional trajectory analysis considers them as separate tasks, where the system first clusters the trajectories into a known number of clusters and then the path taken in each cluster is modelled. However, such a hierarchy does not allow the knowledge of the path model to be used to improve the performance of trajectory clustering. Based on the distance dependent Chinese restaurant process (DDCRP), a trajectory analysis system that simultaneously performs trajectory clustering and path modelling was proposed. Unlike most traditional approaches where the number of clusters should be known, the proposed method decides the number of clusters automatically. The proposed algorithm was tested on two publicly available trajectory datasets, and the experimental results recorded better performance and considerable improvement in both datasets for the task of trajectory clustering compared to traditional approaches. The study proved that the proposed method is an appropriate candidate to be used for trajectory clustering and path modelling.


2018 ◽  
Vol 11 (3) ◽  
pp. 52 ◽  
Author(s):  
Mark Jensen ◽  
John Maheu

In this paper, we let the data speak for itself about the existence of volatility feedback and the often debated risk–return relationship. We do this by modeling the contemporaneous relationship between market excess returns and log-realized variances with a nonparametric, infinitely-ordered, mixture representation of the observables’ joint distribution. Our nonparametric estimator allows for deviation from conditional Gaussianity through non-zero, higher ordered, moments, like asymmetric, fat-tailed behavior, along with smooth, nonlinear, risk–return relationships. We use the parsimonious and relatively uninformative Bayesian Dirichlet process prior to overcoming the problem of having too many unknowns and not enough observations. Applying our Bayesian nonparametric model to more than a century’s worth of monthly US stock market returns and realized variances, we find strong, robust evidence of volatility feedback. Once volatility feedback is accounted for, we find an unambiguous positive, nonlinear, relationship between expected excess returns and expected log-realized variance. In addition to the conditional mean, volatility feedback impacts the entire joint distribution.


Sign in / Sign up

Export Citation Format

Share Document