scholarly journals Parameter Choice, Stability and Validity for Robust Cluster Weighted Modeling

Stats ◽  
2021 ◽  
Vol 4 (3) ◽  
pp. 602-615
Author(s):  
Andrea Cappozzo ◽  
Luis Angel García García Escudero ◽  
Francesca Greselin ◽  
Agustín Mayo-Iscar

Statistical inference based on the cluster weighted model often requires some subjective judgment from the modeler. Many features influence the final solution, such as the number of mixture components, the shape of the clusters in the explanatory variables, and the degree of heteroscedasticity of the errors around the regression lines. Moreover, to deal with outliers and contamination that may appear in the data, hyper-parameter values ensuring robust estimation are also needed. In principle, this freedom gives rise to a variety of “legitimate” solutions, each derived by a specific set of choices and their implications in modeling. Here we introduce a method for identifying a “set of good models” to cluster a dataset, considering the whole panorama of choices. In this way, we enable the practitioner, or the scientist who needs to cluster the data, to make an educated choice. They will be able to identify the most appropriate solutions for the purposes of their own analysis, in light of their stability and validity.

Econometrica ◽  
2020 ◽  
Vol 88 (3) ◽  
pp. 1007-1029
Author(s):  
Bo E. Honoré ◽  
Luojia Hu

It is well understood that classical sample selection models are not semiparametrically identified without exclusion restrictions. Lee (2009) developed bounds for the parameters in a model that nests the semiparametric sample selection model. These bounds can be wide. In this paper, we investigate bounds that impose the full structure of a sample selection model with errors that are independent of the explanatory variables but have unknown distribution. The additional structure can significantly reduce the identified set for the parameters of interest. Specifically, we construct the identified set for the parameter vector of interest. It is a one‐dimensional line segment in the parameter space, and we demonstrate that this line segment can be short in practice. We show that the identified set is sharp when the model is correct and empty when there exist no parameter values that make the sample selection model consistent with the data. We also provide non‐sharp bounds under the assumption that the model is correct. These are easier to compute and associated with lower statistical uncertainty than the sharp bounds. Throughout the paper, we illustrate our approach by estimating a standard sample selection model for wages.


1994 ◽  
Vol 88 (2) ◽  
pp. 412-423 ◽  
Author(s):  
Bruce Western ◽  
Simon Jackman

Regression analysis in comparative research suffers from two distinct problems of statistical inference. First, because the data constitute all the available observations from a population, conventional inference based on the long-run behavior of a repeatable data mechanism is not appropriate. Second, the small and collinear data sets of comparative research yield imprecise estimates of the effects of explanatory variables. We describe a Bayesian approach to statistical inference that provides a unified solution to these two problems. This approach is illustrated in a comparative analysis of unionization.


Genetics ◽  
2002 ◽  
Vol 162 (4) ◽  
pp. 2025-2035 ◽  
Author(s):  
Mark A Beaumont ◽  
Wenyang Zhang ◽  
David J Balding

Abstract We propose a new method for approximate Bayesian statistical inference on the basis of summary statistics. The method is suited to complex problems that arise in population genetics, extending ideas developed in this setting by earlier authors. Properties of the posterior distribution of a parameter, such as its mean or density curve, are approximated without explicit likelihood calculations. This is achieved by fitting a local-linear regression of simulated parameter values on simulated summary statistics, and then substituting the observed summary statistics into the regression equation. The method combines many of the advantages of Bayesian statistical inference with the computational efficiency of methods based on summary statistics. A key advantage of the method is that the nuisance parameters are automatically integrated out in the simulation step, so that the large numbers of nuisance parameters that arise in population genetics problems can be handled without difficulty. Simulation results indicate computational and statistical efficiency that compares favorably with those of alternative methods previously proposed in the literature. We also compare the relative efficiency of inferences obtained using methods based on summary statistics with those obtained directly from the data using MCMC.


2020 ◽  
Vol 20 (1) ◽  
pp. 27
Author(s):  
Florencia Wahyu Ganda Fismaya ◽  
Abduh Riski ◽  
Ahmad Kamsyakawuni

Selling or trading in the industrial 4.0 era as it can now be done by opening a shop online. Therefore, shopping at this time can also be done online also, so that the online shop owners require orders that do not allow for Cash On Delivery (COD) transactions using package delivery services. This research discusses about finding a solution for good shipping with a minimum total mileage of several couriers at PT. Titipan Kilat District Banyuwangi uses AFSA as a settlement algorithm. The experimental process is carried out by using several parameter values to determine the parameters that affect the final solution. Each parameter will be tested with a maximum of 1000 iterations, then the best results will be tested again with a maximum iteration of 2000, and 5000 and will be compared with the original distance traveled by the couriers. The final solution offered in the form of a delivery route by three couriers with the total distance (Z) of the third courier is 87.28 Km with the smallest iteration value reaching the local minimum in iteration 1169. Keywords: artificial fish swarm algorithm (afsa), multiple travelling salesman problem (m-tsp), route, total mileage.


1987 ◽  
Vol 19 (2) ◽  
pp. 173-186 ◽  
Author(s):  
C M Guy

A common problem in the use of singly-constrained spatial interaction shopping models has been that of finding optimal parameter values. This problem has been exacerbated where improvements to the model have involved extra parameters to be estimated. In this paper it is shown that calibration of quite complex models can be achieved through modification of the conventional ‘gravity’ model to a generalised linear model with Poisson error structure and logarithmic link function. Data on observed trips between fifteen residential zones and eighty-three shopping destinations in Cardiff are used to test several models through application of the GLIM computing package. Models involving extra explanatory variables, origin-specific distance-decay parameters, and competing-destinations terms are all shown to offer worthwhile improvements in performance over the conventional singly-constrained model. An individual-specific model is also tested for a small sample of shoppers. Finally, some comments are made concerning the relevance of the Cardiff findings and the wider significance of these methodological advances.


2019 ◽  
Author(s):  
Bahman Nasseroleslami ◽  
Stefan Dukic ◽  
Teresa Buxo ◽  
Amina Coffey ◽  
Roisin McMackin ◽  
...  

AbstractDespite advances in multivariate spectral analysis of neural signals, the statistical inference of measures such as spectral power and coherence in practical and real-life scenarios remains a challenge. The non-normal distribution of the neural signals and presence of artefactual components make it difficult to use the parametric methods for robust estimation of measures or to infer the presence of specific spectral components above the chance level. Furthermore, the bias of the coherence measures and their complex statistical distributions are impediments in robust statistical comparisons between 2 different levels of coherence. Non-parametric methods based on the median of auto-/cross-spectra have shown promise for robust estimation of spectral power and coherence estimates. However, the statistical inference based on these non-parametric estimates remain to be formulated and tested. In this report a set of methods based on non-parametric rank statistics for 1-sample and 2-sample testing of spectral power and coherence is provided. The proposed methods were demonstrated and tested using simulated neural signals in different conditions. The results show that non-parametric methods provide robustness against artefactual components. Moreover, they provide new possibilities for robust 1-sample and 2-sample testing of the complex coherency function, including both the magnitude and phase, where existing methods fall short of functionality. The utility of the methods were further demonstrated by examples on experimental neural data. The proposed approach provides a new framework for non-parametric spectral analysis of digital signals. These methods are especially suited to neuroscience and neural engineering applications, given the attractive properties such as minimal assumption on distributions, statistical robustness, and the diverse testing scenarios afforded.


Sign in / Sign up

Export Citation Format

Share Document