Semisupervised Clustering by Iterative Partition and Regression with Neuroscience Applications

Computational Intelligence and Neuroscience ◽

10.1155/2016/4037380 ◽

2016 ◽

Vol 2016 ◽

pp. 1-13 ◽

Cited By ~ 2

Author(s):

Guoqi Qian ◽

Yuehua Wu ◽

Davide Ferrari ◽

Puxue Qiao ◽

Frédéric Hollande

Keyword(s):

Real Data ◽

Regression Coefficients ◽

Simulation Studies ◽

Data Set ◽

Cluster Label ◽

Wide Range ◽

Data Clusters ◽

Computing Procedure ◽

Semisupervised Clustering ◽

Robust Statistical Methods

Regression clustering is a mixture of unsupervised and supervised statistical learning and data mining method which is found in a wide range of applications including artificial intelligence and neuroscience. It performs unsupervised learning when it clusters the data according to their respective unobserved regression hyperplanes. The method also performs supervised learning when it fits regression hyperplanes to the corresponding data clusters. Applying regression clustering in practice requires means of determining the underlying number of clusters in the data, finding the cluster label of each data point, and estimating the regression coefficients of the model. In this paper, we review the estimation and selection issues in regression clustering with regard to the least squares and robust statistical methods. We also provide a model selection based technique to determine the number of regression clusters underlying the data. We further develop a computing procedure for regression clustering estimation and selection. Finally, simulation studies are presented for assessing the procedure, together with analyzing a real data set on RGB cell marking in neuroscience to illustrate and interpret the method.

Download Full-text

Robustness of Projective IRT to Misspecification of the Underlying Multidimensional Model

Applied Psychological Measurement ◽

10.1177/0146621620909894 ◽

2020 ◽

Vol 44 (5) ◽

pp. 362-375

Author(s):

Tyler Strachan ◽

Edward Ip ◽

Yanyan Fu ◽

Terry Ackerman ◽

Shyh-Huei Chen ◽

...

Keyword(s):

Item Response Theory ◽

Item Response ◽

Real Data ◽

Model Parameters ◽

Simulation Studies ◽

Response Theory ◽

Computational Stability ◽

Data Set ◽

Response Data ◽

Higher Dimensional

As a method to derive a “purified” measure along a dimension of interest from response data that are potentially multidimensional in nature, the projective item response theory (PIRT) approach requires first fitting a multidimensional item response theory (MIRT) model to the data before projecting onto a dimension of interest. This study aims to explore how accurate the PIRT results are when the estimated MIRT model is misspecified. Specifically, we focus on using a (potentially misspecified) two-dimensional (2D)-MIRT for projection because of its advantages, including interpretability, identifiability, and computational stability, over higher dimensional models. Two large simulation studies (I and II) were conducted. Both studies examined whether the fitting of a 2D-MIRT is sufficient to recover the PIRT parameters when multiple nuisance dimensions exist in the test items, which were generated, respectively, under compensatory MIRT and bifactor models. Various factors were manipulated, including sample size, test length, latent factor correlation, and number of nuisance dimensions. The results from simulation studies I and II showed that the PIRT was overall robust to a misspecified 2D-MIRT. Smaller third and fourth simulation studies were done to evaluate recovery of the PIRT model parameters when the correctly specified higher dimensional MIRT or bifactor model was fitted with the response data. In addition, a real data set was used to illustrate the robustness of PIRT.

Download Full-text

SSP: An R package to estimate sampling effort in studies of ecological communities

10.1101/2020.03.19.996991 ◽

2020 ◽

Author(s):

Edlin J. Guerra-Castro ◽

Juan Carlos Cajas ◽

Nuno Simões ◽

Juan J Cruz-Motta ◽

Maite Mascaró

Keyword(s):

Simulated Data ◽

Real Data ◽

R Package ◽

Sampling Effort ◽

Ecological Communities ◽

Ecological Data ◽

Data Set ◽

Pilot Studies ◽

Ecological Features ◽

Wide Range

ABSTRACTSSP (simulation-based sampling protocol) is an R package that uses simulation of ecological data and dissimilarity-based multivariate standard error (MultSE) as an estimator of precision to evaluate the adequacy of different sampling efforts for studies that will test hypothesis using permutational multivariate analysis of variance. The procedure consists in simulating several extensive data matrixes that mimic some of the relevant ecological features of the community of interest using a pilot data set. For each simulated data, several sampling efforts are repeatedly executed and MultSE calculated. The mean value, 0.025 and 0.975 quantiles of MultSE for each sampling effort across all simulated data are then estimated and standardized regarding the lowest sampling effort. The optimal sampling effort is identified as that in which the increase in sampling effort do not improve the precision beyond a threshold value (e.g. 2.5 %). The performance of SSP was validated using real data, and in all examples the simulated data mimicked well the real data, allowing to evaluate the relationship MultSE – n beyond the sampling size of the pilot studies. SSP can be used to estimate sample size in a wide range of situations, ranging from simple (e.g. single site) to more complex (e.g. several sites for different habitats) experimental designs. The latter constitutes an important advantage, since it offers new possibilities for complex sampling designs, as it has been advised for multi-scale studies in ecology.

Download Full-text

Multiphysics anomaly map: A new data fusion workflow for geophysical interpretation

Interpretation ◽

10.1190/int-2018-0178.1 ◽

2020 ◽

Vol 8 (2) ◽

pp. B35-B43

Author(s):

Julio Cesar S. O. Lyrio ◽

Paulo T. L. Menezes ◽

Jorlivan L. Correa ◽

Adriano R. Viana

Keyword(s):

Data Fusion ◽

Geophysical Methods ◽

Real Data ◽

Logistic Function ◽

Geophysical Data ◽

Data Set ◽

Geophysical Interpretation ◽

Wide Range ◽

Anomaly Map ◽

Different Response

When collecting and processing geophysical data for exploration, the same geologic feature can generate a different response for each rock property being targeted. Typically, the units of these responses may differ by several orders of magnitude; therefore, the combination of geophysical data in integrated interpretation is not a straightforward process and cannot be performed by visual inspection only. The multiphysics anomaly map (MAM) that we have developed is a data fusion solution that consists of a spatial representation of the correlation between anomalies detected with different geophysical methods. In the MAM, we mathematically process geophysical data such as seismic attributes, gravity, magnetic, and resistivity before combining them in a single map. In each data set, anomalous regions of interest, which are problem-dependent, are selected by the interpreter. Selected anomalies are highlighted through the use of a logistic function, which is specially designed to clip large magnitudes and rescale the range of values, increasing the discrimination of anomalies. The resulting anomalies, named logistic anomalies, represent regions of large probabilities of target occurrence. This new solution highlights areas where individual interpretations of different geophysical methods correlate, increasing the confidence in the interpretation. We determine the effectiveness of our MAM with application to real data from onshore and offshore Brazil. In the onshore Recôncavo Basin, the MAM allows the interpreter to identify a channel where a drilled well found the largest sandstone thickness on the area. In a second example, from offshore Sergipe-Alagoas Basin, the MAM helps differentiate between a dry and an oil-bearing channel previously outlined in seismic data. Therefore, these outcomes indicate that the MAM is a valid interpretation tool that we believe can be applied to a wide range of geologic problems.

Download Full-text

Empirical likelihood for high-dimensional partially functional linear model

Random Matrices Theory and Application ◽

10.1142/s2010326320500173 ◽

2019 ◽

Vol 09 (04) ◽

pp. 2050017

Author(s):

Zhiqiang Jiang ◽

Zhensheng Huang ◽

Guoliang Fan

Keyword(s):

Linear Model ◽

Empirical Likelihood ◽

Likelihood Ratio Statistic ◽

Real Data ◽

Regression Coefficients ◽

High Dimensional ◽

Regularity Conditions ◽

Functional Linear Model ◽

Data Set ◽

Functional Predictors

This paper considers empirical likelihood inference for a high-dimensional partially functional linear model. An empirical log-likelihood ratio statistic is constructed for the regression coefficients of non-functional predictors and proved to be asymptotically normally distributed under some regularity conditions. Moreover, maximum empirical likelihood estimators of the regression coefficients of non-functional predictors are proposed and their asymptotic properties are obtained. Simulation studies are conducted to demonstrate the performance of the proposed procedure and a real data set is analyzed for illustration.

Download Full-text

A Novel Chen Extension: Theory, Characterizations and Different Estimation Methods

European Journal of Statistics ◽

10.28924/ada/stat.2.1 ◽

2021 ◽

Vol 2 ◽

pp. 1

Author(s):

Haitham M. Yousof ◽

Mustafa C. Korkmaz ◽

G.G. Hamedani ◽

Mohamed Ibrahim

Keyword(s):

Numerical Analysis ◽

Maximum Likelihood ◽

Weighted Least Squares ◽

Real Data ◽

Estimation Methods ◽

Simulation Studies ◽

Data Set ◽

Anderson Darling ◽

Mean Variance ◽

Skewness And Kurtosis

In this work, we derive a novel extension of Chen distribution. Some statistical properties of the new model are derived. Numerical analysis for mean, variance, skewness and kurtosis is presented. Some characterizations of the proposed distribution are presented. Different classical estimation methods under uncensored schemes such as the maximum likelihood, Anderson-Darling, weighted least squares and right-tail Anderson–Darling methods are considered. Simulation studies are performed in order to compare and assess the above-mentioned estimation methods. For comparing the applicability of the four classical methods, two application to real data set are analyzed.

Download Full-text

Topp–Leone Linear Exponential Distribution

Stochastics and Quality Control ◽

10.1515/eqc-2017-0022 ◽

2018 ◽

Vol 33 (1) ◽

pp. 31-43

Author(s):

Bol A. M. Atem ◽

Suleman Nasiru ◽

Kwara Nantomah

Keyword(s):

Maximum Likelihood ◽

Maximum Likelihood Estimation ◽

Exponential Distribution ◽

Likelihood Estimation ◽

Real Data ◽

Simulation Studies ◽

Finite Sample ◽

Data Set ◽

Finite Sample Properties ◽

Linear Exponential Distribution

Abstract This article studies the properties of the Topp–Leone linear exponential distribution. The parameters of the new model are estimated using maximum likelihood estimation, and simulation studies are performed to examine the finite sample properties of the parameters. An application of the model is demonstrated using a real data set. Finally, a bivariate extension of the model is proposed.

Download Full-text

A new class of skew-logistic distribution

Mathematical Sciences ◽

10.1007/s40096-019-00306-8 ◽

2019 ◽

Vol 13 (4) ◽

pp. 375-385

Author(s):

Saeed Mirzadeh ◽

Anis Iranmanesh

Keyword(s):

Exponential Family ◽

Real Data ◽

Statistical Characteristics ◽

Logistic Distribution ◽

Simulation Studies ◽

Data Set ◽

The Real ◽

New Class ◽

Skewness Parameter

Abstract In this study, the researchers introduce a new class of the logistic distribution which can be used to model the unimodal data with some skewness present. The new generalization is carried out using the basic idea of Nadarajah (Statistics 48(4):872–895, 2014), called truncated-exponential skew-logistic (TESL) distribution. The TESL distribution is a member of the exponential family; therefore, the skewness parameter can be derived easier. Meanwhile, some important statistical characteristics are presented; the real data set and simulation studies are applied to evaluate the results. Also, the TESL distribution is compared to at least five other skew-logistic distributions.

Download Full-text

Regularized Estimation of the Four-Parameter Logistic Model

Psych ◽

10.3390/psych2040020 ◽

2020 ◽

Vol 2 (4) ◽

pp. 269-278

Author(s):

Michela Battauz

Keyword(s):

Logistic Model ◽

Likelihood Function ◽

Latent Trait ◽

Real Data ◽

Theory Model ◽

Simulation Studies ◽

Data Set ◽

Penalty Term ◽

Regularized Estimation ◽

Item Parameters

The four-parameter logistic model is an Item Response Theory model for dichotomous items that limit the probability of giving a positive response to an item into a restricted range, so that even people at the extremes of a latent trait do not have a probability close to zero or one. Despite the literature acknowledging the usefulness of this model in certain contexts, the difficulty of estimating the item parameters has limited its use in practice. In this paper we propose a regularized estimation approach for the estimation of the item parameters based on the inclusion of a penalty term in the log-likelihood function. Simulation studies show the good performance of the proposal, which is further illustrated through an application to a real-data set.

Download Full-text

The Influence of Model Violation on Phylogenetic Inference: A Simulation Study

10.1101/2021.09.22.461455 ◽

2021 ◽

Author(s):

Suha Naser-Khdour ◽

Rob Lanfear ◽

Bui Quang Minh

Keyword(s):

Maximum Likelihood ◽

Simulation Study ◽

Convergent Evolution ◽

Real Data ◽

Phylogenetic Inference ◽

Likelihood Inference ◽

Simulation Studies ◽

Worst Case ◽

Wide Range

Phylogenetic inference typically assumes that the data has evolved under Stationary, Reversible and Homogeneous (SRH) conditions. Many empirical and simulation studies have shown that assuming SRH conditions can lead to significant errors in phylogenetic inference when the data violates these assumptions. Yet, many simulation studies focused on extreme non-SRH conditions that represent worst-case scenarios and not the average empirical dataset. In this study, we simulate datasets under various degrees of non-SRH conditions using empirically derived parameters to mimic real data and examine the effects of incorrectly assuming SRH conditions on inferring phylogenies. Our results show that maximum likelihood inference is generally quite robust to a wide range of SRH model violations but is inaccurate under extreme convergent evolution.

Download Full-text

An On-Line Virtual Environment For Teaching Statistical Sampling And Analysis

American Journal of Business Education (AJBE) ◽

10.19030/ajbe.v2i1.4024 ◽

2009 ◽

Vol 2 (1) ◽

pp. 79-90

Author(s):

Michael T. Marsh

Keyword(s):

Hypothesis Testing ◽

Real Data ◽

Introductory Statistics ◽

Student Knowledge ◽

Original Experiment ◽

Data Set ◽

Related Discipline ◽

Wide Range ◽

On Line ◽

Design Experiments

Regardless of the related discipline, students in statistics courses invariably have difficulty understanding the connection between the numerical values calculated for end-of-the-chapter exercises and their usefulness in decision making. This disconnect is, in part, due to the lack of time and opportunity to actually design the experiments and collect the data. The prototypes proposed in this project were developed to allow students to design experiments and collect data in relevant settings without the impediments to real data collection. The virtual environments attempt to replicate real situations of interest in which students can design and run experiments, devise alternative sampling strategies, analyze the results of experiments, and relate the result to the original experiment. The setting and underlying data set detailed in this paper were developed to allow students to experience a wide range of statistical concepts typically found in introductory statistic courses, such as basic descriptive statistics, estimation, hypothesis testing, ANOVA, and regression. Assessments of student knowledge after using this approach have shown marked increases in students understanding of statistical concepts, especially confidence intervals and hypothesis testing. Specific details about the data set are provided as are suggestions for using it in an introductory statistics class. Potential uses and examples for a variety of disciplines are also included.

Download Full-text