Selection of the number of clusters via the bootstrap method

Yixin Fang; Junhui Wang

doi:10.1016/j.csda.2011.09.003

The Bootstrap Method for the Selection of a Shrinkage Factor in Two-stage Estimation of the Reliability Function of an Exponential Distribution

Journal of Modern Applied Statistical Methods ◽

10.22237/jmasm/1241137200 ◽

2009 ◽

Vol 8 (1) ◽

pp. 233-236

Author(s):

Makarand V. Ratnaparkhi ◽

Vasant B. Waikar ◽

Fredrick J. Schuurmann

Keyword(s):

Exponential Distribution ◽

Bootstrap Method ◽

Reliability Function ◽

Shrinkage Factor ◽

Two Stage ◽

Stage Estimation ◽

Selection Of ◽

The Bootstrap Method

Download Full-text

Effect of plot size and plant spatial arrangement on the efficiency of family selection in sugarcane

Revista de la Facultad de Ciencias Agrarias UNCuyo ◽

10.48162/rev.39.033 ◽

2021 ◽

Vol 53 (2) ◽

pp. 1-10

Author(s):

Aparecido De Moraes ◽

Matheus Henrique Silveira Mendes ◽

Mauro Sérgio de Oliveira Leite ◽

Regis De Castro Carvalho ◽

Flávia Maria Avelar Gonçalves

Keyword(s):

Sample Size ◽

Bootstrap Method ◽

Spatial Arrangement ◽

Family Selection ◽

Plot Size ◽

Stalk Diameter ◽

New Varieties ◽

Selection Of ◽

The Bootstrap Method ◽

The Ideal

The purpose of this study was to identify the ideal sample size representing a family in its potential, to identify superior families and, in parallel, determine in which spatial arrangement they may have a better accuracy in the selection of new varieties of sugarcane. For such purpose, five families of full-sibs were evaluated, each with 360 individuals, in the randomized blocks design, with three replications in three different spacing among plants in the row (50 cm, 75 cm, and 100 cm) and 150 cm between the rows. To determine the ideal sample size, as well as the better spacing for evaluation, the bootstrap method was adopted. It was observed that 100 cm spacings provided the best average for the stalk numbers, stalk diameter and for estimated weight of stalks in the stool. The spacing of 75 cm between the plants allowed a better power of discrimination among the families for all characters evaluated. At this 75 cm spacing was also possible to identify superior families with a sample of 30 plants each plot and 3 reps in the trial. Highlights The bootstrap method was efficient to determine the ideal sample size, as well as the best spacing for evaluation. The 75-cm spacing had the highest power of discrimination among families, indicating that this spacing is the most efficient in evaluating sugarcane families for selection purposes. From all the results and considering selective accuracy as the guiding parameter for decision making, the highest values obtained considering the number of stalks and weight of stalks in the stools were found at the 75-cm spacing.

Download Full-text

Cluster Analysis of Antigenic Profiles of Tumors: Selection of Number of Clusters Using Akaike’s Information Criterion

Methods of Information in Medicine ◽

10.1055/s-0038-1634783 ◽

1990 ◽

Vol 29 (03) ◽

pp. 200-204 ◽

Cited By ~ 7

Author(s):

J. A. Koziol

Keyword(s):

Cluster Analysis ◽

Basic Problem ◽

Information Criterion ◽

Akaike's Information Criterion ◽

Cell Surface Antigens ◽

Number Of Clusters ◽

Akaike’S Information Criterion ◽

Multinomial Data ◽

Tumor Types ◽

Selection Of

AbstractA basic problem of cluster analysis is the determination or selection of the number of clusters evinced in any set of data. We address this issue with multinomial data using Akaike’s information criterion and demonstrate its utility in identifying an appropriate number of clusters of tumor types with similar profiles of cell surface antigens.

Download Full-text

Verification of Internet Advertising Effectiveness of Sales Promotion Using Sports Stars: Applying the Bootstrap Method

Korean Journal of Sports Science ◽

10.35159/kjss.2019.06.28.3.373 ◽

2019 ◽

Vol 28 (3) ◽

pp. 373-384

Author(s):

Jin-Ho Shin

Keyword(s):

Bootstrap Method ◽

Sales Promotion ◽

Internet Advertising ◽

Advertising Effectiveness ◽

The Bootstrap Method

Download Full-text

Using data envelopment analysis and the bootstrap method to evaluate organ transplantation efficiency in Brazil

Health Care Management Science ◽

10.1007/s10729-021-09552-6 ◽

2021 ◽

Author(s):

Alexandre Marinho ◽

Claudia Affonso Silva Araújo

Keyword(s):

Data Envelopment Analysis ◽

Organ Transplantation ◽

Bootstrap Method ◽

Data Envelopment ◽

Using Data ◽

The Bootstrap Method

Download Full-text

Statistical Estimates of the Pulsar Glitch Activity

Universe ◽

10.3390/universe7010008 ◽

2021 ◽

Vol 7 (1) ◽

pp. 8

Author(s):

Alessandro Montoli ◽

Marco Antonelli ◽

Brynmor Haskell ◽

Pierre Pizzochero

Keyword(s):

Linear Regression ◽

Upper Bound ◽

Bootstrap Method ◽

Waiting Times ◽

Statistical Estimates ◽

Equal Variance ◽

Linear Fit ◽

The Moment ◽

The Bootstrap Method

A common way to calculate the glitch activity of a pulsar is an ordinary linear regression of the observed cumulative glitch history. This method however is likely to underestimate the errors on the activity, as it implicitly assumes a (long-term) linear dependence between glitch sizes and waiting times, as well as equal variance, i.e., homoscedasticity, in the fit residuals, both assumptions that are not well justified from pulsar data. In this paper, we review the extrapolation of the glitch activity parameter and explore two alternatives: the relaxation of the homoscedasticity hypothesis in the linear fit and the use of the bootstrap technique. We find a larger uncertainty in the activity with respect to that obtained by ordinary linear regression, especially for those objects in which it can be significantly affected by a single glitch. We discuss how this affects the theoretical upper bound on the moment of inertia associated with the region of a neutron star containing the superfluid reservoir of angular momentum released in a stationary sequence of glitches. We find that this upper bound is less tight if one considers the uncertainty on the activity estimated with the bootstrap method and allows for models in which the superfluid reservoir is entirely in the crust.

Download Full-text

The Bootstrap-Method: Discussion and Proposal for Improvement / Das bootstrap-Verfahren: Diskussion und Verbesserungsvorschlag

Jahrbücher für Nationalökonomie und Statistik ◽

10.1515/jbnst-1998-0112 ◽

1998 ◽

Vol 217 (1) ◽

Author(s):

Hans Schneeberger

Keyword(s):

Bootstrap Method ◽

Law School ◽

Mean Deviation ◽

Alternative Method ◽

Doubling Method ◽

The Mean ◽

The Bootstrap Method

SummaryWith Efron’s law-school example the bootstrap method is compared with an alternative method, called doubling. It is shown, that the mean deviation of the estimator is always smaller for the doubling method.

Download Full-text

Application of the bootstrap method to quantify uncertainty in seismic hazard estimates

Bulletin of the Seismological Society of America ◽

10.1785/bssa0820010104 ◽

1992 ◽

Vol 82 (1) ◽

pp. 104-119

Author(s):

Michéle Lamarre ◽

Brent Townshend ◽

Haresh C. Shah

Keyword(s):

Confidence Interval ◽

Seismic Hazard ◽

Statistical Method ◽

Bootstrap Method ◽

Earthquake Magnitude ◽

Earthquake Catalog ◽

Uncertainty Measure ◽

Attenuation Models ◽

The Bootstrap Method

Abstract This paper describes a methodology to assess the uncertainty in seismic hazard estimates at particular sites. A variant of the bootstrap statistical method is used to combine the uncertainty due to earthquake catalog incompleteness, earthquake magnitude, and recurrence and attenuation models used. The uncertainty measure is provided in the form of a confidence interval. Comparisons of this method applied to various sites in California with previous studies are used to confirm the validity of the method.

Download Full-text

Self-Adaptive K-Means Based on a Covering Algorithm

Complexity ◽

10.1155/2018/7698274 ◽

2018 ◽

Vol 2018 ◽

pp. 1-16 ◽

Cited By ~ 1

Author(s):

Yiwen Zhang ◽

Yuanyuan Zhou ◽

Xing Guo ◽

Jintao Wu ◽

Qiang He ◽

...

Keyword(s):

Large Scale ◽

Clustering Algorithm ◽

Real Data ◽

Second Phase ◽

Data Sets ◽

Number Of Clusters ◽

Large Scale Data ◽

Long Time ◽

Two Phases ◽

Selection Of

The K-means algorithm is one of the ten classic algorithms in the area of data mining and has been studied by researchers in numerous fields for a long time. However, the value of the clustering number k in the K-means algorithm is not always easy to be determined, and the selection of the initial centers is vulnerable to outliers. This paper proposes an improved K-means clustering algorithm called the covering K-means algorithm (C-K-means). The C-K-means algorithm can not only acquire efficient and accurate clustering results but also self-adaptively provide a reasonable numbers of clusters based on the data features. It includes two phases: the initialization of the covering algorithm (CA) and the Lloyd iteration of the K-means. The first phase executes the CA. CA self-organizes and recognizes the number of clusters k based on the similarities in the data, and it requires neither the number of clusters to be prespecified nor the initial centers to be manually selected. Therefore, it has a “blind” feature, that is, k is not preselected. The second phase performs the Lloyd iteration based on the results of the first phase. The C-K-means algorithm combines the advantages of CA and K-means. Experiments are carried out on the Spark platform, and the results verify the good scalability of the C-K-means algorithm. This algorithm can effectively solve the problem of large-scale data clustering. Extensive experiments on real data sets show that the accuracy and efficiency of the C-K-means algorithm outperforms the existing algorithms under both sequential and parallel conditions.

Download Full-text

Standard Error Estimation of 3PL IRT True Score Equating With an MCMC Method

Journal of Educational and Behavioral Statistics ◽

10.3102/1076998607306076 ◽

2008 ◽

Vol 33 (3) ◽

pp. 257-278 ◽

Cited By ~ 6

Author(s):

Yuming Liu ◽

E. Matthew Schulz ◽

Lei Yu

Keyword(s):

Bootstrap Method ◽

Standard Errors ◽

True Score ◽

Mcmc Method ◽

Test Form ◽

True Score Equating ◽

Standard Error Estimation ◽

Equivalent Test ◽

Mean Square Errors ◽

The Bootstrap Method

A Markov chain Monte Carlo (MCMC) method and a bootstrap method were compared in the estimation of standard errors of item response theory (IRT) true score equating. Three test form relationships were examined: parallel, tau-equivalent, and congeneric. Data were simulated based on Reading Comprehension and Vocabulary tests of the Iowa Tests of Basic Skills®. For parallel and congeneric test forms within valid IRT true score ranges, the pattern and magnitude of standard errors of IRT true score equating estimated by the MCMC method were very close to those estimated by the bootstrap method. For tau-equivalent test forms, the pattern of standard errors estimated by the two methods was also similar. Bias and mean square errors of equating produced by the MCMC method were smaller than those produced by the bootstrap method; however, standard errors were larger. In educational testing, the MCMC method may be used as an additional or alternative procedure to the bootstrap method when evaluating the precision of equating results.

Download Full-text