Statistical Approach on Grading the Student Achievement via Normal Mixture Modeling

2012 ◽  
Author(s):  
Zairul Nor Deana Md. Desa ◽  
Ismail Mohamad ◽  
Zarina Mohd. Khalid ◽  
Hanafiah Md. Zin

Kajian dijalankan untuk membanding keputusan yang didapati daripada tiga kaedah penggredan terhadap pencapaian pelajar. Kaedah konvensional yang popular adalah kaedah Skala Tegak. Pendekatan statistik yang menggunakan kaedah Sisihan Piawai dan kaedah Bayesian bersyarat dipertimbangkan untuk memberi gred. Dalam model Bayesian, dianggapkan bahawa data adalah mengikut taburan Normal Tergabung di mana setiap gred adalah dipisahkan secara berasingan oleh parameter; min dan kadar bandingan dari taburan Normal Tergabung. Masalah yang timbul adalah sukar untuk menganggarkan ketumpatan posterior bagi parameter tersebut secara analitik. Satu penyelesaiannya adalah dengan menggunakan pendekatan Markov Chain Monte Carlo iaitu melalui algoritma pensampelan Gibbs. Kaedah Skala Tegak, kaedah Sisihan Piawai dan kaedah Bayesian bersyarat diaplikasikan untuk markah mentah peperiksaan bagi dua kumpulan pelajar. Pencapaian ketiga–tiga kaedah dibandingkan melalui nilai Kehilangan Kelas Neutral, Kehilangan Kelas Tidak Tegas dan Pekali Penentuan. Didapati keputusan dari kaedah Bayesian bersyarat menunjukkan penggredan yang lebih baik berbanding kaedah Skala Tegak dan kaedah Sisihan Piawai. Kata kunci: Kaedah penggredan, pengukuran pendidikan, Skala Tegak, kaedah Sisihan Piawai, Normal Tergabung, Markov Chain Monte Carlo, pensampelan Gibbs The purpose of this study is to compare results obtained from three methods of assigning letter grades to students’ achievement. The conventional and the most popular method to assign grades is the Straight Scale method (SS). Statistical approaches which used the Standard Deviation (GC) and conditional Bayesian methods are considered to assign the grades. In the conditional Bayesian model, we assume the data to follow the Normal Mixture distribution where the grades are distinctively separated by the parameters: means and proportions of the Normal Mixture distribution. The problem lies in estimating the posterior density of the parameters which is analytically intractable. A solution to this problem is using the Markov Chain Monte Carlo approach namely Gibbs sampler algorithm. The Straight Scale, Standard Deviation and Conditional Bayesian methods are applied to the examination raw scores of two sets of students. The performances of these methods are measured using the Neutral Class Loss, Lenient Class Loss and Coefficient of Determination. The results showed that Conditional Bayesian outperformed the Conventional Methods of assigning grades. Key words: Grading methods, educational measurement, Straight Scale, Standard Deviation method, Normal Mixture, Markov Chain Monte Carlo, Gibbs sampling

Geophysics ◽  
2019 ◽  
Vol 84 (6) ◽  
pp. R1003-R1020 ◽  
Author(s):  
Georgia K. Stuart ◽  
Susan E. Minkoff ◽  
Felipe Pereira

Bayesian methods for full-waveform inversion allow quantification of uncertainty in the solution, including determination of interval estimates and posterior distributions of the model unknowns. Markov chain Monte Carlo (MCMC) methods produce posterior distributions subject to fewer assumptions, such as normality, than deterministic Bayesian methods. However, MCMC is computationally a very expensive process that requires repeated solution of the wave equation for different velocity samples. Ultimately, a large proportion of these samples (often 40%–90%) is rejected. We have evaluated a two-stage MCMC algorithm that uses a coarse-grid filter to quickly reject unacceptable velocity proposals, thereby reducing the computational expense of solving the velocity inversion problem and quantifying uncertainty. Our filter stage uses operator upscaling, which provides near-perfect speedup in parallel with essentially no communication between processes and produces data that are highly correlated with those obtained from the full fine-grid solution. Four numerical experiments demonstrate the efficiency and accuracy of the method. The two-stage MCMC algorithm produce the same results (i.e., posterior distributions and uncertainty information, such as medians and highest posterior density intervals) as the Metropolis-Hastings MCMC. Thus, no information needed for uncertainty quantification is compromised when replacing the one-stage MCMC with the more computationally efficient two-stage MCMC. In four representative experiments, the two-stage method reduces the time spent on rejected models by one-third to one-half, which is important because most of models tried during the course of the MCMC algorithm are rejected. Furthermore, the two-stage MCMC algorithm substantially reduced the overall time-per-trial by as much as 40%, while increasing the acceptance rate from 9% to 90%.


2011 ◽  
Vol 11 (7) ◽  
pp. 20051-20105 ◽  
Author(s):  
D. G. Partridge ◽  
J. A. Vrugt ◽  
P. Tunved ◽  
A. M. L. Ekman ◽  
H. Struthers ◽  
...  

Abstract. This paper presents a novel approach to investigate cloud-aerosol interactions by coupling a Markov Chain Monte Carlo (MCMC) algorithm to a pseudo-adiabatic cloud parcel model. Despite the number of numerical cloud-aerosol sensitivity studies previously conducted few have used statistical analysis tools to investigate the sensitivity of a cloud model to input aerosol physiochemical parameters. Using synthetic data as observed values of cloud droplet number concentration (CDNC) distribution, this inverse modelling framework is shown to successfully converge to the correct calibration parameters. The employed analysis method provides a new, integrative framework to evaluate the sensitivity of the derived CDNC distribution to the input parameters describing the lognormal properties of the accumulation mode and the particle chemistry. To a large extent, results from prior studies are confirmed, but the present study also provides some additional insightful findings. There is a clear transition from very clean marine Arctic conditions where the aerosol parameters representing the mean radius and geometric standard deviation of the accumulation mode are found to be most important for determining the CDNC distribution to very polluted continental environments (aerosol concentration in the accumulation mode >1000 cm−3) where particle chemistry is more important than both number concentration and size of the accumulation mode. The competition and compensation between the cloud model input parameters illustrate that if the soluble mass fraction is reduced, both the number of particles and geometric standard deviation must increase and the mean radius of the accumulation mode must increase in order to achieve the same CDNC distribution. For more polluted aerosol conditions, with a reduction in soluble mass fraction the parameter correlation becomes weaker and more non-linear over the range of possible solutions (indicative of the sensitivity). This indicates that for the cloud parcel model used herein, the relative importance of the soluble mass fraction appears to decrease if the number or geometric standard deviation of the accumulation mode is increased. This study demonstrates that inverse modelling provides a flexible, transparent and integrative method for efficiently exploring cloud-aerosol interactions efficiently with respect to parameter sensitivity and correlation.


2019 ◽  
Vol 1 (1) ◽  
pp. 34
Author(s):  
Ulfa Destiarina ◽  
Mustika Hadijati ◽  
Desy Komalasari ◽  
Nurul Fitriyani

In parameter estimation, sometimes there are several problems that require the completion of a mixture distribution. This study aimed to apply the parameter estimation of exponential and Weibull mixture distribution in simulation data using the Bayesian Markov Chain Monte Carlo (MCMC) estimation method. The results obtained indicate that the analytic calculations of parameter estimation were more accurate than the calculations with the help of software, based on the terms of the suitability of the theory and its integration process.


2020 ◽  
Vol 222 (1) ◽  
pp. 388-405
Author(s):  
F J Tilmann ◽  
H Sadeghisorkhani ◽  
A Mauerberger

SUMMARY In probabilistic Bayesian inversions, data uncertainty is a crucial parameter for quantifying the uncertainties and correlations of the resulting model parameters or, in transdimensional approaches, even the complexity of the model. However, in many geophysical inference problems it is poorly known. Therefore, it is common practice to allow the data uncertainty itself to be a parameter to be determined. Although in principle any arbitrary uncertainty distribution can be assumed, Gaussian distributions whose standard deviation is then the unknown parameter to be estimated are the usual choice. In this special case, the paper demonstrates that a simple analytical integration is sufficient to marginalise out this uncertainty parameter, reducing the complexity of the model space without compromising the accuracy of the posterior model probability distribution. However, it is well known that the distribution of geophysical measurement errors, although superficially similar to a Gaussian distribution, typically contains more frequent samples along the tail of the distribution, so-called outliers. In linearized inversions these are often removed in subsequent iterations based on some threshold criterion, but in Markov chain Monte Carlo (McMC) inversions this approach is not possible as they rely on the likelihood ratios, which cannot be formed if the number of data points varies between the steps of the Markov chain. The flexibility to define the data error probability distribution in McMC can be exploited in order to account for this pattern of uncertainties in a natural way, without having to make arbitrary choices regarding residual thresholds. In particular, we can regard the data uncertainty distribution as a mixture between a Gaussian distribution, which represent valid measurements with some measurement error, and a uniform distribution, which represents invalid measurements. The relative balance between them is an unknown parameter to be estimated alongside the standard deviation of the Gauss distribution. For each data point, the algorithm can then assign a probability to be an outlier, and the influence of each data point will be effectively downgraded according to its probability to be an outlier. Furthermore, this assignment can change as the McMC search is exploring different parts of the model space. The approach is demonstrated with both synthetic and real tomography examples. In a synthetic test, the proposed mixed measurement error distribution allows recovery of the underlying model even in the presence of 6 per cent outliers, which completely destroy the ability of a regular McMC or linear search to provide a meaningful image. Applied to an actual ambient noise tomography study based on automatically picked dispersion curves, the resulting model is shown to be much more consistent for different data sets, which differ in the applied quality criteria, while retaining the ability to recover strong anomalies in selected parts of the model.


2021 ◽  
Vol 11 (16) ◽  
pp. 7343
Author(s):  
Dwi Rantini ◽  
Nur Iriawan ◽  
Irhamah Irhamah

Data with a multimodal pattern can be analyzed using a mixture model. In a mixture model, the most important step is the determination of the number of mixture components, because finding the correct number of mixture components will reduce the error of the resulting model. In a Bayesian analysis, one method that can be used to determine the number of mixture components is the reversible jump Markov chain Monte Carlo (RJMCMC). The RJMCMC is used for distributions that have location and scale parameters or location-scale distribution, such as the Gaussian distribution family. In this research, we added an important step before beginning to use the RJMCMC method, namely the modification of the analyzed distribution into location-scale distribution. We called this the non-Gaussian RJMCMC (NG-RJMCMC) algorithm. The following steps are the same as for the RJMCMC. In this study, we applied it to the Weibull distribution. This will help many researchers in the field of survival analysis since most of the survival time distribution is Weibull. We transformed the Weibull distribution into a location-scale distribution, which is the extreme value (EV) type 1 (Gumbel-type for minima) distribution. Thus, for the mixture analysis, we call this EV-I mixture distribution. Based on the simulation results, we can conclude that the accuracy level is at minimum 95%. We also applied the EV-I mixture distribution and compared it with the Gaussian mixture distribution for enzyme, acidity, and galaxy datasets. Based on the Kullback–Leibler divergence (KLD) and visual observation, the EV-I mixture distribution has higher coverage than the Gaussian mixture distribution. We also applied it to our dengue hemorrhagic fever (DHF) data from eastern Surabaya, East Java, Indonesia. The estimation results show that the number of mixture components in the data is four; we also obtained the estimation results of the other parameters and labels for each observation. Based on the Kullback–Leibler divergence (KLD) and visual observation, for our data, the EV-I mixture distribution offers better coverage than the Gaussian mixture distribution.


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Hassan M. Aljohani ◽  
Nada M. Alfaer

Censoring schemes have received much attention over the past decades. Hybrid censoring schemes are censoring schemes mixed of type-I (T-1) and type-II (T-2) censoring schemes, a most popular area of study in life-testing or reliability experiments. More precisely, hybrid censoring can be described as a mixture of T-I and T-2 schemes. Gamma distribution is widely used, and its connection has more distributions. Mixture and single gamma distribution will be studied to estimate parameters, based on type-II hybrid censoring schemes (T-2HCS). We will apply algorithms to compute the maximum likelihood (ML) estimators and Bayesian approaches, using statistics, such as Markov chain Monte Carlo methods. Bayes estimators and corresponding highest posterior density confidence intervals will be tabled. Also, Markov chain Monte Carlo simulation is implemented to compare the performances of the different methods and the real dataset is analyzed for illustrative purposes.


2018 ◽  
Vol 19 (4) ◽  
pp. 1129-1136
Author(s):  
Zhenxiang Xing ◽  
Han Zhang ◽  
Yi Ji ◽  
Gong Xinglong ◽  
Qiang Fu ◽  
...  

Abstract Reliability and validity of model prediction play a decisive role in water resource simulation and prediction. Among many prediction models, the combined model (CM) is widely used because it can combine the prediction results of multiple single models and make full use of the information provided by various methods. CM is an effective method to improve the predictive veracity but the weight of single model estimation is the key to the CM. Previous studies take errors as the objective function to calculate the weight, and the uncertainty of the weight of the individual model cannot be considered comprehensively. In order to consider the uncertainty of the weight and to improve universal applicability of the CM, in this paper, the authors intend the Markov chain Monte Carlo based on adaptive Metropolis algorithm (AM-MCMC) to solve the weight of a single model in the CM, and obtain the probability distribution of the weight and the joint probability density of all the weight. Finally, the optimal weight combination is obtained. In order to test the validity of the established model, the author put it into the prediction of monthly groundwater level. The two single models in the CM are time series analysis model (TSAM) and grey model (GM (1,1)), respectively. The case study showed that the uncertainty characteristic of the weight in the CM can be obtained by AM-MCMC. According to the study results, CM has obtained a least average root mean square error (RMSE) of 0.85, a mean absolute percentage error (MAPE) of 8.61, and a coefficient of determination (R2) value of 0.97 for the studied forecast period.


Sign in / Sign up

Export Citation Format

Share Document