scholarly journals Big-But-Biased Data Analytics for Air Quality

Electronics ◽  
2020 ◽  
Vol 9 (9) ◽  
pp. 1551
Author(s):  
Laura Borrajo ◽  
Ricardo Cao

Air pollution is one of the big concerns for smart cities. The problem of applying big data analytics to sampling bias in the context of urban air quality is studied in this paper. A nonparametric estimator that incorporates kernel density estimation is used. When ignoring the biasing weight function, a small-sized simple random sample of the real population is assumed to be additionally observed. The general parameter considered is the mean of a transformation of the random variable of interest. A new bootstrap algorithm is used to approximate the mean squared error of the new estimator. Its minimization leads to an automatic bandwidth selector. The method is applied to a real data set concerning the levels of different pollutants in the urban air of the city of A Coruña (Galicia, NW Spain). Estimations for the mean and the cumulative distribution function of the level of ozone and nitrogen dioxide when the temperature is greater than or equal to 30 ∘C based on 15 years of biased data are obtained.

Author(s):  
Parisa Torkaman

The generalized inverted exponential distribution is introduced as a lifetime model with good statistical properties. This paper, the estimation of the probability density function and the cumulative distribution function of with five different estimation methods: uniformly minimum variance unbiased(UMVU), maximum likelihood(ML), least squares(LS), weighted least squares (WLS) and percentile(PC) estimators are considered. The performance of these estimation procedures, based on the mean squared error (MSE) by numerical simulations are compared. Simulation studies express that the UMVU estimator performs better than others and when the sample size is large enough the ML and UMVU estimators are almost equivalent and efficient than LS, WLS and PC. Finally, the result using a real data set are analyzed.


2018 ◽  
Vol 52 (1) ◽  
pp. 43-59
Author(s):  
AMULYA KUMAR MAHTO ◽  
YOGESH MANI TRIPATH ◽  
SANKU DEY

Burr type X distribution is one of the members of the Burr family which was originally derived by Burr (1942) and can be used quite effectively in modelling strength data and also general lifetime data. In this article, we consider efficient estimation of the probability density function (PDF) and cumulative distribution function (CDF) of Burr X distribution. Eight different estimation methods namely maximum likelihood estimation, uniformly minimum variance unbiased estimation, least square estimation, weighted least square estimation, percentile estimation, maximum product estimation, Cremer-von-Mises estimation and Anderson-Darling estimation are considered. Analytic expressions for bias and mean squared error are derived. Monte Carlo simulations are performed to compare the performances of the proposed methods of estimation for both small and large samples. Finally, a real data set has been analyzed for illustrative purposes.


In this paper, we have defined a new two-parameter new Lindley half Cauchy (NLHC) distribution using Lindley-G family of distribution which accommodates increasing, decreasing and a variety of monotone failure rates. The statistical properties of the proposed distribution such as probability density function, cumulative distribution function, quantile, the measure of skewness and kurtosis are presented. We have briefly described the three well-known estimation methods namely maximum likelihood estimators (MLE), least-square (LSE) and Cramer-Von-Mises (CVM) methods. All the computations are performed in R software. By using the maximum likelihood method, we have constructed the asymptotic confidence interval for the model parameters. We verify empirically the potentiality of the new distribution in modeling a real data set.


2016 ◽  
Vol 144 (4) ◽  
pp. 1649-1668 ◽  
Author(s):  
Daniel Hodyss ◽  
Elizabeth Satterfield ◽  
Justin McLay ◽  
Thomas M. Hamill ◽  
Michael Scheuerer

Abstract Ensemble postprocessing is frequently applied to correct biases and deficiencies in the spread of ensemble forecasts. Methods involving weighted, regression-corrected forecasts address the typical biases and underdispersion of ensembles through a regression correction of ensemble members followed by the generation of a probability density function (PDF) from the weighted sum of kernels fit around each corrected member. The weighting step accounts for the situation where the ensemble is constructed from different model forecasts or generated in some way that creates ensemble members that do not represent equally likely states. In the present work, it is shown that an overweighting of climatology in weighted, regression-corrected forecasts can occur when one first performs a regression-based correction before weighting each member. This overweighting of climatology results in an increase in the mean-squared error of the mean of the predicted PDF. The overweighting of climatology is illustrated in a simulation study and a real-data study, where the reference is generated through a direct application of Bayes’s rule. The real-data example is a comparison of a particular method referred to as Bayesian model averaging (BMA) and a direct application of Bayes’s rule for ocean wave heights using U.S. Navy and National Weather Service global deterministic forecasts. This direct application of Bayes’s rule is shown to not overweight climatology and may be a low-cost replacement for the generally more expensive weighted, regression-correction methods.


2015 ◽  
Vol 2015 ◽  
pp. 1-5 ◽  
Author(s):  
Mursala Khan ◽  
Rajesh Singh

A chain ratio-type estimator is proposed for the estimation of finite population mean under systematic sampling scheme using two auxiliary variables. The mean square error of the proposed estimator is derived up to the first order of approximation and is compared with other relevant existing estimators. To illustrate the performances of the different estimators in comparison with the usual simple estimator, we have taken a real data set from the literature of survey sampling.


Author(s):  
Asifa Mubeen ◽  
Nasir Jamal ◽  
Muhammad Hanif ◽  
Usman Shahzad

The main objective of the present study was to develop a new ridge regression estimator and fit the ridge regression model to the peanut production data of Pakistan. Peanut production data has been used to analyze the results. The data has been taken peanut production and growth rate of Pakistan. The mean square error of the proposed estimator is compared with some existing ridge regression estimators. In this study, we proposed a ridge regression estimator. The properties of proposed estimators are also discussed. The real data set of peanut production is used for assuming the performance of proposed and existing estimators. Numerical results of real data set show that proposed ridge regression estimator provides best results as compare to reviewed ones.


Author(s):  
Hani M. Samawi ◽  
Eman M. Tawalbeh

The performance of a regression estimator based on the double ranked set sample (DRSS) scheme, introduced by Al-Saleh and Al-Kadiri (2000), is investigated when the mean of the auxiliary variable X is unknown. Our primary analysis and simulation indicates that using the DRSS regression estimator for estimating the population mean substantially increases relative efficiency compared to using regression estimator based on simple random sampling (SRS) or ranked set sampling (RSS) (Yu and Lam, 1997) regression estimator.  Moreover, the regression estimator using DRSS is also more efficient than the naïve estimators of the population mean using SRS, RSS (when the correlation coefficient is at least 0.4) and DRSS for high correlation coefficient (at least 0.91.) The theory is illustrated using a real data set of trees.  


2011 ◽  
Vol 83 (2) ◽  
pp. 357-373 ◽  
Author(s):  
Gauss M Cordeiro ◽  
Alexandre B Simas ◽  
Borko D Stošic

The beta Weibull distribution was first introduced by Famoye et al. (2005) and studied by these authors and Lee et al. (2007). However, they do not give explicit expressions for the moments. In this article, we derive explicit closed form expressions for the moments of this distribution, which generalize results available in the literature for some sub-models. We also obtain expansions for the cumulative distribution function and Rényi entropy. Further, we discuss maximum likelihood estimation and provide formulae for the elements of the expected information matrix. We also demonstrate the usefulness of this distribution on a real data set.


Author(s):  
Alisha Banga ◽  
Ravinder Ahuja ◽  
Subhash Chander Sharma

Background and Objective: With the increase in populations in urban areas, there is an increase in pollution also. Air pollution is one of the challenging environmental issues in smart cities. Real-time monitoring of air quality can help the administration to take appropriate decisions on time. Development in the Internet of Things based sensors has changed the way to monitor air quality. Methods: In this paper, we have applied two-stage regressions. In the first stage, ten regression algorithms (Decision Tree, Random Forest, Elastic Net, Adaboost, Extra Tree, Linear Regression, Lasso, XGBoost, Light GBM, AdaBoost, and Multi-Layer Perceptron) is applied and in second stage best four algorithms are picked and stacking ensemble algorithms is applied using python to predict the PM2.5 pollutants in air. Data set of five Chinese cities (Beijing, Chengdu, Guangzhou, Shanghai, and Shenyang) has taken into consideration and compared based on MAE (Mean Absolute Error), RMSE (Root Mean Square Error), and R2 parameters. Results and Conclusion: We observed that out of ten regression algorithms applied extra tree algorithm is giving the highest performance on all the five datasets, and stacking further improves the performance. Feature importance for Sheyang, and Beijing city is computed using three regression algorithms, and we found the four most important features are Humidity, wind speed, wind direction, and dew point.


Author(s):  
Amal Hassan ◽  
Salwa Assar ◽  
Kareem Ali

<p>This paper proposed a new general class of continuous lifetime distributions, which is a complementary to the Poisson-Lindley family proposed by Asgharzadeh et al. [3]. The new class is derived by compounding the maximum of a random number of independent and identically continuous distributed random variables, and Poisson-Lindley distribution. Several properties of the proposed class are discussed, including a formal proof of probability density, cumulative distribution, and reliability and hazard rate functions. The unknown parameters are estimated by the maximum likelihood method and the Fisher’s information matrix elements are determined. Some sub-models of this class are investigated and studied in some details. Finally, a real data set is analyzed to illustrate the performance of new distributions.</p>


Sign in / Sign up

Export Citation Format

Share Document