Applying Artificial Neural Networks. I. Estimating Nicotine in Tobacco from near Infrared Data

1995 ◽  
Vol 3 (3) ◽  
pp. 133-142 ◽  
Author(s):  
M. Hana ◽  
W.F. McClure ◽  
T.B. Whitaker ◽  
M. White ◽  
D.R. Bahler

Two artificial neural network models were used to estimate the nicotine in tobacco: (i) a back-propagation network and (ii) a linear network. The back-propagation network consisted of an input layer, an output layer and one hidden layer. The linear network consisted of an input layer and an output layer. Both networks used the generalised delta rule for learning. Performances of both networks were compared to the multiple linear regression method MLR of calibration. The nicotine content in tobacco samples was estimated for two different data sets. Data set A contained 110 near infrared (NIR) spectra each consisting of reflected energy at eight wavelengths. Data set B consisted of 200 NIR spectra with each spectrum having 840 spectral data points. The Fast Fourier transformation was applied to data set B in order to compress each spectrum into 13 Fourier coefficients. For data set A, the linear regression model gave better results followed by the back-propagation network which was followed by the linear network. The true performance of the linear regression model was better than the back-propagation and the linear networks by 14.0% and 18.1%, respectively. For data set B, the back-propagation network gave the best result followed by MLR and the linear network. Both the linear network and MLR models gave almost the same results. The true performance of the back-propagation network model was better than the MLR and linear network by 35.14%.

2020 ◽  
Vol 9 (11) ◽  
pp. 654
Author(s):  
Guanwei Zhao ◽  
Muzhuang Yang

Mapping population distribution at fine resolutions with high accuracy is crucial to urban planning and management. This paper takes Guangzhou city as the study area, illustrates the gridded population distribution map by using machine learning methods based on zoning strategy with multisource geospatial data such as night light remote sensing data, point of interest data, land use data, and so on. The street-level accuracy evaluation results show that the proposed approach achieved good overall accuracy, with determinant coefficient (R2) being 0.713 and root mean square error (RMSE) being 5512.9. Meanwhile, the goodness of fit for single linear regression (LR) model and random forest (RF) regression model are 0.0039 and 0.605, respectively. For dense area, the accuracy of the random forest model is better than the linear regression model, while for sparse area, the accuracy of the linear regression model is better than the random forest model. The results indicated that the proposed method has great potential in fine-scale population mapping. Therefore, it is advised that the zonal modeling strategy should be the primary choice for solving regional differences in the population distribution mapping research.


2020 ◽  
Vol 2020 ◽  
pp. 1-17
Author(s):  
Adewale F. Lukman ◽  
B. M. Golam Kibria ◽  
Kayode Ayinde ◽  
Segun L. Jegede

Motivated by the ridge regression (Hoerl and Kennard, 1970) and Liu (1993) estimators, this paper proposes a modified Liu estimator to solve the multicollinearity problem for the linear regression model. This modification places this estimator in the class of the ridge and Liu estimators with a single biasing parameter. Theoretical comparisons, real-life application, and simulation results show that it consistently dominates the usual Liu estimator. Under some conditions, it performs better than the ridge regression estimators in the smaller MSE sense. Two real-life data are analyzed to illustrate the findings of the paper and the performances of the estimators assessed by MSE and the mean squared prediction error. The application result agrees with the theoretical and simulation results.


1997 ◽  
Vol 5 (1) ◽  
pp. 19-25 ◽  
Author(s):  
Maha Hana ◽  
W.F. McClure ◽  
T.B. Whitaker ◽  
M.W. White ◽  
D.R. Bahler

Classification of flue-cured and Burley tobacco types with artificial neural networks (ANNs) were studied. Burley tobacco was further classified as either grown in the USA or grown outside the USA. The input data were in the form of near infrared (NIR) spectra, each spectrum containing 19 points. The number of flue-cured and Burley samples were 654 and 959, respectively. The number of native and non-native tobacco samples were 266 and 267, respectively. The models selected for this research were a quadratic classifier, a back-propagation network and a linear network. The results of the calibration model and the true performance for classifying tobacco species were (100%, 100%), (99.38%, 99.39%) and (95.19%, 99.26%) for the quadratic classifier, back-propagation network and linear network, respectively. The identification of native tobacco and its true performance were (100%, 100%) using a quadratic classifier, (89.12%, 88.46%) using a back-propagation network and (80.68%, 79.62%) using a linear network.


Author(s):  
K. GANESAN ◽  
TAGHI M. KHOSHGOFTAAR ◽  
EDWARD B. ALLEN

Highly reliable software is becoming an essential ingredient in many systems. However, assuring reliability often entails time-consuming costly development processes. One cost-effective strategy is to target reliability-enhancement activities to those modules that are likely to have the most problems. Software quality prediction models can predict the number of faults expected in each module early enough for reliability enhancement to be effective. This paper introduces a case-based reasoning technique for the prediction of software quality factors. Case-based reasoning is a technique that seeks to answer new problems by identifying similar "cases" from the past. A case-based reasoning system can function as a software quality prediction model. To our knowledge, this study is the first to use case-based reasoning systems for predicting quantitative measures of software quality. A case study applied case-based reasoning to software quality modeling of a family of full-scale industrial software systems. The case-based reasoning system's accuracy was much better than a corresponding multiple linear regression model in predicting the number of design faults. When predicting faults in code, its accuracy was significantly better than a corresponding multiple linear regression model for two of three test data sets and statistically equivalent for the third.


2020 ◽  
Author(s):  
Peijia Liu ◽  
Dong Yang ◽  
Shaomin Li ◽  
Yutian Chong ◽  
Wentao Hu ◽  
...  

Abstract Background The utilization of estimating-GFR equations is critical for kidney disease in the clinic. However, the performance of the Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) equation has not improved substantially in the past eight years. Here we hypothesized that random forest regression(RF) method could go beyond revised linear regression, which is used to build the CKD-EPI equationMethods 1732 participants were enrolled in this study totally (1333 in development data set from Tianhe District and 399 in external data set Luogang District). Recursive feature elimination (RFE) is applied to the development data to select important variables and build random forest models. Then same variables were used to develop the estimated GFR equation with linear regression as a comparison. The performances of these equations are measured by bias, 30% accuracy , precision and root mean square error(RMSE).Results Of all the variables, creatinine, cystatin C, weight, body mass index (BMI), age, uric acid(UA), blood urea nitrogen(BUN), hematocrit(HCT) and apolipoprotein B(APOB) were selected by RFE method. The results revealed that the overall performance of random forest regression models ascended the revised regression models based on the same variables. In the 9-variable model, RF model was better than revised linear regression in term of bias, precision ,30%accuracy and RMSE(0.78 vs 2.98, 16.90 vs 23.62, 0.84 vs 0.80, 16.88 vs 18.70, all P<0.01 ). In the 4-variable model, random forest regression model showed an improvement in precision and RMSE compared with revised regression model. (20.82 vs 25.25, P<0.01, 19.08 vs 20.60, P<0.001). Bias and 30%accurancy were preferable, but the results were not statistically significant (0.34 vs 2.07, P=0.10, 0.8 vs 0.78, P=0.19, respectively).Conclusions The performances of random forest regression models are better than revised linear regression models when it comes to GFR estimation.


2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Qingqi Zhang

In this paper, the author first analyzes the major factors affecting housing prices with Spearman correlation coefficient, selects significant factors influencing general housing prices, and conducts a combined analysis algorithm. Then, the author establishes a multiple linear regression model for housing price prediction and applies the data set of real estate prices in Boston to test the method. Through the data analysis and test in this paper, it can be summarized that the multiple linear regression model can effectively predict and analyze the housing price to some extent, while the algorithm can still be improved through more advanced machine learning methods.


2020 ◽  
Author(s):  
Peijia Liu ◽  
Dong Yang ◽  
Shaomin Li ◽  
Yutian Chong ◽  
Ming Li ◽  
...  

Abstract Background The utilization of estimating-GFR equations is critical for kidney disease in the clinic. However, the performance of the Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) equation has not improved substantially in the past eight years. Here we hypothesized that random forest regression(RF) method could go beyond revised linear regression, which is used to build the CKD-EPI equation Methods 1732 participants were enrolled in this study totally (1333 in development data set from Tianhe District and 399 in external data set Luogang District). Recursive feature elimination (RFE) is applied to the development data to select important variables and build random forest models. Then same variables were used to develop the estimated GFR equation with linear regression as a comparison. The performances of these equations are measured by bias, 30% accuracy, precision and root mean square error(RMSE). Results Of all the variables, creatinine, cystatin C, weight, body mass index (BMI), age, uric acid(UA), blood urea nitrogen(BUN), hematocrit(HCT) and apolipoprotein B(APOB) were selected by RFE method. The results revealed that the overall performance of random forest regression models ascended the revised regression models based on the same variables. In the 9-variable model, RF model was better than revised linear regression in term of bias, precision ,30%accuracy and RMSE(0.78 vs 2.98, 16.90 vs 23.62, 0.84 vs 0.80, 16.88 vs 18.70, all P < 0.01 ). In the 4-variable model, random forest regression model showed an improvement in precision and RMSE compared with revised regression model. (20.82 vs 25.25, P < 0.01, 19.08 vs 20.60, P < 0.001). Bias and 30%accurancy were preferable, but the results were not statistically significant (0.34 vs 2.07, P = 0.10, 0.8 vs 0.78, P = 0.19, respectively). Conclusions The performances of random forest regression models are better than revised linear regression models when it comes to GFR estimation.


Scientifica ◽  
2020 ◽  
Vol 2020 ◽  
pp. 1-16 ◽  
Author(s):  
B. M. Golam Kibria ◽  
Adewale F. Lukman

The ridge regression-type (Hoerl and Kennard, 1970) and Liu-type (Liu, 1993) estimators are consistently attractive shrinkage methods to reduce the effects of multicollinearity for both linear and nonlinear regression models. This paper proposes a new estimator to solve the multicollinearity problem for the linear regression model. Theory and simulation results show that, under some conditions, it performs better than both Liu and ridge regression estimators in the smaller MSE sense. Two real-life (chemical and economic) data are analyzed to illustrate the findings of the paper.


PeerJ ◽  
2019 ◽  
Vol 7 ◽  
pp. e8192 ◽  
Author(s):  
Gökhan Karakülah ◽  
Nazmiye Arslan ◽  
Cihangir Yandım ◽  
Aslı Suner

Introduction Recent studies highlight the crucial regulatory roles of transposable elements (TEs) on proximal gene expression in distinct biological contexts such as disease and development. However, computational tools extracting potential TE –proximal gene expression associations from RNA-sequencing data are still missing. Implementation Herein, we developed a novel R package, using a linear regression model, for studying the potential influence of TE species on proximal gene expression from a given RNA-sequencing data set. Our R package, namely TEffectR, makes use of publicly available RepeatMasker TE and Ensembl gene annotations as well as several functions of other R-packages. It calculates total read counts of TEs from sorted and indexed genome aligned BAM files provided by the user, and determines statistically significant relations between TE expression and the transcription of nearby genes under diverse biological conditions. Availability TEffectR is freely available at https://github.com/karakulahg/TEffectR along with a handy tutorial as exemplified by the analysis of RNA-sequencing data including normal and tumour tissue specimens obtained from breast cancer patients.


Sign in / Sign up

Export Citation Format

Share Document