scholarly journals Classifying Asian Rice Cultivars (Oryza sativa L.) into Indica and Japonica Using Logistic Regression Model with Publicly Available Phenotypic Data

2018 ◽  
Author(s):  
Bongsong Kim

AbstractThis article introduces how to implement the logistic regression model (LRM) with phenotypic variables for classifying Asian rice (Oryza sativa L.) cultivars into two pivotal subpopulations, indica and japonica. This study took advantage of publicly available data attached to a previous paper. The classification accuracy was assessed using an area under curve (AUC) of a receiver operating characteristic (ROC) curve. Given 24 phenotypic variables for 280 indica/japonica accessions, the LRMs were fitted with up to six phenotypic variables of all possible combinations; the highest AUC accounts for 0.9977, obtained with six variables including panicle number per plant, seed number per panicle, florets per panicle, panicle fertility, straighthead susceptibility and blast resistance. Overall, the more variables there are, the higher the resulting AUCs are. The ultimate purpose of this study is to demonstrate the indica/japonica prediction ability of the LRM when applied to unclassified Asian rice cultivars. To estimate the indica/japonica prediction accuracy, ten-fold cross-validations were conducted 100 times with the 280 indica/japonica accessions using the LRM with parameters that yielded the highest AUC. The resulting prediction accuracy accounted for 0.9779. This suggests that the LRM promises to be a highly effective indica/japonica prediction tool using phenotypic variables in Asian cultivated rice.

PeerJ ◽  
2019 ◽  
Vol 7 ◽  
pp. e7259
Author(s):  
Bongsong Kim

In Oryza sativa, indica and japonica are pivotal subpopulations, and other subpopulations such as aus and aromatic are considered to be derived from indica or japonica. In this regard, Oryza sativa accessions are frequently viewed from the indica/japonica perspective. This study introduces a computational method for indica/japonica classification by applying phenotypic variables to the logistic regression model (LRM). The population used in this study included 413 Oryza sativa accessions, of which 280 accessions were indica or japonica. Out of 24 phenotypic variables, a set of seven phenotypic variables was identified to collectively generate the fully accurate indica/japonica separation power of the LRM. The resulting parameters were used to define the customized LRM. Given the 280 indica/japonica accessions, the classification accuracy of the customized LRM along with the set of seven phenotypic variables was estimated by 100 iterations of ten-fold cross-validations. As a result, the classification accuracy of 100% was achieved. This suggests that the LRM can be an effective tool to analyze the indica/japonica classification with phenotypic variables in Oryza sativa.


2021 ◽  
Vol 9 ◽  
Author(s):  
Deliang Sun ◽  
Haijia Wen ◽  
Jiahui Xu ◽  
Yalan Zhang ◽  
Danzhou Wang ◽  
...  

This study aims to develop a logistic regression model of landslide susceptibility based on GeoDetector for dominant-factor screening and 10-fold cross validation for training sample optimization. First, Fengjie county, a typical mountainous area, was selected as the study area since it experienced 1,522 landslides from 2001 to 2016. Second, 22 factors were selected as the initial conditioning factors, and a geospatial database was established with a grid of 30 m precision. Factor detection of the geographic detector and the stepwise regression method included in logistic regression were used to screen out the dominant factors from the database. Then, based on the sample dataset with a 1:10 ratio of landslides and nonlandslides, 10-fold cross validation was used to select the optimized sample to train the logistic regression model of landslide susceptibility in the study area. Finally, the accuracy and efficiency of the two models before and after screening out the dominant factors were evaluated and compared. The results showed that the total accuracy of the two models was both more than 0.9, and the area under the curve value of the receiver operating characteristic curve was more than 0.8, indicating that the models before and after screening factor both had high reliability and good prediction ability. Besides, the screened factors had an active leading role in the geospatial distribution of the historical landslide, indicating that the screened dominant factors have individual rationality. Improving the geospatial agreement between landslide susceptibility and actual landslide-prone by the screening of dominant factors and the optimization of the training samples, a simple, efficient, and reliable logistic-regression–based landslide susceptibility model can be constructed.


2021 ◽  
Author(s):  
Wenhui Li ◽  
Quanli Xu ◽  
Junhua Yi ◽  
Jing Liu

Abstract Establishing an effective forest fire forecasting mechanism is the premise of scientific planning and management of forest fires and forest fire prevention. In recent years, the forest fire prediction mechanism has been one of the key areas of concern for the government forestry management departments and forestry researchers. One of them, is logistic regression ( LR ). It is a relatively frequent prediction probability model used in forest fire prediction and prediction in China and abroad for the past few years. However, with the gradual deepening of research, it is found that the logistic regression model fails to fully consider the spatial non-stationary relationship between forest fires and driving factors, which leads to poor fitting effect and low prediction accuracy of the model. But its extended counterpart, the Geographically weighted logistic regression ( GWLR ) model, takes into account the spatial correlation between model variables, and effectively improves the fitting ability and prediction accuracy of the model. Therefore, this paper compares the ability of the logistic regression model and the geographically weighted logistic regression model in terms of fitting ability and prediction accuracy in order to obtain the ability of the two models to predict forest fires in Yunnan Province. In this paper, the samples were divided into 60% training samples and 40% test samples, and repeated sampling was carried out 5 times for training. Variables that appeared in the training model for 3 or more times were used to construct the final LR and GWLR models. Finally, the models with better fitting ability and higher prediction accuracy were used to classify the fire risks in Yunnan Province. The results show that the geographically weighted logistic regression model is superior to the logistic regression model in terms of fitting effect and accuracy. The geographically weighted logistic regression model is more suitable for the data structure of forest fires in Yunnan Province and has better prediction ability. The AUC value of the geographically weighted logistic regression model is 0.902, and the prediction accuracy is 82.7 %; The AUC value of logistic regression model was 0.891, and the prediction accuracy was 80.1%; Fully considering the spatial heterogeneity among model variables can, to some extent, predict forest fires more accurately. The fitting of the two models shows that the relative humidity, temperature, air pressure, sunshine hours, daily precipitation, wind speed, and other meteorological factors; Vegetation type; terrain factor; Population density, road network and other human activity factors become the cause of forest fires in Yunnan Province.


2021 ◽  
Vol 2021 ◽  
pp. 1-7
Author(s):  
Emre Altinkurt ◽  
Ozkan Avci ◽  
Orkun Muftuoglu ◽  
Adem Ugurlu ◽  
Zafer Cebeci ◽  
...  

Purpose. Diagnose keratoconus by establishing an effective logistic regression model from the data obtained with a Scheimpflug-Placido cornea topographer. Methods. Topographical parameters of 125 eyes of 70 patients diagnosed with keratoconus by clinical or topographical findings were compared with 120 eyes of 63 patients who were defined as keratorefractive surgery candidates. The receiver operating character (ROC) curve analysis was performed to determine the diagnostic ability of the topographic parameters. The data set of parameters with an AUROC (area under the ROC curve) value greater than 0.9 was analyzed with logistic regression analysis (LRA) to determine the most predictive model that could diagnose keratoconus. A logit formula of the model was built, and the logit values of every eye in the study were calculated according to this formula. Then, an ROC analysis of the logit values was done. Results. Baiocchi Calossi Versaci front index (BCVf) had the highest AUROC value (0.976) in the study. The LRA model, which had the highest prediction ability, had 97.5% accuracy, 96.8% sensitivity, and 99.2% specificity. The most significant parameters were found to be BCVf ( p = 0.001 ), BCVb (Baiocchi Calossi Versaci back) ( p = 0.002 ), posterior rf (apical radius of the flattest meridian of the aspherotoric surface in 4.5 mm diameter of the cornea) ( p = 0.005 ), central corneal thickness ( p = 0.072 ), and minimum corneal thickness ( p = 0.494 ). Conclusions. The LRA model can distinguish keratoconus corneas from normal ones with high accuracy without the need for complex computer algorithms.


Author(s):  
Osama EL-Ansary ◽  
Mohamed Saleh

Purpose – the main purpose of the study is to investigate an accurate prediction method for banking distress applied on a set of Egyptian banks.Methodology - the researchers have compared the prediction accuracy of the discriminant analysis and logistic regression model, to choose the most appropriate one. The data has been collected from the “Bank scope” data base and for the period of 2002–2016.Findings – the results of the study revealed that the predictive accuracy of discriminant analysis outperformed that of the logistic regression model.Originality - The study adds value to the literature as it is one of the few studies that is concerned with predicating the banking financial distress especially in Egypt.


Sign in / Sign up

Export Citation Format

Share Document