scholarly journals Prevalence and predicting factors of perceived stress among Bangladeshi university students using machine learning algorithms

Author(s):  
Rumana Rois ◽  
Manik Ray ◽  
Atikur Rahman ◽  
Swapan. K. Roy

Abstract Background: Stress-related mental health problems are one of the most common causes of the burden in university students worldwide. Many studies have been conducted to predict the prevalence of stress among university students, however most of these analyses were predominantly performed using the basic logistic regression model. As an alternative, we used the advanced machine learning approaches for detecting significant risk factors and to predict the prevalence of stress among Bangladeshi university students.Methods: This prevalence study surveyed 355 students from twenty-eight different Bangladeshi universities using questions concerning anthropometric measurements, academic, lifestyles, and health-related information, which referred to the perceived stress status of the respondents (yes or no). Boruta algorithm was used in determining the significant prognostic factors of the prevalence of stress. Prediction models were built using decision tree (DT), random forest (RF), support vector machine (SVM), and logistic regression (LR), and their performances were evaluated using parameters of confusion matrix, ROC curves, and k-fold cross-validation techniques. Results: One-third of university students reported stress within the last 12 months. Students’ pulse rate, systolic and diastolic blood pressures, sleep status, smoking status, and academic background were selected as the important features for predicting the prevalence of stress. Evaluated performance revealed that the highest performance observed from RF (accuracy=0.8972, precision=0.9241, sensitivity=0.9250, specificity=0.8148, AUC=0.8715, k-fold accuracy=0.8983) and the lowest from LR (accuracy=0.7476, precision=0.8354, sensitivity=0.8250, specificity=0.5185, AUC=0.7822, k-fold accuracy=07713) and SVM with polynomial kernel of degree 2 (accuracy=0.7570, precision=0.7975, sensitivity=0.8630, specificity=0.5294, AUC=0.7717, k-fold accuracy=0.7855). The RF model perfectly predicted stress including individual and interaction effects of predictors.Conclusion: The machine learning framework can be detected the significant prognostic factors and predicted this psychological problem more accurately, thereby helping the policy-makers, stakeholders, and families to understand and prevent this serious crisis by improving policy-making strategies, mental health promotion, and establishing effective university counseling services.

2021 ◽  
Vol 40 (1) ◽  
Author(s):  
Rumana Rois ◽  
Manik Ray ◽  
Atikur Rahman ◽  
Swapan K. Roy

Abstract Background Stress-related mental health problems are one of the most common causes of the burden in university students worldwide. Many studies have been conducted to predict the prevalence of stress among university students, however most of these analyses were predominantly performed using the basic logistic regression (LR) model. As an alternative, we used the advanced machine learning (ML) approaches for detecting significant risk factors and to predict the prevalence of stress among Bangladeshi university students. Methods This prevalence study surveyed 355 students from twenty-eight different Bangladeshi universities using questions concerning anthropometric measurements, academic, lifestyles, and health-related information, which referred to the perceived stress status of the respondents (yes or no). Boruta algorithm was used in determining the significant prognostic factors of the prevalence of stress. Prediction models were built using decision tree (DT), random forest (RF), support vector machine (SVM), and LR, and their performances were evaluated using parameters of confusion matrix, receiver operating characteristics (ROC) curves, and k-fold cross-validation techniques. Results One-third of university students reported stress within the last 12 months. Students’ pulse rate, systolic and diastolic blood pressures, sleep status, smoking status, and academic background were selected as the important features for predicting the prevalence of stress. Evaluated performance revealed that the highest performance observed from RF (accuracy = 0.8972, precision = 0.9241, sensitivity = 0.9250, specificity = 0.8148, area under the ROC curve (AUC) = 0.8715, k-fold accuracy = 0.8983) and the lowest from LR (accuracy = 0.7476, precision = 0.8354, sensitivity = 0.8250, specificity = 0.5185, AUC = 0.7822, k-fold accuracy = 07713) and SVM with polynomial kernel of degree 2 (accuracy = 0.7570, precision = 0.7975, sensitivity = 0.8630, specificity = 0.5294, AUC = 0.7717, k-fold accuracy = 0.7855). Overall, the RF model performs better and authentically predicted stress compared with other ML techniques, including individual and interaction effects of predictors. Conclusion The machine learning framework can be detected the significant prognostic factors and predicted this psychological problem more accurately, thereby helping the policy-makers, stakeholders, and families to understand and prevent this serious crisis by improving policy-making strategies, mental health promotion, and establishing effective university counseling services.


Author(s):  
Cheng-Chien Lai ◽  
Wei-Hsin Huang ◽  
Betty Chia-Chen Chang ◽  
Lee-Ching Hwang

Predictors for success in smoking cessation have been studied, but a prediction model capable of providing a success rate for each patient attempting to quit smoking is still lacking. The aim of this study is to develop prediction models using machine learning algorithms to predict the outcome of smoking cessation. Data was acquired from patients underwent smoking cessation program at one medical center in Northern Taiwan. A total of 4875 enrollments fulfilled our inclusion criteria. Models with artificial neural network (ANN), support vector machine (SVM), random forest (RF), logistic regression (LoR), k-nearest neighbor (KNN), classification and regression tree (CART), and naïve Bayes (NB) were trained to predict the final smoking status of the patients in a six-month period. Sensitivity, specificity, accuracy, and area under receiver operating characteristic (ROC) curve (AUC or ROC value) were used to determine the performance of the models. We adopted the ANN model which reached a slightly better performance, with a sensitivity of 0.704, a specificity of 0.567, an accuracy of 0.640, and an ROC value of 0.660 (95% confidence interval (CI): 0.617–0.702) for prediction in smoking cessation outcome. A predictive model for smoking cessation was constructed. The model could aid in providing the predicted success rate for all smokers. It also had the potential to achieve personalized and precision medicine for treatment of smoking cessation.


2020 ◽  
Vol 20 (1) ◽  
Author(s):  
Matthijs Blankers ◽  
Louk F. M. van der Post ◽  
Jack J. M. Dekker

Abstract Background Accurate prediction models for whether patients on the verge of a psychiatric criseis need hospitalization are lacking and machine learning methods may help improve the accuracy of psychiatric hospitalization prediction models. In this paper we evaluate the accuracy of ten machine learning algorithms, including the generalized linear model (GLM/logistic regression) to predict psychiatric hospitalization in the first 12 months after a psychiatric crisis care contact. We also evaluate an ensemble model to optimize the accuracy and we explore individual predictors of hospitalization. Methods Data from 2084 patients included in the longitudinal Amsterdam Study of Acute Psychiatry with at least one reported psychiatric crisis care contact were included. Target variable for the prediction models was whether the patient was hospitalized in the 12 months following inclusion. The predictive power of 39 variables related to patients’ socio-demographics, clinical characteristics and previous mental health care contacts was evaluated. The accuracy and area under the receiver operating characteristic curve (AUC) of the machine learning algorithms were compared and we also estimated the relative importance of each predictor variable. The best and least performing algorithms were compared with GLM/logistic regression using net reclassification improvement analysis and the five best performing algorithms were combined in an ensemble model using stacking. Results All models performed above chance level. We found Gradient Boosting to be the best performing algorithm (AUC = 0.774) and K-Nearest Neighbors to be the least performing (AUC = 0.702). The performance of GLM/logistic regression (AUC = 0.76) was slightly above average among the tested algorithms. In a Net Reclassification Improvement analysis Gradient Boosting outperformed GLM/logistic regression by 2.9% and K-Nearest Neighbors by 11.3%. GLM/logistic regression outperformed K-Nearest Neighbors by 8.7%. Nine of the top-10 most important predictor variables were related to previous mental health care use. Conclusions Gradient Boosting led to the highest predictive accuracy and AUC while GLM/logistic regression performed average among the tested algorithms. Although statistically significant, the magnitude of the differences between the machine learning algorithms was in most cases modest. The results show that a predictive accuracy similar to the best performing model can be achieved when combining multiple algorithms in an ensemble model.


2021 ◽  
Author(s):  
Myeong Gyu Kim ◽  
Jae Hyun Kim ◽  
Kyungim Kim

BACKGROUND Garlic-related misinformation is prevalent whenever a virus outbreak occurs. Again, with the outbreak of coronavirus disease 2019 (COVID-19), garlic-related misinformation is spreading through social media sites, including Twitter. Machine learning-based approaches can be used to detect misinformation from vast tweets. OBJECTIVE This study aimed to develop machine learning algorithms for detecting misinformation on garlic and COVID-19 in Twitter. METHODS This study used 5,929 original tweets mentioning garlic and COVID-19. Tweets were manually labeled as misinformation, accurate information, and others. We tested the following algorithms: k-nearest neighbors; random forest; support vector machine (SVM) with linear, radial, and polynomial kernels; and neural network. Features for machine learning included user-based features (verified account, user type, number of followers, and follower rate) and text-based features (uniform resource locator, negation, sentiment score, Latent Dirichlet Allocation topic probability, number of retweets, and number of favorites). A model with the highest accuracy in the training dataset (70% of overall dataset) was tested using a test dataset (30% of overall dataset). Predictive performance was measured using overall accuracy, sensitivity, specificity, and balanced accuracy. RESULTS SVM with the polynomial kernel model showed the highest accuracy of 0.670. The model also showed a balanced accuracy of 0.757, sensitivity of 0.819, and specificity of 0.696 for misinformation. Important features in the misinformation and accurate information classes included topic 4 (common myths), topic 13 (garlic-specific myths), number of followers, topic 11 (misinformation on social media), and follower rate. Topic 3 (cooking recipes) was the most important feature in the others class. CONCLUSIONS Our SVM model showed good performance in detecting misinformation. The results of our study will help detect misinformation related to garlic and COVID-19. It could also be applied to prevent misinformation related to dietary supplements in the event of a future outbreak of a disease other than COVID-19.


2020 ◽  
Vol 19 ◽  
pp. 153303382090982
Author(s):  
Melek Akcay ◽  
Durmus Etiz ◽  
Ozer Celik ◽  
Alaattin Ozen

Background and Aim: Although the prognosis of nasopharyngeal cancer largely depends on a classification based on the tumor-lymph node metastasis staging system, patients at the same stage may have different clinical outcomes. This study aimed to evaluate the survival prognosis of nasopharyngeal cancer using machine learning. Settings and Design: Original, retrospective. Materials and Methods: A total of 72 patients with a diagnosis of nasopharyngeal cancer who received radiotherapy ± chemotherapy were included in the study. The contribution of patient, tumor, and treatment characteristics to the survival prognosis was evaluated by machine learning using the following techniques: logistic regression, artificial neural network, XGBoost, support-vector clustering, random forest, and Gaussian Naive Bayes. Results: In the analysis of the data set, correlation analysis, and binary logistic regression analyses were applied. Of the 18 independent variables, 10 were found to be effective in predicting nasopharyngeal cancer-related mortality: age, weight loss, initial neutrophil/lymphocyte ratio, initial lactate dehydrogenase, initial hemoglobin, radiotherapy duration, tumor diameter, number of concurrent chemotherapy cycles, and T and N stages. Gaussian Naive Bayes was determined as the best algorithm to evaluate the prognosis of machine learning techniques (accuracy rate: 88%, area under the curve score: 0.91, confidence interval: 0.68-1, sensitivity: 75%, specificity: 100%). Conclusion: Many factors affect prognosis in cancer, and machine learning algorithms can be used to determine which factors have a greater effect on survival prognosis, which then allows further research into these factors. In the current study, Gaussian Naive Bayes was identified as the best algorithm for the evaluation of prognosis of nasopharyngeal cancer.


2020 ◽  
Vol 10 (15) ◽  
pp. 5047 ◽  
Author(s):  
Viet-Ha Nhu ◽  
Danesh Zandi ◽  
Himan Shahabi ◽  
Kamran Chapi ◽  
Ataollah Shirzadi ◽  
...  

This paper aims to apply and compare the performance of the three machine learning algorithms–support vector machine (SVM), bayesian logistic regression (BLR), and alternating decision tree (ADTree)–to map landslide susceptibility along the mountainous road of the Salavat Abad saddle, Kurdistan province, Iran. We identified 66 shallow landslide locations, based on field surveys, by recording the locations of the landslides by a global position System (GPS), Google Earth imagery and black-and-white aerial photographs (scale 1: 20,000) and 19 landslide conditioning factors, then tested these factors using the information gain ratio (IGR) technique. We checked the validity of the models using statistical metrics, including sensitivity, specificity, accuracy, kappa, root mean square error (RMSE), and area under the receiver operating characteristic curve (AUC). We found that, although all three machine learning algorithms yielded excellent performance, the SVM algorithm (AUC = 0.984) slightly outperformed the BLR (AUC = 0.980), and ADTree (AUC = 0.977) algorithms. We observed that not only all three algorithms are useful and effective tools for identifying shallow landslide-prone areas but also the BLR algorithm can be used such as the SVM algorithm as a soft computing benchmark algorithm to check the performance of the models in future.


2019 ◽  
Vol 8 (2) ◽  
pp. 3697-3705 ◽  

Forest fires have become one of the most frequently occurring disasters in recent years. The effects of forest fires have a lasting impact on the environment as it lead to deforestation and global warming, which is also one of its major cause of occurrence. Forest fires are dealt by collecting the satellite images of forest and if there is any emergency caused by the fires then the authorities are notified to mitigate its effects. By the time the authorities get to know about it, the fires would have already caused a lot of damage. Data mining and machine learning techniques can provide an efficient prevention approach where data associated with forests can be used for predicting the eventuality of forest fires. This paper uses the dataset present in the UCI machine learning repository which consists of physical factors and climatic conditions of the Montesinho park situated in Portugal. Various algorithms like Logistic regression, Support Vector Machine, Random forest, K-Nearest neighbors in addition to Bagging and Boosting predictors are used, both with and without Principal Component Analysis (PCA). Among the models in which PCA was applied, Logistic Regression gave the highest F-1 score of 68.26 and among the models where PCA was absent, Gradient boosting gave the highest score of 68.36.


Diagnostics ◽  
2021 ◽  
Vol 11 (10) ◽  
pp. 1909
Author(s):  
Dougho Park ◽  
Eunhwan Jeong ◽  
Haejong Kim ◽  
Hae Wook Pyun ◽  
Haemin Kim ◽  
...  

Background: Functional outcomes after acute ischemic stroke are of great concern to patients and their families, as well as physicians and surgeons who make the clinical decisions. We developed machine learning (ML)-based functional outcome prediction models in acute ischemic stroke. Methods: This retrospective study used a prospective cohort database. A total of 1066 patients with acute ischemic stroke between January 2019 and March 2021 were included. Variables such as demographic factors, stroke-related factors, laboratory findings, and comorbidities were utilized at the time of admission. Five ML algorithms were applied to predict a favorable functional outcome (modified Rankin Scale 0 or 1) at 3 months after stroke onset. Results: Regularized logistic regression showed the best performance with an area under the receiver operating characteristic curve (AUC) of 0.86. Support vector machines represented the second-highest AUC of 0.85 with the highest F1-score of 0.86, and finally, all ML models applied achieved an AUC > 0.8. The National Institute of Health Stroke Scale at admission and age were consistently the top two important variables for generalized logistic regression, random forest, and extreme gradient boosting models. Conclusions: ML-based functional outcome prediction models for acute ischemic stroke were validated and proven to be readily applicable and useful.


2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Li Zhang ◽  
Xia Zhe ◽  
Min Tang ◽  
Jing Zhang ◽  
Jialiang Ren ◽  
...  

Purpose. This study aimed to investigate the value of biparametric magnetic resonance imaging (bp-MRI)-based radiomics signatures for the preoperative prediction of prostate cancer (PCa) grade compared with visual assessments by radiologists based on the Prostate Imaging Reporting and Data System Version 2.1 (PI-RADS V2.1) scores of multiparametric MRI (mp-MRI). Methods. This retrospective study included 142 consecutive patients with histologically confirmed PCa who were undergoing mp-MRI before surgery. MRI images were scored and evaluated by two independent radiologists using PI-RADS V2.1. The radiomics workflow was divided into five steps: (a) image selection and segmentation, (b) feature extraction, (c) feature selection, (d) model establishment, and (e) model evaluation. Three machine learning algorithms (random forest tree (RF), logistic regression, and support vector machine (SVM)) were constructed to differentiate high-grade from low-grade PCa. Receiver operating characteristic (ROC) analysis was used to compare the machine learning-based analysis of bp-MRI radiomics models with PI-RADS V2.1. Results. In all, 8 stable radiomics features out of 804 extracted features based on T2-weighted imaging (T2WI) and ADC sequences were selected. Radiomics signatures successfully categorized high-grade and low-grade PCa cases ( P < 0.05 ) in both the training and test datasets. The radiomics model-based RF method (area under the curve, AUC: 0.982; 0.918), logistic regression (AUC: 0.886; 0.886), and SVM (AUC: 0.943; 0.913) in both the training and test cohorts had better diagnostic performance than PI-RADS V2.1 (AUC: 0.767; 0.813) when predicting PCa grade. Conclusions. The results of this clinical study indicate that machine learning-based analysis of bp-MRI radiomic models may be helpful for distinguishing high-grade and low-grade PCa that outperformed the PI-RADS V2.1 scores based on mp-MRI. The machine learning algorithm RF model was slightly better.


Sign in / Sign up

Export Citation Format

Share Document