scholarly journals Predicting the Sincerity of a Question Asked

Author(s):  
Tuan Minh ◽  
PHAYUNG MEESAD

<div>The growth of applications in both scientific socialism and naturalism causes increasingly difficult to assess whether a question is sincere or not. It is mandatory for many marketing and financial companies. Many utilizations will be reconfigured beyond recognition, especially text and images, while others face potential extinction as a corollary of advances in technology and computer science in particular. Analyzing text and image data will be truly needed for understanding valuable insights. In this paper, we analyzed the Quora dataset obtained from Kaggle.com to filter insincere and spam content. We used different preprocessing algorithms and analysis models providing in PySpark. Besides, we analyzed the manner of users established in writing their posts via the proposed prediction models. Finally, we show the most accurate algorithm of the selected algorithms for classifying questions on Quora. The Gradient Boosted Tree was the best model for questions on Quora with accuracy that is 79.5%. Compared to other methods, the same building in Scikitlearn and machine learning LSTM+GRU, applying models in SpySpark could get the better answer in classifying questions on Quora.</div>

2021 ◽  
Author(s):  
Tuan Minh ◽  
PHAYUNG MEESAD

<div>The growth of applications in both scientific socialism and naturalism causes increasingly difficult to assess whether a question is sincere or not. It is mandatory for many marketing and financial companies. Many utilizations will be reconfigured beyond recognition, especially text and images, while others face potential extinction as a corollary of advances in technology and computer science in particular. Analyzing text and image data will be truly needed for understanding valuable insights. In this paper, we analyzed the Quora dataset obtained from Kaggle.com to filter insincere and spam content. We used different preprocessing algorithms and analysis models providing in PySpark. Besides, we analyzed the manner of users established in writing their posts via the proposed prediction models. Finally, we show the most accurate algorithm of the selected algorithms for classifying questions on Quora. The Gradient Boosted Tree was the best model for questions on Quora with accuracy that is 79.5%. Compared to other methods, the same building in Scikitlearn and machine learning LSTM+GRU, applying models in SpySpark could get the better answer in classifying questions on Quora.</div>


2021 ◽  
Vol 11 (11) ◽  
pp. 1127
Author(s):  
Arun Govindaiah ◽  
Abdul Baten ◽  
R. Theodore Smith ◽  
Siva Balasubramanian ◽  
Alauddin Bhuiyan

Age-related macular degeneration (AMD) is a leading cause of blindness in the developed world. In this study, we compare the performance of retinal fundus images and genetic-information-based machine learning models for the prediction of late AMD. Using data from the Age-related Eye Disease Study, we built machine learning models with various combinations of genetic, socio-demographic/clinical, and retinal image data to predict late AMD using its severity and category in a single visit, in 2, 5, and 10 years. We compared their performance in sensitivity, specificity, accuracy, and unweighted kappa. The 2-year model based on retinal image and socio-demographic (S-D) parameters achieved a sensitivity of 91.34%, specificity of 84.49% while the same for genetic and S-D-parameters-based model was 79.79% and 66.84%. For the 5-year model, the retinal image and S-D-parameters-based model also outperformed the genetic and S-D parameters-based model. The two 10-year models achieved similar sensitivities of 74.24% and 75.79%, respectively, but the retinal image and S-D-parameters-based model was otherwise superior. The retinal-image-based models were not further improved by adding genetic data. Retinal imaging and S-D data can build an excellent machine learning predictor of developing late AMD over 2–5 years; the retinal imaging model appears to be the preferred prognostic tool for efficient patient management.


2019 ◽  
Author(s):  
Oskar Flygare ◽  
Jesper Enander ◽  
Erik Andersson ◽  
Brjánn Ljótsson ◽  
Volen Z Ivanov ◽  
...  

**Background:** Previous attempts to identify predictors of treatment outcomes in body dysmorphic disorder (BDD) have yielded inconsistent findings. One way to increase precision and clinical utility could be to use machine learning methods, which can incorporate multiple non-linear associations in prediction models. **Methods:** This study used a random forests machine learning approach to test if it is possible to reliably predict remission from BDD in a sample of 88 individuals that had received internet-delivered cognitive behavioral therapy for BDD. The random forest models were compared to traditional logistic regression analyses. **Results:** Random forests correctly identified 78% of participants as remitters or non-remitters at post-treatment. The accuracy of prediction was lower in subsequent follow-ups (68%, 66% and 61% correctly classified at 3-, 12- and 24-month follow-ups, respectively). Depressive symptoms, treatment credibility, working alliance, and initial severity of BDD were among the most important predictors at the beginning of treatment. By contrast, the logistic regression models did not identify consistent and strong predictors of remission from BDD. **Conclusions:** The results provide initial support for the clinical utility of machine learning approaches in the prediction of outcomes of patients with BDD. **Trial registration:** ClinicalTrials.gov ID: NCT02010619.


2020 ◽  
Author(s):  
Sina Faizollahzadeh Ardabili ◽  
Amir Mosavi ◽  
Pedram Ghamisi ◽  
Filip Ferdinand ◽  
Annamaria R. Varkonyi-Koczy ◽  
...  

Several outbreak prediction models for COVID-19 are being used by officials around the world to make informed-decisions and enforce relevant control measures. Among the standard models for COVID-19 global pandemic prediction, simple epidemiological and statistical models have received more attention by authorities, and they are popular in the media. Due to a high level of uncertainty and lack of essential data, standard models have shown low accuracy for long-term prediction. Although the literature includes several attempts to address this issue, the essential generalization and robustness abilities of existing models needs to be improved. This paper presents a comparative analysis of machine learning and soft computing models to predict the COVID-19 outbreak as an alternative to SIR and SEIR models. Among a wide range of machine learning models investigated, two models showed promising results (i.e., multi-layered perceptron, MLP, and adaptive network-based fuzzy inference system, ANFIS). Based on the results reported here, and due to the highly complex nature of the COVID-19 outbreak and variation in its behavior from nation-to-nation, this study suggests machine learning as an effective tool to model the outbreak. This paper provides an initial benchmarking to demonstrate the potential of machine learning for future research. Paper further suggests that real novelty in outbreak prediction can be realized through integrating machine learning and SEIR models.


2019 ◽  
Vol 21 (9) ◽  
pp. 662-669 ◽  
Author(s):  
Junnan Zhao ◽  
Lu Zhu ◽  
Weineng Zhou ◽  
Lingfeng Yin ◽  
Yuchen Wang ◽  
...  

Background: Thrombin is the central protease of the vertebrate blood coagulation cascade, which is closely related to cardiovascular diseases. The inhibitory constant Ki is the most significant property of thrombin inhibitors. Method: This study was carried out to predict Ki values of thrombin inhibitors based on a large data set by using machine learning methods. Taking advantage of finding non-intuitive regularities on high-dimensional datasets, machine learning can be used to build effective predictive models. A total of 6554 descriptors for each compound were collected and an efficient descriptor selection method was chosen to find the appropriate descriptors. Four different methods including multiple linear regression (MLR), K Nearest Neighbors (KNN), Gradient Boosting Regression Tree (GBRT) and Support Vector Machine (SVM) were implemented to build prediction models with these selected descriptors. Results: The SVM model was the best one among these methods with R2=0.84, MSE=0.55 for the training set and R2=0.83, MSE=0.56 for the test set. Several validation methods such as yrandomization test and applicability domain evaluation, were adopted to assess the robustness and generalization ability of the model. The final model shows excellent stability and predictive ability and can be employed for rapid estimation of the inhibitory constant, which is full of help for designing novel thrombin inhibitors.


2020 ◽  
Vol 16 ◽  
Author(s):  
Nitigya Sambyal ◽  
Poonam Saini ◽  
Rupali Syal

Background and Introduction: Diabetes mellitus is a metabolic disorder that has emerged as a serious public health issue worldwide. According to the World Health Organization (WHO), without interventions, the number of diabetic incidences is expected to be at least 629 million by 2045. Uncontrolled diabetes gradually leads to progressive damage to eyes, heart, kidneys, blood vessels and nerves. Method: The paper presents a critical review of existing statistical and Artificial Intelligence (AI) based machine learning techniques with respect to DM complications namely retinopathy, neuropathy and nephropathy. The statistical and machine learning analytic techniques are used to structure the subsequent content review. Result: It has been inferred that statistical analysis can help only in inferential and descriptive analysis whereas, AI based machine learning models can even provide actionable prediction models for faster and accurate diagnose of complications associated with DM. Conclusion: The integration of AI based analytics techniques like machine learning and deep learning in clinical medicine will result in improved disease management through faster disease detection and cost reduction for disease treatment.


2021 ◽  
Vol 15 ◽  
Author(s):  
Alhassan Alkuhlani ◽  
Walaa Gad ◽  
Mohamed Roushdy ◽  
Abdel-Badeeh M. Salem

Background: Glycosylation is one of the most common post-translation modifications (PTMs) in organism cells. It plays important roles in several biological processes including cell-cell interaction, protein folding, antigen’s recognition, and immune response. In addition, glycosylation is associated with many human diseases such as cancer, diabetes and coronaviruses. The experimental techniques for identifying glycosylation sites are time-consuming, extensive laboratory work, and expensive. Therefore, computational intelligence techniques are becoming very important for glycosylation site prediction. Objective: This paper is a theoretical discussion of the technical aspects of the biotechnological (e.g., using artificial intelligence and machine learning) to digital bioinformatics research and intelligent biocomputing. The computational intelligent techniques have shown efficient results for predicting N-linked, O-linked and C-linked glycosylation sites. In the last two decades, many studies have been conducted for glycosylation site prediction using these techniques. In this paper, we analyze and compare a wide range of intelligent techniques of these studies from multiple aspects. The current challenges and difficulties facing the software developers and knowledge engineers for predicting glycosylation sites are also included. Method: The comparison between these different studies is introduced including many criteria such as databases, feature extraction and selection, machine learning classification methods, evaluation measures and the performance results. Results and conclusions: Many challenges and problems are presented. Consequently, more efforts are needed to get more accurate prediction models for the three basic types of glycosylation sites.


2018 ◽  
Author(s):  
Liyan Pan ◽  
Guangjian Liu ◽  
Xiaojian Mao ◽  
Huixian Li ◽  
Jiexin Zhang ◽  
...  

BACKGROUND Central precocious puberty (CPP) in girls seriously affects their physical and mental development in childhood. The method of diagnosis—gonadotropin-releasing hormone (GnRH)–stimulation test or GnRH analogue (GnRHa)–stimulation test—is expensive and makes patients uncomfortable due to the need for repeated blood sampling. OBJECTIVE We aimed to combine multiple CPP–related features and construct machine learning models to predict response to the GnRHa-stimulation test. METHODS In this retrospective study, we analyzed clinical and laboratory data of 1757 girls who underwent a GnRHa test in order to develop XGBoost and random forest classifiers for prediction of response to the GnRHa test. The local interpretable model-agnostic explanations (LIME) algorithm was used with the black-box classifiers to increase their interpretability. We measured sensitivity, specificity, and area under receiver operating characteristic (AUC) of the models. RESULTS Both the XGBoost and random forest models achieved good performance in distinguishing between positive and negative responses, with the AUC ranging from 0.88 to 0.90, sensitivity ranging from 77.91% to 77.94%, and specificity ranging from 84.32% to 87.66%. Basal serum luteinizing hormone, follicle-stimulating hormone, and insulin-like growth factor-I levels were found to be the three most important factors. In the interpretable models of LIME, the abovementioned variables made high contributions to the prediction probability. CONCLUSIONS The prediction models we developed can help diagnose CPP and may be used as a prescreening tool before the GnRHa-stimulation test.


2019 ◽  
Vol 41 (2) ◽  
pp. 284-287
Author(s):  
Pedro Guilherme Coelho Hannun ◽  
Luis Gustavo Modelli de Andrade

Abstract Introduction: The prediction of post transplantation outcomes is clinically important and involves several problems. The current prediction models based on standard statistics are very complex, difficult to validate and do not provide accurate prediction. Machine learning, a statistical technique that allows the computer to make future predictions using previous experiences, is beginning to be used in order to solve these issues. In the field of kidney transplantation, computational forecasting use has been reported in prediction of chronic allograft rejection, delayed graft function, and graft survival. This paper describes machine learning principles and steps to make a prediction and performs a brief analysis of the most recent applications of its application in literature. Discussion: There is compelling evidence that machine learning approaches based on donor and recipient data are better in providing improved prognosis of graft outcomes than traditional analysis. The immediate expectations that emerge from this new prediction modelling technique are that it will generate better clinical decisions based on dynamic and local practice data and optimize organ allocation as well as post transplantation care management. Despite the promising results, there is no substantial number of studies yet to determine feasibility of its application in a clinical setting. Conclusion: The way we deal with storage data in electronic health records will radically change in the coming years and machine learning will be part of clinical daily routine, whether to predict clinical outcomes or suggest diagnosis based on institutional experience.


Sign in / Sign up

Export Citation Format

Share Document