Twitter Analysis of the Nonmedical Use and Side Effects of Methylphenidate: Machine Learning Study

Myeong Gyu Kim; Jungu Kim; Su Cheol Kim; Jaegwon Jeong

doi:10.2196/16466

Twitter Analysis of the Nonmedical Use and Side Effects of Methylphenidate: Machine Learning Study

Journal of Medical Internet Research ◽

10.2196/16466 ◽

2020 ◽

Vol 22 (2) ◽

pp. e16466 ◽

Cited By ~ 2

Author(s):

Myeong Gyu Kim ◽

Jungu Kim ◽

Su Cheol Kim ◽

Jaegwon Jeong

Keyword(s):

Machine Learning ◽

Side Effects ◽

Brand Names ◽

Training Dataset ◽

Support Vector ◽

Test Dataset ◽

Hyperactivity Disorder ◽

Nonmedical Use ◽

Machine Learning Approach ◽

Twitter Analysis

Background Methylphenidate, a stimulant used to treat attention deficit hyperactivity disorder, has the potential to be used nonmedically, such as for studying and recreation. In an era when many people actively use social networking services, experience with the nonmedical use or side effects of methylphenidate might be shared on Twitter. Objective The purpose of this study was to analyze tweets about the nonmedical use and side effects of methylphenidate using a machine learning approach. Methods A total of 34,293 tweets mentioning methylphenidate from August 2018 to July 2019 were collected using searches for “methylphenidate” and its brand names. Tweets in a randomly selected training dataset (6860/34,293, 20.00%) were annotated as positive or negative for two dependent variables: nonmedical use and side effects. Features such as personal noun, nonmedical use terms, medical use terms, side effect terms, sentiment scores, and the presence of a URL were generated for supervised learning. Using the labeled training dataset and features, support vector machine (SVM) classifiers were built and the performance was evaluated using F1 scores. The classifiers were applied to the test dataset to determine the number of tweets about nonmedical use and side effects. Results Of the 6860 tweets in the training dataset, 5.19% (356/6860) and 5.52% (379/6860) were about nonmedical use and side effects, respectively. Performance of SVM classifiers for nonmedical use and side effects, expressed as F1 scores, were 0.547 (precision: 0.926, recall: 0.388, and accuracy: 0.967) and 0.733 (precision: 0.920, recall: 0.609, and accuracy: 0.976), respectively. In the test dataset, the SVM classifiers identified 361 tweets (1.32%) about nonmedical use and 519 tweets (1.89%) about side effects. The proportion of tweets about nonmedical use was highest in May 2019 (46/2624, 1.75%) and December 2018 (36/2041, 1.76%). Conclusions The SVM classifiers that were built in this study were highly precise and accurate and will help to automatically identify the nonmedical use and side effects of methylphenidate using Twitter.

Download Full-text

Twitter Analysis of the Nonmedical Use and Side Effects of Methylphenidate, and User Sentiment about the Drug (Preprint)

10.2196/preprints.16466 ◽

2019 ◽

Author(s):

Jungu Kim ◽

Su Cheol Kim ◽

Jaegwon Jeong ◽

Myeong Gyu Kim

Keyword(s):

Side Effects ◽

Social Networking ◽

Brand Names ◽

Social Networking Services ◽

Hyperactivity Disorder ◽

Search Terms ◽

Nonmedical Use ◽

Twitter Analysis ◽

Sentiment Score ◽

The Cost

BACKGROUND Methylphenidate, a stimulant used to treat attention deficit hyperactivity disorder (ADHD), has the potential for nonmedical uses such as study and recreation. In the era of active use of social networking services (SNSs), experience with the nonmedical use or side effects of methylphenidate might be shared on Twitter. OBJECTIVE To analyze monthly tweets about methylphenidate, its nonmedical use and side effects, and user sentiments about methylphenidate. METHODS Tweets mentioning methylphenidate from August 2018 to July 2019 were collected using search terms for methylphenidate and its brand names. Only tweets written in English were included. The monthly number of tweets about methylphenidate and the number of tweets containing keywords related to the nonmedical use and side effects of methylphenidate were analyzed. Precision was calculated as the number of true nonmedical use or side effects divided by the number of tweets containing each keywords. Sentiment analysis was conducted using the text and emoji in tweets, and tweets were categorized as very negative (less than -3), negative (-3 to -1), neutral (0), positive (1 to 3), or very positive (more than 3), depending on the sentiment score. RESULTS A total of 4,169 tweets were ultimately selected for analysis. The number of tweets per month was lowest in August (n=264) and highest in May (n=435). There were 292 (7.0%) tweets about nonmedical uses of methylphenidate. Among those, 200 (4.8%) described use for studying, and 15 (0.4%) described use for recreation. In 91 (2.2%) tweets, snorting methylphenidate was mentioned. Side effects of methylphenidate, mainly poor appetite (n=74, 1.8%) and insomnia (n=54, 1.3%), were reported in 316 (7.6%) tweets. The average sentiment score was 0.027 ± 1.475, and neutral tweets were the most abundant (n=1,593, 38.2%). CONCLUSIONS Tweets about methylphenidate were most abundant in May, mentioned nonmedical use for study or recreation, and contained information about side effects. Analysis of Twitter has the advantage of saving the cost and time needed to conduct a survey, and could help identify nonmedical uses and side effects of drugs.

Download Full-text

Classification of different skarn deposits based on the compositional variability of associated grandite garnets: a data science and Machine Learning approach

10.5194/egusphere-egu21-10537 ◽

2021 ◽

Author(s):

Urmi Ghosh ◽

Tuhin Chakraborty

Keyword(s):

Machine Learning ◽

Trace Element ◽

Data Science ◽

Training Dataset ◽

Support Vector ◽

Learning Approach ◽

Machine Learning Approach ◽

Skarn Deposits

<p>Rapid technological improvements made in in-situ analysis techniques, including LA-ICPMS, have transformed the field of analytical geochemistry. This has a far-reaching impact for different petrogenetic and ore-genetic studies where minute major and trace element compositional changes between different mineral zones within a single crystal can now be demarcated. Minerals such as garnet although robust are quite sensitive to the changing P-T and fluid conditions during their formation. These minerals have become powerful tools to characterize mineralization types. Previously, Meinert (1992) has used in-situ major element EPMA analysis results to classify different skarn deposit based on the end-member composition of hydrothermal garnets. Alternatively, Tian et al. (2019) used the garnet trace element composition for the similar purpose. However, these discrimination plots/ classification schemes show major overlap in different skarn deposits, such as Fe, Cu, Zn, and Au. The present study is an attempt to use machine learning approach on available garnet data to found a more potent classification scheme for skarn deposits, thus reaffirming garnet as a faithful indicator for hydrothermal ore deposits. We have meticulously collected major and trace element data of Ca-rich garnets, associated with different skarn deposits worldwide from 40 publications. This collected data is then used to train a model for fingerprinting the skarn deposits. Stratified random sampling method has been used on the dataset with 80% of the samples as test set and the rest 20 % as training dataset. We have used K-nearest neighbour (KNN), Support Vector Machine (SVM) and Random Forest algorithms on the data by using Python as a platform. These ML classification algorithm performs better than the earlier existing models available for classification of ore types based on garnet composition in skarn system. Factor importance is calculated that shows which elements play a pivotal role in classification of the ore type. Our results depict that multiple garnet forming elements taken together can reliably be used to discriminate between different ore formation settings.</p>

Download Full-text

Streamlining Quality Review of Mass Spectrometry Data in the Clinical Laboratory by Use of Machine Learning

Archives of Pathology & Laboratory Medicine ◽

10.5858/arpa.2018-0238-oa ◽

2019 ◽

Vol 143 (8) ◽

pp. 990-998 ◽

Cited By ~ 2

Author(s):

Min Yu ◽

Lindsay A. L. Bazydlo ◽

David E. Bruns ◽

James H. Harrison

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Turnaround Time ◽

Machine Learning Algorithms ◽

Classification Model ◽

Supervised Machine Learning ◽

Training Dataset ◽

Support Vector ◽

Test Dataset ◽

Manual Review

Context.— Turnaround time and productivity of clinical mass spectrometric (MS) testing are hampered by time-consuming manual review of the analytical quality of MS data before release of patient results. Objective.— To determine whether a classification model created by using standard machine learning algorithms can verify analytically acceptable MS results and thereby reduce manual review requirements. Design.— We obtained retrospective data from gas chromatography–MS analyses of 11-nor-9-carboxy-delta-9-tetrahydrocannabinol (THC-COOH) in 1267 urine samples. The data for each sample had been labeled previously as either analytically unacceptable or acceptable by manual review. The dataset was randomly split into training and test sets (848 and 419 samples, respectively), maintaining equal proportions of acceptable (90%) and unacceptable (10%) results in each set. We used stratified 10-fold cross-validation in assessing the abilities of 6 supervised machine learning algorithms to distinguish unacceptable from acceptable assay results in the training dataset. The classifier with the highest recall was used to build a final model, and its performance was evaluated against the test dataset. Results.— In comparison testing of the 6 classifiers, a model based on the Support Vector Machines algorithm yielded the highest recall and acceptable precision. After optimization, this model correctly identified all unacceptable results in the test dataset (100% recall) with a precision of 81%. Conclusions.— Automated data review identified all analytically unacceptable assays in the test dataset, while reducing the manual review requirement by about 87%. This automation strategy can focus manual review only on assays likely to be problematic, allowing improved throughput and turnaround time without reducing quality.

Download Full-text

Predicting Tumor Budding Status in Cervical Cancer Using MRI Radiomics: Linking Imaging Biomarkers to Histologic Characteristics

Cancers ◽

10.3390/cancers13205140 ◽

2021 ◽

Vol 13 (20) ◽

pp. 5140

Author(s):

Gun Oh Chong ◽

Shin-Hyung Park ◽

Nora Jee-Young Park ◽

Bong Kyung Bae ◽

Yoon Hee Lee ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Cervical Cancer ◽

Area Under The Curve ◽

Tumor Budding ◽

Training Dataset ◽

Imaging Biomarkers ◽

Support Vector ◽

Test Dataset ◽

Machine Learning Classifiers

Background: Our previous study demonstrated that tumor budding (TB) status was associated with inferior overall survival in cervical cancer. The purpose of this study is to evaluate whether radiomic features can predict TB status in cervical cancer patients. Methods: Seventy-four patients with cervical cancer who underwent preoperative MRI and radical hysterectomy from 2011 to 2015 at our institution were enrolled. The patients were randomly allocated to the training dataset (n = 48) and test dataset (n = 26). Tumors were segmented on axial gadolinium-enhanced T1- and T2-weighted images. A total of 2074 radiomic features were extracted. Four machine learning classifiers, including logistic regression (LR), random forest (RF), support vector machine (SVM), and neural network (NN), were used. The trained models were validated on the test dataset. Results: Twenty radiomic features were selected; all were features from filtered-images and 85% were texture-related features. The area under the curve values and accuracy of the models by LR, RF, SVM and NN were 0.742 and 0.769, 0.782 and 0.731, 0.849 and 0.885, and 0.891 and 0.731, respectively, in the test dataset. Conclusion: MRI-based radiomic features could predict TB status in patients with cervical cancer.

Download Full-text

Proposing a machine-learning based method to predict stillbirth before and during delivery and ranking the features: nationwide retrospective cross-sectional study

BMC Pregnancy and Childbirth ◽

10.1186/s12884-021-03658-z ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Toktam Khatibi ◽

Elham Hanifi ◽

Mohammad Mehdi Sepehri ◽

Leila Allahqoli

Keyword(s):

Machine Learning ◽

External Validation ◽

Fetal Loss ◽

Null Distribution ◽

Training Dataset ◽

Gradient Boosting ◽

Support Vector ◽

Cross Sectional ◽

Boosting Method ◽

Demographic Features

Abstract Background Stillbirth is defined as fetal loss in pregnancy beyond 28 weeks by WHO. In this study, a machine-learning based method is proposed to predict stillbirth from livebirth and discriminate stillbirth before and during delivery and rank the features. Method A two-step stack ensemble classifier is proposed for classifying the instances into stillbirth and livebirth at the first step and then, classifying stillbirth before delivery from stillbirth during the labor at the second step. The proposed SE has two consecutive layers including the same classifiers. The base classifiers in each layer are decision tree, Gradient boosting classifier, logistics regression, random forest and support vector machines which are trained independently and aggregated based on Vote boosting method. Moreover, a new feature ranking method is proposed in this study based on mean decrease accuracy, Gini Index and model coefficients to find high-ranked features. Results IMAN registry dataset is used in this study considering all births at or beyond 28th gestational week from 2016/04/01 to 2017/01/01 including 1,415,623 live birth and 5502 stillbirth cases. A combination of maternal demographic features, clinical history, fetal properties, delivery descriptors, environmental features, healthcare service provider descriptors and socio-demographic features are considered. The experimental results show that our proposed SE outperforms the compared classifiers with the average accuracy of 90%, sensitivity of 91%, specificity of 88%. The discrimination of the proposed SE is assessed and the average AUC of ±95%, CI of 90.51% ±1.08 and 90% ±1.12 is obtained on training dataset for model development and test dataset for external validation, respectively. The proposed SE is calibrated using isotopic nonparametric calibration method with the score of 0.07. The process is repeated 10,000 times and AUC of SE classifiers using random different training datasets as null distribution. The obtained p-value to assess the specificity of the proposed SE is 0.0126 which shows the significance of the proposed SE. Conclusions Gestational age and fetal height are two most important features for discriminating livebirth from stillbirth. Moreover, hospital, province, delivery main cause, perinatal abnormality, miscarriage number and maternal age are the most important features for classifying stillbirth before and during delivery.

Download Full-text

The Classification of Skateboarding Tricks : A Transfer Learning and Machine Learning Approach

Mekatronika ◽

10.15282/mekatronika.v2i2.6683 ◽

2020 ◽

Vol 2 (2) ◽

pp. 1-12

Author(s):

Muhammad Nur Aiman Shapiee ◽

Muhammad Ar Rahim Ibrahim ◽

Muhammad Amirul Abdullah ◽

Rabiu Muazu Musa ◽

Noor Azuan Abu Osman ◽

...

Keyword(s):

Machine Learning ◽

Classification Accuracy ◽

Nearest Neighbor ◽

Olympic Games ◽

Learning Approach ◽

K Nearest Neighbor ◽

Test Dataset ◽

Machine Learning Approach ◽

Competitive Games

The skateboarding scene has arrived at new statures, particularly with its first appearance at the now delayed Tokyo Summer Olympic Games. Hence, attributable to the size of the game in such competitive games, progressed creative appraisal approaches have progressively increased due consideration by pertinent partners, particularly with the enthusiasm of a more goal-based assessment. This study purposes for classifying skateboarding tricks, specifically Frontside 180, Kickflip, Ollie, Nollie Front Shove-it, and Pop Shove-it over the integration of image processing, Trasnfer Learning (TL) to feature extraction enhanced with tradisional Machine Learning (ML) classifier. A male skateboarder performed five tricks every sort of trick consistently and the YI Action camera captured the movement by a range of 1.26 m. Then, the image dataset were features built and extricated by means of three TL models, and afterward in this manner arranged to utilize by k-Nearest Neighbor (k-NN) classifier. The perception via the initial experiments showed, the MobileNet, NASNetMobile, and NASNetLarge coupled with optimized k-NN classifiers attain a classification accuracy (CA) of 95%, 92% and 90%, respectively on the test dataset. Besides, the result evident from the robustness evaluation showed the MobileNet+k-NN pipeline is more robust as it could provide a decent average CA than other pipelines. It would be demonstrated that the suggested study could characterize the skateboard tricks sufficiently and could, over the long haul, uphold judges decided for giving progressively objective-based decision.

Download Full-text

Distribution Grids Fault Location employing ST based Optimized Machine Learning Approach

Energies ◽

10.3390/en11092328 ◽

2018 ◽

Vol 11 (9) ◽

pp. 2328 ◽

Cited By ~ 12

Author(s):

Md Shafiullah ◽

M. Abido ◽

Taher Abdel-Fattah

Keyword(s):

Machine Learning ◽

Fault Location ◽

Percentage Error ◽

Support Vector ◽

Learning Approach ◽

Efficiency Coefficient ◽

Learning Tools ◽

Performance Indices ◽

Machine Learning Approach ◽

Distribution Grids

Precise information of fault location plays a vital role in expediting the restoration process, after being subjected to any kind of fault in power distribution grids. This paper proposed the Stockwell transform (ST) based optimized machine learning approach, to locate the faults and to identify the faulty sections in the distribution grids. This research employed the ST to extract useful features from the recorded three-phase current signals and fetches them as inputs to different machine learning tools (MLT), including the multilayer perceptron neural networks (MLP-NN), support vector machines (SVM), and extreme learning machines (ELM). The proposed approach employed the constriction-factor particle swarm optimization (CF-PSO) technique, to optimize the parameters of the SVM and ELM for their better generalization performance. Hence, it compared the obtained results of the test datasets in terms of the selected statistical performance indices, including the root mean squared error (RMSE), mean absolute percentage error (MAPE), percent bias (PBIAS), RMSE-observations to standard deviation ratio (RSR), coefficient of determination (R2), Willmott’s index of agreement (WIA), and Nash–Sutcliffe model efficiency coefficient (NSEC) to confirm the effectiveness of the developed fault location scheme. The satisfactory values of the statistical performance indices, indicated the superiority of the optimized machine learning tools over the non-optimized tools in locating faults. In addition, this research confirmed the efficacy of the faulty section identification scheme based on overall accuracy. Furthermore, the presented results validated the robustness of the developed approach against the measurement noise and uncertainties associated with pre-fault loading condition, fault resistance, and inception angle.

Download Full-text

Hybrid Machine Learning Approach for Skin Disease Detection Using Optimal Support Vector Machine

Intelligent Data Communication Technologies and Internet of Things - Lecture Notes on Data Engineering and Communications Technologies ◽

10.1007/978-3-030-34080-3_73 ◽

2019 ◽

pp. 647-658

Author(s):

K. Melbin ◽

Y. Jacob Vetha Raj

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Skin Disease ◽

Support Vector ◽

Disease Detection ◽

Learning Approach ◽

Machine Learning Approach ◽

Hybrid Machine

Download Full-text

KDClassifier: Urinary Proteomic Spectra Analysis Based on Machine Learning for Classification of Kidney Diseases

10.1101/2020.12.01.20242198 ◽

2020 ◽

Author(s):

Wanjun Zhao ◽

Yong Zhang ◽

Xinming Li ◽

Yonghong Mao ◽

Changwei Wu ◽

...

Keyword(s):

Machine Learning ◽

Mass Spectrum ◽

Kidney Disease ◽

Kidney Diseases ◽

Training Dataset ◽

Validation Dataset ◽

Support Vector ◽

Urinary Proteomics ◽

Diagnosis Model

AbstractBackgroundBy extracting the spectrum features from urinary proteomics based on an advanced mass spectrometer and machine learning algorithms, more accurate reporting results can be achieved for disease classification. We attempted to establish a novel diagnosis model of kidney diseases by combining machine learning with an extreme gradient boosting (XGBoost) algorithm with complete mass spectrum information from the urinary proteomics.MethodsWe enrolled 134 patients (including those with IgA nephropathy, membranous nephropathy, and diabetic kidney disease) and 68 healthy participants as a control, and for training and validation of the diagnostic model, applied a total of 610,102 mass spectra from their urinary proteomics produced using high-resolution mass spectrometry. We divided the mass spectrum data into a training dataset (80%) and a validation dataset (20%). The training dataset was directly used to create a diagnosis model using XGBoost, random forest (RF), a support vector machine (SVM), and artificial neural networks (ANNs). The diagnostic accuracy was evaluated using a confusion matrix. We also constructed the receiver operating-characteristic, Lorenz, and gain curves to evaluate the diagnosis model.ResultsCompared with RF, the SVM, and ANNs, the modified XGBoost model, called a Kidney Disease Classifier (KDClassifier), showed the best performance. The accuracy of the diagnostic XGBoost model was 96.03% (CI = 95.17%-96.77%; Kapa = 0.943; McNemar’s Test, P value = 0.00027). The area under the curve of the XGBoost model was 0.952 (CI = 0.9307-0.9733). The Kolmogorov-Smirnov (KS) value of the Lorenz curve was 0.8514. The Lorenz and gain curves showed the strong robustness of the developed model.ConclusionsThis study presents the first XGBoost diagnosis model, i.e., the KDClassifier, combined with complete mass spectrum information from the urinary proteomics for distinguishing different kidney diseases. KDClassifier achieves a high accuracy and robustness, providing a potential tool for the classification of all types of kidney diseases.

Download Full-text

A machine learning approach to predict pancreatic islet grafts rejection versus tolerance

PLoS ONE ◽

10.1371/journal.pone.0241925 ◽

2020 ◽

Vol 15 (11) ◽

pp. e0241925

Author(s):

Gerardo A. Ceballos ◽

Luis F. Hernandez ◽

Daniel Paredes ◽

Luis R. Betancourt ◽

Midhat H. Abdulreda

Keyword(s):

Machine Learning ◽

Pancreatic Islet ◽

Support Vector ◽

Medical Diagnoses ◽

Laser Induced Fluorescence Detection ◽

Classification Score ◽

New Information ◽

Islet Allografts ◽

Machine Learning Approach ◽

Positive Classification

The application of artificial intelligence (AI) and machine learning (ML) in biomedical research promises to unlock new information from the vast amounts of data being generated through the delivery of healthcare and the expanding high-throughput research applications. Such information can aid medical diagnoses and reveal various unique patterns of biochemical and immune features that can serve as early disease biomarkers. In this report, we demonstrate the feasibility of using an AI/ML approach in a relatively small dataset to discriminate among three categories of samples obtained from mice that either rejected or tolerated their pancreatic islet allografts following transplant in the anterior chamber of the eye, and from naïve controls. We created a locked software based on a support vector machine (SVM) technique for pattern recognition in electropherograms (EPGs) generated by micellar electrokinetic chromatography and laser induced fluorescence detection (MEKC-LIFD). Predictions were made based only on the aligned EPGs obtained in microliter-size aqueous humor samples representative of the immediate local microenvironment of the islet allografts. The analysis identified discriminative peaks in the EPGs of the three sample categories. Our classifier software was tested with targeted and untargeted peaks. Working with the patterns of untargeted peaks (i.e., based on the whole pattern of EPGs), it was able to achieve a 21 out of 22 positive classification score with a corresponding 95.45% prediction accuracy among the three sample categories, and 100% accuracy between the rejecting and tolerant recipients. These findings demonstrate the feasibility of AI/ML approaches to classify small numbers of samples and they warrant further studies to identify the analytes/biochemicals corresponding to discriminative features as potential biomarkers of islet allograft immune rejection and tolerance.

Download Full-text