A Classifier to Detect Informational vs. Non-Informational Heart Attack Tweets

Ola Karajeh; Dirar Darweesh; Omar Darwish; Noor Abu-El-Rub; Belal Alsinglawi; Nasser Alsaedi

doi:10.3390/fi13010019

A Classifier to Detect Informational vs. Non-Informational Heart Attack Tweets

Future Internet ◽

10.3390/fi13010019 ◽

2021 ◽

Vol 13 (1) ◽

pp. 19

Author(s):

Ola Karajeh ◽

Dirar Darweesh ◽

Omar Darwish ◽

Noor Abu-El-Rub ◽

Belal Alsinglawi ◽

...

Keyword(s):

Machine Learning ◽

Social Media ◽

Heart Attack ◽

Machine Learning Algorithms ◽

Support Vector ◽

Heart Attacks ◽

Learning Classifier ◽

Important Concern ◽

Starting Point ◽

First Time

Social media sites are considered one of the most important sources of data in many fields, such as health, education, and politics. While surveys provide explicit answers to specific questions, posts in social media have the same answers implicitly occurring in the text. This research aims to develop a method for extracting implicit answers from large tweet collections, and to demonstrate this method for an important concern: the problem of heart attacks. The approach is to collect tweets containing “heart attack” and then select from those the ones with useful information. Informational tweets are those which express real heart attack issues, e.g., “Yesterday morning, my grandfather had a heart attack while he was walking around the garden.” On the other hand, there are non-informational tweets such as “Dropped my iPhone for the first time and almost had a heart attack.” The starting point was to manually classify around 7000 tweets as either informational (11%) or non-informational (89%), thus yielding a labeled dataset to use in devising a machine learning classifier that can be applied to our large collection of over 20 million tweets. Tweets were cleaned and converted to a vector representation, suitable to be fed into different machine-learning algorithms: Deep neural networks, support vector machine (SVM), J48 decision tree and naïve Bayes. Our experimentation aimed to find the best algorithm to use to build a high-quality classifier. This involved splitting the labeled dataset, with 2/3 used to train the classifier and 1/3 used for evaluation besides cross-validation methods. The deep neural network (DNN) classifier obtained the highest accuracy (95.2%). In addition, it obtained the highest F1-scores with (73.6%) and (97.4%) for informational and non-informational classes, respectively.

Download Full-text

Fault detection for air conditioning system using machine learning

IAES International Journal of Artificial Intelligence (IJ-AI) ◽

10.11591/ijai.v9.i1.pp109-116 ◽

2020 ◽

Vol 9 (1) ◽

pp. 109

Author(s):

Noor Asyikin Sulaiman ◽

Md Pauzi Abdullah ◽

Hayati Abdullah ◽

Muhammad Noorazlan Shah Zainudin ◽

Azdiana Md Yusop

Keyword(s):

Machine Learning ◽

Supervised Learning ◽

Air Conditioning ◽

Machine Learning Algorithms ◽

Coefficient Of Performance ◽

Support Vector ◽

Air Conditioning System ◽

Learning Classifier ◽

Negative Impacts ◽

The Impact

Air conditioning system is a complex system and consumes the most energy in a building. Any fault in the system operation such as cooling tower fan faulty, compressor failure, damper stuck, etc. could lead to energy wastage and reduction in the system’s coefficient of performance (COP). Due to the complexity of the air conditioning system, detecting those faults is hard as it requires exhaustive inspections. This paper consists of two parts; i) to investigate the impact of different faults related to the air conditioning system on COP and ii) to analyse the performances of machine learning algorithms to classify those faults. Three supervised learning classifier models were developed, which were deep learning, support vector machine (SVM) and multi-layer perceptron (MLP). The performances of each classifier were investigated in terms of six different classes of faults. Results showed that different faults give different negative impacts on the COP. Also, the three supervised learning classifier models able to classify all faults for more than 94%, and MLP produced the highest accuracy and precision among all.

Download Full-text

Cyber Bullying Detection for Twitter Using ML Classification Algorithms

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.38701 ◽

2021 ◽

Vol 9 (11) ◽

pp. 24-29

Author(s):

Muskan Patidar

Keyword(s):

Machine Learning ◽

Social Media ◽

Natural Language ◽

Naive Bayes ◽

Learning Algorithms ◽

Naïve Bayes ◽

Cyber Bullying ◽

Machine Learning Algorithms ◽

Support Vector ◽

Classification Algorithms

Abstract: Social networking platforms have given us incalculable opportunities than ever before, and its benefits are undeniable. Despite benefits, people may be humiliated, insulted, bullied, and harassed by anonymous users, strangers, or peers. Cyberbullying refers to the use of technology to humiliate and slander other people. It takes form of hate messages sent through social media and emails. With the exponential increase of social media users, cyberbullying has been emerged as a form of bullying through electronic messages. We have tried to propose a possible solution for the above problem, our project aims to detect cyberbullying in tweets using ML Classification algorithms like Naïve Bayes, KNN, Decision Tree, Random Forest, Support Vector etc. and also we will apply the NLTK (Natural language toolkit) which consist of bigram, trigram, n-gram and unigram on Naïve Bayes to check its accuracy. Finally, we will compare the results of proposed and baseline features with other machine learning algorithms. Findings of the comparison indicate the significance of the proposed features in cyberbullying detection. Keywords: Cyber bullying, Machine Learning Algorithms, Twitter, Natural Language Toolkit

Download Full-text

Detection of misinformation on garlic and COVID-19 in Twitter: A machine learning-based approach (Preprint)

10.2196/preprints.33056 ◽

2021 ◽

Author(s):

Myeong Gyu Kim ◽

Jae Hyun Kim ◽

Kyungim Kim

Keyword(s):

Machine Learning ◽

Social Media ◽

Latent Dirichlet Allocation ◽

Predictive Performance ◽

Machine Learning Algorithms ◽

Training Dataset ◽

Polynomial Kernel ◽

Support Vector ◽

Accurate Information ◽

Probability Number

BACKGROUND Garlic-related misinformation is prevalent whenever a virus outbreak occurs. Again, with the outbreak of coronavirus disease 2019 (COVID-19), garlic-related misinformation is spreading through social media sites, including Twitter. Machine learning-based approaches can be used to detect misinformation from vast tweets. OBJECTIVE This study aimed to develop machine learning algorithms for detecting misinformation on garlic and COVID-19 in Twitter. METHODS This study used 5,929 original tweets mentioning garlic and COVID-19. Tweets were manually labeled as misinformation, accurate information, and others. We tested the following algorithms: k-nearest neighbors; random forest; support vector machine (SVM) with linear, radial, and polynomial kernels; and neural network. Features for machine learning included user-based features (verified account, user type, number of followers, and follower rate) and text-based features (uniform resource locator, negation, sentiment score, Latent Dirichlet Allocation topic probability, number of retweets, and number of favorites). A model with the highest accuracy in the training dataset (70% of overall dataset) was tested using a test dataset (30% of overall dataset). Predictive performance was measured using overall accuracy, sensitivity, specificity, and balanced accuracy. RESULTS SVM with the polynomial kernel model showed the highest accuracy of 0.670. The model also showed a balanced accuracy of 0.757, sensitivity of 0.819, and specificity of 0.696 for misinformation. Important features in the misinformation and accurate information classes included topic 4 (common myths), topic 13 (garlic-specific myths), number of followers, topic 11 (misinformation on social media), and follower rate. Topic 3 (cooking recipes) was the most important feature in the others class. CONCLUSIONS Our SVM model showed good performance in detecting misinformation. The results of our study will help detect misinformation related to garlic and COVID-19. It could also be applied to prevent misinformation related to dietary supplements in the event of a future outbreak of a disease other than COVID-19.

Download Full-text

MACHINE LEARNING ALGORITHMS FOR IDENTIFICATION OF ABNORMAL GLOW CURVES AND ASSOCIATED ABNORMALITY IN CaSO4:DY-BASED PERSONNEL MONITORING DOSIMETERS

Radiation Protection Dosimetry ◽

10.1093/rpd/ncaa108 ◽

2020 ◽

Vol 190 (3) ◽

pp. 342-351

Author(s):

Munir S Pathan ◽

S M Pradhan ◽

T Palani Selvam

Keyword(s):

Machine Learning ◽

Glow Curve ◽

Good Accuracy ◽

Machine Learning Algorithms ◽

Support Vector ◽

Computationally Efficient ◽

Artificial Neural Network Ann ◽

First Time

Abstract In the present study, machine learning (ML) methods for the identification of abnormal glow curves (GC) of CaSO4:Dy-based thermoluminescence dosimeters in individual monitoring are presented. The classifier algorithms, random forest (RF), artificial neural network (ANN) and support vector machine (SVM) are employed for identifying not only the abnormal glow curve but also the type of abnormality. For the first time, the simplest and computationally efficient algorithm based on RF is presented for GC classifications. About 4000 GCs are used for the training and validation of ML algorithms. The performance of all algorithms is compared by using various parameters. Results show a fairly good accuracy of 99.05% for the classification of GCs by RF algorithm. Whereas 96.7% and 96.1% accuracy is achieved using ANN and SVM, respectively. The RF-based classifier is recommended for GC classification as well as in assisting the fault determination of the TLD reader system.

Download Full-text

Sentiment Analysis of Impact of Technology on Employment from Text on Twitter

International Journal of Interactive Mobile Technologies (iJIM) ◽

10.3991/ijim.v14i07.10600 ◽

2020 ◽

Vol 14 (07) ◽

pp. 88

Author(s):

Shahzad Qaiser ◽

Nooraini Yusoff ◽

Farzana Kabir Ahmad ◽

Ramsha Ali

Keyword(s):

Machine Learning ◽

Social Media ◽

Social Issues ◽

Support Vector ◽

Ripple Effect ◽

Learning Classifier ◽

The People ◽

Impact Of Technology ◽

Negative Sentiment ◽

The Impact

Many different studies are in progress to analyze the content created by the users on social media due to its influence and social ripple effect. Various content created on social media has pieces of information and user’s sentiments about social issues. This study aims to analyze people’s sentiments about the impact of technology on employment and advancements in technologies and build a machine learning classifier to classify the sentiments. People are getting nervous, depressed and even doing suicides due to unemployment; hence, it is essential to explore this relatively new area of research. The study has two main objectives 1) to preprocess text collected from Twitter concerning the impact of technology on employment and analyze its sentiment, 2) to evaluate the performance of machine learning Naïve Bayes (NB) classifier on the text. To achieve this, a methodology is proposed that includes 1) data collection and preprocessing 2) analyze sentiment, 3) building machine learning classifier and 4) compare the performance of NB and support vector machine (SVM). NB and SVM achieved 87.18% and 82.05% accuracy respectively. The study found that 65% of the people hold negative sentiment regarding the impact of technology on employment and technological advancements; hence people must acquire new skills to minimize the effect of structural unemployment.

Download Full-text

Detecting Spam Messages in Twitter Data by Machine learning Algorithms using Cross Validation

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.k1913.1081219 ◽

2019 ◽

Vol 8 (12) ◽

pp. 2941-2946

Keyword(s):

Machine Learning ◽

Social Media ◽

Cross Validation ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

Human Relations ◽

Detection Model ◽

Social Media Networks ◽

Twitter Data

Now a day’s human relations are maintained by social media networks. Traditional relationships now days are obsolete. To maintain in association, sharing ideas, exchange knowledge between we use social media networking sites. Social media networking sites like Twitter, Facebook, LinkedIn etc are available in the communication environment. Through Twitter media users share their opinions, interests, knowledge to others by messages. At the same time some of the user’s misguide the genuine users. These genuine users are also called solicited users and the users who misguidance are called spammers. These spammers post unwanted information to the non spam users. The non spammers may retweet them to others and they follow the spammers. To avoid this spam messages we propose a methodology by us using machine learning algorithms. To develop our approach used a set of content based features. In spam detection model we used Support vector machine algorithm(SVM) and Naive bayes classification algorithm. To measure the performance of our model we used precision, recall and F measure metrics.

Download Full-text

Sentiment Analysis on Social Media using Machine Learning Approach

10.22541/au.163620143.37655829/v1 ◽

2021 ◽

Author(s):

Erick Omuya ◽

George Okeyo ◽

Michael Kimwele

Keyword(s):

Machine Learning ◽

Social Media ◽

Sentiment Analysis ◽

Language Processing ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

Learning Approach ◽

K Nearest Neighbor ◽

Machine Learning Approach

Social media has been embraced by different people as a convenient and official medium of communication. People write messages and attach images and videos on Twitter, Facebook and other social media which they share. Social media therefore generates a lot of data that is rich in sentiments from these updates. Sentiment analysis has been used to determine opinions of clients, for instance, relating to a particular product or company. Knowledge based approach and Machine learning approach are among the strategies that have been used to analyze these sentiments. The performance of sentiment analysis is however distorted by noise, the curse of dimensionality, the data domains and size of data used for training and testing. This research aims at developing a model for sentiment analysis in which dimensionality reduction and the use of different parts of speech improves sentiment analysis performance. It uses natural language processing for filtering, storing and performing sentiment analysis on the data from social media. The model is tested using Naïve Bayes, Support Vector Machines and K-Nearest neighbor machine learning algorithms and its performance compared with that of two other Sentiment Analysis models. Experimental results show that the model improves sentiment analysis performance using machine learning techniques.

Download Full-text

A Survey on Prediction of Suicidal Ideation Using Machine and Ensemble Learning

The Computer Journal ◽

10.1093/comjnl/bxz120 ◽

2019 ◽

Cited By ~ 1

Author(s):

Akshma Chadha ◽

Baijnath Kaushik

Keyword(s):

Machine Learning ◽

Social Media ◽

Suicidal Ideation ◽

Ensemble Learning ◽

Naive Bayes ◽

Learning Algorithms ◽

Naïve Bayes ◽

Social Networking Site ◽

Machine Learning Algorithms ◽

Support Vector

Abstract Suicide is a major health issue nowadays and has become one of the highest reason for deaths. There are many negative emotions like anxiety, depression, stress that can lead to suicide. By identifying the individuals having suicidal ideation beforehand, the risk of them completing suicide can be reduced. Social media is increasingly becoming a powerful platform where people around the world are sharing emotions and thoughts. Moreover, this platform in some way is working as a catalyst for invoking and inciting the suicidal ideation. The objective of this proposal is to use social media as a tool that can aid in preventing the same. Data is collected from Twitter, a social networking site using some features that are related to suicidal ideation. The tweets are preprocessed as per the semantics of the identified features and then it is converted into probabilistic values so that it will be suitably used by machine learning and ensemble learning algorithms. Different machine learning algorithms like Bernoulli Naïve Bayes, Multinomial Naïve Bayes, Decision Tree, Logistic Regression, Support Vector Machine were applied on the data to predict and identify trends of suicidal ideation. Further the proposed work is evaluated with some ensemble approaches like Random Forest, AdaBoost, Voting Ensemble to see the improvement.

Download Full-text

Redis-Based Messaging Queue and Cache-Enabled Parallel Processing Social Media Analytics Framework

The Computer Journal ◽

10.1093/comjnl/bxaa114 ◽

2020 ◽

Author(s):

Ravindra Kumar Singh ◽

Harsh Kumar Verma

Keyword(s):

Machine Learning ◽

Social Media ◽

Real Time ◽

Data Analytics ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Support Vector ◽

Social Media Analytics ◽

Data Engineering ◽

Extreme Gradient Boosting

Abstract The extensive usage of social media polarity analysis claims the need for real-time analytics and runtime outcomes on dashboards. In data analytics, only 30% of the time is consumed in modeling and evaluation stages and 70% is consumed in data engineering tasks. There are lots of machine learning algorithms to achieve a desirable outcome in prediction points of view, but they lack in handling data and their transformation so-called data engineering tasks, and reducing its time remained still challenging. The contribution of this research paper is to encounter the mentioned challenges by presenting a parallelly, scalable, effective, responsive and fault-tolerant framework to perform end-to-end data analytics tasks in real-time and batch-processing manner. An experimental analysis on Twitter posts supported the claims and signifies the benefits of parallelism of data processing units. This research has highlighted the importance of processing mentioned URLs and embedded images along with post content to boost the prediction efficiency. Furthermore, this research additionally provided a comparison of naive Bayes, support vector machines, extreme gradient boosting and long short-term memory (LSTM) machine learning techniques for sentiment analysis on Twitter posts and concluded LSTM as the most effective technique in this regard.

Download Full-text

Hoax News Classification using Machine Learning Algorithms

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.b3753.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 3938-3944

Keyword(s):

Machine Learning ◽

Social Media ◽

Learning Algorithm ◽

Detection System ◽

Machine Learning Algorithms ◽

Training Data ◽

Stochastic Gradient Descent ◽

Support Vector ◽

The Impact ◽

F Measure

Hoax news on social media has had a dramatic effect on our society in recent years. The impact of hoax news felt by many people, anxiety, financial loss, and loss of the right name. Therefore we need a detection system that can help reduce hoax news on social media. Hoax news classification is one of the stages in the construction of a hoax news detection system, and this unsupervised learning algorithm becomes a method for creating hoax news datasets, machine learning tools for data processing, and text processing for detecting data. The next will produce a classification of a hoax or not a Hoax based on the text inputted. Hoax news classification in this study uses five algorithms, namely Support Vector Machine, Naïve Bayes, Decision Tree, Logistic Regression, Stochastic Gradient Descent, and Neural Network (MLP). These five algorithms to produce the best algorithm that can use to detect hoax news, with the highest parameters, accuracy, F-measure, Precision, and recall. From the results of testing conducted on five classification algorithms produced shows that the NN-MPL algorithm has an average of 93% for the value of accuracy, F-Measure, and Precision, the highest compared to five other algorithms, but for the highest Recall value generated from the algorithm SVM which is 94%. the results of this experiment show that different effects for different classifiers, and that means that the more hoax data used as training data, the more accurate the system calculates accuracy in more detail.

Download Full-text