Ensemble Learning Approach for Clickbait Detection Using Article Headline Features

Mapping Intimacies ◽

10.28945/4319 ◽

2019 ◽

Keyword(s):

Ensemble Learning ◽

Language Processing ◽

Classification Model ◽

Web Page ◽

Learning Techniques ◽

False News ◽

Processing Techniques ◽

News Headlines ◽

Unpleasant Experience ◽

Better Than

[This Proceedings paper was revised and published in the 2019 issue of the journal Informing Science: The International Journal of an Emerging Transdiscipline, Volume 22] Aim/Purpose: The aim of this paper is to propose an ensemble learners based classification model for classification clickbaits from genuine article headlines. Background: Clickbaits are online articles with deliberately designed misleading titles for luring more and more readers to open the intended web page. Clickbaits are used to tempted visitors to click on a particular link either to monetize the landing page or to spread the false news for sensationalization. The presence of clickbaits on any news aggregator portal may lead to an unpleasant experience for readers. Therefore, it is essential to distinguish clickbaits from authentic headlines to mitigate their impact on readers’ perception. Methodology: A total of one hundred thousand article headlines are collected from news aggregator sites consists of clickbaits and authentic news headlines. The collected data samples are divided into five training sets of balanced and unbalanced data. The natural language processing techniques are used to extract 19 manually selected features from article headlines. Contribution: Three ensemble learning techniques including bagging, boosting, and random forests are used to design a classifier model for classifying a given headline into the clickbait or non-clickbait. The performances of learners are evaluated using accuracy, precision, recall, and F-measures. Findings: It is observed that the random forest classifier detects clickbaits better than the other classifiers with an accuracy of 91.16 %, a total precision, recall, and f-measure of 91 %.

Download Full-text

Ensemble Learning Approach for Clickbait Detection Using Article Headline Features

Informing Science The International Journal of an Emerging Transdiscipline ◽

10.28945/4279 ◽

2019 ◽

Vol 22 ◽

pp. 031-044 ◽

Cited By ~ 4

Author(s):

Dilip Singh Sisodia

Keyword(s):

Ensemble Learning ◽

Language Processing ◽

Classification Model ◽

Web Page ◽

Learning Techniques ◽

False News ◽

Processing Techniques ◽

News Headlines ◽

Unpleasant Experience ◽

Better Than

Aim/Purpose: The aim of this paper is to propose an ensemble learners based classification model for classification clickbaits from genuine article headlines. Background: Clickbaits are online articles with deliberately designed misleading titles for luring more and more readers to open the intended web page. Clickbaits are used to tempted visitors to click on a particular link either to monetize the landing page or to spread the false news for sensationalization. The presence of clickbaits on any news aggregator portal may lead to an unpleasant experience for readers. Therefore, it is essential to distinguish clickbaits from authentic headlines to mitigate their impact on readers’ perception. Methodology: A total of one hundred thousand article headlines are collected from news aggregator sites consists of clickbaits and authentic news headlines. The collected data samples are divided into five training sets of balanced and unbalanced data. The natural language processing techniques are used to extract 19 manually selected features from article headlines. Contribution: Three ensemble learning techniques including bagging, boosting, and random forests are used to design a classifier model for classifying a given headline into the clickbait or non-clickbait. The performances of learners are evaluated using accuracy, precision, recall, and F-measures. Findings: It is observed that the random forest classifier detects clickbaits better than the other classifiers with an accuracy of 91.16 %, a total precision, recall, and f-measure of 91 %.

Download Full-text

Identification of Plant Leaf Disease using Machine Learning Techniques

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.c5621.098319 ◽

2019 ◽

Vol 8 (3) ◽

pp. 6077-6081 ◽

Cited By ~ 1

Keyword(s):

Plant Disease ◽

Digital Camera ◽

Machine Learning Techniques ◽

Support Vector ◽

Leaf Sample ◽

Learning Techniques ◽

Artificial Neural Network Ann ◽

Identification Techniques ◽

Processing Techniques ◽

Better Than

Plant disease identification and classification is major area of research as majority of people in India depend on agriculture for their main source of income and for food. Identification of the diseases in any crops is challenging since manual identification techniques being used in this are based on the experts advises which may not be efficient. Based on leaf features decisions about variety of diseases are taken. In this paper an automated framework is introduced which can be used to detect and classify the diseases in the leaf accurately. Leaf images are acquired by using digital camera. Pre-processing techniques, segmentation and feature extraction are performed on the acquired images. The features are passed on to the classifiers to classify the diseases. This work has been proposed to classify and distinguish the leaf sample based on its features. The proposed work is carried out with Artificial Neural Network (ANN), Support Vector Machine (SVM) and Naive Bayes classifiers to analyze the result. For given dataset ANN performed better than the other two classifiers

Download Full-text

Automatic Organ Localization on X-Ray CT Images by Using Ensemble-Learning Techniques

Image Processing ◽

10.4018/978-1-4666-3994-2.ch068 ◽

2013 ◽

pp. 1379-1394

Author(s):

Xiangrong Zhou ◽

Hiroshi Fujita

Keyword(s):

Ensemble Learning ◽

Medical Image ◽

Medical Image Analysis ◽

Ct Images ◽

Lesion Detection ◽

Localization Scheme ◽

Learning Techniques ◽

Training Samples ◽

Inner Organs ◽

Processing Techniques

Location of an inner organ in a CT image is the basic information that is required for medical image analysis such as image segmentation, lesion detection, content-based image retrieval, and anatomical annotation. A general approach/scheme for the localization of different inner organs that can be adapted to suit various types of medical image formats is required. However, this is a very challenging problem and can hardly be solved by using traditional image processing techniques. This chapter introduces an ensemble-learning-based approach that can be used to solve organ localization problems. This approach can be used to generate a fast and efficient organ-localization scheme from a limited number of training samples that include both original images and target locations. This approach has been used for localizing five different human organs in CT images, and the accuracy, robustness, and computational efficiency of the designed scheme were validated by experiments.

Download Full-text

Emotion Detection for Social Robots Based on NLP Transformers and an Emotion Ontology

Sensors ◽

10.3390/s21041322 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1322

Author(s):

Wilfredo Graterol ◽

Jose Diaz-Amado ◽

Yudith Cardinale ◽

Irvin Dongo ◽

Edmundo Lopes-Silva ◽

...

Keyword(s):

Language Processing ◽

Application Programming Interface ◽

Social Robots ◽

Classification Model ◽

Machine Learning Techniques ◽

Emotional States ◽

Emotion Detection ◽

Learning Techniques ◽

Wide Range ◽

Application Programming

For social robots, knowledge regarding human emotional states is an essential part of adapting their behavior or associating emotions to other entities. Robots gather the information from which emotion detection is processed via different media, such as text, speech, images, or videos. The multimedia content is then properly processed to recognize emotions/sentiments, for example, by analyzing faces and postures in images/videos based on machine learning techniques or by converting speech into text to perform emotion detection with natural language processing (NLP) techniques. Keeping this information in semantic repositories offers a wide range of possibilities for implementing smart applications. We propose a framework to allow social robots to detect emotions and to store this information in a semantic repository, based on EMONTO (an EMotion ONTOlogy), and in the first figure or table caption. Please define if appropriate. an ontology to represent emotions. As a proof-of-concept, we develop a first version of this framework focused on emotion detection in text, which can be obtained directly as text or by converting speech to text. We tested the implementation with a case study of tour-guide robots for museums that rely on a speech-to-text converter based on the Google Application Programming Interface (API) and a Python library, a neural network to label the emotions in texts based on NLP transformers, and EMONTO integrated with an ontology for museums; thus, it is possible to register the emotions that artworks produce in visitors. We evaluate the classification model, obtaining equivalent results compared with a state-of-the-art transformer-based model and with a clear roadmap for improvement.

Download Full-text

Sentimental Classification of News Headlines using Recurrent Neural Network

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.f3573.049620 ◽

2020 ◽

Vol 9 (6) ◽

pp. 207-210

Keyword(s):

Neural Network ◽

Language Processing ◽

Recurrent Neural Network ◽

Short Term Memory ◽

Attention Mechanism ◽

Machine Learning Techniques ◽

Learning Techniques ◽

Negative Comments ◽

News Headlines

Sentiment analysis combines the natural language processing task and analysis of the text that attempts to predict the sentiment of the text in terms of positive and negative comments. Nowadays, the tremendous volume of news originated via different webpages, and it is feasible to determine the opinion of particular news. This work tries to judge completely various machine learning techniques to classify the view of the news headlines. In this project, propose the appliance of Recurrent Neural Network with Long Short Term Memory Unit(LSTM), focus on seeking out similar news headlines, and predict the opinion of news headlines from numerous sources. The main objective is to classify the sentiment of news headlines from various sources using a recurrent neural network. Interestingly, the proposed attention mechanism performs better than the more complex attention mechanism on a held-out set of articles.

Download Full-text

Clickbait detection using multiple categorisation techniques

Journal of Information Science ◽

10.1177/0165551519871822 ◽

2019 ◽

pp. 016555151987182

Author(s):

Abinash Pujahari ◽

Dilip Singh Sisodia

Keyword(s):

Machine Learning ◽

Learning Community ◽

Similarity Measures ◽

Machine Learning Algorithms ◽

Sentence Structure ◽

Recent Past ◽

Syntactic Similarity ◽

False News ◽

News Headlines ◽

Unpleasant Experience

Clickbaits are online articles with deliberately designed misleading titles for luring more and more readers to open the intended web page. Clickbaits are used to tempt visitors to click on a particular link either to monetise the landing page or to spread the false news for sensationalisation. The presence of clickbaits on any news aggregator portal may lead to unpleasant experience to readers. Automatic detection of clickbait headlines from news headlines has been a challenging issue for the machine learning community. A lot of methods have been proposed for preventing clickbait articles in recent past. However, the recent techniques available in detecting clickbaits are not much robust. This article proposes a hybrid categorisation technique for separating clickbait and non-clickbait articles by integrating different features, sentence structure and clustering. During preliminary categorisation, the headlines are separated using 11 features. After that, the headlines are recategorised using sentence formality and syntactic similarity measures. In the last phase, the headlines are again recategorised by applying clustering using word vector similarity based on t-stochastic neighbourhood embedding ( t-SNE) approach. After categorisation of these headlines, machine learning models are applied to the dataset to evaluate machine learning algorithms. The obtained experimental results indicate that the proposed hybrid model is more robust, reliable and efficient than any individual categorisation techniques for the dataset we have used.

Download Full-text

Survey Paper On “Web Page Content Visualization”

Kathmandu University Journal of Science Engineering and Technology ◽

10.3126/kuset.v8i1.6052 ◽

1970 ◽

Vol 8 (1) ◽

pp. 125-133

Author(s):

Sushil Shrestha

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Language Processing ◽

Machine Learning Techniques ◽

Web Page ◽

Semantic Graph ◽

Survey Paper ◽

Learning Techniques ◽

Future Direction ◽

Work Done

The paper is a survey of a work done in the field of web page content vuisualization. The paper start with summary of the semantic graph. Then it describe the process of generation of semantic graph by natural language processing and machine learning techniques and then enriching text with RDF/OWL Encoded Sense as the enhancement of the existing enrycher text conversion and ends with the possible future direction and conclusion. DOI: http://dx.doi.org/10.3126/kuset.v8i1.6052 KUSET 2012; 8(1): 125-133

Download Full-text

AI-Crime Hunter: An AI Mixture of Experts for Crime Discovery on Twitter

10.20944/preprints202111.0023.v1 ◽

2021 ◽

Author(s):

Niloufar Shoeibi ◽

Nastaran Shoeibi ◽

Guillermo Hernández ◽

Pablo Chamoso ◽

Juan Manuel Corchado

Keyword(s):

Sentiment Analysis ◽

Language Processing ◽

Hate Speech ◽

Freedom Of Expression ◽

Classification Model ◽

Abnormal Behavior ◽

Proper Actions ◽

Abnormal Behavior Detection ◽

Metadata Analysis ◽

Processing Techniques

Maintaining a healthy cyber society is a big challenge due to the users’ freedom of expression and behaving. It can be solved by monitoring and analyzing the users’ behavior and taking proper actions towards them. This research aims to present a platform that monitors the public content on Twitter by extracting tweet data. After maintaining the data, the users’ interactions are analyzed using Graph Analysis methods. Then the users’ behavioral patterns are analyzed by applying Metadata Analysis, in which the timeline of each profile is obtained; also, the time-series behavioral features of users are investigated. Then in the Abnormal Behavior Detection Filtering component, the interesting profiles are selected for further examinations. Finally, in the Contextual Analysis component, the contents will be analyzed using natural language processing techniques; A binary text classification model (SVM + TF-IDF with 88.89% accuracy) for detecting if the tweet is related to crime or not. Then, a sentiment analysis method is applied to the crime-related tweets to perform aspect-based sentiment analysis (DistilBERT + FFNN with 80% accuracy); because sharing positive opinions about a crime-related topic can threaten society. This platform aims to provide the end-user (Police) suggestions to control hate speech or terrorist propaganda.

Download Full-text

Usense: A User Sentiment Analysis Model for Movie Reviews by Applying LSTM

International Journal for Research in Engineering Application & Management ◽

10.35291/2454-9150.2020.0315 ◽

2020 ◽

pp. 369-372

Keyword(s):

Sentiment Analysis ◽

Language Processing ◽

State Of The Art ◽

Linear Kernel ◽

Analysis Model ◽

Large Dataset ◽

Major Drawback ◽

User Comments ◽

Processing Techniques ◽

Better Than

Sentiment analysis is a field which deals with assessing the sentiments or emotions of the users on products and services. It takes user comments as input and applies natural language processing techniques to identify the mood of the user. Usually a sentiment is deemed to be positive, negative or neutral depending upon the mood that he expresses in the comments or feedbacks. It is largely used by businesses to improve products and services and also to present its customers with a set of products and services based on their likes and dislikes. State-of-the-art indicates many techniques have been applied in past such as, linear regression and SVM models. Recurrent Neural Networks (RNNs) have improved the way in which sentiment analysis could be done with greater accuracy, but they suffer from major drawback when applied to longer sentences. This paper proposes a sentiment analysis model using Long ShortTerm Memory (LSTM) based approach , which is a variant of RNNs. LSTMs are good in handling long sentence data. The model is applied to reviews collected from IMDB dataset. It is large dataset that contains 50K reviews. Out of the available reviews 50 % are used for training purpose and 50% are used for testing purpose. The model gives a training accuracy of 92% and validation accuracy of 85% which is neither an over fit nor an under fit. The overall accuracy here is 85%, which seems to be better than some of the existing techniques such as SVM with linear kernel.

Download Full-text

A BERT-based Hybrid Short Text Classification Model Incorporating CNN and Attention-based BiGRU

Journal of Organizational and End User Computing ◽

10.4018/joeuc.294580 ◽

2021 ◽

Vol 33 (6) ◽

pp. 0-0

Keyword(s):

Neural Network ◽

Language Processing ◽

Text Classification ◽

Feature Fusion ◽

Feature Representation ◽

Classification Model ◽

Short Text ◽

Learning Techniques ◽

Proposed Model ◽

Fusion Framework

Short text classification is a research focus for natural language processing (NLP), which is widely used in news classification, sentiment analysis, mail filtering and other fields. In recent years, deep learning techniques are applied to text classification and has made some progress. Different from ordinary text classification, short text has the problem of less vocabulary and feature sparsity, which raise higher request for text semantic feature representation. To address this issue, this paper propose a feature fusion framework based on the Bidirectional Encoder Representations from Transformers (BERT). In this hybrid method, BERT is used to train word vector representation. Convolutional neural network (CNN) capture static features. As a supplement, a bi-gated recurrent neural network (BiGRU) is adopted to capture contextual features. Furthermore, an attention mechanism is introduced to assign the weight of salient words. The experimental results confirmed that the proposed model significantly outperforms the other state-of-the-art baseline methods.

Download Full-text