scholarly journals Ensemble Learning Approach for Clickbait Detection Using Article Headline Features

10.28945/4319 ◽  
2019 ◽  

[This Proceedings paper was revised and published in the 2019 issue of the journal Informing Science: The International Journal of an Emerging Transdiscipline, Volume 22] Aim/Purpose: The aim of this paper is to propose an ensemble learners based classification model for classification clickbaits from genuine article headlines. Background: Clickbaits are online articles with deliberately designed misleading titles for luring more and more readers to open the intended web page. Clickbaits are used to tempted visitors to click on a particular link either to monetize the landing page or to spread the false news for sensationalization. The presence of clickbaits on any news aggregator portal may lead to an unpleasant experience for readers. Therefore, it is essential to distinguish clickbaits from authentic headlines to mitigate their impact on readers’ perception. Methodology: A total of one hundred thousand article headlines are collected from news aggregator sites consists of clickbaits and authentic news headlines. The collected data samples are divided into five training sets of balanced and unbalanced data. The natural language processing techniques are used to extract 19 manually selected features from article headlines. Contribution: Three ensemble learning techniques including bagging, boosting, and random forests are used to design a classifier model for classifying a given headline into the clickbait or non-clickbait. The performances of learners are evaluated using accuracy, precision, recall, and F-measures. Findings: It is observed that the random forest classifier detects clickbaits better than the other classifiers with an accuracy of 91.16 %, a total precision, recall, and f-measure of 91 %.

Author(s):  
Dilip Singh Sisodia

Aim/Purpose: The aim of this paper is to propose an ensemble learners based classification model for classification clickbaits from genuine article headlines. Background: Clickbaits are online articles with deliberately designed misleading titles for luring more and more readers to open the intended web page. Clickbaits are used to tempted visitors to click on a particular link either to monetize the landing page or to spread the false news for sensationalization. The presence of clickbaits on any news aggregator portal may lead to an unpleasant experience for readers. Therefore, it is essential to distinguish clickbaits from authentic headlines to mitigate their impact on readers’ perception. Methodology: A total of one hundred thousand article headlines are collected from news aggregator sites consists of clickbaits and authentic news headlines. The collected data samples are divided into five training sets of balanced and unbalanced data. The natural language processing techniques are used to extract 19 manually selected features from article headlines. Contribution: Three ensemble learning techniques including bagging, boosting, and random forests are used to design a classifier model for classifying a given headline into the clickbait or non-clickbait. The performances of learners are evaluated using accuracy, precision, recall, and F-measures. Findings: It is observed that the random forest classifier detects clickbaits better than the other classifiers with an accuracy of 91.16 %, a total precision, recall, and f-measure of 91 %.


2019 ◽  
Vol 8 (3) ◽  
pp. 6077-6081 ◽  

Plant disease identification and classification is major area of research as majority of people in India depend on agriculture for their main source of income and for food. Identification of the diseases in any crops is challenging since manual identification techniques being used in this are based on the experts advises which may not be efficient. Based on leaf features decisions about variety of diseases are taken. In this paper an automated framework is introduced which can be used to detect and classify the diseases in the leaf accurately. Leaf images are acquired by using digital camera. Pre-processing techniques, segmentation and feature extraction are performed on the acquired images. The features are passed on to the classifiers to classify the diseases. This work has been proposed to classify and distinguish the leaf sample based on its features. The proposed work is carried out with Artificial Neural Network (ANN), Support Vector Machine (SVM) and Naive Bayes classifiers to analyze the result. For given dataset ANN performed better than the other two classifiers


2013 ◽  
pp. 1379-1394
Author(s):  
Xiangrong Zhou ◽  
Hiroshi Fujita

Location of an inner organ in a CT image is the basic information that is required for medical image analysis such as image segmentation, lesion detection, content-based image retrieval, and anatomical annotation. A general approach/scheme for the localization of different inner organs that can be adapted to suit various types of medical image formats is required. However, this is a very challenging problem and can hardly be solved by using traditional image processing techniques. This chapter introduces an ensemble-learning-based approach that can be used to solve organ localization problems. This approach can be used to generate a fast and efficient organ-localization scheme from a limited number of training samples that include both original images and target locations. This approach has been used for localizing five different human organs in CT images, and the accuracy, robustness, and computational efficiency of the designed scheme were validated by experiments.


Sensors ◽  
2021 ◽  
Vol 21 (4) ◽  
pp. 1322
Author(s):  
Wilfredo Graterol ◽  
Jose Diaz-Amado ◽  
Yudith Cardinale ◽  
Irvin Dongo ◽  
Edmundo Lopes-Silva ◽  
...  

For social robots, knowledge regarding human emotional states is an essential part of adapting their behavior or associating emotions to other entities. Robots gather the information from which emotion detection is processed via different media, such as text, speech, images, or videos. The multimedia content is then properly processed to recognize emotions/sentiments, for example, by analyzing faces and postures in images/videos based on machine learning techniques or by converting speech into text to perform emotion detection with natural language processing (NLP) techniques. Keeping this information in semantic repositories offers a wide range of possibilities for implementing smart applications. We propose a framework to allow social robots to detect emotions and to store this information in a semantic repository, based on EMONTO (an EMotion ONTOlogy), and in the first figure or table caption. Please define if appropriate. an ontology to represent emotions. As a proof-of-concept, we develop a first version of this framework focused on emotion detection in text, which can be obtained directly as text or by converting speech to text. We tested the implementation with a case study of tour-guide robots for museums that rely on a speech-to-text converter based on the Google Application Programming Interface (API) and a Python library, a neural network to label the emotions in texts based on NLP transformers, and EMONTO integrated with an ontology for museums; thus, it is possible to register the emotions that artworks produce in visitors. We evaluate the classification model, obtaining equivalent results compared with a state-of-the-art transformer-based model and with a clear roadmap for improvement.


Sentiment analysis combines the natural language processing task and analysis of the text that attempts to predict the sentiment of the text in terms of positive and negative comments. Nowadays, the tremendous volume of news originated via different webpages, and it is feasible to determine the opinion of particular news. This work tries to judge completely various machine learning techniques to classify the view of the news headlines. In this project, propose the appliance of Recurrent Neural Network with Long Short Term Memory Unit(LSTM), focus on seeking out similar news headlines, and predict the opinion of news headlines from numerous sources. The main objective is to classify the sentiment of news headlines from various sources using a recurrent neural network. Interestingly, the proposed attention mechanism performs better than the more complex attention mechanism on a held-out set of articles.


2019 ◽  
pp. 016555151987182
Author(s):  
Abinash Pujahari ◽  
Dilip Singh Sisodia

Clickbaits are online articles with deliberately designed misleading titles for luring more and more readers to open the intended web page. Clickbaits are used to tempt visitors to click on a particular link either to monetise the landing page or to spread the false news for sensationalisation. The presence of clickbaits on any news aggregator portal may lead to unpleasant experience to readers. Automatic detection of clickbait headlines from news headlines has been a challenging issue for the machine learning community. A lot of methods have been proposed for preventing clickbait articles in recent past. However, the recent techniques available in detecting clickbaits are not much robust. This article proposes a hybrid categorisation technique for separating clickbait and non-clickbait articles by integrating different features, sentence structure and clustering. During preliminary categorisation, the headlines are separated using 11 features. After that, the headlines are recategorised using sentence formality and syntactic similarity measures. In the last phase, the headlines are again recategorised by applying clustering using word vector similarity based on t-stochastic neighbourhood embedding ( t-SNE) approach. After categorisation of these headlines, machine learning models are applied to the dataset to evaluate machine learning algorithms. The obtained experimental results indicate that the proposed hybrid model is more robust, reliable and efficient than any individual categorisation techniques for the dataset we have used.


Author(s):  
Sushil Shrestha

The paper is a survey of a work done in the field of web page content vuisualization. The paper start with summary of the semantic graph. Then it describe the process of generation of semantic graph by natural language processing and machine learning techniques and then enriching text with RDF/OWL Encoded Sense as the enhancement of the existing enrycher text conversion and ends with the possible future direction and conclusion. DOI: http://dx.doi.org/10.3126/kuset.v8i1.6052 KUSET 2012; 8(1): 125-133


Author(s):  
Niloufar Shoeibi ◽  
Nastaran Shoeibi ◽  
Guillermo Hernández ◽  
Pablo Chamoso ◽  
Juan Manuel Corchado

Maintaining a healthy cyber society is a big challenge due to the users’ freedom of expression and behaving. It can be solved by monitoring and analyzing the users’ behavior and taking proper actions towards them. This research aims to present a platform that monitors the public content on Twitter by extracting tweet data. After maintaining the data, the users’ interactions are analyzed using Graph Analysis methods. Then the users’ behavioral patterns are analyzed by applying Metadata Analysis, in which the timeline of each profile is obtained; also, the time-series behavioral features of users are investigated. Then in the Abnormal Behavior Detection Filtering component, the interesting profiles are selected for further examinations. Finally, in the Contextual Analysis component, the contents will be analyzed using natural language processing techniques; A binary text classification model (SVM + TF-IDF with 88.89% accuracy) for detecting if the tweet is related to crime or not. Then, a sentiment analysis method is applied to the crime-related tweets to perform aspect-based sentiment analysis (DistilBERT + FFNN with 80% accuracy); because sharing positive opinions about a crime-related topic can threaten society. This platform aims to provide the end-user (Police) suggestions to control hate speech or terrorist propaganda.


Sentiment analysis is a field which deals with assessing the sentiments or emotions of the users on products and services. It takes user comments as input and applies natural language processing techniques to identify the mood of the user. Usually a sentiment is deemed to be positive, negative or neutral depending upon the mood that he expresses in the comments or feedbacks. It is largely used by businesses to improve products and services and also to present its customers with a set of products and services based on their likes and dislikes. State-of-the-art indicates many techniques have been applied in past such as, linear regression and SVM models. Recurrent Neural Networks (RNNs) have improved the way in which sentiment analysis could be done with greater accuracy, but they suffer from major drawback when applied to longer sentences. This paper proposes a sentiment analysis model using Long ShortTerm Memory (LSTM) based approach , which is a variant of RNNs. LSTMs are good in handling long sentence data. The model is applied to reviews collected from IMDB dataset. It is large dataset that contains 50K reviews. Out of the available reviews 50 % are used for training purpose and 50% are used for testing purpose. The model gives a training accuracy of 92% and validation accuracy of 85% which is neither an over fit nor an under fit. The overall accuracy here is 85%, which seems to be better than some of the existing techniques such as SVM with linear kernel.


2021 ◽  
Vol 33 (6) ◽  
pp. 0-0

Short text classification is a research focus for natural language processing (NLP), which is widely used in news classification, sentiment analysis, mail filtering and other fields. In recent years, deep learning techniques are applied to text classification and has made some progress. Different from ordinary text classification, short text has the problem of less vocabulary and feature sparsity, which raise higher request for text semantic feature representation. To address this issue, this paper propose a feature fusion framework based on the Bidirectional Encoder Representations from Transformers (BERT). In this hybrid method, BERT is used to train word vector representation. Convolutional neural network (CNN) capture static features. As a supplement, a bi-gated recurrent neural network (BiGRU) is adopted to capture contextual features. Furthermore, an attention mechanism is introduced to assign the weight of salient words. The experimental results confirmed that the proposed model significantly outperforms the other state-of-the-art baseline methods.


Sign in / Sign up

Export Citation Format

Share Document