Fake news identification: a comparison of parts-of-speech and N-grams with neural networks

Author(s):  
Brandon Stoick ◽  
Nicholas Snell ◽  
Jeremy Straub
Author(s):  
Madusha Prasanjith Thilakarathna ◽  
Vihanga Ashinsana Wijayasekara ◽  
Yasiru Gamage ◽  
Kavindi Hanshani Peiris ◽  
Chanuka Abeysinghe ◽  
...  

2021 ◽  
Author(s):  
Guilherme Zanini Moreira ◽  
Marcelo Romero ◽  
Manassés Ribeiro

After the advent of Web, the number of people who abandoned traditional media channels and started receiving news only through social media has increased. However, this caused an increase of the spread of fake news due to the ease of sharing information. The consequences are various, with one of the main ones being the possible attempts to manipulate public opinion for elections or promotion of movements that can damage rule of law or the institutions that represent it. The objective of this work is to perform fake news detection using Distributed Representations and Recurrent Neural Networks (RNNs). Although fake news detection using RNNs has been already explored in the literature, there is little research on the processing of texts in Portuguese language, which is the focus of this work. For this purpose, distributed representations from texts are generated with three different algorithms (fastText, GloVe and word2vec) and used as input features for a Long Short-term Memory Network (LSTM). The approach is evaluated using a publicly available labelled news dataset. The proposed approach shows promising results for all the three distributed representation methods for feature extraction, with the combination word2vec+LSTM providing the best results. The results of the proposed approach shows a better classification performance when compared to simple architectures, while similar results are obtained when the approach is compared to deeper architectures or more complex methods.


2020 ◽  
Vol 1 (3) ◽  
Author(s):  
Aman Agarwal ◽  
Mamta Mittal ◽  
Akshat Pathak ◽  
Lalit Mohan Goyal

2021 ◽  
Vol 10 (34) ◽  
Author(s):  
A.N SAK ◽  
◽  
E.V BESSONOVA ◽  

When constructing machine translation systems, an important task is to represent data using graphs, where words act as vertices, and relations between words in a sentence act as edges. One of these tasks at the first stage of the analysis is the classification of words as parts of speech, and at the next stage of the analysis to determine the belonging of words to the sentence members’ classes. The article discusses methods of parsing both on the basis of rules determined in advance by means of traditional object-oriented programming, and on the basis of analysis by means of graph convolutional neural networks with their subsequent training. Online dictionaries act as a thesaurus.


Author(s):  
A. E. Bondarev ◽  
A. V. Bondarenko ◽  
V. A. Galaktionov

Abstract. The presented research considers the problems of studying the cluster structure of multidimensional data volumes. This paper presents the results of numerical experiments on the study of data volumes consisting of frequencies of joint use of words from different parts of speech, for instance “noun + verb” or “adjective + noun”. The volumes of data are obtained from samples from text collections in Russian. The aim of the research is to analyze the cluster structure of the studied volume and semantic proximity of words in clusters and subclusters. The hypothesis was used that words with similar meaning should occur in approximately the same context. In this regard, in the space of features, they will be at a relatively close distance from each other, while differing words will be at a more distant distance from each other. Research is carried out using elastic maps, which are effective tools for visual analysis of multidimensional data. The construction of elastic maps and their extensions in the space of the first three principal components makes it possible to determine the cluster structure of the studied multidimensional data volumes. Such analysis can be useful in the tasks of confronting negative verbal influences such as fake news, hidden propaganda, involvement in sects, verbal manipulation, etc. Also this approach can be applied to text collections having medical origin.


Sign in / Sign up

Export Citation Format

Share Document