text representation Latest Research Papers

TEP2MP: A text-emotion prediction model oriented to multi-participant text-conversation scenario with hybrid attention enhancement

Mathematical Biosciences and Engineering ◽

10.3934/mbe.2022122 ◽

2022 ◽

Vol 19 (3) ◽

pp. 2671-2699

Author(s):

Huan Rong ◽

◽

Tinghuai Ma ◽

Xinyu Cao ◽

Xin Yu ◽

...

Keyword(s):

Prediction Model ◽

Online Social Networks ◽

Rapid Development ◽

Attention Mechanism ◽

Text Representation ◽

Search Mechanism ◽

Proposed Model ◽

The Government ◽

Prediction Efficiency ◽

Target Speaker

<abstract> <p>With the rapid development of online social networks, text-communication has become an indispensable part of daily life. Mining the emotion hidden behind the conversation-text is of prime significance and application value when it comes to the government public-opinion supervision, enterprise decision-making, etc. Therefore, in this paper, we propose a text emotion prediction model in a multi-participant text-conversation scenario, which aims to effectively predict the emotion of the text to be posted by target speaker in the future. Specifically, first, an <italic>affective space mapping</italic> is constructed, which represents the original conversation-text as an n-dimensional <italic>affective vector</italic> so as to obtain the text representation on different emotion categories. Second, a similar scene search mechanism is adopted to seek several sub-sequences which contain similar tendency on emotion shift to that of the current conversation scene. Finally, the text emotion prediction model is constructed in a two-layer encoder-decoder structure with the emotion fusion and hybrid attention mechanism introduced at the encoder and decoder side respectively. According to the experimental results, our proposed model can achieve an overall best performance on emotion prediction due to the auxiliary features extracted from similar scenes and the adoption of emotion fusion as well as the hybrid attention mechanism. At the same time, the prediction efficiency can still be controlled at an acceptable level.</p> </abstract>

Download Full-text

Hybrid embedding-based text representation for hierarchical multi-label text classification

Expert Systems with Applications ◽

10.1016/j.eswa.2021.115905 ◽

2022 ◽

Vol 187 ◽

pp. 115905

Author(s):

Yinglong Ma ◽

Xiaofeng Liu ◽

Lijiao Zhao ◽

Yue Liang ◽

Peng Zhang ◽

...

Keyword(s):

Text Classification ◽

Text Representation

Download Full-text

A content spectral-based text representation

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-219248 ◽

2021 ◽

pp. 1-12

Author(s):

Melesio Crespo-Sanchez ◽

Ivan Lopez-Arevalo ◽

Edwin Aldana-Bobadilla ◽

Alejandro Molina-Villegas

Keyword(s):

Machine Learning ◽

Text Analysis ◽

Question Answering ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Text Representation ◽

Feature Vectors ◽

Learning Tasks ◽

Semantic Component ◽

Vector Representations

In the last few years, text analysis has grown as a keystone in several domains for solving many real-world problems, such as machine translation, spam detection, and question answering, to mention a few. Many of these tasks can be approached by means of machine learning algorithms. Most of these algorithms take as input a transformation of the text in the form of feature vectors containing an abstraction of the content. Most of recent vector representations focus on the semantic component of text, however, we consider that also taking into account the lexical and syntactic components the abstraction of content could be beneficial for learning tasks. In this work, we propose a content spectral-based text representation applicable to machine learning algorithms for text analysis. This representation integrates the spectra from the lexical, syntactic, and semantic components of text producing an abstract image, which can also be treated by both, text and image learning algorithms. These components came from feature vectors of text. For demonstrating the goodness of our proposal, this was tested on text classification and complexity reading score prediction tasks obtaining promising results.

Download Full-text

Fake or Genuine? Contextualised Text Representation for Fake Review Detection

10.5121/csit.2021.112311 ◽

2021 ◽

Author(s):

Rami Mohawesh ◽

Shuxiang Xu ◽

Matthew Springer ◽

Muna Al-Hawawreh ◽

Sumbal Maqsood

Keyword(s):

State Of The Art ◽

Online Reviews ◽

Learning Approaches ◽

Text Representation ◽

Purchasing Decisions ◽

Linguistic Features ◽

Proposed Model ◽

Benchmark Datasets ◽

Hidden Patterns ◽

Fake Reviews

Online reviews have a significant influence on customers' purchasing decisions for any products or services. However, fake reviews can mislead both consumers and companies. Several models have been developed to detect fake reviews using machine learning approaches. Many of these models have some limitations resulting in low accuracy in distinguishing between fake and genuine reviews. These models focused only on linguistic features to detect fake reviews and failed to capture the semantic meaning of the reviews. To deal with this, this paper proposes a new ensemble model that employs transformer architecture to discover the hidden patterns in a sequence of fake reviews and detect them precisely. The proposed approach combines three transformer models to improve the robustness of fake and genuine behaviour profiling and modelling to detect fake reviews. The experimental results using semi-real benchmark datasets showed the superiority of the proposed model over state-of-the-art models.

Download Full-text

Matchtigs: minimum plain text representation of kmer sets

10.1101/2021.12.15.472871 ◽

2021 ◽

Author(s):

Sebastian Schmidt ◽

Shahbaz Khan ◽

Jarno Alanko ◽

Alexandru I. Tomescu

Keyword(s):

Open Problem ◽

Polynomial Algorithm ◽

Model Organisms ◽

Greedy Heuristic ◽

Np Hard ◽

Text Representation ◽

Plain Text ◽

A Minor ◽

Large Model

Kmer-based methods are widely used in bioinformatics, which raises the question of what is the smallest practically usable representation (i.e. plain text) of a set of kmers. We propose a polynomial algorithm computing a minimum such representation (which was previously posed as a potentially NP-hard open problem), as well as an efficient near-minimum greedy heuristic. When compressing genomes of large model organisms, read sets thereof or bacterial pangenomes, with only a minor runtime increase, we decrease the size of the representation by up to 60% over unitigs and 27% over previous work. Additionally, the number of strings is decreased by up to 97% over unitigs and 91% over previous work. Finally we show that a small representation has advantages in downstream applications, as it speeds up queries on the popular kmer indexing tool Bifrost by 1.66x over unitigs and 1.29x over previous work.

Download Full-text

SE4ExSum: An Integrated Semantic-aware Neural Approach with Graph Convolutional Network for Extractive Text Summarization

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3464426 ◽

2021 ◽

Vol 20 (6) ◽

pp. 1-22

Author(s):

Tham Vo

Keyword(s):

Neural Network ◽

Deep Learning ◽

State Of The Art ◽

Text Summarization ◽

Text Representation ◽

Convolutional Network ◽

Sentence Level ◽

Proposed Model ◽

Benchmark Datasets ◽

Common Problems

Recently, advanced techniques in deep learning such as recurrent neural network (GRU, LSTM and Bi-LSTM) and auto-encoding (attention-based transformer and BERT) have achieved great successes in multiple application domains including text summarization. Recent state-of-the-art encoding-based text summarization models such as BertSum, PreSum and DiscoBert have demonstrated significant improvements on extractive text summarization tasks. However, recent models still encounter common problems related to the language-specific dependency which requires the supports of the external NLP tools. Besides that, recent advanced text representation methods, such as BERT as the sentence-level textual encoder, also fail to fully capture the representation of a full-length document. To address these challenges, in this paper we proposed a novel s emantic-ware e mbedding approach for ex tractive text sum marization , called as: SE4ExSum. Our proposed SE4ExSum is an integration between the use of feature graph-of-words (FGOW) with BERT-based encoder for effectively learning the word/sentence-level representations of a given document. Then, the g raph c onvolutional n etwork (GCN) based encoder is applied to learn the global document's representation which is then used to facilitate the text summarization task. Extensive experiments on benchmark datasets show the effectiveness of our proposed model in comparing with recent state-of-the-art text summarization models.

Download Full-text

Multi-Keyword Classification: A Case Study in Finnish Social Sciences Data Archive

Information ◽

10.3390/info12120491 ◽

2021 ◽

Vol 12 (12) ◽

pp. 491

Author(s):

Erjon Skenderi ◽

Jukka Huhtamäki ◽

Kostas Stefanidis

Keyword(s):

Social Science ◽

Classification Accuracy ◽

Human Error ◽

Learning Approaches ◽

Text Representation ◽

Nearest Neighbours ◽

Ranking Performance ◽

Ranking Score ◽

Science Domain ◽

Representation Techniques

In this paper, we consider the task of assigning relevant labels to studies in the social science domain. Manual labelling is an expensive process and prone to human error. Various multi-label text classification machine learning approaches have been proposed to resolve this problem. We introduce a dataset obtained from the Finnish Social Science Archive and comprised of 2968 research studies’ metadata. The metadata of each study includes attributes, such as the “abstract” and the “set of labels”. We used the Bag of Words (BoW), TF-IDF term weighting and pretrained word embeddings obtained from FastText and BERT models to generate the text representations for each study’s abstract field. Our selection of multi-label classification methods includes a Naive approach, Multi-label k Nearest Neighbours (ML-kNN), Multi-Label Random Forest (ML-RF), X-BERT and Parabel. The methods were combined with the text representation techniques and their performance was evaluated on our dataset. We measured the classification accuracy of the combinations using Precision, Recall and F1 metrics. In addition, we used the Normalized Discounted Cumulative Gain to measure the label ranking performance of the selected methods combined with the text representation techniques. The results showed that the ML-RF model achieved a higher classification accuracy with the TF-IDF features and, based on the ranking score, the Parabel model outperformed the other methods.

Download Full-text

Text Representation and Cognitive Processes: How the Mind Makes Meaning in e-Learning

10.9734/bpi/nvst/v10/1615a ◽

2021 ◽

pp. 104-108

Author(s):

Susan Smith Nash

Keyword(s):

Cognitive Processes ◽

Text Representation ◽

E Learning ◽

The Mind

Download Full-text

A profile-based sentiment-aware approach for depression detection in social media

EPJ Data Science ◽

10.1140/epjds/s13688-021-00309-3 ◽

2021 ◽

Vol 10 (1) ◽

Author(s):

José de Jesús Titla-Tlatelpa ◽

Rosa María Ortega-Mendoza ◽

Manuel Montes-y-Gómez ◽

Luis Villaseñor-Pineda

Keyword(s):

Social Media ◽

Main Idea ◽

Text Representation ◽

Computational Tools ◽

New Approach ◽

Linguistic Markers ◽

Depression Detection ◽

Benchmark Datasets ◽

Discriminative Value ◽

Severe Mental Health Problem

AbstractDepression is a severe mental health problem. Due to its relevance, the development of computational tools for its detection has attracted increasing attention in recent years. In this context, several research works have addressed the problem using word-based approaches (e.g., a bag of words). This type of representation has shown to be useful, indicating that words act as linguistic markers of depression. However, we believe that in addition to words, their contexts contain implicitly valuable information that could be inferred and exploited to enhance the detection of signs of depression. Specifically, we explore the use of user’s characteristics and the expressed sentiments in the messages as context insights. The main idea is that the words’ discriminative value depends on the characteristics of the person who is writing and on the polarity of the messages where they occur. Hence, this paper introduces a new approach based on specializing the framework of classification to profiles of users (e.g., males or women) and considering the sentiments expressed in the messages through a new text representation that captures their polarity (e.g., positive or negative). The proposed approach was evaluated on benchmark datasets from social media; the results achieved are encouraging, since they outperform those of state-of-the-art corresponding to computationally more expensive methods.

Download Full-text

Deep contextualized text representation and learning for fake news detection

Information Processing & Management ◽

10.1016/j.ipm.2021.102723 ◽

2021 ◽

Vol 58 (6) ◽

pp. 102723

Author(s):

Mohammadreza Samadi ◽

Maryam Mousavian ◽

Saeedeh Momtazi

Keyword(s):

Text Representation ◽

Fake News

Download Full-text

text representation
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

TEP2MP: A text-emotion prediction model oriented to multi-participant text-conversation scenario with hybrid attention enhancement

Hybrid embedding-based text representation for hierarchical multi-label text classification

A content spectral-based text representation

Fake or Genuine? Contextualised Text Representation for Fake Review Detection

Matchtigs: minimum plain text representation of kmer sets

SE4ExSum: An Integrated Semantic-aware Neural Approach with Graph Convolutional Network for Extractive Text Summarization

Multi-Keyword Classification: A Case Study in Finnish Social Sciences Data Archive

Text Representation and Cognitive Processes: How the Mind Makes Meaning in e-Learning

A profile-based sentiment-aware approach for depression detection in social media

Deep contextualized text representation and learning for fake news detection

Export Citation Format

text representationRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

TEP2MP: A text-emotion prediction model oriented to multi-participant text-conversation scenario with hybrid attention enhancement

Hybrid embedding-based text representation for hierarchical multi-label text classification

A content spectral-based text representation

Fake or Genuine? Contextualised Text Representation for Fake Review Detection

Matchtigs: minimum plain text representation of kmer sets

SE4ExSum: An Integrated Semantic-aware Neural Approach with Graph Convolutional Network for Extractive Text Summarization

Multi-Keyword Classification: A Case Study in Finnish Social Sciences Data Archive

Text Representation and Cognitive Processes: How the Mind Makes Meaning in e-Learning

A profile-based sentiment-aware approach for depression detection in social media

Deep contextualized text representation and learning for fake news detection

text representation
Recently Published Documents