text representation
Recently Published Documents


TOTAL DOCUMENTS

338
(FIVE YEARS 59)

H-INDEX

15
(FIVE YEARS 0)

2022 ◽  
Vol 19 (3) ◽  
pp. 2671-2699
Author(s):  
Huan Rong ◽  
◽  
Tinghuai Ma ◽  
Xinyu Cao ◽  
Xin Yu ◽  
...  

<abstract> <p>With the rapid development of online social networks, text-communication has become an indispensable part of daily life. Mining the emotion hidden behind the conversation-text is of prime significance and application value when it comes to the government public-opinion supervision, enterprise decision-making, etc. Therefore, in this paper, we propose a text emotion prediction model in a multi-participant text-conversation scenario, which aims to effectively predict the emotion of the text to be posted by target speaker in the future. Specifically, first, an <italic>affective space mapping</italic> is constructed, which represents the original conversation-text as an n-dimensional <italic>affective vector</italic> so as to obtain the text representation on different emotion categories. Second, a similar scene search mechanism is adopted to seek several sub-sequences which contain similar tendency on emotion shift to that of the current conversation scene. Finally, the text emotion prediction model is constructed in a two-layer encoder-decoder structure with the emotion fusion and hybrid attention mechanism introduced at the encoder and decoder side respectively. According to the experimental results, our proposed model can achieve an overall best performance on emotion prediction due to the auxiliary features extracted from similar scenes and the adoption of emotion fusion as well as the hybrid attention mechanism. At the same time, the prediction efficiency can still be controlled at an acceptable level.</p> </abstract>


2022 ◽  
Vol 187 ◽  
pp. 115905
Author(s):  
Yinglong Ma ◽  
Xiaofeng Liu ◽  
Lijiao Zhao ◽  
Yue Liang ◽  
Peng Zhang ◽  
...  

2021 ◽  
pp. 1-12
Author(s):  
Melesio Crespo-Sanchez ◽  
Ivan Lopez-Arevalo ◽  
Edwin Aldana-Bobadilla ◽  
Alejandro Molina-Villegas

In the last few years, text analysis has grown as a keystone in several domains for solving many real-world problems, such as machine translation, spam detection, and question answering, to mention a few. Many of these tasks can be approached by means of machine learning algorithms. Most of these algorithms take as input a transformation of the text in the form of feature vectors containing an abstraction of the content. Most of recent vector representations focus on the semantic component of text, however, we consider that also taking into account the lexical and syntactic components the abstraction of content could be beneficial for learning tasks. In this work, we propose a content spectral-based text representation applicable to machine learning algorithms for text analysis. This representation integrates the spectra from the lexical, syntactic, and semantic components of text producing an abstract image, which can also be treated by both, text and image learning algorithms. These components came from feature vectors of text. For demonstrating the goodness of our proposal, this was tested on text classification and complexity reading score prediction tasks obtaining promising results.


2021 ◽  
Author(s):  
Rami Mohawesh ◽  
Shuxiang Xu ◽  
Matthew Springer ◽  
Muna Al-Hawawreh ◽  
Sumbal Maqsood

Online reviews have a significant influence on customers' purchasing decisions for any products or services. However, fake reviews can mislead both consumers and companies. Several models have been developed to detect fake reviews using machine learning approaches. Many of these models have some limitations resulting in low accuracy in distinguishing between fake and genuine reviews. These models focused only on linguistic features to detect fake reviews and failed to capture the semantic meaning of the reviews. To deal with this, this paper proposes a new ensemble model that employs transformer architecture to discover the hidden patterns in a sequence of fake reviews and detect them precisely. The proposed approach combines three transformer models to improve the robustness of fake and genuine behaviour profiling and modelling to detect fake reviews. The experimental results using semi-real benchmark datasets showed the superiority of the proposed model over state-of-the-art models.


2021 ◽  
Author(s):  
Sebastian Schmidt ◽  
Shahbaz Khan ◽  
Jarno Alanko ◽  
Alexandru I. Tomescu

Kmer-based methods are widely used in bioinformatics, which raises the question of what is the smallest practically usable representation (i.e. plain text) of a set of kmers. We propose a polynomial algorithm computing a minimum such representation (which was previously posed as a potentially NP-hard open problem), as well as an efficient near-minimum greedy heuristic. When compressing genomes of large model organisms, read sets thereof or bacterial pangenomes, with only a minor runtime increase, we decrease the size of the representation by up to 60% over unitigs and 27% over previous work. Additionally, the number of strings is decreased by up to 97% over unitigs and 91% over previous work. Finally we show that a small representation has advantages in downstream applications, as it speeds up queries on the popular kmer indexing tool Bifrost by 1.66x over unitigs and 1.29x over previous work.


Author(s):  
Tham Vo

Recently, advanced techniques in deep learning such as recurrent neural network (GRU, LSTM and Bi-LSTM) and auto-encoding (attention-based transformer and BERT) have achieved great successes in multiple application domains including text summarization. Recent state-of-the-art encoding-based text summarization models such as BertSum, PreSum and DiscoBert have demonstrated significant improvements on extractive text summarization tasks. However, recent models still encounter common problems related to the language-specific dependency which requires the supports of the external NLP tools. Besides that, recent advanced text representation methods, such as BERT as the sentence-level textual encoder, also fail to fully capture the representation of a full-length document. To address these challenges, in this paper we proposed a novel s emantic-ware e mbedding approach for ex tractive text sum marization , called as: SE4ExSum. Our proposed SE4ExSum is an integration between the use of feature graph-of-words (FGOW) with BERT-based encoder for effectively learning the word/sentence-level representations of a given document. Then, the g raph c onvolutional n etwork (GCN) based encoder is applied to learn the global document's representation which is then used to facilitate the text summarization task. Extensive experiments on benchmark datasets show the effectiveness of our proposed model in comparing with recent state-of-the-art text summarization models.


Information ◽  
2021 ◽  
Vol 12 (12) ◽  
pp. 491
Author(s):  
Erjon Skenderi ◽  
Jukka Huhtamäki ◽  
Kostas Stefanidis

In this paper, we consider the task of assigning relevant labels to studies in the social science domain. Manual labelling is an expensive process and prone to human error. Various multi-label text classification machine learning approaches have been proposed to resolve this problem. We introduce a dataset obtained from the Finnish Social Science Archive and comprised of 2968 research studies’ metadata. The metadata of each study includes attributes, such as the “abstract” and the “set of labels”. We used the Bag of Words (BoW), TF-IDF term weighting and pretrained word embeddings obtained from FastText and BERT models to generate the text representations for each study’s abstract field. Our selection of multi-label classification methods includes a Naive approach, Multi-label k Nearest Neighbours (ML-kNN), Multi-Label Random Forest (ML-RF), X-BERT and Parabel. The methods were combined with the text representation techniques and their performance was evaluated on our dataset. We measured the classification accuracy of the combinations using Precision, Recall and F1 metrics. In addition, we used the Normalized Discounted Cumulative Gain to measure the label ranking performance of the selected methods combined with the text representation techniques. The results showed that the ML-RF model achieved a higher classification accuracy with the TF-IDF features and, based on the ranking score, the Parabel model outperformed the other methods.


2021 ◽  
Vol 10 (1) ◽  
Author(s):  
José de Jesús Titla-Tlatelpa ◽  
Rosa María Ortega-Mendoza ◽  
Manuel Montes-y-Gómez ◽  
Luis Villaseñor-Pineda

AbstractDepression is a severe mental health problem. Due to its relevance, the development of computational tools for its detection has attracted increasing attention in recent years. In this context, several research works have addressed the problem using word-based approaches (e.g., a bag of words). This type of representation has shown to be useful, indicating that words act as linguistic markers of depression. However, we believe that in addition to words, their contexts contain implicitly valuable information that could be inferred and exploited to enhance the detection of signs of depression. Specifically, we explore the use of user’s characteristics and the expressed sentiments in the messages as context insights. The main idea is that the words’ discriminative value depends on the characteristics of the person who is writing and on the polarity of the messages where they occur. Hence, this paper introduces a new approach based on specializing the framework of classification to profiles of users (e.g., males or women) and considering the sentiments expressed in the messages through a new text representation that captures their polarity (e.g., positive or negative). The proposed approach was evaluated on benchmark datasets from social media; the results achieved are encouraging, since they outperform those of state-of-the-art corresponding to computationally more expensive methods.


2021 ◽  
Vol 58 (6) ◽  
pp. 102723
Author(s):  
Mohammadreza Samadi ◽  
Maryam Mousavian ◽  
Saeedeh Momtazi

Sign in / Sign up

Export Citation Format

Share Document