Statistical and Similarity Methods for Classifying Emotion in Suicide Notes

Using Ensemble Models to Classify the Sentiment Expressed in Suicide Notes

Biomedical Informatics Insights ◽

10.4137/bii.s8931 ◽

2012 ◽

Vol 5s1 ◽

pp. BII.S8931 ◽

Cited By ~ 5

Author(s):

James A. McCart ◽

Dezon K. Finch ◽

Jay Jarman ◽

Edward Hickling ◽

Jason D. Lind ◽

...

Keyword(s):

Natural Language Processing ◽

Text Mining ◽

Natural Language ◽

Sentiment Analysis ◽

Language Processing ◽

Regular Expression ◽

Shared Task ◽

Suicide Notes ◽

The Mean ◽

The U.S

In 2007, suicide was the tenth leading cause of death in the U.S. Given the significance of this problem, suicide was the focus of the 2011 Informatics for Integrating Biology and the Bedside (i2b2) Natural Language Processing (NLP) shared task competition (track two). Specifically, the challenge concentrated on sentiment analysis, predicting the presence or absence of 15 emotions (labels) simultaneously in a collection of suicide notes spanning over 70 years. Our team explored multiple approaches combining regular expression-based rules, statistical text mining (STM), and an approach that applies weights to text while accounting for multiple labels. Our best submission used an ensemble of both rules and STM models to achieve a micro-averaged F1 score of 0.5023, slightly above the mean from the 26 teams that competed (0.4875).

Download Full-text

Rule-based and Lightly Supervised Methods to Predict Emotions in Suicide Notes

Biomedical Informatics Insights ◽

10.4137/bii.s8953 ◽

2012 ◽

Vol 5s1 ◽

pp. BII.S8953 ◽

Cited By ~ 3

Author(s):

Ted Pedersen

Keyword(s):

Sentiment Analysis ◽

Corpus Analysis ◽

Measures Of Association ◽

Rule Based ◽

The Third ◽

Suicide Notes ◽

Supervised Methods ◽

Rule Based Approach ◽

F Measure

This paper describes the Duluth systems that participated in the Sentiment Analysis track of the i2b2/VA/Cincinnati Children's 2011 Challenge. The top Duluth system was a rule-based approach derived through manual corpus analysis and the use of measures of association to identify significant ngrams. This performed in the median range of systems, attaining an F-measure of 0.45. The second system was automatically derived from the most frequent bigrams unique to one or two emotions. It achieved an F-measure of 0.36. The third system was the union of the first two, and reached an F-measure of 0.44.

Download Full-text

Knowledge Transfer for Entity Resolution with Siamese Neural Networks

Journal of Data and Information Quality ◽

10.1145/3410157 ◽

2021 ◽

Vol 13 (1) ◽

pp. 1-25

Author(s):

Michael Loster ◽

Ioannis Koumarelas ◽

Felix Naumann

Keyword(s):

Knowledge Transfer ◽

Similarity Measure ◽

State Of The Art ◽

Similarity Measures ◽

Engineering Process ◽

Domain Experts ◽

Multiple Datasets ◽

Multiple Data ◽

Domain Expertise ◽

F Measure

The integration of multiple data sources is a common problem in a large variety of applications. Traditionally, handcrafted similarity measures are used to discover, merge, and integrate multiple representations of the same entity—duplicates—into a large homogeneous collection of data. Often, these similarity measures do not cope well with the heterogeneity of the underlying dataset. In addition, domain experts are needed to manually design and configure such measures, which is both time-consuming and requires extensive domain expertise. We propose a deep Siamese neural network, capable of learning a similarity measure that is tailored to the characteristics of a particular dataset. With the properties of deep learning methods, we are able to eliminate the manual feature engineering process and thus considerably reduce the effort required for model construction. In addition, we show that it is possible to transfer knowledge acquired during the deduplication of one dataset to another, and thus significantly reduce the amount of data required to train a similarity measure. We evaluated our method on multiple datasets and compare our approach to state-of-the-art deduplication methods. Our approach outperforms competitors by up to +26 percent F-measure, depending on task and dataset. In addition, we show that knowledge transfer is not only feasible, but in our experiments led to an improvement in F-measure of up to +4.7 percent.

Download Full-text

The Effects of Different Kernels in SVM Sentiment Analysis on Mass Social Distancing

JELIKU (Jurnal Elektronik Ilmu Komputer Udayana) ◽

10.24843/jlk.2020.v09.i02.p01 ◽

2020 ◽

Vol 9 (2) ◽

pp. 161

Author(s):

Komang Dhiyo Yonatha Wijaya ◽

Anak Agung Istri Ngurah Eka Karyawati

Keyword(s):

Social Media ◽

Sentiment Analysis ◽

Kernel Method ◽

Idle Time ◽

Linear Kernel ◽

Social Distancing ◽

The Social ◽

Negative Sentiment ◽

F Measure ◽

Kernel Yield

During this pandemic, social media has become a major need as a means of communication. One of the social medias used is Twitter by using messages referred to as tweets. Indonesia currently undergoing mass social distancing. During this time most people use social media in order to spend their idle time However, sometimes, this result in negative sentiment that used to insult and aimed at an individual or group. To filter that kind of tweets, a sentiment analysis was performed with SVM and 3 different kernel method. Tweets are labelled into 3 classes of positive, neutral, and negative. The experiments are conducted to determine which kernel is better. From the sentiment analysis that has been performed, SVM linear kernel yield the best score Some experiments show that the precision of linear kernel is 57%, recall is 50%, and f-measure is 44%

Download Full-text

Sentiment Analysis of Suicide Notes: A Shared Task

Biomedical Informatics Insights ◽

10.4137/bii.s9042 ◽

2012 ◽

Vol 5s1 ◽

pp. BII.S9042 ◽

Cited By ~ 47

Author(s):

John P. Pestian ◽

Pawel Matykiewicz ◽

Michelle Linn-Gust ◽

Brett South ◽

Ozlem Uzuner ◽

...

Keyword(s):

Preliminary Analysis ◽

The Other ◽

Future Research ◽

Large Set ◽

Biomedical Domain ◽

Shared Task ◽

Clinical Text ◽

Evaluation Measures ◽

Suicide Notes ◽

Data Production

This paper reports on a shared task involving the assignment of emotions to suicide notes. Two features distinguished this task from previous shared tasks in the biomedical domain. One is that it resulted in the corpus of fully anonymized clinical text and annotated suicide notes. This resource is permanently available and will (we hope) facilitate future research. The other key feature of the task is that it required categorization with respect to a large set of labels. The number of participants was larger than in any previous biomedical challenge task. We describe the data production process and the evaluation measures, and give a preliminary analysis of the results. Many systems performed at levels approaching the inter-coder agreement, suggesting that human-like performance on this task is within the reach of currently available technologies.

Download Full-text

VLSP SHARED TASK: SENTIMENT ANALYSIS

Journal of Computer Science and Cybernetics ◽

10.15625/1813-9663/34/4/13160 ◽

2019 ◽

Vol 34 (4) ◽

pp. 295-310 ◽

Cited By ~ 3

Author(s):

Huyen T M Nguyen ◽

Hung V Nguyen ◽

Quyen T Ngo ◽

Luong X Vu ◽

Vu Mai Tran ◽

...

Keyword(s):

Sentiment Analysis ◽

Language Processing ◽

Objective Evaluation ◽

Shared Task ◽

Electronic Products ◽

Performance Quality ◽

Benchmark Datasets ◽

Active Research ◽

Evaluation Measurement

Sentiment analysis is a natural language processing (NLP) task of identifying orextracting the sentiment content of a text unit. This task has become an active research topic since the early 2000s. During the two last editions of the VLSP workshop series, the shared task on Sentiment Analysis (SA) for Vietnamese has been organized in order to provide an objective evaluation measurement about the performance (quality) of sentiment analysis tools, and encouragethe development of Vietnamese sentiment analysis systems, as well as to provide benchmark datasets for this task. The rst campaign in 2016 only focused on the sentiment polarity classication, with a dataset containing reviews of electronic products. The second campaign in 2018 addressed the problem of Aspect Based Sentiment Analysis (ABSA) for Vietnamese, by providing two datasets containing reviews in restaurant and hotel domains. These data are accessible for research purpose via the VLSP website vlsp.org.vn/resources. This paper describes the built datasets as well as the evaluation results of the systems participating to these campaigns.

Download Full-text

An Evolutionary-Based Sentiment Analysis Approach for Enhancing Government Decisions during COVID-19 Pandemic: The Case of Jordan

Applied Sciences ◽

10.3390/app11199080 ◽

2021 ◽

Vol 11 (19) ◽

pp. 9080

Author(s):

Ruba Obiedat ◽

Osama Harfoushi ◽

Raneem Qaddoura ◽

Laila Al-Qaisi ◽

Ala’ M. Al-Zoubi

Keyword(s):

Decision Support ◽

Decision Support System ◽

Sentiment Analysis ◽

Support System ◽

Support Vector ◽

Whale Optimization ◽

Vector Machines ◽

Standard Classification ◽

The Government ◽

F Measure

The world has witnessed recently a global outbreak of coronavirus disease (COVID-19). This pandemic has affected many countries and has resulted in worldwide health concerns, thus governments are attempting to reduce its spread and impact on different aspects of life such as health, economics, education, and politics by making emergent decisions and policies (e.g., lockdown and social distancing). These new regulations influenced people’s daily life and cast significant burdens, concerns, and disparities on various population groups. Taking the wrong actions and enforcing bad decisions by some countries result in increasing the contagion rate and more catastrophic results. People start to post their opinions and feelings about their government’s decisions on different social media networks, and the data received through these platforms present a very useful source of information that affects how governments perceive and cope with the current the pandemic. Jordan was one of the top affected countries. In this paper, we proposed a decision support system based on the sentiment analysis mechanism by combining support vector machines with a whale optimization algorithm for automatically tuning the hyperparameters and performing feature weighting. The work is based on a hybrid evolutionary approach that aims to perform sentiment analysis combined with a decision support system to study people’s posts on Facebook to investigate their attitudes and feelings toward the government’s decisions during the pandemic. The government regulations were divided into two periods: the first and latter regulations. Studying public sentiments during these periods allows decision-makers in the government to sense people’s feelings, alert them in case of possible threats, and help in making proactive actions if needed to better handle the current pandemic situation. Five different versions were generated for each of the two collected datasets. The results demonstrate the superiority of the proposed Whale Optimization Algorithm & Support Vector Machines (WOA-SVM) against other metaheuristic algorithms and standard classification models as WOA-SVM has achieved 78.78% in terms of accuracy and 84.64% in term of f-measure, while other standard classification models such as NB, k-NN, J84, and SVM achieved an accuracy of 69.25%, 69.78%, 70.17%, and 69.29%, respectively, with 64.15%, 62.90%, 60.51%, and 59.09% F-measure. Moreover, when comparing our proposed WOA-SVM approach with other metaheuristic algorithms, which are GA-SVM, PSO-SVM, and MVO-SVM, WOA-SVM proved to outperform the other approaches with results of 78.78% in terms of accuracy and 84.64% in terms of F-measure. Further, we investigate and analyze the most relevant features and their effect to improve the decision support system of government decisions.

Download Full-text

An Efficient Framework for Vietnamese Sentiment Classification

Knowledge Innovation Through Intelligent Software Methodologies, Tools and Techniques - Frontiers in Artificial Intelligence and Applications ◽

10.3233/faia200579 ◽

2020 ◽

Author(s):

Cuong V. Nguyen ◽

Khiem H. Le ◽

Anh M. Tran ◽

Binh T. Nguyen

Keyword(s):

Product Quality ◽

Sentiment Analysis ◽

New Products ◽

Classification Problem ◽

Research Community ◽

Sentiment Classification ◽

Experimental Results ◽

Data Sets ◽

Online Retailers

With the booming development of E-commerce platforms in many counties, there is a massive amount of customers’ review data in different products and services. Understanding customers’ feedbacks in both current and new products can give online retailers the possibility to improve the product quality, meet customers’ expectations, and increase the corresponding revenue. In this paper, we investigate the Vietnamese sentiment classification problem on two datasets containing Vietnamese customers’ reviews. We propose eight different approaches, including Bi-LSTM, Bi-LSTM + Attention, Bi-GRU, Bi-GRU + Attention, Recurrent CNN, Residual CNN, Transformer, and PhoBERT, and conduct all experiments on two datasets, AIVIVN 2019 and our dataset self-collected from multiple Vietnamese e-commerce websites. The experimental results show that all our proposed methods outperform the winning solution of the competition “AIVIVN 2019 Sentiment Champion” with a significant margin. Especially, Recurrent CNN has the best performance in comparison with other algorithms in terms of both AUC (98.48%) and F1-score (93.42%) in this competition dataset and also surpasses other techniques in our dataset collected. Finally, we aim to publish our codes, and these two data-sets later to contribute to the current research community related to the field of sentiment analysis.

Download Full-text

A Semantic Approach for News Recommendation

Business Intelligence Applications and the Web - Advances in Business Information Systems and Analytics ◽

10.4018/978-1-61350-038-5.ch005 ◽

2011 ◽

pp. 102-121 ◽

Cited By ~ 3

Author(s):

Flavius Frasincar ◽

Wouter IJntema ◽

Frank Goossen ◽

Frederik Hogenboom

Keyword(s):

Recommender Systems ◽

Similarity Measures ◽

Extraction Methods ◽

Decision Processes ◽

Semantic Approach ◽

Business Decision ◽

Term Extraction ◽

Cosine Similarity Measure ◽

News Recommendation ◽

F Measure

News items play an increasingly important role in the current business decision processes. Due to the large amount of news published every day it is difficult to find the new items of one’s interest. One solution to this problem is based on employing recommender systems. Traditionally, these recommenders use term extraction methods like TF-IDF combined with the cosine similarity measure. In this chapter, we explore semantic approaches for recommending news items by employing several semantic similarity measures. We have used existing semantic similarities as well as proposed new solutions for computing semantic similarities. Both traditional and semantic recommender approaches, some new, have been implemented in Athena, an extension of the Hermes news personalization framework. Based on the performed evaluation, we conclude that semantic recommender systems in general outperform traditional recommenders systems with respect to accuracy, precision, and recall, and that the new semantic recommenders have a better F-measure than existing semantic recommenders.

Download Full-text

Implementation of n-gram Methodology for Rotten Tomatoes Review Dataset Sentiment Analysis

International Journal of Knowledge Discovery in Bioinformatics ◽

10.4018/ijkdb.2017010103 ◽

2017 ◽

Vol 7 (1) ◽

pp. 30-41 ◽

Cited By ~ 12

Author(s):

Prayag Tiwari ◽

Brojo Kishore Mishra ◽

Sachin Kumar ◽

Vivek Kumar

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Sentiment Analysis ◽

Maximum Entropy ◽

Learning Strategies ◽

Supervised Machine Learning ◽

Support Vector ◽

N Gram ◽

F Measure ◽

Blog Posts

Sentiment Analysis intends to get the basic perspective of the content, which may be anything that holds a subjective supposition, for example, an online audit, Comments on Blog posts, film rating and so forth. These surveys and websites might be characterized into various extremity gatherings, for example, negative, positive, and unbiased keeping in mind the end goal to concentrate data from the info dataset. Supervised machine learning strategies group these reviews. In this paper, three distinctive machine learning calculations, for example, Support Vector Machine (SVM), Maximum Entropy (ME) and Naive Bayes (NB), have been considered for the arrangement of human conclusions. The exactness of various strategies is basically inspected keeping in mind the end goal to get to their execution on the premise of parameters, e.g. accuracy, review, f-measure, and precision.

Download Full-text