scholarly journals Three Hybrid Classifiers for the Detection of Emotions in Suicide Notes

2012 ◽  
Vol 5s1 ◽  
pp. BII.S8967 ◽  
Author(s):  
Maria Liakata ◽  
Jee-Hyub Kim ◽  
Shyamasree Saha ◽  
Janna Hastings ◽  
Dietrich Rebholz-Schuhmann

We describe our approach for creating a system able to detect emotions in suicide notes. Motivated by the sparse and imbalanced data as well as the complex annotation scheme, we have considered three hybrid approaches for distinguishing between the different categories. Each of the three approaches combines machine learning with manually derived rules, where the latter target very sparse emotion categories. The first approach considers the task as single label multi-class classification, where an SVM and a CRF classifier are trained to recognise fifteen different categories and their results are combined. Our second approach trains individual binary classifiers (SVM and CRF) for each of the fifteen sentence categories and returns the union of the classifiers as the final result. Finally, our third approach is a combination of binary and multi-class classifiers (SVM and CRF) trained on different subsets of the training data. We considered a number of different feature configurations. All three systems were tested on 300 unseen messages. Our second system had the best performance of the three, yielding an F1 score of 45.6% and a Precision of 60.1% whereas our best Recall (43.6%) was obtained using the third system.

2012 ◽  
Vol 5s1 ◽  
pp. BII.S8933 ◽  
Author(s):  
Colin Cherry ◽  
Saif M. Mohammad ◽  
Berry De Bruijn

This paper describes the National Research Council of Canada's submission to the 2011 i2b2 NLP challenge on the detection of emotions in suicide notes. In this task, each sentence of a suicide note is annotated with zero or more emotions, making it a multi-label sentence classification task. We employ two distinct large-margin models capable of handling multiple labels. The first uses one classifier per emotion, and is built to simplify label balance issues and to allow extremely fast development. This approach is very effective, scoring an F-measure of 55.22 and placing fourth in the competition, making it the best system that does not use web-derived statistics or re-annotated training data. Second, we present a latent sequence model, which learns to segment the sentence into a number of emotion regions. This model is intended to gracefully handle sentences that convey multiple thoughts and emotions. Preliminary work with the latent sequence model shows promise, resulting in comparable performance using fewer features.


2021 ◽  
pp. 2-11
Author(s):  
David Aufreiter ◽  
Doris Ehrlinger ◽  
Christian Stadlmann ◽  
Margarethe Uberwimmer ◽  
Anna Biedersberger ◽  
...  

On the servitization journey, manufacturing companies complement their offerings with new industrial and knowledge-based services, which causes challenges of uncertainty and risk. In addition to the required adjustment of internal factors, the international selling of services is a major challenge. This paper presents the initial results of an international research project aimed at assisting advanced manufacturers in making decisions about exporting their service offerings to foreign markets. In the frame of this project, a tool is developed to support managers in their service export decisions through the automated generation of market information based on Natural Language Processing and Machine Learning. The paper presents a roadmap for progressing towards an Artificial Intelligence-based market information solution. It describes the research process steps of analyzing problem statements of relevant industry partners, selecting target countries and markets, defining parameters for the scope of the tool, classifying different service offerings and their components into categories and developing annotation scheme for generating reliable and focused training data for the Artificial Intelligence solution. This paper demonstrates good practices in essential steps and highlights common pitfalls to avoid for researcher and managers working on future research projects supported by Artificial Intelligence. In the end, the paper aims at contributing to support and motivate researcher and manager to discover AI application and research opportunities within the servitization field.


2016 ◽  
Vol 42 (3) ◽  
pp. 391-419 ◽  
Author(s):  
Weiwei Sun ◽  
Xiaojun Wan

From the perspective of structural linguistics, we explore paradigmatic and syntagmatic lexical relations for Chinese POS tagging, an important and challenging task for Chinese language processing. Paradigmatic lexical relations are explicitly captured by word clustering on large-scale unlabeled data and are used to design new features to enhance a discriminative tagger. Syntagmatic lexical relations are implicitly captured by syntactic parsing in the constituency formalism, and are utilized via system combination. Experiments on the Penn Chinese Treebank demonstrate the importance of both paradigmatic and syntagmatic relations. Our linguistically motivated, hybrid approaches yield a relative error reduction of 18% in total over state-of-the-art baselines. Despite the effectiveness to boost accuracy, computationally expensive parsers make hybrid systems inappropriate for many realistic NLP applications. In this article, we are also concerned with improving tagging efficiency at test time. In particular, we explore unlabeled data to transfer the predictive power of hybrid models to simple sequence models. Specifically, hybrid systems are utilized to create large-scale pseudo training data for cheap models. Experimental results illustrate that the re-compiled models not only achieve high accuracy with respect to per token classification, but also serve as a front-end to a parser well.


2017 ◽  
Author(s):  
Marie Lachaize ◽  
Sylvie Le Hégarat-Mascle ◽  
Emanuel Aldea ◽  
Aude Maitrot ◽  
Roger Reynaud

Author(s):  
Uzma Batool ◽  
Mohd Ibrahim Shapiai ◽  
Nordinah Ismail ◽  
Hilman Fauzi ◽  
Syahrizal Salleh

Silicon wafer defect data collected from fabrication facilities is intrinsically imbalanced because of the variable frequencies of defect types. Frequently occurring types will have more influence on the classification predictions if a model gets trained on such skewed data. A fair classifier for such imbalanced data requires a mechanism to deal with type imbalance in order to avoid biased results. This study has proposed a convolutional neural network for wafer map defect classification, employing oversampling as an imbalance addressing technique. To have an equal participation of all classes in the classifier’s training, data augmentation has been employed, generating more samples in minor classes. The proposed deep learning method has been evaluated on a real wafer map defect dataset and its classification results on the test set returned a 97.91% accuracy. The results were compared with another deep learning based auto-encoder model demonstrating the proposed method, a potential approach for silicon wafer defect classification that needs to be investigated further for its robustness.


2003 ◽  
Vol 9 (2) ◽  
pp. 127-149 ◽  
Author(s):  
RIE KUBOTA ANDO ◽  
LILLIAN LEE

Given the lack of word delimiters in written Japanese, word segmentation is generally considered a crucial first step in processing Japanese texts. Typical Japanese segmentation algorithms rely either on a lexicon and syntactic analysis or on pre-segmented data; but these are labor-intensive, and the lexico-syntactic techniques are vulnerable to the unknown word problem. In contrast, we introduce a novel, more robust statistical method utilizing unsegmented training data. Despite its simplicity, the algorithm yields performance on long kanji sequences comparable to and sometimes surpassing that of state-of-the-art morphological analyzers over a variety of error metrics. The algorithm also outperforms another mostly-unsupervised statistical algorithm previously proposed for Chinese. Additionally, we present a two-level annotation scheme for Japanese to incorporate multiple segmentation granularities, and introduce two novel evaluation metrics, both based on the notion of a compatible bracket, that can account for multiple granularities simultaneously.


2014 ◽  
Vol 926-930 ◽  
pp. 3373-3378 ◽  
Author(s):  
Dan Yang Qi ◽  
Zheng Jiang

Aiming at the problem of capsule defect species diversity and classification difficulty in the process of actual capsule defect detection, this paper extracts capsule defect feature based on capsule texture, shape and capsule defect region by edge detector, and then applies hierarchical SVMs multi-class classification to classifying. In order to resolve the problems of training data imbalance and the hierarchical SVM error accumulation, a algorithm of constructing hierarchical structure is proposed that takes the principle of dividing all sample data into two more imbalanced categories according to the length of training data, and then considering significant degree of capsule defect and the probability level of capsule defect occurrence. The experimental results show that compared with the method of BP neural network, the hierarchical SVMs achieved a better classification result.


Author(s):  
S. K. Gupta ◽  
M. Jhunjhunwalla ◽  
A. Bhardwaj ◽  
D. P. Shukla

Abstract. Machine learning methods such as artificial neural network, support vector machine etc. require a large amount of training data, however, the number of landslide occurrences are limited in a study area. The limited number of landslides leads to a small number of positive class pixels in the training data. On contrary, the number of non-landslide pixels (negative class pixels) are enormous in numbers. This under-represented data and severe class distribution skew create a data imbalance for learning algorithms and suboptimal models, which are biased towards the majority class (non-landslide pixels) and have low performance on the minority class (landslide pixels).In this work, we have used two algorithms namely EasyEnsemble and BalanceCascade for balancing the data. This balanced data is used with feature selection methods such as fisher discriminant analysis (FDA), logistic regression (LR) and artificial neural network (ANN) to generate LSZ maps The results of the study show that ANN with balanced data has major improvements in preparation of susceptibility maps over imbalanced data, where as the LR method is ill-effected by data balancing algorithms. The FDA does not show significant changes between balanced and imbalanced data.


Sign in / Sign up

Export Citation Format

Share Document