Three Hybrid Classifiers for the Detection of Emotions in Suicide Notes

Biomedical Informatics Insights ◽

10.4137/bii.s8967 ◽

2012 ◽

Vol 5s1 ◽

pp. BII.S8967 ◽

Cited By ~ 6

Author(s):

Maria Liakata ◽

Jee-Hyub Kim ◽

Shyamasree Saha ◽

Janna Hastings ◽

Dietrich Rebholz-Schuhmann

Keyword(s):

Imbalanced Data ◽

Training Data ◽

Annotation Scheme ◽

The Third ◽

Hybrid Approaches ◽

Suicide Notes ◽

Binary Classifiers ◽

Hybrid Classifiers ◽

Multi Class Classification ◽

Derived Rules

We describe our approach for creating a system able to detect emotions in suicide notes. Motivated by the sparse and imbalanced data as well as the complex annotation scheme, we have considered three hybrid approaches for distinguishing between the different categories. Each of the three approaches combines machine learning with manually derived rules, where the latter target very sparse emotion categories. The first approach considers the task as single label multi-class classification, where an SVM and a CRF classifier are trained to recognise fifteen different categories and their results are combined. Our second approach trains individual binary classifiers (SVM and CRF) for each of the fifteen sentence categories and returns the union of the classifiers as the final result. Finally, our third approach is a combination of binary and multi-class classifiers (SVM and CRF) trained on different subsets of the training data. We considered a number of different feature configurations. All three systems were tested on 300 unseen messages. Our second system had the best performance of the three, yielding an F1 score of 45.6% and a Precision of 60.1% whereas our best Recall (43.6%) was obtained using the third system.

Download Full-text

Binary Classifiers and Latent Sequence Models for Emotion Detection in Suicide Notes

Biomedical Informatics Insights ◽

10.4137/bii.s8933 ◽

2012 ◽

Vol 5s1 ◽

pp. BII.S8933 ◽

Cited By ~ 18

Author(s):

Colin Cherry ◽

Saif M. Mohammad ◽

Berry De Bruijn

Keyword(s):

National Research Council ◽

Training Data ◽

Emotion Detection ◽

Suicide Notes ◽

Comparable Performance ◽

Fast Development ◽

Suicide Note ◽

Sentence Classification ◽

Binary Classifiers ◽

F Measure

This paper describes the National Research Council of Canada's submission to the 2011 i2b2 NLP challenge on the detection of emotions in suicide notes. In this task, each sentence of a suicide note is annotated with zero or more emotions, making it a multi-label sentence classification task. We employ two distinct large-margin models capable of handling multiple labels. The first uses one classifier per emotion, and is built to simplify label balance issues and to allow extremely fast development. This approach is very effective, scoring an F-measure of 55.22 and placing fourth in the competition, making it the best system that does not use web-derived statistics or re-annotated training data. Second, we present a latent sequence model, which learns to segment the sentence into a number of emotion regions. This model is intended to gracefully handle sentences that convey multiple thoughts and emotions. Preliminary work with the latent sequence model shows promise, resulting in comparable performance using fewer features.

Download Full-text

Automatic information retrievement for exporting services: First project findings from the development of an AI based export decision supporting instrument

Marketing Science & Inspirations ◽

10.46286/msi.2021.16.2.1 ◽

2021 ◽

pp. 2-11

Author(s):

David Aufreiter ◽

Doris Ehrlinger ◽

Christian Stadlmann ◽

Margarethe Uberwimmer ◽

Anna Biedersberger ◽

...

Keyword(s):

Artificial Intelligence ◽

Language Processing ◽

Research Process ◽

Training Data ◽

Future Research ◽

Manufacturing Companies ◽

Market Information ◽

Annotation Scheme ◽

Export Decisions ◽

Decision Supporting

On the servitization journey, manufacturing companies complement their offerings with new industrial and knowledge-based services, which causes challenges of uncertainty and risk. In addition to the required adjustment of internal factors, the international selling of services is a major challenge. This paper presents the initial results of an international research project aimed at assisting advanced manufacturers in making decisions about exporting their service offerings to foreign markets. In the frame of this project, a tool is developed to support managers in their service export decisions through the automated generation of market information based on Natural Language Processing and Machine Learning. The paper presents a roadmap for progressing towards an Artificial Intelligence-based market information solution. It describes the research process steps of analyzing problem statements of relevant industry partners, selecting target countries and markets, defining parameters for the scope of the tool, classifying different service offerings and their components into categories and developing annotation scheme for generating reliable and focused training data for the Artificial Intelligence solution. This paper demonstrates good practices in essential steps and highlights common pitfalls to avoid for researcher and managers working on future research projects supported by Artificial Intelligence. In the end, the paper aims at contributing to support and motivate researcher and manager to discover AI application and research opportunities within the servitization field.

Download Full-text

Towards Accurate and Efficient Chinese Part-of-Speech Tagging

Computational Linguistics ◽

10.1162/coli_a_00253 ◽

2016 ◽

Vol 42 (3) ◽

pp. 391-419 ◽

Cited By ~ 4

Author(s):

Weiwei Sun ◽

Xiaojun Wan

Keyword(s):

Hybrid Systems ◽

Language Processing ◽

Large Scale ◽

Unlabeled Data ◽

Training Data ◽

Test Time ◽

System Combination ◽

Pos Tagging ◽

Hybrid Approaches ◽

Lexical Relations

From the perspective of structural linguistics, we explore paradigmatic and syntagmatic lexical relations for Chinese POS tagging, an important and challenging task for Chinese language processing. Paradigmatic lexical relations are explicitly captured by word clustering on large-scale unlabeled data and are used to design new features to enhance a discriminative tagger. Syntagmatic lexical relations are implicitly captured by syntactic parsing in the constituency formalism, and are utilized via system combination. Experiments on the Penn Chinese Treebank demonstrate the importance of both paradigmatic and syntagmatic relations. Our linguistically motivated, hybrid approaches yield a relative error reduction of 18% in total over state-of-the-art baselines. Despite the effectiveness to boost accuracy, computationally expensive parsers make hybrid systems inappropriate for many realistic NLP applications. In this article, we are also concerned with improving tagging efficiency at test time. In particular, we explore unlabeled data to transfer the predictive power of hybrid models to simple sequence models. Specifically, hybrid systems are utilized to create large-scale pseudo training data for cheap models. Experimental results illustrate that the re-compiled models not only achieve high accuracy with respect to per token classification, but also serve as a front-end to a parser well.

Download Full-text

Evidential multi-class classification from binary classifiers: application to waste sorting quality control from hyperspectral data

10.1117/12.2266961 ◽

2017 ◽

Author(s):

Marie Lachaize ◽

Sylvie Le Hégarat-Mascle ◽

Emanuel Aldea ◽

Aude Maitrot ◽

Roger Reynaud

Keyword(s):

Quality Control ◽

Hyperspectral Data ◽

Binary Classifiers ◽

Multi Class Classification

Download Full-text

A Novel Multi-class Classification Architecture Combining Population-based Sampling and Multi-expert Classifier for Imbalanced Data

10.1109/smc52423.2021.9659252 ◽

2021 ◽

Author(s):

Haochen Jiang ◽

Ziqi Wei ◽

Lin Liu ◽

Xiulong Yuan ◽

Jun Chen

Keyword(s):

Imbalanced Data ◽

Population Based ◽

Multi Class Classification

Download Full-text

Oversampling Based on Data Augmentation in Convolutional Neural Network for Silicon Wafer Defect Classification

Knowledge Innovation Through Intelligent Software Methodologies, Tools and Techniques - Frontiers in Artificial Intelligence and Applications ◽

10.3233/faia200547 ◽

2020 ◽

Author(s):

Uzma Batool ◽

Mohd Ibrahim Shapiai ◽

Nordinah Ismail ◽

Hilman Fauzi ◽

Syahrizal Salleh

Keyword(s):

Neural Network ◽

Deep Learning ◽

Convolutional Neural Network ◽

Silicon Wafer ◽

Data Augmentation ◽

Imbalanced Data ◽

Training Data ◽

Defect Classification ◽

Learning Method ◽

Test Set

Silicon wafer defect data collected from fabrication facilities is intrinsically imbalanced because of the variable frequencies of defect types. Frequently occurring types will have more influence on the classification predictions if a model gets trained on such skewed data. A fair classifier for such imbalanced data requires a mechanism to deal with type imbalance in order to avoid biased results. This study has proposed a convolutional neural network for wafer map defect classification, employing oversampling as an imbalance addressing technique. To have an equal participation of all classes in the classifier’s training, data augmentation has been employed, generating more samples in minor classes. The proposed deep learning method has been evaluated on a real wafer map defect dataset and its classification results on the test set returned a 97.91% accuracy. The results were compared with another deep learning based auto-encoder model demonstrating the proposed method, a potential approach for silicon wafer defect classification that needs to be investigated further for its robustness.

Download Full-text

Multi-Class Classification using Covariance among Binary Classifiers and its Application to the Analysis of Tumor Microarrays

Computational Intelligence and Bioinformatics / 755: Modelling, Identification, and Simulation ◽

10.2316/p.2012.753-043 ◽

2012 ◽

Author(s):

Li-San Wang ◽

Yuk Yee Leung

Keyword(s):

Binary Classifiers ◽

Multi Class Classification

Download Full-text

Mostly-unsupervised statistical segmentation of Japanese kanji sequences

Natural Language Engineering ◽

10.1017/s1351324902002954 ◽

2003 ◽

Vol 9 (2) ◽

pp. 127-149 ◽

Cited By ~ 9

Author(s):

RIE KUBOTA ANDO ◽

LILLIAN LEE

Keyword(s):

State Of The Art ◽

Training Data ◽

Syntactic Analysis ◽

Annotation Scheme ◽

Error Metrics ◽

Japanese Word ◽

Robust Statistical Method ◽

Segmentation Algorithms ◽

Statistical Segmentation ◽

Multiple Granularities

Given the lack of word delimiters in written Japanese, word segmentation is generally considered a crucial first step in processing Japanese texts. Typical Japanese segmentation algorithms rely either on a lexicon and syntactic analysis or on pre-segmented data; but these are labor-intensive, and the lexico-syntactic techniques are vulnerable to the unknown word problem. In contrast, we introduce a novel, more robust statistical method utilizing unsegmented training data. Despite its simplicity, the algorithm yields performance on long kanji sequences comparable to and sometimes surpassing that of state-of-the-art morphological analyzers over a variety of error metrics. The algorithm also outperforms another mostly-unsupervised statistical algorithm previously proposed for Chinese. Additionally, we present a two-level annotation scheme for Japanese to incorporate multiple segmentation granularities, and introduce two novel evaluation metrics, both based on the notion of a compatible bracket, that can account for multiple granularities simultaneously.

Download Full-text

Capsule Defects Classification Based on Hierarchical Support Vector Machines

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.926-930.3373 ◽

2014 ◽

Vol 926-930 ◽

pp. 3373-3378 ◽

Cited By ~ 1

Author(s):

Dan Yang Qi ◽

Zheng Jiang

Keyword(s):

Significant Degree ◽

Training Data ◽

Support Vector ◽

A Algorithm ◽

Vector Machines ◽

Sample Data ◽

Feature Based ◽

Error Accumulation ◽

Multi Class Classification ◽

Defects Classification

Aiming at the problem of capsule defect species diversity and classification difficulty in the process of actual capsule defect detection, this paper extracts capsule defect feature based on capsule texture, shape and capsule defect region by edge detector, and then applies hierarchical SVMs multi-class classification to classifying. In order to resolve the problems of training data imbalance and the hierarchical SVM error accumulation, a algorithm of constructing hierarchical structure is proposed that takes the principle of dividing all sample data into two more imbalanced categories according to the length of training data, and then considering significant degree of capsule defect and the probability level of capsule defect occurrence. The experimental results show that compared with the method of BP neural network, the hierarchical SVMs achieved a better classification result.

Download Full-text

DATA IMBALANCE IN LANDSLIDE SUSCEPTIBILITY ZONATION: UNDER-SAMPLING FOR CLASS-IMBALANCE LEARNING

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-3-w11-51-2020 ◽

2020 ◽

Vol XLII-3/W11 ◽

pp. 51-57

Author(s):

S. K. Gupta ◽

M. Jhunjhunwalla ◽

A. Bhardwaj ◽

D. P. Shukla

Keyword(s):

Neural Network ◽

Artificial Neural Network ◽

Class Imbalance ◽

Imbalanced Data ◽

Training Data ◽

Support Vector ◽

Fisher Discriminant Analysis ◽

Minority Class ◽

Data Imbalance ◽

Artificial Neural

Abstract. Machine learning methods such as artificial neural network, support vector machine etc. require a large amount of training data, however, the number of landslide occurrences are limited in a study area. The limited number of landslides leads to a small number of positive class pixels in the training data. On contrary, the number of non-landslide pixels (negative class pixels) are enormous in numbers. This under-represented data and severe class distribution skew create a data imbalance for learning algorithms and suboptimal models, which are biased towards the majority class (non-landslide pixels) and have low performance on the minority class (landslide pixels).In this work, we have used two algorithms namely EasyEnsemble and BalanceCascade for balancing the data. This balanced data is used with feature selection methods such as fisher discriminant analysis (FDA), logistic regression (LR) and artificial neural network (ANN) to generate LSZ maps The results of the study show that ANN with balanced data has major improvements in preparation of susceptibility maps over imbalanced data, where as the LR method is ill-effected by data balancing algorithms. The FDA does not show significant changes between balanced and imbalanced data.

Download Full-text