Dual Coordinate Descent Algorithms for Efficient Large Margin Structured Prediction

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00221 ◽

2013 ◽

Vol 1 ◽

pp. 207-218 ◽

Cited By ~ 7

Author(s):

Ming-Wei Chang ◽

Wen-tau Yih

Keyword(s):

Named Entity Recognition ◽

Weight Vector ◽

Coordinate Descent ◽

Structured Prediction ◽

Entity Recognition ◽

Stochastic Gradient Descent ◽

Support Vector ◽

Modeling Tools ◽

Wide Range ◽

Dual Coordinate Descent

Due to the nature of complex NLP problems, structured prediction algorithms have been important modeling tools for a wide range of tasks. While there exists evidence showing that linear Structural Support Vector Machine (SSVM) algorithm performs better than structured Perceptron, the SSVM algorithm is still less frequently chosen in the NLP community because of its relatively slow training speed. In this paper, we propose a fast and easy-to-implement dual coordinate descent algorithm for SSVMs. Unlike algorithms such as Perceptron and stochastic gradient descent, our method keeps track of dual variables and updates the weight vector more aggressively. As a result, this training process is as efficient as existing online learning methods, and yet derives consistently better models, as evaluated on four benchmark NLP datasets for part-of-speech tagging, named-entity recognition and dependency parsing.

Download Full-text

Hyper-parameter optimization for support vector machines using stochastic gradient descent and dual coordinate descent

EURO Journal on Computational Optimization ◽

10.1007/s13675-019-00115-7 ◽

2019 ◽

Vol 8 (1) ◽

pp. 85-101 ◽

Cited By ~ 3

Author(s):

Wei Jiang ◽

Sauleh Siddiqui

Keyword(s):

Support Vector Machines ◽

Parameter Optimization ◽

Gradient Descent ◽

Coordinate Descent ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Support Vector ◽

Vector Machines ◽

Dual Coordinate Descent

Download Full-text

A Kernel-Based Approach for Biomedical Named Entity Recognition

The Scientific World JOURNAL ◽

10.1155/2013/950796 ◽

2013 ◽

Vol 2013 ◽

pp. 1-7 ◽

Cited By ~ 8

Author(s):

Rakesh Patra ◽

Sujan Kumar Saha

Keyword(s):

Kernel Function ◽

Text Processing ◽

Named Entity Recognition ◽

Kernel Functions ◽

Entity Recognition ◽

Machine Learning Techniques ◽

Support Vector ◽

Svm Classifier ◽

Named Entity ◽

Tree Kernel

Support vector machine (SVM) is one of the popular machine learning techniques used in various text processing tasks including named entity recognition (NER). The performance of the SVM classifier largely depends on the appropriateness of the kernel function. In the last few years a number of task-specific kernel functions have been proposed and used in various text processing tasks, for example, string kernel, graph kernel, tree kernel and so on. So far very few efforts have been devoted to the development of NER task specific kernel. In the literature we found that the tree kernel has been used in NER task only for entity boundary detection or reannotation. The conventional tree kernel is unable to execute the complete NER task on its own. In this paper we have proposed a kernel function, motivated by the tree kernel, which is able to perform the complete NER task. To examine the effectiveness of the proposed kernel, we have applied the kernel function on the openly available JNLPBA 2004 data. Our kernel executes the complete NER task and achieves reasonable accuracy.

Download Full-text

ChemTok: A New Rule Based Tokenizer for Chemical Named Entity Recognition

BioMed Research International ◽

10.1155/2016/4248026 ◽

2016 ◽

Vol 2016 ◽

pp. 1-9 ◽

Cited By ~ 5

Author(s):

Abbas Akkasi ◽

Ekrem Varoğlu ◽

Nazife Dimililer

Keyword(s):

Conditional Random Fields ◽

Named Entity Recognition ◽

Classification Performance ◽

Entity Recognition ◽

Support Vector ◽

Learning Approaches ◽

Data Set ◽

Rule Based ◽

Named Entity ◽

Vector Machines

Named Entity Recognition (NER) from text constitutes the first step in many text mining applications. The most important preliminary step for NER systems using machine learning approaches is tokenization where raw text is segmented into tokens. This study proposes an enhanced rule based tokenizer, ChemTok, which utilizes rules extracted mainly from the train data set. The main novelty of ChemTok is the use of the extracted rules in order to merge the tokens split in the previous steps, thus producing longer and more discriminative tokens. ChemTok is compared to the tokenization methods utilized by ChemSpot and tmChem. Support Vector Machines and Conditional Random Fields are employed as the learning algorithms. The experimental results show that the classifiers trained on the output of ChemTok outperforms all classifiers trained on the output of the other two tokenizers in terms of classification performance, and the number of incorrectly segmented entities.

Download Full-text

SCIENTIFIC NAMED ENTITY RECOGNITION WITH THE HELP OF MODERN METHODS

Bulletin Series of Physics & Mathematical Sciences ◽

10.51889/2021-3.1728-7901.11 ◽

2021 ◽

Vol 75 (3) ◽

pp. 94-99

Author(s):

A.M. Yelenov ◽

◽

A.B. Jaxylykova ◽

Keyword(s):

Machine Learning ◽

Language Processing ◽

Named Entity Recognition ◽

Recognition Task ◽

Entity Recognition ◽

Support Vector ◽

Scientific Article ◽

Natural Languages ◽

Named Entity ◽

Learning Area

This research focuses on a comparative study of the Named Entity Recognition task for scientific article texts. Natural language processing could be considered as one of the cornerstones in the machine learning area which devotes its attention to the problems connected with the understanding of different natural languages and linguistic analysis. It was already shown that current deep learning techniques have a good performance and accuracy in such areas as image recognition, pattern recognition, computer vision, that could mean that such technology probably would be successful in the neuro-linguistic programming area too and lead to a dramatic increase on the research interest on this topic. For a very long time, quite trivial algorithms have been used in this area, such as support vector machines or various types of regression, basic encoding on text data was also used, which did not provide high results. The following dataset was used to process the experiment models: Dataset Scientific Entity Relation Core. The algorithms used were Long short-term memory, Random Forest Classifier with Conditional Random Fields, and Named-entity recognition with Bidirectional Encoder Representations from Transformers. In the findings, the metrics scores of all models were compared to each other to make a comparison. This research is devoted to the processing of scientific articles, concerning the machine learning area, because the subject is not investigated on enough properly level.The consideration of this task can help machines to understand natural languages better, so that they can solve other neuro-linguistic programming tasks better, enhancing scores in common sense.

Download Full-text

A clipping dual coordinate descent algorithm for solving support vector machines

Knowledge-Based Systems ◽

10.1016/j.knosys.2014.08.005 ◽

2014 ◽

Vol 71 ◽

pp. 266-278 ◽

Cited By ~ 12

Author(s):

Xinjun Peng ◽

Dongjing Chen ◽

Lingyan Kong

Keyword(s):

Support Vector Machines ◽

Coordinate Descent ◽

Support Vector ◽

Descent Algorithm ◽

Coordinate Descent Algorithm ◽

Vector Machines ◽

Dual Coordinate Descent

Download Full-text

Tuning support vector machines for biomedical named entity recognition

10.3115/1118149.1118150 ◽

2002 ◽

Cited By ~ 82

Author(s):

Jun'ichi Kazama ◽

Takaki Makino ◽

Yoshihiro Ohta ◽

Jun'ichi Tsujii

Keyword(s):

Support Vector Machines ◽

Named Entity Recognition ◽

Entity Recognition ◽

Support Vector ◽

Named Entity ◽

Vector Machines ◽

Biomedical Named Entity Recognition

Download Full-text

Detecting the Presence of Named Entities in Bengali: Corpus and Experiments

The International FLAIRS Conference Proceedings ◽

10.32473/flairs.v34i1.128445 ◽

2021 ◽

Vol 34 (1) ◽

Author(s):

Farzana Rashid ◽

Fahmida Hamid

Keyword(s):

Question Answering ◽

Short Term Memory ◽

Named Entity Recognition ◽

Entity Recognition ◽

Support Vector ◽

Named Entities ◽

Comparable Amount ◽

Relationship Extraction ◽

Simple Language ◽

Asian Languages

Named Entity Recognition (NER) belongs to the field of Information Extraction (IE) and Natural LanguageProcessing (NLP). NER aims to find and categorize named entities present in the textual data into recognizable classes. Named entities play vital roles in other related fields like question-answering, relationship extraction, and machine translation. Researchers have done a significant amount of work (e.g., dataset construction and analysis) in this direction for several languages like English, Spanish, Chinese, Russian, Arabic, to name a few. We do not find a comparable amount of work for several South-Asian languages like Bengali/Bangla. Hence, as part of the initial phase, we have constructed a qualitative dataset in Bengali.In this paper, we identify the presence of Named Entities (NEs) in the Bengali text (sentences), classify them in standardized categories, and test whether an automatic detection of NE is possible. We present a new corpus and experimental results. Our dataset, annotated by multiple humans, shows promising results (F-measures ranging from 0.72 to 0.84) in different setups (support vector machine (SVM) setups with simple language features and Long-Short Term Memory (LSTM) setup with various word embedding).

Download Full-text

An Improved Word Representation for Deep Learning Based NER in Indian Languages

Information ◽

10.3390/info10060186 ◽

2019 ◽

Vol 10 (6) ◽

pp. 186 ◽

Cited By ~ 1

Author(s):

Ajees A P ◽

Manju K ◽

Sumam Mary Idicula

Keyword(s):

Deep Learning ◽

Named Entity Recognition ◽

Entity Recognition ◽

Machine Learning Techniques ◽

Support Vector ◽

Indian Languages ◽

Named Entity ◽

Text Document ◽

Learning Techniques ◽

Word Representation

Named Entity Recognition (NER) is the process of identifying the elementary units in a text document and classifying them into predefined categories such as person, location, organization and so forth. NER plays an important role in many Natural Language Processing applications like information retrieval, question answering, machine translation and so forth. Resolving the ambiguities of lexical items involved in a text document is a challenging task. NER in Indian languages is always a complex task due to their morphological richness and agglutinative nature. Even though different solutions were proposed for NER, it is still an unsolved problem. Traditional approaches to Named Entity Recognition were based on the application of hand-crafted features to classical machine learning techniques such as Hidden Markov Model (HMM), Support Vector Machine (SVM), Conditional Random Field (CRF) and so forth. But the introduction of deep learning techniques to the NER problem changed the scenario, where the state of art results have been achieved using deep learning architectures. In this paper, we address the problem of effective word representation for NER in Indian languages by capturing the syntactic, semantic and morphological information. We propose a deep learning based entity extraction system for Indian languages using a novel combined word representation, including character-level, word-level and affix-level embeddings. We have used ‘ARNEKT-IECSIL 2018’ shared data for training and testing. Our results highlight the improvement that we obtained over the existing pre-trained word representations.

Download Full-text

NAMED ENTITY RECOGNITION IN BIOMEDICAL LITERATURE USING TWO-LAYER SUPPORT VECTOR MACHINES

Proceedings of the Ninth International Conference on Enterprise Information Systems ◽

10.5220/0002357300390045 ◽

2007 ◽

Keyword(s):

Support Vector Machines ◽

Named Entity Recognition ◽

Biomedical Literature ◽

Entity Recognition ◽

Support Vector ◽

Named Entity ◽

Vector Machines

Download Full-text

Identifying interactions between chemical entities in biomedical text

Journal of Integrative Bioinformatics ◽

10.1515/jib-2014-247 ◽

2014 ◽

Vol 11 (3) ◽

pp. 1-16 ◽

Cited By ~ 6

Author(s):

Andre Lamurias ◽

João D. Ferreira ◽

Francisco M. Couto

Keyword(s):

Named Entity Recognition ◽

Relation Extraction ◽

Ensemble Classifier ◽

Entity Recognition ◽

Support Vector ◽

Biomedical Text ◽

Web Tool ◽

Named Entity ◽

Vector Machines ◽

Chemical Named Entity Recognition

Summary Interactions between chemical compounds described in biomedical text can be of great importance to drug discovery and design, as well as pharmacovigilance. We developed a novel system, “Identifying Interactions between Chemical Entities” (IICE), to identify chemical interactions described in text. Kernel-based Support Vector Machines first identify the interactions and then an ensemble classifier validates and classifies the type of each interaction. This relation extraction module was evaluated with the corpus released for the DDI Extraction task of SemEval 2013, obtaining results comparable to stateof- the-art methods for this type of task. We integrated this module with our chemical named entity recognition module and made the whole system available as a web tool at www.lasige.di.fc.ul.pt/webtools/iice.

Download Full-text