Higher-order Lexical Semantic Models for Non-factoid Answer Reranking

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00133 ◽

2015 ◽

Vol 3 ◽

pp. 197-210 ◽

Cited By ~ 5

Author(s):

Daniel Fried ◽

Peter Jansen ◽

Gustave Hahn-Powell ◽

Mihai Surdeanu ◽

Peter Clark

Keyword(s):

Question Answering ◽

Direct Evidence ◽

Higher Order ◽

Language Models ◽

Lexical Semantic ◽

Semantic Models ◽

Question And Answer ◽

Network Language ◽

Relative Gains ◽

Direct Term

Lexical semantic models provide robust performance for question answering, but, in general, can only capitalize on direct evidence seen during training. For example, monolingual alignment models acquire term alignment probabilities from semi-structured data such as question-answer pairs; neural network language models learn term embeddings from unstructured text. All this knowledge is then used to estimate the semantic similarity between question and answer candidates. We introduce a higher-order formalism that allows all these lexical semantic models to chain direct evidence to construct indirect associations between question and answer texts, by casting the task as the traversal of graphs that encode direct term associations. Using a corpus of 10,000 questions from Yahoo! Answers, we experimentally demonstrate that higher-order methods are broadly applicable to alignment and language models, across both word and syntactic representations. We show that an important criterion for success is controlling for the semantic drift that accumulates during graph traversal. All in all, the proposed higher-order approach improves five out of the six lexical semantic models investigated, with relative gains of up to +13% over their first-order variants.

Download Full-text

Factorised Hidden Layer Based Domain Adaptation for Recurrent Neural Network Language Models

2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) ◽

10.23919/apsipa.2018.8659473 ◽

2018 ◽

Cited By ~ 1

Author(s):

Michael Hentschel ◽

Marc Delcroix ◽

Atsunori Ogawa ◽

Tomoharu Iwata ◽

Tomohiro Nakatani

Keyword(s):

Neural Network ◽

Recurrent Neural Network ◽

Domain Adaptation ◽

Language Models ◽

Hidden Layer ◽

Network Language

Download Full-text

Efficient Embedded Decoding of Neural Network Language Models in a Machine Translation System

International Journal of Neural Systems ◽

10.1142/s0129065718500077 ◽

2018 ◽

Vol 28 (09) ◽

pp. 1850007

Author(s):

Francisco Zamora-Martinez ◽

Maria Jose Castro-Bleda

Keyword(s):

Neural Network ◽

Machine Translation ◽

Language Processing ◽

Traditional Approach ◽

Computational Cost ◽

Integrated Approach ◽

Language Models ◽

Translation System ◽

Neural Net ◽

Network Language

Neural Network Language Models (NNLMs) are a successful approach to Natural Language Processing tasks, such as Machine Translation. We introduce in this work a Statistical Machine Translation (SMT) system which fully integrates NNLMs in the decoding stage, breaking the traditional approach based on [Formula: see text]-best list rescoring. The neural net models (both language models (LMs) and translation models) are fully coupled in the decoding stage, allowing to more strongly influence the translation quality. Computational issues were solved by using a novel idea based on memorization and smoothing of the softmax constants to avoid their computation, which introduces a trade-off between LM quality and computational cost. These ideas were studied in a machine translation task with different combinations of neural networks used both as translation models and as target LMs, comparing phrase-based and [Formula: see text]-gram-based systems, showing that the integrated approach seems more promising for [Formula: see text]-gram-based systems, even with nonfull-quality NNLMs.

Download Full-text

Author Reputation Measurement on Question and Answer Sites by the Classification of Author-Generated Content

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194021500479 ◽

2021 ◽

Vol 31 (10) ◽

pp. 1421-1445

Author(s):

Erhan Sezerer ◽

Samet Tenekeci ◽

Ali Acar ◽

Bora Baloğlu ◽

Selma Tekir

Keyword(s):

Software Engineering ◽

Design Patterns ◽

Binary Classification ◽

Grey Literature ◽

Language Models ◽

Superior Performance ◽

Reputation Measurement ◽

Objective Quality ◽

Question And Answer ◽

Dataset Size

In the field of software engineering, practitioners’ share in the constructed knowledge cannot be underestimated and is mostly in the form of grey literature (GL). GL is a valuable resource though it is subjective and lacks an objective quality assurance methodology. In this paper, a quality assessment scheme is proposed for question and answer (Q&A) sites. In particular, we target stack overflow (SO) and stack exchange (SE) sites. We model the problem of author reputation measurement as a classification task on the author-provided answers. The authors’ mean, median, and total answer scores are used as inputs for class labeling. State-of-the-art language models (BERT and DistilBERT) with a softmax layer on top are utilized as classifiers and compared to SVM and random baselines. Our best model achieves [Formula: see text] accuracy in binary classification in SO design patterns tag and [Formula: see text] accuracy in SE software engineering category. Superior performance in SE software engineering can be explained by its larger dataset size. In addition to quantitative evaluation, we provide qualitative evidence, which supports that the system’s predicted reputation labels match the quality of provided answers.

Download Full-text

Audio-aware Spoken Multiple-choice Question Answering with Pre-trained Language Models

IEEE/ACM Transactions on Audio Speech and Language Processing ◽

10.1109/taslp.2021.3120638 ◽

2021 ◽

pp. 1-1

Author(s):

Chia-Chih Kuo ◽

Kuan-Yu Chen ◽

Shang-Bao Luo

Keyword(s):

Question Answering ◽

Multiple Choice ◽

Multiple Choice Question ◽

Language Models

Download Full-text

Efficient Transfer Learning for Neural Network Language Models

2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) ◽

10.1109/asonam.2018.8508304 ◽

2018 ◽

Author(s):

Jacek Skryzalin ◽

Hamilton Link ◽

Jeremy Wendt ◽

Richard Field ◽

Samuel N. Richter

Keyword(s):

Neural Network ◽

Transfer Learning ◽

Language Models ◽

Network Language ◽

Efficient Transfer

Download Full-text

Limited-Memory BFGS Optimization of Recurrent Neural Network Language Models for Speech Recognition

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2018.8461550 ◽

2018 ◽

Cited By ~ 4

Author(s):

Xunying Liu ◽

Shansong Liu ◽

Jinze Sha ◽

Jianwei Yu ◽

Zhiyuan Xu ◽

...

Keyword(s):

Neural Network ◽

Speech Recognition ◽

Recurrent Neural Network ◽

Language Models ◽

Limited Memory ◽

Network Language

Download Full-text

What Can We Learn from Almost a Decade of Food Tweets

Frontiers in Artificial Intelligence and Applications - Human Language Technologies – The Baltic Perspective ◽

10.3233/faia200622 ◽

2020 ◽

Author(s):

Uga Sproģis ◽

Matīss Rikters

Keyword(s):

Sentiment Analysis ◽

Question Answering ◽

Time Span ◽

Use Cases ◽

Specific Question ◽

Domain Specific ◽

Question And Answer ◽

Analysis Models ◽

Over Time

We present the Latvian Twitter Eater Corpus - a set of tweets in the narrow domain related to food, drinks, eating and drinking. The corpus has been collected over time-span of over 8 years and includes over 2 million tweets entailed with additional useful data. We also separate two sub-corpora of question and answer tweets and sentiment annotated tweets. We analyse the contents of the corpus and demonstrate use-cases for the sub-corpora by training domain-specific question-answering and sentiment-analysis models using the data from the corpus.

Download Full-text

Evaluating Commonsense in Pre-Trained Language Models

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6523 ◽

2020 ◽

Vol 34 (05) ◽

pp. 9733-9740 ◽

Cited By ~ 1

Author(s):

Xuhui Zhou ◽

Yue Zhang ◽

Leyang Cui ◽

Dandan Huang

Keyword(s):

Reading Comprehension ◽

Question Answering ◽

Deep Level ◽

Language Models ◽

Future Research ◽

Correct Prediction ◽

Test Cases ◽

Word Sense ◽

Training Set ◽

Text Data

Contextualized representations trained over large raw text data have given remarkable improvements for NLP tasks including question answering and reading comprehension. There have been works showing that syntactic, semantic and word sense knowledge are contained in such representations, which explains why they benefit such tasks. However, relatively little work has been done investigating commonsense knowledge contained in contextualized representations, which is crucial for human question answering and reading comprehension. We study the commonsense ability of GPT, BERT, XLNet, and RoBERTa by testing them on seven challenging benchmarks, finding that language modeling and its variants are effective objectives for promoting models' commonsense ability while bi-directional context and larger training set are bonuses. We additionally find that current models do poorly on tasks require more necessary inference steps. Finally, we test the robustness of models by making dual test cases, which are correlated so that the correct prediction of one sample should lead to correct prediction of the other. Interestingly, the models show confusion on these test cases, which suggests that they learn commonsense at the surface rather than the deep level. We release a test set, named CATs publicly, for future research.

Download Full-text

Analyzing the Effect of Masking Length Distribution of MLM: An Evaluation Framework and Case Study on Chinese MRC Datasets

Wireless Communications and Mobile Computing ◽

10.1155/2021/5375334 ◽

2021 ◽

Vol 2021 ◽

pp. 1-17

Author(s):

Changchang Zeng ◽

Shaobo Li

Keyword(s):

Reading Comprehension ◽

Language Processing ◽

Question Answering ◽

Multiple Choice ◽

Length Distribution ◽

Research Field ◽

Evaluation Framework ◽

Language Models ◽

Training Objective ◽

Machine Reading

Machine reading comprehension (MRC) is a challenging natural language processing (NLP) task. It has a wide application potential in the fields of question answering robots, human-computer interactions in mobile virtual reality systems, etc. Recently, the emergence of pretrained models (PTMs) has brought this research field into a new era, in which the training objective plays a key role. The masked language model (MLM) is a self-supervised training objective widely used in various PTMs. With the development of training objectives, many variants of MLM have been proposed, such as whole word masking, entity masking, phrase masking, and span masking. In different MLMs, the length of the masked tokens is different. Similarly, in different machine reading comprehension tasks, the length of the answer is also different, and the answer is often a word, phrase, or sentence. Thus, in MRC tasks with different answer lengths, whether the length of MLM is related to performance is a question worth studying. If this hypothesis is true, it can guide us on how to pretrain the MLM with a relatively suitable mask length distribution for MRC tasks. In this paper, we try to uncover how much of MLM’s success in the machine reading comprehension tasks comes from the correlation between masking length distribution and answer length in the MRC dataset. In order to address this issue, herein, (1) we propose four MRC tasks with different answer length distributions, namely, the short span extraction task, long span extraction task, short multiple-choice cloze task, and long multiple-choice cloze task; (2) four Chinese MRC datasets are created for these tasks; (3) we also have pretrained four masked language models according to the answer length distributions of these datasets; and (4) ablation experiments are conducted on the datasets to verify our hypothesis. The experimental results demonstrate that our hypothesis is true. On four different machine reading comprehension datasets, the performance of the model with correlation length distribution surpasses the model without correlation.

Download Full-text

Discriminative method for recurrent neural network language models

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2015.7179000 ◽

2015 ◽

Cited By ~ 4

Author(s):

Yuuki Tachioka ◽

Shinji Watanabe

Keyword(s):

Neural Network ◽

Recurrent Neural Network ◽

Language Models ◽

Discriminative Method ◽

Network Language

Download Full-text