Narrative Question Answering with Cutting-Edge Open-Domain QA Techniques: A Comprehensive Study

Relevance-guided Supervision for OpenQA with ColBERT

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00405 ◽

2021 ◽

Vol 9 ◽

pp. 929-944

Author(s):

Omar Khattab ◽

Christopher Potts ◽

Matei Zaharia

Keyword(s):

Question Answering ◽

State Of The Art ◽

Training Data ◽

Coarse Grained ◽

Retrieval Model ◽

Open Domain ◽

Weak Supervision ◽

Fine Grained ◽

Vector Representations ◽

Large Corpus

Abstract Systems for Open-Domain Question Answering (OpenQA) generally depend on a retriever for finding candidate passages in a large corpus and a reader for extracting answers from those passages. In much recent work, the retriever is a learned component that uses coarse-grained vector representations of questions and passages. We argue that this modeling choice is insufficiently expressive for dealing with the complexity of natural language questions. To address this, we define ColBERT-QA, which adapts the scalable neural retrieval model ColBERT to OpenQA. ColBERT creates fine-grained interactions between questions and passages. We propose an efficient weak supervision strategy that iteratively uses ColBERT to create its own training data. This greatly improves OpenQA retrieval on Natural Questions, SQuAD, and TriviaQA, and the resulting system attains state-of-the-art extractive OpenQA performance on all three datasets.

Download Full-text

MKQA: A Linguistically Diverse Benchmark for Multilingual Open Domain Question Answering

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00433 ◽

2021 ◽

Vol 9 ◽

pp. 1389-1406

Author(s):

Shayne Longpre ◽

Yi Lu ◽

Joachim Daiber

Keyword(s):

Question Answering ◽

State Of The Art ◽

Linguistically Diverse ◽

Data Representation ◽

Independent Data ◽

Open Domain ◽

Low Resource ◽

Art Methods ◽

Questions And Answers ◽

Cross Lingual

Abstract Progress in cross-lingual modeling depends on challenging, realistic, and diverse evaluation sets. We introduce Multilingual Knowledge Questions and Answers (MKQA), an open- domain question answering evaluation set comprising 10k question-answer pairs aligned across 26 typologically diverse languages (260k question-answer pairs in total). Answers are based on heavily curated, language- independent data representation, making results comparable across languages and independent of language-specific passages. With 26 languages, this dataset supplies the widest range of languages to-date for evaluating question answering. We benchmark a variety of state- of-the-art methods and baselines for generative and extractive question answering, trained on Natural Questions, in zero shot and translation settings. Results indicate this dataset is challenging even in English, but especially in low-resource languages.1

Download Full-text

A Yes/No Answer Generator Based on Sentiment-Word Scores in Biomedical Question Answering

Data Analytics in Medicine ◽

10.4018/978-1-7998-1204-3.ch005 ◽

2020 ◽

pp. 103-116

Author(s):

Mourad Sarrouti ◽

Said Ouatik El Alaoui

Keyword(s):

Question Answering ◽

State Of The Art ◽

Biomedical Domain ◽

Open Domain ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Current State ◽

Sentiment Score ◽

Speech Tagging ◽

Sentiment Word

Background and Objective: Yes/no question answering (QA) in open-domain is a longstanding challenge widely studied over the last decades. However, it still requires further efforts in the biomedical domain. Yes/no QA aims at answering yes/no questions, which are seeking for a clear “yes” or “no” answer. In this paper, we present a novel yes/no answer generator based on sentiment-word scores in biomedical QA. Methods: In the proposed method, we first use the Stanford CoreNLP for tokenization and part-of-speech tagging all relevant passages to a given yes/no question. We then assign a sentiment score based on SentiWordNet to each word of the passages. Finally, the decision on either the answers “yes” or “no” is based on the obtained sentiment-passages score: “yes” for a positive final sentiment-passages score and “no” for a negative one. Results: Experimental evaluations performed on BioASQ collections show that the proposed method is more effective as compared with the current state-of-the-art method, and significantly outperforms it by an average of 15.68% in terms of accuracy.

Download Full-text

A Yes/No Answer Generator Based on Sentiment-Word Scores in Biomedical Question Answering

International Journal of Healthcare Information Systems and Informatics ◽

10.4018/ijhisi.2017070104 ◽

2017 ◽

Vol 12 (3) ◽

pp. 62-74 ◽

Cited By ~ 5

Author(s):

Mourad Sarrouti ◽

Said Ouatik El Alaoui

Keyword(s):

Question Answering ◽

State Of The Art ◽

Biomedical Domain ◽

Open Domain ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Current State ◽

Sentiment Score ◽

Speech Tagging ◽

Sentiment Word

Background and Objective: Yes/no question answering (QA) in open-domain is a longstanding challenge widely studied over the last decades. However, it still requires further efforts in the biomedical domain. Yes/no QA aims at answering yes/no questions, which are seeking for a clear “yes” or “no” answer. In this paper, we present a novel yes/no answer generator based on sentiment-word scores in biomedical QA. Methods: In the proposed method, we first use the Stanford CoreNLP for tokenization and part-of-speech tagging all relevant passages to a given yes/no question. We then assign a sentiment score based on SentiWordNet to each word of the passages. Finally, the decision on either the answers “yes” or “no” is based on the obtained sentiment-passages score: “yes” for a positive final sentiment-passages score and “no” for a negative one. Results: Experimental evaluations performed on BioASQ collections show that the proposed method is more effective as compared with the current state-of-the-art method, and significantly outperforms it by an average of 15.68% in terms of accuracy.

Download Full-text

JUST.ASK — A MULTI-PRONGED APPROACH TO QUESTION ANSWERING

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213012500364 ◽

2013 ◽

Vol 22 (01) ◽

pp. 1250036 ◽

Cited By ~ 5

Author(s):

ANA CRISTINA MENDES ◽

LUÍSA COHEUR ◽

JOÃO SILVA ◽

HUGO RODRIGUES

Keyword(s):

Scientific Community ◽

Question Answering ◽

State Of The Art ◽

Information Source ◽

Test Collection ◽

Open Domain ◽

Research Areas ◽

Detailed Evaluation ◽

Flexible Architecture ◽

Recent Developments

In the last decades, several research areas experienced key improvements due to the appearance of numerous tools made available to the scientific community. For instance, Moses plays an important role in recent developments in machine translation and Lucene is, with no doubt, a widespread tool in information retrieval. The existence of these systems allows an easy development of baselines and, therefore, researchers can focus on improving preliminary results, instead of spending time in developing software from scratch. In addition, the existence of appropriate test collections leads to a straightforward comparison of systems and of their specific components. In this paper we describe Just.Ask, a multi-pronged approach to open-domain question answering. Just.Ask combines rule- with machine learning-based components and implements several state-of-the-art strategies in question answering. Also, it has a flexible architecture that allows for further extensions. Moreover, in this paper we report a detailed evaluation of each one of Just.Ask components. The evaluation is split into two parts: in the first one, we use a manually built test collection — the GoldWebQA — that intends to evaluate Just.Ask performance when the information source in use is the Web, without having to deal with its constant changes; in the second one, we use a set of questions gathered from the TREC evaluation forum, having a closed text collection, locally indexed and stored, as information source. Therefore, this paper contributes with a benchmark for research on question answering, since both Just.Ask and the GoldWebQA corpus are freely available for the scientific community.

Download Full-text

PAQ: 65 Million Probably-Asked Questions and What You Can Do With Them

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00415 ◽

2021 ◽

Vol 9 ◽

pp. 1098-1115

Author(s):

Patrick Lewis ◽

Yuxiang Wu ◽

Linqing Liu ◽

Pasquale Minervini ◽

Heinrich Küttler ◽

...

Keyword(s):

Question Answering ◽

State Of The Art ◽

High Accuracy ◽

Test Time ◽

Open Domain ◽

Combined System ◽

Text Corpora ◽

Degree Of Control ◽

Conventional Models ◽

High Degree

Abstract Open-domain Question Answering models that directly leverage question-answer (QA) pairs, such as closed-book QA (CBQA) models and QA-pair retrievers, show promise in terms of speed and memory compared with conventional models which retrieve and read from text corpora. QA-pair retrievers also offer interpretable answers, a high degree of control, and are trivial to update at test time with new knowledge. However, these models fall short of the accuracy of retrieve-and-read systems, as substantially less knowledge is covered by the available QA-pairs relative to text corpora like Wikipedia. To facilitate improved QA-pair models, we introduce Probably Asked Questions (PAQ), a very large resource of 65M automatically generated QA-pairs. We introduce a new QA-pair retriever, RePAQ, to complement PAQ. We find that PAQ preempts and caches test questions, enabling RePAQ to match the accuracy of recent retrieve-and-read models, whilst being significantly faster. Using PAQ, we train CBQA models which outperform comparable baselines by 5%, but trail RePAQ by over 15%, indicating the effectiveness of explicit retrieval. RePAQ can be configured for size (under 500MB) or speed (over 1K questions per second) while retaining high accuracy. Lastly, we demonstrate RePAQ’s strength at selective QA, abstaining from answering when it is likely to be incorrect. This enables RePAQ to “back-off” to a more expensive state-of-the-art model, leading to a combined system which is both more accurate and 2x faster than the state-of-the-art model alone.

Download Full-text

Coarse-To-Careful: Seeking Semantic-Related Knowledge for Open-Domain Commonsense Question Answering

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp39728.2021.9413878 ◽

2021 ◽

Author(s):

Luxi Xing ◽

Yue Hu ◽

Jing Yu ◽

Yuqiang Xie ◽

Wei Peng

Keyword(s):

Question Answering ◽

Open Domain

Download Full-text

Question-aware memory network for multi-hop question answering in human–robot interaction

Complex & Intelligent Systems ◽

10.1007/s40747-021-00448-0 ◽

2021 ◽

Author(s):

Xinmeng Li ◽

Mamoun Alazab ◽

Qian Li ◽

Keping Yu ◽

Quanjun Yin

Keyword(s):

Question Answering ◽

State Of The Art ◽

Human Robot Interaction ◽

Knowledge Graph ◽

Robot Interaction ◽

Natural Language Question ◽

Memory Network ◽

The Given ◽

Fine Tune ◽

Language Question

AbstractKnowledge graph question answering is an important technology in intelligent human–robot interaction, which aims at automatically giving answer to human natural language question with the given knowledge graph. For the multi-relation question with higher variety and complexity, the tokens of the question have different priority for the triples selection in the reasoning steps. Most existing models take the question as a whole and ignore the priority information in it. To solve this problem, we propose question-aware memory network for multi-hop question answering, named QA2MN, to update the attention on question timely in the reasoning process. In addition, we incorporate graph context information into knowledge graph embedding model to increase the ability to represent entities and relations. We use it to initialize the QA2MN model and fine-tune it in the training process. We evaluate QA2MN on PathQuestion and WorldCup2014, two representative datasets for complex multi-hop question answering. The result demonstrates that QA2MN achieves state-of-the-art Hits@1 accuracy on the two datasets, which validates the effectiveness of our model.

Download Full-text

An Open Domain Question Answering System Based on Improved System Similarity Model

2006 International Conference on Machine Learning and Cybernetics ◽

10.1109/icmlc.2006.259170 ◽

2006 ◽

Cited By ~ 4

Author(s):

Yu-ming Zhao ◽

Zhi-ming Xu ◽

Yi Guan ◽

Xiao-long Wang

Keyword(s):

Question Answering ◽

Open Domain ◽

Question Answering System ◽

Similarity Model

Download Full-text

State of the Art in Information Extraction and Quantitative Analysis for Multimodality Biomolecular Imaging

Proceedings of the IEEE ◽

10.1109/jproc.2007.913556 ◽

2008 ◽

Vol 96 (3) ◽

pp. 512-531 ◽

Cited By ~ 7

Author(s):

W.M. Ahmed ◽

S.J. Leavesley ◽

B. Rajwa ◽

M.N. Ayyaz ◽

A. Ghafoor ◽

...

Keyword(s):

Quantitative Analysis ◽

Information Extraction ◽

State Of The Art ◽

Biomolecular Imaging

Download Full-text