scholarly journals MKQA: A Linguistically Diverse Benchmark for Multilingual Open Domain Question Answering

2021 ◽  
Vol 9 ◽  
pp. 1389-1406
Author(s):  
Shayne Longpre ◽  
Yi Lu ◽  
Joachim Daiber

Abstract Progress in cross-lingual modeling depends on challenging, realistic, and diverse evaluation sets. We introduce Multilingual Knowledge Questions and Answers (MKQA), an open- domain question answering evaluation set comprising 10k question-answer pairs aligned across 26 typologically diverse languages (260k question-answer pairs in total). Answers are based on heavily curated, language- independent data representation, making results comparable across languages and independent of language-specific passages. With 26 languages, this dataset supplies the widest range of languages to-date for evaluating question answering. We benchmark a variety of state- of-the-art methods and baselines for generative and extractive question answering, trained on Natural Questions, in zero shot and translation settings. Results indicate this dataset is challenging even in English, but especially in low-resource languages.1

2021 ◽  
Vol 9 ◽  
pp. 929-944
Author(s):  
Omar Khattab ◽  
Christopher Potts ◽  
Matei Zaharia

Abstract Systems for Open-Domain Question Answering (OpenQA) generally depend on a retriever for finding candidate passages in a large corpus and a reader for extracting answers from those passages. In much recent work, the retriever is a learned component that uses coarse-grained vector representations of questions and passages. We argue that this modeling choice is insufficiently expressive for dealing with the complexity of natural language questions. To address this, we define ColBERT-QA, which adapts the scalable neural retrieval model ColBERT to OpenQA. ColBERT creates fine-grained interactions between questions and passages. We propose an efficient weak supervision strategy that iteratively uses ColBERT to create its own training data. This greatly improves OpenQA retrieval on Natural Questions, SQuAD, and TriviaQA, and the resulting system attains state-of-the-art extractive OpenQA performance on all three datasets.


2019 ◽  
Author(s):  
Jiahua Liu ◽  
Yankai Lin ◽  
Zhiyuan Liu ◽  
Maosong Sun

2020 ◽  
Vol 34 (05) ◽  
pp. 9065-9072
Author(s):  
Luu Anh Tuan ◽  
Darsh Shah ◽  
Regina Barzilay

Automatic question generation can benefit many applications ranging from dialogue systems to reading comprehension. While questions are often asked with respect to long documents, there are many challenges with modeling such long documents. Many existing techniques generate questions by effectively looking at one sentence at a time, leading to questions that are easy and not reflective of the human process of question generation. Our goal is to incorporate interactions across multiple sentences to generate realistic questions for long documents. In order to link a broad document context to the target answer, we represent the relevant context via a multi-stage attention mechanism, which forms the foundation of a sequence to sequence model. We outperform state-of-the-art methods on question generation on three question-answering datasets - SQuAD, MS MARCO and NewsQA. 1


Author(s):  
Xilun Chen ◽  
Yu Sun ◽  
Ben Athiwaratkun ◽  
Claire Cardie ◽  
Kilian Weinberger

In recent years great success has been achieved in sentiment classification for English, thanks in part to the availability of copious annotated resources. Unfortunately, most languages do not enjoy such an abundance of labeled data. To tackle the sentiment classification problem in low-resource languages without adequate annotated data, we propose an Adversarial Deep Averaging Network (ADAN 1 ) to transfer the knowledge learned from labeled data on a resource-rich source language to low-resource languages where only unlabeled data exist. ADAN has two discriminative branches: a sentiment classifier and an adversarial language discriminator. Both branches take input from a shared feature extractor to learn hidden representations that are simultaneously indicative for the classification task and invariant across languages. Experiments on Chinese and Arabic sentiment classification demonstrate that ADAN significantly outperforms state-of-the-art systems.


2020 ◽  
Vol 8 ◽  
pp. 109-124
Author(s):  
Shuyan Zhou ◽  
Shruti Rijhwani ◽  
John Wieting ◽  
Jaime Carbonell ◽  
Graham Neubig

Cross-lingual entity linking (XEL) is the task of finding referents in a target-language knowledge base (KB) for mentions extracted from source-language texts. The first step of (X)EL is candidate generation, which retrieves a list of plausible candidate entities from the target-language KB for each mention. Approaches based on resources from Wikipedia have proven successful in the realm of relatively high-resource languages, but these do not extend well to low-resource languages with few, if any, Wikipedia pages. Recently, transfer learning methods have been shown to reduce the demand for resources in the low-resource languages by utilizing resources in closely related languages, but the performance still lags far behind their high-resource counterparts. In this paper, we first assess the problems faced by current entity candidate generation methods for low-resource XEL, then propose three improvements that (1) reduce the disconnect between entity mentions and KB entries, and (2) improve the robustness of the model to low-resource scenarios. The methods are simple, but effective: We experiment with our approach on seven XEL datasets and find that they yield an average gain of 16.9% in Top-30 gold candidate recall, compared with state-of-the-art baselines. Our improved model also yields an average gain of 7.9% in in-KB accuracy of end-to-end XEL. 1


2020 ◽  
Vol 34 (07) ◽  
pp. 11021-11028 ◽  
Author(s):  
Deng Huang ◽  
Peihao Chen ◽  
Runhao Zeng ◽  
Qing Du ◽  
Mingkui Tan ◽  
...  

We addressed the challenging task of video question answering, which requires machines to answer questions about videos in a natural language form. Previous state-of-the-art methods attempt to apply spatio-temporal attention mechanism on video frame features without explicitly modeling the location and relations among object interaction occurred in videos. However, the relations between object interaction and their location information are very critical for both action recognition and question reasoning. In this work, we propose to represent the contents in the video as a location-aware graph by incorporating the location information of an object into the graph construction. Here, each node is associated with an object represented by its appearance and location features. Based on the constructed graph, we propose to use graph convolution to infer both the category and temporal locations of an action. As the graph is built on objects, our method is able to focus on the foreground action contents for better video question answering. Lastly, we leverage an attention mechanism to combine the output of graph convolution and encoded question features for final answer reasoning. Extensive experiments demonstrate the effectiveness of the proposed methods. Specifically, our method significantly outperforms state-of-the-art methods on TGIF-QA, Youtube2Text-QA and MSVD-QA datasets.


2021 ◽  
Vol 9 ◽  
pp. 1032-1046
Author(s):  
Xiangyang Mou ◽  
Chenghao Yang ◽  
Mo Yu ◽  
Bingsheng Yao ◽  
Xiaoxiao Guo ◽  
...  

Abstract Recent advancements in open-domain question answering (ODQA), that is, finding answers from large open-domain corpus like Wikipedia, have led to human-level performance on many datasets. However, progress in QA over book stories (Book QA) lags despite its similar task formulation to ODQA. This work provides a comprehensive and quantitative analysis about the difficulty of Book QA: (1) We benchmark the research on the NarrativeQA dataset with extensive experiments with cutting-edge ODQA techniques. This quantifies the challenges Book QA poses, as well as advances the published state-of-the-art with a ∼7% absolute improvement on ROUGE-L. (2) We further analyze the detailed challenges in Book QA through human studies.1 Our findings indicate that the event-centric questions dominate this task, which exemplifies the inability of existing QA models to handle event-oriented scenarios.


Author(s):  
Toluwase Victor Asubiaro ◽  
Ebelechukwu Gloria Igwe

African languages, including those that are natives to Nigeria, are low-resource languages because they lack basic computing resources such as language-dependent hardware keyboard. Speakers of these low-resource languages are therefore unfairly deprived of information access on the internet. There is no information about the level of progress that has been made on the computation of Nigerian languages. Hence, this chapter presents a state-of-the-art review of Nigerian languages natural language processing. The review reveals that only four Nigerian languages; Hausa, Ibibio, Igbo, and Yoruba have been significantly studied in published NLP papers. Creating alternatives to hardware keyboard is one of the most popular research areas, and means such as automatic diacritics restoration, virtual keyboard, and optical character recognition have been explored. There was also an inclination towards speech and computational morphological analysis. Resource development and knowledge representation modeling of the languages using rapid resource development and cross-lingual methods are recommended.


2020 ◽  
pp. 103-116
Author(s):  
Mourad Sarrouti ◽  
Said Ouatik El Alaoui

Background and Objective: Yes/no question answering (QA) in open-domain is a longstanding challenge widely studied over the last decades. However, it still requires further efforts in the biomedical domain. Yes/no QA aims at answering yes/no questions, which are seeking for a clear “yes” or “no” answer. In this paper, we present a novel yes/no answer generator based on sentiment-word scores in biomedical QA. Methods: In the proposed method, we first use the Stanford CoreNLP for tokenization and part-of-speech tagging all relevant passages to a given yes/no question. We then assign a sentiment score based on SentiWordNet to each word of the passages. Finally, the decision on either the answers “yes” or “no” is based on the obtained sentiment-passages score: “yes” for a positive final sentiment-passages score and “no” for a negative one. Results: Experimental evaluations performed on BioASQ collections show that the proposed method is more effective as compared with the current state-of-the-art method, and significantly outperforms it by an average of 15.68% in terms of accuracy.


Sign in / Sign up

Export Citation Format

Share Document