Integrate Candidate Answer Extraction with Re-Ranking for Chinese Machine Reading Comprehension

Junjie Zeng; Xiaoya Sun; Qi Zhang; Xinmeng Li

doi:10.3390/e23030322

Integrate Candidate Answer Extraction with Re-Ranking for Chinese Machine Reading Comprehension

Entropy ◽

10.3390/e23030322 ◽

2021 ◽

Vol 23 (3) ◽

pp. 322

Author(s):

Junjie Zeng ◽

Xiaoya Sun ◽

Qi Zhang ◽

Xinmeng Li

Keyword(s):

Reading Comprehension ◽

Language Processing ◽

State Of The Art ◽

Language Model ◽

Pipeline System ◽

Challenging Problem ◽

Reading Methods ◽

Answer Extraction ◽

Word Level ◽

Machine Reading

Machine Reading Comprehension (MRC) research concerns how to endow machines with the ability to understand given passages and answer questions, which is a challenging problem in the field of natural language processing. To solve the Chinese MRC task efficiently, this paper proposes an Improved Extraction-based Reading Comprehension method with Answer Re-ranking (IERC-AR), consisting of a candidate answer extraction module and a re-ranking module. The candidate answer extraction module uses an improved pre-training language model, RoBERTa-WWM, to generate precise word representations, which can solve the problem of polysemy and is good for capturing Chinese word-level features. The re-ranking module re-evaluates candidate answers based on a self-attention mechanism, which can improve the accuracy of predicting answers. Traditional machine-reading methods generally integrate different modules into a pipeline system, which leads to re-encoding problems and inconsistent data distribution between the training and testing phases; therefore, this paper proposes an end-to-end model architecture for IERC-AR to reasonably integrate the candidate answer extraction and re-ranking modules. The experimental results on the Les MMRC dataset show that IERC-AR outperforms state-of-the-art MRC approaches.

Download Full-text

Read + Verify: Machine Reading Comprehension with Unanswerable Questions

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33016529 ◽

2019 ◽

Vol 33 ◽

pp. 6529-6537 ◽

Cited By ~ 8

Author(s):

Minghao Hu ◽

Furu Wei ◽

Yuxing Peng ◽

Zhen Huang ◽

Nan Yang ◽

...

Keyword(s):

Reading Comprehension ◽

State Of The Art ◽

Test Set ◽

Answer Extraction ◽

Machine Reading

Machine reading comprehension with unanswerable questions aims to abstain from answering when no answer can be inferred. In addition to extract answers, previous works usually predict an additional “no-answer” probability to detect unanswerable cases. However, they fail to validate the answerability of the question by verifying the legitimacy of the predicted answer. To address this problem, we propose a novel read-then-verify system, which not only utilizes a neural reader to extract candidate answers and produce no-answer probabilities, but also leverages an answer verifier to decide whether the predicted answer is entailed by the input snippets. Moreover, we introduce two auxiliary losses to help the reader better handle answer extraction as well as no-answer detection, and investigate three different architectures for the answer verifier. Our experiments on the SQuAD 2.0 dataset show that our system obtains a score of 74.2 F1 on test set, achieving state-of-the-art results at the time of submission (Aug. 28th, 2018).

Download Full-text

Distill BERT to Traditional Models in Chinese Machine Reading Comprehension (Student Abstract)

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i10.7223 ◽

2020 ◽

Vol 34 (10) ◽

pp. 13901-13902

Author(s):

Xingkai Ren ◽

Ronghua Shi ◽

Fangfang Li

Keyword(s):

Reading Comprehension ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Language Model ◽

Representation Learning ◽

Original Model ◽

Language Models ◽

Machine Reading

Recently, unsupervised representation learning has been extremely successful in the field of natural language processing. More and more pre-trained language models are proposed and achieved the most advanced results especially in machine reading comprehension. However, these proposed pre-trained language models are huge with hundreds of millions of parameters that have to be trained. It is quite time consuming to use them in actual industry. Thus we propose a method that employ a distillation traditional reading comprehension model to simplify the pre-trained language model so that the distillation model has faster reasoning speed and higher inference accuracy in the field of machine reading comprehension. We evaluate our proposed method on the Chinese machine reading comprehension dataset CMRC2018 and greatly improve the accuracy of the original model. To the best of our knowledge, we are the first to propose a method that employ the distillation pre-trained language model in Chinese machine reading comprehension.

Download Full-text

Keyword extraction method for machine reading comprehension based on natural language processing

Journal of Physics Conference Series ◽

10.1088/1742-6596/1955/1/012072 ◽

2021 ◽

Vol 1955 (1) ◽

pp. 012072

Author(s):

Ruiheng Li ◽

Xuan Zhang ◽

Chengdong Li ◽

Zhongju Zheng ◽

Zihang Zhou ◽

...

Keyword(s):

Reading Comprehension ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Extraction Method ◽

Keyword Extraction ◽

Machine Reading

Download Full-text

Analyzing the Effect of Masking Length Distribution of MLM: An Evaluation Framework and Case Study on Chinese MRC Datasets

Wireless Communications and Mobile Computing ◽

10.1155/2021/5375334 ◽

2021 ◽

Vol 2021 ◽

pp. 1-17

Author(s):

Changchang Zeng ◽

Shaobo Li

Keyword(s):

Reading Comprehension ◽

Language Processing ◽

Question Answering ◽

Multiple Choice ◽

Length Distribution ◽

Research Field ◽

Evaluation Framework ◽

Language Models ◽

Training Objective ◽

Machine Reading

Machine reading comprehension (MRC) is a challenging natural language processing (NLP) task. It has a wide application potential in the fields of question answering robots, human-computer interactions in mobile virtual reality systems, etc. Recently, the emergence of pretrained models (PTMs) has brought this research field into a new era, in which the training objective plays a key role. The masked language model (MLM) is a self-supervised training objective widely used in various PTMs. With the development of training objectives, many variants of MLM have been proposed, such as whole word masking, entity masking, phrase masking, and span masking. In different MLMs, the length of the masked tokens is different. Similarly, in different machine reading comprehension tasks, the length of the answer is also different, and the answer is often a word, phrase, or sentence. Thus, in MRC tasks with different answer lengths, whether the length of MLM is related to performance is a question worth studying. If this hypothesis is true, it can guide us on how to pretrain the MLM with a relatively suitable mask length distribution for MRC tasks. In this paper, we try to uncover how much of MLM’s success in the machine reading comprehension tasks comes from the correlation between masking length distribution and answer length in the MRC dataset. In order to address this issue, herein, (1) we propose four MRC tasks with different answer length distributions, namely, the short span extraction task, long span extraction task, short multiple-choice cloze task, and long multiple-choice cloze task; (2) four Chinese MRC datasets are created for these tasks; (3) we also have pretrained four masked language models according to the answer length distributions of these datasets; and (4) ablation experiments are conducted on the datasets to verify our hypothesis. The experimental results demonstrate that our hypothesis is true. On four different machine reading comprehension datasets, the performance of the model with correlation length distribution surpasses the model without correlation.

Download Full-text

SG-Net: Syntax-Guided Machine Reading Comprehension

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6511 ◽

2020 ◽

Vol 34 (05) ◽

pp. 9636-9643

Author(s):

Zhuosheng Zhang ◽

Yuwei Wu ◽

Junru Zhou ◽

Sufeng Duan ◽

Hai Zhao ◽

...

Keyword(s):

Reading Comprehension ◽

Performance Improvement ◽

Language Model ◽

Attention Mechanism ◽

Linguistic Knowledge ◽

Attention Network ◽

Syntactic Constraints ◽

Syntactic Dependency ◽

Machine Reading

For machine reading comprehension, the capacity of effectively modeling the linguistic knowledge from the detail-riddled and lengthy passages and getting ride of the noises is essential to improve its performance. Traditional attentive models attend to all words without explicit constraint, which results in inaccurate concentration on some dispensable words. In this work, we propose using syntax to guide the text modeling by incorporating explicit syntactic constraints into attention mechanism for better linguistically motivated word representations. In detail, for self-attention network (SAN) sponsored Transformer-based encoder, we introduce syntactic dependency of interest (SDOI) design into the SAN to form an SDOI-SAN with syntax-guided self-attention. Syntax-guided network (SG-Net) is then composed of this extra SDOI-SAN and the SAN from the original Transformer encoder through a dual contextual architecture for better linguistics inspired representation. To verify its effectiveness, the proposed SG-Net is applied to typical pre-trained language model BERT which is right based on a Transformer encoder. Extensive experiments on popular benchmarks including SQuAD 2.0 and RACE show that the proposed SG-Net design helps achieve substantial performance improvement over strong baselines.

Download Full-text

Convolutional Spatial Attention Model for Reading Comprehension with Multiple-Choice Questions

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33016276 ◽

2019 ◽

Vol 33 ◽

pp. 6276-6283 ◽

Cited By ~ 6

Author(s):

Zhipeng Chen ◽

Yiming Cui ◽

Wentao Ma ◽

Shijin Wang ◽

Guoping Hu

Keyword(s):

Reading Comprehension ◽

Mutual Information ◽

Spatial Attention ◽

State Of The Art ◽

Multiple Choice ◽

Multiple Choice Questions ◽

Attention Model ◽

Novel Approach ◽

Proposed Model ◽

Machine Reading

Machine Reading Comprehension (MRC) with multiplechoice questions requires the machine to read given passage and select the correct answer among several candidates. In this paper, we propose a novel approach called Convolutional Spatial Attention (CSA) model which can better handle the MRC with multiple-choice questions. The proposed model could fully extract the mutual information among the passage, question, and the candidates, to form the enriched representations. Furthermore, to merge various attention results, we propose to use convolutional operation to dynamically summarize the attention values within the different size of regions. Experimental results show that the proposed model could give substantial improvements over various state-of- the-art systems on both RACE and SemEval-2018 Task11 datasets.

Download Full-text

Improving the state-of-the-art in Thai semantic similarity using distributional semantics and ontological information

PLoS ONE ◽

10.1371/journal.pone.0246751 ◽

2021 ◽

Vol 16 (2) ◽

pp. e0246751

Author(s):

Ponrudee Netisopakul ◽

Gerhard Wohlgenannt ◽

Aleksei Pulich ◽

Zar Zar Hlaing

Keyword(s):

Semantic Similarity ◽

Language Processing ◽

English Language ◽

State Of The Art ◽

Word Sense Disambiguation ◽

Similarity Score ◽

The State ◽

Word Sense ◽

Word Level ◽

High Fraction

Research into semantic similarity has a long history in lexical semantics, and it has applications in many natural language processing (NLP) tasks like word sense disambiguation or machine translation. The task of calculating semantic similarity is usually presented in the form of datasets which contain word pairs and a human-assigned similarity score. Algorithms are then evaluated by their ability to approximate the gold standard similarity scores. Many such datasets, with different characteristics, have been created for English language. Recently, four of those were transformed to Thai language versions, namely WordSim-353, SimLex-999, SemEval-2017-500, and R&G-65. Given those four datasets, in this work we aim to improve the previous baseline evaluations for Thai semantic similarity and solve challenges of unsegmented Asian languages (particularly the high fraction of out-of-vocabulary (OOV) dataset terms). To this end we apply and integrate different strategies to compute similarity, including traditional word-level embeddings, subword-unit embeddings, and ontological or hybrid sources like WordNet and ConceptNet. With our best model, which combines self-trained fastText subword embeddings with ConceptNet Numberbatch, we managed to raise the state-of-the-art, measured with the harmonic mean of Pearson on Spearman ρ, by a large margin from 0.356 to 0.688 for TH-WordSim-353, from 0.286 to 0.769 for TH-SemEval-500, from 0.397 to 0.717 for TH-SimLex-999, and from 0.505 to 0.901 for TWS-65.

Download Full-text

Co-Attention Hierarchical Network: Generating Coherent Long Distractors for Reading Comprehension

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6522 ◽

2020 ◽

Vol 34 (05) ◽

pp. 9725-9732

Author(s):

Xiaorui Zhou ◽

Senlin Luo ◽

Yunfang Wu

Keyword(s):

Reading Comprehension ◽

Semantic Similarity ◽

State Of The Art ◽

Deep Understanding ◽

Hierarchical Architecture ◽

Hierarchical Network ◽

Word Level ◽

Human Evaluation ◽

Sentence Level ◽

The Relationship

In reading comprehension, generating sentence-level distractors is a significant task, which requires a deep understanding of the article and question. The traditional entity-centered methods can only generate word-level or phrase-level distractors. Although recently proposed neural-based methods like sequence-to-sequence (Seq2Seq) model show great potential in generating creative text, the previous neural methods for distractor generation ignore two important aspects. First, they didn't model the interactions between the article and question, making the generated distractors tend to be too general or not relevant to question context. Second, they didn't emphasize the relationship between the distractor and article, making the generated distractors not semantically relevant to the article and thus fail to form a set of meaningful options. To solve the first problem, we propose a co-attention enhanced hierarchical architecture to better capture the interactions between the article and question, thus guide the decoder to generate more coherent distractors. To alleviate the second problem, we add an additional semantic similarity loss to push the generated distractors more relevant to the article. Experimental results show that our model outperforms several strong baselines on automatic metrics, achieving state-of-the-art performance. Further human evaluation indicates that our generated distractors are more coherent and more educative compared with those distractors generated by baselines.

Download Full-text

Semantics-Aware BERT for Language Understanding

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6510 ◽

2020 ◽

Vol 34 (05) ◽

pp. 9628-9635

Author(s):

Zhuosheng Zhang ◽

Yuwei Wu ◽

Hai Zhao ◽

Zuchao Li ◽

Shuailiang Zhang ◽

...

Keyword(s):

Reading Comprehension ◽

Natural Language ◽

Language Model ◽

Fine Tuning ◽

Semantic Role Labeling ◽

Language Understanding ◽

Context Sensitive ◽

Language Representation ◽

Model Training ◽

Machine Reading

The latest work on language representations carefully integrates contextualized features into language model training, which enables a series of success especially in various machine reading comprehension and natural language inference tasks. However, the existing language representation models including ELMo, GPT and BERT only exploit plain context-sensitive features such as character or word embeddings. They rarely consider incorporating structured semantic information which can provide rich semantics for language representation. To promote natural language understanding, we propose to incorporate explicit contextual semantics from pre-trained semantic role labeling, and introduce an improved language representation model, Semantics-aware BERT (SemBERT), which is capable of explicitly absorbing contextual semantics over a BERT backbone. SemBERT keeps the convenient usability of its BERT precursor in a light fine-tuning way without substantial task-specific modifications. Compared with BERT, semantics-aware BERT is as simple in concept but more powerful. It obtains new state-of-the-art or substantially improves results on ten reading comprehension and language inference tasks.

Download Full-text

A text-based multi-span network for reading comprehension

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-200581 ◽

2021 ◽

pp. 1-13

Author(s):

Deguang Chen ◽

Ziping Ma ◽

Lin Wei ◽

Yanbin Zhu ◽

Jinlin Ma ◽

...

Keyword(s):

Reading Comprehension ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

State Of The Art ◽

Market Value ◽

The State ◽

Experimental Results ◽

Training Time ◽

The Public

Text-based reading comprehension models have great research significance and market value and are one of the main directions of natural language processing. Reading comprehension models of single-span answers have recently attracted more attention and achieved significant results. In contrast, multi-span answer models for reading comprehension have been less investigated and their performances need improvement. To address this issue, in this paper, we propose a text-based multi-span network for reading comprehension, ALBERT_SBoundary, and build a multi-span answer corpus, MultiSpan_NMU. We also conduct extensive experiments on the public multi-span corpus, MultiSpan_DROP, and our multi-span answer corpus, MultiSpan_NMU, and compare the proposed method with the state-of-the-art. The experimental results show that our proposed method achieves F1 scores of 84.10 and 92.88 on MultiSpan_DROP and MultiSpan_NMU datasets, respectively, while it also has fewer parameters and a shorter training time.

Download Full-text