TransBERT

Zhongyang Li; Xiao Ding; Ting Liu

doi:10.1145/3427669

TransBERT

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3427669 ◽

2021 ◽

Vol 20 (1) ◽

pp. 1-20

Author(s):

Zhongyang Li ◽

Xiao Ding ◽

Ting Liu

Keyword(s):

Natural Language ◽

Large Scale ◽

State Of The Art ◽

Language Model ◽

Fine Tuning ◽

Action Prediction ◽

Target Task ◽

Language Knowledge ◽

Previous State ◽

Transfer Tasks

Recent advances, such as GPT, BERT, and RoBERTa, have shown success in incorporating a pre-trained transformer language model and fine-tuning operations to improve downstream NLP systems. However, this framework still has some fundamental problems in effectively incorporating supervised knowledge from other related tasks. In this study, we investigate a transferable BERT (TransBERT) training framework, which can transfer not only general language knowledge from large-scale unlabeled data but also specific kinds of knowledge from various semantically related supervised tasks, for a target task. Particularly, we propose utilizing three kinds of transfer tasks, including natural language inference, sentiment classification, and next action prediction, to further train BERT based on a pre-trained model. This enables the model to get a better initialization for the target task. We take story-ending prediction as the target task to conduct experiments. The final results of 96.0% and 95.0% accuracy on two versions of Story Cloze Test datasets dramatically outperform previous state-of-the-art baseline methods. Several comparative experiments give some helpful suggestions on how to select transfer tasks to improve BERT. Furthermore, experiments on six English and three Chinese datasets show that TransBERT generalizes well to other tasks, languages, and pre-trained models.

Download Full-text

Story Ending Prediction by Transferable BERT

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/249 ◽

2019 ◽

Cited By ~ 3

Author(s):

Zhongyang Li ◽

Xiao Ding ◽

Ting Liu

Keyword(s):

Natural Language ◽

Large Scale ◽

State Of The Art ◽

Language Model ◽

Fine Tuning ◽

Action Prediction ◽

Target Task ◽

Language Knowledge ◽

Previous State ◽

Transfer Tasks

Recent advances, such as GPT and BERT, have shown success in incorporating a pre-trained transformer language model and fine-tuning operation to improve downstream NLP systems. However, this framework still has some fundamental problems in effectively incorporating supervised knowledge from other related tasks. In this study, we investigate a transferable BERT (TransBERT) training framework, which can transfer not only general language knowledge from large-scale unlabeled data but also specific kinds of knowledge from various semantically related supervised tasks, for a target task. Particularly, we propose utilizing three kinds of transfer tasks, including natural language inference, sentiment classification, and next action prediction, to further train BERT based on a pre-trained model. This enables the model to get a better initialization for the target task. We take story ending prediction as the target task to conduct experiments. The final result, an accuracy of 91.8%, dramatically outperforms previous state-of-the-art baseline methods. Several comparative experiments give some helpful suggestions on how to select transfer tasks to improve BERT.

Download Full-text

Extremely Low-Resource Text Simplification with Pre-trained Transformer Language Model

International Journal of Asian Language Processing ◽

10.1142/s2717554520500010 ◽

2020 ◽

Vol 30 (01) ◽

pp. 2050001

Author(s):

Takumi Maruyama ◽

Kazuhide Yamamoto

Keyword(s):

Machine Translation ◽

Large Scale ◽

State Of The Art ◽

Language Model ◽

Fine Tuning ◽

Neural Machine Translation ◽

Low Resource ◽

Resource Setting ◽

Text Simplification ◽

Low Resource Setting

Inspired by machine translation task, recent text simplification approaches regard a task as a monolingual text-to-text generation, and neural machine translation models have significantly improved the performance of simplification tasks. Although such models require a large-scale parallel corpus, such corpora for text simplification are very few in number and smaller in size compared to machine translation task. Therefore, we have attempted to facilitate the training of simplification rewritings using pre-training from a large-scale monolingual corpus such as Wikipedia articles. In addition, we propose a translation language model to seamlessly conduct a fine-tuning of text simplification from the pre-training of the language model. The experimental results show that the translation language model substantially outperforms a state-of-the-art model under a low-resource setting. In addition, a pre-trained translation language model with only 3000 supervised examples can achieve a performance comparable to that of the state-of-the-art model using 30,000 supervised examples.

Download Full-text

Large-scale Semantic Parsing without Question-Answer Pairs

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00190 ◽

2014 ◽

Vol 2 ◽

pp. 377-392 ◽

Cited By ~ 40

Author(s):

Siva Reddy ◽

Mirella Lapata ◽

Mark Steedman

Keyword(s):

Natural Language ◽

Large Scale ◽

Graph Matching ◽

State Of The Art ◽

The State ◽

Semantic Parsing ◽

Matching Problem ◽

Weak Supervision ◽

Benchmark Datasets

In this paper we introduce a novel semantic parsing approach to query Freebase in natural language without requiring manual annotations or question-answer pairs. Our key insight is to represent natural language via semantic graphs whose topology shares many commonalities with Freebase. Given this representation, we conceptualize semantic parsing as a graph matching problem. Our model converts sentences to semantic graphs using CCG and subsequently grounds them to Freebase guided by denotations as a form of weak supervision. Evaluation experiments on a subset of the Free917 and WebQuestions benchmark datasets show our semantic parser improves over the state of the art.

Download Full-text

Detection of Manipulated Face Videos over Social Networks: A Large-Scale Study

Journal of Imaging ◽

10.3390/jimaging7100193 ◽

2021 ◽

Vol 7 (10) ◽

pp. 193

Author(s):

Federico Marcon ◽

Cecilia Pasquini ◽

Giulia Boato

Keyword(s):

Large Scale ◽

State Of The Art ◽

Forensic Analysis ◽

General Purpose ◽

Fine Tuning ◽

Specific Technique ◽

Multimedia Forensics ◽

Shared Data ◽

Social Media Platforms ◽

The Web

The detection of manipulated videos represents a highly relevant problem in multimedia forensics, which has been widely investigated in the last years. However, a common trait of published studies is the fact that the forensic analysis is typically applied on data prior to their potential dissemination over the web. This work addresses the challenging scenario where manipulated videos are first shared through social media platforms and then are subject to the forensic analysis. In this context, a large scale performance evaluation has been carried out involving general purpose deep networks and state-of-the-art manipulated data, and studying different effects. Results confirm that a performance drop is observed in every case when unseen shared data are tested by networks trained on non-shared data; however, fine-tuning operations can mitigate this problem. Also, we show that the output of differently trained networks can carry useful forensic information for the identification of the specific technique used for visual manipulation, both for shared and non-shared data.

Download Full-text

Semantics-Aware BERT for Language Understanding

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6510 ◽

2020 ◽

Vol 34 (05) ◽

pp. 9628-9635

Author(s):

Zhuosheng Zhang ◽

Yuwei Wu ◽

Hai Zhao ◽

Zuchao Li ◽

Shuailiang Zhang ◽

...

Keyword(s):

Reading Comprehension ◽

Natural Language ◽

Language Model ◽

Fine Tuning ◽

Semantic Role Labeling ◽

Language Understanding ◽

Context Sensitive ◽

Language Representation ◽

Model Training ◽

Machine Reading

The latest work on language representations carefully integrates contextualized features into language model training, which enables a series of success especially in various machine reading comprehension and natural language inference tasks. However, the existing language representation models including ELMo, GPT and BERT only exploit plain context-sensitive features such as character or word embeddings. They rarely consider incorporating structured semantic information which can provide rich semantics for language representation. To promote natural language understanding, we propose to incorporate explicit contextual semantics from pre-trained semantic role labeling, and introduce an improved language representation model, Semantics-aware BERT (SemBERT), which is capable of explicitly absorbing contextual semantics over a BERT backbone. SemBERT keeps the convenient usability of its BERT precursor in a light fine-tuning way without substantial task-specific modifications. Compared with BERT, semantics-aware BERT is as simple in concept but more powerful. It obtains new state-of-the-art or substantially improves results on ten reading comprehension and language inference tasks.

Download Full-text

Highly accurate classification of chest radiographic reports using a deep learning natural language model pre-trained on 3.8 million text reports

Bioinformatics ◽

10.1093/bioinformatics/btaa668 ◽

2020 ◽

Author(s):

Keno K Bressem ◽

Lisa C Adams ◽

Robert A Gaudin ◽

Daniel Tröltzsch ◽

Bernd Hamm ◽

...

Keyword(s):

Natural Language ◽

Language Processing ◽

Language Model ◽

Fine Tuning ◽

Supplementary Information ◽

Free Text ◽

Clinical Workflow ◽

Text Data ◽

Unlabelled Data ◽

Medical Reports

Abstract Motivation The development of deep, bidirectional transformers such as Bidirectional Encoder Representations from Transformers (BERT) led to an outperformance of several Natural Language Processing (NLP) benchmarks. Especially in radiology, large amounts of free-text data are generated in daily clinical workflow. These report texts could be of particular use for the generation of labels in machine learning, especially for image classification. However, as report texts are mostly unstructured, advanced NLP methods are needed to enable accurate text classification. While neural networks can be used for this purpose, they must first be trained on large amounts of manually labelled data to achieve good results. In contrast, BERT models can be pre-trained on unlabelled data and then only require fine tuning on a small amount of manually labelled data to achieve even better results. Results Using BERT to identify the most important findings in intensive care chest radiograph reports, we achieve areas under the receiver operation characteristics curve of 0.98 for congestion, 0.97 for effusion, 0.97 for consolidation and 0.99 for pneumothorax, surpassing the accuracy of previous approaches with comparatively little annotation effort. Our approach could therefore help to improve information extraction from free-text medical reports. Availability and implementation We make the source code for fine-tuning the BERT-models freely available at https://github.com/fast-raidiology/bert-for-radiology. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Improving shift-reduce constituency parsing with large-scale unlabeled data

Natural Language Engineering ◽

10.1017/s1351324913000119 ◽

2013 ◽

Vol 21 (1) ◽

pp. 113-138 ◽

Cited By ~ 1

Author(s):

MUHUA ZHU ◽

JINGBO ZHU ◽

HUIZHEN WANG

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Large Scale ◽

State Of The Art ◽

Unlabeled Data ◽

Experimental Results ◽

Empirical Methods ◽

Part Of Speech

AbstractShift-reduce parsing has been studied extensively for diverse grammars due to the simplicity and running efficiency. However, in the field of constituency parsing, shift-reduce parsers lag behind state-of-the-art parsers. In this paper we propose a semi-supervised approach for advancing shift-reduce constituency parsing. First, we apply the uptraining approach (Petrov, S. et al. 2010. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP), Cambridge, MA, USA, pp. 705–713) to improve part-of-speech taggers to provide better part-of-speech tags to subsequent shift-reduce parsers. Second, we enhance shift-reduce parsing models with novel features that are defined on lexical dependency information. Both stages depend on the use of large-scale unlabeled data. Experimental results show that the approach achieves overall improvements of 1.5 percent and 2.1 percent on English and Chinese data respectively. Moreover, the final parsing accuracies reach 90.9 percent and 82.2 percent respectively, which are comparable with the accuracy of state-of-the-art parsers.

Download Full-text

Probing Natural Language Inference Models through Semantic Fragments

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6397 ◽

2020 ◽

Vol 34 (05) ◽

pp. 8713-8721

Author(s):

Kyle Richardson ◽

Hai Hu ◽

Lawrence Moss ◽

Ashish Sabharwal

Keyword(s):

Natural Language ◽

State Of The Art ◽

Fine Tuning ◽

Language Understanding ◽

Inference Models ◽

Linguistic Understanding ◽

Benchmark Datasets ◽

Linguistic Behavior ◽

Linguistic Models

Do state-of-the-art models for language understanding already have, or can they easily learn, abilities such as boolean coordination, quantification, conditionals, comparatives, and monotonicity reasoning (i.e., reasoning about word substitutions in sentential contexts)? While such phenomena are involved in natural language inference (NLI) and go beyond basic linguistic understanding, it is unclear the extent to which they are captured in existing NLI benchmarks and effectively learned by models. To investigate this, we propose the use of semantic fragments—systematically generated datasets that each target a different semantic phenomenon—for probing, and efficiently improving, such capabilities of linguistic models. This approach to creating challenge datasets allows direct control over the semantic diversity and complexity of the targeted linguistic phenomena, and results in a more precise characterization of a model's linguistic behavior. Our experiments, using a library of 8 such semantic fragments, reveal two remarkable findings: (a) State-of-the-art models, including BERT, that are pre-trained on existing NLI benchmark datasets perform poorly on these new fragments, even though the phenomena probed here are central to the NLI task; (b) On the other hand, with only a few minutes of additional fine-tuning—with a carefully selected learning rate and a novel variation of “inoculation”—a BERT-based model can master all of these logic and monotonicity fragments while retaining its performance on established NLI benchmarks.

Download Full-text

Glyfn: A Glyph-Aware Fusion Network for Distributed Chinese Event Detection

10.5121/csit.2021.110114 ◽

2021 ◽

Author(s):

Qi Zhai ◽

Zhigang Kan ◽

Linhui Feng ◽

Linbo Qiao ◽

Feng Liu

Keyword(s):

Event Detection ◽

Large Scale ◽

State Of The Art ◽

Language Model ◽

Special Kind ◽

Detection Task ◽

Experimental Results ◽

Large Scale Data ◽

Unstructured Text ◽

Scale Data

Recently, Chinese event detection has attracted more and more attention. As a special kind of hieroglyphics, Chinese glyphs are semantically useful but still unexplored in this task. In this paper, we propose a novel Glyph-Aware Fusion Network, named GlyFN. It introduces the glyphs' information into the pre-trained language model representation. To obtain a better representation, we design a Vector Linear Fusion mechanism to fuse them. Specifically, it first utilizes a max-pooling to capture salient information. Then, we use the linear operation of vectors to retain unique information. Moreover, for large-scale unstructured text, we distribute the data into different clusters parallelly. Finally, we conduct extensive experiments on ACE2005 and large-scale data. Experimental results show that GlyFN obtains increases of 7.48(10.18%) and 6.17(8.7%) in the F1-score for trigger identification and classification over the state-of-the-art methods, respectively. Furthermore, the event detection task for large-scale unstructured text can be efficiently accomplished through distribution.

Download Full-text

Solving Analogies on Words based on Minimal Complexity Transformation

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/256 ◽

2020 ◽

Author(s):

Pierre-Alexandre Murena ◽

Marie Al-Ghossein ◽

Jean-Louis Dessalles ◽

Antoine Cornuéjols

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Large Scale ◽

State Of The Art ◽

Benchmark Dataset ◽

Alternative Approach

Analogies are 4-ary relations of the form "A is to B as C is to D". When A, B and C are fixed, we call analogical equation the problem of finding the correct D. A direct applicative domain is Natural Language Processing, in which it has been shown successful on word inflections, such as conjugation or declension. If most approaches rely on the axioms of proportional analogy to solve these equations, these axioms are known to have limitations, in particular in the nature of the considered flections. In this paper, we propose an alternative approach, based on the assumption that optimal word inflections are transformations of minimal complexity. We propose a rough estimation of complexity for word analogies and an algorithm to find the optimal transformations. We illustrate our method on a large-scale benchmark dataset and compare with state-of-the-art approaches to demonstrate the interest of using complexity to solve analogies on words.

Download Full-text