scholarly journals Cross-lingual Adaptation Using Universal Dependencies

Author(s):  
Nasrin Taghizadeh ◽  
Heshaam Faili

We describe a cross-lingual adaptation method based on syntactic parse trees obtained from the Universal Dependencies (UD), which are consistent across languages, to develop classifiers in low-resource languages. The idea of UD parsing is to capture similarities as well as idiosyncrasies among typologically different languages. In this article, we show that models trained using UD parse trees for complex NLP tasks can characterize very different languages. We study two tasks of paraphrase identification and relation extraction as case studies. Based on UD parse trees, we develop several models using tree kernels and show that these models trained on the English dataset can correctly classify data of other languages, e.g., French, Farsi, and Arabic. The proposed approach opens up avenues for exploiting UD parsing in solving similar cross-lingual tasks, which is very useful for languages for which no labeled data is available.

2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Chuanming Yu ◽  
Haodong Xue ◽  
Manyi Wang ◽  
Lu An

Purpose Owing to the uneven distribution of annotated corpus among different languages, it is necessary to bridge the gap between low resource languages and high resource languages. From the perspective of entity relation extraction, this paper aims to extend the knowledge acquisition task from a single language context to a cross-lingual context, and to improve the relation extraction performance for low resource languages. Design/methodology/approach This paper proposes a cross-lingual adversarial relation extraction (CLARE) framework, which decomposes cross-lingual relation extraction into parallel corpus acquisition and adversarial adaptation relation extraction. Based on the proposed framework, this paper conducts extensive experiments in two tasks, i.e. the English-to-Chinese and the English-to-Arabic cross-lingual entity relation extraction. Findings The Macro-F1 values of the optimal models in the two tasks are 0.880 1 and 0.789 9, respectively, indicating that the proposed CLARE framework for CLARE can significantly improve the effect of low resource language entity relation extraction. The experimental results suggest that the proposed framework can effectively transfer the corpus as well as the annotated tags from English to Chinese and Arabic. This study reveals that the proposed approach is less human labour intensive and more effective in the cross-lingual entity relation extraction than the manual method. It shows that this approach has high generalizability among different languages. Originality/value The research results are of great significance for improving the performance of the cross-lingual knowledge acquisition. The cross-lingual transfer may greatly reduce the time and cost of the manual construction of the multi-lingual corpus. It sheds light on the knowledge acquisition and organization from the unstructured text in the era of big data.


2021 ◽  
Author(s):  
Mengzhou Xia ◽  
Guoqing Zheng ◽  
Subhabrata Mukherjee ◽  
Milad Shokouhi ◽  
Graham Neubig ◽  
...  

2021 ◽  
Vol 1 (2) ◽  
Author(s):  
Garrett Nicolai ◽  
Edith Coates ◽  
Ming Zhang ◽  
Miika Silfverberg

We present an extension to the JHU Bible corpus, collecting and normalizing more than thirty Bible translations in thirty Indigenous languages of North America. These exhibit a wide variety of interesting syntactic and morphological phenomena that are understudied in the computational community. Neural translation experiments demonstrate significant gains obtained through cross-lingual, many-to-many translation, with improvements of up to 8.4 BLEU over monolingual models for extremely low-resource languages.


Sign in / Sign up

Export Citation Format

Share Document