scholarly journals Joint Transition-Based Models for Morpho-Syntactic Parsing: Parsing Strategies for MRLs and a Case Study from Modern Hebrew

Author(s):  
Amir More ◽  
Amit Seker ◽  
Victoria Basmova ◽  
Reut Tsarfaty

In standard NLP pipelines, morphological analysis and disambiguation (MA&D) precedes syntactic and semantic downstream tasks. However, for languages with complex and ambiguous word-internal structure, known as morphologically rich languages (MRLs), it has been hypothesized that syntactic context may be crucial for accurate MA&D, and vice versa. In this work we empirically confirm this hypothesis for Modern Hebrew, an MRL with complex morphology and severe word-level ambiguity, in a novel transition-based framework. Specifically, we propose a joint morphosyntactic transition-based framework which formally unifies two distinct transition systems, morphological and syntactic, into a single transition-based system with joint training and joint inference. We empirically show that MA&D results obtained in the joint settings outperform MA&D results obtained by the respective standalone components, and that end-to-end parsing results obtained by our joint system present a new state of the art for Hebrew dependency parsing.

2021 ◽  
Vol 13 (6) ◽  
pp. 3323
Author(s):  
Nishtman Karimi ◽  
Hossein Azadi ◽  
Kobe Boussauw

Continuously changing conditions of sociotechnical systems are the basis of structural changes in communities. Relationships between transition contexts and regime transformation processes and their driving factors in sociotechnical regimes are poorly understood. Moreover, not all changes in multilevel governance regimes are geared towards sustainability, as demonstrated by the case of the water management regime in Sanandaj county in the west of Iran between 1962 and 2018. The current study shows how the management regime of water resources in the case study has changed over time and identifies the institutional arrangements through a retrospective analysis. The analysis is based on three stages of data collection which included a discussion group, a Delphi survey, and a focus group survey among various types of stakeholders. The “Hybrid Transitions” framework is introduced in order to denote processes of regime change that take place in a range of different transition contexts. The findings do not identify a single transition pathway but show that a number of parallel transition pathways have occurred in the context of groundwater and surface water management and their respective institutional arrangements. The study provides a better understanding of the complexity of transition pathways that were devised at the management regime level.


Author(s):  
Rashmini Naranpanawa ◽  
Ravinga Perera ◽  
Thilakshi Fonseka ◽  
Uthayasanker Thayasivam

Neural machine translation (NMT) is a remarkable approach which performs much better than the Statistical machine translation (SMT) models when there is an abundance of parallel corpus. However, vanilla NMT is primarily based upon word-level with a fixed vocabulary. Therefore, low resource morphologically rich languages such as Sinhala are mostly affected by the out of vocabulary (OOV) and Rare word problems. Recent advancements in subword techniques have opened up opportunities for low resource communities by enabling open vocabulary translation. In this paper, we extend our recently published state-of-the-art EN-SI translation system using the transformer and explore standard subword techniques on top of it to identify which subword approach has a greater effect on English Sinhala language pair. Our models demonstrate that subword segmentation strategies along with the state-of-the-art NMT can perform remarkably when translating English sentences into a rich morphology language regardless of a large parallel corpus.


2003 ◽  
Vol 12 (1) ◽  
pp. 57-70 ◽  
Author(s):  
Yael Reshef

This article studies the relevance of an historical lexical analysis to the stylistic description of Modern Hebrew texts. The examination of the lexical make-up of two distinct genres - administrative language and folksong - reveals a correlation between the social functions of the corpora and their formal characteristics. The administrative corpus reflects the lexical structure of standard Modern Hebrew. The folksong, on the other hand, is influenced by literary and ideological considerations. Consequently, it gives expression to the cultural ties with the traditional Hebrew sources by an abundant use of inherited lexicon. The findings suggest that in text-oriented cultures such as Hebrew, stylistic description can benefit from an historical analysis. Such an analysis responds to an intrinsic socio-linguistic characteristic of the language, and complements the structural stylistic analysis. Following Sarfatti (1990), the lexical analysis is based on distinctions drawn within each lexical item between three elements - root, form and meaning. Such a distinction takes account of diachronic changes in the semantic value of lexical items. It pinpoints factors characterizing the corpora’s lexical composition and enables multi-level distinctions between different types of discourse. As a result, it sheds light on one aspect of genre differentiation in the language.


2018 ◽  
Vol 6 ◽  
pp. 451-465 ◽  
Author(s):  
Daniela Gerz ◽  
Ivan Vulić ◽  
Edoardo Ponti ◽  
Jason Naradowsky ◽  
Roi Reichart ◽  
...  

Neural architectures are prominent in the construction of language models (LMs). However, word-level prediction is typically agnostic of subword-level information (characters and character sequences) and operates over a closed vocabulary, consisting of a limited word set. Indeed, while subword-aware models boost performance across a variety of NLP tasks, previous work did not evaluate the ability of these models to assist next-word prediction in language modeling tasks. Such subword-level informed models should be particularly effective for morphologically-rich languages (MRLs) that exhibit high type-to-token ratios. In this work, we present a large-scale LM study on 50 typologically diverse languages covering a wide variety of morphological systems, and offer new LM benchmarks to the community, while considering subword-level information. The main technical contribution of our work is a novel method for injecting subword-level information into semantic word vectors, integrated into the neural language modeling training, to facilitate word-level prediction. We conduct experiments in the LM setting where the number of infrequent words is large, and demonstrate strong perplexity gains across our 50 languages, especially for morphologically-rich languages. Our code and data sets are publicly available.


2021 ◽  
Author(s):  
Fatiha Imane Mahcar ◽  
Belkacem Takhi

Algeria has a rich urban and architectural heritage, which presents regional specificities. Once the ksour was a symbol of balance and perfect harmony with its environment, unfortunately today they no longer reflect their former function. The Ksourian architecture, including that of Laghouat is a prestigious heritage of high value; it is the testimony of genius knowledge and the capacity of their occupants to adapt to the difficult environment. The housing is considered the essential core of this architecture it represents the entire composition of the ksar, its design is inspired by the immediate environment and respects ancestral social values. It is characterized by a simple architecture and simple construction techniques which are based on the construction in load-bearing walls, the construction materials used are local materials of great resistance and less expensive. This study addresses the theme revaluation of heritage, particularly our case study ksar El-Haouita which has experienced a neglecting and depopulation due to several factors. The ksar El-Haouita is among the most famous ksour located in the south of Algeria and exactly in the region of Laghouat. It is built with simple materials and techniques of construction. The construction materials used are local materials like stones and lime found in the environmental surroundings of the ksar. The aim of this study is to identify the major causes of the degradation of ksar, also to preserve ksar El-Haouita through specific operations and to improve the tourist attractiveness of ksar El-Haouita in order to promote heritage, to convert it back into sustainable Saharan tourism. Our study based in the first place; on a theoretical underpinning which contains the notions that have a relation with our theme, the problematic and the envisaged objective, then a presentation of ksar followed by a morphological analysis accompanied by identification of the problems to identify the phenomena of damage and its disfigurement. The last step is to treat an aspect for the development of ksar, this aspect is devoted to the restitution of the defensive system (doors, ramparts, ramparts of houses and towers) of the ksar, through a diagnostic and several operations like (rehabilitation and reconstruction). The aim result of this study is to show that the revaluation of the ksar is a very vast operation and proposes interventions that allow the preservation of the ksar and also to understand the elements that help the success of interventions and to put some of the parameters considered as reference elements and basic principles for the operations on the ksar and among these operations it is (the case study, which is the restitution of the defensive system of ksar El-Haouita).


2001 ◽  
Vol 32 (3) ◽  
pp. 182-195 ◽  
Author(s):  
Kenn Apel ◽  
Julie J. Masterson

Purpose : Current research and theory in spelling development and best practices for literacy instruction were reviewed to develop a set of theoretically guided assessment and intervention procedures. These procedures were applied to the case of a 13-year-old student with spelling difficulties. Method : The student was involved in an intensive group intervention program that focused on increasing foundational skills for spelling and on oral word-level reading. Assessment results led to an intervention program targeting phonemic and morphological awareness skills and orthographic knowledge. Results : The student demonstrated clinically significant growth in phonemic and morphological awareness, orthographic knowledge, spelling, and word-level reading. Conclusion : Results of the case study suggest that assessment and intervention procedures guided by theory and research can lead speech-language pathologists to effective participation in aspects of spelling remediation. Additionally, the case study may serve as a model for clinical services and evidence-based practice within clinical settings.


2013 ◽  
Vol 100 (1) ◽  
pp. 51-62 ◽  
Author(s):  
Eva Schlinger ◽  
Victor Chahuneau ◽  
Chris Dyer

Abstract We present morphogen, a tool for improving translation into morphologically rich languages with synthetic phrases. We approach the problem of translating into morphologically rich languages in two phases. First, an inflection model is learned to predict target word inflections from source side context. Then this model is used to create additional sentence specific translation phrases. These “synthetic phrases” augment the standard translation grammars and decoding proceeds normally with a standard translation model. We present an open source Python implementation of our method, as well as a method of obtaining an unsupervised morphological analysis of the target language when no supervised analyzer is available.


2008 ◽  
Vol 14 (2) ◽  
pp. 223-251 ◽  
Author(s):  
ROY BAR-HAIM ◽  
KHALIL SIMA'AN ◽  
YOAD WINTER

AbstractWords in Semitic texts often consist of a concatenation ofword segments, each corresponding to a part-of-speech (POS) category. Semitic words may be ambiguous with regard to their segmentation as well as to the POS tags assigned to each segment. When designing POS taggers for Semitic languages, a major architectural decision concerns the choice of the atomic input tokens (terminal symbols). If the tokenization is at the word level, the output tags must be complex, and represent both the segmentation of the word and the POS tag assigned to each word segment. If the tokenization is at the segment level, the input itself must encode the different alternative segmentations of the words, while the output consists of standard POS tags. Comparing these two alternatives is not trivial, as the choice between them may have global effects on the grammatical model. Moreover, intermediate levels of tokenization between these two extremes are conceivable, and, as we aim to show, beneficial. To the best of our knowledge, the problem of tokenization for POS tagging of Semitic languages has not been addressed before in full generality. In this paper, we study this problem for the purpose of POS tagging of Modern Hebrew texts. After extensive error analysis of the two simple tokenization models, we propose a novel, linguistically motivated, intermediate tokenization model that gives better performance for Hebrew over the two initial architectures. Our study is based on the well-known hidden Markov models (HMMs). We start out from a manually devised morphological analyzer and a very small annotated corpus, and describe how to adapt an HMM-based POS tagger for both tokenization architectures. We present an effective technique for smoothing the lexical probabilities using an untagged corpus, and a novel transformation for casting the segment-level tagger in terms of a standard, word-level HMM implementation. The results obtained using our model are on par with the best published results on Modern Standard Arabic, despite the much smaller annotated corpus available for Modern Hebrew.


2021 ◽  
Vol 13 (21) ◽  
pp. 12210
Author(s):  
Manel Elmsalmi ◽  
Wafik Hachicha ◽  
Awad M. Aljuaid

The supply chain risk management (SCRM) is very critical to strategically support the firms to continuous success. There are, at least, three basic steps in this SCRM process: risk identification, risk evaluation, and risk mitigation (treatment). Whatever happens, the main step is risk mitigation (RM) and mainly sustainable RM. In fact, every risk must be eliminated or controlled as much as possible. The purpose of this paper is to elaborate and evaluate various RM scenarios from an initial risk identification and prioritization solution. The proposed scenario modeling technique is based on morphological analysis (MA) as an explorative scenario tool for RM. MA is used to develop a framework to proactively assess critical risk variables. Firstly, MA is employed to exhaustively create possible RM scenarios and, secondly, to assess the likelihood of each scenario. The proposed approach addresses the need for a basic rubric to help identify and choose RM approaches. A real case study is provided from the food industry to illustrate the application of the proposed approach. To handle all possible MA strategies, a dedicated MORPHOL software package is used. In addition, RM strategies are selected based on sustainability indicators. The case study results prove that MA has a considerable value for SCRM. It shows that firms can adopt multiple robust strategies in the form of a scenario describing all stages of SCRM in an integrated representation.


Sign in / Sign up

Export Citation Format

Share Document