scholarly journals Detailed Error Annotation for Morphologically Rich Languages: Latvian Use Case

Author(s):  
Roberts Darģis ◽  
Ilze Auzin̦a ◽  
Kristīne Levāne-Petrova ◽  
Inga Kaija

This paper presents a detailed error annotation for morphologically rich languages. The described approach is used to create Latvian Language Learner corpus (LaVA) which is part of a currently ongoing project Development of Learner corpus of Latvian: methods, tools and applications. There is no need for an advanced multi-token error annotation schema, because error annotated texts are written by beginner level (A1 and A2) who use simple syntactic structures. This schema focuses on in-depth categorization of spelling and word formation errors. The annotation schema will work best for languages with relatively free word order and rich morphology.

Author(s):  
Yan Huang ◽  
Akira Murakami ◽  
Theodora Alexopoulou ◽  
Anna Korhonen

Abstract As large-scale learner corpora become increasingly available, it is vital that natural language processing (NLP) technology is developed to provide rich linguistic annotations necessary for second language (L2) research. We present a system for automatically analyzing subcategorization frames (SCFs) for learner English. SCFs link lexis with morphosyntax, shedding light on the interplay between lexical and structural information in learner language. Meanwhile, SCFs are crucial to the study of a wide range of phenomena including individual verbs, verb classes and varying syntactic structures. To illustrate the usefulness of our system for learner corpus research and second language acquisition (SLA), we investigate how L2 learners diversify their use of SCFs in text and how this diversity changes with L2 proficiency.


1994 ◽  
Vol 18 (1) ◽  
pp. 1-26 ◽  
Author(s):  
Philippe E. Barbaud

In this study, it is shown that the "category changing" property of morphological rules of conversion is unable to account for compound words, for formal and semantic reasons. Several convergent facts demonstrate that the compounding process is syntactic in nature. Consequently, it is argued that X-bar theory must be involved in compound word formation because of the "lexical function" of the syntax. Empirical data are mainly focused on French Noms Composés à base Verbale, or NCV, as tire-bouchon (cork screw),porte-parole (spoke person), gagne-pain (job), etc., which are analyzed as base generated "quasi-VPs" embedded in a NP. Thus, the NPWP exocentric dominance instantiates a "syntactic conversion" at the D-structure level. Such a categorial hierarchy is based on the "distribution changing" property of X-bar theory rather than on the "category changing" property of structuring morphological rules. Therefore, the high productivity of NCVs in French and other Romance languages is due to their morphology, which allows SPEC\HEAD agreement and VERB RAISING movement. The licensing of exocentric X-bar structures in grammar depends on several semantic principles of lexical interpretation, which are relevant to hyperonymy, hyponymy, meronymy, etc.. Thus, the model is dispensed with a superfluous component of "peripheral" rules of compounding. In conclusion, exocentricity of syntactic structures leads the author to claim that X-bar schema is primitive in grammar and that a given phrase is not the necessary projection of its head.


2019 ◽  
Vol 6 ◽  
pp. 67-104 ◽  
Author(s):  
Elena Volodina ◽  
Lena Granstedt ◽  
Arild Matsson ◽  
Beáta Megyesi ◽  
Ildikó Pilán ◽  
...  

The article presents a new language learner corpus for Swedish, SweLL, and the methodology from collection and pesudonymisation to protect personal information of learners to annotation adapted to second language learning. The main aim is to deliver a well-annotated corpus of essays written by second language learners of Swedish and make it available for research through a browsable environment. To that end, a new annotation tool and a new project management tool have been implemented, – both with the main purpose to ensure reliability and quality of the final corpus. In the article we discuss reasoning behind metadata selection, principles of gold corpus compilation and argue for separation of normalization from correction annotation.


2020 ◽  
Vol 4 (2) ◽  
pp. 102
Author(s):  
Muhammad Hendra Sunarya ◽  
Rusman Irwansyah ◽  
Muhammad Syahid Pebriadi

The process of monitoring and evaluating a project needs to be done to ensure the project runs smoothly. The problems that arise at this stage are obstacles to knowing the progress of the ongoing project. Some other obstacles are the potential for file loss, duplication of data, human errors, the need for fast processing, and the need for data integrity of two systems. This study aims to develop a project monitoring and evaluation information system applying deviation calculations. This calculation is carried out to see the realization of the work of an ongoing project. The system design technique uses the use case diagram method which describes the process of the system. The waterfall method is used to develop a monitoring and evaluation information system for this project. The research result is in the form of a project monitoring and evaluation information system that can provide information about the work realization of an ongoing project.Keywords:Monitoring, Evaluation, Deviation Calculation, Information System.


2020 ◽  
Vol 75 (1) ◽  
pp. 339-344
Author(s):  
М.А. Yermekbayev ◽  

The main result of the learning process in primary school depends on how many competencies the student has mastered. a student who has developed linguistic and cultural competencies is immersed in the grammatical structural secrets of the Kazakh language, understands the laws of applying certain grammatical rules in speech, and learns the mechanism of sentence formation. And without competence, the undocumented skills of theoretical complementarity are transformed into the preparation of a single plateau. On a conscious scale, students ' language competencies become a necessary condition for organizing their language skills and abilities. Properly formed language and cultural competencies help students to understand the studied phonetic, lexical, grammatical phenomena in depth, to understand the mechanism of Word formation, correct spelling rules, the presence of certain types of speech in the language presence, syntactic structures, etc.


Author(s):  
Nurit Melnik

Situations in which conflicting constraints clash can potentially provide linguists with insights into the architecture of grammar. This paper deals with such a case. When predicative modifiers of morphologically rich languages head relative clauses, they are involved in two, sometimes conflicting, agreement relationships. Different languages adopt different strategies in order to resolve situations of conflicting constraints. This paper focuses on Standard Arabic and the hybrid agreement strategy which it employs. It argues that the HPSG theory of agreement, which distinguishes between morphosyntactic and semantic agreement, constitutes an appropriate framework for accounting for the phenomenon. In addition, it shows that contrary to claims made by Doron and Reintges (2005), a non-derivational framework such as HPSG is adequate for accounting for this non-trivial agreement pattern. Moreover, with a constructional approach, whereby constraints can target syntactic structures above the lexical level, better empirical coverage is achieved.


2021 ◽  
Vol 7 (2) ◽  
pp. 275-289
Author(s):  
Nadine Herry-Bénit ◽  
Stéphanie Lopez ◽  
Takeki Kamiyama ◽  
Jeff Tennant

Abstract This article presents the IPCE-IPAC corpus, an ongoing project, which has been collected in France, Italy, Spain and China since 2014. The data is collected to investigate the acquisition of segmental and suprasegmental phenomena by L2 learners of English, with a focus on phonemes. The article discusses the methods for the compilation of this original spoken learner corpus, designed to study L2 “interphonology” (Detey, Racine, Kawaguchi, & Zay, 2016), or interlanguage phonology.


2021 ◽  
pp. 162-177
Author(s):  
Antra Kļavinska ◽  

Several text corpora have been created in Latvia, including learner corpora. One of the latest projects is the Latvian Language Learner Corpus (LaVA), which contains the works of international students studying in Latvian higher education institutions who are learning Latvian as a foreign language. The texts are morphologically tagged automatically, and learner errors are tagged manually. A sufficient scope of publications is available, which provides the theoretical basis for the creation of Latvian language learner corpora; however, there is a lack of studies or practical methodological guidelines concerning the opportunities for their application, and there is little data about the use of text corpora in language acquisition. The aim of this study is to explain from the theoretical perspective for what purposes learner corpus data may be used, as well as to illustrate the methodological groundwork with examples from the LaVA corpus. Analysis of theoretical literature has demonstrated the functions and meaning of learner corpora in research, and experience with the use of corpora in acquiring a foreign language has been analysed. Examples of the use of the LaVA corpus as a didactic resource have been prepared using Corpus Linguistics methods. The study was conducted within the state research programme project “The Latvian Language”. After studying the functions of learner corpora from the theoretical perspective, it was concluded that the target audience of the LaVA corpus mainly includes teachers of Latvian as a foreign language (LATS), authors of teaching materials, as well as Latvian language learners. To facilitate the use of the LaVA corpus, it is important to have basic knowledge of Corpus Linguistics, an understanding of the theory of language, as well as an understanding of foreign language teaching methodology. LATS teachers can use the LaVA corpus data in the creation of curricula and teaching materials, in the preparation of language proficiency tests, etc. Using the inductive approach in language acquisition, language learners can also become language researchers, can analyse the errors of other learners, etc. Undeniably, the LaVA corpus can be used in broader linguistic research, for example, in contrastive interlanguage analysis, comparing the data of language learners with the data of native speakers or the data of different groups of language learners.


2011 ◽  
Vol 11 (2) ◽  
pp. 97
Author(s):  
Ana Elina Martínez Insua

This paper is concerned with how <em>there</em>-constructions may have helped to achieve discourse coherence in the recent history of English. From the theoretical framework of Meta-Informative Centering Theory (MIC) the paper explores the possibility to establish a relation between the syntactic structures under analysis and the distinction between 'smooth-shift' and 'rough-shift' transitions from one centre of attention to another (Brennan, Friedman &amp; Pollard, 1987). This will help, ultimately, to investigate the interaction between centering and MIC theories, word order and information structure in a 'non-free' word order language such as English. A corpus- driven analysis of the behaviour of spoken and written <em>there</em>-constructions from late Middle English to Present Day English will show their capacity to function either as highly coherent structures that continue with the same local topic as the previous utterance(s), or as means to shift the local focus of attention.


Sign in / Sign up

Export Citation Format

Share Document