scholarly journals Error Tagging in the Lithuanian Learner Corpus

Author(s):  
Jūratė Ruzaitė ◽  
Sigita Dereškevičiūtė ◽  
Viktorija Kavaliauskaitė-Vilkinienė ◽  
Eglė Krivickaitė-Leišienė

This paper is a work-in-progress report on error annotation in the Lithuanian Learner Corpus (LLC), which has been developed using the TEITOK environment. The LLC is the first electronic corpus of learner Lithuanian that represents learners of very diverse native language backgrounds and different proficiency levels. In this paper, we have a double aim: firstly, we present the structure of the corpus in its current state; and secondly, we describe the main principles, procedures, and challenges of error annotation in the LLC. The main types of errors that are tagged in this corpus and analysed in this paper are orthographic, lexical, and syntactic.

2015 ◽  
Vol 1 (2) ◽  
pp. 187-209 ◽  
Author(s):  
Kristopher Kyle ◽  
Scott A. Crossley ◽  
YouJin Kim

This study evaluates the impact of writing proficiency on native language identification (NLI), a topic that has important implications for the generalizability of NLI models and detection-based arguments for cross-linguistic influence (Jarvis 2010, 2012; CLI). The study uses multinomial logistic regression to classify the first language (L1) group membership of essays at two proficiency levels based on systematic lexical and phrasal choices made by members of five L1 groups. The results indicate that lower proficiency essays are significantly easier to classify than higher proficiency essays, suggesting that lower proficiency writers make lexical and phrasal choices that are more similar to other lower proficiency writers that share an L1 than higher proficiency writers that share an L1. A close analysis of the findings also indicates that the relationship between NLI accuracy and proficiency differed across L1 groups.


2003 ◽  
Vol 9 (4) ◽  
pp. 325-363
Author(s):  
SYLVAIN DELISLE ◽  
BERNARD MOULIN ◽  
TERRY COPECK

Most information systems that deal with natural language texts do not tolerate much deviation from their idealized and simplified model of language. Spoken dialog is notoriously ungrammatical, however. Because the MAREDI project focuses in particular on the automatic analysis of scripted dialogs, we needed to develop a robust capacity to analyze transcribed spoken language. This paper summarizes the current state of our work. It presents the main elements of our approach, which is based on exploiting surface markers as the best route to the semantics of the conversation modelled. We highlight the foundations of our particular conversational model, and give an overview of the MAREDI system. We then discuss its three key modules, a connectionist network to recognise speech acts, a robust syntactic analyzer, and a semantic analyzer.


Author(s):  
Roberts Darģis ◽  
Ilze Auzin̦a ◽  
Kristīne Levāne-Petrova ◽  
Inga Kaija

This paper presents a detailed error annotation for morphologically rich languages. The described approach is used to create Latvian Language Learner corpus (LaVA) which is part of a currently ongoing project Development of Learner corpus of Latvian: methods, tools and applications. There is no need for an advanced multi-token error annotation schema, because error annotated texts are written by beginner level (A1 and A2) who use simple syntactic structures. This schema focuses on in-depth categorization of spelling and word formation errors. The annotation schema will work best for languages with relatively free word order and rich morphology.


Author(s):  
N. V. Remnev ◽  

The task of recognizing the author’s native (Native Language Identification—NLI) language based on a texts, written in a language that is non-native to the author—is the task of automatically recognizing native language (L1). The NLI task was studied in detail for the English language, and two shared tasks were conducted in 2013 and 2017, where TOEFL English essays and essay samples were used as data. There is also a small number of works where the NLI problem was solved for other languages. The NLI problem was investigated for Russian by Ladygina (2017) and Remnev (2019). This paper discusses the use of well-established approaches in the NLI Shared Task 2013 and 2017 competitions to solve the problem of recognizing the author’s native language, as well as to recognize the type of speaker—learners of Russian or Heritage Russian speakers. Native language identification task is also solved based on the types of errors specific to different languages. This study is data-driven and is possible thanks to the Russian Learner Corpus developed by the Higher School of Economics (HSE) Learner Russian Research Group on the basis of which experiments are being conducted.


Author(s):  
Laila Aghai

This qualitative research study focuses on English language learners who are continuing their education in the U.S. high schools and examines their translanguaging in the classroom. When students are learning a second language, they use their linguistic repertoire and their knowledge in English and their native language for negotiation of meaning. In order to gain a better understanding of the students' translanguaging, one ESL teacher and 10 ESL students were interviewed and observed in a classroom. The ESL students spoke Arabic as their native language and had beginning to intermediate proficiency levels. The findings of the study showed that English language learners use various strategies to make the content comprehensible by making connections between their knowledge in their L1 and L2.


Author(s):  
Trude Heift ◽  
Catherine Caws

This chapter discusses the cyclical process of collecting and recycling learner data within the E-Tutor CALL system and presents a study on student usage of its data-driven learning (DDL) tool. E-Tutor consists of a static and dynamic learner corpus for L2 learners of German. The static learner corpus has been constructed from approximately 5000 learners who used the system over a period of five years. These learners provided millions of submissions from a variety of activity types. In addition, all concurrent E-Tutor users contribute data to a dynamic corpus, which allows them to compare and examine their ongoing system submissions to those contained in the static corpus. The authors conducted a study with 84 learners and recorded their interaction with the DDL tool of E-Tutor over one semester. Study results on student usage suggest that investigating sample input of a large, unknown user group might be less informative and of less interest to language learners than their own data. For the DDL tool to be useful for all proficiency levels, training and scaffolding must also be provided.


1974 ◽  
Vol 61 ◽  
pp. 41-41
Author(s):  
J. L. Schombert

The observational programs for the Southern Reference Star Program are rapidly drawing to a close. Observations have been completed, except for a few at San Juan and a small percentage of zone −52° to −64° at the Cape. The Cape zone −64° to − 90° has been cancelled. The current state of the observations is shown in Table I.


1989 ◽  
Vol 20 ◽  
pp. 71-75 ◽  
Author(s):  
D. White

Trigeminated at more or less the same time as the First of September Revolution and the appearance of the first published report from The Society for Libyan Studies, the excavation phase of the extramural Sanctuary of Demeter and Persephone project at Cyrene ran until 1978 when it was stopped in order to begin work on final publication. The intervening years have seen a degree of progress, and this retrospective vicenary issue of Libyan Studies provides a welcome opportunity to take stock of what has been and is going on. The journal's readers will already have some familiarity with the broad outlines of the sanctuary project, since summary articles have been published in an earlier issue (White 1978) and elsewhere (Vickers and Reynolds 1972; Kane 1979; Humphrey 1980; White 1981). In addition reviews of the three published volumes of the final report (White 1984; Schaus 1985; Lowenstam et al. 1987) have appeared here with exemplary promptitude (Lloyd 1985; Boardman 1986; Fulford 1988), as well as externally (Brown 1986; Tomlinson 1986; Cook 1987). The present article's bibliographical citations list what has been written about the sanctuary, but omit the series of preliminary reports in Libya Antiqua (between Vols. 8 and 16) and American Journal of Archaeology (Vols. 78 and 80), whose inclusion would be redundant, as would be any attempt to minute the contents of the reviewed final study volumes. Instead my present intention is to give a short report on work in progress and to summarise the results of what has already been published in separate studies outside the framework of the final publication.


2020 ◽  
Vol 6 (1) ◽  
pp. 1-37
Author(s):  
Katrin Wisniewski

Abstract Little is known about the link between Second Language Acquisition (SLA) developmental stages (Pienemann 1998) and the Common European Framework of Reference (CEFR) proficiency levels. Therefore, this study examines the inversion (INV) and verb-end (VEND) stages in L2 German MERLIN texts rated B1 (N = 104) or A2 (N = 32). Two acquisition criteria are applied to individual texts (emergence, Pienemann 1998; 75%-correct, Ellis 1989) and combined with analyses of texts grouped at CEFR levels. Results suggest links between developmental stages and CEFR levels: VEND was emerged in 44% of B1 texts, while most A2 texts did not contain VEND. Many B1 texts (61.5%), but only 37.5% of A2 texts showed emergence of INV. However, analyses also revealed persisting problems with INV accuracy at B1. More generally, the study points out shared challenges for Learner Corpus Research, SLA, and proficiency/assessment research related to the availability of linguistic evidence in learner texts.


2004 ◽  
Vol 11 (5) ◽  
pp. 542-549 ◽  
Author(s):  
Michael S. Sabel ◽  
Cary S. Kaufman ◽  
Pat Whitworth ◽  
Helena Chang ◽  
Lewis H. Stocks ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document