Error Tagging in the Lithuanian Learner Corpus

Frontiers in Artificial Intelligence and Applications - Human Language Technologies – The Baltic Perspective ◽

10.3233/faia200631 ◽

2020 ◽

Author(s):

Jūratė Ruzaitė ◽

Sigita Dereškevičiūtė ◽

Viktorija Kavaliauskaitė-Vilkinienė ◽

Eglė Krivickaitė-Leišienė

Keyword(s):

Native Language ◽

Progress Report ◽

Proficiency Levels ◽

Work In Progress ◽

Learner Corpus ◽

Current State ◽

Error Annotation

This paper is a work-in-progress report on error annotation in the Lithuanian Learner Corpus (LLC), which has been developed using the TEITOK environment. The LLC is the first electronic corpus of learner Lithuanian that represents learners of very diverse native language backgrounds and different proficiency levels. In this paper, we have a double aim: firstly, we present the structure of the corpus in its current state; and secondly, we describe the main principles, procedures, and challenges of error annotation in the LLC. The main types of errors that are tagged in this corpus and analysed in this paper are orthographic, lexical, and syntactic.

Download Full-text

Native language identification and writing proficiency

International Journal of Learner Corpus Research ◽

10.1075/ijlcr.1.2.01kyl ◽

2015 ◽

Vol 1 (2) ◽

pp. 187-209 ◽

Cited By ~ 2

Author(s):

Kristopher Kyle ◽

Scott A. Crossley ◽

YouJin Kim

Keyword(s):

Group Membership ◽

Native Language ◽

Multinomial Logistic Regression ◽

First Language ◽

Language Identification ◽

Writing Proficiency ◽

Proficiency Levels ◽

Close Analysis ◽

The Impact ◽

The Relationship

This study evaluates the impact of writing proficiency on native language identification (NLI), a topic that has important implications for the generalizability of NLI models and detection-based arguments for cross-linguistic influence (Jarvis 2010, 2012; CLI). The study uses multinomial logistic regression to classify the first language (L1) group membership of essays at two proficiency levels based on systematic lexical and phrasal choices made by members of five L1 groups. The results indicate that lower proficiency essays are significantly easier to classify than higher proficiency essays, suggesting that lower proficiency writers make lexical and phrasal choices that are more similar to other lower proficiency writers that share an L1 than higher proficiency writers that share an L1. A close analysis of the findings also indicates that the relationship between NLI accuracy and proficiency differed across L1 groups.

Download Full-text

Surface-marker-based dialog modelling: A progress report on the MAREDI project

Natural Language Engineering ◽

10.1017/s1351324903003231 ◽

2003 ◽

Vol 9 (4) ◽

pp. 325-363

Author(s):

SYLVAIN DELISLE ◽

BERNARD MOULIN ◽

TERRY COPECK

Keyword(s):

Information Systems ◽

Natural Language ◽

Speech Acts ◽

Spoken Language ◽

Progress Report ◽

Surface Marker ◽

Simplified Model ◽

Surface Markers ◽

Connectionist Network ◽

Current State

Most information systems that deal with natural language texts do not tolerate much deviation from their idealized and simplified model of language. Spoken dialog is notoriously ungrammatical, however. Because the MAREDI project focuses in particular on the automatic analysis of scripted dialogs, we needed to develop a robust capacity to analyze transcribed spoken language. This paper summarizes the current state of our work. It presents the main elements of our approach, which is based on exploiting surface markers as the best route to the semantics of the conversation modelled. We highlight the foundations of our particular conversational model, and give an overview of the MAREDI system. We then discuss its three key modules, a connectionist network to recognise speech acts, a robust syntactic analyzer, and a semantic analyzer.

Download Full-text

Detailed Error Annotation for Morphologically Rich Languages: Latvian Use Case

Frontiers in Artificial Intelligence and Applications - Human Language Technologies – The Baltic Perspective ◽

10.3233/faia200629 ◽

2020 ◽

Author(s):

Roberts Darģis ◽

Ilze Auzin̦a ◽

Kristīne Levāne-Petrova ◽

Inga Kaija

Keyword(s):

Word Formation ◽

Language Learner ◽

Use Case ◽

Project Development ◽

Syntactic Structures ◽

Learner Corpus ◽

Error Annotation ◽

Morphologically Rich Languages ◽

Free Word ◽

Ongoing Project

This paper presents a detailed error annotation for morphologically rich languages. The described approach is used to create Latvian Language Learner corpus (LaVA) which is part of a currently ongoing project Development of Learner corpus of Latvian: methods, tools and applications. There is no need for an advanced multi-token error annotation schema, because error annotated texts are written by beginner level (A1 and A2) who use simple syntactic structures. This schema focuses on in-depth categorization of spelling and word formation errors. The annotation schema will work best for languages with relatively free word order and rich morphology.

Download Full-text

NATIVE LANGUAGE IDENTIFICATION FOR RUSSIAN USING ERRORS TYPES

Computational Linguistics and Intellectual Technologies ◽

10.28995/2075-7182-2020-19-1123-1133 ◽

2020 ◽

Author(s):

N. V. Remnev ◽

Keyword(s):

Research Group ◽

English Language ◽

Native Language ◽

Language Identification ◽

Data Driven ◽

Shared Task ◽

Identification Task ◽

Learner Corpus ◽

Russian Speakers ◽

Russian Research

The task of recognizing the author’s native (Native Language Identification—NLI) language based on a texts, written in a language that is non-native to the author—is the task of automatically recognizing native language (L1). The NLI task was studied in detail for the English language, and two shared tasks were conducted in 2013 and 2017, where TOEFL English essays and essay samples were used as data. There is also a small number of works where the NLI problem was solved for other languages. The NLI problem was investigated for Russian by Ladygina (2017) and Remnev (2019). This paper discusses the use of well-established approaches in the NLI Shared Task 2013 and 2017 competitions to solve the problem of recognizing the author’s native language, as well as to recognize the type of speaker—learners of Russian or Heritage Russian speakers. Native language identification task is also solved based on the types of errors specific to different languages. This study is data-driven and is possible thanks to the Russian Learner Corpus developed by the Higher School of Economics (HSE) Learner Russian Research Group on the basis of which experiments are being conducted.

Download Full-text

What Teachers Need to Know About English Language Learners' Translanguaging in the Classroom

Handbook of Research on Assessment Practices and Pedagogical Models for Immigrant Students - Advances in Educational Technologies and Instructional Design ◽

10.4018/978-1-5225-9348-5.ch007 ◽

2019 ◽

pp. 109-126

Author(s):

Laila Aghai

Keyword(s):

English Language Learners ◽

Language Learners ◽

English Language ◽

Native Language ◽

Esl Students ◽

Negotiation Of Meaning ◽

Qualitative Research Study ◽

Proficiency Levels ◽

L1 And L2 ◽

The U.S

This qualitative research study focuses on English language learners who are continuing their education in the U.S. high schools and examines their translanguaging in the classroom. When students are learning a second language, they use their linguistic repertoire and their knowledge in English and their native language for negotiation of meaning. In order to gain a better understanding of the students' translanguaging, one ESL teacher and 10 ESL students were interviewed and observed in a classroom. The ESL students spoke Arabic as their native language and had beginning to intermediate proficiency levels. The findings of the study showed that English language learners use various strategies to make the content comprehensible by making connections between their knowledge in their L1 and L2.

Download Full-text

Querying a Static and Dynamic Learner Corpus

Handbook of Research on Integrating Technology Into Contemporary Language Learning and Teaching - Advances in Educational Technologies and Instructional Design ◽

10.4018/978-1-5225-5140-9.ch024 ◽

2018 ◽

pp. 494-511

Author(s):

Trude Heift ◽

Catherine Caws

Keyword(s):

Language Learners ◽

Data Driven ◽

User Group ◽

Proficiency Levels ◽

Learner Corpus ◽

Study Results ◽

Call System ◽

L2 Learners ◽

Activity Types ◽

Cyclical Process

This chapter discusses the cyclical process of collecting and recycling learner data within the E-Tutor CALL system and presents a study on student usage of its data-driven learning (DDL) tool. E-Tutor consists of a static and dynamic learner corpus for L2 learners of German. The static learner corpus has been constructed from approximately 5000 learners who used the system over a period of five years. These learners provided millions of submissions from a variety of activity types. In addition, all concurrent E-Tutor users contribute data to a dynamic corpus, which allows them to compare and examine their ongoing system submissions to those contained in the static corpus. The authors conducted a study with 84 learners and recorded their interaction with the DDL tool of E-Tutor over one semester. Study results on student usage suggest that investigating sample input of a large, unknown user group might be less informative and of less interest to language learners than their own data. For the DDL tool to be useful for all proficiency levels, training and scaffolding must also be provided.

Download Full-text

Southern Reference Star Program: Progress Report

Symposium - International Astronomical Union ◽

10.1017/s0074180900069485 ◽

1974 ◽

Vol 61 ◽

pp. 41-41

Author(s):

J. L. Schombert

Keyword(s):

Progress Report ◽

San Juan ◽

Reference Star ◽

Current State ◽

Star Program

The observational programs for the Southern Reference Star Program are rapidly drawing to a close. Observations have been completed, except for a few at San Juan and a small percentage of zone −52° to −64° at the Cape. The Cape zone −64° to − 90° has been cancelled. The current state of the observations is shown in Table I.

Download Full-text

The Pennsylvania University Museum's Demeter and Persephone Sanctuary Project at Cyrene: A Final Progress Report?

Libyan Studies ◽

10.1017/s0263718900006609 ◽

1989 ◽

Vol 20 ◽

pp. 71-75 ◽

Cited By ~ 1

Author(s):

D. White

Keyword(s):

Progress Report ◽

Final Report ◽

Short Report ◽

Work In Progress ◽

Final Study ◽

Published Report ◽

Sanctuary Of Demeter ◽

Demeter And Persephone

Trigeminated at more or less the same time as the First of September Revolution and the appearance of the first published report from The Society for Libyan Studies, the excavation phase of the extramural Sanctuary of Demeter and Persephone project at Cyrene ran until 1978 when it was stopped in order to begin work on final publication. The intervening years have seen a degree of progress, and this retrospective vicenary issue of Libyan Studies provides a welcome opportunity to take stock of what has been and is going on. The journal's readers will already have some familiarity with the broad outlines of the sanctuary project, since summary articles have been published in an earlier issue (White 1978) and elsewhere (Vickers and Reynolds 1972; Kane 1979; Humphrey 1980; White 1981). In addition reviews of the three published volumes of the final report (White 1984; Schaus 1985; Lowenstam et al. 1987) have appeared here with exemplary promptitude (Lloyd 1985; Boardman 1986; Fulford 1988), as well as externally (Brown 1986; Tomlinson 1986; Cook 1987). The present article's bibliographical citations list what has been written about the sanctuary, but omit the series of preliminary reports in Libya Antiqua (between Vols. 8 and 16) and American Journal of Archaeology (Vols. 78 and 80), whose inclusion would be redundant, as would be any attempt to minute the contents of the reviewed final study volumes. Instead my present intention is to give a short report on work in progress and to summarise the results of what has already been published in separate studies outside the framework of the final publication.

Download Full-text

SLA developmental stages in the CEFR-related learner corpus MERLIN

International Journal of Learner Corpus Research ◽

10.1075/ijlcr.18008.wis ◽

2020 ◽

Vol 6 (1) ◽

pp. 1-37

Author(s):

Katrin Wisniewski

Keyword(s):

Second Language ◽

Second Language Acquisition ◽

Language Acquisition ◽

Developmental Stages ◽

Proficiency Levels ◽

Learner Corpus ◽

Linguistic Evidence ◽

Proficiency Assessment ◽

The Common ◽

Assessment Research

Abstract Little is known about the link between Second Language Acquisition (SLA) developmental stages (Pienemann 1998) and the Common European Framework of Reference (CEFR) proficiency levels. Therefore, this study examines the inversion (INV) and verb-end (VEND) stages in L2 German MERLIN texts rated B1 (N = 104) or A2 (N = 32). Two acquisition criteria are applied to individual texts (emergence, Pienemann 1998; 75%-correct, Ellis 1989) and combined with analyses of texts grouped at CEFR levels. Results suggest links between developmental stages and CEFR levels: VEND was emerged in 44% of B1 texts, while most A2 texts did not contain VEND. Many B1 texts (61.5%), but only 37.5% of A2 texts showed emergence of INV. However, analyses also revealed persisting problems with INV accuracy at B1. More generally, the study points out shared challenges for Learner Corpus Research, SLA, and proficiency/assessment research related to the availability of linguistic evidence in learner texts.

Download Full-text

Cryoablation of Early-Stage Breast Cancer: Work-in-Progress Report of a Multi-Institutional Trial

Annals of Surgical Oncology ◽

10.1245/aso.2004.08.003 ◽

2004 ◽

Vol 11 (5) ◽

pp. 542-549 ◽

Cited By ~ 125

Author(s):

Michael S. Sabel ◽

Cary S. Kaufman ◽

Pat Whitworth ◽

Helena Chang ◽

Lewis H. Stocks ◽

...

Keyword(s):

Breast Cancer ◽

Early Stage ◽

Progress Report ◽

Early Stage Breast Cancer ◽

Work In Progress

Download Full-text