Detailed Error Annotation for Morphologically Rich Languages: Latvian Use Case

Frontiers in Artificial Intelligence and Applications - Human Language Technologies – The Baltic Perspective ◽

10.3233/faia200629 ◽

2020 ◽

Author(s):

Roberts Darģis ◽

Ilze Auzin̦a ◽

Kristīne Levāne-Petrova ◽

Inga Kaija

Keyword(s):

Word Formation ◽

Language Learner ◽

Use Case ◽

Project Development ◽

Syntactic Structures ◽

Learner Corpus ◽

Error Annotation ◽

Morphologically Rich Languages ◽

Free Word ◽

Ongoing Project

This paper presents a detailed error annotation for morphologically rich languages. The described approach is used to create Latvian Language Learner corpus (LaVA) which is part of a currently ongoing project Development of Learner corpus of Latvian: methods, tools and applications. There is no need for an advanced multi-token error annotation schema, because error annotated texts are written by beginner level (A1 and A2) who use simple syntactic structures. This schema focuses on in-depth categorization of spelling and word formation errors. The annotation schema will work best for languages with relatively free word order and rich morphology.

Download Full-text

Subcategorization frame identification for learner English

International Journal of Corpus Linguistics ◽

10.1075/ijcl.18097.hua ◽

2020 ◽

Author(s):

Yan Huang ◽

Akira Murakami ◽

Theodora Alexopoulou ◽

Anna Korhonen

Keyword(s):

Second Language ◽

Second Language Acquisition ◽

Language Processing ◽

Large Scale ◽

Structural Information ◽

Syntactic Structures ◽

Verb Classes ◽

Learner Corpus ◽

Wide Range ◽

Learner Language

Abstract As large-scale learner corpora become increasingly available, it is vital that natural language processing (NLP) technology is developed to provide rich linguistic annotations necessary for second language (L2) research. We present a system for automatically analyzing subcategorization frames (SCFs) for learner English. SCFs link lexis with morphosyntax, shedding light on the interplay between lexical and structural information in learner language. Meanwhile, SCFs are crucial to the study of a wide range of phenomena including individual verbs, verb classes and varying syntactic structures. To illustrate the usefulness of our system for learner corpus research and second language acquisition (SLA), we investigate how L2 learners diversify their use of SCFs in text and how this diversity changes with L2 proficiency.

Download Full-text

Conversion Syntaxique

Lingvisticae Investigationes ◽

10.1075/li.18.1.02bar ◽

1994 ◽

Vol 18 (1) ◽

pp. 1-26 ◽

Cited By ~ 1

Author(s):

Philippe E. Barbaud

Keyword(s):

Empirical Data ◽

Word Formation ◽

Compound Word ◽

Romance Languages ◽

Compound Words ◽

Syntactic Structures ◽

High Productivity ◽

Structure Level ◽

Lexical Function ◽

Verb Raising

In this study, it is shown that the "category changing" property of morphological rules of conversion is unable to account for compound words, for formal and semantic reasons. Several convergent facts demonstrate that the compounding process is syntactic in nature. Consequently, it is argued that X-bar theory must be involved in compound word formation because of the "lexical function" of the syntax. Empirical data are mainly focused on French Noms Composés à base Verbale, or NCV, as tire-bouchon (cork screw),porte-parole (spoke person), gagne-pain (job), etc., which are analyzed as base generated "quasi-VPs" embedded in a NP. Thus, the NPWP exocentric dominance instantiates a "syntactic conversion" at the D-structure level. Such a categorial hierarchy is based on the "distribution changing" property of X-bar theory rather than on the "category changing" property of structuring morphological rules. Therefore, the high productivity of NCVs in French and other Romance languages is due to their morphology, which allows SPEC\HEAD agreement and VERB RAISING movement. The licensing of exocentric X-bar structures in grammar depends on several semantic principles of lexical interpretation, which are relevant to hyperonymy, hyponymy, meronymy, etc.. Thus, the model is dispensed with a superfluous component of "peripheral" rules of compounding. In conclusion, exocentricity of syntactic structures leads the author to claim that X-bar schema is primitive in grammar and that a given phrase is not the necessary projection of its head.

Download Full-text

The SweLL Language Learner Corpus

Northern European Journal of Language Technology ◽

10.3384/nejlt.2000-1533.19667 ◽

2019 ◽

Vol 6 ◽

pp. 67-104 ◽

Cited By ~ 1

Author(s):

Elena Volodina ◽

Lena Granstedt ◽

Arild Matsson ◽

Beáta Megyesi ◽

Ildikó Pilán ◽

...

Keyword(s):

Second Language ◽

Language Learning ◽

Language Learners ◽

Second Language Learners ◽

Personal Information ◽

Language Learner ◽

Management Tool ◽

Learner Corpus ◽

Selection Principles

The article presents a new language learner corpus for Swedish, SweLL, and the methodology from collection and pesudonymisation to protect personal information of learners to annotation adapted to second language learning. The main aim is to deliver a well-annotated corpus of essays written by second language learners of Swedish and make it available for research through a browsable environment. To that end, a new annotation tool and a new project management tool have been implemented, – both with the main purpose to ensure reliability and quality of the final corpus. In the article we discuss reasoning behind metadata selection, principles of gold corpus compilation and argue for separation of normalization from correction annotation.

Download Full-text

Sistem Informasi Monitoring dan Evaluasi Proyek dengan Menerapkan Perhitungan Deviasi

Jurnal JTIK (Jurnal Teknologi Informasi dan Komunikasi) ◽

10.35870/jtik.v4i2.146 ◽

2020 ◽

Vol 4 (2) ◽

pp. 102

Author(s):

Muhammad Hendra Sunarya ◽

Rusman Irwansyah ◽

Muhammad Syahid Pebriadi

Keyword(s):

Information System ◽

Research Result ◽

Monitoring And Evaluation ◽

Use Case ◽

Human Errors ◽

Diagram Method ◽

Fast Processing ◽

Project Monitoring ◽

Ongoing Project ◽

Evaluation Information

The process of monitoring and evaluating a project needs to be done to ensure the project runs smoothly. The problems that arise at this stage are obstacles to knowing the progress of the ongoing project. Some other obstacles are the potential for file loss, duplication of data, human errors, the need for fast processing, and the need for data integrity of two systems. This study aims to develop a project monitoring and evaluation information system applying deviation calculations. This calculation is carried out to see the realization of the work of an ongoing project. The system design technique uses the use case diagram method which describes the process of the system. The waterfall method is used to develop a monitoring and evaluation information system for this project. The research result is in the form of a project monitoring and evaluation information system that can provide information about the work realization of an ongoing project.Keywords:Monitoring, Evaluation, Deviation Calculation, Information System.

Download Full-text

Error Annotation of the Arabic Learner Corpus

Language Processing and Knowledge in the Web - Lecture Notes in Computer Science ◽

10.1007/978-3-642-40722-2_2 ◽

2013 ◽

pp. 14-22 ◽

Cited By ~ 1

Author(s):

Abdullah Alfaifi ◽

Eric Atwell ◽

Ghazi Abuhakema

Keyword(s):

Learner Corpus ◽

Error Annotation

Download Full-text

PEDAGOGIGAL ASPECTS OF THE FORMATION LINGUISTIC AND CULTURAL COMPETENCE

BULLETIN Series of Philological Sciences ◽

10.51889/2021-1.1728-7804.58 ◽

2020 ◽

Vol 75 (1) ◽

pp. 339-344

Author(s):

М.А. Yermekbayev ◽

Keyword(s):

Cultural Competence ◽

Primary School ◽

Learning Process ◽

Language Skills ◽

Word Formation ◽

Necessary Condition ◽

Syntactic Structures ◽

Cultural Competencies ◽

Correct Spelling ◽

Language Competencies

The main result of the learning process in primary school depends on how many competencies the student has mastered. a student who has developed linguistic and cultural competencies is immersed in the grammatical structural secrets of the Kazakh language, understands the laws of applying certain grammatical rules in speech, and learns the mechanism of sentence formation. And without competence, the undocumented skills of theoretical complementarity are transformed into the preparation of a single plateau. On a conscious scale, students ' language competencies become a necessary condition for organizing their language skills and abilities. Properly formed language and cultural competencies help students to understand the studied phonetic, lexical, grammatical phenomena in depth, to understand the mechanism of Word formation, correct spelling rules, the presence of certain types of speech in the language presence, syntactic structures, etc.

Download Full-text

Hybrid agreement as a conflict resolution strategy

Proceedings of the International Conference on Head-Driven Phrase Structure Grammar ◽

10.21248/hpsg.2006.13 ◽

2006 ◽

Author(s):

Nurit Melnik

Keyword(s):

Conflict Resolution ◽

Relative Clauses ◽

Syntactic Structures ◽

Standard Arabic ◽

Empirical Coverage ◽

Conflict Resolution Strategy ◽

Resolution Strategy ◽

Morphologically Rich Languages ◽

Semantic Agreement

Situations in which conflicting constraints clash can potentially provide linguists with insights into the architecture of grammar. This paper deals with such a case. When predicative modifiers of morphologically rich languages head relative clauses, they are involved in two, sometimes conflicting, agreement relationships. Different languages adopt different strategies in order to resolve situations of conflicting constraints. This paper focuses on Standard Arabic and the hybrid agreement strategy which it employs. It argues that the HPSG theory of agreement, which distinguishes between morphosyntactic and semantic agreement, constitutes an appropriate framework for accounting for the phenomenon. In addition, it shows that contrary to claims made by Doron and Reintges (2005), a non-derivational framework such as HPSG is adequate for accounting for this non-trivial agreement pattern. Moreover, with a constructional approach, whereby constraints can target syntactic structures above the lexical level, better empirical coverage is achieved.

Download Full-text

The interphonology of contemporary English corpus (IPCE-IPAC)

International Journal of Learner Corpus Research ◽

10.1075/ijlcr.20010.her ◽

2021 ◽

Vol 7 (2) ◽

pp. 275-289

Author(s):

Nadine Herry-Bénit ◽

Stéphanie Lopez ◽

Takeki Kamiyama ◽

Jeff Tennant

Keyword(s):

Learner Corpus ◽

L2 Learners ◽

Ongoing Project

Abstract This article presents the IPCE-IPAC corpus, an ongoing project, which has been collected in France, Italy, Spain and China since 2014. The data is collected to investigate the acquisition of segmental and suprasegmental phenomena by L2 learners of English, with a focus on phonemes. The article discusses the methods for the compilation of this original spoken learner corpus, designed to study L2 “interphonology” (Detey, Racine, Kawaguchi, & Zay, 2016), or interlanguage phonology.

Download Full-text

Latviešu valodas apguvēju korpuss lietojumā: teorētisks un metodoloģisks ieskats

10.37384/lva.2021.162 ◽

2021 ◽

pp. 162-177

Author(s):

Antra Kļavinska ◽

Keyword(s):

Language Acquisition ◽

Foreign Language ◽

Language Learners ◽

Corpus Linguistics ◽

Theoretical Perspective ◽

Language Learner ◽

Learner Corpus ◽

Text Corpora ◽

Learner Corpora ◽

Corpus Data

Several text corpora have been created in Latvia, including learner corpora. One of the latest projects is the Latvian Language Learner Corpus (LaVA), which contains the works of international students studying in Latvian higher education institutions who are learning Latvian as a foreign language. The texts are morphologically tagged automatically, and learner errors are tagged manually. A sufficient scope of publications is available, which provides the theoretical basis for the creation of Latvian language learner corpora; however, there is a lack of studies or practical methodological guidelines concerning the opportunities for their application, and there is little data about the use of text corpora in language acquisition. The aim of this study is to explain from the theoretical perspective for what purposes learner corpus data may be used, as well as to illustrate the methodological groundwork with examples from the LaVA corpus. Analysis of theoretical literature has demonstrated the functions and meaning of learner corpora in research, and experience with the use of corpora in acquiring a foreign language has been analysed. Examples of the use of the LaVA corpus as a didactic resource have been prepared using Corpus Linguistics methods. The study was conducted within the state research programme project “The Latvian Language”. After studying the functions of learner corpora from the theoretical perspective, it was concluded that the target audience of the LaVA corpus mainly includes teachers of Latvian as a foreign language (LATS), authors of teaching materials, as well as Latvian language learners. To facilitate the use of the LaVA corpus, it is important to have basic knowledge of Corpus Linguistics, an understanding of the theory of language, as well as an understanding of foreign language teaching methodology. LATS teachers can use the LaVA corpus data in the creation of curricula and teaching materials, in the preparation of language proficiency tests, etc. Using the inductive approach in language acquisition, language learners can also become language researchers, can analyse the errors of other learners, etc. Undeniably, the LaVA corpus can be used in broader linguistic research, for example, in contrastive interlanguage analysis, comparing the data of language learners with the data of native speakers or the data of different groups of language learners.

Download Full-text

When discourse matches syntax: On meta-informative centering theory and discourse coherence in the recent history of English

International Journal of English Studies ◽

10.6018/ijes/2011/2/149661 ◽

2011 ◽

Vol 11 (2) ◽

pp. 97

Author(s):

Ana Elina Martínez Insua

Keyword(s):

Middle English ◽

Word Order ◽

Information Structure ◽

Recent History ◽

Syntactic Structures ◽

Discourse Coherence ◽

Order Language ◽

History Of ◽

Centering Theory ◽

Free Word

This paper is concerned with how <em>there</em>-constructions may have helped to achieve discourse coherence in the recent history of English. From the theoretical framework of Meta-Informative Centering Theory (MIC) the paper explores the possibility to establish a relation between the syntactic structures under analysis and the distinction between 'smooth-shift' and 'rough-shift' transitions from one centre of attention to another (Brennan, Friedman & Pollard, 1987). This will help, ultimately, to investigate the interaction between centering and MIC theories, word order and information structure in a 'non-free' word order language such as English. A corpus- driven analysis of the behaviour of spoken and written <em>there</em>-constructions from late Middle English to Present Day English will show their capacity to function either as highly coherent structures that continue with the same local topic as the previous utterance(s), or as means to shift the local focus of attention.

Download Full-text