test languages
Recently Published Documents


TOTAL DOCUMENTS

14
(FIVE YEARS 0)

H-INDEX

5
(FIVE YEARS 0)

2019 ◽  
Vol 10 (3) ◽  
pp. 351-379 ◽  
Author(s):  
Charlotte Gooskens ◽  
Vincent J. van Heuven

Abstract We measured mutual intelligibility of 16 closely related spoken languages in Europe. Intelligibility was determined for all 70 language combinations using the same uniform methodology (a cloze test). We analysed the results of 1833 listeners representing the mutual intelligibility between young, educated Europeans from the same 16 countries. Lexical, phonological, orthographic, morphological and syntactic distances were computed as linguistic variables. We also quantified non-linguistic variables (e.g. exposure, attitudes towards the test languages). Using stepwise regression analysis the importance of linguistic and non-linguistic predictors for the mutual intelligibility in the 70 language pairs was assessed. Exposure to the test language was the most important variable, overriding all other variables. Then, limiting the analysis to the prediction of inherent intelligibility, we analysed the results for a subset of listeners with no or little previous exposure to the test language. Linguistic distances, especially lexical distance, now explain a substantial part of the variance.


2018 ◽  
Vol 6 ◽  
pp. 667-685 ◽  
Author(s):  
Dingquan Wang ◽  
Jason Eisner

We introduce a novel framework for delexicalized dependency parsing in a new language. We show that useful features of the target language can be extracted automatically from an unparsed corpus, which consists only of gold part-of-speech (POS) sequences. Providing these features to our neural parser enables it to parse sequences like those in the corpus. Strikingly, our system has no supervision in the target language. Rather, it is a multilingual system that is trained end-to-end on a variety of other languages, so it learns a feature extractor that works well. We show experimentally across multiple languages: (1) Features computed from the unparsed corpus improve parsing accuracy. (2) Including thousands of synthetic languages in the training yields further improvement. (3) Despite being computed from unparsed corpora, our learned task-specific features beat previous work’s interpretable typological features that require parsed corpora or expert categorization of the language. Our best method improved attachment scores on held-out test languages by an average of 5.6 percentage points over past work that does not inspect the unparsed data (McDonald et al., 2011), and by 20.7 points over past “grammar induction” work that does not use training languages (Naseem et al., 2010).


2017 ◽  
Vol 40 (2) ◽  
pp. 123-147 ◽  
Author(s):  
Charlotte Gooskens ◽  
Femke Swarte

We report on a large-scale investigation of the mutual intelligibility between five Germanic languages: Danish, Dutch, English, German and Swedish. We tested twenty language combinations using the same uniform methodology, making the results commensurable for the first time. We first tested both written and spoken language by means of cloze tests. Next we calculated linguistic distance at the levels of lexicon, orthography, phonology, morphology and syntax. We also quantified exposure and attitudes towards the test languages. Finally, we carried out a regression analysis to determine the relative importance of these linguistic and extra-linguistic predictors for the mutual intelligibility between Germanic languages. The extra-linguistic predictor exposure was the most significant factor in predicting intelligibility in the Germanic language area. The effect of attitude was very small. Lexical distance, orthographic and phonetic distances were the most important linguistic predictors of intelligibility.


2017 ◽  
Vol 7 (3-4) ◽  
pp. 359-393 ◽  
Author(s):  
Stanislava Antonijevic ◽  
Ruth Durham ◽  
Íde Ní Chonghaile

Abstract Currently there are no standardized language assessments for English-Irish bilingual school age children that would test languages in a comparable way. There are also no standardized language assessments of Irish for this age group. The current study aimed to design comparable language assessments in both languages targeting structures known to be challenging for children with language impairments. A sentence repetition (SRep) task equivalent to the English SRep task (Marinis, Chiat, Armon-Lotem, Piper, & Roy, 2011) was designed for Irish. Twenty-four typically developing, sequential bilingual children immersed in Irish in the educational setting performed better on the English SRep task than on the Irish SRep task. Different patterns were observed in language performance across sentence types with performance on relative clauses being particularly poor in Irish. Similarly, differences were observed in error patterns with the highest number of errors of omission in Irish, and the highest number of substitution errors in English.


Author(s):  
Željko Agić ◽  
Anders Johannsen ◽  
Barbara Plank ◽  
Héctor Martínez Alonso ◽  
Natalie Schluter ◽  
...  

We propose a novel approach to cross-lingual part-of-speech tagging and dependency parsing for truly low-resource languages. Our annotation projection-based approach yields tagging and parsing models for over 100 languages. All that is needed are freely available parallel texts, and taggers and parsers for resource-rich languages. The empirical evaluation across 30 test languages shows that our method consistently provides top-level accuracies, close to established upper bounds, and outperforms several competitive baselines.


2015 ◽  
Vol 12 (4) ◽  
pp. 374-391 ◽  
Author(s):  
Alexandru-Robert Guduvan ◽  
Hélène Waeselynck ◽  
Virginie Wiels ◽  
Guy Durrieu ◽  
Yann Fusero ◽  
...  
Keyword(s):  

Author(s):  
Krister Lindén

Language software applications encounter new words, e.g., acronyms, technical terminology, loan words, names or compounds of such words. To add new words to a lexicon, we need to indicate their base form and inflectional paradigm. In this article, we evaluate a combination of corpus-based and lexicon-based methods for assigning the base form and inflectional paradigm to new words in Finnish, Swedish and English finite-state transducer lexicons. The methods have been implemented with the open-source Helsinki Finite-State Technology (Lindén & al., 2009). As an entry generator often produces numerous suggestions, it is important that the best suggestions be among the first few, otherwise it may become more efficient to create the entries by hand. By combining the probabilities calculated from corpus data and from lexical data, we get a more precise combined model. The combined method has 77-81 % precision and 89-97 % recall, i.e. the first correctly generated entry is on the average found as the first or second candidate for the test languages. A further study demonstrated that a native speaker could revise suggestions from the entry generator at a speed of 300-400 entries per hour.


Sign in / Sign up

Export Citation Format

Share Document