scholarly journals The Language Demographics of Amazon Mechanical Turk

Author(s):  
Ellie Pavlick ◽  
Matt Post ◽  
Ann Irvine ◽  
Dmitry Kachaev ◽  
Chris Callison-Burch

We present a large scale study of the languages spoken by bilingual workers on Mechanical Turk (MTurk). We establish a methodology for determining the language skills of anonymous crowd workers that is more robust than simple surveying. We validate workers’ self-reported language skill claims by measuring their ability to correctly translate words, and by geolocating workers to see if they reside in countries where the languages are likely to be spoken. Rather than posting a one-off survey, we posted paid tasks consisting of 1,000 assignments to translate a total of 10,000 words in each of 100 languages. Our study ran for several months, and was highly visible on the MTurk crowdsourcing platform, increasing the chances that bilingual workers would complete it. Our study was useful both to create bilingual dictionaries and to act as census of the bilingual speakers on MTurk. We use this data to recommend languages with the largest speaker populations as good candidates for other researchers who want to develop crowdsourced, multilingual technologies. To further demonstrate the value of creating data via crowdsourcing, we hire workers to create bilingual parallel corpora in six Indian languages, and use them to train statistical machine translation systems.

2021 ◽  
Vol 111 (2) ◽  
pp. 687-719
Author(s):  
Erik Snowberg ◽  
Leeat Yariv

We leverage a large-scale incentivized survey eliciting behaviors from (almost) an entire undergraduate university student population, a representative sample of the US population, and Amazon Mechanical Turk (MTurk) to address concerns about the external validity of experiments with student participants. Behavior in the student population offers bounds on behaviors in other populations, and correlations between behaviors are similar across samples. Furthermore, non-student samples exhibit higher levels of noise. Adding historical lab participation data, we find a small set of attributes over which lab participants differ from non-lab participants. An additional set of lab experiments shows no evidence of observer effects. (JEL C83, D90, D91)


2019 ◽  
Author(s):  
Antonio Alonso Arechar ◽  
David Gertler Rand

We investigate whether experience playing the Dictator Game (DG) affects prosociality by aggregating data from 37 experiments run on Amazon Mechanical Turk over a six-year period. While prior evidence has shown a correlation between experience on Amazon Mechanical Turk and selfishness, it is unclear to what extent this is the result of selection versus learning. Examining a total of 27,266 decisions made by 17,791 unique individuals, our data shows evidence of significant negative effects of both selection and learning. First, people who participated in a greater total number of our experiments were more selfish, even in their first game – indicating that people who are more likely to select into our experiments are more selfish. Second, a given individual tends to transfer less money over successive experiments – indicating that experience with the DG leads to greater selfishness. These results provide clear evidence of learning even in this non-strategic social setting.


2013 ◽  
Vol 39 (4) ◽  
pp. 999-1023 ◽  
Author(s):  
Gennadi Lembersky ◽  
Noam Ordan ◽  
Shuly Wintner

Translation models used for statistical machine translation are compiled from parallel corpora that are manually translated. The common assumption is that parallel texts are symmetrical: The direction of translation is deemed irrelevant and is consequently ignored. Much research in Translation Studies indicates that the direction of translation matters, however, as translated language (translationese) has many unique properties. It has already been shown that phrase tables constructed from parallel corpora translated in the same direction as the translation task outperform those constructed from corpora translated in the opposite direction. We reconfirm that this is indeed the case, but emphasize the importance of also using texts translated in the “wrong” direction. We take advantage of information pertaining to the direction of translation in constructing phrase tables by adapting the translation model to the special properties of translationese. We explore two adaptation techniques: First, we create a mixture model by interpolating phrase tables trained on texts translated in the “right” and the “wrong” directions. The weights for the interpolation are determined by minimizing perplexity. Second, we define entropy-based measures that estimate the correspondence of target-language phrases to translationese, thereby eliminating the need to annotate the parallel corpus with information pertaining to the direction of translation. We show that incorporating these measures as features in the phrase tables of statistical machine translation systems results in consistent, statistically significant improvement in the quality of the translation.


Machine Translation systems are still far from being perfect and to improve their performance the concept of Interactive Machine Translation (IMT) was introduced. This paper proposes an IMT system, which uses Statistical Machine Translation and a bilingual corpus on which several algorithms (Word error rate, Position Independent Error Rate, Translation Error Rate, n-grams) are implemented to translate text from English to Indian languages. The proposed system improves both the speed and productivity of the human translators as found through experiments.


2013 ◽  
Vol 99 (1) ◽  
pp. 17-38
Author(s):  
Matthias Huck ◽  
Erik Scharwächter ◽  
Hermann Ney

Abstract Standard phrase-based statistical machine translation systems generate translations based on an inventory of continuous bilingual phrases. In this work, we extend a phrase-based decoder with the ability to make use of phrases that are discontinuous in the source part. Our dynamic programming beam search algorithm supports separate pruning of coverage hypotheses per cardinality and of lexical hypotheses per coverage, as well as coverage constraints that impose restrictions on the possible reorderings. In addition to investigating these aspects, which are related to the decoding procedure, we also concentrate our attention on the question of how to obtain source-side discontinuous phrases from parallel training data. Two approaches (hierarchical and discontinuous extraction) are presented and compared. On a large-scale Chinese!English translation task, we conduct a thorough empirical evaluation in order to study a number of system configurations with source-side discontinuous phrases, and to compare them to setups which employ continuous phrases only.


Author(s):  
Ignatius Ikechukwu Ayogu ◽  
Adebayo Olusola Adetunmbi ◽  
Bolanle Adefowoke Ojokoh

The global demand for translation and translation tools currently surpasses the capacity of available solutions. Besides, there is no one-solution-fits-all, off-the-shelf solution for all languages. Thus, the need and urgency to increase the scale of research for the development of translation tools and devices continue to grow, especially for languages suffering under the pressure of globalisation. This paper discusses our experiments on translation systems between English and two Nigerian languages: Igbo and Yorùbá. The study is setup to build parallel corpora, train and experiment English-to-Igbo, (), English-to-Yorùbá, () and Igbo-to-Yorùbá, () phrase-based statistical machine translation systems. The systems were trained on parallel corpora that were created for each language pair using text from the religious domain in the course of this research. A BLEU score of 30.04, 29.01 and 18.72 respectively was recorded for the English-to-Igbo, English-to-Yorùbá and Igbo-to-Yorùbá MT systems. An error analysis of the systems’ outputs was conducted using a linguistically motivated MT error analysis approach and it showed that errors occurred mostly at the lexical, grammatical and semantic levels. While the study reveals the potentials of our corpora, it also shows that the size of the corpora is yet an issue that requires further attention. Thus an important target in the immediate future is to increase the quantity and quality of the data.  


2015 ◽  
Vol 103 (1) ◽  
pp. 65-84 ◽  
Author(s):  
Abdullah Alrajeh ◽  
Mahesan Niranjan

Abstract In state-of-the-art phrase-based statistical machine translation systems, modelling phrase reorderings is an important need to enhance naturalness of the translated outputs, particularly when the grammatical structures of the language pairs differ significantly. Posing phrase movements as a classification problem, we exploit recent developments in solving large-scale multiclass support vector machines. Using dual coordinate descent methods for learning, we provide a mechanism to shrink the amount of training data required for each iteration. Hence, we produce significant computational saving while preserving the accuracy of the models. Our approach is a couple of times faster than maximum entropy approach and more memory-efficient (50% reduction). Experiments were carried out on an Arabic-English corpus with more than a quarter of a billion words. We achieve BLEU score improvements on top of a strong baseline system with sparse reordering features.


2015 ◽  
Vol 23 (1) ◽  
pp. 3-30 ◽  
Author(s):  
YVETTE GRAHAM ◽  
TIMOTHY BALDWIN ◽  
ALISTAIR MOFFAT ◽  
JUSTIN ZOBEL

AbstractCrowd-sourced assessments of machine translation quality allow evaluations to be carried out cheaply and on a large scale. It is essential, however, that the crowd's work be filtered to avoid contamination of results through the inclusion of false assessments. One method is to filter via agreement with experts, but even amongst experts agreement levels may not be high. In this paper, we present a new methodology for crowd-sourcing human assessments of translation quality, which allows individual workers to develop their own individual assessment strategy. Agreement with experts is no longer required, and a worker is deemed reliable if they are consistent relative to their own previous work. Individual translations are assessed in isolation from all others in the form of direct estimates of translation quality. This allows more meaningful statistics to be computed for systems and enables significance to be determined on smaller sets of assessments. We demonstrate the methodology's feasibility in large-scale human evaluation through replication of the human evaluation component of Workshop on Statistical Machine Translation shared translation task for two language pairs, Spanish-to-English and English-to-Spanish. Results for measurement based solely on crowd-sourced assessments show system rankings in line with those of the original evaluation. Comparison of results produced by the relative preference approach and the direct estimate method described here demonstrate that the direct estimate method has a substantially increased ability to identify significant differences between translation systems.


Complexity ◽  
2020 ◽  
Vol 2020 ◽  
pp. 1-11
Author(s):  
Rui Wang

Relying on large-scale parallel corpora, neural machine translation has achieved great success in certain language pairs. However, the acquisition of high-quality parallel corpus is one of the main difficulties in machine translation research. In order to solve this problem, this paper proposes unsupervised domain adaptive neural network machine translation. This method can be trained using only two unrelated monolingual corpora and obtain a good translation result. This article first measures the matching degree of translation rules by adding relevant subject information to the translation rules and dynamically calculating the similarity between each translation rule and the document to be translated during the decoding process. Secondly, through the joint training of multiple training tasks, the source language can learn useful semantic and structural information from the monolingual corpus of a third language that is not parallel to the current two languages during the process of translation into the target language. Experimental results show that better results can be obtained than traditional statistical machine translation.


2012 ◽  
Vol 279 (1742) ◽  
pp. 3556-3564 ◽  
Author(s):  
Nichola J. Raihani ◽  
Redouan Bshary

People often consider how their behaviour will be viewed by others, and may cooperate to avoid gaining a bad reputation. Sensitivity to reputation may be elicited by subtle social cues of being watched: previous studies have shown that people behave more cooperatively when they see images of eyes rather than control images. Here, we tested whether eye images enhance cooperation in a dictator game, using the online labour market Amazon Mechanical Turk (AMT). In contrast to our predictions and the results of most previous studies, dictators gave away more money when they saw images of flowers rather than eye images. Donations in response to eye images were not significantly different to donations under control treatments. Dictator donations varied significantly across cultures but there was no systematic variation in responses to different image types across cultures. Unlike most previous studies, players interacting via AMT may feel truly anonymous when making decisions and, as such, may not respond to subtle social cues of being watched. Nevertheless, dictators gave away similar amounts as in previous studies, so anonymity did not erase helpfulness. We suggest that eye images might only promote cooperative behaviour in relatively public settings and that people may ignore these cues when they know their behaviour is truly anonymous.


Sign in / Sign up

Export Citation Format

Share Document