Domains, text types, aspect marking and English-Chinese translation

1999 ◽  
Vol 2 (2) ◽  
pp. 211-229 ◽  
Author(s):  
Tony McEnery ◽  
Richard Xiao

This paper uses an English-Chinese parallel corpus, an L1 Chinese comparable corpus, and an L1 Chinese reference corpus to examine how aspectual meanings in English are translated into Chinese and explore the effects of domains, text types and translation on aspect marking. We will show that while English and Chinese both mark aspect grammatically, the aspect system in the two languages differs considerably. Even though Chinese, as an aspect language, is rich in aspect markers, covert marking (LVM) is a frequent and important strategy in Chinese discourse. The distribution of aspect markers varies significantly across domain and text type. The study also sheds new light on the translation effect by contrasting aspect marking in translated Chinese texts and L1 Chinese texts.

2011 ◽  
Vol 56 (2) ◽  
pp. 374-390 ◽  
Author(s):  
Lieve Macken ◽  
Orphée De Clercq ◽  
Hans Paulussen

This paper presents the Dutch Parallel Corpus, a high-quality parallel corpus for Dutch, French and English consisting of more than ten million words. The corpus contains five different text types and is balanced with respect to text type and translation direction. All texts included in the corpus have been cleared from copyright. We discuss the importance of parallel corpora in various research domains and contrast the Dutch Parallel Corpus with existing parallel corpora. The Dutch Parallel Corpus distinguishes itself from other parallel corpora by having a balanced composition and by its availability to the wide research community, thanks to its copyright clearance. All texts in the corpus are sentence-aligned and further enriched with basic linguistic annotations (lemmas and word class information). Approximately 25,000 words of the Dutch-English part have been manually aligned at the sub-sentential level. Rich metadata facilitates the navigability of the corpus and enables users to select the texts that satisfy their needs. The entire corpus is released as full texts in XML format and is also available via a web interface, which supports basic and complex search queries and presents the results as parallel concordances. The corpus will be distributed by the Flemish-Dutch Human Language Technology Agency (TST-Centrale).


Author(s):  
Ming-yueh Shen

Abstract This study aimed to determine as to whether or not the text type and strategy usage affect the EFL learners’ lexical inferencing performance. The participants were comprised of 87 first-year English majors at a technical university. Data were collected from (1) a lexical inferencing test with excerpts of narrative and expository texts, for which both multiple-choice and definition tasks were designed, respectively, and then (2) the responses from the learners’ self-reported strategy usage. The quantitative analyses demonstrated that the text types significantly affected the EFL learners’ lexical inferencing performance, in which the EFL learners performed better for the narrative excerpt than for the expository texts. However, significant coefficients between the strategy use and the lexical inferencing performance were not found in this study. The results further implied that the text structure and the lexical inferencing strategies should be explicitly taught to the EFL learners.


Target ◽  
2020 ◽  
Vol 32 (3) ◽  
pp. 420-455
Author(s):  
Shuangzi Pang ◽  
Kefei Wang

Abstract This article investigates the role of translations from English in language change in Chinese. It employs a new corpus, the Chinese Diachronic Composite Corpus (CDCC), which incorporates a parallel corpus and comparable corpus in three sampling periods in the twentieth century, and a refe­rence corpus as a starting point in the timeframe. We examine whether explicitness in English–Chinese translations has exerted an impact on the target language, focusing on adversative conjunctions as a measure of explicitness. The results of the study demonstrate that: (1) translated Chinese texts have changed in step with original Chinese texts in the frequency of adversative conjunctions; (2) translated Chinese texts and original Chinese texts are interrelated throughout the three periods, but the correlation between them has changed perceptibly over the three sample points; and (3) source language interference found in translated Chinese texts increases over the three periods.


2013 ◽  
Vol 421 ◽  
pp. 725-730
Author(s):  
Song Bin Bao

English, which is specially used in the field of manufacturing systems, belongs to ESP (English for specific purposes). In order to improve the effect of ESP education in China, it is very necessary to create an English-Chinese parallel corpus for aiding ESP teaching and learning. In this paper, a novel method is presented to create a small-scale English-Chinese parallel corpus by means of TMS (translation memory system). Firstly, the suitable English and Chinese texts are collected from network, publication and human translation; secondly, The English and Chinese texts are aligned and formatted by using the related TMS functions; then Chinese texts are split into words by using ICWSS (Intelligent Chinese Word Segmentation System); finally, the English-Chinese corpus is stored in cloud database. This small-scale English-Chinese parallel corpus can be searched through ParaConc and meet the basic needs of ESP teaching and learning. Since the method does not need to design new algorithm nor develop new software system, the construction of the corpus is much easier and more flexible compared to general large-scale corpus.


Target ◽  
2012 ◽  
Vol 24 (2) ◽  
pp. 203-224 ◽  
Author(s):  
Isabelle Delaere ◽  
Gert De Sutter ◽  
Koen Plevoets

With this article, we seek to support the law of growing standardization by showing that texts translated into Belgian Dutch make more use of standard language than non-translated Belgian Dutch texts. Additionally, we want to examine whether the use of standard vs. non-standard language can be attributed to the variables text type and source language. In order to achieve that goal, we gathered a diverse set of linguistic variables and used a 10-million-word corpus that is parallel, comparable and bidirectional (the Dutch Parallel Corpus; Macken et al. 2011). The frequency counts for each of the variables are used to determine the differences in standard language use by means of profile-based correspondence analysis (Plevoets 2008). The results of our analysis show that (i) in general, there is indeed a standardizing trend among translations and (ii) text types with a lot of editorial control (fiction, non-fiction and journalistic texts) contain more standard language than the less edited text types (administrative texts and external communication) which adds support for the idea that the differences between translated and non-translated texts are text type dependent.


2017 ◽  
Vol 141 ◽  
pp. 235-244
Author(s):  
Juri Kijko

Im vorliegenden Beitrag handelt es sich um eine kontrastive Analyse von Bauprinzipien in den deutschen und ukrainischen informationsbetonten Paralleltextsorten aus fraktaler Perspektive anhand renommierter gleichrangiger Tageszeitungen. Je nach der Textdimension lässt sich Zwei- bzw. Dreifraktalstruktur in den untersuchten Textsorten unterscheiden. Meldungen weisen α- und ω-Fraktale, Nachrichten und Berichte noch φ-Fraktal auf. Darüber hinaus stehen diese drei Textsorten in fraktaler Relation zueinander. Es dürfte also angenommen werden, dass Selbstähnlichkeit ein universales Bauprinzip in informationsbetonten Textsorten ist. Bedingt ist solch eine Baustruktur vor allem durch extralinguale Faktoren, wobei Zeit- und Platzmangel eine entscheidende Rolle spielen.Fractality in German and Ukrainian news text typesThe present paper focuses on a contrastive analysis of the structural principles in the German and Ukrainian news text types from a fractal perspective based on the material from the equivalent quality daily newspapers. Depending on the text size two- or three-fractal structures may be singled out in the news texts under study. The text type note has α- and ω-fractals, news articles and reports have additionally φ-fractal. Furthermore, these three text types are in a fractal relation to each other. It might be assumed that self-similarity is a universal building principle in news text types. Such a structure is caused especially by extralinguistic factors, where time and space play a crucial role.


Author(s):  
Finn Frandsen

The present paper gives a critical introduction to the theory of text types or text sequences elaborated by the French text linguist Jean-Michel Adam. The first part of the paper presents the overall theoretical framework for Adam’s research within stylistics and text linguistics. The second part of the paper gives a more detailed discussion of Adam’s answers to what may be defined as the four most crucial questions within text type research, that is: a) the number of text types which can be identified (the classification problem), b) the relation between text types within individual texts, c) the relation between text types and linguistic features and d) the relation between text types and their communicative function (the interaction between form and function).L’objectif de la linguistique textuelle est simple : poursuivre l’analyse lin-guistique au-delà de la phrase complexe et des seuls couples de phrases et, si difficile que cela paraisse, accepter de se situer aux frontières du linguistique dans le but de rendre compte de l’hétérogénéité de toute composition textuelle.


Mäetagused ◽  
2021 ◽  
Vol 81 ◽  
pp. 161-176
Author(s):  
Külli Prillop ◽  
◽  
Tiit Hennoste ◽  
Külli Habicht ◽  
Helle Metslang ◽  
...  

Within the project “Pragmatics above grammar: Subjectivity and intersubjectivity in Estonian registers and text types” (PRG341) we are studying the expression of subjectivity and intersubjectivity in different written and spoken registers of modern Estonian. We focus on adverbs that function as discourse markers (e.g. vist ‘maybe, probably’, ilmselt ‘apparently, obviously’, tegelikult ‘actually’), markers that develop from main clauses containing cognition verbs that take sentence complements (e.g. (ma) arvan ‘I think’, usun ‘I believe’, (mulle) tundub ‘it seems (to me), it appears (that)’) as well as modal and performative verbs (e.g. võib (juhtuda) ‘can (happen’, peaks (tulema) ‘should (come)’; kinnitan/väidan (olevat) ‘I affirm/claim’). The analysis combines quantitative corpus-linguistic and qualitative pragmatic approaches, thus belonging to the field of corpus pragmatics. Unlike previous studies of related topics, the project systematically compares the usage of markers in different registers (spoken, online communication, print texts) and text types. The pilot studies performed thus far have revealed several problems with the existing Estonian corpora, important in the study of pragmatics. Firstly, some text types are underrepresented or not represented at all, the text types cannot always be distinguished, and the particular text may not always correspond to the nominal text type (e.g. an academic text may contain quotes from texts of other types). All of this makes it difficult to do comparative statistical analysis of different text types. Secondly, the markers under examination are multifunctional and identifying their (inter)subjective function requires consideration of context broader than a single sentence. However, the public search systems for the existing corpora do not provide this context. For instance, the discourse marker function of cognition verbs is indicated primarily by the fact that the topic of the conversation or text follows through the subordinate clause, not the main clause. Since the available search systems do not provide context larger than a single sentence, the identification of the topic of the discourse, and therefore of the potential discourse-marker function of the verb, is made more difficult. To avoid these problems, the project working group is developing a new “Pragmatics” corpus, being created in the SketchEngine environment. The corpus is made up of 10 subcorpora representing different text types and registers. Each subcorpus contains roughly 500,000 words.


Author(s):  
Kirill I. Semenov ◽  

This article considers phonetic and graphic transformations of Russian loanwords in Chinese. The study comprises an analysis of both proper and common nouns, as well as both loanwords included in dictionaries and those used in the Internet. The data considered make it possible to detect the main trends in the adaptation of Russian consonants in Chinese, as well as to localize the hypothetical influence of the Russian-Chinese pidgin on current loanword adaptation in Mandarin Chinese. It is noted that there is a dramatic discrepancy between the norms of transliteration prescribed by the PRC media and the usage in the Internet. Furthermore, a significant level of specificity of the hieroglyphic N-grams in the loanwords is revealed, compared to the reference corpus of the Chinese texts. The author expects that the results of the work will be useful for specialists both in phonetic typology and in NLP.


Author(s):  
Philip M. McCarthy ◽  
Shinobu Watanabe ◽  
Travis A. Lamkin

Natural language processing tools, such as Coh-Metrix (see Chapter 11, this volume) and LIWC (see Chapter 12, this volume), have been tremendously successful in offering insight into quantifiable differences between text types. Such quantitative assessments have certainly been highly informative in terms of evaluating theoretical linguistic and psychological categories that distinguish text types (e.g., referential overlap, lexical diversity, positive emotion words, and so forth). Although these identifications are extremely important in revealing ability deficiencies, knowledge gaps, comprehension failures, and underlying psychological phenomena, such assessments can be difficult to interpret because they do not explicitly inform readers and researchers as to which specific linguistic features are driving the text type identification (i.e., the words and word clusters of the text). For example, a tool such as Coh-Metrix informs us that expository texts are more cohesive than narrative texts in terms of sentential referential overlap (McNamara, Louwerse, & Graesser, in press; McCarthy, 2010), but it does not tell us which words (or word clusters) are driving that cohesion. That is, we do not learn which actual words tend to be indicative of the text type differences. These actual words may tend to cluster around certain psychological, cultural, or generic differences, and, as a result, researchers and materials designers who might wish to create or modify text, so as to better meet the needs of readers, are left somewhat in the dark as to which specific language to use. What is needed is a textual analysis tool that offers qualitative output (in addition to quantitative output) that researchers and materials designers might use as a guide to the lexical characteristics of the texts under analysis. The Gramulator is such a tool.


Sign in / Sign up

Export Citation Format

Share Document