Measuring grammatical status in Chinese through quantitative corpus analysis

Corpora ◽  
2020 ◽  
Vol 15 (3) ◽  
pp. 317-342
Author(s):  
Linlin Sun ◽  
David Correia Saavedra

This paper applies a quantitative model developed for measuring grammatical status, using data from the Lancaster Corpus of Mandarin Chinese (lcmc). The model takes into account four quantitative factors (token frequency, collocate diversity, colligate diversity and deviation of proportions) and uses them as predictors in a binary logistic regression in order to compute a score of grammatical status between ‘0’ (lexical/non-grammatical) and ‘1’ (highly grammatical) for each given element. The results of the lcmc model are then compared to those of a similar study of the British National Corpus (bnc). The comparison suggests that token frequency emerges as one of the most relevant parameters for quantifying degrees of grammatical status in both language models, together with the collocate diversity measure when using a broad window span. On the other hand, the colligational measures (left- or right-based) and the other collocate diversity measures using small spans (left- or right-based) contribute very differently to the two languages due to their typologically distinctive structures.

2018 ◽  
Vol 10 (1) ◽  
pp. 1-37 ◽  
Author(s):  
Olli O. Silvennoinen

Abstract This paper discusses constructional variation in the domain of contrastive negation in English, using data from the British National Corpus. Contrastive negation refers to constructs with two parts, one negative and the other affirmative, such that the affirmative offers an alternative to the negative in the frame in question (e.g. shaken, not stirred; not once but twice; I don’t like it – I love it). The paper utilises multiple correspondence analysis to explore the degree of synonymy among the various constructional schemas of contrastive negation, finding that different schemas are associated with different semantic, pragmatic and extralinguistic contexts but also that certain schemas do not differ from each other in a significant way.


2021 ◽  
Vol 3 (1) ◽  
pp. 9-21
Author(s):  
Namkil Kang

The ultimate goal of this paper is to provide a comparative analysis of rely on and depend on in the Corpus of Contemporary American English and the British National Corpus. The COCA clearly shows that the expression rely on government is the most preferred by Americans, followed by rely on people, and rely on data. The COCA further indicates that the expression depend on slate is the most preferred by Americans, followed by depend on government, and depend on people. The BNC shows, on the other hand, that the expression rely on others is the most preferred by the British, followed by rely on people, and rely on friends. The BNC further indicates that depend on factors and depend on others are the most preferred by the British, followed by depend on age, and depend on food. Finally, in the COCA, the nouns government, luck, welfare, people, information, state, fossil, water, family, oil, food, and things are linked to both rely on and depend on, but many nouns are not still linked to both of them. On the other hand, in the BNC, only the nouns state, chance, government, and others are linked to both rely on and depend on, but many nouns are not still linked to both rely on and depend on. It can thus be inferred from this that rely on is slightly different from depend on in its use.


Author(s):  
Monserrat Martínez Vázquez

In this paper I present an empirical approach to the analysis of the way English speakers conceptualize the communicative process in English. Most linguistic expressions about language in English are surface manifestations of what Reddy termed the "conduit metaphor". Reddy's model implies several interrelated cognitive associations: words are conceived as containers in which speakers introduce their ideas and send them to listeners, who will take these ideas out of these containers. Central to this model is the metaphor words are containers. It has also been claimed that there are other ways of perspectivizing the language process apart from the notion of containment (Vanparys 1995). In fact, Reddy himself notes that there is approximately a 30% of metalanguage not based on the conduit metaphor. The pervasiveness of the container metaphor would reasonably be most directly tested in expressions with the lexeme word. In order to measure what falls inside and outside these containers I carry out a corpus analysis of the lexeme word excerpted from the British National Corpus (BNC). The systematic evidence obtained from a large but delimited corpus gives us more reliable information about the frequency and use of this metaphor than an intuition based analysis or an arbitrary search in multi-source corpora.


Author(s):  
Hang Su ◽  
Susan Hunston

Abstract This study takes a lexical-grammatical approach to exploring the evaluation of human behaviour and/or character. It uses adjective complementation patterns as the starting point to examine the lexical-grammatical resources at risk in the appraisal system of judgement, aiming to explore the extent to which we can arrive at the same categorization of the resources realizing judgement if a formal or lexical-grammatical approach, rather than a discourse-semantic one, is taken. Using a corpus compiled of texts categorized as ‘Biography’ in the British National Corpus, the study, on the one hand, shows that most of the items identified can be very satisfactorily classified in terms posited in the judgement system, suggesting that the nomenclature from that model is useful. On the other hand, a considerable number of items have also been identified which construe attitudes towards emotional types of personality traits, leading to the proposal of a potentially useful new judgement category and further an adjusted system of judgement. The heuristic potential of aligning the lexical-grammatical and discourse-semantic approaches to appraisal is further discussed.


2001 ◽  
Vol 8 (2) ◽  
pp. 251-282 ◽  
Author(s):  
Angela Downing

A broad view of evidentiality is adopted, based on Chafe (1986) and Haviland (1987) which goes beyond the grammatical marking of the speaker’s or writer’s perceived sources of knowledge and reliability of these sources to encode, not only what the speaker knows and how s/he knows it, but also what can be taken to be an addressee’s state of knowledge. According to this view, evidentials are contemplated as interactive devices or resources for redefining common ground between interlocutors. They go beyond referential content to signal such meanings as confrontation and contradictory assumptions. They are necessarily situated in social contexts and have an indexical function. They may also overlap with epistemic stances and with affect, ranging in the case of surely from surprise, disbelief, doubt and disapproval to persuasion and an invitation to share beliefs or to agree on future courses of action. Using data from the British National Corpus, I analyse a sample of concordances of surely with subject personal pronouns, with the aim of providing a preliminary characterisation of the range of interpersonal attitudes expressed by surely and the determining factors which trigger these apparently contradictory stances.


2019 ◽  
Vol 1 (2) ◽  
pp. 34
Author(s):  
Entusiastik -

This paper analysed the use of corpus and spoken language features in the English Language Teaching (ELT) coursebook “Touchstone”. The corpus analysis was carried out by using the British National Corpus (BNC) which was chosen for its easy and free access. In doing the spoken language analysis, I refer to McCarthy and Carter’s (2015, p.5) argument which take the grammar of conversation as ‘the benchmark for a grammar of speaking’ by considering features such as ellipsis, heads and teailsm lexical bundles, and vagueness. The analysis indicated that the language used in this coursebook signified a certain level of authentic and natural language, although areas of improvement were also found.


2020 ◽  
Vol 8 (4) ◽  
pp. p48
Author(s):  
Namkil Kang

The main goal of this paper is to provide a detailed frequency analysis of the five types it is imperative that, it is vital that, it is essential that, it is important that, and it is necessary that within the British National Corpus (100 million, British, 1980s-1993), the Corpus of Contemporary American English (1.0 billion, US, 1990-2019), the Corpus of Historical American English (400 million, US, 1810s-2000s), and the Hansard Corpus (1.6 billion, British Parliament). In this paper, we have examined the frequency of the five types and collected the data. A major point to note is that it is important that was the most preferred by British people, followed by it is essential that, it is vital that, it is imperative that, and it is necessary that, in that order. The BNC clearly shows, on the other hand, that it is important that was the most commonly used one in the spoken genre, magazine genre, newspaper genre, and academic genre. A further point to note is that it is important that was the most preferred by Americans, followed by it is imperative that, it is essential that, it is vital that, and it is necessary that, in that order. The COCA clearly indicates that it is important that was the most widely used one in the blog genre, web genre, spoken genre, fiction genre, magazine genre, newspaper genre, and academic genre. The reason why it is important that was the most preferred by Americans and British people in the academic genre may be that a moderate obligation is suitable for conveying factual information. With respect to the COHA, it is worth noting that it is necessary that was the most preferred by Americans from 1810 to 2000, followed by it is important that, it is essential that, it is imperative that, and it is vital that. As for the HC, it is important that was the most preferred by British politicians, followed by it is essential that, it is vital that, it is necessary that, and it is imperative that. It is worth noting that Americans and British politicians show the similar pattern in the ranking of the five types in that Americans did not prefer a strong statement or the strongest statement, whereas British politicians did not prefer the strongest statement.


Author(s):  
Dr. Hamad Abdullah H Aldawsari

Many people use pause fillers such as um, erm, and er in order to signal to the other person that they have not finished speaking yet. This paper aims to investigate pause fillers and their relationship with the two sociolinguistic variables of age and gender. The data-driven analysis is based on the British National Corpus (BNC). The results show that the sociolinguistic variables of age and gender influence the use of pause fillers among British English speakers, which is proposed to be linked to the advancement of age and an improved fluency among female speakers.


2019 ◽  
Vol 24 (2) ◽  
pp. 333-363 ◽  
Author(s):  
MARIOS ANDREOU ◽  
ROCHELLE LIEBER

In this article we explore the range of aspectual and quantificational readings that are available to two kinds of deverbal nominalizations in English, conversion nouns and -ing nominals. Using data gathered from the Corpus of Contemporary American English (COCA) and the British National Corpus (BNC), we examine the range of readings available for the conversion and -ing forms of 106 English verbs in context. We distinguish eventive versus referential readings, looking at instances of both count and mass quantification for the two kinds of nominalizations. Within the eventive readings we also distinguish bounded versus unbounded aspectual readings, and within bounded readings two types that we call ‘completive’ and ‘package’. We argue that the quantificational properties and aspectual intepretation of both conversion and -ing nominalizations are not rigidly or even loosely determined by the form of the nominalization, but that the lexical aspect of the base verb (state, activity, accomplishment, achievement, semelfactive) plays some role in circumscribing aspectual readings. We argue that the strongest role in determining quantificational and aspectual readings is played by factors arising from the context in which conversion forms and -ing nominalizations are deployed. The aspectual interpretation of conversion and -ing nominalizations can be influenced by the presence of temporal and quantificational modifiers, by surrounding tenses, as well as by encyclopedic knowledge. We conclude with a consideration of the theoretical implications of our findings.


2011 ◽  
Vol 15 (2) ◽  
pp. 363-385 ◽  
Author(s):  
JOHN PAYNE

English genitive noun-phrase coordinations follow two patterns. The first is the single genitive, in which exponence of the genitive case occurs solely on the final coordinate, e.g. Mary and Jane's; and the second is the multiple genitive, in which exponence of the genitive case occurs on all coordinates, e.g. Mary's and Jane's. When either of the coordinates is a personal pronoun, difficult choices have to be made about the form of the pronoun. These difficulties arise especially with the single genitive, which is judged to be totally ungrammatical in coordinations like *my wife and I's or *my wife and my. On the other hand, the alternative use of the multiple genitive, my wife's and my, conflicts with a preference for the single genitive when the coordinates are felt to constitute a single unit. In this article, we first conduct a corpus-based analysis for genitive coordinations with personal pronouns, based on the British National Corpus. This, supplemented by some non-standard examples from web-based sources, gives some insight into the choices actually made by native speakers. We then provide a theoretical account of the syntactic problems that genitive coordinations with pronouns create. This account is shown to be compatible solely with an analysis of the English ’s genitive as an inflectional affix.


Sign in / Sign up

Export Citation Format

Share Document