frequency word Latest Research Papers

Investigating vocabulary in academic spoken English: Corpora, teachers, and learners

10.26686/wgtn.17060051.v1 ◽

2021 ◽

Author(s):

◽

Yen Dang

Keyword(s):

High Frequency ◽

Corpus Linguistics ◽

Word List ◽

Vocabulary Knowledge ◽

Frequency Word ◽

High Frequency Word ◽

Spoken English ◽

Foreign Language Learners ◽

L2 Learners ◽

High Frequency Words

Understanding academic spoken English is challenging for second language (L2) learners at English-medium universities. A lack of vocabulary is a major reason for this difficulty. To help these learners overcome this challenge, it is important to examine the nature of vocabulary in academic spoken English. This thesis presents three linked studies which were conducted to address this need. Study 1 examined the lexical coverage in nine spoken and nine written corpora of four well-known general high-frequency word lists: West’s (1953) General Service List (GSL), Nation’s (2006) BNC2000, Nation’s (2012) BNC/COCA2000, and Brezina and Gablasova’s (2015) New-GSL. Study 2 further compared the BNC/COCA2000 and the New-GSL, which had the highest coverage in Study 1. It involved 25 English first language (L1) teachers, 26 Vietnamese L1 teachers, 27 various L1 teachers, and 275 Vietnamese English as a Foreign Language learners. The teachers completed 10 surveys in which they rated the usefulness of 973 non-overlapping items between the BNC/COCA2000 and the New-GSL for their learners in a five-point Likert scale. The learners took the Vocabulary Levels Test (Nation, 1983, 1990; Schmitt, Schmitt, & Clapham, 2001), and 15 Yes/No tests which measured their knowledge of the 973 words. Study 3 involved compiling two academic spoken corpora, one academic written corpus, and one non-academic spoken corpus. Each contains approximately 13-million running words. The academic spoken corpora contained four equally-sized sub-corpora. From the first academic spoken corpus, 1,741 word families were selected for the Academic Spoken Word List (ASWL). The coverage of the ASWL and the BNC/COCA2000 in the four corpora and the potential coverage of the ASWL for learners of different vocabulary levels were determined. Six main findings were drawn from these studies. First, in the first academic spoken corpus, the ASWL and its levels had slightly higher coverage in certain disciplinary sub-corpora than in the others. Yet, the list provided around 90% coverage of each sub-corpus. It helps learners to achieve 92%-96% coverage of academic speech depending on their levels. Second, the BNC/COCA2000 is the most suitable general high-frequency word list for L2 learners from the perspectives of corpus linguistics, teachers, and learners. It provided higher coverage than the GSL and the BNC2000, and had more words known by learners and perceived as being useful by teachers than the New-GSL. Third, general high-frequency words, especially the most frequent 1,000 words, provided much higher coverage in spoken corpora than written corpora in both academic and non-academic discourse. Fourth, despite the importance of general high-frequency words, a reasonable proportion of the learners had insufficient knowledge of these words, which highlights the importance of a word list which is adaptable to learners’ proficiency like the ASWL. Fifth, lexical coverage had significant but small correlations with teacher perception of word usefulness and learner vocabulary knowledge. Sixth, the Vietnamese L1 teachers had the highest correlation between the teacher ratings of word usefulness and the learner vocabulary knowledge. Next came the various L1 teachers, and then the English L1 teachers. This thesis also provides theoretical, pedagogical, and methodological implications of these findings so that L2 learners can gain better support in their vocabulary development and achieve better comprehension of academic spoken English.

Download Full-text

Investigating vocabulary in academic spoken English: Corpora, teachers, and learners

10.26686/wgtn.17060051 ◽

2021 ◽

Author(s):

◽

Yen Dang

Keyword(s):

High Frequency ◽

Corpus Linguistics ◽

Word List ◽

Vocabulary Knowledge ◽

Frequency Word ◽

High Frequency Word ◽

Spoken English ◽

Foreign Language Learners ◽

L2 Learners ◽

High Frequency Words

Understanding academic spoken English is challenging for second language (L2) learners at English-medium universities. A lack of vocabulary is a major reason for this difficulty. To help these learners overcome this challenge, it is important to examine the nature of vocabulary in academic spoken English. This thesis presents three linked studies which were conducted to address this need. Study 1 examined the lexical coverage in nine spoken and nine written corpora of four well-known general high-frequency word lists: West’s (1953) General Service List (GSL), Nation’s (2006) BNC2000, Nation’s (2012) BNC/COCA2000, and Brezina and Gablasova’s (2015) New-GSL. Study 2 further compared the BNC/COCA2000 and the New-GSL, which had the highest coverage in Study 1. It involved 25 English first language (L1) teachers, 26 Vietnamese L1 teachers, 27 various L1 teachers, and 275 Vietnamese English as a Foreign Language learners. The teachers completed 10 surveys in which they rated the usefulness of 973 non-overlapping items between the BNC/COCA2000 and the New-GSL for their learners in a five-point Likert scale. The learners took the Vocabulary Levels Test (Nation, 1983, 1990; Schmitt, Schmitt, & Clapham, 2001), and 15 Yes/No tests which measured their knowledge of the 973 words. Study 3 involved compiling two academic spoken corpora, one academic written corpus, and one non-academic spoken corpus. Each contains approximately 13-million running words. The academic spoken corpora contained four equally-sized sub-corpora. From the first academic spoken corpus, 1,741 word families were selected for the Academic Spoken Word List (ASWL). The coverage of the ASWL and the BNC/COCA2000 in the four corpora and the potential coverage of the ASWL for learners of different vocabulary levels were determined. Six main findings were drawn from these studies. First, in the first academic spoken corpus, the ASWL and its levels had slightly higher coverage in certain disciplinary sub-corpora than in the others. Yet, the list provided around 90% coverage of each sub-corpus. It helps learners to achieve 92%-96% coverage of academic speech depending on their levels. Second, the BNC/COCA2000 is the most suitable general high-frequency word list for L2 learners from the perspectives of corpus linguistics, teachers, and learners. It provided higher coverage than the GSL and the BNC2000, and had more words known by learners and perceived as being useful by teachers than the New-GSL. Third, general high-frequency words, especially the most frequent 1,000 words, provided much higher coverage in spoken corpora than written corpora in both academic and non-academic discourse. Fourth, despite the importance of general high-frequency words, a reasonable proportion of the learners had insufficient knowledge of these words, which highlights the importance of a word list which is adaptable to learners’ proficiency like the ASWL. Fifth, lexical coverage had significant but small correlations with teacher perception of word usefulness and learner vocabulary knowledge. Sixth, the Vietnamese L1 teachers had the highest correlation between the teacher ratings of word usefulness and the learner vocabulary knowledge. Next came the various L1 teachers, and then the English L1 teachers. This thesis also provides theoretical, pedagogical, and methodological implications of these findings so that L2 learners can gain better support in their vocabulary development and achieve better comprehension of academic spoken English.

Download Full-text

Good-enough production: Selecting easier words instead of more accurate ones

10.31234/osf.io/q2h9d ◽

2021 ◽

Author(s):

Mark Koranda ◽

Martin Zettersten ◽

Maryellen MacDonald

Keyword(s):

High Frequency ◽

Low Frequency ◽

Word Meanings ◽

Frequency Word ◽

Trade Off ◽

Word Use ◽

Trade Offs ◽

Language Communication ◽

The Times ◽

High Frequency Words

While many implicit decisions are the result of a trade-off, trade-offs in word use, such as whether a producer meant to convey a message more aligned with kitten despite saying a more accessible word like cat, are difficult to measure. To test the trade-off between message alignment and accessibility, we designed an artificial lexicon where word meanings corresponded to angles on a compass. In a novel language communication game, participants trained on some words more than others (high- vs low-frequency), and then earned points by producing words, often requiring an implicit decision between a high- vs low-frequency word. A trade-off was observed across four experiments, such that high-frequency words were produced even when less aligned with messages. Since high-frequency words are more accessible, these results suggest that implicit decisions between words are impacted by accessibility. Of all the times that people have said cat, many times they likely meant kitten.

Download Full-text

High Frequency Words Produced by Typically Developing Mandarin-Speaking Children Between 3 and 6 Years of Age

SAGE Open ◽

10.1177/21582440211036109 ◽

2021 ◽

Vol 11 (3) ◽

pp. 215824402110361

Author(s):

Shang-Yu Wu ◽

Shanju Lin ◽

Rei-Jane Huang ◽

I-Fang Tsai

Keyword(s):

High Frequency ◽

Age Groups ◽

Word List ◽

Free Play ◽

Typically Developing ◽

Frequency Word ◽

High Frequency Word ◽

Pos Tagging ◽

Word Lists ◽

High Frequency Words

The purpose of this study was to provide high frequency word lists for Mandarin-speaking children between 3 and 6 years of age and to explore the differences between each part of speech (POS) category among different age groups. Participants were 209 typically developing native Mandarin speakers aged between 3 and 6 years, born in Taiwan, and recruited from Mandarin-language preschools in Taipei, New Taipei City, and Miaoli. Language samples were collected through conversations, free play, and story retelling. The researchers then transcribed the samples, segment utterances, and words, tagging the POS corresponding to each word. The frequencies of word occurrences were then analyzed and ranked to generate a high frequency word list. The mean frequency of each POS category was calculated to identify significant differences between age groups. The results showed high frequency word lists, including the corresponding POS tagging. Significant differences were found in 10 of the 11 POS categories among age groups. The results of this study presented preliminary information concerning high frequency words produced by Mandarin-speaking children aged between 3 and 6 years and the development of their use of each POS category.

Download Full-text

The Effects Of Communication Mediums On The Evolution Of Modern And Contemporary Western Poetry

10.32920/ryerson.14665842.v1 ◽

2021 ◽

Author(s):

Sarah Ostapchuk

Keyword(s):

Word Frequency ◽

Past Century ◽

Change Over Time ◽

Frequency Word ◽

The Past ◽

Over Time

This study analyses how popular communication mediums over the past century have changed the form and content of poetry. A periodical and small magazine published in 1912 are assessed and compared, as well as an anthology and several poems from Instagram published in 2014. All poems are also briefly compared to get an understanding of change over time. Medium affordances are considered, especially with respect to multimodal capacities. By assessing vocabulary density, word frequency, word distinctiveness, and visual formatting, characteristics of poetry from specific mediums arise, leading to a conclusion that mediums have an effect on the evolution of poetry.

Download Full-text

The Effects Of Communication Mediums On The Evolution Of Modern And Contemporary Western Poetry

10.32920/ryerson.14665842 ◽

2021 ◽

Author(s):

Sarah Ostapchuk

Keyword(s):

Word Frequency ◽

Past Century ◽

Change Over Time ◽

Frequency Word ◽

The Past ◽

Over Time

This study analyses how popular communication mediums over the past century have changed the form and content of poetry. A periodical and small magazine published in 1912 are assessed and compared, as well as an anthology and several poems from Instagram published in 2014. All poems are also briefly compared to get an understanding of change over time. Medium affordances are considered, especially with respect to multimodal capacities. By assessing vocabulary density, word frequency, word distinctiveness, and visual formatting, characteristics of poetry from specific mediums arise, leading to a conclusion that mediums have an effect on the evolution of poetry.

Download Full-text

Predicting second language listening functor comprehension probability with usage-based and embodiment approaches

International Journal of Bilingualism ◽

10.1177/13670069211000851 ◽

2021 ◽

pp. 136700692110008

Author(s):

Allie Patterson

Keyword(s):

Second Language ◽

Word Length ◽

First Language ◽

The Body ◽

Frequency Word ◽

Independent Variables ◽

Second Language Listening ◽

L2 Listening ◽

Listening Task ◽

The Relationship

Aims and Objectives: Embodiment is a major paradigm of first language (L1) research but has not yet been widely adopted in second language (L2) research. The main objective of this research was to find evidence for the effects of sensorimotor embodiment on L2 listening functor comprehension rates. Research Hypothesis: Frequency, word length, and Minkowski3 sensorimotor norms are significantly predictive of functor comprehension probability in an L2 listening task. Methodology: 129 Japanese participants were administered a paused transcription test that contained twelve target phrases. Data and analysis: Transcription of functors was the dependent variable. The independent variables were frequency, word length, and Minkowski3 sensorimotor ratings. These variables were analyzed with logit mixed-effects regressions. Findings/conclusions: Greater frequency, longer word length, and higher Minkowski3 ratings were found to facilitate comprehension and significantly increase the probability that a functor was transcribed. Frequency rates derived from spontaneous L1 oration and conversations were found to be significant, whereas frequency derived from written texts was not significant despite being from a much larger corpus. Originality: No L2 study has used Minkowski3 sensorimotor ratings to predict L2 performance. Minkowski3 ratings quantify the relationship between language and the body. Few researchers have yet to incorporate embodiment theories into models of L2 comprehension. Implications: Embodiment theories complement usage-based approaches and should be incorporated into existing L2 theories. Researchers should be aware of textual differences between corpora and choose corpora appropriate for their analyses.

Download Full-text

A CORPUS-BASED STUDY ON THE EDUCATION-RELATED TED TALKS BETWEEN NATIVE AND NON-NATIVE SPEAKERS

European Journal of Foreign Language Teaching ◽

10.46827/ejfl.v5i3.3577 ◽

2021 ◽

Vol 5 (3) ◽

Author(s):

Hieu Manh Do

Keyword(s):

Data Collection ◽

Native Speakers ◽

Pivotal Role ◽

Frequency Word ◽

Word Lists ◽

Ted Talks ◽

Education Field

This study aims to find out the frequency word lists in the TED talks in the education field as well as the comparison of the language used by native speakers (NS) and non-native speakers (NNS). The researcher collected four transcripts (two from NS and the others two from NNS) from the TED talks. AntConc is the main software that would be used to investigate the frequency word lists. Data collection includes two steps: (1) collecting the four transcripts of TED talks and (2) listing top 10, 20, and 100 frequency word lists of TED talks corpus of NS and NNS, separately. The findings found that both speakers usually use functional words more than content words. However, content words play a pivotal role in making a full meaning sentence. Article visualizations:<img src="/-counters-/edu_01/0787/a.php" alt="Hit counter" />

Download Full-text

N-Clustering of Text Documents Using Graph Mining Techniques

Encyclopedia of Information Science and Technology, Fifth Edition - Advances in Information Quality and Management ◽

10.4018/978-1-7998-3479-3.ch057 ◽

2021 ◽

pp. 828-846

Author(s):

Bapuji Rao

Keyword(s):

Word Frequency ◽

Graph Mining ◽

Incidence Matrix ◽

Input Word ◽

Text Documents ◽

Number Of Clusters ◽

Frequency Word ◽

N Input ◽

The Creation ◽

Selection Of

The chapter is about the clustering of text documents based on the input of the n-number of words on the m-number of text documents using graph mining techniques. The author has proposed an algorithm for clustering of text documents by inputting n-number of words on m-number of text documents. First of all the proposed algorithm starts the selection of documents with extension name “.txt” from m-numbers of documents having various types of extension names. The n-number of words are input on the selected “.txt” documents, the algorithm starts n-clustering of text documents based on an n-input word. This is possible by way of creation of a document-word frequency matrix in the memory. Then the frequency-word table is converted into the un-oriented document-word incidence matrix by replacing all non-zeros with 1s. Using the un-oriented document-word incidence matrix, the algorithm starts the creation of n-number of clusters of text documents having the presence of words ranging from 1 to n respectively. Finally, these n-clusters based on word-wise as well as 1 to n word-wise.

Download Full-text

How to write medical abstracts? The rhetorical structure and phrases used in Epidemiology

Jornal Brasileiro de Doenças Sexualmente Transmissíveis ◽

10.5327/dst-2177-8264-20213317 ◽

2021 ◽

Vol 33 ◽

Author(s):

Ana Luiza Pires de Freitas ◽

Ana Eliza Pereira Bocorny

Keyword(s):

High Frequency ◽

Corpus Linguistics ◽

Building Blocks ◽

Formulaic Language ◽

Function Words ◽

Frequency Word ◽

Rhetorical Structure ◽

High Frequency Word ◽

Review Analysis ◽

And Function

Introduction: Abstracts are critical in medical contexts. They contain formulaic building blocks called Lexical Frames (LFs), which are high-frequency word sequences with variable slots that can be formed around collocation nodes. LFs are abundant in written academic discourse, and , for this reason, have great importance for the production of abstracts. Extensive research has been conducted on formulaic language, especially on medical genres. Fewer studies, however, have focused on LFs from specialty-specific corpora (.e.g., epidemiology) and their relationship with the rhetorical structure of abstracts. Objective: This study aims to fill this gap by describing the structure of epidemiology abstracts, presenting their rhetorical functions, and identifying the LFs that linguistically realize these functions to help researchers write more conventional abstracts. Methods: We put together three corpora of abstracts in the field, published in English in peer-reviewed journals, and combined genre analysis and Corpus Linguistics principles to identify the linguistic realizations of the rhetorical functions in the texts. First, the rhetorical structure was described; then, the LFs were identified and analyzed. Results: 92% of the texts follow a pre-established pattern, whose structure consists of five to nine sections. Eight saliently frequent nodes (study, result, method, conclusion, review, analysis, patients, and findings) around which the LFs are constructed were identified. Conclusion: Even though both the content and function words that make up the LFs show some variation, it is possible to notice that the LFs elicited typify the linguistic realizations of the corresponding sections' rhetorical functions and, thus, are suitable to the observation of a pattern. For that reason, the data obtained in this study were used to inform the creation of a support framework for the writing of specialty-specific medical abstracts.

Download Full-text

frequency word
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Investigating vocabulary in academic spoken English: Corpora, teachers, and learners

Investigating vocabulary in academic spoken English: Corpora, teachers, and learners

Good-enough production: Selecting easier words instead of more accurate ones

High Frequency Words Produced by Typically Developing Mandarin-Speaking Children Between 3 and 6 Years of Age

The Effects Of Communication Mediums On The Evolution Of Modern And Contemporary Western Poetry

The Effects Of Communication Mediums On The Evolution Of Modern And Contemporary Western Poetry

Predicting second language listening functor comprehension probability with usage-based and embodiment approaches

A CORPUS-BASED STUDY ON THE EDUCATION-RELATED TED TALKS BETWEEN NATIVE AND NON-NATIVE SPEAKERS

N-Clustering of Text Documents Using Graph Mining Techniques

How to write medical abstracts? The rhetorical structure and phrases used in Epidemiology

Export Citation Format

frequency wordRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Investigating vocabulary in academic spoken English: Corpora, teachers, and learners

Investigating vocabulary in academic spoken English: Corpora, teachers, and learners

Good-enough production: Selecting easier words instead of more accurate ones

High Frequency Words Produced by Typically Developing Mandarin-Speaking Children Between 3 and 6 Years of Age

The Effects Of Communication Mediums On The Evolution Of Modern And Contemporary Western Poetry

The Effects Of Communication Mediums On The Evolution Of Modern And Contemporary Western Poetry

Predicting second language listening functor comprehension probability with usage-based and embodiment approaches

A CORPUS-BASED STUDY ON THE EDUCATION-RELATED TED TALKS BETWEEN NATIVE AND NON-NATIVE SPEAKERS

N-Clustering of Text Documents Using Graph Mining Techniques

How to write medical abstracts? The rhetorical structure and phrases used in Epidemiology

frequency word
Recently Published Documents