Use of syntactic context to produce term association lists for text retrieval

Author(s):  
Gregory Grefenstette
2020 ◽  
Vol 31 (2) ◽  
pp. 339-365
Author(s):  
Jakob Neels

AbstractThis paper explores the added value of studying intra- and inter-speaker variation in grammaticalisation based on idiolect corpora. It analyses the usage patterns of the English let alone construction in a self-compiled William Faulkner corpus against the backdrop of aggregated community data. Vast individual differences (early Faulkner vs. late Faulkner vs. peers) in frequencies of use are observed, and these frequency differences correlate with different degrees of grammaticalisation as measured in terms of host-class and syntactic context expansion. The corpus findings inform general issues in current cognitive-functional research, such as the from-corpus-to-cognition issue and the cause/consequence issue of frequency. They lend support to the usage-based view of grammaticalisation as a lifelong, frequency-sensitive process of cognitive automation. To substantiate this view, this paper proposes a self-feeding cycle of constructional generalisation that is driven by the interplay of frequency, entrenchment, partial sanction and habituation.


2021 ◽  
Author(s):  
Sungkwon Choo ◽  
Seong Jong Ha ◽  
Joonsoo Lee

2020 ◽  
Vol 54 (s41) ◽  
pp. 37-65
Author(s):  
Julia Fernández-Cuesta ◽  
Nieves Rodríguez-Ledesma

Abstract One of the most characteristic features of the grammar of the Lindisfarne Gospel gloss is the absence of the etymological -e inflection in the dative singular in the paradigm of the strong masculine and neuter declension (a-stems). Ross (1960: 38) already noted that endingless forms of the nominative/accusative cases were quite frequent in contexts where a dative singular in -e would be expected, to the extent that he labeled the forms in -e ‘rudimentary dative.’ The aim of this article is to assess to what extent the dative singular is still found as a separate case in the paradigms of the masculine and neuter a-stems and root nouns. To this end a quantitative/statistical analysis of nouns belonging to these classes has been carried out in contexts where the Latin lemma is either accusative or dative. We have tried to determine whether variables such as syntactic context, noun class, and frequency condition the presence or absence of the -e inflection, and whether the distribution of the inflected and uninflected forms is different in the various demarcations that have been identified in the gloss. The data have been retrieved using the Dictionary of Old English Corpus. All tokens have been checked against the facsimile edition and the digitised manuscript in order to detect possible errors.


1988 ◽  
Vol 11 (1-2) ◽  
pp. 33-46 ◽  
Author(s):  
Tove Fjeldvig ◽  
Anne Golden

The fact that a lexeme can appear in various forms causes problems in information retrieval. As a solution to this problem, we have developed methods for automatic root lemmatization, automatic truncation and automatic splitting of compound words. All the methods have as their basis a set of rules which contain information regarding inflected and derived forms of words – and not a dictionary. The methods have been tested on several collections of texts, and have produced very good results. By controlled experiments in text retrieval, we have studied the effects on search results. These results show that both the method of automatic root lemmatization and the method of automatic truncation make a considerable improvement on search quality. The experiments with splitting of compound words did not give quite the same improvement, however, but all the same this experiment showed that such a method could contribute to a richer and more complete search request.


Sign in / Sign up

Export Citation Format

Share Document