Speech Pauses and Pronominal Anaphors

Frontiers in Computer Science ◽

10.3389/fcomp.2021.659539 ◽

2021 ◽

Vol 3 ◽

Author(s):

Costanza Navarretta

Keyword(s):

Machine Learning ◽

Speech Act ◽

Spoken Language ◽

Language Models ◽

Complex Processes ◽

Filled Pauses ◽

Abstract Entities ◽

Class Function ◽

Demonstrative Pronouns ◽

Third Person

This paper addresses the usefulness of speech pauses for determining whether third person neuter gender singular pronouns refer to individual or abstract entities in Danish spoken language. The annotations of dyadic map task dialogues and spontaneous first encounters are analyzed and used in machine learning experiments act to automatically identify the anaphoric functions of pronouns and the type of abstract reference. The analysis of the data shows that abstract reference is more often performed by marked (stressed or demonstrative pronouns) than by unmarked personal pronouns in Danish speech as in English, and therefore previous studies of abstract reference in the former language are corrected. The data also show that silent and filled pauses precede significantly more often third person singular neuter gender pronouns when they refer to abstract entities than when they refer to individual entities. Since abstract entities are not the most salient ones and referring to them is cognitively more hard than referring to individual entities, pauses signal this complex processes. This is in line with perception studies, which connect pauses with the expression of abstract or complex concepts. We also found that unmarked pronouns referring to an entity type usually referred to by a marked pronoun are significantly more often preceded by a speech pause than marked pronouns with the same referent type. This indicates that speech pauses can also signal that the referent of a pronoun of a certain type is not the most expected one. Finally, language models were produced from the annotated map task and first encounter dialogues in order to train machine learning experiments to predict the function of third person neuter gender singular pronouns as a first step toward the identification of the anaphoric antecedents. The language models from the map task dialogues were also used for training classifiers to determine the referent type (speech act, event, fact or proposition) of abstract anaphors. In all cases, the best results were obtained by a multilayer perceptron with an F1-score between 0.52 and 0.67 for the three-class function prediction task and of 0.73 for the referential type prediction.

Download Full-text

Some Observations on Demonstrative Pronouns in the Tarnovo Edition of the Stishen Prologue

Vestnik NSU Series History and Philology ◽

10.25205/1818-7919-2021-20-2-9-19 ◽

2021 ◽

Vol 20 (2) ◽

pp. 9-19

Author(s):

Aneta Tihova

Keyword(s):

Fourteenth Century ◽

Spoken Language ◽

Personal Pronouns ◽

Person Pronoun ◽

Demonstrative Pronouns ◽

Third Person

The article successively examines the demonstrative pronouns for general display, for close people and objects, for distant people and objects and for enumerating, which have the function of the 3rd person forms of personal pronouns in the Tarnovo edition of the Stishen Prologue (a calendar hagiographic collection, translated in the first half of the fourteenth century). It is established that the semantics of the demonstrative forms is determined by the context: in a combination with a noun form they have an indicative meaning, and in a combination with a verb form they mean a third-person pronoun, for example, съи блажен!и which means този блаженият (“this blissful one”), съи бэше means той беше (“he was”). The use of the elongated forms for masculine тъи, for feminine тя, for the plural тие / тйе which later turn into he, she and they, as well as the presence of the new forms тои, characteristic of the spoken language, are historically significant.

Download Full-text

PERSONAL AND DEMONSTRATIVE PRONOUNS AS LINGUISTIC MEANS OF MODELLING THE ADDRESSEE IN KATE FOX’S BOOK “WATCHING THE ENGLISH”

НАУЧНЫЙ ЖУРНАЛ СОВРЕМЕННЫЕ ЛИНГВИСТИЧЕСКИЕ И МЕТОДИКО-ДИДАКТИЧЕСКИЕ ИССЛЕДОВАНИЯ ◽

10.36622/vstu.2020.85.25.007 ◽

2020 ◽

pp. 93-103

Author(s):

Е.М. Вишневская ◽

Н.И. Хайду

Keyword(s):

Speech Act ◽

Point Of View ◽

Semantic Features ◽

Personal Pronouns ◽

Problem Statement ◽

Dual Nature ◽

Native Culture ◽

Demonstrative Pronouns ◽

Third Person ◽

Linguistic Means

Постановка задачи. Исследование посвящено анализу ролевого дейксиса как языкового средства моделирования адресата на материале произведения К. Фокс «Наблюдая за англичанами». В рамках исследования рассматривалась специфика языковой ситуации, которой обусловлен тип адресата. В статье также представлены примеры, иллюстрирующие явление дейксиса и его использование автором произведения. Основными задачами исследования являлись анализ текста с точки зрения прагматики, анализ местоименного дейксиса в тексте произведения, а также определение основных коммуникативных тактик и стратегий взаимодействия автора с адресатом. Результаты. В ходе работы была выявлена дуальная природа адресата, актуализируемая через личные местоимения. Адресат фигурирует в тексте в двух образах: в качестве представителя либо родной культуры автора, либо другой культуры. В одной части повествования автор, обращаясь к соотечественникам, объединяет себя с адресатом при помощи местоимения «мы», в то время как в другой прибегает к использованию местоимений третьего лица, занимая позицию нейтрального наблюдателя рядом с читателем-иностранцем. Автор также применяет указательное местоимение that как способ усиления оценочного значения в тексте, в то время как this функционирует в соответствии с теми же принципами, что и местоимения первого лица. Выводы. По результатам исследования был сделан вывод о том, что для предоставления своей позиции автор пользуется разнообразными приемами, достигая таким образом максимальной включенности адресата в повествование. Problem statement. The paper considers person deixis as linguistic means of modelling the addressee in Kate Fox’s book “Watching the English. The Hidden Rules of English Behaviour”. It tackles particular characteristics of linguistic and cultural situation that define the type of addressee. The paper features examples of person deixis and the author’s use of these linguistic means. It describes the semantic features and the ability of the pronouns to function as deictic markers which provide cohesion of the components in the speech act. The main objective in the study was to analyze the text from the pragmatic point of view, particularly the pronoun deixis and communicative tactics applied by the author. Results. The study revealed the dual nature of the addressee communicated by means of personal pronouns. In the text, the addressee is portrayed as a representative of either the author’s native culture or another culture. When talking to her compatriots, the author unites herself with the addressee by using the pronoun “we”. In other contexts, she takes the position of an observer by the foreign reader’s side and uses third-person pronouns. The study has also revealed that the author uses the indicative pronoun “that” as a way of enhancing the evaluativity in the text, while “this” has the same function in the text as the first-person pronouns. Conclusion. The results of the study suggest that the author uses a variety of deictic techniques to present her position, thus achieving the maximum involvement of the addressee in the narrative.

Download Full-text

Disfluencies signal reference to novel objects for adults but not children

Journal of Child Language ◽

10.1017/s0305000917000368 ◽

2017 ◽

Vol 45 (3) ◽

pp. 581-609 ◽

Cited By ~ 5

Author(s):

Sarah J. OWENS ◽

Justine M. THACKER ◽

Susan A. GRAHAM

Keyword(s):

Young Children ◽

Spoken Language ◽

The Novel ◽

Novel Object ◽

Filled Pauses ◽

Novel Objects ◽

Novel Targets ◽

The Way ◽

Familiar Objects

AbstractSpeech disfluencies can guide the ways in which listeners interpret spoken language. Here, we examined whether three-year-olds, five-year-olds, and adults use filled pauses to anticipate that a speaker is likely to refer to a novel object. Across three experiments, participants were presented with pairs of novel and familiar objects and heard a speaker refer to one of the objects using a fluent (“Look at the ball/lep!”) or disfluent (“Look at thee uh ball/lep!”) expression. The salience of the speaker's unfamiliarity with the novel referents, and the way in which the speaker referred to the novel referents (i.e., a noun vs. a description) varied across experiments. Three- and five-year-olds successfully identified familiar and novel targets, but only adults’ looking patterns reflected increased looks to novel objects in the presence of a disfluency. Together, these findings demonstrate that adults, but not young children, use filled pauses to anticipate reference to novel objects.

Download Full-text

Spoken language understanding and interaction: machine learning for human-like conversational systems

Computer Speech & Language ◽

10.1016/j.csl.2017.05.006 ◽

2017 ◽

Vol 46 ◽

pp. 249-251 ◽

Cited By ~ 1

Author(s):

Milica Gašić ◽

Dilek Hakkani-Tür ◽

Asli Celikyilmaz

Keyword(s):

Machine Learning ◽

Spoken Language ◽

Language Understanding ◽

Spoken Language Understanding

Download Full-text

Transfer Learning for Risk Classification of Social Media Posts: Model Evaluation Study (Preprint)

10.2196/preprints.15371 ◽

2019 ◽

Author(s):

Derek Howard ◽

Marta M Maslej ◽

Justin Lee ◽

Jacob Ritchie ◽

Geoffrey Woollard ◽

...

Keyword(s):

Mental Health ◽

Machine Learning ◽

Social Media ◽

Transfer Learning ◽

Computational Linguistics ◽

Feature Representation ◽

Fine Tuning ◽

Language Models ◽

Universal Sentence ◽

Text Feature

BACKGROUND Mental illness affects a significant portion of the worldwide population. Online mental health forums can provide a supportive environment for those afflicted and also generate a large amount of data that can be mined to predict mental health states using machine learning methods. OBJECTIVE This study aimed to benchmark multiple methods of text feature representation for social media posts and compare their downstream use with automated machine learning (AutoML) tools. We tested on datasets that contain posts labeled for perceived suicide risk or moderator attention in the context of self-harm. Specifically, we assessed the ability of the methods to prioritize posts that a moderator would identify for immediate response. METHODS We used 1588 labeled posts from the Computational Linguistics and Clinical Psychology (CLPsych) 2017 shared task collected from the Reachout.com forum. Posts were represented using lexicon-based tools, including Valence Aware Dictionary and sEntiment Reasoner, Empath, and Linguistic Inquiry and Word Count, and also using pretrained artificial neural network models, including DeepMoji, Universal Sentence Encoder, and Generative Pretrained Transformer-1 (GPT-1). We used Tree-based Optimization Tool and Auto-Sklearn as AutoML tools to generate classifiers to triage the posts. RESULTS The top-performing system used features derived from the GPT-1 model, which was fine-tuned on over 150,000 unlabeled posts from Reachout.com. Our top system had a macroaveraged F1 score of 0.572, providing a new state-of-the-art result on the CLPsych 2017 task. This was achieved without additional information from metadata or preceding posts. Error analyses revealed that this top system often misses expressions of hopelessness. In addition, we have presented visualizations that aid in the understanding of the learned classifiers. CONCLUSIONS In this study, we found that transfer learning is an effective strategy for predicting risk with relatively little labeled data and noted that fine-tuning of pretrained language models provides further gains when large amounts of unlabeled text are available.

Download Full-text

Chart parsing of stochastic spoken language models

10.3115/100964.100999 ◽

1989 ◽

Author(s):

Charles Hemphill ◽

Joseph Picone

Keyword(s):

Spoken Language ◽

Language Models ◽

Chart Parsing

Download Full-text

Anaforinen nolla: Kielioppia ja affekteja

Virittäjä ◽

10.23982/vir.40659 ◽

2008 ◽

Vol 112 (2) ◽

pp. 162

Author(s):

Auli Hakulinen ◽

Lea Laitinen

Keyword(s):

Spoken Language ◽

Initial Position ◽

Written Language ◽

Literary Texts ◽

Clear Cut ◽

Anaphoric Pronouns ◽

Third Person ◽

Story Character ◽

The Third Person ◽

Semantic Properties

Anaphoric zero: Grammar and affect [myös suomeksi] (englanti)2/2008 (112)Anaphoric zero: Grammar and affectThe article examines the syntactic and semantic properties of the anaphoric zero in spoken and written Finnish. Referentially, the zero is equivalent to the third person pronoun hn he/she or he they. However, the writers started out with the hypothesis that this does not necessarily hold for other possible kinds of meaning conveyed by the two different devices, the anaphoric zero and anaphoric pronouns. In standardised written language the conditions for use of the zero are fairly clear cut: within a sentence it is mainly used as an anaphoric device, but in a subordinate clause that precedes the main clause it is also used as a forward-looking, anticipatory anaphor. In spoken language as well as in literary prose the syntactic conditions are more flexible. During the course of the research, it was the literary texts that proved especially fruitful for understanding the implications involved in the use of the anaphoric zero.In earlier work (e.g. Kalliokoski 1990; Heinonen 1995), it has been pointed out that the anaphoric zero typically ties two successive clauses together more tightly than a pronoun would. The writers are able to show that it does something else as well. In talk-in-interaction, it conveys the speakers commitment to and often affiliation with the previous speakers perspective and stance. In reported speech - both in spoken language and in literary dialogue - the zero can convey the speakers attitude concerning the thoughts of the person being referred to, for example irony and empathy.The writers argue that when the zero represents one alternative in a paradigm it is empty only in (morpho)syntactical terms, not in terms of meaning. Whether the speaker chooses a pronoun (hn or he) or a zero, he/she makes a rhetorical choice. The zero alternative creates implications, expressing the speakers affective stance and attitude in relation to the characters in the story, or his/her interpretation of the speech, thought or behaviour of the co-participant or the story character that he/she is quoting.It is striking that in more than 90 per cent of the 150 examples used, the verb is at the beginning of the utterance or turn. In the rest of the cases, the verb is often preceded by an epistemic adverb (varmaan definitely, tuskin hardly), or the utterance is formed as a fixed construction. The writers hypothesise that the grammar of the anaphoric zero should include verb initial position as one of its constitutive factors. This factor is typical both for co-ordinated and subordinated sentences of the standard written language that are governed by syntactic rules, and for the turn-initial expressions that arise from the speakers or narrators affective stance towards the matter at hand.Auli Hakulinen Lea Laitinen- - - - - - - - - - - -Anaforinen nolla: Kielioppia ja affektejaArtikkeli käsittelee anaforisen nollan syntaktisia ja semanttisia ominaisuuksia puhutussa ja kirjoitetussa suomessa. Referentiaalisesti nolla vastaa kolmannen persoonan pronomineja hän, he. Lähdimme kuitenkin siitä oletuksesta, että vastaavuus ei välttämättä koske niiden muita funktioita. Normitetussa kirjakielessä nollan käytön ehdot ovat jokseenkin selvät: virkkeen rajoissa se on anaforinen mutta päälausetta edeltävässä sivulauseessa myös eteenpäin katsova, ennakoiva anafora. Puhutussa kielessä samoin kuin kaunokirjallisessa proosassa anaforisen nollan syntaktiset ehdot ovat joustavammat. Varsinkin kaunokirjalliset tekstit osoittautuivat hedelmällisiksi yrittäessmme tutkimuksen kuluessa ymmrätää nollan käyttöön liittyviä implikaatioita. Aikaisemmassa tutkimuksessa (Kalliokoski 1990, Heinonen 1995) on todettu, että anaforinen nolla sitoo kaksi perättäistä lausetta tiukemmin yhteen kuin pronomini. Omassa tutkimuksessamme voimme osoittaa sen tekevän muutakin. Keskustelupuheessa se välittää puhujan sitoutumista ja usein asettumista (affiliaatiota) edellisen puhujan perspektiiviin ja asennoitumiseen. Referoinnissa - niin vapaassa puheessa kuin kaunokirjallisessa dialogissakin - nolla voi tuoda esiin puhujan asennoitumisen puheenalaisen henkilön ajatuksiin, esimerkiksi ironisia tai empaattisia affekteja.Väitämme siis, että kun nolla on yksi paradigman vaihtoehdoista, se on tyhjä vain (morfo)syntaktisesti, ei merkitykseltään. Käyttää puhuja sitten pronominia hän, he tai nollaa, hän tekee retorisen valinnan. Nollavaihtoehto luo implikaatioita, ilmaisee puhujan affektia ja suhtautumista kertomuksen henkilöön tai tulkintaa referoimansa puhekumppanin tai kertomuksen henkilön puheesta, ajattelusta tai käyttäytymisestä.Huomiota herttää, että yli 90 %:ssa 150 esimerkistämme verbi on lausuman- tai vuoronalkuinen. Lopuissa tapauksista verbi edeltää usein episteeminen adverbi (varmaan, tuskin) tai lausumana on kiteytynyt konstruktio. Hypoteesimme on, että verbialkuisuus on anaforisen nollan kieliopin tärkeä piirre. Se on tyypillinen kirjoitetussa kielessä sekä rinnasteisille ja alisteisille virkkeille, joita säätelevät kirjakielen normit, että vuoronalkuisille ilmauksille, jotka ilmentävät puhujan tai kertojan affektista suhtautumista käsillä olevaan. Auli Hakulinen Lea Laitinen

Download Full-text

Adding filled pauses and disfluent events into language models for speech recognition

2016 7th IEEE International Conference on Cognitive Infocommunications (CogInfoCom) ◽

10.1109/coginfocom.2016.7804538 ◽

2016 ◽

Cited By ~ 2

Author(s):

Jan Stas ◽

Daniel Hladek ◽

Jozef Juhar

Keyword(s):

Speech Recognition ◽

Language Models ◽

Filled Pauses

Download Full-text

An Algorithm for Anaphora Resolution in Spanish Texts

Computational Linguistics ◽

10.1162/089120101753342662 ◽

2001 ◽

Vol 27 (4) ◽

pp. 545-567 ◽

Cited By ~ 18

Author(s):

Manuel Palomar ◽

Antonio Ferrández ◽

Lidia Moreno ◽

Patricio Martínez-Barco ◽

Jesús Peral ◽

...

Keyword(s):

Noun Phrase ◽

Anaphora Resolution ◽

New Approach ◽

Partial Parsing ◽

Test Corpus ◽

Reflexive Pronouns ◽

Different Types ◽

Demonstrative Pronouns ◽

Third Person ◽

Competitive Algorithms

This paper presents an algorithm for identifying noun phrase antecedents of third person personal pronouns, demonstrative pronouns, reflexive pronouns, and omitted pronouns (zero pronouns) in unrestricted Spanish texts. We define a list of constraints and preferences for different types of pronominal expressions, and we document in detail the importance of each kind of knowledge (lexical, morphological, syntactic, and statistical) in anaphora resolution for Spanish. The paper also provides a definition for syntactic conditions on Spanish NP-pronoun noncoreference using partial parsing. The algorithm has been evaluated on a corpus of 1,677 pronouns and achieved a success rate of 76.8%. We have also implemented four competitive algorithms and tested their performance in a blind evaluation on the same test corpus. This new approach could easily be extended to other languages such as English, Portuguese, Italian, or Japanese.

Download Full-text

Better Word Representation Vectors Using Syllabic Alphabet: A Case Study of Swahili

Applied Sciences ◽

10.3390/app9183648 ◽

2019 ◽

Vol 9 (18) ◽

pp. 3648

Author(s):

Casper S. Shikali ◽

Zhou Sijie ◽

Liu Qihe ◽

Refuoe Mokhosi

Keyword(s):

Language Processing ◽

Critical Role ◽

Language Model ◽

Central Africa ◽

Spoken Language ◽

Language Models ◽

Word Embeddings ◽

Word Representation

Deep learning has extensively been used in natural language processing with sub-word representation vectors playing a critical role. However, this cannot be said of Swahili, which is a low resource and widely spoken language in East and Central Africa. This study proposed novel word embeddings from syllable embeddings (WEFSE) for Swahili to address the concern of word representation for agglutinative and syllabic-based languages. Inspired by the learning methodology of Swahili in beginner classes, we encoded respective syllables instead of characters, character n-grams or morphemes of words and generated quality word embeddings using a convolutional neural network. The quality of WEFSE was demonstrated by the state-of-art results in the syllable-aware language model on both the small dataset (31.229 perplexity value) and the medium dataset (45.859 perplexity value), outperforming character-aware language models. We further evaluated the word embeddings using word analogy task. To the best of our knowledge, syllabic alphabets have not been used to compose the word representation vectors. Therefore, the main contributions of the study are a syllabic alphabet, WEFSE, a syllabic-aware language model and a word analogy dataset for Swahili.

Download Full-text