Compositional Distributional Semantics with Syntactic Dependencies and Selectional Preferences

Pablo Gamallo

doi:10.3390/app11125743

Compositional Distributional Semantics with Syntactic Dependencies and Selectional Preferences

Applied Sciences ◽

10.3390/app11125743 ◽

2021 ◽

Vol 11 (12) ◽

pp. 5743

Author(s):

Pablo Gamallo

Keyword(s):

English Language ◽

State Of The Art ◽

Current Work ◽

Distributional Semantics ◽

Compositional Model ◽

Compositional Approach ◽

Sentence Similarity ◽

Selectional Preferences ◽

Syntactic Dependencies ◽

Compositional Distributional Semantics

This article describes a compositional model based on syntactic dependencies which has been designed to build contextualized word vectors, by following linguistic principles related to the concept of selectional preferences. The compositional strategy proposed in the current work has been evaluated on a syntactically controlled and multilingual dataset, and compared with Transformer BERT-like models, such as Sentence BERT, the state-of-the-art in sentence similarity. For this purpose, we created two new test datasets for Portuguese and Spanish on the basis of that defined for the English language, containing expressions with noun-verb-noun transitive constructions. The results we have obtained show that the linguistic-based compositional approach turns out to be competitive with Transformer models.

Download Full-text

Probabilistic Distributional Semantics with Latent Variable Models

Computational Linguistics ◽

10.1162/coli_a_00194 ◽

2014 ◽

Vol 40 (3) ◽

pp. 587-631 ◽

Cited By ~ 3

Author(s):

Diarmuid Ó Séaghdha ◽

Anna Korhonen

Keyword(s):

Latent Variable ◽

Latent Dirichlet Allocation ◽

State Of The Art ◽

Word Meaning ◽

Latent Variable Models ◽

Probabilistic Methods ◽

Distributional Semantics ◽

Probabilistic Framework ◽

Semantic Classes ◽

Selectional Preferences

We describe a probabilistic framework for acquiring selectional preferences of linguistic predicates and for using the acquired representations to model the effects of context on word meaning. Our framework uses Bayesian latent-variable models inspired by, and extending, the well-known Latent Dirichlet Allocation (LDA) model of topical structure in documents; when applied to predicate–argument data, topic models automatically induce semantic classes of arguments and assign each predicate a distribution over those classes. We consider LDA and a number of extensions to the model and evaluate them on a variety of semantic prediction tasks, demonstrating that our approach attains state-of-the-art performance. More generally, we argue that probabilistic methods provide an effective and flexible methodology for distributional semantics.

Download Full-text

The role of syntactic dependencies in compositional distributional semantics

Corpus Linguistics and Linguistic Theory ◽

10.1515/cllt-2016-0038 ◽

2017 ◽

Vol 13 (2) ◽

pp. 261-289 ◽

Cited By ~ 2

Author(s):

Pablo Gamallo

Keyword(s):

Natural Language Processing ◽

Language Processing ◽

Large Scale ◽

Distributional Semantics ◽

Semantic Framework ◽

Syntactic Dependencies ◽

Syntactic Dependency ◽

Compositional Translation ◽

Compositional Distributional Semantics

AbstractThis article provides a preliminary semantic framework for Dependency Grammar in which lexical words are semantically defined as contextual distributions (sets of contexts) while syntactic dependencies are compositional operations on word distributions. More precisely, any syntactic dependency uses the contextual distribution of the dependent word to restrict the distribution of the head, and makes use of the contextual distribution of the head to restrict that of the dependent word. The interpretation of composite expressions and sentences, which are analyzed as a tree of binary dependencies, is performed by restricting the contexts of words dependency by dependency in a left-to-right incremental way. Consequently, the meaning of the whole composite expression or sentence is not a single representation, but a list of contextualized senses, namely the restricted distributions of its constituent (lexical) words. We report the results of two large-scale corpus-based experiments on two different natural language processing applications: paraphrasing and compositional translation.

Download Full-text

When the Whole Is Not Greater Than the Combination of Its Parts: A “Decompositional” Look at Compositional Distributional Semantics

Computational Linguistics ◽

10.1162/coli_a_00215 ◽

2015 ◽

Vol 41 (1) ◽

pp. 165-173 ◽

Cited By ~ 3

Author(s):

Fabio Massimo Zanzotto ◽

Lorenzo Ferrone ◽

Marco Baroni

Keyword(s):

Distributional Semantics ◽

Important Class ◽

Strong Link ◽

Composition Methods ◽

Convolution Kernels ◽

Composition Operations ◽

Compositional Distributional Semantics

Distributional semantics has been extended to phrases and sentences by means of composition operations. We look at how these operations affect similarity measurements, showing that similarity equations of an important class of composition methods can be decomposed into operations performed on the subparts of the input phrases. This establishes a strong link between these models and convolution kernels.

Download Full-text

A Frobenius Model of Information Structure in Categorical Compositional Distributional Semantics

10.3115/v1/w15-2306 ◽

2015 ◽

Cited By ~ 4

Author(s):

Dimitri Kartsaklis ◽

Mehrnoosh Sadrzadeh

Keyword(s):

Information Structure ◽

Distributional Semantics ◽

Compositional Distributional Semantics

Download Full-text

The Experience of Sound: An Interview with Thomas DeLio

Computer Music Journal ◽

10.1162/comj_a_00564 ◽

2020 ◽

Vol 44 (2-3) ◽

pp. 108-117

Author(s):

Thomas Licata

Keyword(s):

New England ◽

Computer Music ◽

Current Work ◽

Compositional Approach ◽

Musical Thinking ◽

Over 40 Years ◽

Early Influences ◽

New England Conservatory

Abstract Thomas DeLio is a composer and theorist of international renown in both fields and especially noted for his work in computer music. In this conversation he discusses his musical thinking with over 40 years in the field. His compositions have been performed worldwide and are recorded on numerous labels. Neuma recordings has recently released five volumes of his recorded compositions in an ongoing series of his collected works. The interview itself, conducted over a series of phone conversations in the summer of 2020, begins with a look at his early student years at the New England Conservatory of Music in the late 1960s and early 1970s and navigates through to his current work and thinking today. Numerous aspects of his compositional approach and aesthetics are discussed, including his early influences, his illuminating thoughts on time and silence in his music, his applications of various technologies, and spatial projection, particularly at it relates to his work with sound installations.

Download Full-text

State-of-the-art review: preventing child and youth pedestrian motor vehicle collisions: critical issues and future directions

Injury Prevention ◽

10.1136/injuryprev-2020-043829 ◽

2020 ◽

pp. injuryprev-2020-043829 ◽

Cited By ~ 2

Author(s):

Marie-Soleil Cloutier ◽

Emilie Beaulieu ◽

Liraz Fridman ◽

Alison K Macpherson ◽

Brent E Hagel ◽

...

Keyword(s):

Risk Factors ◽

English Language ◽

Motor Vehicle ◽

State Of The Art ◽

Intersectoral Collaboration ◽

Future Directions ◽

Vehicle Collisions ◽

The Road ◽

Vehicle Collision ◽

Critical Issues

AimTo undertake a comprehensive review of the best available evidence related to risk factors for child pedestrian motor vehicle collision (PMVC), as well as identification of established and emerging prevention strategies.MethodsArticles on risk factors were identified through a search of English language publications listed in Medline, Embase, Transport, SafetyLit, Web of Science, CINHAL, Scopus and PsycINFO within the last 30 years (~1989 onwards).ResultsThis state-of-the-art review uses the road safety Safe System approach as a new lens to examine three risk factor domains affecting child pedestrian safety (built environment, drivers and vehicles) and four cross-cutting critical issues (reliable collision and exposure data, evaluation of interventions, evidence-based policy and intersectoral collaboration).ConclusionsResearch conducted over the past 30 years has reported extensively on child PMVC risk factors. The challenge facing us now is how to move these findings into action and intervene to reduce the child PMVC injury and fatality rates worldwide.

Download Full-text

A generalised quantifier theory of natural language in categorical compositional distributional semantics with bialgebras

Mathematical Structures in Computer Science ◽

10.1017/s0960129518000324 ◽

2019 ◽

Vol 29 (06) ◽

pp. 783-809

Author(s):

Jules Hedges ◽

Mehrnoosh Sadrzadeh

Keyword(s):

Natural Language ◽

Vector Space ◽

Generative Grammar ◽

Distributional Semantics ◽

Compositional Semantics ◽

Finite Dimensional ◽

Linear Maps ◽

First Time ◽

Quantifier Theory ◽

Compositional Distributional Semantics

AbstractCategorical compositional distributional semantics is a model of natural language; it combines the statistical vector space models of words with the compositional models of grammar. We formalise in this model the generalised quantifier theory of natural language, due to Barwise and Cooper. The underlying setting is a compact closed category with bialgebras. We start from a generative grammar formalisation and develop an abstract categorical compositional semantics for it, and then instantiate the abstract setting to sets and relations and to finite-dimensional vector spaces and linear maps. We prove the equivalence of the relational instantiation to the truth theoretic semantics of generalised quantifiers. The vector space instantiation formalises the statistical usages of words and enables us to, for the first time, reason about quantified phrases and sentences compositionally in distributional semantics.

Download Full-text

Aligning Packed Dependency Trees: A Theory of Composition for Distributional Semantics

Computational Linguistics ◽

10.1162/coli_a_00265 ◽

2016 ◽

Vol 42 (4) ◽

pp. 727-761 ◽

Cited By ~ 6

Author(s):

David Weir ◽

Julie Weeds ◽

Jeremy Reffin ◽

Thomas Kober

Keyword(s):

Distributional Semantics ◽

Dependency Trees ◽

Distributional Knowledge ◽

Compositional Distributional Semantics ◽

New Framework

We present a new framework for compositional distributional semantics in which the distributional contexts of lexemes are expressed in terms of anchored packed dependency trees. We show that these structures have the potential to capture the full sentential contexts of a lexeme and provide a uniform basis for the composition of distributional knowledge in a way that captures both mutual disambiguation and generalization.

Download Full-text

Syntax-Guided Controlled Generation of Paraphrases

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00318 ◽

2020 ◽

Vol 8 ◽

pp. 330-345

Author(s):

Ashutosh Kumar ◽

Kabir Ahuja ◽

Raghuram Vadapalli ◽

Partha Talukdar

Keyword(s):

Real World ◽

English Language ◽

State Of The Art ◽

Source Code ◽

Future Research ◽

Text Generation ◽

Syntactic Information ◽

Input Sentence ◽

World English ◽

Paraphrase Generation

Given a sentence (e.g., “I like mangoes”) and a constraint (e.g., sentiment flip), the goal of controlled text generation is to produce a sentence that adapts the input sentence to meet the requirements of the constraint (e.g., “I hate mangoes”). Going beyond such simple constraints, recent work has started exploring the incorporation of complex syntactic-guidance as constraints in the task of controlled paraphrase generation. In these methods, syntactic-guidance is sourced from a separate exemplar sentence. However, these prior works have only utilized limited syntactic information available in the parse tree of the exemplar sentence. We address this limitation in the paper and propose Syntax Guided Controlled Paraphraser (SGCP), an end-to-end framework for syntactic paraphrase generation. We find that Sgcp can generate syntax-conforming sentences while not compromising on relevance. We perform extensive automated and human evaluations over multiple real-world English language datasets to demonstrate the efficacy of Sgcp over state-of-the-art baselines. To drive future research, we have made Sgcp’s source code available. 1

Download Full-text

Improving the state-of-the-art in Thai semantic similarity using distributional semantics and ontological information

PLoS ONE ◽

10.1371/journal.pone.0246751 ◽

2021 ◽

Vol 16 (2) ◽

pp. e0246751

Author(s):

Ponrudee Netisopakul ◽

Gerhard Wohlgenannt ◽

Aleksei Pulich ◽

Zar Zar Hlaing

Keyword(s):

Semantic Similarity ◽

Language Processing ◽

English Language ◽

State Of The Art ◽

Word Sense Disambiguation ◽

Similarity Score ◽

The State ◽

Word Sense ◽

Word Level ◽

High Fraction

Research into semantic similarity has a long history in lexical semantics, and it has applications in many natural language processing (NLP) tasks like word sense disambiguation or machine translation. The task of calculating semantic similarity is usually presented in the form of datasets which contain word pairs and a human-assigned similarity score. Algorithms are then evaluated by their ability to approximate the gold standard similarity scores. Many such datasets, with different characteristics, have been created for English language. Recently, four of those were transformed to Thai language versions, namely WordSim-353, SimLex-999, SemEval-2017-500, and R&G-65. Given those four datasets, in this work we aim to improve the previous baseline evaluations for Thai semantic similarity and solve challenges of unsegmented Asian languages (particularly the high fraction of out-of-vocabulary (OOV) dataset terms). To this end we apply and integrate different strategies to compute similarity, including traditional word-level embeddings, subword-unit embeddings, and ontological or hybrid sources like WordNet and ConceptNet. With our best model, which combines self-trained fastText subword embeddings with ConceptNet Numberbatch, we managed to raise the state-of-the-art, measured with the harmonic mean of Pearson on Spearman ρ, by a large margin from 0.356 to 0.688 for TH-WordSim-353, from 0.286 to 0.769 for TH-SemEval-500, from 0.397 to 0.717 for TH-SimLex-999, and from 0.505 to 0.901 for TWS-65.

Download Full-text