Calibration of Constraint Promotion Does Not Help with Learning Variation in Stochastic Optimality Theory

Giorgio Magri; Benjamin Storme

doi:10.1162/ling_a_00328

Calibration of Constraint Promotion Does Not Help with Learning Variation in Stochastic Optimality Theory

Linguistic Inquiry ◽

10.1162/ling_a_00328 ◽

2020 ◽

Vol 51 (1) ◽

pp. 97-123

Author(s):

Giorgio Magri ◽

Benjamin Storme

Keyword(s):

Detailed Analysis ◽

Optimality Theory ◽

Learning Algorithm ◽

Test Cases ◽

Ranking Algorithm ◽

Gradual Learning Algorithm ◽

Simulation Results ◽

Stochastic Optimality

The Calibrated Error-Driven Ranking Algorithm (CEDRA; Magri 2012 ) is shown to fail on two test cases of phonologically conditioned variation from Boersma and Hayes 2001 . The failure of the CEDRA raises a serious unsolved challenge for learnability research in stochastic Optimality Theory, because the CEDRA itself was proposed to repair a learnability problem ( Pater 2008 ) encountered by the original Gradual Learning Algorithm. This result is supported by both simulation results and a detailed analysis whereby a few constraints and a few candidates at a time are recursively “peeled off” until we are left with a “core” small enough that the behavior of the learner is easy to interpret.

Download Full-text

Empirical Tests of the Gradual Learning Algorithm

Linguistic Inquiry ◽

10.1162/002438901554586 ◽

2001 ◽

Vol 32 (1) ◽

pp. 45-86 ◽

Cited By ~ 244

Author(s):

Paul Boersma ◽

Bruce Hayes

Keyword(s):

Case Studies ◽

Research Program ◽

Optimality Theory ◽

Learning Algorithm ◽

Ranking Algorithm ◽

Gradual Learning Algorithm ◽

Empirical Tests ◽

Learning Data ◽

Free Variation ◽

Constraint Ranking

The Gradual Learning Algorithm (Boersma 1997) is a constraint-ranking algorithm for learning optimality-theoretic grammars. The purpose of this article is to assess the capabilities of the Gradual Learning Algorithm, particularly in comparison with the Constraint Demotion algorithm of Tesar and Smolensky (1993, 1996, 1998, 2000), which initiated the learnability research program for Optimality Theory. We argue that the Gradual Learning Algorithm has a number of special advantages: it can learn free variation, deal effectively with noisy learning data, and account for gradient well-formedness judgments. The case studies we examine involve Ilokano reduplication and metathesis, Finnish genitive plurals, and the distribution of English light and dark /l/.

Download Full-text

On the relationship between learning sequence and rate of acquisition

Proceedings of the Annual Meetings on Phonology ◽

10.3765/amp.v3i0.3688 ◽

2016 ◽

Vol 3 ◽

Author(s):

Karen Jesney

Keyword(s):

Optimality Theory ◽

Learning Algorithm ◽

Learning Algorithms ◽

Learning Models ◽

Learning Sequence ◽

Harmonic Grammar ◽

Language Data ◽

Rate Of Learning ◽

Gradual Learning Algorithm ◽

The Relationship

Many error-driven learning algorithms for constraint-based phonological grammars, including the Gradual Learning Algorithm for Optimality Theory and Harmonic Grammar, predict that more frequent input forms will be acquired earlier than less frequent input forms – a fact that has been commonly taken as a virtue of these models. These models also predict, however, that the rate of learning for more frequent input forms should be faster than the rate of learning for less frequent input forms. In other words, these models predict that sequence and rate of acquisition are related; structures acquired earlier in the course of learning will be acquired more rapidly, while those that are acquired relatively later will be acquired more slowly. This paper explicates these predictions and argues that they are not consistently supported by child language data. Evidence from six children’s acquisition of consonant clusters is presented, demonstrating that, contrary to the predictions of the learning models, learning sequence and rate of acquisition are largely disassociated.

Download Full-text

HG Has No Computational Advantages over OT: Toward a New Toolkit for Computational OT

Linguistic Inquiry ◽

10.1162/ling_a_00139 ◽

2013 ◽

Vol 44 (4) ◽

pp. 569-609 ◽

Cited By ~ 2

Author(s):

Giorgio Magri

Keyword(s):

Machine Learning ◽

Optimality Theory ◽

Learning Algorithm ◽

Harmonic Grammar ◽

Computational Phonology ◽

Gradual Learning Algorithm

Various authors have recently endorsed Harmonic Grammar (HG) as a replacement for Optimality Theory (OT). One argument for this move is that OT seems not to have close correspondents within machine learning while HG allows methods and results from machine learning to be imported into computational phonology. Here, I prove that this argument in favor of HG and against OT is wrong. In fact, I show that any algorithm for HG can be turned into an algorithm for OT. Hence, HG has no computational advantages over OT. This result allows tools from machine learning to be systematically adapted to OT. As an illustration of this new toolkit for computational OT, I prove convergence for a slight variant of Boersma’s (1998) (nonstochastic) Gradual Learning Algorithm.

Download Full-text

Some Correct Error-Driven Versions of the Constraint Demotion Algorithm

Linguistic Inquiry ◽

10.1162/ling.2009.40.4.667 ◽

2009 ◽

Vol 40 (4) ◽

pp. 667-686 ◽

Cited By ~ 8

Author(s):

Paul Boersma

Keyword(s):

Optimality Theory ◽

Learning Algorithm ◽

Learning Algorithms ◽

Gradual Learning Algorithm

This article shows that Error-Driven Constraint Demotion (EDCD), an error-driven learning algorithm proposed by Tesar (1995) for Prince and Smolensky's (1993/2004) version of Optimality Theory, can fail to converge to a correct totally ranked hierarchy of constraints, unlike the earlier non-error-driven learning algorithms proposed by Tesar and Smolensky (1993). The cause of the problem is found in Tesar's use of “mark-pooling ties,” indicating that EDCD can be repaired by assuming Anttila's (1997) “permuting ties” instead. Proofs show, and simulations confirm, that totally ranked hierarchies can indeed be found by both this repaired version of EDCD and Boersma's (1998) Minimal Gradual Learning Algorithm.

Download Full-text

On the optimization and grammaticalization of anaphora

ZAS Papers in Linguistics ◽

10.21248/zaspil.38.2005.251 ◽

2005 ◽

Vol 38 ◽

pp. 187

Author(s):

Jason Mattausch

Keyword(s):

Optimality Theory ◽

Learning Algorithm ◽

Binding Theory ◽

Formal Framework ◽

Gradual Learning Algorithm

The purpose of this dissertation is to defend the idea that the empirical responsibilities of binding theory can be handled in a more psychologically and historically realistic way when assigned to the field of pragmatics. In particular, I wish to show that Optimality Theory (OT) (Prince & Smolensky, 1993), the stochastic OT and Gradual Learning Algorithm of Boersma (1998), the Recoverability of OT of Wilson (2001) and Buchwald et al. (2002), and the bidirectional OT of Blutner (2000b) and Bidirectional Gradual Learning Algorithm of Jäger (2003a) can all participate in a formal framework in which one can formally spell out and justify the idea that the distributional behavior of bound pronouns and reflexivs is a pragmatic phenomenon.

Download Full-text

Velar palatalization in Russian and artificial grammar: Constraints on models of morphophonology

Laboratory Phonology Journal of the Association for Laboratory Phonology ◽

10.1515/labphon.2010.019 ◽

2010 ◽

Vol 1 (2) ◽

Cited By ~ 11

Author(s):

Vsevolod Kapatsinski

Keyword(s):

Optimality Theory ◽

Learning Algorithm ◽

Artificial Grammar ◽

Phonological Acquisition ◽

Cambridge University ◽

Word Forms ◽

Type Frequency ◽

Require Output ◽

The University ◽

Stochastic Optimality

AbstractRussian velar palatalization changes velars into alveopalatals before certain suffixes, including the stem extension -i and the diminutive suffixes -ok and -ek/ik. While velar palatalization always applies before the relevant suffixes in the established lexicon, it often fails with nonce loanwords before -i and -ik but not before -ok or -ek. This is shown to be predicted by the Minimal Generalization Learner (MGL), a model of rule induction and weighting developed by Albright and Hayes (Cognition 90: 119–161, 2003), by a novel version of Network Theory (Bybee, Morphology: A study of the relation between meaning and form, John Benjamins, 1985, Phonology and language use, Cambridge University Press, 2001), which uses competing unconditional product-oriented schemas weighted by type frequency and paradigm uniformity constraints, and by stochastic Optimality Theory with language-specific constraints learned using the Gradual Learning Algorithm (GLA, Boersma, Proceedings of the Institute of Phonetic Sciences of the University of Amsterdam 21: 43–58, 1997). The successful models are shown to predict that a morphophonological rule will fail if the triggering suffix comes to attach to inputs that are not eligible to undergo the rule. This prediction is confirmed in an artificial grammar learning experiment. Under either model, the choice between generalizations or output forms is shown to be stochastic, which requires retrieving known word-forms from the lexicon as wholes, rather than generating them through the grammar. Furthermore, MGL and GLA are shown to succeed only if the suffix and the stem shape are chosen simultaneously, as opposed to the suffix being chosen first and then triggering (or failing to trigger) a stem change. In addition, the GLA is shown to require output-output faithfulness to be ranked above markedness at the beginning of learning (Hayes, Phonological acquisition in Optimality Theory: the early stages, Cambridge University Press, 2004) to account for the present data.

Download Full-text

A wug-shaped curve in sound symbolism: the case of Japanese Pokémon names

Phonology ◽

10.1017/s0952675720000202 ◽

2020 ◽

Vol 37 (3) ◽

pp. 383-418

Author(s):

Shigeto Kawahara

Keyword(s):

Optimality Theory ◽

Experimental Results ◽

Sound Symbolism ◽

Japanese Speakers ◽

Linguistic Patterns ◽

Two Factors ◽

Evolution Status ◽

Stochastic Optimality

An experiment showed that Japanese speakers’ judgement of Pokémons’ evolution status on the basis of nonce names is affected both by mora count and by the presence of a voiced obstruent. The effects of mora count are a case of counting cumulativity, and the interaction between the two factors a case of ganging-up cumulativity. Together, the patterns result in what Hayes (2020) calls ‘wug-shaped curves’, a quantitative signature predicted by MaxEnt. I show in this paper that the experimental results can indeed be successfully modelled with MaxEnt, and also that Stochastic Optimality Theory faces an interesting set of challenges. The study was inspired by a proposal made within formal phonology, and reveals important previously understudied aspects of sound symbolism. In addition, it demonstrates how cumulativity is manifested in linguistic patterns. The work here shows that formal phonology and research on sound symbolism can be mutually beneficial.

Download Full-text

Iterative learning algorithm with a quadratic criterion for linear time-varying systems

Proceedings of the Institution of Mechanical Engineers Part I Journal of Systems and Control Engineering ◽

10.1177/095965180221600309 ◽

2002 ◽

Vol 216 (3) ◽

pp. 309-316 ◽

Cited By ~ 1

Author(s):

S N Huang ◽

K K Tan ◽

T H Lee

Keyword(s):

Learning Algorithm ◽

Linear Time ◽

Tracking Error ◽

Iterative Learning ◽

Tracking Performance ◽

Time Varying ◽

Control Scheme ◽

Time Varying Systems ◽

Simulation Results ◽

Learning Law

A novel iterative learning controller for linear time-varying systems is developed. The learning law is derived on the basis of a quadratic criterion. This control scheme does not include package information. The advantage of the proposed learning law is that the convergence is guaranteed without the need for empirical choice of parameters. Furthermore, the tracking error on the final iteration will be a class K function of the bounds on the uncertainties. Finally, simulation results reveal that the proposed control has a good setpoint tracking performance.

Download Full-text

The TP Chromatic Aberration Estimation by Using Neural Networks

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.121-126.4239 ◽

2011 ◽

Vol 121-126 ◽

pp. 4239-4243 ◽

Cited By ~ 1

Author(s):

Du Jou Huang ◽

Yu Ju Chen ◽

Huang Chu Huang ◽

Yu An Lin ◽

Rey Chue Hwang

Keyword(s):

Neural Networks ◽

Learning Algorithm ◽

Back Propagation ◽

Physical Property ◽

Chromatic Aberration ◽

Neural Model ◽

Artificial Intelligent ◽

Error Back Propagation ◽

The Neural Networks ◽

Simulation Results

The chromatic aberration estimations of touch panel (TP) film by using neural networks are presented in this paper. The neural networks with error back-propagation (BP) learning algorithm were used to catch the complex relationship between the chromatic aberration, i.e., L.A.B. values, and the relative parameters of TP decoration film. An artificial intelligent (AI) estimator based on neural model for the estimation of physical property of TP film is expected to be developed. From the simulation results shown, the estimations of chromatic aberration of TP film are very accurate. In other words, such an AI estimator is quite promising and potential in commercial using.

Download Full-text

Emergence of the Unmarked in Indian Englishes with Different Substrates

10.1093/oxfordhb/9780199777716.013.007 ◽

2014 ◽

Author(s):

Caroline R. Wiltshire

Keyword(s):

Second Language ◽

Second Language Acquisition ◽

Optimality Theory ◽

Learning Algorithm ◽

Indian English ◽

Form And Function ◽

First Languages ◽

And Function ◽

Insight Into

This study uses data from Indian English as a second language, spoken by speakers of five first languages, to illustrate and evaluate the role of the emergence of the unmarked (TETU) in phonological theory. The analysis focusses on word-final consonant devoicing and cluster reduction, for which the five Indian first languages have various constraints, while Indian English is relatively unrestricted. Variation in L2 Indian Englishes results from both transfer of L1 phonotactics and the emergence of the unmarked, accounted for within Optimality Theory. The use of a learning algorithm also allows us to test the relative importance of markedness and frequency and to evaluate the relative markedness of various clusters. Thus, data from Indian Englishes provides insight into the form and function of markedness constraints, as well as the mechanisms of Second Language Acquisition (SLA).

Download Full-text