scholarly journals Improving Statistical Linguistic Algorithms for Parsing Mathematics

10.29007/8c2m ◽  
2018 ◽  
Author(s):  
Cezary Kaliszyk ◽  
Josef Urban ◽  
Jiri Vyskocil

In this paper we describe our combined statistical/semantic parsing method based on the CYK chart-parsing algorithm augmented with limited internal typechecking and external ATP filtering. This method was previously evaluated on parsing ambiguous mathematical expressions over the informalized Flyspeck corpus of 20000 theorems. We first discuss the motivation and drawbacks of the first version of the CYK-based component of the algorithm, and then we propose and implement a more sophisticated approach based on better statistical model of mathematical data structures.

2019 ◽  
Author(s):  
Nick Papoulias

Background. Context-free grammars (CFGs) and Parsing-expression Grammars (PEGs) are the two main formalisms used by formal specifications and parsing frameworks to describe programming languages. They mainly differ in the definition of the choice operator, describing language alternatives. CFGs support the use of non-deterministic choice (i.e., unordered choice), where all alternatives are equally explored. PEGs support a deterministic choice (i.e., ordered choice), where alternatives are explored in strict succession. In practice the two formalisms, are used through concrete classes of parsing algorithms (such as Left-to-right, rightmost derivation (LR) for CFGs and Packrat parsing for PEGs), that follow the semantics of the formal operators. Problem Statement. Neither the two formalisms, nor the accompanying algorithms are sufficient for a complete description of common cases arising in language design. In order to properly handle ambiguity, recursion, precedence or associativity, parsing frameworks either introduce implementation specific directives or ask users to refactor their grammars to fit the needs of the framework/algorithm/formalism combo. This introduces significant complexity even in simple cases and results in incompatible grammar specifications. Our Proposal. We introduce Multi-Ordered Grammars (MOGs) as an alternative to the CFG and PEG formalisms. MOGs aim for a better exploration of ambiguity, ordering, recursion and associativity during language design. This is achieved by (a) allowing both deterministic and non-deterministic choices to co-exist, and (b) introducing a form of recursive and scoped ordering. The formalism is accompanied by a new parsing algorithm (Gray) that extends chart parsing (normally used for Natural Language Processing) with the proposed MOG operators. Results. We conduct two case-studies to assess the expressiveness of MOGs, compared to CFGs and PEGs. The first consists of two idealized examples from literature (an expression grammar and a simple procedural language). The second examines a real-world case (the entire Smalltalk grammar and eleven new Smalltalk extensions) probing the complexities of practical needs. We show that in comparison, MOGs are able to reduce complexity and naturally express language constructs, without resorting to implementation specific directives. Conclusion. We conclude that combining deterministic and non-deterministic choices in a single grammar specification is indeed not only possible but also beneficial. Moreover, augmented by operators for recursive and scoped ordering the resulting multi-ordered formalism presents a viable alternative to both CFGs and PEGs. Concrete implementations of MOGs can be constructed by extending chart parsing with MOG operators for recursive and scoped ordering.


2014 ◽  
Author(s):  
Michael M Monagan

The principal data structure in Maple used to represent polynomials and general mathematical expressions involving functions like sqrt(x), sin x, exp(2x), y'(x) etc., is known to the Maple developers as the sum-of-products data structure. Gaston Gonnet, as the primary author of the Maple kernel, designed and implemented this data structure in the early 80s. As part of the process of simplifying a mathematical formula, he represented every Maple object and every sub-object uniquely in memory. This makes testing for equality, which is used in many operations, very fast. In this article, on occasion of Gaston's retirement, we present details of his design, its pros and cons, and changes we have made to it over the years. One of the cons is the sum-of-products data structure is not nearly as efficient for multiplying multivariate polynomials as other special purpose computer algebra systems. We describe the new data structure called POLY which we added to Maple 17 (released 2013) to improve performance for polynomials in Maple, and recent work done for Maple 18 (released 2014).


2016 ◽  
Vol 27 (2) ◽  
pp. 480-489 ◽  
Author(s):  
Moonseong Heo ◽  
Namhee Kim ◽  
Michael L Rinke ◽  
Judith Wylie-Rosett

Stepped-wedge (SW) designs have been steadily implemented in a variety of trials. A SW design typically assumes a three-level hierarchical data structure where participants are nested within times or periods which are in turn nested within clusters. Therefore, statistical models for analysis of SW trial data need to consider two correlations, the first and second level correlations. Existing power functions and sample size determination formulas had been derived based on statistical models for two-level data structures. Consequently, the second-level correlation has not been incorporated in conventional power analyses. In this paper, we derived a closed-form explicit power function based on a statistical model for three-level continuous outcome data. The power function is based on a pooled overall estimate of stratified cluster-specific estimates of an intervention effect. The sampling distribution of the pooled estimate is derived by applying a fixed-effect meta-analytic approach. Simulation studies verified that the derived power function is unbiased and can be applicable to varying number of participants per period per cluster. In addition, when data structures are assumed to have two levels, we compare three types of power functions by conducting additional simulation studies under a two-level statistical model. In this case, the power function based on a sampling distribution of a marginal, as opposed to pooled, estimate of the intervention effect performed the best. Extensions of power functions to binary outcomes are also suggested.


2014 ◽  
Author(s):  
Michael M Monagan

The principal data structure in Maple used to represent polynomials and general mathematical expressions involving functions like sqrt(x), sin x, exp(2x), y'(x) etc., is known to the Maple developers as the sum-of-products data structure. Gaston Gonnet, as the primary author of the Maple kernel, designed and implemented this data structure in the early 80s. As part of the process of simplifying a mathematical formula, he represented every Maple object and every sub-object uniquely in memory. This makes testing for equality, which is used in many operations, very fast. In this article, on occasion of Gaston's retirement, we present details of his design, its pros and cons, and changes we have made to it over the years. One of the cons is the sum-of-products data structure is not nearly as efficient for multiplying multivariate polynomials as other special purpose computer algebra systems. We describe the new data structure called POLY which we added to Maple 17 (released 2013) to improve performance for polynomials in Maple, and recent work done for Maple 18 (released 2014).


2019 ◽  
Author(s):  
Nick Papoulias

Background. Context-free grammars (CFGs) and Parsing-expression Grammars (PEGs) are the two main formalisms used by formal specifications and parsing frameworks to describe programming languages. They mainly differ in the definition of the choice operator, describing language alternatives. CFGs support the use of non-deterministic choice (i.e., unordered choice), where all alternatives are equally explored. PEGs support a deterministic choice (i.e., ordered choice), where alternatives are explored in strict succession. In practice the two formalisms, are used through concrete classes of parsing algorithms (such as Left-to-right, rightmost derivation (LR) for CFGs and Packrat parsing for PEGs), that follow the semantics of the formal operators. Problem Statement. Neither the two formalisms, nor the accompanying algorithms are sufficient for a complete description of common cases arising in language design. In order to properly handle ambiguity, recursion, precedence or associativity, parsing frameworks either introduce implementation specific directives or ask users to refactor their grammars to fit the needs of the framework/algorithm/formalism combo. This introduces significant complexity even in simple cases and results in incompatible grammar specifications. Our Proposal. We introduce Multi-Ordered Grammars (MOGs) as an alternative to the CFG and PEG formalisms. MOGs aim for a better exploration of ambiguity, ordering, recursion and associativity during language design. This is achieved by (a) allowing both deterministic and non-deterministic choices to co-exist, and (b) introducing a form of recursive and scoped ordering. The formalism is accompanied by a new parsing algorithm (Gray) that extends chart parsing (normally used for Natural Language Processing) with the proposed MOG operators. Results. We conduct two case-studies to assess the expressiveness of MOGs, compared to CFGs and PEGs. The first consists of two idealized examples from literature (an expression grammar and a simple procedural language). The second examines a real-world case (the entire Smalltalk grammar and eleven new Smalltalk extensions) probing the complexities of practical needs. We show that in comparison, MOGs are able to reduce complexity and naturally express language constructs, without resorting to implementation specific directives. Conclusion. We conclude that combining deterministic and non-deterministic choices in a single gram- mar specification is indeed not only possible but also beneficial. Moreover, augmented by operators for recursive and scoped ordering the resulting multi-ordered formalism presents a viable alternative to both CFGs and PEGs. Concrete implementations of MOGs can be constructed by extending chart parsing with MOG operators for recursive and scoped ordering.


2013 ◽  
Vol E96.D (1) ◽  
pp. 93-101
Author(s):  
Meixun JIN ◽  
Yong-Hun LEE ◽  
Jong-Hyeok LEE

2020 ◽  
Vol 34 (05) ◽  
pp. 8952-8959
Author(s):  
Yawei Sun ◽  
Lingling Zhang ◽  
Gong Cheng ◽  
Yuzhong Qu

Semantic parsing transforms a natural language question into a formal query over a knowledge base. Many existing methods rely on syntactic parsing like dependencies. However, the accuracy of producing such expressive formalisms is not satisfying on long complex questions. In this paper, we propose a novel skeleton grammar to represent the high-level structure of a complex question. This dedicated coarse-grained formalism with a BERT-based parsing algorithm helps to improve the accuracy of the downstream fine-grained semantic parsing. Besides, to align the structure of a question with the structure of a knowledge base, our multi-strategy method combines sentence-level and word-level semantics. Our approach shows promising performance on several datasets.


2019 ◽  
Author(s):  
Nick Papoulias

Background. Context-free grammars (CFGs) and Parsing-expression Grammars (PEGs) are the two main formalisms used by formal specifications and parsing frameworks to describe programming languages. They mainly differ in the definition of the choice operator, describing language alternatives. CFGs support the use of non-deterministic choice (i.e., unordered choice), where all alternatives are equally explored. PEGs support a deterministic choice (i.e., ordered choice), where alternatives are explored in strict succession. In practice the two formalisms, are used through concrete classes of parsing algorithms (such as Left-to-right, rightmost derivation (LR) for CFGs and Packrat parsing for PEGs), that follow the semantics of the formal operators. Problem Statement. Neither the two formalisms, nor the accompanying algorithms are sufficient for a complete description of common cases arising in language design. In order to properly handle ambiguity, recursion, precedence or associativity, parsing frameworks either introduce implementation specific directives or ask users to refactor their grammars to fit the needs of the framework/algorithm/formalism combo. This introduces significant complexity even in simple cases and results in incompatible grammar specifications. Our Proposal. We introduce Multi-Ordered Grammars (MOGs) as an alternative to the CFG and PEG formalisms. MOGs aim for a better exploration of ambiguity, ordering, recursion and associativity during language design. This is achieved by (a) allowing both deterministic and non-deterministic choices to co-exist, and (b) introducing a form of recursive and scoped ordering. The formalism is accompanied by a new parsing algorithm (Gray) that extends chart parsing (normally used for Natural Language Processing) with the proposed MOG operators. Results. We conduct two case-studies to assess the expressiveness of MOGs, compared to CFGs and PEGs. The first consists of two idealized examples from literature (an expression grammar and a simple procedural language). The second examines a real-world case (the entire Smalltalk grammar and eleven new Smalltalk extensions) probing the complexities of practical needs. We show that in comparison, MOGs are able to reduce complexity and naturally express language constructs, without resorting to implementation specific directives. Conclusion. We conclude that combining deterministic and non-deterministic choices in a single grammar specification is indeed not only possible but also beneficial. Moreover, augmented by operators for recursive and scoped ordering the resulting multi-ordered formalism presents a viable alternative to both CFGs and PEGs. Concrete implementations of MOGs can be constructed by extending chart parsing with MOG operators for recursive and scoped ordering.


Sign in / Sign up

Export Citation Format

Share Document