Solving string constraints with Regex-dependent functions through transducers with priorities and variables

Taolue Chen; Alejandro Flores-Lamas; Matthew Hague; Zhilei Han; Denghang Hu; Shuanglong Kan; Anthony W. Lin; Philipp Rümmer; Zhilin Wu

doi:10.1145/3498707

Solving string constraints with Regex-dependent functions through transducers with priorities and variables

Proceedings of the ACM on Programming Languages ◽

10.1145/3498707 ◽

2022 ◽

Vol 6 (POPL) ◽

pp. 1-31

Author(s):

Taolue Chen ◽

Alejandro Flores-Lamas ◽

Matthew Hague ◽

Zhilei Han ◽

Denghang Hu ◽

...

Keyword(s):

Programming Languages ◽

Formal Language ◽

Formal Semantics ◽

Symbolic Execution ◽

Language Theory ◽

Regular Expressions ◽

Constraint Language ◽

Straight Line ◽

Finite State ◽

Lazy Semantics

Regular expressions are a classical concept in formal language theory. Regular expressions in programming languages (RegEx) such as JavaScript, feature non-standard semantics of operators (e.g. greedy/lazy Kleene star), as well as additional features such as capturing groups and references. While symbolic execution of programs containing RegExes appeals to string solvers natively supporting important features of RegEx, such a string solver is hitherto missing. In this paper, we propose the first string theory and string solver that natively provides such support. The key idea of our string solver is to introduce a new automata model, called prioritized streaming string transducers (PSST), to formalize the semantics of RegEx-dependent string functions. PSSTs combine priorities, which have previously been introduced in prioritized finite-state automata to capture greedy/lazy semantics, with string variables as in streaming string transducers to model capturing groups. We validate the consistency of the formal semantics with the actual JavaScript semantics by extensive experiments. Furthermore, to solve the string constraints, we show that PSSTs enjoy nice closure and algorithmic properties, in particular, the regularity-preserving property (i.e., pre-images of regular constraints under PSSTs are regular), and introduce a sound sequent calculus that exploits these properties and performs propagation of regular constraints by means of taking post-images or pre-images. Although the satisfiability of the string constraint language is generally undecidable, we show that our approach is complete for the so-called straight-line fragment. We evaluate the performance of our string solver on over 195000 string constraints generated from an open-source RegEx library. The experimental results show the efficacy of our approach, drastically improving the existing methods (via symbolic execution) in both precision and efficiency.

Download Full-text

Logic and the Generative Power of Autosegmental Phonology

Proceedings of the Annual Meetings on Phonology ◽

10.3765/amp.v1i1.4 ◽

2014 ◽

Vol 1 (1) ◽

Cited By ~ 3

Author(s):

Adam Jardine

Keyword(s):

Formal Language ◽

Second Order ◽

Order Logic ◽

Language Theory ◽

Preliminary Conclusion ◽

Autosegmental Phonology ◽

Finite State ◽

Regular Sets ◽

Second Order Logic ◽

Monadic Second Order Logic

<p>Autosegmental Phonology is studied in the framework of Formal Language Theory, which classifies the computational complexity of patterns. In contrast to previous computational studies of Autosegmental Phonology, which were mainly concerned with finite-state implementations of the formalism, a methodology for a model-theoretic study of autosegmental diagrams with monadic second-order logic is introduced. Monadic second order logic provides a mathematically rigorous way of studying autosegmental formalisms, and its complexity is well understood. The preliminary conclusion is that autosegmental diagrams which conform to the well-formedness constraints defined here likely describe at most regular sets of strings.</p>

Download Full-text

Interactive Teaching of Programming Language Theory with a Proof Assistant

Central-European Journal of New Technologies in Research, Education and Practice ◽

10.36427/cejntrep.2.1.470 ◽

2020 ◽

pp. 19-33

Author(s):

Péter Bereczky ◽

István Donkó ◽

Dániel Horpácsi ◽

Ambrus Kaposi ◽

Dávid János Németh

Keyword(s):

Programming Languages ◽

Programming Language ◽

Learning Experience ◽

Formal Semantics ◽

Type Systems ◽

Language Theory ◽

Interactive Methods ◽

Proof Assistant ◽

Track Record ◽

New Material

Teaching of programming language theory has a long track record at ELTE Faculty of Informatics. Traditionally, formal semantics and type systems of programming languages, similarly to other theory-oriented subjects, were taught with the pen and paper method. However, modern proof assistants call for replacing this old-fashioned way of teaching with novel and interactive methods that bring deeper understanding, provide better learning experience and build technical skills in applying formal methods. The authors have launched practice classes for two programming language theory subjects and carefully developed course material based on executable and verifiable definitions formalised in the Coq proof assistant. In this paper, we share our experiences regarding the design and implementation of the new material, we outline the pros and cons of using a proof assistant in the courses, and we describe how the presented method may be adapted to other courses.

Download Full-text

From Two-Way Transducers to Regular Function Expressions

International Journal of Foundations of Computer Science ◽

10.1142/s0129054120410087 ◽

2020 ◽

Vol 31 (06) ◽

pp. 843-873

Author(s):

Nicolas Baudru ◽

Pierre-Alain Reynier

Keyword(s):

Regular Function ◽

Language Theory ◽

Fundamental Result ◽

Regular Word ◽

Finite State Automata ◽

Regular Expressions ◽

Important Class ◽

Graph Transformations ◽

Finite State ◽

Effective Translation

Transducers constitute a fundamental extension of automata. The class of regular word functions has recently emerged as an important class of word-to-word functions, characterized by means of (functional, or unambiguous, or deterministic) two-way transducers, copyless streaming string transducers, and MSO-definable graph transformations. A fundamental result in language theory is Kleene’s Theorem, relating finite state automata and regular expressions. Recently, a set of regular function expressions has been introduced and used to prove a similar result for regular word functions, by showing its equivalence with copyless streaming string transducers. In this paper, we propose a direct, simplified and effective translation from unambiguous two-way transducers to regular function expressions extending the Brzozowski and McCluskey algorithm. In addition, our approach allows us to derive a subset of regular function expressions characterizing the (strict) subclass of functional sweeping transducers.

Download Full-text

Generalisations of DNA Splicing Systems with One Palindromic Restriction Enzyme

MATEMATIKA ◽

10.11113/matematika.v34.n1.1011 ◽

2018 ◽

Vol 34 (1) ◽

pp. 59-71 ◽

Cited By ~ 1

Author(s):

Fong Wan Heng ◽

Nurul Izzaty Ismail

Keyword(s):

Restriction Enzyme ◽

Formal Language ◽

Restriction Enzymes ◽

Potential Effect ◽

Language Theory ◽

Regular Expressions ◽

Nitrogen Bases ◽

Restriction Sites ◽

Splicing Systems ◽

Dna Splicing

In DNA splicing system, the potential effect of sets of restriction enzymes and a ligase that allow DNA molecules to be cleaved and re-associated to produce further molecules is modelled mathematically. This modelling is done in the framework of formal language theory, in which the nitrogen bases, nucleotides and restriction sites are modelled as alphabets, strings and rules respectively. The molecules resulting from a splicing system is depicted as the splicing language. In this research, the splicing language resulting from DNA splicing systems with one palindromic restriction enzyme for one and two (non-overlapping) cutting sites are generalised as regular expressions.

Download Full-text

The descriptional power of queue automata of constant length

Acta Informatica ◽

10.1007/s00236-021-00398-7 ◽

2021 ◽

Vol 58 (4) ◽

pp. 335-356

Author(s):

Sebastian Jakobi ◽

Katja Meckel ◽

Carlo Mereghetti ◽

Beatrice Palano

Keyword(s):

Blow Up ◽

Regular Expressions ◽

Constant Length ◽

Straight Line ◽

Constant Height ◽

Double Exponential ◽

Finite State ◽

Pushdown Automata ◽

Exponential Size ◽

Straight Line Programs

AbstractWe consider the notion of a constant length queue automaton—i.e., a traditional queue automaton with a built-in constant limit on the length of its queue—as a formalism for representing regular languages. We show that the descriptional power of constant length queue automata greatly outperforms that of traditional finite state automata, of constant height pushdown automata, and of straight line programs for regular expressions, by providing optimal exponential and double-exponential size gaps. Moreover, we prove that constant height pushdown automata can be simulated by constant length queue automata paying only by a linear size increase, and that removing nondeterminism in constant length queue automata requires an optimal exponential size blow-up, against the optimal double-exponential cost for determinizing constant height pushdown automata. Finally, we investigate the size cost of implementing Boolean language operations on deterministic and nondeterministic constant length queue automata.

Download Full-text

Formal Semantics and Abstract Properties of String Pattern Operations and Extended Formal Language Description Mechanisms

SIAM Journal on Computing ◽

10.1137/0212011 ◽

1983 ◽

Vol 12 (1) ◽

pp. 166-188 ◽

Cited By ~ 1

Author(s):

A. C. Fleck ◽

R. S. Limaye

Keyword(s):

Formal Language ◽

Formal Semantics ◽

String Pattern ◽

Language Description

Download Full-text

THE DEVELOPMENT OF FORMAL LANGUAGE THEORY SINCE 1956

International Journal of Foundations of Computer Science ◽

10.1142/s0129054190000254 ◽

1990 ◽

Vol 01 (04) ◽

pp. 355-368

Author(s):

ROBERT McNAUGHTON

Keyword(s):

Formal Language ◽

Formal Languages ◽

Early Years ◽

Formal Language Theory ◽

Language Theory ◽

Personal Interest

This brief survey will discuss the early years of the theory of formal languages through about 1970, treating only the most fundamental of the concepts. The paper will conclude with a brief discussion of a small number of topics, the choice reflecting only the personal interest of the author.

Download Full-text

Introduction to Formal Language Theory

Modeling and Control of Logical Discrete Event Systems ◽

10.1007/978-1-4615-2217-1_1 ◽

1995 ◽

pp. 1-34

Author(s):

Ratnesh Kumar ◽

Vijay K. Garg

Keyword(s):

Formal Language ◽

Formal Language Theory ◽

Language Theory

Download Full-text

Theory of L systems: From the point of view of formal language theory

L Systems - Lecture Notes in Computer Science ◽

10.1007/3-540-06867-8_1 ◽

1974 ◽

pp. 1-23 ◽

Cited By ~ 12

Author(s):

G. Rozenberg

Keyword(s):

Formal Language ◽

Point Of View ◽

Formal Language Theory ◽

Language Theory ◽

L Systems

Download Full-text

Automata for DNA Splicing Languages with Palindromic and Non-Palindromic Restriction Enzymes using Grammars

MATEMATIKA ◽

10.11113/matematika.v35.n4.1260 ◽

2019 ◽

Vol 35 (4) ◽

pp. 1-14

Author(s):

Wan Heng Fong ◽

Nurul Izzaty Ismail ◽

Nor Haniza Sarmin

Keyword(s):

Dna Sequences ◽

Formal Language ◽

Restriction Enzymes ◽

Language Theory ◽

Dna Assembly ◽

Dna Molecules ◽

Splicing Systems ◽

Transition Graphs ◽

Palindromic Sequences ◽

Dna Splicing

In DNA splicing system, DNA molecules are cut and recombined with the presence of restriction enzymes and a ligase. The splicing system is analyzed via formal language theory where the molecules resulting from the splicing system generate a language which is called a splicing language. In nature, DNA molecules can be read in two ways; forward and backward. A sequence of string that reads the same forward and backward is known as a palindrome. Palindromic and non-palindromic sequences can also be recognized in restriction enzymes. Research on splicing languages from DNA splicing systems with palindromic and non-palindromic restriction enzymes have been done previously. This research is motivated by the problem of DNA assembly to read millions of long DNA sequences where the concepts of automata and grammars are applied in DNA splicing systems to simplify the assembly in short-read sequences. The splicing languages generated from DNA splicing systems with palindromic and nonpalindromic restriction enzymes are deduced from the grammars which are visualised as automata diagrams, and presented by transition graphs where transition labels represent the language of DNA molecules resulting from the respective DNA splicing systems.

Download Full-text