scholarly journals Solving string constraints with Regex-dependent functions through transducers with priorities and variables

2022 ◽  
Vol 6 (POPL) ◽  
pp. 1-31
Author(s):  
Taolue Chen ◽  
Alejandro Flores-Lamas ◽  
Matthew Hague ◽  
Zhilei Han ◽  
Denghang Hu ◽  
...  

Regular expressions are a classical concept in formal language theory. Regular expressions in programming languages (RegEx) such as JavaScript, feature non-standard semantics of operators (e.g. greedy/lazy Kleene star), as well as additional features such as capturing groups and references. While symbolic execution of programs containing RegExes appeals to string solvers natively supporting important features of RegEx, such a string solver is hitherto missing. In this paper, we propose the first string theory and string solver that natively provides such support. The key idea of our string solver is to introduce a new automata model, called prioritized streaming string transducers (PSST), to formalize the semantics of RegEx-dependent string functions. PSSTs combine priorities, which have previously been introduced in prioritized finite-state automata to capture greedy/lazy semantics, with string variables as in streaming string transducers to model capturing groups. We validate the consistency of the formal semantics with the actual JavaScript semantics by extensive experiments. Furthermore, to solve the string constraints, we show that PSSTs enjoy nice closure and algorithmic properties, in particular, the regularity-preserving property (i.e., pre-images of regular constraints under PSSTs are regular), and introduce a sound sequent calculus that exploits these properties and performs propagation of regular constraints by means of taking post-images or pre-images. Although the satisfiability of the string constraint language is generally undecidable, we show that our approach is complete for the so-called straight-line fragment. We evaluate the performance of our string solver on over 195000 string constraints generated from an open-source RegEx library. The experimental results show the efficacy of our approach, drastically improving the existing methods (via symbolic execution) in both precision and efficiency.

Author(s):  
Adam Jardine

<p>Autosegmental Phonology is studied in the framework of Formal Language Theory, which classifies the computational complexity of patterns. In contrast to previous computational studies of Autosegmental Phonology, which were mainly concerned with finite-state implementations of the formalism, a methodology for a model-theoretic study of autosegmental diagrams with monadic second-order logic is introduced. Monadic second order logic provides a mathematically rigorous way of studying autosegmental formalisms, and its complexity is well understood. The preliminary conclusion is that autosegmental diagrams which conform to the well-formedness constraints defined here likely describe at most regular sets of strings.</p>


Author(s):  
Péter Bereczky ◽  
István Donkó ◽  
Dániel Horpácsi ◽  
Ambrus Kaposi ◽  
Dávid János Németh

Teaching of programming language theory has a long track record at ELTE Faculty of Informatics. Traditionally, formal semantics and type systems of programming languages, similarly to other theory-oriented subjects, were taught with the pen and paper method. However, modern proof assistants call for replacing this old-fashioned way of teaching with novel and interactive methods that bring deeper understanding, provide better learning experience and build technical skills in applying formal methods. The authors have launched practice classes for two programming language theory subjects and carefully developed course material based on executable and verifiable definitions formalised in the Coq proof assistant. In this paper, we share our experiences regarding the design and implementation of the new material, we outline the pros and cons of using a proof assistant in the courses, and we describe how the presented method may be adapted to other courses.


2020 ◽  
Vol 31 (06) ◽  
pp. 843-873
Author(s):  
Nicolas Baudru ◽  
Pierre-Alain Reynier

Transducers constitute a fundamental extension of automata. The class of regular word functions has recently emerged as an important class of word-to-word functions, characterized by means of (functional, or unambiguous, or deterministic) two-way transducers, copyless streaming string transducers, and MSO-definable graph transformations. A fundamental result in language theory is Kleene’s Theorem, relating finite state automata and regular expressions. Recently, a set of regular function expressions has been introduced and used to prove a similar result for regular word functions, by showing its equivalence with copyless streaming string transducers. In this paper, we propose a direct, simplified and effective translation from unambiguous two-way transducers to regular function expressions extending the Brzozowski and McCluskey algorithm. In addition, our approach allows us to derive a subset of regular function expressions characterizing the (strict) subclass of functional sweeping transducers.


MATEMATIKA ◽  
2018 ◽  
Vol 34 (1) ◽  
pp. 59-71 ◽  
Author(s):  
Fong Wan Heng ◽  
Nurul Izzaty Ismail

In DNA splicing system, the potential effect of sets of restriction enzymes and a ligase that allow DNA molecules to be cleaved and re-associated to produce further molecules is modelled mathematically.  This modelling is done in the framework of formal language theory, in which the nitrogen bases, nucleotides and restriction sites are modelled as alphabets, strings and rules respectively.  The molecules resulting from a splicing system is depicted as the splicing language.  In this research, the splicing language resulting from DNA splicing systems with one palindromic restriction enzyme for one and two (non-overlapping) cutting sites are generalised as regular expressions.


2021 ◽  
Vol 58 (4) ◽  
pp. 335-356
Author(s):  
Sebastian Jakobi ◽  
Katja Meckel ◽  
Carlo Mereghetti ◽  
Beatrice Palano

AbstractWe consider the notion of a constant length queue automaton—i.e., a traditional queue automaton with a built-in constant limit on the length of its queue—as a formalism for representing regular languages. We show that the descriptional power of constant length queue automata greatly outperforms that of traditional finite state automata, of constant height pushdown automata, and of straight line programs for regular expressions, by providing optimal exponential and double-exponential size gaps. Moreover, we prove that constant height pushdown automata can be simulated by constant length queue automata paying only by a linear size increase, and that removing nondeterminism in constant length queue automata requires an optimal exponential size blow-up, against the optimal double-exponential cost for determinizing constant height pushdown automata. Finally, we investigate the size cost of implementing Boolean language operations on deterministic and nondeterministic constant length queue automata.


1990 ◽  
Vol 01 (04) ◽  
pp. 355-368
Author(s):  
ROBERT McNAUGHTON

This brief survey will discuss the early years of the theory of formal languages through about 1970, treating only the most fundamental of the concepts. The paper will conclude with a brief discussion of a small number of topics, the choice reflecting only the personal interest of the author.


MATEMATIKA ◽  
2019 ◽  
Vol 35 (4) ◽  
pp. 1-14
Author(s):  
Wan Heng Fong ◽  
Nurul Izzaty Ismail ◽  
Nor Haniza Sarmin

In DNA splicing system, DNA molecules are cut and recombined with the presence of restriction enzymes and a ligase. The splicing system is analyzed via formal language theory where the molecules resulting from the splicing system generate a language which is called a splicing language. In nature, DNA molecules can be read in two ways; forward and backward. A sequence of string that reads the same forward and backward is known as a palindrome. Palindromic and non-palindromic sequences can also be recognized in restriction enzymes. Research on splicing languages from DNA splicing systems with palindromic and non-palindromic restriction enzymes have been done previously. This research is motivated by the problem of DNA assembly to read millions of long DNA sequences where the concepts of automata and grammars are applied in DNA splicing systems to simplify the assembly in short-read sequences. The splicing languages generated from DNA splicing systems with palindromic and nonpalindromic restriction enzymes are deduced from the grammars which are visualised as automata diagrams, and presented by transition graphs where transition labels represent the language of DNA molecules resulting from the respective DNA splicing systems.


Sign in / Sign up

Export Citation Format

Share Document