Space-time tradeoff in regular expression matching with semi-deterministic finite automata

Author(s):  
Yi-Hua E. Yang ◽  
Viktor K. Prasanna
2009 ◽  
Vol 2009 ◽  
pp. 1-10 ◽  
Author(s):  
Yi-Hua E. Yang ◽  
Viktor K. Prasanna

We present a software toolchain for constructing large-scaleregular expression matching(REM) on FPGA. The software automates the conversion of regular expressions into compact and high-performance nondeterministic finite automata (RE-NFA). Each RE-NFA is described as an RTL regular expression matching engine (REME) in VHDL for FPGA implementation. Assuming a fixed number of fan-out transitions per state, ann-statem-bytes-per-cycle RE-NFA can be constructed inO(n×m)time andO(n×m)memory by our software. A large number of RE-NFAs are placed onto a two-dimensionalstaged pipeline, allowing scalability to thousands of RE-NFAs with linear area increase and little clock rate penalty due to scaling. On a PC with a 2 GHz Athlon64 processor and 2 GB memory, our prototype software constructs hundreds of RE-NFAs used by Snort in less than 10 seconds. We also designed a benchmark generator which can produce RE-NFAs with configurable pattern complexity parameters, including state count, state fan-in, loop-back and feed-forward distances. Several regular expressions with various complexities are used to test the performance of our RE-NFA construction software.


1977 ◽  
Vol 6 (82) ◽  
Author(s):  
Erik Meineche Schmidt

<p>The gain in succinctness of descriptions of regular languages when nondeterministic (unambiguous) finite automata are used rather than unambiguous (deterministic) finite automata, is not bounded by any polynomium.</p><p>The problem of deciding whether an unambiguous regular expression does not generate all words over its terminal alphabet, is in NP.</p>


Algorithms ◽  
2021 ◽  
Vol 14 (8) ◽  
pp. 238
Author(s):  
Faissal Ouardi ◽  
Zineb Lotfi ◽  
Bilal Elghadyry

This paper describes a fast algorithm for constructing directly the equation automaton from the well-known Thompson automaton associated with a regular expression. Allauzen and Mohri have presented a unified construction of small automata and gave a construction of the equation automaton with time and space complexity in O(mlogm+m2), where m denotes the number of Thompson automaton transitions. It is based on two classical automata operations, namely epsilon-removal and Hopcroft’s algorithm for deterministic Finite Automata (DFA) minimization. Using the notion of c-continuation, Ziadi et al. presented a fast computation of the equation automaton in O(m2) time complexity. In this paper, we design an output-sensitive algorithm combining advantages of the previous algorithms and show that its computational complexity can be reduced to O(m×|Q≡e|), where |Q≡e| denotes the number of states of the equation automaton, by an epsilon-removal and Bubenzer minimization algorithm of an Acyclic Deterministic Finite Automata (ADFA).


2012 ◽  
Vol 263-266 ◽  
pp. 3108-3113
Author(s):  
Wei He ◽  
Yun Fei Guo ◽  
Hong Chao Hu

Fast data transmission put forward high requirements on network content matching (NCM). Due to the high time complexity, Nondeterministic Finite Automata (NFA) was unable to meet the demand of regular expression matching (REM) which was the core of NCM; Transfer NFA to Deterministic Finite Automaton (DFA) could enhance the throughput, but led to state explosion, which increased demand for memory. To balance memory and throughput, state explosion in the transformation from NFA to DFA has been analyzed and a new method DC-DFA is presented for large scale REM. DC-DFA is based on hybrid automata structure which composed of NFA and DFA. DC-DFA introduces GradeOne classification to cut the memory usage and deep classification to improve throughput. The results show that for serious state explosion, DC-DFA could reduce 75% DFA states and improve memory utilization efficiently while maintain high system throughput.


Sign in / Sign up

Export Citation Format

Share Document