Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

Modern deep packet inspection systems use regular expressions to define various patterns of interest in network data streams. Deterministic Finite Automata (DFA) are commonly used to parse regular expressions. DFAs are fast, but can require prohibitively large amounts of memory for patterns arising in network applications. Traditional DFA table compression only slightly reduces the memory required and requires an additional memory access per input character. Alternative representations of regular expressions, such as NFAs and Delayed Input DFAs (D 2FA) require less memory but sacrifice throughput. In this paper we introduce the Content Addressed Delayed Input DFA (CD 2FA), which provides a compact representation of regular expressions that match the throughput of traditional uncompressed DFAs. A CD 2FA addresses successive states of a D 2FA using their content, rather than a "content-less" identifier. This makes selected information available earlier in the state traversal process, which makes it possible to avoid unnecessary memory accesses. We demonstrate that such content-addressing can be effectively used to obtain automata that are very compact and can achieve high throughput. Specifically, we show that for an application using thousands of patterns defined by regular expressions, CD 2FAs use as little as 10% of the space required by a conventional compressed DFA, and match the throughput of an uncompressed DFA. Copyright 2006 ACM.

Original publication




Conference paper

Publication Date



81 - 92