Cookies on this website
We use cookies to ensure that we give you the best experience on our website. If you click 'Continue' we'll assume that you are happy to receive all cookies and you won't see this message again. Click 'Find out more' for information on how to change your cookie settings.
Skip to main content

Modern deep packet inspection systems use regular expressions to define various patterns of interest in network data streams. Deterministic Finite Automata (DFA) are commonly used to parse regular expressions. DFAs are fast, but can require prohibitively large amounts of memory for patterns arising in network applications. Traditional DFA table compression only slightly reduces the memory required and requires an additional memory access per input character. Alternative representations of regular expressions, such as NFAs and Delayed Input DFAs (D 2FA) require less memory but sacrifice throughput. In this paper we introduce the Content Addressed Delayed Input DFA (CD 2FA), which provides a compact representation of regular expressions that match the throughput of traditional uncompressed DFAs. A CD 2FA addresses successive states of a D 2FA using their content, rather than a "content-less" identifier. This makes selected information available earlier in the state traversal process, which makes it possible to avoid unnecessary memory accesses. We demonstrate that such content-addressing can be effectively used to obtain automata that are very compact and can achieve high throughput. Specifically, we show that for an application using thousands of patterns defined by regular expressions, CD 2FAs use as little as 10% of the space required by a conventional compressed DFA, and match the throughput of an uncompressed DFA. Copyright 2006 ACM.

Original publication

DOI

10.1145/1185347.1185359

Type

Conference paper

Publication Date

01/12/2006

Pages

81 - 92