Advanced algorithms for fast and scalable deep packet inspection
Kumar S., Turner J., Williams J.
Modern deep packet inspection systems use regular expressions to define various patterns of interest in network data streams. Deterministic Finite Automata (DFA) are commonly used to parse regular expressions. DFAs are fast, but can require prohibitively large amounts of memory for patterns arising in network applications. Traditional DFA table compression only slightly reduces the memory required and requires an additional memory access per input character. Alternative representations of regular expressions, such as NFAs and Delayed Input DFAs (D 2FA) require less memory but sacrifice throughput. In this paper we introduce the Content Addressed Delayed Input DFA (CD 2FA), which provides a compact representation of regular expressions that match the throughput of traditional uncompressed DFAs. A CD 2FA addresses successive states of a D 2FA using their content, rather than a "content-less" identifier. This makes selected information available earlier in the state traversal process, which makes it possible to avoid unnecessary memory accesses. We demonstrate that such content-addressing can be effectively used to obtain automata that are very compact and can achieve high throughput. Specifically, we show that for an application using thousands of patterns defined by regular expressions, CD 2FAs use as little as 10% of the space required by a conventional compressed DFA, and match the throughput of an uncompressed DFA. Copyright 2006 ACM.