MULTI-GIGABIT PATTERN MATCHING FOR PACKET ASSESMENT IN NETWORK SECURITY

8/2/2019 MULTI-GIGABIT PATTERN MATCHING FOR PACKET ASSESMENT IN NETWORK SECURITY

1/6

Abstract - In the current scenario network security is emerg-

ing the world. Matching large sets of patterns against an incom-

ing stream of data is a fundamental task in several fields such

as network security or computational biology. High-speed net-

work intrusion detection systems (IDS) rely on efficient pattern

matching techniques to analyze the packet payload and make

decisions on the significance of the packet body. However,

matching the streaming payload bytes against thousands of

patterns at multi-gigabit rates is computationally intensive.

Various techniques have been proposed in past but the perfor-

mance of the system is reducing because of multi-gigabit rates.

Pattern matching is a significant issue in intrusion detection

systems, but by no means the only one. Handling multi-content

rules, reordering, and reassembling incoming packets are also

significant for system performance. We present two pattern

matching techniques to compare incoming packets against in-

trusion detection search patterns. The first approach, decoded

partial CAM (DpCAM), pre-decodes incoming characters,

aligns the decoded data, and performs logical AND on them to

produce the match signal for each pattern. The second ap-

proach, perfect hashing memory (PHmem), uses perfect hash-

ing to determine a unique memory location that contains the

search pattern and a comparison between incoming data and

memory output to determine the match. The suggested methodshave implemented in VHDL coding and we use Xilinx for syn-

thesis.

Keywords : DCAM, DpCAM ,Packet inspection, pattern matching,

perfect hashing.

I. INTRODUCTION

Matching large sets of patterns against an incom-

ing stream of data is a fundamental task in several

fields such as network security or computational biolo-

gy. For example, high-speed network intrusion detec-

tion systems (IDS) rely on efficient pattern matching

techniques to analyze the packet payload and makedecisions on the significance of the packet body. How-

ever, matching the streaming payload bytes against

thousands of patterns at multi-gigabit rates is computa-

tionally intensive.

On the other hand, hardware-based solutions can

significantly increase performance and achieve higher

throughput. Many hardware units have been proposed

for IDS pattern matching most of them in the area of

reconfigurable hardware. In general, field-

programmable gate arrays (FPGAs) are well suited

for this task, since designs can be customized for a par-ticular set of search patterns and updates to that set can

be performed via reconfiguration. FPGA-based systems

provide higher flexibility and comparable to ASICs

performance. Generally, the performance results of

FPGA systems are promising, showing that FPGAs can

be used to support the increasing needs for network

security. FPGAs are flexible, reconfigurable, pro-

vide hardware speed, and therefore, are suitable for

implementing such systems. On the other hand, there

are several issues that should be faced. Large designs

are complex and therefore hard to operate at high fre-quency. Additionally, matching a large number of pat-

terns has high area cost, so sharing logic is critical,

since it could save a significant amount of resources,

and make designs smaller and faster. Furthermore, the

performance of such designs is promising and indicates

that FPGAs can be used to support the increasing needs

for high-speed network security.

Pattern matching is a significant issue in intrusion

detection systems, but by no means the only one. Han-

dling multi content rules, reordering, and reassembling

incoming packets are also significant for system perfor-mance. We present two efficient pattern matching tech-

niques to analyze packet payloads at multi gigabit rates

and detect hazardous contents. The first one is Decoded

CAM (DCAM) and uses pre-decoding to exploit pat-

tern similarities and reduce the area cost of the designs.

We improve DCAM and decrease the required logic

resources by partially matching long patterns. The im-

proved approach is denoted as decoded partial CAM

(DpCAM). The second approach perfect hashing

memory (PHmem), combines logic and memory for the

matching. PHmem utilizes a new perfect hashing tech-

nique to hash the incoming data and determine a

unique memory location of a possible matching pat-

tern. Subsequently, we read this pattern from memory

and compare it against the incoming data. We extend

the perfect hashing algorithm in order to guarantee that

for any given set a perfect hash function can be gener-

ated, and present a theoretical proof of its correctness.

The rest of the paper is organized as follows: In

Section II, we discuss related work. In Section III, we

describe our initial Discrete Comparator approaches,

DCAM, DpCAM architectures respectively. In SectionIV, perfect hashing is introduced, In Section V, we pre-

sent the implementation results of both DpCAM and

International Journal ofSystems , Algorithms &

ApplicationsIIIIJJJJSSSSAAAA AAAAMULTI-GIGABIT PATTERN MATCHING FOR PACKET ASSESMENT IN

NETWORK SECURITY

N.Sagar1,G.V.RaviKumar2

1M.TechinEmbeddedsystems,2AssistantProfessor,Dept.ofECE1

BharatInstituteofEngineeringandTechnology,Hyderabad2St.JohnsCollegeofEngineering&Technology,Yemmiganur

email:[email protected],[email protected]

Volume 2, Issue 1, January 2012, ISSN Online: 2277-2677 1


2/6

PHmem and compare them with related work. Finally, in

Section VI, conclusions are made.

II. LITERATURE SURVEY

This chapter includes a brief description of pattern

matching in software NIDS solutions and hardware-

based pattern matching architectures in NIDS.

A. PATTERN MATCHING IN SOFTWARE

NIDS SOLUTIONS

NIDS scan packet payloads with signatures to detect

malicious intrusions. This is a string matching proce-

dure. The most well-known software-based algorithms

are: Knuth-Morris-Pratt (KMP), Boyer-Moore (BM),

Aho-Corasick (AC) and Commentz-Walter (CW). The

KMP and BM algorithms are designed for single pattern

searching. If the pattern length is m bytes, then it will

take O(m+n) time to finish the search in an n bytes pack-

et. If there are k patterns, the search time will be O (k(m+n)); linearly to k. String matching module using BM

algorithm is called in current SNORT system.

The AC and CW algorithms are designed for multi-

pattern matching by pre-processing the patterns and

building a finite automaton which can process an input

packet with n bytes length in O (n) time. However, the

exponential state explosion cost too much space. On the

other hand, Aho-Corasick (AC) is a multiple pattern

string matching algorithm, meaning it matches the input

against multiple patterns at the same time. Multiple pat-

tern string matching algorithms generally preprocess the

set of patterns, and then search all of them together over

the packet content. AC is more suitable for hardware

implementation because it has a deterministic execution

time per packet. They concluded that their compressed

version of AC is the best choice for hardware implemen-

tation of pattern matching for NIDS.

B. HARDWARE-BASED PATTERN

MATCHING ARCHITECTURES IN NIDS

In the past few years, numerous hardware-based

pattern matching solutions have been proposed, most ofthem using FPGAs, finite automata or hashing approach-

es. Next, we describe some significant steps forward in

IDS pattern matching over the past few years. Simple

CAM or discrete comparators structures offer high per-

formance, at high area cost. Using regular expressions

(NFAs and DFAs) for pattern matching slightly reduces

the area requirements, however, results in significantly

lower performance. A technique to substantially increase

sharing of character comparators and reduce the design

cost is predecoding, applicable to both regular expres-

sion and CAM-like approaches. The main idea is that

incoming characters are predecoded resulting in eachunique character being represented by a single wire.

This way, an Ncharacter comparator is reduced to an N

-input AND gate. Yusuf and Luk presented a tree-based

CAM structure, representing multiple patterns as a Bool-

ean expression in the form of a binary decision diagram

(BDD). In doing so, the area cost is lower than other

CAM and NFA approaches. Given the processing band-

width limitations of General purpose processors (GPP),

which can serve only a few hundred Mbps throughput,

Hardware-based NIDS (Multicore Processors, ASIC orFPGA) as illustrated in Fig. 1 is an attractive alternative

solution.

Fig. 1. Abstract illustration of performance and area efficiency forvarious hardware pattern matching techniques

III. DECODED CAMS

Simple CAM or discrete comparators may provide

high performance; however, they are not scalable due to

their high area cost. We assumed the simple organiza-

tion depicted in Fig 2(a). The input stream is inserted in

a shift register, and the individual entries are fanned out

to the pattern comparators. There is one comparator for

each pattern, fed from the shift register. This design is

simple and regular, and with proper use of pipelining,

the circuit can be fast. Its drawback, however, is the higharea cost. To remedy this cost, we suggested sharing the

character comparators exploiting similarities between

patterns as shown in Fig 2(b).

Fig 2. Basic discrete comparator structure and its optimized version

which shares common character comparators.

The Decoded CAM architecture illustrated in Fig 3,

builds on this idea extending it further by the following

observation: instead of keeping a window of input char-

acters in the shift register each of which is compared

against multiple search patterns, we can first test forequality of the input for the desired characters, and then

delay the partial matching signals. This approach both

shares the equality logic for character comparators and


ApplicationsIIIIJJJJSSSSAAAA AAAAMULTI-GIGABITPATTERNMATCHINGFORPACKETASSESMENTIN

NETWORKSECURITY



3/6

replaces the 8-bit wide shift registers used in our initial

approach with single bit shift registers for the equality

result(s). If we exploit this advantage, the potential for

area savings is significant. In practice, about 5 less area

resources are required compared to simple CAM and

discrete comparators designs.

Fig 3. Decoded CAM: Three comparators provide the equality

signals for characters A, B, and C (A is shared). To match pattern

ABCA we have to remember (using shift registers) the matching of

character A, B, C, for 3, 2, and 1 cycles, respectively, until the final

character is matched.

One of the possible shortcomings of our approach is

that the number of the single bit shift registers is propor-

tional to the length of the patterns. Fig 3 illustrates this

point: to match a four-character long pattern, we need to

test equality for each character (in the dashed decoder

block), and to delay the matching of the first character

by three cycles, the matching of the second character bytwo cycles, and so on, for the width of the search pat-

tern. In total, the number of storage elements required in

this approach is for a string of length. For many long

patterns this number can exceed the number of bits in

the character shift register used in the original CAM de-

sign. To our advantage, however, is that these shift reg-

isters are true first-inputfirst-outputs (FIFOs) with one

input and one output, as opposed to the shift registers in

the simple design in which each entry in the shift regis-

ter is fan-out to comparators.

To tackle this possible obstacle, we use two tech-niques. First, we reduce the number of shift registers by

sharing their outputs whenever the same character is

used in the same position in multiple search patterns.

Second, we use the SRL16 optimized implementation of

shift register that is available in Xilinx devices and uses

a single logic cell for a shift register. Together these two

optimizations lead to significant area savings. To further

reduce the area cost of our designs, we split long pat-

terns in smaller substrings and match each substring sep-

arately. This improved version of DCAM is denoted as

DpCAM(decoded partial CAM). Fig 4 depicts the block

diagram of matching patterns longer than 16 characters.

Fig 4. DpCAM: Partial matching of long patterns. In this example, a

31-byte pattern is matched. The first 16 bytes are partially matched

and the result is properly delayed to feed the second substring com-

parator. Both substring comparators are fed from the same pool of

shifted decoded characters (SRL16s) and therefore sharing of decod-

ed characters is higher.

Long patterns are partially matched in substrings of

maximally 16 characters long. The reason is that the

AND-tree of a 16 character substring needs only five

LUTs, while only a single SRL16 shift register is re-

quired to delay each decoded input character. Conse-quently, a pattern longer than 16 characters is partitioned

in smaller substrings which are matched separately. The

partial match of each substring is properly delayed and

provides input to the AND-tree of the next substring.

Fig 5. DpCAM processing two characters per cycle.

In order to achieve better performance, we use tech-

niques to improve the operating frequency, as well as the

throughput of our DpCAM implementation. To increase

the processing throughput, we use parallelism. We wid-

en the distribution paths by a factor of P providing P

copies of comparators (decoders) and the corresponding

matching gates. Fig 5 illustrates this point for P = 2. To

achieve high operating frequency, we use extensive fine-grain pipelining. The latency of the pipeline depends on

the pattern length and in practice is a few tens of cycles,

which translates to a few hundreds of nanoseconds and

is acceptable for such systems.

IV. PERFECT HASHING MEMORY

The alternative pattern matching approach proposed

in this paper is the PHmem. Instead of matching each

pattern separately, it is more efficient to utilize a hash

module to determine which pattern is a possible match,

read this pattern from a memory and compare it against

the incoming data. Hardware hashing for pattern match-ing is a technique known for decades.



NETWORKSECURITY



4/6

Fig 6. PHmem block diagram.

Fig 6 depicts our PHmem scheme. The incoming

packet data are shifted into a serial-in parallel-out shift

register. The parallel-out lines of the shift register pro-

vide input to the comparator which is also fed by the

memory that stores the patterns. Selected bit positions of

the shifted incoming data are used as input to a hash

module, which outputs the ID of the possible match

pattern.

For memory utilization reasons, we do not use this

pattern ID to directly read the search pattern from the

pattern memory. We utilize instead an indirection

memory. The indirection memory outputs the actual lo-

cation of the pattern in the pattern memory and its length

that is used to determine which bytes of the pattern

memory and the incoming data are needed to be com-

pared. In our case, the indirection memory performs a 1-

to-1 instead of the N-to-1 mapping, since the output ad-

dress has the same width (number of bits) as the pattern

ID. Finally, it is worth noting that the implementation ofthe hash tree and the memories are pipelined. Conse-

quently, the incoming bit stream must be buffered by the

same amount of pipeline stages in order to correctly

align it for comparison with the chosen pattern from the

pattern memory.

A. PERFECT HASHING TREE

The proposed scheme requires the hash function to

generate a different address for each pattern, in other

words, requires a perfect hash function which has no

collisions for a given set of patterns. Furthermore, the

address space would preferably be minimal and equal to

the number of patterns. Instead of matching unique pat-

tern prefixes, we hash unique substrings in order to dis-

tinguish the patterns. To do so, we introduce a perfect

hashing method to guarantee that no collisions will oc-

cur for a given set. Generating such a perfect hash func-

tion may be difficult and time consuming. In our ap-

proach, instead of searching for a single hash function,

we search for multiple simpler sub hashes that when put

together in a tree-like structure will construct a perfect

hash function. The perfect hash tree, is created based on

the idea of divide and conquer. Let A be a set ofunique\ sub-strings ={a1, a2.an} and H (A) a perfect

hash function ofA, then the perfect hash tree is created

according to the following equations:

H(A) = h0 (H1 (1st half of A), H2 (2

nd half of A)) (1)

H1 (1st half of A) = h1 (H1.1(1

st quarter of A),

H1.2(2nd quarter of A)) (2)

and so on for the smaller subsets of the set A (until each

subset contains a single element). The h0, h1 etc., are

functions that combine subhashes. The H1, H2, H1.1, H1.2etc., are perfect hashes of subsets (subhashes).

Following the previously discussed methodology,

we create a binary hash tree. For a given set of n pat-

terns that have unique substrings, we consider the set of

substrings as an nxm matrix A. Each row of the matrix

A (m bits long) represents a substring, which differs at

least in one bit from all the other rows. Each column of

the matrix A( n bits long) represents a different bit posi-

tion of the substrings. The perfect hash tree should have

log2(n) output bits in order to be minimal. We construct

the tree by recursively partitioning the given matrix asfollows.

Search for a function (e.g., h0) that separates the

matrix A in two parts (e.g., A0, A1), which can be

encoded in log2(n) 1 bits.

Recursively repeat the procedure for each part of

the matrix, in order to separate them again in

smaller parts.

The process terminates when all parts contain one

row.

To prove that our method generates perfect hash func-

tions, we need to prove the following.

For any given set A ofn items that can be encoded

in log2(n) bits, our method generates a function

h:A{0,1} to split the set in two subsets that can

be encoded in log2(n/2) bits (that is log2(n) 1

bits).

Based on the first proof, the proposed scheme out-

puts a perfect hash function for the initial set of

patterns.

Proof: By definition, a hash function H|A of set A = {a1,

a2.an} which outputs a different value for each ele-

ment aiis perfectH |a1 H |a2. H |ax . (3)

Also, ifh|S , where S = A U B U . U N and A B

. N = V is a hash function that separates the n sub-

sets A,B,N having a different output for elements of

different subsets is also perfect, that is

h |A h |B. h |N . (4)

We construct our hash trees based on two facts.

First, the selects of the multiplexers h separate per-

fectly the subsets of the node. Second, that the inputs of

the leaf nodes are perfect hash functions; this is given bythe fact that each element differs to any other element at

least one bit, therefore, there exists a single bit that sepa-

rates (perfectly) any pair of elements in the set. Conse-



NETWORKSECURITY



5/6

quently, it must be proven that a node which combines

the outputs of perfect hash functions HA,HB,..HN of

the subsets A,B,..N using a perfect hash function h |S

which separates these subsets, outputs also a perfect

hash function Hnodefor the entire set S.

The output of the node is the following:

Hnode = h|S2

IF(h|S = h|A) THEN HA ELSEIF (h|S = h|B) THEN HB ELSE . . .

IF (h|S = h|N) THEN HN ELSE

Consequently, the Hnode outputs different values for ei-

ther two entries of the same subset Hnode|aiHnode|aj based

on (3), or for two entries of different subsets Hnode|aiHnode|bj based on (4). Therefore, each tree node and also

the entire hash tree output perfect hash functions.

V. EXPERIMENTAL RESULTS & ANALYSIS

DpCAM and PHmem structures are simulated by

active-HDL, synthesized in Xilinx and their FPGA mod-ule are given.

Fig 7. Simulated results of DpCAM

Implementation of designs using Vertex-E devices

with -8 speed grade, designs that process 8, 16 bits/cycle, which were implemented in an Xcv50e8-CS144.

Table 1, contains comparison of FPGA-based pattern

matching approaches. When compared with the delay

time between implemented designs, PHmem is better

because it has less delay time.

Fig 8. FPGA module of DpCAM

Fig 9. Simulated results of PHmem

Fig 10. FPGA module of PHmem

Table 1 Comparison of FPGA-based pattern matching

approaches

VI. CONCLUSIOS

Network intrusion detection and prevention systems

have become an essential part of the Internet infrastruc-

ture. An IDPS has two primary purposes. First, it must

detect all potentially damaging events that can occur in a

system. Second, it must prevent the harmful activity

from being performed by blocking the flow of harmful

traffic into the system. We describe two reconfigurable

pattern matching approaches, suitable for intrusion de-

tection. The first one (DpCAM) uses only logic and the

pre-decoding technique to share resources. Pre-decoding

is a very effective method to share logic between differ-

ent pattern comparators (either in discrete comparatorsor regular expressions). The second one (PHmem) re-

quires both memory and logic, employing an in practice

simple and compact hash function to access the pattern

memory. The proposed PHmem algorithm guarantees

the generation of a perfect hash function for any given

set of patterns. Porting these architectures to newer gen-

eration FPGAs to take advantage of additional resources

and benefit from faster clock frequencies is a logical

next step for expanding these designs. Systems with

such architectures can significantly outperform existing

commercially available systems.

Description Input

in Bits

Family Device Logic

Cells

Total

Delay

(ns)

BCD 8 Vertex E Xcv50e8-

CS144

14 9.341

CCC 8 Vertex E Xcv50e8-

CS144

14 9.341

DCAM 8 Vertex E Xcv50e8-

CS144

178 5.857

DpCAM 16 Vertex E Xcv50e8-

CS144

346 6.991

PHmem 16 Vertex E Xcv50e8-

CS144

269 5.783



NETWORKSECURITY



6/6

REFERENCES[1] I. Sourdis and D. Pnevmatikatos, Fast, large-scale string match

for a 10 Gbps FPGA-based network intrusion detection system, in

Proc. Int. Conf. Field Program. Logic Appl,2003.

[2] I. Sourdis and D. Pnevmatikatos, Pre-decoded CAMs for effi-

cient and high-speed NIDS pattern matching, in Proc. IEEE Symp.

Field-Program. Custom Comput. Mach., 2004.

[3] M. Gokhale, D. Dubois, A. Dubois, M. Boorman, S. Poole, andV. Hogsett, Granidt: Towards gigabit rate network intrusion detec-

tion technology, in Proc. Int. Conf. Field Program. Logic Appl.,

2002.

[4] Z. K. Baker and V. K. Prasanna, A methodology for synthesis

of efficient intrusion detection systems on FPGAs, in Proc. IEEE

Symp. Field-Program. Custom Comput. Mach., 2004, pp. 135144.

[5] C. R. Clark and D. E. Schimmel, Scalable parallel pattern-

matching on high-speed networks, in Proc. IEEE Symp. Field-

Program. Custom Comput. Mach., 2004, pp. 249257.

[6]Y. H. Cho, S. Navab, and W. Mangione-Smith, Specialized hard-

ware for deep network packet filtering, in Proc. 12th Int. Conf. Field

Program.Logic Appl., 2002, pp. 452461.

[7] Z. K. Baker and V. K. Prasanna, Automatic synthesis of effi-

cient intrusion detection systems on FPGAs, in Proc. 14th Int. Conf.

Field Program. Logic Appl., 2004, pp. 311321.

[8] G. Papadopoulos and D. Pnevmatikatos, Hashing + Memory =

Low Cost, exact pattern matching, in Proc. Int. Conf. Field Pro-

gram. Logic Appl., 2005, pp. 3944.

[9] Xilinx, San Jose, CA, VirtexE, Virtex2, Virtex2Pro, and Spar-

tan3 datasheets, 2006. [Online]. Available: http://www.xilinx.com

[10] F. J. Burkowski, A hardware hashing scheme in the design of a

multiterm string comparator, IEEE Trans. Comput., vol. 31, no. 9,

Sep.1982, pp.825834.



NETWORKSECURITY


MULTI-GIGABIT PATTERN MATCHING FOR PACKET ASSESMENT IN NETWORK SECURITY

Documents

Transcript of MULTI-GIGABIT PATTERN MATCHING FOR PACKET ASSESMENT IN NETWORK SECURITY