MULTI-GIGABIT PATTERN MATCHING FOR PACKET ASSESMENT IN NETWORK SECURITY

download MULTI-GIGABIT PATTERN MATCHING FOR PACKET ASSESMENT IN NETWORK SECURITY

of 6

Transcript of MULTI-GIGABIT PATTERN MATCHING FOR PACKET ASSESMENT IN NETWORK SECURITY

  • 8/2/2019 MULTI-GIGABIT PATTERN MATCHING FOR PACKET ASSESMENT IN NETWORK SECURITY

    1/6

    Abstract - In the current scenario network security is emerg-

    ing the world. Matching large sets of patterns against an incom-

    ing stream of data is a fundamental task in several fields such

    as network security or computational biology. High-speed net-

    work intrusion detection systems (IDS) rely on efficient pattern

    matching techniques to analyze the packet payload and make

    decisions on the significance of the packet body. However,

    matching the streaming payload bytes against thousands of

    patterns at multi-gigabit rates is computationally intensive.

    Various techniques have been proposed in past but the perfor-

    mance of the system is reducing because of multi-gigabit rates.

    Pattern matching is a significant issue in intrusion detection

    systems, but by no means the only one. Handling multi-content

    rules, reordering, and reassembling incoming packets are also

    significant for system performance. We present two pattern

    matching techniques to compare incoming packets against in-

    trusion detection search patterns. The first approach, decoded

    partial CAM (DpCAM), pre-decodes incoming characters,

    aligns the decoded data, and performs logical AND on them to

    produce the match signal for each pattern. The second ap-

    proach, perfect hashing memory (PHmem), uses perfect hash-

    ing to determine a unique memory location that contains the

    search pattern and a comparison between incoming data and

    memory output to determine the match. The suggested methodshave implemented in VHDL coding and we use Xilinx for syn-

    thesis.

    Keywords : DCAM, DpCAM ,Packet inspection, pattern matching,

    perfect hashing.

    I. INTRODUCTION

    Matching large sets of patterns against an incom-

    ing stream of data is a fundamental task in several

    fields such as network security or computational biolo-

    gy. For example, high-speed network intrusion detec-

    tion systems (IDS) rely on efficient pattern matching

    techniques to analyze the packet payload and makedecisions on the significance of the packet body. How-

    ever, matching the streaming payload bytes against

    thousands of patterns at multi-gigabit rates is computa-

    tionally intensive.

    On the other hand, hardware-based solutions can

    significantly increase performance and achieve higher

    throughput. Many hardware units have been proposed

    for IDS pattern matching most of them in the area of

    reconfigurable hardware. In general, field-

    programmable gate arrays (FPGAs) are well suited

    for this task, since designs can be customized for a par-ticular set of search patterns and updates to that set can

    be performed via reconfiguration. FPGA-based systems

    provide higher flexibility and comparable to ASICs

    performance. Generally, the performance results of

    FPGA systems are promising, showing that FPGAs can

    be used to support the increasing needs for network

    security. FPGAs are flexible, reconfigurable, pro-

    vide hardware speed, and therefore, are suitable for

    implementing such systems. On the other hand, there

    are several issues that should be faced. Large designs

    are complex and therefore hard to operate at high fre-quency. Additionally, matching a large number of pat-

    terns has high area cost, so sharing logic is critical,

    since it could save a significant amount of resources,

    and make designs smaller and faster. Furthermore, the

    performance of such designs is promising and indicates

    that FPGAs can be used to support the increasing needs

    for high-speed network security.

    Pattern matching is a significant issue in intrusion

    detection systems, but by no means the only one. Han-

    dling multi content rules, reordering, and reassembling

    incoming packets are also significant for system perfor-mance. We present two efficient pattern matching tech-

    niques to analyze packet payloads at multi gigabit rates

    and detect hazardous contents. The first one is Decoded

    CAM (DCAM) and uses pre-decoding to exploit pat-

    tern similarities and reduce the area cost of the designs.

    We improve DCAM and decrease the required logic

    resources by partially matching long patterns. The im-

    proved approach is denoted as decoded partial CAM

    (DpCAM). The second approach perfect hashing

    memory (PHmem), combines logic and memory for the

    matching. PHmem utilizes a new perfect hashing tech-

    nique to hash the incoming data and determine a

    unique memory location of a possible matching pat-

    tern. Subsequently, we read this pattern from memory

    and compare it against the incoming data. We extend

    the perfect hashing algorithm in order to guarantee that

    for any given set a perfect hash function can be gener-

    ated, and present a theoretical proof of its correctness.

    The rest of the paper is organized as follows: In

    Section II, we discuss related work. In Section III, we

    describe our initial Discrete Comparator approaches,

    DCAM, DpCAM architectures respectively. In SectionIV, perfect hashing is introduced, In Section V, we pre-

    sent the implementation results of both DpCAM and

    International Journal ofSystems , Algorithms &

    ApplicationsIIIIJJJJSSSSAAAA AAAAMULTI-GIGABIT PATTERN MATCHING FOR PACKET ASSESMENT IN

    NETWORK SECURITY

    N.Sagar1,G.V.RaviKumar2

    1M.TechinEmbeddedsystems,2AssistantProfessor,Dept.ofECE1

    BharatInstituteofEngineeringandTechnology,Hyderabad2St.JohnsCollegeofEngineering&Technology,Yemmiganur

    email:[email protected],[email protected]

    Volume 2, Issue 1, January 2012, ISSN Online: 2277-2677 1

  • 8/2/2019 MULTI-GIGABIT PATTERN MATCHING FOR PACKET ASSESMENT IN NETWORK SECURITY

    2/6

    PHmem and compare them with related work. Finally, in

    Section VI, conclusions are made.

    II. LITERATURE SURVEY

    This chapter includes a brief description of pattern

    matching in software NIDS solutions and hardware-

    based pattern matching architectures in NIDS.

    A. PATTERN MATCHING IN SOFTWARE

    NIDS SOLUTIONS

    NIDS scan packet payloads with signatures to detect

    malicious intrusions. This is a string matching proce-

    dure. The most well-known software-based algorithms

    are: Knuth-Morris-Pratt (KMP), Boyer-Moore (BM),

    Aho-Corasick (AC) and Commentz-Walter (CW). The

    KMP and BM algorithms are designed for single pattern

    searching. If the pattern length is m bytes, then it will

    take O(m+n) time to finish the search in an n bytes pack-

    et. If there are k patterns, the search time will be O (k(m+n)); linearly to k. String matching module using BM

    algorithm is called in current SNORT system.

    The AC and CW algorithms are designed for multi-

    pattern matching by pre-processing the patterns and

    building a finite automaton which can process an input

    packet with n bytes length in O (n) time. However, the

    exponential state explosion cost too much space. On the

    other hand, Aho-Corasick (AC) is a multiple pattern

    string matching algorithm, meaning it matches the input

    against multiple patterns at the same time. Multiple pat-

    tern string matching algorithms generally preprocess the

    set of patterns, and then search all of them together over

    the packet content. AC is more suitable for hardware

    implementation because it has a deterministic execution

    time per packet. They concluded that their compressed

    version of AC is the best choice for hardware implemen-

    tation of pattern matching for NIDS.

    B. HARDWARE-BASED PATTERN

    MATCHING ARCHITECTURES IN NIDS

    In the past few years, numerous hardware-based

    pattern matching solutions have been proposed, most ofthem using FPGAs, finite automata or hashing approach-

    es. Next, we describe some significant steps forward in

    IDS pattern matching over the past few years. Simple

    CAM or discrete comparators structures offer high per-

    formance, at high area cost. Using regular expressions

    (NFAs and DFAs) for pattern matching slightly reduces

    the area requirements, however, results in significantly

    lower performance. A technique to substantially increase

    sharing of character comparators and reduce the design

    cost is predecoding, applicable to both regular expres-

    sion and CAM-like approaches. The main idea is that

    incoming characters are predecoded resulting in eachunique character being represented by a single wire.

    This way, an Ncharacter comparator is reduced to an N

    -input AND gate. Yusuf and Luk presented a tree-based

    CAM structure, representing multiple patterns as a Bool-

    ean expression in the form of a binary decision diagram

    (BDD). In doing so, the area cost is lower than other

    CAM and NFA approaches. Given the processing band-

    width limitations of General purpose processors (GPP),

    which can serve only a few hundred Mbps throughput,

    Hardware-based NIDS (Multicore Processors, ASIC orFPGA) as illustrated in Fig. 1 is an attractive alternative

    solution.

    Fig. 1. Abstract illustration of performance and area efficiency forvarious hardware pattern matching techniques

    III. DECODED CAMS

    Simple CAM or discrete comparators may provide

    high performance; however, they are not scalable due to

    their high area cost. We assumed the simple organiza-

    tion depicted in Fig 2(a). The input stream is inserted in

    a shift register, and the individual entries are fanned out

    to the pattern comparators. There is one comparator for

    each pattern, fed from the shift register. This design is

    simple and regular, and with proper use of pipelining,

    the circuit can be fast. Its drawback, however, is the higharea cost. To remedy this cost, we suggested sharing the

    character comparators exploiting similarities between

    patterns as shown in Fig 2(b).

    Fig 2. Basic discrete comparator structure and its optimized version

    which shares common character comparators.

    The Decoded CAM architecture illustrated in Fig 3,

    builds on this idea extending it further by the following

    observation: instead of keeping a window of input char-

    acters in the shift register each of which is compared

    against multiple search patterns, we can first test forequality of the input for the desired characters, and then

    delay the partial matching signals. This approach both

    shares the equality logic for character comparators and

    International Journal ofSystems , Algorithms &

    ApplicationsIIIIJJJJSSSSAAAA AAAAMULTI-GIGABITPATTERNMATCHINGFORPACKETASSESMENTIN

    NETWORKSECURITY

    Volume 2, Issue 1, January 2012, ISSN Online: 2277-2677 2

  • 8/2/2019 MULTI-GIGABIT PATTERN MATCHING FOR PACKET ASSESMENT IN NETWORK SECURITY

    3/6

    replaces the 8-bit wide shift registers used in our initial

    approach with single bit shift registers for the equality

    result(s). If we exploit this advantage, the potential for

    area savings is significant. In practice, about 5 less area

    resources are required compared to simple CAM and

    discrete comparators designs.

    Fig 3. Decoded CAM: Three comparators provide the equality

    signals for characters A, B, and C (A is shared). To match pattern

    ABCA we have to remember (using shift registers) the matching of

    character A, B, C, for 3, 2, and 1 cycles, respectively, until the final

    character is matched.

    One of the possible shortcomings of our approach is

    that the number of the single bit shift registers is propor-

    tional to the length of the patterns. Fig 3 illustrates this

    point: to match a four-character long pattern, we need to

    test equality for each character (in the dashed decoder

    block), and to delay the matching of the first character

    by three cycles, the matching of the second character bytwo cycles, and so on, for the width of the search pat-

    tern. In total, the number of storage elements required in

    this approach is for a string of length. For many long

    patterns this number can exceed the number of bits in

    the character shift register used in the original CAM de-

    sign. To our advantage, however, is that these shift reg-

    isters are true first-inputfirst-outputs (FIFOs) with one

    input and one output, as opposed to the shift registers in

    the simple design in which each entry in the shift regis-

    ter is fan-out to comparators.

    To tackle this possible obstacle, we use two tech-niques. First, we reduce the number of shift registers by

    sharing their outputs whenever the same character is

    used in the same position in multiple search patterns.

    Second, we use the SRL16 optimized implementation of

    shift register that is available in Xilinx devices and uses

    a single logic cell for a shift register. Together these two

    optimizations lead to significant area savings. To further

    reduce the area cost of our designs, we split long pat-

    terns in smaller substrings and match each substring sep-

    arately. This improved version of DCAM is denoted as

    DpCAM(decoded partial CAM). Fig 4 depicts the block

    diagram of matching patterns longer than 16 characters.

    Fig 4. DpCAM: Partial matching of long patterns. In this example, a

    31-byte pattern is matched. The first 16 bytes are partially matched

    and the result is properly delayed to feed the second substring com-

    parator. Both substring comparators are fed from the same pool of

    shifted decoded characters (SRL16s) and therefore sharing of decod-

    ed characters is higher.

    Long patterns are partially matched in substrings of

    maximally 16 characters long. The reason is that the

    AND-tree of a 16 character substring needs only five

    LUTs, while only a single SRL16 shift register is re-

    quired to delay each decoded input character. Conse-quently, a pattern longer than 16 characters is partitioned

    in smaller substrings which are matched separately. The

    partial match of each substring is properly delayed and

    provides input to the AND-tree of the next substring.

    Fig 5. DpCAM processing two characters per cycle.

    In order to achieve better performance, we use tech-

    niques to improve the operating frequency, as well as the

    throughput of our DpCAM implementation. To increase

    the processing throughput, we use parallelism. We wid-

    en the distribution paths by a factor of P providing P

    copies of comparators (decoders) and the corresponding

    matching gates. Fig 5 illustrates this point for P = 2. To

    achieve high operating frequency, we use extensive fine-grain pipelining. The latency of the pipeline depends on

    the pattern length and in practice is a few tens of cycles,

    which translates to a few hundreds of nanoseconds and

    is acceptable for such systems.

    IV. PERFECT HASHING MEMORY

    The alternative pattern matching approach proposed

    in this paper is the PHmem. Instead of matching each

    pattern separately, it is more efficient to utilize a hash

    module to determine which pattern is a possible match,

    read this pattern from a memory and compare it against

    the incoming data. Hardware hashing for pattern match-ing is a technique known for decades.

    International Journal ofSystems , Algorithms &

    ApplicationsIIIIJJJJSSSSAAAA AAAAMULTI-GIGABITPATTERNMATCHINGFORPACKETASSESMENTIN

    NETWORKSECURITY

    Volume 2, Issue 1, January 2012, ISSN Online: 2277-2677 3

  • 8/2/2019 MULTI-GIGABIT PATTERN MATCHING FOR PACKET ASSESMENT IN NETWORK SECURITY

    4/6

    Fig 6. PHmem block diagram.

    Fig 6 depicts our PHmem scheme. The incoming

    packet data are shifted into a serial-in parallel-out shift

    register. The parallel-out lines of the shift register pro-

    vide input to the comparator which is also fed by the

    memory that stores the patterns. Selected bit positions of

    the shifted incoming data are used as input to a hash

    module, which outputs the ID of the possible match

    pattern.

    For memory utilization reasons, we do not use this

    pattern ID to directly read the search pattern from the

    pattern memory. We utilize instead an indirection

    memory. The indirection memory outputs the actual lo-

    cation of the pattern in the pattern memory and its length

    that is used to determine which bytes of the pattern

    memory and the incoming data are needed to be com-

    pared. In our case, the indirection memory performs a 1-

    to-1 instead of the N-to-1 mapping, since the output ad-

    dress has the same width (number of bits) as the pattern

    ID. Finally, it is worth noting that the implementation ofthe hash tree and the memories are pipelined. Conse-

    quently, the incoming bit stream must be buffered by the

    same amount of pipeline stages in order to correctly

    align it for comparison with the chosen pattern from the

    pattern memory.

    A. PERFECT HASHING TREE

    The proposed scheme requires the hash function to

    generate a different address for each pattern, in other

    words, requires a perfect hash function which has no

    collisions for a given set of patterns. Furthermore, the

    address space would preferably be minimal and equal to

    the number of patterns. Instead of matching unique pat-

    tern prefixes, we hash unique substrings in order to dis-

    tinguish the patterns. To do so, we introduce a perfect

    hashing method to guarantee that no collisions will oc-

    cur for a given set. Generating such a perfect hash func-

    tion may be difficult and time consuming. In our ap-

    proach, instead of searching for a single hash function,

    we search for multiple simpler sub hashes that when put

    together in a tree-like structure will construct a perfect

    hash function. The perfect hash tree, is created based on

    the idea of divide and conquer. Let A be a set ofunique\ sub-strings ={a1, a2.an} and H (A) a perfect

    hash function ofA, then the perfect hash tree is created

    according to the following equations:

    H(A) = h0 (H1 (1st half of A), H2 (2

    nd half of A)) (1)

    H1 (1st half of A) = h1 (H1.1(1

    st quarter of A),

    H1.2(2nd quarter of A)) (2)

    and so on for the smaller subsets of the set A (until each

    subset contains a single element). The h0, h1 etc., are

    functions that combine subhashes. The H1, H2, H1.1, H1.2etc., are perfect hashes of subsets (subhashes).

    Following the previously discussed methodology,

    we create a binary hash tree. For a given set of n pat-

    terns that have unique substrings, we consider the set of

    substrings as an nxm matrix A. Each row of the matrix

    A (m bits long) represents a substring, which differs at

    least in one bit from all the other rows. Each column of

    the matrix A( n bits long) represents a different bit posi-

    tion of the substrings. The perfect hash tree should have

    log2(n) output bits in order to be minimal. We construct

    the tree by recursively partitioning the given matrix asfollows.

    Search for a function (e.g., h0) that separates the

    matrix A in two parts (e.g., A0, A1), which can be

    encoded in log2(n) 1 bits.

    Recursively repeat the procedure for each part of

    the matrix, in order to separate them again in

    smaller parts.

    The process terminates when all parts contain one

    row.

    To prove that our method generates perfect hash func-

    tions, we need to prove the following.

    For any given set A ofn items that can be encoded

    in log2(n) bits, our method generates a function

    h:A{0,1} to split the set in two subsets that can

    be encoded in log2(n/2) bits (that is log2(n) 1

    bits).

    Based on the first proof, the proposed scheme out-

    puts a perfect hash function for the initial set of

    patterns.

    Proof: By definition, a hash function H|A of set A = {a1,

    a2.an} which outputs a different value for each ele-

    ment aiis perfectH |a1 H |a2. H |ax . (3)

    Also, ifh|S , where S = A U B U . U N and A B

    . N = V is a hash function that separates the n sub-

    sets A,B,N having a different output for elements of

    different subsets is also perfect, that is

    h |A h |B. h |N . (4)

    We construct our hash trees based on two facts.

    First, the selects of the multiplexers h separate per-

    fectly the subsets of the node. Second, that the inputs of

    the leaf nodes are perfect hash functions; this is given bythe fact that each element differs to any other element at

    least one bit, therefore, there exists a single bit that sepa-

    rates (perfectly) any pair of elements in the set. Conse-

    International Journal ofSystems , Algorithms &

    ApplicationsIIIIJJJJSSSSAAAA AAAAMULTI-GIGABITPATTERNMATCHINGFORPACKETASSESMENTIN

    NETWORKSECURITY

    Volume 2, Issue 1, January 2012, ISSN Online: 2277-2677 4

  • 8/2/2019 MULTI-GIGABIT PATTERN MATCHING FOR PACKET ASSESMENT IN NETWORK SECURITY

    5/6

    quently, it must be proven that a node which combines

    the outputs of perfect hash functions HA,HB,..HN of

    the subsets A,B,..N using a perfect hash function h |S

    which separates these subsets, outputs also a perfect

    hash function Hnodefor the entire set S.

    The output of the node is the following:

    Hnode = h|S2

    IF(h|S = h|A) THEN HA ELSEIF (h|S = h|B) THEN HB ELSE . . .

    IF (h|S = h|N) THEN HN ELSE

    Consequently, the Hnode outputs different values for ei-

    ther two entries of the same subset Hnode|aiHnode|aj based

    on (3), or for two entries of different subsets Hnode|aiHnode|bj based on (4). Therefore, each tree node and also

    the entire hash tree output perfect hash functions.

    V. EXPERIMENTAL RESULTS & ANALYSIS

    DpCAM and PHmem structures are simulated by

    active-HDL, synthesized in Xilinx and their FPGA mod-ule are given.

    Fig 7. Simulated results of DpCAM

    Implementation of designs using Vertex-E devices

    with -8 speed grade, designs that process 8, 16 bits/cycle, which were implemented in an Xcv50e8-CS144.

    Table 1, contains comparison of FPGA-based pattern

    matching approaches. When compared with the delay

    time between implemented designs, PHmem is better

    because it has less delay time.

    Fig 8. FPGA module of DpCAM

    Fig 9. Simulated results of PHmem

    Fig 10. FPGA module of PHmem

    Table 1 Comparison of FPGA-based pattern matching

    approaches

    VI. CONCLUSIOS

    Network intrusion detection and prevention systems

    have become an essential part of the Internet infrastruc-

    ture. An IDPS has two primary purposes. First, it must

    detect all potentially damaging events that can occur in a

    system. Second, it must prevent the harmful activity

    from being performed by blocking the flow of harmful

    traffic into the system. We describe two reconfigurable

    pattern matching approaches, suitable for intrusion de-

    tection. The first one (DpCAM) uses only logic and the

    pre-decoding technique to share resources. Pre-decoding

    is a very effective method to share logic between differ-

    ent pattern comparators (either in discrete comparatorsor regular expressions). The second one (PHmem) re-

    quires both memory and logic, employing an in practice

    simple and compact hash function to access the pattern

    memory. The proposed PHmem algorithm guarantees

    the generation of a perfect hash function for any given

    set of patterns. Porting these architectures to newer gen-

    eration FPGAs to take advantage of additional resources

    and benefit from faster clock frequencies is a logical

    next step for expanding these designs. Systems with

    such architectures can significantly outperform existing

    commercially available systems.

    Description Input

    in Bits

    Family Device Logic

    Cells

    Total

    Delay

    (ns)

    BCD 8 Vertex E Xcv50e8-

    CS144

    14 9.341

    CCC 8 Vertex E Xcv50e8-

    CS144

    14 9.341

    DCAM 8 Vertex E Xcv50e8-

    CS144

    178 5.857

    DpCAM 16 Vertex E Xcv50e8-

    CS144

    346 6.991

    PHmem 16 Vertex E Xcv50e8-

    CS144

    269 5.783

    International Journal ofSystems , Algorithms &

    ApplicationsIIIIJJJJSSSSAAAA AAAAMULTI-GIGABITPATTERNMATCHINGFORPACKETASSESMENTIN

    NETWORKSECURITY

    Volume 2, Issue 1, January 2012, ISSN Online: 2277-2677 5

  • 8/2/2019 MULTI-GIGABIT PATTERN MATCHING FOR PACKET ASSESMENT IN NETWORK SECURITY

    6/6

    REFERENCES[1] I. Sourdis and D. Pnevmatikatos, Fast, large-scale string match

    for a 10 Gbps FPGA-based network intrusion detection system, in

    Proc. Int. Conf. Field Program. Logic Appl,2003.

    [2] I. Sourdis and D. Pnevmatikatos, Pre-decoded CAMs for effi-

    cient and high-speed NIDS pattern matching, in Proc. IEEE Symp.

    Field-Program. Custom Comput. Mach., 2004.

    [3] M. Gokhale, D. Dubois, A. Dubois, M. Boorman, S. Poole, andV. Hogsett, Granidt: Towards gigabit rate network intrusion detec-

    tion technology, in Proc. Int. Conf. Field Program. Logic Appl.,

    2002.

    [4] Z. K. Baker and V. K. Prasanna, A methodology for synthesis

    of efficient intrusion detection systems on FPGAs, in Proc. IEEE

    Symp. Field-Program. Custom Comput. Mach., 2004, pp. 135144.

    [5] C. R. Clark and D. E. Schimmel, Scalable parallel pattern-

    matching on high-speed networks, in Proc. IEEE Symp. Field-

    Program. Custom Comput. Mach., 2004, pp. 249257.

    [6]Y. H. Cho, S. Navab, and W. Mangione-Smith, Specialized hard-

    ware for deep network packet filtering, in Proc. 12th Int. Conf. Field

    Program.Logic Appl., 2002, pp. 452461.

    [7] Z. K. Baker and V. K. Prasanna, Automatic synthesis of effi-

    cient intrusion detection systems on FPGAs, in Proc. 14th Int. Conf.

    Field Program. Logic Appl., 2004, pp. 311321.

    [8] G. Papadopoulos and D. Pnevmatikatos, Hashing + Memory =

    Low Cost, exact pattern matching, in Proc. Int. Conf. Field Pro-

    gram. Logic Appl., 2005, pp. 3944.

    [9] Xilinx, San Jose, CA, VirtexE, Virtex2, Virtex2Pro, and Spar-

    tan3 datasheets, 2006. [Online]. Available: http://www.xilinx.com

    [10] F. J. Burkowski, A hardware hashing scheme in the design of a

    multiterm string comparator, IEEE Trans. Comput., vol. 31, no. 9,

    Sep.1982, pp.825834.

    International Journal ofSystems , Algorithms &

    ApplicationsIIIIJJJJSSSSAAAA AAAAMULTI-GIGABITPATTERNMATCHINGFORPACKETASSESMENTIN

    NETWORKSECURITY

    Volume 2, Issue 1, January 2012, ISSN Online: 2277-2677 6