Decompression-Free Inspection: DPI for Shared Dictionary Compression over HTTP

Anat Bremler-BarrInterdisciplinary Center HerzliyaShimrit Tzur David

Interdisciplinary Center Herzliya &The Hebrew University, Jerusalem

David HayThe Hebrew University, Jerusalem

Yaron KoralTel Aviv University

OutlineMotivationBackground

◦AC algorithmOur solution

◦The offline Phase◦The online phase

Experimental Results

Deep Packet Inspection (DPI) Search for patterns in the packets` payload

Signatures-based NIDS ◦Intrusion Preventions

Web-Application Firewalls◦Leakage prevention◦Content Filtering

Challenges:◦Thousands of known malicious patterns◦Real time, link rate

Security tools performance is dominated by the pattern matching engine (Fisk & Varghese 2002)

Compressed HTTP

19% increase in 8 month!

84.1% of the top 1,000 sites compress their traffic.

Data compression is done by adding references to repeated data.

There are two types of compression:

◦Intra-response compression – the references point to bytes within the response (Gzip/Deflate)

◦Inter-responses/connections compression – the references point to bytes in a separate file, called dictionary (Google’s SDCH).

Example – Intra-Response Compression

File1.html:abcdefgabcd

File2.htmlabcdxyzbcdtr

Encode repeated strings by pointer: {distance, length}

TCP Connection Setup

GET File1.html

abcdefg(7,4)

GET File2.html

abcdxyz(6,3)tr

Example – Inter-Response Compression

Dictionary:abcd

File1.html:abcdefgabcd

File2.htmlabcdxyzbcdtr

Copy repeated strings from the dictionary: (address, length)

TCP Connection Setup

GET File1.html

Delta file: (0,4)efg(0,4)

GET File2.html

Delta file:(0,4)xyz(1,3)tr

GET dictionaryabcd

Current NIDS Operation (1)

Server Client

Http uncompressed

GET \index.htmlAccept-Encoding: SDCH

Scan for Intrusions

Http uncompressed

Current NIDS Operation (2)

Server Client

Http compressed

Do Not Scan/Decompress,Scan, Compress

Http compressed

Our Solution

Server Client

Http compressed

Scan directly with no decompression

Http compressed

Our Solution: Decompression-Free Scanning

Focused on inter-response compression

Our algorithm works in two phases◦Offline phase - Scanning the dictionary◦Online phase - Scanning the delta files

Works at the rate of the compressed traffic◦Gain 56% improvement compared with scanning

the plain-text directly

Outline

MotivationBackground

◦Aho-Corasick (AC) algorithmOur solution

Aho-Corasick (AC) Algorithm Finite State Machine (FSM)

◦ Regular states, accepting states

Goto function (black arrows)◦ g(state,symbol)state

Each state corresponds to a label- the sequence of characters on its goto path from the root.◦ The length of the label is the depth of the state

Failure function (red arrows)◦ f(state)state◦ Taken when there is no goto function◦ Goes to a state that its label is the longest suffix of

the current state’s label

s3 s5s4

s13 s6

The label of S14 is BCAA

g(S11,B) = S12g(S11,A) = ?

Patterns:EBEBDBCAABCDCDBCAB

f(S11) = S13 g(S11,A) g(S13,A)=S14

Aho-Corasick InsightsThe automaton remembers

only its current state

◦The input text ends with the label of current state

◦This label is the longest suffix in the text that can be a prefix of a match

No future pattern can begin before this label

s3 s5s4

s13 s6

Outlines

Accelerator Algorithm Idea

The algorithm operates in two phases:The Offline Phase:

◦Scan the dictionary and store information about the pattern matching results

The Online Phase:◦Scan the delta file and skip almost all referenced

bytes that were already scanned for patterns.

The Offline PhaseThe dictionary is scanned using

AC (from its first byte and from s0). We save the state after each byte.

11 10 9 8 7 6 5 4 3 2 1 0C B A C B D C A A E B DS5 S12 S11 S10 S9 S8 S7 S0 S0 S3 S2 S0

s3 s5s4

AState:

We also save information of matched patterns that are found in the dictionary

ChallengesDictionary:Delta file:

ABDB(5,4)AAB(1,4)The uncompressed data is:

We copy from arbitrary position in the dictionary when the automaton in an arbitrary state◦We show that no matter in what state and which

symbol we start to copy, the resulting state is reachable via failure transitions from the saved state.

A B D B C D B C A A B B E A A

Patterns/Signatures:EBEBDBCAABCDCDBCAB

Types of matches:Right boundaryInternalLeft boundary

0 1 2 3 4 5 6 7 8 9 10 11

DB E A A C DB C A B C

The Online Phase

Scan the delta file:Uncompressed bytes - scan using AC.

Copy instruction (p,x)◦ The compressed data that we already scanned in the offline

phase.◦ We will save the scan for almost all these bytes.

The internal match is trivial, see paper for details.

The Online Phase - Right BoundaryWhen encountering copy instruction (p,x),

We want to stop scanning and jump to state[p+x-1]◦If the label of the state is longer than the copy-

value The label begins before the copy value The context of this state is not as in the online scan We take failure transitions to find state with

sufficiently short label.

◦otherwise The label of the state is contained in the copy value This is the longest suffix that can lead to a match

Example – Right Boundary

Uncompressed data:…B

s3 s5s4

s13 s6

State:

BCABCOPY(7,4):

Go to State[10]=s12. depth(s12) > 4.Go to f(s12)=s2

depth(s2) ≤ 4Current state is S2

The Online Phase – Left BoundaryWhen encountering copy instruction (p,x),

We want to stop scanning and jump to state[p+x-1]◦If the number of bytes we read from the copy value

is less than the depth of the current state The label of the state begins before the copied bytes We scan the copy value till we reach a state that its

label is shorter than the number of read bytes.

◦otherwise The label of the state is contained in the copy value Both offline and online scans have the same context

Example – Left Boundary

Uncompressed data:…B

s3 s5s4

State:

CDBCCOPY(5,4):

j=0depth=1Continue

j=1Depth=2Continue

j=2Depth=3Continuej=3

Stop scanning (depth(s9)≤3)

Outline

Input: ◦google.com dictionary ◦Pages for 1000 most popular Google queries.

Patterns◦Snort

The synthetic case◦A patterns file for each input file so the input

file has a different percentage of matches, from 25% to 100%.

The Algorithm Overheads

1. Traversing the failure transitions◦ In the right boundary

2. Scanning the copy value◦ In the left boundary

3. Memory consumption:◦ The additional information of the offline phase.◦ Total: 420 KB (per dictionary)

Can be further reduced by a variable-length pointer encoding.

Failure Transitions – Right Boundaries

If length ≥ depth, no failure transition is taken

In our experiments:◦The average is 2.35

failure transitions per file (average of 557 copy

instructions per file)

Scanning the Copy Value -Left Boundary

Compression ratio – compressed/uncompressed

Scan ratio – scanned/uncompressed.

Snort◦ low percentage of matches

scan-ratio ~ compression ratio

The synthetic case◦ high percentage of matches◦ Unrealistic case ◦ scan-ratio is between 1.05 to

1.2 times compression-ratio.

Regular Expression Results

Strings were extracted from the regular expression and were added to the pattern set.

When needed, we use off-the-shelf perl compatible regular expression engine to scan additional parts of the text.

The overhead of the regular expression is around 1% which is almost negligible

Questions??

Regular ExpressionVery common in security purpose patterns.

◦In Snort, 55% of the rules contain regular expression.

Composed of anchors and pcre tokens.For example, in the pattern: abc[1-9]*xyza{3,7}The anchors are:

◦abc◦xyz

The pcre tokens are:◦[1-9]*◦a{3,7}

Dealing with Regular Expression

1. The anchors are extracted from the regular expression offline.

2. The anchors are added to the patterns set.

3. If there is a regular expression which all its anchors were matched:

◦run an off the-shelf regular expression engine until, either a mismatch, a full pattern match, or the whole (limited) text is searched.

Regular Expression – Limited Search

In most cases, we can limit the search in at least one direction.◦If before the first anchor all tokens have a

limited size, there is a bounded number of characters we should examine before the matched anchor.

◦If after the last anchor all tokens have a limited size there is a bounded number of characters we should examine after the matched anchor.

Memory Consumption

1. Doubling the size of the dictionary (for saving the offline scan results, one pointer per symbol)

2. Saving the matched list (for internal matches)

Our experiments:◦Match list size 40,000◦Dictionary size 116K symbols◦Pointer size 17 bits

Total memory consumption is 420 KB (per dictionary)◦Can be further reduced by a variable-length pointer

encoding.

Decompression-Free Inspection: DPI for Shared Dictionary Compression over HTTP

Documents

Transcript of Decompression-Free Inspection: DPI for Shared Dictionary Compression over HTTP

Spinal Decompression with 360˚ Instrumented …culous therapy when adequate decompression and fusion are done. Keywords Spinal, Tuberculosis, Decompression, Fusion, Recovery *Corresponding

DESCRIPTION BRAND - bearingssa.co.za · 13685/21 dpi 14125 a/276 dpi 14125 a/276 dpi 14136 a/276 dpi 14585/25 dpi 15100/245 dpi 15101/15245 dpi 15101/15250 x dpi 15106/250 dpi 15117/245

MWM Decompression Brake

Lesson 002: Decompression

Advanced Core Decompression

Spinal Decompression: Laminectomy & Laminotomy · > 1 thickened ligaments. Decompression may be 1 Overview Decompression surgery (laminectomy) removes the bony roof covering the spinal

Clinical Policy Bulletin: Core Decompression for Avascular ... · Clinical Policy Bulletin: Core Decompression for Avascular ... avascular necrosis of the hip ... Core Decompression

60' DECOMPRESSION CHAMBER

TranscaruncularMedialWallOrbital Decompression ...

Decompression Surgery

Real-time Decompression And Visualization Of Animated ...cohen/Research/TexComp/Guthe2001-Realtime Decompression and...Real-time Decompression And Visualization Of Animated Volume

Spinal decompression

Spine Decompression Surgery

All About Monoidscomonad.com/reader/wp-content/uploads/2009/07/AllAboutMonoids.pdfLZ78 decompression never compares values in the dictionary. Decompress in the monoid, caching the

Albert Bühlmann - Decompression — Decompression Sickness

Orbital decompression, optic n decompression and EndoDCR

A noble double dictionary based ECG Compression Technique ... · At the receiving end, the compressed signal is translated back into the original one using a decompression technique.

5 la decompression

Decompression-Free Inspection: DPI for Shared Dictionary Compression over HTTP

Student decompression sickness