Co-Evaluation of Pattern Matching Algorithms on IoT Devices ...¢IoT security is a concern ¢Recent...

Post on 01-Aug-2020

2 views 0 download

Transcript of Co-Evaluation of Pattern Matching Algorithms on IoT Devices ...¢IoT security is a concern ¢Recent...

Co-Evaluation of Pattern Matching Algorithms on IoT Devices with Embedded GPUs

Charalampos StylianopoulosSimon KindströmMagnus AlmgrenOlaf LandsiedelMarina Papatriantafilou

Distributed Computing and Systems

Motivation

2

¢ IoT security is a concern ¢ Recent attacks:

l Show that IoT security is lacking• Mirai botnet• Attacks on a casino’s aquarium

thermostat

l Underline the need for countermeasures

Motivation

Standard security countermeasures (e.g. NIDS) can be applied

l on the IoT devices themselves l on the entry point to the network of IoT devices

3

Motivation

¢ Challengesl Resource constrained devicesl More connected devices -> More traffic to inspect

¢ NIDSl Performance bottleneckl Not tailored to hardware

4

5

… http://some.site.com/get.asp?f=/etc/passwd … GET HTTP try_backdoor…

Input Stream

…/etc/passwdadmin.dllget.aspbackdoor

…Pattern set

Search for all patterns, anywhere in the network stream.

… http://some.site.com/get.asp?f=/etc/passwd … GET HTTP try_backdoor…

Pattern matching = The core functionality of NIDS

Goal:

Compare all network traffic against all malicious signatures

Motivation: Pattern matching

more than 70% of running time [1]

[1] "Generating realistic workloads for network intrusion detection systems", Antonatos et al.

Motivation: New Devices

¢ Opportunitiesl IoT/Embedded hardware is evolvingl New hardware features

• Example: ODROID single board computers with embedded Graphic Processor Units (GPUs)

6

Making use of those features is an open issue

¢ The questions we are trying to answer in this work:l Which algorithms to use?l What are the hardware characteristics that

affect the performance?l How to create new algorithms that make

best use of those characteristics?

8

Our work

Our work

¢ Co-evaluation of pattern matching algorithmsl Evaluate existing implementationsl Influence the design of new ones

¢ Target embedded GPUsl Deep look in their architectural features

¢ Extensive evaluationl Different datasets, patterns, l Energy efficiency

9

Outline

¢ Backgroundl GPU computing

¢ Our Benchmark¢ Evaluation

10

Background

¢ General Purpose GPU computing (GPGPU)l Other than graphics, GPUs can be used for

general tasks as welll Highly parallel architecture

¢ Pattern matching on a GPU: Not a new thingl Not much work on embedded GPUs

11

[1]"Gnort: High Performance Network Intrusion Detection Using Graphics Processors”, Vasiliadis et al., RAID 2008[2]"APUNet: Revitalizing GPU as Packet Processing Accelerator”, Go et al, NSDI 2017[3]"A highly-efficient memory-compression scheme for GPU-accelerated intrusion detectionsystems”, Bellekens et al. SINCONF 2017

Background

¢ The platform

12Source :Energy efficient run-time mapping and thread partitioning of concurrentOpenCL applications on CPU-GPU MPSoCs

Background

Important characteristics(unique to embedded GPUS)

¢ Small number of cores/threads¢ No main memory on the GPU

Ø Shared main memory between CPU and GPU¢ No local memory on chip¢ Vectorization in each GPU thread¢ Separate instruction counter per GPU thread

Ø No need to worry about divergent execution13

Outline

¢ Background¢ Our Benchmark

l Algorithmsl Optimizations

¢ Evaluation

14

Algorithms

Representative algorithms from two categories:

15

Aho Corasick DFCCPU

GPU

State machine based Filtering based

Algorithms (CPU)

The Aho-Corasick algorithm¢ Used in many Network Intrusion Detection Systems¢ Builds a State Machine (SM) from all the patterns¢ Traverses the SM reading the input byte by byte

“Efficient String Matching: An Aid to Bibliographic Search”, A. Aho, M, Corasick, ACM Comm.’75

• Poor cache locality• Data dependenciesLimitations

• Only one lookup per input byteBenefits

¢ Aho Corasick¢ DFC

16

Algorithms (CPU)

The DFC algorithm¢ Creates a filter from patterns¢ Quickly filter outs parts of

the input

“DFC: Accelerating String Pattern Matching for Network Applications”, Choi et al. NSDI’16

…a c t i v a t ea d m i n . d l lb a c k d o o rg e t . a s p

… Pattern set

… 0 1 1 0 0 0 1 0 0 0 1 0 0

Filter (8 KB)

ac adab ... ba bb ... ge ...

Fits in cache! … t h i s i s a n i n p u t

Input Stream

¢ Aho Corasick¢ DFC

17

Algorithms (CPU)¢ Aho Corasick¢ DFC

¢ Progressive filteringl in cache

¢ Verificationl in memory

Hash%tables

Initial%filter

1B

… …223B 427B 82 B

… … … … … …

… …

Patternlengthspecificfilters

… … … …

… …

… …

• Verification phase is costlyLimitations

• Cache locality (on filtering)• No data dependenciesBenefits

“DFC: Accelerating String Pattern Matching for Network Applications”, Choi et al. NSDI’16

The DFC algorithm (continued)

18

Algorithms

Representative algorithms from two categories:

19

Aho Corasick

PFAC [1]

DFC

DFC (GPU)

HYBRID

CPU

GPU

[1] “Accelerating Pattern Matching Using a Novel Parallel Algorithm on GPUs” Lin et al., TOC 2013

State machine based Filtering based

Hardware-oriented optimizations

Relevant aspects that we investigate:¢ Memory mapping vs data transfers

l 2-5X faster with memory mapping¢ Placement of the filters

l Global memoryl Texture memoryl Local memory

¢ Vectorizationl No significant speedup 20More in the paper…

Outline

¢ Background¢ Our Benchmark¢ Evaluation

21

Evaluation Methodology

Hardware

22

CPU 4 ARM big.LITTLE

GPU ARM Mali-T628 (6 shader cores)

Memory 2GB RAM

Sensors On board energy sensors

l 3 publicly available traffic tracesl 1 randomly generated data set

l 2183 patterns (from Snort)

Datasets

MaliciousPatterns l 5000 patterns (emergingthreats.net)

Evaluation Methodology

¢ Goal of the evaluation:1. How fast we can process the input (execution time)2. How much energy we spent for processing (energy consumption)3. Effect of datasets and number of patterns4. Influence the design of new algorithms

¢ Versions: l Aho-Corasickl DFC l PFACl DFC on GPU (w/wo vectorization)l HYBRID (w/wo vectorization)

CPU

GPU

23

Evaluation Results

¢ Experiment 1: execution time breakdown

24( Post-processing = Output which and how many patterns matched, on the CPU )

Post-processingCPU->GPUCPU->GPU

CPU Versions GPU VersionsVect

Evaluation Results

¢ Experiment 2: energy consumption

25

Evaluation Results

¢ Experiment 3: effect of datasets and #patterns

26

2183 patterns

5000 patterns

Evaluation Results

¢ Experiment 4: configuring Hybrid

27Bigger Filter =Slower access time (green trend, left y-axis)

Higher hit ratio -> Less verification (red trend, right y-axis)

Conclusions & Future Work

¢ Conclusionsl New hardware features (embedded GPUs) can alleviate the

bottleneck of pattern matchingl Architecture characteristics important for high performance and

low energy consumptionl Possible to design new algorithms tailored to the hardware

¢ Future Workl Overlap CPU/GPU execution (heterogeneous design)l More algorithms and devices (e.g. Nvidia’s Jetson Nano)l Integrate with existing systems (e.g. Snort)

¢ Code available online 28

¢ Backup Slides

29

Background (1/3)

¢ Snortl The de-facto NIDSl Signature based (malicious signatures are

known in advance) l The main pipeline looks like that

30more than 70%of running timeincludes pattern

matching

Algorithms (CPU)

The Aho-Corasick algorithm¢ Used in many Network Intrusion Detection Systems¢ Builds a State Machine (SM) from all the patterns¢ Traverses the SM reading the input byte by byte

“Efficient String Matching: An Aid to Bibliographic Search”, A. Aho, M, Corasick, ACM Comm.’7531

• Poor cache locality• Data dependenciesLimitations

• Only one lookup per input byteBenefits

¢ Aho Corasick¢ DFC

Related work

¢ State machine basedl Aho Corasick

“Efficient String Matching: An Aid to Bibliographic Search”, A. Aho, M, Corasick, ACM Comm.’75

¢ Filter basedl DFC

• Poor cache locality• Data dependenciesLimitations

• Only one lookup per input byteBenefits

…a c t i v a t ea d m i n . d l lb a c k d o o rg e t . a s p

… Pattern set

… 0 0 0 0 0 0 0 0 0 0 0 0 0

Filter (8 KB)

ac adab ... ba bb ... ge ...

… 0 1 0 0 0 0 0 0 0 0 0 0 0… 0 1 1 0 0 0 0 0 0 0 0 0 0… 0 1 1 0 0 0 1 0 0 0 1 0 0

"DFC: Accelerating String Pattern Matching for Network Applications”, Choi et al. NSDI’16

• Much of the hardware remains underutilizedLimitations

• Cache locality (on filtering)• No data dependenciesBenefits

e.g. vectorinstructions?32