U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Hyperion: High Volume Stream...

25
UNIVERSITY OF NIVERSITY OF MASSACHUSETTS ASSACHUSETTS , A , AMHERST MHERST Department of Computer Science Department of Computer Science Hyperion: High Volume Stream Archival for Restrospective Querying Peter Desnoyers and Prashant Shenoy University of Massachusetts
  • date post

    22-Dec-2015
  • Category

    Documents

  • view

    216
  • download

    0

Transcript of U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Hyperion: High Volume Stream...

Page 1: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Hyperion: High Volume Stream Archival for Restrospective Querying Peter Desnoyers.

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Hyperion: High Volume Stream Archival for Restrospective

Querying

Peter Desnoyers and Prashant ShenoyUniversity of Massachusetts

Page 2: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Hyperion: High Volume Stream Archival for Restrospective Querying Peter Desnoyers.

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Packet monitoring with history

Packet monitor: capture and search packet headers

E.g.: Snort, tcpdump, Gigascope

… with history:Capture, index, and store packet headersInteractive queries on stored data

Provides new capabilities:Network forensics:

When was a system compromised? From where? How?

Management:After-the-fact debugging

monitor

storage

Page 3: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Hyperion: High Volume Stream Archival for Restrospective Querying Peter Desnoyers.

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Challenges

Speed

Storage rate, capacityto store data without loss, retain long enough

Queriesmust search millions of packet records

Indexing in real timefor online queries

Commodity hardware

For each linkmonitored

1 gbit/s x 80% ÷ 400 B/pkt

= 250,000 pkts/s

Page 4: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Hyperion: High Volume Stream Archival for Restrospective Querying Peter Desnoyers.

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Existing approaches

Packet monitoring with history requires a new system.

Event rates

ArchiveIndex, query

CommodityHW

Streaming query systems (GigaScope, Bro, Snort)

Yes No No Yes

Peer-to-peer systems (MIND, PIER)

No Yes Yes Yes

Conventional DBMS No Yes Yes Yes

CoMo Yes Yes No Yes

Proprietary systems* ? Yes Yes No

*Niksun NetDetector, Sandstorm NetInterceptor

Page 5: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Hyperion: High Volume Stream Archival for Restrospective Querying Peter Desnoyers.

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Outline of talk

Introduction and MotivationDesignImplementationResultsConclusions

Page 6: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Hyperion: High Volume Stream Archival for Restrospective Querying Peter Desnoyers.

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Hyperion Design

Multiple monitor systemsHigh-speed storage systemLocal indexDistributed index for query routing

Monitor/capture

Storage

Index

Distributedindex

Hyperion node

Page 7: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Hyperion: High Volume Stream Archival for Restrospective Querying Peter Desnoyers.

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Storage Requirements

Real-timeWrites must keep up or data is lost

PrioritizedReads shouldn’t interfere with writes

AgingOld data replaced by new

Stream storageDifferent behavior

Typical app

Hyperion

Likely deletes Newest files

Oldest data

File size Random, small

Streaming

Sequential reads

yes no

Behavior:Typical app. vs. Hyperion

Packet monitoring is different from typical applications

Page 8: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Hyperion: High Volume Stream Archival for Restrospective Querying Peter Desnoyers.

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Log structured stream storage

Goal: minimize seeksdespite interleaved writes on multiple streams

Log-structured file system minimizes seeks

Interleave writes at advancing frontier

free space collected by segment cleaner

A

disk position

1: A

A

But:General-purpose segment cleaner performs poorly on streams

Write frontier

C

2: C3: A

B

4: B

C

Page 9: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Hyperion: High Volume Stream Archival for Restrospective Querying Peter Desnoyers.

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Hyperion StreamFS

How to improve on a general-purpose file system?

Rely on application use patterns

Eliminate un-needed features

StreamFS – log structure with no segment cleaner.

No deletes (just over-write)

No fragmentation

No segment cleaning overhead

Operation:Write fixed-size segment

Advance write frontier to next segment ready for deletion

skip

Page 10: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Hyperion: High Volume Stream Archival for Restrospective Querying Peter Desnoyers.

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

StreamFS Design

RecordSingle write, packed into:

Segment Fixed-size, single stream, interleaved into:

RegionContains:

Region mapIdentifies segments in region

Used when write frontier wraps

DirectoryLocate streams on disk

Region map

record

segment

region

directoryStream_A

Page 11: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Hyperion: High Volume Stream Archival for Restrospective Querying Peter Desnoyers.

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

StreamFS optimizations

Data RetentionControl how much history saved Lets filesystem make delete decisions

Speed balancingWorst-case speed set by slowest tracksSolution: interleave fast and slow sectionsWorst-case speed now set by average track

New data

ReservationOld data is deleted

Page 12: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Hyperion: High Volume Stream Archival for Restrospective Querying Peter Desnoyers.

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Local Index

Requirements:High insertion speedInteractive query response

Search speed

Insert speed

Exhaustive search No Yes

B-tree Yes No

Hash index Yes No

Signature index Yes Yes

Index and search mechanisms

Page 13: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Hyperion: High Volume Stream Archival for Restrospective Querying Peter Desnoyers.

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Signature Index

Compress data into signature

Store signature separately

Search signature, not data

Retrieve data itself on match

Signature algorithm: Bloom filter

No false negatives – never misses a result

False positives – extra read overhead Records

Keys

Signature

Page 14: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Hyperion: High Volume Stream Archival for Restrospective Querying Peter Desnoyers.

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Signature index efficiency

Overhead = bytes searchedIndex sizeFalse positives (data scan)

Concise index:Index scan cost: lowFalse positive scans: high

Verbose index:Index scan cost: highFalse positive scans: low

Byte

s se

arc

hed

Index size

Page 15: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Hyperion: High Volume Stream Archival for Restrospective Querying Peter Desnoyers.

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Multi-level signature index

Concise index: Low scan overhead

Verbose index:Low false positive overhead

Use bothScan concise index

Check positives in verbose index

Concise index

Verbose index

Data records

Page 16: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Hyperion: High Volume Stream Archival for Restrospective Querying Peter Desnoyers.

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Distributed Index

Query routing:Send queries only to nodes holding matchesUse signature index

Index distribution:Aggregate indexes at cluster headRoute queries through cluster headRotate cluster head for load sharing

Page 17: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Hyperion: High Volume Stream Archival for Restrospective Querying Peter Desnoyers.

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Implementation

Components:StreamFSIndexCaptureRPC, query & index distributionQuery API

Linux OSPython framework

Linux kernel capture StreamFS

RPC, query, index dist.

QueryAPI

Index

Hyperion components

Page 18: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Hyperion: High Volume Stream Archival for Restrospective Querying Peter Desnoyers.

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Outline of talk

Introduction and MotivationDesignImplementationResultsConclusions

Page 19: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Hyperion: High Volume Stream Archival for Restrospective Querying Peter Desnoyers.

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Experimental Setup

Hardware:Linux clusterDual 2.4GHz Xeon CPUs1 GB memory4 x 10K RPM SCSI disksSyskonnect SK98xx + U. Cambridge driver

Test dataPacket traces from UMass Internet gateway*400 mbit/s, 100k pkt/s

*http://traces.cs.umass.edu

Page 20: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Hyperion: High Volume Stream Archival for Restrospective Querying Peter Desnoyers.

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

StreamFS – write performance

Tested configurations:NetBSD / LFS Linux / XFS (SGI)StreamFS

Workload:multiple streams, rates

Logfile rotationUsed for LFS, XFS

Results:50% boost in worst-case throughputFast enough to store 1,000,000 packet hdrs/s

Page 21: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Hyperion: High Volume Stream Archival for Restrospective Querying Peter Desnoyers.

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

StreamFS – read/write

Workload:Continuous writesRandom reads

StreamFS:sustained write throughput

XFSthroughput collapse

StreamFS can handlestream read+write traffic without data loss.

XFS cannot.

Page 22: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Hyperion: High Volume Stream Archival for Restrospective Querying Peter Desnoyers.

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Index Performance

Calculation benchmark:250,000 pkts/sec

Query:380M packet headers26GB dataselective query (1 pkt returned)

Query results:

13MB data fetched to query 26GB data (1:2000)

Index size

Data

fetc

hed

(M

B)

Page 23: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Hyperion: High Volume Stream Archival for Restrospective Querying Peter Desnoyers.

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

System Performance

Workload:Trace replaySimultaneous queries

Speed:100-200K pkts/sPacket loss measured:

#transmitted - #received

Up to 175K pkts/s with negligible packet loss

10·10-6175,000

4·10-6160,000

.001200,000

2·10-6150,000

0130,000

0110,000

Loss ratePackets/s

Results:

Page 24: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Hyperion: High Volume Stream Archival for Restrospective Querying Peter Desnoyers.

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Conclusions

Hyperion - packet monitoring with retrospective queries

Key components:Storage

50% improvement over GP file systems

IndexInsert at 250K pkts/secInteractive query over 100s of millions of pkts

SystemCapture, index, and query at 175K pkts/sec

Page 25: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Hyperion: High Volume Stream Archival for Restrospective Querying Peter Desnoyers.

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science

Questions

Questions?