U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Hyperion: High Volume Stream...
-
date post
22-Dec-2015 -
Category
Documents
-
view
216 -
download
0
Transcript of U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Hyperion: High Volume Stream...
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Hyperion: High Volume Stream Archival for Restrospective
Querying
Peter Desnoyers and Prashant ShenoyUniversity of Massachusetts
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Packet monitoring with history
Packet monitor: capture and search packet headers
E.g.: Snort, tcpdump, Gigascope
… with history:Capture, index, and store packet headersInteractive queries on stored data
Provides new capabilities:Network forensics:
When was a system compromised? From where? How?
Management:After-the-fact debugging
monitor
storage
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Challenges
Speed
Storage rate, capacityto store data without loss, retain long enough
Queriesmust search millions of packet records
Indexing in real timefor online queries
Commodity hardware
For each linkmonitored
1 gbit/s x 80% ÷ 400 B/pkt
= 250,000 pkts/s
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Existing approaches
Packet monitoring with history requires a new system.
Event rates
ArchiveIndex, query
CommodityHW
Streaming query systems (GigaScope, Bro, Snort)
Yes No No Yes
Peer-to-peer systems (MIND, PIER)
No Yes Yes Yes
Conventional DBMS No Yes Yes Yes
CoMo Yes Yes No Yes
Proprietary systems* ? Yes Yes No
*Niksun NetDetector, Sandstorm NetInterceptor
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Outline of talk
Introduction and MotivationDesignImplementationResultsConclusions
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Hyperion Design
Multiple monitor systemsHigh-speed storage systemLocal indexDistributed index for query routing
Monitor/capture
Storage
Index
Distributedindex
Hyperion node
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Storage Requirements
Real-timeWrites must keep up or data is lost
PrioritizedReads shouldn’t interfere with writes
AgingOld data replaced by new
Stream storageDifferent behavior
Typical app
Hyperion
Likely deletes Newest files
Oldest data
File size Random, small
Streaming
Sequential reads
yes no
Behavior:Typical app. vs. Hyperion
Packet monitoring is different from typical applications
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Log structured stream storage
Goal: minimize seeksdespite interleaved writes on multiple streams
Log-structured file system minimizes seeks
Interleave writes at advancing frontier
free space collected by segment cleaner
A
disk position
1: A
A
But:General-purpose segment cleaner performs poorly on streams
Write frontier
C
2: C3: A
B
4: B
C
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Hyperion StreamFS
How to improve on a general-purpose file system?
Rely on application use patterns
Eliminate un-needed features
StreamFS – log structure with no segment cleaner.
No deletes (just over-write)
No fragmentation
No segment cleaning overhead
Operation:Write fixed-size segment
Advance write frontier to next segment ready for deletion
skip
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
StreamFS Design
RecordSingle write, packed into:
Segment Fixed-size, single stream, interleaved into:
RegionContains:
Region mapIdentifies segments in region
Used when write frontier wraps
DirectoryLocate streams on disk
Region map
record
segment
region
directoryStream_A
…
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
StreamFS optimizations
Data RetentionControl how much history saved Lets filesystem make delete decisions
Speed balancingWorst-case speed set by slowest tracksSolution: interleave fast and slow sectionsWorst-case speed now set by average track
New data
ReservationOld data is deleted
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Local Index
Requirements:High insertion speedInteractive query response
Search speed
Insert speed
Exhaustive search No Yes
B-tree Yes No
Hash index Yes No
Signature index Yes Yes
Index and search mechanisms
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Signature Index
Compress data into signature
Store signature separately
Search signature, not data
Retrieve data itself on match
Signature algorithm: Bloom filter
No false negatives – never misses a result
False positives – extra read overhead Records
Keys
Signature
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Signature index efficiency
Overhead = bytes searchedIndex sizeFalse positives (data scan)
Concise index:Index scan cost: lowFalse positive scans: high
Verbose index:Index scan cost: highFalse positive scans: low
Byte
s se
arc
hed
Index size
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Multi-level signature index
Concise index: Low scan overhead
Verbose index:Low false positive overhead
Use bothScan concise index
Check positives in verbose index
Concise index
Verbose index
Data records
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Distributed Index
Query routing:Send queries only to nodes holding matchesUse signature index
Index distribution:Aggregate indexes at cluster headRoute queries through cluster headRotate cluster head for load sharing
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Implementation
Components:StreamFSIndexCaptureRPC, query & index distributionQuery API
Linux OSPython framework
Linux kernel capture StreamFS
RPC, query, index dist.
QueryAPI
Index
Hyperion components
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Outline of talk
Introduction and MotivationDesignImplementationResultsConclusions
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Experimental Setup
Hardware:Linux clusterDual 2.4GHz Xeon CPUs1 GB memory4 x 10K RPM SCSI disksSyskonnect SK98xx + U. Cambridge driver
Test dataPacket traces from UMass Internet gateway*400 mbit/s, 100k pkt/s
*http://traces.cs.umass.edu
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
StreamFS – write performance
Tested configurations:NetBSD / LFS Linux / XFS (SGI)StreamFS
Workload:multiple streams, rates
Logfile rotationUsed for LFS, XFS
Results:50% boost in worst-case throughputFast enough to store 1,000,000 packet hdrs/s
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
StreamFS – read/write
Workload:Continuous writesRandom reads
StreamFS:sustained write throughput
XFSthroughput collapse
StreamFS can handlestream read+write traffic without data loss.
XFS cannot.
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Index Performance
Calculation benchmark:250,000 pkts/sec
Query:380M packet headers26GB dataselective query (1 pkt returned)
Query results:
13MB data fetched to query 26GB data (1:2000)
Index size
Data
fetc
hed
(M
B)
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
System Performance
Workload:Trace replaySimultaneous queries
Speed:100-200K pkts/sPacket loss measured:
#transmitted - #received
Up to 175K pkts/s with negligible packet loss
10·10-6175,000
4·10-6160,000
.001200,000
2·10-6150,000
0130,000
0110,000
Loss ratePackets/s
Results:
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Conclusions
Hyperion - packet monitoring with retrospective queries
Key components:Storage
50% improvement over GP file systems
IndexInsert at 250K pkts/secInteractive query over 100s of millions of pkts
SystemCapture, index, and query at 175K pkts/sec
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Questions
Questions?