1 An Information Theoretic Approach to Network Trace Compression Y. Liu, D. Towsley, J. Weng and D....
-
date post
22-Dec-2015 -
Category
Documents
-
view
217 -
download
1
Transcript of 1 An Information Theoretic Approach to Network Trace Compression Y. Liu, D. Towsley, J. Weng and D....
1
An Information Theoretic Approach to Network Trace Compression
Y. Liu, D. Towsley, J. Weng and D. Goeckel
2
Outline
network monitoring/measurement information theory & compression single point trace compression joint network trace compression future work
3
Motivation
service providers, service users monitoring anomaly detection debugging traffic engineering pricing, peering, service level agreements
architecture designapplication design
4
Network of Network Sensors network monitoring: sensing a network
embedded vs. exogenous single point vs. distributed
different granularities full traffic trace: packet headers flow level record: timing, volume summary statistics:
byte/packet counts challenges
growing scales: high speed link, large topology
constrained resources: processing, storage, transmission
30G headers/hour at UMass gateway solutions
sampling: temporal/spatial compression: marginal/distributed
5
Entropy & Compression (I)Shannon entropy of discrete r.v.
compression of i.i.d. sequence by source coding coding: expected code length:
info. theoretic bound: Shannon/Huffman coding
assign short codeword to frequent outcome achieve the H(X) bound
6
Entropy & Compression (II)
joint entropy
entropy rate of stochastic process
exploit auto-correlation lower bound on # bits per sample of X compression ratio: H(X)/M, M – original size per sample
Lempel-Ziv Coding asymptotically achieve entropy rate of stationary process universal data compression algorithms: LZ77, gzip, winzip
7
Entropy & Compression (III)
joint entropy rate of set of stochastic processes
joint data compression exploit cross-correlation between sources joint compression ratio
Slepian-Wolf Coding distributed compression: encode each process
individually, achieve joint entropy rate in limit require knowledge of cross-correlation structure
8
Network Trace Compression
naïve way: treat as byte stream, compress by generic tools gzip compress UMass traces by a factor of 2
network traces are highly structured data multiple fields per packet
• diversity in information richness • correlation among fields
multiple packets per flow• packets within a flow share information• temporal correlation
multiple monitors traversed by a flow• most fields unchanged within the network• spatial correlation
network trace models quantify information content of network traces serves as lower bounds/guidelines for compression algorithms
9
Packet Header Trace
source IP address
destination IP address
data sequence number
acknowledgment number
time stamp (sec.)
time stamp (sub-sec.)
total lengthToSvers. HLen
IPID flags
TTL protocol header checksum
destination portsource port
window sizeHlen
fragment offset
TCP flags
urgent pointerchecksum
Timing
IP Header
TCP Header
0 16 31
10
Header Field Entropy
source IP address
destination IP address
data sequence number
acknowledgment number
time stamp (sec.)
time stamp (sub-sec.)
total lengthToSvers. HLen
IPID flags
TTL protocol header checksum
destination portsource port
window sizeHlen
fragment offset
TCP flags
urgent pointerchecksum
Timing
IP Header
TCP Header
0 16 31
Θ: flow id
T: Time
11
Single Point Compression
T0 Θ0 T1 Θ1 T3 Θ0 Tn ΘnTm Θ0
temporal correlation introduced by flows packets from same flow closely spaced in time they share header information
packet inter-arrival: # bits per packet:
T0 Θ0 T3 Θ0 Tm Θ0 flow based trace:
flow record: Θ0 K T0
flowID
flowsize
arrivaltime packet inter-arrival
# bits per flow id: H(Θ)
12
Flow Level Model
Poisson flow arrival rate: flow inter-arrival: independent packet inter-arrival: K – flow length
# bits per flow:
# bits per second: H(Φ) marginal compression ratio
13
Empirical Results: single point 1 hour UMass gateway traces
Sept. 22, 2004 to Oct. 23, 2004 1am, 10am, 1pm
14
Distributed Network Monitoring single flow recorded by multiple
monitors spatial correlation:
traces collected at distributed monitors are correlated
marginal node view:#bits/sec to represent flows seen by one node, bound on single point compression
network system view:#bits/sec to represent flows cross the network, bound on joint compression
joint compression ratio: quantify gain of joint compression
15
Baseline Joint Entropy Model “perfect” network
fixed routes/constant link delay/no packet loss
flow classes based on routes flows arrive with rate: # of monitors traversed: #bits per flow record:
info. rate at node v:
network view info. rate:
joint compression ratio:
16
Joint Trace Compression
Results from synthetic networks
17
Open Issues
how many more bits for network characteristics variable delay/loss/route
change distributed compression
algorithmslossless v.s. lossyjoint routing and
compression in trace aggregation
18
Future work
develop compression algorithms single point compression distributed joint compression
different levels of details full packet traces Netflow data SNMP data
entropy based applications network monitor placement network anomaly detection
19
Questions & Comments
???