1 An Information Theoretic Approach to Network Trace Compression Y. Liu, D. Towsley, J. Weng and D....

19
1 An Information Theoretic Approach to Network Trace Compression Y. Liu, D. Towsley, J. Weng and D. Goeckel
  • date post

    22-Dec-2015
  • Category

    Documents

  • view

    217
  • download

    1

Transcript of 1 An Information Theoretic Approach to Network Trace Compression Y. Liu, D. Towsley, J. Weng and D....

Page 1: 1 An Information Theoretic Approach to Network Trace Compression Y. Liu, D. Towsley, J. Weng and D. Goeckel.

1

An Information Theoretic Approach to Network Trace Compression

Y. Liu, D. Towsley, J. Weng and D. Goeckel

Page 2: 1 An Information Theoretic Approach to Network Trace Compression Y. Liu, D. Towsley, J. Weng and D. Goeckel.

2

Outline

network monitoring/measurement information theory & compression single point trace compression joint network trace compression future work

Page 3: 1 An Information Theoretic Approach to Network Trace Compression Y. Liu, D. Towsley, J. Weng and D. Goeckel.

3

Motivation

service providers, service users monitoring anomaly detection debugging traffic engineering pricing, peering, service level agreements

architecture designapplication design

Page 4: 1 An Information Theoretic Approach to Network Trace Compression Y. Liu, D. Towsley, J. Weng and D. Goeckel.

4

Network of Network Sensors network monitoring: sensing a network

embedded vs. exogenous single point vs. distributed

different granularities full traffic trace: packet headers flow level record: timing, volume summary statistics:

byte/packet counts challenges

growing scales: high speed link, large topology

constrained resources: processing, storage, transmission

30G headers/hour at UMass gateway solutions

sampling: temporal/spatial compression: marginal/distributed

Page 5: 1 An Information Theoretic Approach to Network Trace Compression Y. Liu, D. Towsley, J. Weng and D. Goeckel.

5

Entropy & Compression (I)Shannon entropy of discrete r.v.

compression of i.i.d. sequence by source coding coding: expected code length:

info. theoretic bound: Shannon/Huffman coding

assign short codeword to frequent outcome achieve the H(X) bound

Page 6: 1 An Information Theoretic Approach to Network Trace Compression Y. Liu, D. Towsley, J. Weng and D. Goeckel.

6

Entropy & Compression (II)

joint entropy

entropy rate of stochastic process

exploit auto-correlation lower bound on # bits per sample of X compression ratio: H(X)/M, M – original size per sample

Lempel-Ziv Coding asymptotically achieve entropy rate of stationary process universal data compression algorithms: LZ77, gzip, winzip

Page 7: 1 An Information Theoretic Approach to Network Trace Compression Y. Liu, D. Towsley, J. Weng and D. Goeckel.

7

Entropy & Compression (III)

joint entropy rate of set of stochastic processes

joint data compression exploit cross-correlation between sources joint compression ratio

Slepian-Wolf Coding distributed compression: encode each process

individually, achieve joint entropy rate in limit require knowledge of cross-correlation structure

Page 8: 1 An Information Theoretic Approach to Network Trace Compression Y. Liu, D. Towsley, J. Weng and D. Goeckel.

8

Network Trace Compression

naïve way: treat as byte stream, compress by generic tools gzip compress UMass traces by a factor of 2

network traces are highly structured data multiple fields per packet

• diversity in information richness • correlation among fields

multiple packets per flow• packets within a flow share information• temporal correlation

multiple monitors traversed by a flow• most fields unchanged within the network• spatial correlation

network trace models quantify information content of network traces serves as lower bounds/guidelines for compression algorithms

Page 9: 1 An Information Theoretic Approach to Network Trace Compression Y. Liu, D. Towsley, J. Weng and D. Goeckel.

9

Packet Header Trace

source IP address

destination IP address

data sequence number

acknowledgment number

time stamp (sec.)

time stamp (sub-sec.)

total lengthToSvers. HLen

IPID flags

TTL protocol header checksum

destination portsource port

window sizeHlen

fragment offset

TCP flags

urgent pointerchecksum

Timing

IP Header

TCP Header

0 16 31

Page 10: 1 An Information Theoretic Approach to Network Trace Compression Y. Liu, D. Towsley, J. Weng and D. Goeckel.

10

Header Field Entropy

source IP address

destination IP address

data sequence number

acknowledgment number

time stamp (sec.)

time stamp (sub-sec.)

total lengthToSvers. HLen

IPID flags

TTL protocol header checksum

destination portsource port

window sizeHlen

fragment offset

TCP flags

urgent pointerchecksum

Timing

IP Header

TCP Header

0 16 31

Θ: flow id

T: Time

Page 11: 1 An Information Theoretic Approach to Network Trace Compression Y. Liu, D. Towsley, J. Weng and D. Goeckel.

11

Single Point Compression

T0 Θ0 T1 Θ1 T3 Θ0 Tn ΘnTm Θ0

temporal correlation introduced by flows packets from same flow closely spaced in time they share header information

packet inter-arrival: # bits per packet:

T0 Θ0 T3 Θ0 Tm Θ0 flow based trace:

flow record: Θ0 K T0

flowID

flowsize

arrivaltime packet inter-arrival

# bits per flow id: H(Θ)

Page 12: 1 An Information Theoretic Approach to Network Trace Compression Y. Liu, D. Towsley, J. Weng and D. Goeckel.

12

Flow Level Model

Poisson flow arrival rate: flow inter-arrival: independent packet inter-arrival: K – flow length

# bits per flow:

# bits per second: H(Φ) marginal compression ratio

Page 13: 1 An Information Theoretic Approach to Network Trace Compression Y. Liu, D. Towsley, J. Weng and D. Goeckel.

13

Empirical Results: single point 1 hour UMass gateway traces

Sept. 22, 2004 to Oct. 23, 2004 1am, 10am, 1pm

Page 14: 1 An Information Theoretic Approach to Network Trace Compression Y. Liu, D. Towsley, J. Weng and D. Goeckel.

14

Distributed Network Monitoring single flow recorded by multiple

monitors spatial correlation:

traces collected at distributed monitors are correlated

marginal node view:#bits/sec to represent flows seen by one node, bound on single point compression

network system view:#bits/sec to represent flows cross the network, bound on joint compression

joint compression ratio: quantify gain of joint compression

Page 15: 1 An Information Theoretic Approach to Network Trace Compression Y. Liu, D. Towsley, J. Weng and D. Goeckel.

15

Baseline Joint Entropy Model “perfect” network

fixed routes/constant link delay/no packet loss

flow classes based on routes flows arrive with rate: # of monitors traversed: #bits per flow record:

info. rate at node v:

network view info. rate:

joint compression ratio:

Page 16: 1 An Information Theoretic Approach to Network Trace Compression Y. Liu, D. Towsley, J. Weng and D. Goeckel.

16

Joint Trace Compression

Results from synthetic networks

Page 17: 1 An Information Theoretic Approach to Network Trace Compression Y. Liu, D. Towsley, J. Weng and D. Goeckel.

17

Open Issues

how many more bits for network characteristics variable delay/loss/route

change distributed compression

algorithmslossless v.s. lossyjoint routing and

compression in trace aggregation

Page 18: 1 An Information Theoretic Approach to Network Trace Compression Y. Liu, D. Towsley, J. Weng and D. Goeckel.

18

Future work

develop compression algorithms single point compression distributed joint compression

different levels of details full packet traces Netflow data SNMP data

entropy based applications network monitor placement network anomaly detection

Page 19: 1 An Information Theoretic Approach to Network Trace Compression Y. Liu, D. Towsley, J. Weng and D. Goeckel.

19

Questions & Comments

???