Internet Traffic Classification KISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela...

50
Internet Traffic Classification KISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi 1

Transcript of Internet Traffic Classification KISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela...

Page 1: Internet Traffic Classification KISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi 1.

Internet Traffic ClassificationKISS

Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi

1

Page 2: Internet Traffic Classification KISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi 1.

Traffic Classification & Measurement Why??

Identify normal and anomalous behavior Characterize the network and its users Quality of service Filtering …

How?How? By means of passive measurement Using Tstat

2

Page 3: Internet Traffic Classification KISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi 1.

3

Tstat

Traffic classifier Deep packet inspection Statistical methods

Persistent and scalable monitoring platform Round Robin Database (RRD) Histograms

Internal Clients

EdgeRouter

External Servers

htt

p:/

/tst

at.

tlc.

polit

o.it

htt

p:/

/tst

at.

tlc.

polit

o.it

Page 4: Internet Traffic Classification KISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi 1.

Tstat at a Glance

Page 5: Internet Traffic Classification KISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi 1.

Worm and Viruses?

Did someone open a Christmas card? Happy new year to Windows!! Did someone open a Christmas card? Happy new year to Windows!!

Page 6: Internet Traffic Classification KISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi 1.

Anomalies (Good!)Spammer Disappear McColo SpamNet shut off on Tuesday, November 11th, 2008

Spammer Disappear McColo SpamNet shut off on Tuesday, November 11th, 2008

Page 7: Internet Traffic Classification KISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi 1.

New Applications – P2PTVFiorentina 4 - Udinese 2Fiorentina 4 - Udinese 2

Inter 1 - Juventus 0Inter 1 - Juventus 0

Page 8: Internet Traffic Classification KISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi 1.

Traffic classification

Look at the packets…

Tell me what protocol and/or application

generated them

Page 9: Internet Traffic Classification KISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi 1.

Port:

Port: 4662/4672

Port:

Port:

Payload: “bittorrent”

Payload: E4/E5

Payload:

Payload: RTP protocol

Skype Bittorrent

Gtalk eMule

Typical approach: Deep Packet Inspection (DPI)

It fails more and more:P2P

EncryptionProprietary solution

Many different flavours

Page 10: Internet Traffic Classification KISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi 1.

The Failure of DPI

11.05.2008 12:29 eMule 0.49a released 11.05.2008 12:29 eMule 0.49a released

1.08.2008 20:25 eMule 0.49b released 1.08.2008 20:25 eMule 0.49b released

Page 11: Internet Traffic Classification KISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi 1.

Possible Solution: Behavioral Classifier

Phase 1

Feature

Phase 3

Verify

1. Statistical characterization of traffic (given source) 2. Look for the behaviour of unknown traffic and

assign the class that better fits it3. Check for possible classification mistakes

Phase 2

DecisionTraffic(Known)

(Training) (Operation)

Page 12: Internet Traffic Classification KISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi 1.

Phase 1

Feature

Phase 3

Verify

Phase 2

DecisionTraffic(Known)

Our Approach

Statistical characterization of bits in a flow

Do NOT look at the SEMANTIC and TIMING… but rather look at the protocol FORMAT

Test2

Page 13: Internet Traffic Classification KISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi 1.

Chunking and 2

First N payload bytes

First N payload bytes

C chunks Each of

b bits2

12

C[ ], … ,

Vector of Statistics

The provides an implicit measure of entropy or randomness

2

Observeddistribution

Expecteddistribution(uniform)

Page 14: Internet Traffic Classification KISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi 1.

Consider a chunk of 2 bits:

0 1 2 3 0 1 2 3 0 1 2 3

RandomValues

DeterministicValue

Counter

Oi

and different beaviour

Page 15: Internet Traffic Classification KISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi 1.

4 bit long chunks: evolution

random

x x x x

2

Page 16: Internet Traffic Classification KISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi 1.

random

Deterministic )12(2 bN

0 0 0 1

4 bit long chunks: evolution2

Page 17: Internet Traffic Classification KISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi 1.

random

deterministic

mixed

x 0 0 0

x 0 x 0

0 x x x

4 bit long chunks: evolution2

Page 18: Internet Traffic Classification KISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi 1.

Chi Square Classifier

Split the payload into groups

Apply the test on the groups at the flow end: each message is a sample

Some groups will contain Random bits Mixed bits Deterministic bits

0 8 16 24---------------------| ID | FUNC |---------------------

Page 19: Internet Traffic Classification KISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi 1.

CSC

1

10

100

1000

10000

100000

1e+006

100 1000 10000 100000 1e+006n [pkt]

Deterministic groupRandom group

Mixed group

Page 20: Internet Traffic Classification KISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi 1.

And the counter example?

2 byte long counter

MSG L2 L1 LSG

MostSignificantGroup

LessSignificantGroup

Page 21: Internet Traffic Classification KISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi 1.

Protocol format as seen from the2

Page 22: Internet Traffic Classification KISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi 1.

Statistical characterization of bits in a flow

Decision process Test

Minimum distance / maximum likelihood

2

Phase 1

Feature

Phase 3

Verify

Phase 2

DecisionTraffic(Known)

Our Approach

Page 23: Internet Traffic Classification KISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi 1.

C-dimension space

21

2C[ ], … ,

Iperspace

ClassificationRegions

EuclideanDistance

Support VectorMachine

2i

2j

Class

Class

My Point

Page 24: Internet Traffic Classification KISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi 1.

Example considering the 2

Page 25: Internet Traffic Classification KISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi 1.

2i

2j Centroid

Center of mass

Euclidean Distance Classifier

Page 26: Internet Traffic Classification KISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi 1.

2i

2j

True NegativeAre “Far”

True PositivesAre “Nearby”

CentroidCenter of mass

Euclidean Distance Classifier

Page 27: Internet Traffic Classification KISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi 1.

2i

2j

False Positives

CentroidCenter of mass

Iper-sphere

Euclidean Distance Classifier

Page 28: Internet Traffic Classification KISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi 1.

2i

2j Centroid

Center of mass

Iper-sphere False negatives

Radius

Euclidean Distance Classifier

Page 29: Internet Traffic Classification KISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi 1.

2i

2j Centroid

Center of mass

Iper-sphere min { False Pos. } min { False Neg. }

Confidence

The distance is a measure of the condifence of the decision

Euclidean Distance Classifier

Page 30: Internet Traffic Classification KISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi 1.

Radius

Tru

e P

ositi

ve

– F

alse

pos

itive

How to define the sphere radius?

Page 31: Internet Traffic Classification KISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi 1.

Space ofsamples(dim. C)

Kernel function

Space of feature

(dim. ∞)

Kernel functions Move point so that borders

are simple

Support Vector Machine

Page 32: Internet Traffic Classification KISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi 1.

Support vectors

Support vectors

Kernel functions Move point so that borders

are simple

Borders are planes Simple surface! Nice math Support Vectors LibSVM

Support Vector Machine

Page 33: Internet Traffic Classification KISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi 1.

Decision Distance from the border Confidence is a

probability

p ( class )

Kernel functions

Borders are planes Simple surface! Nice math Support Vectors LibSVM

Support Vector Machine

Page 34: Internet Traffic Classification KISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi 1.

Performance evaluationHow accurate is all this?

Our ApproachPhase 1

Feature

Phase 3

Verify

Phase 2

DecisionTraffic(Known)

Statistical characterization of bits in a flow

Decision process Test

Minimum distance / maximum likelihood

2

Page 35: Internet Traffic Classification KISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi 1.

Per flow and per endpoint

What are we going to classify? It can be applied to both single flows And to endpoints

It is robust to sampling Does not require to monitor all packets, not the

first packets

35

Page 36: Internet Traffic Classification KISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi 1.

Real traffic tracesInternet

Fastweb

Known + Other Training Known Traffic False Negatives Unknown traffic False Positives

Trace

RTPeMuleDNS

Oracle(DPI +Manual )

other

Other UnknownTraffic

1 day long trace

20 GByte diUDP traffic

Page 37: Internet Traffic Classification KISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi 1.

Definition of false positive/negative

TrafficOracle (DPI) eMuleRTP

DNS

Other

Classifing “known”

true positives

false negatives

true negatives

false positives

Classifing “other”KISS KISS

Page 38: Internet Traffic Classification KISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi 1.

Case A Case BRtp 0.08 0.23Edk 13.03 7.97Dns 6.57 19.19

Case A Case B0.00 0.050.98 0.540.12 2.14

Case A Case Bother 13.6 17.01

Euclidean Distance SVM

Case A Case B0.00 0.18

Results

Known traffic(False Neg.)

[%]

Other(False Pos.)

[%]

Page 39: Internet Traffic Classification KISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi 1.

Real traffic trace

RTP errors are oracle mistakes(do not identify RTP v1)

DNS errors are due to impure training set

(for the oracle all port 53 is DNS traffic)

EDK errors are (maybe) Xbox Live(proper training for “other”)

FN are always below 3%!!!

Page 40: Internet Traffic Classification KISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi 1.

Tuning trainset size

%

True positives

False positives

Samples per class

(confidence 5%)

Small training setFor “known”: 70-80 MbyteFor “other”: 300 Mbyte

Page 41: Internet Traffic Classification KISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi 1.

2

packets

%

True positives

False positives

Tuning num of packets for

(confidence 5%)

Protocols with volumesat least 70-80 pkts per flow

Page 42: Internet Traffic Classification KISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi 1.

P2P-TV applications

P2P-TV applications are becoming popularThey heavly rely on UDP at the transport protocolThey are based on proprietary protocolsThey are evolving over time very quicklyHow to identify them?... After 6 hours, KISS give you results

Page 43: Internet Traffic Classification KISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi 1.

The Failure of DPI

Page 44: Internet Traffic Classification KISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi 1.

And for TCP?

44

Page 45: Internet Traffic Classification KISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi 1.

Chunking and 2

First N payload bytes

First N payload bytes

C chunks Each of

b bits2

12

C[ ], … ,

Vector of Statistics

The provides an implicit measure of entropy or randomness

2

Observeddistribution

Expecteddistribution(uniform)

Page 46: Internet Traffic Classification KISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi 1.

Results

46

Page 47: Internet Traffic Classification KISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi 1.

Results

47

Page 48: Internet Traffic Classification KISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi 1.

Pros and Cons

KISS is good because…• Blind approach• Completely automated• Works with many protocols• Works even with small training• Statistics can start at any point• Robust w.r.t. packet drops• Bypasses some DPI problems

but…• Learn (other) properly• Needs volumes of traffic• May require memory (for now)• Only UDP (for now)• Only offline (for now)

Page 49: Internet Traffic Classification KISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi 1.

Papers D. Bonfiglio, M. Mellia, M. Meo, D. Rossi, P. Tofanelli “Revealing skype

traffic: when randomness plays with you”, ACM SIGCOMM, Kyoto, JP, August 2007

D. Rossi, M. Mellia, M. Meo, “A Detailed Measurement of Skype Network Traffic”, 7th International Workshop on Peer-to-Peer Systems (IPTPS '08), Tampa Bay, Florida, February 2008

D. Bonfiglio, M. Mellia, M. Meo, N. Ritacca, D. Rossi, “Tracking Down Skype Traffic”, IEEE Infocom, Phoenix, AZ, 15,17 April 2008

D. Bonfiglio, M. Mellia, M. Meo, D. Rossi Detailed Analysis of Skype Traffic IEEE Transactions on Multimedia "1", Vol. 11, No. 1, pp. 117-127, ISSN: 1520-9210, January 2009

A. Finamore, M. Mellia, M. Meo, D. Rossi KISS: Stochastic Packet Inspection 1st Traffic Monitoring and Analysis (TMA) Workshop Aachen, 11 May 2009

Page 50: Internet Traffic Classification KISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi 1.

And for TCP

50