DiffProbeDetecting ISP Service Discrimination
Partha Kanuparthy, Constantine Dovrolis
Net Neutrality
Recent FCC-ISP debates Comcast throttling dispute, etc.
FCC broadband mapping framework Tools to estimate performance $350m stimulus funds
What is Service Discrimination?
ISPs can classify certain apps as low-priority: and service them accordingly
Discrimination can manifest as (relatively): high delays high loss rates
ISP can also do shaping: leads to low throughput (=> both delay and loss) ShaperProbe: first step
Goals
Problem: Is an application's traffic being
classified low-priority by an ISP? Is the ISP doing loss or delay
discrimination or both? Can we identify scheduler type?
Solution: Compare performance of normal
and application traffic sent simultaneously
Identifying discrimination is not easy:
1. Congestion events can be short-lived (us-ms scales)
– Bad idea: compare delays/loss rates from different times
2. Customer may see same performance if there is no cross-traffic
– Bad idea: call this as no-discrimination
serverDiffProbe
WRR
Delay Discrimination: Practice
Non-discriminatory schedulers (single queue): First-Come-First-Serve (FCFS)
Discriminatory schedulers (multiple classes): Strict Priority (SP) Weighted Fair Queuing (WFQ)
Delay discrimination creates difference in delay distributions
Loss Discrimination: Practice
Non-discriminatory buffer managers: DropTail (DT) Random Early Detect (RED)
Discriminatory buffer managers: Weighted RED (WRED) Drop-from-Longest-Queue
Loss discrimination creates difference in loss rates
Drop-from-Longest
Rest of the talk…
High level design Detecting delay discrimination Detecting loss discrimination The DiffProbe tool
ShaperProbe
High-level design
Send normal (P) and application (A) traffic simultaneously
Measure one-way delays (OWDs) and lost packets for each flow
serverDiffProbe
Application traffic (A)
Normal traffic (P)
Avoiding Classification
A flow: , ... P flow has to be:
sufficiently different from A to avoid classification Ex: alter payload, ports, gaps
sufficient similar to A to observe same network performance as P when there is no discrimination same packet size distribution between A and P send a P packet at about same time as A
Probing Patterns
Create two probing structures using A and P: Balanced Load Period (BLP): send both flows at
their normal rates
Load Increase Period (LIP): scale up P flow's rate
Why create LIP? To maximize chances of queuing in ISP network
AP
AP
Discrimination Identifiability
The user does not always “see” discrimination no high-priority backlog “=>” Low-priority gets link
capacity We use BLP to detect unidentifiable conditions
for delay discrimination P delays created during LIP are larger than BLP
90th percentile of P's delays during LIP
median of P's delays during BLP>
Overview
High level designHigh level design Detecting delay discrimination Detecting loss discrimination The DiffProbe tool
Detecting Delay Discrimination
We observe empirical delay distributions of A and P flows during LIP: and
No delay discrimination:
Delay discrimination:WRR 1:3
(emulated)
FCFS(Comcast)
Detecting Delay Discrimination (2)
Pre-processing: Pairing: Consider only those (A,P) sample pairs
which were sent within an MTU-transmission time, τ
Discard delay values in τ-neighborhood of estimated propagation delay such samples don't see queuing
Subtract propagation delay estimate from samples
AP
Detecting Delay Discrimination (3)
Hypothesis test for :1. Null hypothesis: equal distributions
2. Compute Kullbeck-Leibler (KL) Divergence of pre-processed samples
3. Compute KL Divergences of uniform random partitions of
4. Is (2) > (3)?
• Test for Compare all higher percentiles (50th - 90th) of A and P delay
distributions Redo the test, swapping A and P as inputs If this test fails, we state that delay discrimination is unknown
Delay Discrimination: Accuracy
Evaluate using simulations: Discrimination using SP and WFQ Skype iSAC packet trace as A flow Cross-traffic: interactive TCP sessions
(200 users) Half of user traffic classified low-priority BLP, LIP durations: 30s
90+% accuracy among detectable trials
95% confidence, 2% error margin
WFQ weights
FCFS, SP, WFQ
1:1.5 is similar to FCFS
SP or WFQ? SP-like or WFQ-like scheduling create diff. delays
Idea: some P packets serviced just after A would: see only A's non-preemption delay (if any) in SP but, see A's queuing delays in WFQ
Low-prioritySP WFQ 1:2
non-preemption queuing
WFQSP
Distribution of P subset
Method: choose a subset of P samples:
received very close but after an A packet
Overview
High level designHigh level design Detecting delay discriminationDetecting delay discrimination Detecting loss discrimination The DiffProbe tool
Detecting Loss Discrimination
Estimate loss rates of A and P flows during LIP as fraction of packets lost: and
No loss discrimination:
Loss discrimination:
WRR 1:3 Drop-Longest-Queue(emulated)
Detecting Loss Discrimination (2)
Pre-processing: to estimate and Pairing: same as that for delay discrimination
ensure the A and P flows sample the same congestion events if DropTail/RED
Use the Two-Proportion Test on and Unidentifiability: less than 10 dropped packets
in each flow
Loss Discrimination: Accuracy
Buffer sizes according to BW-Delay product
90+% accuracy for discriminating configurations
WRED accuracy
f: Min queue thresholdof normal flows:
Drop-Longest-Queue (WFQ) vs. DT
WFQ 1:1.5 is similar to DT
similar loss rates
Overview
High level designHigh level design Detecting delay discriminationDetecting delay discrimination Detecting loss discriminationDetecting loss discrimination The DiffProbe tool
Implementing DiffProbe
DiffProbe runs as client-server (~7500 LoCs) Classifier types: port, payload A flow: Skype and Vonage voice traces P flow: randomize payload, port of A flow LIP, BLP durations: 30s each
Pre-probing: estimate path capacity using packet trains
Experiments
Emulations: discriminating link configured using tc Pareto cross-traffic SP, WRR, and Drop-Longest-Queue discriminators No FPs, FNs
Real-world experiments (Skype and Vonage):
KL-test p-values: Access ISP runs
We do not have ground truth A high p-value of KL-test is a good “indicator” of no-discrimination One ISP showed multi-path routing, which created different delays
Validation
ISPs have so far not disclosed details of application discrimination practices (if any) No ground truth!
Discrimination: significant difference in delays and/or losses of A and P Why? : controlled environment trials!
Validation ideas?
Overview
High level designHigh level design Detecting delay discriminationDetecting delay discrimination Detecting loss discriminationDetecting loss discrimination The DiffProbe toolThe DiffProbe tool
ShaperProbe
A pre-probing module of DiffProbe to answer:
Can we detect traffic shaping by ISPs?
What is the shaping configuration?
Key idea: probe and detect level shifts in rate
the token bucket signature Upload: 7Mbps -> 2Mbpsin 8s
ShaperProbe (contd.)
Deployed at Google M-Lab 60,000+ runs so far
Who shapes traffic?
...among 700+ other ASes.
Thank You!
partha @ cc.gatech.edu
Detecting Delay Discrimination (3)
Hypothesis test for : Null hypothesis: equal distributions Compute Kullbeck-Leibler (KL) Divergence of pre-
processed samples call it
Bootstrap: compute KL Divergences of uniform random partitions of this gives us a KL distribution
Reject null hypothesis if p-value is < 0.05:
Detecting Delay Discrimination (4)
Test for (if KL-test rejects null hypothesis) Compare higher percentiles of A and P delay
distributions
Redo the test, swapping A and P as inputs If this test fails, we state that delay
discrimination is unknown
SP or WFQ? (2)
For the distribution of this subset of P samples: SP if: 95th percentile P delay ≈ 5th percentile WFQ-like, otherwise
WFQ
SP
WFQ-SP accuracyDistribution of P subset
Top Related