Pushing Up Performance for Everyone

Post on 14-Jan-2016

36 views 0 download

Tags:

description

Pushing Up Performance for Everyone. Matt Mathis 7-Dec-99. Why do so few people get good network performance?. Context and history Architectural origins Approaches. The Wizard Gap. Past Performance Evolution. Wizards wrote standards Standard TCP could not go fast (1988) - PowerPoint PPT Presentation

Transcript of Pushing Up Performance for Everyone

Pushing Up Performance for Everyone

Matt Mathis

7-Dec-99

Why do so few people get good network performance?

• Context and history

• Architectural origins

• Approaches

The Wizard Gap

0.1

1

10

100

1000

Year

Dat

a R

ate

(Mb/

s)

Expert

Default

Past Performance Evolution

• Wizards wrote standards– Standard TCP could not go fast (1988)

• Wizards enhanced systems– Stock systems could not go fast (1995)

• Gurus tune systems (today)– Fast TCP is present – Badly misstuned by default

Ongoing Performance Evolution

• More disciples tune and debug (tomorrow)– All netadmins and sysadmins?

• Systems are tuned by default (future)– Web100..…

• Debugging will become “easy” (?)

Architecture

• The Good news– TCP hides the net from the application

• The Bad news– TCP hides the net

Architecture

• The Good news– TCP hides the net from the application

• The Bad news– TCP hides the net

……. including ALL bugs everywhere.

• The only legal symptom is less than expected performance

You get poor performance if:

– The application is inefficient– TCP is buggy – TCP is misstuned– The path is buggy– The path is congested– Routing is suboptimal

Especially on a long path.– Think: weakest link of an invisible chain

Closing the Wizard gap

• Share the expertise– Train more disciples

• Require less expertise– Systems should tune themselves

• Better observability– Focused and efficient debugging

• Documentation– Show that the world is improving

Share the expertise

• Joint Techs meetings

• TCP Tuning– In depth presentation by Matt Mathis

• DAST Application tutorials– See: dast.nlanr.net

Require less expertise

• TCP Autotuning– Presentation by Matt Mathis

• Web100– Presentation be Basil Irwin

• Online TCP debugging resources– See http://www.ncne.nlanr.net/TCP

Better Observability (Instrumentation)

• Network Instrumentation and Visualization– Presentation by Mark Gates

• Trace Analysis and Auto-Diagnosis– Presentation by Kathy Benninger

• Better TCP instrumentation (Web-100)– Just ask TCP why it is slow

Better Observability(Debugging methods)

• Sweden - Pittsburgh path– Presentation by Greg Miller & Jerry Sobieski

• iPerf tool– Presentation by Mark Gates

• Existing tools and tool repositories– See: http://www.ncne.nlanr.net/tools

• Still insufficient

Better Observability(Measurement)

• Measurements from Seattle I2 Meeting– Presentation by Matt Zekauskas

• Advanced Research and Engineering Atlas– Presentation by John Jamison

• Many distributed measurement efforts– AMP, Surveyor, NIMI, etc

Documentation

• vBNS stats and measurement– Tutorial by Rick Wilder

• NLANR MOAT vBNS traffic on NAI– See: moat.nlanr.net

• Many benchmark efforts– Surveyor, AMP, NIMI, Web100……

• HPC host census(?)

Conclusion

• We need to find every bug that TCP hides– Now and always

• We need to eliminate all irrelevant controls– Autotune TCP (and RED, etc)

Debugging flowchart

• http://www.ncne.nlanr.net/TCP/debugging

• Look at a trace and click to study symptoms

• Ongoing evolution

Testrig kit

• "Fool proof" TCP diagnosis starter kit with:– Simple diagnostic application– TCP trace collection tools– Visualization tools– Pointer to the debugging flowchart

• With wrapper scripts around everything

TCP Debugging In-depth

• Draft done at CAIDA this summer

• Future NCNE On-site– 1, 2.5 and 5 hour versions

• Basis for the debugging flowchart

• Update from flowchart as it evolves

• Interactive - Uses magicpoint/xplot

Trace Analysis and Auto-Diagnosis(TAAD)

• Scan GigaPop traffic for misstuned TCP connections– that fail to meet the model

rate = (MSS/RTT) * (C/sqrt(p))

• Running prototype

• Use to direct other resources

Autotuning

• Make TCP “do the right thing” by default

• No unneeded user controls

Generate data points (AMP)

• Nearly 100 systems already

• Kernel TCP bug– Need to upgrade to freeBSD 3.3

• Easy to create 100x1 data points

• Can create 100x100 data points

• Opportunity for NIMI

Generate OC-12 data points

• Max Okumoto working at PSC for SDSC

• Will start tuning selected paths

HPC Host Census

• Use existing data from MCI OC-Xmon

• Patterned after HWB big flow detection

• Measure the number of fast hosts

• Words needed to generalize to all of JET