Pushing Up Performance for Everyone

24
Pushing Up Performance for Everyone Matt Mathis 7-Dec-99

description

Pushing Up Performance for Everyone. Matt Mathis 7-Dec-99. Why do so few people get good network performance?. Context and history Architectural origins Approaches. The Wizard Gap. Past Performance Evolution. Wizards wrote standards Standard TCP could not go fast (1988) - PowerPoint PPT Presentation

Transcript of Pushing Up Performance for Everyone

Page 1: Pushing Up Performance for Everyone

Pushing Up Performance for Everyone

Matt Mathis

7-Dec-99

Page 2: Pushing Up Performance for Everyone

Why do so few people get good network performance?

• Context and history

• Architectural origins

• Approaches

Page 3: Pushing Up Performance for Everyone

The Wizard Gap

0.1

1

10

100

1000

Year

Dat

a R

ate

(Mb/

s)

Expert

Default

Page 4: Pushing Up Performance for Everyone

Past Performance Evolution

• Wizards wrote standards– Standard TCP could not go fast (1988)

• Wizards enhanced systems– Stock systems could not go fast (1995)

• Gurus tune systems (today)– Fast TCP is present – Badly misstuned by default

Page 5: Pushing Up Performance for Everyone

Ongoing Performance Evolution

• More disciples tune and debug (tomorrow)– All netadmins and sysadmins?

• Systems are tuned by default (future)– Web100..…

• Debugging will become “easy” (?)

Page 6: Pushing Up Performance for Everyone

Architecture

• The Good news– TCP hides the net from the application

• The Bad news– TCP hides the net

Page 7: Pushing Up Performance for Everyone

Architecture

• The Good news– TCP hides the net from the application

• The Bad news– TCP hides the net

……. including ALL bugs everywhere.

• The only legal symptom is less than expected performance

Page 8: Pushing Up Performance for Everyone

You get poor performance if:

– The application is inefficient– TCP is buggy – TCP is misstuned– The path is buggy– The path is congested– Routing is suboptimal

Especially on a long path.– Think: weakest link of an invisible chain

Page 9: Pushing Up Performance for Everyone

Closing the Wizard gap

• Share the expertise– Train more disciples

• Require less expertise– Systems should tune themselves

• Better observability– Focused and efficient debugging

• Documentation– Show that the world is improving

Page 10: Pushing Up Performance for Everyone

Share the expertise

• Joint Techs meetings

• TCP Tuning– In depth presentation by Matt Mathis

• DAST Application tutorials– See: dast.nlanr.net

Page 11: Pushing Up Performance for Everyone

Require less expertise

• TCP Autotuning– Presentation by Matt Mathis

• Web100– Presentation be Basil Irwin

• Online TCP debugging resources– See http://www.ncne.nlanr.net/TCP

Page 12: Pushing Up Performance for Everyone

Better Observability (Instrumentation)

• Network Instrumentation and Visualization– Presentation by Mark Gates

• Trace Analysis and Auto-Diagnosis– Presentation by Kathy Benninger

• Better TCP instrumentation (Web-100)– Just ask TCP why it is slow

Page 13: Pushing Up Performance for Everyone

Better Observability(Debugging methods)

• Sweden - Pittsburgh path– Presentation by Greg Miller & Jerry Sobieski

• iPerf tool– Presentation by Mark Gates

• Existing tools and tool repositories– See: http://www.ncne.nlanr.net/tools

• Still insufficient

Page 14: Pushing Up Performance for Everyone

Better Observability(Measurement)

• Measurements from Seattle I2 Meeting– Presentation by Matt Zekauskas

• Advanced Research and Engineering Atlas– Presentation by John Jamison

• Many distributed measurement efforts– AMP, Surveyor, NIMI, etc

Page 15: Pushing Up Performance for Everyone

Documentation

• vBNS stats and measurement– Tutorial by Rick Wilder

• NLANR MOAT vBNS traffic on NAI– See: moat.nlanr.net

• Many benchmark efforts– Surveyor, AMP, NIMI, Web100……

• HPC host census(?)

Page 16: Pushing Up Performance for Everyone

Conclusion

• We need to find every bug that TCP hides– Now and always

• We need to eliminate all irrelevant controls– Autotune TCP (and RED, etc)

Page 17: Pushing Up Performance for Everyone

Debugging flowchart

• http://www.ncne.nlanr.net/TCP/debugging

• Look at a trace and click to study symptoms

• Ongoing evolution

Page 18: Pushing Up Performance for Everyone

Testrig kit

• "Fool proof" TCP diagnosis starter kit with:– Simple diagnostic application– TCP trace collection tools– Visualization tools– Pointer to the debugging flowchart

• With wrapper scripts around everything

Page 19: Pushing Up Performance for Everyone

TCP Debugging In-depth

• Draft done at CAIDA this summer

• Future NCNE On-site– 1, 2.5 and 5 hour versions

• Basis for the debugging flowchart

• Update from flowchart as it evolves

• Interactive - Uses magicpoint/xplot

Page 20: Pushing Up Performance for Everyone

Trace Analysis and Auto-Diagnosis(TAAD)

• Scan GigaPop traffic for misstuned TCP connections– that fail to meet the model

rate = (MSS/RTT) * (C/sqrt(p))

• Running prototype

• Use to direct other resources

Page 21: Pushing Up Performance for Everyone

Autotuning

• Make TCP “do the right thing” by default

• No unneeded user controls

Page 22: Pushing Up Performance for Everyone

Generate data points (AMP)

• Nearly 100 systems already

• Kernel TCP bug– Need to upgrade to freeBSD 3.3

• Easy to create 100x1 data points

• Can create 100x100 data points

• Opportunity for NIMI

Page 23: Pushing Up Performance for Everyone

Generate OC-12 data points

• Max Okumoto working at PSC for SDSC

• Will start tuning selected paths

Page 24: Pushing Up Performance for Everyone

HPC Host Census

• Use existing data from MCI OC-Xmon

• Patterned after HWB big flow detection

• Measure the number of fast hosts

• Words needed to generalize to all of JET