Agenda

23
Performance Optimization in Apache 2.0 Development: How we made Apache faster, and what we learned from the experience [email protected] O’Reilly Open Source Convention, San Diego, CA July 24, 2002

description

Performance Optimization in Apache 2.0 Development: How we made Apache faster, and what we learned from the experience [email protected] O’Reilly Open Source Convention, San Diego, CA July 24, 2002. Agenda. Introductions Performance optimization approach - PowerPoint PPT Presentation

Transcript of Agenda

Page 1: Agenda

Performance Optimization in Apache 2.0 Development:

How we made Apache faster, and what

we learned from the experience

[email protected]

O’Reilly Open Source Convention, San Diego, CA July 24, 2002

Page 2: Agenda

Agenda

• Introductions

• Performance optimization approach– Specific optimizations in Apache 2.0– General strategy for open-source

software performance improvement

• Results and Next Steps

Page 3: Agenda

Goals for Apache 2.0 Performance

• Make the httpd faster

• But what does that mean?– How will we measure speed?– What are we willing to sacrifice for

speed?– And why does performance matter?

Page 4: Agenda

Optimization Strategy: Part 1

Know your project’s priorities:•Metrics that matter•Rules of the game

Page 5: Agenda

Performance Guidelines

• Metrics that matter for Apache:– Throughput

• HTTP requests per second

– Resource utilization• CPU, memory

• Rules of the game for Apache:– Keep the server portable, reliable,

configurable, maintainable, and compatible

Page 6: Agenda

Making Strategic Tradeoffs

• Use these metrics and rules to make effective tradeoffs

• Example: Table data structures– Slow, O(n)-time lookups; a significant

bottleneck– But 3rd party code depended upon the array-

based implementation (wasn’t well abstracted)

– Solution: keep the O(n) design, but optimize it heavily (improve the throughput metric, but maintain compatibility)

Page 7: Agenda

Optimization Strategy: Part 2

Profile early, profile often

Page 8: Agenda

Profiling Tools

• We used traditional code profiling tools to find the slow functions and basic blocks– gprof– Quantify– OProfile

• Plus tracing tools to profile system calls– truss– strace

• And occasional custom instrumentation

Page 9: Agenda

Profile-Driven Optimization

• Profiling helps to create an informal roadmap:– Small problems: fix the code now– Medium problems: phase in API changes &

faster code– Large problems: rearchitect

Page 10: Agenda

Profile-Driven Optimization

Apache 2.0 optimizations due to profiling, throughout the entire request processing flow:

Faster accept(2)serialization

Less buffercopying

More scalable, multi-threaded memory allocator

Faster MIME-typemapper and configmerge

Less stringmanipulation

Complete rewrite ofserver-side-includeparser

Platform-specificsocket I/O speedups

Timestamp cachingin access logger

ReadRequest

Create RequestData Structures

Map URLto File

DetermineContent-Type

Stream OutputThrough Filters

Send ResponseTo Client

AcceptConnection

LogRequest

OpenFile

Page 11: Agenda

Optimization Strategy: Part 3

Take advantage ofimprovements in the platform

Page 12: Agenda

Platform Optimizations

• 2.0 uses fast platform features if available:– sendfile(2)– unserialized or pthread-mutex-serialized

accept(2)– Atomic operations

Page 13: Agenda

Platform Optimizations

• Apache Portable Runtime (APR) library abstracts the OS specifics– “Greatest common denominator” approach– Write your application code to use efficient

OS features– On platforms where those features are not

available, APR will emulate them

• In 2.0, the concurrency model is a plug-in– We can add better threading models for

specific platforms

Page 14: Agenda

Optimization Strategy, Part 4

Use the powerof distributed development

Page 15: Agenda

Distributed Development

• Just like open source debugging, open-source performance tuning scales well as more people work on a problem

• “Redundant” coding has worked well:– Multiple people implementing different

approaches to the same problem– Share ideas, compare results, pick the

best implementation

Page 16: Agenda

Distributed Optimization Example:SSI Parser

From: Brian Pane Date: 2001-09-05 3:00:35Subject: remaining CPU bottlenecks in 2.0

…Here are the top 30 functions, ranked according totheir CPU utilization. :

CPU timefunction (% of total)-------- ------------find_start_sequence 23.9 …* find_start_sequence() is the main scanning function within mod_include. …

Page 17: Agenda

Distributed Optimization Example:SSI Parser

From: Justin Erenkrantz Date: 2001-09-05 8:42:46Subject: [PATCH] Potential replacement for find_start_sequence

…Basically, replace the inner search with aRabin-Karp search…

From: Sander Striker Date: 2001-09-05 8:47:59Subject: Re: [PATCH] Potential replacement for find_start_sequence

…Rabin-Karp introduces a lot of * and %.I'll try Boyer-Moore with precalced tablesfor '<!--#' and '--->'…

From: Sascha Schumann Date: 2001-09-05 10:51:53Subject: Re: [PATCH] Potential replacement for find_start_sequence

…I'd suggest looking at BNDM which combines theadvantages of bit-parallelism (shift-and/-oralgorithms) and suffix automata…

From: Ian Holsman Date: 2001-09-05 16:18:11Subject: [PATCH] Potential replacement for find_start_sequence..--skip5

…I can post my code to the skip5 implementation. Itisn't optimized yet, but in my tests I see a lowerCPU utilization than the standard mod-includes parser…

Page 18: Agenda

Distributed Optimization Example:SSI Parser

From: Justin Erenkrantz Date: 2001-09-05 19:08:31Subject: [PATCH] Round 2 of mod_include/find_start_sequence...

…Replaced Rabin-Karp with the bndm algorithm asimplemented by Sascha. Seems to work. Can peopleplease test/review?…

• SSI parser performance improvement:– Before: 23.9% of total usr CPU time– After: 4.8%

• Greater than 4x improvement in 48 hours

Page 19: Agenda

Results

Page 20: Agenda

Results

Performance on a simple file delivery test:

Test case description:– Server running on Solaris 8 on Sun E4000/8x167

MHz, 2GB RAM– 20 concurrent client connections requesting 10KB

non-parsed file over 100Mb/s switched network

httpd Requests/sec

CPU Utilization

1.3.24 777 61%

2.0.36 912 77%

Page 21: Agenda

Results

Performance on a server-parsed (.shtml) file test:

Test case description:– Server running on Solaris 8 on Sun E4000/8x167 MHz,

2GB RAM– 20 concurrent client connections over 100Mb/s

switched network– .shtml file with virtual includes of five 2KB files

httpd Requests/sec

CPU Utilization

1.3.24 389 94%

2.0.37 712 93%

Page 22: Agenda

Conclusion

Next steps for Apache:

• Continue incremental performance improvements

• Explore highly scalable concurrency models (multiple connections per thread)

Page 23: Agenda

Conclusion

Recommendations for other projects:

1. Know your project’s priorities:• Metrics that matter• Rules of the game

2. Profile early, profile often3. Take advantage of platform

improvements4. Use the power of distributed

development