Modularity and Costs Greg Busby Computer Science 614 March 26, 2002.
-
date post
21-Dec-2015 -
Category
Documents
-
view
215 -
download
0
Transcript of Modularity and Costs Greg Busby Computer Science 614 March 26, 2002.
![Page 1: Modularity and Costs Greg Busby Computer Science 614 March 26, 2002.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d6b5503460f94a49eb2/html5/thumbnails/1.jpg)
Modularity and CostsGreg Busby
Computer Science 614
March 26, 2002
![Page 2: Modularity and Costs Greg Busby Computer Science 614 March 26, 2002.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d6b5503460f94a49eb2/html5/thumbnails/2.jpg)
Problem 1 – Complexity
Protocols are necessary to do network communications Both ends must agree on format to
exchange messages
Communication protocols are complexUsing several protocols together is even more complex
![Page 3: Modularity and Costs Greg Busby Computer Science 614 March 26, 2002.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d6b5503460f94a49eb2/html5/thumbnails/3.jpg)
Solution 1 – Layers
Implement each protocol independently Allows cleaner implementation
Layer protocols Maintains modularity Reduces complexity – no need to
understand interactions between protocols
![Page 4: Modularity and Costs Greg Busby Computer Science 614 March 26, 2002.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d6b5503460f94a49eb2/html5/thumbnails/4.jpg)
Problem 2 – Delays
Messages get larger as additional headers are added at each layerProcessing overhead for switch between layersNeed to wait for one protocol to finish before starting the nextI/O overhead with multiple writes to memory as buffers are stored between layers
![Page 5: Modularity and Costs Greg Busby Computer Science 614 March 26, 2002.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d6b5503460f94a49eb2/html5/thumbnails/5.jpg)
Solution 2 – Improve Performance
Will discuss several approaches, including pros and cons of each: x-Kernel: Puts entire communication system
directly in the kernel with specific objects and support routines
Integrated Layer Processing (ILP): Integrates protocol layers to reduce task switching and memory writes
Protocol Accelerator (PA): Reduces total data to send and shortens critical path of code between messages
![Page 6: Modularity and Costs Greg Busby Computer Science 614 March 26, 2002.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d6b5503460f94a49eb2/html5/thumbnails/6.jpg)
x-Kernel
Defines a uniform set of abstractions for protocolsStructures protocols for efficient interaction in the common caseSupports primitive routines for common protocol tasks
![Page 7: Modularity and Costs Greg Busby Computer Science 614 March 26, 2002.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d6b5503460f94a49eb2/html5/thumbnails/7.jpg)
x-Kernel Architecture
Provides objects for protocols, sessions, and messages
Creates a kernel for a specific set of protocols (static)
Instantiates sessions for each protocol as needed (dynamic)
Messages are active objects that move through protocol/sessions
Provides specific support routines
TCP
ETH
IP
UDP
![Page 8: Modularity and Costs Greg Busby Computer Science 614 March 26, 2002.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d6b5503460f94a49eb2/html5/thumbnails/8.jpg)
x-Kernel Objects
Protocols Create sessions Demux messages received
Sessions Represent connections Created and destroyed when connections
made/terminated
Messages Contain the data itself Passed from level to level
![Page 9: Modularity and Costs Greg Busby Computer Science 614 March 26, 2002.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d6b5503460f94a49eb2/html5/thumbnails/9.jpg)
x-Kernel Primitives
Buffer managers Allocate, concatenate, split, and
truncate Operate in local process heap
Map managers Add, remove, and map bindings for
protocols
Event managers Provide timers to allow timeouts
![Page 10: Modularity and Costs Greg Busby Computer Science 614 March 26, 2002.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d6b5503460f94a49eb2/html5/thumbnails/10.jpg)
x-Kernel Performance
2-3 x faster than Unix overallUnix cost is primarily due to socketsProtocol performance is comparableConclusion: architecture is the difference
![Page 11: Modularity and Costs Greg Busby Computer Science 614 March 26, 2002.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d6b5503460f94a49eb2/html5/thumbnails/11.jpg)
x-Kernel Conclusions
Pros: Architecture simplifies
the implementation of protocols
Uniformity of interface between protocols makes protocol performance predictable and reduces overhead between protocols
Possible to write efficient protocols by tuning the underlying architecture
Don’t need to know exact protocol stack
Cons: Requires new
compilation of the kernel for each new set of protocols
Doesn’t reduce message size (headers) or sequentiallity of processes
Primarily useful as a research tool for protocol implementation, not to improve performance per se.
![Page 12: Modularity and Costs Greg Busby Computer Science 614 March 26, 2002.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d6b5503460f94a49eb2/html5/thumbnails/12.jpg)
Integrated Layer Processing (ILP)
Reduces protocol layers by integrating processingTunes performance to increase caching and avoid memory I/OEliminates redundant copies (similar to U-Net’s shared memory)
![Page 13: Modularity and Costs Greg Busby Computer Science 614 March 26, 2002.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d6b5503460f94a49eb2/html5/thumbnails/13.jpg)
ILP Architecture
Combine protocol-specific manipulations in a single loop where possibleProcess small pieces to make use of processor on-board cachingPut as much processing as possible in-line (macros) versus function calls
![Page 14: Modularity and Costs Greg Busby Computer Science 614 March 26, 2002.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d6b5503460f94a49eb2/html5/thumbnails/14.jpg)
ILP Loop
Combine marshalling (encoding), encryption, and checksumming Work in memory, reduce copyingReduces steps from 5 to 2 (increased processing at step 1)
ApplicationData
ApplicationData
Kernel Buffer
TCP Buffer
ApplicationData
TCP Buffer
Kernel Buffer
1. Marshalling (r/w)
2. Encryption (r/w)
3. Copying (r/w)
4. Checksum (r)
5. System copy (r/w) 2. System copy (r/w)
1. Marshalling
ILPSend
Non-ILPSend
encryption, andchecksumming (r/w)
![Page 15: Modularity and Costs Greg Busby Computer Science 614 March 26, 2002.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d6b5503460f94a49eb2/html5/thumbnails/15.jpg)
ILP Processing (send)
Divide message into small partsBegin marshalling and encryption on part B, then C…Process part A once length is knownFinish protocol-specific processingDoesn’t work if A must be processed first (ordering-constrained)
RPC Header Data
Length align.
TCP Header
Part A Part CPart B
marshalling, encryption
checksum
![Page 16: Modularity and Costs Greg Busby Computer Science 614 March 26, 2002.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d6b5503460f94a49eb2/html5/thumbnails/16.jpg)
ILP Performance
Processing reduction of 20-25%Throughput improvement of 10-15%Actually reduces cache usage, although designed to optimize itPerformance gains can easily be masked by using strong encryption which drastically increases processingConclusion: performance results were such that use is “debatable in existing communication systems…”
![Page 17: Modularity and Costs Greg Busby Computer Science 614 March 26, 2002.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d6b5503460f94a49eb2/html5/thumbnails/17.jpg)
ILP Conclusions
Pros Decreased
memory access up to 30%
Slightly improved performance
Cons Only applicable
with non-ordering constrained functions
Requires macros to increase speed, reducing flexibility
Protocol stack must be known before-hand
![Page 18: Modularity and Costs Greg Busby Computer Science 614 March 26, 2002.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d6b5503460f94a49eb2/html5/thumbnails/18.jpg)
The Protocol Accelerator (PA)
Reduces header overhead by sending non-changing protocol headers only onceFurther reduces total bytes by packing other header information across protocolsReduces layered protocol processing overhead by splitting processing of header and data (canonical processing)
![Page 19: Modularity and Costs Greg Busby Computer Science 614 March 26, 2002.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d6b5503460f94a49eb2/html5/thumbnails/19.jpg)
PA Header Reduction
Four classes of Header Information Connection Identification – don’t change during
session Protocol-specific Information – depends only on
protocol state, not on message Message-specific Information – depends on
contents of message but not protocol state Gossip – included because overhead is small, but
optional
Connection Cookies 8-byte field that replaces the Connection
Identification information
![Page 20: Modularity and Costs Greg Busby Computer Science 614 March 26, 2002.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d6b5503460f94a49eb2/html5/thumbnails/20.jpg)
PA Message Format
Connection Cookie suffices for Connection ID on 2nd & later messagesPacking information explained belowGossip is optional but useful
Connection cookie (62 bit number)
Connection Identification (first message)
Protocol-specific Information
Message-specific Information
Gossip (optional)
Packing Information (if packed)
Application Data
Connection Id Present bit
Byte order bit (big- or little-endian)
![Page 21: Modularity and Costs Greg Busby Computer Science 614 March 26, 2002.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d6b5503460f94a49eb2/html5/thumbnails/21.jpg)
PA Processing Reduction
Canonical Protocol Processing – Breaks processing in a protocol layer into 2 parts Pre-processing Phase – build or check
message header without changing protocol state
Post-processing Phase – update protocol state; attempt to do this after message is sent or delivered
Pre-processing at every layer done before post-processing at any layer
![Page 22: Modularity and Costs Greg Busby Computer Science 614 March 26, 2002.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d6b5503460f94a49eb2/html5/thumbnails/22.jpg)
PA Processing Reduction (cont.)
Header Prediction Use post-processing phase to predict formation
of next header
Packet Filters A pre-pre-processor that checks or ensures
header correctness without invoking protocol where possible; invokes protocol if necessary
Message packing Pack backlogged messages together if
application gets ahead – reduces space and processing since checksums etc. calculated only once
![Page 23: Modularity and Costs Greg Busby Computer Science 614 March 26, 2002.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d6b5503460f94a49eb2/html5/thumbnails/23.jpg)
PA Processing (send)
Check backlog; queue and exit if anyCreate packing and predicted header, add to message dataRun packet filter to create message-specific data (and gossip, if any)Push to protocol if necessaryPush connection cookie onto front of message and sendPass to protocol stack for post-processing to update protocol state
Application
Network
ProtocolStack
Packer Unpacker
PA
PreSend
PreDeliver
![Page 24: Modularity and Costs Greg Busby Computer Science 614 March 26, 2002.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d6b5503460f94a49eb2/html5/thumbnails/24.jpg)
PA Performance
Can gain an order of magnitude improvement over pure layered protocolsMaximal throughput achieved by reducing garbage collection and doing post-processing while messages are “on the wire”Conclusion: Useful in improving performance as long as PA is used on both ends of
![Page 25: Modularity and Costs Greg Busby Computer Science 614 March 26, 2002.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d6b5503460f94a49eb2/html5/thumbnails/25.jpg)
PA Conclusions
Pros Eliminates much
of the overhead of layered protocols
Significant speed improvement
Canonical processing applicable in any case
Cons Can’t communicate
with non-PA peer Specific PA needed
for set of protocols No fragmentation
of messages, so only works on small messages
![Page 26: Modularity and Costs Greg Busby Computer Science 614 March 26, 2002.](https://reader036.fdocuments.net/reader036/viewer/2022062714/56649d6b5503460f94a49eb2/html5/thumbnails/26.jpg)
Summary
Protocols are layered to improve modularity and reduce complexity This reduces performance
Improving performance reduces modularity Requires foreknowledge of protocol stack
Approaches Increase use of kernel (x-Kernel) Integrate processing of all layers together (ILP) Reduce message size and speed critical path (PA)
All improve performance, but only PA results in significant improvement.