Operating Systems & Network Interfaces Mohammad Alisafaee Department of Electrical and Computer...
-
Upload
phoebe-erin-bradford -
Category
Documents
-
view
218 -
download
0
Transcript of Operating Systems & Network Interfaces Mohammad Alisafaee Department of Electrical and Computer...
Operating Systems & Operating Systems & Network InterfacesNetwork Interfaces
Mohammad AlisafaeeMohammad AlisafaeeDepartment of Electrical and Computer Department of Electrical and Computer
EngineeringEngineering
University of TehranUniversity of Tehran
8 Jan 20048 Jan 2004
2
OutlineOutline
• Network SubsystemNetwork Subsystem• Problems of Current Network SubsystemsProblems of Current Network Subsystems• Principles for the Design of High-Principles for the Design of High-
Performance Network SubsystemPerformance Network Subsystem• Network Interface DesignNetwork Interface Design• High-Bandwidth Network I/OHigh-Bandwidth Network I/O• Low Latency Network AccessLow Latency Network Access• Predictable Communication PerformancePredictable Communication Performance• ConclusionConclusion
3
Network SubsystemNetwork Subsystem
• The The network subsystemnetwork subsystem comprises the comprises the network interface and the part of the network interface and the part of the operating system responsible for operating system responsible for communicationcommunication
• It has three main tasks:It has three main tasks:• Multiplexing network among applications running on Multiplexing network among applications running on
the nodethe node• providing a rich communication service for providing a rich communication service for
applications from the raw packet-delivery service of applications from the raw packet-delivery service of the underlying networkthe underlying network
• Providing a standardized abstract API for Providing a standardized abstract API for applicationsapplications
4
ChallengesChallenges
• The key challenge in designing a The key challenge in designing a network subsystem is to achieve network subsystem is to achieve application-to-application application-to-application performance close to capabilities of performance close to capabilities of the physical network, while the physical network, while maintaining standardized APIs for maintaining standardized APIs for applicationsapplications
5
Grid ApplicationsGrid Applications
• Grid applications have demanding Grid applications have demanding communication requirements, including communication requirements, including high-bandwidth, realtime, and multiple high-bandwidth, realtime, and multiple concurrent data streams with different concurrent data streams with different QoSQoS
• The fundamental services needed to The fundamental services needed to support such applications are:support such applications are:
• high-bandwidthhigh-bandwidth• low-delay communicationlow-delay communication• predictable communication performancepredictable communication performance
6
Problems of Current Problems of Current Network SubsystemsNetwork Subsystems
• In the network subsystem of current OSs:In the network subsystem of current OSs:• High per-byte processing overhead reduces High per-byte processing overhead reduces
effective application throughputeffective application throughput• High per-message processing overhead increases High per-message processing overhead increases
application latencyapplication latency• Inappropriate accounting and scheduling of Inappropriate accounting and scheduling of
operating system resources causes high variance operating system resources causes high variance in application’s communication throughput and in application’s communication throughput and latencylatency
• The result is loss of performance and The result is loss of performance and quality of service provided by the quality of service provided by the network because of inappropriate network because of inappropriate processing in the end systemprocessing in the end system
7
Principles for the Design of Principles for the Design of High-Performance Network High-Performance Network
SubsystemSubsystem• Coordinated design: Achieving high-Coordinated design: Achieving high-
performance requires a coordinated performance requires a coordinated design of the entire network subsystemdesign of the entire network subsystem
• Early demultiplexing: To appropriately Early demultiplexing: To appropriately schedule resources in the end system, it is schedule resources in the end system, it is necessary to have a single point necessary to have a single point demultiplexing for incoming packets as demultiplexing for incoming packets as close to the point of network attachment close to the point of network attachment as possibleas possible
8
Principles for the Design of Principles for the Design of High-Performance Network High-Performance Network
Subsystem (cont.)Subsystem (cont.)• Integrated layer processing (ILP): ILP is Integrated layer processing (ILP): ILP is
a technique that attempts to combine all a technique that attempts to combine all computations on the network payload computations on the network payload data in a single traversal of the datadata in a single traversal of the data
• Path-oriented structure: A system Path-oriented structure: A system structure that uses data path as central structure that uses data path as central point for optimization, is required to gain point for optimization, is required to gain high-performancehigh-performance
9
Network Interface DesignNetwork Interface Design
• How data is transmitted between the How data is transmitted between the adaptor and host memory?adaptor and host memory?– What bus the adaptor is attached to?What bus the adaptor is attached to?
• I/O versus Memory BusI/O versus Memory Bus
– Where packets are buffered while they Where packets are buffered while they await processing?await processing?• DMA versus PIODMA versus PIO
– How the adaptor and host signal each How the adaptor and host signal each other?other?• Interrupt versus PollingInterrupt versus Polling
10
I/O vs. Memory BusI/O vs. Memory Bus
• I/O buses are both open and stable I/O buses are both open and stable but memory buses are generally but memory buses are generally proprietary and change at vendor’s proprietary and change at vendor’s whimwhim
• If adaptors for memory buses could If adaptors for memory buses could be made practical:be made practical:
• The data path between the adaptor and The data path between the adaptor and memory would be shortenedmemory would be shortened
• The adaptor is able to exploit the higher The adaptor is able to exploit the higher bandwidth of memory busesbandwidth of memory buses
11
DMA vs. PIODMA vs. PIO
• DMA supports large transfers which DMA supports large transfers which eliminate the cost of acquiring buseliminate the cost of acquiring bus
• PIO load the incoming data into the PIO load the incoming data into the cache; if the application reads data soon cache; if the application reads data soon after the PIO transfer, the data may be after the PIO transfer, the data may be still in the cache; this requires a still in the cache; this requires a substantial amount of buffer space in the substantial amount of buffer space in the adaptoradaptor
• The preferable technique is highly The preferable technique is highly machine dependentmachine dependent
12
Interrupt vs. PollingInterrupt vs. Polling
• Handling interrupt is a time-consuming Handling interrupt is a time-consuming task, e.g., an interrupt takes 75 us in task, e.g., an interrupt takes 75 us in Mach OS on a DECStation 5000/200 Mach OS on a DECStation 5000/200 and the service time for a UDP/IP and the service time for a UDP/IP received packet is 200 usreceived packet is 200 us
• In some situations, when there is In some situations, when there is sufficient buffering on the adaptor and sufficient buffering on the adaptor and frequently of communication is frequently of communication is predictable, polling is appropriatepredictable, polling is appropriate
13
High-Bandwidth Network High-Bandwidth Network I/OI/O
• Achieving low per-byte overhead enables Achieving low per-byte overhead enables the system to deliver the bandwidth of a the system to deliver the bandwidth of a high-speed network to applicationhigh-speed network to application
• The primary source of per-byte processing The primary source of per-byte processing cost is data touching operations such as cost is data touching operations such as copying, checksumming, and presentation copying, checksumming, and presentation conversionconversion
• The performance impact of checksumming The performance impact of checksumming and presentation conversion can be and presentation conversion can be reduced by using the techniques of ILPreduced by using the techniques of ILP
• Multiple Data copying can be easily avoidedMultiple Data copying can be easily avoided
14
Integrated Data-Touching Integrated Data-Touching OperationsOperations
• Integrated Layer Processing (ILP) is a Integrated Layer Processing (ILP) is a general technique to reduce the overhead general technique to reduce the overhead of multiple data-touching operationof multiple data-touching operation
• The idea is to merge all data-touching The idea is to merge all data-touching operations performed on a packet and to operations performed on a packet and to perform the resulting compound operation perform the resulting compound operation in a single traversal of the datain a single traversal of the data
• In practice, implementing ILP is In practice, implementing ILP is complicated by the need to merge complicated by the need to merge operations from independent protocolsoperations from independent protocols
15
Avoiding Data CopyingAvoiding Data Copying
• Data copying occurs most frequently at Data copying occurs most frequently at the boundaries between layers of the the boundaries between layers of the network subsystemnetwork subsystem
• Copying among different protocols can be Copying among different protocols can be avoided because most protocols are avoided because most protocols are implemented inside the OS kernelimplemented inside the OS kernel
• After above improvement only two data After above improvement only two data copying are remained: data transfer copying are remained: data transfer between main memory and network between main memory and network adaptor, and between OS and applicationadaptor, and between OS and application
16
Avoiding Data Copying Avoiding Data Copying (cont.)(cont.)
• The data transfer between network The data transfer between network adaptor and main memory cannot usually adaptor and main memory cannot usually be avoidedbe avoided
• Avoiding data copying between OS and Avoiding data copying between OS and application requires the solution of a application requires the solution of a complex set of issues and affect the design complex set of issues and affect the design of the network adaptor, the demultiplexing of the network adaptor, the demultiplexing strategy, the data buffering system in the strategy, the data buffering system in the OS, and the network APIOS, and the network API
17
Avoiding Data Copying Avoiding Data Copying (cont.)(cont.)
• Numerous approaches for avoiding Numerous approaches for avoiding data copying between OS and data copying between OS and application have been proposed:application have been proposed:
MethodMethodCopy Copy
semantsemanticsics
SafeSafetyty
OutboaOutboard rd
bufferibufferingng
Early Early demuxdemux
WITLEWITLESSSS YesYes YesYes NeedeNeede
ddNeverNever
GenieGenie YesYes YesYes NoNo AlwaysAlwaysRemapRemap NoNo YesYes NoNo AlwaysAlwaysShared Shared MemorMemor
yyNoNo NoNo NoNo Not in Not in
CCCC
fbufsfbufs NoNo YesYes NoNo Not in Not in CCCC
18
Low Latency Network Low Latency Network AccessAccess
• High per-message processing overhead High per-message processing overhead may limit the rate at which messages can may limit the rate at which messages can be sent or receivedbe sent or received
• Per-message processing overhead refers Per-message processing overhead refers to the number of CPU cycles spent per to the number of CPU cycles spent per sent or received application messagesent or received application message
• Latency is affected by two factors: Latency is affected by two factors: overhead cycles in critical path, and overhead cycles in critical path, and scheduling delaysscheduling delays
19
Low Latency Network Low Latency Network Access (cont.)Access (cont.)
• To minimize scheduling delays, each task To minimize scheduling delays, each task involved in the processing of a message involved in the processing of a message must have sufficient high prioritymust have sufficient high priority
• There are two ideas for reducing There are two ideas for reducing overhead cycles: optimizing message overhead cycles: optimizing message processing for the common case, and processing for the common case, and eliminating OS kernel from the critical eliminating OS kernel from the critical processing path of messagesprocessing path of messages
20
Optimizing for Common Optimizing for Common Case ProcessingCase Processing
• Reducing memory stalls will reduce Cycles Reducing memory stalls will reduce Cycles Per Instruction (CPI)Per Instruction (CPI)
• Inlining: Some functions are inlined in order Inlining: Some functions are inlined in order to reduce memory stallsto reduce memory stalls
• Outlining: Removing low-frequently executed Outlining: Removing low-frequently executed part of function from critical pathpart of function from critical path
• Path Inlining: The entire latency-sensitive Path Inlining: The entire latency-sensitive path of execution (expect library functions) path of execution (expect library functions) is inlined to form a single functionis inlined to form a single function
21
Bypassing the Kernel: Bypassing the Kernel: Application Device Application Device
ChannelsChannels• Protection boundaries between OS and Protection boundaries between OS and
application add latency to I/O operationsapplication add latency to I/O operations• ADC gives applications direct access to a ADC gives applications direct access to a
network device for common I/O operationsnetwork device for common I/O operations• The implementation of ADCs comprises The implementation of ADCs comprises
three components:three components:• A user-level implementation of the network A user-level implementation of the network
protocols, device driver, and communications APIprotocols, device driver, and communications API• The ADC mechanism support from the OSThe ADC mechanism support from the OS• Network adaptor support for ADC-based Network adaptor support for ADC-based
networkingnetworking
22
Application Device Application Device Channels (cont.)Channels (cont.)
23
Predictable Communication Predictable Communication PerformancePerformance
• The key to predictable communication The key to predictable communication performance is appropriate scheduling of performance is appropriate scheduling of resources in the end hostresources in the end host
• Appropriate scheduling of resources needs four Appropriate scheduling of resources needs four requirements:requirements:
• All communication-related processing must be All communication-related processing must be scheduled according to a policy that is able to maintain scheduled according to a policy that is able to maintain network’s Qos guarantees, and ensure fairness among network’s Qos guarantees, and ensure fairness among applicationsapplications
• Communication events must be associated with its Communication events must be associated with its application’s process with the use of application’s process with the use of early early demultiplexingdemultiplexing
• Communication processing must be scheduled Communication processing must be scheduled according to the contract between OS and applicationaccording to the contract between OS and application
• Resources consumed during the processing of Resources consumed during the processing of communication must be charged against the resource communication must be charged against the resource allocation of the applicationallocation of the application
24
ConclusionConclusion
• Future challenges and research Future challenges and research problems in operating systems for grid problems in operating systems for grid environments were discussedenvironments were discussed
• End-to-end performance is critical for End-to-end performance is critical for grid applications, so many changes in grid applications, so many changes in current operating systems are necessary current operating systems are necessary
• Proper accounting and control of Proper accounting and control of resources are important because grid resources are important because grid applications execute on shared resourcesapplications execute on shared resources
Thanks & QuestionsThanks & Questions