Operating Systems & Network Interfaces Mohammad Alisafaee Department of Electrical and Computer...

25
Operating Systems & Operating Systems & Network Interfaces Network Interfaces Mohammad Alisafaee Mohammad Alisafaee Department of Electrical and Department of Electrical and Computer Engineering Computer Engineering University of Tehran University of Tehran 8 Jan 2004 8 Jan 2004

Transcript of Operating Systems & Network Interfaces Mohammad Alisafaee Department of Electrical and Computer...

Page 1: Operating Systems & Network Interfaces Mohammad Alisafaee Department of Electrical and Computer Engineering University of Tehran 8 Jan 2004.

Operating Systems & Operating Systems & Network InterfacesNetwork Interfaces

Mohammad AlisafaeeMohammad AlisafaeeDepartment of Electrical and Computer Department of Electrical and Computer

EngineeringEngineering

University of TehranUniversity of Tehran

8 Jan 20048 Jan 2004

Page 2: Operating Systems & Network Interfaces Mohammad Alisafaee Department of Electrical and Computer Engineering University of Tehran 8 Jan 2004.

2

OutlineOutline

• Network SubsystemNetwork Subsystem• Problems of Current Network SubsystemsProblems of Current Network Subsystems• Principles for the Design of High-Principles for the Design of High-

Performance Network SubsystemPerformance Network Subsystem• Network Interface DesignNetwork Interface Design• High-Bandwidth Network I/OHigh-Bandwidth Network I/O• Low Latency Network AccessLow Latency Network Access• Predictable Communication PerformancePredictable Communication Performance• ConclusionConclusion

Page 3: Operating Systems & Network Interfaces Mohammad Alisafaee Department of Electrical and Computer Engineering University of Tehran 8 Jan 2004.

3

Network SubsystemNetwork Subsystem

• The The network subsystemnetwork subsystem comprises the comprises the network interface and the part of the network interface and the part of the operating system responsible for operating system responsible for communicationcommunication

• It has three main tasks:It has three main tasks:• Multiplexing network among applications running on Multiplexing network among applications running on

the nodethe node• providing a rich communication service for providing a rich communication service for

applications from the raw packet-delivery service of applications from the raw packet-delivery service of the underlying networkthe underlying network

• Providing a standardized abstract API for Providing a standardized abstract API for applicationsapplications

Page 4: Operating Systems & Network Interfaces Mohammad Alisafaee Department of Electrical and Computer Engineering University of Tehran 8 Jan 2004.

4

ChallengesChallenges

• The key challenge in designing a The key challenge in designing a network subsystem is to achieve network subsystem is to achieve application-to-application application-to-application performance close to capabilities of performance close to capabilities of the physical network, while the physical network, while maintaining standardized APIs for maintaining standardized APIs for applicationsapplications

Page 5: Operating Systems & Network Interfaces Mohammad Alisafaee Department of Electrical and Computer Engineering University of Tehran 8 Jan 2004.

5

Grid ApplicationsGrid Applications

• Grid applications have demanding Grid applications have demanding communication requirements, including communication requirements, including high-bandwidth, realtime, and multiple high-bandwidth, realtime, and multiple concurrent data streams with different concurrent data streams with different QoSQoS

• The fundamental services needed to The fundamental services needed to support such applications are:support such applications are:

• high-bandwidthhigh-bandwidth• low-delay communicationlow-delay communication• predictable communication performancepredictable communication performance

Page 6: Operating Systems & Network Interfaces Mohammad Alisafaee Department of Electrical and Computer Engineering University of Tehran 8 Jan 2004.

6

Problems of Current Problems of Current Network SubsystemsNetwork Subsystems

• In the network subsystem of current OSs:In the network subsystem of current OSs:• High per-byte processing overhead reduces High per-byte processing overhead reduces

effective application throughputeffective application throughput• High per-message processing overhead increases High per-message processing overhead increases

application latencyapplication latency• Inappropriate accounting and scheduling of Inappropriate accounting and scheduling of

operating system resources causes high variance operating system resources causes high variance in application’s communication throughput and in application’s communication throughput and latencylatency

• The result is loss of performance and The result is loss of performance and quality of service provided by the quality of service provided by the network because of inappropriate network because of inappropriate processing in the end systemprocessing in the end system

Page 7: Operating Systems & Network Interfaces Mohammad Alisafaee Department of Electrical and Computer Engineering University of Tehran 8 Jan 2004.

7

Principles for the Design of Principles for the Design of High-Performance Network High-Performance Network

SubsystemSubsystem• Coordinated design: Achieving high-Coordinated design: Achieving high-

performance requires a coordinated performance requires a coordinated design of the entire network subsystemdesign of the entire network subsystem

• Early demultiplexing: To appropriately Early demultiplexing: To appropriately schedule resources in the end system, it is schedule resources in the end system, it is necessary to have a single point necessary to have a single point demultiplexing for incoming packets as demultiplexing for incoming packets as close to the point of network attachment close to the point of network attachment as possibleas possible

Page 8: Operating Systems & Network Interfaces Mohammad Alisafaee Department of Electrical and Computer Engineering University of Tehran 8 Jan 2004.

8

Principles for the Design of Principles for the Design of High-Performance Network High-Performance Network

Subsystem (cont.)Subsystem (cont.)• Integrated layer processing (ILP): ILP is Integrated layer processing (ILP): ILP is

a technique that attempts to combine all a technique that attempts to combine all computations on the network payload computations on the network payload data in a single traversal of the datadata in a single traversal of the data

• Path-oriented structure: A system Path-oriented structure: A system structure that uses data path as central structure that uses data path as central point for optimization, is required to gain point for optimization, is required to gain high-performancehigh-performance

Page 9: Operating Systems & Network Interfaces Mohammad Alisafaee Department of Electrical and Computer Engineering University of Tehran 8 Jan 2004.

9

Network Interface DesignNetwork Interface Design

• How data is transmitted between the How data is transmitted between the adaptor and host memory?adaptor and host memory?– What bus the adaptor is attached to?What bus the adaptor is attached to?

• I/O versus Memory BusI/O versus Memory Bus

– Where packets are buffered while they Where packets are buffered while they await processing?await processing?• DMA versus PIODMA versus PIO

– How the adaptor and host signal each How the adaptor and host signal each other?other?• Interrupt versus PollingInterrupt versus Polling

Page 10: Operating Systems & Network Interfaces Mohammad Alisafaee Department of Electrical and Computer Engineering University of Tehran 8 Jan 2004.

10

I/O vs. Memory BusI/O vs. Memory Bus

• I/O buses are both open and stable I/O buses are both open and stable but memory buses are generally but memory buses are generally proprietary and change at vendor’s proprietary and change at vendor’s whimwhim

• If adaptors for memory buses could If adaptors for memory buses could be made practical:be made practical:

• The data path between the adaptor and The data path between the adaptor and memory would be shortenedmemory would be shortened

• The adaptor is able to exploit the higher The adaptor is able to exploit the higher bandwidth of memory busesbandwidth of memory buses

Page 11: Operating Systems & Network Interfaces Mohammad Alisafaee Department of Electrical and Computer Engineering University of Tehran 8 Jan 2004.

11

DMA vs. PIODMA vs. PIO

• DMA supports large transfers which DMA supports large transfers which eliminate the cost of acquiring buseliminate the cost of acquiring bus

• PIO load the incoming data into the PIO load the incoming data into the cache; if the application reads data soon cache; if the application reads data soon after the PIO transfer, the data may be after the PIO transfer, the data may be still in the cache; this requires a still in the cache; this requires a substantial amount of buffer space in the substantial amount of buffer space in the adaptoradaptor

• The preferable technique is highly The preferable technique is highly machine dependentmachine dependent

Page 12: Operating Systems & Network Interfaces Mohammad Alisafaee Department of Electrical and Computer Engineering University of Tehran 8 Jan 2004.

12

Interrupt vs. PollingInterrupt vs. Polling

• Handling interrupt is a time-consuming Handling interrupt is a time-consuming task, e.g., an interrupt takes 75 us in task, e.g., an interrupt takes 75 us in Mach OS on a DECStation 5000/200 Mach OS on a DECStation 5000/200 and the service time for a UDP/IP and the service time for a UDP/IP received packet is 200 usreceived packet is 200 us

• In some situations, when there is In some situations, when there is sufficient buffering on the adaptor and sufficient buffering on the adaptor and frequently of communication is frequently of communication is predictable, polling is appropriatepredictable, polling is appropriate

Page 13: Operating Systems & Network Interfaces Mohammad Alisafaee Department of Electrical and Computer Engineering University of Tehran 8 Jan 2004.

13

High-Bandwidth Network High-Bandwidth Network I/OI/O

• Achieving low per-byte overhead enables Achieving low per-byte overhead enables the system to deliver the bandwidth of a the system to deliver the bandwidth of a high-speed network to applicationhigh-speed network to application

• The primary source of per-byte processing The primary source of per-byte processing cost is data touching operations such as cost is data touching operations such as copying, checksumming, and presentation copying, checksumming, and presentation conversionconversion

• The performance impact of checksumming The performance impact of checksumming and presentation conversion can be and presentation conversion can be reduced by using the techniques of ILPreduced by using the techniques of ILP

• Multiple Data copying can be easily avoidedMultiple Data copying can be easily avoided

Page 14: Operating Systems & Network Interfaces Mohammad Alisafaee Department of Electrical and Computer Engineering University of Tehran 8 Jan 2004.

14

Integrated Data-Touching Integrated Data-Touching OperationsOperations

• Integrated Layer Processing (ILP) is a Integrated Layer Processing (ILP) is a general technique to reduce the overhead general technique to reduce the overhead of multiple data-touching operationof multiple data-touching operation

• The idea is to merge all data-touching The idea is to merge all data-touching operations performed on a packet and to operations performed on a packet and to perform the resulting compound operation perform the resulting compound operation in a single traversal of the datain a single traversal of the data

• In practice, implementing ILP is In practice, implementing ILP is complicated by the need to merge complicated by the need to merge operations from independent protocolsoperations from independent protocols

Page 15: Operating Systems & Network Interfaces Mohammad Alisafaee Department of Electrical and Computer Engineering University of Tehran 8 Jan 2004.

15

Avoiding Data CopyingAvoiding Data Copying

• Data copying occurs most frequently at Data copying occurs most frequently at the boundaries between layers of the the boundaries between layers of the network subsystemnetwork subsystem

• Copying among different protocols can be Copying among different protocols can be avoided because most protocols are avoided because most protocols are implemented inside the OS kernelimplemented inside the OS kernel

• After above improvement only two data After above improvement only two data copying are remained: data transfer copying are remained: data transfer between main memory and network between main memory and network adaptor, and between OS and applicationadaptor, and between OS and application

Page 16: Operating Systems & Network Interfaces Mohammad Alisafaee Department of Electrical and Computer Engineering University of Tehran 8 Jan 2004.

16

Avoiding Data Copying Avoiding Data Copying (cont.)(cont.)

• The data transfer between network The data transfer between network adaptor and main memory cannot usually adaptor and main memory cannot usually be avoidedbe avoided

• Avoiding data copying between OS and Avoiding data copying between OS and application requires the solution of a application requires the solution of a complex set of issues and affect the design complex set of issues and affect the design of the network adaptor, the demultiplexing of the network adaptor, the demultiplexing strategy, the data buffering system in the strategy, the data buffering system in the OS, and the network APIOS, and the network API

Page 17: Operating Systems & Network Interfaces Mohammad Alisafaee Department of Electrical and Computer Engineering University of Tehran 8 Jan 2004.

17

Avoiding Data Copying Avoiding Data Copying (cont.)(cont.)

• Numerous approaches for avoiding Numerous approaches for avoiding data copying between OS and data copying between OS and application have been proposed:application have been proposed:

MethodMethodCopy Copy

semantsemanticsics

SafeSafetyty

OutboaOutboard rd

bufferibufferingng

Early Early demuxdemux

WITLEWITLESSSS YesYes YesYes NeedeNeede

ddNeverNever

GenieGenie YesYes YesYes NoNo AlwaysAlwaysRemapRemap NoNo YesYes NoNo AlwaysAlwaysShared Shared MemorMemor

yyNoNo NoNo NoNo Not in Not in

CCCC

fbufsfbufs NoNo YesYes NoNo Not in Not in CCCC

Page 18: Operating Systems & Network Interfaces Mohammad Alisafaee Department of Electrical and Computer Engineering University of Tehran 8 Jan 2004.

18

Low Latency Network Low Latency Network AccessAccess

• High per-message processing overhead High per-message processing overhead may limit the rate at which messages can may limit the rate at which messages can be sent or receivedbe sent or received

• Per-message processing overhead refers Per-message processing overhead refers to the number of CPU cycles spent per to the number of CPU cycles spent per sent or received application messagesent or received application message

• Latency is affected by two factors: Latency is affected by two factors: overhead cycles in critical path, and overhead cycles in critical path, and scheduling delaysscheduling delays

Page 19: Operating Systems & Network Interfaces Mohammad Alisafaee Department of Electrical and Computer Engineering University of Tehran 8 Jan 2004.

19

Low Latency Network Low Latency Network Access (cont.)Access (cont.)

• To minimize scheduling delays, each task To minimize scheduling delays, each task involved in the processing of a message involved in the processing of a message must have sufficient high prioritymust have sufficient high priority

• There are two ideas for reducing There are two ideas for reducing overhead cycles: optimizing message overhead cycles: optimizing message processing for the common case, and processing for the common case, and eliminating OS kernel from the critical eliminating OS kernel from the critical processing path of messagesprocessing path of messages

Page 20: Operating Systems & Network Interfaces Mohammad Alisafaee Department of Electrical and Computer Engineering University of Tehran 8 Jan 2004.

20

Optimizing for Common Optimizing for Common Case ProcessingCase Processing

• Reducing memory stalls will reduce Cycles Reducing memory stalls will reduce Cycles Per Instruction (CPI)Per Instruction (CPI)

• Inlining: Some functions are inlined in order Inlining: Some functions are inlined in order to reduce memory stallsto reduce memory stalls

• Outlining: Removing low-frequently executed Outlining: Removing low-frequently executed part of function from critical pathpart of function from critical path

• Path Inlining: The entire latency-sensitive Path Inlining: The entire latency-sensitive path of execution (expect library functions) path of execution (expect library functions) is inlined to form a single functionis inlined to form a single function

Page 21: Operating Systems & Network Interfaces Mohammad Alisafaee Department of Electrical and Computer Engineering University of Tehran 8 Jan 2004.

21

Bypassing the Kernel: Bypassing the Kernel: Application Device Application Device

ChannelsChannels• Protection boundaries between OS and Protection boundaries between OS and

application add latency to I/O operationsapplication add latency to I/O operations• ADC gives applications direct access to a ADC gives applications direct access to a

network device for common I/O operationsnetwork device for common I/O operations• The implementation of ADCs comprises The implementation of ADCs comprises

three components:three components:• A user-level implementation of the network A user-level implementation of the network

protocols, device driver, and communications APIprotocols, device driver, and communications API• The ADC mechanism support from the OSThe ADC mechanism support from the OS• Network adaptor support for ADC-based Network adaptor support for ADC-based

networkingnetworking

Page 22: Operating Systems & Network Interfaces Mohammad Alisafaee Department of Electrical and Computer Engineering University of Tehran 8 Jan 2004.

22

Application Device Application Device Channels (cont.)Channels (cont.)

Page 23: Operating Systems & Network Interfaces Mohammad Alisafaee Department of Electrical and Computer Engineering University of Tehran 8 Jan 2004.

23

Predictable Communication Predictable Communication PerformancePerformance

• The key to predictable communication The key to predictable communication performance is appropriate scheduling of performance is appropriate scheduling of resources in the end hostresources in the end host

• Appropriate scheduling of resources needs four Appropriate scheduling of resources needs four requirements:requirements:

• All communication-related processing must be All communication-related processing must be scheduled according to a policy that is able to maintain scheduled according to a policy that is able to maintain network’s Qos guarantees, and ensure fairness among network’s Qos guarantees, and ensure fairness among applicationsapplications

• Communication events must be associated with its Communication events must be associated with its application’s process with the use of application’s process with the use of early early demultiplexingdemultiplexing

• Communication processing must be scheduled Communication processing must be scheduled according to the contract between OS and applicationaccording to the contract between OS and application

• Resources consumed during the processing of Resources consumed during the processing of communication must be charged against the resource communication must be charged against the resource allocation of the applicationallocation of the application

Page 24: Operating Systems & Network Interfaces Mohammad Alisafaee Department of Electrical and Computer Engineering University of Tehran 8 Jan 2004.

24

ConclusionConclusion

• Future challenges and research Future challenges and research problems in operating systems for grid problems in operating systems for grid environments were discussedenvironments were discussed

• End-to-end performance is critical for End-to-end performance is critical for grid applications, so many changes in grid applications, so many changes in current operating systems are necessary current operating systems are necessary

• Proper accounting and control of Proper accounting and control of resources are important because grid resources are important because grid applications execute on shared resourcesapplications execute on shared resources

Page 25: Operating Systems & Network Interfaces Mohammad Alisafaee Department of Electrical and Computer Engineering University of Tehran 8 Jan 2004.

Thanks & QuestionsThanks & Questions