Windows OS Internals - Copyright © 2005 David A. Solomon, Mark E. Russinovich, and Andreas Polze...
-
Upload
christopher-york -
Category
Documents
-
view
237 -
download
3
Transcript of Windows OS Internals - Copyright © 2005 David A. Solomon, Mark E. Russinovich, and Andreas Polze...
Windows OS Internals - Copyright © 2005 David A. Solomon, Mark E. Russinovich, and Andreas PolzeWindows OS Internals - Copyright © 2005 David A. Solomon, Mark E. Russinovich, and Andreas Polze
Unit OS4: Scheduling and DispatchUnit OS4: Scheduling and Dispatch
4.4. Windows Thread Scheduling4.4. Windows Thread Scheduling
2
Copyright NoticeCopyright Notice© 2000-2005 David A. Solomon and Mark Russinovich© 2000-2005 David A. Solomon and Mark Russinovich
These materials are part of the These materials are part of the Windows Operating Windows Operating System Internals Curriculum Development Kit,System Internals Curriculum Development Kit, developed by David A. Solomon and Mark E. developed by David A. Solomon and Mark E. Russinovich with Andreas PolzeRussinovich with Andreas Polze
Microsoft has licensed these materials from David Microsoft has licensed these materials from David Solomon Expert Seminars, Inc. for distribution to Solomon Expert Seminars, Inc. for distribution to academic organizations solely for use in academic academic organizations solely for use in academic environments (and not for commercial use)environments (and not for commercial use)
3
Roadmap for Section 4.4.Roadmap for Section 4.4.
Windows Scheduling PrinciplesWindows Scheduling Principles
Windows API vs. NT Kernel PrioritiesWindows API vs. NT Kernel Priorities
Scheduling Data StructuresScheduling Data Structures
Scheduling ScenariosScheduling Scenarios
Priority Boosts and Priority AdjustmentsPriority Boosts and Priority Adjustments
4
Scheduling CriteriaScheduling Criteria
CPU utilization – keep the CPU as busy as possibleCPU utilization – keep the CPU as busy as possible
Throughput – # of processes/threads that complete Throughput – # of processes/threads that complete their execution per time unittheir execution per time unit
Turnaround time – amount of time to execute a Turnaround time – amount of time to execute a particular process/threadparticular process/thread
Waiting time – amount of time a process/thread has Waiting time – amount of time a process/thread has been waiting in the ready queuebeen waiting in the ready queue
Response time – amount of time it takes from when a Response time – amount of time it takes from when a request was submitted until the first response is request was submitted until the first response is produced, produced, notnot output (i.e.; the hourglass) output (i.e.; the hourglass)
5
How does the Windows scheduler How does the Windows scheduler relate to the issues discussed:relate to the issues discussed:
Priority-driven, preemptive scheduling systemPriority-driven, preemptive scheduling system
Highest-priority runHighest-priority runnnable thread always runsable thread always runs
Thread runs for time amount of Thread runs for time amount of quantumquantum
No single scheduler – event-based scheduling code No single scheduler – event-based scheduling code spread across the kernelspread across the kernel
Dispatcher routines triggered by the following events:Dispatcher routines triggered by the following events:
Thread becomes ready for executionThread becomes ready for execution
Thread leaves running state (quantum expires, wait state)Thread leaves running state (quantum expires, wait state)
Thread‘s priority changes (system call/NT acThread‘s priority changes (system call/NT acttivity)ivity)
Processor affinity of a running thread changesProcessor affinity of a running thread changes
6
Windows Scheduling PrinciplesWindows Scheduling Principles
32 priority levels32 priority levels
Threads within same priority are scheduled Threads within same priority are scheduled following the Round-Robin policyfollowing the Round-Robin policy
Non-Realtime Priorities are adjusted dynamicallyNon-Realtime Priorities are adjusted dynamically
Priority elevation as response to certain I/O and Priority elevation as response to certain I/O and dispatch eventsdispatch events
Quantum stretching to optimize responsivenessQuantum stretching to optimize responsiveness
Realtime priorities (i.e.; > 15) are assigned Realtime priorities (i.e.; > 15) are assigned statically to threadsstatically to threads
7
Windows Scheduling-related APIs:Windows Scheduling-related APIs:
Get/SetPriorityClassGet/SetPriorityClass
Get/SetThreadPriorityGet/SetThreadPriority
Get/SetProcessAffinityMaskGet/SetProcessAffinityMask
SetThreadAffinityMaskSetThreadAffinityMask
SetThreadIdealProcessorSetThreadIdealProcessor
Suspend/ResumeThreadSuspend/ResumeThread
SchedulingSchedulingMultiple threads may be Multiple threads may be readyready to run to run
““Who gets to use the CPU?”Who gets to use the CPU?”
From Windows API point of view:From Windows API point of view:
Processes are given a priority class upon creationProcesses are given a priority class upon creation
Idle, Normal, High, RealtimeIdle, Normal, High, Realtime
Windows 2000 added “Above normal” and “Below normal”Windows 2000 added “Above normal” and “Below normal”
Threads have a relative priority within the classThreads have a relative priority within the class
Idle, Lowest, Below_Normal, Normal, Above_Normal, Highest, and Idle, Lowest, Below_Normal, Normal, Above_Normal, Highest, and Time_CriticalTime_Critical
From the kernel’s view:From the kernel’s view:
Threads have priorities Threads have priorities 0 through 310 through 31
Threads are scheduled, Threads are scheduled, not processesnot processes
Process priority class is not usedProcess priority class is not usedto make scheduling decisionsto make scheduling decisions
8
Kernel: Thread Priority LevelsKernel: Thread Priority Levels
16 “real-time” levels
15 variable levels
Used by zero page thread
Used by idle thread(s)
31
16
0
i
15
1
9
Windows vs. NT Kernel Priorities Windows vs. NT Kernel Priorities Win32 Process Classes
Realtime HighAboveNormal Normal
BelowNormal Idle
Win32 Time-critical 31 15 15 15 15 15Thread Highest 26 15 12 10 8 6
Priorities Above-normal 25 14 11 9 7 5
Normal 24 13 10 8 6 4
Below-normal 23 12 9 7 5 3
Lowest 22 11 8 6 4 2
Idle 16 1 1 1 1 1
Table shows Table shows basebase priorities (“current” or “dynamic” thread priority may priorities (“current” or “dynamic” thread priority may be higher if base is < 15)be higher if base is < 15)
Many utilities (such as Process Viewer) show the “dynamic priority” of Many utilities (such as Process Viewer) show the “dynamic priority” of threads rather than the base (Performance Monitor can show both) threads rather than the base (Performance Monitor can show both)
Drivers can set to any value with KeSetPriorityThreadDrivers can set to any value with KeSetPriorityThread
10
Special Thread PrioritiesSpecial Thread Priorities
Idle threads -- one per CPUIdle threads -- one per CPU
When no threads want to run, Idle thread “runs”When no threads want to run, Idle thread “runs”
Not a real priority level - appears to have priority zero, but actually runs “below” Not a real priority level - appears to have priority zero, but actually runs “below” priority 0priority 0
Provides CPU idle time accounting (unused clock ticks are charged to the idle Provides CPU idle time accounting (unused clock ticks are charged to the idle thread)thread)
Loop:Loop:
Calls HAL to allow for power managementCalls HAL to allow for power management
Processes DPC listProcesses DPC list
Dispatches to a thread if selectedDispatches to a thread if selected
Server 2003: in certain cases, scans per-CPU ready queues for next threadServer 2003: in certain cases, scans per-CPU ready queues for next thread
Zero page thread -- one per NT systemZero page thread -- one per NT system
Zeroes pages of memory in anticipation of “demand zero” page faultsZeroes pages of memory in anticipation of “demand zero” page faults
Runs at priority zero (lower than any reachable from Windows)Runs at priority zero (lower than any reachable from Windows)
Part of the “System” process (not a complete process)Part of the “System” process (not a complete process)
11
Thread Scheduling Priorities vs. Thread Scheduling Priorities vs. Interrupt Request Levels (IRQLs)Interrupt Request Levels (IRQLs)
Passive_LevelAPC
Dispatch/DPCDevice 1
.
.
.Device n
ClockInterprocessor Interrupt
Power failHigh
Hardware interrupts
IRQLs (x86)
Software interrupts
012
302928
31
Thread priorities
0-31
12
Single Processor Thread SchedulingSingle Processor Thread Scheduling
Priority driven, preemptivePriority driven, preemptive
32 queues (FIFO lists) of “ready” threads32 queues (FIFO lists) of “ready” threads
UP: highest priority thread always runsUP: highest priority thread always runs
MP: MP: OneOne of the highest priority runnable thread will be running of the highest priority runnable thread will be running somewheresomewhere
No attempt to share processor(s) “fairly” among processes, only No attempt to share processor(s) “fairly” among processes, only among threadsamong threads
Time-sliced, round-robin within a priority levelTime-sliced, round-robin within a priority level
Event-driven; no guaranteed execution period before Event-driven; no guaranteed execution period before preemptionpreemption
When a thread becomes Ready, it either runs immediately or is When a thread becomes Ready, it either runs immediately or is inserted at the tail of the Ready queue for its current (dynamic) inserted at the tail of the Ready queue for its current (dynamic) prioritypriority
13
Thread SchedulingThread Scheduling
No central scheduler!No central scheduler!
i.e. there is no always-instantiated routine called “the scheduler”i.e. there is no always-instantiated routine called “the scheduler”
The “code that does scheduling” is not a threadThe “code that does scheduling” is not a thread
Scheduling routines are simply called whenever events occur that Scheduling routines are simply called whenever events occur that change the Ready state of a threadchange the Ready state of a thread
Things that cause scheduling events include:Things that cause scheduling events include:
interval timer interrupts (for quantum end)interval timer interrupts (for quantum end)
interval timer interrupts (for timed wait completion)interval timer interrupts (for timed wait completion)
other hardware interrupts (for I/O wait completion)other hardware interrupts (for I/O wait completion)
one thread changes the state of a waitable object upon which other one thread changes the state of a waitable object upon which other thread(s) are waitingthread(s) are waiting
a thread waits on one or more dispatcher objectsa thread waits on one or more dispatcher objects
a thread priority is changeda thread priority is changed
Based on doubly-linked lists (queues) of Ready threadsBased on doubly-linked lists (queues) of Ready threads
Nothing that takes “order-Nothing that takes “order-nn time” for time” for nn threads threads
14
Scheduling Data StructuresScheduling Data Structures
Process
thread thread
Process
thread thread
Default base prioDefault proc affinityDefault quantum
31
0
Ready summary Idle summary31 (or 63) 0 31 (or 63) 0
Base priorityCurrent priorityProcessor affinityQuantum
Bitmask for non-emptyready queuesBitmask for idle CPUs
15
Scheduling ScenariosScheduling Scenarios
PreemptionPreemption
A thread becomes Ready at a higher priority than the running threadA thread becomes Ready at a higher priority than the running thread
Lower-priority Running thread is preemptedLower-priority Running thread is preempted
Preempted thread goes back to Preempted thread goes back to headhead of its Ready queue of its Ready queue
action:action: pick lowest priority thread to preempt pick lowest priority thread to preempt
Voluntary switchVoluntary switch
Waiting on a dispatcher objectWaiting on a dispatcher object
TerminationTermination
Explicit lowering of priorityExplicit lowering of priority
action:action: scan for next Ready thread (starting at your priority & down) scan for next Ready thread (starting at your priority & down)
Running thread experiences quantum endRunning thread experiences quantum end
Priority is decremented unless already at thread base priorityPriority is decremented unless already at thread base priority
Thread goes to Thread goes to tailtail of ready queue for its new priority of ready queue for its new priority
May continue running if no equal or higher-priority threads are ReadyMay continue running if no equal or higher-priority threads are Ready
action:action: pick next thread at same priority level pick next thread at same priority level
16
Preemption is strictly event-drivenPreemption is strictly event-driven
does not wait for the next clock tickdoes not wait for the next clock tick
no guaranteed execution period before preemptionno guaranteed execution period before preemption
threads in kernel mode may be preempted (unless they raise IRQL to >= 2)threads in kernel mode may be preempted (unless they raise IRQL to >= 2)
A preempted thread goes back to the head of its ready queueA preempted thread goes back to the head of its ready queue
Scheduling ScenariosScheduling Scenarios
Preemption Preemption
181716151413
Running Ready
from Wait state
17
If newly-ready thread is not of higher priority than the running thread…If newly-ready thread is not of higher priority than the running thread…
……it is put at the tail of the ready queue for its current priorityit is put at the tail of the ready queue for its current priority
If priority >=14 quantum is reset (t.b.d.)If priority >=14 quantum is reset (t.b.d.)
If priority <14 and you’re about to be boosted and didn’t already have a If priority <14 and you’re about to be boosted and didn’t already have a boost, quantum is set to process quantum - 1boost, quantum is set to process quantum - 1
Scheduling ScenariosScheduling Scenarios
Ready after Wait ResolutionReady after Wait Resolution
181716151413
Running Ready
from Wait state
18
Scheduling ScenariosScheduling Scenarios
Voluntary SwitchVoluntary SwitchWhen the running thread gives up the CPU…When the running thread gives up the CPU…
……Schedule the thread at the head of the next non-empty “ready” queueSchedule the thread at the head of the next non-empty “ready” queue
to Waiting state
181716151413
Running Ready
19
Scheduling ScenariosScheduling Scenarios
Quantum End (“time-slicing”)Quantum End (“time-slicing”)When the running thread exhausts its CPU quantum, it goes to the end When the running thread exhausts its CPU quantum, it goes to the end of its ready queueof its ready queue
Applies to both real-time and dynamic priority threads, user and kernel Applies to both real-time and dynamic priority threads, user and kernel modemode
Quantums can be disabled for a thread by a kernel functionQuantums can be disabled for a thread by a kernel function
Default quantum on Professional is 2 clock ticks, 12 on ServerDefault quantum on Professional is 2 clock ticks, 12 on Serverstandard clock tick is 10 msec; might be 15 msec on some MP Pentium systemsstandard clock tick is 10 msec; might be 15 msec on some MP Pentium systems
if no other ready threads at that priority, same thread continues running (just if no other ready threads at that priority, same thread continues running (just gets new quantum)gets new quantum)
if running at boosted priority, priority decays by one at quantum end if running at boosted priority, priority decays by one at quantum end (described later) (described later)
Running Ready181716151413
20
Basic Thread Scheduling StatesBasic Thread Scheduling States
Ready (1) Running (2)
Waiting (5)
voluntaryswitch
preemption, quantum end
21
Priority AdjustmentsPriority Adjustments
Dynamic priority adjustments (boost and decay) are applied to threads in Dynamic priority adjustments (boost and decay) are applied to threads in “dynamic” classes“dynamic” classes
Threads with base priorities 1-15 (technically, 1 through 14)Threads with base priorities 1-15 (technically, 1 through 14)
Disable if desired with SetThreadPriorityBoost or SetProcessPriorityBoostDisable if desired with SetThreadPriorityBoost or SetProcessPriorityBoost
Five types:Five types:
I/O completionI/O completion
Wait completion on events or semaphoresWait completion on events or semaphores
When threads in the foreground process complete a waitWhen threads in the foreground process complete a wait
When GUI threads wake up for windows inputWhen GUI threads wake up for windows input
For CPU starvation avoidanceFor CPU starvation avoidance
No automatic adjustments in “real-time” class (16 or above)No automatic adjustments in “real-time” class (16 or above)
““Real time” here really means “system won’t change the relative priorities of Real time” here really means “system won’t change the relative priorities of your real-time threads”your real-time threads”
Hence, scheduling is predictable with respect to other “real-time” threads (but Hence, scheduling is predictable with respect to other “real-time” threads (but not for absolute latency) not for absolute latency)
22
Priority BoostingPriority BoostingTo favor I/O intense threads:To favor I/O intense threads:
After an I/O: specified by device driverAfter an I/O: specified by device driver
IoCompleteRequest( Irp, PriorityBoost ) IoCompleteRequest( Irp, PriorityBoost )
Other cases discussed in the Windows Scheduling Internals SectionOther cases discussed in the Windows Scheduling Internals Section
After a wait on executive event or After a wait on executive event or semaphoresemaphore
After any wait on a dispatcher object by a thread in the foreground processAfter any wait on a dispatcher object by a thread in the foreground process
GUI threads that wake up to process windowing input (e.g. windows GUI threads that wake up to process windowing input (e.g. windows messages) get a boost of 2messages) get a boost of 2
Common boost values (see NTDDK.H)1: disk, CD-ROM, parallel, Video2: serial, network, named pipe, mailslot6: keyboard or mouse8: sound
23
Priority
BasePriority
Run Wait Run
Preempt(beforequantumend)
Run
Priority decayat quantum end
Boostuponwaitcomplete
Round-robin atbase priority
quantum
Time
Thread Priority Boost and DecayThread Priority Boost and DecayBehavior of these boosts:Behavior of these boosts:
Applied to thread’s base priorityApplied to thread’s base priority
will not take you above priority 15will not take you above priority 15
After a boost, you get one quantumAfter a boost, you get one quantum
Then decays 1 level, Then decays 1 level, runs another quantumruns another quantum
24
Further ReadingFurther Reading
Mark E. Russinovich and David A. Solomon, Microsoft Windows Mark E. Russinovich and David A. Solomon, Microsoft Windows Internals, 4th Edition, Microsoft Press, 2004. Internals, 4th Edition, Microsoft Press, 2004.
Chapter 6 - Processes, Thread, and JobsChapter 6 - Processes, Thread, and Jobs(from pp. 289)(from pp. 289)
Thread Scheduling (from pp. 325)Thread Scheduling (from pp. 325)
Thread States (from pp. 334)Thread States (from pp. 334)
Scheduling Scenarios (from pp. 345)Scheduling Scenarios (from pp. 345)
25
Source Code ReferencesSource Code References
Windows Research Kernel sourcesWindows Research Kernel sources
\base\ntos\ke\i386, \base\ntos\ke\amd64:\base\ntos\ke\i386, \base\ntos\ke\amd64:
Ctxswap.asm – Context SwapCtxswap.asm – Context Swap
Clockint.asm – Clock Interrupt HandlerClockint.asm – Clock Interrupt Handler
\base\ntos\ke\base\ntos\ke
procobj.c - Process objectprocobj.c - Process object
thredobj.c, thredsup.c – Thread objectthredobj.c, thredsup.c – Thread object
Idsched.c – Idle schedulerIdsched.c – Idle scheduler
Wait.c – quantum management, wait resolutionWait.c – quantum management, wait resolution
Waitsup.c – dispatcher exit (deferred ready queue)Waitsup.c – dispatcher exit (deferred ready queue)
\base\ntos\inc\ke.h – structure/type definitions\base\ntos\inc\ke.h – structure/type definitions