Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity...

110
Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求求求求 求求求求

Transcript of Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity...

Page 1: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

Unit 3 Concurrency

Instructor Hengming Zou PhD

In Pursuit of Absolute Simplicity 求于至简归于永恒

22

Outline of Content

31 Critical Sections Semaphores and Monitors

32 Windows Trap Dispatching Interrupts Synchronization

33 Advanced Windows Synchronization

34 Windows APIs for Synchronization and IPC

33

Critical Sections Semaphores and Monitors The Critical-Section Problem

Software Solutions

Synchronization Hardware

Semaphores

Synchronization in Windows amp Linux

44

The Critical-Section Problem

n threads all competing to use a shared resource

Each thread has a code segment called critical section in which the shared data is accessed

Problem Ensure that

ndash when one thread is executing in its critical section no other thread is allowed to execute in its critical section

55

Solution to Critical-Section Problem

Mutual Exclusion

ndash Only one thread at a time is allowed into its CS among all threads that have CS for the same resource or shared data

ndash A thread halted in its non-critical section must not interfere with other threads

Progress

ndash A thread remains inside CS for a finite time only

ndash No assumptions concerning relative speed of the threads

66

Solution to Critical-Section Problem

Bounded Waiting

ndash It must no be possible for a thread requiring access to a critical section to be delayed indefinitely

ndash When no thread is in a critical section any thread that requests entry must be permitted to enter without delay

77

Only 2 threads T0 and T1

General structure of thread Ti (other thread Tj)

do

enter section

critical section

exit section

reminder section

while (1)

Threads may share some common variables to synchronize their actions

Initial Attempts to Solve Problem

88

First Attempt Algorithm 1

Shared variables

ndash Initialization int turn = 0

ndash turn == i Ti can enter its critical section

Thread Ti

do

while (turn = i)

critical section

turn = j

reminder section

while (1)

Satisfies mutual exclusion but not progress

99

Second Attempt Algorithm 2

Shared variables

ndash initialization int flag[2] flag[0] = flag[1] = 0

ndash flag[i] == 1 Ti can enter its critical section

Thread Ti

do

flag[i] = 1while (flag[j] == 1)

critical section

flag[i] = 0remainder section

while(1)

Satisfies mutual exclusion not progress requirement

1010

Algorithm 3 (Petersonrsquos Algorithm - 1981)

Shared variables of algorithms 1 and 2 - initialization

int flag[2] flag[0] = flag[1] = 0int turn = 0

Thread Ti

do

flag[i] = 1turn = jwhile ((flag[j] == 1) ampamp turn == j)

critical section

flag[i] = 0

remainder section

while (1)

Solves the critical-section problem for two threads

1111

Dekkerrsquos Algorithm (1965)

This is the first correct solution proposed for the two-thread (two-process) case

Originally developed by Dekker in a different context it was applied to the critical section problem by Dijkstra

Dekker adds the idea of a favored thread and allows access to either thread when the request is uncontested

When there is a conflict one thread is favored and the priority reverses after successful execution of the critical section

1212

Dekkerrsquos Algorithm (contd)

Shared variables - initializationint flag[2] flag[0] = flag[1] = 0int turn = 0

Thread Ti

do flag[i] = 1

while (flag[j] ) if (turn == j)

flag[i] = 0while (turn == j)flag[i] = 1

critical section

turn = jflag[I] = 0

remainder section

while (1)

1313

Bakery Algorithm (Lamport 1979)

A Solution to the Critical Section problem for n threads

Before entering its CS a thread receives a number

Holder of the smallest number enters the CS

If threads Ti and Tj receive the same number if i lt j then Ti is served first else Tj is served first

The numbering scheme generates numbers in monotonically non-decreasing order

ndash ie 1112333445

1414

Bakery Algorithm

Notation ldquoltldquo establishes lexicographical order among 2-tuples (ticket thread id )

(ab) lt (cd) if a lt c or if a == c and b lt d

max (a0hellip an-1) = k | k ai for i = 0hellip n ndash 1

Shared data

int choosing[n]

int number[n] - the ticket

Data structures are initialized to 0

1515

Bakery Algorithm

do

choosing[i] = 1

number[i] = max(number[0]number[1] number[n-1]) + 1

choosing[i] = 0

for (j = 0 j lt n j++)

while (choosing[j] == 1)

while ((number[j] = 0) ampamp ((number[j]j) lsquorsquoltlsquorsquo (number[i]i)))

critical section

number[i] = 0

remainder section

while (1)

1616

Mutual Exclusion - Hardware Support

Interrupt Disabling

ndash Concurrent threads cannot overlap on a uniprocessor

ndash Thread will run until performing a system call or interrupt happens

Special Atomic Machine Instructions

ndash Test and Set Instruction - read amp write a memory location

ndash Exchange Instruction - swap register and memory location

Problems with Machine-Instruction Approach

ndash Busy waiting

ndash Starvation is possible

ndash Deadlock is possible

1717

Synchronization Hardware

Test and modify the content of a word atomically

boolean TestAndSet(boolean amptarget)

boolean rv = target

target = true

return rv

1818

Shared data ndash boolean lock = false

Thread Ti

do

while (TestAndSet(lock))

critical section

lock = false

remainder section

Mutual Exclusion with Test-and-Set

1919

Synchronization Hardware

Atomically swap two variables

void Swap(boolean ampa boolean ampb)

boolean temp = a

a = b

b = temp

2020

Mutual Exclusion with Swap

Shared data (initialized to 0) int lock = 0

Thread Ti

int key

do

key = 1

while (key == 1) Swap(lockkey)

critical section

lock = 0

remainder section

2121

Semaphores

Semaphore S ndash integer variable

can only be accessed via two atomic operations

wait (S)

while (S lt= 0)S--

signal (S)

S++

2222

Critical Section of n Threads

Shared data

semaphore mutex initially mutex = 1

Thread Ti

do wait(mutex) critical section

signal(mutex) remainder section

while (1)

2323

Semaphore Implementation

Semaphores may suspendresume threads

ndash Avoid busy waiting

Define a semaphore as a record

typedef struct

int value struct thread L semaphore

Assume two simple operations

ndash suspend() suspends the thread that invokes it

ndash resume(T) resumes the execution of a blocked thread T

2424

Implementation

Semaphore operations now defined as wait(S)

Svalue--

if (Svalue lt 0)

add this thread to SLsuspend()

signal(S) Svalue++

if (Svalue lt= 0)

remove a thread T from SLresume(T)

2525

Semaphore as a General Synchronization Tool Execute B in Tj only after A executed in Ti

Use semaphore flag initialized to 0

Code

Ti Tj

A wait(flag)

signal(flag) B

2626

Two Types of Semaphores

Counting semaphore

ndash integer value can range over an unrestricted domain

Binary semaphore

ndash integer value can range only between 0 and 1

ndash can be simpler to implement

Counting semaphore S can be implemented as a binary semaphore

2727

Deadlock and Starvation

Deadlock ndash

ndash two or more threads are waiting indefinitely for an event that can be caused by only one of the waiting threads

Let S and Q be two semaphores initialized to 1

T0 T1

wait(S) wait(Q)

wait(Q) wait(S)

signal(S) signal(Q)

signal(Q) signal(S)

2828

Deadlock and Starvation

Starvation ndash indefinite blocking

ndash A thread may never be removed from the semaphore queue in which it is suspended

Solution ndash

ndash all code should acquirerelease semaphores in same order

2929

Windows Synchronization

Uses interrupt masks to protect access to global resources on uniprocessor systems

Uses spinlocks on multiprocessor systems

Provides dispatcher objects which may act as mutexes and semaphores

Dispatcher objects may also provide events An event acts much like a condition variable

3030

Linux Synchronization

Kernel disables interrupts for synchronizing access to global data on uniprocessor systems

Uses spinlocks for multiprocessor synchronization

Uses semaphores and readers-writers locks when longer sections of code need access to data

Implements POSIX synchronization primitives to support multitasking multithreading (including real-time threads) and multiprocessing

3131

Further Reading

Ben-Ari M Principles of Concurrent Programming Prentice Hall 1982

Lamport L The Mutual Exclusion Problem Journal of the ACM April 1986

Abraham Silberschatz Peter B Galvin Operating System Concepts John Wiley amp Sons 6th Ed 2003

ndash Chapter 7 - Process Synchronization

ndash Chapter 8 - Deadlocks

3232

32 Trap Dispatching Interrupts Synchronization Trap and Interrupt dispatching

IRQL levels amp Interrupt Precedence

Spinlocks and Kernel Synchronization

Executive Synchronization

3333

Kernel Mode Versus User Mode

A processor state

Controls access to memory

Each memory page is tagged to show the required mode for reading and for writing

ndash Protects the system from the users

ndash Protects the user (process) from themselves

ndash System is not protected from system

Code regions are tagged ldquono write in any moderdquo

Controls ability to execute privileged instructions

A Windows abstraction

ndash Intel Ring 0 Ring 3

3434

Kernel Mode Versus User Mode

Control flow (a thread) can change from user to kernel mode and back

ndash Does not affect scheduling

ndash Thread context includes info about execution mode (along with registers etc)

PerfMon counters

ndash ldquoPrivileged Timerdquo and ldquoUser Timerdquo

ndash 4 levels of granularity thread process processor system

3535

Getting Into Kernel Mode

Code is run in kernel mode for one of three reasons

1 Requests from user mode

ndash Via the system service dispatch mechanism

ndash Kernel-mode code runs in the context of the requesting thread

2 Dedicated kernel-mode system threads

ndash Some threads in the system stay in kernel mode at all timesmostly in the ldquoSystemrdquo process

ndash Scheduled preempted etc like any other threads

3636

Getting Into Kernel Mode

3 Interrupts from external devicesndash interrupt dispatcher invokes the interrupt service routine

ndash ISR runs in the context of the interrupted thread so-called ldquoarbitrary thread contextrdquo

ndash ISR often requests the execution of a ldquoDPC routinerdquo which also runs in kernel mode

ndash Time not charged to interrupted thread

3737

Trap dispatching

Interruptdispatcher

Systemservice

dispatcher

Interruptserviceroutines

Interruptserviceroutines

Interruptserviceroutines

System services

System services

System services

Exceptiondispatcher

Exceptionhandlers

Exceptionhandlers

Exceptionhandlers

Virtual memorymanagerlsquos pager

Interrupt

System service call

HW exceptionsSW exceptions

Virtual addressexceptions

Trap processorlsquos mechanism to capture executing thread

ndash Switch from user to kernel mode

ndash Interrupts ndash asynchronous

ndash Exceptions - synchronous

Interrupt dispatch routine

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Interrupt service routine

interrupt

user or kernel mode

codekernel mode

Note no thread or process context switch

Note no thread or process context switch

Interrupt Dispatching

3939

IRQL = Interrupt Request Level

ndash Precedence of the interrupt with respect to other interrupts

ndash Different interrupt sources have different IRQLs

ndash not the same as IRQ

IRQL is also a state of the processor

ndash Servicing an interrupt raises processor IRQL to that interruptrsquos IRQL

ndash this masks subsequent interrupts at equal and lower IRQLs

User mode is limited to IRQL 0

No waits or page faults at IRQL gt= DISPATCH_LEVEL

Interrupt Precedence via IRQLs (x86)

4040

PassiveLowAPC

DispatchDPCDevice 1

Profile amp Synch (Srv 2003)

ClockInterprocessor Interrupt

Power failHigh

normal thread execution

Hardware interrupts

Deferrable software interrupts

012

302928

31

Interrupt Precedence via IRQLs (x86)

4141

Interrupt processing

Interrupt dispatch table (IDT)ndash Links to interrupt service routines

x86ndash Interrupt controller interrupts processor (single line)

ndash Processor queries for interrupt vector uses vector as index to IDT

Alphandash PAL code (Privileged Architecture Library ndash Alpha BIOS) determines interrupt vector calls kernel

ndash Kernel uses vector to index IDT

After ISR execution IRQL is lowered to initial level

4242

Interrupt object

Allows device drivers to register ISRs for their devicesndash Contains dispatch code (initial handler)

ndash Dispatch code calls ISR with interrupt object as parameter(HW cannot pass parameters to ISR)

Connectingdisconnecting interrupt objectsndash Dynamic association between ISR and IDT entry

ndash Loadable device drivers (kernel modules)

ndash Turn onoff ISR

Interrupt objects can synchronize access to ISR datandash Multiple instances of ISR may be active simultaneously (MP machine)

ndash Multiple ISR may be connected with IRQL

4343

Predefined IRQLs

High ndash used when halting the system (via KeBugCheck())

Power fail ndash originated in the NT design document but has never been used

Inter-processor interruptndash used to request action from other processor (dispatching a thread updating a processors TLB system shutdown system crash)

Clockndash Used to update systemlsquos clock allocation of CPU time to threads

Profilendash Used for kernel profiling (see Kernel profiler ndash Kernprofexe Res Kit)

4444

Predefined IRQLs (contd)

Device

ndash Used to prioritize device interrupts

DPCdispatch and APC

ndash Software interrupts that kernel and device drivers generate

Passive

ndash No interrupt level at all normal thread execution

4545

IRQLs on 64-bit Systems

PassiveLowAPC

DispatchDPCDevice 1

Device n

Synch (Srv 2003)Clock

Interprocessor InterruptPower

HighProfile

012

1413

15

34

PassiveLowAPC

DispatchDPC amp Synch (UP only)Correctable Machine Check

Device 1

Device nSynch (MP only)

ClockInterprocessor Interrupt

HighProfilePower

x64 IA64

12

4646

Interrupt Prioritization amp Delivery

IRQLs are determined as followsndash x86 UP systems IRQL = 27 - IRQ

ndash x86 MP systems bucketized (random)

ndash x64 amp IA64 systems IRQL = IDT vector number 16

On MP systems which processor is chosen to deliver an interruptndash By default any processor can receive an interrupt from any deviceCan be configured with IntFilter utility in Resource Kit

ndash On x86 and x64 systems the IOAPIC (IO advanced programmable interrupt controller) is programmed to interrupt the processor running at the lowest IRQL

ndash On IA64 systems the SAPIC (streamlined advanced programmable interrupt controller) is configured to interrupt one processor for each interrupt sourceProcessors are assigned round robin for each interrupt vector

4747

Software interrupts

Initiating thread dispatching

ndash DPC allow for scheduling actions when kernel is deep within many layers of code

ndash Delayed scheduling decision one DPC queue per processor

Handling timer expiration

Asynchronous execution of a procedure in context of a particular thread

Support for asynchronous IO operations

4848

Flow of Interrupts

4949

Sync on MP use spinlocks to coordinate among processors

Spinlock acquisition and release routines implement a one-owner-at-a-time algorithm

ndash Spinlock is either free or is considered to be owned by a CPU

ndash Analogous to using Windows API mutexes from user mode

A spinlock is just a data cell in memory

ndash Accessed with a test-and-modify operation that is atomic across all processors

ndash KSPIN_LOCK is an opaque data type typedefrsquod as a ULONG

ndash To implement synchronization a single bit is sufficient

Synchronization on SMP Systems

5050

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

Kernel Synchronization

Processor BProcessor A

Critical section

spinlock

DPC DPC

A spinlock is a locking primitive associatedwith a global data structure such as the DPC queue

5151

Queued Spinlocks

Problem Checking status of spinlock via test-and-set operation creates bus contention

Queued spinlocks maintain queue of waiting processors

First processor acquires lock other processors wait on processor-local flag

ndash Thus busy-wait loop requires no access to the memory bus

When releasing lock the 1st processorrsquos flag is modified

ndash Exactly one processor is being signaled

ndash Pre-determined wait order

5252

SMP Scalability Improvements

Windows 2000 queued spinlocks

ndash qlocks in Kernel Debugger

Server 2003

ndash More spinlocks eliminated (context swap system space commit)

ndash Further reduction of use of spinlocks amp length they are held

ndash Scheduling database now per-CPUAllows thread state transitions in parallel

5353

SMP Scalability Improvements

XP2003

ndash Minimized lock contention for hot locks PFN or Page Frame Database lock

ndash Some locks completely eliminatedCharging nonpagedpaged pool quotas allocating and mapping system page table entries charging commitment of pages allocatingmapping physical memory through AWE functions

ndash New more efficient locking mechanism (pushlocks)Doesnrsquot use spinlocks when no contentionUsed for object manager and address windowing extensions (AWE) related locks

5454

Waiting

Flexible wait calls

ndash Wait for one or multiple objects in one call

ndash Wait for multiple can wait for ldquoanyrdquo one or ldquoallrdquo at onceldquoAllrdquo all objects must be in the signalled state concurrently to resolve the wait

ndash All wait calls include optional timeout argument

ndash Waiting threads consume no CPU time

5555

Waiting

Waitable objects include

ndash Events (may be auto-reset or manual reset may be set or ldquopulsedrdquo)

ndash Mutexes (ldquomutual exclusionrdquo one-at-a-time)

ndash Semaphores (n-at-a-time)

ndash Timers

ndash Processes and Threads (signalled upon exit or terminate)

ndash Directories (change notification)

5656

Waiting

No guaranteed ordering of wait resolution

ndash If multiple threads are waiting for an object and only one thread is released (eg itrsquos a mutex or auto-reset event) which thread gets released is unpredictable

5757

Executive Synchronization

Thread waitson an object

handle

Create and initialize thread object

Initialized

Ready

Transition

Waiting

Running

Terminated

Standby

Wait is completeSet object to

signaled state

Interaction with thread scheduling

Waiting on Dispatcher Objects ndash outside the kernel

5858

Interaction bet Synchronization amp Dispatching User mode thread waits on an event objectlsquos handle

Kernel changes threadlsquos scheduling state from ready to waiting and adds thread to wait-list

Another thread sets the event

Kernel wakes up waiting threads variable priority threads get priority boost

5959

Interaction bet Synchronization amp Dispatching Dispatcher re-schedules new thread ndash it may preempt running thread it it has lower priority and issues software interrupt to initiate context switch

If no processor can be preempted the dispatcher places the ready thread in the dispatcher ready queue to be scheduled later

6060

What signals an object

Dispatcher object

System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

Owning thread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (kernel mode)

nonsignaled signaled

Owning thread or otherthread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (exported to user mode)

nonsignaled signaled

One thread releases thesemaphore freeing a resource

A thread acquires the semaphoreMore resources are not available

Kernel resumes one or more waiting threads

Semaphore

6161

A thread reinitializesthe thread object

What signals an object (contd)

Dispatcher object System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

A thread sets the event

Kernel resumes one or more threads

Kernel resumes one or more waiting threads

Event

nonsignaled signaled

Dedicated thread sets oneevent in the event pair

Kernel resumes theother dedicated thread

Kernel resumes waitingdedicated thread

Event pair

nonsignaled signaled

Timer expires

A thread (re) initializes the timer

Kernel resumes all waiting threads

Timer

nonsignaled signaled

Thread terminates

Kernel resumes all waiting threads

Thread

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 2: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

22

Outline of Content

31 Critical Sections Semaphores and Monitors

32 Windows Trap Dispatching Interrupts Synchronization

33 Advanced Windows Synchronization

34 Windows APIs for Synchronization and IPC

33

Critical Sections Semaphores and Monitors The Critical-Section Problem

Software Solutions

Synchronization Hardware

Semaphores

Synchronization in Windows amp Linux

44

The Critical-Section Problem

n threads all competing to use a shared resource

Each thread has a code segment called critical section in which the shared data is accessed

Problem Ensure that

ndash when one thread is executing in its critical section no other thread is allowed to execute in its critical section

55

Solution to Critical-Section Problem

Mutual Exclusion

ndash Only one thread at a time is allowed into its CS among all threads that have CS for the same resource or shared data

ndash A thread halted in its non-critical section must not interfere with other threads

Progress

ndash A thread remains inside CS for a finite time only

ndash No assumptions concerning relative speed of the threads

66

Solution to Critical-Section Problem

Bounded Waiting

ndash It must no be possible for a thread requiring access to a critical section to be delayed indefinitely

ndash When no thread is in a critical section any thread that requests entry must be permitted to enter without delay

77

Only 2 threads T0 and T1

General structure of thread Ti (other thread Tj)

do

enter section

critical section

exit section

reminder section

while (1)

Threads may share some common variables to synchronize their actions

Initial Attempts to Solve Problem

88

First Attempt Algorithm 1

Shared variables

ndash Initialization int turn = 0

ndash turn == i Ti can enter its critical section

Thread Ti

do

while (turn = i)

critical section

turn = j

reminder section

while (1)

Satisfies mutual exclusion but not progress

99

Second Attempt Algorithm 2

Shared variables

ndash initialization int flag[2] flag[0] = flag[1] = 0

ndash flag[i] == 1 Ti can enter its critical section

Thread Ti

do

flag[i] = 1while (flag[j] == 1)

critical section

flag[i] = 0remainder section

while(1)

Satisfies mutual exclusion not progress requirement

1010

Algorithm 3 (Petersonrsquos Algorithm - 1981)

Shared variables of algorithms 1 and 2 - initialization

int flag[2] flag[0] = flag[1] = 0int turn = 0

Thread Ti

do

flag[i] = 1turn = jwhile ((flag[j] == 1) ampamp turn == j)

critical section

flag[i] = 0

remainder section

while (1)

Solves the critical-section problem for two threads

1111

Dekkerrsquos Algorithm (1965)

This is the first correct solution proposed for the two-thread (two-process) case

Originally developed by Dekker in a different context it was applied to the critical section problem by Dijkstra

Dekker adds the idea of a favored thread and allows access to either thread when the request is uncontested

When there is a conflict one thread is favored and the priority reverses after successful execution of the critical section

1212

Dekkerrsquos Algorithm (contd)

Shared variables - initializationint flag[2] flag[0] = flag[1] = 0int turn = 0

Thread Ti

do flag[i] = 1

while (flag[j] ) if (turn == j)

flag[i] = 0while (turn == j)flag[i] = 1

critical section

turn = jflag[I] = 0

remainder section

while (1)

1313

Bakery Algorithm (Lamport 1979)

A Solution to the Critical Section problem for n threads

Before entering its CS a thread receives a number

Holder of the smallest number enters the CS

If threads Ti and Tj receive the same number if i lt j then Ti is served first else Tj is served first

The numbering scheme generates numbers in monotonically non-decreasing order

ndash ie 1112333445

1414

Bakery Algorithm

Notation ldquoltldquo establishes lexicographical order among 2-tuples (ticket thread id )

(ab) lt (cd) if a lt c or if a == c and b lt d

max (a0hellip an-1) = k | k ai for i = 0hellip n ndash 1

Shared data

int choosing[n]

int number[n] - the ticket

Data structures are initialized to 0

1515

Bakery Algorithm

do

choosing[i] = 1

number[i] = max(number[0]number[1] number[n-1]) + 1

choosing[i] = 0

for (j = 0 j lt n j++)

while (choosing[j] == 1)

while ((number[j] = 0) ampamp ((number[j]j) lsquorsquoltlsquorsquo (number[i]i)))

critical section

number[i] = 0

remainder section

while (1)

1616

Mutual Exclusion - Hardware Support

Interrupt Disabling

ndash Concurrent threads cannot overlap on a uniprocessor

ndash Thread will run until performing a system call or interrupt happens

Special Atomic Machine Instructions

ndash Test and Set Instruction - read amp write a memory location

ndash Exchange Instruction - swap register and memory location

Problems with Machine-Instruction Approach

ndash Busy waiting

ndash Starvation is possible

ndash Deadlock is possible

1717

Synchronization Hardware

Test and modify the content of a word atomically

boolean TestAndSet(boolean amptarget)

boolean rv = target

target = true

return rv

1818

Shared data ndash boolean lock = false

Thread Ti

do

while (TestAndSet(lock))

critical section

lock = false

remainder section

Mutual Exclusion with Test-and-Set

1919

Synchronization Hardware

Atomically swap two variables

void Swap(boolean ampa boolean ampb)

boolean temp = a

a = b

b = temp

2020

Mutual Exclusion with Swap

Shared data (initialized to 0) int lock = 0

Thread Ti

int key

do

key = 1

while (key == 1) Swap(lockkey)

critical section

lock = 0

remainder section

2121

Semaphores

Semaphore S ndash integer variable

can only be accessed via two atomic operations

wait (S)

while (S lt= 0)S--

signal (S)

S++

2222

Critical Section of n Threads

Shared data

semaphore mutex initially mutex = 1

Thread Ti

do wait(mutex) critical section

signal(mutex) remainder section

while (1)

2323

Semaphore Implementation

Semaphores may suspendresume threads

ndash Avoid busy waiting

Define a semaphore as a record

typedef struct

int value struct thread L semaphore

Assume two simple operations

ndash suspend() suspends the thread that invokes it

ndash resume(T) resumes the execution of a blocked thread T

2424

Implementation

Semaphore operations now defined as wait(S)

Svalue--

if (Svalue lt 0)

add this thread to SLsuspend()

signal(S) Svalue++

if (Svalue lt= 0)

remove a thread T from SLresume(T)

2525

Semaphore as a General Synchronization Tool Execute B in Tj only after A executed in Ti

Use semaphore flag initialized to 0

Code

Ti Tj

A wait(flag)

signal(flag) B

2626

Two Types of Semaphores

Counting semaphore

ndash integer value can range over an unrestricted domain

Binary semaphore

ndash integer value can range only between 0 and 1

ndash can be simpler to implement

Counting semaphore S can be implemented as a binary semaphore

2727

Deadlock and Starvation

Deadlock ndash

ndash two or more threads are waiting indefinitely for an event that can be caused by only one of the waiting threads

Let S and Q be two semaphores initialized to 1

T0 T1

wait(S) wait(Q)

wait(Q) wait(S)

signal(S) signal(Q)

signal(Q) signal(S)

2828

Deadlock and Starvation

Starvation ndash indefinite blocking

ndash A thread may never be removed from the semaphore queue in which it is suspended

Solution ndash

ndash all code should acquirerelease semaphores in same order

2929

Windows Synchronization

Uses interrupt masks to protect access to global resources on uniprocessor systems

Uses spinlocks on multiprocessor systems

Provides dispatcher objects which may act as mutexes and semaphores

Dispatcher objects may also provide events An event acts much like a condition variable

3030

Linux Synchronization

Kernel disables interrupts for synchronizing access to global data on uniprocessor systems

Uses spinlocks for multiprocessor synchronization

Uses semaphores and readers-writers locks when longer sections of code need access to data

Implements POSIX synchronization primitives to support multitasking multithreading (including real-time threads) and multiprocessing

3131

Further Reading

Ben-Ari M Principles of Concurrent Programming Prentice Hall 1982

Lamport L The Mutual Exclusion Problem Journal of the ACM April 1986

Abraham Silberschatz Peter B Galvin Operating System Concepts John Wiley amp Sons 6th Ed 2003

ndash Chapter 7 - Process Synchronization

ndash Chapter 8 - Deadlocks

3232

32 Trap Dispatching Interrupts Synchronization Trap and Interrupt dispatching

IRQL levels amp Interrupt Precedence

Spinlocks and Kernel Synchronization

Executive Synchronization

3333

Kernel Mode Versus User Mode

A processor state

Controls access to memory

Each memory page is tagged to show the required mode for reading and for writing

ndash Protects the system from the users

ndash Protects the user (process) from themselves

ndash System is not protected from system

Code regions are tagged ldquono write in any moderdquo

Controls ability to execute privileged instructions

A Windows abstraction

ndash Intel Ring 0 Ring 3

3434

Kernel Mode Versus User Mode

Control flow (a thread) can change from user to kernel mode and back

ndash Does not affect scheduling

ndash Thread context includes info about execution mode (along with registers etc)

PerfMon counters

ndash ldquoPrivileged Timerdquo and ldquoUser Timerdquo

ndash 4 levels of granularity thread process processor system

3535

Getting Into Kernel Mode

Code is run in kernel mode for one of three reasons

1 Requests from user mode

ndash Via the system service dispatch mechanism

ndash Kernel-mode code runs in the context of the requesting thread

2 Dedicated kernel-mode system threads

ndash Some threads in the system stay in kernel mode at all timesmostly in the ldquoSystemrdquo process

ndash Scheduled preempted etc like any other threads

3636

Getting Into Kernel Mode

3 Interrupts from external devicesndash interrupt dispatcher invokes the interrupt service routine

ndash ISR runs in the context of the interrupted thread so-called ldquoarbitrary thread contextrdquo

ndash ISR often requests the execution of a ldquoDPC routinerdquo which also runs in kernel mode

ndash Time not charged to interrupted thread

3737

Trap dispatching

Interruptdispatcher

Systemservice

dispatcher

Interruptserviceroutines

Interruptserviceroutines

Interruptserviceroutines

System services

System services

System services

Exceptiondispatcher

Exceptionhandlers

Exceptionhandlers

Exceptionhandlers

Virtual memorymanagerlsquos pager

Interrupt

System service call

HW exceptionsSW exceptions

Virtual addressexceptions

Trap processorlsquos mechanism to capture executing thread

ndash Switch from user to kernel mode

ndash Interrupts ndash asynchronous

ndash Exceptions - synchronous

Interrupt dispatch routine

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Interrupt service routine

interrupt

user or kernel mode

codekernel mode

Note no thread or process context switch

Note no thread or process context switch

Interrupt Dispatching

3939

IRQL = Interrupt Request Level

ndash Precedence of the interrupt with respect to other interrupts

ndash Different interrupt sources have different IRQLs

ndash not the same as IRQ

IRQL is also a state of the processor

ndash Servicing an interrupt raises processor IRQL to that interruptrsquos IRQL

ndash this masks subsequent interrupts at equal and lower IRQLs

User mode is limited to IRQL 0

No waits or page faults at IRQL gt= DISPATCH_LEVEL

Interrupt Precedence via IRQLs (x86)

4040

PassiveLowAPC

DispatchDPCDevice 1

Profile amp Synch (Srv 2003)

ClockInterprocessor Interrupt

Power failHigh

normal thread execution

Hardware interrupts

Deferrable software interrupts

012

302928

31

Interrupt Precedence via IRQLs (x86)

4141

Interrupt processing

Interrupt dispatch table (IDT)ndash Links to interrupt service routines

x86ndash Interrupt controller interrupts processor (single line)

ndash Processor queries for interrupt vector uses vector as index to IDT

Alphandash PAL code (Privileged Architecture Library ndash Alpha BIOS) determines interrupt vector calls kernel

ndash Kernel uses vector to index IDT

After ISR execution IRQL is lowered to initial level

4242

Interrupt object

Allows device drivers to register ISRs for their devicesndash Contains dispatch code (initial handler)

ndash Dispatch code calls ISR with interrupt object as parameter(HW cannot pass parameters to ISR)

Connectingdisconnecting interrupt objectsndash Dynamic association between ISR and IDT entry

ndash Loadable device drivers (kernel modules)

ndash Turn onoff ISR

Interrupt objects can synchronize access to ISR datandash Multiple instances of ISR may be active simultaneously (MP machine)

ndash Multiple ISR may be connected with IRQL

4343

Predefined IRQLs

High ndash used when halting the system (via KeBugCheck())

Power fail ndash originated in the NT design document but has never been used

Inter-processor interruptndash used to request action from other processor (dispatching a thread updating a processors TLB system shutdown system crash)

Clockndash Used to update systemlsquos clock allocation of CPU time to threads

Profilendash Used for kernel profiling (see Kernel profiler ndash Kernprofexe Res Kit)

4444

Predefined IRQLs (contd)

Device

ndash Used to prioritize device interrupts

DPCdispatch and APC

ndash Software interrupts that kernel and device drivers generate

Passive

ndash No interrupt level at all normal thread execution

4545

IRQLs on 64-bit Systems

PassiveLowAPC

DispatchDPCDevice 1

Device n

Synch (Srv 2003)Clock

Interprocessor InterruptPower

HighProfile

012

1413

15

34

PassiveLowAPC

DispatchDPC amp Synch (UP only)Correctable Machine Check

Device 1

Device nSynch (MP only)

ClockInterprocessor Interrupt

HighProfilePower

x64 IA64

12

4646

Interrupt Prioritization amp Delivery

IRQLs are determined as followsndash x86 UP systems IRQL = 27 - IRQ

ndash x86 MP systems bucketized (random)

ndash x64 amp IA64 systems IRQL = IDT vector number 16

On MP systems which processor is chosen to deliver an interruptndash By default any processor can receive an interrupt from any deviceCan be configured with IntFilter utility in Resource Kit

ndash On x86 and x64 systems the IOAPIC (IO advanced programmable interrupt controller) is programmed to interrupt the processor running at the lowest IRQL

ndash On IA64 systems the SAPIC (streamlined advanced programmable interrupt controller) is configured to interrupt one processor for each interrupt sourceProcessors are assigned round robin for each interrupt vector

4747

Software interrupts

Initiating thread dispatching

ndash DPC allow for scheduling actions when kernel is deep within many layers of code

ndash Delayed scheduling decision one DPC queue per processor

Handling timer expiration

Asynchronous execution of a procedure in context of a particular thread

Support for asynchronous IO operations

4848

Flow of Interrupts

4949

Sync on MP use spinlocks to coordinate among processors

Spinlock acquisition and release routines implement a one-owner-at-a-time algorithm

ndash Spinlock is either free or is considered to be owned by a CPU

ndash Analogous to using Windows API mutexes from user mode

A spinlock is just a data cell in memory

ndash Accessed with a test-and-modify operation that is atomic across all processors

ndash KSPIN_LOCK is an opaque data type typedefrsquod as a ULONG

ndash To implement synchronization a single bit is sufficient

Synchronization on SMP Systems

5050

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

Kernel Synchronization

Processor BProcessor A

Critical section

spinlock

DPC DPC

A spinlock is a locking primitive associatedwith a global data structure such as the DPC queue

5151

Queued Spinlocks

Problem Checking status of spinlock via test-and-set operation creates bus contention

Queued spinlocks maintain queue of waiting processors

First processor acquires lock other processors wait on processor-local flag

ndash Thus busy-wait loop requires no access to the memory bus

When releasing lock the 1st processorrsquos flag is modified

ndash Exactly one processor is being signaled

ndash Pre-determined wait order

5252

SMP Scalability Improvements

Windows 2000 queued spinlocks

ndash qlocks in Kernel Debugger

Server 2003

ndash More spinlocks eliminated (context swap system space commit)

ndash Further reduction of use of spinlocks amp length they are held

ndash Scheduling database now per-CPUAllows thread state transitions in parallel

5353

SMP Scalability Improvements

XP2003

ndash Minimized lock contention for hot locks PFN or Page Frame Database lock

ndash Some locks completely eliminatedCharging nonpagedpaged pool quotas allocating and mapping system page table entries charging commitment of pages allocatingmapping physical memory through AWE functions

ndash New more efficient locking mechanism (pushlocks)Doesnrsquot use spinlocks when no contentionUsed for object manager and address windowing extensions (AWE) related locks

5454

Waiting

Flexible wait calls

ndash Wait for one or multiple objects in one call

ndash Wait for multiple can wait for ldquoanyrdquo one or ldquoallrdquo at onceldquoAllrdquo all objects must be in the signalled state concurrently to resolve the wait

ndash All wait calls include optional timeout argument

ndash Waiting threads consume no CPU time

5555

Waiting

Waitable objects include

ndash Events (may be auto-reset or manual reset may be set or ldquopulsedrdquo)

ndash Mutexes (ldquomutual exclusionrdquo one-at-a-time)

ndash Semaphores (n-at-a-time)

ndash Timers

ndash Processes and Threads (signalled upon exit or terminate)

ndash Directories (change notification)

5656

Waiting

No guaranteed ordering of wait resolution

ndash If multiple threads are waiting for an object and only one thread is released (eg itrsquos a mutex or auto-reset event) which thread gets released is unpredictable

5757

Executive Synchronization

Thread waitson an object

handle

Create and initialize thread object

Initialized

Ready

Transition

Waiting

Running

Terminated

Standby

Wait is completeSet object to

signaled state

Interaction with thread scheduling

Waiting on Dispatcher Objects ndash outside the kernel

5858

Interaction bet Synchronization amp Dispatching User mode thread waits on an event objectlsquos handle

Kernel changes threadlsquos scheduling state from ready to waiting and adds thread to wait-list

Another thread sets the event

Kernel wakes up waiting threads variable priority threads get priority boost

5959

Interaction bet Synchronization amp Dispatching Dispatcher re-schedules new thread ndash it may preempt running thread it it has lower priority and issues software interrupt to initiate context switch

If no processor can be preempted the dispatcher places the ready thread in the dispatcher ready queue to be scheduled later

6060

What signals an object

Dispatcher object

System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

Owning thread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (kernel mode)

nonsignaled signaled

Owning thread or otherthread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (exported to user mode)

nonsignaled signaled

One thread releases thesemaphore freeing a resource

A thread acquires the semaphoreMore resources are not available

Kernel resumes one or more waiting threads

Semaphore

6161

A thread reinitializesthe thread object

What signals an object (contd)

Dispatcher object System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

A thread sets the event

Kernel resumes one or more threads

Kernel resumes one or more waiting threads

Event

nonsignaled signaled

Dedicated thread sets oneevent in the event pair

Kernel resumes theother dedicated thread

Kernel resumes waitingdedicated thread

Event pair

nonsignaled signaled

Timer expires

A thread (re) initializes the timer

Kernel resumes all waiting threads

Timer

nonsignaled signaled

Thread terminates

Kernel resumes all waiting threads

Thread

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 3: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

33

Critical Sections Semaphores and Monitors The Critical-Section Problem

Software Solutions

Synchronization Hardware

Semaphores

Synchronization in Windows amp Linux

44

The Critical-Section Problem

n threads all competing to use a shared resource

Each thread has a code segment called critical section in which the shared data is accessed

Problem Ensure that

ndash when one thread is executing in its critical section no other thread is allowed to execute in its critical section

55

Solution to Critical-Section Problem

Mutual Exclusion

ndash Only one thread at a time is allowed into its CS among all threads that have CS for the same resource or shared data

ndash A thread halted in its non-critical section must not interfere with other threads

Progress

ndash A thread remains inside CS for a finite time only

ndash No assumptions concerning relative speed of the threads

66

Solution to Critical-Section Problem

Bounded Waiting

ndash It must no be possible for a thread requiring access to a critical section to be delayed indefinitely

ndash When no thread is in a critical section any thread that requests entry must be permitted to enter without delay

77

Only 2 threads T0 and T1

General structure of thread Ti (other thread Tj)

do

enter section

critical section

exit section

reminder section

while (1)

Threads may share some common variables to synchronize their actions

Initial Attempts to Solve Problem

88

First Attempt Algorithm 1

Shared variables

ndash Initialization int turn = 0

ndash turn == i Ti can enter its critical section

Thread Ti

do

while (turn = i)

critical section

turn = j

reminder section

while (1)

Satisfies mutual exclusion but not progress

99

Second Attempt Algorithm 2

Shared variables

ndash initialization int flag[2] flag[0] = flag[1] = 0

ndash flag[i] == 1 Ti can enter its critical section

Thread Ti

do

flag[i] = 1while (flag[j] == 1)

critical section

flag[i] = 0remainder section

while(1)

Satisfies mutual exclusion not progress requirement

1010

Algorithm 3 (Petersonrsquos Algorithm - 1981)

Shared variables of algorithms 1 and 2 - initialization

int flag[2] flag[0] = flag[1] = 0int turn = 0

Thread Ti

do

flag[i] = 1turn = jwhile ((flag[j] == 1) ampamp turn == j)

critical section

flag[i] = 0

remainder section

while (1)

Solves the critical-section problem for two threads

1111

Dekkerrsquos Algorithm (1965)

This is the first correct solution proposed for the two-thread (two-process) case

Originally developed by Dekker in a different context it was applied to the critical section problem by Dijkstra

Dekker adds the idea of a favored thread and allows access to either thread when the request is uncontested

When there is a conflict one thread is favored and the priority reverses after successful execution of the critical section

1212

Dekkerrsquos Algorithm (contd)

Shared variables - initializationint flag[2] flag[0] = flag[1] = 0int turn = 0

Thread Ti

do flag[i] = 1

while (flag[j] ) if (turn == j)

flag[i] = 0while (turn == j)flag[i] = 1

critical section

turn = jflag[I] = 0

remainder section

while (1)

1313

Bakery Algorithm (Lamport 1979)

A Solution to the Critical Section problem for n threads

Before entering its CS a thread receives a number

Holder of the smallest number enters the CS

If threads Ti and Tj receive the same number if i lt j then Ti is served first else Tj is served first

The numbering scheme generates numbers in monotonically non-decreasing order

ndash ie 1112333445

1414

Bakery Algorithm

Notation ldquoltldquo establishes lexicographical order among 2-tuples (ticket thread id )

(ab) lt (cd) if a lt c or if a == c and b lt d

max (a0hellip an-1) = k | k ai for i = 0hellip n ndash 1

Shared data

int choosing[n]

int number[n] - the ticket

Data structures are initialized to 0

1515

Bakery Algorithm

do

choosing[i] = 1

number[i] = max(number[0]number[1] number[n-1]) + 1

choosing[i] = 0

for (j = 0 j lt n j++)

while (choosing[j] == 1)

while ((number[j] = 0) ampamp ((number[j]j) lsquorsquoltlsquorsquo (number[i]i)))

critical section

number[i] = 0

remainder section

while (1)

1616

Mutual Exclusion - Hardware Support

Interrupt Disabling

ndash Concurrent threads cannot overlap on a uniprocessor

ndash Thread will run until performing a system call or interrupt happens

Special Atomic Machine Instructions

ndash Test and Set Instruction - read amp write a memory location

ndash Exchange Instruction - swap register and memory location

Problems with Machine-Instruction Approach

ndash Busy waiting

ndash Starvation is possible

ndash Deadlock is possible

1717

Synchronization Hardware

Test and modify the content of a word atomically

boolean TestAndSet(boolean amptarget)

boolean rv = target

target = true

return rv

1818

Shared data ndash boolean lock = false

Thread Ti

do

while (TestAndSet(lock))

critical section

lock = false

remainder section

Mutual Exclusion with Test-and-Set

1919

Synchronization Hardware

Atomically swap two variables

void Swap(boolean ampa boolean ampb)

boolean temp = a

a = b

b = temp

2020

Mutual Exclusion with Swap

Shared data (initialized to 0) int lock = 0

Thread Ti

int key

do

key = 1

while (key == 1) Swap(lockkey)

critical section

lock = 0

remainder section

2121

Semaphores

Semaphore S ndash integer variable

can only be accessed via two atomic operations

wait (S)

while (S lt= 0)S--

signal (S)

S++

2222

Critical Section of n Threads

Shared data

semaphore mutex initially mutex = 1

Thread Ti

do wait(mutex) critical section

signal(mutex) remainder section

while (1)

2323

Semaphore Implementation

Semaphores may suspendresume threads

ndash Avoid busy waiting

Define a semaphore as a record

typedef struct

int value struct thread L semaphore

Assume two simple operations

ndash suspend() suspends the thread that invokes it

ndash resume(T) resumes the execution of a blocked thread T

2424

Implementation

Semaphore operations now defined as wait(S)

Svalue--

if (Svalue lt 0)

add this thread to SLsuspend()

signal(S) Svalue++

if (Svalue lt= 0)

remove a thread T from SLresume(T)

2525

Semaphore as a General Synchronization Tool Execute B in Tj only after A executed in Ti

Use semaphore flag initialized to 0

Code

Ti Tj

A wait(flag)

signal(flag) B

2626

Two Types of Semaphores

Counting semaphore

ndash integer value can range over an unrestricted domain

Binary semaphore

ndash integer value can range only between 0 and 1

ndash can be simpler to implement

Counting semaphore S can be implemented as a binary semaphore

2727

Deadlock and Starvation

Deadlock ndash

ndash two or more threads are waiting indefinitely for an event that can be caused by only one of the waiting threads

Let S and Q be two semaphores initialized to 1

T0 T1

wait(S) wait(Q)

wait(Q) wait(S)

signal(S) signal(Q)

signal(Q) signal(S)

2828

Deadlock and Starvation

Starvation ndash indefinite blocking

ndash A thread may never be removed from the semaphore queue in which it is suspended

Solution ndash

ndash all code should acquirerelease semaphores in same order

2929

Windows Synchronization

Uses interrupt masks to protect access to global resources on uniprocessor systems

Uses spinlocks on multiprocessor systems

Provides dispatcher objects which may act as mutexes and semaphores

Dispatcher objects may also provide events An event acts much like a condition variable

3030

Linux Synchronization

Kernel disables interrupts for synchronizing access to global data on uniprocessor systems

Uses spinlocks for multiprocessor synchronization

Uses semaphores and readers-writers locks when longer sections of code need access to data

Implements POSIX synchronization primitives to support multitasking multithreading (including real-time threads) and multiprocessing

3131

Further Reading

Ben-Ari M Principles of Concurrent Programming Prentice Hall 1982

Lamport L The Mutual Exclusion Problem Journal of the ACM April 1986

Abraham Silberschatz Peter B Galvin Operating System Concepts John Wiley amp Sons 6th Ed 2003

ndash Chapter 7 - Process Synchronization

ndash Chapter 8 - Deadlocks

3232

32 Trap Dispatching Interrupts Synchronization Trap and Interrupt dispatching

IRQL levels amp Interrupt Precedence

Spinlocks and Kernel Synchronization

Executive Synchronization

3333

Kernel Mode Versus User Mode

A processor state

Controls access to memory

Each memory page is tagged to show the required mode for reading and for writing

ndash Protects the system from the users

ndash Protects the user (process) from themselves

ndash System is not protected from system

Code regions are tagged ldquono write in any moderdquo

Controls ability to execute privileged instructions

A Windows abstraction

ndash Intel Ring 0 Ring 3

3434

Kernel Mode Versus User Mode

Control flow (a thread) can change from user to kernel mode and back

ndash Does not affect scheduling

ndash Thread context includes info about execution mode (along with registers etc)

PerfMon counters

ndash ldquoPrivileged Timerdquo and ldquoUser Timerdquo

ndash 4 levels of granularity thread process processor system

3535

Getting Into Kernel Mode

Code is run in kernel mode for one of three reasons

1 Requests from user mode

ndash Via the system service dispatch mechanism

ndash Kernel-mode code runs in the context of the requesting thread

2 Dedicated kernel-mode system threads

ndash Some threads in the system stay in kernel mode at all timesmostly in the ldquoSystemrdquo process

ndash Scheduled preempted etc like any other threads

3636

Getting Into Kernel Mode

3 Interrupts from external devicesndash interrupt dispatcher invokes the interrupt service routine

ndash ISR runs in the context of the interrupted thread so-called ldquoarbitrary thread contextrdquo

ndash ISR often requests the execution of a ldquoDPC routinerdquo which also runs in kernel mode

ndash Time not charged to interrupted thread

3737

Trap dispatching

Interruptdispatcher

Systemservice

dispatcher

Interruptserviceroutines

Interruptserviceroutines

Interruptserviceroutines

System services

System services

System services

Exceptiondispatcher

Exceptionhandlers

Exceptionhandlers

Exceptionhandlers

Virtual memorymanagerlsquos pager

Interrupt

System service call

HW exceptionsSW exceptions

Virtual addressexceptions

Trap processorlsquos mechanism to capture executing thread

ndash Switch from user to kernel mode

ndash Interrupts ndash asynchronous

ndash Exceptions - synchronous

Interrupt dispatch routine

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Interrupt service routine

interrupt

user or kernel mode

codekernel mode

Note no thread or process context switch

Note no thread or process context switch

Interrupt Dispatching

3939

IRQL = Interrupt Request Level

ndash Precedence of the interrupt with respect to other interrupts

ndash Different interrupt sources have different IRQLs

ndash not the same as IRQ

IRQL is also a state of the processor

ndash Servicing an interrupt raises processor IRQL to that interruptrsquos IRQL

ndash this masks subsequent interrupts at equal and lower IRQLs

User mode is limited to IRQL 0

No waits or page faults at IRQL gt= DISPATCH_LEVEL

Interrupt Precedence via IRQLs (x86)

4040

PassiveLowAPC

DispatchDPCDevice 1

Profile amp Synch (Srv 2003)

ClockInterprocessor Interrupt

Power failHigh

normal thread execution

Hardware interrupts

Deferrable software interrupts

012

302928

31

Interrupt Precedence via IRQLs (x86)

4141

Interrupt processing

Interrupt dispatch table (IDT)ndash Links to interrupt service routines

x86ndash Interrupt controller interrupts processor (single line)

ndash Processor queries for interrupt vector uses vector as index to IDT

Alphandash PAL code (Privileged Architecture Library ndash Alpha BIOS) determines interrupt vector calls kernel

ndash Kernel uses vector to index IDT

After ISR execution IRQL is lowered to initial level

4242

Interrupt object

Allows device drivers to register ISRs for their devicesndash Contains dispatch code (initial handler)

ndash Dispatch code calls ISR with interrupt object as parameter(HW cannot pass parameters to ISR)

Connectingdisconnecting interrupt objectsndash Dynamic association between ISR and IDT entry

ndash Loadable device drivers (kernel modules)

ndash Turn onoff ISR

Interrupt objects can synchronize access to ISR datandash Multiple instances of ISR may be active simultaneously (MP machine)

ndash Multiple ISR may be connected with IRQL

4343

Predefined IRQLs

High ndash used when halting the system (via KeBugCheck())

Power fail ndash originated in the NT design document but has never been used

Inter-processor interruptndash used to request action from other processor (dispatching a thread updating a processors TLB system shutdown system crash)

Clockndash Used to update systemlsquos clock allocation of CPU time to threads

Profilendash Used for kernel profiling (see Kernel profiler ndash Kernprofexe Res Kit)

4444

Predefined IRQLs (contd)

Device

ndash Used to prioritize device interrupts

DPCdispatch and APC

ndash Software interrupts that kernel and device drivers generate

Passive

ndash No interrupt level at all normal thread execution

4545

IRQLs on 64-bit Systems

PassiveLowAPC

DispatchDPCDevice 1

Device n

Synch (Srv 2003)Clock

Interprocessor InterruptPower

HighProfile

012

1413

15

34

PassiveLowAPC

DispatchDPC amp Synch (UP only)Correctable Machine Check

Device 1

Device nSynch (MP only)

ClockInterprocessor Interrupt

HighProfilePower

x64 IA64

12

4646

Interrupt Prioritization amp Delivery

IRQLs are determined as followsndash x86 UP systems IRQL = 27 - IRQ

ndash x86 MP systems bucketized (random)

ndash x64 amp IA64 systems IRQL = IDT vector number 16

On MP systems which processor is chosen to deliver an interruptndash By default any processor can receive an interrupt from any deviceCan be configured with IntFilter utility in Resource Kit

ndash On x86 and x64 systems the IOAPIC (IO advanced programmable interrupt controller) is programmed to interrupt the processor running at the lowest IRQL

ndash On IA64 systems the SAPIC (streamlined advanced programmable interrupt controller) is configured to interrupt one processor for each interrupt sourceProcessors are assigned round robin for each interrupt vector

4747

Software interrupts

Initiating thread dispatching

ndash DPC allow for scheduling actions when kernel is deep within many layers of code

ndash Delayed scheduling decision one DPC queue per processor

Handling timer expiration

Asynchronous execution of a procedure in context of a particular thread

Support for asynchronous IO operations

4848

Flow of Interrupts

4949

Sync on MP use spinlocks to coordinate among processors

Spinlock acquisition and release routines implement a one-owner-at-a-time algorithm

ndash Spinlock is either free or is considered to be owned by a CPU

ndash Analogous to using Windows API mutexes from user mode

A spinlock is just a data cell in memory

ndash Accessed with a test-and-modify operation that is atomic across all processors

ndash KSPIN_LOCK is an opaque data type typedefrsquod as a ULONG

ndash To implement synchronization a single bit is sufficient

Synchronization on SMP Systems

5050

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

Kernel Synchronization

Processor BProcessor A

Critical section

spinlock

DPC DPC

A spinlock is a locking primitive associatedwith a global data structure such as the DPC queue

5151

Queued Spinlocks

Problem Checking status of spinlock via test-and-set operation creates bus contention

Queued spinlocks maintain queue of waiting processors

First processor acquires lock other processors wait on processor-local flag

ndash Thus busy-wait loop requires no access to the memory bus

When releasing lock the 1st processorrsquos flag is modified

ndash Exactly one processor is being signaled

ndash Pre-determined wait order

5252

SMP Scalability Improvements

Windows 2000 queued spinlocks

ndash qlocks in Kernel Debugger

Server 2003

ndash More spinlocks eliminated (context swap system space commit)

ndash Further reduction of use of spinlocks amp length they are held

ndash Scheduling database now per-CPUAllows thread state transitions in parallel

5353

SMP Scalability Improvements

XP2003

ndash Minimized lock contention for hot locks PFN or Page Frame Database lock

ndash Some locks completely eliminatedCharging nonpagedpaged pool quotas allocating and mapping system page table entries charging commitment of pages allocatingmapping physical memory through AWE functions

ndash New more efficient locking mechanism (pushlocks)Doesnrsquot use spinlocks when no contentionUsed for object manager and address windowing extensions (AWE) related locks

5454

Waiting

Flexible wait calls

ndash Wait for one or multiple objects in one call

ndash Wait for multiple can wait for ldquoanyrdquo one or ldquoallrdquo at onceldquoAllrdquo all objects must be in the signalled state concurrently to resolve the wait

ndash All wait calls include optional timeout argument

ndash Waiting threads consume no CPU time

5555

Waiting

Waitable objects include

ndash Events (may be auto-reset or manual reset may be set or ldquopulsedrdquo)

ndash Mutexes (ldquomutual exclusionrdquo one-at-a-time)

ndash Semaphores (n-at-a-time)

ndash Timers

ndash Processes and Threads (signalled upon exit or terminate)

ndash Directories (change notification)

5656

Waiting

No guaranteed ordering of wait resolution

ndash If multiple threads are waiting for an object and only one thread is released (eg itrsquos a mutex or auto-reset event) which thread gets released is unpredictable

5757

Executive Synchronization

Thread waitson an object

handle

Create and initialize thread object

Initialized

Ready

Transition

Waiting

Running

Terminated

Standby

Wait is completeSet object to

signaled state

Interaction with thread scheduling

Waiting on Dispatcher Objects ndash outside the kernel

5858

Interaction bet Synchronization amp Dispatching User mode thread waits on an event objectlsquos handle

Kernel changes threadlsquos scheduling state from ready to waiting and adds thread to wait-list

Another thread sets the event

Kernel wakes up waiting threads variable priority threads get priority boost

5959

Interaction bet Synchronization amp Dispatching Dispatcher re-schedules new thread ndash it may preempt running thread it it has lower priority and issues software interrupt to initiate context switch

If no processor can be preempted the dispatcher places the ready thread in the dispatcher ready queue to be scheduled later

6060

What signals an object

Dispatcher object

System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

Owning thread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (kernel mode)

nonsignaled signaled

Owning thread or otherthread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (exported to user mode)

nonsignaled signaled

One thread releases thesemaphore freeing a resource

A thread acquires the semaphoreMore resources are not available

Kernel resumes one or more waiting threads

Semaphore

6161

A thread reinitializesthe thread object

What signals an object (contd)

Dispatcher object System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

A thread sets the event

Kernel resumes one or more threads

Kernel resumes one or more waiting threads

Event

nonsignaled signaled

Dedicated thread sets oneevent in the event pair

Kernel resumes theother dedicated thread

Kernel resumes waitingdedicated thread

Event pair

nonsignaled signaled

Timer expires

A thread (re) initializes the timer

Kernel resumes all waiting threads

Timer

nonsignaled signaled

Thread terminates

Kernel resumes all waiting threads

Thread

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 4: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

44

The Critical-Section Problem

n threads all competing to use a shared resource

Each thread has a code segment called critical section in which the shared data is accessed

Problem Ensure that

ndash when one thread is executing in its critical section no other thread is allowed to execute in its critical section

55

Solution to Critical-Section Problem

Mutual Exclusion

ndash Only one thread at a time is allowed into its CS among all threads that have CS for the same resource or shared data

ndash A thread halted in its non-critical section must not interfere with other threads

Progress

ndash A thread remains inside CS for a finite time only

ndash No assumptions concerning relative speed of the threads

66

Solution to Critical-Section Problem

Bounded Waiting

ndash It must no be possible for a thread requiring access to a critical section to be delayed indefinitely

ndash When no thread is in a critical section any thread that requests entry must be permitted to enter without delay

77

Only 2 threads T0 and T1

General structure of thread Ti (other thread Tj)

do

enter section

critical section

exit section

reminder section

while (1)

Threads may share some common variables to synchronize their actions

Initial Attempts to Solve Problem

88

First Attempt Algorithm 1

Shared variables

ndash Initialization int turn = 0

ndash turn == i Ti can enter its critical section

Thread Ti

do

while (turn = i)

critical section

turn = j

reminder section

while (1)

Satisfies mutual exclusion but not progress

99

Second Attempt Algorithm 2

Shared variables

ndash initialization int flag[2] flag[0] = flag[1] = 0

ndash flag[i] == 1 Ti can enter its critical section

Thread Ti

do

flag[i] = 1while (flag[j] == 1)

critical section

flag[i] = 0remainder section

while(1)

Satisfies mutual exclusion not progress requirement

1010

Algorithm 3 (Petersonrsquos Algorithm - 1981)

Shared variables of algorithms 1 and 2 - initialization

int flag[2] flag[0] = flag[1] = 0int turn = 0

Thread Ti

do

flag[i] = 1turn = jwhile ((flag[j] == 1) ampamp turn == j)

critical section

flag[i] = 0

remainder section

while (1)

Solves the critical-section problem for two threads

1111

Dekkerrsquos Algorithm (1965)

This is the first correct solution proposed for the two-thread (two-process) case

Originally developed by Dekker in a different context it was applied to the critical section problem by Dijkstra

Dekker adds the idea of a favored thread and allows access to either thread when the request is uncontested

When there is a conflict one thread is favored and the priority reverses after successful execution of the critical section

1212

Dekkerrsquos Algorithm (contd)

Shared variables - initializationint flag[2] flag[0] = flag[1] = 0int turn = 0

Thread Ti

do flag[i] = 1

while (flag[j] ) if (turn == j)

flag[i] = 0while (turn == j)flag[i] = 1

critical section

turn = jflag[I] = 0

remainder section

while (1)

1313

Bakery Algorithm (Lamport 1979)

A Solution to the Critical Section problem for n threads

Before entering its CS a thread receives a number

Holder of the smallest number enters the CS

If threads Ti and Tj receive the same number if i lt j then Ti is served first else Tj is served first

The numbering scheme generates numbers in monotonically non-decreasing order

ndash ie 1112333445

1414

Bakery Algorithm

Notation ldquoltldquo establishes lexicographical order among 2-tuples (ticket thread id )

(ab) lt (cd) if a lt c or if a == c and b lt d

max (a0hellip an-1) = k | k ai for i = 0hellip n ndash 1

Shared data

int choosing[n]

int number[n] - the ticket

Data structures are initialized to 0

1515

Bakery Algorithm

do

choosing[i] = 1

number[i] = max(number[0]number[1] number[n-1]) + 1

choosing[i] = 0

for (j = 0 j lt n j++)

while (choosing[j] == 1)

while ((number[j] = 0) ampamp ((number[j]j) lsquorsquoltlsquorsquo (number[i]i)))

critical section

number[i] = 0

remainder section

while (1)

1616

Mutual Exclusion - Hardware Support

Interrupt Disabling

ndash Concurrent threads cannot overlap on a uniprocessor

ndash Thread will run until performing a system call or interrupt happens

Special Atomic Machine Instructions

ndash Test and Set Instruction - read amp write a memory location

ndash Exchange Instruction - swap register and memory location

Problems with Machine-Instruction Approach

ndash Busy waiting

ndash Starvation is possible

ndash Deadlock is possible

1717

Synchronization Hardware

Test and modify the content of a word atomically

boolean TestAndSet(boolean amptarget)

boolean rv = target

target = true

return rv

1818

Shared data ndash boolean lock = false

Thread Ti

do

while (TestAndSet(lock))

critical section

lock = false

remainder section

Mutual Exclusion with Test-and-Set

1919

Synchronization Hardware

Atomically swap two variables

void Swap(boolean ampa boolean ampb)

boolean temp = a

a = b

b = temp

2020

Mutual Exclusion with Swap

Shared data (initialized to 0) int lock = 0

Thread Ti

int key

do

key = 1

while (key == 1) Swap(lockkey)

critical section

lock = 0

remainder section

2121

Semaphores

Semaphore S ndash integer variable

can only be accessed via two atomic operations

wait (S)

while (S lt= 0)S--

signal (S)

S++

2222

Critical Section of n Threads

Shared data

semaphore mutex initially mutex = 1

Thread Ti

do wait(mutex) critical section

signal(mutex) remainder section

while (1)

2323

Semaphore Implementation

Semaphores may suspendresume threads

ndash Avoid busy waiting

Define a semaphore as a record

typedef struct

int value struct thread L semaphore

Assume two simple operations

ndash suspend() suspends the thread that invokes it

ndash resume(T) resumes the execution of a blocked thread T

2424

Implementation

Semaphore operations now defined as wait(S)

Svalue--

if (Svalue lt 0)

add this thread to SLsuspend()

signal(S) Svalue++

if (Svalue lt= 0)

remove a thread T from SLresume(T)

2525

Semaphore as a General Synchronization Tool Execute B in Tj only after A executed in Ti

Use semaphore flag initialized to 0

Code

Ti Tj

A wait(flag)

signal(flag) B

2626

Two Types of Semaphores

Counting semaphore

ndash integer value can range over an unrestricted domain

Binary semaphore

ndash integer value can range only between 0 and 1

ndash can be simpler to implement

Counting semaphore S can be implemented as a binary semaphore

2727

Deadlock and Starvation

Deadlock ndash

ndash two or more threads are waiting indefinitely for an event that can be caused by only one of the waiting threads

Let S and Q be two semaphores initialized to 1

T0 T1

wait(S) wait(Q)

wait(Q) wait(S)

signal(S) signal(Q)

signal(Q) signal(S)

2828

Deadlock and Starvation

Starvation ndash indefinite blocking

ndash A thread may never be removed from the semaphore queue in which it is suspended

Solution ndash

ndash all code should acquirerelease semaphores in same order

2929

Windows Synchronization

Uses interrupt masks to protect access to global resources on uniprocessor systems

Uses spinlocks on multiprocessor systems

Provides dispatcher objects which may act as mutexes and semaphores

Dispatcher objects may also provide events An event acts much like a condition variable

3030

Linux Synchronization

Kernel disables interrupts for synchronizing access to global data on uniprocessor systems

Uses spinlocks for multiprocessor synchronization

Uses semaphores and readers-writers locks when longer sections of code need access to data

Implements POSIX synchronization primitives to support multitasking multithreading (including real-time threads) and multiprocessing

3131

Further Reading

Ben-Ari M Principles of Concurrent Programming Prentice Hall 1982

Lamport L The Mutual Exclusion Problem Journal of the ACM April 1986

Abraham Silberschatz Peter B Galvin Operating System Concepts John Wiley amp Sons 6th Ed 2003

ndash Chapter 7 - Process Synchronization

ndash Chapter 8 - Deadlocks

3232

32 Trap Dispatching Interrupts Synchronization Trap and Interrupt dispatching

IRQL levels amp Interrupt Precedence

Spinlocks and Kernel Synchronization

Executive Synchronization

3333

Kernel Mode Versus User Mode

A processor state

Controls access to memory

Each memory page is tagged to show the required mode for reading and for writing

ndash Protects the system from the users

ndash Protects the user (process) from themselves

ndash System is not protected from system

Code regions are tagged ldquono write in any moderdquo

Controls ability to execute privileged instructions

A Windows abstraction

ndash Intel Ring 0 Ring 3

3434

Kernel Mode Versus User Mode

Control flow (a thread) can change from user to kernel mode and back

ndash Does not affect scheduling

ndash Thread context includes info about execution mode (along with registers etc)

PerfMon counters

ndash ldquoPrivileged Timerdquo and ldquoUser Timerdquo

ndash 4 levels of granularity thread process processor system

3535

Getting Into Kernel Mode

Code is run in kernel mode for one of three reasons

1 Requests from user mode

ndash Via the system service dispatch mechanism

ndash Kernel-mode code runs in the context of the requesting thread

2 Dedicated kernel-mode system threads

ndash Some threads in the system stay in kernel mode at all timesmostly in the ldquoSystemrdquo process

ndash Scheduled preempted etc like any other threads

3636

Getting Into Kernel Mode

3 Interrupts from external devicesndash interrupt dispatcher invokes the interrupt service routine

ndash ISR runs in the context of the interrupted thread so-called ldquoarbitrary thread contextrdquo

ndash ISR often requests the execution of a ldquoDPC routinerdquo which also runs in kernel mode

ndash Time not charged to interrupted thread

3737

Trap dispatching

Interruptdispatcher

Systemservice

dispatcher

Interruptserviceroutines

Interruptserviceroutines

Interruptserviceroutines

System services

System services

System services

Exceptiondispatcher

Exceptionhandlers

Exceptionhandlers

Exceptionhandlers

Virtual memorymanagerlsquos pager

Interrupt

System service call

HW exceptionsSW exceptions

Virtual addressexceptions

Trap processorlsquos mechanism to capture executing thread

ndash Switch from user to kernel mode

ndash Interrupts ndash asynchronous

ndash Exceptions - synchronous

Interrupt dispatch routine

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Interrupt service routine

interrupt

user or kernel mode

codekernel mode

Note no thread or process context switch

Note no thread or process context switch

Interrupt Dispatching

3939

IRQL = Interrupt Request Level

ndash Precedence of the interrupt with respect to other interrupts

ndash Different interrupt sources have different IRQLs

ndash not the same as IRQ

IRQL is also a state of the processor

ndash Servicing an interrupt raises processor IRQL to that interruptrsquos IRQL

ndash this masks subsequent interrupts at equal and lower IRQLs

User mode is limited to IRQL 0

No waits or page faults at IRQL gt= DISPATCH_LEVEL

Interrupt Precedence via IRQLs (x86)

4040

PassiveLowAPC

DispatchDPCDevice 1

Profile amp Synch (Srv 2003)

ClockInterprocessor Interrupt

Power failHigh

normal thread execution

Hardware interrupts

Deferrable software interrupts

012

302928

31

Interrupt Precedence via IRQLs (x86)

4141

Interrupt processing

Interrupt dispatch table (IDT)ndash Links to interrupt service routines

x86ndash Interrupt controller interrupts processor (single line)

ndash Processor queries for interrupt vector uses vector as index to IDT

Alphandash PAL code (Privileged Architecture Library ndash Alpha BIOS) determines interrupt vector calls kernel

ndash Kernel uses vector to index IDT

After ISR execution IRQL is lowered to initial level

4242

Interrupt object

Allows device drivers to register ISRs for their devicesndash Contains dispatch code (initial handler)

ndash Dispatch code calls ISR with interrupt object as parameter(HW cannot pass parameters to ISR)

Connectingdisconnecting interrupt objectsndash Dynamic association between ISR and IDT entry

ndash Loadable device drivers (kernel modules)

ndash Turn onoff ISR

Interrupt objects can synchronize access to ISR datandash Multiple instances of ISR may be active simultaneously (MP machine)

ndash Multiple ISR may be connected with IRQL

4343

Predefined IRQLs

High ndash used when halting the system (via KeBugCheck())

Power fail ndash originated in the NT design document but has never been used

Inter-processor interruptndash used to request action from other processor (dispatching a thread updating a processors TLB system shutdown system crash)

Clockndash Used to update systemlsquos clock allocation of CPU time to threads

Profilendash Used for kernel profiling (see Kernel profiler ndash Kernprofexe Res Kit)

4444

Predefined IRQLs (contd)

Device

ndash Used to prioritize device interrupts

DPCdispatch and APC

ndash Software interrupts that kernel and device drivers generate

Passive

ndash No interrupt level at all normal thread execution

4545

IRQLs on 64-bit Systems

PassiveLowAPC

DispatchDPCDevice 1

Device n

Synch (Srv 2003)Clock

Interprocessor InterruptPower

HighProfile

012

1413

15

34

PassiveLowAPC

DispatchDPC amp Synch (UP only)Correctable Machine Check

Device 1

Device nSynch (MP only)

ClockInterprocessor Interrupt

HighProfilePower

x64 IA64

12

4646

Interrupt Prioritization amp Delivery

IRQLs are determined as followsndash x86 UP systems IRQL = 27 - IRQ

ndash x86 MP systems bucketized (random)

ndash x64 amp IA64 systems IRQL = IDT vector number 16

On MP systems which processor is chosen to deliver an interruptndash By default any processor can receive an interrupt from any deviceCan be configured with IntFilter utility in Resource Kit

ndash On x86 and x64 systems the IOAPIC (IO advanced programmable interrupt controller) is programmed to interrupt the processor running at the lowest IRQL

ndash On IA64 systems the SAPIC (streamlined advanced programmable interrupt controller) is configured to interrupt one processor for each interrupt sourceProcessors are assigned round robin for each interrupt vector

4747

Software interrupts

Initiating thread dispatching

ndash DPC allow for scheduling actions when kernel is deep within many layers of code

ndash Delayed scheduling decision one DPC queue per processor

Handling timer expiration

Asynchronous execution of a procedure in context of a particular thread

Support for asynchronous IO operations

4848

Flow of Interrupts

4949

Sync on MP use spinlocks to coordinate among processors

Spinlock acquisition and release routines implement a one-owner-at-a-time algorithm

ndash Spinlock is either free or is considered to be owned by a CPU

ndash Analogous to using Windows API mutexes from user mode

A spinlock is just a data cell in memory

ndash Accessed with a test-and-modify operation that is atomic across all processors

ndash KSPIN_LOCK is an opaque data type typedefrsquod as a ULONG

ndash To implement synchronization a single bit is sufficient

Synchronization on SMP Systems

5050

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

Kernel Synchronization

Processor BProcessor A

Critical section

spinlock

DPC DPC

A spinlock is a locking primitive associatedwith a global data structure such as the DPC queue

5151

Queued Spinlocks

Problem Checking status of spinlock via test-and-set operation creates bus contention

Queued spinlocks maintain queue of waiting processors

First processor acquires lock other processors wait on processor-local flag

ndash Thus busy-wait loop requires no access to the memory bus

When releasing lock the 1st processorrsquos flag is modified

ndash Exactly one processor is being signaled

ndash Pre-determined wait order

5252

SMP Scalability Improvements

Windows 2000 queued spinlocks

ndash qlocks in Kernel Debugger

Server 2003

ndash More spinlocks eliminated (context swap system space commit)

ndash Further reduction of use of spinlocks amp length they are held

ndash Scheduling database now per-CPUAllows thread state transitions in parallel

5353

SMP Scalability Improvements

XP2003

ndash Minimized lock contention for hot locks PFN or Page Frame Database lock

ndash Some locks completely eliminatedCharging nonpagedpaged pool quotas allocating and mapping system page table entries charging commitment of pages allocatingmapping physical memory through AWE functions

ndash New more efficient locking mechanism (pushlocks)Doesnrsquot use spinlocks when no contentionUsed for object manager and address windowing extensions (AWE) related locks

5454

Waiting

Flexible wait calls

ndash Wait for one or multiple objects in one call

ndash Wait for multiple can wait for ldquoanyrdquo one or ldquoallrdquo at onceldquoAllrdquo all objects must be in the signalled state concurrently to resolve the wait

ndash All wait calls include optional timeout argument

ndash Waiting threads consume no CPU time

5555

Waiting

Waitable objects include

ndash Events (may be auto-reset or manual reset may be set or ldquopulsedrdquo)

ndash Mutexes (ldquomutual exclusionrdquo one-at-a-time)

ndash Semaphores (n-at-a-time)

ndash Timers

ndash Processes and Threads (signalled upon exit or terminate)

ndash Directories (change notification)

5656

Waiting

No guaranteed ordering of wait resolution

ndash If multiple threads are waiting for an object and only one thread is released (eg itrsquos a mutex or auto-reset event) which thread gets released is unpredictable

5757

Executive Synchronization

Thread waitson an object

handle

Create and initialize thread object

Initialized

Ready

Transition

Waiting

Running

Terminated

Standby

Wait is completeSet object to

signaled state

Interaction with thread scheduling

Waiting on Dispatcher Objects ndash outside the kernel

5858

Interaction bet Synchronization amp Dispatching User mode thread waits on an event objectlsquos handle

Kernel changes threadlsquos scheduling state from ready to waiting and adds thread to wait-list

Another thread sets the event

Kernel wakes up waiting threads variable priority threads get priority boost

5959

Interaction bet Synchronization amp Dispatching Dispatcher re-schedules new thread ndash it may preempt running thread it it has lower priority and issues software interrupt to initiate context switch

If no processor can be preempted the dispatcher places the ready thread in the dispatcher ready queue to be scheduled later

6060

What signals an object

Dispatcher object

System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

Owning thread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (kernel mode)

nonsignaled signaled

Owning thread or otherthread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (exported to user mode)

nonsignaled signaled

One thread releases thesemaphore freeing a resource

A thread acquires the semaphoreMore resources are not available

Kernel resumes one or more waiting threads

Semaphore

6161

A thread reinitializesthe thread object

What signals an object (contd)

Dispatcher object System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

A thread sets the event

Kernel resumes one or more threads

Kernel resumes one or more waiting threads

Event

nonsignaled signaled

Dedicated thread sets oneevent in the event pair

Kernel resumes theother dedicated thread

Kernel resumes waitingdedicated thread

Event pair

nonsignaled signaled

Timer expires

A thread (re) initializes the timer

Kernel resumes all waiting threads

Timer

nonsignaled signaled

Thread terminates

Kernel resumes all waiting threads

Thread

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 5: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

55

Solution to Critical-Section Problem

Mutual Exclusion

ndash Only one thread at a time is allowed into its CS among all threads that have CS for the same resource or shared data

ndash A thread halted in its non-critical section must not interfere with other threads

Progress

ndash A thread remains inside CS for a finite time only

ndash No assumptions concerning relative speed of the threads

66

Solution to Critical-Section Problem

Bounded Waiting

ndash It must no be possible for a thread requiring access to a critical section to be delayed indefinitely

ndash When no thread is in a critical section any thread that requests entry must be permitted to enter without delay

77

Only 2 threads T0 and T1

General structure of thread Ti (other thread Tj)

do

enter section

critical section

exit section

reminder section

while (1)

Threads may share some common variables to synchronize their actions

Initial Attempts to Solve Problem

88

First Attempt Algorithm 1

Shared variables

ndash Initialization int turn = 0

ndash turn == i Ti can enter its critical section

Thread Ti

do

while (turn = i)

critical section

turn = j

reminder section

while (1)

Satisfies mutual exclusion but not progress

99

Second Attempt Algorithm 2

Shared variables

ndash initialization int flag[2] flag[0] = flag[1] = 0

ndash flag[i] == 1 Ti can enter its critical section

Thread Ti

do

flag[i] = 1while (flag[j] == 1)

critical section

flag[i] = 0remainder section

while(1)

Satisfies mutual exclusion not progress requirement

1010

Algorithm 3 (Petersonrsquos Algorithm - 1981)

Shared variables of algorithms 1 and 2 - initialization

int flag[2] flag[0] = flag[1] = 0int turn = 0

Thread Ti

do

flag[i] = 1turn = jwhile ((flag[j] == 1) ampamp turn == j)

critical section

flag[i] = 0

remainder section

while (1)

Solves the critical-section problem for two threads

1111

Dekkerrsquos Algorithm (1965)

This is the first correct solution proposed for the two-thread (two-process) case

Originally developed by Dekker in a different context it was applied to the critical section problem by Dijkstra

Dekker adds the idea of a favored thread and allows access to either thread when the request is uncontested

When there is a conflict one thread is favored and the priority reverses after successful execution of the critical section

1212

Dekkerrsquos Algorithm (contd)

Shared variables - initializationint flag[2] flag[0] = flag[1] = 0int turn = 0

Thread Ti

do flag[i] = 1

while (flag[j] ) if (turn == j)

flag[i] = 0while (turn == j)flag[i] = 1

critical section

turn = jflag[I] = 0

remainder section

while (1)

1313

Bakery Algorithm (Lamport 1979)

A Solution to the Critical Section problem for n threads

Before entering its CS a thread receives a number

Holder of the smallest number enters the CS

If threads Ti and Tj receive the same number if i lt j then Ti is served first else Tj is served first

The numbering scheme generates numbers in monotonically non-decreasing order

ndash ie 1112333445

1414

Bakery Algorithm

Notation ldquoltldquo establishes lexicographical order among 2-tuples (ticket thread id )

(ab) lt (cd) if a lt c or if a == c and b lt d

max (a0hellip an-1) = k | k ai for i = 0hellip n ndash 1

Shared data

int choosing[n]

int number[n] - the ticket

Data structures are initialized to 0

1515

Bakery Algorithm

do

choosing[i] = 1

number[i] = max(number[0]number[1] number[n-1]) + 1

choosing[i] = 0

for (j = 0 j lt n j++)

while (choosing[j] == 1)

while ((number[j] = 0) ampamp ((number[j]j) lsquorsquoltlsquorsquo (number[i]i)))

critical section

number[i] = 0

remainder section

while (1)

1616

Mutual Exclusion - Hardware Support

Interrupt Disabling

ndash Concurrent threads cannot overlap on a uniprocessor

ndash Thread will run until performing a system call or interrupt happens

Special Atomic Machine Instructions

ndash Test and Set Instruction - read amp write a memory location

ndash Exchange Instruction - swap register and memory location

Problems with Machine-Instruction Approach

ndash Busy waiting

ndash Starvation is possible

ndash Deadlock is possible

1717

Synchronization Hardware

Test and modify the content of a word atomically

boolean TestAndSet(boolean amptarget)

boolean rv = target

target = true

return rv

1818

Shared data ndash boolean lock = false

Thread Ti

do

while (TestAndSet(lock))

critical section

lock = false

remainder section

Mutual Exclusion with Test-and-Set

1919

Synchronization Hardware

Atomically swap two variables

void Swap(boolean ampa boolean ampb)

boolean temp = a

a = b

b = temp

2020

Mutual Exclusion with Swap

Shared data (initialized to 0) int lock = 0

Thread Ti

int key

do

key = 1

while (key == 1) Swap(lockkey)

critical section

lock = 0

remainder section

2121

Semaphores

Semaphore S ndash integer variable

can only be accessed via two atomic operations

wait (S)

while (S lt= 0)S--

signal (S)

S++

2222

Critical Section of n Threads

Shared data

semaphore mutex initially mutex = 1

Thread Ti

do wait(mutex) critical section

signal(mutex) remainder section

while (1)

2323

Semaphore Implementation

Semaphores may suspendresume threads

ndash Avoid busy waiting

Define a semaphore as a record

typedef struct

int value struct thread L semaphore

Assume two simple operations

ndash suspend() suspends the thread that invokes it

ndash resume(T) resumes the execution of a blocked thread T

2424

Implementation

Semaphore operations now defined as wait(S)

Svalue--

if (Svalue lt 0)

add this thread to SLsuspend()

signal(S) Svalue++

if (Svalue lt= 0)

remove a thread T from SLresume(T)

2525

Semaphore as a General Synchronization Tool Execute B in Tj only after A executed in Ti

Use semaphore flag initialized to 0

Code

Ti Tj

A wait(flag)

signal(flag) B

2626

Two Types of Semaphores

Counting semaphore

ndash integer value can range over an unrestricted domain

Binary semaphore

ndash integer value can range only between 0 and 1

ndash can be simpler to implement

Counting semaphore S can be implemented as a binary semaphore

2727

Deadlock and Starvation

Deadlock ndash

ndash two or more threads are waiting indefinitely for an event that can be caused by only one of the waiting threads

Let S and Q be two semaphores initialized to 1

T0 T1

wait(S) wait(Q)

wait(Q) wait(S)

signal(S) signal(Q)

signal(Q) signal(S)

2828

Deadlock and Starvation

Starvation ndash indefinite blocking

ndash A thread may never be removed from the semaphore queue in which it is suspended

Solution ndash

ndash all code should acquirerelease semaphores in same order

2929

Windows Synchronization

Uses interrupt masks to protect access to global resources on uniprocessor systems

Uses spinlocks on multiprocessor systems

Provides dispatcher objects which may act as mutexes and semaphores

Dispatcher objects may also provide events An event acts much like a condition variable

3030

Linux Synchronization

Kernel disables interrupts for synchronizing access to global data on uniprocessor systems

Uses spinlocks for multiprocessor synchronization

Uses semaphores and readers-writers locks when longer sections of code need access to data

Implements POSIX synchronization primitives to support multitasking multithreading (including real-time threads) and multiprocessing

3131

Further Reading

Ben-Ari M Principles of Concurrent Programming Prentice Hall 1982

Lamport L The Mutual Exclusion Problem Journal of the ACM April 1986

Abraham Silberschatz Peter B Galvin Operating System Concepts John Wiley amp Sons 6th Ed 2003

ndash Chapter 7 - Process Synchronization

ndash Chapter 8 - Deadlocks

3232

32 Trap Dispatching Interrupts Synchronization Trap and Interrupt dispatching

IRQL levels amp Interrupt Precedence

Spinlocks and Kernel Synchronization

Executive Synchronization

3333

Kernel Mode Versus User Mode

A processor state

Controls access to memory

Each memory page is tagged to show the required mode for reading and for writing

ndash Protects the system from the users

ndash Protects the user (process) from themselves

ndash System is not protected from system

Code regions are tagged ldquono write in any moderdquo

Controls ability to execute privileged instructions

A Windows abstraction

ndash Intel Ring 0 Ring 3

3434

Kernel Mode Versus User Mode

Control flow (a thread) can change from user to kernel mode and back

ndash Does not affect scheduling

ndash Thread context includes info about execution mode (along with registers etc)

PerfMon counters

ndash ldquoPrivileged Timerdquo and ldquoUser Timerdquo

ndash 4 levels of granularity thread process processor system

3535

Getting Into Kernel Mode

Code is run in kernel mode for one of three reasons

1 Requests from user mode

ndash Via the system service dispatch mechanism

ndash Kernel-mode code runs in the context of the requesting thread

2 Dedicated kernel-mode system threads

ndash Some threads in the system stay in kernel mode at all timesmostly in the ldquoSystemrdquo process

ndash Scheduled preempted etc like any other threads

3636

Getting Into Kernel Mode

3 Interrupts from external devicesndash interrupt dispatcher invokes the interrupt service routine

ndash ISR runs in the context of the interrupted thread so-called ldquoarbitrary thread contextrdquo

ndash ISR often requests the execution of a ldquoDPC routinerdquo which also runs in kernel mode

ndash Time not charged to interrupted thread

3737

Trap dispatching

Interruptdispatcher

Systemservice

dispatcher

Interruptserviceroutines

Interruptserviceroutines

Interruptserviceroutines

System services

System services

System services

Exceptiondispatcher

Exceptionhandlers

Exceptionhandlers

Exceptionhandlers

Virtual memorymanagerlsquos pager

Interrupt

System service call

HW exceptionsSW exceptions

Virtual addressexceptions

Trap processorlsquos mechanism to capture executing thread

ndash Switch from user to kernel mode

ndash Interrupts ndash asynchronous

ndash Exceptions - synchronous

Interrupt dispatch routine

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Interrupt service routine

interrupt

user or kernel mode

codekernel mode

Note no thread or process context switch

Note no thread or process context switch

Interrupt Dispatching

3939

IRQL = Interrupt Request Level

ndash Precedence of the interrupt with respect to other interrupts

ndash Different interrupt sources have different IRQLs

ndash not the same as IRQ

IRQL is also a state of the processor

ndash Servicing an interrupt raises processor IRQL to that interruptrsquos IRQL

ndash this masks subsequent interrupts at equal and lower IRQLs

User mode is limited to IRQL 0

No waits or page faults at IRQL gt= DISPATCH_LEVEL

Interrupt Precedence via IRQLs (x86)

4040

PassiveLowAPC

DispatchDPCDevice 1

Profile amp Synch (Srv 2003)

ClockInterprocessor Interrupt

Power failHigh

normal thread execution

Hardware interrupts

Deferrable software interrupts

012

302928

31

Interrupt Precedence via IRQLs (x86)

4141

Interrupt processing

Interrupt dispatch table (IDT)ndash Links to interrupt service routines

x86ndash Interrupt controller interrupts processor (single line)

ndash Processor queries for interrupt vector uses vector as index to IDT

Alphandash PAL code (Privileged Architecture Library ndash Alpha BIOS) determines interrupt vector calls kernel

ndash Kernel uses vector to index IDT

After ISR execution IRQL is lowered to initial level

4242

Interrupt object

Allows device drivers to register ISRs for their devicesndash Contains dispatch code (initial handler)

ndash Dispatch code calls ISR with interrupt object as parameter(HW cannot pass parameters to ISR)

Connectingdisconnecting interrupt objectsndash Dynamic association between ISR and IDT entry

ndash Loadable device drivers (kernel modules)

ndash Turn onoff ISR

Interrupt objects can synchronize access to ISR datandash Multiple instances of ISR may be active simultaneously (MP machine)

ndash Multiple ISR may be connected with IRQL

4343

Predefined IRQLs

High ndash used when halting the system (via KeBugCheck())

Power fail ndash originated in the NT design document but has never been used

Inter-processor interruptndash used to request action from other processor (dispatching a thread updating a processors TLB system shutdown system crash)

Clockndash Used to update systemlsquos clock allocation of CPU time to threads

Profilendash Used for kernel profiling (see Kernel profiler ndash Kernprofexe Res Kit)

4444

Predefined IRQLs (contd)

Device

ndash Used to prioritize device interrupts

DPCdispatch and APC

ndash Software interrupts that kernel and device drivers generate

Passive

ndash No interrupt level at all normal thread execution

4545

IRQLs on 64-bit Systems

PassiveLowAPC

DispatchDPCDevice 1

Device n

Synch (Srv 2003)Clock

Interprocessor InterruptPower

HighProfile

012

1413

15

34

PassiveLowAPC

DispatchDPC amp Synch (UP only)Correctable Machine Check

Device 1

Device nSynch (MP only)

ClockInterprocessor Interrupt

HighProfilePower

x64 IA64

12

4646

Interrupt Prioritization amp Delivery

IRQLs are determined as followsndash x86 UP systems IRQL = 27 - IRQ

ndash x86 MP systems bucketized (random)

ndash x64 amp IA64 systems IRQL = IDT vector number 16

On MP systems which processor is chosen to deliver an interruptndash By default any processor can receive an interrupt from any deviceCan be configured with IntFilter utility in Resource Kit

ndash On x86 and x64 systems the IOAPIC (IO advanced programmable interrupt controller) is programmed to interrupt the processor running at the lowest IRQL

ndash On IA64 systems the SAPIC (streamlined advanced programmable interrupt controller) is configured to interrupt one processor for each interrupt sourceProcessors are assigned round robin for each interrupt vector

4747

Software interrupts

Initiating thread dispatching

ndash DPC allow for scheduling actions when kernel is deep within many layers of code

ndash Delayed scheduling decision one DPC queue per processor

Handling timer expiration

Asynchronous execution of a procedure in context of a particular thread

Support for asynchronous IO operations

4848

Flow of Interrupts

4949

Sync on MP use spinlocks to coordinate among processors

Spinlock acquisition and release routines implement a one-owner-at-a-time algorithm

ndash Spinlock is either free or is considered to be owned by a CPU

ndash Analogous to using Windows API mutexes from user mode

A spinlock is just a data cell in memory

ndash Accessed with a test-and-modify operation that is atomic across all processors

ndash KSPIN_LOCK is an opaque data type typedefrsquod as a ULONG

ndash To implement synchronization a single bit is sufficient

Synchronization on SMP Systems

5050

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

Kernel Synchronization

Processor BProcessor A

Critical section

spinlock

DPC DPC

A spinlock is a locking primitive associatedwith a global data structure such as the DPC queue

5151

Queued Spinlocks

Problem Checking status of spinlock via test-and-set operation creates bus contention

Queued spinlocks maintain queue of waiting processors

First processor acquires lock other processors wait on processor-local flag

ndash Thus busy-wait loop requires no access to the memory bus

When releasing lock the 1st processorrsquos flag is modified

ndash Exactly one processor is being signaled

ndash Pre-determined wait order

5252

SMP Scalability Improvements

Windows 2000 queued spinlocks

ndash qlocks in Kernel Debugger

Server 2003

ndash More spinlocks eliminated (context swap system space commit)

ndash Further reduction of use of spinlocks amp length they are held

ndash Scheduling database now per-CPUAllows thread state transitions in parallel

5353

SMP Scalability Improvements

XP2003

ndash Minimized lock contention for hot locks PFN or Page Frame Database lock

ndash Some locks completely eliminatedCharging nonpagedpaged pool quotas allocating and mapping system page table entries charging commitment of pages allocatingmapping physical memory through AWE functions

ndash New more efficient locking mechanism (pushlocks)Doesnrsquot use spinlocks when no contentionUsed for object manager and address windowing extensions (AWE) related locks

5454

Waiting

Flexible wait calls

ndash Wait for one or multiple objects in one call

ndash Wait for multiple can wait for ldquoanyrdquo one or ldquoallrdquo at onceldquoAllrdquo all objects must be in the signalled state concurrently to resolve the wait

ndash All wait calls include optional timeout argument

ndash Waiting threads consume no CPU time

5555

Waiting

Waitable objects include

ndash Events (may be auto-reset or manual reset may be set or ldquopulsedrdquo)

ndash Mutexes (ldquomutual exclusionrdquo one-at-a-time)

ndash Semaphores (n-at-a-time)

ndash Timers

ndash Processes and Threads (signalled upon exit or terminate)

ndash Directories (change notification)

5656

Waiting

No guaranteed ordering of wait resolution

ndash If multiple threads are waiting for an object and only one thread is released (eg itrsquos a mutex or auto-reset event) which thread gets released is unpredictable

5757

Executive Synchronization

Thread waitson an object

handle

Create and initialize thread object

Initialized

Ready

Transition

Waiting

Running

Terminated

Standby

Wait is completeSet object to

signaled state

Interaction with thread scheduling

Waiting on Dispatcher Objects ndash outside the kernel

5858

Interaction bet Synchronization amp Dispatching User mode thread waits on an event objectlsquos handle

Kernel changes threadlsquos scheduling state from ready to waiting and adds thread to wait-list

Another thread sets the event

Kernel wakes up waiting threads variable priority threads get priority boost

5959

Interaction bet Synchronization amp Dispatching Dispatcher re-schedules new thread ndash it may preempt running thread it it has lower priority and issues software interrupt to initiate context switch

If no processor can be preempted the dispatcher places the ready thread in the dispatcher ready queue to be scheduled later

6060

What signals an object

Dispatcher object

System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

Owning thread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (kernel mode)

nonsignaled signaled

Owning thread or otherthread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (exported to user mode)

nonsignaled signaled

One thread releases thesemaphore freeing a resource

A thread acquires the semaphoreMore resources are not available

Kernel resumes one or more waiting threads

Semaphore

6161

A thread reinitializesthe thread object

What signals an object (contd)

Dispatcher object System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

A thread sets the event

Kernel resumes one or more threads

Kernel resumes one or more waiting threads

Event

nonsignaled signaled

Dedicated thread sets oneevent in the event pair

Kernel resumes theother dedicated thread

Kernel resumes waitingdedicated thread

Event pair

nonsignaled signaled

Timer expires

A thread (re) initializes the timer

Kernel resumes all waiting threads

Timer

nonsignaled signaled

Thread terminates

Kernel resumes all waiting threads

Thread

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 6: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

66

Solution to Critical-Section Problem

Bounded Waiting

ndash It must no be possible for a thread requiring access to a critical section to be delayed indefinitely

ndash When no thread is in a critical section any thread that requests entry must be permitted to enter without delay

77

Only 2 threads T0 and T1

General structure of thread Ti (other thread Tj)

do

enter section

critical section

exit section

reminder section

while (1)

Threads may share some common variables to synchronize their actions

Initial Attempts to Solve Problem

88

First Attempt Algorithm 1

Shared variables

ndash Initialization int turn = 0

ndash turn == i Ti can enter its critical section

Thread Ti

do

while (turn = i)

critical section

turn = j

reminder section

while (1)

Satisfies mutual exclusion but not progress

99

Second Attempt Algorithm 2

Shared variables

ndash initialization int flag[2] flag[0] = flag[1] = 0

ndash flag[i] == 1 Ti can enter its critical section

Thread Ti

do

flag[i] = 1while (flag[j] == 1)

critical section

flag[i] = 0remainder section

while(1)

Satisfies mutual exclusion not progress requirement

1010

Algorithm 3 (Petersonrsquos Algorithm - 1981)

Shared variables of algorithms 1 and 2 - initialization

int flag[2] flag[0] = flag[1] = 0int turn = 0

Thread Ti

do

flag[i] = 1turn = jwhile ((flag[j] == 1) ampamp turn == j)

critical section

flag[i] = 0

remainder section

while (1)

Solves the critical-section problem for two threads

1111

Dekkerrsquos Algorithm (1965)

This is the first correct solution proposed for the two-thread (two-process) case

Originally developed by Dekker in a different context it was applied to the critical section problem by Dijkstra

Dekker adds the idea of a favored thread and allows access to either thread when the request is uncontested

When there is a conflict one thread is favored and the priority reverses after successful execution of the critical section

1212

Dekkerrsquos Algorithm (contd)

Shared variables - initializationint flag[2] flag[0] = flag[1] = 0int turn = 0

Thread Ti

do flag[i] = 1

while (flag[j] ) if (turn == j)

flag[i] = 0while (turn == j)flag[i] = 1

critical section

turn = jflag[I] = 0

remainder section

while (1)

1313

Bakery Algorithm (Lamport 1979)

A Solution to the Critical Section problem for n threads

Before entering its CS a thread receives a number

Holder of the smallest number enters the CS

If threads Ti and Tj receive the same number if i lt j then Ti is served first else Tj is served first

The numbering scheme generates numbers in monotonically non-decreasing order

ndash ie 1112333445

1414

Bakery Algorithm

Notation ldquoltldquo establishes lexicographical order among 2-tuples (ticket thread id )

(ab) lt (cd) if a lt c or if a == c and b lt d

max (a0hellip an-1) = k | k ai for i = 0hellip n ndash 1

Shared data

int choosing[n]

int number[n] - the ticket

Data structures are initialized to 0

1515

Bakery Algorithm

do

choosing[i] = 1

number[i] = max(number[0]number[1] number[n-1]) + 1

choosing[i] = 0

for (j = 0 j lt n j++)

while (choosing[j] == 1)

while ((number[j] = 0) ampamp ((number[j]j) lsquorsquoltlsquorsquo (number[i]i)))

critical section

number[i] = 0

remainder section

while (1)

1616

Mutual Exclusion - Hardware Support

Interrupt Disabling

ndash Concurrent threads cannot overlap on a uniprocessor

ndash Thread will run until performing a system call or interrupt happens

Special Atomic Machine Instructions

ndash Test and Set Instruction - read amp write a memory location

ndash Exchange Instruction - swap register and memory location

Problems with Machine-Instruction Approach

ndash Busy waiting

ndash Starvation is possible

ndash Deadlock is possible

1717

Synchronization Hardware

Test and modify the content of a word atomically

boolean TestAndSet(boolean amptarget)

boolean rv = target

target = true

return rv

1818

Shared data ndash boolean lock = false

Thread Ti

do

while (TestAndSet(lock))

critical section

lock = false

remainder section

Mutual Exclusion with Test-and-Set

1919

Synchronization Hardware

Atomically swap two variables

void Swap(boolean ampa boolean ampb)

boolean temp = a

a = b

b = temp

2020

Mutual Exclusion with Swap

Shared data (initialized to 0) int lock = 0

Thread Ti

int key

do

key = 1

while (key == 1) Swap(lockkey)

critical section

lock = 0

remainder section

2121

Semaphores

Semaphore S ndash integer variable

can only be accessed via two atomic operations

wait (S)

while (S lt= 0)S--

signal (S)

S++

2222

Critical Section of n Threads

Shared data

semaphore mutex initially mutex = 1

Thread Ti

do wait(mutex) critical section

signal(mutex) remainder section

while (1)

2323

Semaphore Implementation

Semaphores may suspendresume threads

ndash Avoid busy waiting

Define a semaphore as a record

typedef struct

int value struct thread L semaphore

Assume two simple operations

ndash suspend() suspends the thread that invokes it

ndash resume(T) resumes the execution of a blocked thread T

2424

Implementation

Semaphore operations now defined as wait(S)

Svalue--

if (Svalue lt 0)

add this thread to SLsuspend()

signal(S) Svalue++

if (Svalue lt= 0)

remove a thread T from SLresume(T)

2525

Semaphore as a General Synchronization Tool Execute B in Tj only after A executed in Ti

Use semaphore flag initialized to 0

Code

Ti Tj

A wait(flag)

signal(flag) B

2626

Two Types of Semaphores

Counting semaphore

ndash integer value can range over an unrestricted domain

Binary semaphore

ndash integer value can range only between 0 and 1

ndash can be simpler to implement

Counting semaphore S can be implemented as a binary semaphore

2727

Deadlock and Starvation

Deadlock ndash

ndash two or more threads are waiting indefinitely for an event that can be caused by only one of the waiting threads

Let S and Q be two semaphores initialized to 1

T0 T1

wait(S) wait(Q)

wait(Q) wait(S)

signal(S) signal(Q)

signal(Q) signal(S)

2828

Deadlock and Starvation

Starvation ndash indefinite blocking

ndash A thread may never be removed from the semaphore queue in which it is suspended

Solution ndash

ndash all code should acquirerelease semaphores in same order

2929

Windows Synchronization

Uses interrupt masks to protect access to global resources on uniprocessor systems

Uses spinlocks on multiprocessor systems

Provides dispatcher objects which may act as mutexes and semaphores

Dispatcher objects may also provide events An event acts much like a condition variable

3030

Linux Synchronization

Kernel disables interrupts for synchronizing access to global data on uniprocessor systems

Uses spinlocks for multiprocessor synchronization

Uses semaphores and readers-writers locks when longer sections of code need access to data

Implements POSIX synchronization primitives to support multitasking multithreading (including real-time threads) and multiprocessing

3131

Further Reading

Ben-Ari M Principles of Concurrent Programming Prentice Hall 1982

Lamport L The Mutual Exclusion Problem Journal of the ACM April 1986

Abraham Silberschatz Peter B Galvin Operating System Concepts John Wiley amp Sons 6th Ed 2003

ndash Chapter 7 - Process Synchronization

ndash Chapter 8 - Deadlocks

3232

32 Trap Dispatching Interrupts Synchronization Trap and Interrupt dispatching

IRQL levels amp Interrupt Precedence

Spinlocks and Kernel Synchronization

Executive Synchronization

3333

Kernel Mode Versus User Mode

A processor state

Controls access to memory

Each memory page is tagged to show the required mode for reading and for writing

ndash Protects the system from the users

ndash Protects the user (process) from themselves

ndash System is not protected from system

Code regions are tagged ldquono write in any moderdquo

Controls ability to execute privileged instructions

A Windows abstraction

ndash Intel Ring 0 Ring 3

3434

Kernel Mode Versus User Mode

Control flow (a thread) can change from user to kernel mode and back

ndash Does not affect scheduling

ndash Thread context includes info about execution mode (along with registers etc)

PerfMon counters

ndash ldquoPrivileged Timerdquo and ldquoUser Timerdquo

ndash 4 levels of granularity thread process processor system

3535

Getting Into Kernel Mode

Code is run in kernel mode for one of three reasons

1 Requests from user mode

ndash Via the system service dispatch mechanism

ndash Kernel-mode code runs in the context of the requesting thread

2 Dedicated kernel-mode system threads

ndash Some threads in the system stay in kernel mode at all timesmostly in the ldquoSystemrdquo process

ndash Scheduled preempted etc like any other threads

3636

Getting Into Kernel Mode

3 Interrupts from external devicesndash interrupt dispatcher invokes the interrupt service routine

ndash ISR runs in the context of the interrupted thread so-called ldquoarbitrary thread contextrdquo

ndash ISR often requests the execution of a ldquoDPC routinerdquo which also runs in kernel mode

ndash Time not charged to interrupted thread

3737

Trap dispatching

Interruptdispatcher

Systemservice

dispatcher

Interruptserviceroutines

Interruptserviceroutines

Interruptserviceroutines

System services

System services

System services

Exceptiondispatcher

Exceptionhandlers

Exceptionhandlers

Exceptionhandlers

Virtual memorymanagerlsquos pager

Interrupt

System service call

HW exceptionsSW exceptions

Virtual addressexceptions

Trap processorlsquos mechanism to capture executing thread

ndash Switch from user to kernel mode

ndash Interrupts ndash asynchronous

ndash Exceptions - synchronous

Interrupt dispatch routine

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Interrupt service routine

interrupt

user or kernel mode

codekernel mode

Note no thread or process context switch

Note no thread or process context switch

Interrupt Dispatching

3939

IRQL = Interrupt Request Level

ndash Precedence of the interrupt with respect to other interrupts

ndash Different interrupt sources have different IRQLs

ndash not the same as IRQ

IRQL is also a state of the processor

ndash Servicing an interrupt raises processor IRQL to that interruptrsquos IRQL

ndash this masks subsequent interrupts at equal and lower IRQLs

User mode is limited to IRQL 0

No waits or page faults at IRQL gt= DISPATCH_LEVEL

Interrupt Precedence via IRQLs (x86)

4040

PassiveLowAPC

DispatchDPCDevice 1

Profile amp Synch (Srv 2003)

ClockInterprocessor Interrupt

Power failHigh

normal thread execution

Hardware interrupts

Deferrable software interrupts

012

302928

31

Interrupt Precedence via IRQLs (x86)

4141

Interrupt processing

Interrupt dispatch table (IDT)ndash Links to interrupt service routines

x86ndash Interrupt controller interrupts processor (single line)

ndash Processor queries for interrupt vector uses vector as index to IDT

Alphandash PAL code (Privileged Architecture Library ndash Alpha BIOS) determines interrupt vector calls kernel

ndash Kernel uses vector to index IDT

After ISR execution IRQL is lowered to initial level

4242

Interrupt object

Allows device drivers to register ISRs for their devicesndash Contains dispatch code (initial handler)

ndash Dispatch code calls ISR with interrupt object as parameter(HW cannot pass parameters to ISR)

Connectingdisconnecting interrupt objectsndash Dynamic association between ISR and IDT entry

ndash Loadable device drivers (kernel modules)

ndash Turn onoff ISR

Interrupt objects can synchronize access to ISR datandash Multiple instances of ISR may be active simultaneously (MP machine)

ndash Multiple ISR may be connected with IRQL

4343

Predefined IRQLs

High ndash used when halting the system (via KeBugCheck())

Power fail ndash originated in the NT design document but has never been used

Inter-processor interruptndash used to request action from other processor (dispatching a thread updating a processors TLB system shutdown system crash)

Clockndash Used to update systemlsquos clock allocation of CPU time to threads

Profilendash Used for kernel profiling (see Kernel profiler ndash Kernprofexe Res Kit)

4444

Predefined IRQLs (contd)

Device

ndash Used to prioritize device interrupts

DPCdispatch and APC

ndash Software interrupts that kernel and device drivers generate

Passive

ndash No interrupt level at all normal thread execution

4545

IRQLs on 64-bit Systems

PassiveLowAPC

DispatchDPCDevice 1

Device n

Synch (Srv 2003)Clock

Interprocessor InterruptPower

HighProfile

012

1413

15

34

PassiveLowAPC

DispatchDPC amp Synch (UP only)Correctable Machine Check

Device 1

Device nSynch (MP only)

ClockInterprocessor Interrupt

HighProfilePower

x64 IA64

12

4646

Interrupt Prioritization amp Delivery

IRQLs are determined as followsndash x86 UP systems IRQL = 27 - IRQ

ndash x86 MP systems bucketized (random)

ndash x64 amp IA64 systems IRQL = IDT vector number 16

On MP systems which processor is chosen to deliver an interruptndash By default any processor can receive an interrupt from any deviceCan be configured with IntFilter utility in Resource Kit

ndash On x86 and x64 systems the IOAPIC (IO advanced programmable interrupt controller) is programmed to interrupt the processor running at the lowest IRQL

ndash On IA64 systems the SAPIC (streamlined advanced programmable interrupt controller) is configured to interrupt one processor for each interrupt sourceProcessors are assigned round robin for each interrupt vector

4747

Software interrupts

Initiating thread dispatching

ndash DPC allow for scheduling actions when kernel is deep within many layers of code

ndash Delayed scheduling decision one DPC queue per processor

Handling timer expiration

Asynchronous execution of a procedure in context of a particular thread

Support for asynchronous IO operations

4848

Flow of Interrupts

4949

Sync on MP use spinlocks to coordinate among processors

Spinlock acquisition and release routines implement a one-owner-at-a-time algorithm

ndash Spinlock is either free or is considered to be owned by a CPU

ndash Analogous to using Windows API mutexes from user mode

A spinlock is just a data cell in memory

ndash Accessed with a test-and-modify operation that is atomic across all processors

ndash KSPIN_LOCK is an opaque data type typedefrsquod as a ULONG

ndash To implement synchronization a single bit is sufficient

Synchronization on SMP Systems

5050

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

Kernel Synchronization

Processor BProcessor A

Critical section

spinlock

DPC DPC

A spinlock is a locking primitive associatedwith a global data structure such as the DPC queue

5151

Queued Spinlocks

Problem Checking status of spinlock via test-and-set operation creates bus contention

Queued spinlocks maintain queue of waiting processors

First processor acquires lock other processors wait on processor-local flag

ndash Thus busy-wait loop requires no access to the memory bus

When releasing lock the 1st processorrsquos flag is modified

ndash Exactly one processor is being signaled

ndash Pre-determined wait order

5252

SMP Scalability Improvements

Windows 2000 queued spinlocks

ndash qlocks in Kernel Debugger

Server 2003

ndash More spinlocks eliminated (context swap system space commit)

ndash Further reduction of use of spinlocks amp length they are held

ndash Scheduling database now per-CPUAllows thread state transitions in parallel

5353

SMP Scalability Improvements

XP2003

ndash Minimized lock contention for hot locks PFN or Page Frame Database lock

ndash Some locks completely eliminatedCharging nonpagedpaged pool quotas allocating and mapping system page table entries charging commitment of pages allocatingmapping physical memory through AWE functions

ndash New more efficient locking mechanism (pushlocks)Doesnrsquot use spinlocks when no contentionUsed for object manager and address windowing extensions (AWE) related locks

5454

Waiting

Flexible wait calls

ndash Wait for one or multiple objects in one call

ndash Wait for multiple can wait for ldquoanyrdquo one or ldquoallrdquo at onceldquoAllrdquo all objects must be in the signalled state concurrently to resolve the wait

ndash All wait calls include optional timeout argument

ndash Waiting threads consume no CPU time

5555

Waiting

Waitable objects include

ndash Events (may be auto-reset or manual reset may be set or ldquopulsedrdquo)

ndash Mutexes (ldquomutual exclusionrdquo one-at-a-time)

ndash Semaphores (n-at-a-time)

ndash Timers

ndash Processes and Threads (signalled upon exit or terminate)

ndash Directories (change notification)

5656

Waiting

No guaranteed ordering of wait resolution

ndash If multiple threads are waiting for an object and only one thread is released (eg itrsquos a mutex or auto-reset event) which thread gets released is unpredictable

5757

Executive Synchronization

Thread waitson an object

handle

Create and initialize thread object

Initialized

Ready

Transition

Waiting

Running

Terminated

Standby

Wait is completeSet object to

signaled state

Interaction with thread scheduling

Waiting on Dispatcher Objects ndash outside the kernel

5858

Interaction bet Synchronization amp Dispatching User mode thread waits on an event objectlsquos handle

Kernel changes threadlsquos scheduling state from ready to waiting and adds thread to wait-list

Another thread sets the event

Kernel wakes up waiting threads variable priority threads get priority boost

5959

Interaction bet Synchronization amp Dispatching Dispatcher re-schedules new thread ndash it may preempt running thread it it has lower priority and issues software interrupt to initiate context switch

If no processor can be preempted the dispatcher places the ready thread in the dispatcher ready queue to be scheduled later

6060

What signals an object

Dispatcher object

System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

Owning thread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (kernel mode)

nonsignaled signaled

Owning thread or otherthread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (exported to user mode)

nonsignaled signaled

One thread releases thesemaphore freeing a resource

A thread acquires the semaphoreMore resources are not available

Kernel resumes one or more waiting threads

Semaphore

6161

A thread reinitializesthe thread object

What signals an object (contd)

Dispatcher object System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

A thread sets the event

Kernel resumes one or more threads

Kernel resumes one or more waiting threads

Event

nonsignaled signaled

Dedicated thread sets oneevent in the event pair

Kernel resumes theother dedicated thread

Kernel resumes waitingdedicated thread

Event pair

nonsignaled signaled

Timer expires

A thread (re) initializes the timer

Kernel resumes all waiting threads

Timer

nonsignaled signaled

Thread terminates

Kernel resumes all waiting threads

Thread

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 7: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

77

Only 2 threads T0 and T1

General structure of thread Ti (other thread Tj)

do

enter section

critical section

exit section

reminder section

while (1)

Threads may share some common variables to synchronize their actions

Initial Attempts to Solve Problem

88

First Attempt Algorithm 1

Shared variables

ndash Initialization int turn = 0

ndash turn == i Ti can enter its critical section

Thread Ti

do

while (turn = i)

critical section

turn = j

reminder section

while (1)

Satisfies mutual exclusion but not progress

99

Second Attempt Algorithm 2

Shared variables

ndash initialization int flag[2] flag[0] = flag[1] = 0

ndash flag[i] == 1 Ti can enter its critical section

Thread Ti

do

flag[i] = 1while (flag[j] == 1)

critical section

flag[i] = 0remainder section

while(1)

Satisfies mutual exclusion not progress requirement

1010

Algorithm 3 (Petersonrsquos Algorithm - 1981)

Shared variables of algorithms 1 and 2 - initialization

int flag[2] flag[0] = flag[1] = 0int turn = 0

Thread Ti

do

flag[i] = 1turn = jwhile ((flag[j] == 1) ampamp turn == j)

critical section

flag[i] = 0

remainder section

while (1)

Solves the critical-section problem for two threads

1111

Dekkerrsquos Algorithm (1965)

This is the first correct solution proposed for the two-thread (two-process) case

Originally developed by Dekker in a different context it was applied to the critical section problem by Dijkstra

Dekker adds the idea of a favored thread and allows access to either thread when the request is uncontested

When there is a conflict one thread is favored and the priority reverses after successful execution of the critical section

1212

Dekkerrsquos Algorithm (contd)

Shared variables - initializationint flag[2] flag[0] = flag[1] = 0int turn = 0

Thread Ti

do flag[i] = 1

while (flag[j] ) if (turn == j)

flag[i] = 0while (turn == j)flag[i] = 1

critical section

turn = jflag[I] = 0

remainder section

while (1)

1313

Bakery Algorithm (Lamport 1979)

A Solution to the Critical Section problem for n threads

Before entering its CS a thread receives a number

Holder of the smallest number enters the CS

If threads Ti and Tj receive the same number if i lt j then Ti is served first else Tj is served first

The numbering scheme generates numbers in monotonically non-decreasing order

ndash ie 1112333445

1414

Bakery Algorithm

Notation ldquoltldquo establishes lexicographical order among 2-tuples (ticket thread id )

(ab) lt (cd) if a lt c or if a == c and b lt d

max (a0hellip an-1) = k | k ai for i = 0hellip n ndash 1

Shared data

int choosing[n]

int number[n] - the ticket

Data structures are initialized to 0

1515

Bakery Algorithm

do

choosing[i] = 1

number[i] = max(number[0]number[1] number[n-1]) + 1

choosing[i] = 0

for (j = 0 j lt n j++)

while (choosing[j] == 1)

while ((number[j] = 0) ampamp ((number[j]j) lsquorsquoltlsquorsquo (number[i]i)))

critical section

number[i] = 0

remainder section

while (1)

1616

Mutual Exclusion - Hardware Support

Interrupt Disabling

ndash Concurrent threads cannot overlap on a uniprocessor

ndash Thread will run until performing a system call or interrupt happens

Special Atomic Machine Instructions

ndash Test and Set Instruction - read amp write a memory location

ndash Exchange Instruction - swap register and memory location

Problems with Machine-Instruction Approach

ndash Busy waiting

ndash Starvation is possible

ndash Deadlock is possible

1717

Synchronization Hardware

Test and modify the content of a word atomically

boolean TestAndSet(boolean amptarget)

boolean rv = target

target = true

return rv

1818

Shared data ndash boolean lock = false

Thread Ti

do

while (TestAndSet(lock))

critical section

lock = false

remainder section

Mutual Exclusion with Test-and-Set

1919

Synchronization Hardware

Atomically swap two variables

void Swap(boolean ampa boolean ampb)

boolean temp = a

a = b

b = temp

2020

Mutual Exclusion with Swap

Shared data (initialized to 0) int lock = 0

Thread Ti

int key

do

key = 1

while (key == 1) Swap(lockkey)

critical section

lock = 0

remainder section

2121

Semaphores

Semaphore S ndash integer variable

can only be accessed via two atomic operations

wait (S)

while (S lt= 0)S--

signal (S)

S++

2222

Critical Section of n Threads

Shared data

semaphore mutex initially mutex = 1

Thread Ti

do wait(mutex) critical section

signal(mutex) remainder section

while (1)

2323

Semaphore Implementation

Semaphores may suspendresume threads

ndash Avoid busy waiting

Define a semaphore as a record

typedef struct

int value struct thread L semaphore

Assume two simple operations

ndash suspend() suspends the thread that invokes it

ndash resume(T) resumes the execution of a blocked thread T

2424

Implementation

Semaphore operations now defined as wait(S)

Svalue--

if (Svalue lt 0)

add this thread to SLsuspend()

signal(S) Svalue++

if (Svalue lt= 0)

remove a thread T from SLresume(T)

2525

Semaphore as a General Synchronization Tool Execute B in Tj only after A executed in Ti

Use semaphore flag initialized to 0

Code

Ti Tj

A wait(flag)

signal(flag) B

2626

Two Types of Semaphores

Counting semaphore

ndash integer value can range over an unrestricted domain

Binary semaphore

ndash integer value can range only between 0 and 1

ndash can be simpler to implement

Counting semaphore S can be implemented as a binary semaphore

2727

Deadlock and Starvation

Deadlock ndash

ndash two or more threads are waiting indefinitely for an event that can be caused by only one of the waiting threads

Let S and Q be two semaphores initialized to 1

T0 T1

wait(S) wait(Q)

wait(Q) wait(S)

signal(S) signal(Q)

signal(Q) signal(S)

2828

Deadlock and Starvation

Starvation ndash indefinite blocking

ndash A thread may never be removed from the semaphore queue in which it is suspended

Solution ndash

ndash all code should acquirerelease semaphores in same order

2929

Windows Synchronization

Uses interrupt masks to protect access to global resources on uniprocessor systems

Uses spinlocks on multiprocessor systems

Provides dispatcher objects which may act as mutexes and semaphores

Dispatcher objects may also provide events An event acts much like a condition variable

3030

Linux Synchronization

Kernel disables interrupts for synchronizing access to global data on uniprocessor systems

Uses spinlocks for multiprocessor synchronization

Uses semaphores and readers-writers locks when longer sections of code need access to data

Implements POSIX synchronization primitives to support multitasking multithreading (including real-time threads) and multiprocessing

3131

Further Reading

Ben-Ari M Principles of Concurrent Programming Prentice Hall 1982

Lamport L The Mutual Exclusion Problem Journal of the ACM April 1986

Abraham Silberschatz Peter B Galvin Operating System Concepts John Wiley amp Sons 6th Ed 2003

ndash Chapter 7 - Process Synchronization

ndash Chapter 8 - Deadlocks

3232

32 Trap Dispatching Interrupts Synchronization Trap and Interrupt dispatching

IRQL levels amp Interrupt Precedence

Spinlocks and Kernel Synchronization

Executive Synchronization

3333

Kernel Mode Versus User Mode

A processor state

Controls access to memory

Each memory page is tagged to show the required mode for reading and for writing

ndash Protects the system from the users

ndash Protects the user (process) from themselves

ndash System is not protected from system

Code regions are tagged ldquono write in any moderdquo

Controls ability to execute privileged instructions

A Windows abstraction

ndash Intel Ring 0 Ring 3

3434

Kernel Mode Versus User Mode

Control flow (a thread) can change from user to kernel mode and back

ndash Does not affect scheduling

ndash Thread context includes info about execution mode (along with registers etc)

PerfMon counters

ndash ldquoPrivileged Timerdquo and ldquoUser Timerdquo

ndash 4 levels of granularity thread process processor system

3535

Getting Into Kernel Mode

Code is run in kernel mode for one of three reasons

1 Requests from user mode

ndash Via the system service dispatch mechanism

ndash Kernel-mode code runs in the context of the requesting thread

2 Dedicated kernel-mode system threads

ndash Some threads in the system stay in kernel mode at all timesmostly in the ldquoSystemrdquo process

ndash Scheduled preempted etc like any other threads

3636

Getting Into Kernel Mode

3 Interrupts from external devicesndash interrupt dispatcher invokes the interrupt service routine

ndash ISR runs in the context of the interrupted thread so-called ldquoarbitrary thread contextrdquo

ndash ISR often requests the execution of a ldquoDPC routinerdquo which also runs in kernel mode

ndash Time not charged to interrupted thread

3737

Trap dispatching

Interruptdispatcher

Systemservice

dispatcher

Interruptserviceroutines

Interruptserviceroutines

Interruptserviceroutines

System services

System services

System services

Exceptiondispatcher

Exceptionhandlers

Exceptionhandlers

Exceptionhandlers

Virtual memorymanagerlsquos pager

Interrupt

System service call

HW exceptionsSW exceptions

Virtual addressexceptions

Trap processorlsquos mechanism to capture executing thread

ndash Switch from user to kernel mode

ndash Interrupts ndash asynchronous

ndash Exceptions - synchronous

Interrupt dispatch routine

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Interrupt service routine

interrupt

user or kernel mode

codekernel mode

Note no thread or process context switch

Note no thread or process context switch

Interrupt Dispatching

3939

IRQL = Interrupt Request Level

ndash Precedence of the interrupt with respect to other interrupts

ndash Different interrupt sources have different IRQLs

ndash not the same as IRQ

IRQL is also a state of the processor

ndash Servicing an interrupt raises processor IRQL to that interruptrsquos IRQL

ndash this masks subsequent interrupts at equal and lower IRQLs

User mode is limited to IRQL 0

No waits or page faults at IRQL gt= DISPATCH_LEVEL

Interrupt Precedence via IRQLs (x86)

4040

PassiveLowAPC

DispatchDPCDevice 1

Profile amp Synch (Srv 2003)

ClockInterprocessor Interrupt

Power failHigh

normal thread execution

Hardware interrupts

Deferrable software interrupts

012

302928

31

Interrupt Precedence via IRQLs (x86)

4141

Interrupt processing

Interrupt dispatch table (IDT)ndash Links to interrupt service routines

x86ndash Interrupt controller interrupts processor (single line)

ndash Processor queries for interrupt vector uses vector as index to IDT

Alphandash PAL code (Privileged Architecture Library ndash Alpha BIOS) determines interrupt vector calls kernel

ndash Kernel uses vector to index IDT

After ISR execution IRQL is lowered to initial level

4242

Interrupt object

Allows device drivers to register ISRs for their devicesndash Contains dispatch code (initial handler)

ndash Dispatch code calls ISR with interrupt object as parameter(HW cannot pass parameters to ISR)

Connectingdisconnecting interrupt objectsndash Dynamic association between ISR and IDT entry

ndash Loadable device drivers (kernel modules)

ndash Turn onoff ISR

Interrupt objects can synchronize access to ISR datandash Multiple instances of ISR may be active simultaneously (MP machine)

ndash Multiple ISR may be connected with IRQL

4343

Predefined IRQLs

High ndash used when halting the system (via KeBugCheck())

Power fail ndash originated in the NT design document but has never been used

Inter-processor interruptndash used to request action from other processor (dispatching a thread updating a processors TLB system shutdown system crash)

Clockndash Used to update systemlsquos clock allocation of CPU time to threads

Profilendash Used for kernel profiling (see Kernel profiler ndash Kernprofexe Res Kit)

4444

Predefined IRQLs (contd)

Device

ndash Used to prioritize device interrupts

DPCdispatch and APC

ndash Software interrupts that kernel and device drivers generate

Passive

ndash No interrupt level at all normal thread execution

4545

IRQLs on 64-bit Systems

PassiveLowAPC

DispatchDPCDevice 1

Device n

Synch (Srv 2003)Clock

Interprocessor InterruptPower

HighProfile

012

1413

15

34

PassiveLowAPC

DispatchDPC amp Synch (UP only)Correctable Machine Check

Device 1

Device nSynch (MP only)

ClockInterprocessor Interrupt

HighProfilePower

x64 IA64

12

4646

Interrupt Prioritization amp Delivery

IRQLs are determined as followsndash x86 UP systems IRQL = 27 - IRQ

ndash x86 MP systems bucketized (random)

ndash x64 amp IA64 systems IRQL = IDT vector number 16

On MP systems which processor is chosen to deliver an interruptndash By default any processor can receive an interrupt from any deviceCan be configured with IntFilter utility in Resource Kit

ndash On x86 and x64 systems the IOAPIC (IO advanced programmable interrupt controller) is programmed to interrupt the processor running at the lowest IRQL

ndash On IA64 systems the SAPIC (streamlined advanced programmable interrupt controller) is configured to interrupt one processor for each interrupt sourceProcessors are assigned round robin for each interrupt vector

4747

Software interrupts

Initiating thread dispatching

ndash DPC allow for scheduling actions when kernel is deep within many layers of code

ndash Delayed scheduling decision one DPC queue per processor

Handling timer expiration

Asynchronous execution of a procedure in context of a particular thread

Support for asynchronous IO operations

4848

Flow of Interrupts

4949

Sync on MP use spinlocks to coordinate among processors

Spinlock acquisition and release routines implement a one-owner-at-a-time algorithm

ndash Spinlock is either free or is considered to be owned by a CPU

ndash Analogous to using Windows API mutexes from user mode

A spinlock is just a data cell in memory

ndash Accessed with a test-and-modify operation that is atomic across all processors

ndash KSPIN_LOCK is an opaque data type typedefrsquod as a ULONG

ndash To implement synchronization a single bit is sufficient

Synchronization on SMP Systems

5050

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

Kernel Synchronization

Processor BProcessor A

Critical section

spinlock

DPC DPC

A spinlock is a locking primitive associatedwith a global data structure such as the DPC queue

5151

Queued Spinlocks

Problem Checking status of spinlock via test-and-set operation creates bus contention

Queued spinlocks maintain queue of waiting processors

First processor acquires lock other processors wait on processor-local flag

ndash Thus busy-wait loop requires no access to the memory bus

When releasing lock the 1st processorrsquos flag is modified

ndash Exactly one processor is being signaled

ndash Pre-determined wait order

5252

SMP Scalability Improvements

Windows 2000 queued spinlocks

ndash qlocks in Kernel Debugger

Server 2003

ndash More spinlocks eliminated (context swap system space commit)

ndash Further reduction of use of spinlocks amp length they are held

ndash Scheduling database now per-CPUAllows thread state transitions in parallel

5353

SMP Scalability Improvements

XP2003

ndash Minimized lock contention for hot locks PFN or Page Frame Database lock

ndash Some locks completely eliminatedCharging nonpagedpaged pool quotas allocating and mapping system page table entries charging commitment of pages allocatingmapping physical memory through AWE functions

ndash New more efficient locking mechanism (pushlocks)Doesnrsquot use spinlocks when no contentionUsed for object manager and address windowing extensions (AWE) related locks

5454

Waiting

Flexible wait calls

ndash Wait for one or multiple objects in one call

ndash Wait for multiple can wait for ldquoanyrdquo one or ldquoallrdquo at onceldquoAllrdquo all objects must be in the signalled state concurrently to resolve the wait

ndash All wait calls include optional timeout argument

ndash Waiting threads consume no CPU time

5555

Waiting

Waitable objects include

ndash Events (may be auto-reset or manual reset may be set or ldquopulsedrdquo)

ndash Mutexes (ldquomutual exclusionrdquo one-at-a-time)

ndash Semaphores (n-at-a-time)

ndash Timers

ndash Processes and Threads (signalled upon exit or terminate)

ndash Directories (change notification)

5656

Waiting

No guaranteed ordering of wait resolution

ndash If multiple threads are waiting for an object and only one thread is released (eg itrsquos a mutex or auto-reset event) which thread gets released is unpredictable

5757

Executive Synchronization

Thread waitson an object

handle

Create and initialize thread object

Initialized

Ready

Transition

Waiting

Running

Terminated

Standby

Wait is completeSet object to

signaled state

Interaction with thread scheduling

Waiting on Dispatcher Objects ndash outside the kernel

5858

Interaction bet Synchronization amp Dispatching User mode thread waits on an event objectlsquos handle

Kernel changes threadlsquos scheduling state from ready to waiting and adds thread to wait-list

Another thread sets the event

Kernel wakes up waiting threads variable priority threads get priority boost

5959

Interaction bet Synchronization amp Dispatching Dispatcher re-schedules new thread ndash it may preempt running thread it it has lower priority and issues software interrupt to initiate context switch

If no processor can be preempted the dispatcher places the ready thread in the dispatcher ready queue to be scheduled later

6060

What signals an object

Dispatcher object

System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

Owning thread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (kernel mode)

nonsignaled signaled

Owning thread or otherthread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (exported to user mode)

nonsignaled signaled

One thread releases thesemaphore freeing a resource

A thread acquires the semaphoreMore resources are not available

Kernel resumes one or more waiting threads

Semaphore

6161

A thread reinitializesthe thread object

What signals an object (contd)

Dispatcher object System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

A thread sets the event

Kernel resumes one or more threads

Kernel resumes one or more waiting threads

Event

nonsignaled signaled

Dedicated thread sets oneevent in the event pair

Kernel resumes theother dedicated thread

Kernel resumes waitingdedicated thread

Event pair

nonsignaled signaled

Timer expires

A thread (re) initializes the timer

Kernel resumes all waiting threads

Timer

nonsignaled signaled

Thread terminates

Kernel resumes all waiting threads

Thread

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 8: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

88

First Attempt Algorithm 1

Shared variables

ndash Initialization int turn = 0

ndash turn == i Ti can enter its critical section

Thread Ti

do

while (turn = i)

critical section

turn = j

reminder section

while (1)

Satisfies mutual exclusion but not progress

99

Second Attempt Algorithm 2

Shared variables

ndash initialization int flag[2] flag[0] = flag[1] = 0

ndash flag[i] == 1 Ti can enter its critical section

Thread Ti

do

flag[i] = 1while (flag[j] == 1)

critical section

flag[i] = 0remainder section

while(1)

Satisfies mutual exclusion not progress requirement

1010

Algorithm 3 (Petersonrsquos Algorithm - 1981)

Shared variables of algorithms 1 and 2 - initialization

int flag[2] flag[0] = flag[1] = 0int turn = 0

Thread Ti

do

flag[i] = 1turn = jwhile ((flag[j] == 1) ampamp turn == j)

critical section

flag[i] = 0

remainder section

while (1)

Solves the critical-section problem for two threads

1111

Dekkerrsquos Algorithm (1965)

This is the first correct solution proposed for the two-thread (two-process) case

Originally developed by Dekker in a different context it was applied to the critical section problem by Dijkstra

Dekker adds the idea of a favored thread and allows access to either thread when the request is uncontested

When there is a conflict one thread is favored and the priority reverses after successful execution of the critical section

1212

Dekkerrsquos Algorithm (contd)

Shared variables - initializationint flag[2] flag[0] = flag[1] = 0int turn = 0

Thread Ti

do flag[i] = 1

while (flag[j] ) if (turn == j)

flag[i] = 0while (turn == j)flag[i] = 1

critical section

turn = jflag[I] = 0

remainder section

while (1)

1313

Bakery Algorithm (Lamport 1979)

A Solution to the Critical Section problem for n threads

Before entering its CS a thread receives a number

Holder of the smallest number enters the CS

If threads Ti and Tj receive the same number if i lt j then Ti is served first else Tj is served first

The numbering scheme generates numbers in monotonically non-decreasing order

ndash ie 1112333445

1414

Bakery Algorithm

Notation ldquoltldquo establishes lexicographical order among 2-tuples (ticket thread id )

(ab) lt (cd) if a lt c or if a == c and b lt d

max (a0hellip an-1) = k | k ai for i = 0hellip n ndash 1

Shared data

int choosing[n]

int number[n] - the ticket

Data structures are initialized to 0

1515

Bakery Algorithm

do

choosing[i] = 1

number[i] = max(number[0]number[1] number[n-1]) + 1

choosing[i] = 0

for (j = 0 j lt n j++)

while (choosing[j] == 1)

while ((number[j] = 0) ampamp ((number[j]j) lsquorsquoltlsquorsquo (number[i]i)))

critical section

number[i] = 0

remainder section

while (1)

1616

Mutual Exclusion - Hardware Support

Interrupt Disabling

ndash Concurrent threads cannot overlap on a uniprocessor

ndash Thread will run until performing a system call or interrupt happens

Special Atomic Machine Instructions

ndash Test and Set Instruction - read amp write a memory location

ndash Exchange Instruction - swap register and memory location

Problems with Machine-Instruction Approach

ndash Busy waiting

ndash Starvation is possible

ndash Deadlock is possible

1717

Synchronization Hardware

Test and modify the content of a word atomically

boolean TestAndSet(boolean amptarget)

boolean rv = target

target = true

return rv

1818

Shared data ndash boolean lock = false

Thread Ti

do

while (TestAndSet(lock))

critical section

lock = false

remainder section

Mutual Exclusion with Test-and-Set

1919

Synchronization Hardware

Atomically swap two variables

void Swap(boolean ampa boolean ampb)

boolean temp = a

a = b

b = temp

2020

Mutual Exclusion with Swap

Shared data (initialized to 0) int lock = 0

Thread Ti

int key

do

key = 1

while (key == 1) Swap(lockkey)

critical section

lock = 0

remainder section

2121

Semaphores

Semaphore S ndash integer variable

can only be accessed via two atomic operations

wait (S)

while (S lt= 0)S--

signal (S)

S++

2222

Critical Section of n Threads

Shared data

semaphore mutex initially mutex = 1

Thread Ti

do wait(mutex) critical section

signal(mutex) remainder section

while (1)

2323

Semaphore Implementation

Semaphores may suspendresume threads

ndash Avoid busy waiting

Define a semaphore as a record

typedef struct

int value struct thread L semaphore

Assume two simple operations

ndash suspend() suspends the thread that invokes it

ndash resume(T) resumes the execution of a blocked thread T

2424

Implementation

Semaphore operations now defined as wait(S)

Svalue--

if (Svalue lt 0)

add this thread to SLsuspend()

signal(S) Svalue++

if (Svalue lt= 0)

remove a thread T from SLresume(T)

2525

Semaphore as a General Synchronization Tool Execute B in Tj only after A executed in Ti

Use semaphore flag initialized to 0

Code

Ti Tj

A wait(flag)

signal(flag) B

2626

Two Types of Semaphores

Counting semaphore

ndash integer value can range over an unrestricted domain

Binary semaphore

ndash integer value can range only between 0 and 1

ndash can be simpler to implement

Counting semaphore S can be implemented as a binary semaphore

2727

Deadlock and Starvation

Deadlock ndash

ndash two or more threads are waiting indefinitely for an event that can be caused by only one of the waiting threads

Let S and Q be two semaphores initialized to 1

T0 T1

wait(S) wait(Q)

wait(Q) wait(S)

signal(S) signal(Q)

signal(Q) signal(S)

2828

Deadlock and Starvation

Starvation ndash indefinite blocking

ndash A thread may never be removed from the semaphore queue in which it is suspended

Solution ndash

ndash all code should acquirerelease semaphores in same order

2929

Windows Synchronization

Uses interrupt masks to protect access to global resources on uniprocessor systems

Uses spinlocks on multiprocessor systems

Provides dispatcher objects which may act as mutexes and semaphores

Dispatcher objects may also provide events An event acts much like a condition variable

3030

Linux Synchronization

Kernel disables interrupts for synchronizing access to global data on uniprocessor systems

Uses spinlocks for multiprocessor synchronization

Uses semaphores and readers-writers locks when longer sections of code need access to data

Implements POSIX synchronization primitives to support multitasking multithreading (including real-time threads) and multiprocessing

3131

Further Reading

Ben-Ari M Principles of Concurrent Programming Prentice Hall 1982

Lamport L The Mutual Exclusion Problem Journal of the ACM April 1986

Abraham Silberschatz Peter B Galvin Operating System Concepts John Wiley amp Sons 6th Ed 2003

ndash Chapter 7 - Process Synchronization

ndash Chapter 8 - Deadlocks

3232

32 Trap Dispatching Interrupts Synchronization Trap and Interrupt dispatching

IRQL levels amp Interrupt Precedence

Spinlocks and Kernel Synchronization

Executive Synchronization

3333

Kernel Mode Versus User Mode

A processor state

Controls access to memory

Each memory page is tagged to show the required mode for reading and for writing

ndash Protects the system from the users

ndash Protects the user (process) from themselves

ndash System is not protected from system

Code regions are tagged ldquono write in any moderdquo

Controls ability to execute privileged instructions

A Windows abstraction

ndash Intel Ring 0 Ring 3

3434

Kernel Mode Versus User Mode

Control flow (a thread) can change from user to kernel mode and back

ndash Does not affect scheduling

ndash Thread context includes info about execution mode (along with registers etc)

PerfMon counters

ndash ldquoPrivileged Timerdquo and ldquoUser Timerdquo

ndash 4 levels of granularity thread process processor system

3535

Getting Into Kernel Mode

Code is run in kernel mode for one of three reasons

1 Requests from user mode

ndash Via the system service dispatch mechanism

ndash Kernel-mode code runs in the context of the requesting thread

2 Dedicated kernel-mode system threads

ndash Some threads in the system stay in kernel mode at all timesmostly in the ldquoSystemrdquo process

ndash Scheduled preempted etc like any other threads

3636

Getting Into Kernel Mode

3 Interrupts from external devicesndash interrupt dispatcher invokes the interrupt service routine

ndash ISR runs in the context of the interrupted thread so-called ldquoarbitrary thread contextrdquo

ndash ISR often requests the execution of a ldquoDPC routinerdquo which also runs in kernel mode

ndash Time not charged to interrupted thread

3737

Trap dispatching

Interruptdispatcher

Systemservice

dispatcher

Interruptserviceroutines

Interruptserviceroutines

Interruptserviceroutines

System services

System services

System services

Exceptiondispatcher

Exceptionhandlers

Exceptionhandlers

Exceptionhandlers

Virtual memorymanagerlsquos pager

Interrupt

System service call

HW exceptionsSW exceptions

Virtual addressexceptions

Trap processorlsquos mechanism to capture executing thread

ndash Switch from user to kernel mode

ndash Interrupts ndash asynchronous

ndash Exceptions - synchronous

Interrupt dispatch routine

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Interrupt service routine

interrupt

user or kernel mode

codekernel mode

Note no thread or process context switch

Note no thread or process context switch

Interrupt Dispatching

3939

IRQL = Interrupt Request Level

ndash Precedence of the interrupt with respect to other interrupts

ndash Different interrupt sources have different IRQLs

ndash not the same as IRQ

IRQL is also a state of the processor

ndash Servicing an interrupt raises processor IRQL to that interruptrsquos IRQL

ndash this masks subsequent interrupts at equal and lower IRQLs

User mode is limited to IRQL 0

No waits or page faults at IRQL gt= DISPATCH_LEVEL

Interrupt Precedence via IRQLs (x86)

4040

PassiveLowAPC

DispatchDPCDevice 1

Profile amp Synch (Srv 2003)

ClockInterprocessor Interrupt

Power failHigh

normal thread execution

Hardware interrupts

Deferrable software interrupts

012

302928

31

Interrupt Precedence via IRQLs (x86)

4141

Interrupt processing

Interrupt dispatch table (IDT)ndash Links to interrupt service routines

x86ndash Interrupt controller interrupts processor (single line)

ndash Processor queries for interrupt vector uses vector as index to IDT

Alphandash PAL code (Privileged Architecture Library ndash Alpha BIOS) determines interrupt vector calls kernel

ndash Kernel uses vector to index IDT

After ISR execution IRQL is lowered to initial level

4242

Interrupt object

Allows device drivers to register ISRs for their devicesndash Contains dispatch code (initial handler)

ndash Dispatch code calls ISR with interrupt object as parameter(HW cannot pass parameters to ISR)

Connectingdisconnecting interrupt objectsndash Dynamic association between ISR and IDT entry

ndash Loadable device drivers (kernel modules)

ndash Turn onoff ISR

Interrupt objects can synchronize access to ISR datandash Multiple instances of ISR may be active simultaneously (MP machine)

ndash Multiple ISR may be connected with IRQL

4343

Predefined IRQLs

High ndash used when halting the system (via KeBugCheck())

Power fail ndash originated in the NT design document but has never been used

Inter-processor interruptndash used to request action from other processor (dispatching a thread updating a processors TLB system shutdown system crash)

Clockndash Used to update systemlsquos clock allocation of CPU time to threads

Profilendash Used for kernel profiling (see Kernel profiler ndash Kernprofexe Res Kit)

4444

Predefined IRQLs (contd)

Device

ndash Used to prioritize device interrupts

DPCdispatch and APC

ndash Software interrupts that kernel and device drivers generate

Passive

ndash No interrupt level at all normal thread execution

4545

IRQLs on 64-bit Systems

PassiveLowAPC

DispatchDPCDevice 1

Device n

Synch (Srv 2003)Clock

Interprocessor InterruptPower

HighProfile

012

1413

15

34

PassiveLowAPC

DispatchDPC amp Synch (UP only)Correctable Machine Check

Device 1

Device nSynch (MP only)

ClockInterprocessor Interrupt

HighProfilePower

x64 IA64

12

4646

Interrupt Prioritization amp Delivery

IRQLs are determined as followsndash x86 UP systems IRQL = 27 - IRQ

ndash x86 MP systems bucketized (random)

ndash x64 amp IA64 systems IRQL = IDT vector number 16

On MP systems which processor is chosen to deliver an interruptndash By default any processor can receive an interrupt from any deviceCan be configured with IntFilter utility in Resource Kit

ndash On x86 and x64 systems the IOAPIC (IO advanced programmable interrupt controller) is programmed to interrupt the processor running at the lowest IRQL

ndash On IA64 systems the SAPIC (streamlined advanced programmable interrupt controller) is configured to interrupt one processor for each interrupt sourceProcessors are assigned round robin for each interrupt vector

4747

Software interrupts

Initiating thread dispatching

ndash DPC allow for scheduling actions when kernel is deep within many layers of code

ndash Delayed scheduling decision one DPC queue per processor

Handling timer expiration

Asynchronous execution of a procedure in context of a particular thread

Support for asynchronous IO operations

4848

Flow of Interrupts

4949

Sync on MP use spinlocks to coordinate among processors

Spinlock acquisition and release routines implement a one-owner-at-a-time algorithm

ndash Spinlock is either free or is considered to be owned by a CPU

ndash Analogous to using Windows API mutexes from user mode

A spinlock is just a data cell in memory

ndash Accessed with a test-and-modify operation that is atomic across all processors

ndash KSPIN_LOCK is an opaque data type typedefrsquod as a ULONG

ndash To implement synchronization a single bit is sufficient

Synchronization on SMP Systems

5050

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

Kernel Synchronization

Processor BProcessor A

Critical section

spinlock

DPC DPC

A spinlock is a locking primitive associatedwith a global data structure such as the DPC queue

5151

Queued Spinlocks

Problem Checking status of spinlock via test-and-set operation creates bus contention

Queued spinlocks maintain queue of waiting processors

First processor acquires lock other processors wait on processor-local flag

ndash Thus busy-wait loop requires no access to the memory bus

When releasing lock the 1st processorrsquos flag is modified

ndash Exactly one processor is being signaled

ndash Pre-determined wait order

5252

SMP Scalability Improvements

Windows 2000 queued spinlocks

ndash qlocks in Kernel Debugger

Server 2003

ndash More spinlocks eliminated (context swap system space commit)

ndash Further reduction of use of spinlocks amp length they are held

ndash Scheduling database now per-CPUAllows thread state transitions in parallel

5353

SMP Scalability Improvements

XP2003

ndash Minimized lock contention for hot locks PFN or Page Frame Database lock

ndash Some locks completely eliminatedCharging nonpagedpaged pool quotas allocating and mapping system page table entries charging commitment of pages allocatingmapping physical memory through AWE functions

ndash New more efficient locking mechanism (pushlocks)Doesnrsquot use spinlocks when no contentionUsed for object manager and address windowing extensions (AWE) related locks

5454

Waiting

Flexible wait calls

ndash Wait for one or multiple objects in one call

ndash Wait for multiple can wait for ldquoanyrdquo one or ldquoallrdquo at onceldquoAllrdquo all objects must be in the signalled state concurrently to resolve the wait

ndash All wait calls include optional timeout argument

ndash Waiting threads consume no CPU time

5555

Waiting

Waitable objects include

ndash Events (may be auto-reset or manual reset may be set or ldquopulsedrdquo)

ndash Mutexes (ldquomutual exclusionrdquo one-at-a-time)

ndash Semaphores (n-at-a-time)

ndash Timers

ndash Processes and Threads (signalled upon exit or terminate)

ndash Directories (change notification)

5656

Waiting

No guaranteed ordering of wait resolution

ndash If multiple threads are waiting for an object and only one thread is released (eg itrsquos a mutex or auto-reset event) which thread gets released is unpredictable

5757

Executive Synchronization

Thread waitson an object

handle

Create and initialize thread object

Initialized

Ready

Transition

Waiting

Running

Terminated

Standby

Wait is completeSet object to

signaled state

Interaction with thread scheduling

Waiting on Dispatcher Objects ndash outside the kernel

5858

Interaction bet Synchronization amp Dispatching User mode thread waits on an event objectlsquos handle

Kernel changes threadlsquos scheduling state from ready to waiting and adds thread to wait-list

Another thread sets the event

Kernel wakes up waiting threads variable priority threads get priority boost

5959

Interaction bet Synchronization amp Dispatching Dispatcher re-schedules new thread ndash it may preempt running thread it it has lower priority and issues software interrupt to initiate context switch

If no processor can be preempted the dispatcher places the ready thread in the dispatcher ready queue to be scheduled later

6060

What signals an object

Dispatcher object

System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

Owning thread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (kernel mode)

nonsignaled signaled

Owning thread or otherthread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (exported to user mode)

nonsignaled signaled

One thread releases thesemaphore freeing a resource

A thread acquires the semaphoreMore resources are not available

Kernel resumes one or more waiting threads

Semaphore

6161

A thread reinitializesthe thread object

What signals an object (contd)

Dispatcher object System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

A thread sets the event

Kernel resumes one or more threads

Kernel resumes one or more waiting threads

Event

nonsignaled signaled

Dedicated thread sets oneevent in the event pair

Kernel resumes theother dedicated thread

Kernel resumes waitingdedicated thread

Event pair

nonsignaled signaled

Timer expires

A thread (re) initializes the timer

Kernel resumes all waiting threads

Timer

nonsignaled signaled

Thread terminates

Kernel resumes all waiting threads

Thread

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 9: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

99

Second Attempt Algorithm 2

Shared variables

ndash initialization int flag[2] flag[0] = flag[1] = 0

ndash flag[i] == 1 Ti can enter its critical section

Thread Ti

do

flag[i] = 1while (flag[j] == 1)

critical section

flag[i] = 0remainder section

while(1)

Satisfies mutual exclusion not progress requirement

1010

Algorithm 3 (Petersonrsquos Algorithm - 1981)

Shared variables of algorithms 1 and 2 - initialization

int flag[2] flag[0] = flag[1] = 0int turn = 0

Thread Ti

do

flag[i] = 1turn = jwhile ((flag[j] == 1) ampamp turn == j)

critical section

flag[i] = 0

remainder section

while (1)

Solves the critical-section problem for two threads

1111

Dekkerrsquos Algorithm (1965)

This is the first correct solution proposed for the two-thread (two-process) case

Originally developed by Dekker in a different context it was applied to the critical section problem by Dijkstra

Dekker adds the idea of a favored thread and allows access to either thread when the request is uncontested

When there is a conflict one thread is favored and the priority reverses after successful execution of the critical section

1212

Dekkerrsquos Algorithm (contd)

Shared variables - initializationint flag[2] flag[0] = flag[1] = 0int turn = 0

Thread Ti

do flag[i] = 1

while (flag[j] ) if (turn == j)

flag[i] = 0while (turn == j)flag[i] = 1

critical section

turn = jflag[I] = 0

remainder section

while (1)

1313

Bakery Algorithm (Lamport 1979)

A Solution to the Critical Section problem for n threads

Before entering its CS a thread receives a number

Holder of the smallest number enters the CS

If threads Ti and Tj receive the same number if i lt j then Ti is served first else Tj is served first

The numbering scheme generates numbers in monotonically non-decreasing order

ndash ie 1112333445

1414

Bakery Algorithm

Notation ldquoltldquo establishes lexicographical order among 2-tuples (ticket thread id )

(ab) lt (cd) if a lt c or if a == c and b lt d

max (a0hellip an-1) = k | k ai for i = 0hellip n ndash 1

Shared data

int choosing[n]

int number[n] - the ticket

Data structures are initialized to 0

1515

Bakery Algorithm

do

choosing[i] = 1

number[i] = max(number[0]number[1] number[n-1]) + 1

choosing[i] = 0

for (j = 0 j lt n j++)

while (choosing[j] == 1)

while ((number[j] = 0) ampamp ((number[j]j) lsquorsquoltlsquorsquo (number[i]i)))

critical section

number[i] = 0

remainder section

while (1)

1616

Mutual Exclusion - Hardware Support

Interrupt Disabling

ndash Concurrent threads cannot overlap on a uniprocessor

ndash Thread will run until performing a system call or interrupt happens

Special Atomic Machine Instructions

ndash Test and Set Instruction - read amp write a memory location

ndash Exchange Instruction - swap register and memory location

Problems with Machine-Instruction Approach

ndash Busy waiting

ndash Starvation is possible

ndash Deadlock is possible

1717

Synchronization Hardware

Test and modify the content of a word atomically

boolean TestAndSet(boolean amptarget)

boolean rv = target

target = true

return rv

1818

Shared data ndash boolean lock = false

Thread Ti

do

while (TestAndSet(lock))

critical section

lock = false

remainder section

Mutual Exclusion with Test-and-Set

1919

Synchronization Hardware

Atomically swap two variables

void Swap(boolean ampa boolean ampb)

boolean temp = a

a = b

b = temp

2020

Mutual Exclusion with Swap

Shared data (initialized to 0) int lock = 0

Thread Ti

int key

do

key = 1

while (key == 1) Swap(lockkey)

critical section

lock = 0

remainder section

2121

Semaphores

Semaphore S ndash integer variable

can only be accessed via two atomic operations

wait (S)

while (S lt= 0)S--

signal (S)

S++

2222

Critical Section of n Threads

Shared data

semaphore mutex initially mutex = 1

Thread Ti

do wait(mutex) critical section

signal(mutex) remainder section

while (1)

2323

Semaphore Implementation

Semaphores may suspendresume threads

ndash Avoid busy waiting

Define a semaphore as a record

typedef struct

int value struct thread L semaphore

Assume two simple operations

ndash suspend() suspends the thread that invokes it

ndash resume(T) resumes the execution of a blocked thread T

2424

Implementation

Semaphore operations now defined as wait(S)

Svalue--

if (Svalue lt 0)

add this thread to SLsuspend()

signal(S) Svalue++

if (Svalue lt= 0)

remove a thread T from SLresume(T)

2525

Semaphore as a General Synchronization Tool Execute B in Tj only after A executed in Ti

Use semaphore flag initialized to 0

Code

Ti Tj

A wait(flag)

signal(flag) B

2626

Two Types of Semaphores

Counting semaphore

ndash integer value can range over an unrestricted domain

Binary semaphore

ndash integer value can range only between 0 and 1

ndash can be simpler to implement

Counting semaphore S can be implemented as a binary semaphore

2727

Deadlock and Starvation

Deadlock ndash

ndash two or more threads are waiting indefinitely for an event that can be caused by only one of the waiting threads

Let S and Q be two semaphores initialized to 1

T0 T1

wait(S) wait(Q)

wait(Q) wait(S)

signal(S) signal(Q)

signal(Q) signal(S)

2828

Deadlock and Starvation

Starvation ndash indefinite blocking

ndash A thread may never be removed from the semaphore queue in which it is suspended

Solution ndash

ndash all code should acquirerelease semaphores in same order

2929

Windows Synchronization

Uses interrupt masks to protect access to global resources on uniprocessor systems

Uses spinlocks on multiprocessor systems

Provides dispatcher objects which may act as mutexes and semaphores

Dispatcher objects may also provide events An event acts much like a condition variable

3030

Linux Synchronization

Kernel disables interrupts for synchronizing access to global data on uniprocessor systems

Uses spinlocks for multiprocessor synchronization

Uses semaphores and readers-writers locks when longer sections of code need access to data

Implements POSIX synchronization primitives to support multitasking multithreading (including real-time threads) and multiprocessing

3131

Further Reading

Ben-Ari M Principles of Concurrent Programming Prentice Hall 1982

Lamport L The Mutual Exclusion Problem Journal of the ACM April 1986

Abraham Silberschatz Peter B Galvin Operating System Concepts John Wiley amp Sons 6th Ed 2003

ndash Chapter 7 - Process Synchronization

ndash Chapter 8 - Deadlocks

3232

32 Trap Dispatching Interrupts Synchronization Trap and Interrupt dispatching

IRQL levels amp Interrupt Precedence

Spinlocks and Kernel Synchronization

Executive Synchronization

3333

Kernel Mode Versus User Mode

A processor state

Controls access to memory

Each memory page is tagged to show the required mode for reading and for writing

ndash Protects the system from the users

ndash Protects the user (process) from themselves

ndash System is not protected from system

Code regions are tagged ldquono write in any moderdquo

Controls ability to execute privileged instructions

A Windows abstraction

ndash Intel Ring 0 Ring 3

3434

Kernel Mode Versus User Mode

Control flow (a thread) can change from user to kernel mode and back

ndash Does not affect scheduling

ndash Thread context includes info about execution mode (along with registers etc)

PerfMon counters

ndash ldquoPrivileged Timerdquo and ldquoUser Timerdquo

ndash 4 levels of granularity thread process processor system

3535

Getting Into Kernel Mode

Code is run in kernel mode for one of three reasons

1 Requests from user mode

ndash Via the system service dispatch mechanism

ndash Kernel-mode code runs in the context of the requesting thread

2 Dedicated kernel-mode system threads

ndash Some threads in the system stay in kernel mode at all timesmostly in the ldquoSystemrdquo process

ndash Scheduled preempted etc like any other threads

3636

Getting Into Kernel Mode

3 Interrupts from external devicesndash interrupt dispatcher invokes the interrupt service routine

ndash ISR runs in the context of the interrupted thread so-called ldquoarbitrary thread contextrdquo

ndash ISR often requests the execution of a ldquoDPC routinerdquo which also runs in kernel mode

ndash Time not charged to interrupted thread

3737

Trap dispatching

Interruptdispatcher

Systemservice

dispatcher

Interruptserviceroutines

Interruptserviceroutines

Interruptserviceroutines

System services

System services

System services

Exceptiondispatcher

Exceptionhandlers

Exceptionhandlers

Exceptionhandlers

Virtual memorymanagerlsquos pager

Interrupt

System service call

HW exceptionsSW exceptions

Virtual addressexceptions

Trap processorlsquos mechanism to capture executing thread

ndash Switch from user to kernel mode

ndash Interrupts ndash asynchronous

ndash Exceptions - synchronous

Interrupt dispatch routine

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Interrupt service routine

interrupt

user or kernel mode

codekernel mode

Note no thread or process context switch

Note no thread or process context switch

Interrupt Dispatching

3939

IRQL = Interrupt Request Level

ndash Precedence of the interrupt with respect to other interrupts

ndash Different interrupt sources have different IRQLs

ndash not the same as IRQ

IRQL is also a state of the processor

ndash Servicing an interrupt raises processor IRQL to that interruptrsquos IRQL

ndash this masks subsequent interrupts at equal and lower IRQLs

User mode is limited to IRQL 0

No waits or page faults at IRQL gt= DISPATCH_LEVEL

Interrupt Precedence via IRQLs (x86)

4040

PassiveLowAPC

DispatchDPCDevice 1

Profile amp Synch (Srv 2003)

ClockInterprocessor Interrupt

Power failHigh

normal thread execution

Hardware interrupts

Deferrable software interrupts

012

302928

31

Interrupt Precedence via IRQLs (x86)

4141

Interrupt processing

Interrupt dispatch table (IDT)ndash Links to interrupt service routines

x86ndash Interrupt controller interrupts processor (single line)

ndash Processor queries for interrupt vector uses vector as index to IDT

Alphandash PAL code (Privileged Architecture Library ndash Alpha BIOS) determines interrupt vector calls kernel

ndash Kernel uses vector to index IDT

After ISR execution IRQL is lowered to initial level

4242

Interrupt object

Allows device drivers to register ISRs for their devicesndash Contains dispatch code (initial handler)

ndash Dispatch code calls ISR with interrupt object as parameter(HW cannot pass parameters to ISR)

Connectingdisconnecting interrupt objectsndash Dynamic association between ISR and IDT entry

ndash Loadable device drivers (kernel modules)

ndash Turn onoff ISR

Interrupt objects can synchronize access to ISR datandash Multiple instances of ISR may be active simultaneously (MP machine)

ndash Multiple ISR may be connected with IRQL

4343

Predefined IRQLs

High ndash used when halting the system (via KeBugCheck())

Power fail ndash originated in the NT design document but has never been used

Inter-processor interruptndash used to request action from other processor (dispatching a thread updating a processors TLB system shutdown system crash)

Clockndash Used to update systemlsquos clock allocation of CPU time to threads

Profilendash Used for kernel profiling (see Kernel profiler ndash Kernprofexe Res Kit)

4444

Predefined IRQLs (contd)

Device

ndash Used to prioritize device interrupts

DPCdispatch and APC

ndash Software interrupts that kernel and device drivers generate

Passive

ndash No interrupt level at all normal thread execution

4545

IRQLs on 64-bit Systems

PassiveLowAPC

DispatchDPCDevice 1

Device n

Synch (Srv 2003)Clock

Interprocessor InterruptPower

HighProfile

012

1413

15

34

PassiveLowAPC

DispatchDPC amp Synch (UP only)Correctable Machine Check

Device 1

Device nSynch (MP only)

ClockInterprocessor Interrupt

HighProfilePower

x64 IA64

12

4646

Interrupt Prioritization amp Delivery

IRQLs are determined as followsndash x86 UP systems IRQL = 27 - IRQ

ndash x86 MP systems bucketized (random)

ndash x64 amp IA64 systems IRQL = IDT vector number 16

On MP systems which processor is chosen to deliver an interruptndash By default any processor can receive an interrupt from any deviceCan be configured with IntFilter utility in Resource Kit

ndash On x86 and x64 systems the IOAPIC (IO advanced programmable interrupt controller) is programmed to interrupt the processor running at the lowest IRQL

ndash On IA64 systems the SAPIC (streamlined advanced programmable interrupt controller) is configured to interrupt one processor for each interrupt sourceProcessors are assigned round robin for each interrupt vector

4747

Software interrupts

Initiating thread dispatching

ndash DPC allow for scheduling actions when kernel is deep within many layers of code

ndash Delayed scheduling decision one DPC queue per processor

Handling timer expiration

Asynchronous execution of a procedure in context of a particular thread

Support for asynchronous IO operations

4848

Flow of Interrupts

4949

Sync on MP use spinlocks to coordinate among processors

Spinlock acquisition and release routines implement a one-owner-at-a-time algorithm

ndash Spinlock is either free or is considered to be owned by a CPU

ndash Analogous to using Windows API mutexes from user mode

A spinlock is just a data cell in memory

ndash Accessed with a test-and-modify operation that is atomic across all processors

ndash KSPIN_LOCK is an opaque data type typedefrsquod as a ULONG

ndash To implement synchronization a single bit is sufficient

Synchronization on SMP Systems

5050

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

Kernel Synchronization

Processor BProcessor A

Critical section

spinlock

DPC DPC

A spinlock is a locking primitive associatedwith a global data structure such as the DPC queue

5151

Queued Spinlocks

Problem Checking status of spinlock via test-and-set operation creates bus contention

Queued spinlocks maintain queue of waiting processors

First processor acquires lock other processors wait on processor-local flag

ndash Thus busy-wait loop requires no access to the memory bus

When releasing lock the 1st processorrsquos flag is modified

ndash Exactly one processor is being signaled

ndash Pre-determined wait order

5252

SMP Scalability Improvements

Windows 2000 queued spinlocks

ndash qlocks in Kernel Debugger

Server 2003

ndash More spinlocks eliminated (context swap system space commit)

ndash Further reduction of use of spinlocks amp length they are held

ndash Scheduling database now per-CPUAllows thread state transitions in parallel

5353

SMP Scalability Improvements

XP2003

ndash Minimized lock contention for hot locks PFN or Page Frame Database lock

ndash Some locks completely eliminatedCharging nonpagedpaged pool quotas allocating and mapping system page table entries charging commitment of pages allocatingmapping physical memory through AWE functions

ndash New more efficient locking mechanism (pushlocks)Doesnrsquot use spinlocks when no contentionUsed for object manager and address windowing extensions (AWE) related locks

5454

Waiting

Flexible wait calls

ndash Wait for one or multiple objects in one call

ndash Wait for multiple can wait for ldquoanyrdquo one or ldquoallrdquo at onceldquoAllrdquo all objects must be in the signalled state concurrently to resolve the wait

ndash All wait calls include optional timeout argument

ndash Waiting threads consume no CPU time

5555

Waiting

Waitable objects include

ndash Events (may be auto-reset or manual reset may be set or ldquopulsedrdquo)

ndash Mutexes (ldquomutual exclusionrdquo one-at-a-time)

ndash Semaphores (n-at-a-time)

ndash Timers

ndash Processes and Threads (signalled upon exit or terminate)

ndash Directories (change notification)

5656

Waiting

No guaranteed ordering of wait resolution

ndash If multiple threads are waiting for an object and only one thread is released (eg itrsquos a mutex or auto-reset event) which thread gets released is unpredictable

5757

Executive Synchronization

Thread waitson an object

handle

Create and initialize thread object

Initialized

Ready

Transition

Waiting

Running

Terminated

Standby

Wait is completeSet object to

signaled state

Interaction with thread scheduling

Waiting on Dispatcher Objects ndash outside the kernel

5858

Interaction bet Synchronization amp Dispatching User mode thread waits on an event objectlsquos handle

Kernel changes threadlsquos scheduling state from ready to waiting and adds thread to wait-list

Another thread sets the event

Kernel wakes up waiting threads variable priority threads get priority boost

5959

Interaction bet Synchronization amp Dispatching Dispatcher re-schedules new thread ndash it may preempt running thread it it has lower priority and issues software interrupt to initiate context switch

If no processor can be preempted the dispatcher places the ready thread in the dispatcher ready queue to be scheduled later

6060

What signals an object

Dispatcher object

System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

Owning thread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (kernel mode)

nonsignaled signaled

Owning thread or otherthread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (exported to user mode)

nonsignaled signaled

One thread releases thesemaphore freeing a resource

A thread acquires the semaphoreMore resources are not available

Kernel resumes one or more waiting threads

Semaphore

6161

A thread reinitializesthe thread object

What signals an object (contd)

Dispatcher object System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

A thread sets the event

Kernel resumes one or more threads

Kernel resumes one or more waiting threads

Event

nonsignaled signaled

Dedicated thread sets oneevent in the event pair

Kernel resumes theother dedicated thread

Kernel resumes waitingdedicated thread

Event pair

nonsignaled signaled

Timer expires

A thread (re) initializes the timer

Kernel resumes all waiting threads

Timer

nonsignaled signaled

Thread terminates

Kernel resumes all waiting threads

Thread

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 10: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

1010

Algorithm 3 (Petersonrsquos Algorithm - 1981)

Shared variables of algorithms 1 and 2 - initialization

int flag[2] flag[0] = flag[1] = 0int turn = 0

Thread Ti

do

flag[i] = 1turn = jwhile ((flag[j] == 1) ampamp turn == j)

critical section

flag[i] = 0

remainder section

while (1)

Solves the critical-section problem for two threads

1111

Dekkerrsquos Algorithm (1965)

This is the first correct solution proposed for the two-thread (two-process) case

Originally developed by Dekker in a different context it was applied to the critical section problem by Dijkstra

Dekker adds the idea of a favored thread and allows access to either thread when the request is uncontested

When there is a conflict one thread is favored and the priority reverses after successful execution of the critical section

1212

Dekkerrsquos Algorithm (contd)

Shared variables - initializationint flag[2] flag[0] = flag[1] = 0int turn = 0

Thread Ti

do flag[i] = 1

while (flag[j] ) if (turn == j)

flag[i] = 0while (turn == j)flag[i] = 1

critical section

turn = jflag[I] = 0

remainder section

while (1)

1313

Bakery Algorithm (Lamport 1979)

A Solution to the Critical Section problem for n threads

Before entering its CS a thread receives a number

Holder of the smallest number enters the CS

If threads Ti and Tj receive the same number if i lt j then Ti is served first else Tj is served first

The numbering scheme generates numbers in monotonically non-decreasing order

ndash ie 1112333445

1414

Bakery Algorithm

Notation ldquoltldquo establishes lexicographical order among 2-tuples (ticket thread id )

(ab) lt (cd) if a lt c or if a == c and b lt d

max (a0hellip an-1) = k | k ai for i = 0hellip n ndash 1

Shared data

int choosing[n]

int number[n] - the ticket

Data structures are initialized to 0

1515

Bakery Algorithm

do

choosing[i] = 1

number[i] = max(number[0]number[1] number[n-1]) + 1

choosing[i] = 0

for (j = 0 j lt n j++)

while (choosing[j] == 1)

while ((number[j] = 0) ampamp ((number[j]j) lsquorsquoltlsquorsquo (number[i]i)))

critical section

number[i] = 0

remainder section

while (1)

1616

Mutual Exclusion - Hardware Support

Interrupt Disabling

ndash Concurrent threads cannot overlap on a uniprocessor

ndash Thread will run until performing a system call or interrupt happens

Special Atomic Machine Instructions

ndash Test and Set Instruction - read amp write a memory location

ndash Exchange Instruction - swap register and memory location

Problems with Machine-Instruction Approach

ndash Busy waiting

ndash Starvation is possible

ndash Deadlock is possible

1717

Synchronization Hardware

Test and modify the content of a word atomically

boolean TestAndSet(boolean amptarget)

boolean rv = target

target = true

return rv

1818

Shared data ndash boolean lock = false

Thread Ti

do

while (TestAndSet(lock))

critical section

lock = false

remainder section

Mutual Exclusion with Test-and-Set

1919

Synchronization Hardware

Atomically swap two variables

void Swap(boolean ampa boolean ampb)

boolean temp = a

a = b

b = temp

2020

Mutual Exclusion with Swap

Shared data (initialized to 0) int lock = 0

Thread Ti

int key

do

key = 1

while (key == 1) Swap(lockkey)

critical section

lock = 0

remainder section

2121

Semaphores

Semaphore S ndash integer variable

can only be accessed via two atomic operations

wait (S)

while (S lt= 0)S--

signal (S)

S++

2222

Critical Section of n Threads

Shared data

semaphore mutex initially mutex = 1

Thread Ti

do wait(mutex) critical section

signal(mutex) remainder section

while (1)

2323

Semaphore Implementation

Semaphores may suspendresume threads

ndash Avoid busy waiting

Define a semaphore as a record

typedef struct

int value struct thread L semaphore

Assume two simple operations

ndash suspend() suspends the thread that invokes it

ndash resume(T) resumes the execution of a blocked thread T

2424

Implementation

Semaphore operations now defined as wait(S)

Svalue--

if (Svalue lt 0)

add this thread to SLsuspend()

signal(S) Svalue++

if (Svalue lt= 0)

remove a thread T from SLresume(T)

2525

Semaphore as a General Synchronization Tool Execute B in Tj only after A executed in Ti

Use semaphore flag initialized to 0

Code

Ti Tj

A wait(flag)

signal(flag) B

2626

Two Types of Semaphores

Counting semaphore

ndash integer value can range over an unrestricted domain

Binary semaphore

ndash integer value can range only between 0 and 1

ndash can be simpler to implement

Counting semaphore S can be implemented as a binary semaphore

2727

Deadlock and Starvation

Deadlock ndash

ndash two or more threads are waiting indefinitely for an event that can be caused by only one of the waiting threads

Let S and Q be two semaphores initialized to 1

T0 T1

wait(S) wait(Q)

wait(Q) wait(S)

signal(S) signal(Q)

signal(Q) signal(S)

2828

Deadlock and Starvation

Starvation ndash indefinite blocking

ndash A thread may never be removed from the semaphore queue in which it is suspended

Solution ndash

ndash all code should acquirerelease semaphores in same order

2929

Windows Synchronization

Uses interrupt masks to protect access to global resources on uniprocessor systems

Uses spinlocks on multiprocessor systems

Provides dispatcher objects which may act as mutexes and semaphores

Dispatcher objects may also provide events An event acts much like a condition variable

3030

Linux Synchronization

Kernel disables interrupts for synchronizing access to global data on uniprocessor systems

Uses spinlocks for multiprocessor synchronization

Uses semaphores and readers-writers locks when longer sections of code need access to data

Implements POSIX synchronization primitives to support multitasking multithreading (including real-time threads) and multiprocessing

3131

Further Reading

Ben-Ari M Principles of Concurrent Programming Prentice Hall 1982

Lamport L The Mutual Exclusion Problem Journal of the ACM April 1986

Abraham Silberschatz Peter B Galvin Operating System Concepts John Wiley amp Sons 6th Ed 2003

ndash Chapter 7 - Process Synchronization

ndash Chapter 8 - Deadlocks

3232

32 Trap Dispatching Interrupts Synchronization Trap and Interrupt dispatching

IRQL levels amp Interrupt Precedence

Spinlocks and Kernel Synchronization

Executive Synchronization

3333

Kernel Mode Versus User Mode

A processor state

Controls access to memory

Each memory page is tagged to show the required mode for reading and for writing

ndash Protects the system from the users

ndash Protects the user (process) from themselves

ndash System is not protected from system

Code regions are tagged ldquono write in any moderdquo

Controls ability to execute privileged instructions

A Windows abstraction

ndash Intel Ring 0 Ring 3

3434

Kernel Mode Versus User Mode

Control flow (a thread) can change from user to kernel mode and back

ndash Does not affect scheduling

ndash Thread context includes info about execution mode (along with registers etc)

PerfMon counters

ndash ldquoPrivileged Timerdquo and ldquoUser Timerdquo

ndash 4 levels of granularity thread process processor system

3535

Getting Into Kernel Mode

Code is run in kernel mode for one of three reasons

1 Requests from user mode

ndash Via the system service dispatch mechanism

ndash Kernel-mode code runs in the context of the requesting thread

2 Dedicated kernel-mode system threads

ndash Some threads in the system stay in kernel mode at all timesmostly in the ldquoSystemrdquo process

ndash Scheduled preempted etc like any other threads

3636

Getting Into Kernel Mode

3 Interrupts from external devicesndash interrupt dispatcher invokes the interrupt service routine

ndash ISR runs in the context of the interrupted thread so-called ldquoarbitrary thread contextrdquo

ndash ISR often requests the execution of a ldquoDPC routinerdquo which also runs in kernel mode

ndash Time not charged to interrupted thread

3737

Trap dispatching

Interruptdispatcher

Systemservice

dispatcher

Interruptserviceroutines

Interruptserviceroutines

Interruptserviceroutines

System services

System services

System services

Exceptiondispatcher

Exceptionhandlers

Exceptionhandlers

Exceptionhandlers

Virtual memorymanagerlsquos pager

Interrupt

System service call

HW exceptionsSW exceptions

Virtual addressexceptions

Trap processorlsquos mechanism to capture executing thread

ndash Switch from user to kernel mode

ndash Interrupts ndash asynchronous

ndash Exceptions - synchronous

Interrupt dispatch routine

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Interrupt service routine

interrupt

user or kernel mode

codekernel mode

Note no thread or process context switch

Note no thread or process context switch

Interrupt Dispatching

3939

IRQL = Interrupt Request Level

ndash Precedence of the interrupt with respect to other interrupts

ndash Different interrupt sources have different IRQLs

ndash not the same as IRQ

IRQL is also a state of the processor

ndash Servicing an interrupt raises processor IRQL to that interruptrsquos IRQL

ndash this masks subsequent interrupts at equal and lower IRQLs

User mode is limited to IRQL 0

No waits or page faults at IRQL gt= DISPATCH_LEVEL

Interrupt Precedence via IRQLs (x86)

4040

PassiveLowAPC

DispatchDPCDevice 1

Profile amp Synch (Srv 2003)

ClockInterprocessor Interrupt

Power failHigh

normal thread execution

Hardware interrupts

Deferrable software interrupts

012

302928

31

Interrupt Precedence via IRQLs (x86)

4141

Interrupt processing

Interrupt dispatch table (IDT)ndash Links to interrupt service routines

x86ndash Interrupt controller interrupts processor (single line)

ndash Processor queries for interrupt vector uses vector as index to IDT

Alphandash PAL code (Privileged Architecture Library ndash Alpha BIOS) determines interrupt vector calls kernel

ndash Kernel uses vector to index IDT

After ISR execution IRQL is lowered to initial level

4242

Interrupt object

Allows device drivers to register ISRs for their devicesndash Contains dispatch code (initial handler)

ndash Dispatch code calls ISR with interrupt object as parameter(HW cannot pass parameters to ISR)

Connectingdisconnecting interrupt objectsndash Dynamic association between ISR and IDT entry

ndash Loadable device drivers (kernel modules)

ndash Turn onoff ISR

Interrupt objects can synchronize access to ISR datandash Multiple instances of ISR may be active simultaneously (MP machine)

ndash Multiple ISR may be connected with IRQL

4343

Predefined IRQLs

High ndash used when halting the system (via KeBugCheck())

Power fail ndash originated in the NT design document but has never been used

Inter-processor interruptndash used to request action from other processor (dispatching a thread updating a processors TLB system shutdown system crash)

Clockndash Used to update systemlsquos clock allocation of CPU time to threads

Profilendash Used for kernel profiling (see Kernel profiler ndash Kernprofexe Res Kit)

4444

Predefined IRQLs (contd)

Device

ndash Used to prioritize device interrupts

DPCdispatch and APC

ndash Software interrupts that kernel and device drivers generate

Passive

ndash No interrupt level at all normal thread execution

4545

IRQLs on 64-bit Systems

PassiveLowAPC

DispatchDPCDevice 1

Device n

Synch (Srv 2003)Clock

Interprocessor InterruptPower

HighProfile

012

1413

15

34

PassiveLowAPC

DispatchDPC amp Synch (UP only)Correctable Machine Check

Device 1

Device nSynch (MP only)

ClockInterprocessor Interrupt

HighProfilePower

x64 IA64

12

4646

Interrupt Prioritization amp Delivery

IRQLs are determined as followsndash x86 UP systems IRQL = 27 - IRQ

ndash x86 MP systems bucketized (random)

ndash x64 amp IA64 systems IRQL = IDT vector number 16

On MP systems which processor is chosen to deliver an interruptndash By default any processor can receive an interrupt from any deviceCan be configured with IntFilter utility in Resource Kit

ndash On x86 and x64 systems the IOAPIC (IO advanced programmable interrupt controller) is programmed to interrupt the processor running at the lowest IRQL

ndash On IA64 systems the SAPIC (streamlined advanced programmable interrupt controller) is configured to interrupt one processor for each interrupt sourceProcessors are assigned round robin for each interrupt vector

4747

Software interrupts

Initiating thread dispatching

ndash DPC allow for scheduling actions when kernel is deep within many layers of code

ndash Delayed scheduling decision one DPC queue per processor

Handling timer expiration

Asynchronous execution of a procedure in context of a particular thread

Support for asynchronous IO operations

4848

Flow of Interrupts

4949

Sync on MP use spinlocks to coordinate among processors

Spinlock acquisition and release routines implement a one-owner-at-a-time algorithm

ndash Spinlock is either free or is considered to be owned by a CPU

ndash Analogous to using Windows API mutexes from user mode

A spinlock is just a data cell in memory

ndash Accessed with a test-and-modify operation that is atomic across all processors

ndash KSPIN_LOCK is an opaque data type typedefrsquod as a ULONG

ndash To implement synchronization a single bit is sufficient

Synchronization on SMP Systems

5050

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

Kernel Synchronization

Processor BProcessor A

Critical section

spinlock

DPC DPC

A spinlock is a locking primitive associatedwith a global data structure such as the DPC queue

5151

Queued Spinlocks

Problem Checking status of spinlock via test-and-set operation creates bus contention

Queued spinlocks maintain queue of waiting processors

First processor acquires lock other processors wait on processor-local flag

ndash Thus busy-wait loop requires no access to the memory bus

When releasing lock the 1st processorrsquos flag is modified

ndash Exactly one processor is being signaled

ndash Pre-determined wait order

5252

SMP Scalability Improvements

Windows 2000 queued spinlocks

ndash qlocks in Kernel Debugger

Server 2003

ndash More spinlocks eliminated (context swap system space commit)

ndash Further reduction of use of spinlocks amp length they are held

ndash Scheduling database now per-CPUAllows thread state transitions in parallel

5353

SMP Scalability Improvements

XP2003

ndash Minimized lock contention for hot locks PFN or Page Frame Database lock

ndash Some locks completely eliminatedCharging nonpagedpaged pool quotas allocating and mapping system page table entries charging commitment of pages allocatingmapping physical memory through AWE functions

ndash New more efficient locking mechanism (pushlocks)Doesnrsquot use spinlocks when no contentionUsed for object manager and address windowing extensions (AWE) related locks

5454

Waiting

Flexible wait calls

ndash Wait for one or multiple objects in one call

ndash Wait for multiple can wait for ldquoanyrdquo one or ldquoallrdquo at onceldquoAllrdquo all objects must be in the signalled state concurrently to resolve the wait

ndash All wait calls include optional timeout argument

ndash Waiting threads consume no CPU time

5555

Waiting

Waitable objects include

ndash Events (may be auto-reset or manual reset may be set or ldquopulsedrdquo)

ndash Mutexes (ldquomutual exclusionrdquo one-at-a-time)

ndash Semaphores (n-at-a-time)

ndash Timers

ndash Processes and Threads (signalled upon exit or terminate)

ndash Directories (change notification)

5656

Waiting

No guaranteed ordering of wait resolution

ndash If multiple threads are waiting for an object and only one thread is released (eg itrsquos a mutex or auto-reset event) which thread gets released is unpredictable

5757

Executive Synchronization

Thread waitson an object

handle

Create and initialize thread object

Initialized

Ready

Transition

Waiting

Running

Terminated

Standby

Wait is completeSet object to

signaled state

Interaction with thread scheduling

Waiting on Dispatcher Objects ndash outside the kernel

5858

Interaction bet Synchronization amp Dispatching User mode thread waits on an event objectlsquos handle

Kernel changes threadlsquos scheduling state from ready to waiting and adds thread to wait-list

Another thread sets the event

Kernel wakes up waiting threads variable priority threads get priority boost

5959

Interaction bet Synchronization amp Dispatching Dispatcher re-schedules new thread ndash it may preempt running thread it it has lower priority and issues software interrupt to initiate context switch

If no processor can be preempted the dispatcher places the ready thread in the dispatcher ready queue to be scheduled later

6060

What signals an object

Dispatcher object

System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

Owning thread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (kernel mode)

nonsignaled signaled

Owning thread or otherthread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (exported to user mode)

nonsignaled signaled

One thread releases thesemaphore freeing a resource

A thread acquires the semaphoreMore resources are not available

Kernel resumes one or more waiting threads

Semaphore

6161

A thread reinitializesthe thread object

What signals an object (contd)

Dispatcher object System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

A thread sets the event

Kernel resumes one or more threads

Kernel resumes one or more waiting threads

Event

nonsignaled signaled

Dedicated thread sets oneevent in the event pair

Kernel resumes theother dedicated thread

Kernel resumes waitingdedicated thread

Event pair

nonsignaled signaled

Timer expires

A thread (re) initializes the timer

Kernel resumes all waiting threads

Timer

nonsignaled signaled

Thread terminates

Kernel resumes all waiting threads

Thread

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 11: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

1111

Dekkerrsquos Algorithm (1965)

This is the first correct solution proposed for the two-thread (two-process) case

Originally developed by Dekker in a different context it was applied to the critical section problem by Dijkstra

Dekker adds the idea of a favored thread and allows access to either thread when the request is uncontested

When there is a conflict one thread is favored and the priority reverses after successful execution of the critical section

1212

Dekkerrsquos Algorithm (contd)

Shared variables - initializationint flag[2] flag[0] = flag[1] = 0int turn = 0

Thread Ti

do flag[i] = 1

while (flag[j] ) if (turn == j)

flag[i] = 0while (turn == j)flag[i] = 1

critical section

turn = jflag[I] = 0

remainder section

while (1)

1313

Bakery Algorithm (Lamport 1979)

A Solution to the Critical Section problem for n threads

Before entering its CS a thread receives a number

Holder of the smallest number enters the CS

If threads Ti and Tj receive the same number if i lt j then Ti is served first else Tj is served first

The numbering scheme generates numbers in monotonically non-decreasing order

ndash ie 1112333445

1414

Bakery Algorithm

Notation ldquoltldquo establishes lexicographical order among 2-tuples (ticket thread id )

(ab) lt (cd) if a lt c or if a == c and b lt d

max (a0hellip an-1) = k | k ai for i = 0hellip n ndash 1

Shared data

int choosing[n]

int number[n] - the ticket

Data structures are initialized to 0

1515

Bakery Algorithm

do

choosing[i] = 1

number[i] = max(number[0]number[1] number[n-1]) + 1

choosing[i] = 0

for (j = 0 j lt n j++)

while (choosing[j] == 1)

while ((number[j] = 0) ampamp ((number[j]j) lsquorsquoltlsquorsquo (number[i]i)))

critical section

number[i] = 0

remainder section

while (1)

1616

Mutual Exclusion - Hardware Support

Interrupt Disabling

ndash Concurrent threads cannot overlap on a uniprocessor

ndash Thread will run until performing a system call or interrupt happens

Special Atomic Machine Instructions

ndash Test and Set Instruction - read amp write a memory location

ndash Exchange Instruction - swap register and memory location

Problems with Machine-Instruction Approach

ndash Busy waiting

ndash Starvation is possible

ndash Deadlock is possible

1717

Synchronization Hardware

Test and modify the content of a word atomically

boolean TestAndSet(boolean amptarget)

boolean rv = target

target = true

return rv

1818

Shared data ndash boolean lock = false

Thread Ti

do

while (TestAndSet(lock))

critical section

lock = false

remainder section

Mutual Exclusion with Test-and-Set

1919

Synchronization Hardware

Atomically swap two variables

void Swap(boolean ampa boolean ampb)

boolean temp = a

a = b

b = temp

2020

Mutual Exclusion with Swap

Shared data (initialized to 0) int lock = 0

Thread Ti

int key

do

key = 1

while (key == 1) Swap(lockkey)

critical section

lock = 0

remainder section

2121

Semaphores

Semaphore S ndash integer variable

can only be accessed via two atomic operations

wait (S)

while (S lt= 0)S--

signal (S)

S++

2222

Critical Section of n Threads

Shared data

semaphore mutex initially mutex = 1

Thread Ti

do wait(mutex) critical section

signal(mutex) remainder section

while (1)

2323

Semaphore Implementation

Semaphores may suspendresume threads

ndash Avoid busy waiting

Define a semaphore as a record

typedef struct

int value struct thread L semaphore

Assume two simple operations

ndash suspend() suspends the thread that invokes it

ndash resume(T) resumes the execution of a blocked thread T

2424

Implementation

Semaphore operations now defined as wait(S)

Svalue--

if (Svalue lt 0)

add this thread to SLsuspend()

signal(S) Svalue++

if (Svalue lt= 0)

remove a thread T from SLresume(T)

2525

Semaphore as a General Synchronization Tool Execute B in Tj only after A executed in Ti

Use semaphore flag initialized to 0

Code

Ti Tj

A wait(flag)

signal(flag) B

2626

Two Types of Semaphores

Counting semaphore

ndash integer value can range over an unrestricted domain

Binary semaphore

ndash integer value can range only between 0 and 1

ndash can be simpler to implement

Counting semaphore S can be implemented as a binary semaphore

2727

Deadlock and Starvation

Deadlock ndash

ndash two or more threads are waiting indefinitely for an event that can be caused by only one of the waiting threads

Let S and Q be two semaphores initialized to 1

T0 T1

wait(S) wait(Q)

wait(Q) wait(S)

signal(S) signal(Q)

signal(Q) signal(S)

2828

Deadlock and Starvation

Starvation ndash indefinite blocking

ndash A thread may never be removed from the semaphore queue in which it is suspended

Solution ndash

ndash all code should acquirerelease semaphores in same order

2929

Windows Synchronization

Uses interrupt masks to protect access to global resources on uniprocessor systems

Uses spinlocks on multiprocessor systems

Provides dispatcher objects which may act as mutexes and semaphores

Dispatcher objects may also provide events An event acts much like a condition variable

3030

Linux Synchronization

Kernel disables interrupts for synchronizing access to global data on uniprocessor systems

Uses spinlocks for multiprocessor synchronization

Uses semaphores and readers-writers locks when longer sections of code need access to data

Implements POSIX synchronization primitives to support multitasking multithreading (including real-time threads) and multiprocessing

3131

Further Reading

Ben-Ari M Principles of Concurrent Programming Prentice Hall 1982

Lamport L The Mutual Exclusion Problem Journal of the ACM April 1986

Abraham Silberschatz Peter B Galvin Operating System Concepts John Wiley amp Sons 6th Ed 2003

ndash Chapter 7 - Process Synchronization

ndash Chapter 8 - Deadlocks

3232

32 Trap Dispatching Interrupts Synchronization Trap and Interrupt dispatching

IRQL levels amp Interrupt Precedence

Spinlocks and Kernel Synchronization

Executive Synchronization

3333

Kernel Mode Versus User Mode

A processor state

Controls access to memory

Each memory page is tagged to show the required mode for reading and for writing

ndash Protects the system from the users

ndash Protects the user (process) from themselves

ndash System is not protected from system

Code regions are tagged ldquono write in any moderdquo

Controls ability to execute privileged instructions

A Windows abstraction

ndash Intel Ring 0 Ring 3

3434

Kernel Mode Versus User Mode

Control flow (a thread) can change from user to kernel mode and back

ndash Does not affect scheduling

ndash Thread context includes info about execution mode (along with registers etc)

PerfMon counters

ndash ldquoPrivileged Timerdquo and ldquoUser Timerdquo

ndash 4 levels of granularity thread process processor system

3535

Getting Into Kernel Mode

Code is run in kernel mode for one of three reasons

1 Requests from user mode

ndash Via the system service dispatch mechanism

ndash Kernel-mode code runs in the context of the requesting thread

2 Dedicated kernel-mode system threads

ndash Some threads in the system stay in kernel mode at all timesmostly in the ldquoSystemrdquo process

ndash Scheduled preempted etc like any other threads

3636

Getting Into Kernel Mode

3 Interrupts from external devicesndash interrupt dispatcher invokes the interrupt service routine

ndash ISR runs in the context of the interrupted thread so-called ldquoarbitrary thread contextrdquo

ndash ISR often requests the execution of a ldquoDPC routinerdquo which also runs in kernel mode

ndash Time not charged to interrupted thread

3737

Trap dispatching

Interruptdispatcher

Systemservice

dispatcher

Interruptserviceroutines

Interruptserviceroutines

Interruptserviceroutines

System services

System services

System services

Exceptiondispatcher

Exceptionhandlers

Exceptionhandlers

Exceptionhandlers

Virtual memorymanagerlsquos pager

Interrupt

System service call

HW exceptionsSW exceptions

Virtual addressexceptions

Trap processorlsquos mechanism to capture executing thread

ndash Switch from user to kernel mode

ndash Interrupts ndash asynchronous

ndash Exceptions - synchronous

Interrupt dispatch routine

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Interrupt service routine

interrupt

user or kernel mode

codekernel mode

Note no thread or process context switch

Note no thread or process context switch

Interrupt Dispatching

3939

IRQL = Interrupt Request Level

ndash Precedence of the interrupt with respect to other interrupts

ndash Different interrupt sources have different IRQLs

ndash not the same as IRQ

IRQL is also a state of the processor

ndash Servicing an interrupt raises processor IRQL to that interruptrsquos IRQL

ndash this masks subsequent interrupts at equal and lower IRQLs

User mode is limited to IRQL 0

No waits or page faults at IRQL gt= DISPATCH_LEVEL

Interrupt Precedence via IRQLs (x86)

4040

PassiveLowAPC

DispatchDPCDevice 1

Profile amp Synch (Srv 2003)

ClockInterprocessor Interrupt

Power failHigh

normal thread execution

Hardware interrupts

Deferrable software interrupts

012

302928

31

Interrupt Precedence via IRQLs (x86)

4141

Interrupt processing

Interrupt dispatch table (IDT)ndash Links to interrupt service routines

x86ndash Interrupt controller interrupts processor (single line)

ndash Processor queries for interrupt vector uses vector as index to IDT

Alphandash PAL code (Privileged Architecture Library ndash Alpha BIOS) determines interrupt vector calls kernel

ndash Kernel uses vector to index IDT

After ISR execution IRQL is lowered to initial level

4242

Interrupt object

Allows device drivers to register ISRs for their devicesndash Contains dispatch code (initial handler)

ndash Dispatch code calls ISR with interrupt object as parameter(HW cannot pass parameters to ISR)

Connectingdisconnecting interrupt objectsndash Dynamic association between ISR and IDT entry

ndash Loadable device drivers (kernel modules)

ndash Turn onoff ISR

Interrupt objects can synchronize access to ISR datandash Multiple instances of ISR may be active simultaneously (MP machine)

ndash Multiple ISR may be connected with IRQL

4343

Predefined IRQLs

High ndash used when halting the system (via KeBugCheck())

Power fail ndash originated in the NT design document but has never been used

Inter-processor interruptndash used to request action from other processor (dispatching a thread updating a processors TLB system shutdown system crash)

Clockndash Used to update systemlsquos clock allocation of CPU time to threads

Profilendash Used for kernel profiling (see Kernel profiler ndash Kernprofexe Res Kit)

4444

Predefined IRQLs (contd)

Device

ndash Used to prioritize device interrupts

DPCdispatch and APC

ndash Software interrupts that kernel and device drivers generate

Passive

ndash No interrupt level at all normal thread execution

4545

IRQLs on 64-bit Systems

PassiveLowAPC

DispatchDPCDevice 1

Device n

Synch (Srv 2003)Clock

Interprocessor InterruptPower

HighProfile

012

1413

15

34

PassiveLowAPC

DispatchDPC amp Synch (UP only)Correctable Machine Check

Device 1

Device nSynch (MP only)

ClockInterprocessor Interrupt

HighProfilePower

x64 IA64

12

4646

Interrupt Prioritization amp Delivery

IRQLs are determined as followsndash x86 UP systems IRQL = 27 - IRQ

ndash x86 MP systems bucketized (random)

ndash x64 amp IA64 systems IRQL = IDT vector number 16

On MP systems which processor is chosen to deliver an interruptndash By default any processor can receive an interrupt from any deviceCan be configured with IntFilter utility in Resource Kit

ndash On x86 and x64 systems the IOAPIC (IO advanced programmable interrupt controller) is programmed to interrupt the processor running at the lowest IRQL

ndash On IA64 systems the SAPIC (streamlined advanced programmable interrupt controller) is configured to interrupt one processor for each interrupt sourceProcessors are assigned round robin for each interrupt vector

4747

Software interrupts

Initiating thread dispatching

ndash DPC allow for scheduling actions when kernel is deep within many layers of code

ndash Delayed scheduling decision one DPC queue per processor

Handling timer expiration

Asynchronous execution of a procedure in context of a particular thread

Support for asynchronous IO operations

4848

Flow of Interrupts

4949

Sync on MP use spinlocks to coordinate among processors

Spinlock acquisition and release routines implement a one-owner-at-a-time algorithm

ndash Spinlock is either free or is considered to be owned by a CPU

ndash Analogous to using Windows API mutexes from user mode

A spinlock is just a data cell in memory

ndash Accessed with a test-and-modify operation that is atomic across all processors

ndash KSPIN_LOCK is an opaque data type typedefrsquod as a ULONG

ndash To implement synchronization a single bit is sufficient

Synchronization on SMP Systems

5050

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

Kernel Synchronization

Processor BProcessor A

Critical section

spinlock

DPC DPC

A spinlock is a locking primitive associatedwith a global data structure such as the DPC queue

5151

Queued Spinlocks

Problem Checking status of spinlock via test-and-set operation creates bus contention

Queued spinlocks maintain queue of waiting processors

First processor acquires lock other processors wait on processor-local flag

ndash Thus busy-wait loop requires no access to the memory bus

When releasing lock the 1st processorrsquos flag is modified

ndash Exactly one processor is being signaled

ndash Pre-determined wait order

5252

SMP Scalability Improvements

Windows 2000 queued spinlocks

ndash qlocks in Kernel Debugger

Server 2003

ndash More spinlocks eliminated (context swap system space commit)

ndash Further reduction of use of spinlocks amp length they are held

ndash Scheduling database now per-CPUAllows thread state transitions in parallel

5353

SMP Scalability Improvements

XP2003

ndash Minimized lock contention for hot locks PFN or Page Frame Database lock

ndash Some locks completely eliminatedCharging nonpagedpaged pool quotas allocating and mapping system page table entries charging commitment of pages allocatingmapping physical memory through AWE functions

ndash New more efficient locking mechanism (pushlocks)Doesnrsquot use spinlocks when no contentionUsed for object manager and address windowing extensions (AWE) related locks

5454

Waiting

Flexible wait calls

ndash Wait for one or multiple objects in one call

ndash Wait for multiple can wait for ldquoanyrdquo one or ldquoallrdquo at onceldquoAllrdquo all objects must be in the signalled state concurrently to resolve the wait

ndash All wait calls include optional timeout argument

ndash Waiting threads consume no CPU time

5555

Waiting

Waitable objects include

ndash Events (may be auto-reset or manual reset may be set or ldquopulsedrdquo)

ndash Mutexes (ldquomutual exclusionrdquo one-at-a-time)

ndash Semaphores (n-at-a-time)

ndash Timers

ndash Processes and Threads (signalled upon exit or terminate)

ndash Directories (change notification)

5656

Waiting

No guaranteed ordering of wait resolution

ndash If multiple threads are waiting for an object and only one thread is released (eg itrsquos a mutex or auto-reset event) which thread gets released is unpredictable

5757

Executive Synchronization

Thread waitson an object

handle

Create and initialize thread object

Initialized

Ready

Transition

Waiting

Running

Terminated

Standby

Wait is completeSet object to

signaled state

Interaction with thread scheduling

Waiting on Dispatcher Objects ndash outside the kernel

5858

Interaction bet Synchronization amp Dispatching User mode thread waits on an event objectlsquos handle

Kernel changes threadlsquos scheduling state from ready to waiting and adds thread to wait-list

Another thread sets the event

Kernel wakes up waiting threads variable priority threads get priority boost

5959

Interaction bet Synchronization amp Dispatching Dispatcher re-schedules new thread ndash it may preempt running thread it it has lower priority and issues software interrupt to initiate context switch

If no processor can be preempted the dispatcher places the ready thread in the dispatcher ready queue to be scheduled later

6060

What signals an object

Dispatcher object

System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

Owning thread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (kernel mode)

nonsignaled signaled

Owning thread or otherthread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (exported to user mode)

nonsignaled signaled

One thread releases thesemaphore freeing a resource

A thread acquires the semaphoreMore resources are not available

Kernel resumes one or more waiting threads

Semaphore

6161

A thread reinitializesthe thread object

What signals an object (contd)

Dispatcher object System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

A thread sets the event

Kernel resumes one or more threads

Kernel resumes one or more waiting threads

Event

nonsignaled signaled

Dedicated thread sets oneevent in the event pair

Kernel resumes theother dedicated thread

Kernel resumes waitingdedicated thread

Event pair

nonsignaled signaled

Timer expires

A thread (re) initializes the timer

Kernel resumes all waiting threads

Timer

nonsignaled signaled

Thread terminates

Kernel resumes all waiting threads

Thread

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 12: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

1212

Dekkerrsquos Algorithm (contd)

Shared variables - initializationint flag[2] flag[0] = flag[1] = 0int turn = 0

Thread Ti

do flag[i] = 1

while (flag[j] ) if (turn == j)

flag[i] = 0while (turn == j)flag[i] = 1

critical section

turn = jflag[I] = 0

remainder section

while (1)

1313

Bakery Algorithm (Lamport 1979)

A Solution to the Critical Section problem for n threads

Before entering its CS a thread receives a number

Holder of the smallest number enters the CS

If threads Ti and Tj receive the same number if i lt j then Ti is served first else Tj is served first

The numbering scheme generates numbers in monotonically non-decreasing order

ndash ie 1112333445

1414

Bakery Algorithm

Notation ldquoltldquo establishes lexicographical order among 2-tuples (ticket thread id )

(ab) lt (cd) if a lt c or if a == c and b lt d

max (a0hellip an-1) = k | k ai for i = 0hellip n ndash 1

Shared data

int choosing[n]

int number[n] - the ticket

Data structures are initialized to 0

1515

Bakery Algorithm

do

choosing[i] = 1

number[i] = max(number[0]number[1] number[n-1]) + 1

choosing[i] = 0

for (j = 0 j lt n j++)

while (choosing[j] == 1)

while ((number[j] = 0) ampamp ((number[j]j) lsquorsquoltlsquorsquo (number[i]i)))

critical section

number[i] = 0

remainder section

while (1)

1616

Mutual Exclusion - Hardware Support

Interrupt Disabling

ndash Concurrent threads cannot overlap on a uniprocessor

ndash Thread will run until performing a system call or interrupt happens

Special Atomic Machine Instructions

ndash Test and Set Instruction - read amp write a memory location

ndash Exchange Instruction - swap register and memory location

Problems with Machine-Instruction Approach

ndash Busy waiting

ndash Starvation is possible

ndash Deadlock is possible

1717

Synchronization Hardware

Test and modify the content of a word atomically

boolean TestAndSet(boolean amptarget)

boolean rv = target

target = true

return rv

1818

Shared data ndash boolean lock = false

Thread Ti

do

while (TestAndSet(lock))

critical section

lock = false

remainder section

Mutual Exclusion with Test-and-Set

1919

Synchronization Hardware

Atomically swap two variables

void Swap(boolean ampa boolean ampb)

boolean temp = a

a = b

b = temp

2020

Mutual Exclusion with Swap

Shared data (initialized to 0) int lock = 0

Thread Ti

int key

do

key = 1

while (key == 1) Swap(lockkey)

critical section

lock = 0

remainder section

2121

Semaphores

Semaphore S ndash integer variable

can only be accessed via two atomic operations

wait (S)

while (S lt= 0)S--

signal (S)

S++

2222

Critical Section of n Threads

Shared data

semaphore mutex initially mutex = 1

Thread Ti

do wait(mutex) critical section

signal(mutex) remainder section

while (1)

2323

Semaphore Implementation

Semaphores may suspendresume threads

ndash Avoid busy waiting

Define a semaphore as a record

typedef struct

int value struct thread L semaphore

Assume two simple operations

ndash suspend() suspends the thread that invokes it

ndash resume(T) resumes the execution of a blocked thread T

2424

Implementation

Semaphore operations now defined as wait(S)

Svalue--

if (Svalue lt 0)

add this thread to SLsuspend()

signal(S) Svalue++

if (Svalue lt= 0)

remove a thread T from SLresume(T)

2525

Semaphore as a General Synchronization Tool Execute B in Tj only after A executed in Ti

Use semaphore flag initialized to 0

Code

Ti Tj

A wait(flag)

signal(flag) B

2626

Two Types of Semaphores

Counting semaphore

ndash integer value can range over an unrestricted domain

Binary semaphore

ndash integer value can range only between 0 and 1

ndash can be simpler to implement

Counting semaphore S can be implemented as a binary semaphore

2727

Deadlock and Starvation

Deadlock ndash

ndash two or more threads are waiting indefinitely for an event that can be caused by only one of the waiting threads

Let S and Q be two semaphores initialized to 1

T0 T1

wait(S) wait(Q)

wait(Q) wait(S)

signal(S) signal(Q)

signal(Q) signal(S)

2828

Deadlock and Starvation

Starvation ndash indefinite blocking

ndash A thread may never be removed from the semaphore queue in which it is suspended

Solution ndash

ndash all code should acquirerelease semaphores in same order

2929

Windows Synchronization

Uses interrupt masks to protect access to global resources on uniprocessor systems

Uses spinlocks on multiprocessor systems

Provides dispatcher objects which may act as mutexes and semaphores

Dispatcher objects may also provide events An event acts much like a condition variable

3030

Linux Synchronization

Kernel disables interrupts for synchronizing access to global data on uniprocessor systems

Uses spinlocks for multiprocessor synchronization

Uses semaphores and readers-writers locks when longer sections of code need access to data

Implements POSIX synchronization primitives to support multitasking multithreading (including real-time threads) and multiprocessing

3131

Further Reading

Ben-Ari M Principles of Concurrent Programming Prentice Hall 1982

Lamport L The Mutual Exclusion Problem Journal of the ACM April 1986

Abraham Silberschatz Peter B Galvin Operating System Concepts John Wiley amp Sons 6th Ed 2003

ndash Chapter 7 - Process Synchronization

ndash Chapter 8 - Deadlocks

3232

32 Trap Dispatching Interrupts Synchronization Trap and Interrupt dispatching

IRQL levels amp Interrupt Precedence

Spinlocks and Kernel Synchronization

Executive Synchronization

3333

Kernel Mode Versus User Mode

A processor state

Controls access to memory

Each memory page is tagged to show the required mode for reading and for writing

ndash Protects the system from the users

ndash Protects the user (process) from themselves

ndash System is not protected from system

Code regions are tagged ldquono write in any moderdquo

Controls ability to execute privileged instructions

A Windows abstraction

ndash Intel Ring 0 Ring 3

3434

Kernel Mode Versus User Mode

Control flow (a thread) can change from user to kernel mode and back

ndash Does not affect scheduling

ndash Thread context includes info about execution mode (along with registers etc)

PerfMon counters

ndash ldquoPrivileged Timerdquo and ldquoUser Timerdquo

ndash 4 levels of granularity thread process processor system

3535

Getting Into Kernel Mode

Code is run in kernel mode for one of three reasons

1 Requests from user mode

ndash Via the system service dispatch mechanism

ndash Kernel-mode code runs in the context of the requesting thread

2 Dedicated kernel-mode system threads

ndash Some threads in the system stay in kernel mode at all timesmostly in the ldquoSystemrdquo process

ndash Scheduled preempted etc like any other threads

3636

Getting Into Kernel Mode

3 Interrupts from external devicesndash interrupt dispatcher invokes the interrupt service routine

ndash ISR runs in the context of the interrupted thread so-called ldquoarbitrary thread contextrdquo

ndash ISR often requests the execution of a ldquoDPC routinerdquo which also runs in kernel mode

ndash Time not charged to interrupted thread

3737

Trap dispatching

Interruptdispatcher

Systemservice

dispatcher

Interruptserviceroutines

Interruptserviceroutines

Interruptserviceroutines

System services

System services

System services

Exceptiondispatcher

Exceptionhandlers

Exceptionhandlers

Exceptionhandlers

Virtual memorymanagerlsquos pager

Interrupt

System service call

HW exceptionsSW exceptions

Virtual addressexceptions

Trap processorlsquos mechanism to capture executing thread

ndash Switch from user to kernel mode

ndash Interrupts ndash asynchronous

ndash Exceptions - synchronous

Interrupt dispatch routine

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Interrupt service routine

interrupt

user or kernel mode

codekernel mode

Note no thread or process context switch

Note no thread or process context switch

Interrupt Dispatching

3939

IRQL = Interrupt Request Level

ndash Precedence of the interrupt with respect to other interrupts

ndash Different interrupt sources have different IRQLs

ndash not the same as IRQ

IRQL is also a state of the processor

ndash Servicing an interrupt raises processor IRQL to that interruptrsquos IRQL

ndash this masks subsequent interrupts at equal and lower IRQLs

User mode is limited to IRQL 0

No waits or page faults at IRQL gt= DISPATCH_LEVEL

Interrupt Precedence via IRQLs (x86)

4040

PassiveLowAPC

DispatchDPCDevice 1

Profile amp Synch (Srv 2003)

ClockInterprocessor Interrupt

Power failHigh

normal thread execution

Hardware interrupts

Deferrable software interrupts

012

302928

31

Interrupt Precedence via IRQLs (x86)

4141

Interrupt processing

Interrupt dispatch table (IDT)ndash Links to interrupt service routines

x86ndash Interrupt controller interrupts processor (single line)

ndash Processor queries for interrupt vector uses vector as index to IDT

Alphandash PAL code (Privileged Architecture Library ndash Alpha BIOS) determines interrupt vector calls kernel

ndash Kernel uses vector to index IDT

After ISR execution IRQL is lowered to initial level

4242

Interrupt object

Allows device drivers to register ISRs for their devicesndash Contains dispatch code (initial handler)

ndash Dispatch code calls ISR with interrupt object as parameter(HW cannot pass parameters to ISR)

Connectingdisconnecting interrupt objectsndash Dynamic association between ISR and IDT entry

ndash Loadable device drivers (kernel modules)

ndash Turn onoff ISR

Interrupt objects can synchronize access to ISR datandash Multiple instances of ISR may be active simultaneously (MP machine)

ndash Multiple ISR may be connected with IRQL

4343

Predefined IRQLs

High ndash used when halting the system (via KeBugCheck())

Power fail ndash originated in the NT design document but has never been used

Inter-processor interruptndash used to request action from other processor (dispatching a thread updating a processors TLB system shutdown system crash)

Clockndash Used to update systemlsquos clock allocation of CPU time to threads

Profilendash Used for kernel profiling (see Kernel profiler ndash Kernprofexe Res Kit)

4444

Predefined IRQLs (contd)

Device

ndash Used to prioritize device interrupts

DPCdispatch and APC

ndash Software interrupts that kernel and device drivers generate

Passive

ndash No interrupt level at all normal thread execution

4545

IRQLs on 64-bit Systems

PassiveLowAPC

DispatchDPCDevice 1

Device n

Synch (Srv 2003)Clock

Interprocessor InterruptPower

HighProfile

012

1413

15

34

PassiveLowAPC

DispatchDPC amp Synch (UP only)Correctable Machine Check

Device 1

Device nSynch (MP only)

ClockInterprocessor Interrupt

HighProfilePower

x64 IA64

12

4646

Interrupt Prioritization amp Delivery

IRQLs are determined as followsndash x86 UP systems IRQL = 27 - IRQ

ndash x86 MP systems bucketized (random)

ndash x64 amp IA64 systems IRQL = IDT vector number 16

On MP systems which processor is chosen to deliver an interruptndash By default any processor can receive an interrupt from any deviceCan be configured with IntFilter utility in Resource Kit

ndash On x86 and x64 systems the IOAPIC (IO advanced programmable interrupt controller) is programmed to interrupt the processor running at the lowest IRQL

ndash On IA64 systems the SAPIC (streamlined advanced programmable interrupt controller) is configured to interrupt one processor for each interrupt sourceProcessors are assigned round robin for each interrupt vector

4747

Software interrupts

Initiating thread dispatching

ndash DPC allow for scheduling actions when kernel is deep within many layers of code

ndash Delayed scheduling decision one DPC queue per processor

Handling timer expiration

Asynchronous execution of a procedure in context of a particular thread

Support for asynchronous IO operations

4848

Flow of Interrupts

4949

Sync on MP use spinlocks to coordinate among processors

Spinlock acquisition and release routines implement a one-owner-at-a-time algorithm

ndash Spinlock is either free or is considered to be owned by a CPU

ndash Analogous to using Windows API mutexes from user mode

A spinlock is just a data cell in memory

ndash Accessed with a test-and-modify operation that is atomic across all processors

ndash KSPIN_LOCK is an opaque data type typedefrsquod as a ULONG

ndash To implement synchronization a single bit is sufficient

Synchronization on SMP Systems

5050

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

Kernel Synchronization

Processor BProcessor A

Critical section

spinlock

DPC DPC

A spinlock is a locking primitive associatedwith a global data structure such as the DPC queue

5151

Queued Spinlocks

Problem Checking status of spinlock via test-and-set operation creates bus contention

Queued spinlocks maintain queue of waiting processors

First processor acquires lock other processors wait on processor-local flag

ndash Thus busy-wait loop requires no access to the memory bus

When releasing lock the 1st processorrsquos flag is modified

ndash Exactly one processor is being signaled

ndash Pre-determined wait order

5252

SMP Scalability Improvements

Windows 2000 queued spinlocks

ndash qlocks in Kernel Debugger

Server 2003

ndash More spinlocks eliminated (context swap system space commit)

ndash Further reduction of use of spinlocks amp length they are held

ndash Scheduling database now per-CPUAllows thread state transitions in parallel

5353

SMP Scalability Improvements

XP2003

ndash Minimized lock contention for hot locks PFN or Page Frame Database lock

ndash Some locks completely eliminatedCharging nonpagedpaged pool quotas allocating and mapping system page table entries charging commitment of pages allocatingmapping physical memory through AWE functions

ndash New more efficient locking mechanism (pushlocks)Doesnrsquot use spinlocks when no contentionUsed for object manager and address windowing extensions (AWE) related locks

5454

Waiting

Flexible wait calls

ndash Wait for one or multiple objects in one call

ndash Wait for multiple can wait for ldquoanyrdquo one or ldquoallrdquo at onceldquoAllrdquo all objects must be in the signalled state concurrently to resolve the wait

ndash All wait calls include optional timeout argument

ndash Waiting threads consume no CPU time

5555

Waiting

Waitable objects include

ndash Events (may be auto-reset or manual reset may be set or ldquopulsedrdquo)

ndash Mutexes (ldquomutual exclusionrdquo one-at-a-time)

ndash Semaphores (n-at-a-time)

ndash Timers

ndash Processes and Threads (signalled upon exit or terminate)

ndash Directories (change notification)

5656

Waiting

No guaranteed ordering of wait resolution

ndash If multiple threads are waiting for an object and only one thread is released (eg itrsquos a mutex or auto-reset event) which thread gets released is unpredictable

5757

Executive Synchronization

Thread waitson an object

handle

Create and initialize thread object

Initialized

Ready

Transition

Waiting

Running

Terminated

Standby

Wait is completeSet object to

signaled state

Interaction with thread scheduling

Waiting on Dispatcher Objects ndash outside the kernel

5858

Interaction bet Synchronization amp Dispatching User mode thread waits on an event objectlsquos handle

Kernel changes threadlsquos scheduling state from ready to waiting and adds thread to wait-list

Another thread sets the event

Kernel wakes up waiting threads variable priority threads get priority boost

5959

Interaction bet Synchronization amp Dispatching Dispatcher re-schedules new thread ndash it may preempt running thread it it has lower priority and issues software interrupt to initiate context switch

If no processor can be preempted the dispatcher places the ready thread in the dispatcher ready queue to be scheduled later

6060

What signals an object

Dispatcher object

System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

Owning thread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (kernel mode)

nonsignaled signaled

Owning thread or otherthread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (exported to user mode)

nonsignaled signaled

One thread releases thesemaphore freeing a resource

A thread acquires the semaphoreMore resources are not available

Kernel resumes one or more waiting threads

Semaphore

6161

A thread reinitializesthe thread object

What signals an object (contd)

Dispatcher object System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

A thread sets the event

Kernel resumes one or more threads

Kernel resumes one or more waiting threads

Event

nonsignaled signaled

Dedicated thread sets oneevent in the event pair

Kernel resumes theother dedicated thread

Kernel resumes waitingdedicated thread

Event pair

nonsignaled signaled

Timer expires

A thread (re) initializes the timer

Kernel resumes all waiting threads

Timer

nonsignaled signaled

Thread terminates

Kernel resumes all waiting threads

Thread

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 13: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

1313

Bakery Algorithm (Lamport 1979)

A Solution to the Critical Section problem for n threads

Before entering its CS a thread receives a number

Holder of the smallest number enters the CS

If threads Ti and Tj receive the same number if i lt j then Ti is served first else Tj is served first

The numbering scheme generates numbers in monotonically non-decreasing order

ndash ie 1112333445

1414

Bakery Algorithm

Notation ldquoltldquo establishes lexicographical order among 2-tuples (ticket thread id )

(ab) lt (cd) if a lt c or if a == c and b lt d

max (a0hellip an-1) = k | k ai for i = 0hellip n ndash 1

Shared data

int choosing[n]

int number[n] - the ticket

Data structures are initialized to 0

1515

Bakery Algorithm

do

choosing[i] = 1

number[i] = max(number[0]number[1] number[n-1]) + 1

choosing[i] = 0

for (j = 0 j lt n j++)

while (choosing[j] == 1)

while ((number[j] = 0) ampamp ((number[j]j) lsquorsquoltlsquorsquo (number[i]i)))

critical section

number[i] = 0

remainder section

while (1)

1616

Mutual Exclusion - Hardware Support

Interrupt Disabling

ndash Concurrent threads cannot overlap on a uniprocessor

ndash Thread will run until performing a system call or interrupt happens

Special Atomic Machine Instructions

ndash Test and Set Instruction - read amp write a memory location

ndash Exchange Instruction - swap register and memory location

Problems with Machine-Instruction Approach

ndash Busy waiting

ndash Starvation is possible

ndash Deadlock is possible

1717

Synchronization Hardware

Test and modify the content of a word atomically

boolean TestAndSet(boolean amptarget)

boolean rv = target

target = true

return rv

1818

Shared data ndash boolean lock = false

Thread Ti

do

while (TestAndSet(lock))

critical section

lock = false

remainder section

Mutual Exclusion with Test-and-Set

1919

Synchronization Hardware

Atomically swap two variables

void Swap(boolean ampa boolean ampb)

boolean temp = a

a = b

b = temp

2020

Mutual Exclusion with Swap

Shared data (initialized to 0) int lock = 0

Thread Ti

int key

do

key = 1

while (key == 1) Swap(lockkey)

critical section

lock = 0

remainder section

2121

Semaphores

Semaphore S ndash integer variable

can only be accessed via two atomic operations

wait (S)

while (S lt= 0)S--

signal (S)

S++

2222

Critical Section of n Threads

Shared data

semaphore mutex initially mutex = 1

Thread Ti

do wait(mutex) critical section

signal(mutex) remainder section

while (1)

2323

Semaphore Implementation

Semaphores may suspendresume threads

ndash Avoid busy waiting

Define a semaphore as a record

typedef struct

int value struct thread L semaphore

Assume two simple operations

ndash suspend() suspends the thread that invokes it

ndash resume(T) resumes the execution of a blocked thread T

2424

Implementation

Semaphore operations now defined as wait(S)

Svalue--

if (Svalue lt 0)

add this thread to SLsuspend()

signal(S) Svalue++

if (Svalue lt= 0)

remove a thread T from SLresume(T)

2525

Semaphore as a General Synchronization Tool Execute B in Tj only after A executed in Ti

Use semaphore flag initialized to 0

Code

Ti Tj

A wait(flag)

signal(flag) B

2626

Two Types of Semaphores

Counting semaphore

ndash integer value can range over an unrestricted domain

Binary semaphore

ndash integer value can range only between 0 and 1

ndash can be simpler to implement

Counting semaphore S can be implemented as a binary semaphore

2727

Deadlock and Starvation

Deadlock ndash

ndash two or more threads are waiting indefinitely for an event that can be caused by only one of the waiting threads

Let S and Q be two semaphores initialized to 1

T0 T1

wait(S) wait(Q)

wait(Q) wait(S)

signal(S) signal(Q)

signal(Q) signal(S)

2828

Deadlock and Starvation

Starvation ndash indefinite blocking

ndash A thread may never be removed from the semaphore queue in which it is suspended

Solution ndash

ndash all code should acquirerelease semaphores in same order

2929

Windows Synchronization

Uses interrupt masks to protect access to global resources on uniprocessor systems

Uses spinlocks on multiprocessor systems

Provides dispatcher objects which may act as mutexes and semaphores

Dispatcher objects may also provide events An event acts much like a condition variable

3030

Linux Synchronization

Kernel disables interrupts for synchronizing access to global data on uniprocessor systems

Uses spinlocks for multiprocessor synchronization

Uses semaphores and readers-writers locks when longer sections of code need access to data

Implements POSIX synchronization primitives to support multitasking multithreading (including real-time threads) and multiprocessing

3131

Further Reading

Ben-Ari M Principles of Concurrent Programming Prentice Hall 1982

Lamport L The Mutual Exclusion Problem Journal of the ACM April 1986

Abraham Silberschatz Peter B Galvin Operating System Concepts John Wiley amp Sons 6th Ed 2003

ndash Chapter 7 - Process Synchronization

ndash Chapter 8 - Deadlocks

3232

32 Trap Dispatching Interrupts Synchronization Trap and Interrupt dispatching

IRQL levels amp Interrupt Precedence

Spinlocks and Kernel Synchronization

Executive Synchronization

3333

Kernel Mode Versus User Mode

A processor state

Controls access to memory

Each memory page is tagged to show the required mode for reading and for writing

ndash Protects the system from the users

ndash Protects the user (process) from themselves

ndash System is not protected from system

Code regions are tagged ldquono write in any moderdquo

Controls ability to execute privileged instructions

A Windows abstraction

ndash Intel Ring 0 Ring 3

3434

Kernel Mode Versus User Mode

Control flow (a thread) can change from user to kernel mode and back

ndash Does not affect scheduling

ndash Thread context includes info about execution mode (along with registers etc)

PerfMon counters

ndash ldquoPrivileged Timerdquo and ldquoUser Timerdquo

ndash 4 levels of granularity thread process processor system

3535

Getting Into Kernel Mode

Code is run in kernel mode for one of three reasons

1 Requests from user mode

ndash Via the system service dispatch mechanism

ndash Kernel-mode code runs in the context of the requesting thread

2 Dedicated kernel-mode system threads

ndash Some threads in the system stay in kernel mode at all timesmostly in the ldquoSystemrdquo process

ndash Scheduled preempted etc like any other threads

3636

Getting Into Kernel Mode

3 Interrupts from external devicesndash interrupt dispatcher invokes the interrupt service routine

ndash ISR runs in the context of the interrupted thread so-called ldquoarbitrary thread contextrdquo

ndash ISR often requests the execution of a ldquoDPC routinerdquo which also runs in kernel mode

ndash Time not charged to interrupted thread

3737

Trap dispatching

Interruptdispatcher

Systemservice

dispatcher

Interruptserviceroutines

Interruptserviceroutines

Interruptserviceroutines

System services

System services

System services

Exceptiondispatcher

Exceptionhandlers

Exceptionhandlers

Exceptionhandlers

Virtual memorymanagerlsquos pager

Interrupt

System service call

HW exceptionsSW exceptions

Virtual addressexceptions

Trap processorlsquos mechanism to capture executing thread

ndash Switch from user to kernel mode

ndash Interrupts ndash asynchronous

ndash Exceptions - synchronous

Interrupt dispatch routine

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Interrupt service routine

interrupt

user or kernel mode

codekernel mode

Note no thread or process context switch

Note no thread or process context switch

Interrupt Dispatching

3939

IRQL = Interrupt Request Level

ndash Precedence of the interrupt with respect to other interrupts

ndash Different interrupt sources have different IRQLs

ndash not the same as IRQ

IRQL is also a state of the processor

ndash Servicing an interrupt raises processor IRQL to that interruptrsquos IRQL

ndash this masks subsequent interrupts at equal and lower IRQLs

User mode is limited to IRQL 0

No waits or page faults at IRQL gt= DISPATCH_LEVEL

Interrupt Precedence via IRQLs (x86)

4040

PassiveLowAPC

DispatchDPCDevice 1

Profile amp Synch (Srv 2003)

ClockInterprocessor Interrupt

Power failHigh

normal thread execution

Hardware interrupts

Deferrable software interrupts

012

302928

31

Interrupt Precedence via IRQLs (x86)

4141

Interrupt processing

Interrupt dispatch table (IDT)ndash Links to interrupt service routines

x86ndash Interrupt controller interrupts processor (single line)

ndash Processor queries for interrupt vector uses vector as index to IDT

Alphandash PAL code (Privileged Architecture Library ndash Alpha BIOS) determines interrupt vector calls kernel

ndash Kernel uses vector to index IDT

After ISR execution IRQL is lowered to initial level

4242

Interrupt object

Allows device drivers to register ISRs for their devicesndash Contains dispatch code (initial handler)

ndash Dispatch code calls ISR with interrupt object as parameter(HW cannot pass parameters to ISR)

Connectingdisconnecting interrupt objectsndash Dynamic association between ISR and IDT entry

ndash Loadable device drivers (kernel modules)

ndash Turn onoff ISR

Interrupt objects can synchronize access to ISR datandash Multiple instances of ISR may be active simultaneously (MP machine)

ndash Multiple ISR may be connected with IRQL

4343

Predefined IRQLs

High ndash used when halting the system (via KeBugCheck())

Power fail ndash originated in the NT design document but has never been used

Inter-processor interruptndash used to request action from other processor (dispatching a thread updating a processors TLB system shutdown system crash)

Clockndash Used to update systemlsquos clock allocation of CPU time to threads

Profilendash Used for kernel profiling (see Kernel profiler ndash Kernprofexe Res Kit)

4444

Predefined IRQLs (contd)

Device

ndash Used to prioritize device interrupts

DPCdispatch and APC

ndash Software interrupts that kernel and device drivers generate

Passive

ndash No interrupt level at all normal thread execution

4545

IRQLs on 64-bit Systems

PassiveLowAPC

DispatchDPCDevice 1

Device n

Synch (Srv 2003)Clock

Interprocessor InterruptPower

HighProfile

012

1413

15

34

PassiveLowAPC

DispatchDPC amp Synch (UP only)Correctable Machine Check

Device 1

Device nSynch (MP only)

ClockInterprocessor Interrupt

HighProfilePower

x64 IA64

12

4646

Interrupt Prioritization amp Delivery

IRQLs are determined as followsndash x86 UP systems IRQL = 27 - IRQ

ndash x86 MP systems bucketized (random)

ndash x64 amp IA64 systems IRQL = IDT vector number 16

On MP systems which processor is chosen to deliver an interruptndash By default any processor can receive an interrupt from any deviceCan be configured with IntFilter utility in Resource Kit

ndash On x86 and x64 systems the IOAPIC (IO advanced programmable interrupt controller) is programmed to interrupt the processor running at the lowest IRQL

ndash On IA64 systems the SAPIC (streamlined advanced programmable interrupt controller) is configured to interrupt one processor for each interrupt sourceProcessors are assigned round robin for each interrupt vector

4747

Software interrupts

Initiating thread dispatching

ndash DPC allow for scheduling actions when kernel is deep within many layers of code

ndash Delayed scheduling decision one DPC queue per processor

Handling timer expiration

Asynchronous execution of a procedure in context of a particular thread

Support for asynchronous IO operations

4848

Flow of Interrupts

4949

Sync on MP use spinlocks to coordinate among processors

Spinlock acquisition and release routines implement a one-owner-at-a-time algorithm

ndash Spinlock is either free or is considered to be owned by a CPU

ndash Analogous to using Windows API mutexes from user mode

A spinlock is just a data cell in memory

ndash Accessed with a test-and-modify operation that is atomic across all processors

ndash KSPIN_LOCK is an opaque data type typedefrsquod as a ULONG

ndash To implement synchronization a single bit is sufficient

Synchronization on SMP Systems

5050

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

Kernel Synchronization

Processor BProcessor A

Critical section

spinlock

DPC DPC

A spinlock is a locking primitive associatedwith a global data structure such as the DPC queue

5151

Queued Spinlocks

Problem Checking status of spinlock via test-and-set operation creates bus contention

Queued spinlocks maintain queue of waiting processors

First processor acquires lock other processors wait on processor-local flag

ndash Thus busy-wait loop requires no access to the memory bus

When releasing lock the 1st processorrsquos flag is modified

ndash Exactly one processor is being signaled

ndash Pre-determined wait order

5252

SMP Scalability Improvements

Windows 2000 queued spinlocks

ndash qlocks in Kernel Debugger

Server 2003

ndash More spinlocks eliminated (context swap system space commit)

ndash Further reduction of use of spinlocks amp length they are held

ndash Scheduling database now per-CPUAllows thread state transitions in parallel

5353

SMP Scalability Improvements

XP2003

ndash Minimized lock contention for hot locks PFN or Page Frame Database lock

ndash Some locks completely eliminatedCharging nonpagedpaged pool quotas allocating and mapping system page table entries charging commitment of pages allocatingmapping physical memory through AWE functions

ndash New more efficient locking mechanism (pushlocks)Doesnrsquot use spinlocks when no contentionUsed for object manager and address windowing extensions (AWE) related locks

5454

Waiting

Flexible wait calls

ndash Wait for one or multiple objects in one call

ndash Wait for multiple can wait for ldquoanyrdquo one or ldquoallrdquo at onceldquoAllrdquo all objects must be in the signalled state concurrently to resolve the wait

ndash All wait calls include optional timeout argument

ndash Waiting threads consume no CPU time

5555

Waiting

Waitable objects include

ndash Events (may be auto-reset or manual reset may be set or ldquopulsedrdquo)

ndash Mutexes (ldquomutual exclusionrdquo one-at-a-time)

ndash Semaphores (n-at-a-time)

ndash Timers

ndash Processes and Threads (signalled upon exit or terminate)

ndash Directories (change notification)

5656

Waiting

No guaranteed ordering of wait resolution

ndash If multiple threads are waiting for an object and only one thread is released (eg itrsquos a mutex or auto-reset event) which thread gets released is unpredictable

5757

Executive Synchronization

Thread waitson an object

handle

Create and initialize thread object

Initialized

Ready

Transition

Waiting

Running

Terminated

Standby

Wait is completeSet object to

signaled state

Interaction with thread scheduling

Waiting on Dispatcher Objects ndash outside the kernel

5858

Interaction bet Synchronization amp Dispatching User mode thread waits on an event objectlsquos handle

Kernel changes threadlsquos scheduling state from ready to waiting and adds thread to wait-list

Another thread sets the event

Kernel wakes up waiting threads variable priority threads get priority boost

5959

Interaction bet Synchronization amp Dispatching Dispatcher re-schedules new thread ndash it may preempt running thread it it has lower priority and issues software interrupt to initiate context switch

If no processor can be preempted the dispatcher places the ready thread in the dispatcher ready queue to be scheduled later

6060

What signals an object

Dispatcher object

System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

Owning thread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (kernel mode)

nonsignaled signaled

Owning thread or otherthread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (exported to user mode)

nonsignaled signaled

One thread releases thesemaphore freeing a resource

A thread acquires the semaphoreMore resources are not available

Kernel resumes one or more waiting threads

Semaphore

6161

A thread reinitializesthe thread object

What signals an object (contd)

Dispatcher object System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

A thread sets the event

Kernel resumes one or more threads

Kernel resumes one or more waiting threads

Event

nonsignaled signaled

Dedicated thread sets oneevent in the event pair

Kernel resumes theother dedicated thread

Kernel resumes waitingdedicated thread

Event pair

nonsignaled signaled

Timer expires

A thread (re) initializes the timer

Kernel resumes all waiting threads

Timer

nonsignaled signaled

Thread terminates

Kernel resumes all waiting threads

Thread

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 14: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

1414

Bakery Algorithm

Notation ldquoltldquo establishes lexicographical order among 2-tuples (ticket thread id )

(ab) lt (cd) if a lt c or if a == c and b lt d

max (a0hellip an-1) = k | k ai for i = 0hellip n ndash 1

Shared data

int choosing[n]

int number[n] - the ticket

Data structures are initialized to 0

1515

Bakery Algorithm

do

choosing[i] = 1

number[i] = max(number[0]number[1] number[n-1]) + 1

choosing[i] = 0

for (j = 0 j lt n j++)

while (choosing[j] == 1)

while ((number[j] = 0) ampamp ((number[j]j) lsquorsquoltlsquorsquo (number[i]i)))

critical section

number[i] = 0

remainder section

while (1)

1616

Mutual Exclusion - Hardware Support

Interrupt Disabling

ndash Concurrent threads cannot overlap on a uniprocessor

ndash Thread will run until performing a system call or interrupt happens

Special Atomic Machine Instructions

ndash Test and Set Instruction - read amp write a memory location

ndash Exchange Instruction - swap register and memory location

Problems with Machine-Instruction Approach

ndash Busy waiting

ndash Starvation is possible

ndash Deadlock is possible

1717

Synchronization Hardware

Test and modify the content of a word atomically

boolean TestAndSet(boolean amptarget)

boolean rv = target

target = true

return rv

1818

Shared data ndash boolean lock = false

Thread Ti

do

while (TestAndSet(lock))

critical section

lock = false

remainder section

Mutual Exclusion with Test-and-Set

1919

Synchronization Hardware

Atomically swap two variables

void Swap(boolean ampa boolean ampb)

boolean temp = a

a = b

b = temp

2020

Mutual Exclusion with Swap

Shared data (initialized to 0) int lock = 0

Thread Ti

int key

do

key = 1

while (key == 1) Swap(lockkey)

critical section

lock = 0

remainder section

2121

Semaphores

Semaphore S ndash integer variable

can only be accessed via two atomic operations

wait (S)

while (S lt= 0)S--

signal (S)

S++

2222

Critical Section of n Threads

Shared data

semaphore mutex initially mutex = 1

Thread Ti

do wait(mutex) critical section

signal(mutex) remainder section

while (1)

2323

Semaphore Implementation

Semaphores may suspendresume threads

ndash Avoid busy waiting

Define a semaphore as a record

typedef struct

int value struct thread L semaphore

Assume two simple operations

ndash suspend() suspends the thread that invokes it

ndash resume(T) resumes the execution of a blocked thread T

2424

Implementation

Semaphore operations now defined as wait(S)

Svalue--

if (Svalue lt 0)

add this thread to SLsuspend()

signal(S) Svalue++

if (Svalue lt= 0)

remove a thread T from SLresume(T)

2525

Semaphore as a General Synchronization Tool Execute B in Tj only after A executed in Ti

Use semaphore flag initialized to 0

Code

Ti Tj

A wait(flag)

signal(flag) B

2626

Two Types of Semaphores

Counting semaphore

ndash integer value can range over an unrestricted domain

Binary semaphore

ndash integer value can range only between 0 and 1

ndash can be simpler to implement

Counting semaphore S can be implemented as a binary semaphore

2727

Deadlock and Starvation

Deadlock ndash

ndash two or more threads are waiting indefinitely for an event that can be caused by only one of the waiting threads

Let S and Q be two semaphores initialized to 1

T0 T1

wait(S) wait(Q)

wait(Q) wait(S)

signal(S) signal(Q)

signal(Q) signal(S)

2828

Deadlock and Starvation

Starvation ndash indefinite blocking

ndash A thread may never be removed from the semaphore queue in which it is suspended

Solution ndash

ndash all code should acquirerelease semaphores in same order

2929

Windows Synchronization

Uses interrupt masks to protect access to global resources on uniprocessor systems

Uses spinlocks on multiprocessor systems

Provides dispatcher objects which may act as mutexes and semaphores

Dispatcher objects may also provide events An event acts much like a condition variable

3030

Linux Synchronization

Kernel disables interrupts for synchronizing access to global data on uniprocessor systems

Uses spinlocks for multiprocessor synchronization

Uses semaphores and readers-writers locks when longer sections of code need access to data

Implements POSIX synchronization primitives to support multitasking multithreading (including real-time threads) and multiprocessing

3131

Further Reading

Ben-Ari M Principles of Concurrent Programming Prentice Hall 1982

Lamport L The Mutual Exclusion Problem Journal of the ACM April 1986

Abraham Silberschatz Peter B Galvin Operating System Concepts John Wiley amp Sons 6th Ed 2003

ndash Chapter 7 - Process Synchronization

ndash Chapter 8 - Deadlocks

3232

32 Trap Dispatching Interrupts Synchronization Trap and Interrupt dispatching

IRQL levels amp Interrupt Precedence

Spinlocks and Kernel Synchronization

Executive Synchronization

3333

Kernel Mode Versus User Mode

A processor state

Controls access to memory

Each memory page is tagged to show the required mode for reading and for writing

ndash Protects the system from the users

ndash Protects the user (process) from themselves

ndash System is not protected from system

Code regions are tagged ldquono write in any moderdquo

Controls ability to execute privileged instructions

A Windows abstraction

ndash Intel Ring 0 Ring 3

3434

Kernel Mode Versus User Mode

Control flow (a thread) can change from user to kernel mode and back

ndash Does not affect scheduling

ndash Thread context includes info about execution mode (along with registers etc)

PerfMon counters

ndash ldquoPrivileged Timerdquo and ldquoUser Timerdquo

ndash 4 levels of granularity thread process processor system

3535

Getting Into Kernel Mode

Code is run in kernel mode for one of three reasons

1 Requests from user mode

ndash Via the system service dispatch mechanism

ndash Kernel-mode code runs in the context of the requesting thread

2 Dedicated kernel-mode system threads

ndash Some threads in the system stay in kernel mode at all timesmostly in the ldquoSystemrdquo process

ndash Scheduled preempted etc like any other threads

3636

Getting Into Kernel Mode

3 Interrupts from external devicesndash interrupt dispatcher invokes the interrupt service routine

ndash ISR runs in the context of the interrupted thread so-called ldquoarbitrary thread contextrdquo

ndash ISR often requests the execution of a ldquoDPC routinerdquo which also runs in kernel mode

ndash Time not charged to interrupted thread

3737

Trap dispatching

Interruptdispatcher

Systemservice

dispatcher

Interruptserviceroutines

Interruptserviceroutines

Interruptserviceroutines

System services

System services

System services

Exceptiondispatcher

Exceptionhandlers

Exceptionhandlers

Exceptionhandlers

Virtual memorymanagerlsquos pager

Interrupt

System service call

HW exceptionsSW exceptions

Virtual addressexceptions

Trap processorlsquos mechanism to capture executing thread

ndash Switch from user to kernel mode

ndash Interrupts ndash asynchronous

ndash Exceptions - synchronous

Interrupt dispatch routine

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Interrupt service routine

interrupt

user or kernel mode

codekernel mode

Note no thread or process context switch

Note no thread or process context switch

Interrupt Dispatching

3939

IRQL = Interrupt Request Level

ndash Precedence of the interrupt with respect to other interrupts

ndash Different interrupt sources have different IRQLs

ndash not the same as IRQ

IRQL is also a state of the processor

ndash Servicing an interrupt raises processor IRQL to that interruptrsquos IRQL

ndash this masks subsequent interrupts at equal and lower IRQLs

User mode is limited to IRQL 0

No waits or page faults at IRQL gt= DISPATCH_LEVEL

Interrupt Precedence via IRQLs (x86)

4040

PassiveLowAPC

DispatchDPCDevice 1

Profile amp Synch (Srv 2003)

ClockInterprocessor Interrupt

Power failHigh

normal thread execution

Hardware interrupts

Deferrable software interrupts

012

302928

31

Interrupt Precedence via IRQLs (x86)

4141

Interrupt processing

Interrupt dispatch table (IDT)ndash Links to interrupt service routines

x86ndash Interrupt controller interrupts processor (single line)

ndash Processor queries for interrupt vector uses vector as index to IDT

Alphandash PAL code (Privileged Architecture Library ndash Alpha BIOS) determines interrupt vector calls kernel

ndash Kernel uses vector to index IDT

After ISR execution IRQL is lowered to initial level

4242

Interrupt object

Allows device drivers to register ISRs for their devicesndash Contains dispatch code (initial handler)

ndash Dispatch code calls ISR with interrupt object as parameter(HW cannot pass parameters to ISR)

Connectingdisconnecting interrupt objectsndash Dynamic association between ISR and IDT entry

ndash Loadable device drivers (kernel modules)

ndash Turn onoff ISR

Interrupt objects can synchronize access to ISR datandash Multiple instances of ISR may be active simultaneously (MP machine)

ndash Multiple ISR may be connected with IRQL

4343

Predefined IRQLs

High ndash used when halting the system (via KeBugCheck())

Power fail ndash originated in the NT design document but has never been used

Inter-processor interruptndash used to request action from other processor (dispatching a thread updating a processors TLB system shutdown system crash)

Clockndash Used to update systemlsquos clock allocation of CPU time to threads

Profilendash Used for kernel profiling (see Kernel profiler ndash Kernprofexe Res Kit)

4444

Predefined IRQLs (contd)

Device

ndash Used to prioritize device interrupts

DPCdispatch and APC

ndash Software interrupts that kernel and device drivers generate

Passive

ndash No interrupt level at all normal thread execution

4545

IRQLs on 64-bit Systems

PassiveLowAPC

DispatchDPCDevice 1

Device n

Synch (Srv 2003)Clock

Interprocessor InterruptPower

HighProfile

012

1413

15

34

PassiveLowAPC

DispatchDPC amp Synch (UP only)Correctable Machine Check

Device 1

Device nSynch (MP only)

ClockInterprocessor Interrupt

HighProfilePower

x64 IA64

12

4646

Interrupt Prioritization amp Delivery

IRQLs are determined as followsndash x86 UP systems IRQL = 27 - IRQ

ndash x86 MP systems bucketized (random)

ndash x64 amp IA64 systems IRQL = IDT vector number 16

On MP systems which processor is chosen to deliver an interruptndash By default any processor can receive an interrupt from any deviceCan be configured with IntFilter utility in Resource Kit

ndash On x86 and x64 systems the IOAPIC (IO advanced programmable interrupt controller) is programmed to interrupt the processor running at the lowest IRQL

ndash On IA64 systems the SAPIC (streamlined advanced programmable interrupt controller) is configured to interrupt one processor for each interrupt sourceProcessors are assigned round robin for each interrupt vector

4747

Software interrupts

Initiating thread dispatching

ndash DPC allow for scheduling actions when kernel is deep within many layers of code

ndash Delayed scheduling decision one DPC queue per processor

Handling timer expiration

Asynchronous execution of a procedure in context of a particular thread

Support for asynchronous IO operations

4848

Flow of Interrupts

4949

Sync on MP use spinlocks to coordinate among processors

Spinlock acquisition and release routines implement a one-owner-at-a-time algorithm

ndash Spinlock is either free or is considered to be owned by a CPU

ndash Analogous to using Windows API mutexes from user mode

A spinlock is just a data cell in memory

ndash Accessed with a test-and-modify operation that is atomic across all processors

ndash KSPIN_LOCK is an opaque data type typedefrsquod as a ULONG

ndash To implement synchronization a single bit is sufficient

Synchronization on SMP Systems

5050

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

Kernel Synchronization

Processor BProcessor A

Critical section

spinlock

DPC DPC

A spinlock is a locking primitive associatedwith a global data structure such as the DPC queue

5151

Queued Spinlocks

Problem Checking status of spinlock via test-and-set operation creates bus contention

Queued spinlocks maintain queue of waiting processors

First processor acquires lock other processors wait on processor-local flag

ndash Thus busy-wait loop requires no access to the memory bus

When releasing lock the 1st processorrsquos flag is modified

ndash Exactly one processor is being signaled

ndash Pre-determined wait order

5252

SMP Scalability Improvements

Windows 2000 queued spinlocks

ndash qlocks in Kernel Debugger

Server 2003

ndash More spinlocks eliminated (context swap system space commit)

ndash Further reduction of use of spinlocks amp length they are held

ndash Scheduling database now per-CPUAllows thread state transitions in parallel

5353

SMP Scalability Improvements

XP2003

ndash Minimized lock contention for hot locks PFN or Page Frame Database lock

ndash Some locks completely eliminatedCharging nonpagedpaged pool quotas allocating and mapping system page table entries charging commitment of pages allocatingmapping physical memory through AWE functions

ndash New more efficient locking mechanism (pushlocks)Doesnrsquot use spinlocks when no contentionUsed for object manager and address windowing extensions (AWE) related locks

5454

Waiting

Flexible wait calls

ndash Wait for one or multiple objects in one call

ndash Wait for multiple can wait for ldquoanyrdquo one or ldquoallrdquo at onceldquoAllrdquo all objects must be in the signalled state concurrently to resolve the wait

ndash All wait calls include optional timeout argument

ndash Waiting threads consume no CPU time

5555

Waiting

Waitable objects include

ndash Events (may be auto-reset or manual reset may be set or ldquopulsedrdquo)

ndash Mutexes (ldquomutual exclusionrdquo one-at-a-time)

ndash Semaphores (n-at-a-time)

ndash Timers

ndash Processes and Threads (signalled upon exit or terminate)

ndash Directories (change notification)

5656

Waiting

No guaranteed ordering of wait resolution

ndash If multiple threads are waiting for an object and only one thread is released (eg itrsquos a mutex or auto-reset event) which thread gets released is unpredictable

5757

Executive Synchronization

Thread waitson an object

handle

Create and initialize thread object

Initialized

Ready

Transition

Waiting

Running

Terminated

Standby

Wait is completeSet object to

signaled state

Interaction with thread scheduling

Waiting on Dispatcher Objects ndash outside the kernel

5858

Interaction bet Synchronization amp Dispatching User mode thread waits on an event objectlsquos handle

Kernel changes threadlsquos scheduling state from ready to waiting and adds thread to wait-list

Another thread sets the event

Kernel wakes up waiting threads variable priority threads get priority boost

5959

Interaction bet Synchronization amp Dispatching Dispatcher re-schedules new thread ndash it may preempt running thread it it has lower priority and issues software interrupt to initiate context switch

If no processor can be preempted the dispatcher places the ready thread in the dispatcher ready queue to be scheduled later

6060

What signals an object

Dispatcher object

System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

Owning thread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (kernel mode)

nonsignaled signaled

Owning thread or otherthread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (exported to user mode)

nonsignaled signaled

One thread releases thesemaphore freeing a resource

A thread acquires the semaphoreMore resources are not available

Kernel resumes one or more waiting threads

Semaphore

6161

A thread reinitializesthe thread object

What signals an object (contd)

Dispatcher object System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

A thread sets the event

Kernel resumes one or more threads

Kernel resumes one or more waiting threads

Event

nonsignaled signaled

Dedicated thread sets oneevent in the event pair

Kernel resumes theother dedicated thread

Kernel resumes waitingdedicated thread

Event pair

nonsignaled signaled

Timer expires

A thread (re) initializes the timer

Kernel resumes all waiting threads

Timer

nonsignaled signaled

Thread terminates

Kernel resumes all waiting threads

Thread

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 15: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

1515

Bakery Algorithm

do

choosing[i] = 1

number[i] = max(number[0]number[1] number[n-1]) + 1

choosing[i] = 0

for (j = 0 j lt n j++)

while (choosing[j] == 1)

while ((number[j] = 0) ampamp ((number[j]j) lsquorsquoltlsquorsquo (number[i]i)))

critical section

number[i] = 0

remainder section

while (1)

1616

Mutual Exclusion - Hardware Support

Interrupt Disabling

ndash Concurrent threads cannot overlap on a uniprocessor

ndash Thread will run until performing a system call or interrupt happens

Special Atomic Machine Instructions

ndash Test and Set Instruction - read amp write a memory location

ndash Exchange Instruction - swap register and memory location

Problems with Machine-Instruction Approach

ndash Busy waiting

ndash Starvation is possible

ndash Deadlock is possible

1717

Synchronization Hardware

Test and modify the content of a word atomically

boolean TestAndSet(boolean amptarget)

boolean rv = target

target = true

return rv

1818

Shared data ndash boolean lock = false

Thread Ti

do

while (TestAndSet(lock))

critical section

lock = false

remainder section

Mutual Exclusion with Test-and-Set

1919

Synchronization Hardware

Atomically swap two variables

void Swap(boolean ampa boolean ampb)

boolean temp = a

a = b

b = temp

2020

Mutual Exclusion with Swap

Shared data (initialized to 0) int lock = 0

Thread Ti

int key

do

key = 1

while (key == 1) Swap(lockkey)

critical section

lock = 0

remainder section

2121

Semaphores

Semaphore S ndash integer variable

can only be accessed via two atomic operations

wait (S)

while (S lt= 0)S--

signal (S)

S++

2222

Critical Section of n Threads

Shared data

semaphore mutex initially mutex = 1

Thread Ti

do wait(mutex) critical section

signal(mutex) remainder section

while (1)

2323

Semaphore Implementation

Semaphores may suspendresume threads

ndash Avoid busy waiting

Define a semaphore as a record

typedef struct

int value struct thread L semaphore

Assume two simple operations

ndash suspend() suspends the thread that invokes it

ndash resume(T) resumes the execution of a blocked thread T

2424

Implementation

Semaphore operations now defined as wait(S)

Svalue--

if (Svalue lt 0)

add this thread to SLsuspend()

signal(S) Svalue++

if (Svalue lt= 0)

remove a thread T from SLresume(T)

2525

Semaphore as a General Synchronization Tool Execute B in Tj only after A executed in Ti

Use semaphore flag initialized to 0

Code

Ti Tj

A wait(flag)

signal(flag) B

2626

Two Types of Semaphores

Counting semaphore

ndash integer value can range over an unrestricted domain

Binary semaphore

ndash integer value can range only between 0 and 1

ndash can be simpler to implement

Counting semaphore S can be implemented as a binary semaphore

2727

Deadlock and Starvation

Deadlock ndash

ndash two or more threads are waiting indefinitely for an event that can be caused by only one of the waiting threads

Let S and Q be two semaphores initialized to 1

T0 T1

wait(S) wait(Q)

wait(Q) wait(S)

signal(S) signal(Q)

signal(Q) signal(S)

2828

Deadlock and Starvation

Starvation ndash indefinite blocking

ndash A thread may never be removed from the semaphore queue in which it is suspended

Solution ndash

ndash all code should acquirerelease semaphores in same order

2929

Windows Synchronization

Uses interrupt masks to protect access to global resources on uniprocessor systems

Uses spinlocks on multiprocessor systems

Provides dispatcher objects which may act as mutexes and semaphores

Dispatcher objects may also provide events An event acts much like a condition variable

3030

Linux Synchronization

Kernel disables interrupts for synchronizing access to global data on uniprocessor systems

Uses spinlocks for multiprocessor synchronization

Uses semaphores and readers-writers locks when longer sections of code need access to data

Implements POSIX synchronization primitives to support multitasking multithreading (including real-time threads) and multiprocessing

3131

Further Reading

Ben-Ari M Principles of Concurrent Programming Prentice Hall 1982

Lamport L The Mutual Exclusion Problem Journal of the ACM April 1986

Abraham Silberschatz Peter B Galvin Operating System Concepts John Wiley amp Sons 6th Ed 2003

ndash Chapter 7 - Process Synchronization

ndash Chapter 8 - Deadlocks

3232

32 Trap Dispatching Interrupts Synchronization Trap and Interrupt dispatching

IRQL levels amp Interrupt Precedence

Spinlocks and Kernel Synchronization

Executive Synchronization

3333

Kernel Mode Versus User Mode

A processor state

Controls access to memory

Each memory page is tagged to show the required mode for reading and for writing

ndash Protects the system from the users

ndash Protects the user (process) from themselves

ndash System is not protected from system

Code regions are tagged ldquono write in any moderdquo

Controls ability to execute privileged instructions

A Windows abstraction

ndash Intel Ring 0 Ring 3

3434

Kernel Mode Versus User Mode

Control flow (a thread) can change from user to kernel mode and back

ndash Does not affect scheduling

ndash Thread context includes info about execution mode (along with registers etc)

PerfMon counters

ndash ldquoPrivileged Timerdquo and ldquoUser Timerdquo

ndash 4 levels of granularity thread process processor system

3535

Getting Into Kernel Mode

Code is run in kernel mode for one of three reasons

1 Requests from user mode

ndash Via the system service dispatch mechanism

ndash Kernel-mode code runs in the context of the requesting thread

2 Dedicated kernel-mode system threads

ndash Some threads in the system stay in kernel mode at all timesmostly in the ldquoSystemrdquo process

ndash Scheduled preempted etc like any other threads

3636

Getting Into Kernel Mode

3 Interrupts from external devicesndash interrupt dispatcher invokes the interrupt service routine

ndash ISR runs in the context of the interrupted thread so-called ldquoarbitrary thread contextrdquo

ndash ISR often requests the execution of a ldquoDPC routinerdquo which also runs in kernel mode

ndash Time not charged to interrupted thread

3737

Trap dispatching

Interruptdispatcher

Systemservice

dispatcher

Interruptserviceroutines

Interruptserviceroutines

Interruptserviceroutines

System services

System services

System services

Exceptiondispatcher

Exceptionhandlers

Exceptionhandlers

Exceptionhandlers

Virtual memorymanagerlsquos pager

Interrupt

System service call

HW exceptionsSW exceptions

Virtual addressexceptions

Trap processorlsquos mechanism to capture executing thread

ndash Switch from user to kernel mode

ndash Interrupts ndash asynchronous

ndash Exceptions - synchronous

Interrupt dispatch routine

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Interrupt service routine

interrupt

user or kernel mode

codekernel mode

Note no thread or process context switch

Note no thread or process context switch

Interrupt Dispatching

3939

IRQL = Interrupt Request Level

ndash Precedence of the interrupt with respect to other interrupts

ndash Different interrupt sources have different IRQLs

ndash not the same as IRQ

IRQL is also a state of the processor

ndash Servicing an interrupt raises processor IRQL to that interruptrsquos IRQL

ndash this masks subsequent interrupts at equal and lower IRQLs

User mode is limited to IRQL 0

No waits or page faults at IRQL gt= DISPATCH_LEVEL

Interrupt Precedence via IRQLs (x86)

4040

PassiveLowAPC

DispatchDPCDevice 1

Profile amp Synch (Srv 2003)

ClockInterprocessor Interrupt

Power failHigh

normal thread execution

Hardware interrupts

Deferrable software interrupts

012

302928

31

Interrupt Precedence via IRQLs (x86)

4141

Interrupt processing

Interrupt dispatch table (IDT)ndash Links to interrupt service routines

x86ndash Interrupt controller interrupts processor (single line)

ndash Processor queries for interrupt vector uses vector as index to IDT

Alphandash PAL code (Privileged Architecture Library ndash Alpha BIOS) determines interrupt vector calls kernel

ndash Kernel uses vector to index IDT

After ISR execution IRQL is lowered to initial level

4242

Interrupt object

Allows device drivers to register ISRs for their devicesndash Contains dispatch code (initial handler)

ndash Dispatch code calls ISR with interrupt object as parameter(HW cannot pass parameters to ISR)

Connectingdisconnecting interrupt objectsndash Dynamic association between ISR and IDT entry

ndash Loadable device drivers (kernel modules)

ndash Turn onoff ISR

Interrupt objects can synchronize access to ISR datandash Multiple instances of ISR may be active simultaneously (MP machine)

ndash Multiple ISR may be connected with IRQL

4343

Predefined IRQLs

High ndash used when halting the system (via KeBugCheck())

Power fail ndash originated in the NT design document but has never been used

Inter-processor interruptndash used to request action from other processor (dispatching a thread updating a processors TLB system shutdown system crash)

Clockndash Used to update systemlsquos clock allocation of CPU time to threads

Profilendash Used for kernel profiling (see Kernel profiler ndash Kernprofexe Res Kit)

4444

Predefined IRQLs (contd)

Device

ndash Used to prioritize device interrupts

DPCdispatch and APC

ndash Software interrupts that kernel and device drivers generate

Passive

ndash No interrupt level at all normal thread execution

4545

IRQLs on 64-bit Systems

PassiveLowAPC

DispatchDPCDevice 1

Device n

Synch (Srv 2003)Clock

Interprocessor InterruptPower

HighProfile

012

1413

15

34

PassiveLowAPC

DispatchDPC amp Synch (UP only)Correctable Machine Check

Device 1

Device nSynch (MP only)

ClockInterprocessor Interrupt

HighProfilePower

x64 IA64

12

4646

Interrupt Prioritization amp Delivery

IRQLs are determined as followsndash x86 UP systems IRQL = 27 - IRQ

ndash x86 MP systems bucketized (random)

ndash x64 amp IA64 systems IRQL = IDT vector number 16

On MP systems which processor is chosen to deliver an interruptndash By default any processor can receive an interrupt from any deviceCan be configured with IntFilter utility in Resource Kit

ndash On x86 and x64 systems the IOAPIC (IO advanced programmable interrupt controller) is programmed to interrupt the processor running at the lowest IRQL

ndash On IA64 systems the SAPIC (streamlined advanced programmable interrupt controller) is configured to interrupt one processor for each interrupt sourceProcessors are assigned round robin for each interrupt vector

4747

Software interrupts

Initiating thread dispatching

ndash DPC allow for scheduling actions when kernel is deep within many layers of code

ndash Delayed scheduling decision one DPC queue per processor

Handling timer expiration

Asynchronous execution of a procedure in context of a particular thread

Support for asynchronous IO operations

4848

Flow of Interrupts

4949

Sync on MP use spinlocks to coordinate among processors

Spinlock acquisition and release routines implement a one-owner-at-a-time algorithm

ndash Spinlock is either free or is considered to be owned by a CPU

ndash Analogous to using Windows API mutexes from user mode

A spinlock is just a data cell in memory

ndash Accessed with a test-and-modify operation that is atomic across all processors

ndash KSPIN_LOCK is an opaque data type typedefrsquod as a ULONG

ndash To implement synchronization a single bit is sufficient

Synchronization on SMP Systems

5050

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

Kernel Synchronization

Processor BProcessor A

Critical section

spinlock

DPC DPC

A spinlock is a locking primitive associatedwith a global data structure such as the DPC queue

5151

Queued Spinlocks

Problem Checking status of spinlock via test-and-set operation creates bus contention

Queued spinlocks maintain queue of waiting processors

First processor acquires lock other processors wait on processor-local flag

ndash Thus busy-wait loop requires no access to the memory bus

When releasing lock the 1st processorrsquos flag is modified

ndash Exactly one processor is being signaled

ndash Pre-determined wait order

5252

SMP Scalability Improvements

Windows 2000 queued spinlocks

ndash qlocks in Kernel Debugger

Server 2003

ndash More spinlocks eliminated (context swap system space commit)

ndash Further reduction of use of spinlocks amp length they are held

ndash Scheduling database now per-CPUAllows thread state transitions in parallel

5353

SMP Scalability Improvements

XP2003

ndash Minimized lock contention for hot locks PFN or Page Frame Database lock

ndash Some locks completely eliminatedCharging nonpagedpaged pool quotas allocating and mapping system page table entries charging commitment of pages allocatingmapping physical memory through AWE functions

ndash New more efficient locking mechanism (pushlocks)Doesnrsquot use spinlocks when no contentionUsed for object manager and address windowing extensions (AWE) related locks

5454

Waiting

Flexible wait calls

ndash Wait for one or multiple objects in one call

ndash Wait for multiple can wait for ldquoanyrdquo one or ldquoallrdquo at onceldquoAllrdquo all objects must be in the signalled state concurrently to resolve the wait

ndash All wait calls include optional timeout argument

ndash Waiting threads consume no CPU time

5555

Waiting

Waitable objects include

ndash Events (may be auto-reset or manual reset may be set or ldquopulsedrdquo)

ndash Mutexes (ldquomutual exclusionrdquo one-at-a-time)

ndash Semaphores (n-at-a-time)

ndash Timers

ndash Processes and Threads (signalled upon exit or terminate)

ndash Directories (change notification)

5656

Waiting

No guaranteed ordering of wait resolution

ndash If multiple threads are waiting for an object and only one thread is released (eg itrsquos a mutex or auto-reset event) which thread gets released is unpredictable

5757

Executive Synchronization

Thread waitson an object

handle

Create and initialize thread object

Initialized

Ready

Transition

Waiting

Running

Terminated

Standby

Wait is completeSet object to

signaled state

Interaction with thread scheduling

Waiting on Dispatcher Objects ndash outside the kernel

5858

Interaction bet Synchronization amp Dispatching User mode thread waits on an event objectlsquos handle

Kernel changes threadlsquos scheduling state from ready to waiting and adds thread to wait-list

Another thread sets the event

Kernel wakes up waiting threads variable priority threads get priority boost

5959

Interaction bet Synchronization amp Dispatching Dispatcher re-schedules new thread ndash it may preempt running thread it it has lower priority and issues software interrupt to initiate context switch

If no processor can be preempted the dispatcher places the ready thread in the dispatcher ready queue to be scheduled later

6060

What signals an object

Dispatcher object

System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

Owning thread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (kernel mode)

nonsignaled signaled

Owning thread or otherthread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (exported to user mode)

nonsignaled signaled

One thread releases thesemaphore freeing a resource

A thread acquires the semaphoreMore resources are not available

Kernel resumes one or more waiting threads

Semaphore

6161

A thread reinitializesthe thread object

What signals an object (contd)

Dispatcher object System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

A thread sets the event

Kernel resumes one or more threads

Kernel resumes one or more waiting threads

Event

nonsignaled signaled

Dedicated thread sets oneevent in the event pair

Kernel resumes theother dedicated thread

Kernel resumes waitingdedicated thread

Event pair

nonsignaled signaled

Timer expires

A thread (re) initializes the timer

Kernel resumes all waiting threads

Timer

nonsignaled signaled

Thread terminates

Kernel resumes all waiting threads

Thread

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 16: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

1616

Mutual Exclusion - Hardware Support

Interrupt Disabling

ndash Concurrent threads cannot overlap on a uniprocessor

ndash Thread will run until performing a system call or interrupt happens

Special Atomic Machine Instructions

ndash Test and Set Instruction - read amp write a memory location

ndash Exchange Instruction - swap register and memory location

Problems with Machine-Instruction Approach

ndash Busy waiting

ndash Starvation is possible

ndash Deadlock is possible

1717

Synchronization Hardware

Test and modify the content of a word atomically

boolean TestAndSet(boolean amptarget)

boolean rv = target

target = true

return rv

1818

Shared data ndash boolean lock = false

Thread Ti

do

while (TestAndSet(lock))

critical section

lock = false

remainder section

Mutual Exclusion with Test-and-Set

1919

Synchronization Hardware

Atomically swap two variables

void Swap(boolean ampa boolean ampb)

boolean temp = a

a = b

b = temp

2020

Mutual Exclusion with Swap

Shared data (initialized to 0) int lock = 0

Thread Ti

int key

do

key = 1

while (key == 1) Swap(lockkey)

critical section

lock = 0

remainder section

2121

Semaphores

Semaphore S ndash integer variable

can only be accessed via two atomic operations

wait (S)

while (S lt= 0)S--

signal (S)

S++

2222

Critical Section of n Threads

Shared data

semaphore mutex initially mutex = 1

Thread Ti

do wait(mutex) critical section

signal(mutex) remainder section

while (1)

2323

Semaphore Implementation

Semaphores may suspendresume threads

ndash Avoid busy waiting

Define a semaphore as a record

typedef struct

int value struct thread L semaphore

Assume two simple operations

ndash suspend() suspends the thread that invokes it

ndash resume(T) resumes the execution of a blocked thread T

2424

Implementation

Semaphore operations now defined as wait(S)

Svalue--

if (Svalue lt 0)

add this thread to SLsuspend()

signal(S) Svalue++

if (Svalue lt= 0)

remove a thread T from SLresume(T)

2525

Semaphore as a General Synchronization Tool Execute B in Tj only after A executed in Ti

Use semaphore flag initialized to 0

Code

Ti Tj

A wait(flag)

signal(flag) B

2626

Two Types of Semaphores

Counting semaphore

ndash integer value can range over an unrestricted domain

Binary semaphore

ndash integer value can range only between 0 and 1

ndash can be simpler to implement

Counting semaphore S can be implemented as a binary semaphore

2727

Deadlock and Starvation

Deadlock ndash

ndash two or more threads are waiting indefinitely for an event that can be caused by only one of the waiting threads

Let S and Q be two semaphores initialized to 1

T0 T1

wait(S) wait(Q)

wait(Q) wait(S)

signal(S) signal(Q)

signal(Q) signal(S)

2828

Deadlock and Starvation

Starvation ndash indefinite blocking

ndash A thread may never be removed from the semaphore queue in which it is suspended

Solution ndash

ndash all code should acquirerelease semaphores in same order

2929

Windows Synchronization

Uses interrupt masks to protect access to global resources on uniprocessor systems

Uses spinlocks on multiprocessor systems

Provides dispatcher objects which may act as mutexes and semaphores

Dispatcher objects may also provide events An event acts much like a condition variable

3030

Linux Synchronization

Kernel disables interrupts for synchronizing access to global data on uniprocessor systems

Uses spinlocks for multiprocessor synchronization

Uses semaphores and readers-writers locks when longer sections of code need access to data

Implements POSIX synchronization primitives to support multitasking multithreading (including real-time threads) and multiprocessing

3131

Further Reading

Ben-Ari M Principles of Concurrent Programming Prentice Hall 1982

Lamport L The Mutual Exclusion Problem Journal of the ACM April 1986

Abraham Silberschatz Peter B Galvin Operating System Concepts John Wiley amp Sons 6th Ed 2003

ndash Chapter 7 - Process Synchronization

ndash Chapter 8 - Deadlocks

3232

32 Trap Dispatching Interrupts Synchronization Trap and Interrupt dispatching

IRQL levels amp Interrupt Precedence

Spinlocks and Kernel Synchronization

Executive Synchronization

3333

Kernel Mode Versus User Mode

A processor state

Controls access to memory

Each memory page is tagged to show the required mode for reading and for writing

ndash Protects the system from the users

ndash Protects the user (process) from themselves

ndash System is not protected from system

Code regions are tagged ldquono write in any moderdquo

Controls ability to execute privileged instructions

A Windows abstraction

ndash Intel Ring 0 Ring 3

3434

Kernel Mode Versus User Mode

Control flow (a thread) can change from user to kernel mode and back

ndash Does not affect scheduling

ndash Thread context includes info about execution mode (along with registers etc)

PerfMon counters

ndash ldquoPrivileged Timerdquo and ldquoUser Timerdquo

ndash 4 levels of granularity thread process processor system

3535

Getting Into Kernel Mode

Code is run in kernel mode for one of three reasons

1 Requests from user mode

ndash Via the system service dispatch mechanism

ndash Kernel-mode code runs in the context of the requesting thread

2 Dedicated kernel-mode system threads

ndash Some threads in the system stay in kernel mode at all timesmostly in the ldquoSystemrdquo process

ndash Scheduled preempted etc like any other threads

3636

Getting Into Kernel Mode

3 Interrupts from external devicesndash interrupt dispatcher invokes the interrupt service routine

ndash ISR runs in the context of the interrupted thread so-called ldquoarbitrary thread contextrdquo

ndash ISR often requests the execution of a ldquoDPC routinerdquo which also runs in kernel mode

ndash Time not charged to interrupted thread

3737

Trap dispatching

Interruptdispatcher

Systemservice

dispatcher

Interruptserviceroutines

Interruptserviceroutines

Interruptserviceroutines

System services

System services

System services

Exceptiondispatcher

Exceptionhandlers

Exceptionhandlers

Exceptionhandlers

Virtual memorymanagerlsquos pager

Interrupt

System service call

HW exceptionsSW exceptions

Virtual addressexceptions

Trap processorlsquos mechanism to capture executing thread

ndash Switch from user to kernel mode

ndash Interrupts ndash asynchronous

ndash Exceptions - synchronous

Interrupt dispatch routine

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Interrupt service routine

interrupt

user or kernel mode

codekernel mode

Note no thread or process context switch

Note no thread or process context switch

Interrupt Dispatching

3939

IRQL = Interrupt Request Level

ndash Precedence of the interrupt with respect to other interrupts

ndash Different interrupt sources have different IRQLs

ndash not the same as IRQ

IRQL is also a state of the processor

ndash Servicing an interrupt raises processor IRQL to that interruptrsquos IRQL

ndash this masks subsequent interrupts at equal and lower IRQLs

User mode is limited to IRQL 0

No waits or page faults at IRQL gt= DISPATCH_LEVEL

Interrupt Precedence via IRQLs (x86)

4040

PassiveLowAPC

DispatchDPCDevice 1

Profile amp Synch (Srv 2003)

ClockInterprocessor Interrupt

Power failHigh

normal thread execution

Hardware interrupts

Deferrable software interrupts

012

302928

31

Interrupt Precedence via IRQLs (x86)

4141

Interrupt processing

Interrupt dispatch table (IDT)ndash Links to interrupt service routines

x86ndash Interrupt controller interrupts processor (single line)

ndash Processor queries for interrupt vector uses vector as index to IDT

Alphandash PAL code (Privileged Architecture Library ndash Alpha BIOS) determines interrupt vector calls kernel

ndash Kernel uses vector to index IDT

After ISR execution IRQL is lowered to initial level

4242

Interrupt object

Allows device drivers to register ISRs for their devicesndash Contains dispatch code (initial handler)

ndash Dispatch code calls ISR with interrupt object as parameter(HW cannot pass parameters to ISR)

Connectingdisconnecting interrupt objectsndash Dynamic association between ISR and IDT entry

ndash Loadable device drivers (kernel modules)

ndash Turn onoff ISR

Interrupt objects can synchronize access to ISR datandash Multiple instances of ISR may be active simultaneously (MP machine)

ndash Multiple ISR may be connected with IRQL

4343

Predefined IRQLs

High ndash used when halting the system (via KeBugCheck())

Power fail ndash originated in the NT design document but has never been used

Inter-processor interruptndash used to request action from other processor (dispatching a thread updating a processors TLB system shutdown system crash)

Clockndash Used to update systemlsquos clock allocation of CPU time to threads

Profilendash Used for kernel profiling (see Kernel profiler ndash Kernprofexe Res Kit)

4444

Predefined IRQLs (contd)

Device

ndash Used to prioritize device interrupts

DPCdispatch and APC

ndash Software interrupts that kernel and device drivers generate

Passive

ndash No interrupt level at all normal thread execution

4545

IRQLs on 64-bit Systems

PassiveLowAPC

DispatchDPCDevice 1

Device n

Synch (Srv 2003)Clock

Interprocessor InterruptPower

HighProfile

012

1413

15

34

PassiveLowAPC

DispatchDPC amp Synch (UP only)Correctable Machine Check

Device 1

Device nSynch (MP only)

ClockInterprocessor Interrupt

HighProfilePower

x64 IA64

12

4646

Interrupt Prioritization amp Delivery

IRQLs are determined as followsndash x86 UP systems IRQL = 27 - IRQ

ndash x86 MP systems bucketized (random)

ndash x64 amp IA64 systems IRQL = IDT vector number 16

On MP systems which processor is chosen to deliver an interruptndash By default any processor can receive an interrupt from any deviceCan be configured with IntFilter utility in Resource Kit

ndash On x86 and x64 systems the IOAPIC (IO advanced programmable interrupt controller) is programmed to interrupt the processor running at the lowest IRQL

ndash On IA64 systems the SAPIC (streamlined advanced programmable interrupt controller) is configured to interrupt one processor for each interrupt sourceProcessors are assigned round robin for each interrupt vector

4747

Software interrupts

Initiating thread dispatching

ndash DPC allow for scheduling actions when kernel is deep within many layers of code

ndash Delayed scheduling decision one DPC queue per processor

Handling timer expiration

Asynchronous execution of a procedure in context of a particular thread

Support for asynchronous IO operations

4848

Flow of Interrupts

4949

Sync on MP use spinlocks to coordinate among processors

Spinlock acquisition and release routines implement a one-owner-at-a-time algorithm

ndash Spinlock is either free or is considered to be owned by a CPU

ndash Analogous to using Windows API mutexes from user mode

A spinlock is just a data cell in memory

ndash Accessed with a test-and-modify operation that is atomic across all processors

ndash KSPIN_LOCK is an opaque data type typedefrsquod as a ULONG

ndash To implement synchronization a single bit is sufficient

Synchronization on SMP Systems

5050

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

Kernel Synchronization

Processor BProcessor A

Critical section

spinlock

DPC DPC

A spinlock is a locking primitive associatedwith a global data structure such as the DPC queue

5151

Queued Spinlocks

Problem Checking status of spinlock via test-and-set operation creates bus contention

Queued spinlocks maintain queue of waiting processors

First processor acquires lock other processors wait on processor-local flag

ndash Thus busy-wait loop requires no access to the memory bus

When releasing lock the 1st processorrsquos flag is modified

ndash Exactly one processor is being signaled

ndash Pre-determined wait order

5252

SMP Scalability Improvements

Windows 2000 queued spinlocks

ndash qlocks in Kernel Debugger

Server 2003

ndash More spinlocks eliminated (context swap system space commit)

ndash Further reduction of use of spinlocks amp length they are held

ndash Scheduling database now per-CPUAllows thread state transitions in parallel

5353

SMP Scalability Improvements

XP2003

ndash Minimized lock contention for hot locks PFN or Page Frame Database lock

ndash Some locks completely eliminatedCharging nonpagedpaged pool quotas allocating and mapping system page table entries charging commitment of pages allocatingmapping physical memory through AWE functions

ndash New more efficient locking mechanism (pushlocks)Doesnrsquot use spinlocks when no contentionUsed for object manager and address windowing extensions (AWE) related locks

5454

Waiting

Flexible wait calls

ndash Wait for one or multiple objects in one call

ndash Wait for multiple can wait for ldquoanyrdquo one or ldquoallrdquo at onceldquoAllrdquo all objects must be in the signalled state concurrently to resolve the wait

ndash All wait calls include optional timeout argument

ndash Waiting threads consume no CPU time

5555

Waiting

Waitable objects include

ndash Events (may be auto-reset or manual reset may be set or ldquopulsedrdquo)

ndash Mutexes (ldquomutual exclusionrdquo one-at-a-time)

ndash Semaphores (n-at-a-time)

ndash Timers

ndash Processes and Threads (signalled upon exit or terminate)

ndash Directories (change notification)

5656

Waiting

No guaranteed ordering of wait resolution

ndash If multiple threads are waiting for an object and only one thread is released (eg itrsquos a mutex or auto-reset event) which thread gets released is unpredictable

5757

Executive Synchronization

Thread waitson an object

handle

Create and initialize thread object

Initialized

Ready

Transition

Waiting

Running

Terminated

Standby

Wait is completeSet object to

signaled state

Interaction with thread scheduling

Waiting on Dispatcher Objects ndash outside the kernel

5858

Interaction bet Synchronization amp Dispatching User mode thread waits on an event objectlsquos handle

Kernel changes threadlsquos scheduling state from ready to waiting and adds thread to wait-list

Another thread sets the event

Kernel wakes up waiting threads variable priority threads get priority boost

5959

Interaction bet Synchronization amp Dispatching Dispatcher re-schedules new thread ndash it may preempt running thread it it has lower priority and issues software interrupt to initiate context switch

If no processor can be preempted the dispatcher places the ready thread in the dispatcher ready queue to be scheduled later

6060

What signals an object

Dispatcher object

System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

Owning thread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (kernel mode)

nonsignaled signaled

Owning thread or otherthread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (exported to user mode)

nonsignaled signaled

One thread releases thesemaphore freeing a resource

A thread acquires the semaphoreMore resources are not available

Kernel resumes one or more waiting threads

Semaphore

6161

A thread reinitializesthe thread object

What signals an object (contd)

Dispatcher object System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

A thread sets the event

Kernel resumes one or more threads

Kernel resumes one or more waiting threads

Event

nonsignaled signaled

Dedicated thread sets oneevent in the event pair

Kernel resumes theother dedicated thread

Kernel resumes waitingdedicated thread

Event pair

nonsignaled signaled

Timer expires

A thread (re) initializes the timer

Kernel resumes all waiting threads

Timer

nonsignaled signaled

Thread terminates

Kernel resumes all waiting threads

Thread

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 17: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

1717

Synchronization Hardware

Test and modify the content of a word atomically

boolean TestAndSet(boolean amptarget)

boolean rv = target

target = true

return rv

1818

Shared data ndash boolean lock = false

Thread Ti

do

while (TestAndSet(lock))

critical section

lock = false

remainder section

Mutual Exclusion with Test-and-Set

1919

Synchronization Hardware

Atomically swap two variables

void Swap(boolean ampa boolean ampb)

boolean temp = a

a = b

b = temp

2020

Mutual Exclusion with Swap

Shared data (initialized to 0) int lock = 0

Thread Ti

int key

do

key = 1

while (key == 1) Swap(lockkey)

critical section

lock = 0

remainder section

2121

Semaphores

Semaphore S ndash integer variable

can only be accessed via two atomic operations

wait (S)

while (S lt= 0)S--

signal (S)

S++

2222

Critical Section of n Threads

Shared data

semaphore mutex initially mutex = 1

Thread Ti

do wait(mutex) critical section

signal(mutex) remainder section

while (1)

2323

Semaphore Implementation

Semaphores may suspendresume threads

ndash Avoid busy waiting

Define a semaphore as a record

typedef struct

int value struct thread L semaphore

Assume two simple operations

ndash suspend() suspends the thread that invokes it

ndash resume(T) resumes the execution of a blocked thread T

2424

Implementation

Semaphore operations now defined as wait(S)

Svalue--

if (Svalue lt 0)

add this thread to SLsuspend()

signal(S) Svalue++

if (Svalue lt= 0)

remove a thread T from SLresume(T)

2525

Semaphore as a General Synchronization Tool Execute B in Tj only after A executed in Ti

Use semaphore flag initialized to 0

Code

Ti Tj

A wait(flag)

signal(flag) B

2626

Two Types of Semaphores

Counting semaphore

ndash integer value can range over an unrestricted domain

Binary semaphore

ndash integer value can range only between 0 and 1

ndash can be simpler to implement

Counting semaphore S can be implemented as a binary semaphore

2727

Deadlock and Starvation

Deadlock ndash

ndash two or more threads are waiting indefinitely for an event that can be caused by only one of the waiting threads

Let S and Q be two semaphores initialized to 1

T0 T1

wait(S) wait(Q)

wait(Q) wait(S)

signal(S) signal(Q)

signal(Q) signal(S)

2828

Deadlock and Starvation

Starvation ndash indefinite blocking

ndash A thread may never be removed from the semaphore queue in which it is suspended

Solution ndash

ndash all code should acquirerelease semaphores in same order

2929

Windows Synchronization

Uses interrupt masks to protect access to global resources on uniprocessor systems

Uses spinlocks on multiprocessor systems

Provides dispatcher objects which may act as mutexes and semaphores

Dispatcher objects may also provide events An event acts much like a condition variable

3030

Linux Synchronization

Kernel disables interrupts for synchronizing access to global data on uniprocessor systems

Uses spinlocks for multiprocessor synchronization

Uses semaphores and readers-writers locks when longer sections of code need access to data

Implements POSIX synchronization primitives to support multitasking multithreading (including real-time threads) and multiprocessing

3131

Further Reading

Ben-Ari M Principles of Concurrent Programming Prentice Hall 1982

Lamport L The Mutual Exclusion Problem Journal of the ACM April 1986

Abraham Silberschatz Peter B Galvin Operating System Concepts John Wiley amp Sons 6th Ed 2003

ndash Chapter 7 - Process Synchronization

ndash Chapter 8 - Deadlocks

3232

32 Trap Dispatching Interrupts Synchronization Trap and Interrupt dispatching

IRQL levels amp Interrupt Precedence

Spinlocks and Kernel Synchronization

Executive Synchronization

3333

Kernel Mode Versus User Mode

A processor state

Controls access to memory

Each memory page is tagged to show the required mode for reading and for writing

ndash Protects the system from the users

ndash Protects the user (process) from themselves

ndash System is not protected from system

Code regions are tagged ldquono write in any moderdquo

Controls ability to execute privileged instructions

A Windows abstraction

ndash Intel Ring 0 Ring 3

3434

Kernel Mode Versus User Mode

Control flow (a thread) can change from user to kernel mode and back

ndash Does not affect scheduling

ndash Thread context includes info about execution mode (along with registers etc)

PerfMon counters

ndash ldquoPrivileged Timerdquo and ldquoUser Timerdquo

ndash 4 levels of granularity thread process processor system

3535

Getting Into Kernel Mode

Code is run in kernel mode for one of three reasons

1 Requests from user mode

ndash Via the system service dispatch mechanism

ndash Kernel-mode code runs in the context of the requesting thread

2 Dedicated kernel-mode system threads

ndash Some threads in the system stay in kernel mode at all timesmostly in the ldquoSystemrdquo process

ndash Scheduled preempted etc like any other threads

3636

Getting Into Kernel Mode

3 Interrupts from external devicesndash interrupt dispatcher invokes the interrupt service routine

ndash ISR runs in the context of the interrupted thread so-called ldquoarbitrary thread contextrdquo

ndash ISR often requests the execution of a ldquoDPC routinerdquo which also runs in kernel mode

ndash Time not charged to interrupted thread

3737

Trap dispatching

Interruptdispatcher

Systemservice

dispatcher

Interruptserviceroutines

Interruptserviceroutines

Interruptserviceroutines

System services

System services

System services

Exceptiondispatcher

Exceptionhandlers

Exceptionhandlers

Exceptionhandlers

Virtual memorymanagerlsquos pager

Interrupt

System service call

HW exceptionsSW exceptions

Virtual addressexceptions

Trap processorlsquos mechanism to capture executing thread

ndash Switch from user to kernel mode

ndash Interrupts ndash asynchronous

ndash Exceptions - synchronous

Interrupt dispatch routine

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Interrupt service routine

interrupt

user or kernel mode

codekernel mode

Note no thread or process context switch

Note no thread or process context switch

Interrupt Dispatching

3939

IRQL = Interrupt Request Level

ndash Precedence of the interrupt with respect to other interrupts

ndash Different interrupt sources have different IRQLs

ndash not the same as IRQ

IRQL is also a state of the processor

ndash Servicing an interrupt raises processor IRQL to that interruptrsquos IRQL

ndash this masks subsequent interrupts at equal and lower IRQLs

User mode is limited to IRQL 0

No waits or page faults at IRQL gt= DISPATCH_LEVEL

Interrupt Precedence via IRQLs (x86)

4040

PassiveLowAPC

DispatchDPCDevice 1

Profile amp Synch (Srv 2003)

ClockInterprocessor Interrupt

Power failHigh

normal thread execution

Hardware interrupts

Deferrable software interrupts

012

302928

31

Interrupt Precedence via IRQLs (x86)

4141

Interrupt processing

Interrupt dispatch table (IDT)ndash Links to interrupt service routines

x86ndash Interrupt controller interrupts processor (single line)

ndash Processor queries for interrupt vector uses vector as index to IDT

Alphandash PAL code (Privileged Architecture Library ndash Alpha BIOS) determines interrupt vector calls kernel

ndash Kernel uses vector to index IDT

After ISR execution IRQL is lowered to initial level

4242

Interrupt object

Allows device drivers to register ISRs for their devicesndash Contains dispatch code (initial handler)

ndash Dispatch code calls ISR with interrupt object as parameter(HW cannot pass parameters to ISR)

Connectingdisconnecting interrupt objectsndash Dynamic association between ISR and IDT entry

ndash Loadable device drivers (kernel modules)

ndash Turn onoff ISR

Interrupt objects can synchronize access to ISR datandash Multiple instances of ISR may be active simultaneously (MP machine)

ndash Multiple ISR may be connected with IRQL

4343

Predefined IRQLs

High ndash used when halting the system (via KeBugCheck())

Power fail ndash originated in the NT design document but has never been used

Inter-processor interruptndash used to request action from other processor (dispatching a thread updating a processors TLB system shutdown system crash)

Clockndash Used to update systemlsquos clock allocation of CPU time to threads

Profilendash Used for kernel profiling (see Kernel profiler ndash Kernprofexe Res Kit)

4444

Predefined IRQLs (contd)

Device

ndash Used to prioritize device interrupts

DPCdispatch and APC

ndash Software interrupts that kernel and device drivers generate

Passive

ndash No interrupt level at all normal thread execution

4545

IRQLs on 64-bit Systems

PassiveLowAPC

DispatchDPCDevice 1

Device n

Synch (Srv 2003)Clock

Interprocessor InterruptPower

HighProfile

012

1413

15

34

PassiveLowAPC

DispatchDPC amp Synch (UP only)Correctable Machine Check

Device 1

Device nSynch (MP only)

ClockInterprocessor Interrupt

HighProfilePower

x64 IA64

12

4646

Interrupt Prioritization amp Delivery

IRQLs are determined as followsndash x86 UP systems IRQL = 27 - IRQ

ndash x86 MP systems bucketized (random)

ndash x64 amp IA64 systems IRQL = IDT vector number 16

On MP systems which processor is chosen to deliver an interruptndash By default any processor can receive an interrupt from any deviceCan be configured with IntFilter utility in Resource Kit

ndash On x86 and x64 systems the IOAPIC (IO advanced programmable interrupt controller) is programmed to interrupt the processor running at the lowest IRQL

ndash On IA64 systems the SAPIC (streamlined advanced programmable interrupt controller) is configured to interrupt one processor for each interrupt sourceProcessors are assigned round robin for each interrupt vector

4747

Software interrupts

Initiating thread dispatching

ndash DPC allow for scheduling actions when kernel is deep within many layers of code

ndash Delayed scheduling decision one DPC queue per processor

Handling timer expiration

Asynchronous execution of a procedure in context of a particular thread

Support for asynchronous IO operations

4848

Flow of Interrupts

4949

Sync on MP use spinlocks to coordinate among processors

Spinlock acquisition and release routines implement a one-owner-at-a-time algorithm

ndash Spinlock is either free or is considered to be owned by a CPU

ndash Analogous to using Windows API mutexes from user mode

A spinlock is just a data cell in memory

ndash Accessed with a test-and-modify operation that is atomic across all processors

ndash KSPIN_LOCK is an opaque data type typedefrsquod as a ULONG

ndash To implement synchronization a single bit is sufficient

Synchronization on SMP Systems

5050

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

Kernel Synchronization

Processor BProcessor A

Critical section

spinlock

DPC DPC

A spinlock is a locking primitive associatedwith a global data structure such as the DPC queue

5151

Queued Spinlocks

Problem Checking status of spinlock via test-and-set operation creates bus contention

Queued spinlocks maintain queue of waiting processors

First processor acquires lock other processors wait on processor-local flag

ndash Thus busy-wait loop requires no access to the memory bus

When releasing lock the 1st processorrsquos flag is modified

ndash Exactly one processor is being signaled

ndash Pre-determined wait order

5252

SMP Scalability Improvements

Windows 2000 queued spinlocks

ndash qlocks in Kernel Debugger

Server 2003

ndash More spinlocks eliminated (context swap system space commit)

ndash Further reduction of use of spinlocks amp length they are held

ndash Scheduling database now per-CPUAllows thread state transitions in parallel

5353

SMP Scalability Improvements

XP2003

ndash Minimized lock contention for hot locks PFN or Page Frame Database lock

ndash Some locks completely eliminatedCharging nonpagedpaged pool quotas allocating and mapping system page table entries charging commitment of pages allocatingmapping physical memory through AWE functions

ndash New more efficient locking mechanism (pushlocks)Doesnrsquot use spinlocks when no contentionUsed for object manager and address windowing extensions (AWE) related locks

5454

Waiting

Flexible wait calls

ndash Wait for one or multiple objects in one call

ndash Wait for multiple can wait for ldquoanyrdquo one or ldquoallrdquo at onceldquoAllrdquo all objects must be in the signalled state concurrently to resolve the wait

ndash All wait calls include optional timeout argument

ndash Waiting threads consume no CPU time

5555

Waiting

Waitable objects include

ndash Events (may be auto-reset or manual reset may be set or ldquopulsedrdquo)

ndash Mutexes (ldquomutual exclusionrdquo one-at-a-time)

ndash Semaphores (n-at-a-time)

ndash Timers

ndash Processes and Threads (signalled upon exit or terminate)

ndash Directories (change notification)

5656

Waiting

No guaranteed ordering of wait resolution

ndash If multiple threads are waiting for an object and only one thread is released (eg itrsquos a mutex or auto-reset event) which thread gets released is unpredictable

5757

Executive Synchronization

Thread waitson an object

handle

Create and initialize thread object

Initialized

Ready

Transition

Waiting

Running

Terminated

Standby

Wait is completeSet object to

signaled state

Interaction with thread scheduling

Waiting on Dispatcher Objects ndash outside the kernel

5858

Interaction bet Synchronization amp Dispatching User mode thread waits on an event objectlsquos handle

Kernel changes threadlsquos scheduling state from ready to waiting and adds thread to wait-list

Another thread sets the event

Kernel wakes up waiting threads variable priority threads get priority boost

5959

Interaction bet Synchronization amp Dispatching Dispatcher re-schedules new thread ndash it may preempt running thread it it has lower priority and issues software interrupt to initiate context switch

If no processor can be preempted the dispatcher places the ready thread in the dispatcher ready queue to be scheduled later

6060

What signals an object

Dispatcher object

System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

Owning thread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (kernel mode)

nonsignaled signaled

Owning thread or otherthread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (exported to user mode)

nonsignaled signaled

One thread releases thesemaphore freeing a resource

A thread acquires the semaphoreMore resources are not available

Kernel resumes one or more waiting threads

Semaphore

6161

A thread reinitializesthe thread object

What signals an object (contd)

Dispatcher object System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

A thread sets the event

Kernel resumes one or more threads

Kernel resumes one or more waiting threads

Event

nonsignaled signaled

Dedicated thread sets oneevent in the event pair

Kernel resumes theother dedicated thread

Kernel resumes waitingdedicated thread

Event pair

nonsignaled signaled

Timer expires

A thread (re) initializes the timer

Kernel resumes all waiting threads

Timer

nonsignaled signaled

Thread terminates

Kernel resumes all waiting threads

Thread

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 18: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

1818

Shared data ndash boolean lock = false

Thread Ti

do

while (TestAndSet(lock))

critical section

lock = false

remainder section

Mutual Exclusion with Test-and-Set

1919

Synchronization Hardware

Atomically swap two variables

void Swap(boolean ampa boolean ampb)

boolean temp = a

a = b

b = temp

2020

Mutual Exclusion with Swap

Shared data (initialized to 0) int lock = 0

Thread Ti

int key

do

key = 1

while (key == 1) Swap(lockkey)

critical section

lock = 0

remainder section

2121

Semaphores

Semaphore S ndash integer variable

can only be accessed via two atomic operations

wait (S)

while (S lt= 0)S--

signal (S)

S++

2222

Critical Section of n Threads

Shared data

semaphore mutex initially mutex = 1

Thread Ti

do wait(mutex) critical section

signal(mutex) remainder section

while (1)

2323

Semaphore Implementation

Semaphores may suspendresume threads

ndash Avoid busy waiting

Define a semaphore as a record

typedef struct

int value struct thread L semaphore

Assume two simple operations

ndash suspend() suspends the thread that invokes it

ndash resume(T) resumes the execution of a blocked thread T

2424

Implementation

Semaphore operations now defined as wait(S)

Svalue--

if (Svalue lt 0)

add this thread to SLsuspend()

signal(S) Svalue++

if (Svalue lt= 0)

remove a thread T from SLresume(T)

2525

Semaphore as a General Synchronization Tool Execute B in Tj only after A executed in Ti

Use semaphore flag initialized to 0

Code

Ti Tj

A wait(flag)

signal(flag) B

2626

Two Types of Semaphores

Counting semaphore

ndash integer value can range over an unrestricted domain

Binary semaphore

ndash integer value can range only between 0 and 1

ndash can be simpler to implement

Counting semaphore S can be implemented as a binary semaphore

2727

Deadlock and Starvation

Deadlock ndash

ndash two or more threads are waiting indefinitely for an event that can be caused by only one of the waiting threads

Let S and Q be two semaphores initialized to 1

T0 T1

wait(S) wait(Q)

wait(Q) wait(S)

signal(S) signal(Q)

signal(Q) signal(S)

2828

Deadlock and Starvation

Starvation ndash indefinite blocking

ndash A thread may never be removed from the semaphore queue in which it is suspended

Solution ndash

ndash all code should acquirerelease semaphores in same order

2929

Windows Synchronization

Uses interrupt masks to protect access to global resources on uniprocessor systems

Uses spinlocks on multiprocessor systems

Provides dispatcher objects which may act as mutexes and semaphores

Dispatcher objects may also provide events An event acts much like a condition variable

3030

Linux Synchronization

Kernel disables interrupts for synchronizing access to global data on uniprocessor systems

Uses spinlocks for multiprocessor synchronization

Uses semaphores and readers-writers locks when longer sections of code need access to data

Implements POSIX synchronization primitives to support multitasking multithreading (including real-time threads) and multiprocessing

3131

Further Reading

Ben-Ari M Principles of Concurrent Programming Prentice Hall 1982

Lamport L The Mutual Exclusion Problem Journal of the ACM April 1986

Abraham Silberschatz Peter B Galvin Operating System Concepts John Wiley amp Sons 6th Ed 2003

ndash Chapter 7 - Process Synchronization

ndash Chapter 8 - Deadlocks

3232

32 Trap Dispatching Interrupts Synchronization Trap and Interrupt dispatching

IRQL levels amp Interrupt Precedence

Spinlocks and Kernel Synchronization

Executive Synchronization

3333

Kernel Mode Versus User Mode

A processor state

Controls access to memory

Each memory page is tagged to show the required mode for reading and for writing

ndash Protects the system from the users

ndash Protects the user (process) from themselves

ndash System is not protected from system

Code regions are tagged ldquono write in any moderdquo

Controls ability to execute privileged instructions

A Windows abstraction

ndash Intel Ring 0 Ring 3

3434

Kernel Mode Versus User Mode

Control flow (a thread) can change from user to kernel mode and back

ndash Does not affect scheduling

ndash Thread context includes info about execution mode (along with registers etc)

PerfMon counters

ndash ldquoPrivileged Timerdquo and ldquoUser Timerdquo

ndash 4 levels of granularity thread process processor system

3535

Getting Into Kernel Mode

Code is run in kernel mode for one of three reasons

1 Requests from user mode

ndash Via the system service dispatch mechanism

ndash Kernel-mode code runs in the context of the requesting thread

2 Dedicated kernel-mode system threads

ndash Some threads in the system stay in kernel mode at all timesmostly in the ldquoSystemrdquo process

ndash Scheduled preempted etc like any other threads

3636

Getting Into Kernel Mode

3 Interrupts from external devicesndash interrupt dispatcher invokes the interrupt service routine

ndash ISR runs in the context of the interrupted thread so-called ldquoarbitrary thread contextrdquo

ndash ISR often requests the execution of a ldquoDPC routinerdquo which also runs in kernel mode

ndash Time not charged to interrupted thread

3737

Trap dispatching

Interruptdispatcher

Systemservice

dispatcher

Interruptserviceroutines

Interruptserviceroutines

Interruptserviceroutines

System services

System services

System services

Exceptiondispatcher

Exceptionhandlers

Exceptionhandlers

Exceptionhandlers

Virtual memorymanagerlsquos pager

Interrupt

System service call

HW exceptionsSW exceptions

Virtual addressexceptions

Trap processorlsquos mechanism to capture executing thread

ndash Switch from user to kernel mode

ndash Interrupts ndash asynchronous

ndash Exceptions - synchronous

Interrupt dispatch routine

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Interrupt service routine

interrupt

user or kernel mode

codekernel mode

Note no thread or process context switch

Note no thread or process context switch

Interrupt Dispatching

3939

IRQL = Interrupt Request Level

ndash Precedence of the interrupt with respect to other interrupts

ndash Different interrupt sources have different IRQLs

ndash not the same as IRQ

IRQL is also a state of the processor

ndash Servicing an interrupt raises processor IRQL to that interruptrsquos IRQL

ndash this masks subsequent interrupts at equal and lower IRQLs

User mode is limited to IRQL 0

No waits or page faults at IRQL gt= DISPATCH_LEVEL

Interrupt Precedence via IRQLs (x86)

4040

PassiveLowAPC

DispatchDPCDevice 1

Profile amp Synch (Srv 2003)

ClockInterprocessor Interrupt

Power failHigh

normal thread execution

Hardware interrupts

Deferrable software interrupts

012

302928

31

Interrupt Precedence via IRQLs (x86)

4141

Interrupt processing

Interrupt dispatch table (IDT)ndash Links to interrupt service routines

x86ndash Interrupt controller interrupts processor (single line)

ndash Processor queries for interrupt vector uses vector as index to IDT

Alphandash PAL code (Privileged Architecture Library ndash Alpha BIOS) determines interrupt vector calls kernel

ndash Kernel uses vector to index IDT

After ISR execution IRQL is lowered to initial level

4242

Interrupt object

Allows device drivers to register ISRs for their devicesndash Contains dispatch code (initial handler)

ndash Dispatch code calls ISR with interrupt object as parameter(HW cannot pass parameters to ISR)

Connectingdisconnecting interrupt objectsndash Dynamic association between ISR and IDT entry

ndash Loadable device drivers (kernel modules)

ndash Turn onoff ISR

Interrupt objects can synchronize access to ISR datandash Multiple instances of ISR may be active simultaneously (MP machine)

ndash Multiple ISR may be connected with IRQL

4343

Predefined IRQLs

High ndash used when halting the system (via KeBugCheck())

Power fail ndash originated in the NT design document but has never been used

Inter-processor interruptndash used to request action from other processor (dispatching a thread updating a processors TLB system shutdown system crash)

Clockndash Used to update systemlsquos clock allocation of CPU time to threads

Profilendash Used for kernel profiling (see Kernel profiler ndash Kernprofexe Res Kit)

4444

Predefined IRQLs (contd)

Device

ndash Used to prioritize device interrupts

DPCdispatch and APC

ndash Software interrupts that kernel and device drivers generate

Passive

ndash No interrupt level at all normal thread execution

4545

IRQLs on 64-bit Systems

PassiveLowAPC

DispatchDPCDevice 1

Device n

Synch (Srv 2003)Clock

Interprocessor InterruptPower

HighProfile

012

1413

15

34

PassiveLowAPC

DispatchDPC amp Synch (UP only)Correctable Machine Check

Device 1

Device nSynch (MP only)

ClockInterprocessor Interrupt

HighProfilePower

x64 IA64

12

4646

Interrupt Prioritization amp Delivery

IRQLs are determined as followsndash x86 UP systems IRQL = 27 - IRQ

ndash x86 MP systems bucketized (random)

ndash x64 amp IA64 systems IRQL = IDT vector number 16

On MP systems which processor is chosen to deliver an interruptndash By default any processor can receive an interrupt from any deviceCan be configured with IntFilter utility in Resource Kit

ndash On x86 and x64 systems the IOAPIC (IO advanced programmable interrupt controller) is programmed to interrupt the processor running at the lowest IRQL

ndash On IA64 systems the SAPIC (streamlined advanced programmable interrupt controller) is configured to interrupt one processor for each interrupt sourceProcessors are assigned round robin for each interrupt vector

4747

Software interrupts

Initiating thread dispatching

ndash DPC allow for scheduling actions when kernel is deep within many layers of code

ndash Delayed scheduling decision one DPC queue per processor

Handling timer expiration

Asynchronous execution of a procedure in context of a particular thread

Support for asynchronous IO operations

4848

Flow of Interrupts

4949

Sync on MP use spinlocks to coordinate among processors

Spinlock acquisition and release routines implement a one-owner-at-a-time algorithm

ndash Spinlock is either free or is considered to be owned by a CPU

ndash Analogous to using Windows API mutexes from user mode

A spinlock is just a data cell in memory

ndash Accessed with a test-and-modify operation that is atomic across all processors

ndash KSPIN_LOCK is an opaque data type typedefrsquod as a ULONG

ndash To implement synchronization a single bit is sufficient

Synchronization on SMP Systems

5050

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

Kernel Synchronization

Processor BProcessor A

Critical section

spinlock

DPC DPC

A spinlock is a locking primitive associatedwith a global data structure such as the DPC queue

5151

Queued Spinlocks

Problem Checking status of spinlock via test-and-set operation creates bus contention

Queued spinlocks maintain queue of waiting processors

First processor acquires lock other processors wait on processor-local flag

ndash Thus busy-wait loop requires no access to the memory bus

When releasing lock the 1st processorrsquos flag is modified

ndash Exactly one processor is being signaled

ndash Pre-determined wait order

5252

SMP Scalability Improvements

Windows 2000 queued spinlocks

ndash qlocks in Kernel Debugger

Server 2003

ndash More spinlocks eliminated (context swap system space commit)

ndash Further reduction of use of spinlocks amp length they are held

ndash Scheduling database now per-CPUAllows thread state transitions in parallel

5353

SMP Scalability Improvements

XP2003

ndash Minimized lock contention for hot locks PFN or Page Frame Database lock

ndash Some locks completely eliminatedCharging nonpagedpaged pool quotas allocating and mapping system page table entries charging commitment of pages allocatingmapping physical memory through AWE functions

ndash New more efficient locking mechanism (pushlocks)Doesnrsquot use spinlocks when no contentionUsed for object manager and address windowing extensions (AWE) related locks

5454

Waiting

Flexible wait calls

ndash Wait for one or multiple objects in one call

ndash Wait for multiple can wait for ldquoanyrdquo one or ldquoallrdquo at onceldquoAllrdquo all objects must be in the signalled state concurrently to resolve the wait

ndash All wait calls include optional timeout argument

ndash Waiting threads consume no CPU time

5555

Waiting

Waitable objects include

ndash Events (may be auto-reset or manual reset may be set or ldquopulsedrdquo)

ndash Mutexes (ldquomutual exclusionrdquo one-at-a-time)

ndash Semaphores (n-at-a-time)

ndash Timers

ndash Processes and Threads (signalled upon exit or terminate)

ndash Directories (change notification)

5656

Waiting

No guaranteed ordering of wait resolution

ndash If multiple threads are waiting for an object and only one thread is released (eg itrsquos a mutex or auto-reset event) which thread gets released is unpredictable

5757

Executive Synchronization

Thread waitson an object

handle

Create and initialize thread object

Initialized

Ready

Transition

Waiting

Running

Terminated

Standby

Wait is completeSet object to

signaled state

Interaction with thread scheduling

Waiting on Dispatcher Objects ndash outside the kernel

5858

Interaction bet Synchronization amp Dispatching User mode thread waits on an event objectlsquos handle

Kernel changes threadlsquos scheduling state from ready to waiting and adds thread to wait-list

Another thread sets the event

Kernel wakes up waiting threads variable priority threads get priority boost

5959

Interaction bet Synchronization amp Dispatching Dispatcher re-schedules new thread ndash it may preempt running thread it it has lower priority and issues software interrupt to initiate context switch

If no processor can be preempted the dispatcher places the ready thread in the dispatcher ready queue to be scheduled later

6060

What signals an object

Dispatcher object

System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

Owning thread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (kernel mode)

nonsignaled signaled

Owning thread or otherthread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (exported to user mode)

nonsignaled signaled

One thread releases thesemaphore freeing a resource

A thread acquires the semaphoreMore resources are not available

Kernel resumes one or more waiting threads

Semaphore

6161

A thread reinitializesthe thread object

What signals an object (contd)

Dispatcher object System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

A thread sets the event

Kernel resumes one or more threads

Kernel resumes one or more waiting threads

Event

nonsignaled signaled

Dedicated thread sets oneevent in the event pair

Kernel resumes theother dedicated thread

Kernel resumes waitingdedicated thread

Event pair

nonsignaled signaled

Timer expires

A thread (re) initializes the timer

Kernel resumes all waiting threads

Timer

nonsignaled signaled

Thread terminates

Kernel resumes all waiting threads

Thread

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 19: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

1919

Synchronization Hardware

Atomically swap two variables

void Swap(boolean ampa boolean ampb)

boolean temp = a

a = b

b = temp

2020

Mutual Exclusion with Swap

Shared data (initialized to 0) int lock = 0

Thread Ti

int key

do

key = 1

while (key == 1) Swap(lockkey)

critical section

lock = 0

remainder section

2121

Semaphores

Semaphore S ndash integer variable

can only be accessed via two atomic operations

wait (S)

while (S lt= 0)S--

signal (S)

S++

2222

Critical Section of n Threads

Shared data

semaphore mutex initially mutex = 1

Thread Ti

do wait(mutex) critical section

signal(mutex) remainder section

while (1)

2323

Semaphore Implementation

Semaphores may suspendresume threads

ndash Avoid busy waiting

Define a semaphore as a record

typedef struct

int value struct thread L semaphore

Assume two simple operations

ndash suspend() suspends the thread that invokes it

ndash resume(T) resumes the execution of a blocked thread T

2424

Implementation

Semaphore operations now defined as wait(S)

Svalue--

if (Svalue lt 0)

add this thread to SLsuspend()

signal(S) Svalue++

if (Svalue lt= 0)

remove a thread T from SLresume(T)

2525

Semaphore as a General Synchronization Tool Execute B in Tj only after A executed in Ti

Use semaphore flag initialized to 0

Code

Ti Tj

A wait(flag)

signal(flag) B

2626

Two Types of Semaphores

Counting semaphore

ndash integer value can range over an unrestricted domain

Binary semaphore

ndash integer value can range only between 0 and 1

ndash can be simpler to implement

Counting semaphore S can be implemented as a binary semaphore

2727

Deadlock and Starvation

Deadlock ndash

ndash two or more threads are waiting indefinitely for an event that can be caused by only one of the waiting threads

Let S and Q be two semaphores initialized to 1

T0 T1

wait(S) wait(Q)

wait(Q) wait(S)

signal(S) signal(Q)

signal(Q) signal(S)

2828

Deadlock and Starvation

Starvation ndash indefinite blocking

ndash A thread may never be removed from the semaphore queue in which it is suspended

Solution ndash

ndash all code should acquirerelease semaphores in same order

2929

Windows Synchronization

Uses interrupt masks to protect access to global resources on uniprocessor systems

Uses spinlocks on multiprocessor systems

Provides dispatcher objects which may act as mutexes and semaphores

Dispatcher objects may also provide events An event acts much like a condition variable

3030

Linux Synchronization

Kernel disables interrupts for synchronizing access to global data on uniprocessor systems

Uses spinlocks for multiprocessor synchronization

Uses semaphores and readers-writers locks when longer sections of code need access to data

Implements POSIX synchronization primitives to support multitasking multithreading (including real-time threads) and multiprocessing

3131

Further Reading

Ben-Ari M Principles of Concurrent Programming Prentice Hall 1982

Lamport L The Mutual Exclusion Problem Journal of the ACM April 1986

Abraham Silberschatz Peter B Galvin Operating System Concepts John Wiley amp Sons 6th Ed 2003

ndash Chapter 7 - Process Synchronization

ndash Chapter 8 - Deadlocks

3232

32 Trap Dispatching Interrupts Synchronization Trap and Interrupt dispatching

IRQL levels amp Interrupt Precedence

Spinlocks and Kernel Synchronization

Executive Synchronization

3333

Kernel Mode Versus User Mode

A processor state

Controls access to memory

Each memory page is tagged to show the required mode for reading and for writing

ndash Protects the system from the users

ndash Protects the user (process) from themselves

ndash System is not protected from system

Code regions are tagged ldquono write in any moderdquo

Controls ability to execute privileged instructions

A Windows abstraction

ndash Intel Ring 0 Ring 3

3434

Kernel Mode Versus User Mode

Control flow (a thread) can change from user to kernel mode and back

ndash Does not affect scheduling

ndash Thread context includes info about execution mode (along with registers etc)

PerfMon counters

ndash ldquoPrivileged Timerdquo and ldquoUser Timerdquo

ndash 4 levels of granularity thread process processor system

3535

Getting Into Kernel Mode

Code is run in kernel mode for one of three reasons

1 Requests from user mode

ndash Via the system service dispatch mechanism

ndash Kernel-mode code runs in the context of the requesting thread

2 Dedicated kernel-mode system threads

ndash Some threads in the system stay in kernel mode at all timesmostly in the ldquoSystemrdquo process

ndash Scheduled preempted etc like any other threads

3636

Getting Into Kernel Mode

3 Interrupts from external devicesndash interrupt dispatcher invokes the interrupt service routine

ndash ISR runs in the context of the interrupted thread so-called ldquoarbitrary thread contextrdquo

ndash ISR often requests the execution of a ldquoDPC routinerdquo which also runs in kernel mode

ndash Time not charged to interrupted thread

3737

Trap dispatching

Interruptdispatcher

Systemservice

dispatcher

Interruptserviceroutines

Interruptserviceroutines

Interruptserviceroutines

System services

System services

System services

Exceptiondispatcher

Exceptionhandlers

Exceptionhandlers

Exceptionhandlers

Virtual memorymanagerlsquos pager

Interrupt

System service call

HW exceptionsSW exceptions

Virtual addressexceptions

Trap processorlsquos mechanism to capture executing thread

ndash Switch from user to kernel mode

ndash Interrupts ndash asynchronous

ndash Exceptions - synchronous

Interrupt dispatch routine

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Interrupt service routine

interrupt

user or kernel mode

codekernel mode

Note no thread or process context switch

Note no thread or process context switch

Interrupt Dispatching

3939

IRQL = Interrupt Request Level

ndash Precedence of the interrupt with respect to other interrupts

ndash Different interrupt sources have different IRQLs

ndash not the same as IRQ

IRQL is also a state of the processor

ndash Servicing an interrupt raises processor IRQL to that interruptrsquos IRQL

ndash this masks subsequent interrupts at equal and lower IRQLs

User mode is limited to IRQL 0

No waits or page faults at IRQL gt= DISPATCH_LEVEL

Interrupt Precedence via IRQLs (x86)

4040

PassiveLowAPC

DispatchDPCDevice 1

Profile amp Synch (Srv 2003)

ClockInterprocessor Interrupt

Power failHigh

normal thread execution

Hardware interrupts

Deferrable software interrupts

012

302928

31

Interrupt Precedence via IRQLs (x86)

4141

Interrupt processing

Interrupt dispatch table (IDT)ndash Links to interrupt service routines

x86ndash Interrupt controller interrupts processor (single line)

ndash Processor queries for interrupt vector uses vector as index to IDT

Alphandash PAL code (Privileged Architecture Library ndash Alpha BIOS) determines interrupt vector calls kernel

ndash Kernel uses vector to index IDT

After ISR execution IRQL is lowered to initial level

4242

Interrupt object

Allows device drivers to register ISRs for their devicesndash Contains dispatch code (initial handler)

ndash Dispatch code calls ISR with interrupt object as parameter(HW cannot pass parameters to ISR)

Connectingdisconnecting interrupt objectsndash Dynamic association between ISR and IDT entry

ndash Loadable device drivers (kernel modules)

ndash Turn onoff ISR

Interrupt objects can synchronize access to ISR datandash Multiple instances of ISR may be active simultaneously (MP machine)

ndash Multiple ISR may be connected with IRQL

4343

Predefined IRQLs

High ndash used when halting the system (via KeBugCheck())

Power fail ndash originated in the NT design document but has never been used

Inter-processor interruptndash used to request action from other processor (dispatching a thread updating a processors TLB system shutdown system crash)

Clockndash Used to update systemlsquos clock allocation of CPU time to threads

Profilendash Used for kernel profiling (see Kernel profiler ndash Kernprofexe Res Kit)

4444

Predefined IRQLs (contd)

Device

ndash Used to prioritize device interrupts

DPCdispatch and APC

ndash Software interrupts that kernel and device drivers generate

Passive

ndash No interrupt level at all normal thread execution

4545

IRQLs on 64-bit Systems

PassiveLowAPC

DispatchDPCDevice 1

Device n

Synch (Srv 2003)Clock

Interprocessor InterruptPower

HighProfile

012

1413

15

34

PassiveLowAPC

DispatchDPC amp Synch (UP only)Correctable Machine Check

Device 1

Device nSynch (MP only)

ClockInterprocessor Interrupt

HighProfilePower

x64 IA64

12

4646

Interrupt Prioritization amp Delivery

IRQLs are determined as followsndash x86 UP systems IRQL = 27 - IRQ

ndash x86 MP systems bucketized (random)

ndash x64 amp IA64 systems IRQL = IDT vector number 16

On MP systems which processor is chosen to deliver an interruptndash By default any processor can receive an interrupt from any deviceCan be configured with IntFilter utility in Resource Kit

ndash On x86 and x64 systems the IOAPIC (IO advanced programmable interrupt controller) is programmed to interrupt the processor running at the lowest IRQL

ndash On IA64 systems the SAPIC (streamlined advanced programmable interrupt controller) is configured to interrupt one processor for each interrupt sourceProcessors are assigned round robin for each interrupt vector

4747

Software interrupts

Initiating thread dispatching

ndash DPC allow for scheduling actions when kernel is deep within many layers of code

ndash Delayed scheduling decision one DPC queue per processor

Handling timer expiration

Asynchronous execution of a procedure in context of a particular thread

Support for asynchronous IO operations

4848

Flow of Interrupts

4949

Sync on MP use spinlocks to coordinate among processors

Spinlock acquisition and release routines implement a one-owner-at-a-time algorithm

ndash Spinlock is either free or is considered to be owned by a CPU

ndash Analogous to using Windows API mutexes from user mode

A spinlock is just a data cell in memory

ndash Accessed with a test-and-modify operation that is atomic across all processors

ndash KSPIN_LOCK is an opaque data type typedefrsquod as a ULONG

ndash To implement synchronization a single bit is sufficient

Synchronization on SMP Systems

5050

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

Kernel Synchronization

Processor BProcessor A

Critical section

spinlock

DPC DPC

A spinlock is a locking primitive associatedwith a global data structure such as the DPC queue

5151

Queued Spinlocks

Problem Checking status of spinlock via test-and-set operation creates bus contention

Queued spinlocks maintain queue of waiting processors

First processor acquires lock other processors wait on processor-local flag

ndash Thus busy-wait loop requires no access to the memory bus

When releasing lock the 1st processorrsquos flag is modified

ndash Exactly one processor is being signaled

ndash Pre-determined wait order

5252

SMP Scalability Improvements

Windows 2000 queued spinlocks

ndash qlocks in Kernel Debugger

Server 2003

ndash More spinlocks eliminated (context swap system space commit)

ndash Further reduction of use of spinlocks amp length they are held

ndash Scheduling database now per-CPUAllows thread state transitions in parallel

5353

SMP Scalability Improvements

XP2003

ndash Minimized lock contention for hot locks PFN or Page Frame Database lock

ndash Some locks completely eliminatedCharging nonpagedpaged pool quotas allocating and mapping system page table entries charging commitment of pages allocatingmapping physical memory through AWE functions

ndash New more efficient locking mechanism (pushlocks)Doesnrsquot use spinlocks when no contentionUsed for object manager and address windowing extensions (AWE) related locks

5454

Waiting

Flexible wait calls

ndash Wait for one or multiple objects in one call

ndash Wait for multiple can wait for ldquoanyrdquo one or ldquoallrdquo at onceldquoAllrdquo all objects must be in the signalled state concurrently to resolve the wait

ndash All wait calls include optional timeout argument

ndash Waiting threads consume no CPU time

5555

Waiting

Waitable objects include

ndash Events (may be auto-reset or manual reset may be set or ldquopulsedrdquo)

ndash Mutexes (ldquomutual exclusionrdquo one-at-a-time)

ndash Semaphores (n-at-a-time)

ndash Timers

ndash Processes and Threads (signalled upon exit or terminate)

ndash Directories (change notification)

5656

Waiting

No guaranteed ordering of wait resolution

ndash If multiple threads are waiting for an object and only one thread is released (eg itrsquos a mutex or auto-reset event) which thread gets released is unpredictable

5757

Executive Synchronization

Thread waitson an object

handle

Create and initialize thread object

Initialized

Ready

Transition

Waiting

Running

Terminated

Standby

Wait is completeSet object to

signaled state

Interaction with thread scheduling

Waiting on Dispatcher Objects ndash outside the kernel

5858

Interaction bet Synchronization amp Dispatching User mode thread waits on an event objectlsquos handle

Kernel changes threadlsquos scheduling state from ready to waiting and adds thread to wait-list

Another thread sets the event

Kernel wakes up waiting threads variable priority threads get priority boost

5959

Interaction bet Synchronization amp Dispatching Dispatcher re-schedules new thread ndash it may preempt running thread it it has lower priority and issues software interrupt to initiate context switch

If no processor can be preempted the dispatcher places the ready thread in the dispatcher ready queue to be scheduled later

6060

What signals an object

Dispatcher object

System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

Owning thread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (kernel mode)

nonsignaled signaled

Owning thread or otherthread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (exported to user mode)

nonsignaled signaled

One thread releases thesemaphore freeing a resource

A thread acquires the semaphoreMore resources are not available

Kernel resumes one or more waiting threads

Semaphore

6161

A thread reinitializesthe thread object

What signals an object (contd)

Dispatcher object System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

A thread sets the event

Kernel resumes one or more threads

Kernel resumes one or more waiting threads

Event

nonsignaled signaled

Dedicated thread sets oneevent in the event pair

Kernel resumes theother dedicated thread

Kernel resumes waitingdedicated thread

Event pair

nonsignaled signaled

Timer expires

A thread (re) initializes the timer

Kernel resumes all waiting threads

Timer

nonsignaled signaled

Thread terminates

Kernel resumes all waiting threads

Thread

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 20: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

2020

Mutual Exclusion with Swap

Shared data (initialized to 0) int lock = 0

Thread Ti

int key

do

key = 1

while (key == 1) Swap(lockkey)

critical section

lock = 0

remainder section

2121

Semaphores

Semaphore S ndash integer variable

can only be accessed via two atomic operations

wait (S)

while (S lt= 0)S--

signal (S)

S++

2222

Critical Section of n Threads

Shared data

semaphore mutex initially mutex = 1

Thread Ti

do wait(mutex) critical section

signal(mutex) remainder section

while (1)

2323

Semaphore Implementation

Semaphores may suspendresume threads

ndash Avoid busy waiting

Define a semaphore as a record

typedef struct

int value struct thread L semaphore

Assume two simple operations

ndash suspend() suspends the thread that invokes it

ndash resume(T) resumes the execution of a blocked thread T

2424

Implementation

Semaphore operations now defined as wait(S)

Svalue--

if (Svalue lt 0)

add this thread to SLsuspend()

signal(S) Svalue++

if (Svalue lt= 0)

remove a thread T from SLresume(T)

2525

Semaphore as a General Synchronization Tool Execute B in Tj only after A executed in Ti

Use semaphore flag initialized to 0

Code

Ti Tj

A wait(flag)

signal(flag) B

2626

Two Types of Semaphores

Counting semaphore

ndash integer value can range over an unrestricted domain

Binary semaphore

ndash integer value can range only between 0 and 1

ndash can be simpler to implement

Counting semaphore S can be implemented as a binary semaphore

2727

Deadlock and Starvation

Deadlock ndash

ndash two or more threads are waiting indefinitely for an event that can be caused by only one of the waiting threads

Let S and Q be two semaphores initialized to 1

T0 T1

wait(S) wait(Q)

wait(Q) wait(S)

signal(S) signal(Q)

signal(Q) signal(S)

2828

Deadlock and Starvation

Starvation ndash indefinite blocking

ndash A thread may never be removed from the semaphore queue in which it is suspended

Solution ndash

ndash all code should acquirerelease semaphores in same order

2929

Windows Synchronization

Uses interrupt masks to protect access to global resources on uniprocessor systems

Uses spinlocks on multiprocessor systems

Provides dispatcher objects which may act as mutexes and semaphores

Dispatcher objects may also provide events An event acts much like a condition variable

3030

Linux Synchronization

Kernel disables interrupts for synchronizing access to global data on uniprocessor systems

Uses spinlocks for multiprocessor synchronization

Uses semaphores and readers-writers locks when longer sections of code need access to data

Implements POSIX synchronization primitives to support multitasking multithreading (including real-time threads) and multiprocessing

3131

Further Reading

Ben-Ari M Principles of Concurrent Programming Prentice Hall 1982

Lamport L The Mutual Exclusion Problem Journal of the ACM April 1986

Abraham Silberschatz Peter B Galvin Operating System Concepts John Wiley amp Sons 6th Ed 2003

ndash Chapter 7 - Process Synchronization

ndash Chapter 8 - Deadlocks

3232

32 Trap Dispatching Interrupts Synchronization Trap and Interrupt dispatching

IRQL levels amp Interrupt Precedence

Spinlocks and Kernel Synchronization

Executive Synchronization

3333

Kernel Mode Versus User Mode

A processor state

Controls access to memory

Each memory page is tagged to show the required mode for reading and for writing

ndash Protects the system from the users

ndash Protects the user (process) from themselves

ndash System is not protected from system

Code regions are tagged ldquono write in any moderdquo

Controls ability to execute privileged instructions

A Windows abstraction

ndash Intel Ring 0 Ring 3

3434

Kernel Mode Versus User Mode

Control flow (a thread) can change from user to kernel mode and back

ndash Does not affect scheduling

ndash Thread context includes info about execution mode (along with registers etc)

PerfMon counters

ndash ldquoPrivileged Timerdquo and ldquoUser Timerdquo

ndash 4 levels of granularity thread process processor system

3535

Getting Into Kernel Mode

Code is run in kernel mode for one of three reasons

1 Requests from user mode

ndash Via the system service dispatch mechanism

ndash Kernel-mode code runs in the context of the requesting thread

2 Dedicated kernel-mode system threads

ndash Some threads in the system stay in kernel mode at all timesmostly in the ldquoSystemrdquo process

ndash Scheduled preempted etc like any other threads

3636

Getting Into Kernel Mode

3 Interrupts from external devicesndash interrupt dispatcher invokes the interrupt service routine

ndash ISR runs in the context of the interrupted thread so-called ldquoarbitrary thread contextrdquo

ndash ISR often requests the execution of a ldquoDPC routinerdquo which also runs in kernel mode

ndash Time not charged to interrupted thread

3737

Trap dispatching

Interruptdispatcher

Systemservice

dispatcher

Interruptserviceroutines

Interruptserviceroutines

Interruptserviceroutines

System services

System services

System services

Exceptiondispatcher

Exceptionhandlers

Exceptionhandlers

Exceptionhandlers

Virtual memorymanagerlsquos pager

Interrupt

System service call

HW exceptionsSW exceptions

Virtual addressexceptions

Trap processorlsquos mechanism to capture executing thread

ndash Switch from user to kernel mode

ndash Interrupts ndash asynchronous

ndash Exceptions - synchronous

Interrupt dispatch routine

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Interrupt service routine

interrupt

user or kernel mode

codekernel mode

Note no thread or process context switch

Note no thread or process context switch

Interrupt Dispatching

3939

IRQL = Interrupt Request Level

ndash Precedence of the interrupt with respect to other interrupts

ndash Different interrupt sources have different IRQLs

ndash not the same as IRQ

IRQL is also a state of the processor

ndash Servicing an interrupt raises processor IRQL to that interruptrsquos IRQL

ndash this masks subsequent interrupts at equal and lower IRQLs

User mode is limited to IRQL 0

No waits or page faults at IRQL gt= DISPATCH_LEVEL

Interrupt Precedence via IRQLs (x86)

4040

PassiveLowAPC

DispatchDPCDevice 1

Profile amp Synch (Srv 2003)

ClockInterprocessor Interrupt

Power failHigh

normal thread execution

Hardware interrupts

Deferrable software interrupts

012

302928

31

Interrupt Precedence via IRQLs (x86)

4141

Interrupt processing

Interrupt dispatch table (IDT)ndash Links to interrupt service routines

x86ndash Interrupt controller interrupts processor (single line)

ndash Processor queries for interrupt vector uses vector as index to IDT

Alphandash PAL code (Privileged Architecture Library ndash Alpha BIOS) determines interrupt vector calls kernel

ndash Kernel uses vector to index IDT

After ISR execution IRQL is lowered to initial level

4242

Interrupt object

Allows device drivers to register ISRs for their devicesndash Contains dispatch code (initial handler)

ndash Dispatch code calls ISR with interrupt object as parameter(HW cannot pass parameters to ISR)

Connectingdisconnecting interrupt objectsndash Dynamic association between ISR and IDT entry

ndash Loadable device drivers (kernel modules)

ndash Turn onoff ISR

Interrupt objects can synchronize access to ISR datandash Multiple instances of ISR may be active simultaneously (MP machine)

ndash Multiple ISR may be connected with IRQL

4343

Predefined IRQLs

High ndash used when halting the system (via KeBugCheck())

Power fail ndash originated in the NT design document but has never been used

Inter-processor interruptndash used to request action from other processor (dispatching a thread updating a processors TLB system shutdown system crash)

Clockndash Used to update systemlsquos clock allocation of CPU time to threads

Profilendash Used for kernel profiling (see Kernel profiler ndash Kernprofexe Res Kit)

4444

Predefined IRQLs (contd)

Device

ndash Used to prioritize device interrupts

DPCdispatch and APC

ndash Software interrupts that kernel and device drivers generate

Passive

ndash No interrupt level at all normal thread execution

4545

IRQLs on 64-bit Systems

PassiveLowAPC

DispatchDPCDevice 1

Device n

Synch (Srv 2003)Clock

Interprocessor InterruptPower

HighProfile

012

1413

15

34

PassiveLowAPC

DispatchDPC amp Synch (UP only)Correctable Machine Check

Device 1

Device nSynch (MP only)

ClockInterprocessor Interrupt

HighProfilePower

x64 IA64

12

4646

Interrupt Prioritization amp Delivery

IRQLs are determined as followsndash x86 UP systems IRQL = 27 - IRQ

ndash x86 MP systems bucketized (random)

ndash x64 amp IA64 systems IRQL = IDT vector number 16

On MP systems which processor is chosen to deliver an interruptndash By default any processor can receive an interrupt from any deviceCan be configured with IntFilter utility in Resource Kit

ndash On x86 and x64 systems the IOAPIC (IO advanced programmable interrupt controller) is programmed to interrupt the processor running at the lowest IRQL

ndash On IA64 systems the SAPIC (streamlined advanced programmable interrupt controller) is configured to interrupt one processor for each interrupt sourceProcessors are assigned round robin for each interrupt vector

4747

Software interrupts

Initiating thread dispatching

ndash DPC allow for scheduling actions when kernel is deep within many layers of code

ndash Delayed scheduling decision one DPC queue per processor

Handling timer expiration

Asynchronous execution of a procedure in context of a particular thread

Support for asynchronous IO operations

4848

Flow of Interrupts

4949

Sync on MP use spinlocks to coordinate among processors

Spinlock acquisition and release routines implement a one-owner-at-a-time algorithm

ndash Spinlock is either free or is considered to be owned by a CPU

ndash Analogous to using Windows API mutexes from user mode

A spinlock is just a data cell in memory

ndash Accessed with a test-and-modify operation that is atomic across all processors

ndash KSPIN_LOCK is an opaque data type typedefrsquod as a ULONG

ndash To implement synchronization a single bit is sufficient

Synchronization on SMP Systems

5050

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

Kernel Synchronization

Processor BProcessor A

Critical section

spinlock

DPC DPC

A spinlock is a locking primitive associatedwith a global data structure such as the DPC queue

5151

Queued Spinlocks

Problem Checking status of spinlock via test-and-set operation creates bus contention

Queued spinlocks maintain queue of waiting processors

First processor acquires lock other processors wait on processor-local flag

ndash Thus busy-wait loop requires no access to the memory bus

When releasing lock the 1st processorrsquos flag is modified

ndash Exactly one processor is being signaled

ndash Pre-determined wait order

5252

SMP Scalability Improvements

Windows 2000 queued spinlocks

ndash qlocks in Kernel Debugger

Server 2003

ndash More spinlocks eliminated (context swap system space commit)

ndash Further reduction of use of spinlocks amp length they are held

ndash Scheduling database now per-CPUAllows thread state transitions in parallel

5353

SMP Scalability Improvements

XP2003

ndash Minimized lock contention for hot locks PFN or Page Frame Database lock

ndash Some locks completely eliminatedCharging nonpagedpaged pool quotas allocating and mapping system page table entries charging commitment of pages allocatingmapping physical memory through AWE functions

ndash New more efficient locking mechanism (pushlocks)Doesnrsquot use spinlocks when no contentionUsed for object manager and address windowing extensions (AWE) related locks

5454

Waiting

Flexible wait calls

ndash Wait for one or multiple objects in one call

ndash Wait for multiple can wait for ldquoanyrdquo one or ldquoallrdquo at onceldquoAllrdquo all objects must be in the signalled state concurrently to resolve the wait

ndash All wait calls include optional timeout argument

ndash Waiting threads consume no CPU time

5555

Waiting

Waitable objects include

ndash Events (may be auto-reset or manual reset may be set or ldquopulsedrdquo)

ndash Mutexes (ldquomutual exclusionrdquo one-at-a-time)

ndash Semaphores (n-at-a-time)

ndash Timers

ndash Processes and Threads (signalled upon exit or terminate)

ndash Directories (change notification)

5656

Waiting

No guaranteed ordering of wait resolution

ndash If multiple threads are waiting for an object and only one thread is released (eg itrsquos a mutex or auto-reset event) which thread gets released is unpredictable

5757

Executive Synchronization

Thread waitson an object

handle

Create and initialize thread object

Initialized

Ready

Transition

Waiting

Running

Terminated

Standby

Wait is completeSet object to

signaled state

Interaction with thread scheduling

Waiting on Dispatcher Objects ndash outside the kernel

5858

Interaction bet Synchronization amp Dispatching User mode thread waits on an event objectlsquos handle

Kernel changes threadlsquos scheduling state from ready to waiting and adds thread to wait-list

Another thread sets the event

Kernel wakes up waiting threads variable priority threads get priority boost

5959

Interaction bet Synchronization amp Dispatching Dispatcher re-schedules new thread ndash it may preempt running thread it it has lower priority and issues software interrupt to initiate context switch

If no processor can be preempted the dispatcher places the ready thread in the dispatcher ready queue to be scheduled later

6060

What signals an object

Dispatcher object

System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

Owning thread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (kernel mode)

nonsignaled signaled

Owning thread or otherthread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (exported to user mode)

nonsignaled signaled

One thread releases thesemaphore freeing a resource

A thread acquires the semaphoreMore resources are not available

Kernel resumes one or more waiting threads

Semaphore

6161

A thread reinitializesthe thread object

What signals an object (contd)

Dispatcher object System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

A thread sets the event

Kernel resumes one or more threads

Kernel resumes one or more waiting threads

Event

nonsignaled signaled

Dedicated thread sets oneevent in the event pair

Kernel resumes theother dedicated thread

Kernel resumes waitingdedicated thread

Event pair

nonsignaled signaled

Timer expires

A thread (re) initializes the timer

Kernel resumes all waiting threads

Timer

nonsignaled signaled

Thread terminates

Kernel resumes all waiting threads

Thread

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 21: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

2121

Semaphores

Semaphore S ndash integer variable

can only be accessed via two atomic operations

wait (S)

while (S lt= 0)S--

signal (S)

S++

2222

Critical Section of n Threads

Shared data

semaphore mutex initially mutex = 1

Thread Ti

do wait(mutex) critical section

signal(mutex) remainder section

while (1)

2323

Semaphore Implementation

Semaphores may suspendresume threads

ndash Avoid busy waiting

Define a semaphore as a record

typedef struct

int value struct thread L semaphore

Assume two simple operations

ndash suspend() suspends the thread that invokes it

ndash resume(T) resumes the execution of a blocked thread T

2424

Implementation

Semaphore operations now defined as wait(S)

Svalue--

if (Svalue lt 0)

add this thread to SLsuspend()

signal(S) Svalue++

if (Svalue lt= 0)

remove a thread T from SLresume(T)

2525

Semaphore as a General Synchronization Tool Execute B in Tj only after A executed in Ti

Use semaphore flag initialized to 0

Code

Ti Tj

A wait(flag)

signal(flag) B

2626

Two Types of Semaphores

Counting semaphore

ndash integer value can range over an unrestricted domain

Binary semaphore

ndash integer value can range only between 0 and 1

ndash can be simpler to implement

Counting semaphore S can be implemented as a binary semaphore

2727

Deadlock and Starvation

Deadlock ndash

ndash two or more threads are waiting indefinitely for an event that can be caused by only one of the waiting threads

Let S and Q be two semaphores initialized to 1

T0 T1

wait(S) wait(Q)

wait(Q) wait(S)

signal(S) signal(Q)

signal(Q) signal(S)

2828

Deadlock and Starvation

Starvation ndash indefinite blocking

ndash A thread may never be removed from the semaphore queue in which it is suspended

Solution ndash

ndash all code should acquirerelease semaphores in same order

2929

Windows Synchronization

Uses interrupt masks to protect access to global resources on uniprocessor systems

Uses spinlocks on multiprocessor systems

Provides dispatcher objects which may act as mutexes and semaphores

Dispatcher objects may also provide events An event acts much like a condition variable

3030

Linux Synchronization

Kernel disables interrupts for synchronizing access to global data on uniprocessor systems

Uses spinlocks for multiprocessor synchronization

Uses semaphores and readers-writers locks when longer sections of code need access to data

Implements POSIX synchronization primitives to support multitasking multithreading (including real-time threads) and multiprocessing

3131

Further Reading

Ben-Ari M Principles of Concurrent Programming Prentice Hall 1982

Lamport L The Mutual Exclusion Problem Journal of the ACM April 1986

Abraham Silberschatz Peter B Galvin Operating System Concepts John Wiley amp Sons 6th Ed 2003

ndash Chapter 7 - Process Synchronization

ndash Chapter 8 - Deadlocks

3232

32 Trap Dispatching Interrupts Synchronization Trap and Interrupt dispatching

IRQL levels amp Interrupt Precedence

Spinlocks and Kernel Synchronization

Executive Synchronization

3333

Kernel Mode Versus User Mode

A processor state

Controls access to memory

Each memory page is tagged to show the required mode for reading and for writing

ndash Protects the system from the users

ndash Protects the user (process) from themselves

ndash System is not protected from system

Code regions are tagged ldquono write in any moderdquo

Controls ability to execute privileged instructions

A Windows abstraction

ndash Intel Ring 0 Ring 3

3434

Kernel Mode Versus User Mode

Control flow (a thread) can change from user to kernel mode and back

ndash Does not affect scheduling

ndash Thread context includes info about execution mode (along with registers etc)

PerfMon counters

ndash ldquoPrivileged Timerdquo and ldquoUser Timerdquo

ndash 4 levels of granularity thread process processor system

3535

Getting Into Kernel Mode

Code is run in kernel mode for one of three reasons

1 Requests from user mode

ndash Via the system service dispatch mechanism

ndash Kernel-mode code runs in the context of the requesting thread

2 Dedicated kernel-mode system threads

ndash Some threads in the system stay in kernel mode at all timesmostly in the ldquoSystemrdquo process

ndash Scheduled preempted etc like any other threads

3636

Getting Into Kernel Mode

3 Interrupts from external devicesndash interrupt dispatcher invokes the interrupt service routine

ndash ISR runs in the context of the interrupted thread so-called ldquoarbitrary thread contextrdquo

ndash ISR often requests the execution of a ldquoDPC routinerdquo which also runs in kernel mode

ndash Time not charged to interrupted thread

3737

Trap dispatching

Interruptdispatcher

Systemservice

dispatcher

Interruptserviceroutines

Interruptserviceroutines

Interruptserviceroutines

System services

System services

System services

Exceptiondispatcher

Exceptionhandlers

Exceptionhandlers

Exceptionhandlers

Virtual memorymanagerlsquos pager

Interrupt

System service call

HW exceptionsSW exceptions

Virtual addressexceptions

Trap processorlsquos mechanism to capture executing thread

ndash Switch from user to kernel mode

ndash Interrupts ndash asynchronous

ndash Exceptions - synchronous

Interrupt dispatch routine

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Interrupt service routine

interrupt

user or kernel mode

codekernel mode

Note no thread or process context switch

Note no thread or process context switch

Interrupt Dispatching

3939

IRQL = Interrupt Request Level

ndash Precedence of the interrupt with respect to other interrupts

ndash Different interrupt sources have different IRQLs

ndash not the same as IRQ

IRQL is also a state of the processor

ndash Servicing an interrupt raises processor IRQL to that interruptrsquos IRQL

ndash this masks subsequent interrupts at equal and lower IRQLs

User mode is limited to IRQL 0

No waits or page faults at IRQL gt= DISPATCH_LEVEL

Interrupt Precedence via IRQLs (x86)

4040

PassiveLowAPC

DispatchDPCDevice 1

Profile amp Synch (Srv 2003)

ClockInterprocessor Interrupt

Power failHigh

normal thread execution

Hardware interrupts

Deferrable software interrupts

012

302928

31

Interrupt Precedence via IRQLs (x86)

4141

Interrupt processing

Interrupt dispatch table (IDT)ndash Links to interrupt service routines

x86ndash Interrupt controller interrupts processor (single line)

ndash Processor queries for interrupt vector uses vector as index to IDT

Alphandash PAL code (Privileged Architecture Library ndash Alpha BIOS) determines interrupt vector calls kernel

ndash Kernel uses vector to index IDT

After ISR execution IRQL is lowered to initial level

4242

Interrupt object

Allows device drivers to register ISRs for their devicesndash Contains dispatch code (initial handler)

ndash Dispatch code calls ISR with interrupt object as parameter(HW cannot pass parameters to ISR)

Connectingdisconnecting interrupt objectsndash Dynamic association between ISR and IDT entry

ndash Loadable device drivers (kernel modules)

ndash Turn onoff ISR

Interrupt objects can synchronize access to ISR datandash Multiple instances of ISR may be active simultaneously (MP machine)

ndash Multiple ISR may be connected with IRQL

4343

Predefined IRQLs

High ndash used when halting the system (via KeBugCheck())

Power fail ndash originated in the NT design document but has never been used

Inter-processor interruptndash used to request action from other processor (dispatching a thread updating a processors TLB system shutdown system crash)

Clockndash Used to update systemlsquos clock allocation of CPU time to threads

Profilendash Used for kernel profiling (see Kernel profiler ndash Kernprofexe Res Kit)

4444

Predefined IRQLs (contd)

Device

ndash Used to prioritize device interrupts

DPCdispatch and APC

ndash Software interrupts that kernel and device drivers generate

Passive

ndash No interrupt level at all normal thread execution

4545

IRQLs on 64-bit Systems

PassiveLowAPC

DispatchDPCDevice 1

Device n

Synch (Srv 2003)Clock

Interprocessor InterruptPower

HighProfile

012

1413

15

34

PassiveLowAPC

DispatchDPC amp Synch (UP only)Correctable Machine Check

Device 1

Device nSynch (MP only)

ClockInterprocessor Interrupt

HighProfilePower

x64 IA64

12

4646

Interrupt Prioritization amp Delivery

IRQLs are determined as followsndash x86 UP systems IRQL = 27 - IRQ

ndash x86 MP systems bucketized (random)

ndash x64 amp IA64 systems IRQL = IDT vector number 16

On MP systems which processor is chosen to deliver an interruptndash By default any processor can receive an interrupt from any deviceCan be configured with IntFilter utility in Resource Kit

ndash On x86 and x64 systems the IOAPIC (IO advanced programmable interrupt controller) is programmed to interrupt the processor running at the lowest IRQL

ndash On IA64 systems the SAPIC (streamlined advanced programmable interrupt controller) is configured to interrupt one processor for each interrupt sourceProcessors are assigned round robin for each interrupt vector

4747

Software interrupts

Initiating thread dispatching

ndash DPC allow for scheduling actions when kernel is deep within many layers of code

ndash Delayed scheduling decision one DPC queue per processor

Handling timer expiration

Asynchronous execution of a procedure in context of a particular thread

Support for asynchronous IO operations

4848

Flow of Interrupts

4949

Sync on MP use spinlocks to coordinate among processors

Spinlock acquisition and release routines implement a one-owner-at-a-time algorithm

ndash Spinlock is either free or is considered to be owned by a CPU

ndash Analogous to using Windows API mutexes from user mode

A spinlock is just a data cell in memory

ndash Accessed with a test-and-modify operation that is atomic across all processors

ndash KSPIN_LOCK is an opaque data type typedefrsquod as a ULONG

ndash To implement synchronization a single bit is sufficient

Synchronization on SMP Systems

5050

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

Kernel Synchronization

Processor BProcessor A

Critical section

spinlock

DPC DPC

A spinlock is a locking primitive associatedwith a global data structure such as the DPC queue

5151

Queued Spinlocks

Problem Checking status of spinlock via test-and-set operation creates bus contention

Queued spinlocks maintain queue of waiting processors

First processor acquires lock other processors wait on processor-local flag

ndash Thus busy-wait loop requires no access to the memory bus

When releasing lock the 1st processorrsquos flag is modified

ndash Exactly one processor is being signaled

ndash Pre-determined wait order

5252

SMP Scalability Improvements

Windows 2000 queued spinlocks

ndash qlocks in Kernel Debugger

Server 2003

ndash More spinlocks eliminated (context swap system space commit)

ndash Further reduction of use of spinlocks amp length they are held

ndash Scheduling database now per-CPUAllows thread state transitions in parallel

5353

SMP Scalability Improvements

XP2003

ndash Minimized lock contention for hot locks PFN or Page Frame Database lock

ndash Some locks completely eliminatedCharging nonpagedpaged pool quotas allocating and mapping system page table entries charging commitment of pages allocatingmapping physical memory through AWE functions

ndash New more efficient locking mechanism (pushlocks)Doesnrsquot use spinlocks when no contentionUsed for object manager and address windowing extensions (AWE) related locks

5454

Waiting

Flexible wait calls

ndash Wait for one or multiple objects in one call

ndash Wait for multiple can wait for ldquoanyrdquo one or ldquoallrdquo at onceldquoAllrdquo all objects must be in the signalled state concurrently to resolve the wait

ndash All wait calls include optional timeout argument

ndash Waiting threads consume no CPU time

5555

Waiting

Waitable objects include

ndash Events (may be auto-reset or manual reset may be set or ldquopulsedrdquo)

ndash Mutexes (ldquomutual exclusionrdquo one-at-a-time)

ndash Semaphores (n-at-a-time)

ndash Timers

ndash Processes and Threads (signalled upon exit or terminate)

ndash Directories (change notification)

5656

Waiting

No guaranteed ordering of wait resolution

ndash If multiple threads are waiting for an object and only one thread is released (eg itrsquos a mutex or auto-reset event) which thread gets released is unpredictable

5757

Executive Synchronization

Thread waitson an object

handle

Create and initialize thread object

Initialized

Ready

Transition

Waiting

Running

Terminated

Standby

Wait is completeSet object to

signaled state

Interaction with thread scheduling

Waiting on Dispatcher Objects ndash outside the kernel

5858

Interaction bet Synchronization amp Dispatching User mode thread waits on an event objectlsquos handle

Kernel changes threadlsquos scheduling state from ready to waiting and adds thread to wait-list

Another thread sets the event

Kernel wakes up waiting threads variable priority threads get priority boost

5959

Interaction bet Synchronization amp Dispatching Dispatcher re-schedules new thread ndash it may preempt running thread it it has lower priority and issues software interrupt to initiate context switch

If no processor can be preempted the dispatcher places the ready thread in the dispatcher ready queue to be scheduled later

6060

What signals an object

Dispatcher object

System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

Owning thread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (kernel mode)

nonsignaled signaled

Owning thread or otherthread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (exported to user mode)

nonsignaled signaled

One thread releases thesemaphore freeing a resource

A thread acquires the semaphoreMore resources are not available

Kernel resumes one or more waiting threads

Semaphore

6161

A thread reinitializesthe thread object

What signals an object (contd)

Dispatcher object System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

A thread sets the event

Kernel resumes one or more threads

Kernel resumes one or more waiting threads

Event

nonsignaled signaled

Dedicated thread sets oneevent in the event pair

Kernel resumes theother dedicated thread

Kernel resumes waitingdedicated thread

Event pair

nonsignaled signaled

Timer expires

A thread (re) initializes the timer

Kernel resumes all waiting threads

Timer

nonsignaled signaled

Thread terminates

Kernel resumes all waiting threads

Thread

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 22: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

2222

Critical Section of n Threads

Shared data

semaphore mutex initially mutex = 1

Thread Ti

do wait(mutex) critical section

signal(mutex) remainder section

while (1)

2323

Semaphore Implementation

Semaphores may suspendresume threads

ndash Avoid busy waiting

Define a semaphore as a record

typedef struct

int value struct thread L semaphore

Assume two simple operations

ndash suspend() suspends the thread that invokes it

ndash resume(T) resumes the execution of a blocked thread T

2424

Implementation

Semaphore operations now defined as wait(S)

Svalue--

if (Svalue lt 0)

add this thread to SLsuspend()

signal(S) Svalue++

if (Svalue lt= 0)

remove a thread T from SLresume(T)

2525

Semaphore as a General Synchronization Tool Execute B in Tj only after A executed in Ti

Use semaphore flag initialized to 0

Code

Ti Tj

A wait(flag)

signal(flag) B

2626

Two Types of Semaphores

Counting semaphore

ndash integer value can range over an unrestricted domain

Binary semaphore

ndash integer value can range only between 0 and 1

ndash can be simpler to implement

Counting semaphore S can be implemented as a binary semaphore

2727

Deadlock and Starvation

Deadlock ndash

ndash two or more threads are waiting indefinitely for an event that can be caused by only one of the waiting threads

Let S and Q be two semaphores initialized to 1

T0 T1

wait(S) wait(Q)

wait(Q) wait(S)

signal(S) signal(Q)

signal(Q) signal(S)

2828

Deadlock and Starvation

Starvation ndash indefinite blocking

ndash A thread may never be removed from the semaphore queue in which it is suspended

Solution ndash

ndash all code should acquirerelease semaphores in same order

2929

Windows Synchronization

Uses interrupt masks to protect access to global resources on uniprocessor systems

Uses spinlocks on multiprocessor systems

Provides dispatcher objects which may act as mutexes and semaphores

Dispatcher objects may also provide events An event acts much like a condition variable

3030

Linux Synchronization

Kernel disables interrupts for synchronizing access to global data on uniprocessor systems

Uses spinlocks for multiprocessor synchronization

Uses semaphores and readers-writers locks when longer sections of code need access to data

Implements POSIX synchronization primitives to support multitasking multithreading (including real-time threads) and multiprocessing

3131

Further Reading

Ben-Ari M Principles of Concurrent Programming Prentice Hall 1982

Lamport L The Mutual Exclusion Problem Journal of the ACM April 1986

Abraham Silberschatz Peter B Galvin Operating System Concepts John Wiley amp Sons 6th Ed 2003

ndash Chapter 7 - Process Synchronization

ndash Chapter 8 - Deadlocks

3232

32 Trap Dispatching Interrupts Synchronization Trap and Interrupt dispatching

IRQL levels amp Interrupt Precedence

Spinlocks and Kernel Synchronization

Executive Synchronization

3333

Kernel Mode Versus User Mode

A processor state

Controls access to memory

Each memory page is tagged to show the required mode for reading and for writing

ndash Protects the system from the users

ndash Protects the user (process) from themselves

ndash System is not protected from system

Code regions are tagged ldquono write in any moderdquo

Controls ability to execute privileged instructions

A Windows abstraction

ndash Intel Ring 0 Ring 3

3434

Kernel Mode Versus User Mode

Control flow (a thread) can change from user to kernel mode and back

ndash Does not affect scheduling

ndash Thread context includes info about execution mode (along with registers etc)

PerfMon counters

ndash ldquoPrivileged Timerdquo and ldquoUser Timerdquo

ndash 4 levels of granularity thread process processor system

3535

Getting Into Kernel Mode

Code is run in kernel mode for one of three reasons

1 Requests from user mode

ndash Via the system service dispatch mechanism

ndash Kernel-mode code runs in the context of the requesting thread

2 Dedicated kernel-mode system threads

ndash Some threads in the system stay in kernel mode at all timesmostly in the ldquoSystemrdquo process

ndash Scheduled preempted etc like any other threads

3636

Getting Into Kernel Mode

3 Interrupts from external devicesndash interrupt dispatcher invokes the interrupt service routine

ndash ISR runs in the context of the interrupted thread so-called ldquoarbitrary thread contextrdquo

ndash ISR often requests the execution of a ldquoDPC routinerdquo which also runs in kernel mode

ndash Time not charged to interrupted thread

3737

Trap dispatching

Interruptdispatcher

Systemservice

dispatcher

Interruptserviceroutines

Interruptserviceroutines

Interruptserviceroutines

System services

System services

System services

Exceptiondispatcher

Exceptionhandlers

Exceptionhandlers

Exceptionhandlers

Virtual memorymanagerlsquos pager

Interrupt

System service call

HW exceptionsSW exceptions

Virtual addressexceptions

Trap processorlsquos mechanism to capture executing thread

ndash Switch from user to kernel mode

ndash Interrupts ndash asynchronous

ndash Exceptions - synchronous

Interrupt dispatch routine

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Interrupt service routine

interrupt

user or kernel mode

codekernel mode

Note no thread or process context switch

Note no thread or process context switch

Interrupt Dispatching

3939

IRQL = Interrupt Request Level

ndash Precedence of the interrupt with respect to other interrupts

ndash Different interrupt sources have different IRQLs

ndash not the same as IRQ

IRQL is also a state of the processor

ndash Servicing an interrupt raises processor IRQL to that interruptrsquos IRQL

ndash this masks subsequent interrupts at equal and lower IRQLs

User mode is limited to IRQL 0

No waits or page faults at IRQL gt= DISPATCH_LEVEL

Interrupt Precedence via IRQLs (x86)

4040

PassiveLowAPC

DispatchDPCDevice 1

Profile amp Synch (Srv 2003)

ClockInterprocessor Interrupt

Power failHigh

normal thread execution

Hardware interrupts

Deferrable software interrupts

012

302928

31

Interrupt Precedence via IRQLs (x86)

4141

Interrupt processing

Interrupt dispatch table (IDT)ndash Links to interrupt service routines

x86ndash Interrupt controller interrupts processor (single line)

ndash Processor queries for interrupt vector uses vector as index to IDT

Alphandash PAL code (Privileged Architecture Library ndash Alpha BIOS) determines interrupt vector calls kernel

ndash Kernel uses vector to index IDT

After ISR execution IRQL is lowered to initial level

4242

Interrupt object

Allows device drivers to register ISRs for their devicesndash Contains dispatch code (initial handler)

ndash Dispatch code calls ISR with interrupt object as parameter(HW cannot pass parameters to ISR)

Connectingdisconnecting interrupt objectsndash Dynamic association between ISR and IDT entry

ndash Loadable device drivers (kernel modules)

ndash Turn onoff ISR

Interrupt objects can synchronize access to ISR datandash Multiple instances of ISR may be active simultaneously (MP machine)

ndash Multiple ISR may be connected with IRQL

4343

Predefined IRQLs

High ndash used when halting the system (via KeBugCheck())

Power fail ndash originated in the NT design document but has never been used

Inter-processor interruptndash used to request action from other processor (dispatching a thread updating a processors TLB system shutdown system crash)

Clockndash Used to update systemlsquos clock allocation of CPU time to threads

Profilendash Used for kernel profiling (see Kernel profiler ndash Kernprofexe Res Kit)

4444

Predefined IRQLs (contd)

Device

ndash Used to prioritize device interrupts

DPCdispatch and APC

ndash Software interrupts that kernel and device drivers generate

Passive

ndash No interrupt level at all normal thread execution

4545

IRQLs on 64-bit Systems

PassiveLowAPC

DispatchDPCDevice 1

Device n

Synch (Srv 2003)Clock

Interprocessor InterruptPower

HighProfile

012

1413

15

34

PassiveLowAPC

DispatchDPC amp Synch (UP only)Correctable Machine Check

Device 1

Device nSynch (MP only)

ClockInterprocessor Interrupt

HighProfilePower

x64 IA64

12

4646

Interrupt Prioritization amp Delivery

IRQLs are determined as followsndash x86 UP systems IRQL = 27 - IRQ

ndash x86 MP systems bucketized (random)

ndash x64 amp IA64 systems IRQL = IDT vector number 16

On MP systems which processor is chosen to deliver an interruptndash By default any processor can receive an interrupt from any deviceCan be configured with IntFilter utility in Resource Kit

ndash On x86 and x64 systems the IOAPIC (IO advanced programmable interrupt controller) is programmed to interrupt the processor running at the lowest IRQL

ndash On IA64 systems the SAPIC (streamlined advanced programmable interrupt controller) is configured to interrupt one processor for each interrupt sourceProcessors are assigned round robin for each interrupt vector

4747

Software interrupts

Initiating thread dispatching

ndash DPC allow for scheduling actions when kernel is deep within many layers of code

ndash Delayed scheduling decision one DPC queue per processor

Handling timer expiration

Asynchronous execution of a procedure in context of a particular thread

Support for asynchronous IO operations

4848

Flow of Interrupts

4949

Sync on MP use spinlocks to coordinate among processors

Spinlock acquisition and release routines implement a one-owner-at-a-time algorithm

ndash Spinlock is either free or is considered to be owned by a CPU

ndash Analogous to using Windows API mutexes from user mode

A spinlock is just a data cell in memory

ndash Accessed with a test-and-modify operation that is atomic across all processors

ndash KSPIN_LOCK is an opaque data type typedefrsquod as a ULONG

ndash To implement synchronization a single bit is sufficient

Synchronization on SMP Systems

5050

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

Kernel Synchronization

Processor BProcessor A

Critical section

spinlock

DPC DPC

A spinlock is a locking primitive associatedwith a global data structure such as the DPC queue

5151

Queued Spinlocks

Problem Checking status of spinlock via test-and-set operation creates bus contention

Queued spinlocks maintain queue of waiting processors

First processor acquires lock other processors wait on processor-local flag

ndash Thus busy-wait loop requires no access to the memory bus

When releasing lock the 1st processorrsquos flag is modified

ndash Exactly one processor is being signaled

ndash Pre-determined wait order

5252

SMP Scalability Improvements

Windows 2000 queued spinlocks

ndash qlocks in Kernel Debugger

Server 2003

ndash More spinlocks eliminated (context swap system space commit)

ndash Further reduction of use of spinlocks amp length they are held

ndash Scheduling database now per-CPUAllows thread state transitions in parallel

5353

SMP Scalability Improvements

XP2003

ndash Minimized lock contention for hot locks PFN or Page Frame Database lock

ndash Some locks completely eliminatedCharging nonpagedpaged pool quotas allocating and mapping system page table entries charging commitment of pages allocatingmapping physical memory through AWE functions

ndash New more efficient locking mechanism (pushlocks)Doesnrsquot use spinlocks when no contentionUsed for object manager and address windowing extensions (AWE) related locks

5454

Waiting

Flexible wait calls

ndash Wait for one or multiple objects in one call

ndash Wait for multiple can wait for ldquoanyrdquo one or ldquoallrdquo at onceldquoAllrdquo all objects must be in the signalled state concurrently to resolve the wait

ndash All wait calls include optional timeout argument

ndash Waiting threads consume no CPU time

5555

Waiting

Waitable objects include

ndash Events (may be auto-reset or manual reset may be set or ldquopulsedrdquo)

ndash Mutexes (ldquomutual exclusionrdquo one-at-a-time)

ndash Semaphores (n-at-a-time)

ndash Timers

ndash Processes and Threads (signalled upon exit or terminate)

ndash Directories (change notification)

5656

Waiting

No guaranteed ordering of wait resolution

ndash If multiple threads are waiting for an object and only one thread is released (eg itrsquos a mutex or auto-reset event) which thread gets released is unpredictable

5757

Executive Synchronization

Thread waitson an object

handle

Create and initialize thread object

Initialized

Ready

Transition

Waiting

Running

Terminated

Standby

Wait is completeSet object to

signaled state

Interaction with thread scheduling

Waiting on Dispatcher Objects ndash outside the kernel

5858

Interaction bet Synchronization amp Dispatching User mode thread waits on an event objectlsquos handle

Kernel changes threadlsquos scheduling state from ready to waiting and adds thread to wait-list

Another thread sets the event

Kernel wakes up waiting threads variable priority threads get priority boost

5959

Interaction bet Synchronization amp Dispatching Dispatcher re-schedules new thread ndash it may preempt running thread it it has lower priority and issues software interrupt to initiate context switch

If no processor can be preempted the dispatcher places the ready thread in the dispatcher ready queue to be scheduled later

6060

What signals an object

Dispatcher object

System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

Owning thread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (kernel mode)

nonsignaled signaled

Owning thread or otherthread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (exported to user mode)

nonsignaled signaled

One thread releases thesemaphore freeing a resource

A thread acquires the semaphoreMore resources are not available

Kernel resumes one or more waiting threads

Semaphore

6161

A thread reinitializesthe thread object

What signals an object (contd)

Dispatcher object System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

A thread sets the event

Kernel resumes one or more threads

Kernel resumes one or more waiting threads

Event

nonsignaled signaled

Dedicated thread sets oneevent in the event pair

Kernel resumes theother dedicated thread

Kernel resumes waitingdedicated thread

Event pair

nonsignaled signaled

Timer expires

A thread (re) initializes the timer

Kernel resumes all waiting threads

Timer

nonsignaled signaled

Thread terminates

Kernel resumes all waiting threads

Thread

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 23: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

2323

Semaphore Implementation

Semaphores may suspendresume threads

ndash Avoid busy waiting

Define a semaphore as a record

typedef struct

int value struct thread L semaphore

Assume two simple operations

ndash suspend() suspends the thread that invokes it

ndash resume(T) resumes the execution of a blocked thread T

2424

Implementation

Semaphore operations now defined as wait(S)

Svalue--

if (Svalue lt 0)

add this thread to SLsuspend()

signal(S) Svalue++

if (Svalue lt= 0)

remove a thread T from SLresume(T)

2525

Semaphore as a General Synchronization Tool Execute B in Tj only after A executed in Ti

Use semaphore flag initialized to 0

Code

Ti Tj

A wait(flag)

signal(flag) B

2626

Two Types of Semaphores

Counting semaphore

ndash integer value can range over an unrestricted domain

Binary semaphore

ndash integer value can range only between 0 and 1

ndash can be simpler to implement

Counting semaphore S can be implemented as a binary semaphore

2727

Deadlock and Starvation

Deadlock ndash

ndash two or more threads are waiting indefinitely for an event that can be caused by only one of the waiting threads

Let S and Q be two semaphores initialized to 1

T0 T1

wait(S) wait(Q)

wait(Q) wait(S)

signal(S) signal(Q)

signal(Q) signal(S)

2828

Deadlock and Starvation

Starvation ndash indefinite blocking

ndash A thread may never be removed from the semaphore queue in which it is suspended

Solution ndash

ndash all code should acquirerelease semaphores in same order

2929

Windows Synchronization

Uses interrupt masks to protect access to global resources on uniprocessor systems

Uses spinlocks on multiprocessor systems

Provides dispatcher objects which may act as mutexes and semaphores

Dispatcher objects may also provide events An event acts much like a condition variable

3030

Linux Synchronization

Kernel disables interrupts for synchronizing access to global data on uniprocessor systems

Uses spinlocks for multiprocessor synchronization

Uses semaphores and readers-writers locks when longer sections of code need access to data

Implements POSIX synchronization primitives to support multitasking multithreading (including real-time threads) and multiprocessing

3131

Further Reading

Ben-Ari M Principles of Concurrent Programming Prentice Hall 1982

Lamport L The Mutual Exclusion Problem Journal of the ACM April 1986

Abraham Silberschatz Peter B Galvin Operating System Concepts John Wiley amp Sons 6th Ed 2003

ndash Chapter 7 - Process Synchronization

ndash Chapter 8 - Deadlocks

3232

32 Trap Dispatching Interrupts Synchronization Trap and Interrupt dispatching

IRQL levels amp Interrupt Precedence

Spinlocks and Kernel Synchronization

Executive Synchronization

3333

Kernel Mode Versus User Mode

A processor state

Controls access to memory

Each memory page is tagged to show the required mode for reading and for writing

ndash Protects the system from the users

ndash Protects the user (process) from themselves

ndash System is not protected from system

Code regions are tagged ldquono write in any moderdquo

Controls ability to execute privileged instructions

A Windows abstraction

ndash Intel Ring 0 Ring 3

3434

Kernel Mode Versus User Mode

Control flow (a thread) can change from user to kernel mode and back

ndash Does not affect scheduling

ndash Thread context includes info about execution mode (along with registers etc)

PerfMon counters

ndash ldquoPrivileged Timerdquo and ldquoUser Timerdquo

ndash 4 levels of granularity thread process processor system

3535

Getting Into Kernel Mode

Code is run in kernel mode for one of three reasons

1 Requests from user mode

ndash Via the system service dispatch mechanism

ndash Kernel-mode code runs in the context of the requesting thread

2 Dedicated kernel-mode system threads

ndash Some threads in the system stay in kernel mode at all timesmostly in the ldquoSystemrdquo process

ndash Scheduled preempted etc like any other threads

3636

Getting Into Kernel Mode

3 Interrupts from external devicesndash interrupt dispatcher invokes the interrupt service routine

ndash ISR runs in the context of the interrupted thread so-called ldquoarbitrary thread contextrdquo

ndash ISR often requests the execution of a ldquoDPC routinerdquo which also runs in kernel mode

ndash Time not charged to interrupted thread

3737

Trap dispatching

Interruptdispatcher

Systemservice

dispatcher

Interruptserviceroutines

Interruptserviceroutines

Interruptserviceroutines

System services

System services

System services

Exceptiondispatcher

Exceptionhandlers

Exceptionhandlers

Exceptionhandlers

Virtual memorymanagerlsquos pager

Interrupt

System service call

HW exceptionsSW exceptions

Virtual addressexceptions

Trap processorlsquos mechanism to capture executing thread

ndash Switch from user to kernel mode

ndash Interrupts ndash asynchronous

ndash Exceptions - synchronous

Interrupt dispatch routine

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Interrupt service routine

interrupt

user or kernel mode

codekernel mode

Note no thread or process context switch

Note no thread or process context switch

Interrupt Dispatching

3939

IRQL = Interrupt Request Level

ndash Precedence of the interrupt with respect to other interrupts

ndash Different interrupt sources have different IRQLs

ndash not the same as IRQ

IRQL is also a state of the processor

ndash Servicing an interrupt raises processor IRQL to that interruptrsquos IRQL

ndash this masks subsequent interrupts at equal and lower IRQLs

User mode is limited to IRQL 0

No waits or page faults at IRQL gt= DISPATCH_LEVEL

Interrupt Precedence via IRQLs (x86)

4040

PassiveLowAPC

DispatchDPCDevice 1

Profile amp Synch (Srv 2003)

ClockInterprocessor Interrupt

Power failHigh

normal thread execution

Hardware interrupts

Deferrable software interrupts

012

302928

31

Interrupt Precedence via IRQLs (x86)

4141

Interrupt processing

Interrupt dispatch table (IDT)ndash Links to interrupt service routines

x86ndash Interrupt controller interrupts processor (single line)

ndash Processor queries for interrupt vector uses vector as index to IDT

Alphandash PAL code (Privileged Architecture Library ndash Alpha BIOS) determines interrupt vector calls kernel

ndash Kernel uses vector to index IDT

After ISR execution IRQL is lowered to initial level

4242

Interrupt object

Allows device drivers to register ISRs for their devicesndash Contains dispatch code (initial handler)

ndash Dispatch code calls ISR with interrupt object as parameter(HW cannot pass parameters to ISR)

Connectingdisconnecting interrupt objectsndash Dynamic association between ISR and IDT entry

ndash Loadable device drivers (kernel modules)

ndash Turn onoff ISR

Interrupt objects can synchronize access to ISR datandash Multiple instances of ISR may be active simultaneously (MP machine)

ndash Multiple ISR may be connected with IRQL

4343

Predefined IRQLs

High ndash used when halting the system (via KeBugCheck())

Power fail ndash originated in the NT design document but has never been used

Inter-processor interruptndash used to request action from other processor (dispatching a thread updating a processors TLB system shutdown system crash)

Clockndash Used to update systemlsquos clock allocation of CPU time to threads

Profilendash Used for kernel profiling (see Kernel profiler ndash Kernprofexe Res Kit)

4444

Predefined IRQLs (contd)

Device

ndash Used to prioritize device interrupts

DPCdispatch and APC

ndash Software interrupts that kernel and device drivers generate

Passive

ndash No interrupt level at all normal thread execution

4545

IRQLs on 64-bit Systems

PassiveLowAPC

DispatchDPCDevice 1

Device n

Synch (Srv 2003)Clock

Interprocessor InterruptPower

HighProfile

012

1413

15

34

PassiveLowAPC

DispatchDPC amp Synch (UP only)Correctable Machine Check

Device 1

Device nSynch (MP only)

ClockInterprocessor Interrupt

HighProfilePower

x64 IA64

12

4646

Interrupt Prioritization amp Delivery

IRQLs are determined as followsndash x86 UP systems IRQL = 27 - IRQ

ndash x86 MP systems bucketized (random)

ndash x64 amp IA64 systems IRQL = IDT vector number 16

On MP systems which processor is chosen to deliver an interruptndash By default any processor can receive an interrupt from any deviceCan be configured with IntFilter utility in Resource Kit

ndash On x86 and x64 systems the IOAPIC (IO advanced programmable interrupt controller) is programmed to interrupt the processor running at the lowest IRQL

ndash On IA64 systems the SAPIC (streamlined advanced programmable interrupt controller) is configured to interrupt one processor for each interrupt sourceProcessors are assigned round robin for each interrupt vector

4747

Software interrupts

Initiating thread dispatching

ndash DPC allow for scheduling actions when kernel is deep within many layers of code

ndash Delayed scheduling decision one DPC queue per processor

Handling timer expiration

Asynchronous execution of a procedure in context of a particular thread

Support for asynchronous IO operations

4848

Flow of Interrupts

4949

Sync on MP use spinlocks to coordinate among processors

Spinlock acquisition and release routines implement a one-owner-at-a-time algorithm

ndash Spinlock is either free or is considered to be owned by a CPU

ndash Analogous to using Windows API mutexes from user mode

A spinlock is just a data cell in memory

ndash Accessed with a test-and-modify operation that is atomic across all processors

ndash KSPIN_LOCK is an opaque data type typedefrsquod as a ULONG

ndash To implement synchronization a single bit is sufficient

Synchronization on SMP Systems

5050

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

Kernel Synchronization

Processor BProcessor A

Critical section

spinlock

DPC DPC

A spinlock is a locking primitive associatedwith a global data structure such as the DPC queue

5151

Queued Spinlocks

Problem Checking status of spinlock via test-and-set operation creates bus contention

Queued spinlocks maintain queue of waiting processors

First processor acquires lock other processors wait on processor-local flag

ndash Thus busy-wait loop requires no access to the memory bus

When releasing lock the 1st processorrsquos flag is modified

ndash Exactly one processor is being signaled

ndash Pre-determined wait order

5252

SMP Scalability Improvements

Windows 2000 queued spinlocks

ndash qlocks in Kernel Debugger

Server 2003

ndash More spinlocks eliminated (context swap system space commit)

ndash Further reduction of use of spinlocks amp length they are held

ndash Scheduling database now per-CPUAllows thread state transitions in parallel

5353

SMP Scalability Improvements

XP2003

ndash Minimized lock contention for hot locks PFN or Page Frame Database lock

ndash Some locks completely eliminatedCharging nonpagedpaged pool quotas allocating and mapping system page table entries charging commitment of pages allocatingmapping physical memory through AWE functions

ndash New more efficient locking mechanism (pushlocks)Doesnrsquot use spinlocks when no contentionUsed for object manager and address windowing extensions (AWE) related locks

5454

Waiting

Flexible wait calls

ndash Wait for one or multiple objects in one call

ndash Wait for multiple can wait for ldquoanyrdquo one or ldquoallrdquo at onceldquoAllrdquo all objects must be in the signalled state concurrently to resolve the wait

ndash All wait calls include optional timeout argument

ndash Waiting threads consume no CPU time

5555

Waiting

Waitable objects include

ndash Events (may be auto-reset or manual reset may be set or ldquopulsedrdquo)

ndash Mutexes (ldquomutual exclusionrdquo one-at-a-time)

ndash Semaphores (n-at-a-time)

ndash Timers

ndash Processes and Threads (signalled upon exit or terminate)

ndash Directories (change notification)

5656

Waiting

No guaranteed ordering of wait resolution

ndash If multiple threads are waiting for an object and only one thread is released (eg itrsquos a mutex or auto-reset event) which thread gets released is unpredictable

5757

Executive Synchronization

Thread waitson an object

handle

Create and initialize thread object

Initialized

Ready

Transition

Waiting

Running

Terminated

Standby

Wait is completeSet object to

signaled state

Interaction with thread scheduling

Waiting on Dispatcher Objects ndash outside the kernel

5858

Interaction bet Synchronization amp Dispatching User mode thread waits on an event objectlsquos handle

Kernel changes threadlsquos scheduling state from ready to waiting and adds thread to wait-list

Another thread sets the event

Kernel wakes up waiting threads variable priority threads get priority boost

5959

Interaction bet Synchronization amp Dispatching Dispatcher re-schedules new thread ndash it may preempt running thread it it has lower priority and issues software interrupt to initiate context switch

If no processor can be preempted the dispatcher places the ready thread in the dispatcher ready queue to be scheduled later

6060

What signals an object

Dispatcher object

System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

Owning thread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (kernel mode)

nonsignaled signaled

Owning thread or otherthread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (exported to user mode)

nonsignaled signaled

One thread releases thesemaphore freeing a resource

A thread acquires the semaphoreMore resources are not available

Kernel resumes one or more waiting threads

Semaphore

6161

A thread reinitializesthe thread object

What signals an object (contd)

Dispatcher object System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

A thread sets the event

Kernel resumes one or more threads

Kernel resumes one or more waiting threads

Event

nonsignaled signaled

Dedicated thread sets oneevent in the event pair

Kernel resumes theother dedicated thread

Kernel resumes waitingdedicated thread

Event pair

nonsignaled signaled

Timer expires

A thread (re) initializes the timer

Kernel resumes all waiting threads

Timer

nonsignaled signaled

Thread terminates

Kernel resumes all waiting threads

Thread

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 24: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

2424

Implementation

Semaphore operations now defined as wait(S)

Svalue--

if (Svalue lt 0)

add this thread to SLsuspend()

signal(S) Svalue++

if (Svalue lt= 0)

remove a thread T from SLresume(T)

2525

Semaphore as a General Synchronization Tool Execute B in Tj only after A executed in Ti

Use semaphore flag initialized to 0

Code

Ti Tj

A wait(flag)

signal(flag) B

2626

Two Types of Semaphores

Counting semaphore

ndash integer value can range over an unrestricted domain

Binary semaphore

ndash integer value can range only between 0 and 1

ndash can be simpler to implement

Counting semaphore S can be implemented as a binary semaphore

2727

Deadlock and Starvation

Deadlock ndash

ndash two or more threads are waiting indefinitely for an event that can be caused by only one of the waiting threads

Let S and Q be two semaphores initialized to 1

T0 T1

wait(S) wait(Q)

wait(Q) wait(S)

signal(S) signal(Q)

signal(Q) signal(S)

2828

Deadlock and Starvation

Starvation ndash indefinite blocking

ndash A thread may never be removed from the semaphore queue in which it is suspended

Solution ndash

ndash all code should acquirerelease semaphores in same order

2929

Windows Synchronization

Uses interrupt masks to protect access to global resources on uniprocessor systems

Uses spinlocks on multiprocessor systems

Provides dispatcher objects which may act as mutexes and semaphores

Dispatcher objects may also provide events An event acts much like a condition variable

3030

Linux Synchronization

Kernel disables interrupts for synchronizing access to global data on uniprocessor systems

Uses spinlocks for multiprocessor synchronization

Uses semaphores and readers-writers locks when longer sections of code need access to data

Implements POSIX synchronization primitives to support multitasking multithreading (including real-time threads) and multiprocessing

3131

Further Reading

Ben-Ari M Principles of Concurrent Programming Prentice Hall 1982

Lamport L The Mutual Exclusion Problem Journal of the ACM April 1986

Abraham Silberschatz Peter B Galvin Operating System Concepts John Wiley amp Sons 6th Ed 2003

ndash Chapter 7 - Process Synchronization

ndash Chapter 8 - Deadlocks

3232

32 Trap Dispatching Interrupts Synchronization Trap and Interrupt dispatching

IRQL levels amp Interrupt Precedence

Spinlocks and Kernel Synchronization

Executive Synchronization

3333

Kernel Mode Versus User Mode

A processor state

Controls access to memory

Each memory page is tagged to show the required mode for reading and for writing

ndash Protects the system from the users

ndash Protects the user (process) from themselves

ndash System is not protected from system

Code regions are tagged ldquono write in any moderdquo

Controls ability to execute privileged instructions

A Windows abstraction

ndash Intel Ring 0 Ring 3

3434

Kernel Mode Versus User Mode

Control flow (a thread) can change from user to kernel mode and back

ndash Does not affect scheduling

ndash Thread context includes info about execution mode (along with registers etc)

PerfMon counters

ndash ldquoPrivileged Timerdquo and ldquoUser Timerdquo

ndash 4 levels of granularity thread process processor system

3535

Getting Into Kernel Mode

Code is run in kernel mode for one of three reasons

1 Requests from user mode

ndash Via the system service dispatch mechanism

ndash Kernel-mode code runs in the context of the requesting thread

2 Dedicated kernel-mode system threads

ndash Some threads in the system stay in kernel mode at all timesmostly in the ldquoSystemrdquo process

ndash Scheduled preempted etc like any other threads

3636

Getting Into Kernel Mode

3 Interrupts from external devicesndash interrupt dispatcher invokes the interrupt service routine

ndash ISR runs in the context of the interrupted thread so-called ldquoarbitrary thread contextrdquo

ndash ISR often requests the execution of a ldquoDPC routinerdquo which also runs in kernel mode

ndash Time not charged to interrupted thread

3737

Trap dispatching

Interruptdispatcher

Systemservice

dispatcher

Interruptserviceroutines

Interruptserviceroutines

Interruptserviceroutines

System services

System services

System services

Exceptiondispatcher

Exceptionhandlers

Exceptionhandlers

Exceptionhandlers

Virtual memorymanagerlsquos pager

Interrupt

System service call

HW exceptionsSW exceptions

Virtual addressexceptions

Trap processorlsquos mechanism to capture executing thread

ndash Switch from user to kernel mode

ndash Interrupts ndash asynchronous

ndash Exceptions - synchronous

Interrupt dispatch routine

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Interrupt service routine

interrupt

user or kernel mode

codekernel mode

Note no thread or process context switch

Note no thread or process context switch

Interrupt Dispatching

3939

IRQL = Interrupt Request Level

ndash Precedence of the interrupt with respect to other interrupts

ndash Different interrupt sources have different IRQLs

ndash not the same as IRQ

IRQL is also a state of the processor

ndash Servicing an interrupt raises processor IRQL to that interruptrsquos IRQL

ndash this masks subsequent interrupts at equal and lower IRQLs

User mode is limited to IRQL 0

No waits or page faults at IRQL gt= DISPATCH_LEVEL

Interrupt Precedence via IRQLs (x86)

4040

PassiveLowAPC

DispatchDPCDevice 1

Profile amp Synch (Srv 2003)

ClockInterprocessor Interrupt

Power failHigh

normal thread execution

Hardware interrupts

Deferrable software interrupts

012

302928

31

Interrupt Precedence via IRQLs (x86)

4141

Interrupt processing

Interrupt dispatch table (IDT)ndash Links to interrupt service routines

x86ndash Interrupt controller interrupts processor (single line)

ndash Processor queries for interrupt vector uses vector as index to IDT

Alphandash PAL code (Privileged Architecture Library ndash Alpha BIOS) determines interrupt vector calls kernel

ndash Kernel uses vector to index IDT

After ISR execution IRQL is lowered to initial level

4242

Interrupt object

Allows device drivers to register ISRs for their devicesndash Contains dispatch code (initial handler)

ndash Dispatch code calls ISR with interrupt object as parameter(HW cannot pass parameters to ISR)

Connectingdisconnecting interrupt objectsndash Dynamic association between ISR and IDT entry

ndash Loadable device drivers (kernel modules)

ndash Turn onoff ISR

Interrupt objects can synchronize access to ISR datandash Multiple instances of ISR may be active simultaneously (MP machine)

ndash Multiple ISR may be connected with IRQL

4343

Predefined IRQLs

High ndash used when halting the system (via KeBugCheck())

Power fail ndash originated in the NT design document but has never been used

Inter-processor interruptndash used to request action from other processor (dispatching a thread updating a processors TLB system shutdown system crash)

Clockndash Used to update systemlsquos clock allocation of CPU time to threads

Profilendash Used for kernel profiling (see Kernel profiler ndash Kernprofexe Res Kit)

4444

Predefined IRQLs (contd)

Device

ndash Used to prioritize device interrupts

DPCdispatch and APC

ndash Software interrupts that kernel and device drivers generate

Passive

ndash No interrupt level at all normal thread execution

4545

IRQLs on 64-bit Systems

PassiveLowAPC

DispatchDPCDevice 1

Device n

Synch (Srv 2003)Clock

Interprocessor InterruptPower

HighProfile

012

1413

15

34

PassiveLowAPC

DispatchDPC amp Synch (UP only)Correctable Machine Check

Device 1

Device nSynch (MP only)

ClockInterprocessor Interrupt

HighProfilePower

x64 IA64

12

4646

Interrupt Prioritization amp Delivery

IRQLs are determined as followsndash x86 UP systems IRQL = 27 - IRQ

ndash x86 MP systems bucketized (random)

ndash x64 amp IA64 systems IRQL = IDT vector number 16

On MP systems which processor is chosen to deliver an interruptndash By default any processor can receive an interrupt from any deviceCan be configured with IntFilter utility in Resource Kit

ndash On x86 and x64 systems the IOAPIC (IO advanced programmable interrupt controller) is programmed to interrupt the processor running at the lowest IRQL

ndash On IA64 systems the SAPIC (streamlined advanced programmable interrupt controller) is configured to interrupt one processor for each interrupt sourceProcessors are assigned round robin for each interrupt vector

4747

Software interrupts

Initiating thread dispatching

ndash DPC allow for scheduling actions when kernel is deep within many layers of code

ndash Delayed scheduling decision one DPC queue per processor

Handling timer expiration

Asynchronous execution of a procedure in context of a particular thread

Support for asynchronous IO operations

4848

Flow of Interrupts

4949

Sync on MP use spinlocks to coordinate among processors

Spinlock acquisition and release routines implement a one-owner-at-a-time algorithm

ndash Spinlock is either free or is considered to be owned by a CPU

ndash Analogous to using Windows API mutexes from user mode

A spinlock is just a data cell in memory

ndash Accessed with a test-and-modify operation that is atomic across all processors

ndash KSPIN_LOCK is an opaque data type typedefrsquod as a ULONG

ndash To implement synchronization a single bit is sufficient

Synchronization on SMP Systems

5050

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

Kernel Synchronization

Processor BProcessor A

Critical section

spinlock

DPC DPC

A spinlock is a locking primitive associatedwith a global data structure such as the DPC queue

5151

Queued Spinlocks

Problem Checking status of spinlock via test-and-set operation creates bus contention

Queued spinlocks maintain queue of waiting processors

First processor acquires lock other processors wait on processor-local flag

ndash Thus busy-wait loop requires no access to the memory bus

When releasing lock the 1st processorrsquos flag is modified

ndash Exactly one processor is being signaled

ndash Pre-determined wait order

5252

SMP Scalability Improvements

Windows 2000 queued spinlocks

ndash qlocks in Kernel Debugger

Server 2003

ndash More spinlocks eliminated (context swap system space commit)

ndash Further reduction of use of spinlocks amp length they are held

ndash Scheduling database now per-CPUAllows thread state transitions in parallel

5353

SMP Scalability Improvements

XP2003

ndash Minimized lock contention for hot locks PFN or Page Frame Database lock

ndash Some locks completely eliminatedCharging nonpagedpaged pool quotas allocating and mapping system page table entries charging commitment of pages allocatingmapping physical memory through AWE functions

ndash New more efficient locking mechanism (pushlocks)Doesnrsquot use spinlocks when no contentionUsed for object manager and address windowing extensions (AWE) related locks

5454

Waiting

Flexible wait calls

ndash Wait for one or multiple objects in one call

ndash Wait for multiple can wait for ldquoanyrdquo one or ldquoallrdquo at onceldquoAllrdquo all objects must be in the signalled state concurrently to resolve the wait

ndash All wait calls include optional timeout argument

ndash Waiting threads consume no CPU time

5555

Waiting

Waitable objects include

ndash Events (may be auto-reset or manual reset may be set or ldquopulsedrdquo)

ndash Mutexes (ldquomutual exclusionrdquo one-at-a-time)

ndash Semaphores (n-at-a-time)

ndash Timers

ndash Processes and Threads (signalled upon exit or terminate)

ndash Directories (change notification)

5656

Waiting

No guaranteed ordering of wait resolution

ndash If multiple threads are waiting for an object and only one thread is released (eg itrsquos a mutex or auto-reset event) which thread gets released is unpredictable

5757

Executive Synchronization

Thread waitson an object

handle

Create and initialize thread object

Initialized

Ready

Transition

Waiting

Running

Terminated

Standby

Wait is completeSet object to

signaled state

Interaction with thread scheduling

Waiting on Dispatcher Objects ndash outside the kernel

5858

Interaction bet Synchronization amp Dispatching User mode thread waits on an event objectlsquos handle

Kernel changes threadlsquos scheduling state from ready to waiting and adds thread to wait-list

Another thread sets the event

Kernel wakes up waiting threads variable priority threads get priority boost

5959

Interaction bet Synchronization amp Dispatching Dispatcher re-schedules new thread ndash it may preempt running thread it it has lower priority and issues software interrupt to initiate context switch

If no processor can be preempted the dispatcher places the ready thread in the dispatcher ready queue to be scheduled later

6060

What signals an object

Dispatcher object

System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

Owning thread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (kernel mode)

nonsignaled signaled

Owning thread or otherthread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (exported to user mode)

nonsignaled signaled

One thread releases thesemaphore freeing a resource

A thread acquires the semaphoreMore resources are not available

Kernel resumes one or more waiting threads

Semaphore

6161

A thread reinitializesthe thread object

What signals an object (contd)

Dispatcher object System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

A thread sets the event

Kernel resumes one or more threads

Kernel resumes one or more waiting threads

Event

nonsignaled signaled

Dedicated thread sets oneevent in the event pair

Kernel resumes theother dedicated thread

Kernel resumes waitingdedicated thread

Event pair

nonsignaled signaled

Timer expires

A thread (re) initializes the timer

Kernel resumes all waiting threads

Timer

nonsignaled signaled

Thread terminates

Kernel resumes all waiting threads

Thread

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 25: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

2525

Semaphore as a General Synchronization Tool Execute B in Tj only after A executed in Ti

Use semaphore flag initialized to 0

Code

Ti Tj

A wait(flag)

signal(flag) B

2626

Two Types of Semaphores

Counting semaphore

ndash integer value can range over an unrestricted domain

Binary semaphore

ndash integer value can range only between 0 and 1

ndash can be simpler to implement

Counting semaphore S can be implemented as a binary semaphore

2727

Deadlock and Starvation

Deadlock ndash

ndash two or more threads are waiting indefinitely for an event that can be caused by only one of the waiting threads

Let S and Q be two semaphores initialized to 1

T0 T1

wait(S) wait(Q)

wait(Q) wait(S)

signal(S) signal(Q)

signal(Q) signal(S)

2828

Deadlock and Starvation

Starvation ndash indefinite blocking

ndash A thread may never be removed from the semaphore queue in which it is suspended

Solution ndash

ndash all code should acquirerelease semaphores in same order

2929

Windows Synchronization

Uses interrupt masks to protect access to global resources on uniprocessor systems

Uses spinlocks on multiprocessor systems

Provides dispatcher objects which may act as mutexes and semaphores

Dispatcher objects may also provide events An event acts much like a condition variable

3030

Linux Synchronization

Kernel disables interrupts for synchronizing access to global data on uniprocessor systems

Uses spinlocks for multiprocessor synchronization

Uses semaphores and readers-writers locks when longer sections of code need access to data

Implements POSIX synchronization primitives to support multitasking multithreading (including real-time threads) and multiprocessing

3131

Further Reading

Ben-Ari M Principles of Concurrent Programming Prentice Hall 1982

Lamport L The Mutual Exclusion Problem Journal of the ACM April 1986

Abraham Silberschatz Peter B Galvin Operating System Concepts John Wiley amp Sons 6th Ed 2003

ndash Chapter 7 - Process Synchronization

ndash Chapter 8 - Deadlocks

3232

32 Trap Dispatching Interrupts Synchronization Trap and Interrupt dispatching

IRQL levels amp Interrupt Precedence

Spinlocks and Kernel Synchronization

Executive Synchronization

3333

Kernel Mode Versus User Mode

A processor state

Controls access to memory

Each memory page is tagged to show the required mode for reading and for writing

ndash Protects the system from the users

ndash Protects the user (process) from themselves

ndash System is not protected from system

Code regions are tagged ldquono write in any moderdquo

Controls ability to execute privileged instructions

A Windows abstraction

ndash Intel Ring 0 Ring 3

3434

Kernel Mode Versus User Mode

Control flow (a thread) can change from user to kernel mode and back

ndash Does not affect scheduling

ndash Thread context includes info about execution mode (along with registers etc)

PerfMon counters

ndash ldquoPrivileged Timerdquo and ldquoUser Timerdquo

ndash 4 levels of granularity thread process processor system

3535

Getting Into Kernel Mode

Code is run in kernel mode for one of three reasons

1 Requests from user mode

ndash Via the system service dispatch mechanism

ndash Kernel-mode code runs in the context of the requesting thread

2 Dedicated kernel-mode system threads

ndash Some threads in the system stay in kernel mode at all timesmostly in the ldquoSystemrdquo process

ndash Scheduled preempted etc like any other threads

3636

Getting Into Kernel Mode

3 Interrupts from external devicesndash interrupt dispatcher invokes the interrupt service routine

ndash ISR runs in the context of the interrupted thread so-called ldquoarbitrary thread contextrdquo

ndash ISR often requests the execution of a ldquoDPC routinerdquo which also runs in kernel mode

ndash Time not charged to interrupted thread

3737

Trap dispatching

Interruptdispatcher

Systemservice

dispatcher

Interruptserviceroutines

Interruptserviceroutines

Interruptserviceroutines

System services

System services

System services

Exceptiondispatcher

Exceptionhandlers

Exceptionhandlers

Exceptionhandlers

Virtual memorymanagerlsquos pager

Interrupt

System service call

HW exceptionsSW exceptions

Virtual addressexceptions

Trap processorlsquos mechanism to capture executing thread

ndash Switch from user to kernel mode

ndash Interrupts ndash asynchronous

ndash Exceptions - synchronous

Interrupt dispatch routine

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Interrupt service routine

interrupt

user or kernel mode

codekernel mode

Note no thread or process context switch

Note no thread or process context switch

Interrupt Dispatching

3939

IRQL = Interrupt Request Level

ndash Precedence of the interrupt with respect to other interrupts

ndash Different interrupt sources have different IRQLs

ndash not the same as IRQ

IRQL is also a state of the processor

ndash Servicing an interrupt raises processor IRQL to that interruptrsquos IRQL

ndash this masks subsequent interrupts at equal and lower IRQLs

User mode is limited to IRQL 0

No waits or page faults at IRQL gt= DISPATCH_LEVEL

Interrupt Precedence via IRQLs (x86)

4040

PassiveLowAPC

DispatchDPCDevice 1

Profile amp Synch (Srv 2003)

ClockInterprocessor Interrupt

Power failHigh

normal thread execution

Hardware interrupts

Deferrable software interrupts

012

302928

31

Interrupt Precedence via IRQLs (x86)

4141

Interrupt processing

Interrupt dispatch table (IDT)ndash Links to interrupt service routines

x86ndash Interrupt controller interrupts processor (single line)

ndash Processor queries for interrupt vector uses vector as index to IDT

Alphandash PAL code (Privileged Architecture Library ndash Alpha BIOS) determines interrupt vector calls kernel

ndash Kernel uses vector to index IDT

After ISR execution IRQL is lowered to initial level

4242

Interrupt object

Allows device drivers to register ISRs for their devicesndash Contains dispatch code (initial handler)

ndash Dispatch code calls ISR with interrupt object as parameter(HW cannot pass parameters to ISR)

Connectingdisconnecting interrupt objectsndash Dynamic association between ISR and IDT entry

ndash Loadable device drivers (kernel modules)

ndash Turn onoff ISR

Interrupt objects can synchronize access to ISR datandash Multiple instances of ISR may be active simultaneously (MP machine)

ndash Multiple ISR may be connected with IRQL

4343

Predefined IRQLs

High ndash used when halting the system (via KeBugCheck())

Power fail ndash originated in the NT design document but has never been used

Inter-processor interruptndash used to request action from other processor (dispatching a thread updating a processors TLB system shutdown system crash)

Clockndash Used to update systemlsquos clock allocation of CPU time to threads

Profilendash Used for kernel profiling (see Kernel profiler ndash Kernprofexe Res Kit)

4444

Predefined IRQLs (contd)

Device

ndash Used to prioritize device interrupts

DPCdispatch and APC

ndash Software interrupts that kernel and device drivers generate

Passive

ndash No interrupt level at all normal thread execution

4545

IRQLs on 64-bit Systems

PassiveLowAPC

DispatchDPCDevice 1

Device n

Synch (Srv 2003)Clock

Interprocessor InterruptPower

HighProfile

012

1413

15

34

PassiveLowAPC

DispatchDPC amp Synch (UP only)Correctable Machine Check

Device 1

Device nSynch (MP only)

ClockInterprocessor Interrupt

HighProfilePower

x64 IA64

12

4646

Interrupt Prioritization amp Delivery

IRQLs are determined as followsndash x86 UP systems IRQL = 27 - IRQ

ndash x86 MP systems bucketized (random)

ndash x64 amp IA64 systems IRQL = IDT vector number 16

On MP systems which processor is chosen to deliver an interruptndash By default any processor can receive an interrupt from any deviceCan be configured with IntFilter utility in Resource Kit

ndash On x86 and x64 systems the IOAPIC (IO advanced programmable interrupt controller) is programmed to interrupt the processor running at the lowest IRQL

ndash On IA64 systems the SAPIC (streamlined advanced programmable interrupt controller) is configured to interrupt one processor for each interrupt sourceProcessors are assigned round robin for each interrupt vector

4747

Software interrupts

Initiating thread dispatching

ndash DPC allow for scheduling actions when kernel is deep within many layers of code

ndash Delayed scheduling decision one DPC queue per processor

Handling timer expiration

Asynchronous execution of a procedure in context of a particular thread

Support for asynchronous IO operations

4848

Flow of Interrupts

4949

Sync on MP use spinlocks to coordinate among processors

Spinlock acquisition and release routines implement a one-owner-at-a-time algorithm

ndash Spinlock is either free or is considered to be owned by a CPU

ndash Analogous to using Windows API mutexes from user mode

A spinlock is just a data cell in memory

ndash Accessed with a test-and-modify operation that is atomic across all processors

ndash KSPIN_LOCK is an opaque data type typedefrsquod as a ULONG

ndash To implement synchronization a single bit is sufficient

Synchronization on SMP Systems

5050

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

Kernel Synchronization

Processor BProcessor A

Critical section

spinlock

DPC DPC

A spinlock is a locking primitive associatedwith a global data structure such as the DPC queue

5151

Queued Spinlocks

Problem Checking status of spinlock via test-and-set operation creates bus contention

Queued spinlocks maintain queue of waiting processors

First processor acquires lock other processors wait on processor-local flag

ndash Thus busy-wait loop requires no access to the memory bus

When releasing lock the 1st processorrsquos flag is modified

ndash Exactly one processor is being signaled

ndash Pre-determined wait order

5252

SMP Scalability Improvements

Windows 2000 queued spinlocks

ndash qlocks in Kernel Debugger

Server 2003

ndash More spinlocks eliminated (context swap system space commit)

ndash Further reduction of use of spinlocks amp length they are held

ndash Scheduling database now per-CPUAllows thread state transitions in parallel

5353

SMP Scalability Improvements

XP2003

ndash Minimized lock contention for hot locks PFN or Page Frame Database lock

ndash Some locks completely eliminatedCharging nonpagedpaged pool quotas allocating and mapping system page table entries charging commitment of pages allocatingmapping physical memory through AWE functions

ndash New more efficient locking mechanism (pushlocks)Doesnrsquot use spinlocks when no contentionUsed for object manager and address windowing extensions (AWE) related locks

5454

Waiting

Flexible wait calls

ndash Wait for one or multiple objects in one call

ndash Wait for multiple can wait for ldquoanyrdquo one or ldquoallrdquo at onceldquoAllrdquo all objects must be in the signalled state concurrently to resolve the wait

ndash All wait calls include optional timeout argument

ndash Waiting threads consume no CPU time

5555

Waiting

Waitable objects include

ndash Events (may be auto-reset or manual reset may be set or ldquopulsedrdquo)

ndash Mutexes (ldquomutual exclusionrdquo one-at-a-time)

ndash Semaphores (n-at-a-time)

ndash Timers

ndash Processes and Threads (signalled upon exit or terminate)

ndash Directories (change notification)

5656

Waiting

No guaranteed ordering of wait resolution

ndash If multiple threads are waiting for an object and only one thread is released (eg itrsquos a mutex or auto-reset event) which thread gets released is unpredictable

5757

Executive Synchronization

Thread waitson an object

handle

Create and initialize thread object

Initialized

Ready

Transition

Waiting

Running

Terminated

Standby

Wait is completeSet object to

signaled state

Interaction with thread scheduling

Waiting on Dispatcher Objects ndash outside the kernel

5858

Interaction bet Synchronization amp Dispatching User mode thread waits on an event objectlsquos handle

Kernel changes threadlsquos scheduling state from ready to waiting and adds thread to wait-list

Another thread sets the event

Kernel wakes up waiting threads variable priority threads get priority boost

5959

Interaction bet Synchronization amp Dispatching Dispatcher re-schedules new thread ndash it may preempt running thread it it has lower priority and issues software interrupt to initiate context switch

If no processor can be preempted the dispatcher places the ready thread in the dispatcher ready queue to be scheduled later

6060

What signals an object

Dispatcher object

System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

Owning thread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (kernel mode)

nonsignaled signaled

Owning thread or otherthread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (exported to user mode)

nonsignaled signaled

One thread releases thesemaphore freeing a resource

A thread acquires the semaphoreMore resources are not available

Kernel resumes one or more waiting threads

Semaphore

6161

A thread reinitializesthe thread object

What signals an object (contd)

Dispatcher object System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

A thread sets the event

Kernel resumes one or more threads

Kernel resumes one or more waiting threads

Event

nonsignaled signaled

Dedicated thread sets oneevent in the event pair

Kernel resumes theother dedicated thread

Kernel resumes waitingdedicated thread

Event pair

nonsignaled signaled

Timer expires

A thread (re) initializes the timer

Kernel resumes all waiting threads

Timer

nonsignaled signaled

Thread terminates

Kernel resumes all waiting threads

Thread

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 26: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

2626

Two Types of Semaphores

Counting semaphore

ndash integer value can range over an unrestricted domain

Binary semaphore

ndash integer value can range only between 0 and 1

ndash can be simpler to implement

Counting semaphore S can be implemented as a binary semaphore

2727

Deadlock and Starvation

Deadlock ndash

ndash two or more threads are waiting indefinitely for an event that can be caused by only one of the waiting threads

Let S and Q be two semaphores initialized to 1

T0 T1

wait(S) wait(Q)

wait(Q) wait(S)

signal(S) signal(Q)

signal(Q) signal(S)

2828

Deadlock and Starvation

Starvation ndash indefinite blocking

ndash A thread may never be removed from the semaphore queue in which it is suspended

Solution ndash

ndash all code should acquirerelease semaphores in same order

2929

Windows Synchronization

Uses interrupt masks to protect access to global resources on uniprocessor systems

Uses spinlocks on multiprocessor systems

Provides dispatcher objects which may act as mutexes and semaphores

Dispatcher objects may also provide events An event acts much like a condition variable

3030

Linux Synchronization

Kernel disables interrupts for synchronizing access to global data on uniprocessor systems

Uses spinlocks for multiprocessor synchronization

Uses semaphores and readers-writers locks when longer sections of code need access to data

Implements POSIX synchronization primitives to support multitasking multithreading (including real-time threads) and multiprocessing

3131

Further Reading

Ben-Ari M Principles of Concurrent Programming Prentice Hall 1982

Lamport L The Mutual Exclusion Problem Journal of the ACM April 1986

Abraham Silberschatz Peter B Galvin Operating System Concepts John Wiley amp Sons 6th Ed 2003

ndash Chapter 7 - Process Synchronization

ndash Chapter 8 - Deadlocks

3232

32 Trap Dispatching Interrupts Synchronization Trap and Interrupt dispatching

IRQL levels amp Interrupt Precedence

Spinlocks and Kernel Synchronization

Executive Synchronization

3333

Kernel Mode Versus User Mode

A processor state

Controls access to memory

Each memory page is tagged to show the required mode for reading and for writing

ndash Protects the system from the users

ndash Protects the user (process) from themselves

ndash System is not protected from system

Code regions are tagged ldquono write in any moderdquo

Controls ability to execute privileged instructions

A Windows abstraction

ndash Intel Ring 0 Ring 3

3434

Kernel Mode Versus User Mode

Control flow (a thread) can change from user to kernel mode and back

ndash Does not affect scheduling

ndash Thread context includes info about execution mode (along with registers etc)

PerfMon counters

ndash ldquoPrivileged Timerdquo and ldquoUser Timerdquo

ndash 4 levels of granularity thread process processor system

3535

Getting Into Kernel Mode

Code is run in kernel mode for one of three reasons

1 Requests from user mode

ndash Via the system service dispatch mechanism

ndash Kernel-mode code runs in the context of the requesting thread

2 Dedicated kernel-mode system threads

ndash Some threads in the system stay in kernel mode at all timesmostly in the ldquoSystemrdquo process

ndash Scheduled preempted etc like any other threads

3636

Getting Into Kernel Mode

3 Interrupts from external devicesndash interrupt dispatcher invokes the interrupt service routine

ndash ISR runs in the context of the interrupted thread so-called ldquoarbitrary thread contextrdquo

ndash ISR often requests the execution of a ldquoDPC routinerdquo which also runs in kernel mode

ndash Time not charged to interrupted thread

3737

Trap dispatching

Interruptdispatcher

Systemservice

dispatcher

Interruptserviceroutines

Interruptserviceroutines

Interruptserviceroutines

System services

System services

System services

Exceptiondispatcher

Exceptionhandlers

Exceptionhandlers

Exceptionhandlers

Virtual memorymanagerlsquos pager

Interrupt

System service call

HW exceptionsSW exceptions

Virtual addressexceptions

Trap processorlsquos mechanism to capture executing thread

ndash Switch from user to kernel mode

ndash Interrupts ndash asynchronous

ndash Exceptions - synchronous

Interrupt dispatch routine

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Interrupt service routine

interrupt

user or kernel mode

codekernel mode

Note no thread or process context switch

Note no thread or process context switch

Interrupt Dispatching

3939

IRQL = Interrupt Request Level

ndash Precedence of the interrupt with respect to other interrupts

ndash Different interrupt sources have different IRQLs

ndash not the same as IRQ

IRQL is also a state of the processor

ndash Servicing an interrupt raises processor IRQL to that interruptrsquos IRQL

ndash this masks subsequent interrupts at equal and lower IRQLs

User mode is limited to IRQL 0

No waits or page faults at IRQL gt= DISPATCH_LEVEL

Interrupt Precedence via IRQLs (x86)

4040

PassiveLowAPC

DispatchDPCDevice 1

Profile amp Synch (Srv 2003)

ClockInterprocessor Interrupt

Power failHigh

normal thread execution

Hardware interrupts

Deferrable software interrupts

012

302928

31

Interrupt Precedence via IRQLs (x86)

4141

Interrupt processing

Interrupt dispatch table (IDT)ndash Links to interrupt service routines

x86ndash Interrupt controller interrupts processor (single line)

ndash Processor queries for interrupt vector uses vector as index to IDT

Alphandash PAL code (Privileged Architecture Library ndash Alpha BIOS) determines interrupt vector calls kernel

ndash Kernel uses vector to index IDT

After ISR execution IRQL is lowered to initial level

4242

Interrupt object

Allows device drivers to register ISRs for their devicesndash Contains dispatch code (initial handler)

ndash Dispatch code calls ISR with interrupt object as parameter(HW cannot pass parameters to ISR)

Connectingdisconnecting interrupt objectsndash Dynamic association between ISR and IDT entry

ndash Loadable device drivers (kernel modules)

ndash Turn onoff ISR

Interrupt objects can synchronize access to ISR datandash Multiple instances of ISR may be active simultaneously (MP machine)

ndash Multiple ISR may be connected with IRQL

4343

Predefined IRQLs

High ndash used when halting the system (via KeBugCheck())

Power fail ndash originated in the NT design document but has never been used

Inter-processor interruptndash used to request action from other processor (dispatching a thread updating a processors TLB system shutdown system crash)

Clockndash Used to update systemlsquos clock allocation of CPU time to threads

Profilendash Used for kernel profiling (see Kernel profiler ndash Kernprofexe Res Kit)

4444

Predefined IRQLs (contd)

Device

ndash Used to prioritize device interrupts

DPCdispatch and APC

ndash Software interrupts that kernel and device drivers generate

Passive

ndash No interrupt level at all normal thread execution

4545

IRQLs on 64-bit Systems

PassiveLowAPC

DispatchDPCDevice 1

Device n

Synch (Srv 2003)Clock

Interprocessor InterruptPower

HighProfile

012

1413

15

34

PassiveLowAPC

DispatchDPC amp Synch (UP only)Correctable Machine Check

Device 1

Device nSynch (MP only)

ClockInterprocessor Interrupt

HighProfilePower

x64 IA64

12

4646

Interrupt Prioritization amp Delivery

IRQLs are determined as followsndash x86 UP systems IRQL = 27 - IRQ

ndash x86 MP systems bucketized (random)

ndash x64 amp IA64 systems IRQL = IDT vector number 16

On MP systems which processor is chosen to deliver an interruptndash By default any processor can receive an interrupt from any deviceCan be configured with IntFilter utility in Resource Kit

ndash On x86 and x64 systems the IOAPIC (IO advanced programmable interrupt controller) is programmed to interrupt the processor running at the lowest IRQL

ndash On IA64 systems the SAPIC (streamlined advanced programmable interrupt controller) is configured to interrupt one processor for each interrupt sourceProcessors are assigned round robin for each interrupt vector

4747

Software interrupts

Initiating thread dispatching

ndash DPC allow for scheduling actions when kernel is deep within many layers of code

ndash Delayed scheduling decision one DPC queue per processor

Handling timer expiration

Asynchronous execution of a procedure in context of a particular thread

Support for asynchronous IO operations

4848

Flow of Interrupts

4949

Sync on MP use spinlocks to coordinate among processors

Spinlock acquisition and release routines implement a one-owner-at-a-time algorithm

ndash Spinlock is either free or is considered to be owned by a CPU

ndash Analogous to using Windows API mutexes from user mode

A spinlock is just a data cell in memory

ndash Accessed with a test-and-modify operation that is atomic across all processors

ndash KSPIN_LOCK is an opaque data type typedefrsquod as a ULONG

ndash To implement synchronization a single bit is sufficient

Synchronization on SMP Systems

5050

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

Kernel Synchronization

Processor BProcessor A

Critical section

spinlock

DPC DPC

A spinlock is a locking primitive associatedwith a global data structure such as the DPC queue

5151

Queued Spinlocks

Problem Checking status of spinlock via test-and-set operation creates bus contention

Queued spinlocks maintain queue of waiting processors

First processor acquires lock other processors wait on processor-local flag

ndash Thus busy-wait loop requires no access to the memory bus

When releasing lock the 1st processorrsquos flag is modified

ndash Exactly one processor is being signaled

ndash Pre-determined wait order

5252

SMP Scalability Improvements

Windows 2000 queued spinlocks

ndash qlocks in Kernel Debugger

Server 2003

ndash More spinlocks eliminated (context swap system space commit)

ndash Further reduction of use of spinlocks amp length they are held

ndash Scheduling database now per-CPUAllows thread state transitions in parallel

5353

SMP Scalability Improvements

XP2003

ndash Minimized lock contention for hot locks PFN or Page Frame Database lock

ndash Some locks completely eliminatedCharging nonpagedpaged pool quotas allocating and mapping system page table entries charging commitment of pages allocatingmapping physical memory through AWE functions

ndash New more efficient locking mechanism (pushlocks)Doesnrsquot use spinlocks when no contentionUsed for object manager and address windowing extensions (AWE) related locks

5454

Waiting

Flexible wait calls

ndash Wait for one or multiple objects in one call

ndash Wait for multiple can wait for ldquoanyrdquo one or ldquoallrdquo at onceldquoAllrdquo all objects must be in the signalled state concurrently to resolve the wait

ndash All wait calls include optional timeout argument

ndash Waiting threads consume no CPU time

5555

Waiting

Waitable objects include

ndash Events (may be auto-reset or manual reset may be set or ldquopulsedrdquo)

ndash Mutexes (ldquomutual exclusionrdquo one-at-a-time)

ndash Semaphores (n-at-a-time)

ndash Timers

ndash Processes and Threads (signalled upon exit or terminate)

ndash Directories (change notification)

5656

Waiting

No guaranteed ordering of wait resolution

ndash If multiple threads are waiting for an object and only one thread is released (eg itrsquos a mutex or auto-reset event) which thread gets released is unpredictable

5757

Executive Synchronization

Thread waitson an object

handle

Create and initialize thread object

Initialized

Ready

Transition

Waiting

Running

Terminated

Standby

Wait is completeSet object to

signaled state

Interaction with thread scheduling

Waiting on Dispatcher Objects ndash outside the kernel

5858

Interaction bet Synchronization amp Dispatching User mode thread waits on an event objectlsquos handle

Kernel changes threadlsquos scheduling state from ready to waiting and adds thread to wait-list

Another thread sets the event

Kernel wakes up waiting threads variable priority threads get priority boost

5959

Interaction bet Synchronization amp Dispatching Dispatcher re-schedules new thread ndash it may preempt running thread it it has lower priority and issues software interrupt to initiate context switch

If no processor can be preempted the dispatcher places the ready thread in the dispatcher ready queue to be scheduled later

6060

What signals an object

Dispatcher object

System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

Owning thread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (kernel mode)

nonsignaled signaled

Owning thread or otherthread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (exported to user mode)

nonsignaled signaled

One thread releases thesemaphore freeing a resource

A thread acquires the semaphoreMore resources are not available

Kernel resumes one or more waiting threads

Semaphore

6161

A thread reinitializesthe thread object

What signals an object (contd)

Dispatcher object System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

A thread sets the event

Kernel resumes one or more threads

Kernel resumes one or more waiting threads

Event

nonsignaled signaled

Dedicated thread sets oneevent in the event pair

Kernel resumes theother dedicated thread

Kernel resumes waitingdedicated thread

Event pair

nonsignaled signaled

Timer expires

A thread (re) initializes the timer

Kernel resumes all waiting threads

Timer

nonsignaled signaled

Thread terminates

Kernel resumes all waiting threads

Thread

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 27: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

2727

Deadlock and Starvation

Deadlock ndash

ndash two or more threads are waiting indefinitely for an event that can be caused by only one of the waiting threads

Let S and Q be two semaphores initialized to 1

T0 T1

wait(S) wait(Q)

wait(Q) wait(S)

signal(S) signal(Q)

signal(Q) signal(S)

2828

Deadlock and Starvation

Starvation ndash indefinite blocking

ndash A thread may never be removed from the semaphore queue in which it is suspended

Solution ndash

ndash all code should acquirerelease semaphores in same order

2929

Windows Synchronization

Uses interrupt masks to protect access to global resources on uniprocessor systems

Uses spinlocks on multiprocessor systems

Provides dispatcher objects which may act as mutexes and semaphores

Dispatcher objects may also provide events An event acts much like a condition variable

3030

Linux Synchronization

Kernel disables interrupts for synchronizing access to global data on uniprocessor systems

Uses spinlocks for multiprocessor synchronization

Uses semaphores and readers-writers locks when longer sections of code need access to data

Implements POSIX synchronization primitives to support multitasking multithreading (including real-time threads) and multiprocessing

3131

Further Reading

Ben-Ari M Principles of Concurrent Programming Prentice Hall 1982

Lamport L The Mutual Exclusion Problem Journal of the ACM April 1986

Abraham Silberschatz Peter B Galvin Operating System Concepts John Wiley amp Sons 6th Ed 2003

ndash Chapter 7 - Process Synchronization

ndash Chapter 8 - Deadlocks

3232

32 Trap Dispatching Interrupts Synchronization Trap and Interrupt dispatching

IRQL levels amp Interrupt Precedence

Spinlocks and Kernel Synchronization

Executive Synchronization

3333

Kernel Mode Versus User Mode

A processor state

Controls access to memory

Each memory page is tagged to show the required mode for reading and for writing

ndash Protects the system from the users

ndash Protects the user (process) from themselves

ndash System is not protected from system

Code regions are tagged ldquono write in any moderdquo

Controls ability to execute privileged instructions

A Windows abstraction

ndash Intel Ring 0 Ring 3

3434

Kernel Mode Versus User Mode

Control flow (a thread) can change from user to kernel mode and back

ndash Does not affect scheduling

ndash Thread context includes info about execution mode (along with registers etc)

PerfMon counters

ndash ldquoPrivileged Timerdquo and ldquoUser Timerdquo

ndash 4 levels of granularity thread process processor system

3535

Getting Into Kernel Mode

Code is run in kernel mode for one of three reasons

1 Requests from user mode

ndash Via the system service dispatch mechanism

ndash Kernel-mode code runs in the context of the requesting thread

2 Dedicated kernel-mode system threads

ndash Some threads in the system stay in kernel mode at all timesmostly in the ldquoSystemrdquo process

ndash Scheduled preempted etc like any other threads

3636

Getting Into Kernel Mode

3 Interrupts from external devicesndash interrupt dispatcher invokes the interrupt service routine

ndash ISR runs in the context of the interrupted thread so-called ldquoarbitrary thread contextrdquo

ndash ISR often requests the execution of a ldquoDPC routinerdquo which also runs in kernel mode

ndash Time not charged to interrupted thread

3737

Trap dispatching

Interruptdispatcher

Systemservice

dispatcher

Interruptserviceroutines

Interruptserviceroutines

Interruptserviceroutines

System services

System services

System services

Exceptiondispatcher

Exceptionhandlers

Exceptionhandlers

Exceptionhandlers

Virtual memorymanagerlsquos pager

Interrupt

System service call

HW exceptionsSW exceptions

Virtual addressexceptions

Trap processorlsquos mechanism to capture executing thread

ndash Switch from user to kernel mode

ndash Interrupts ndash asynchronous

ndash Exceptions - synchronous

Interrupt dispatch routine

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Interrupt service routine

interrupt

user or kernel mode

codekernel mode

Note no thread or process context switch

Note no thread or process context switch

Interrupt Dispatching

3939

IRQL = Interrupt Request Level

ndash Precedence of the interrupt with respect to other interrupts

ndash Different interrupt sources have different IRQLs

ndash not the same as IRQ

IRQL is also a state of the processor

ndash Servicing an interrupt raises processor IRQL to that interruptrsquos IRQL

ndash this masks subsequent interrupts at equal and lower IRQLs

User mode is limited to IRQL 0

No waits or page faults at IRQL gt= DISPATCH_LEVEL

Interrupt Precedence via IRQLs (x86)

4040

PassiveLowAPC

DispatchDPCDevice 1

Profile amp Synch (Srv 2003)

ClockInterprocessor Interrupt

Power failHigh

normal thread execution

Hardware interrupts

Deferrable software interrupts

012

302928

31

Interrupt Precedence via IRQLs (x86)

4141

Interrupt processing

Interrupt dispatch table (IDT)ndash Links to interrupt service routines

x86ndash Interrupt controller interrupts processor (single line)

ndash Processor queries for interrupt vector uses vector as index to IDT

Alphandash PAL code (Privileged Architecture Library ndash Alpha BIOS) determines interrupt vector calls kernel

ndash Kernel uses vector to index IDT

After ISR execution IRQL is lowered to initial level

4242

Interrupt object

Allows device drivers to register ISRs for their devicesndash Contains dispatch code (initial handler)

ndash Dispatch code calls ISR with interrupt object as parameter(HW cannot pass parameters to ISR)

Connectingdisconnecting interrupt objectsndash Dynamic association between ISR and IDT entry

ndash Loadable device drivers (kernel modules)

ndash Turn onoff ISR

Interrupt objects can synchronize access to ISR datandash Multiple instances of ISR may be active simultaneously (MP machine)

ndash Multiple ISR may be connected with IRQL

4343

Predefined IRQLs

High ndash used when halting the system (via KeBugCheck())

Power fail ndash originated in the NT design document but has never been used

Inter-processor interruptndash used to request action from other processor (dispatching a thread updating a processors TLB system shutdown system crash)

Clockndash Used to update systemlsquos clock allocation of CPU time to threads

Profilendash Used for kernel profiling (see Kernel profiler ndash Kernprofexe Res Kit)

4444

Predefined IRQLs (contd)

Device

ndash Used to prioritize device interrupts

DPCdispatch and APC

ndash Software interrupts that kernel and device drivers generate

Passive

ndash No interrupt level at all normal thread execution

4545

IRQLs on 64-bit Systems

PassiveLowAPC

DispatchDPCDevice 1

Device n

Synch (Srv 2003)Clock

Interprocessor InterruptPower

HighProfile

012

1413

15

34

PassiveLowAPC

DispatchDPC amp Synch (UP only)Correctable Machine Check

Device 1

Device nSynch (MP only)

ClockInterprocessor Interrupt

HighProfilePower

x64 IA64

12

4646

Interrupt Prioritization amp Delivery

IRQLs are determined as followsndash x86 UP systems IRQL = 27 - IRQ

ndash x86 MP systems bucketized (random)

ndash x64 amp IA64 systems IRQL = IDT vector number 16

On MP systems which processor is chosen to deliver an interruptndash By default any processor can receive an interrupt from any deviceCan be configured with IntFilter utility in Resource Kit

ndash On x86 and x64 systems the IOAPIC (IO advanced programmable interrupt controller) is programmed to interrupt the processor running at the lowest IRQL

ndash On IA64 systems the SAPIC (streamlined advanced programmable interrupt controller) is configured to interrupt one processor for each interrupt sourceProcessors are assigned round robin for each interrupt vector

4747

Software interrupts

Initiating thread dispatching

ndash DPC allow for scheduling actions when kernel is deep within many layers of code

ndash Delayed scheduling decision one DPC queue per processor

Handling timer expiration

Asynchronous execution of a procedure in context of a particular thread

Support for asynchronous IO operations

4848

Flow of Interrupts

4949

Sync on MP use spinlocks to coordinate among processors

Spinlock acquisition and release routines implement a one-owner-at-a-time algorithm

ndash Spinlock is either free or is considered to be owned by a CPU

ndash Analogous to using Windows API mutexes from user mode

A spinlock is just a data cell in memory

ndash Accessed with a test-and-modify operation that is atomic across all processors

ndash KSPIN_LOCK is an opaque data type typedefrsquod as a ULONG

ndash To implement synchronization a single bit is sufficient

Synchronization on SMP Systems

5050

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

Kernel Synchronization

Processor BProcessor A

Critical section

spinlock

DPC DPC

A spinlock is a locking primitive associatedwith a global data structure such as the DPC queue

5151

Queued Spinlocks

Problem Checking status of spinlock via test-and-set operation creates bus contention

Queued spinlocks maintain queue of waiting processors

First processor acquires lock other processors wait on processor-local flag

ndash Thus busy-wait loop requires no access to the memory bus

When releasing lock the 1st processorrsquos flag is modified

ndash Exactly one processor is being signaled

ndash Pre-determined wait order

5252

SMP Scalability Improvements

Windows 2000 queued spinlocks

ndash qlocks in Kernel Debugger

Server 2003

ndash More spinlocks eliminated (context swap system space commit)

ndash Further reduction of use of spinlocks amp length they are held

ndash Scheduling database now per-CPUAllows thread state transitions in parallel

5353

SMP Scalability Improvements

XP2003

ndash Minimized lock contention for hot locks PFN or Page Frame Database lock

ndash Some locks completely eliminatedCharging nonpagedpaged pool quotas allocating and mapping system page table entries charging commitment of pages allocatingmapping physical memory through AWE functions

ndash New more efficient locking mechanism (pushlocks)Doesnrsquot use spinlocks when no contentionUsed for object manager and address windowing extensions (AWE) related locks

5454

Waiting

Flexible wait calls

ndash Wait for one or multiple objects in one call

ndash Wait for multiple can wait for ldquoanyrdquo one or ldquoallrdquo at onceldquoAllrdquo all objects must be in the signalled state concurrently to resolve the wait

ndash All wait calls include optional timeout argument

ndash Waiting threads consume no CPU time

5555

Waiting

Waitable objects include

ndash Events (may be auto-reset or manual reset may be set or ldquopulsedrdquo)

ndash Mutexes (ldquomutual exclusionrdquo one-at-a-time)

ndash Semaphores (n-at-a-time)

ndash Timers

ndash Processes and Threads (signalled upon exit or terminate)

ndash Directories (change notification)

5656

Waiting

No guaranteed ordering of wait resolution

ndash If multiple threads are waiting for an object and only one thread is released (eg itrsquos a mutex or auto-reset event) which thread gets released is unpredictable

5757

Executive Synchronization

Thread waitson an object

handle

Create and initialize thread object

Initialized

Ready

Transition

Waiting

Running

Terminated

Standby

Wait is completeSet object to

signaled state

Interaction with thread scheduling

Waiting on Dispatcher Objects ndash outside the kernel

5858

Interaction bet Synchronization amp Dispatching User mode thread waits on an event objectlsquos handle

Kernel changes threadlsquos scheduling state from ready to waiting and adds thread to wait-list

Another thread sets the event

Kernel wakes up waiting threads variable priority threads get priority boost

5959

Interaction bet Synchronization amp Dispatching Dispatcher re-schedules new thread ndash it may preempt running thread it it has lower priority and issues software interrupt to initiate context switch

If no processor can be preempted the dispatcher places the ready thread in the dispatcher ready queue to be scheduled later

6060

What signals an object

Dispatcher object

System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

Owning thread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (kernel mode)

nonsignaled signaled

Owning thread or otherthread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (exported to user mode)

nonsignaled signaled

One thread releases thesemaphore freeing a resource

A thread acquires the semaphoreMore resources are not available

Kernel resumes one or more waiting threads

Semaphore

6161

A thread reinitializesthe thread object

What signals an object (contd)

Dispatcher object System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

A thread sets the event

Kernel resumes one or more threads

Kernel resumes one or more waiting threads

Event

nonsignaled signaled

Dedicated thread sets oneevent in the event pair

Kernel resumes theother dedicated thread

Kernel resumes waitingdedicated thread

Event pair

nonsignaled signaled

Timer expires

A thread (re) initializes the timer

Kernel resumes all waiting threads

Timer

nonsignaled signaled

Thread terminates

Kernel resumes all waiting threads

Thread

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 28: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

2828

Deadlock and Starvation

Starvation ndash indefinite blocking

ndash A thread may never be removed from the semaphore queue in which it is suspended

Solution ndash

ndash all code should acquirerelease semaphores in same order

2929

Windows Synchronization

Uses interrupt masks to protect access to global resources on uniprocessor systems

Uses spinlocks on multiprocessor systems

Provides dispatcher objects which may act as mutexes and semaphores

Dispatcher objects may also provide events An event acts much like a condition variable

3030

Linux Synchronization

Kernel disables interrupts for synchronizing access to global data on uniprocessor systems

Uses spinlocks for multiprocessor synchronization

Uses semaphores and readers-writers locks when longer sections of code need access to data

Implements POSIX synchronization primitives to support multitasking multithreading (including real-time threads) and multiprocessing

3131

Further Reading

Ben-Ari M Principles of Concurrent Programming Prentice Hall 1982

Lamport L The Mutual Exclusion Problem Journal of the ACM April 1986

Abraham Silberschatz Peter B Galvin Operating System Concepts John Wiley amp Sons 6th Ed 2003

ndash Chapter 7 - Process Synchronization

ndash Chapter 8 - Deadlocks

3232

32 Trap Dispatching Interrupts Synchronization Trap and Interrupt dispatching

IRQL levels amp Interrupt Precedence

Spinlocks and Kernel Synchronization

Executive Synchronization

3333

Kernel Mode Versus User Mode

A processor state

Controls access to memory

Each memory page is tagged to show the required mode for reading and for writing

ndash Protects the system from the users

ndash Protects the user (process) from themselves

ndash System is not protected from system

Code regions are tagged ldquono write in any moderdquo

Controls ability to execute privileged instructions

A Windows abstraction

ndash Intel Ring 0 Ring 3

3434

Kernel Mode Versus User Mode

Control flow (a thread) can change from user to kernel mode and back

ndash Does not affect scheduling

ndash Thread context includes info about execution mode (along with registers etc)

PerfMon counters

ndash ldquoPrivileged Timerdquo and ldquoUser Timerdquo

ndash 4 levels of granularity thread process processor system

3535

Getting Into Kernel Mode

Code is run in kernel mode for one of three reasons

1 Requests from user mode

ndash Via the system service dispatch mechanism

ndash Kernel-mode code runs in the context of the requesting thread

2 Dedicated kernel-mode system threads

ndash Some threads in the system stay in kernel mode at all timesmostly in the ldquoSystemrdquo process

ndash Scheduled preempted etc like any other threads

3636

Getting Into Kernel Mode

3 Interrupts from external devicesndash interrupt dispatcher invokes the interrupt service routine

ndash ISR runs in the context of the interrupted thread so-called ldquoarbitrary thread contextrdquo

ndash ISR often requests the execution of a ldquoDPC routinerdquo which also runs in kernel mode

ndash Time not charged to interrupted thread

3737

Trap dispatching

Interruptdispatcher

Systemservice

dispatcher

Interruptserviceroutines

Interruptserviceroutines

Interruptserviceroutines

System services

System services

System services

Exceptiondispatcher

Exceptionhandlers

Exceptionhandlers

Exceptionhandlers

Virtual memorymanagerlsquos pager

Interrupt

System service call

HW exceptionsSW exceptions

Virtual addressexceptions

Trap processorlsquos mechanism to capture executing thread

ndash Switch from user to kernel mode

ndash Interrupts ndash asynchronous

ndash Exceptions - synchronous

Interrupt dispatch routine

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Interrupt service routine

interrupt

user or kernel mode

codekernel mode

Note no thread or process context switch

Note no thread or process context switch

Interrupt Dispatching

3939

IRQL = Interrupt Request Level

ndash Precedence of the interrupt with respect to other interrupts

ndash Different interrupt sources have different IRQLs

ndash not the same as IRQ

IRQL is also a state of the processor

ndash Servicing an interrupt raises processor IRQL to that interruptrsquos IRQL

ndash this masks subsequent interrupts at equal and lower IRQLs

User mode is limited to IRQL 0

No waits or page faults at IRQL gt= DISPATCH_LEVEL

Interrupt Precedence via IRQLs (x86)

4040

PassiveLowAPC

DispatchDPCDevice 1

Profile amp Synch (Srv 2003)

ClockInterprocessor Interrupt

Power failHigh

normal thread execution

Hardware interrupts

Deferrable software interrupts

012

302928

31

Interrupt Precedence via IRQLs (x86)

4141

Interrupt processing

Interrupt dispatch table (IDT)ndash Links to interrupt service routines

x86ndash Interrupt controller interrupts processor (single line)

ndash Processor queries for interrupt vector uses vector as index to IDT

Alphandash PAL code (Privileged Architecture Library ndash Alpha BIOS) determines interrupt vector calls kernel

ndash Kernel uses vector to index IDT

After ISR execution IRQL is lowered to initial level

4242

Interrupt object

Allows device drivers to register ISRs for their devicesndash Contains dispatch code (initial handler)

ndash Dispatch code calls ISR with interrupt object as parameter(HW cannot pass parameters to ISR)

Connectingdisconnecting interrupt objectsndash Dynamic association between ISR and IDT entry

ndash Loadable device drivers (kernel modules)

ndash Turn onoff ISR

Interrupt objects can synchronize access to ISR datandash Multiple instances of ISR may be active simultaneously (MP machine)

ndash Multiple ISR may be connected with IRQL

4343

Predefined IRQLs

High ndash used when halting the system (via KeBugCheck())

Power fail ndash originated in the NT design document but has never been used

Inter-processor interruptndash used to request action from other processor (dispatching a thread updating a processors TLB system shutdown system crash)

Clockndash Used to update systemlsquos clock allocation of CPU time to threads

Profilendash Used for kernel profiling (see Kernel profiler ndash Kernprofexe Res Kit)

4444

Predefined IRQLs (contd)

Device

ndash Used to prioritize device interrupts

DPCdispatch and APC

ndash Software interrupts that kernel and device drivers generate

Passive

ndash No interrupt level at all normal thread execution

4545

IRQLs on 64-bit Systems

PassiveLowAPC

DispatchDPCDevice 1

Device n

Synch (Srv 2003)Clock

Interprocessor InterruptPower

HighProfile

012

1413

15

34

PassiveLowAPC

DispatchDPC amp Synch (UP only)Correctable Machine Check

Device 1

Device nSynch (MP only)

ClockInterprocessor Interrupt

HighProfilePower

x64 IA64

12

4646

Interrupt Prioritization amp Delivery

IRQLs are determined as followsndash x86 UP systems IRQL = 27 - IRQ

ndash x86 MP systems bucketized (random)

ndash x64 amp IA64 systems IRQL = IDT vector number 16

On MP systems which processor is chosen to deliver an interruptndash By default any processor can receive an interrupt from any deviceCan be configured with IntFilter utility in Resource Kit

ndash On x86 and x64 systems the IOAPIC (IO advanced programmable interrupt controller) is programmed to interrupt the processor running at the lowest IRQL

ndash On IA64 systems the SAPIC (streamlined advanced programmable interrupt controller) is configured to interrupt one processor for each interrupt sourceProcessors are assigned round robin for each interrupt vector

4747

Software interrupts

Initiating thread dispatching

ndash DPC allow for scheduling actions when kernel is deep within many layers of code

ndash Delayed scheduling decision one DPC queue per processor

Handling timer expiration

Asynchronous execution of a procedure in context of a particular thread

Support for asynchronous IO operations

4848

Flow of Interrupts

4949

Sync on MP use spinlocks to coordinate among processors

Spinlock acquisition and release routines implement a one-owner-at-a-time algorithm

ndash Spinlock is either free or is considered to be owned by a CPU

ndash Analogous to using Windows API mutexes from user mode

A spinlock is just a data cell in memory

ndash Accessed with a test-and-modify operation that is atomic across all processors

ndash KSPIN_LOCK is an opaque data type typedefrsquod as a ULONG

ndash To implement synchronization a single bit is sufficient

Synchronization on SMP Systems

5050

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

Kernel Synchronization

Processor BProcessor A

Critical section

spinlock

DPC DPC

A spinlock is a locking primitive associatedwith a global data structure such as the DPC queue

5151

Queued Spinlocks

Problem Checking status of spinlock via test-and-set operation creates bus contention

Queued spinlocks maintain queue of waiting processors

First processor acquires lock other processors wait on processor-local flag

ndash Thus busy-wait loop requires no access to the memory bus

When releasing lock the 1st processorrsquos flag is modified

ndash Exactly one processor is being signaled

ndash Pre-determined wait order

5252

SMP Scalability Improvements

Windows 2000 queued spinlocks

ndash qlocks in Kernel Debugger

Server 2003

ndash More spinlocks eliminated (context swap system space commit)

ndash Further reduction of use of spinlocks amp length they are held

ndash Scheduling database now per-CPUAllows thread state transitions in parallel

5353

SMP Scalability Improvements

XP2003

ndash Minimized lock contention for hot locks PFN or Page Frame Database lock

ndash Some locks completely eliminatedCharging nonpagedpaged pool quotas allocating and mapping system page table entries charging commitment of pages allocatingmapping physical memory through AWE functions

ndash New more efficient locking mechanism (pushlocks)Doesnrsquot use spinlocks when no contentionUsed for object manager and address windowing extensions (AWE) related locks

5454

Waiting

Flexible wait calls

ndash Wait for one or multiple objects in one call

ndash Wait for multiple can wait for ldquoanyrdquo one or ldquoallrdquo at onceldquoAllrdquo all objects must be in the signalled state concurrently to resolve the wait

ndash All wait calls include optional timeout argument

ndash Waiting threads consume no CPU time

5555

Waiting

Waitable objects include

ndash Events (may be auto-reset or manual reset may be set or ldquopulsedrdquo)

ndash Mutexes (ldquomutual exclusionrdquo one-at-a-time)

ndash Semaphores (n-at-a-time)

ndash Timers

ndash Processes and Threads (signalled upon exit or terminate)

ndash Directories (change notification)

5656

Waiting

No guaranteed ordering of wait resolution

ndash If multiple threads are waiting for an object and only one thread is released (eg itrsquos a mutex or auto-reset event) which thread gets released is unpredictable

5757

Executive Synchronization

Thread waitson an object

handle

Create and initialize thread object

Initialized

Ready

Transition

Waiting

Running

Terminated

Standby

Wait is completeSet object to

signaled state

Interaction with thread scheduling

Waiting on Dispatcher Objects ndash outside the kernel

5858

Interaction bet Synchronization amp Dispatching User mode thread waits on an event objectlsquos handle

Kernel changes threadlsquos scheduling state from ready to waiting and adds thread to wait-list

Another thread sets the event

Kernel wakes up waiting threads variable priority threads get priority boost

5959

Interaction bet Synchronization amp Dispatching Dispatcher re-schedules new thread ndash it may preempt running thread it it has lower priority and issues software interrupt to initiate context switch

If no processor can be preempted the dispatcher places the ready thread in the dispatcher ready queue to be scheduled later

6060

What signals an object

Dispatcher object

System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

Owning thread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (kernel mode)

nonsignaled signaled

Owning thread or otherthread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (exported to user mode)

nonsignaled signaled

One thread releases thesemaphore freeing a resource

A thread acquires the semaphoreMore resources are not available

Kernel resumes one or more waiting threads

Semaphore

6161

A thread reinitializesthe thread object

What signals an object (contd)

Dispatcher object System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

A thread sets the event

Kernel resumes one or more threads

Kernel resumes one or more waiting threads

Event

nonsignaled signaled

Dedicated thread sets oneevent in the event pair

Kernel resumes theother dedicated thread

Kernel resumes waitingdedicated thread

Event pair

nonsignaled signaled

Timer expires

A thread (re) initializes the timer

Kernel resumes all waiting threads

Timer

nonsignaled signaled

Thread terminates

Kernel resumes all waiting threads

Thread

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 29: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

2929

Windows Synchronization

Uses interrupt masks to protect access to global resources on uniprocessor systems

Uses spinlocks on multiprocessor systems

Provides dispatcher objects which may act as mutexes and semaphores

Dispatcher objects may also provide events An event acts much like a condition variable

3030

Linux Synchronization

Kernel disables interrupts for synchronizing access to global data on uniprocessor systems

Uses spinlocks for multiprocessor synchronization

Uses semaphores and readers-writers locks when longer sections of code need access to data

Implements POSIX synchronization primitives to support multitasking multithreading (including real-time threads) and multiprocessing

3131

Further Reading

Ben-Ari M Principles of Concurrent Programming Prentice Hall 1982

Lamport L The Mutual Exclusion Problem Journal of the ACM April 1986

Abraham Silberschatz Peter B Galvin Operating System Concepts John Wiley amp Sons 6th Ed 2003

ndash Chapter 7 - Process Synchronization

ndash Chapter 8 - Deadlocks

3232

32 Trap Dispatching Interrupts Synchronization Trap and Interrupt dispatching

IRQL levels amp Interrupt Precedence

Spinlocks and Kernel Synchronization

Executive Synchronization

3333

Kernel Mode Versus User Mode

A processor state

Controls access to memory

Each memory page is tagged to show the required mode for reading and for writing

ndash Protects the system from the users

ndash Protects the user (process) from themselves

ndash System is not protected from system

Code regions are tagged ldquono write in any moderdquo

Controls ability to execute privileged instructions

A Windows abstraction

ndash Intel Ring 0 Ring 3

3434

Kernel Mode Versus User Mode

Control flow (a thread) can change from user to kernel mode and back

ndash Does not affect scheduling

ndash Thread context includes info about execution mode (along with registers etc)

PerfMon counters

ndash ldquoPrivileged Timerdquo and ldquoUser Timerdquo

ndash 4 levels of granularity thread process processor system

3535

Getting Into Kernel Mode

Code is run in kernel mode for one of three reasons

1 Requests from user mode

ndash Via the system service dispatch mechanism

ndash Kernel-mode code runs in the context of the requesting thread

2 Dedicated kernel-mode system threads

ndash Some threads in the system stay in kernel mode at all timesmostly in the ldquoSystemrdquo process

ndash Scheduled preempted etc like any other threads

3636

Getting Into Kernel Mode

3 Interrupts from external devicesndash interrupt dispatcher invokes the interrupt service routine

ndash ISR runs in the context of the interrupted thread so-called ldquoarbitrary thread contextrdquo

ndash ISR often requests the execution of a ldquoDPC routinerdquo which also runs in kernel mode

ndash Time not charged to interrupted thread

3737

Trap dispatching

Interruptdispatcher

Systemservice

dispatcher

Interruptserviceroutines

Interruptserviceroutines

Interruptserviceroutines

System services

System services

System services

Exceptiondispatcher

Exceptionhandlers

Exceptionhandlers

Exceptionhandlers

Virtual memorymanagerlsquos pager

Interrupt

System service call

HW exceptionsSW exceptions

Virtual addressexceptions

Trap processorlsquos mechanism to capture executing thread

ndash Switch from user to kernel mode

ndash Interrupts ndash asynchronous

ndash Exceptions - synchronous

Interrupt dispatch routine

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Interrupt service routine

interrupt

user or kernel mode

codekernel mode

Note no thread or process context switch

Note no thread or process context switch

Interrupt Dispatching

3939

IRQL = Interrupt Request Level

ndash Precedence of the interrupt with respect to other interrupts

ndash Different interrupt sources have different IRQLs

ndash not the same as IRQ

IRQL is also a state of the processor

ndash Servicing an interrupt raises processor IRQL to that interruptrsquos IRQL

ndash this masks subsequent interrupts at equal and lower IRQLs

User mode is limited to IRQL 0

No waits or page faults at IRQL gt= DISPATCH_LEVEL

Interrupt Precedence via IRQLs (x86)

4040

PassiveLowAPC

DispatchDPCDevice 1

Profile amp Synch (Srv 2003)

ClockInterprocessor Interrupt

Power failHigh

normal thread execution

Hardware interrupts

Deferrable software interrupts

012

302928

31

Interrupt Precedence via IRQLs (x86)

4141

Interrupt processing

Interrupt dispatch table (IDT)ndash Links to interrupt service routines

x86ndash Interrupt controller interrupts processor (single line)

ndash Processor queries for interrupt vector uses vector as index to IDT

Alphandash PAL code (Privileged Architecture Library ndash Alpha BIOS) determines interrupt vector calls kernel

ndash Kernel uses vector to index IDT

After ISR execution IRQL is lowered to initial level

4242

Interrupt object

Allows device drivers to register ISRs for their devicesndash Contains dispatch code (initial handler)

ndash Dispatch code calls ISR with interrupt object as parameter(HW cannot pass parameters to ISR)

Connectingdisconnecting interrupt objectsndash Dynamic association between ISR and IDT entry

ndash Loadable device drivers (kernel modules)

ndash Turn onoff ISR

Interrupt objects can synchronize access to ISR datandash Multiple instances of ISR may be active simultaneously (MP machine)

ndash Multiple ISR may be connected with IRQL

4343

Predefined IRQLs

High ndash used when halting the system (via KeBugCheck())

Power fail ndash originated in the NT design document but has never been used

Inter-processor interruptndash used to request action from other processor (dispatching a thread updating a processors TLB system shutdown system crash)

Clockndash Used to update systemlsquos clock allocation of CPU time to threads

Profilendash Used for kernel profiling (see Kernel profiler ndash Kernprofexe Res Kit)

4444

Predefined IRQLs (contd)

Device

ndash Used to prioritize device interrupts

DPCdispatch and APC

ndash Software interrupts that kernel and device drivers generate

Passive

ndash No interrupt level at all normal thread execution

4545

IRQLs on 64-bit Systems

PassiveLowAPC

DispatchDPCDevice 1

Device n

Synch (Srv 2003)Clock

Interprocessor InterruptPower

HighProfile

012

1413

15

34

PassiveLowAPC

DispatchDPC amp Synch (UP only)Correctable Machine Check

Device 1

Device nSynch (MP only)

ClockInterprocessor Interrupt

HighProfilePower

x64 IA64

12

4646

Interrupt Prioritization amp Delivery

IRQLs are determined as followsndash x86 UP systems IRQL = 27 - IRQ

ndash x86 MP systems bucketized (random)

ndash x64 amp IA64 systems IRQL = IDT vector number 16

On MP systems which processor is chosen to deliver an interruptndash By default any processor can receive an interrupt from any deviceCan be configured with IntFilter utility in Resource Kit

ndash On x86 and x64 systems the IOAPIC (IO advanced programmable interrupt controller) is programmed to interrupt the processor running at the lowest IRQL

ndash On IA64 systems the SAPIC (streamlined advanced programmable interrupt controller) is configured to interrupt one processor for each interrupt sourceProcessors are assigned round robin for each interrupt vector

4747

Software interrupts

Initiating thread dispatching

ndash DPC allow for scheduling actions when kernel is deep within many layers of code

ndash Delayed scheduling decision one DPC queue per processor

Handling timer expiration

Asynchronous execution of a procedure in context of a particular thread

Support for asynchronous IO operations

4848

Flow of Interrupts

4949

Sync on MP use spinlocks to coordinate among processors

Spinlock acquisition and release routines implement a one-owner-at-a-time algorithm

ndash Spinlock is either free or is considered to be owned by a CPU

ndash Analogous to using Windows API mutexes from user mode

A spinlock is just a data cell in memory

ndash Accessed with a test-and-modify operation that is atomic across all processors

ndash KSPIN_LOCK is an opaque data type typedefrsquod as a ULONG

ndash To implement synchronization a single bit is sufficient

Synchronization on SMP Systems

5050

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

Kernel Synchronization

Processor BProcessor A

Critical section

spinlock

DPC DPC

A spinlock is a locking primitive associatedwith a global data structure such as the DPC queue

5151

Queued Spinlocks

Problem Checking status of spinlock via test-and-set operation creates bus contention

Queued spinlocks maintain queue of waiting processors

First processor acquires lock other processors wait on processor-local flag

ndash Thus busy-wait loop requires no access to the memory bus

When releasing lock the 1st processorrsquos flag is modified

ndash Exactly one processor is being signaled

ndash Pre-determined wait order

5252

SMP Scalability Improvements

Windows 2000 queued spinlocks

ndash qlocks in Kernel Debugger

Server 2003

ndash More spinlocks eliminated (context swap system space commit)

ndash Further reduction of use of spinlocks amp length they are held

ndash Scheduling database now per-CPUAllows thread state transitions in parallel

5353

SMP Scalability Improvements

XP2003

ndash Minimized lock contention for hot locks PFN or Page Frame Database lock

ndash Some locks completely eliminatedCharging nonpagedpaged pool quotas allocating and mapping system page table entries charging commitment of pages allocatingmapping physical memory through AWE functions

ndash New more efficient locking mechanism (pushlocks)Doesnrsquot use spinlocks when no contentionUsed for object manager and address windowing extensions (AWE) related locks

5454

Waiting

Flexible wait calls

ndash Wait for one or multiple objects in one call

ndash Wait for multiple can wait for ldquoanyrdquo one or ldquoallrdquo at onceldquoAllrdquo all objects must be in the signalled state concurrently to resolve the wait

ndash All wait calls include optional timeout argument

ndash Waiting threads consume no CPU time

5555

Waiting

Waitable objects include

ndash Events (may be auto-reset or manual reset may be set or ldquopulsedrdquo)

ndash Mutexes (ldquomutual exclusionrdquo one-at-a-time)

ndash Semaphores (n-at-a-time)

ndash Timers

ndash Processes and Threads (signalled upon exit or terminate)

ndash Directories (change notification)

5656

Waiting

No guaranteed ordering of wait resolution

ndash If multiple threads are waiting for an object and only one thread is released (eg itrsquos a mutex or auto-reset event) which thread gets released is unpredictable

5757

Executive Synchronization

Thread waitson an object

handle

Create and initialize thread object

Initialized

Ready

Transition

Waiting

Running

Terminated

Standby

Wait is completeSet object to

signaled state

Interaction with thread scheduling

Waiting on Dispatcher Objects ndash outside the kernel

5858

Interaction bet Synchronization amp Dispatching User mode thread waits on an event objectlsquos handle

Kernel changes threadlsquos scheduling state from ready to waiting and adds thread to wait-list

Another thread sets the event

Kernel wakes up waiting threads variable priority threads get priority boost

5959

Interaction bet Synchronization amp Dispatching Dispatcher re-schedules new thread ndash it may preempt running thread it it has lower priority and issues software interrupt to initiate context switch

If no processor can be preempted the dispatcher places the ready thread in the dispatcher ready queue to be scheduled later

6060

What signals an object

Dispatcher object

System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

Owning thread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (kernel mode)

nonsignaled signaled

Owning thread or otherthread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (exported to user mode)

nonsignaled signaled

One thread releases thesemaphore freeing a resource

A thread acquires the semaphoreMore resources are not available

Kernel resumes one or more waiting threads

Semaphore

6161

A thread reinitializesthe thread object

What signals an object (contd)

Dispatcher object System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

A thread sets the event

Kernel resumes one or more threads

Kernel resumes one or more waiting threads

Event

nonsignaled signaled

Dedicated thread sets oneevent in the event pair

Kernel resumes theother dedicated thread

Kernel resumes waitingdedicated thread

Event pair

nonsignaled signaled

Timer expires

A thread (re) initializes the timer

Kernel resumes all waiting threads

Timer

nonsignaled signaled

Thread terminates

Kernel resumes all waiting threads

Thread

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 30: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

3030

Linux Synchronization

Kernel disables interrupts for synchronizing access to global data on uniprocessor systems

Uses spinlocks for multiprocessor synchronization

Uses semaphores and readers-writers locks when longer sections of code need access to data

Implements POSIX synchronization primitives to support multitasking multithreading (including real-time threads) and multiprocessing

3131

Further Reading

Ben-Ari M Principles of Concurrent Programming Prentice Hall 1982

Lamport L The Mutual Exclusion Problem Journal of the ACM April 1986

Abraham Silberschatz Peter B Galvin Operating System Concepts John Wiley amp Sons 6th Ed 2003

ndash Chapter 7 - Process Synchronization

ndash Chapter 8 - Deadlocks

3232

32 Trap Dispatching Interrupts Synchronization Trap and Interrupt dispatching

IRQL levels amp Interrupt Precedence

Spinlocks and Kernel Synchronization

Executive Synchronization

3333

Kernel Mode Versus User Mode

A processor state

Controls access to memory

Each memory page is tagged to show the required mode for reading and for writing

ndash Protects the system from the users

ndash Protects the user (process) from themselves

ndash System is not protected from system

Code regions are tagged ldquono write in any moderdquo

Controls ability to execute privileged instructions

A Windows abstraction

ndash Intel Ring 0 Ring 3

3434

Kernel Mode Versus User Mode

Control flow (a thread) can change from user to kernel mode and back

ndash Does not affect scheduling

ndash Thread context includes info about execution mode (along with registers etc)

PerfMon counters

ndash ldquoPrivileged Timerdquo and ldquoUser Timerdquo

ndash 4 levels of granularity thread process processor system

3535

Getting Into Kernel Mode

Code is run in kernel mode for one of three reasons

1 Requests from user mode

ndash Via the system service dispatch mechanism

ndash Kernel-mode code runs in the context of the requesting thread

2 Dedicated kernel-mode system threads

ndash Some threads in the system stay in kernel mode at all timesmostly in the ldquoSystemrdquo process

ndash Scheduled preempted etc like any other threads

3636

Getting Into Kernel Mode

3 Interrupts from external devicesndash interrupt dispatcher invokes the interrupt service routine

ndash ISR runs in the context of the interrupted thread so-called ldquoarbitrary thread contextrdquo

ndash ISR often requests the execution of a ldquoDPC routinerdquo which also runs in kernel mode

ndash Time not charged to interrupted thread

3737

Trap dispatching

Interruptdispatcher

Systemservice

dispatcher

Interruptserviceroutines

Interruptserviceroutines

Interruptserviceroutines

System services

System services

System services

Exceptiondispatcher

Exceptionhandlers

Exceptionhandlers

Exceptionhandlers

Virtual memorymanagerlsquos pager

Interrupt

System service call

HW exceptionsSW exceptions

Virtual addressexceptions

Trap processorlsquos mechanism to capture executing thread

ndash Switch from user to kernel mode

ndash Interrupts ndash asynchronous

ndash Exceptions - synchronous

Interrupt dispatch routine

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Interrupt service routine

interrupt

user or kernel mode

codekernel mode

Note no thread or process context switch

Note no thread or process context switch

Interrupt Dispatching

3939

IRQL = Interrupt Request Level

ndash Precedence of the interrupt with respect to other interrupts

ndash Different interrupt sources have different IRQLs

ndash not the same as IRQ

IRQL is also a state of the processor

ndash Servicing an interrupt raises processor IRQL to that interruptrsquos IRQL

ndash this masks subsequent interrupts at equal and lower IRQLs

User mode is limited to IRQL 0

No waits or page faults at IRQL gt= DISPATCH_LEVEL

Interrupt Precedence via IRQLs (x86)

4040

PassiveLowAPC

DispatchDPCDevice 1

Profile amp Synch (Srv 2003)

ClockInterprocessor Interrupt

Power failHigh

normal thread execution

Hardware interrupts

Deferrable software interrupts

012

302928

31

Interrupt Precedence via IRQLs (x86)

4141

Interrupt processing

Interrupt dispatch table (IDT)ndash Links to interrupt service routines

x86ndash Interrupt controller interrupts processor (single line)

ndash Processor queries for interrupt vector uses vector as index to IDT

Alphandash PAL code (Privileged Architecture Library ndash Alpha BIOS) determines interrupt vector calls kernel

ndash Kernel uses vector to index IDT

After ISR execution IRQL is lowered to initial level

4242

Interrupt object

Allows device drivers to register ISRs for their devicesndash Contains dispatch code (initial handler)

ndash Dispatch code calls ISR with interrupt object as parameter(HW cannot pass parameters to ISR)

Connectingdisconnecting interrupt objectsndash Dynamic association between ISR and IDT entry

ndash Loadable device drivers (kernel modules)

ndash Turn onoff ISR

Interrupt objects can synchronize access to ISR datandash Multiple instances of ISR may be active simultaneously (MP machine)

ndash Multiple ISR may be connected with IRQL

4343

Predefined IRQLs

High ndash used when halting the system (via KeBugCheck())

Power fail ndash originated in the NT design document but has never been used

Inter-processor interruptndash used to request action from other processor (dispatching a thread updating a processors TLB system shutdown system crash)

Clockndash Used to update systemlsquos clock allocation of CPU time to threads

Profilendash Used for kernel profiling (see Kernel profiler ndash Kernprofexe Res Kit)

4444

Predefined IRQLs (contd)

Device

ndash Used to prioritize device interrupts

DPCdispatch and APC

ndash Software interrupts that kernel and device drivers generate

Passive

ndash No interrupt level at all normal thread execution

4545

IRQLs on 64-bit Systems

PassiveLowAPC

DispatchDPCDevice 1

Device n

Synch (Srv 2003)Clock

Interprocessor InterruptPower

HighProfile

012

1413

15

34

PassiveLowAPC

DispatchDPC amp Synch (UP only)Correctable Machine Check

Device 1

Device nSynch (MP only)

ClockInterprocessor Interrupt

HighProfilePower

x64 IA64

12

4646

Interrupt Prioritization amp Delivery

IRQLs are determined as followsndash x86 UP systems IRQL = 27 - IRQ

ndash x86 MP systems bucketized (random)

ndash x64 amp IA64 systems IRQL = IDT vector number 16

On MP systems which processor is chosen to deliver an interruptndash By default any processor can receive an interrupt from any deviceCan be configured with IntFilter utility in Resource Kit

ndash On x86 and x64 systems the IOAPIC (IO advanced programmable interrupt controller) is programmed to interrupt the processor running at the lowest IRQL

ndash On IA64 systems the SAPIC (streamlined advanced programmable interrupt controller) is configured to interrupt one processor for each interrupt sourceProcessors are assigned round robin for each interrupt vector

4747

Software interrupts

Initiating thread dispatching

ndash DPC allow for scheduling actions when kernel is deep within many layers of code

ndash Delayed scheduling decision one DPC queue per processor

Handling timer expiration

Asynchronous execution of a procedure in context of a particular thread

Support for asynchronous IO operations

4848

Flow of Interrupts

4949

Sync on MP use spinlocks to coordinate among processors

Spinlock acquisition and release routines implement a one-owner-at-a-time algorithm

ndash Spinlock is either free or is considered to be owned by a CPU

ndash Analogous to using Windows API mutexes from user mode

A spinlock is just a data cell in memory

ndash Accessed with a test-and-modify operation that is atomic across all processors

ndash KSPIN_LOCK is an opaque data type typedefrsquod as a ULONG

ndash To implement synchronization a single bit is sufficient

Synchronization on SMP Systems

5050

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

Kernel Synchronization

Processor BProcessor A

Critical section

spinlock

DPC DPC

A spinlock is a locking primitive associatedwith a global data structure such as the DPC queue

5151

Queued Spinlocks

Problem Checking status of spinlock via test-and-set operation creates bus contention

Queued spinlocks maintain queue of waiting processors

First processor acquires lock other processors wait on processor-local flag

ndash Thus busy-wait loop requires no access to the memory bus

When releasing lock the 1st processorrsquos flag is modified

ndash Exactly one processor is being signaled

ndash Pre-determined wait order

5252

SMP Scalability Improvements

Windows 2000 queued spinlocks

ndash qlocks in Kernel Debugger

Server 2003

ndash More spinlocks eliminated (context swap system space commit)

ndash Further reduction of use of spinlocks amp length they are held

ndash Scheduling database now per-CPUAllows thread state transitions in parallel

5353

SMP Scalability Improvements

XP2003

ndash Minimized lock contention for hot locks PFN or Page Frame Database lock

ndash Some locks completely eliminatedCharging nonpagedpaged pool quotas allocating and mapping system page table entries charging commitment of pages allocatingmapping physical memory through AWE functions

ndash New more efficient locking mechanism (pushlocks)Doesnrsquot use spinlocks when no contentionUsed for object manager and address windowing extensions (AWE) related locks

5454

Waiting

Flexible wait calls

ndash Wait for one or multiple objects in one call

ndash Wait for multiple can wait for ldquoanyrdquo one or ldquoallrdquo at onceldquoAllrdquo all objects must be in the signalled state concurrently to resolve the wait

ndash All wait calls include optional timeout argument

ndash Waiting threads consume no CPU time

5555

Waiting

Waitable objects include

ndash Events (may be auto-reset or manual reset may be set or ldquopulsedrdquo)

ndash Mutexes (ldquomutual exclusionrdquo one-at-a-time)

ndash Semaphores (n-at-a-time)

ndash Timers

ndash Processes and Threads (signalled upon exit or terminate)

ndash Directories (change notification)

5656

Waiting

No guaranteed ordering of wait resolution

ndash If multiple threads are waiting for an object and only one thread is released (eg itrsquos a mutex or auto-reset event) which thread gets released is unpredictable

5757

Executive Synchronization

Thread waitson an object

handle

Create and initialize thread object

Initialized

Ready

Transition

Waiting

Running

Terminated

Standby

Wait is completeSet object to

signaled state

Interaction with thread scheduling

Waiting on Dispatcher Objects ndash outside the kernel

5858

Interaction bet Synchronization amp Dispatching User mode thread waits on an event objectlsquos handle

Kernel changes threadlsquos scheduling state from ready to waiting and adds thread to wait-list

Another thread sets the event

Kernel wakes up waiting threads variable priority threads get priority boost

5959

Interaction bet Synchronization amp Dispatching Dispatcher re-schedules new thread ndash it may preempt running thread it it has lower priority and issues software interrupt to initiate context switch

If no processor can be preempted the dispatcher places the ready thread in the dispatcher ready queue to be scheduled later

6060

What signals an object

Dispatcher object

System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

Owning thread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (kernel mode)

nonsignaled signaled

Owning thread or otherthread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (exported to user mode)

nonsignaled signaled

One thread releases thesemaphore freeing a resource

A thread acquires the semaphoreMore resources are not available

Kernel resumes one or more waiting threads

Semaphore

6161

A thread reinitializesthe thread object

What signals an object (contd)

Dispatcher object System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

A thread sets the event

Kernel resumes one or more threads

Kernel resumes one or more waiting threads

Event

nonsignaled signaled

Dedicated thread sets oneevent in the event pair

Kernel resumes theother dedicated thread

Kernel resumes waitingdedicated thread

Event pair

nonsignaled signaled

Timer expires

A thread (re) initializes the timer

Kernel resumes all waiting threads

Timer

nonsignaled signaled

Thread terminates

Kernel resumes all waiting threads

Thread

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 31: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

3131

Further Reading

Ben-Ari M Principles of Concurrent Programming Prentice Hall 1982

Lamport L The Mutual Exclusion Problem Journal of the ACM April 1986

Abraham Silberschatz Peter B Galvin Operating System Concepts John Wiley amp Sons 6th Ed 2003

ndash Chapter 7 - Process Synchronization

ndash Chapter 8 - Deadlocks

3232

32 Trap Dispatching Interrupts Synchronization Trap and Interrupt dispatching

IRQL levels amp Interrupt Precedence

Spinlocks and Kernel Synchronization

Executive Synchronization

3333

Kernel Mode Versus User Mode

A processor state

Controls access to memory

Each memory page is tagged to show the required mode for reading and for writing

ndash Protects the system from the users

ndash Protects the user (process) from themselves

ndash System is not protected from system

Code regions are tagged ldquono write in any moderdquo

Controls ability to execute privileged instructions

A Windows abstraction

ndash Intel Ring 0 Ring 3

3434

Kernel Mode Versus User Mode

Control flow (a thread) can change from user to kernel mode and back

ndash Does not affect scheduling

ndash Thread context includes info about execution mode (along with registers etc)

PerfMon counters

ndash ldquoPrivileged Timerdquo and ldquoUser Timerdquo

ndash 4 levels of granularity thread process processor system

3535

Getting Into Kernel Mode

Code is run in kernel mode for one of three reasons

1 Requests from user mode

ndash Via the system service dispatch mechanism

ndash Kernel-mode code runs in the context of the requesting thread

2 Dedicated kernel-mode system threads

ndash Some threads in the system stay in kernel mode at all timesmostly in the ldquoSystemrdquo process

ndash Scheduled preempted etc like any other threads

3636

Getting Into Kernel Mode

3 Interrupts from external devicesndash interrupt dispatcher invokes the interrupt service routine

ndash ISR runs in the context of the interrupted thread so-called ldquoarbitrary thread contextrdquo

ndash ISR often requests the execution of a ldquoDPC routinerdquo which also runs in kernel mode

ndash Time not charged to interrupted thread

3737

Trap dispatching

Interruptdispatcher

Systemservice

dispatcher

Interruptserviceroutines

Interruptserviceroutines

Interruptserviceroutines

System services

System services

System services

Exceptiondispatcher

Exceptionhandlers

Exceptionhandlers

Exceptionhandlers

Virtual memorymanagerlsquos pager

Interrupt

System service call

HW exceptionsSW exceptions

Virtual addressexceptions

Trap processorlsquos mechanism to capture executing thread

ndash Switch from user to kernel mode

ndash Interrupts ndash asynchronous

ndash Exceptions - synchronous

Interrupt dispatch routine

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Interrupt service routine

interrupt

user or kernel mode

codekernel mode

Note no thread or process context switch

Note no thread or process context switch

Interrupt Dispatching

3939

IRQL = Interrupt Request Level

ndash Precedence of the interrupt with respect to other interrupts

ndash Different interrupt sources have different IRQLs

ndash not the same as IRQ

IRQL is also a state of the processor

ndash Servicing an interrupt raises processor IRQL to that interruptrsquos IRQL

ndash this masks subsequent interrupts at equal and lower IRQLs

User mode is limited to IRQL 0

No waits or page faults at IRQL gt= DISPATCH_LEVEL

Interrupt Precedence via IRQLs (x86)

4040

PassiveLowAPC

DispatchDPCDevice 1

Profile amp Synch (Srv 2003)

ClockInterprocessor Interrupt

Power failHigh

normal thread execution

Hardware interrupts

Deferrable software interrupts

012

302928

31

Interrupt Precedence via IRQLs (x86)

4141

Interrupt processing

Interrupt dispatch table (IDT)ndash Links to interrupt service routines

x86ndash Interrupt controller interrupts processor (single line)

ndash Processor queries for interrupt vector uses vector as index to IDT

Alphandash PAL code (Privileged Architecture Library ndash Alpha BIOS) determines interrupt vector calls kernel

ndash Kernel uses vector to index IDT

After ISR execution IRQL is lowered to initial level

4242

Interrupt object

Allows device drivers to register ISRs for their devicesndash Contains dispatch code (initial handler)

ndash Dispatch code calls ISR with interrupt object as parameter(HW cannot pass parameters to ISR)

Connectingdisconnecting interrupt objectsndash Dynamic association between ISR and IDT entry

ndash Loadable device drivers (kernel modules)

ndash Turn onoff ISR

Interrupt objects can synchronize access to ISR datandash Multiple instances of ISR may be active simultaneously (MP machine)

ndash Multiple ISR may be connected with IRQL

4343

Predefined IRQLs

High ndash used when halting the system (via KeBugCheck())

Power fail ndash originated in the NT design document but has never been used

Inter-processor interruptndash used to request action from other processor (dispatching a thread updating a processors TLB system shutdown system crash)

Clockndash Used to update systemlsquos clock allocation of CPU time to threads

Profilendash Used for kernel profiling (see Kernel profiler ndash Kernprofexe Res Kit)

4444

Predefined IRQLs (contd)

Device

ndash Used to prioritize device interrupts

DPCdispatch and APC

ndash Software interrupts that kernel and device drivers generate

Passive

ndash No interrupt level at all normal thread execution

4545

IRQLs on 64-bit Systems

PassiveLowAPC

DispatchDPCDevice 1

Device n

Synch (Srv 2003)Clock

Interprocessor InterruptPower

HighProfile

012

1413

15

34

PassiveLowAPC

DispatchDPC amp Synch (UP only)Correctable Machine Check

Device 1

Device nSynch (MP only)

ClockInterprocessor Interrupt

HighProfilePower

x64 IA64

12

4646

Interrupt Prioritization amp Delivery

IRQLs are determined as followsndash x86 UP systems IRQL = 27 - IRQ

ndash x86 MP systems bucketized (random)

ndash x64 amp IA64 systems IRQL = IDT vector number 16

On MP systems which processor is chosen to deliver an interruptndash By default any processor can receive an interrupt from any deviceCan be configured with IntFilter utility in Resource Kit

ndash On x86 and x64 systems the IOAPIC (IO advanced programmable interrupt controller) is programmed to interrupt the processor running at the lowest IRQL

ndash On IA64 systems the SAPIC (streamlined advanced programmable interrupt controller) is configured to interrupt one processor for each interrupt sourceProcessors are assigned round robin for each interrupt vector

4747

Software interrupts

Initiating thread dispatching

ndash DPC allow for scheduling actions when kernel is deep within many layers of code

ndash Delayed scheduling decision one DPC queue per processor

Handling timer expiration

Asynchronous execution of a procedure in context of a particular thread

Support for asynchronous IO operations

4848

Flow of Interrupts

4949

Sync on MP use spinlocks to coordinate among processors

Spinlock acquisition and release routines implement a one-owner-at-a-time algorithm

ndash Spinlock is either free or is considered to be owned by a CPU

ndash Analogous to using Windows API mutexes from user mode

A spinlock is just a data cell in memory

ndash Accessed with a test-and-modify operation that is atomic across all processors

ndash KSPIN_LOCK is an opaque data type typedefrsquod as a ULONG

ndash To implement synchronization a single bit is sufficient

Synchronization on SMP Systems

5050

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

Kernel Synchronization

Processor BProcessor A

Critical section

spinlock

DPC DPC

A spinlock is a locking primitive associatedwith a global data structure such as the DPC queue

5151

Queued Spinlocks

Problem Checking status of spinlock via test-and-set operation creates bus contention

Queued spinlocks maintain queue of waiting processors

First processor acquires lock other processors wait on processor-local flag

ndash Thus busy-wait loop requires no access to the memory bus

When releasing lock the 1st processorrsquos flag is modified

ndash Exactly one processor is being signaled

ndash Pre-determined wait order

5252

SMP Scalability Improvements

Windows 2000 queued spinlocks

ndash qlocks in Kernel Debugger

Server 2003

ndash More spinlocks eliminated (context swap system space commit)

ndash Further reduction of use of spinlocks amp length they are held

ndash Scheduling database now per-CPUAllows thread state transitions in parallel

5353

SMP Scalability Improvements

XP2003

ndash Minimized lock contention for hot locks PFN or Page Frame Database lock

ndash Some locks completely eliminatedCharging nonpagedpaged pool quotas allocating and mapping system page table entries charging commitment of pages allocatingmapping physical memory through AWE functions

ndash New more efficient locking mechanism (pushlocks)Doesnrsquot use spinlocks when no contentionUsed for object manager and address windowing extensions (AWE) related locks

5454

Waiting

Flexible wait calls

ndash Wait for one or multiple objects in one call

ndash Wait for multiple can wait for ldquoanyrdquo one or ldquoallrdquo at onceldquoAllrdquo all objects must be in the signalled state concurrently to resolve the wait

ndash All wait calls include optional timeout argument

ndash Waiting threads consume no CPU time

5555

Waiting

Waitable objects include

ndash Events (may be auto-reset or manual reset may be set or ldquopulsedrdquo)

ndash Mutexes (ldquomutual exclusionrdquo one-at-a-time)

ndash Semaphores (n-at-a-time)

ndash Timers

ndash Processes and Threads (signalled upon exit or terminate)

ndash Directories (change notification)

5656

Waiting

No guaranteed ordering of wait resolution

ndash If multiple threads are waiting for an object and only one thread is released (eg itrsquos a mutex or auto-reset event) which thread gets released is unpredictable

5757

Executive Synchronization

Thread waitson an object

handle

Create and initialize thread object

Initialized

Ready

Transition

Waiting

Running

Terminated

Standby

Wait is completeSet object to

signaled state

Interaction with thread scheduling

Waiting on Dispatcher Objects ndash outside the kernel

5858

Interaction bet Synchronization amp Dispatching User mode thread waits on an event objectlsquos handle

Kernel changes threadlsquos scheduling state from ready to waiting and adds thread to wait-list

Another thread sets the event

Kernel wakes up waiting threads variable priority threads get priority boost

5959

Interaction bet Synchronization amp Dispatching Dispatcher re-schedules new thread ndash it may preempt running thread it it has lower priority and issues software interrupt to initiate context switch

If no processor can be preempted the dispatcher places the ready thread in the dispatcher ready queue to be scheduled later

6060

What signals an object

Dispatcher object

System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

Owning thread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (kernel mode)

nonsignaled signaled

Owning thread or otherthread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (exported to user mode)

nonsignaled signaled

One thread releases thesemaphore freeing a resource

A thread acquires the semaphoreMore resources are not available

Kernel resumes one or more waiting threads

Semaphore

6161

A thread reinitializesthe thread object

What signals an object (contd)

Dispatcher object System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

A thread sets the event

Kernel resumes one or more threads

Kernel resumes one or more waiting threads

Event

nonsignaled signaled

Dedicated thread sets oneevent in the event pair

Kernel resumes theother dedicated thread

Kernel resumes waitingdedicated thread

Event pair

nonsignaled signaled

Timer expires

A thread (re) initializes the timer

Kernel resumes all waiting threads

Timer

nonsignaled signaled

Thread terminates

Kernel resumes all waiting threads

Thread

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 32: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

3232

32 Trap Dispatching Interrupts Synchronization Trap and Interrupt dispatching

IRQL levels amp Interrupt Precedence

Spinlocks and Kernel Synchronization

Executive Synchronization

3333

Kernel Mode Versus User Mode

A processor state

Controls access to memory

Each memory page is tagged to show the required mode for reading and for writing

ndash Protects the system from the users

ndash Protects the user (process) from themselves

ndash System is not protected from system

Code regions are tagged ldquono write in any moderdquo

Controls ability to execute privileged instructions

A Windows abstraction

ndash Intel Ring 0 Ring 3

3434

Kernel Mode Versus User Mode

Control flow (a thread) can change from user to kernel mode and back

ndash Does not affect scheduling

ndash Thread context includes info about execution mode (along with registers etc)

PerfMon counters

ndash ldquoPrivileged Timerdquo and ldquoUser Timerdquo

ndash 4 levels of granularity thread process processor system

3535

Getting Into Kernel Mode

Code is run in kernel mode for one of three reasons

1 Requests from user mode

ndash Via the system service dispatch mechanism

ndash Kernel-mode code runs in the context of the requesting thread

2 Dedicated kernel-mode system threads

ndash Some threads in the system stay in kernel mode at all timesmostly in the ldquoSystemrdquo process

ndash Scheduled preempted etc like any other threads

3636

Getting Into Kernel Mode

3 Interrupts from external devicesndash interrupt dispatcher invokes the interrupt service routine

ndash ISR runs in the context of the interrupted thread so-called ldquoarbitrary thread contextrdquo

ndash ISR often requests the execution of a ldquoDPC routinerdquo which also runs in kernel mode

ndash Time not charged to interrupted thread

3737

Trap dispatching

Interruptdispatcher

Systemservice

dispatcher

Interruptserviceroutines

Interruptserviceroutines

Interruptserviceroutines

System services

System services

System services

Exceptiondispatcher

Exceptionhandlers

Exceptionhandlers

Exceptionhandlers

Virtual memorymanagerlsquos pager

Interrupt

System service call

HW exceptionsSW exceptions

Virtual addressexceptions

Trap processorlsquos mechanism to capture executing thread

ndash Switch from user to kernel mode

ndash Interrupts ndash asynchronous

ndash Exceptions - synchronous

Interrupt dispatch routine

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Interrupt service routine

interrupt

user or kernel mode

codekernel mode

Note no thread or process context switch

Note no thread or process context switch

Interrupt Dispatching

3939

IRQL = Interrupt Request Level

ndash Precedence of the interrupt with respect to other interrupts

ndash Different interrupt sources have different IRQLs

ndash not the same as IRQ

IRQL is also a state of the processor

ndash Servicing an interrupt raises processor IRQL to that interruptrsquos IRQL

ndash this masks subsequent interrupts at equal and lower IRQLs

User mode is limited to IRQL 0

No waits or page faults at IRQL gt= DISPATCH_LEVEL

Interrupt Precedence via IRQLs (x86)

4040

PassiveLowAPC

DispatchDPCDevice 1

Profile amp Synch (Srv 2003)

ClockInterprocessor Interrupt

Power failHigh

normal thread execution

Hardware interrupts

Deferrable software interrupts

012

302928

31

Interrupt Precedence via IRQLs (x86)

4141

Interrupt processing

Interrupt dispatch table (IDT)ndash Links to interrupt service routines

x86ndash Interrupt controller interrupts processor (single line)

ndash Processor queries for interrupt vector uses vector as index to IDT

Alphandash PAL code (Privileged Architecture Library ndash Alpha BIOS) determines interrupt vector calls kernel

ndash Kernel uses vector to index IDT

After ISR execution IRQL is lowered to initial level

4242

Interrupt object

Allows device drivers to register ISRs for their devicesndash Contains dispatch code (initial handler)

ndash Dispatch code calls ISR with interrupt object as parameter(HW cannot pass parameters to ISR)

Connectingdisconnecting interrupt objectsndash Dynamic association between ISR and IDT entry

ndash Loadable device drivers (kernel modules)

ndash Turn onoff ISR

Interrupt objects can synchronize access to ISR datandash Multiple instances of ISR may be active simultaneously (MP machine)

ndash Multiple ISR may be connected with IRQL

4343

Predefined IRQLs

High ndash used when halting the system (via KeBugCheck())

Power fail ndash originated in the NT design document but has never been used

Inter-processor interruptndash used to request action from other processor (dispatching a thread updating a processors TLB system shutdown system crash)

Clockndash Used to update systemlsquos clock allocation of CPU time to threads

Profilendash Used for kernel profiling (see Kernel profiler ndash Kernprofexe Res Kit)

4444

Predefined IRQLs (contd)

Device

ndash Used to prioritize device interrupts

DPCdispatch and APC

ndash Software interrupts that kernel and device drivers generate

Passive

ndash No interrupt level at all normal thread execution

4545

IRQLs on 64-bit Systems

PassiveLowAPC

DispatchDPCDevice 1

Device n

Synch (Srv 2003)Clock

Interprocessor InterruptPower

HighProfile

012

1413

15

34

PassiveLowAPC

DispatchDPC amp Synch (UP only)Correctable Machine Check

Device 1

Device nSynch (MP only)

ClockInterprocessor Interrupt

HighProfilePower

x64 IA64

12

4646

Interrupt Prioritization amp Delivery

IRQLs are determined as followsndash x86 UP systems IRQL = 27 - IRQ

ndash x86 MP systems bucketized (random)

ndash x64 amp IA64 systems IRQL = IDT vector number 16

On MP systems which processor is chosen to deliver an interruptndash By default any processor can receive an interrupt from any deviceCan be configured with IntFilter utility in Resource Kit

ndash On x86 and x64 systems the IOAPIC (IO advanced programmable interrupt controller) is programmed to interrupt the processor running at the lowest IRQL

ndash On IA64 systems the SAPIC (streamlined advanced programmable interrupt controller) is configured to interrupt one processor for each interrupt sourceProcessors are assigned round robin for each interrupt vector

4747

Software interrupts

Initiating thread dispatching

ndash DPC allow for scheduling actions when kernel is deep within many layers of code

ndash Delayed scheduling decision one DPC queue per processor

Handling timer expiration

Asynchronous execution of a procedure in context of a particular thread

Support for asynchronous IO operations

4848

Flow of Interrupts

4949

Sync on MP use spinlocks to coordinate among processors

Spinlock acquisition and release routines implement a one-owner-at-a-time algorithm

ndash Spinlock is either free or is considered to be owned by a CPU

ndash Analogous to using Windows API mutexes from user mode

A spinlock is just a data cell in memory

ndash Accessed with a test-and-modify operation that is atomic across all processors

ndash KSPIN_LOCK is an opaque data type typedefrsquod as a ULONG

ndash To implement synchronization a single bit is sufficient

Synchronization on SMP Systems

5050

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

Kernel Synchronization

Processor BProcessor A

Critical section

spinlock

DPC DPC

A spinlock is a locking primitive associatedwith a global data structure such as the DPC queue

5151

Queued Spinlocks

Problem Checking status of spinlock via test-and-set operation creates bus contention

Queued spinlocks maintain queue of waiting processors

First processor acquires lock other processors wait on processor-local flag

ndash Thus busy-wait loop requires no access to the memory bus

When releasing lock the 1st processorrsquos flag is modified

ndash Exactly one processor is being signaled

ndash Pre-determined wait order

5252

SMP Scalability Improvements

Windows 2000 queued spinlocks

ndash qlocks in Kernel Debugger

Server 2003

ndash More spinlocks eliminated (context swap system space commit)

ndash Further reduction of use of spinlocks amp length they are held

ndash Scheduling database now per-CPUAllows thread state transitions in parallel

5353

SMP Scalability Improvements

XP2003

ndash Minimized lock contention for hot locks PFN or Page Frame Database lock

ndash Some locks completely eliminatedCharging nonpagedpaged pool quotas allocating and mapping system page table entries charging commitment of pages allocatingmapping physical memory through AWE functions

ndash New more efficient locking mechanism (pushlocks)Doesnrsquot use spinlocks when no contentionUsed for object manager and address windowing extensions (AWE) related locks

5454

Waiting

Flexible wait calls

ndash Wait for one or multiple objects in one call

ndash Wait for multiple can wait for ldquoanyrdquo one or ldquoallrdquo at onceldquoAllrdquo all objects must be in the signalled state concurrently to resolve the wait

ndash All wait calls include optional timeout argument

ndash Waiting threads consume no CPU time

5555

Waiting

Waitable objects include

ndash Events (may be auto-reset or manual reset may be set or ldquopulsedrdquo)

ndash Mutexes (ldquomutual exclusionrdquo one-at-a-time)

ndash Semaphores (n-at-a-time)

ndash Timers

ndash Processes and Threads (signalled upon exit or terminate)

ndash Directories (change notification)

5656

Waiting

No guaranteed ordering of wait resolution

ndash If multiple threads are waiting for an object and only one thread is released (eg itrsquos a mutex or auto-reset event) which thread gets released is unpredictable

5757

Executive Synchronization

Thread waitson an object

handle

Create and initialize thread object

Initialized

Ready

Transition

Waiting

Running

Terminated

Standby

Wait is completeSet object to

signaled state

Interaction with thread scheduling

Waiting on Dispatcher Objects ndash outside the kernel

5858

Interaction bet Synchronization amp Dispatching User mode thread waits on an event objectlsquos handle

Kernel changes threadlsquos scheduling state from ready to waiting and adds thread to wait-list

Another thread sets the event

Kernel wakes up waiting threads variable priority threads get priority boost

5959

Interaction bet Synchronization amp Dispatching Dispatcher re-schedules new thread ndash it may preempt running thread it it has lower priority and issues software interrupt to initiate context switch

If no processor can be preempted the dispatcher places the ready thread in the dispatcher ready queue to be scheduled later

6060

What signals an object

Dispatcher object

System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

Owning thread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (kernel mode)

nonsignaled signaled

Owning thread or otherthread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (exported to user mode)

nonsignaled signaled

One thread releases thesemaphore freeing a resource

A thread acquires the semaphoreMore resources are not available

Kernel resumes one or more waiting threads

Semaphore

6161

A thread reinitializesthe thread object

What signals an object (contd)

Dispatcher object System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

A thread sets the event

Kernel resumes one or more threads

Kernel resumes one or more waiting threads

Event

nonsignaled signaled

Dedicated thread sets oneevent in the event pair

Kernel resumes theother dedicated thread

Kernel resumes waitingdedicated thread

Event pair

nonsignaled signaled

Timer expires

A thread (re) initializes the timer

Kernel resumes all waiting threads

Timer

nonsignaled signaled

Thread terminates

Kernel resumes all waiting threads

Thread

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 33: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

3333

Kernel Mode Versus User Mode

A processor state

Controls access to memory

Each memory page is tagged to show the required mode for reading and for writing

ndash Protects the system from the users

ndash Protects the user (process) from themselves

ndash System is not protected from system

Code regions are tagged ldquono write in any moderdquo

Controls ability to execute privileged instructions

A Windows abstraction

ndash Intel Ring 0 Ring 3

3434

Kernel Mode Versus User Mode

Control flow (a thread) can change from user to kernel mode and back

ndash Does not affect scheduling

ndash Thread context includes info about execution mode (along with registers etc)

PerfMon counters

ndash ldquoPrivileged Timerdquo and ldquoUser Timerdquo

ndash 4 levels of granularity thread process processor system

3535

Getting Into Kernel Mode

Code is run in kernel mode for one of three reasons

1 Requests from user mode

ndash Via the system service dispatch mechanism

ndash Kernel-mode code runs in the context of the requesting thread

2 Dedicated kernel-mode system threads

ndash Some threads in the system stay in kernel mode at all timesmostly in the ldquoSystemrdquo process

ndash Scheduled preempted etc like any other threads

3636

Getting Into Kernel Mode

3 Interrupts from external devicesndash interrupt dispatcher invokes the interrupt service routine

ndash ISR runs in the context of the interrupted thread so-called ldquoarbitrary thread contextrdquo

ndash ISR often requests the execution of a ldquoDPC routinerdquo which also runs in kernel mode

ndash Time not charged to interrupted thread

3737

Trap dispatching

Interruptdispatcher

Systemservice

dispatcher

Interruptserviceroutines

Interruptserviceroutines

Interruptserviceroutines

System services

System services

System services

Exceptiondispatcher

Exceptionhandlers

Exceptionhandlers

Exceptionhandlers

Virtual memorymanagerlsquos pager

Interrupt

System service call

HW exceptionsSW exceptions

Virtual addressexceptions

Trap processorlsquos mechanism to capture executing thread

ndash Switch from user to kernel mode

ndash Interrupts ndash asynchronous

ndash Exceptions - synchronous

Interrupt dispatch routine

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Interrupt service routine

interrupt

user or kernel mode

codekernel mode

Note no thread or process context switch

Note no thread or process context switch

Interrupt Dispatching

3939

IRQL = Interrupt Request Level

ndash Precedence of the interrupt with respect to other interrupts

ndash Different interrupt sources have different IRQLs

ndash not the same as IRQ

IRQL is also a state of the processor

ndash Servicing an interrupt raises processor IRQL to that interruptrsquos IRQL

ndash this masks subsequent interrupts at equal and lower IRQLs

User mode is limited to IRQL 0

No waits or page faults at IRQL gt= DISPATCH_LEVEL

Interrupt Precedence via IRQLs (x86)

4040

PassiveLowAPC

DispatchDPCDevice 1

Profile amp Synch (Srv 2003)

ClockInterprocessor Interrupt

Power failHigh

normal thread execution

Hardware interrupts

Deferrable software interrupts

012

302928

31

Interrupt Precedence via IRQLs (x86)

4141

Interrupt processing

Interrupt dispatch table (IDT)ndash Links to interrupt service routines

x86ndash Interrupt controller interrupts processor (single line)

ndash Processor queries for interrupt vector uses vector as index to IDT

Alphandash PAL code (Privileged Architecture Library ndash Alpha BIOS) determines interrupt vector calls kernel

ndash Kernel uses vector to index IDT

After ISR execution IRQL is lowered to initial level

4242

Interrupt object

Allows device drivers to register ISRs for their devicesndash Contains dispatch code (initial handler)

ndash Dispatch code calls ISR with interrupt object as parameter(HW cannot pass parameters to ISR)

Connectingdisconnecting interrupt objectsndash Dynamic association between ISR and IDT entry

ndash Loadable device drivers (kernel modules)

ndash Turn onoff ISR

Interrupt objects can synchronize access to ISR datandash Multiple instances of ISR may be active simultaneously (MP machine)

ndash Multiple ISR may be connected with IRQL

4343

Predefined IRQLs

High ndash used when halting the system (via KeBugCheck())

Power fail ndash originated in the NT design document but has never been used

Inter-processor interruptndash used to request action from other processor (dispatching a thread updating a processors TLB system shutdown system crash)

Clockndash Used to update systemlsquos clock allocation of CPU time to threads

Profilendash Used for kernel profiling (see Kernel profiler ndash Kernprofexe Res Kit)

4444

Predefined IRQLs (contd)

Device

ndash Used to prioritize device interrupts

DPCdispatch and APC

ndash Software interrupts that kernel and device drivers generate

Passive

ndash No interrupt level at all normal thread execution

4545

IRQLs on 64-bit Systems

PassiveLowAPC

DispatchDPCDevice 1

Device n

Synch (Srv 2003)Clock

Interprocessor InterruptPower

HighProfile

012

1413

15

34

PassiveLowAPC

DispatchDPC amp Synch (UP only)Correctable Machine Check

Device 1

Device nSynch (MP only)

ClockInterprocessor Interrupt

HighProfilePower

x64 IA64

12

4646

Interrupt Prioritization amp Delivery

IRQLs are determined as followsndash x86 UP systems IRQL = 27 - IRQ

ndash x86 MP systems bucketized (random)

ndash x64 amp IA64 systems IRQL = IDT vector number 16

On MP systems which processor is chosen to deliver an interruptndash By default any processor can receive an interrupt from any deviceCan be configured with IntFilter utility in Resource Kit

ndash On x86 and x64 systems the IOAPIC (IO advanced programmable interrupt controller) is programmed to interrupt the processor running at the lowest IRQL

ndash On IA64 systems the SAPIC (streamlined advanced programmable interrupt controller) is configured to interrupt one processor for each interrupt sourceProcessors are assigned round robin for each interrupt vector

4747

Software interrupts

Initiating thread dispatching

ndash DPC allow for scheduling actions when kernel is deep within many layers of code

ndash Delayed scheduling decision one DPC queue per processor

Handling timer expiration

Asynchronous execution of a procedure in context of a particular thread

Support for asynchronous IO operations

4848

Flow of Interrupts

4949

Sync on MP use spinlocks to coordinate among processors

Spinlock acquisition and release routines implement a one-owner-at-a-time algorithm

ndash Spinlock is either free or is considered to be owned by a CPU

ndash Analogous to using Windows API mutexes from user mode

A spinlock is just a data cell in memory

ndash Accessed with a test-and-modify operation that is atomic across all processors

ndash KSPIN_LOCK is an opaque data type typedefrsquod as a ULONG

ndash To implement synchronization a single bit is sufficient

Synchronization on SMP Systems

5050

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

Kernel Synchronization

Processor BProcessor A

Critical section

spinlock

DPC DPC

A spinlock is a locking primitive associatedwith a global data structure such as the DPC queue

5151

Queued Spinlocks

Problem Checking status of spinlock via test-and-set operation creates bus contention

Queued spinlocks maintain queue of waiting processors

First processor acquires lock other processors wait on processor-local flag

ndash Thus busy-wait loop requires no access to the memory bus

When releasing lock the 1st processorrsquos flag is modified

ndash Exactly one processor is being signaled

ndash Pre-determined wait order

5252

SMP Scalability Improvements

Windows 2000 queued spinlocks

ndash qlocks in Kernel Debugger

Server 2003

ndash More spinlocks eliminated (context swap system space commit)

ndash Further reduction of use of spinlocks amp length they are held

ndash Scheduling database now per-CPUAllows thread state transitions in parallel

5353

SMP Scalability Improvements

XP2003

ndash Minimized lock contention for hot locks PFN or Page Frame Database lock

ndash Some locks completely eliminatedCharging nonpagedpaged pool quotas allocating and mapping system page table entries charging commitment of pages allocatingmapping physical memory through AWE functions

ndash New more efficient locking mechanism (pushlocks)Doesnrsquot use spinlocks when no contentionUsed for object manager and address windowing extensions (AWE) related locks

5454

Waiting

Flexible wait calls

ndash Wait for one or multiple objects in one call

ndash Wait for multiple can wait for ldquoanyrdquo one or ldquoallrdquo at onceldquoAllrdquo all objects must be in the signalled state concurrently to resolve the wait

ndash All wait calls include optional timeout argument

ndash Waiting threads consume no CPU time

5555

Waiting

Waitable objects include

ndash Events (may be auto-reset or manual reset may be set or ldquopulsedrdquo)

ndash Mutexes (ldquomutual exclusionrdquo one-at-a-time)

ndash Semaphores (n-at-a-time)

ndash Timers

ndash Processes and Threads (signalled upon exit or terminate)

ndash Directories (change notification)

5656

Waiting

No guaranteed ordering of wait resolution

ndash If multiple threads are waiting for an object and only one thread is released (eg itrsquos a mutex or auto-reset event) which thread gets released is unpredictable

5757

Executive Synchronization

Thread waitson an object

handle

Create and initialize thread object

Initialized

Ready

Transition

Waiting

Running

Terminated

Standby

Wait is completeSet object to

signaled state

Interaction with thread scheduling

Waiting on Dispatcher Objects ndash outside the kernel

5858

Interaction bet Synchronization amp Dispatching User mode thread waits on an event objectlsquos handle

Kernel changes threadlsquos scheduling state from ready to waiting and adds thread to wait-list

Another thread sets the event

Kernel wakes up waiting threads variable priority threads get priority boost

5959

Interaction bet Synchronization amp Dispatching Dispatcher re-schedules new thread ndash it may preempt running thread it it has lower priority and issues software interrupt to initiate context switch

If no processor can be preempted the dispatcher places the ready thread in the dispatcher ready queue to be scheduled later

6060

What signals an object

Dispatcher object

System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

Owning thread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (kernel mode)

nonsignaled signaled

Owning thread or otherthread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (exported to user mode)

nonsignaled signaled

One thread releases thesemaphore freeing a resource

A thread acquires the semaphoreMore resources are not available

Kernel resumes one or more waiting threads

Semaphore

6161

A thread reinitializesthe thread object

What signals an object (contd)

Dispatcher object System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

A thread sets the event

Kernel resumes one or more threads

Kernel resumes one or more waiting threads

Event

nonsignaled signaled

Dedicated thread sets oneevent in the event pair

Kernel resumes theother dedicated thread

Kernel resumes waitingdedicated thread

Event pair

nonsignaled signaled

Timer expires

A thread (re) initializes the timer

Kernel resumes all waiting threads

Timer

nonsignaled signaled

Thread terminates

Kernel resumes all waiting threads

Thread

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 34: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

3434

Kernel Mode Versus User Mode

Control flow (a thread) can change from user to kernel mode and back

ndash Does not affect scheduling

ndash Thread context includes info about execution mode (along with registers etc)

PerfMon counters

ndash ldquoPrivileged Timerdquo and ldquoUser Timerdquo

ndash 4 levels of granularity thread process processor system

3535

Getting Into Kernel Mode

Code is run in kernel mode for one of three reasons

1 Requests from user mode

ndash Via the system service dispatch mechanism

ndash Kernel-mode code runs in the context of the requesting thread

2 Dedicated kernel-mode system threads

ndash Some threads in the system stay in kernel mode at all timesmostly in the ldquoSystemrdquo process

ndash Scheduled preempted etc like any other threads

3636

Getting Into Kernel Mode

3 Interrupts from external devicesndash interrupt dispatcher invokes the interrupt service routine

ndash ISR runs in the context of the interrupted thread so-called ldquoarbitrary thread contextrdquo

ndash ISR often requests the execution of a ldquoDPC routinerdquo which also runs in kernel mode

ndash Time not charged to interrupted thread

3737

Trap dispatching

Interruptdispatcher

Systemservice

dispatcher

Interruptserviceroutines

Interruptserviceroutines

Interruptserviceroutines

System services

System services

System services

Exceptiondispatcher

Exceptionhandlers

Exceptionhandlers

Exceptionhandlers

Virtual memorymanagerlsquos pager

Interrupt

System service call

HW exceptionsSW exceptions

Virtual addressexceptions

Trap processorlsquos mechanism to capture executing thread

ndash Switch from user to kernel mode

ndash Interrupts ndash asynchronous

ndash Exceptions - synchronous

Interrupt dispatch routine

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Interrupt service routine

interrupt

user or kernel mode

codekernel mode

Note no thread or process context switch

Note no thread or process context switch

Interrupt Dispatching

3939

IRQL = Interrupt Request Level

ndash Precedence of the interrupt with respect to other interrupts

ndash Different interrupt sources have different IRQLs

ndash not the same as IRQ

IRQL is also a state of the processor

ndash Servicing an interrupt raises processor IRQL to that interruptrsquos IRQL

ndash this masks subsequent interrupts at equal and lower IRQLs

User mode is limited to IRQL 0

No waits or page faults at IRQL gt= DISPATCH_LEVEL

Interrupt Precedence via IRQLs (x86)

4040

PassiveLowAPC

DispatchDPCDevice 1

Profile amp Synch (Srv 2003)

ClockInterprocessor Interrupt

Power failHigh

normal thread execution

Hardware interrupts

Deferrable software interrupts

012

302928

31

Interrupt Precedence via IRQLs (x86)

4141

Interrupt processing

Interrupt dispatch table (IDT)ndash Links to interrupt service routines

x86ndash Interrupt controller interrupts processor (single line)

ndash Processor queries for interrupt vector uses vector as index to IDT

Alphandash PAL code (Privileged Architecture Library ndash Alpha BIOS) determines interrupt vector calls kernel

ndash Kernel uses vector to index IDT

After ISR execution IRQL is lowered to initial level

4242

Interrupt object

Allows device drivers to register ISRs for their devicesndash Contains dispatch code (initial handler)

ndash Dispatch code calls ISR with interrupt object as parameter(HW cannot pass parameters to ISR)

Connectingdisconnecting interrupt objectsndash Dynamic association between ISR and IDT entry

ndash Loadable device drivers (kernel modules)

ndash Turn onoff ISR

Interrupt objects can synchronize access to ISR datandash Multiple instances of ISR may be active simultaneously (MP machine)

ndash Multiple ISR may be connected with IRQL

4343

Predefined IRQLs

High ndash used when halting the system (via KeBugCheck())

Power fail ndash originated in the NT design document but has never been used

Inter-processor interruptndash used to request action from other processor (dispatching a thread updating a processors TLB system shutdown system crash)

Clockndash Used to update systemlsquos clock allocation of CPU time to threads

Profilendash Used for kernel profiling (see Kernel profiler ndash Kernprofexe Res Kit)

4444

Predefined IRQLs (contd)

Device

ndash Used to prioritize device interrupts

DPCdispatch and APC

ndash Software interrupts that kernel and device drivers generate

Passive

ndash No interrupt level at all normal thread execution

4545

IRQLs on 64-bit Systems

PassiveLowAPC

DispatchDPCDevice 1

Device n

Synch (Srv 2003)Clock

Interprocessor InterruptPower

HighProfile

012

1413

15

34

PassiveLowAPC

DispatchDPC amp Synch (UP only)Correctable Machine Check

Device 1

Device nSynch (MP only)

ClockInterprocessor Interrupt

HighProfilePower

x64 IA64

12

4646

Interrupt Prioritization amp Delivery

IRQLs are determined as followsndash x86 UP systems IRQL = 27 - IRQ

ndash x86 MP systems bucketized (random)

ndash x64 amp IA64 systems IRQL = IDT vector number 16

On MP systems which processor is chosen to deliver an interruptndash By default any processor can receive an interrupt from any deviceCan be configured with IntFilter utility in Resource Kit

ndash On x86 and x64 systems the IOAPIC (IO advanced programmable interrupt controller) is programmed to interrupt the processor running at the lowest IRQL

ndash On IA64 systems the SAPIC (streamlined advanced programmable interrupt controller) is configured to interrupt one processor for each interrupt sourceProcessors are assigned round robin for each interrupt vector

4747

Software interrupts

Initiating thread dispatching

ndash DPC allow for scheduling actions when kernel is deep within many layers of code

ndash Delayed scheduling decision one DPC queue per processor

Handling timer expiration

Asynchronous execution of a procedure in context of a particular thread

Support for asynchronous IO operations

4848

Flow of Interrupts

4949

Sync on MP use spinlocks to coordinate among processors

Spinlock acquisition and release routines implement a one-owner-at-a-time algorithm

ndash Spinlock is either free or is considered to be owned by a CPU

ndash Analogous to using Windows API mutexes from user mode

A spinlock is just a data cell in memory

ndash Accessed with a test-and-modify operation that is atomic across all processors

ndash KSPIN_LOCK is an opaque data type typedefrsquod as a ULONG

ndash To implement synchronization a single bit is sufficient

Synchronization on SMP Systems

5050

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

Kernel Synchronization

Processor BProcessor A

Critical section

spinlock

DPC DPC

A spinlock is a locking primitive associatedwith a global data structure such as the DPC queue

5151

Queued Spinlocks

Problem Checking status of spinlock via test-and-set operation creates bus contention

Queued spinlocks maintain queue of waiting processors

First processor acquires lock other processors wait on processor-local flag

ndash Thus busy-wait loop requires no access to the memory bus

When releasing lock the 1st processorrsquos flag is modified

ndash Exactly one processor is being signaled

ndash Pre-determined wait order

5252

SMP Scalability Improvements

Windows 2000 queued spinlocks

ndash qlocks in Kernel Debugger

Server 2003

ndash More spinlocks eliminated (context swap system space commit)

ndash Further reduction of use of spinlocks amp length they are held

ndash Scheduling database now per-CPUAllows thread state transitions in parallel

5353

SMP Scalability Improvements

XP2003

ndash Minimized lock contention for hot locks PFN or Page Frame Database lock

ndash Some locks completely eliminatedCharging nonpagedpaged pool quotas allocating and mapping system page table entries charging commitment of pages allocatingmapping physical memory through AWE functions

ndash New more efficient locking mechanism (pushlocks)Doesnrsquot use spinlocks when no contentionUsed for object manager and address windowing extensions (AWE) related locks

5454

Waiting

Flexible wait calls

ndash Wait for one or multiple objects in one call

ndash Wait for multiple can wait for ldquoanyrdquo one or ldquoallrdquo at onceldquoAllrdquo all objects must be in the signalled state concurrently to resolve the wait

ndash All wait calls include optional timeout argument

ndash Waiting threads consume no CPU time

5555

Waiting

Waitable objects include

ndash Events (may be auto-reset or manual reset may be set or ldquopulsedrdquo)

ndash Mutexes (ldquomutual exclusionrdquo one-at-a-time)

ndash Semaphores (n-at-a-time)

ndash Timers

ndash Processes and Threads (signalled upon exit or terminate)

ndash Directories (change notification)

5656

Waiting

No guaranteed ordering of wait resolution

ndash If multiple threads are waiting for an object and only one thread is released (eg itrsquos a mutex or auto-reset event) which thread gets released is unpredictable

5757

Executive Synchronization

Thread waitson an object

handle

Create and initialize thread object

Initialized

Ready

Transition

Waiting

Running

Terminated

Standby

Wait is completeSet object to

signaled state

Interaction with thread scheduling

Waiting on Dispatcher Objects ndash outside the kernel

5858

Interaction bet Synchronization amp Dispatching User mode thread waits on an event objectlsquos handle

Kernel changes threadlsquos scheduling state from ready to waiting and adds thread to wait-list

Another thread sets the event

Kernel wakes up waiting threads variable priority threads get priority boost

5959

Interaction bet Synchronization amp Dispatching Dispatcher re-schedules new thread ndash it may preempt running thread it it has lower priority and issues software interrupt to initiate context switch

If no processor can be preempted the dispatcher places the ready thread in the dispatcher ready queue to be scheduled later

6060

What signals an object

Dispatcher object

System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

Owning thread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (kernel mode)

nonsignaled signaled

Owning thread or otherthread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (exported to user mode)

nonsignaled signaled

One thread releases thesemaphore freeing a resource

A thread acquires the semaphoreMore resources are not available

Kernel resumes one or more waiting threads

Semaphore

6161

A thread reinitializesthe thread object

What signals an object (contd)

Dispatcher object System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

A thread sets the event

Kernel resumes one or more threads

Kernel resumes one or more waiting threads

Event

nonsignaled signaled

Dedicated thread sets oneevent in the event pair

Kernel resumes theother dedicated thread

Kernel resumes waitingdedicated thread

Event pair

nonsignaled signaled

Timer expires

A thread (re) initializes the timer

Kernel resumes all waiting threads

Timer

nonsignaled signaled

Thread terminates

Kernel resumes all waiting threads

Thread

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 35: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

3535

Getting Into Kernel Mode

Code is run in kernel mode for one of three reasons

1 Requests from user mode

ndash Via the system service dispatch mechanism

ndash Kernel-mode code runs in the context of the requesting thread

2 Dedicated kernel-mode system threads

ndash Some threads in the system stay in kernel mode at all timesmostly in the ldquoSystemrdquo process

ndash Scheduled preempted etc like any other threads

3636

Getting Into Kernel Mode

3 Interrupts from external devicesndash interrupt dispatcher invokes the interrupt service routine

ndash ISR runs in the context of the interrupted thread so-called ldquoarbitrary thread contextrdquo

ndash ISR often requests the execution of a ldquoDPC routinerdquo which also runs in kernel mode

ndash Time not charged to interrupted thread

3737

Trap dispatching

Interruptdispatcher

Systemservice

dispatcher

Interruptserviceroutines

Interruptserviceroutines

Interruptserviceroutines

System services

System services

System services

Exceptiondispatcher

Exceptionhandlers

Exceptionhandlers

Exceptionhandlers

Virtual memorymanagerlsquos pager

Interrupt

System service call

HW exceptionsSW exceptions

Virtual addressexceptions

Trap processorlsquos mechanism to capture executing thread

ndash Switch from user to kernel mode

ndash Interrupts ndash asynchronous

ndash Exceptions - synchronous

Interrupt dispatch routine

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Interrupt service routine

interrupt

user or kernel mode

codekernel mode

Note no thread or process context switch

Note no thread or process context switch

Interrupt Dispatching

3939

IRQL = Interrupt Request Level

ndash Precedence of the interrupt with respect to other interrupts

ndash Different interrupt sources have different IRQLs

ndash not the same as IRQ

IRQL is also a state of the processor

ndash Servicing an interrupt raises processor IRQL to that interruptrsquos IRQL

ndash this masks subsequent interrupts at equal and lower IRQLs

User mode is limited to IRQL 0

No waits or page faults at IRQL gt= DISPATCH_LEVEL

Interrupt Precedence via IRQLs (x86)

4040

PassiveLowAPC

DispatchDPCDevice 1

Profile amp Synch (Srv 2003)

ClockInterprocessor Interrupt

Power failHigh

normal thread execution

Hardware interrupts

Deferrable software interrupts

012

302928

31

Interrupt Precedence via IRQLs (x86)

4141

Interrupt processing

Interrupt dispatch table (IDT)ndash Links to interrupt service routines

x86ndash Interrupt controller interrupts processor (single line)

ndash Processor queries for interrupt vector uses vector as index to IDT

Alphandash PAL code (Privileged Architecture Library ndash Alpha BIOS) determines interrupt vector calls kernel

ndash Kernel uses vector to index IDT

After ISR execution IRQL is lowered to initial level

4242

Interrupt object

Allows device drivers to register ISRs for their devicesndash Contains dispatch code (initial handler)

ndash Dispatch code calls ISR with interrupt object as parameter(HW cannot pass parameters to ISR)

Connectingdisconnecting interrupt objectsndash Dynamic association between ISR and IDT entry

ndash Loadable device drivers (kernel modules)

ndash Turn onoff ISR

Interrupt objects can synchronize access to ISR datandash Multiple instances of ISR may be active simultaneously (MP machine)

ndash Multiple ISR may be connected with IRQL

4343

Predefined IRQLs

High ndash used when halting the system (via KeBugCheck())

Power fail ndash originated in the NT design document but has never been used

Inter-processor interruptndash used to request action from other processor (dispatching a thread updating a processors TLB system shutdown system crash)

Clockndash Used to update systemlsquos clock allocation of CPU time to threads

Profilendash Used for kernel profiling (see Kernel profiler ndash Kernprofexe Res Kit)

4444

Predefined IRQLs (contd)

Device

ndash Used to prioritize device interrupts

DPCdispatch and APC

ndash Software interrupts that kernel and device drivers generate

Passive

ndash No interrupt level at all normal thread execution

4545

IRQLs on 64-bit Systems

PassiveLowAPC

DispatchDPCDevice 1

Device n

Synch (Srv 2003)Clock

Interprocessor InterruptPower

HighProfile

012

1413

15

34

PassiveLowAPC

DispatchDPC amp Synch (UP only)Correctable Machine Check

Device 1

Device nSynch (MP only)

ClockInterprocessor Interrupt

HighProfilePower

x64 IA64

12

4646

Interrupt Prioritization amp Delivery

IRQLs are determined as followsndash x86 UP systems IRQL = 27 - IRQ

ndash x86 MP systems bucketized (random)

ndash x64 amp IA64 systems IRQL = IDT vector number 16

On MP systems which processor is chosen to deliver an interruptndash By default any processor can receive an interrupt from any deviceCan be configured with IntFilter utility in Resource Kit

ndash On x86 and x64 systems the IOAPIC (IO advanced programmable interrupt controller) is programmed to interrupt the processor running at the lowest IRQL

ndash On IA64 systems the SAPIC (streamlined advanced programmable interrupt controller) is configured to interrupt one processor for each interrupt sourceProcessors are assigned round robin for each interrupt vector

4747

Software interrupts

Initiating thread dispatching

ndash DPC allow for scheduling actions when kernel is deep within many layers of code

ndash Delayed scheduling decision one DPC queue per processor

Handling timer expiration

Asynchronous execution of a procedure in context of a particular thread

Support for asynchronous IO operations

4848

Flow of Interrupts

4949

Sync on MP use spinlocks to coordinate among processors

Spinlock acquisition and release routines implement a one-owner-at-a-time algorithm

ndash Spinlock is either free or is considered to be owned by a CPU

ndash Analogous to using Windows API mutexes from user mode

A spinlock is just a data cell in memory

ndash Accessed with a test-and-modify operation that is atomic across all processors

ndash KSPIN_LOCK is an opaque data type typedefrsquod as a ULONG

ndash To implement synchronization a single bit is sufficient

Synchronization on SMP Systems

5050

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

Kernel Synchronization

Processor BProcessor A

Critical section

spinlock

DPC DPC

A spinlock is a locking primitive associatedwith a global data structure such as the DPC queue

5151

Queued Spinlocks

Problem Checking status of spinlock via test-and-set operation creates bus contention

Queued spinlocks maintain queue of waiting processors

First processor acquires lock other processors wait on processor-local flag

ndash Thus busy-wait loop requires no access to the memory bus

When releasing lock the 1st processorrsquos flag is modified

ndash Exactly one processor is being signaled

ndash Pre-determined wait order

5252

SMP Scalability Improvements

Windows 2000 queued spinlocks

ndash qlocks in Kernel Debugger

Server 2003

ndash More spinlocks eliminated (context swap system space commit)

ndash Further reduction of use of spinlocks amp length they are held

ndash Scheduling database now per-CPUAllows thread state transitions in parallel

5353

SMP Scalability Improvements

XP2003

ndash Minimized lock contention for hot locks PFN or Page Frame Database lock

ndash Some locks completely eliminatedCharging nonpagedpaged pool quotas allocating and mapping system page table entries charging commitment of pages allocatingmapping physical memory through AWE functions

ndash New more efficient locking mechanism (pushlocks)Doesnrsquot use spinlocks when no contentionUsed for object manager and address windowing extensions (AWE) related locks

5454

Waiting

Flexible wait calls

ndash Wait for one or multiple objects in one call

ndash Wait for multiple can wait for ldquoanyrdquo one or ldquoallrdquo at onceldquoAllrdquo all objects must be in the signalled state concurrently to resolve the wait

ndash All wait calls include optional timeout argument

ndash Waiting threads consume no CPU time

5555

Waiting

Waitable objects include

ndash Events (may be auto-reset or manual reset may be set or ldquopulsedrdquo)

ndash Mutexes (ldquomutual exclusionrdquo one-at-a-time)

ndash Semaphores (n-at-a-time)

ndash Timers

ndash Processes and Threads (signalled upon exit or terminate)

ndash Directories (change notification)

5656

Waiting

No guaranteed ordering of wait resolution

ndash If multiple threads are waiting for an object and only one thread is released (eg itrsquos a mutex or auto-reset event) which thread gets released is unpredictable

5757

Executive Synchronization

Thread waitson an object

handle

Create and initialize thread object

Initialized

Ready

Transition

Waiting

Running

Terminated

Standby

Wait is completeSet object to

signaled state

Interaction with thread scheduling

Waiting on Dispatcher Objects ndash outside the kernel

5858

Interaction bet Synchronization amp Dispatching User mode thread waits on an event objectlsquos handle

Kernel changes threadlsquos scheduling state from ready to waiting and adds thread to wait-list

Another thread sets the event

Kernel wakes up waiting threads variable priority threads get priority boost

5959

Interaction bet Synchronization amp Dispatching Dispatcher re-schedules new thread ndash it may preempt running thread it it has lower priority and issues software interrupt to initiate context switch

If no processor can be preempted the dispatcher places the ready thread in the dispatcher ready queue to be scheduled later

6060

What signals an object

Dispatcher object

System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

Owning thread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (kernel mode)

nonsignaled signaled

Owning thread or otherthread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (exported to user mode)

nonsignaled signaled

One thread releases thesemaphore freeing a resource

A thread acquires the semaphoreMore resources are not available

Kernel resumes one or more waiting threads

Semaphore

6161

A thread reinitializesthe thread object

What signals an object (contd)

Dispatcher object System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

A thread sets the event

Kernel resumes one or more threads

Kernel resumes one or more waiting threads

Event

nonsignaled signaled

Dedicated thread sets oneevent in the event pair

Kernel resumes theother dedicated thread

Kernel resumes waitingdedicated thread

Event pair

nonsignaled signaled

Timer expires

A thread (re) initializes the timer

Kernel resumes all waiting threads

Timer

nonsignaled signaled

Thread terminates

Kernel resumes all waiting threads

Thread

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 36: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

3636

Getting Into Kernel Mode

3 Interrupts from external devicesndash interrupt dispatcher invokes the interrupt service routine

ndash ISR runs in the context of the interrupted thread so-called ldquoarbitrary thread contextrdquo

ndash ISR often requests the execution of a ldquoDPC routinerdquo which also runs in kernel mode

ndash Time not charged to interrupted thread

3737

Trap dispatching

Interruptdispatcher

Systemservice

dispatcher

Interruptserviceroutines

Interruptserviceroutines

Interruptserviceroutines

System services

System services

System services

Exceptiondispatcher

Exceptionhandlers

Exceptionhandlers

Exceptionhandlers

Virtual memorymanagerlsquos pager

Interrupt

System service call

HW exceptionsSW exceptions

Virtual addressexceptions

Trap processorlsquos mechanism to capture executing thread

ndash Switch from user to kernel mode

ndash Interrupts ndash asynchronous

ndash Exceptions - synchronous

Interrupt dispatch routine

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Interrupt service routine

interrupt

user or kernel mode

codekernel mode

Note no thread or process context switch

Note no thread or process context switch

Interrupt Dispatching

3939

IRQL = Interrupt Request Level

ndash Precedence of the interrupt with respect to other interrupts

ndash Different interrupt sources have different IRQLs

ndash not the same as IRQ

IRQL is also a state of the processor

ndash Servicing an interrupt raises processor IRQL to that interruptrsquos IRQL

ndash this masks subsequent interrupts at equal and lower IRQLs

User mode is limited to IRQL 0

No waits or page faults at IRQL gt= DISPATCH_LEVEL

Interrupt Precedence via IRQLs (x86)

4040

PassiveLowAPC

DispatchDPCDevice 1

Profile amp Synch (Srv 2003)

ClockInterprocessor Interrupt

Power failHigh

normal thread execution

Hardware interrupts

Deferrable software interrupts

012

302928

31

Interrupt Precedence via IRQLs (x86)

4141

Interrupt processing

Interrupt dispatch table (IDT)ndash Links to interrupt service routines

x86ndash Interrupt controller interrupts processor (single line)

ndash Processor queries for interrupt vector uses vector as index to IDT

Alphandash PAL code (Privileged Architecture Library ndash Alpha BIOS) determines interrupt vector calls kernel

ndash Kernel uses vector to index IDT

After ISR execution IRQL is lowered to initial level

4242

Interrupt object

Allows device drivers to register ISRs for their devicesndash Contains dispatch code (initial handler)

ndash Dispatch code calls ISR with interrupt object as parameter(HW cannot pass parameters to ISR)

Connectingdisconnecting interrupt objectsndash Dynamic association between ISR and IDT entry

ndash Loadable device drivers (kernel modules)

ndash Turn onoff ISR

Interrupt objects can synchronize access to ISR datandash Multiple instances of ISR may be active simultaneously (MP machine)

ndash Multiple ISR may be connected with IRQL

4343

Predefined IRQLs

High ndash used when halting the system (via KeBugCheck())

Power fail ndash originated in the NT design document but has never been used

Inter-processor interruptndash used to request action from other processor (dispatching a thread updating a processors TLB system shutdown system crash)

Clockndash Used to update systemlsquos clock allocation of CPU time to threads

Profilendash Used for kernel profiling (see Kernel profiler ndash Kernprofexe Res Kit)

4444

Predefined IRQLs (contd)

Device

ndash Used to prioritize device interrupts

DPCdispatch and APC

ndash Software interrupts that kernel and device drivers generate

Passive

ndash No interrupt level at all normal thread execution

4545

IRQLs on 64-bit Systems

PassiveLowAPC

DispatchDPCDevice 1

Device n

Synch (Srv 2003)Clock

Interprocessor InterruptPower

HighProfile

012

1413

15

34

PassiveLowAPC

DispatchDPC amp Synch (UP only)Correctable Machine Check

Device 1

Device nSynch (MP only)

ClockInterprocessor Interrupt

HighProfilePower

x64 IA64

12

4646

Interrupt Prioritization amp Delivery

IRQLs are determined as followsndash x86 UP systems IRQL = 27 - IRQ

ndash x86 MP systems bucketized (random)

ndash x64 amp IA64 systems IRQL = IDT vector number 16

On MP systems which processor is chosen to deliver an interruptndash By default any processor can receive an interrupt from any deviceCan be configured with IntFilter utility in Resource Kit

ndash On x86 and x64 systems the IOAPIC (IO advanced programmable interrupt controller) is programmed to interrupt the processor running at the lowest IRQL

ndash On IA64 systems the SAPIC (streamlined advanced programmable interrupt controller) is configured to interrupt one processor for each interrupt sourceProcessors are assigned round robin for each interrupt vector

4747

Software interrupts

Initiating thread dispatching

ndash DPC allow for scheduling actions when kernel is deep within many layers of code

ndash Delayed scheduling decision one DPC queue per processor

Handling timer expiration

Asynchronous execution of a procedure in context of a particular thread

Support for asynchronous IO operations

4848

Flow of Interrupts

4949

Sync on MP use spinlocks to coordinate among processors

Spinlock acquisition and release routines implement a one-owner-at-a-time algorithm

ndash Spinlock is either free or is considered to be owned by a CPU

ndash Analogous to using Windows API mutexes from user mode

A spinlock is just a data cell in memory

ndash Accessed with a test-and-modify operation that is atomic across all processors

ndash KSPIN_LOCK is an opaque data type typedefrsquod as a ULONG

ndash To implement synchronization a single bit is sufficient

Synchronization on SMP Systems

5050

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

Kernel Synchronization

Processor BProcessor A

Critical section

spinlock

DPC DPC

A spinlock is a locking primitive associatedwith a global data structure such as the DPC queue

5151

Queued Spinlocks

Problem Checking status of spinlock via test-and-set operation creates bus contention

Queued spinlocks maintain queue of waiting processors

First processor acquires lock other processors wait on processor-local flag

ndash Thus busy-wait loop requires no access to the memory bus

When releasing lock the 1st processorrsquos flag is modified

ndash Exactly one processor is being signaled

ndash Pre-determined wait order

5252

SMP Scalability Improvements

Windows 2000 queued spinlocks

ndash qlocks in Kernel Debugger

Server 2003

ndash More spinlocks eliminated (context swap system space commit)

ndash Further reduction of use of spinlocks amp length they are held

ndash Scheduling database now per-CPUAllows thread state transitions in parallel

5353

SMP Scalability Improvements

XP2003

ndash Minimized lock contention for hot locks PFN or Page Frame Database lock

ndash Some locks completely eliminatedCharging nonpagedpaged pool quotas allocating and mapping system page table entries charging commitment of pages allocatingmapping physical memory through AWE functions

ndash New more efficient locking mechanism (pushlocks)Doesnrsquot use spinlocks when no contentionUsed for object manager and address windowing extensions (AWE) related locks

5454

Waiting

Flexible wait calls

ndash Wait for one or multiple objects in one call

ndash Wait for multiple can wait for ldquoanyrdquo one or ldquoallrdquo at onceldquoAllrdquo all objects must be in the signalled state concurrently to resolve the wait

ndash All wait calls include optional timeout argument

ndash Waiting threads consume no CPU time

5555

Waiting

Waitable objects include

ndash Events (may be auto-reset or manual reset may be set or ldquopulsedrdquo)

ndash Mutexes (ldquomutual exclusionrdquo one-at-a-time)

ndash Semaphores (n-at-a-time)

ndash Timers

ndash Processes and Threads (signalled upon exit or terminate)

ndash Directories (change notification)

5656

Waiting

No guaranteed ordering of wait resolution

ndash If multiple threads are waiting for an object and only one thread is released (eg itrsquos a mutex or auto-reset event) which thread gets released is unpredictable

5757

Executive Synchronization

Thread waitson an object

handle

Create and initialize thread object

Initialized

Ready

Transition

Waiting

Running

Terminated

Standby

Wait is completeSet object to

signaled state

Interaction with thread scheduling

Waiting on Dispatcher Objects ndash outside the kernel

5858

Interaction bet Synchronization amp Dispatching User mode thread waits on an event objectlsquos handle

Kernel changes threadlsquos scheduling state from ready to waiting and adds thread to wait-list

Another thread sets the event

Kernel wakes up waiting threads variable priority threads get priority boost

5959

Interaction bet Synchronization amp Dispatching Dispatcher re-schedules new thread ndash it may preempt running thread it it has lower priority and issues software interrupt to initiate context switch

If no processor can be preempted the dispatcher places the ready thread in the dispatcher ready queue to be scheduled later

6060

What signals an object

Dispatcher object

System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

Owning thread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (kernel mode)

nonsignaled signaled

Owning thread or otherthread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (exported to user mode)

nonsignaled signaled

One thread releases thesemaphore freeing a resource

A thread acquires the semaphoreMore resources are not available

Kernel resumes one or more waiting threads

Semaphore

6161

A thread reinitializesthe thread object

What signals an object (contd)

Dispatcher object System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

A thread sets the event

Kernel resumes one or more threads

Kernel resumes one or more waiting threads

Event

nonsignaled signaled

Dedicated thread sets oneevent in the event pair

Kernel resumes theother dedicated thread

Kernel resumes waitingdedicated thread

Event pair

nonsignaled signaled

Timer expires

A thread (re) initializes the timer

Kernel resumes all waiting threads

Timer

nonsignaled signaled

Thread terminates

Kernel resumes all waiting threads

Thread

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 37: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

3737

Trap dispatching

Interruptdispatcher

Systemservice

dispatcher

Interruptserviceroutines

Interruptserviceroutines

Interruptserviceroutines

System services

System services

System services

Exceptiondispatcher

Exceptionhandlers

Exceptionhandlers

Exceptionhandlers

Virtual memorymanagerlsquos pager

Interrupt

System service call

HW exceptionsSW exceptions

Virtual addressexceptions

Trap processorlsquos mechanism to capture executing thread

ndash Switch from user to kernel mode

ndash Interrupts ndash asynchronous

ndash Exceptions - synchronous

Interrupt dispatch routine

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Interrupt service routine

interrupt

user or kernel mode

codekernel mode

Note no thread or process context switch

Note no thread or process context switch

Interrupt Dispatching

3939

IRQL = Interrupt Request Level

ndash Precedence of the interrupt with respect to other interrupts

ndash Different interrupt sources have different IRQLs

ndash not the same as IRQ

IRQL is also a state of the processor

ndash Servicing an interrupt raises processor IRQL to that interruptrsquos IRQL

ndash this masks subsequent interrupts at equal and lower IRQLs

User mode is limited to IRQL 0

No waits or page faults at IRQL gt= DISPATCH_LEVEL

Interrupt Precedence via IRQLs (x86)

4040

PassiveLowAPC

DispatchDPCDevice 1

Profile amp Synch (Srv 2003)

ClockInterprocessor Interrupt

Power failHigh

normal thread execution

Hardware interrupts

Deferrable software interrupts

012

302928

31

Interrupt Precedence via IRQLs (x86)

4141

Interrupt processing

Interrupt dispatch table (IDT)ndash Links to interrupt service routines

x86ndash Interrupt controller interrupts processor (single line)

ndash Processor queries for interrupt vector uses vector as index to IDT

Alphandash PAL code (Privileged Architecture Library ndash Alpha BIOS) determines interrupt vector calls kernel

ndash Kernel uses vector to index IDT

After ISR execution IRQL is lowered to initial level

4242

Interrupt object

Allows device drivers to register ISRs for their devicesndash Contains dispatch code (initial handler)

ndash Dispatch code calls ISR with interrupt object as parameter(HW cannot pass parameters to ISR)

Connectingdisconnecting interrupt objectsndash Dynamic association between ISR and IDT entry

ndash Loadable device drivers (kernel modules)

ndash Turn onoff ISR

Interrupt objects can synchronize access to ISR datandash Multiple instances of ISR may be active simultaneously (MP machine)

ndash Multiple ISR may be connected with IRQL

4343

Predefined IRQLs

High ndash used when halting the system (via KeBugCheck())

Power fail ndash originated in the NT design document but has never been used

Inter-processor interruptndash used to request action from other processor (dispatching a thread updating a processors TLB system shutdown system crash)

Clockndash Used to update systemlsquos clock allocation of CPU time to threads

Profilendash Used for kernel profiling (see Kernel profiler ndash Kernprofexe Res Kit)

4444

Predefined IRQLs (contd)

Device

ndash Used to prioritize device interrupts

DPCdispatch and APC

ndash Software interrupts that kernel and device drivers generate

Passive

ndash No interrupt level at all normal thread execution

4545

IRQLs on 64-bit Systems

PassiveLowAPC

DispatchDPCDevice 1

Device n

Synch (Srv 2003)Clock

Interprocessor InterruptPower

HighProfile

012

1413

15

34

PassiveLowAPC

DispatchDPC amp Synch (UP only)Correctable Machine Check

Device 1

Device nSynch (MP only)

ClockInterprocessor Interrupt

HighProfilePower

x64 IA64

12

4646

Interrupt Prioritization amp Delivery

IRQLs are determined as followsndash x86 UP systems IRQL = 27 - IRQ

ndash x86 MP systems bucketized (random)

ndash x64 amp IA64 systems IRQL = IDT vector number 16

On MP systems which processor is chosen to deliver an interruptndash By default any processor can receive an interrupt from any deviceCan be configured with IntFilter utility in Resource Kit

ndash On x86 and x64 systems the IOAPIC (IO advanced programmable interrupt controller) is programmed to interrupt the processor running at the lowest IRQL

ndash On IA64 systems the SAPIC (streamlined advanced programmable interrupt controller) is configured to interrupt one processor for each interrupt sourceProcessors are assigned round robin for each interrupt vector

4747

Software interrupts

Initiating thread dispatching

ndash DPC allow for scheduling actions when kernel is deep within many layers of code

ndash Delayed scheduling decision one DPC queue per processor

Handling timer expiration

Asynchronous execution of a procedure in context of a particular thread

Support for asynchronous IO operations

4848

Flow of Interrupts

4949

Sync on MP use spinlocks to coordinate among processors

Spinlock acquisition and release routines implement a one-owner-at-a-time algorithm

ndash Spinlock is either free or is considered to be owned by a CPU

ndash Analogous to using Windows API mutexes from user mode

A spinlock is just a data cell in memory

ndash Accessed with a test-and-modify operation that is atomic across all processors

ndash KSPIN_LOCK is an opaque data type typedefrsquod as a ULONG

ndash To implement synchronization a single bit is sufficient

Synchronization on SMP Systems

5050

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

Kernel Synchronization

Processor BProcessor A

Critical section

spinlock

DPC DPC

A spinlock is a locking primitive associatedwith a global data structure such as the DPC queue

5151

Queued Spinlocks

Problem Checking status of spinlock via test-and-set operation creates bus contention

Queued spinlocks maintain queue of waiting processors

First processor acquires lock other processors wait on processor-local flag

ndash Thus busy-wait loop requires no access to the memory bus

When releasing lock the 1st processorrsquos flag is modified

ndash Exactly one processor is being signaled

ndash Pre-determined wait order

5252

SMP Scalability Improvements

Windows 2000 queued spinlocks

ndash qlocks in Kernel Debugger

Server 2003

ndash More spinlocks eliminated (context swap system space commit)

ndash Further reduction of use of spinlocks amp length they are held

ndash Scheduling database now per-CPUAllows thread state transitions in parallel

5353

SMP Scalability Improvements

XP2003

ndash Minimized lock contention for hot locks PFN or Page Frame Database lock

ndash Some locks completely eliminatedCharging nonpagedpaged pool quotas allocating and mapping system page table entries charging commitment of pages allocatingmapping physical memory through AWE functions

ndash New more efficient locking mechanism (pushlocks)Doesnrsquot use spinlocks when no contentionUsed for object manager and address windowing extensions (AWE) related locks

5454

Waiting

Flexible wait calls

ndash Wait for one or multiple objects in one call

ndash Wait for multiple can wait for ldquoanyrdquo one or ldquoallrdquo at onceldquoAllrdquo all objects must be in the signalled state concurrently to resolve the wait

ndash All wait calls include optional timeout argument

ndash Waiting threads consume no CPU time

5555

Waiting

Waitable objects include

ndash Events (may be auto-reset or manual reset may be set or ldquopulsedrdquo)

ndash Mutexes (ldquomutual exclusionrdquo one-at-a-time)

ndash Semaphores (n-at-a-time)

ndash Timers

ndash Processes and Threads (signalled upon exit or terminate)

ndash Directories (change notification)

5656

Waiting

No guaranteed ordering of wait resolution

ndash If multiple threads are waiting for an object and only one thread is released (eg itrsquos a mutex or auto-reset event) which thread gets released is unpredictable

5757

Executive Synchronization

Thread waitson an object

handle

Create and initialize thread object

Initialized

Ready

Transition

Waiting

Running

Terminated

Standby

Wait is completeSet object to

signaled state

Interaction with thread scheduling

Waiting on Dispatcher Objects ndash outside the kernel

5858

Interaction bet Synchronization amp Dispatching User mode thread waits on an event objectlsquos handle

Kernel changes threadlsquos scheduling state from ready to waiting and adds thread to wait-list

Another thread sets the event

Kernel wakes up waiting threads variable priority threads get priority boost

5959

Interaction bet Synchronization amp Dispatching Dispatcher re-schedules new thread ndash it may preempt running thread it it has lower priority and issues software interrupt to initiate context switch

If no processor can be preempted the dispatcher places the ready thread in the dispatcher ready queue to be scheduled later

6060

What signals an object

Dispatcher object

System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

Owning thread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (kernel mode)

nonsignaled signaled

Owning thread or otherthread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (exported to user mode)

nonsignaled signaled

One thread releases thesemaphore freeing a resource

A thread acquires the semaphoreMore resources are not available

Kernel resumes one or more waiting threads

Semaphore

6161

A thread reinitializesthe thread object

What signals an object (contd)

Dispatcher object System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

A thread sets the event

Kernel resumes one or more threads

Kernel resumes one or more waiting threads

Event

nonsignaled signaled

Dedicated thread sets oneevent in the event pair

Kernel resumes theother dedicated thread

Kernel resumes waitingdedicated thread

Event pair

nonsignaled signaled

Timer expires

A thread (re) initializes the timer

Kernel resumes all waiting threads

Timer

nonsignaled signaled

Thread terminates

Kernel resumes all waiting threads

Thread

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 38: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

Interrupt dispatch routine

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Disable interrupts

Record machine state (trap frame) to allow resume

Mask equal- and lower-IRQL interrupts

Find and call appropriate ISR

Dismiss interrupt

Restore machine state (including mode and enabled interrupts)

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Tell the device to stop interrupting

Interrogate device state start next operation on device etc

Request a DPC

Return to caller

Interrupt service routine

interrupt

user or kernel mode

codekernel mode

Note no thread or process context switch

Note no thread or process context switch

Interrupt Dispatching

3939

IRQL = Interrupt Request Level

ndash Precedence of the interrupt with respect to other interrupts

ndash Different interrupt sources have different IRQLs

ndash not the same as IRQ

IRQL is also a state of the processor

ndash Servicing an interrupt raises processor IRQL to that interruptrsquos IRQL

ndash this masks subsequent interrupts at equal and lower IRQLs

User mode is limited to IRQL 0

No waits or page faults at IRQL gt= DISPATCH_LEVEL

Interrupt Precedence via IRQLs (x86)

4040

PassiveLowAPC

DispatchDPCDevice 1

Profile amp Synch (Srv 2003)

ClockInterprocessor Interrupt

Power failHigh

normal thread execution

Hardware interrupts

Deferrable software interrupts

012

302928

31

Interrupt Precedence via IRQLs (x86)

4141

Interrupt processing

Interrupt dispatch table (IDT)ndash Links to interrupt service routines

x86ndash Interrupt controller interrupts processor (single line)

ndash Processor queries for interrupt vector uses vector as index to IDT

Alphandash PAL code (Privileged Architecture Library ndash Alpha BIOS) determines interrupt vector calls kernel

ndash Kernel uses vector to index IDT

After ISR execution IRQL is lowered to initial level

4242

Interrupt object

Allows device drivers to register ISRs for their devicesndash Contains dispatch code (initial handler)

ndash Dispatch code calls ISR with interrupt object as parameter(HW cannot pass parameters to ISR)

Connectingdisconnecting interrupt objectsndash Dynamic association between ISR and IDT entry

ndash Loadable device drivers (kernel modules)

ndash Turn onoff ISR

Interrupt objects can synchronize access to ISR datandash Multiple instances of ISR may be active simultaneously (MP machine)

ndash Multiple ISR may be connected with IRQL

4343

Predefined IRQLs

High ndash used when halting the system (via KeBugCheck())

Power fail ndash originated in the NT design document but has never been used

Inter-processor interruptndash used to request action from other processor (dispatching a thread updating a processors TLB system shutdown system crash)

Clockndash Used to update systemlsquos clock allocation of CPU time to threads

Profilendash Used for kernel profiling (see Kernel profiler ndash Kernprofexe Res Kit)

4444

Predefined IRQLs (contd)

Device

ndash Used to prioritize device interrupts

DPCdispatch and APC

ndash Software interrupts that kernel and device drivers generate

Passive

ndash No interrupt level at all normal thread execution

4545

IRQLs on 64-bit Systems

PassiveLowAPC

DispatchDPCDevice 1

Device n

Synch (Srv 2003)Clock

Interprocessor InterruptPower

HighProfile

012

1413

15

34

PassiveLowAPC

DispatchDPC amp Synch (UP only)Correctable Machine Check

Device 1

Device nSynch (MP only)

ClockInterprocessor Interrupt

HighProfilePower

x64 IA64

12

4646

Interrupt Prioritization amp Delivery

IRQLs are determined as followsndash x86 UP systems IRQL = 27 - IRQ

ndash x86 MP systems bucketized (random)

ndash x64 amp IA64 systems IRQL = IDT vector number 16

On MP systems which processor is chosen to deliver an interruptndash By default any processor can receive an interrupt from any deviceCan be configured with IntFilter utility in Resource Kit

ndash On x86 and x64 systems the IOAPIC (IO advanced programmable interrupt controller) is programmed to interrupt the processor running at the lowest IRQL

ndash On IA64 systems the SAPIC (streamlined advanced programmable interrupt controller) is configured to interrupt one processor for each interrupt sourceProcessors are assigned round robin for each interrupt vector

4747

Software interrupts

Initiating thread dispatching

ndash DPC allow for scheduling actions when kernel is deep within many layers of code

ndash Delayed scheduling decision one DPC queue per processor

Handling timer expiration

Asynchronous execution of a procedure in context of a particular thread

Support for asynchronous IO operations

4848

Flow of Interrupts

4949

Sync on MP use spinlocks to coordinate among processors

Spinlock acquisition and release routines implement a one-owner-at-a-time algorithm

ndash Spinlock is either free or is considered to be owned by a CPU

ndash Analogous to using Windows API mutexes from user mode

A spinlock is just a data cell in memory

ndash Accessed with a test-and-modify operation that is atomic across all processors

ndash KSPIN_LOCK is an opaque data type typedefrsquod as a ULONG

ndash To implement synchronization a single bit is sufficient

Synchronization on SMP Systems

5050

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

Kernel Synchronization

Processor BProcessor A

Critical section

spinlock

DPC DPC

A spinlock is a locking primitive associatedwith a global data structure such as the DPC queue

5151

Queued Spinlocks

Problem Checking status of spinlock via test-and-set operation creates bus contention

Queued spinlocks maintain queue of waiting processors

First processor acquires lock other processors wait on processor-local flag

ndash Thus busy-wait loop requires no access to the memory bus

When releasing lock the 1st processorrsquos flag is modified

ndash Exactly one processor is being signaled

ndash Pre-determined wait order

5252

SMP Scalability Improvements

Windows 2000 queued spinlocks

ndash qlocks in Kernel Debugger

Server 2003

ndash More spinlocks eliminated (context swap system space commit)

ndash Further reduction of use of spinlocks amp length they are held

ndash Scheduling database now per-CPUAllows thread state transitions in parallel

5353

SMP Scalability Improvements

XP2003

ndash Minimized lock contention for hot locks PFN or Page Frame Database lock

ndash Some locks completely eliminatedCharging nonpagedpaged pool quotas allocating and mapping system page table entries charging commitment of pages allocatingmapping physical memory through AWE functions

ndash New more efficient locking mechanism (pushlocks)Doesnrsquot use spinlocks when no contentionUsed for object manager and address windowing extensions (AWE) related locks

5454

Waiting

Flexible wait calls

ndash Wait for one or multiple objects in one call

ndash Wait for multiple can wait for ldquoanyrdquo one or ldquoallrdquo at onceldquoAllrdquo all objects must be in the signalled state concurrently to resolve the wait

ndash All wait calls include optional timeout argument

ndash Waiting threads consume no CPU time

5555

Waiting

Waitable objects include

ndash Events (may be auto-reset or manual reset may be set or ldquopulsedrdquo)

ndash Mutexes (ldquomutual exclusionrdquo one-at-a-time)

ndash Semaphores (n-at-a-time)

ndash Timers

ndash Processes and Threads (signalled upon exit or terminate)

ndash Directories (change notification)

5656

Waiting

No guaranteed ordering of wait resolution

ndash If multiple threads are waiting for an object and only one thread is released (eg itrsquos a mutex or auto-reset event) which thread gets released is unpredictable

5757

Executive Synchronization

Thread waitson an object

handle

Create and initialize thread object

Initialized

Ready

Transition

Waiting

Running

Terminated

Standby

Wait is completeSet object to

signaled state

Interaction with thread scheduling

Waiting on Dispatcher Objects ndash outside the kernel

5858

Interaction bet Synchronization amp Dispatching User mode thread waits on an event objectlsquos handle

Kernel changes threadlsquos scheduling state from ready to waiting and adds thread to wait-list

Another thread sets the event

Kernel wakes up waiting threads variable priority threads get priority boost

5959

Interaction bet Synchronization amp Dispatching Dispatcher re-schedules new thread ndash it may preempt running thread it it has lower priority and issues software interrupt to initiate context switch

If no processor can be preempted the dispatcher places the ready thread in the dispatcher ready queue to be scheduled later

6060

What signals an object

Dispatcher object

System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

Owning thread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (kernel mode)

nonsignaled signaled

Owning thread or otherthread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (exported to user mode)

nonsignaled signaled

One thread releases thesemaphore freeing a resource

A thread acquires the semaphoreMore resources are not available

Kernel resumes one or more waiting threads

Semaphore

6161

A thread reinitializesthe thread object

What signals an object (contd)

Dispatcher object System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

A thread sets the event

Kernel resumes one or more threads

Kernel resumes one or more waiting threads

Event

nonsignaled signaled

Dedicated thread sets oneevent in the event pair

Kernel resumes theother dedicated thread

Kernel resumes waitingdedicated thread

Event pair

nonsignaled signaled

Timer expires

A thread (re) initializes the timer

Kernel resumes all waiting threads

Timer

nonsignaled signaled

Thread terminates

Kernel resumes all waiting threads

Thread

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 39: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

3939

IRQL = Interrupt Request Level

ndash Precedence of the interrupt with respect to other interrupts

ndash Different interrupt sources have different IRQLs

ndash not the same as IRQ

IRQL is also a state of the processor

ndash Servicing an interrupt raises processor IRQL to that interruptrsquos IRQL

ndash this masks subsequent interrupts at equal and lower IRQLs

User mode is limited to IRQL 0

No waits or page faults at IRQL gt= DISPATCH_LEVEL

Interrupt Precedence via IRQLs (x86)

4040

PassiveLowAPC

DispatchDPCDevice 1

Profile amp Synch (Srv 2003)

ClockInterprocessor Interrupt

Power failHigh

normal thread execution

Hardware interrupts

Deferrable software interrupts

012

302928

31

Interrupt Precedence via IRQLs (x86)

4141

Interrupt processing

Interrupt dispatch table (IDT)ndash Links to interrupt service routines

x86ndash Interrupt controller interrupts processor (single line)

ndash Processor queries for interrupt vector uses vector as index to IDT

Alphandash PAL code (Privileged Architecture Library ndash Alpha BIOS) determines interrupt vector calls kernel

ndash Kernel uses vector to index IDT

After ISR execution IRQL is lowered to initial level

4242

Interrupt object

Allows device drivers to register ISRs for their devicesndash Contains dispatch code (initial handler)

ndash Dispatch code calls ISR with interrupt object as parameter(HW cannot pass parameters to ISR)

Connectingdisconnecting interrupt objectsndash Dynamic association between ISR and IDT entry

ndash Loadable device drivers (kernel modules)

ndash Turn onoff ISR

Interrupt objects can synchronize access to ISR datandash Multiple instances of ISR may be active simultaneously (MP machine)

ndash Multiple ISR may be connected with IRQL

4343

Predefined IRQLs

High ndash used when halting the system (via KeBugCheck())

Power fail ndash originated in the NT design document but has never been used

Inter-processor interruptndash used to request action from other processor (dispatching a thread updating a processors TLB system shutdown system crash)

Clockndash Used to update systemlsquos clock allocation of CPU time to threads

Profilendash Used for kernel profiling (see Kernel profiler ndash Kernprofexe Res Kit)

4444

Predefined IRQLs (contd)

Device

ndash Used to prioritize device interrupts

DPCdispatch and APC

ndash Software interrupts that kernel and device drivers generate

Passive

ndash No interrupt level at all normal thread execution

4545

IRQLs on 64-bit Systems

PassiveLowAPC

DispatchDPCDevice 1

Device n

Synch (Srv 2003)Clock

Interprocessor InterruptPower

HighProfile

012

1413

15

34

PassiveLowAPC

DispatchDPC amp Synch (UP only)Correctable Machine Check

Device 1

Device nSynch (MP only)

ClockInterprocessor Interrupt

HighProfilePower

x64 IA64

12

4646

Interrupt Prioritization amp Delivery

IRQLs are determined as followsndash x86 UP systems IRQL = 27 - IRQ

ndash x86 MP systems bucketized (random)

ndash x64 amp IA64 systems IRQL = IDT vector number 16

On MP systems which processor is chosen to deliver an interruptndash By default any processor can receive an interrupt from any deviceCan be configured with IntFilter utility in Resource Kit

ndash On x86 and x64 systems the IOAPIC (IO advanced programmable interrupt controller) is programmed to interrupt the processor running at the lowest IRQL

ndash On IA64 systems the SAPIC (streamlined advanced programmable interrupt controller) is configured to interrupt one processor for each interrupt sourceProcessors are assigned round robin for each interrupt vector

4747

Software interrupts

Initiating thread dispatching

ndash DPC allow for scheduling actions when kernel is deep within many layers of code

ndash Delayed scheduling decision one DPC queue per processor

Handling timer expiration

Asynchronous execution of a procedure in context of a particular thread

Support for asynchronous IO operations

4848

Flow of Interrupts

4949

Sync on MP use spinlocks to coordinate among processors

Spinlock acquisition and release routines implement a one-owner-at-a-time algorithm

ndash Spinlock is either free or is considered to be owned by a CPU

ndash Analogous to using Windows API mutexes from user mode

A spinlock is just a data cell in memory

ndash Accessed with a test-and-modify operation that is atomic across all processors

ndash KSPIN_LOCK is an opaque data type typedefrsquod as a ULONG

ndash To implement synchronization a single bit is sufficient

Synchronization on SMP Systems

5050

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

Kernel Synchronization

Processor BProcessor A

Critical section

spinlock

DPC DPC

A spinlock is a locking primitive associatedwith a global data structure such as the DPC queue

5151

Queued Spinlocks

Problem Checking status of spinlock via test-and-set operation creates bus contention

Queued spinlocks maintain queue of waiting processors

First processor acquires lock other processors wait on processor-local flag

ndash Thus busy-wait loop requires no access to the memory bus

When releasing lock the 1st processorrsquos flag is modified

ndash Exactly one processor is being signaled

ndash Pre-determined wait order

5252

SMP Scalability Improvements

Windows 2000 queued spinlocks

ndash qlocks in Kernel Debugger

Server 2003

ndash More spinlocks eliminated (context swap system space commit)

ndash Further reduction of use of spinlocks amp length they are held

ndash Scheduling database now per-CPUAllows thread state transitions in parallel

5353

SMP Scalability Improvements

XP2003

ndash Minimized lock contention for hot locks PFN or Page Frame Database lock

ndash Some locks completely eliminatedCharging nonpagedpaged pool quotas allocating and mapping system page table entries charging commitment of pages allocatingmapping physical memory through AWE functions

ndash New more efficient locking mechanism (pushlocks)Doesnrsquot use spinlocks when no contentionUsed for object manager and address windowing extensions (AWE) related locks

5454

Waiting

Flexible wait calls

ndash Wait for one or multiple objects in one call

ndash Wait for multiple can wait for ldquoanyrdquo one or ldquoallrdquo at onceldquoAllrdquo all objects must be in the signalled state concurrently to resolve the wait

ndash All wait calls include optional timeout argument

ndash Waiting threads consume no CPU time

5555

Waiting

Waitable objects include

ndash Events (may be auto-reset or manual reset may be set or ldquopulsedrdquo)

ndash Mutexes (ldquomutual exclusionrdquo one-at-a-time)

ndash Semaphores (n-at-a-time)

ndash Timers

ndash Processes and Threads (signalled upon exit or terminate)

ndash Directories (change notification)

5656

Waiting

No guaranteed ordering of wait resolution

ndash If multiple threads are waiting for an object and only one thread is released (eg itrsquos a mutex or auto-reset event) which thread gets released is unpredictable

5757

Executive Synchronization

Thread waitson an object

handle

Create and initialize thread object

Initialized

Ready

Transition

Waiting

Running

Terminated

Standby

Wait is completeSet object to

signaled state

Interaction with thread scheduling

Waiting on Dispatcher Objects ndash outside the kernel

5858

Interaction bet Synchronization amp Dispatching User mode thread waits on an event objectlsquos handle

Kernel changes threadlsquos scheduling state from ready to waiting and adds thread to wait-list

Another thread sets the event

Kernel wakes up waiting threads variable priority threads get priority boost

5959

Interaction bet Synchronization amp Dispatching Dispatcher re-schedules new thread ndash it may preempt running thread it it has lower priority and issues software interrupt to initiate context switch

If no processor can be preempted the dispatcher places the ready thread in the dispatcher ready queue to be scheduled later

6060

What signals an object

Dispatcher object

System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

Owning thread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (kernel mode)

nonsignaled signaled

Owning thread or otherthread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (exported to user mode)

nonsignaled signaled

One thread releases thesemaphore freeing a resource

A thread acquires the semaphoreMore resources are not available

Kernel resumes one or more waiting threads

Semaphore

6161

A thread reinitializesthe thread object

What signals an object (contd)

Dispatcher object System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

A thread sets the event

Kernel resumes one or more threads

Kernel resumes one or more waiting threads

Event

nonsignaled signaled

Dedicated thread sets oneevent in the event pair

Kernel resumes theother dedicated thread

Kernel resumes waitingdedicated thread

Event pair

nonsignaled signaled

Timer expires

A thread (re) initializes the timer

Kernel resumes all waiting threads

Timer

nonsignaled signaled

Thread terminates

Kernel resumes all waiting threads

Thread

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 40: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

4040

PassiveLowAPC

DispatchDPCDevice 1

Profile amp Synch (Srv 2003)

ClockInterprocessor Interrupt

Power failHigh

normal thread execution

Hardware interrupts

Deferrable software interrupts

012

302928

31

Interrupt Precedence via IRQLs (x86)

4141

Interrupt processing

Interrupt dispatch table (IDT)ndash Links to interrupt service routines

x86ndash Interrupt controller interrupts processor (single line)

ndash Processor queries for interrupt vector uses vector as index to IDT

Alphandash PAL code (Privileged Architecture Library ndash Alpha BIOS) determines interrupt vector calls kernel

ndash Kernel uses vector to index IDT

After ISR execution IRQL is lowered to initial level

4242

Interrupt object

Allows device drivers to register ISRs for their devicesndash Contains dispatch code (initial handler)

ndash Dispatch code calls ISR with interrupt object as parameter(HW cannot pass parameters to ISR)

Connectingdisconnecting interrupt objectsndash Dynamic association between ISR and IDT entry

ndash Loadable device drivers (kernel modules)

ndash Turn onoff ISR

Interrupt objects can synchronize access to ISR datandash Multiple instances of ISR may be active simultaneously (MP machine)

ndash Multiple ISR may be connected with IRQL

4343

Predefined IRQLs

High ndash used when halting the system (via KeBugCheck())

Power fail ndash originated in the NT design document but has never been used

Inter-processor interruptndash used to request action from other processor (dispatching a thread updating a processors TLB system shutdown system crash)

Clockndash Used to update systemlsquos clock allocation of CPU time to threads

Profilendash Used for kernel profiling (see Kernel profiler ndash Kernprofexe Res Kit)

4444

Predefined IRQLs (contd)

Device

ndash Used to prioritize device interrupts

DPCdispatch and APC

ndash Software interrupts that kernel and device drivers generate

Passive

ndash No interrupt level at all normal thread execution

4545

IRQLs on 64-bit Systems

PassiveLowAPC

DispatchDPCDevice 1

Device n

Synch (Srv 2003)Clock

Interprocessor InterruptPower

HighProfile

012

1413

15

34

PassiveLowAPC

DispatchDPC amp Synch (UP only)Correctable Machine Check

Device 1

Device nSynch (MP only)

ClockInterprocessor Interrupt

HighProfilePower

x64 IA64

12

4646

Interrupt Prioritization amp Delivery

IRQLs are determined as followsndash x86 UP systems IRQL = 27 - IRQ

ndash x86 MP systems bucketized (random)

ndash x64 amp IA64 systems IRQL = IDT vector number 16

On MP systems which processor is chosen to deliver an interruptndash By default any processor can receive an interrupt from any deviceCan be configured with IntFilter utility in Resource Kit

ndash On x86 and x64 systems the IOAPIC (IO advanced programmable interrupt controller) is programmed to interrupt the processor running at the lowest IRQL

ndash On IA64 systems the SAPIC (streamlined advanced programmable interrupt controller) is configured to interrupt one processor for each interrupt sourceProcessors are assigned round robin for each interrupt vector

4747

Software interrupts

Initiating thread dispatching

ndash DPC allow for scheduling actions when kernel is deep within many layers of code

ndash Delayed scheduling decision one DPC queue per processor

Handling timer expiration

Asynchronous execution of a procedure in context of a particular thread

Support for asynchronous IO operations

4848

Flow of Interrupts

4949

Sync on MP use spinlocks to coordinate among processors

Spinlock acquisition and release routines implement a one-owner-at-a-time algorithm

ndash Spinlock is either free or is considered to be owned by a CPU

ndash Analogous to using Windows API mutexes from user mode

A spinlock is just a data cell in memory

ndash Accessed with a test-and-modify operation that is atomic across all processors

ndash KSPIN_LOCK is an opaque data type typedefrsquod as a ULONG

ndash To implement synchronization a single bit is sufficient

Synchronization on SMP Systems

5050

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

Kernel Synchronization

Processor BProcessor A

Critical section

spinlock

DPC DPC

A spinlock is a locking primitive associatedwith a global data structure such as the DPC queue

5151

Queued Spinlocks

Problem Checking status of spinlock via test-and-set operation creates bus contention

Queued spinlocks maintain queue of waiting processors

First processor acquires lock other processors wait on processor-local flag

ndash Thus busy-wait loop requires no access to the memory bus

When releasing lock the 1st processorrsquos flag is modified

ndash Exactly one processor is being signaled

ndash Pre-determined wait order

5252

SMP Scalability Improvements

Windows 2000 queued spinlocks

ndash qlocks in Kernel Debugger

Server 2003

ndash More spinlocks eliminated (context swap system space commit)

ndash Further reduction of use of spinlocks amp length they are held

ndash Scheduling database now per-CPUAllows thread state transitions in parallel

5353

SMP Scalability Improvements

XP2003

ndash Minimized lock contention for hot locks PFN or Page Frame Database lock

ndash Some locks completely eliminatedCharging nonpagedpaged pool quotas allocating and mapping system page table entries charging commitment of pages allocatingmapping physical memory through AWE functions

ndash New more efficient locking mechanism (pushlocks)Doesnrsquot use spinlocks when no contentionUsed for object manager and address windowing extensions (AWE) related locks

5454

Waiting

Flexible wait calls

ndash Wait for one or multiple objects in one call

ndash Wait for multiple can wait for ldquoanyrdquo one or ldquoallrdquo at onceldquoAllrdquo all objects must be in the signalled state concurrently to resolve the wait

ndash All wait calls include optional timeout argument

ndash Waiting threads consume no CPU time

5555

Waiting

Waitable objects include

ndash Events (may be auto-reset or manual reset may be set or ldquopulsedrdquo)

ndash Mutexes (ldquomutual exclusionrdquo one-at-a-time)

ndash Semaphores (n-at-a-time)

ndash Timers

ndash Processes and Threads (signalled upon exit or terminate)

ndash Directories (change notification)

5656

Waiting

No guaranteed ordering of wait resolution

ndash If multiple threads are waiting for an object and only one thread is released (eg itrsquos a mutex or auto-reset event) which thread gets released is unpredictable

5757

Executive Synchronization

Thread waitson an object

handle

Create and initialize thread object

Initialized

Ready

Transition

Waiting

Running

Terminated

Standby

Wait is completeSet object to

signaled state

Interaction with thread scheduling

Waiting on Dispatcher Objects ndash outside the kernel

5858

Interaction bet Synchronization amp Dispatching User mode thread waits on an event objectlsquos handle

Kernel changes threadlsquos scheduling state from ready to waiting and adds thread to wait-list

Another thread sets the event

Kernel wakes up waiting threads variable priority threads get priority boost

5959

Interaction bet Synchronization amp Dispatching Dispatcher re-schedules new thread ndash it may preempt running thread it it has lower priority and issues software interrupt to initiate context switch

If no processor can be preempted the dispatcher places the ready thread in the dispatcher ready queue to be scheduled later

6060

What signals an object

Dispatcher object

System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

Owning thread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (kernel mode)

nonsignaled signaled

Owning thread or otherthread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (exported to user mode)

nonsignaled signaled

One thread releases thesemaphore freeing a resource

A thread acquires the semaphoreMore resources are not available

Kernel resumes one or more waiting threads

Semaphore

6161

A thread reinitializesthe thread object

What signals an object (contd)

Dispatcher object System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

A thread sets the event

Kernel resumes one or more threads

Kernel resumes one or more waiting threads

Event

nonsignaled signaled

Dedicated thread sets oneevent in the event pair

Kernel resumes theother dedicated thread

Kernel resumes waitingdedicated thread

Event pair

nonsignaled signaled

Timer expires

A thread (re) initializes the timer

Kernel resumes all waiting threads

Timer

nonsignaled signaled

Thread terminates

Kernel resumes all waiting threads

Thread

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 41: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

4141

Interrupt processing

Interrupt dispatch table (IDT)ndash Links to interrupt service routines

x86ndash Interrupt controller interrupts processor (single line)

ndash Processor queries for interrupt vector uses vector as index to IDT

Alphandash PAL code (Privileged Architecture Library ndash Alpha BIOS) determines interrupt vector calls kernel

ndash Kernel uses vector to index IDT

After ISR execution IRQL is lowered to initial level

4242

Interrupt object

Allows device drivers to register ISRs for their devicesndash Contains dispatch code (initial handler)

ndash Dispatch code calls ISR with interrupt object as parameter(HW cannot pass parameters to ISR)

Connectingdisconnecting interrupt objectsndash Dynamic association between ISR and IDT entry

ndash Loadable device drivers (kernel modules)

ndash Turn onoff ISR

Interrupt objects can synchronize access to ISR datandash Multiple instances of ISR may be active simultaneously (MP machine)

ndash Multiple ISR may be connected with IRQL

4343

Predefined IRQLs

High ndash used when halting the system (via KeBugCheck())

Power fail ndash originated in the NT design document but has never been used

Inter-processor interruptndash used to request action from other processor (dispatching a thread updating a processors TLB system shutdown system crash)

Clockndash Used to update systemlsquos clock allocation of CPU time to threads

Profilendash Used for kernel profiling (see Kernel profiler ndash Kernprofexe Res Kit)

4444

Predefined IRQLs (contd)

Device

ndash Used to prioritize device interrupts

DPCdispatch and APC

ndash Software interrupts that kernel and device drivers generate

Passive

ndash No interrupt level at all normal thread execution

4545

IRQLs on 64-bit Systems

PassiveLowAPC

DispatchDPCDevice 1

Device n

Synch (Srv 2003)Clock

Interprocessor InterruptPower

HighProfile

012

1413

15

34

PassiveLowAPC

DispatchDPC amp Synch (UP only)Correctable Machine Check

Device 1

Device nSynch (MP only)

ClockInterprocessor Interrupt

HighProfilePower

x64 IA64

12

4646

Interrupt Prioritization amp Delivery

IRQLs are determined as followsndash x86 UP systems IRQL = 27 - IRQ

ndash x86 MP systems bucketized (random)

ndash x64 amp IA64 systems IRQL = IDT vector number 16

On MP systems which processor is chosen to deliver an interruptndash By default any processor can receive an interrupt from any deviceCan be configured with IntFilter utility in Resource Kit

ndash On x86 and x64 systems the IOAPIC (IO advanced programmable interrupt controller) is programmed to interrupt the processor running at the lowest IRQL

ndash On IA64 systems the SAPIC (streamlined advanced programmable interrupt controller) is configured to interrupt one processor for each interrupt sourceProcessors are assigned round robin for each interrupt vector

4747

Software interrupts

Initiating thread dispatching

ndash DPC allow for scheduling actions when kernel is deep within many layers of code

ndash Delayed scheduling decision one DPC queue per processor

Handling timer expiration

Asynchronous execution of a procedure in context of a particular thread

Support for asynchronous IO operations

4848

Flow of Interrupts

4949

Sync on MP use spinlocks to coordinate among processors

Spinlock acquisition and release routines implement a one-owner-at-a-time algorithm

ndash Spinlock is either free or is considered to be owned by a CPU

ndash Analogous to using Windows API mutexes from user mode

A spinlock is just a data cell in memory

ndash Accessed with a test-and-modify operation that is atomic across all processors

ndash KSPIN_LOCK is an opaque data type typedefrsquod as a ULONG

ndash To implement synchronization a single bit is sufficient

Synchronization on SMP Systems

5050

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

Kernel Synchronization

Processor BProcessor A

Critical section

spinlock

DPC DPC

A spinlock is a locking primitive associatedwith a global data structure such as the DPC queue

5151

Queued Spinlocks

Problem Checking status of spinlock via test-and-set operation creates bus contention

Queued spinlocks maintain queue of waiting processors

First processor acquires lock other processors wait on processor-local flag

ndash Thus busy-wait loop requires no access to the memory bus

When releasing lock the 1st processorrsquos flag is modified

ndash Exactly one processor is being signaled

ndash Pre-determined wait order

5252

SMP Scalability Improvements

Windows 2000 queued spinlocks

ndash qlocks in Kernel Debugger

Server 2003

ndash More spinlocks eliminated (context swap system space commit)

ndash Further reduction of use of spinlocks amp length they are held

ndash Scheduling database now per-CPUAllows thread state transitions in parallel

5353

SMP Scalability Improvements

XP2003

ndash Minimized lock contention for hot locks PFN or Page Frame Database lock

ndash Some locks completely eliminatedCharging nonpagedpaged pool quotas allocating and mapping system page table entries charging commitment of pages allocatingmapping physical memory through AWE functions

ndash New more efficient locking mechanism (pushlocks)Doesnrsquot use spinlocks when no contentionUsed for object manager and address windowing extensions (AWE) related locks

5454

Waiting

Flexible wait calls

ndash Wait for one or multiple objects in one call

ndash Wait for multiple can wait for ldquoanyrdquo one or ldquoallrdquo at onceldquoAllrdquo all objects must be in the signalled state concurrently to resolve the wait

ndash All wait calls include optional timeout argument

ndash Waiting threads consume no CPU time

5555

Waiting

Waitable objects include

ndash Events (may be auto-reset or manual reset may be set or ldquopulsedrdquo)

ndash Mutexes (ldquomutual exclusionrdquo one-at-a-time)

ndash Semaphores (n-at-a-time)

ndash Timers

ndash Processes and Threads (signalled upon exit or terminate)

ndash Directories (change notification)

5656

Waiting

No guaranteed ordering of wait resolution

ndash If multiple threads are waiting for an object and only one thread is released (eg itrsquos a mutex or auto-reset event) which thread gets released is unpredictable

5757

Executive Synchronization

Thread waitson an object

handle

Create and initialize thread object

Initialized

Ready

Transition

Waiting

Running

Terminated

Standby

Wait is completeSet object to

signaled state

Interaction with thread scheduling

Waiting on Dispatcher Objects ndash outside the kernel

5858

Interaction bet Synchronization amp Dispatching User mode thread waits on an event objectlsquos handle

Kernel changes threadlsquos scheduling state from ready to waiting and adds thread to wait-list

Another thread sets the event

Kernel wakes up waiting threads variable priority threads get priority boost

5959

Interaction bet Synchronization amp Dispatching Dispatcher re-schedules new thread ndash it may preempt running thread it it has lower priority and issues software interrupt to initiate context switch

If no processor can be preempted the dispatcher places the ready thread in the dispatcher ready queue to be scheduled later

6060

What signals an object

Dispatcher object

System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

Owning thread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (kernel mode)

nonsignaled signaled

Owning thread or otherthread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (exported to user mode)

nonsignaled signaled

One thread releases thesemaphore freeing a resource

A thread acquires the semaphoreMore resources are not available

Kernel resumes one or more waiting threads

Semaphore

6161

A thread reinitializesthe thread object

What signals an object (contd)

Dispatcher object System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

A thread sets the event

Kernel resumes one or more threads

Kernel resumes one or more waiting threads

Event

nonsignaled signaled

Dedicated thread sets oneevent in the event pair

Kernel resumes theother dedicated thread

Kernel resumes waitingdedicated thread

Event pair

nonsignaled signaled

Timer expires

A thread (re) initializes the timer

Kernel resumes all waiting threads

Timer

nonsignaled signaled

Thread terminates

Kernel resumes all waiting threads

Thread

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 42: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

4242

Interrupt object

Allows device drivers to register ISRs for their devicesndash Contains dispatch code (initial handler)

ndash Dispatch code calls ISR with interrupt object as parameter(HW cannot pass parameters to ISR)

Connectingdisconnecting interrupt objectsndash Dynamic association between ISR and IDT entry

ndash Loadable device drivers (kernel modules)

ndash Turn onoff ISR

Interrupt objects can synchronize access to ISR datandash Multiple instances of ISR may be active simultaneously (MP machine)

ndash Multiple ISR may be connected with IRQL

4343

Predefined IRQLs

High ndash used when halting the system (via KeBugCheck())

Power fail ndash originated in the NT design document but has never been used

Inter-processor interruptndash used to request action from other processor (dispatching a thread updating a processors TLB system shutdown system crash)

Clockndash Used to update systemlsquos clock allocation of CPU time to threads

Profilendash Used for kernel profiling (see Kernel profiler ndash Kernprofexe Res Kit)

4444

Predefined IRQLs (contd)

Device

ndash Used to prioritize device interrupts

DPCdispatch and APC

ndash Software interrupts that kernel and device drivers generate

Passive

ndash No interrupt level at all normal thread execution

4545

IRQLs on 64-bit Systems

PassiveLowAPC

DispatchDPCDevice 1

Device n

Synch (Srv 2003)Clock

Interprocessor InterruptPower

HighProfile

012

1413

15

34

PassiveLowAPC

DispatchDPC amp Synch (UP only)Correctable Machine Check

Device 1

Device nSynch (MP only)

ClockInterprocessor Interrupt

HighProfilePower

x64 IA64

12

4646

Interrupt Prioritization amp Delivery

IRQLs are determined as followsndash x86 UP systems IRQL = 27 - IRQ

ndash x86 MP systems bucketized (random)

ndash x64 amp IA64 systems IRQL = IDT vector number 16

On MP systems which processor is chosen to deliver an interruptndash By default any processor can receive an interrupt from any deviceCan be configured with IntFilter utility in Resource Kit

ndash On x86 and x64 systems the IOAPIC (IO advanced programmable interrupt controller) is programmed to interrupt the processor running at the lowest IRQL

ndash On IA64 systems the SAPIC (streamlined advanced programmable interrupt controller) is configured to interrupt one processor for each interrupt sourceProcessors are assigned round robin for each interrupt vector

4747

Software interrupts

Initiating thread dispatching

ndash DPC allow for scheduling actions when kernel is deep within many layers of code

ndash Delayed scheduling decision one DPC queue per processor

Handling timer expiration

Asynchronous execution of a procedure in context of a particular thread

Support for asynchronous IO operations

4848

Flow of Interrupts

4949

Sync on MP use spinlocks to coordinate among processors

Spinlock acquisition and release routines implement a one-owner-at-a-time algorithm

ndash Spinlock is either free or is considered to be owned by a CPU

ndash Analogous to using Windows API mutexes from user mode

A spinlock is just a data cell in memory

ndash Accessed with a test-and-modify operation that is atomic across all processors

ndash KSPIN_LOCK is an opaque data type typedefrsquod as a ULONG

ndash To implement synchronization a single bit is sufficient

Synchronization on SMP Systems

5050

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

Kernel Synchronization

Processor BProcessor A

Critical section

spinlock

DPC DPC

A spinlock is a locking primitive associatedwith a global data structure such as the DPC queue

5151

Queued Spinlocks

Problem Checking status of spinlock via test-and-set operation creates bus contention

Queued spinlocks maintain queue of waiting processors

First processor acquires lock other processors wait on processor-local flag

ndash Thus busy-wait loop requires no access to the memory bus

When releasing lock the 1st processorrsquos flag is modified

ndash Exactly one processor is being signaled

ndash Pre-determined wait order

5252

SMP Scalability Improvements

Windows 2000 queued spinlocks

ndash qlocks in Kernel Debugger

Server 2003

ndash More spinlocks eliminated (context swap system space commit)

ndash Further reduction of use of spinlocks amp length they are held

ndash Scheduling database now per-CPUAllows thread state transitions in parallel

5353

SMP Scalability Improvements

XP2003

ndash Minimized lock contention for hot locks PFN or Page Frame Database lock

ndash Some locks completely eliminatedCharging nonpagedpaged pool quotas allocating and mapping system page table entries charging commitment of pages allocatingmapping physical memory through AWE functions

ndash New more efficient locking mechanism (pushlocks)Doesnrsquot use spinlocks when no contentionUsed for object manager and address windowing extensions (AWE) related locks

5454

Waiting

Flexible wait calls

ndash Wait for one or multiple objects in one call

ndash Wait for multiple can wait for ldquoanyrdquo one or ldquoallrdquo at onceldquoAllrdquo all objects must be in the signalled state concurrently to resolve the wait

ndash All wait calls include optional timeout argument

ndash Waiting threads consume no CPU time

5555

Waiting

Waitable objects include

ndash Events (may be auto-reset or manual reset may be set or ldquopulsedrdquo)

ndash Mutexes (ldquomutual exclusionrdquo one-at-a-time)

ndash Semaphores (n-at-a-time)

ndash Timers

ndash Processes and Threads (signalled upon exit or terminate)

ndash Directories (change notification)

5656

Waiting

No guaranteed ordering of wait resolution

ndash If multiple threads are waiting for an object and only one thread is released (eg itrsquos a mutex or auto-reset event) which thread gets released is unpredictable

5757

Executive Synchronization

Thread waitson an object

handle

Create and initialize thread object

Initialized

Ready

Transition

Waiting

Running

Terminated

Standby

Wait is completeSet object to

signaled state

Interaction with thread scheduling

Waiting on Dispatcher Objects ndash outside the kernel

5858

Interaction bet Synchronization amp Dispatching User mode thread waits on an event objectlsquos handle

Kernel changes threadlsquos scheduling state from ready to waiting and adds thread to wait-list

Another thread sets the event

Kernel wakes up waiting threads variable priority threads get priority boost

5959

Interaction bet Synchronization amp Dispatching Dispatcher re-schedules new thread ndash it may preempt running thread it it has lower priority and issues software interrupt to initiate context switch

If no processor can be preempted the dispatcher places the ready thread in the dispatcher ready queue to be scheduled later

6060

What signals an object

Dispatcher object

System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

Owning thread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (kernel mode)

nonsignaled signaled

Owning thread or otherthread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (exported to user mode)

nonsignaled signaled

One thread releases thesemaphore freeing a resource

A thread acquires the semaphoreMore resources are not available

Kernel resumes one or more waiting threads

Semaphore

6161

A thread reinitializesthe thread object

What signals an object (contd)

Dispatcher object System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

A thread sets the event

Kernel resumes one or more threads

Kernel resumes one or more waiting threads

Event

nonsignaled signaled

Dedicated thread sets oneevent in the event pair

Kernel resumes theother dedicated thread

Kernel resumes waitingdedicated thread

Event pair

nonsignaled signaled

Timer expires

A thread (re) initializes the timer

Kernel resumes all waiting threads

Timer

nonsignaled signaled

Thread terminates

Kernel resumes all waiting threads

Thread

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 43: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

4343

Predefined IRQLs

High ndash used when halting the system (via KeBugCheck())

Power fail ndash originated in the NT design document but has never been used

Inter-processor interruptndash used to request action from other processor (dispatching a thread updating a processors TLB system shutdown system crash)

Clockndash Used to update systemlsquos clock allocation of CPU time to threads

Profilendash Used for kernel profiling (see Kernel profiler ndash Kernprofexe Res Kit)

4444

Predefined IRQLs (contd)

Device

ndash Used to prioritize device interrupts

DPCdispatch and APC

ndash Software interrupts that kernel and device drivers generate

Passive

ndash No interrupt level at all normal thread execution

4545

IRQLs on 64-bit Systems

PassiveLowAPC

DispatchDPCDevice 1

Device n

Synch (Srv 2003)Clock

Interprocessor InterruptPower

HighProfile

012

1413

15

34

PassiveLowAPC

DispatchDPC amp Synch (UP only)Correctable Machine Check

Device 1

Device nSynch (MP only)

ClockInterprocessor Interrupt

HighProfilePower

x64 IA64

12

4646

Interrupt Prioritization amp Delivery

IRQLs are determined as followsndash x86 UP systems IRQL = 27 - IRQ

ndash x86 MP systems bucketized (random)

ndash x64 amp IA64 systems IRQL = IDT vector number 16

On MP systems which processor is chosen to deliver an interruptndash By default any processor can receive an interrupt from any deviceCan be configured with IntFilter utility in Resource Kit

ndash On x86 and x64 systems the IOAPIC (IO advanced programmable interrupt controller) is programmed to interrupt the processor running at the lowest IRQL

ndash On IA64 systems the SAPIC (streamlined advanced programmable interrupt controller) is configured to interrupt one processor for each interrupt sourceProcessors are assigned round robin for each interrupt vector

4747

Software interrupts

Initiating thread dispatching

ndash DPC allow for scheduling actions when kernel is deep within many layers of code

ndash Delayed scheduling decision one DPC queue per processor

Handling timer expiration

Asynchronous execution of a procedure in context of a particular thread

Support for asynchronous IO operations

4848

Flow of Interrupts

4949

Sync on MP use spinlocks to coordinate among processors

Spinlock acquisition and release routines implement a one-owner-at-a-time algorithm

ndash Spinlock is either free or is considered to be owned by a CPU

ndash Analogous to using Windows API mutexes from user mode

A spinlock is just a data cell in memory

ndash Accessed with a test-and-modify operation that is atomic across all processors

ndash KSPIN_LOCK is an opaque data type typedefrsquod as a ULONG

ndash To implement synchronization a single bit is sufficient

Synchronization on SMP Systems

5050

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

Kernel Synchronization

Processor BProcessor A

Critical section

spinlock

DPC DPC

A spinlock is a locking primitive associatedwith a global data structure such as the DPC queue

5151

Queued Spinlocks

Problem Checking status of spinlock via test-and-set operation creates bus contention

Queued spinlocks maintain queue of waiting processors

First processor acquires lock other processors wait on processor-local flag

ndash Thus busy-wait loop requires no access to the memory bus

When releasing lock the 1st processorrsquos flag is modified

ndash Exactly one processor is being signaled

ndash Pre-determined wait order

5252

SMP Scalability Improvements

Windows 2000 queued spinlocks

ndash qlocks in Kernel Debugger

Server 2003

ndash More spinlocks eliminated (context swap system space commit)

ndash Further reduction of use of spinlocks amp length they are held

ndash Scheduling database now per-CPUAllows thread state transitions in parallel

5353

SMP Scalability Improvements

XP2003

ndash Minimized lock contention for hot locks PFN or Page Frame Database lock

ndash Some locks completely eliminatedCharging nonpagedpaged pool quotas allocating and mapping system page table entries charging commitment of pages allocatingmapping physical memory through AWE functions

ndash New more efficient locking mechanism (pushlocks)Doesnrsquot use spinlocks when no contentionUsed for object manager and address windowing extensions (AWE) related locks

5454

Waiting

Flexible wait calls

ndash Wait for one or multiple objects in one call

ndash Wait for multiple can wait for ldquoanyrdquo one or ldquoallrdquo at onceldquoAllrdquo all objects must be in the signalled state concurrently to resolve the wait

ndash All wait calls include optional timeout argument

ndash Waiting threads consume no CPU time

5555

Waiting

Waitable objects include

ndash Events (may be auto-reset or manual reset may be set or ldquopulsedrdquo)

ndash Mutexes (ldquomutual exclusionrdquo one-at-a-time)

ndash Semaphores (n-at-a-time)

ndash Timers

ndash Processes and Threads (signalled upon exit or terminate)

ndash Directories (change notification)

5656

Waiting

No guaranteed ordering of wait resolution

ndash If multiple threads are waiting for an object and only one thread is released (eg itrsquos a mutex or auto-reset event) which thread gets released is unpredictable

5757

Executive Synchronization

Thread waitson an object

handle

Create and initialize thread object

Initialized

Ready

Transition

Waiting

Running

Terminated

Standby

Wait is completeSet object to

signaled state

Interaction with thread scheduling

Waiting on Dispatcher Objects ndash outside the kernel

5858

Interaction bet Synchronization amp Dispatching User mode thread waits on an event objectlsquos handle

Kernel changes threadlsquos scheduling state from ready to waiting and adds thread to wait-list

Another thread sets the event

Kernel wakes up waiting threads variable priority threads get priority boost

5959

Interaction bet Synchronization amp Dispatching Dispatcher re-schedules new thread ndash it may preempt running thread it it has lower priority and issues software interrupt to initiate context switch

If no processor can be preempted the dispatcher places the ready thread in the dispatcher ready queue to be scheduled later

6060

What signals an object

Dispatcher object

System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

Owning thread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (kernel mode)

nonsignaled signaled

Owning thread or otherthread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (exported to user mode)

nonsignaled signaled

One thread releases thesemaphore freeing a resource

A thread acquires the semaphoreMore resources are not available

Kernel resumes one or more waiting threads

Semaphore

6161

A thread reinitializesthe thread object

What signals an object (contd)

Dispatcher object System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

A thread sets the event

Kernel resumes one or more threads

Kernel resumes one or more waiting threads

Event

nonsignaled signaled

Dedicated thread sets oneevent in the event pair

Kernel resumes theother dedicated thread

Kernel resumes waitingdedicated thread

Event pair

nonsignaled signaled

Timer expires

A thread (re) initializes the timer

Kernel resumes all waiting threads

Timer

nonsignaled signaled

Thread terminates

Kernel resumes all waiting threads

Thread

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 44: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

4444

Predefined IRQLs (contd)

Device

ndash Used to prioritize device interrupts

DPCdispatch and APC

ndash Software interrupts that kernel and device drivers generate

Passive

ndash No interrupt level at all normal thread execution

4545

IRQLs on 64-bit Systems

PassiveLowAPC

DispatchDPCDevice 1

Device n

Synch (Srv 2003)Clock

Interprocessor InterruptPower

HighProfile

012

1413

15

34

PassiveLowAPC

DispatchDPC amp Synch (UP only)Correctable Machine Check

Device 1

Device nSynch (MP only)

ClockInterprocessor Interrupt

HighProfilePower

x64 IA64

12

4646

Interrupt Prioritization amp Delivery

IRQLs are determined as followsndash x86 UP systems IRQL = 27 - IRQ

ndash x86 MP systems bucketized (random)

ndash x64 amp IA64 systems IRQL = IDT vector number 16

On MP systems which processor is chosen to deliver an interruptndash By default any processor can receive an interrupt from any deviceCan be configured with IntFilter utility in Resource Kit

ndash On x86 and x64 systems the IOAPIC (IO advanced programmable interrupt controller) is programmed to interrupt the processor running at the lowest IRQL

ndash On IA64 systems the SAPIC (streamlined advanced programmable interrupt controller) is configured to interrupt one processor for each interrupt sourceProcessors are assigned round robin for each interrupt vector

4747

Software interrupts

Initiating thread dispatching

ndash DPC allow for scheduling actions when kernel is deep within many layers of code

ndash Delayed scheduling decision one DPC queue per processor

Handling timer expiration

Asynchronous execution of a procedure in context of a particular thread

Support for asynchronous IO operations

4848

Flow of Interrupts

4949

Sync on MP use spinlocks to coordinate among processors

Spinlock acquisition and release routines implement a one-owner-at-a-time algorithm

ndash Spinlock is either free or is considered to be owned by a CPU

ndash Analogous to using Windows API mutexes from user mode

A spinlock is just a data cell in memory

ndash Accessed with a test-and-modify operation that is atomic across all processors

ndash KSPIN_LOCK is an opaque data type typedefrsquod as a ULONG

ndash To implement synchronization a single bit is sufficient

Synchronization on SMP Systems

5050

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

Kernel Synchronization

Processor BProcessor A

Critical section

spinlock

DPC DPC

A spinlock is a locking primitive associatedwith a global data structure such as the DPC queue

5151

Queued Spinlocks

Problem Checking status of spinlock via test-and-set operation creates bus contention

Queued spinlocks maintain queue of waiting processors

First processor acquires lock other processors wait on processor-local flag

ndash Thus busy-wait loop requires no access to the memory bus

When releasing lock the 1st processorrsquos flag is modified

ndash Exactly one processor is being signaled

ndash Pre-determined wait order

5252

SMP Scalability Improvements

Windows 2000 queued spinlocks

ndash qlocks in Kernel Debugger

Server 2003

ndash More spinlocks eliminated (context swap system space commit)

ndash Further reduction of use of spinlocks amp length they are held

ndash Scheduling database now per-CPUAllows thread state transitions in parallel

5353

SMP Scalability Improvements

XP2003

ndash Minimized lock contention for hot locks PFN or Page Frame Database lock

ndash Some locks completely eliminatedCharging nonpagedpaged pool quotas allocating and mapping system page table entries charging commitment of pages allocatingmapping physical memory through AWE functions

ndash New more efficient locking mechanism (pushlocks)Doesnrsquot use spinlocks when no contentionUsed for object manager and address windowing extensions (AWE) related locks

5454

Waiting

Flexible wait calls

ndash Wait for one or multiple objects in one call

ndash Wait for multiple can wait for ldquoanyrdquo one or ldquoallrdquo at onceldquoAllrdquo all objects must be in the signalled state concurrently to resolve the wait

ndash All wait calls include optional timeout argument

ndash Waiting threads consume no CPU time

5555

Waiting

Waitable objects include

ndash Events (may be auto-reset or manual reset may be set or ldquopulsedrdquo)

ndash Mutexes (ldquomutual exclusionrdquo one-at-a-time)

ndash Semaphores (n-at-a-time)

ndash Timers

ndash Processes and Threads (signalled upon exit or terminate)

ndash Directories (change notification)

5656

Waiting

No guaranteed ordering of wait resolution

ndash If multiple threads are waiting for an object and only one thread is released (eg itrsquos a mutex or auto-reset event) which thread gets released is unpredictable

5757

Executive Synchronization

Thread waitson an object

handle

Create and initialize thread object

Initialized

Ready

Transition

Waiting

Running

Terminated

Standby

Wait is completeSet object to

signaled state

Interaction with thread scheduling

Waiting on Dispatcher Objects ndash outside the kernel

5858

Interaction bet Synchronization amp Dispatching User mode thread waits on an event objectlsquos handle

Kernel changes threadlsquos scheduling state from ready to waiting and adds thread to wait-list

Another thread sets the event

Kernel wakes up waiting threads variable priority threads get priority boost

5959

Interaction bet Synchronization amp Dispatching Dispatcher re-schedules new thread ndash it may preempt running thread it it has lower priority and issues software interrupt to initiate context switch

If no processor can be preempted the dispatcher places the ready thread in the dispatcher ready queue to be scheduled later

6060

What signals an object

Dispatcher object

System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

Owning thread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (kernel mode)

nonsignaled signaled

Owning thread or otherthread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (exported to user mode)

nonsignaled signaled

One thread releases thesemaphore freeing a resource

A thread acquires the semaphoreMore resources are not available

Kernel resumes one or more waiting threads

Semaphore

6161

A thread reinitializesthe thread object

What signals an object (contd)

Dispatcher object System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

A thread sets the event

Kernel resumes one or more threads

Kernel resumes one or more waiting threads

Event

nonsignaled signaled

Dedicated thread sets oneevent in the event pair

Kernel resumes theother dedicated thread

Kernel resumes waitingdedicated thread

Event pair

nonsignaled signaled

Timer expires

A thread (re) initializes the timer

Kernel resumes all waiting threads

Timer

nonsignaled signaled

Thread terminates

Kernel resumes all waiting threads

Thread

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 45: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

4545

IRQLs on 64-bit Systems

PassiveLowAPC

DispatchDPCDevice 1

Device n

Synch (Srv 2003)Clock

Interprocessor InterruptPower

HighProfile

012

1413

15

34

PassiveLowAPC

DispatchDPC amp Synch (UP only)Correctable Machine Check

Device 1

Device nSynch (MP only)

ClockInterprocessor Interrupt

HighProfilePower

x64 IA64

12

4646

Interrupt Prioritization amp Delivery

IRQLs are determined as followsndash x86 UP systems IRQL = 27 - IRQ

ndash x86 MP systems bucketized (random)

ndash x64 amp IA64 systems IRQL = IDT vector number 16

On MP systems which processor is chosen to deliver an interruptndash By default any processor can receive an interrupt from any deviceCan be configured with IntFilter utility in Resource Kit

ndash On x86 and x64 systems the IOAPIC (IO advanced programmable interrupt controller) is programmed to interrupt the processor running at the lowest IRQL

ndash On IA64 systems the SAPIC (streamlined advanced programmable interrupt controller) is configured to interrupt one processor for each interrupt sourceProcessors are assigned round robin for each interrupt vector

4747

Software interrupts

Initiating thread dispatching

ndash DPC allow for scheduling actions when kernel is deep within many layers of code

ndash Delayed scheduling decision one DPC queue per processor

Handling timer expiration

Asynchronous execution of a procedure in context of a particular thread

Support for asynchronous IO operations

4848

Flow of Interrupts

4949

Sync on MP use spinlocks to coordinate among processors

Spinlock acquisition and release routines implement a one-owner-at-a-time algorithm

ndash Spinlock is either free or is considered to be owned by a CPU

ndash Analogous to using Windows API mutexes from user mode

A spinlock is just a data cell in memory

ndash Accessed with a test-and-modify operation that is atomic across all processors

ndash KSPIN_LOCK is an opaque data type typedefrsquod as a ULONG

ndash To implement synchronization a single bit is sufficient

Synchronization on SMP Systems

5050

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

Kernel Synchronization

Processor BProcessor A

Critical section

spinlock

DPC DPC

A spinlock is a locking primitive associatedwith a global data structure such as the DPC queue

5151

Queued Spinlocks

Problem Checking status of spinlock via test-and-set operation creates bus contention

Queued spinlocks maintain queue of waiting processors

First processor acquires lock other processors wait on processor-local flag

ndash Thus busy-wait loop requires no access to the memory bus

When releasing lock the 1st processorrsquos flag is modified

ndash Exactly one processor is being signaled

ndash Pre-determined wait order

5252

SMP Scalability Improvements

Windows 2000 queued spinlocks

ndash qlocks in Kernel Debugger

Server 2003

ndash More spinlocks eliminated (context swap system space commit)

ndash Further reduction of use of spinlocks amp length they are held

ndash Scheduling database now per-CPUAllows thread state transitions in parallel

5353

SMP Scalability Improvements

XP2003

ndash Minimized lock contention for hot locks PFN or Page Frame Database lock

ndash Some locks completely eliminatedCharging nonpagedpaged pool quotas allocating and mapping system page table entries charging commitment of pages allocatingmapping physical memory through AWE functions

ndash New more efficient locking mechanism (pushlocks)Doesnrsquot use spinlocks when no contentionUsed for object manager and address windowing extensions (AWE) related locks

5454

Waiting

Flexible wait calls

ndash Wait for one or multiple objects in one call

ndash Wait for multiple can wait for ldquoanyrdquo one or ldquoallrdquo at onceldquoAllrdquo all objects must be in the signalled state concurrently to resolve the wait

ndash All wait calls include optional timeout argument

ndash Waiting threads consume no CPU time

5555

Waiting

Waitable objects include

ndash Events (may be auto-reset or manual reset may be set or ldquopulsedrdquo)

ndash Mutexes (ldquomutual exclusionrdquo one-at-a-time)

ndash Semaphores (n-at-a-time)

ndash Timers

ndash Processes and Threads (signalled upon exit or terminate)

ndash Directories (change notification)

5656

Waiting

No guaranteed ordering of wait resolution

ndash If multiple threads are waiting for an object and only one thread is released (eg itrsquos a mutex or auto-reset event) which thread gets released is unpredictable

5757

Executive Synchronization

Thread waitson an object

handle

Create and initialize thread object

Initialized

Ready

Transition

Waiting

Running

Terminated

Standby

Wait is completeSet object to

signaled state

Interaction with thread scheduling

Waiting on Dispatcher Objects ndash outside the kernel

5858

Interaction bet Synchronization amp Dispatching User mode thread waits on an event objectlsquos handle

Kernel changes threadlsquos scheduling state from ready to waiting and adds thread to wait-list

Another thread sets the event

Kernel wakes up waiting threads variable priority threads get priority boost

5959

Interaction bet Synchronization amp Dispatching Dispatcher re-schedules new thread ndash it may preempt running thread it it has lower priority and issues software interrupt to initiate context switch

If no processor can be preempted the dispatcher places the ready thread in the dispatcher ready queue to be scheduled later

6060

What signals an object

Dispatcher object

System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

Owning thread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (kernel mode)

nonsignaled signaled

Owning thread or otherthread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (exported to user mode)

nonsignaled signaled

One thread releases thesemaphore freeing a resource

A thread acquires the semaphoreMore resources are not available

Kernel resumes one or more waiting threads

Semaphore

6161

A thread reinitializesthe thread object

What signals an object (contd)

Dispatcher object System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

A thread sets the event

Kernel resumes one or more threads

Kernel resumes one or more waiting threads

Event

nonsignaled signaled

Dedicated thread sets oneevent in the event pair

Kernel resumes theother dedicated thread

Kernel resumes waitingdedicated thread

Event pair

nonsignaled signaled

Timer expires

A thread (re) initializes the timer

Kernel resumes all waiting threads

Timer

nonsignaled signaled

Thread terminates

Kernel resumes all waiting threads

Thread

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 46: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

4646

Interrupt Prioritization amp Delivery

IRQLs are determined as followsndash x86 UP systems IRQL = 27 - IRQ

ndash x86 MP systems bucketized (random)

ndash x64 amp IA64 systems IRQL = IDT vector number 16

On MP systems which processor is chosen to deliver an interruptndash By default any processor can receive an interrupt from any deviceCan be configured with IntFilter utility in Resource Kit

ndash On x86 and x64 systems the IOAPIC (IO advanced programmable interrupt controller) is programmed to interrupt the processor running at the lowest IRQL

ndash On IA64 systems the SAPIC (streamlined advanced programmable interrupt controller) is configured to interrupt one processor for each interrupt sourceProcessors are assigned round robin for each interrupt vector

4747

Software interrupts

Initiating thread dispatching

ndash DPC allow for scheduling actions when kernel is deep within many layers of code

ndash Delayed scheduling decision one DPC queue per processor

Handling timer expiration

Asynchronous execution of a procedure in context of a particular thread

Support for asynchronous IO operations

4848

Flow of Interrupts

4949

Sync on MP use spinlocks to coordinate among processors

Spinlock acquisition and release routines implement a one-owner-at-a-time algorithm

ndash Spinlock is either free or is considered to be owned by a CPU

ndash Analogous to using Windows API mutexes from user mode

A spinlock is just a data cell in memory

ndash Accessed with a test-and-modify operation that is atomic across all processors

ndash KSPIN_LOCK is an opaque data type typedefrsquod as a ULONG

ndash To implement synchronization a single bit is sufficient

Synchronization on SMP Systems

5050

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

Kernel Synchronization

Processor BProcessor A

Critical section

spinlock

DPC DPC

A spinlock is a locking primitive associatedwith a global data structure such as the DPC queue

5151

Queued Spinlocks

Problem Checking status of spinlock via test-and-set operation creates bus contention

Queued spinlocks maintain queue of waiting processors

First processor acquires lock other processors wait on processor-local flag

ndash Thus busy-wait loop requires no access to the memory bus

When releasing lock the 1st processorrsquos flag is modified

ndash Exactly one processor is being signaled

ndash Pre-determined wait order

5252

SMP Scalability Improvements

Windows 2000 queued spinlocks

ndash qlocks in Kernel Debugger

Server 2003

ndash More spinlocks eliminated (context swap system space commit)

ndash Further reduction of use of spinlocks amp length they are held

ndash Scheduling database now per-CPUAllows thread state transitions in parallel

5353

SMP Scalability Improvements

XP2003

ndash Minimized lock contention for hot locks PFN or Page Frame Database lock

ndash Some locks completely eliminatedCharging nonpagedpaged pool quotas allocating and mapping system page table entries charging commitment of pages allocatingmapping physical memory through AWE functions

ndash New more efficient locking mechanism (pushlocks)Doesnrsquot use spinlocks when no contentionUsed for object manager and address windowing extensions (AWE) related locks

5454

Waiting

Flexible wait calls

ndash Wait for one or multiple objects in one call

ndash Wait for multiple can wait for ldquoanyrdquo one or ldquoallrdquo at onceldquoAllrdquo all objects must be in the signalled state concurrently to resolve the wait

ndash All wait calls include optional timeout argument

ndash Waiting threads consume no CPU time

5555

Waiting

Waitable objects include

ndash Events (may be auto-reset or manual reset may be set or ldquopulsedrdquo)

ndash Mutexes (ldquomutual exclusionrdquo one-at-a-time)

ndash Semaphores (n-at-a-time)

ndash Timers

ndash Processes and Threads (signalled upon exit or terminate)

ndash Directories (change notification)

5656

Waiting

No guaranteed ordering of wait resolution

ndash If multiple threads are waiting for an object and only one thread is released (eg itrsquos a mutex or auto-reset event) which thread gets released is unpredictable

5757

Executive Synchronization

Thread waitson an object

handle

Create and initialize thread object

Initialized

Ready

Transition

Waiting

Running

Terminated

Standby

Wait is completeSet object to

signaled state

Interaction with thread scheduling

Waiting on Dispatcher Objects ndash outside the kernel

5858

Interaction bet Synchronization amp Dispatching User mode thread waits on an event objectlsquos handle

Kernel changes threadlsquos scheduling state from ready to waiting and adds thread to wait-list

Another thread sets the event

Kernel wakes up waiting threads variable priority threads get priority boost

5959

Interaction bet Synchronization amp Dispatching Dispatcher re-schedules new thread ndash it may preempt running thread it it has lower priority and issues software interrupt to initiate context switch

If no processor can be preempted the dispatcher places the ready thread in the dispatcher ready queue to be scheduled later

6060

What signals an object

Dispatcher object

System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

Owning thread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (kernel mode)

nonsignaled signaled

Owning thread or otherthread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (exported to user mode)

nonsignaled signaled

One thread releases thesemaphore freeing a resource

A thread acquires the semaphoreMore resources are not available

Kernel resumes one or more waiting threads

Semaphore

6161

A thread reinitializesthe thread object

What signals an object (contd)

Dispatcher object System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

A thread sets the event

Kernel resumes one or more threads

Kernel resumes one or more waiting threads

Event

nonsignaled signaled

Dedicated thread sets oneevent in the event pair

Kernel resumes theother dedicated thread

Kernel resumes waitingdedicated thread

Event pair

nonsignaled signaled

Timer expires

A thread (re) initializes the timer

Kernel resumes all waiting threads

Timer

nonsignaled signaled

Thread terminates

Kernel resumes all waiting threads

Thread

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 47: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

4747

Software interrupts

Initiating thread dispatching

ndash DPC allow for scheduling actions when kernel is deep within many layers of code

ndash Delayed scheduling decision one DPC queue per processor

Handling timer expiration

Asynchronous execution of a procedure in context of a particular thread

Support for asynchronous IO operations

4848

Flow of Interrupts

4949

Sync on MP use spinlocks to coordinate among processors

Spinlock acquisition and release routines implement a one-owner-at-a-time algorithm

ndash Spinlock is either free or is considered to be owned by a CPU

ndash Analogous to using Windows API mutexes from user mode

A spinlock is just a data cell in memory

ndash Accessed with a test-and-modify operation that is atomic across all processors

ndash KSPIN_LOCK is an opaque data type typedefrsquod as a ULONG

ndash To implement synchronization a single bit is sufficient

Synchronization on SMP Systems

5050

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

Kernel Synchronization

Processor BProcessor A

Critical section

spinlock

DPC DPC

A spinlock is a locking primitive associatedwith a global data structure such as the DPC queue

5151

Queued Spinlocks

Problem Checking status of spinlock via test-and-set operation creates bus contention

Queued spinlocks maintain queue of waiting processors

First processor acquires lock other processors wait on processor-local flag

ndash Thus busy-wait loop requires no access to the memory bus

When releasing lock the 1st processorrsquos flag is modified

ndash Exactly one processor is being signaled

ndash Pre-determined wait order

5252

SMP Scalability Improvements

Windows 2000 queued spinlocks

ndash qlocks in Kernel Debugger

Server 2003

ndash More spinlocks eliminated (context swap system space commit)

ndash Further reduction of use of spinlocks amp length they are held

ndash Scheduling database now per-CPUAllows thread state transitions in parallel

5353

SMP Scalability Improvements

XP2003

ndash Minimized lock contention for hot locks PFN or Page Frame Database lock

ndash Some locks completely eliminatedCharging nonpagedpaged pool quotas allocating and mapping system page table entries charging commitment of pages allocatingmapping physical memory through AWE functions

ndash New more efficient locking mechanism (pushlocks)Doesnrsquot use spinlocks when no contentionUsed for object manager and address windowing extensions (AWE) related locks

5454

Waiting

Flexible wait calls

ndash Wait for one or multiple objects in one call

ndash Wait for multiple can wait for ldquoanyrdquo one or ldquoallrdquo at onceldquoAllrdquo all objects must be in the signalled state concurrently to resolve the wait

ndash All wait calls include optional timeout argument

ndash Waiting threads consume no CPU time

5555

Waiting

Waitable objects include

ndash Events (may be auto-reset or manual reset may be set or ldquopulsedrdquo)

ndash Mutexes (ldquomutual exclusionrdquo one-at-a-time)

ndash Semaphores (n-at-a-time)

ndash Timers

ndash Processes and Threads (signalled upon exit or terminate)

ndash Directories (change notification)

5656

Waiting

No guaranteed ordering of wait resolution

ndash If multiple threads are waiting for an object and only one thread is released (eg itrsquos a mutex or auto-reset event) which thread gets released is unpredictable

5757

Executive Synchronization

Thread waitson an object

handle

Create and initialize thread object

Initialized

Ready

Transition

Waiting

Running

Terminated

Standby

Wait is completeSet object to

signaled state

Interaction with thread scheduling

Waiting on Dispatcher Objects ndash outside the kernel

5858

Interaction bet Synchronization amp Dispatching User mode thread waits on an event objectlsquos handle

Kernel changes threadlsquos scheduling state from ready to waiting and adds thread to wait-list

Another thread sets the event

Kernel wakes up waiting threads variable priority threads get priority boost

5959

Interaction bet Synchronization amp Dispatching Dispatcher re-schedules new thread ndash it may preempt running thread it it has lower priority and issues software interrupt to initiate context switch

If no processor can be preempted the dispatcher places the ready thread in the dispatcher ready queue to be scheduled later

6060

What signals an object

Dispatcher object

System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

Owning thread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (kernel mode)

nonsignaled signaled

Owning thread or otherthread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (exported to user mode)

nonsignaled signaled

One thread releases thesemaphore freeing a resource

A thread acquires the semaphoreMore resources are not available

Kernel resumes one or more waiting threads

Semaphore

6161

A thread reinitializesthe thread object

What signals an object (contd)

Dispatcher object System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

A thread sets the event

Kernel resumes one or more threads

Kernel resumes one or more waiting threads

Event

nonsignaled signaled

Dedicated thread sets oneevent in the event pair

Kernel resumes theother dedicated thread

Kernel resumes waitingdedicated thread

Event pair

nonsignaled signaled

Timer expires

A thread (re) initializes the timer

Kernel resumes all waiting threads

Timer

nonsignaled signaled

Thread terminates

Kernel resumes all waiting threads

Thread

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 48: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

4848

Flow of Interrupts

4949

Sync on MP use spinlocks to coordinate among processors

Spinlock acquisition and release routines implement a one-owner-at-a-time algorithm

ndash Spinlock is either free or is considered to be owned by a CPU

ndash Analogous to using Windows API mutexes from user mode

A spinlock is just a data cell in memory

ndash Accessed with a test-and-modify operation that is atomic across all processors

ndash KSPIN_LOCK is an opaque data type typedefrsquod as a ULONG

ndash To implement synchronization a single bit is sufficient

Synchronization on SMP Systems

5050

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

Kernel Synchronization

Processor BProcessor A

Critical section

spinlock

DPC DPC

A spinlock is a locking primitive associatedwith a global data structure such as the DPC queue

5151

Queued Spinlocks

Problem Checking status of spinlock via test-and-set operation creates bus contention

Queued spinlocks maintain queue of waiting processors

First processor acquires lock other processors wait on processor-local flag

ndash Thus busy-wait loop requires no access to the memory bus

When releasing lock the 1st processorrsquos flag is modified

ndash Exactly one processor is being signaled

ndash Pre-determined wait order

5252

SMP Scalability Improvements

Windows 2000 queued spinlocks

ndash qlocks in Kernel Debugger

Server 2003

ndash More spinlocks eliminated (context swap system space commit)

ndash Further reduction of use of spinlocks amp length they are held

ndash Scheduling database now per-CPUAllows thread state transitions in parallel

5353

SMP Scalability Improvements

XP2003

ndash Minimized lock contention for hot locks PFN or Page Frame Database lock

ndash Some locks completely eliminatedCharging nonpagedpaged pool quotas allocating and mapping system page table entries charging commitment of pages allocatingmapping physical memory through AWE functions

ndash New more efficient locking mechanism (pushlocks)Doesnrsquot use spinlocks when no contentionUsed for object manager and address windowing extensions (AWE) related locks

5454

Waiting

Flexible wait calls

ndash Wait for one or multiple objects in one call

ndash Wait for multiple can wait for ldquoanyrdquo one or ldquoallrdquo at onceldquoAllrdquo all objects must be in the signalled state concurrently to resolve the wait

ndash All wait calls include optional timeout argument

ndash Waiting threads consume no CPU time

5555

Waiting

Waitable objects include

ndash Events (may be auto-reset or manual reset may be set or ldquopulsedrdquo)

ndash Mutexes (ldquomutual exclusionrdquo one-at-a-time)

ndash Semaphores (n-at-a-time)

ndash Timers

ndash Processes and Threads (signalled upon exit or terminate)

ndash Directories (change notification)

5656

Waiting

No guaranteed ordering of wait resolution

ndash If multiple threads are waiting for an object and only one thread is released (eg itrsquos a mutex or auto-reset event) which thread gets released is unpredictable

5757

Executive Synchronization

Thread waitson an object

handle

Create and initialize thread object

Initialized

Ready

Transition

Waiting

Running

Terminated

Standby

Wait is completeSet object to

signaled state

Interaction with thread scheduling

Waiting on Dispatcher Objects ndash outside the kernel

5858

Interaction bet Synchronization amp Dispatching User mode thread waits on an event objectlsquos handle

Kernel changes threadlsquos scheduling state from ready to waiting and adds thread to wait-list

Another thread sets the event

Kernel wakes up waiting threads variable priority threads get priority boost

5959

Interaction bet Synchronization amp Dispatching Dispatcher re-schedules new thread ndash it may preempt running thread it it has lower priority and issues software interrupt to initiate context switch

If no processor can be preempted the dispatcher places the ready thread in the dispatcher ready queue to be scheduled later

6060

What signals an object

Dispatcher object

System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

Owning thread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (kernel mode)

nonsignaled signaled

Owning thread or otherthread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (exported to user mode)

nonsignaled signaled

One thread releases thesemaphore freeing a resource

A thread acquires the semaphoreMore resources are not available

Kernel resumes one or more waiting threads

Semaphore

6161

A thread reinitializesthe thread object

What signals an object (contd)

Dispatcher object System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

A thread sets the event

Kernel resumes one or more threads

Kernel resumes one or more waiting threads

Event

nonsignaled signaled

Dedicated thread sets oneevent in the event pair

Kernel resumes theother dedicated thread

Kernel resumes waitingdedicated thread

Event pair

nonsignaled signaled

Timer expires

A thread (re) initializes the timer

Kernel resumes all waiting threads

Timer

nonsignaled signaled

Thread terminates

Kernel resumes all waiting threads

Thread

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 49: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

4949

Sync on MP use spinlocks to coordinate among processors

Spinlock acquisition and release routines implement a one-owner-at-a-time algorithm

ndash Spinlock is either free or is considered to be owned by a CPU

ndash Analogous to using Windows API mutexes from user mode

A spinlock is just a data cell in memory

ndash Accessed with a test-and-modify operation that is atomic across all processors

ndash KSPIN_LOCK is an opaque data type typedefrsquod as a ULONG

ndash To implement synchronization a single bit is sufficient

Synchronization on SMP Systems

5050

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

Kernel Synchronization

Processor BProcessor A

Critical section

spinlock

DPC DPC

A spinlock is a locking primitive associatedwith a global data structure such as the DPC queue

5151

Queued Spinlocks

Problem Checking status of spinlock via test-and-set operation creates bus contention

Queued spinlocks maintain queue of waiting processors

First processor acquires lock other processors wait on processor-local flag

ndash Thus busy-wait loop requires no access to the memory bus

When releasing lock the 1st processorrsquos flag is modified

ndash Exactly one processor is being signaled

ndash Pre-determined wait order

5252

SMP Scalability Improvements

Windows 2000 queued spinlocks

ndash qlocks in Kernel Debugger

Server 2003

ndash More spinlocks eliminated (context swap system space commit)

ndash Further reduction of use of spinlocks amp length they are held

ndash Scheduling database now per-CPUAllows thread state transitions in parallel

5353

SMP Scalability Improvements

XP2003

ndash Minimized lock contention for hot locks PFN or Page Frame Database lock

ndash Some locks completely eliminatedCharging nonpagedpaged pool quotas allocating and mapping system page table entries charging commitment of pages allocatingmapping physical memory through AWE functions

ndash New more efficient locking mechanism (pushlocks)Doesnrsquot use spinlocks when no contentionUsed for object manager and address windowing extensions (AWE) related locks

5454

Waiting

Flexible wait calls

ndash Wait for one or multiple objects in one call

ndash Wait for multiple can wait for ldquoanyrdquo one or ldquoallrdquo at onceldquoAllrdquo all objects must be in the signalled state concurrently to resolve the wait

ndash All wait calls include optional timeout argument

ndash Waiting threads consume no CPU time

5555

Waiting

Waitable objects include

ndash Events (may be auto-reset or manual reset may be set or ldquopulsedrdquo)

ndash Mutexes (ldquomutual exclusionrdquo one-at-a-time)

ndash Semaphores (n-at-a-time)

ndash Timers

ndash Processes and Threads (signalled upon exit or terminate)

ndash Directories (change notification)

5656

Waiting

No guaranteed ordering of wait resolution

ndash If multiple threads are waiting for an object and only one thread is released (eg itrsquos a mutex or auto-reset event) which thread gets released is unpredictable

5757

Executive Synchronization

Thread waitson an object

handle

Create and initialize thread object

Initialized

Ready

Transition

Waiting

Running

Terminated

Standby

Wait is completeSet object to

signaled state

Interaction with thread scheduling

Waiting on Dispatcher Objects ndash outside the kernel

5858

Interaction bet Synchronization amp Dispatching User mode thread waits on an event objectlsquos handle

Kernel changes threadlsquos scheduling state from ready to waiting and adds thread to wait-list

Another thread sets the event

Kernel wakes up waiting threads variable priority threads get priority boost

5959

Interaction bet Synchronization amp Dispatching Dispatcher re-schedules new thread ndash it may preempt running thread it it has lower priority and issues software interrupt to initiate context switch

If no processor can be preempted the dispatcher places the ready thread in the dispatcher ready queue to be scheduled later

6060

What signals an object

Dispatcher object

System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

Owning thread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (kernel mode)

nonsignaled signaled

Owning thread or otherthread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (exported to user mode)

nonsignaled signaled

One thread releases thesemaphore freeing a resource

A thread acquires the semaphoreMore resources are not available

Kernel resumes one or more waiting threads

Semaphore

6161

A thread reinitializesthe thread object

What signals an object (contd)

Dispatcher object System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

A thread sets the event

Kernel resumes one or more threads

Kernel resumes one or more waiting threads

Event

nonsignaled signaled

Dedicated thread sets oneevent in the event pair

Kernel resumes theother dedicated thread

Kernel resumes waitingdedicated thread

Event pair

nonsignaled signaled

Timer expires

A thread (re) initializes the timer

Kernel resumes all waiting threads

Timer

nonsignaled signaled

Thread terminates

Kernel resumes all waiting threads

Thread

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 50: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

5050

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

do acquire_spinlock(DPC)until (SUCCESS)

begin remove DPC from queueend

release_spinlock(DPC)

Kernel Synchronization

Processor BProcessor A

Critical section

spinlock

DPC DPC

A spinlock is a locking primitive associatedwith a global data structure such as the DPC queue

5151

Queued Spinlocks

Problem Checking status of spinlock via test-and-set operation creates bus contention

Queued spinlocks maintain queue of waiting processors

First processor acquires lock other processors wait on processor-local flag

ndash Thus busy-wait loop requires no access to the memory bus

When releasing lock the 1st processorrsquos flag is modified

ndash Exactly one processor is being signaled

ndash Pre-determined wait order

5252

SMP Scalability Improvements

Windows 2000 queued spinlocks

ndash qlocks in Kernel Debugger

Server 2003

ndash More spinlocks eliminated (context swap system space commit)

ndash Further reduction of use of spinlocks amp length they are held

ndash Scheduling database now per-CPUAllows thread state transitions in parallel

5353

SMP Scalability Improvements

XP2003

ndash Minimized lock contention for hot locks PFN or Page Frame Database lock

ndash Some locks completely eliminatedCharging nonpagedpaged pool quotas allocating and mapping system page table entries charging commitment of pages allocatingmapping physical memory through AWE functions

ndash New more efficient locking mechanism (pushlocks)Doesnrsquot use spinlocks when no contentionUsed for object manager and address windowing extensions (AWE) related locks

5454

Waiting

Flexible wait calls

ndash Wait for one or multiple objects in one call

ndash Wait for multiple can wait for ldquoanyrdquo one or ldquoallrdquo at onceldquoAllrdquo all objects must be in the signalled state concurrently to resolve the wait

ndash All wait calls include optional timeout argument

ndash Waiting threads consume no CPU time

5555

Waiting

Waitable objects include

ndash Events (may be auto-reset or manual reset may be set or ldquopulsedrdquo)

ndash Mutexes (ldquomutual exclusionrdquo one-at-a-time)

ndash Semaphores (n-at-a-time)

ndash Timers

ndash Processes and Threads (signalled upon exit or terminate)

ndash Directories (change notification)

5656

Waiting

No guaranteed ordering of wait resolution

ndash If multiple threads are waiting for an object and only one thread is released (eg itrsquos a mutex or auto-reset event) which thread gets released is unpredictable

5757

Executive Synchronization

Thread waitson an object

handle

Create and initialize thread object

Initialized

Ready

Transition

Waiting

Running

Terminated

Standby

Wait is completeSet object to

signaled state

Interaction with thread scheduling

Waiting on Dispatcher Objects ndash outside the kernel

5858

Interaction bet Synchronization amp Dispatching User mode thread waits on an event objectlsquos handle

Kernel changes threadlsquos scheduling state from ready to waiting and adds thread to wait-list

Another thread sets the event

Kernel wakes up waiting threads variable priority threads get priority boost

5959

Interaction bet Synchronization amp Dispatching Dispatcher re-schedules new thread ndash it may preempt running thread it it has lower priority and issues software interrupt to initiate context switch

If no processor can be preempted the dispatcher places the ready thread in the dispatcher ready queue to be scheduled later

6060

What signals an object

Dispatcher object

System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

Owning thread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (kernel mode)

nonsignaled signaled

Owning thread or otherthread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (exported to user mode)

nonsignaled signaled

One thread releases thesemaphore freeing a resource

A thread acquires the semaphoreMore resources are not available

Kernel resumes one or more waiting threads

Semaphore

6161

A thread reinitializesthe thread object

What signals an object (contd)

Dispatcher object System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

A thread sets the event

Kernel resumes one or more threads

Kernel resumes one or more waiting threads

Event

nonsignaled signaled

Dedicated thread sets oneevent in the event pair

Kernel resumes theother dedicated thread

Kernel resumes waitingdedicated thread

Event pair

nonsignaled signaled

Timer expires

A thread (re) initializes the timer

Kernel resumes all waiting threads

Timer

nonsignaled signaled

Thread terminates

Kernel resumes all waiting threads

Thread

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 51: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

5151

Queued Spinlocks

Problem Checking status of spinlock via test-and-set operation creates bus contention

Queued spinlocks maintain queue of waiting processors

First processor acquires lock other processors wait on processor-local flag

ndash Thus busy-wait loop requires no access to the memory bus

When releasing lock the 1st processorrsquos flag is modified

ndash Exactly one processor is being signaled

ndash Pre-determined wait order

5252

SMP Scalability Improvements

Windows 2000 queued spinlocks

ndash qlocks in Kernel Debugger

Server 2003

ndash More spinlocks eliminated (context swap system space commit)

ndash Further reduction of use of spinlocks amp length they are held

ndash Scheduling database now per-CPUAllows thread state transitions in parallel

5353

SMP Scalability Improvements

XP2003

ndash Minimized lock contention for hot locks PFN or Page Frame Database lock

ndash Some locks completely eliminatedCharging nonpagedpaged pool quotas allocating and mapping system page table entries charging commitment of pages allocatingmapping physical memory through AWE functions

ndash New more efficient locking mechanism (pushlocks)Doesnrsquot use spinlocks when no contentionUsed for object manager and address windowing extensions (AWE) related locks

5454

Waiting

Flexible wait calls

ndash Wait for one or multiple objects in one call

ndash Wait for multiple can wait for ldquoanyrdquo one or ldquoallrdquo at onceldquoAllrdquo all objects must be in the signalled state concurrently to resolve the wait

ndash All wait calls include optional timeout argument

ndash Waiting threads consume no CPU time

5555

Waiting

Waitable objects include

ndash Events (may be auto-reset or manual reset may be set or ldquopulsedrdquo)

ndash Mutexes (ldquomutual exclusionrdquo one-at-a-time)

ndash Semaphores (n-at-a-time)

ndash Timers

ndash Processes and Threads (signalled upon exit or terminate)

ndash Directories (change notification)

5656

Waiting

No guaranteed ordering of wait resolution

ndash If multiple threads are waiting for an object and only one thread is released (eg itrsquos a mutex or auto-reset event) which thread gets released is unpredictable

5757

Executive Synchronization

Thread waitson an object

handle

Create and initialize thread object

Initialized

Ready

Transition

Waiting

Running

Terminated

Standby

Wait is completeSet object to

signaled state

Interaction with thread scheduling

Waiting on Dispatcher Objects ndash outside the kernel

5858

Interaction bet Synchronization amp Dispatching User mode thread waits on an event objectlsquos handle

Kernel changes threadlsquos scheduling state from ready to waiting and adds thread to wait-list

Another thread sets the event

Kernel wakes up waiting threads variable priority threads get priority boost

5959

Interaction bet Synchronization amp Dispatching Dispatcher re-schedules new thread ndash it may preempt running thread it it has lower priority and issues software interrupt to initiate context switch

If no processor can be preempted the dispatcher places the ready thread in the dispatcher ready queue to be scheduled later

6060

What signals an object

Dispatcher object

System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

Owning thread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (kernel mode)

nonsignaled signaled

Owning thread or otherthread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (exported to user mode)

nonsignaled signaled

One thread releases thesemaphore freeing a resource

A thread acquires the semaphoreMore resources are not available

Kernel resumes one or more waiting threads

Semaphore

6161

A thread reinitializesthe thread object

What signals an object (contd)

Dispatcher object System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

A thread sets the event

Kernel resumes one or more threads

Kernel resumes one or more waiting threads

Event

nonsignaled signaled

Dedicated thread sets oneevent in the event pair

Kernel resumes theother dedicated thread

Kernel resumes waitingdedicated thread

Event pair

nonsignaled signaled

Timer expires

A thread (re) initializes the timer

Kernel resumes all waiting threads

Timer

nonsignaled signaled

Thread terminates

Kernel resumes all waiting threads

Thread

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 52: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

5252

SMP Scalability Improvements

Windows 2000 queued spinlocks

ndash qlocks in Kernel Debugger

Server 2003

ndash More spinlocks eliminated (context swap system space commit)

ndash Further reduction of use of spinlocks amp length they are held

ndash Scheduling database now per-CPUAllows thread state transitions in parallel

5353

SMP Scalability Improvements

XP2003

ndash Minimized lock contention for hot locks PFN or Page Frame Database lock

ndash Some locks completely eliminatedCharging nonpagedpaged pool quotas allocating and mapping system page table entries charging commitment of pages allocatingmapping physical memory through AWE functions

ndash New more efficient locking mechanism (pushlocks)Doesnrsquot use spinlocks when no contentionUsed for object manager and address windowing extensions (AWE) related locks

5454

Waiting

Flexible wait calls

ndash Wait for one or multiple objects in one call

ndash Wait for multiple can wait for ldquoanyrdquo one or ldquoallrdquo at onceldquoAllrdquo all objects must be in the signalled state concurrently to resolve the wait

ndash All wait calls include optional timeout argument

ndash Waiting threads consume no CPU time

5555

Waiting

Waitable objects include

ndash Events (may be auto-reset or manual reset may be set or ldquopulsedrdquo)

ndash Mutexes (ldquomutual exclusionrdquo one-at-a-time)

ndash Semaphores (n-at-a-time)

ndash Timers

ndash Processes and Threads (signalled upon exit or terminate)

ndash Directories (change notification)

5656

Waiting

No guaranteed ordering of wait resolution

ndash If multiple threads are waiting for an object and only one thread is released (eg itrsquos a mutex or auto-reset event) which thread gets released is unpredictable

5757

Executive Synchronization

Thread waitson an object

handle

Create and initialize thread object

Initialized

Ready

Transition

Waiting

Running

Terminated

Standby

Wait is completeSet object to

signaled state

Interaction with thread scheduling

Waiting on Dispatcher Objects ndash outside the kernel

5858

Interaction bet Synchronization amp Dispatching User mode thread waits on an event objectlsquos handle

Kernel changes threadlsquos scheduling state from ready to waiting and adds thread to wait-list

Another thread sets the event

Kernel wakes up waiting threads variable priority threads get priority boost

5959

Interaction bet Synchronization amp Dispatching Dispatcher re-schedules new thread ndash it may preempt running thread it it has lower priority and issues software interrupt to initiate context switch

If no processor can be preempted the dispatcher places the ready thread in the dispatcher ready queue to be scheduled later

6060

What signals an object

Dispatcher object

System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

Owning thread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (kernel mode)

nonsignaled signaled

Owning thread or otherthread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (exported to user mode)

nonsignaled signaled

One thread releases thesemaphore freeing a resource

A thread acquires the semaphoreMore resources are not available

Kernel resumes one or more waiting threads

Semaphore

6161

A thread reinitializesthe thread object

What signals an object (contd)

Dispatcher object System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

A thread sets the event

Kernel resumes one or more threads

Kernel resumes one or more waiting threads

Event

nonsignaled signaled

Dedicated thread sets oneevent in the event pair

Kernel resumes theother dedicated thread

Kernel resumes waitingdedicated thread

Event pair

nonsignaled signaled

Timer expires

A thread (re) initializes the timer

Kernel resumes all waiting threads

Timer

nonsignaled signaled

Thread terminates

Kernel resumes all waiting threads

Thread

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 53: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

5353

SMP Scalability Improvements

XP2003

ndash Minimized lock contention for hot locks PFN or Page Frame Database lock

ndash Some locks completely eliminatedCharging nonpagedpaged pool quotas allocating and mapping system page table entries charging commitment of pages allocatingmapping physical memory through AWE functions

ndash New more efficient locking mechanism (pushlocks)Doesnrsquot use spinlocks when no contentionUsed for object manager and address windowing extensions (AWE) related locks

5454

Waiting

Flexible wait calls

ndash Wait for one or multiple objects in one call

ndash Wait for multiple can wait for ldquoanyrdquo one or ldquoallrdquo at onceldquoAllrdquo all objects must be in the signalled state concurrently to resolve the wait

ndash All wait calls include optional timeout argument

ndash Waiting threads consume no CPU time

5555

Waiting

Waitable objects include

ndash Events (may be auto-reset or manual reset may be set or ldquopulsedrdquo)

ndash Mutexes (ldquomutual exclusionrdquo one-at-a-time)

ndash Semaphores (n-at-a-time)

ndash Timers

ndash Processes and Threads (signalled upon exit or terminate)

ndash Directories (change notification)

5656

Waiting

No guaranteed ordering of wait resolution

ndash If multiple threads are waiting for an object and only one thread is released (eg itrsquos a mutex or auto-reset event) which thread gets released is unpredictable

5757

Executive Synchronization

Thread waitson an object

handle

Create and initialize thread object

Initialized

Ready

Transition

Waiting

Running

Terminated

Standby

Wait is completeSet object to

signaled state

Interaction with thread scheduling

Waiting on Dispatcher Objects ndash outside the kernel

5858

Interaction bet Synchronization amp Dispatching User mode thread waits on an event objectlsquos handle

Kernel changes threadlsquos scheduling state from ready to waiting and adds thread to wait-list

Another thread sets the event

Kernel wakes up waiting threads variable priority threads get priority boost

5959

Interaction bet Synchronization amp Dispatching Dispatcher re-schedules new thread ndash it may preempt running thread it it has lower priority and issues software interrupt to initiate context switch

If no processor can be preempted the dispatcher places the ready thread in the dispatcher ready queue to be scheduled later

6060

What signals an object

Dispatcher object

System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

Owning thread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (kernel mode)

nonsignaled signaled

Owning thread or otherthread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (exported to user mode)

nonsignaled signaled

One thread releases thesemaphore freeing a resource

A thread acquires the semaphoreMore resources are not available

Kernel resumes one or more waiting threads

Semaphore

6161

A thread reinitializesthe thread object

What signals an object (contd)

Dispatcher object System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

A thread sets the event

Kernel resumes one or more threads

Kernel resumes one or more waiting threads

Event

nonsignaled signaled

Dedicated thread sets oneevent in the event pair

Kernel resumes theother dedicated thread

Kernel resumes waitingdedicated thread

Event pair

nonsignaled signaled

Timer expires

A thread (re) initializes the timer

Kernel resumes all waiting threads

Timer

nonsignaled signaled

Thread terminates

Kernel resumes all waiting threads

Thread

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 54: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

5454

Waiting

Flexible wait calls

ndash Wait for one or multiple objects in one call

ndash Wait for multiple can wait for ldquoanyrdquo one or ldquoallrdquo at onceldquoAllrdquo all objects must be in the signalled state concurrently to resolve the wait

ndash All wait calls include optional timeout argument

ndash Waiting threads consume no CPU time

5555

Waiting

Waitable objects include

ndash Events (may be auto-reset or manual reset may be set or ldquopulsedrdquo)

ndash Mutexes (ldquomutual exclusionrdquo one-at-a-time)

ndash Semaphores (n-at-a-time)

ndash Timers

ndash Processes and Threads (signalled upon exit or terminate)

ndash Directories (change notification)

5656

Waiting

No guaranteed ordering of wait resolution

ndash If multiple threads are waiting for an object and only one thread is released (eg itrsquos a mutex or auto-reset event) which thread gets released is unpredictable

5757

Executive Synchronization

Thread waitson an object

handle

Create and initialize thread object

Initialized

Ready

Transition

Waiting

Running

Terminated

Standby

Wait is completeSet object to

signaled state

Interaction with thread scheduling

Waiting on Dispatcher Objects ndash outside the kernel

5858

Interaction bet Synchronization amp Dispatching User mode thread waits on an event objectlsquos handle

Kernel changes threadlsquos scheduling state from ready to waiting and adds thread to wait-list

Another thread sets the event

Kernel wakes up waiting threads variable priority threads get priority boost

5959

Interaction bet Synchronization amp Dispatching Dispatcher re-schedules new thread ndash it may preempt running thread it it has lower priority and issues software interrupt to initiate context switch

If no processor can be preempted the dispatcher places the ready thread in the dispatcher ready queue to be scheduled later

6060

What signals an object

Dispatcher object

System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

Owning thread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (kernel mode)

nonsignaled signaled

Owning thread or otherthread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (exported to user mode)

nonsignaled signaled

One thread releases thesemaphore freeing a resource

A thread acquires the semaphoreMore resources are not available

Kernel resumes one or more waiting threads

Semaphore

6161

A thread reinitializesthe thread object

What signals an object (contd)

Dispatcher object System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

A thread sets the event

Kernel resumes one or more threads

Kernel resumes one or more waiting threads

Event

nonsignaled signaled

Dedicated thread sets oneevent in the event pair

Kernel resumes theother dedicated thread

Kernel resumes waitingdedicated thread

Event pair

nonsignaled signaled

Timer expires

A thread (re) initializes the timer

Kernel resumes all waiting threads

Timer

nonsignaled signaled

Thread terminates

Kernel resumes all waiting threads

Thread

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 55: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

5555

Waiting

Waitable objects include

ndash Events (may be auto-reset or manual reset may be set or ldquopulsedrdquo)

ndash Mutexes (ldquomutual exclusionrdquo one-at-a-time)

ndash Semaphores (n-at-a-time)

ndash Timers

ndash Processes and Threads (signalled upon exit or terminate)

ndash Directories (change notification)

5656

Waiting

No guaranteed ordering of wait resolution

ndash If multiple threads are waiting for an object and only one thread is released (eg itrsquos a mutex or auto-reset event) which thread gets released is unpredictable

5757

Executive Synchronization

Thread waitson an object

handle

Create and initialize thread object

Initialized

Ready

Transition

Waiting

Running

Terminated

Standby

Wait is completeSet object to

signaled state

Interaction with thread scheduling

Waiting on Dispatcher Objects ndash outside the kernel

5858

Interaction bet Synchronization amp Dispatching User mode thread waits on an event objectlsquos handle

Kernel changes threadlsquos scheduling state from ready to waiting and adds thread to wait-list

Another thread sets the event

Kernel wakes up waiting threads variable priority threads get priority boost

5959

Interaction bet Synchronization amp Dispatching Dispatcher re-schedules new thread ndash it may preempt running thread it it has lower priority and issues software interrupt to initiate context switch

If no processor can be preempted the dispatcher places the ready thread in the dispatcher ready queue to be scheduled later

6060

What signals an object

Dispatcher object

System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

Owning thread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (kernel mode)

nonsignaled signaled

Owning thread or otherthread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (exported to user mode)

nonsignaled signaled

One thread releases thesemaphore freeing a resource

A thread acquires the semaphoreMore resources are not available

Kernel resumes one or more waiting threads

Semaphore

6161

A thread reinitializesthe thread object

What signals an object (contd)

Dispatcher object System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

A thread sets the event

Kernel resumes one or more threads

Kernel resumes one or more waiting threads

Event

nonsignaled signaled

Dedicated thread sets oneevent in the event pair

Kernel resumes theother dedicated thread

Kernel resumes waitingdedicated thread

Event pair

nonsignaled signaled

Timer expires

A thread (re) initializes the timer

Kernel resumes all waiting threads

Timer

nonsignaled signaled

Thread terminates

Kernel resumes all waiting threads

Thread

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 56: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

5656

Waiting

No guaranteed ordering of wait resolution

ndash If multiple threads are waiting for an object and only one thread is released (eg itrsquos a mutex or auto-reset event) which thread gets released is unpredictable

5757

Executive Synchronization

Thread waitson an object

handle

Create and initialize thread object

Initialized

Ready

Transition

Waiting

Running

Terminated

Standby

Wait is completeSet object to

signaled state

Interaction with thread scheduling

Waiting on Dispatcher Objects ndash outside the kernel

5858

Interaction bet Synchronization amp Dispatching User mode thread waits on an event objectlsquos handle

Kernel changes threadlsquos scheduling state from ready to waiting and adds thread to wait-list

Another thread sets the event

Kernel wakes up waiting threads variable priority threads get priority boost

5959

Interaction bet Synchronization amp Dispatching Dispatcher re-schedules new thread ndash it may preempt running thread it it has lower priority and issues software interrupt to initiate context switch

If no processor can be preempted the dispatcher places the ready thread in the dispatcher ready queue to be scheduled later

6060

What signals an object

Dispatcher object

System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

Owning thread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (kernel mode)

nonsignaled signaled

Owning thread or otherthread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (exported to user mode)

nonsignaled signaled

One thread releases thesemaphore freeing a resource

A thread acquires the semaphoreMore resources are not available

Kernel resumes one or more waiting threads

Semaphore

6161

A thread reinitializesthe thread object

What signals an object (contd)

Dispatcher object System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

A thread sets the event

Kernel resumes one or more threads

Kernel resumes one or more waiting threads

Event

nonsignaled signaled

Dedicated thread sets oneevent in the event pair

Kernel resumes theother dedicated thread

Kernel resumes waitingdedicated thread

Event pair

nonsignaled signaled

Timer expires

A thread (re) initializes the timer

Kernel resumes all waiting threads

Timer

nonsignaled signaled

Thread terminates

Kernel resumes all waiting threads

Thread

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 57: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

5757

Executive Synchronization

Thread waitson an object

handle

Create and initialize thread object

Initialized

Ready

Transition

Waiting

Running

Terminated

Standby

Wait is completeSet object to

signaled state

Interaction with thread scheduling

Waiting on Dispatcher Objects ndash outside the kernel

5858

Interaction bet Synchronization amp Dispatching User mode thread waits on an event objectlsquos handle

Kernel changes threadlsquos scheduling state from ready to waiting and adds thread to wait-list

Another thread sets the event

Kernel wakes up waiting threads variable priority threads get priority boost

5959

Interaction bet Synchronization amp Dispatching Dispatcher re-schedules new thread ndash it may preempt running thread it it has lower priority and issues software interrupt to initiate context switch

If no processor can be preempted the dispatcher places the ready thread in the dispatcher ready queue to be scheduled later

6060

What signals an object

Dispatcher object

System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

Owning thread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (kernel mode)

nonsignaled signaled

Owning thread or otherthread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (exported to user mode)

nonsignaled signaled

One thread releases thesemaphore freeing a resource

A thread acquires the semaphoreMore resources are not available

Kernel resumes one or more waiting threads

Semaphore

6161

A thread reinitializesthe thread object

What signals an object (contd)

Dispatcher object System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

A thread sets the event

Kernel resumes one or more threads

Kernel resumes one or more waiting threads

Event

nonsignaled signaled

Dedicated thread sets oneevent in the event pair

Kernel resumes theother dedicated thread

Kernel resumes waitingdedicated thread

Event pair

nonsignaled signaled

Timer expires

A thread (re) initializes the timer

Kernel resumes all waiting threads

Timer

nonsignaled signaled

Thread terminates

Kernel resumes all waiting threads

Thread

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 58: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

5858

Interaction bet Synchronization amp Dispatching User mode thread waits on an event objectlsquos handle

Kernel changes threadlsquos scheduling state from ready to waiting and adds thread to wait-list

Another thread sets the event

Kernel wakes up waiting threads variable priority threads get priority boost

5959

Interaction bet Synchronization amp Dispatching Dispatcher re-schedules new thread ndash it may preempt running thread it it has lower priority and issues software interrupt to initiate context switch

If no processor can be preempted the dispatcher places the ready thread in the dispatcher ready queue to be scheduled later

6060

What signals an object

Dispatcher object

System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

Owning thread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (kernel mode)

nonsignaled signaled

Owning thread or otherthread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (exported to user mode)

nonsignaled signaled

One thread releases thesemaphore freeing a resource

A thread acquires the semaphoreMore resources are not available

Kernel resumes one or more waiting threads

Semaphore

6161

A thread reinitializesthe thread object

What signals an object (contd)

Dispatcher object System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

A thread sets the event

Kernel resumes one or more threads

Kernel resumes one or more waiting threads

Event

nonsignaled signaled

Dedicated thread sets oneevent in the event pair

Kernel resumes theother dedicated thread

Kernel resumes waitingdedicated thread

Event pair

nonsignaled signaled

Timer expires

A thread (re) initializes the timer

Kernel resumes all waiting threads

Timer

nonsignaled signaled

Thread terminates

Kernel resumes all waiting threads

Thread

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 59: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

5959

Interaction bet Synchronization amp Dispatching Dispatcher re-schedules new thread ndash it may preempt running thread it it has lower priority and issues software interrupt to initiate context switch

If no processor can be preempted the dispatcher places the ready thread in the dispatcher ready queue to be scheduled later

6060

What signals an object

Dispatcher object

System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

Owning thread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (kernel mode)

nonsignaled signaled

Owning thread or otherthread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (exported to user mode)

nonsignaled signaled

One thread releases thesemaphore freeing a resource

A thread acquires the semaphoreMore resources are not available

Kernel resumes one or more waiting threads

Semaphore

6161

A thread reinitializesthe thread object

What signals an object (contd)

Dispatcher object System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

A thread sets the event

Kernel resumes one or more threads

Kernel resumes one or more waiting threads

Event

nonsignaled signaled

Dedicated thread sets oneevent in the event pair

Kernel resumes theother dedicated thread

Kernel resumes waitingdedicated thread

Event pair

nonsignaled signaled

Timer expires

A thread (re) initializes the timer

Kernel resumes all waiting threads

Timer

nonsignaled signaled

Thread terminates

Kernel resumes all waiting threads

Thread

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 60: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

6060

What signals an object

Dispatcher object

System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

Owning thread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (kernel mode)

nonsignaled signaled

Owning thread or otherthread releases mutex

Resumed thread acquires mutex

Kernel resumes one waiting thread

Mutex (exported to user mode)

nonsignaled signaled

One thread releases thesemaphore freeing a resource

A thread acquires the semaphoreMore resources are not available

Kernel resumes one or more waiting threads

Semaphore

6161

A thread reinitializesthe thread object

What signals an object (contd)

Dispatcher object System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

A thread sets the event

Kernel resumes one or more threads

Kernel resumes one or more waiting threads

Event

nonsignaled signaled

Dedicated thread sets oneevent in the event pair

Kernel resumes theother dedicated thread

Kernel resumes waitingdedicated thread

Event pair

nonsignaled signaled

Timer expires

A thread (re) initializes the timer

Kernel resumes all waiting threads

Timer

nonsignaled signaled

Thread terminates

Kernel resumes all waiting threads

Thread

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 61: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

6161

A thread reinitializesthe thread object

What signals an object (contd)

Dispatcher object System events and resultingstate change

Effect of signaled stateon waiting threads

nonsignaled signaled

A thread sets the event

Kernel resumes one or more threads

Kernel resumes one or more waiting threads

Event

nonsignaled signaled

Dedicated thread sets oneevent in the event pair

Kernel resumes theother dedicated thread

Kernel resumes waitingdedicated thread

Event pair

nonsignaled signaled

Timer expires

A thread (re) initializes the timer

Kernel resumes all waiting threads

Timer

nonsignaled signaled

Thread terminates

Kernel resumes all waiting threads

Thread

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 62: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

6262

Further Reading

Mark E Russinovich and David A Solomon Microsoft Windows Internals 4th Edition Microsoft Press 2004

Chapter 3 - System Mechanisms

ndash Trap Dispatching (pp 85 ff)

ndash Synchronization (pp 149 ff)

ndash Kernel Event Tracing (pp 175 ff)

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 63: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

6363

33 Advanced Windows Synchronization

Deferred and Asynchronous Procedure Calls

IRQLs and CPU Time Accounting

Wait Queues amp Dispatcher Objects

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 64: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

6464

Used to defer processing from higher (device) interrupt level to a lower (dispatch) levelndash Also used for quantum end and timer expiration

Driver (usually ISR) queues requestndash One queue per CPU DPCs are normally queued to the current processor but can be targeted to other CPUs

ndash Executes specified procedure at dispatch IRQL (or ldquodispatch levelrdquo also ldquoDPC levelrdquo) when all higher-IRQL work (interrupts) completed

ndash Maximum times recommended ISR 10 usec DPC 25 usec

See httpwwwmicrosoftcomwhdcdriverperformmmdrvmspx

Deferred Procedure Calls (DPCs)

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 65: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

6565

queue head DPC object DPC object DPC object

Deferred Procedure Calls (DPCs)

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 66: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

6666

DPC

Delivering a DPC

DPC routines can call kernel functionsbut canlsquot call system services generatepage faults or create or wait on objects

DPC routines canlsquotassume whatprocess addressspace is currentlymapped

Interruptdispatch table

high

Power failure

DispatchDPC

APC

Low

DPC

1 Timer expires kernelqueues DPC that willrelease all waiting threadsKernel requests SW int

DPCDPC

DPC queue

2 DPC interrupt occurswhen IRQL drops belowdispatchDPC level

dispatcher

3 After DPC interruptcontrol transfers tothread dispatcher

4 Dispatcher executes each DPCroutine in DPC queue

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 67: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

6767

Asynchronous Procedure Calls (APCs)

Execute code in context of a particular user threadndash APC routines can acquire resources (objects) incur page faultscall system services

APC queue is thread-specific

User mode amp kernel mode APCsndash Permission required for user mode APCs

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 68: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

6868

Asynchronous Procedure Calls (APCs)

Executive uses APCs to complete work in thread spacendash Wait for asynchronous IO operation

ndash Emulate delivery of POSIX signals

ndash Make threads suspendterminate itself (env subsystems)

APCs are delivered when thread is in alertable wait statendash WaitForMultipleObjectsEx() SleepEx()

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 69: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

6969

Special kernel APCs

ndash Run in kernel mode at IRQL 1

ndash Always deliverable unless thread is already at IRQL 1 or above

ndash Used for IO completion reporting from ldquoarbitrary thread contextrdquo

ndash Kernel-mode interface is linkable but not documented

Asynchronous Procedure Calls (APCs)

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 70: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

7070

ldquoOrdinaryrdquo kernel APCs

ndash Always deliverable if at IRQL 0 unless explicitly disabled (disable with KeEnterCriticalRegion)

User mode APCs

ndash Used for IO completion callback routines (see ReadFileEx WriteFileEx) also QueueUserApc

ndash Only deliverable when thread is in ldquoalertable waitrdquo

Asynchronous Procedure Calls (APCs)

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 71: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

7171

ThreadObject

K

U

APC objects

Asynchronous Procedure Calls (APCs)

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 72: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

7272

IRQLs and CPU Time Accounting

Interval clock timer ISR keeps track of time

Clock ISR time accounting

ndash If IRQLlt2 charge to threadrsquos user or kernel time

ndash If IRQL=2 and processing a DPC charge to DPC time

ndash If IRQL=2 amp not processing a DPC charge to thread kernel time

ndash If IRQLgt2 charge to interrupt time

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 73: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

7373

IRQLs and CPU Time Accounting

Since time servicing interrupts are NOT charged to interrupted thread if system is busy but no process appears to be running must be due to interrupt-related activity

ndash Note time at IRQL 2 or more is charged to the current threadrsquos quantum (to be described)

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 74: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

7474

Interrupt Time Accounting

Task Manager includes interrupt and DPC time with the Idle process time

Interrupt activity is not charged to any threadprocess

ndash Process Explorer shows these as separate processesnot really processes

ndash Context switches for these are really of interrupts amp DPCs

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 75: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

7575

Time Accounting Quirks

Looking at total CPU time for each process may not reveal where system has spent its time

CPU time accounting is driven by programmable interrupt timer

ndash Normally 10 msec (15 msec on some MP Pentiums)

Thread execution and context switches between clock intervals NOT accounted

ndash Eg one or more threads run and enter a wait state before clock fires

ndash Thus threads may run but never get charged

View context switch activity with Process Explorer

ndash Add Context Switch Delta column

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 76: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

7676

For waiting threads user-mode utilities only display the wait reason

Example pstat

Looking at Waiting Threads

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 77: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

7777

Wait Internals 1 Dispatcher Objects

Size TypeState

Wait listhead

Object-type-specific data

DispatcherObject

(see ntddkincddkntddkh)

Any kernel object you can wait for is a ldquodispatcher objectrdquo

ndash some exclusively for synchronizationeg events mutexes (ldquomutantsrdquo) semaphores queues timers

ndash others can be waited for as a side effect of their prime function eg processes threads file objects

ndash non-waitable kernel objects are called ldquocontrol objectsrdquo

All dispatcher objects have a common header

All dispatcher objects are in one of two states

ndash ldquosignaledrdquo vs ldquononsignaledrdquo

ndash when signalled a wait on the object is satisfied

ndash different object types differ in terms of what changes their state

ndash wait and unwait implementation iscommon to all types of dispatcher objects

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 78: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

7878

Object-type-specific data

Wait Internals 2Wait Blocks

Size TypeState

Wait listhead

Size TypeState

Wait listhead

Represent a threadrsquos reference to something itrsquos waiting for (one per handle passed to WaitForhellip)

All wait blocks from a given wait call are chained to the waiting thread

Type indicates wait for ldquoanyrdquo or ldquoallrdquo Key denotes argument list position for

WaitForMultipleObjects

Object-type-specific data

DispatcherObjects

Thread Objects

WaitBlockListWaitBlockList

Wait blocks

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

Key TypeNext link

List entry

ObjectThread

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 79: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

7979

34 Windows APIs for Synchronization and IPC Windows API constructs for synchronization and interprocess communication

Synchronization

ndash Critical sections

ndash Mutexes

ndash Semaphores

ndash Event objects

Synchronization through interprocess communication

ndash Anonymous pipes

ndash Named pipes

ndash Mailslots

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 80: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

8080

Critical Sections

Only usable from within the same process

Critical sections are initialized and deleted but do not have handles

Only one thread at a time can be in a critical section

A thread can enter a critical section multiple times - however the number of Enter- and Leave-operations must match

Leaving a critical section before entering it may cause deadlocks

No way to test whether another thread is in a critical section

VOID InitializeCriticalSection( LPCRITICAL_SECTION sec )VOID DeleteCriticalSection( LPCRITICAL_SECTION sec )

VOID EnterCriticalSection( LPCRITICAL_SECTION sec ) VOID LeaveCriticalSection( LPCRITICAL_SECTION sec )BOOL TryEnterCriticalSection ( LPCRITICAL_SECTION sec )

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 81: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

8181

Critical Section Example

counter is global shared by all threads

volatile int counter = 0

CRITICAL_SECTION crit

InitializeCriticalSection ( ampcrit )

hellip main loop in any of the threads

while (done)

_try

EnterCriticalSection ( ampcrit )

counter += local_value

LeaveCriticalSection ( ampcrit )

_finally LeaveCriticalSection ( ampcrit )

DeleteCriticalSection( ampcrit )

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 82: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

8282

Synchronizing Threads with Kernel Objects

The following kernel objects can be used to synchronize threads

ndash Processes

ndash Threads

ndash Files

ndash Console input

File change notificationsFile change notifications

MutexesMutexes

Events (auto-reset + manual-reset)Events (auto-reset + manual-reset)

Waitable timersWaitable timers

DWORD WaitForSingleObject( HANDLE hObject DWORD dwTimeout )

DWORD WaitForMultipleObjects( DWORD cObjects LPHANDLE lpHandles BOOL bWaitAll DWORD dwTimeout )

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 83: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

8383

Wait Functions - Details

WaitForSingleObject()ndash hObject specifies kernel object

ndash dwTimeout specifies wait time in msecdwTimeout == 0 - no wait check whether object is signaled

dwTimeout == INFINITE - wait forever

WaitForMultipleObjects()ndash cObjects lt= MAXIMUM_WAIT_OBJECTS (64)

ndash lpHandles - pointer to array identifying these objects

ndash bWaitAll - whether to wait for first signaled object or all objectsFunction returns index of first signaled object

Side effectsndash Mutexes auto-reset events and waitable timers will be reset to non-signaled state after completing wait functions

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 84: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

8484

Mutexes

Mutexes work across processes

First thread has to call CreateMutex()

When sharing a mutex second thread (process) calls CreateMutex() or OpenMutex()

fInitialOwner == TRUE gives creator immediate ownership

Threads acquire mutex ownership using WaitForSingleObject() or WaitForMultipleObjects()

ReleaseMutex() gives up ownership

CloseHandle() will free mutex object

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 85: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

8585

Mutexes

HANDLE CreateMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

HANDLE OpenMutex( LPSECURITY_ATTRIBUTE lpsaBOOL fInitialOwner LPTSTR lpszMutexName )

BOOL ReleaseMutex( HANDLE hMutex )

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 86: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

8686

Mutex Example

counter is global shared by all threads

volatile int done counter = 0

HANDLE mutex = CreateMutex( NULL FALSE NULL )

main loop in any of the threads ret is local

DWORD ret

while (done)

ret = WaitForSingleObject( mutex INFINITE )

if (ret == WAIT_OBJECT_0)

counter += local_value

else mutex was abandoned

break exit the loop

ReleaseMutex( mutex )

CloseHandle( mutex )

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 87: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

8787

Comparison - POSIX mutexes

POSIX pthreads specification supports mutexesndash Synchronization among threads in same process

Five basic functionsndash pthread_mutex_init()

ndash pthread_mutex_destroy()

ndash pthread_mutex_lock()

ndash pthread_mutex_unlock()

ndash pthread_mutex_trylock()

Comparisonndash pthread_mutex_lock() will block - equivalent to WaitForSingleObject( hMutex )

ndash pthread_mutex_trylock() is nonblocking (polling) - equivalent to WaitForSingleObject() with timeout == 0

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 88: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

8888

Semaphores

Semaphore objects are used for resource countingndash A semaphore is signaled when count gt 0

Threadsprocesses use wait functionsndash Each wait function decreases semaphore count by 1

ndash ReleaseSemaphore() may increment count by any value

ndash ReleaseSemaphore() returns old semaphore count

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 89: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

8989

Semaphores

HANDLE CreateSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE OpenSemaphore( LPSECURITY_ATTRIBUTE lpsaLONG cSemInit LONG cSemMaxLPTSTR lpszSemName )

HANDLE ReleaseSemaphore( HANDLE hSemaphoreLONG cReleaseCount LPLONG lpPreviousCount )

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 90: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

9090

Events

Multiple threads can be released when a single event is signaled (barrier synchronization)ndash Manual-reset event can signal several thread simultaneously must be reset manually

ndash PulseEvent() will release all threads waiting on a manual-reset event and automatically reset the event

ndash Auto-reset event signals a single thread event is reset automatically

ndash fInitialState == TRUE - create event in signaled state

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 91: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

9191

Events

HANDLE CreateEvent( LPSECURITY_ATTRIBUTE lpsaBOOL fManualReset BOOL fInititalStateLPTSTR lpszEventName )

BOOL SetEvent( HANDLE hEvent )BOOL ResetEvent( HANDLE hEvent )BOOL PulseEvent( HANDLE hEvent )

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 92: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

9292

Comparison - POSIX condition variables

pthreadrsquos condition variables are comparable to eventsndash pthread_cond_init()

ndash pthread_cond_destroy()

Wait functionsndash pthread_cond_wait()

ndash pthread_cond_timedwait()

Signalingndash pthread_cond_signal() - one thread

ndash pthread_cond_broadcast() - all waiting threads

No exact equivalent to manual-reset events

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 93: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

9393

Anonymous pipes

BOOL CreatePipe( PHANDLE phReadPHANDLE phWriteLPSECURITY_ATTRIBUTES lpsaDWORD cbPipe )

main

prog1 prog2pipe

Half-duplex character-based IPC

cbPipe pipe byte size zero == default

Read on pipe handle will block if pipe is empty

Write operation to a full pipe will block

Anonymous pipes are oneway

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 94: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

9494

IO Redirection using an Anonymous Pipe

Create default size anonymous pipe handles are inheritable

if (CreatePipe (amphReadPipe amphWritePipe ampPipeSA 0))

fprintf(stderr ldquoAnon pipe create failednrdquo) exit(1)

Set output handle to pipe handle create first processes

StartInfoCh1hStdInput = GetStdHandle (STD_INPUT_HANDLE)

StartInfoCh1hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh1hStdOutput = hWritePipe

StartInfoCh1dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)Command1 NULL NULL TRUE 0 NULL NULL ampStartInfoCh1 ampProcInfo1))

fprintf(stderr ldquoCreateProc1 failednrdquo) exit(2)

CloseHandle (hWritePipe)

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 95: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

9595

Pipe example (contd)

Repeat (symmetrically) for the second process

StartInfoCh2hStdInput = hReadPipe

StartInfoCh2hStdError = GetStdHandle (STD_ERROR_HANDLE)

StartInfoCh2hStdOutput = GetStdHandle (STD_OUTPUT_HANDLE)

StartInfoCh2dwFlags = STARTF_USESTDHANDLES

if (CreateProcess (NULL (LPTSTR)targv NULL NULLTRUE Inherit handles

0 NULL NULL ampStartInfoCh2 ampProcInfo2))

fprintf(stderr ldquoCreateProc2 failednrdquo) exit(3)

CloseHandle (hReadPipe)

Wait for both processes to complete

WaitForSingleObject (ProcInfo1hProcess INFINITE)

WaitForSingleObject (ProcInfo2hProcess INFINITE)

CloseHandle (ProcInfo1hThread) CloseHandle (ProcInfo1hProcess)

CloseHandle (ProcInfo2hThread) CloseHandle (ProcInfo2hProcess)

return 0

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 96: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

9696

Named Pipes

Message orientedndash Reading process can read varying-length messages precisely as sent by the writing process

Bi-directionalndash Two processes can exchange messages over the same pipe

Multiple independent instances of a named pipendash Several clients can communicate with a single server using the same instance

ndash Server can respond to client using the same instance

Pipe can be accessed over the networkndash location transparency

Convenience and connection functions

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 97: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

9797

Using Named Pipes

HANDLE CreateNamedPipe (LPCTSTR lpszPipeNameDWORD fdwOpenMode DWORD fdwPipModeDWORD nMaxInstances DWORD cbOutBufDWORD cbInBuf DWORD dwTimeOutLPSECURITY_ATTRIBUTES lpsa )

Use same flag settings forall instances of a named pipe

lpszPipeName pipe[path]pipename

ndash Not possible to create a pipe on remote machine ( ndash local machine)

fdwOpenMode

ndash PIPE_ACCESS_DUPLEX PIPE_ACCESS_INBOUND PIPE_ACCESS_OUTBOUND

fdwPipeMode

ndash PIPE_TYPE_BYTE or PIPE_TYPE_MESSAGE

ndash PIPE_READMODE_BYTE or PIPE_READMODE_MESSAGE

ndash PIPE_WAIT or PIPE_NOWAIT (will ReadFile block)

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 98: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

9898

Named Pipes (contd)

BOOL PeekNamedPipe (HANDLE hPipeLPVOID lpvBuffer DWORD cbBufferLPDWORD lpcbRead LPDWORD lpcbAvailLPDWORD lpcbMessage)

nMaxInstances

ndash Number of instances

ndash PIPE_UNLIMITED_INSTANCES OS choice based on resources

dwTimeOut

ndash Default time-out period (in msec) for WaitNamedPipe()

First CreateNamedPipe creates named pipe

ndash Closing handle to last instance deletes named pipe

Polling a pipe

ndash Nondestructive ndash is there a message waiting for ReadFile

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 99: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

9999

Named Pipe Client Connections

CreateFile with named pipe namendash pipe[path]pipename

ndash servernamepipe[path]pipename

ndash First method gives better performance (local server)

Status Functionsndash GetNamedPipeHandleState

ndash SetNamedPipeHandleState

ndash GetNamedPipeInfo

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 100: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

100100

Convenience Functions

BOOL TransactNamedPipe( HANDLE hNamedPipeLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDOWRD lpcbRead LPOVERLAPPED lpa)

WriteFile ReadFile sequence

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 101: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

101101

Convenience Functions

BOOL CallNamedPipe( LPCTSTR lpszPipeNameLPVOID lpvWriteBuf DWORD cbWriteBufLPVOID lpvReadBuf DWORD cbReadBufLPDWORD lpcbRead DWORD dwTimeOut)

CreateFile WriteFile ReadFile CloseHandle

ndash dwTimeOut NMPWAIT_NOWAIT NMPWAIT_WIAT_FOREVER NMPWAIT_USE_DEFAULT_WAIT

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 102: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

102102

Server eliminate the polling loop

BOOL ConnectNamedPipe (HANDLE hNamedPipeLPOVERLAPPED lpo

lpo == NULLndash Call will return as soon as there is a client connection

ndash Returns false if client connected between CreateNamed Pipe calland ConnectNamedPipe()

Use DisconnectNamedPipe to free the handle for connection from another client

WaitNamedPipe()ndash Client may wait for serverlsquos ConnectNamedPipe()

Security rights for named pipesndash GENERIC_READ GENERIC_WRITE SYNCHRONIZE

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 103: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

103103

Comparison with UNIX

UNIX FIFOs are similar to a named pipe

ndash FIFOs are half-duplex

ndash FIFOs are limited to a single machine

ndash FIFOs are still byte-oriented so its easiest to use fixed-size records in clientserver applications

ndash Individual readwrites are atomic

A server using FIFOs must use a separate FIFO for each clientlsquos response although all clients can send requests via a single well known FIFO

Mkfifo() is the UNIX counterpart to CreateNamedPipe()

Use sockets for networked clientserver scenarios

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 104: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

104104

Client Example using Named Pipe

WaitNamedPipe (ServerPipeName NMPWAIT_WAIT_FOREVER)

hNamedPipe = CreateFile (ServerPipeName GENERIC_READ | GENERIC_WRITE

0 NULL OPEN_EXISTING FILE_ATTRIBUTE_NORMAL NULL)

if (hNamedPipe == INVALID_HANDLE_VALUE)

fptinf(stderr Failure to locate servern) exit(3)

Write the request

WriteFile (hNamedPipe ampRequest MAX_RQRS_LEN ampnWrite NULL)

Read each response and send it to std out

while (ReadFile (hNamedPipe ResponseRecord MAX_RQRS_LEN ampnRead NULL))

printf (s ResponseRecord)

CloseHandle (hNamedPipe)

return 0

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 105: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

105105

Server Example Using a Named Pipe

hNamedPipe = CreateNamedPipe (SERVER_PIPE PIPE_ACCESS_DUPLEX

PIPE_READMODE_MESSAGE | PIPE_TYPE_MESSAGE | PIPE_WAIT

1 0 0 CS_TIMEOUT pNPSA)

while (Done)

printf (Server is awaiting next requestn)

if (ConnectNamedPipe (hNamedPipe NULL)

|| ReadFile (hNamedPipe ampRequest RQ_SIZE ampnXfer NULL))

fprintf(stderr ldquoConnect or Read Named Pipe errornrdquo) exit(4)

printf( ldquoRequest is sn RequestRecord)

Send the file one line at a time to the client

fp = fopen (File r)

while ((fgets (ResponseRecord MAX_RQRS_LEN fp) = NULL))

WriteFile (hNamedPipe ampResponseRecord

(strlen(ResponseRecord) + 1) TSIZE ampnXfer NULL)

fclose (fp)

DisconnectNamedPipe (hNamedPipe)

End of server operation

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 106: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

106106

Win32 IPC - MailslotsMailslots bear some nasty implementation detailsthey are almost never used

Broadcast mechanism

ndash One-directional

ndash Mutliple writersmultiple readers (frequently one-to-many comm)

ndash Message delivery is unreliable

ndash Can be located over a network domain

ndash Message lengths are limited (w2k lt 426 byte) Operations on the mailslot

ndash Each reader (server) creates mailslot with CreateMailslot()

ndash Write-only client opens mailslot with CreateFile() and uses WriteFile() ndash open will fail if there are no waiting readers

ndash Clientlsquos message can be read by all servers (readers) Client lookup mailslotmailslotname

ndash Client will connect to every server in network domain

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 107: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

107107

Locate a server via mailslot

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

hMS = CreateMailslot( ldquomailslotstatusldquo)ReadFile(hMS ampServStat) connect to server

App client 0

App client n

Mailslot Servers

While () Sleep() hMS = CreateFile( ldquomailslotstatusldquo)

WriteFile(hMS ampStatInfo

App Server

Mailslot Client

Message is sent periodically

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 108: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

108108

Creating a mailslot

HANDLE CreateMailslot(LPCTSTR lpszNameDWORD cbMaxMsgDWORD dwReadTimeoutLPSECURITY_ATTRIBUTES lpsa)

lpszName points to a name of the formndash mailslot[path]name

ndash Name must be unique mailslot is created locally

cbMaxMsg is msg size in byte

dwReadTimeout ndash Read operation will wait for so many msec

ndash 0 ndash immediate return

ndash MAILSLOT_WAIT_FOREVER ndash infinite wait

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 109: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

109109

Opening a mailslot

CreateFile with the following namesndash mailslot[path]name - retrieve handle for local mailslot

ndash hostmailslot[path]name - retrieve handlefor mailslot on specified host

ndash domainmailslot[path]name - returns handle representing all mailslots on machines in the domain

ndash mailslot[path]name - returns handle representing mailslots on machines in the systemlsquos primary domain max mesg len 400 bytes

ndash Client must specifiy FILE_SHARE_READ flag

GetMailslotInfo() and SetMailslotInfo() are similar to their named pipe counterparts

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活
Page 110: Unit 3: Concurrency Instructor: Hengming Zou, Ph.D. In Pursuit of Absolute Simplicity 求于至简,归于永恒.

Thoughts Change Life意念改变生活

  • Unit 3 Concurrency
  • Interrupt Dispatching
  • Thoughts Change Life 意念改变生活