Scott O’Connor – Engineer Group Lead Desirée Wolfgramm – Senior Engineer Energy Northwest
David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer...
-
date post
21-Dec-2015 -
Category
Documents
-
view
217 -
download
1
Transcript of David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer...
![Page 1: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/1.jpg)
Concurrent, Multi-core Programming on Windows and .NET Pre-conference Session
David CallahanDistinguished EngineerMicrosoft Corporation
Joe DuffyLead Software EngineerMicrosoft Corporation
Stephen ToubLead Program ManagerMicrosoft Corporation
![Page 2: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/2.jpg)
Overview and Architecture Mechanisms for Asynchrony Lunch Topics in Synchronization Synchronization Best Practices Break Designs and Algorithms .NET Framework 4.0 Wrap-up
Agenda
![Page 3: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/3.jpg)
Overview and Architecture Mechanisms for Asynchrony Lunch Topics in Synchronization Synchronization Best Practices Break Designs and Algorithms .NET Framework 4.0 Wrap-up
Agenda
![Page 4: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/4.jpg)
Overview and Architecture
•Parallel computing mattersThe Shift to Manycore
•Parallel computing conceptsFoundations
•Concerns, top-downTechniques
![Page 5: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/5.jpg)
Moore's Law
“That means by 1975, the number of components per integrated circuit for minimum cost will be 65,000. I believe that such a large circuit can be built on a single wafer.” -- Intel co-founder Gordon Moore in 1965
Quad-core Nehalem announced at IDF in 2007: 731 Million transistors (more than 13 doublings later…)
![Page 6: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/6.jpg)
Spending Moore's Dividend
Windows 3.1 NT 3.51 Windows 95 Windows 98 Windows 2000
Windows XP Windows XP SP2
Vista Premium
1.0
10.0
100.0
1000.0
Processor (SPECInt)Memory (MB)Disk (MB)
Thanks to Jim Larus of Microsoft Research
52% CAGR in Spec Performance!Attack of the Killer Micros!Software is a Gas!
![Page 7: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/7.jpg)
The Power Wall
10,000
1,000
100
10
1
‘70 ‘80 ‘90 ‘00 ‘10
Pow
er D
ensi
ty (
W/c
m2)
40048008
8080
8085
8086
286386
486
Pentium® processors
Hot Plate
Nuclear Reactor
Rocket Nozzle
Sun’s Surface
Dr. Pat Gelsinger, Sr. VP, Intel Corporation and GM, Digital Enterprise Group, February 19, 2004, Intel Developer Forum, Spring 2004
The Memory WallThe ILP Wall
Single-thread software performance will not be improving (much)
![Page 8: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/8.jpg)
The Manycore Shift
Future processors: More cores not faster Optimized for parallel workloads
Intel Larrabee
![Page 9: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/9.jpg)
Return of the free lunch = scalable parallel programs
![Page 10: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/10.jpg)
Parallel Computing
Latent parallelism for future scaling
Focus on data – the scalable dimension
Tasks instead of threads
No silver bullet – many “right” approaches
![Page 11: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/11.jpg)
Outline for the rest of this session
Running example Parallelism, Concurrency, “multi-threading” System Attributes Approaches to Parallelism Mutual-exclusion Concurrent Data Structures The Rabbit Hole of Performance The Black Hole of Correctness
![Page 12: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/12.jpg)
Example: Connected Components
foreach node do node.component = nullforeach node do if(node.component == null) then node.component = new Component;
roots.add(node); dfsearch(node)
fi
Identify connected components and map every node to its containing
componentAll code in this talk is pseudo-
code
![Page 13: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/13.jpg)
Example: Connected Components
function dfsearch(n) foreach m in adjacent(n) do
if(m.component == null) do m.component = node.component
dfsearch(m)fi
![Page 14: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/14.jpg)
Start concurrent searches at arbitrary nodes Each is a candidate
component Identify where searches “collide”
Arbitrate “ownership” of the nodes Record that two candidates connect
Candidates & connections form a reduced graph Recursively find components on reduced graph
Update nodes to refer to final components
Parallelization Strategy
![Page 15: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/15.jpg)
Parallelism v. Concurrency
Concurrent processing: independent requests
(most server applications)
Parallel processing: decompose one task to enable concurrent
execution
“Start concurrent searches …”“Arbitrate “ownership” of the nodes”
Scheduling tasksSimulating isolation of threads
Multi-threading, Asynchronous, …
![Page 16: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/16.jpg)
Fairness: allocating resources to competing uses
Preemption: removing resources from one use to enable another
Responsiveness: latency from request to response
Throughput: work (requests) per unit time
System Metrics
For parallelism, not a goal but a context
Existing architectural concern
Drive overheads down
![Page 17: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/17.jpg)
Task v. Data Parallelism Structured Multi-threading Work-sharing (SPMD, OpenMP)
Dataflow
Approaches to ParallelismWays to structure the code
![Page 18: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/18.jpg)
Task v. Data
parallel foreach node do node.component = NULL
parallel foreach node do …start a parallel search …
Classically data parallel: same
operation applied to an homogenous
collection
Data-focused but built on an underlying
“task” model for generality
![Page 19: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/19.jpg)
Structured Multi-threading
function dfsearch(n) parallel foreach m in adjacent(n) do
if(… first to visit m …) dfsearch(m)fi
• Emphasize recursive decomposition
• Preserves function interfaces• “fork-join”
• Structured control constructs• Parallel loops, co-begin
Each iteration is a taskAll tasks finish before function returns
![Page 20: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/20.jpg)
Work-sharing
Parallel -- acquire workers shared foreach node do node.component = NULL -- implied barrier, workers wait shared foreach node do …start a parallel seearch … -- release workers
• Emphasizes processors• “fork –join” threads + barrier
• Structured control constructs• “shared loops”• Improving support for
recursion
OpenMP is the common binding
of this model
Resource Management
is too hard
![Page 21: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/21.jpg)
A computation is a network Data flows along connections Nodes consume, transform, and produce data
A good fit for Streaming media algorithms Concurrent event systems Component interactions for responsiveness Occasionally in recursive algorithms
Dataflow
m1 m2 m3 m4 m5 m6 m7
c11 c21c12 c22
Data flow graph for subtasks of Strassen Multiplication
![Page 22: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/22.jpg)
Mutual Exclusion
function dfsearch(n) foreach m in adjacent(n) do
if(m.component == NULL) do m.component = n.component
dfsearch(m) fi
Search 1 Search 2
If(m.component == null)? If(m.component == null)?
m.Component = n1.component m.Component = n2.component
dfsearch(m) dfsearch(m) !
Identify where searches “collide”
Arbitrate “ownership” of the nodes
![Page 23: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/23.jpg)
Locking for Mutual Exclusion
function dfsearch(n) foreach m in adjacent(n) do
m.lock(); var old = m.component; if(old == NULL) m.component = n.component m.unlock(); if(old == NULL) then
dfsearch(m) else if (old != n.component) then
-- record the “edge” between searches endif
Locks provide exclusion but the algorithm correction depends on careful reasoning that order does not matter
One action at a time for any specific node
![Page 24: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/24.jpg)
Lock-free Techniques
word compare_and_swap(word * loc, word oldv, word newv) {
word current = *loc;if(current == oldv) *loc = newv;return current;
}
function dfsearch(n) foreach m in adjacent(n) do
var old = compare_and_swap(&m.component, NULL, n.component)
if(old == NULL) then
Common hardware primitive
•Short duration•Preemption friendly•Limited scenarios
![Page 25: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/25.jpg)
Concurrent Data Structures
function dfsearch(n,edges) foreach m in adjacent(n) do
m.lock(); -- Arbitrate “ownership” of the nodes
var old = m.component; if(old == NULL) m.component = n.component m.unlock(); if(old == null) then
dfsearch(m,edges) else if (old != n.component) then edges.insert(old, n.component) endif
Identify where searches “collide”
…Record that two candidates connect
Concurrency SafeHigh-Bandwidth
![Page 26: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/26.jpg)
Example top level
parallel foreach node do node = NULL
parallel foreach node do node.lock() var old = node.component if(old == NULL) node.component = new Component node.unlock() if(old == NULL) then
roots.add(node) dfsearch(node, edges)
fi-- (roots, edges) form a derived problem
![Page 27: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/27.jpg)
Moore’s Dividend: Sequential Parallel Identify computations that are currently or may
grow to be performance concerns Over-decompose for scaling (the new Free
Lunch) Structured multi-threading with a data focus
Relax sequential order to gain more parallelism Ensure atomicity of unordered interactions
Consider data as well as control flow Careful data structure & locking choices to
manage contention
Parallel Computing
![Page 28: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/28.jpg)
The Rabbit Hole of Performance
1 2 3 4 5 6 7 8 9 10 11 12 13 140
20
40
60
80
100
120
Time 95% Efficiency 95%Efficiency 99%
Processors
A program that is 95% (99%) with 3% overhead to parallelize
ContentionLoad BalanceCache EffectsLatenciesPreemption
![Page 29: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/29.jpg)
Data races Deadlock Livelock Memory hierarchy Performance robustness Reproducibility & Testing
The Black Hole of Correctness
![Page 30: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/30.jpg)
Microsoft Visual Studio: Bringing out the Best in Multicore SystemsDate/Time: Monday, Oct. 27 1:45 PM – 3:00 PM
PDC Parallelism Sessions
Parallel Programming for C++ Developers in the Next Version of Microsoft Visual StudioDate/Time: Monday, Oct. 27 3:30PM – 4:45 PM
Parallel Programming for Managed Developers with the Next Version of Microsoft Visual StudioDate/Time: Wednesday, Oct. 29 10:30AM– 11:45AMConcurrency Runtime Deep Dive: How to Harvest Multicore Computing ResourcesDate/Time: Wednesday, Oct. 29 1:15 PM – 2:30 PM
Research: Concurrency Analysis Platform and Tools for Finding Concurrency BugsDate/Time: Wednesday Oct. 29 10:30 AM – 11:45 AM
The Concurrency and Coordination Runtime and Decentralized Software Services ToolkitDate/Time: Tuesday, Oct. 28 1:45PM – 3:00 PM
Parallel Computing Application Architectures and Opportunities Date/Time: Thursday, Oct. 30 10:15AM – 11:45AM
Addressing the Hard Problems of Concurrency Date/Time: Thursday, Oct. 30 8:30AM – 10:00AM
Future of Parallel Computing (Panel) Date/Time: Thursday, Oct. 30 12:00PM – 1:30PM
![Page 31: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/31.jpg)
Overview and Architecture Mechanisms for Asynchrony Lunch Topics in Synchronization Synchronization Best Practices Break Designs and Algorithms .NET Framework 4.0 Wrap-up
Agenda
![Page 32: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/32.jpg)
System.Threading.Thread Exposes Windows threads
Explicit control over concurrent work Join: wait for it to exit IsBackground: allows process to exit while threads are alive Interrupt: dangerous! Abort: even more dangerous! Suspend/Resume: run away!!
Cons: Too expensive for fine-grained work No process-wide resource management
Scheduling: Explicit ThreadingFor coarse-grained work and agents
Thread t = new Thread(delegate{ // concurrent work});t.Start();
![Page 33: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/33.jpg)
System.Threading.ThreadPool Executes work with a pool of Threads
Manages queues of work items and shared threads Each thread dispatches work items from the queue
Improvements for .NET 4.0 to be discussed later CLR controls # of threads to ensure good scaling
Best for fine-grained (task and data) parallelism Short-lived work items that don’t (or seldom) block Also used for async I/O completion (FileStream.BeginRead) Prefer over asynchronous delegate invocation
Cons: Blocking on the thread pool can lead to deadlocks / degradation Not straightforward to wait, cancel, continue from, etc.
Scheduling: ThreadPool For fine-grained work
ThreadPool.QueueUserWorkItem(delegate{ // concurrent work});
![Page 34: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/34.jpg)
ThreadPool
ThreadPool in Action
QueueWorker
Thread 1Worker
Thread p
Program Thread
…
Item 1Item 2Item 3
Item 4Item 5
Item 6
![Page 35: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/35.jpg)
System.Threading.ThreadPool Executes work with a pool of Threads
Manages queues of work items and shared threads Each thread dispatches work items from the queue
Improvements for .NET 4.0 to be discussed later CLR controls # of threads to ensure good scaling
Best for fine-grained (task and data) parallelism Short-lived work items that don’t (or seldom) block Also used for async I/O completion (FileStream.BeginRead) Prefer over asynchronous delegate invocation
Cons: Blocking on the thread pool can lead to deadlocks / degradation Not straightforward to wait, cancel, continue from, etc.
Scheduling: ThreadPool For fine-grained work
ThreadPool.QueueUserWorkItem(delegate{ // concurrent work});
![Page 36: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/36.jpg)
Two pools of work/threads One for work items (QueueUserWorkItem) One for IO completions Both are process-wide
Min/max threads Default: min = 0, max = 250 * CPU Max used to be the cause of frequent deadlocks
Fixed in the .NET Framework 3.5 Thread injection & retirement is automatic
CLR has a daemon thread that watches for blocking
Throttles creation at 2 threads/second Min is often used to reduce “startup” latency
Scheduling: ThreadPoolAdvanced capabilities
![Page 37: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/37.jpg)
API convention supported by much of .NET Synch API
Async APIs
“Begin” schedules work to run asynchronously “End” retrieves the operation return value or throws exception Rendezvous options
Blocking Wait on IAsyncResult.AsyncWaitHandle
Callback Pass an AsyncCallback to “Begin”; runs when completed
Polling Check IAsyncResult.IsCompleted
Must call “End” (in almost all cases)
Scheduling: Async Prog Model (APM)Common Async API Pattern in the Framework
int Foo(object o, string s);
IAsyncResult BeginFoo(object o, string s, AsyncCallback callback, object state);int EndFoo(IAsyncResult result);
![Page 38: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/38.jpg)
I/O completion ports Bound to file HANDLEs and network sockets Async I/O is entirely async in hardware Interrupt posts packet to port when complete Threads block on port to be notified of packets Windows throttles waking for good scaling
CLR ThreadPool has a single, global port All async I/O via APM goes through it Pool of thread-pool I/O threads wait on the port Highly efficient: # of blocked threads is amortized
Can post to it directly using ThreadPool API
Scheduling: I/O Completion PortsEfficient async I/O on Windows
public static unsafe bool UnsafeQueueNativeOverlapped(NativeOverlapped* overlapped)
![Page 39: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/39.jpg)
.NET UI frameworks have thread affinity Controls must be accessed on creating thread
Each framework has marshaling mechanism Windows Forms (Control.Invoke / BeginInvoke)
WPF (Dispatcher.Invoke/BeginInvoke)
Background Work and UIsUI Marshaling
// on background threadControl c = …;c.BeginInvoke((Action)delegate{ // runs on UI thread});
// on background threadControl c = …;c.Dispatcher.BeginInvoke((Action)delegate{ // runs on UI thread});
![Page 40: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/40.jpg)
SynchronizationContext hides marshaling details Send (sync) and Post (async)
SynchronizationContext Send: d() Post: ThreadPool.QueueUserWorkItem
WindowsFormsSynchronizationContext Send: Control.Invoke Post: Control.BeginInvoke
DispatcherSynchronizationContext Send: Dispatcher.Invoke Post: Dispatcher.BeginInvoke
… SynchronizationContext.Current AsyncOperationManager.SynchronizationContext
Background Work and UIsSynchronization Context
![Page 41: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/41.jpg)
Performs work on the right thread Heavy lifting on ThreadPool thread
DoWork event Progress reporting and completion on UI thread
ProgressChanged and RunWorkerCompleted events Kicked off by a call to RunAsync
Also supports cancelation Initiated by CancelAsync DoWork needs to poll the CancelationPending flag
Built on top of SynchronizationContext Captured when BackgroundWorker is instantiated Works for both Windows Forms and WPF
Hides Control.Invoke, Dispatcher, Invoke, etc.
Background Work and UIsBackgroundWorker
![Page 42: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/42.jpg)
Ambient state associated with each thread Security info, call context, … Async points represents a logical continuation
State must be flowed ExecutionContext
Capture Gets the current context
Run Executes a delegate with a captured context
Flowed automatically Thread, ThreadPool, Control.BeginInvoke, etc. But if you do something funky…
Flow can be suppressed SuppressFlow ThreadPool.UnsafeQueueUserWorkItem
Thread hoppingExecutionContext
![Page 43: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/43.jpg)
Overview and Architecture Mechanisms for Asynchrony Lunch (12pm-1:15pm) Topics in Synchronization Synchronization Best Practices Break Designs and Algorithms .NET Framework 4.0 Wrap-up
Agenda
![Page 44: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/44.jpg)
Overview and Architecture Mechanisms for Asynchrony Lunch Topics in Synchronization Synchronization Best Practices Break Designs and Algorithms .NET Framework 4.0 Wrap-up
Agenda
![Page 45: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/45.jpg)
Concurrency and State The Pitfalls of Shared Memory
Threads in a process share access to memory Easy to share information; enticing even! Hard to identify what is shared
Private: locals, heap objects not shared Shared: statics, heap objects shared explicitly, transitive
E.g. class C{ static int s_f; int m_f;public: void f(int * py) { int x; x++; // local variable s_f++; // static class member m_f++; // class member (*py)++; // pointer to something }};
![Page 46: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/46.jpg)
Managing State Isolation, Immutability, and Synchronization
Isolation (a.k.a. confinement) Memory space is “partitioned” No two threads ever access the same state +: no overhead, easy to reason about -: sharing is often needed, leading to message passing
Immutability Data is only read, not written +: no overhead, easy to reason about -: C# and VB encourage mutability … [lineage] -: copying means efficiency can be a challenge +: see F# for promising advances!
Synchronization Lock access to shared state +: flexible, programming techniques remain similar -: perf overhead, deadlocks, races, …
![Page 47: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/47.jpg)
Sharing HazardsR/W
Read/Write, a.k.a. unrepeatable read
t1’s 2nd read of x different from its 1st
static int x = 0;
void t1() { void t2() { int y = x; … x = 42; int z = x; // y != z }}
![Page 48: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/48.jpg)
Sharing HazardsW/R
Write/Read, a.k.a. dirty read
t2 saw t1’s initial write of x, but it got “rolled back”
static int x = 0;
void t1() { void t2() { try { x = 42; … int y = x; … throw e; … f(y); } catch { } // whoops; // rollback! x = 0; throw; }}
![Page 49: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/49.jpg)
Sharing HazardsW/W
Write/Write
What value exists in x at the end? And what do y and z contain? Who knows.
static int x = 0;
void t1() { void t2() { x = 42; x = 99; int y = x; int z = x;} }
![Page 50: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/50.jpg)
On Serializability and LinearizabilityEnsuring A happens-before () B
Transactions are useful concepts For some method M, … A – atomic: M happens all-at-once C – consistent: M never moves the program into an inconsistent
state I – isolated: M’s intermediary work is isolated D – durable: M’s effects persist for the lifetime of the context in
which the effects take place The effect? Serializability!
Given two properly serialized methods M and N, we say either M N or N M
Linearization points The place where M’s effects take place
Locks are common implementation technique
![Page 51: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/51.jpg)
Concurrency and Time Example of a Serializability Problem
static int x;…x++;
// compiles to
MOV EAX, [x]INC EAXMOV [x], EAX
T t0 t1 t2
0 t2(0): MOV EAX,[a] #01 t0(0): MOV EAX,[a] #02 t0(1): INC,EAX #13 t0(2): MOV [a],EAX #14 t1(0): MOV EAX,[a] #15 t1(1): INC,EAX #26 t1(2): MOV [a],EAX #27 t2(1): INC,EAX #18 t2(2): MOV [a],EAX #1
![Page 52: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/52.jpg)
Concurrency Begets Complexity
Sequential Concurrent
Behavior Deterministic Nondeterministic
Memory Stable In flux (unless private, read-only, or protected by a lock)
Locks Unnecessary Essential
Invariants
Must hold only on method entry/exit or calls to external code
Anytime the protecting lock is not held
Deadlock Impossible Possible, but can be mitigated
Testing Code coverage finds most bugs
Code coverage insufficient; races, timing, and environments probabilistically change
Debugging
Trace execution leading to failure; finding a fix is generally assured
Postulate a race and inspect code; root causes easily remain unindentified
![Page 53: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/53.jpg)
Interlocked Operations Hardware Synchronization
Interlocked operations are lowest level sync primitive A.k.a. compare-and-swap (CAS) XCHG, LOCK CMPXCHG, LOCK CMPXCHG8B Atomic sequence of read and write Locks, events, etc. are all built out of them
System.Threading.Interlocked static class
Fairly expensive Cache coherency – bus traffic >100 cycles on non-NUMA, >500 cycles on NUMA (uncontended)
int Add(ref int l, int v);int CompareExchange(ref int l, int v, int cmp);int Decrement(ref int l);int Increment(ref int l);int Exchange(ref int l, int v);
![Page 54: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/54.jpg)
Kernel Synchronization Objects The Foundation on top of Which All Else Exists
Kernel objects are basic primitives Signaled / nonsignaled state WaitForSingleObject, WaitForMultipleObjects *Msg variety: pump messages (STAs, GUIs) *Ex variety: alertable waits (dispatch APCs) CLR always pumps and does alertable waits
Data synchronization kinds Mutex – mutually exclusive Semaphore – N signals before nonsignaled Auto-reset Event – becomes nonsignaled when wait Manual-reset Event – manually reset
Exposed through System.Threading.WaitHandlepublic class WaitHandle : IDisposable { public void Close(); public void WaitOne();
// timeout-variants, and plenty of others…
public static void WaitAll(WaitHandle[] hs); public static int WaitAny(WaitHandle[] hs);}
![Page 55: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/55.jpg)
Mutex and Semaphore In .NET
Useful for Win32 interop When ACLs need to be used Inter-AppDomain synchronization
But otherwise too heavyweight for industrial usepublic class Mutex : WaitHandle { public Mutex(string name, MutexSecurity acl, …); public void ReleaseMutex();}
public class Semaphore : WaitHandle { public Semaphore( int initialCount, int maximumCount, string name, SemaphoreSecurity acl, …); public void Release(int count);}
![Page 56: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/56.jpg)
Auto- and Manual-Reset Events In .NET
Manual-reset: Once set, all threads are awoken, reset must happen manually
Auto-reset: Once set, one thread is awoken, reset happens automatically
Useful for all the previously stated reasons But also because it’s “sticky”
public class EventWaitHandle : WaitHandle { public EventWaitHandle( bool initialState, EventResetMode mode, string name, EventWaitHandleSecurity acl, …); public void Reset(); public void Set();}public enum EventResetMode { AutoReset, ManualReset}public class AutoResetEvent : EventWaitHandle { … }public class ManualResetEvent : EventWaitHandle { … }
![Page 57: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/57.jpg)
Monitors Locking
Monitor.Enter/Exit (and TryEnter) – mutual exclusion Language syntax
Any CLR object can be used as target Recursive and timeout-based acquires Spins briefly to reduce frequency of context switches
Thin and fat locks Header word before fields in object layout used for thin lock Acquire incurs a single interlocked operation Fat lock on contention: sync-block (+event) reclaimed by GC
[C#] lock (obj) { … }[VB] SyncLock obj … End SyncLock
Monitor.Enter(obj);try { …} finally { Monitor.Exit(obj);}
![Page 58: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/58.jpg)
Monitors Condition Variables
Monitor.Wait/Pulse/PulseAll – alternative to events
Waiter releases all locks on target Pulse wakes one, PulseAll wakes all
Efficient mechanisms used internally to pool events Inflates target to fat lock to register thread/event pair Event used is one per-thread
bool P = false;…lock (obj) { while (!P) Monitor.Wait(obj); …}… elsewhere …lock (obj) { P = true; Monitor.Pulse[All](obj);}
![Page 59: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/59.jpg)
Reader/Writer Locks When Mutual Exclusion is Unnecessary
Take advantage of R/R “conflicts” being safe Allow many readers in the lock But when one writer, no others can be in
ReaderWriterLock Read and write modes (AcquireXXLock/ReleaseXXLock) Upgrades (but prone to broken serialization)
ReaderWriterLockSlim (in 3.5) is cheaper Supersedes old ReaderWriterLock class Read, writer, and upgradeable-read modes
(EnterXXLock, ExitXXLock) Deadlock-free upgrades
However, scalability can still suffer… When work in lock is small: CAS dominates
![Page 60: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/60.jpg)
Locks and FairnessConvoy Avoidance
Should locks be fair? Thread A arrives, gets the lock Thread B arrives, must wait Thread C arrives, must wait Is it guaranteed that B gets the lock when A exits?
Pre-windows Server 2003 SP1: Yes; Post: No! Fairness exacerbates convoys, e.g.
A leaves lock, pulses B; Time for B to awaken is at least C, where C = cost
of context switch (>10,000 cycles) --- could be >2C! If Thread D arrives in this window, it must wait! Effectively extends lock hold times!
![Page 61: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/61.jpg)
Thread Local Storage (TLS)Confined State Within Threads
Ensures a reference is local to one thread Static (preferred)
[ThreadStaticAttribute]private static T foo;
Provides best performance (~TLS lookup + field fetch) Dynamic
Thread.SetData(k, v); .. T v = Thread.GetData(k); Only needed when per-object TLS is needed
In both cases, initialization is tricky Class ctors only invoked once per thread static E.g. static T foo = new T(); -- not what you want All uses need to check for initialization
![Page 62: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/62.jpg)
Immutability in PracticeAn ImmutableStack<T> Type
public class ImmutableStack<T> { private readonly T m_value; private readonly ImmutableStack<T> m_next; private readonly bool m_empty; public ImmutableStack() { m_empty = true; } internal ImmutableStack(T value, ImmutableStack<T> next) { m_value = value; m_next = next; m_empty = false; } public ImmutableStack<T> Push(T value) { return new ImmutableStack(value, this); } public ImmutableStack<T> Pop(out T value) { if (m_empty) throw new Exception("Empty."); return m_next; }}
![Page 63: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/63.jpg)
Memory Models (Danger!)Architecture and Platform Guarantees
Sometimes locks are unnecessary Native pointer-sized writes are atomic (32bit/64bit) But compilers + processors reorder reads + writes
CLR memory model: Data dependence All writes are store/release (nothing moves after) All ‘volatile’ reads are load/acquire (nothing moves before) Adjacent writes can be coalesced Fence ensures nothing moves either direction (lock,
interlocked operation, Thread.MemoryBarrier) C++:
Volatiles and explicit barriers Sophisticated code can exploit this; warning!
E.g. double checked locking Hard to test, since real reordering differs per-platform Often requires so much cleverness that locks win out
![Page 64: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/64.jpg)
Memory ReorderingExamples
X = Y = 0;~~~X = 1; A = Y;Y = 1; B = X;~~~A == 1 && B == 0?
X = Y = 0;~~~X = 1; A = X; B = Y;
Y = 1; C = X;~~~A == 1 && B == 1 && C == 0?
X = Y = 0;~~~X = 1; Y = 1;A = Y; B = X;~~~A == 0 && B == 0?
No, except on IA64.(No StoreStore, No LoadLoad)
Yes!(StoreLoad is permitted)
No.(Transitivity)
![Page 65: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/65.jpg)
Word TearingAccessing Nonatomic Locations w/out Proper Synchronization
Can the assert fire? Yes! Comprised of two reads/writes apiece
Possible values incl. 0x0L and 0xaaaabbbbccccddddL (of course), but also 0xaaaabbbb00000000L and 0x000000000ccccddddL !!
internal static long s_x; void t1() { int i = 0; while (true) { s_x = (i & 1) == 0 ? 0x0L : 0xaaaabbbbccccddddL; i++; }} void t2() { while (true) { long x = s_x; Debug.Assert(x == 0x0L || x == 0xaaaabbbbccccddddL); }}
![Page 66: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/66.jpg)
Lock FreedomDouble Edged Sword
Interlocked provides hardware atomicity CompareExchange(&a, b, c): if a contains c, replace it w/ b Guaranteed atomic in the hardware
Can be used to build scalable, wait-free algorithms:class Stack<T> { Node<T> head; void Push(T obj) { Node<T> n = new Node<T>(obj); Node<T> h; do { h = head; n.next = h; } while (Interlocked. CompareExchange(ref head, n, h) != h); }
T Pop() { Node<T> n; do { n = head; } while (Interlocked. CompareExchange(ref head, n.next, n) != n); return n.Value; } …}
![Page 67: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/67.jpg)
Double Checked LockingEfficient Lazy Initialization (Variant 1: Never Create >1)
CLR 2.0 memory model guarantees safety Other memory models (ECMA/1.1, C++, Java) don’t
Volatile necessary to prevent speculative loads (IA64)
class Foo { private static volatile Foo s_inst; private static object s_mutex = new object();
internal Foo { get { if (s_inst == null) lock (s_mutex) if (s_inst == null) s_inst = new Foo(…); return s_inst; } }}
![Page 68: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/68.jpg)
Double Checked LockingEfficient Lazy Initialization (Variant 2: >1 OK)
If possibility of garbage is OK, lock-free…class Foo { private static volatile Foo s_inst;
internal Foo { get { if (s_inst == null) { Foo candidate = new Foo(); Interlocked.CompareExchange( ref s_inst, candidate, null); } return s_inst; } }}
![Page 69: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/69.jpg)
Building a Spin-lockTrickier Than You Think!
class SpinLock { private int m_state = 0;
public void Enter() { while (Interlocked.CompareExchange( ref m_state, 1, 0) != 0) ; }
public void Exit() { m_state = 0; }}
![Page 70: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/70.jpg)
Building a Spin-lockBrain Melting Details …
Spin-waiting needs to Yield immediately on a 1-CPU machine Not use CompareExchange in a loop
Too much cache contention Back-off to add randomization Reread and only try CompareExchange if seen as 1 [TTAS] Possibly consider queueing [MCS]
Avoid priority starvation (Sleep(0) issue) Issue PAUSE instructions on Intel HT machines Mark BeginCriticalRegion/EndCriticalRegion Use managed thread ID to avoid OS thread affinity
![Page 71: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/71.jpg)
Building a Spin-lockTry Numero Dos – Still Imperfect
class SpinLock { private volatile int m_state = 0;
public void Enter() { int tid = Thread.CurrentThread.ManagedThreadId; while (true) { if (Interlocked.CompareExchange(ref m_state, tid, 0) != 0) { int iters = 1; while (m_state != 0) { if (Environment.ProcessorCount == 1) { if (iters % 5 == 0) Thread.Sleep(1); else Thread.Sleep(0); iters++; } else { Thread.SpinWait(iters); if (iters >= 4096) Thread.Sleep(1); else { if (iters >= 2048) Thread.Sleep(0); iters *= 2 } } } } } }
public void Exit() { m_state = 0; }}
![Page 72: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/72.jpg)
Overview and Architecture Mechanisms for Asynchrony Lunch Topics in Synchronization Synchronization Best Practices Break Designs and Algorithms .NET Framework 4.0 Wrap-up
Agenda
![Page 73: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/73.jpg)
Do: Lock over all mutable shared state Do: Always use the same lock for the same state Do: Comment on how state is protected
Synchronization Best PracticesLock consistently
class MyList<T> { T[] items; // lock: items int n; // lock: items
void Add(T item) { lock (items) { items[n] = item; n++; } } …}
![Page 74: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/74.jpg)
Do: Lock over entire invariant Don’t: Lock for longer than is absolutely necessary
Synchronization Best PracticesLock for the right duration
class MyList<T> { T[] items; // lock: items int n; // lock: items
// invariant: n is count of valid // items in list and items[n] == null
void Add(T item) { lock (items) { items[n] = item; n++; } } …}
![Page 75: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/75.jpg)
Do: Minimize the time you hold on to a lock Don’t: Call others’ code while you hold locks Don’t: Block while you hold locks
Synchronization Best Practices Make critical regions short and sweet
class MyList<T> { ...
void Add(T t) { lock(items) { items[n] = t; n++; } Listener.Notify(this); } …}
![Page 76: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/76.jpg)
Don’t: Use public lock objects Don’t: Lock on Types or Strings
Synchronization Best PracticesEncapsulate your locks
class MyList<T> { T[] items; int n;
static object slk = new object(); … static void ResetStats() { lock(slk){ … } } …}
![Page 77: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/77.jpg)
Do: Acquire locks in a consistent order
Synchronization Best PracticesAvoiding deadlocks
class MyService { A a; B b; …
void DoAB() { lock(a) lock(b) { a.Do(); b.Do(); } } void DoBA() { lock(b) lock(a) { b.Do(); a.Do(); } }}
![Page 78: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/78.jpg)
Do: Document your locking policy Especially for public APIs
Do: Use a reader/writer lock if readers are common Do: Prefer lock-based code to lock-free code Do: Prefer Monitors over kernel synchronization Avoid: Lock recursion in your designs Don’t: Build your own lock
Synchronization Best PracticesLocking Miscellany
![Page 79: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/79.jpg)
Avoid: Writing your own thread pools
ThreadPool Best Practices
![Page 80: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/80.jpg)
Overview and Architecture Mechanisms for Asynchrony Lunch Topics in Synchronization Synchronization Best Practices Break (3:15pm-3:45pm) Designs and Algorithms .NET Framework 4.0 Wrap-up
Agenda
![Page 81: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/81.jpg)
Overview and Architecture Mechanisms for Asynchrony Lunch Topics in Synchronization Synchronization Best Practices Break Designs and Algorithms .NET Framework 4.0 Wrap-up
Agenda
![Page 82: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/82.jpg)
Coarse vs. Fine Grained ConcurrencyThe Impact of Multi-core on Apps Server programs have it “easy”
Have been doing this for years Steady stream of incoming work, W Typically W >= #P, thus … One W per P isn’t such a bad strategy Multi-core is only a problem once P > W—then
the server needs to worry about fine-grained Clients, not so much
User-centric and responsiveness-oriented “User clicks a button” now what? Faster? Sure! Need to go fine-grained
![Page 83: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/83.jpg)
Going Fine-GrainedCode and Data
Two basic approaches to fine-grained concurrency Approach #1: It’s the task (i.e. code)
Futures—make a function call o.f and its results will be available as soon as possible; enough independent calls leads to scaling
Divide and conquer—split computations as you go into “left” and “right”, running one on a different thread (recursively)
Approach #2: It’s the data Partitioning—split data source into n partitions, and process each
partition in parallel (database query approach) Pipelining—break into n stages, and execute each stage in parallel
Ideally, app developers choose whichever is most appropriate for the task at hand… Coarse-grained mixed with task and data parallelism == happiness (So long as libraries don’t get in the way)
![Page 84: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/84.jpg)
A Taxonomy of Concurrency
…Agents/CSPs * Message Passing * Loose Coupling
Task Parallelism * Statements * Structured * Futures * ~O(1) Parallelism
Data Parallelism * Data Operations * O(N) Parallelism
Messaging
![Page 85: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/85.jpg)
On Speedups Metrics Worth Measuring
Goal: tS/tP ≈ P tS is the time it takes to run an algorithm sequentially tP is the time it takes to run on P processors Speedups:
If tS/tP = P, this is a linear speedup (good! for every doubling in processors, we halve execution time)
Most problems aren’t linear, … But we try to get as close as possible If tS/tP > P, we have super-linear speedup! Wowzas!
Design challenge: cheap problem decomposition Parallelism always costs something, even when efficient Synchronization, inter-thread communication, cache effects,
and threading not major factors in sequential code
![Page 86: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/86.jpg)
AlgorithmsParallel For Loops
Independent loop iterations may run in parallelfor (int i = 0; i < n; i++) a(i);
Only if no loop-carried dependencies, and … No shared state mutation
Static decomposition Loop iterations are assigned to workers a priori Contiguous chunks, striping, etc. +: simple, predictable, efficient -: can’t tolerate iteration imbalance, blocking
Dynamic decomposition Loop iterations are assigned on-demand, usually in chunks +: tolerates imbalance, blocking -: more difficult, communication overhead
![Page 87: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/87.jpg)
AlgorithmsParallel For Loops – Static Decomposition
void ParallelForS(int lo, int hi, Action<int> body, int p) { int chunk = ((hi – lo) + p - 1) / p; // Iterations/thread ManualResetEvent mre = new ManualResetEvent(false); int remaining = p;
// Schedule the threads to run in parallel for (int i = 0; i < p; i++) { ThreadPool.QueueUserWorkItem(delegate(object procId) { int start = lo + (int)procId * chunk; for (int j=start; j<start + chunk && j < hi; j++) { body(j); } if (Interlocked.Decrement(ref remaining) == 0) mre.Set(); }, i); }
mre.WaitOne(); // Wait for them to finish}
![Page 88: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/88.jpg)
AlgorithmsParallel For Loops – Dynamic Decomposition
void ParallelForD(int lo, int hi, Action<int> body, int p) { const int chunk = 16; // Chunk size (constant) ManualResetEvent mre = new ManualResetEvent(false); int remaining = p; int current = lo;
// Schedule the threads to run in parallel for (int i = 0; i < p; i++) { ThreadPool.QueueUserWorkItem(delegate(object procId) { int j; while ((j = (Interlocked.Add( ref current, chunk) – chunk)) < hi) { for (int k = 0; k < chunk && j + k < hi; k++) { body(j + k); } } if (Interlocked.Decrement(ref remaining) == 0) mre.Set(); }, i); }
mre.WaitOne(); // Wait for them to finish}
![Page 89: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/89.jpg)
AlgorithmsParallel Foreach Loops
What about IEnumerable<T> objects? IEnumerable<T> enum = …;
foreach (T e in enum) a(e);
Challenge: don’t know size a priori, can’t index into it ICollection<T> is slightly better (size) but not by much
Can be handled a bit like dynamic for partitioning Hand out chunks of elements at a time Requires serializing access to a single enumerator Large chunk size means more time spent in lock But less frequent lock arrivals
![Page 90: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/90.jpg)
AlgorithmsParallel Foreach Loops
void ParallelForEach<T>(IEnumerable<T> e, Action<T> body, int p) { const int chunk = 16; // Chunk size (constant) ManualResetEvent mre = new ManualResetEvent(false); int remaining = p;
using (IEnumerator<T> en = e.GetEnumerator()) { // shared // Schedule the threads to run in parallel for (int i = 0; i < p; i++) { ThreadPool.QueueUserWorkItem(delegate(object procId) { T[] buffer = new T[chunk]; int j; do { lock (en) { for (j = 0; j < chunk && en.MoveNext(); j++) buffer[j] = en.Current; } for (int k = 0; k < j; k++) body(buffer[k]); } while (j == chunk); if (Interlocked.Decrement(ref remaining) == 0) mre.Set(); }, i); } mre.WaitOne(); // Wait for them to finish }}
![Page 91: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/91.jpg)
AlgorithmsDivide and Conquer - Recursion
Process separate “halves” of the problem in parallel Process right half concurrently, … And left half on current thread (Recursively…) Switchover to sequential at some point (else too much
parallelism is exposed) E.g. mirror a binary tree in place
void Mirror(TreeNode node) { if (node == null) return; Mirror(node.Left); Mirror(node.Right); TreeNode tmp = node.Left; node.Left = node.Right; node.Right = tmp;}
![Page 92: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/92.jpg)
AlgorithmsReductions
Associative and commutative reductions can be done in parallel, e.g. Sum, Count, Min, Max, Average, …int ParallelSum(int[] array, int p) { int chunk = (array.Length + p - 1) / p; // Iterations/thread ManualResetEvent mre = new ManualResetEvent(false); int sum = 0, remaining = p;
// Schedule the threads to run in parallel for (int i = 0; i < p; i++) { ThreadPool.QueueUserWorkItem(delegate(object procId) { int mySum = 0; int start = (int)procId * chunk; for (int j=start; j<start + chunk && j < array.Length; j++) mySum += array[j]; Interlocked.Add(ref sum, mySum); if (Interlocked.Decrement(ref remaining) == 0) mre.Set(); }, i); }
mre.WaitOne(); // Wait for them to finish return sum;}
![Page 93: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/93.jpg)
When to “Go Parallel”?
There is a cost; only worthwhile when Work per task/element is large, and/or Number of tasks/elements is large
-- Work Per Task // # of Tasks ++
--
Spee
dup
++
1 task(Sequential)
? tasksBreak even point
? tasksPoint of diminishing returns
![Page 94: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/94.jpg)
Speedup InhibitorsSynchronous I/O
With synchronous IO:
With asynchronous IO:
= Running ()
= Waiting ()
Thread 1:
Thread 2:
Thread 1:
Thread 2:
Overlapped IO
time
time
6 work items in3 time
6 work items in4 time
![Page 95: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/95.jpg)
Speedup Inhibitors Synchronization
Synchronization inhibits speedup
Thread 1:
Thread 2:
Thread 3:
Thread 4:
…
= Running ()
= Running w/ lock ()= Waiting ()
(lock)
(lock)
(lock)
(lock)
![Page 96: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/96.jpg)
Speedup InhibitorsLoad Imbalance
Exacerbates Amdahl’s Law E.g. if Your API is 50% of the work, and is only
sequential, maximum parallel speedup = 2x!
Sequential:
Parallel:
More than 2 threads is just wasted resource:S = 50%, 1/S == 2
No matter how many processors, 2x is it
= Your API
Thread 1
Thread 2
Thread 3
Thread 4
![Page 97: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/97.jpg)
AlgorithmsOther Miscellaneous Algorithms
Pipelining Multiple stages run in parallel with one another Producer/consumer relationship
Speculation and search Many threads cooperate to “precompute” an answer If the answer isn’t needed, speculation is discarded Wasted work if unused, but speedups otherwise
Dataflow Future<T> abstraction: accessing the Value waits if
not computed yet, or runs synchronously otherwise Synchronization is driven by data dependence
![Page 98: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/98.jpg)
AlgorithmsProducer/Consumer: Blocking & Bounded Queue
public class BlockingBoundedQueue<T> { private Queue<T> m_queue = new Queue<T>(); private Semaphore m_fullSemaphore = new Semaphore(128); private Semaphore m_emptySemaphore = new Semaphore(0);
public void Enqueue(T item) { m_fullSemaphore.WaitOne(); lock (m_queue) { m_queue.Enqueue(item); } m_emptySemaphore.Release(); }
public T Dequeue() { T e; m_emptySemaphore.WaitOne(); lock (m_queue) { e = m_queue.Dequeue(); } m_fullSemaphore.Release(); return e; }}
![Page 99: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/99.jpg)
Overview and Architecture Mechanisms for Asynchrony Lunch Topics in Synchronization Synchronization Best Practices Break Designs and Algorithms .NET Framework 4.0 Wrap-up
Agenda
![Page 100: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/100.jpg)
Example: "Baby Names"
IEnumerable<BabyInfo> babies = ...;var results = new List<BabyInfo>();foreach(var baby in babies){ if (baby.Name == queryName && baby.State == queryState && baby.Year >= yearStart && baby.Year <= yearEnd) { results.Add(baby); }}results.Sort((b1, b2) => b1.Year.CompareTo(b2.Year));
![Page 101: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/101.jpg)
Manual Parallel Solution
IEnumerable<BabyInfo> babies = …;var results = new List<BabyInfo>();int partitionsCount = Environment.ProcessorCount;int remainingCount = partitionsCount;var enumerator = babies.GetEnumerator();try { using (var done = new ManualResetEvent(false)) { for(int i = 0; i < partitionsCount; i++) { ThreadPool.QueueUserWorkItem(delegate { var partialResults = new List<BabyInfo>(); while(true) { BabyInfo baby; lock (enumerator) { if (!enumerator.MoveNext()) break; baby = enumerator.Current; } if (baby.Name == queryName && baby.State == queryState && baby.Year >= yearStart && baby.Year <= yearEnd) { partialResults.Add(baby); } } lock (results) results.AddRange(partialResults); if (Interlocked.Decrement(ref remainingCount) == 0) done.Set(); }); } done.WaitOne(); results.Sort((b1, b2) => b1.Year.CompareTo(b2.Year)); }}finally { if (enumerator is IDisposable) ((IDisposable)enumerator).Dispose(); }
![Page 102: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/102.jpg)
LINQ Solution
var results = from baby in babies where baby.Name == queryName && baby.State == queryState && baby.Year >= yearStart && baby.Year <= yearEnd orderby baby.Year ascending select baby;
.AsParallel()
![Page 103: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/103.jpg)
Visual Studio 2010Tools / Programming Models / Runtimes
Parallel Pattern Library
Resource Manager
Task Scheduler
Task Parallel Library
PLINQ
Managed Library Native LibraryKey:
Threads
Operating System
Concurrency Runtime
Programming Models
AgentsLibrary
ThreadPool
Task Scheduler
Resource Manager
Data Structures
Dat
a St
ruct
ures
Tools
Tools
ParallelDebugger Windows
Profiler Concurrency
Analysis
![Page 104: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/104.jpg)
What is it? .NET types (mscorlib.dll, System.dll, System.Core.dll) No compiler changes necessary Work-stealing runtime Multiple programming models
Declarative data parallelism (PLINQ) Imperative data and task parallelism (Task Parallel Library) Coordination/synchronization constructs (Coordination Data Structures)
Common exception handling model Parallel debugging and profiling support
Why is it good? Supports parallelism in any .NET language Delivers reduced concept count and complexity, better time to
solution Begins to move parallelism capabilities from concurrency experts
to domain experts
Parallel Extensions to the.NET Framework 4.0
![Page 105: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/105.jpg)
Parallel Extensions Architecture
Task Parallel Library Coordination Data Structures
.NET Program
Proc 1 …
PLINQ Execution Engine
C# Compiler
VB Compiler
C++ Compiler
MSIL
Threads
DeclarativeQueries Data Partitioning
ChunkRangeHash
StripedRepartitioning
Operator TypesMapFilterSort
SearchReduce
…
MergingBuffering options
Order preservationInverted
Proc p
Parallel Algorithms
Query Analysis
Concurrent CollectionsSynchronization Types
Coordination Types
Loop replacementsImperative Task Parallelism
Scheduling
PLINQ
TPL or CDS
F# Compiler
Other .NET Compiler
![Page 106: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/106.jpg)
Work-Stealing Scheduler
Work-Stealing in Action
GlobalQueue
LocalQueue
LocalQueue
Worker Thread 1
Worker Thread p
Program Thread
…
…
Task 1Task 2
Task 3Task 5
Task 4
Task 6
![Page 107: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/107.jpg)
Used throughout PLINQ and TPL Address many of today’s core concurrency
issues
Coordination Data Structures
Thread-safe collectionsConcurrentStack<T>ConcurrentQueue<T>ConcurrentDictionary<TKey,TValue>…
Work exchangeBlockingCollection<T>IProducerConsumerCollection<T>
Phased OperationCountdownEvent Barrier
LocksManualResetEventSlimSemaphoreSlimSpinLockSpinWait
InitializationLazyInit<T>
![Page 108: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/108.jpg)
Overview and Architecture Mechanisms for Asynchrony Lunch Topics in Synchronization Synchronization Best Practices Break Designs and Algorithms .NET Framework 4.0 Wrap-up
Agenda
![Page 109: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/109.jpg)
Concurrency begins with architecture State management Agent interactions (coarse concurrency) Problem decomposition (fine grain parallelism)
It is possible with today’s tools Windows and .NET offer rich support Kernel objects Threads and thread pools User-mode sync primitives (.NET & C++)
Advances are on the horizon To bring parallelism within everybody’s reach You saw some (Parallel Extensions) …
6 Hours in a SlideTalk Recap
![Page 110: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/110.jpg)
What the Future HoldsProgramming Models
Safety Current offerings minimal impact (sharp knives) Three key themes
Functional: immutable & pure Safe imperative: isolated Safe side-effects: transactions
Verification tools Patterns
Agents (CSPs) + tasks + data 1st class isolated agents Raise level of abstraction: what, not how
110
![Page 111: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/111.jpg)
What the Future HoldsEfficiency and Heterogeneity
Efficiency “Do no harm” O(P) >= O(1) More static decision-making vs. dynamic Profile guided optimizations
The future is heterogeneous Chip multiprocessors are “easy” Out-of-order vs. in-order GPGPU’ (fusion of X86 with GPU) Vector ISAs Possibly different memory systems
111
+=~
![Page 112: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/112.jpg)
Implicit ParallelismUse APIs that internally use parallelism
Structured in terms of agentsApps, LINQ queries, etc.
Explicit ParallelismSafe
Frameworks, DSLs, XSLT, sorting, searching
All Programmers Will Not Be Parallel
Explicit ParallelismUnsafe
(Parallel Extensions, etc)
![Page 113: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/113.jpg)
In Conclusion
Opportunity and crisis Competitive advantage for those who grok it Less incentive for the client platform without
Architects & senior developers pay heed Time to start thinking and experimenting Not yet for ubiquitous consumption [5 year horizon] but… Can make a real difference today in select places:
embarassingly parallel Begin experimenting today
Windows Vista + .NET 3.5 Play with Parallel Extensions (.NET 4.0 and C++) Exciting times!
Thank-you. 113
![Page 114: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/114.jpg)
Learn more about concurrency:
Book SigningWhere: PDC bookstoreDate/Time: Wednesday, Oct. 29 2:30PM – 3:00PM
Just released!Available at the PDC bookstore
Concurrent Programming on Windows
(Addison-Wesley)Covers Win32 & .NET Framework
![Page 115: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/115.jpg)
Learn more about Parallel Computing at:
msdn.com/concurrencyAnd download
Parallel Extensions to the .NET Framework!
![Page 116: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/116.jpg)
Microsoft Visual Studio: Bringing out the Best in Multicore SystemsDate/Time: Monday, Oct. 27 1:45 PM – 3:00 PM
PDC Parallelism Sessions
Parallel Programming for C++ Developers in the Next Version of Microsoft Visual StudioDate/Time: Monday, Oct. 27 3:30PM – 4:45 PM
Parallel Programming for Managed Developers with the Next Version of Microsoft Visual StudioDate/Time: Wednesday, Oct. 29 10:30AM– 11:45AMConcurrency Runtime Deep Dive: How to Harvest Multicore Computing ResourcesDate/Time: Wednesday, Oct. 29 1:15 PM – 2:30 PM
Research: Concurrency Analysis Platform and Tools for Finding Concurrency BugsDate/Time: Wednesday Oct. 29 10:30 AM – 11:45 AM
The Concurrency and Coordination Runtime and Decentralized Software Services ToolkitDate/Time: Tuesday, Oct. 28 1:45PM – 3:00 PM
Parallel Computing Application Architectures and Opportunities Date/Time: Thursday, Oct. 30 10:15AM – 11:45AM
Addressing the Hard Problems of Concurrency Date/Time: Thursday, Oct. 30 8:30AM – 10:00AM
Future of Parallel Computing (Panel) Date/Time: Thursday, Oct. 30 12:00PM – 1:30PM
![Page 117: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/117.jpg)
Thank you!
![Page 118: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/118.jpg)
© 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market
conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
![Page 119: David Callahan Distinguished Engineer Microsoft Corporation Joe Duffy Lead Software Engineer Microsoft Corporation Stephen Toub Lead Program Manager.](https://reader033.fdocuments.net/reader033/viewer/2022051516/56649d5f5503460f94a3fed5/html5/thumbnails/119.jpg)