.Net Multithreading and Parallelization
-
Upload
dmitri-nesteruk -
Category
Technology
-
view
4.131 -
download
1
Transcript of .Net Multithreading and Parallelization
Multithreading and Parallelization
Dmitri [email protected] | http://nesteruk.org/seminars
Agenda
Overview
Multithreading
PowerThreading (AsyncEnumerator)
Multi-core parallelization
Parallel Extensions to .NET Framework
Multi-computer parallelization
PureMPI.NET
Why now?
Manycore paradigm shift
CPU speeds reach production challenges(not at the limit yet)
growth
Processor features
Hyper-threading
SIMD
CPU Scope
Yesterday1x-core
Today2x-core norm4x-
Tomorrow32x-core?
Past: more transistors per chip
Present: more coresper chip
Future: even more cores per chip; NUMA & other specialties
Machine Scope
Most clients are concerned with one-machine use
Clustering helps leverage performance
Clouds
Machine
Cluster
Cloud
Multithreading vs. Parallelization
Multithreading
Using threads/thread pool to perform asyncoperations
Explicit (# of threads known)
Parallelization
Implicit parallelization
No explicit thread operation
Ways to Parallelize/Multithread
Managed
Unmanaged
Specialized
System.ThreadingParr. ExtensionsLibraries
OpenMPLibraries
GPGPUFPGA
Managed
System.Threading
Libraries
Parallel Extensions (TPL + PLINQ)
PowerThreading
Languages/frameworks
Sing#, CCR
Remoting, WCF, MPI.NET, PureMPI.NET, etc.
Use over many machines
Unmanaged
OpenMP
– #pragma directives in C++ code
Intel multi-core libraries
Threading Building Blocks (low-level)
Integrated Performance Primitives
Math Kernel Library (also has MPI support)
MPI, PVM, etc.
Use over many machines
Specialized Ex. (Intrinsic Parallelization)
GPU Computation (GPGPU)
Calculations on graphic card
Uses programmable pixel shaders
See, e.g., NVidia CUDA, GPGPU.org
FPGA
Hardware-specific solutions
E.g., in-socket accelerators
Requires HDL programming & custom hardware
Multithreading: a look at AsyncEnumerator
Part I
Multithreading
Goals
Do stuff concurrently
Preserve safety/consistency
Tools
Threads
ThreadPool
Synchronization objects
Framework async APIs
A Look at Delegates
Making delegate for function is easy
Given void a() { … }
– ThreadStart del = a;
Given void a(int n) { … }
– Action<int> del = a;
Given float a(int n, double m) {…}
– Func<int, double, float> del = a;
Otherwise, make your own!
Delegate Methods
Invoke()
Synchronous, blocks your thread
BeginInvoke
Executes in ThreadPool
Returns IAsyncResult
EndInvoke
Waits for completion
Takes the IAsyncResult from BeginInvoke
Usage
Fire and forget
– del.BeginInvoke(null, null);
Fire, and wait until done
– IAsyncResult ar = del.BeginInvoke(null,null);…del.EndInvoke(ar);
Fire, and call a function when done
– del.BeginInvoke(firedWhenDone, null);Callback parameter
WaitOne and WaitAll
To wait until either delegate completes
– WaitHandle.WaitOne(new ThreadStart[] { ar1.AsyncWaitHandle,ar2.AsyncWaitHandle
}); // wait until either completes
To wait until all delegates complete
Use WaitAll instead of WaitOne
– [MTAThread]-specific, use Pulse & Wait instead
Example
Execute a() and b() in parallel; wait on both
ThreadStart delA = a;
ThreadStart delB = b;
IAsyncResult arA = delA.BeginInvoke(null, null);
IAsyncResult arB = delB.BeginInvoke(null, null);
WaitHandle.WaitAll(new [] { arA.AsyncWaitHandle, arB.AsyncWaitHandle });
LINQ Example
Execute a() and b() in parallel; wait on both
WaitHandle.WaitAll(new [] { a, b }
.Select (f =>f.BeginInvoke(null,null)
.AsyncWaitHandle).ToArray());Convert from IEnumerable to array
Call each delegate
Get a wait handle of each
Implicitly make an array of delegates
Asynchronous Programming Model (APM)
Basic goal
– IAsyncResult ar =del.BeginXXX(null,null);
…del.EndXXX(ar);
Supported by Framework classes, e.g.,
– FileStream
– WebRequest
Difficulties
Async calls do not always succeed
Timeout
Exceptions
Cancelation
Results in too many functions/anonymous delegates
Async workflow code becomes difficult to read
PowerThreading
A free library from Wintellect (Jeffrey Richter)
Get it atwintellect.com
Also check out PowerCollections
Resource locks
ReaderWriterGate
Async. prog. model
AsyncEnumeratorSyncGate
Other features
IOState managerNumaInformation :)
AsyncEnumerator
Simplifies APM programming
No need to manually manage IAsyncResult cookies
Fewer functions, cleaner code
Usage patterns
1 async op → process
X async ops → process all
X async ops → process each one as it completes
X async ops → process some, discard the rest
X async ops → process some until cancellation/timeout occurs, discard the rest
AsyncEnumerator Basics
Has three methods
Execute(IEnumerator<Int32>)
BeginExecute
EndExecute
Also exists as AsyncEnumerator<T> when a return value is required
Inside the Function
internal IEnumerator<Int32> GetFile(
AsyncEnumerator ae, string uri)
{
WebRequest wr = WebRequest.Create(uri);
wr.BeginGetResponse(ae.End(), null);
yield return 1;
WebResponse resp = wr.EndGetResponse(
ae.DequeueAsyncResult());
// use response
}
Signature
internal IEnumerator<Int32> GetFile(
AsyncEnumerator ae, string uri)
{
WebRequestwr = WebRequest.Create(uri);
wr.BeginGetResponse(ae.End(), null);
yield return 1;
WebResponseresp = wr.EndGetResponse(
ae.DequeueAsyncResult());
// use response
}
Function must return IEnumerator<Int32>
Function must accept AsyncEnumerator as one of the parameters (order unimportant)
Callback
internal IEnumerator<Int32> GetFile(
AsyncEnumerator ae, string uri)
{
WebRequest wr = WebRequest.Create(uri);
wr.BeginGetResponse(ae.End(), null);
yield return 1;
WebResponseresp = wr.EndGetResponse(
ae.DequeueAsyncResult());
// use response
}
Call the asyncBeginXXX() methods
Pass ae.End() as callback parameter
Yield
internal IEnumerator<Int32> GetFile(
AsyncEnumerator ae, string uri)
{
WebRequest wr = WebRequest.Create(uri);
wr.BeginGetResponse(ae.End(), null);
yield return 1;
WebResponseresp = wr.EndGetResponse(
ae.DequeueAsyncResult());
// use response
}
Now yield return the number of pending asynchronous operations
Wait & Process
internal IEnumerator<Int32> GetFile(
AsyncEnumerator ae, string uri)
{
WebRequest wr = WebRequest.Create(uri);
wr.BeginGetResponse(ae.End(), null);
yield return 1;
WebResponse resp = wr.EndGetResponse(
ae.DequeueAsyncResult());
// use response
}
Call the asyncEndXXX() methods
Pass ae.DequeueAsyncResult() as parameter
Usage
Init the enumerator
– var ae = new AsyncEnumerator();
Use it, passing itself as a parameter
– ae.Execute(GetFile(ae, “http://nesteruk.org”));
Exception Handling
Break out of function
– try {resp = wr.EndGetResponse(ae.DequeueAsyncResult());
} catch (WebException e) {// process eyield break;
}
Propagate a parameter
Discard Groups
Sometimes, you want to ignore the result of some calls
E.g., you already got the data elsewhere
To discard a group of calls
Use overloaded End(…) methods to specify
Group number
Cleanup delegate
Call DiscardGroup(…) with group number
Cancellation
External code can cancel the iterator
– ae.Cancel(…)
Or specify a timeout
– ae.SetCancelTimeout(…)
Check whether iterator is cancelled with
– ae.IsCanceled(…)
just call yield break if it is
Parallel Extensions to .NET Framework TPL and PLINQ
Part II
Parallelization
Algorithms vary
(e.g., matrix multiplication)
Some not so(e.g., matrix inversion)
Some not at all
parallelize them
Parallel Extensions to .NET Framework (PFX)
A library for parallelization
Consists of
Task Parallel Library
Parallel LINQ (PLINQ)
Currently in CTP stage
Maybe in .NET 4.0?
Task Parallel Library Features
System.Linq
Parallel LINQ
System.Theading
Implicit parallelism (Parallel.Xxx)
System.Threading.Collections
Thread-safe stack and queue
System.Threading.Tasks
Task manager, tasks, futures
System.Threading
Implicit parallelization (Parallel.For and ForEach)
Aggregate exceptions
Other useful classes
Parallel.For | ForEach
LazyInit<T>WriteOnce<T>
AggregateException
Other goodies
Parallel.For
Parallelizes a for loop
Instead of
for (int i = 0; i < 10; ++i) { … }
We write
Parallel.For(0, 10, i => { … });
Parallel.For Overloads
Step size
ParallelState for cancelation
Thread-local initialization
Thread-local finalization
References to a TaskManager
Task creation options
Parallel.ForEach
Same features as Parallel.For except
No counters or steps
Takes an IEnumerable<T>
Cancelation
Parallel.For takes an Action<Int32> delegate
Can also take an Action<Int32, ParallelState>
ParallelState keeps track of the state of parallel execution
ParallelState.Stop() stops execution in all threads
Parallel.For Exceptions
The AggregateException class holds all exceptions thrown
Created even if only one thread throws
Used by both Parallel.Xxx and PLINQ
Original exceptions stored in InnerExceptions property.
LazyInit<T>
Lazy initialization of a single variable
Options
– AllowMultipleExecutionInit function can be called by many threads, only one value published
– EnsureSingleExecutionInit function executed only once
– ThreadLocalOne init call & value per thread
WriteOnce<T>
Single-assignment structure
Just like Nullable:
HasValue
Value
Also try methods
TryGetValue
TrySetValue
Futures
A future is the name of a value that will eventually be produced by a computation
Thus, we can decide what to do with the value before we know it
Futures of T
• Future is a factory
• Future<T> is the actual future (and also has factory methods)
To make a future
– var f = Future.Create(() => g());
To use a future
Get f.Value
The accessor does an async computation
Tasks & TaskManager
A better Thread+ThreadPool combination
TaskManager
A very clever thread pool :)
Adjusts worker threads to # of CPUs/cores
Keeps all cores busy
Task
A unit of work
May (or may not) run concurrently
http://channel9.msdn.com/posts/DanielMoth/ParallelFX-Task-and-friends/
Task
Just like a future, a task takes an Action<T>
– Task t = Task.Create(DoSomeWork);
Overloads exist :)
Fires off immediately. To wait on completion
– t.Wait();
Unlike the thread pool, task manager will use as many threads as there are cores
Parallel LINQ (PLINQ)
Parallel evaluation in
LINQ to Objects
LINQ to XML
Features
IParallelEnumerable<T>
ParallelEnumerable.AsParallel static method
Example
IEnumerable<T> data = ...;var q = data.AsParallel().Where(x => p(x)).Orderby(x => k(x)).Select(x => f(x));
foreach (var e in q)a(e);
Interprocess communication with PureMPI.NET
Part III
Message Passing Interface
An API for general-purpose IPC
Works across cores & machines
C++ and Fortran
Some Intel libraries support explicitly
http://www.mcs.anl.gov/research/projects/mpich2/
PureMPI.NET
A free library available at http://purempi.net
Uses WCF endpoints for communication
Uses MPI syntax
Features
A library DLL for WCF functionality
An EXE for easy deployment over network
How it works
Your computers run a service that connects them together
Your program exposes WCF endpoints
You use the MPI interfaces to communicate
Communicator & Rank
A communicator is a group of computers
In most scenarios, you would have one group
MPI_COMM_WORLD
comm
Useful for determine whether we are the
Main
static void Main(string[] args)
{
using (ProcessorGroup processors =new ProcessorGroup("MPIEnvironment",
MpiProcess))
{
processors.Start();
processors.WaitForCompletion();
}
}
MPIEnvironment app.config
Start each one
Wait on all
Run MpiProcess on all machines
Sending & Receiving
Blocking or non-blocking methods
Send/Receive (blocking)
Begin|End Send/Receive (async)
Invoked on the comm
Send/Receive
static void MpiProcess(IDictionary<string, Comm> comms)
{
Comm comm = comms["MPI_COMM_WORLD"];
if (comm.Rank == 0)
{
string msg = comm.Receive<string>(1, string.Empty);
Console.WriteLine("Got " + msg);
}
else if (comm.Rank == 1)
{
comm.Send(0, string.Empty, "Hello");
}
}
Get a message from 1 (blocking)
Send a message to 0 (also blocking)
Get a default comm from dictionary
Extras
Can use async ops
Can send to all (Broadcast)
Can distribute work and then collect it (Gather/Scatter)
Thank You!