C++ Accelerated Massive Parallelism in Visual C++ 2012 Kate Gregory Gregory Consulting @gregcons...
-
Upload
beatrix-skinner -
Category
Documents
-
view
220 -
download
2
Transcript of C++ Accelerated Massive Parallelism in Visual C++ 2012 Kate Gregory Gregory Consulting @gregcons...
![Page 1: C++ Accelerated Massive Parallelism in Visual C++ 2012 Kate Gregory Gregory Consulting @gregcons DEV334.](https://reader030.fdocuments.net/reader030/viewer/2022032705/56649dd45503460f94acc9b8/html5/thumbnails/1.jpg)
C++ Accelerated Massive Parallelism in Visual C++ 2012Kate GregoryGregory Consultingwww.gregcons.com/kateblog, @gregcons
DEV334
![Page 2: C++ Accelerated Massive Parallelism in Visual C++ 2012 Kate Gregory Gregory Consulting @gregcons DEV334.](https://reader030.fdocuments.net/reader030/viewer/2022032705/56649dd45503460f94acc9b8/html5/thumbnails/2.jpg)
C++ is the language for performance
If you need speed at runtime, you use C++Frameworks and libraries can make your code faster
Eg PPL: use all the CPU coresWith little or no change to your logic and code
Experienced C++ developers are productive in C++Don’t want to go back to C or C-like languageEnjoy the tool support in Visual Studio
Many C++ developers value portabilityWrite standard C++, compile with anythingUse portable libraries, run anywhereEven in all-Microsoft universe, simple deployment is important
![Page 3: C++ Accelerated Massive Parallelism in Visual C++ 2012 Kate Gregory Gregory Consulting @gregcons DEV334.](https://reader030.fdocuments.net/reader030/viewer/2022032705/56649dd45503460f94acc9b8/html5/thumbnails/3.jpg)
demo
Cartoonizer
![Page 4: C++ Accelerated Massive Parallelism in Visual C++ 2012 Kate Gregory Gregory Consulting @gregcons DEV334.](https://reader030.fdocuments.net/reader030/viewer/2022032705/56649dd45503460f94acc9b8/html5/thumbnails/4.jpg)
What is C++ AMP?Accelerated Massive Parallelism
Run your calculations on one or more acceleratorsToday, GPU is the accelerator you useEventually: other kinds of accelerators
Write your whole application in C++Not a “C-like” language or a separate resource you link inUse Visual Studio and familiar toolsSpeed up 20x, 50x, or more
Basically a libraryComes with Visual Studio 2012, included in vcredistSpec is open – other platforms/compilers can implement it too
![Page 5: C++ Accelerated Massive Parallelism in Visual C++ 2012 Kate Gregory Gregory Consulting @gregcons DEV334.](https://reader030.fdocuments.net/reader030/viewer/2022032705/56649dd45503460f94acc9b8/html5/thumbnails/5.jpg)
Agenda
Hardware ReviewC++ AMP FundamentalsTilingDebugging and VisualizingCall to Action
![Page 6: C++ Accelerated Massive Parallelism in Visual C++ 2012 Kate Gregory Gregory Consulting @gregcons DEV334.](https://reader030.fdocuments.net/reader030/viewer/2022032705/56649dd45503460f94acc9b8/html5/thumbnails/6.jpg)
CPUs vs GPUs today
CPU
Low memory bandwidthHigher power consumptionMedium level of parallelismDeep execution pipelinesRandom accessesSupports general codeMainstream programming
GPU
High memory bandwidthLower power consumptionHigh level of parallelismShallow execution pipelinesSequential accessesSupports data-parallel codeNiche programming
images source: AMD
![Page 7: C++ Accelerated Massive Parallelism in Visual C++ 2012 Kate Gregory Gregory Consulting @gregcons DEV334.](https://reader030.fdocuments.net/reader030/viewer/2022032705/56649dd45503460f94acc9b8/html5/thumbnails/7.jpg)
C++ AMP is fundamentally a library
Comes with Visual C++ 2012#include <amp.h>Namespace: concurrencyNew classes:
array, array_viewextent, indexaccelerator, accelerator_view
New function(s): parallel_for_each()New (use of) keyword: restrict
Asks compiler to check your code is ok for GPU (DirectX)
![Page 8: C++ Accelerated Massive Parallelism in Visual C++ 2012 Kate Gregory Gregory Consulting @gregcons DEV334.](https://reader030.fdocuments.net/reader030/viewer/2022032705/56649dd45503460f94acc9b8/html5/thumbnails/8.jpg)
parallel_for_each
Entry point to the libraryTakes number (and shape) of threads neededTakes function or lambda to be done by each thread
Must be restrict(amp) Sends the work to the accelerator
Scheduling etc handled thereReturns – no blocking/waiting
![Page 9: C++ Accelerated Massive Parallelism in Visual C++ 2012 Kate Gregory Gregory Consulting @gregcons DEV334.](https://reader030.fdocuments.net/reader030/viewer/2022032705/56649dd45503460f94acc9b8/html5/thumbnails/9.jpg)
void AddArrays(int n, int * pA, int * pB, int * pSum){
for (int i=0; i<n; i++)
{ pSum[i] = pA[i] + pB[i]; }
}
#include <amp.h>using namespace concurrency;
void AddArrays(int n, int * pA, int * pB, int * pSum){ array_view<int,1> a(n, pA); array_view<int,1> b(n, pB); array_view<int,1> sum(n, pSum); parallel_for_each( sum.extent, [=](index<1> i) restrict(amp) { sum[i] = a[i] + b[i]; } );}
Hello World: Array Addition
void AddArrays(int n, int * pA, int * pB, int * pSum){
for (int i=0; i<n; i++)
{ pSum[i] = pA[i] + pB[i]; }
}
![Page 10: C++ Accelerated Massive Parallelism in Visual C++ 2012 Kate Gregory Gregory Consulting @gregcons DEV334.](https://reader030.fdocuments.net/reader030/viewer/2022032705/56649dd45503460f94acc9b8/html5/thumbnails/10.jpg)
Basic Elements of C++ AMP codingvoid AddArrays(int n, int * pA, int * pB, int * pSum){ array_view<int,1> a(n, pA); array_view<int,1> b(n, pB); array_view<int,1> sum(n, pSum); parallel_for_each(
sum.extent, [=](index<1> i) restrict(amp) { sum[i] = a[i] + b[i];
} );}
array_view variables captured and associated data copied to accelerator (on demand)
restrict(amp): tells the compiler to check that this code conforms to C++ AMP language restrictions
parallel_for_each: execute the lambda on the accelerator once per thread
extent: the number and shape of threads to execute the lambda
index: the thread ID that is running the lambda, used to index into data
array_view: wraps the data to operate on the accelerator
![Page 11: C++ Accelerated Massive Parallelism in Visual C++ 2012 Kate Gregory Gregory Consulting @gregcons DEV334.](https://reader030.fdocuments.net/reader030/viewer/2022032705/56649dd45503460f94acc9b8/html5/thumbnails/11.jpg)
extent<N> - size of an N-dim space
![Page 12: C++ Accelerated Massive Parallelism in Visual C++ 2012 Kate Gregory Gregory Consulting @gregcons DEV334.](https://reader030.fdocuments.net/reader030/viewer/2022032705/56649dd45503460f94acc9b8/html5/thumbnails/12.jpg)
index<N> - an N-dimensional point
![Page 13: C++ Accelerated Massive Parallelism in Visual C++ 2012 Kate Gregory Gregory Consulting @gregcons DEV334.](https://reader030.fdocuments.net/reader030/viewer/2022032705/56649dd45503460f94acc9b8/html5/thumbnails/13.jpg)
View on existing data on the CPU or GPUDense in least significant dimensionOf element T and rank NRequires extentRectangularAccess anywhere (implicit sync)
vector<int> v(10);
extent<2> e(2,5); array_view<int,2> a(e, v);
array_view<T,N>
//above two lines can also be written//array_view<int,2> a(2,5,v);
index<2> i(1,3);
int o = a[i]; // or a[i] = 16;//or int o = a(1, 3);
![Page 14: C++ Accelerated Massive Parallelism in Visual C++ 2012 Kate Gregory Gregory Consulting @gregcons DEV334.](https://reader030.fdocuments.net/reader030/viewer/2022032705/56649dd45503460f94acc9b8/html5/thumbnails/14.jpg)
demo
Matrix Multiplication
![Page 15: C++ Accelerated Massive Parallelism in Visual C++ 2012 Kate Gregory Gregory Consulting @gregcons DEV334.](https://reader030.fdocuments.net/reader030/viewer/2022032705/56649dd45503460f94acc9b8/html5/thumbnails/15.jpg)
Matrix Multiplication
C00 = A00 * B00 + A01 * B10 + A02 * B20 + A03 * B30
![Page 16: C++ Accelerated Massive Parallelism in Visual C++ 2012 Kate Gregory Gregory Consulting @gregcons DEV334.](https://reader030.fdocuments.net/reader030/viewer/2022032705/56649dd45503460f94acc9b8/html5/thumbnails/16.jpg)
restrict(amp) restrictions
Can only call other restrict(amp) functionsAll functions must be inlinableOnly amp-supported types
int, unsigned int, float, double, boolstructs & arrays of these types
Pointers and ReferencesLambdas cannot capture by reference, nor capture pointersReferences and single-indirection pointers supported only as local variables and function arguments
![Page 17: C++ Accelerated Massive Parallelism in Visual C++ 2012 Kate Gregory Gregory Consulting @gregcons DEV334.](https://reader030.fdocuments.net/reader030/viewer/2022032705/56649dd45503460f94acc9b8/html5/thumbnails/17.jpg)
restrict(amp) restrictions
No recursion'volatile'virtual functionspointers to functionspointers to member functionspointers in structspointers to pointersbitfields
No goto or labeled statementsthrow, try, catchglobals or staticsdynamic_cast or typeidasm declarationsvarargsunsupported types
e.g. char, short, long double
![Page 18: C++ Accelerated Massive Parallelism in Visual C++ 2012 Kate Gregory Gregory Consulting @gregcons DEV334.](https://reader030.fdocuments.net/reader030/viewer/2022032705/56649dd45503460f94acc9b8/html5/thumbnails/18.jpg)
vector<int> v(8 * 12);extent<2> e(8,12);accelerator acc = …array<int,2> a(e,acc.default_view);copy_async(v.begin(), v.end(), a);
array<T,N>
Multi-dimensional array of rank N with element TContainer whose storage lives on a specific acceleratorCapture by reference [&] in the lambdaExplicit copyNearly identical interface to array_view<T,N>
parallel_for_each(e, [&](index<2> idx) restrict(amp) { a[idx] += 1;});copy(a, v.begin());
![Page 19: C++ Accelerated Massive Parallelism in Visual C++ 2012 Kate Gregory Gregory Consulting @gregcons DEV334.](https://reader030.fdocuments.net/reader030/viewer/2022032705/56649dd45503460f94acc9b8/html5/thumbnails/19.jpg)
Tiling
Rearrange algorithm to do the calculation in tilesEach thread in a tile shares a programmable cache
tile_static memoryAccess 100x as fast as global memoryExcellent for algorithms that use each piece of information again and again
Overload of parallel_for_each that takes a tiled extent
![Page 20: C++ Accelerated Massive Parallelism in Visual C++ 2012 Kate Gregory Gregory Consulting @gregcons DEV334.](https://reader030.fdocuments.net/reader030/viewer/2022032705/56649dd45503460f94acc9b8/html5/thumbnails/20.jpg)
Race Conditions in the Cache
Because a tile of threads shares the programmable cache, you must prevent race conditions
Tile barrier can ensure a waitTypical pattern:
Each thread does a share of the work to fill the cacheThen waits until all threads have done that workThen uses the cache to calculate a share of the answer
![Page 21: C++ Accelerated Massive Parallelism in Visual C++ 2012 Kate Gregory Gregory Consulting @gregcons DEV334.](https://reader030.fdocuments.net/reader030/viewer/2022032705/56649dd45503460f94acc9b8/html5/thumbnails/21.jpg)
Visual Studio 2012
DebuggingEverything you had before, plus: GPU ThreadsParallel StacksParallel Watch
Visualizing
![Page 22: C++ Accelerated Massive Parallelism in Visual C++ 2012 Kate Gregory Gregory Consulting @gregcons DEV334.](https://reader030.fdocuments.net/reader030/viewer/2022032705/56649dd45503460f94acc9b8/html5/thumbnails/22.jpg)
Debugging
Can hit either CPU breakpoints or GPU breakpointsNew UI in VS 2012 to control that
GPU breakpoints: Only on Windows 8 today
![Page 23: C++ Accelerated Massive Parallelism in Visual C++ 2012 Kate Gregory Gregory Consulting @gregcons DEV334.](https://reader030.fdocuments.net/reader030/viewer/2022032705/56649dd45503460f94acc9b8/html5/thumbnails/23.jpg)
Debugging Properties
![Page 24: C++ Accelerated Massive Parallelism in Visual C++ 2012 Kate Gregory Gregory Consulting @gregcons DEV334.](https://reader030.fdocuments.net/reader030/viewer/2022032705/56649dd45503460f94acc9b8/html5/thumbnails/24.jpg)
Values, Call Stacks, etc
![Page 25: C++ Accelerated Massive Parallelism in Visual C++ 2012 Kate Gregory Gregory Consulting @gregcons DEV334.](https://reader030.fdocuments.net/reader030/viewer/2022032705/56649dd45503460f94acc9b8/html5/thumbnails/25.jpg)
GPU Threads Window
Shows progress through the calculation
![Page 26: C++ Accelerated Massive Parallelism in Visual C++ 2012 Kate Gregory Gregory Consulting @gregcons DEV334.](https://reader030.fdocuments.net/reader030/viewer/2022032705/56649dd45503460f94acc9b8/html5/thumbnails/26.jpg)
Parallel Watch
Shows values across multiple threads
![Page 27: C++ Accelerated Massive Parallelism in Visual C++ 2012 Kate Gregory Gregory Consulting @gregcons DEV334.](https://reader030.fdocuments.net/reader030/viewer/2022032705/56649dd45503460f94acc9b8/html5/thumbnails/27.jpg)
And more!
Race Condition Detection Parallel StacksFlagging, Filtering, and GroupingFreezing and ThawingRun Tile to Cursor
![Page 28: C++ Accelerated Massive Parallelism in Visual C++ 2012 Kate Gregory Gregory Consulting @gregcons DEV334.](https://reader030.fdocuments.net/reader030/viewer/2022032705/56649dd45503460f94acc9b8/html5/thumbnails/28.jpg)
Concurrency Visualizer
Shows activity on CPU and GPUCan highlight relative times for specific parts of a calculationOr copy times to/from the accelerator
![Page 29: C++ Accelerated Massive Parallelism in Visual C++ 2012 Kate Gregory Gregory Consulting @gregcons DEV334.](https://reader030.fdocuments.net/reader030/viewer/2022032705/56649dd45503460f94acc9b8/html5/thumbnails/29.jpg)
![Page 30: C++ Accelerated Massive Parallelism in Visual C++ 2012 Kate Gregory Gregory Consulting @gregcons DEV334.](https://reader030.fdocuments.net/reader030/viewer/2022032705/56649dd45503460f94acc9b8/html5/thumbnails/30.jpg)
![Page 31: C++ Accelerated Massive Parallelism in Visual C++ 2012 Kate Gregory Gregory Consulting @gregcons DEV334.](https://reader030.fdocuments.net/reader030/viewer/2022032705/56649dd45503460f94acc9b8/html5/thumbnails/31.jpg)
C++ AMP is…
C++The language you knowExcellent productivity The language you choose when performance matters
Implemented as (mostly) a libraryVariety of application types
Well supported by Visual Studio 2012DebuggerConcurrency VisualizerEverything else you already use
Can be supported by other compilers and platformsOpen spec
![Page 32: C++ Accelerated Massive Parallelism in Visual C++ 2012 Kate Gregory Gregory Consulting @gregcons DEV334.](https://reader030.fdocuments.net/reader030/viewer/2022032705/56649dd45503460f94acc9b8/html5/thumbnails/32.jpg)
Learn C++ AMP
book http://www.gregcons.com/cppamp/
training http://www.acceleware.com/cpp-amp-training
videos http://channel9.msdn.com/Tags/c++-accelerated-massive-parallelism
articles http://blogs.msdn.com/b/nativeconcurrency/archive/2012/04/05/c-amp-articles-in-msdn-magazine-april-issue.aspx
samples http://blogs.msdn.com/b/nativeconcurrency/archive/2012/01/30/c-amp-sample-projects-for-download.aspx
guides http://blogs.msdn.com/b/nativeconcurrency/archive/2012/04/11/c-amp-for-the-cuda-programmer.aspx
spec http://blogs.msdn.com/b/nativeconcurrency/archive/2012/02/03/c-amp-open-spec-published.aspx
forum http://social.msdn.microsoft.com/Forums/en/parallelcppnative/threads
http://blogs.msdn.com/nativeconcurrency/
![Page 33: C++ Accelerated Massive Parallelism in Visual C++ 2012 Kate Gregory Gregory Consulting @gregcons DEV334.](https://reader030.fdocuments.net/reader030/viewer/2022032705/56649dd45503460f94acc9b8/html5/thumbnails/33.jpg)
Call to Action
Get Visual Studio 2012Download some samplesPlay with debugger and other toolsTry writing a C++ AMP application of your own
Console (command prompt)WindowsMetro style for Windows 8
Measure your performance and see the difference
![Page 34: C++ Accelerated Massive Parallelism in Visual C++ 2012 Kate Gregory Gregory Consulting @gregcons DEV334.](https://reader030.fdocuments.net/reader030/viewer/2022032705/56649dd45503460f94acc9b8/html5/thumbnails/34.jpg)
DEV Track Resources
Visual Studio Home Page :: http://www.microsoft.com/visualstudio/en-us
Jason Zander’s Blog :: http://blogs.msdn.com/b/jasonz/
Facebook :: http://www.facebook.com/visualstudio
Twitter :: http://twitter.com/#!/visualstudio
Somasegar’s Blog :: http://blogs.msdn.com/b/somasegar/
![Page 35: C++ Accelerated Massive Parallelism in Visual C++ 2012 Kate Gregory Gregory Consulting @gregcons DEV334.](https://reader030.fdocuments.net/reader030/viewer/2022032705/56649dd45503460f94acc9b8/html5/thumbnails/35.jpg)
Resources
Connect. Share. Discuss.
http://europe.msteched.com
Learning
Microsoft Certification & Training Resources
www.microsoft.com/learning
TechNet
Resources for IT Professionals
http://microsoft.com/technet
Resources for Developers
http://microsoft.com/msdn
![Page 36: C++ Accelerated Massive Parallelism in Visual C++ 2012 Kate Gregory Gregory Consulting @gregcons DEV334.](https://reader030.fdocuments.net/reader030/viewer/2022032705/56649dd45503460f94acc9b8/html5/thumbnails/36.jpg)
Evaluations
http://europe.msteched.com/sessions
Submit your evals online
![Page 37: C++ Accelerated Massive Parallelism in Visual C++ 2012 Kate Gregory Gregory Consulting @gregcons DEV334.](https://reader030.fdocuments.net/reader030/viewer/2022032705/56649dd45503460f94acc9b8/html5/thumbnails/37.jpg)
© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to
be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS
PRESENTATION.
![Page 38: C++ Accelerated Massive Parallelism in Visual C++ 2012 Kate Gregory Gregory Consulting @gregcons DEV334.](https://reader030.fdocuments.net/reader030/viewer/2022032705/56649dd45503460f94acc9b8/html5/thumbnails/38.jpg)