IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)
Transcript of IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)
![Page 1: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/1.jpg)
IAP09 CUDA@MIT / 6.963
Supercomputing on your desktop:Programming the next generation of cheap
and massively parallel hardware using CUDA
Lecture 02
CUDA Basics #1-
Nicolas Pinto (MIT)
![Page 2: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/2.jpg)
During this course,
we’ll try to
and use existing material ;-)
“ ”
adapted for 6.963
![Page 3: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/3.jpg)
Todayyey!!
![Page 4: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/4.jpg)
IntroGPU?
GPU History// Analysis
CUDA OverviewCUDA Basics
IAP09 CUDA@MIT / 6.963
![Page 5: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/5.jpg)
IntroIAP09 CUDA@MIT / 6.963
![Page 6: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/6.jpg)
!"#$$%&$'()*+,-.(/$$0
1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86
/E#E/$$0
F
!"#$%&"'()'*+&+,,",'-./0%$123
!"#"$$%$&'()*+,-./&-0&"&1(#)&(1&'()*+,-./&-.&
23-'3&)".4&-.0,#+',-(.0&"#%&'"##-%5&(+,&
0-)+$,".%(+0$441510"61+
! 7&+61$1.2+,,8)',+&3"9'":0"2;1<"9';0"#1+,1="6
! >:.$1#'?%0"&#./0%$"&;'@>3)'-&+8A
! B1;$&1C%$"6'?8;$"/;'@>3)'D?-E'4F1$"9'G,%"H"2"A'
! *+&+,,",'#./0%$123'I+;'$&+61$1.2+,,8'
12+##";;1C,"'$.'$F"'#.//.61$8'/+&5"$0,+#"
!"#$%&'()*$+,-.%/'0%(,1,(2(%&'()'1$1-%&'3-3%#43%
-.'#%"0%5&",&"&#",%&(1&#(+/3$4&"&1"',(#&(1&,2(&*%#&4%"#&666&
7%#,"-.$4&(8%#&,3%&03(#,&,%#)&,3-0&#",%&'".&9%&%:*%',%5&,(&
'(.,-.+%;&-1&.(,&,(&-.'#%"0%6&<8%#&,3%&$(./%#&,%#);&,3%&
#",%&(1&-.'#%"0%&-0&"&9-,&)(#%&+.'%#,"-.;&"$,3(+/3&,3%#%&-0&
.(&#%"0(.&,(&9%$-%8%&-,&2-$$&.(,&#%)"-.&.%"#$4&'(.0,".,&1(#&
",&$%"0,&=>&4%"#06&?3",&)%".0&94&=@AB;&,3%&.+)9%#&(1&
'()*(.%.,0&*%#&-.,%/#",%5&'-#'+-,&1(#&)-.-)+)&'(0,&2-$$&
9%&CB;>>>6&D&9%$-%8%&,3",&0+'3&"&$"#/%&'-#'+-,&'".&9%&9+-$,&
'1%4%3,15*$%64/$07
H.&6.2'J..&"9'>,"#$&.21#;'J+3+=12"9'KL'D0&1,'KLMN
! 7F"'/.;$'"#.2./1#'2%/C"&'.O'#./0.2"2$;'
12'+2'E-'I1,,'6.%C,"'"<"&8'8"+&
! P1;$.&1#+,,8'!-*Q;'3"$'O+;$"&
"P+&6I+&"'&"+#F123'O&"R%"2#8',1/1$+$1.2;
! S.I'!-*Q;'3"$'I16"&
! T+$F"&'$F+2'":0"#$123'-*Q;'$.'3"$'$I1#"'+;'
O+;$9'":0"#$'$.'F+<"'$I1#"'+;'/+28U
! *+&+,,",'0&.#";;123'O.&'$F"'/+;;";
! Q2O.&$%2+$",8)'*+&+,,",'0&.3&+//123'1;'F+&6V''
"D,3.&1$F/;'+26'B+$+'?$&%#$%&";'/%;$'C"'O%26+/"2$+,,8'&"6";132"6
slide by Matthew Bolitho
Motivation
![Page 7: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/7.jpg)
!"#$$%&$'()*+,-.(/$$0
1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86
/E#E/$$0
F
!"#$%&"'()'*+&+,,",'-./0%$123
!"#"$$%$&'()*+,-./&-0&"&1(#)&(1&'()*+,-./&-.&
23-'3&)".4&-.0,#+',-(.0&"#%&'"##-%5&(+,&
0-)+$,".%(+0$441510"61+
! 7&+61$1.2+,,8)',+&3"9'":0"2;1<"9';0"#1+,1="6
! >:.$1#'?%0"&#./0%$"&;'@>3)'-&+8A
! B1;$&1C%$"6'?8;$"/;'@>3)'D?-E'4F1$"9'G,%"H"2"A'
! *+&+,,",'#./0%$123'I+;'$&+61$1.2+,,8'
12+##";;1C,"'$.'$F"'#.//.61$8'/+&5"$0,+#"
!"#$%&'()*$+,-.%/'0%(,1,(2(%&'()'1$1-%&'3-3%#43%
-.'#%"0%5&",&"&#",%&(1&#(+/3$4&"&1"',(#&(1&,2(&*%#&4%"#&666&
7%#,"-.$4&(8%#&,3%&03(#,&,%#)&,3-0&#",%&'".&9%&%:*%',%5&,(&
'(.,-.+%;&-1&.(,&,(&-.'#%"0%6&<8%#&,3%&$(./%#&,%#);&,3%&
#",%&(1&-.'#%"0%&-0&"&9-,&)(#%&+.'%#,"-.;&"$,3(+/3&,3%#%&-0&
.(&#%"0(.&,(&9%$-%8%&-,&2-$$&.(,&#%)"-.&.%"#$4&'(.0,".,&1(#&
",&$%"0,&=>&4%"#06&?3",&)%".0&94&=@AB;&,3%&.+)9%#&(1&
'()*(.%.,0&*%#&-.,%/#",%5&'-#'+-,&1(#&)-.-)+)&'(0,&2-$$&
9%&CB;>>>6&D&9%$-%8%&,3",&0+'3&"&$"#/%&'-#'+-,&'".&9%&9+-$,&
'1%4%3,15*$%64/$07
H.&6.2'J..&"9'>,"#$&.21#;'J+3+=12"9'KL'D0&1,'KLMN
! 7F"'/.;$'"#.2./1#'2%/C"&'.O'#./0.2"2$;'
12'+2'E-'I1,,'6.%C,"'"<"&8'8"+&
! P1;$.&1#+,,8'!-*Q;'3"$'O+;$"&
"P+&6I+&"'&"+#F123'O&"R%"2#8',1/1$+$1.2;
! S.I'!-*Q;'3"$'I16"&
! T+$F"&'$F+2'":0"#$123'-*Q;'$.'3"$'$I1#"'+;'
O+;$9'":0"#$'$.'F+<"'$I1#"'+;'/+28U
! *+&+,,",'0&.#";;123'O.&'$F"'/+;;";
! Q2O.&$%2+$",8)'*+&+,,",'0&.3&+//123'1;'F+&6V''
"D,3.&1$F/;'+26'B+$+'?$&%#$%&";'/%;$'C"'O%26+/"2$+,,8'&"6";132"6
slide by Matthew Bolitho
Motivation
![Page 8: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/8.jpg)
GPU?IAP09 CUDA@MIT / 6.963
![Page 9: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/9.jpg)
GPUs are REALLY fast
Performance (gflops) Development Time (hours)
3D Filterbank Convolution
GPU?
Nicolas Pinto, James DiCarlo, David Cox (MIT, Harvard)
![Page 10: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/10.jpg)
GPUs are REALLY fast
Matlab
C/SSE
PS3
GT200
Performance (gflops) Development Time (hours)
3D Filterbank Convolution
GPU?
Nicolas Pinto, James DiCarlo, David Cox (MIT, Harvard)
![Page 11: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/11.jpg)
GPUs are REALLY fast
Matlab
C/SSE
PS3
GT200
0.3
Performance (gflops) Development Time (hours)
3D Filterbank Convolution
GPU?
Nicolas Pinto, James DiCarlo, David Cox (MIT, Harvard)
![Page 12: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/12.jpg)
GPUs are REALLY fast
Matlab
C/SSE
PS3
GT200
0.3
9.0
Performance (gflops) Development Time (hours)
3D Filterbank Convolution
GPU?
Nicolas Pinto, James DiCarlo, David Cox (MIT, Harvard)
![Page 13: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/13.jpg)
GPUs are REALLY fast
Matlab
C/SSE
PS3
GT200
0.3
9.0
110.0
Performance (gflops) Development Time (hours)
3D Filterbank Convolution
GPU?
Nicolas Pinto, James DiCarlo, David Cox (MIT, Harvard)
![Page 14: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/14.jpg)
GPUs are REALLY fast
Matlab
C/SSE
PS3
GT200
0.3
9.0
110.0
330.0
Performance (gflops) Development Time (hours)
3D Filterbank Convolution
GPU?
Nicolas Pinto, James DiCarlo, David Cox (MIT, Harvard)
![Page 15: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/15.jpg)
GPUs are REALLY fast
Matlab
C/SSE
PS3
GT200
0.3
9.0
110.0
330.0
0.5
Performance (gflops) Development Time (hours)
3D Filterbank Convolution
GPU?
Nicolas Pinto, James DiCarlo, David Cox (MIT, Harvard)
![Page 16: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/16.jpg)
GPUs are REALLY fast
Matlab
C/SSE
PS3
GT200
0.3
9.0
110.0
330.0
0.5
10.0
Performance (gflops) Development Time (hours)
3D Filterbank Convolution
GPU?
Nicolas Pinto, James DiCarlo, David Cox (MIT, Harvard)
![Page 17: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/17.jpg)
GPUs are REALLY fast
Matlab
C/SSE
PS3
GT200
0.3
9.0
110.0
330.0
0.5
10.0
30.0
Performance (gflops) Development Time (hours)
3D Filterbank Convolution
GPU?
Nicolas Pinto, James DiCarlo, David Cox (MIT, Harvard)
![Page 18: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/18.jpg)
GPUs are REALLY fast
Matlab
C/SSE
PS3
GT200
0.3
9.0
110.0
330.0
0.5
10.0
30.0
10.0
Performance (gflops) Development Time (hours)
3D Filterbank Convolution
GPU?
Nicolas Pinto, James DiCarlo, David Cox (MIT, Harvard)
![Page 19: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/19.jpg)
!"#$$%&$'()*+,-.(/$$0
1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86
EFG$F/$$0
&
! !"#$%&$'()*(+,-.'/
!012('&.*2(3'45&*)&6,7'&"2'89':&%;<=&;>6?&;*2(4'& !012('&.*2(3'45&*)&6,7'&"2'89':&%;<=&;>6?&;*2(4'&
! 6'401-'@&)*(&+,3AB0-3'-407':&C,(,DD'D&
C(*8D'+4/
! E*('&3(,-4043*(4&@'@0.,3'@&3*&?">&3A,-&)D*F&
.*-3(*D&,-@&@,3,&.,.A'
! GA,3&,('&3A'&.*-4'H2'-.'4I
! GA,3&,('&3A'&.*-4'H2'-.'4I
! $(*1(,+&+243&8'&+*('&C('@0.3,8D'/
! 6,3,&,..'44&.*A'('-.5
! $(*1(,+&)D*F
slide by Matthew Bolitho
GPU?
![Page 20: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/20.jpg)
!"#$$%&$'()*+,-.(/$$0
1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86
EFG$F/$$0
&
! !"#$%&$'()*(+,-.'/
!012('&.*2(3'45&*)&6,7'&"2'89':&%;<=&;>6?&;*2(4'& !012('&.*2(3'45&*)&6,7'&"2'89':&%;<=&;>6?&;*2(4'&
! 6'401-'@&)*(&+,3AB0-3'-407':&C,(,DD'D&
C(*8D'+4/
! E*('&3(,-4043*(4&@'@0.,3'@&3*&?">&3A,-&)D*F&
.*-3(*D&,-@&@,3,&.,.A'
! GA,3&,('&3A'&.*-4'H2'-.'4I
! GA,3&,('&3A'&.*-4'H2'-.'4I
! $(*1(,+&+243&8'&+*('&C('@0.3,8D'/
! 6,3,&,..'44&.*A'('-.5
! $(*1(,+&)D*F
slide by Matthew Bolitho
GPU?
![Page 21: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/21.jpg)
!"#$$%&$'()*+,-.(/$$0
1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86
/E#E/$$0
F
!"#$%&"'()'*+&+,,",'-./0%$123
!"#"$$%$&'()*+,-./&-0&"&1(#)&(1&'()*+,-./&-.&
23-'3&)".4&-.0,#+',-(.0&"#%&'"##-%5&(+,&
0-)+$,".%(+0$441510"61+
! 7&+61$1.2+,,8)',+&3"9'":0"2;1<"9';0"#1+,1="6
! >:.$1#'?%0"&#./0%$"&;'@>3)'-&+8A
! B1;$&1C%$"6'?8;$"/;'@>3)'D?-E'4F1$"9'G,%"H"2"A'
! *+&+,,",'#./0%$123'I+;'$&+61$1.2+,,8'
12+##";;1C,"'$.'$F"'#.//.61$8'/+&5"$0,+#"
!"#$%&'()*$+,-.%/'0%(,1,(2(%&'()'1$1-%&'3-3%#43%
-.'#%"0%5&",&"&#",%&(1&#(+/3$4&"&1"',(#&(1&,2(&*%#&4%"#&666&
7%#,"-.$4&(8%#&,3%&03(#,&,%#)&,3-0&#",%&'".&9%&%:*%',%5&,(&
'(.,-.+%;&-1&.(,&,(&-.'#%"0%6&<8%#&,3%&$(./%#&,%#);&,3%&
#",%&(1&-.'#%"0%&-0&"&9-,&)(#%&+.'%#,"-.;&"$,3(+/3&,3%#%&-0&
.(&#%"0(.&,(&9%$-%8%&-,&2-$$&.(,&#%)"-.&.%"#$4&'(.0,".,&1(#&
",&$%"0,&=>&4%"#06&?3",&)%".0&94&=@AB;&,3%&.+)9%#&(1&
'()*(.%.,0&*%#&-.,%/#",%5&'-#'+-,&1(#&)-.-)+)&'(0,&2-$$&
9%&CB;>>>6&D&9%$-%8%&,3",&0+'3&"&$"#/%&'-#'+-,&'".&9%&9+-$,&
'1%4%3,15*$%64/$07
H.&6.2'J..&"9'>,"#$&.21#;'J+3+=12"9'KL'D0&1,'KLMN
! 7F"'/.;$'"#.2./1#'2%/C"&'.O'#./0.2"2$;'
12'+2'E-'I1,,'6.%C,"'"<"&8'8"+&
! P1;$.&1#+,,8'!-*Q;'3"$'O+;$"&
"P+&6I+&"'&"+#F123'O&"R%"2#8',1/1$+$1.2;
! S.I'!-*Q;'3"$'I16"&
! T+$F"&'$F+2'":0"#$123'-*Q;'$.'3"$'$I1#"'+;'
O+;$9'":0"#$'$.'F+<"'$I1#"'+;'/+28U
! *+&+,,",'0&.#";;123'O.&'$F"'/+;;";
! Q2O.&$%2+$",8)'*+&+,,",'0&.3&+//123'1;'F+&6V''
"D,3.&1$F/;'+26'B+$+'?$&%#$%&";'/%;$'C"'O%26+/"2$+,,8'&"6";132"6
slide by Matthew Bolitho
GPU?
![Page 22: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/22.jpg)
4
Task vs. Data parallelismTask vs. Data parallelism
• Task parallel
– Independent processes with little communication
– Easy to use
• “Free” on modern operating systems with SMP
• Data parallel
– Lots of data on which the same computation is being
executed
– No dependencies between data elements in each
step in the computation
– Can saturate many ALUs
– But often requires redesign of traditional algorithms
slide by Mike Houston
GPU?
![Page 23: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/23.jpg)
5
CPU vs. GPUCPU vs. GPU
• CPU
– Really fast caches (great for data reuse)
– Fine branching granularity
– Lots of different processes/threads
– High performance on a single thread of execution
• GPU
– Lots of math units
– Fast access to onboard memory
– Run a program on each fragment/vertex
– High throughput on parallel tasks
• CPUs are great for task parallelism
• GPUs are great for data parallelismslide by Mike Houston
GPU?
![Page 24: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/24.jpg)
6
The Importance of Data Parallelism for GPUsThe Importance of Data Parallelism for GPUs
• GPUs are designed for highly parallel tasks like
rendering
• GPUs process independent vertices and fragments
– Temporary registers are zeroed
– No shared or static data
– No read-modify-write buffers
– In short, no communication between vertices or fragments
• Data-parallel processing
– GPU architectures are ALU-heavy
• Multiple vertex & pixel pipelines
• Lots of compute power
– GPU memory systems are designed to stream data
• Linear access patterns can be prefetched
• Hide memory latency slide by Mike Houston
GPUs
![Page 25: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/25.jpg)
GPUHistory
IAP09 CUDA@MIT / 6.963
![Page 26: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/26.jpg)
not true!
History
![Page 27: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/27.jpg)
!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0
/DE/D/$$0
/
!""#$%&'$()
*(++&),
-.(+.'/0
1&2'./$3&'$()
4.5'6/.
7/&8+.)'
9$2"#&0
4:.;'/&,$'$()&#;+(,.#;<(/;=>9;1.),./$)8
!"#$%! ?./'$%.2;&),;@/$+$'$A.2
! 4/&)2<(/+&'$()2! !"#$%"&#'()*)+,%,*-.',%/0
&$%#$%! B9;C+&8.;<(/;,$2"#&0
!""#$%&'$()
*(++&),
-.(+.'/0
1&2'./$3&'$()
4.5'6/.
7/&8+.)'
9$2"#&0
'()*"%+,##()-.%)/"++
! @/."&/.;&),;#(&,;,&'&! C226.2;%(++&),2;A$&;&);!@C;D.E8E;
F".)-G;(/;9$/.%'HI
!""#$%&'$()
*(++&),
-.(+.'/0
1&2'./$3&'$()
4.5'6/.
7/&8+.)'
9$2"#&0
01/-*22)"3+-/44."5265.%.+71/4+%8*.##()-.%)/"
! 4/$&)86#&'.;@(#08()2
! @/."&/.;A./'.5;,&'&;2'/.&+2! 0
!""#$%&'$()
*(++&),
-.(+.'/0
1&2'./$3&'$()
4.5'6/.
7/&8+.)'
9$2"#&0
01/-*22)"3+9*1%)-*2
! J(,.#$)8;4/&)2<(/+&'$()2! ?$.K$)8;4/&)2<(/+&'$()2
! ?./'.5>L&2.,;G$8:'$)8;*(+"6'&'$()! @./2".%'$A.;4/&)2<(/+&'$()
!""#$%&'$()
*(++&),
-.(+.'/0
1&2'./$3&'$()
4.5'6/.
7/&8+.)'
9$2"#&0
'/":*1%)"3+;*/4*%1<+%/+=1.34*"%2
! 7(/+&'$();(<;'/$&)8#.2;K$':;"/(%.22.,;A./'$%.2
! C)'./"(#&'$();(<;A./'.5;&''/$L6'.2;&%/(22;'/$&)8#.2
! 1(,*-2,/%"3,'4"3"5,6
! */.&'$)8;7/&8+.)'2;</(+;':.;4/$&)8#.2! *#$""$)8
!""#$%&'$()
*(++&),
-.(+.'/0
1&2'./$3&'$()
4.5'6/.
7/&8+.)'
9$2"#&0
>*?%$1*+4.##)"3+/7+=1.34*"%2
! !77892'1%,:%9*,'+,+7*;6
! @./<(/+;'.5'6/.;<$#'./$)8
slide by Matthew Bolitho
History
![Page 28: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/28.jpg)
!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0
/DE/D/$$0
F
!""#$%&'$()
*(++&),
-.(+.'/0
1&2'./$3&'$()
4.5'6/.
7/&8+.)'
9$2"#&0
!"#$%&'()*"+,&--.'$
! :./;7/&8+.)';<$8='$)8! >+&8.?@&2.,;.AA.%'2
! !)'$?!#$&2$)8! !#"=&;B#.),$)8! !
!""#$%&'$()
*(++&),
-.(+.'/0
1&2'./$3&'$()
4.5'6/.
7/&8+.)'
9$2"#&0
/+'0&"(.'$)!"#$%&'(-)(+)*.1&2-
! 9."'=;B6AA./;4.2'! C'.)%$#;B6AA./;4.2'
! !%%6+6#&'$();B6AA./;D"./&'$()! E/$'.;7/&8+.)'2;'(;7/&+.@6AA./
! 1.),./;.'(&"#,(.0&F;/.&#$2'$%;%(+"6'./;
8.)./&'.,;2%.).2
! G&%=;A/&+.;$2;%(+"#.5
! H..,;IJ;A/&+.2;"./;2.%(),
! "#$%&'()*)'+,,'&-,(.
"3&4.,#(&4)5#"46#"&
!""#$%&'$()
*(++&),
-.(+.'/0
1&2'./$3&'$()
4.5'6/.
7/&8+.)'
9$2"#&0
! 4(;$+"/(K.;"./A(/+&)%.F;+(K.;2(+.;L(/M;'(;,.,$%&'.,;=&/,L&/.
! N&/,L&/.;%(6#,;"/(%.22;.&%=;K./'.5;
&),;.&%=;A/&8+.)';$),.".),.)'#0;"7.$528)*#"#22&2
*:O;P;N(2'
-/&"=$%2;N&/,L&/.
! /0)'1*23045&'#43)-46)'(2&'7!"#$%&!'()*"+(8
" N&/,L&/.;L&2;=&/,L$/.,;'(;"./A(/+;'=.;("./&'$()2;$);'=.;"$".#$).
! GK.)'6&##0F;"$".#$).;@.%&+.;+(/.;"/(8/&++&@#.
!""#$%&'$()
*(++&),
-.(+.'/0
1&2'./$3&'$()
4.5'6/.
7/&8+.)'
9$2"#&0
! 94*&+'&+2:)'+,';)'73*,:*2<<2;-)8'()*)'
4.5'6/.;&),;7/&8+.)';2'&8.2
! !@#.;'(;2".%$A0;&;,$2%/.'.;2.';(A;'.5'6/.;
@#.),$)8;("./&'$()2;;! *(6#,;%(+@$).;/.26#'2;A/(+;Q;'.5'6/.;
#((M6"2R;;GR8R;;
!SBT;;U!VW!FBXT;!;9D4;BT;!YB
! H(;%$/%6#&'$();(A;,&'&;$);"$".#$).
*:O;P;N(2'
-/&"=$%2;N&/,L&/.
slide by Matthew Bolitho
History
![Page 29: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/29.jpg)
!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0
/DE/D/$$0
F
!""#$%&'$()
*(++&),
-.(+.'/0
1&2'./$3&'$()
4.5'6/.
7/&8+.)'
9$2"#&0
!"#$%&'()*"+,&--.'$
! :./;7/&8+.)';<$8='$)8! >+&8.?@&2.,;.AA.%'2
! !)'$?!#$&2$)8! !#"=&;B#.),$)8! !
!""#$%&'$()
*(++&),
-.(+.'/0
1&2'./$3&'$()
4.5'6/.
7/&8+.)'
9$2"#&0
/+'0&"(.'$)!"#$%&'(-)(+)*.1&2-
! 9."'=;B6AA./;4.2'! C'.)%$#;B6AA./;4.2'
! !%%6+6#&'$();B6AA./;D"./&'$()! E/$'.;7/&8+.)'2;'(;7/&+.@6AA./
! 1.),./;.'(&"#,(.0&F;/.&#$2'$%;%(+"6'./;
8.)./&'.,;2%.).2
! G&%=;A/&+.;$2;%(+"#.5
! H..,;IJ;A/&+.2;"./;2.%(),
! "#$%&'()*)'+,,'&-,(.
"3&4.,#(&4)5#"46#"&
!""#$%&'$()
*(++&),
-.(+.'/0
1&2'./$3&'$()
4.5'6/.
7/&8+.)'
9$2"#&0
! 4(;$+"/(K.;"./A(/+&)%.F;+(K.;2(+.;L(/M;'(;,.,$%&'.,;=&/,L&/.
! N&/,L&/.;%(6#,;"/(%.22;.&%=;K./'.5;
&),;.&%=;A/&8+.)';$),.".),.)'#0;"7.$528)*#"#22&2
*:O;P;N(2'
-/&"=$%2;N&/,L&/.
! /0)'1*23045&'#43)-46)'(2&'7!"#$%&!'()*"+(8
" N&/,L&/.;L&2;=&/,L$/.,;'(;"./A(/+;'=.;("./&'$()2;$);'=.;"$".#$).
! GK.)'6&##0F;"$".#$).;@.%&+.;+(/.;"/(8/&++&@#.
!""#$%&'$()
*(++&),
-.(+.'/0
1&2'./$3&'$()
4.5'6/.
7/&8+.)'
9$2"#&0
! 94*&+'&+2:)'+,';)'73*,:*2<<2;-)8'()*)'
4.5'6/.;&),;7/&8+.)';2'&8.2
! !@#.;'(;2".%$A0;&;,$2%/.'.;2.';(A;'.5'6/.;
@#.),$)8;("./&'$()2;;! *(6#,;%(+@$).;/.26#'2;A/(+;Q;'.5'6/.;
#((M6"2R;;GR8R;;
!SBT;;U!VW!FBXT;!;9D4;BT;!YB
! H(;%$/%6#&'$();(A;,&'&;$);"$".#$).
*:O;P;N(2'
-/&"=$%2;N&/,L&/.
slide by Matthew Bolitho
History
![Page 30: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/30.jpg)
!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0
/DE/D/$$0
F
!""#$%&'$()
*(++&),
-.(+.'/0
1&2'./$3&'$()
4.5'6/.
7/&8+.)'
9$2"#&0
!"#$%&'()*"+,&--.'$
! :./;7/&8+.)';<$8='$)8! >+&8.?@&2.,;.AA.%'2
! !)'$?!#$&2$)8! !#"=&;B#.),$)8! !
!""#$%&'$()
*(++&),
-.(+.'/0
1&2'./$3&'$()
4.5'6/.
7/&8+.)'
9$2"#&0
/+'0&"(.'$)!"#$%&'(-)(+)*.1&2-
! 9."'=;B6AA./;4.2'! C'.)%$#;B6AA./;4.2'
! !%%6+6#&'$();B6AA./;D"./&'$()! E/$'.;7/&8+.)'2;'(;7/&+.@6AA./
! 1.),./;.'(&"#,(.0&F;/.&#$2'$%;%(+"6'./;
8.)./&'.,;2%.).2
! G&%=;A/&+.;$2;%(+"#.5
! H..,;IJ;A/&+.2;"./;2.%(),
! "#$%&'()*)'+,,'&-,(.
"3&4.,#(&4)5#"46#"&
!""#$%&'$()
*(++&),
-.(+.'/0
1&2'./$3&'$()
4.5'6/.
7/&8+.)'
9$2"#&0
! 4(;$+"/(K.;"./A(/+&)%.F;+(K.;2(+.;L(/M;'(;,.,$%&'.,;=&/,L&/.
! N&/,L&/.;%(6#,;"/(%.22;.&%=;K./'.5;
&),;.&%=;A/&8+.)';$),.".),.)'#0;"7.$528)*#"#22&2
*:O;P;N(2'
-/&"=$%2;N&/,L&/.
! /0)'1*23045&'#43)-46)'(2&'7!"#$%&!'()*"+(8
" N&/,L&/.;L&2;=&/,L$/.,;'(;"./A(/+;'=.;("./&'$()2;$);'=.;"$".#$).
! GK.)'6&##0F;"$".#$).;@.%&+.;+(/.;"/(8/&++&@#.
!""#$%&'$()
*(++&),
-.(+.'/0
1&2'./$3&'$()
4.5'6/.
7/&8+.)'
9$2"#&0
! 94*&+'&+2:)'+,';)'73*,:*2<<2;-)8'()*)'
4.5'6/.;&),;7/&8+.)';2'&8.2
! !@#.;'(;2".%$A0;&;,$2%/.'.;2.';(A;'.5'6/.;
@#.),$)8;("./&'$()2;;! *(6#,;%(+@$).;/.26#'2;A/(+;Q;'.5'6/.;
#((M6"2R;;GR8R;;
!SBT;;U!VW!FBXT;!;9D4;BT;!YB
! H(;%$/%6#&'$();(A;,&'&;$);"$".#$).
*:O;P;N(2'
-/&"=$%2;N&/,L&/.
slide by Matthew Bolitho
History
![Page 31: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/31.jpg)
!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0
/DE/D/$$0
&
!""#$%&'$()
*(++&),
-.(+.'/0
1&2'./$3&'$()
4.5'6/.
7/&8+.)'
9$2"#&0
! 4.5'6/.:&),:7/&8+.)':2'&8.2:;.%&+.:+(/.:"/(8/&++&;#.<:%(+;$).,:$)'(:
!"#$%&'()*+(,)-*=>:?:@(2'
-/&"A$%2:@&/,B&/.
!""#$%&'$()
*(++&),
-.(+.'/0
1&2'./$3&'$()
7/&8+.)':>)$'
9$2"#&0
! 4.5'6/.:&),:7/&8+.)':2'&8.2:;.%&+.:+(/.:"/(8/&++&;#.<:%(+;$).,:$)'(:
!"#$%&'()*+(,)-
! =/(8/&++&;#.:C$&:&22.+;#0:#&)86&8.! D.+(/0:/.&,2:C$&:'.5'6/.:#((E6"2! !.'/'(0$()-*)'1)2#'*34452/6
! F$+$'.,:=/(8/&+:2$3.
! G(:/.&#:;/&)%A$)8:H'A62:#(("$)8I
*=>:?:@(2'
-/&"A$%2:@&/,B&/.
!""#$%&'$()
*(++&),
J./'.5:>)$'
1&2'./$3&'$()
7/&8+.)':>)$'
9$2"#&0
! -.(+.'/0:2'&8.:;.%&+.:/#4%#$&&$73'8*9$33'0*!:'#)'1*+(,)-
! =/(8/&++&;#.:C$&:&22.+;#0:#&)86&8.
! G(:+.+(/0:/.&,2K
! F$+$'.,:=/(8/&+:2$3.
! G(:/.&#:;/&)%A$)8:H'A62:#(("$)8I
*=>:?:@(2'
-/&"A$%2:@&/,B&/.
!""#$%&'$()
*(++&),
J./'.5:>)$'
1&2'./$3&'$()
7/&8+.)':>)$'
9$2"#&0
! 4A$)82:$+"/(C.,:(C./:'$+.L
! J./'.5:6)$':%&):,(:+.+(/0:/.&,2! D&5$+6+:=/(8/&+:2$3.:$)%/.&2.,! M/&)%A$)8:26""(/'! @$8A./:#.C.#:#&)86&8.2:H.N8N:@FOF<:*8I
! G.$'A./:'A.:J./'.5:(/:7/&8+.)':6)$'2:%(6#,:B/$'.:'(:+.+(/0N::*&):()#0:B/$'.:'(:P/&+.:;6PP./
! G(:$)'.8./:+&'A! G(:;$'B$2.:("./&'(/2
*=>:?:@(2'
-/&"A$%2:@&/,B&/.
!""#$%&'$()
*(++&),*=>:?:@(2'
1&2'./$3&
'$()
9$2"#&0
!"#$"%&'()$*#+,-"($&
'()$
4.5'6/.:D.+(/0 4.5'6/.:D.+(/0
-/&"A$%2:@&/,B&/.
! ;(*<==>*?@+A6*7'9$&'*&46)3B*/#4%#$&&$73'8*
! !C23),Q/$66-*$3%4#,)D&6*$334E'0*E#,)'6*)4*
+.+(/0L
! R):"&22:S:B/$'.:'(:P/&+.;6PP./
! 1.;$),:'A.:P/&+.;6PP./ &2:&:'.5'6/.
! 1.&,:$':$):"&22:T<:.'%N
! M6':B./.:$).PP$%$.)'
slide by Matthew Bolitho
History
![Page 32: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/32.jpg)
!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0
/DE/D/$$0
&
!""#$%&'$()
*(++&),
-.(+.'/0
1&2'./$3&'$()
4.5'6/.
7/&8+.)'
9$2"#&0
! 4.5'6/.:&),:7/&8+.)':2'&8.2:;.%&+.:+(/.:"/(8/&++&;#.<:%(+;$).,:$)'(:
!"#$%&'()*+(,)-*=>:?:@(2'
-/&"A$%2:@&/,B&/.
!""#$%&'$()
*(++&),
-.(+.'/0
1&2'./$3&'$()
7/&8+.)':>)$'
9$2"#&0
! 4.5'6/.:&),:7/&8+.)':2'&8.2:;.%&+.:+(/.:"/(8/&++&;#.<:%(+;$).,:$)'(:
!"#$%&'()*+(,)-
! =/(8/&++&;#.:C$&:&22.+;#0:#&)86&8.! D.+(/0:/.&,2:C$&:'.5'6/.:#((E6"2! !.'/'(0$()-*)'1)2#'*34452/6
! F$+$'.,:=/(8/&+:2$3.
! G(:/.&#:;/&)%A$)8:H'A62:#(("$)8I
*=>:?:@(2'
-/&"A$%2:@&/,B&/.
!""#$%&'$()
*(++&),
J./'.5:>)$'
1&2'./$3&'$()
7/&8+.)':>)$'
9$2"#&0
! -.(+.'/0:2'&8.:;.%&+.:/#4%#$&&$73'8*9$33'0*!:'#)'1*+(,)-
! =/(8/&++&;#.:C$&:&22.+;#0:#&)86&8.
! G(:+.+(/0:/.&,2K
! F$+$'.,:=/(8/&+:2$3.
! G(:/.&#:;/&)%A$)8:H'A62:#(("$)8I
*=>:?:@(2'
-/&"A$%2:@&/,B&/.
!""#$%&'$()
*(++&),
J./'.5:>)$'
1&2'./$3&'$()
7/&8+.)':>)$'
9$2"#&0
! 4A$)82:$+"/(C.,:(C./:'$+.L
! J./'.5:6)$':%&):,(:+.+(/0:/.&,2! D&5$+6+:=/(8/&+:2$3.:$)%/.&2.,! M/&)%A$)8:26""(/'! @$8A./:#.C.#:#&)86&8.2:H.N8N:@FOF<:*8I
! G.$'A./:'A.:J./'.5:(/:7/&8+.)':6)$'2:%(6#,:B/$'.:'(:+.+(/0N::*&):()#0:B/$'.:'(:P/&+.:;6PP./
! G(:$)'.8./:+&'A! G(:;$'B$2.:("./&'(/2
*=>:?:@(2'
-/&"A$%2:@&/,B&/.
!""#$%&'$()
*(++&),*=>:?:@(2'
1&2'./$3&
'$()
9$2"#&0
!"#$"%&'()$*#+,-"($&
'()$
4.5'6/.:D.+(/0 4.5'6/.:D.+(/0
-/&"A$%2:@&/,B&/.
! ;(*<==>*?@+A6*7'9$&'*&46)3B*/#4%#$&&$73'8*
! !C23),Q/$66-*$3%4#,)D&6*$334E'0*E#,)'6*)4*
+.+(/0L
! R):"&22:S:B/$'.:'(:P/&+.;6PP./
! 1.;$),:'A.:P/&+.;6PP./ &2:&:'.5'6/.
! 1.&,:$':$):"&22:T<:.'%N
! M6':B./.:$).PP$%$.)'
slide by Matthew Bolitho
History
![Page 33: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/33.jpg)
!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0
/DE/D/$$0
&
!""#$%&'$()
*(++&),
-.(+.'/0
1&2'./$3&'$()
4.5'6/.
7/&8+.)'
9$2"#&0
! 4.5'6/.:&),:7/&8+.)':2'&8.2:;.%&+.:+(/.:"/(8/&++&;#.<:%(+;$).,:$)'(:
!"#$%&'()*+(,)-*=>:?:@(2'
-/&"A$%2:@&/,B&/.
!""#$%&'$()
*(++&),
-.(+.'/0
1&2'./$3&'$()
7/&8+.)':>)$'
9$2"#&0
! 4.5'6/.:&),:7/&8+.)':2'&8.2:;.%&+.:+(/.:"/(8/&++&;#.<:%(+;$).,:$)'(:
!"#$%&'()*+(,)-
! =/(8/&++&;#.:C$&:&22.+;#0:#&)86&8.! D.+(/0:/.&,2:C$&:'.5'6/.:#((E6"2! !.'/'(0$()-*)'1)2#'*34452/6
! F$+$'.,:=/(8/&+:2$3.
! G(:/.&#:;/&)%A$)8:H'A62:#(("$)8I
*=>:?:@(2'
-/&"A$%2:@&/,B&/.
!""#$%&'$()
*(++&),
J./'.5:>)$'
1&2'./$3&'$()
7/&8+.)':>)$'
9$2"#&0
! -.(+.'/0:2'&8.:;.%&+.:/#4%#$&&$73'8*9$33'0*!:'#)'1*+(,)-
! =/(8/&++&;#.:C$&:&22.+;#0:#&)86&8.
! G(:+.+(/0:/.&,2K
! F$+$'.,:=/(8/&+:2$3.
! G(:/.&#:;/&)%A$)8:H'A62:#(("$)8I
*=>:?:@(2'
-/&"A$%2:@&/,B&/.
!""#$%&'$()
*(++&),
J./'.5:>)$'
1&2'./$3&'$()
7/&8+.)':>)$'
9$2"#&0
! 4A$)82:$+"/(C.,:(C./:'$+.L
! J./'.5:6)$':%&):,(:+.+(/0:/.&,2! D&5$+6+:=/(8/&+:2$3.:$)%/.&2.,! M/&)%A$)8:26""(/'! @$8A./:#.C.#:#&)86&8.2:H.N8N:@FOF<:*8I
! G.$'A./:'A.:J./'.5:(/:7/&8+.)':6)$'2:%(6#,:B/$'.:'(:+.+(/0N::*&):()#0:B/$'.:'(:P/&+.:;6PP./
! G(:$)'.8./:+&'A! G(:;$'B$2.:("./&'(/2
*=>:?:@(2'
-/&"A$%2:@&/,B&/.
!""#$%&'$()
*(++&),*=>:?:@(2'
1&2'./$3&
'$()
9$2"#&0
!"#$"%&'()$*#+,-"($&
'()$
4.5'6/.:D.+(/0 4.5'6/.:D.+(/0
-/&"A$%2:@&/,B&/.
! ;(*<==>*?@+A6*7'9$&'*&46)3B*/#4%#$&&$73'8*
! !C23),Q/$66-*$3%4#,)D&6*$334E'0*E#,)'6*)4*
+.+(/0L
! R):"&22:S:B/$'.:'(:P/&+.;6PP./
! 1.;$),:'A.:P/&+.;6PP./ &2:&:'.5'6/.
! 1.&,:$':$):"&22:T<:.'%N
! M6':B./.:$).PP$%$.)'
slide by Matthew Bolitho
History
![Page 34: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/34.jpg)
!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0
/DE/D/$$0
&
!""#$%&'$()
*(++&),
-.(+.'/0
1&2'./$3&'$()
4.5'6/.
7/&8+.)'
9$2"#&0
! 4.5'6/.:&),:7/&8+.)':2'&8.2:;.%&+.:+(/.:"/(8/&++&;#.<:%(+;$).,:$)'(:
!"#$%&'()*+(,)-*=>:?:@(2'
-/&"A$%2:@&/,B&/.
!""#$%&'$()
*(++&),
-.(+.'/0
1&2'./$3&'$()
7/&8+.)':>)$'
9$2"#&0
! 4.5'6/.:&),:7/&8+.)':2'&8.2:;.%&+.:+(/.:"/(8/&++&;#.<:%(+;$).,:$)'(:
!"#$%&'()*+(,)-
! =/(8/&++&;#.:C$&:&22.+;#0:#&)86&8.! D.+(/0:/.&,2:C$&:'.5'6/.:#((E6"2! !.'/'(0$()-*)'1)2#'*34452/6
! F$+$'.,:=/(8/&+:2$3.
! G(:/.&#:;/&)%A$)8:H'A62:#(("$)8I
*=>:?:@(2'
-/&"A$%2:@&/,B&/.
!""#$%&'$()
*(++&),
J./'.5:>)$'
1&2'./$3&'$()
7/&8+.)':>)$'
9$2"#&0
! -.(+.'/0:2'&8.:;.%&+.:/#4%#$&&$73'8*9$33'0*!:'#)'1*+(,)-
! =/(8/&++&;#.:C$&:&22.+;#0:#&)86&8.
! G(:+.+(/0:/.&,2K
! F$+$'.,:=/(8/&+:2$3.
! G(:/.&#:;/&)%A$)8:H'A62:#(("$)8I
*=>:?:@(2'
-/&"A$%2:@&/,B&/.
!""#$%&'$()
*(++&),
J./'.5:>)$'
1&2'./$3&'$()
7/&8+.)':>)$'
9$2"#&0
! 4A$)82:$+"/(C.,:(C./:'$+.L
! J./'.5:6)$':%&):,(:+.+(/0:/.&,2! D&5$+6+:=/(8/&+:2$3.:$)%/.&2.,! M/&)%A$)8:26""(/'! @$8A./:#.C.#:#&)86&8.2:H.N8N:@FOF<:*8I
! G.$'A./:'A.:J./'.5:(/:7/&8+.)':6)$'2:%(6#,:B/$'.:'(:+.+(/0N::*&):()#0:B/$'.:'(:P/&+.:;6PP./
! G(:$)'.8./:+&'A! G(:;$'B$2.:("./&'(/2
*=>:?:@(2'
-/&"A$%2:@&/,B&/.
!""#$%&'$()
*(++&),*=>:?:@(2'
1&2'./$3&
'$()
9$2"#&0
!"#$"%&'()$*#+,-"($&
'()$
4.5'6/.:D.+(/0 4.5'6/.:D.+(/0
-/&"A$%2:@&/,B&/.
! ;(*<==>*?@+A6*7'9$&'*&46)3B*/#4%#$&&$73'8*
! !C23),Q/$66-*$3%4#,)D&6*$334E'0*E#,)'6*)4*
+.+(/0L
! R):"&22:S:B/$'.:'(:P/&+.;6PP./
! 1.;$),:'A.:P/&+.;6PP./ &2:&:'.5'6/.
! 1.&,:$':$):"&22:T<:.'%N
! M6':B./.:$).PP$%$.)'
slide by Matthew Bolitho
History
![Page 35: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/35.jpg)
!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0
/DE/D/$$0
&
!""#$%&'$()
*(++&),
-.(+.'/0
1&2'./$3&'$()
4.5'6/.
7/&8+.)'
9$2"#&0
! 4.5'6/.:&),:7/&8+.)':2'&8.2:;.%&+.:+(/.:"/(8/&++&;#.<:%(+;$).,:$)'(:
!"#$%&'()*+(,)-*=>:?:@(2'
-/&"A$%2:@&/,B&/.
!""#$%&'$()
*(++&),
-.(+.'/0
1&2'./$3&'$()
7/&8+.)':>)$'
9$2"#&0
! 4.5'6/.:&),:7/&8+.)':2'&8.2:;.%&+.:+(/.:"/(8/&++&;#.<:%(+;$).,:$)'(:
!"#$%&'()*+(,)-
! =/(8/&++&;#.:C$&:&22.+;#0:#&)86&8.! D.+(/0:/.&,2:C$&:'.5'6/.:#((E6"2! !.'/'(0$()-*)'1)2#'*34452/6
! F$+$'.,:=/(8/&+:2$3.
! G(:/.&#:;/&)%A$)8:H'A62:#(("$)8I
*=>:?:@(2'
-/&"A$%2:@&/,B&/.
!""#$%&'$()
*(++&),
J./'.5:>)$'
1&2'./$3&'$()
7/&8+.)':>)$'
9$2"#&0
! -.(+.'/0:2'&8.:;.%&+.:/#4%#$&&$73'8*9$33'0*!:'#)'1*+(,)-
! =/(8/&++&;#.:C$&:&22.+;#0:#&)86&8.
! G(:+.+(/0:/.&,2K
! F$+$'.,:=/(8/&+:2$3.
! G(:/.&#:;/&)%A$)8:H'A62:#(("$)8I
*=>:?:@(2'
-/&"A$%2:@&/,B&/.
!""#$%&'$()
*(++&),
J./'.5:>)$'
1&2'./$3&'$()
7/&8+.)':>)$'
9$2"#&0
! 4A$)82:$+"/(C.,:(C./:'$+.L
! J./'.5:6)$':%&):,(:+.+(/0:/.&,2! D&5$+6+:=/(8/&+:2$3.:$)%/.&2.,! M/&)%A$)8:26""(/'! @$8A./:#.C.#:#&)86&8.2:H.N8N:@FOF<:*8I
! G.$'A./:'A.:J./'.5:(/:7/&8+.)':6)$'2:%(6#,:B/$'.:'(:+.+(/0N::*&):()#0:B/$'.:'(:P/&+.:;6PP./
! G(:$)'.8./:+&'A! G(:;$'B$2.:("./&'(/2
*=>:?:@(2'
-/&"A$%2:@&/,B&/.
!""#$%&'$()
*(++&),*=>:?:@(2'
1&2'./$3&
'$()
9$2"#&0
!"#$"%&'()$*#+,-"($&
'()$
4.5'6/.:D.+(/0 4.5'6/.:D.+(/0
-/&"A$%2:@&/,B&/.
! ;(*<==>*?@+A6*7'9$&'*&46)3B*/#4%#$&&$73'8*
! !C23),Q/$66-*$3%4#,)D&6*$334E'0*E#,)'6*)4*
+.+(/0L
! R):"&22:S:B/$'.:'(:P/&+.;6PP./
! 1.;$),:'A.:P/&+.;6PP./ &2:&:'.5'6/.
! 1.&,:$':$):"&22:T<:.'%N
! M6':B./.:$).PP$%$.)'
slide by Matthew Bolitho
History
![Page 36: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/36.jpg)
!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0
/DE/D/$$0
F
! !"#$%&"'(%)%&*&%+,#-'././0'1+))2,%&3'45"67././0'8'.","5*('/25$+#"'9+)$2&*&%+,'+,'&:"'./0;
!"!"#$"%&'%()*
! !"#$%&'()&*)+%),&-#.%
! /(*1"'<*&*'%,'&"=&25"#
! !5*6'*'>(*&'?2*<'7+>>@#15"",;
! A5%&"')2(&%@$*##'*(4+5%&:)'2#%,4'B5*4)",&'0,%&'
&+'$"5>+5)'12#&+)'$5+1"##%,4
! 0,<"5@2&%(%C"<':*5<6*5"
! D,(3'2&%(%C"<'B5*4)",&'0,%&
! D>&",')")+53'E*,<6%<&:'(%)%&"<
! .*&:"5@E*#"<'*(4+5%&:)#'+,(3'7,+'#1*&&"5;
! 0#"<'&:"'.5*$:%1#'F/G
F$$(%1*&%+,
9+))*,<9/0'H'I+#&
J*#&"5%C*
&%+,
!%#$(*3
+,%-,.$#/0-1%('),/-$
#/0-
K")+53 K")+53
.5*$:%1#'I*5<6*5"
!,&),-%2$
#/0-
K")+53
! ."+)"&53'0,%&'+$"5*&"#'+,'*'$5%)%&%L"-'1*,'
65%&"'E*1M'&+')")+53
! 9:*,4"#'&+'2,<"5(3%,4':*5<6*5"N
! FE%(%&3'&+'65%&"'&+')")+53
! /-#.0.)12&3+"4)((.#5&'#.%(
! 90!F'%#'&:"',"6'6*3'&+'$"5>+5)'
1+)$2&*&%+,'+,'&:"'./0
! !")(#$%&'()&6+738.4(&9:;
! F5E%&5*53'*11"##'&+')")+53'7#1*&&"5'+5'
4*&:"5;! 0#"#'*(('*L*%(*E("'$5+1"##%,4'2,%&#
! I*#'G,&"4"5')*&:-'O%&6%#"'+$"5*&+5#
slide by Matthew Bolitho
History
![Page 37: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/37.jpg)
!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0
/DE/D/$$0
F
! !"#$%&"'(%)%&*&%+,#-'././0'1+))2,%&3'45"67././0'8'.","5*('/25$+#"'9+)$2&*&%+,'+,'&:"'./0;
!"!"#$"%&'%()*
! !"#$%&'()&*)+%),&-#.%
! /(*1"'<*&*'%,'&"=&25"#
! !5*6'*'>(*&'?2*<'7+>>@#15"",;
! A5%&"')2(&%@$*##'*(4+5%&:)'2#%,4'B5*4)",&'0,%&'
&+'$"5>+5)'12#&+)'$5+1"##%,4
! 0,<"5@2&%(%C"<':*5<6*5"
! D,(3'2&%(%C"<'B5*4)",&'0,%&
! D>&",')")+53'E*,<6%<&:'(%)%&"<
! .*&:"5@E*#"<'*(4+5%&:)#'+,(3'7,+'#1*&&"5;
! 0#"<'&:"'.5*$:%1#'F/G
F$$(%1*&%+,
9+))*,<9/0'H'I+#&
J*#&"5%C*
&%+,
!%#$(*3
+,%-,.$#/0-1%('),/-$
#/0-
K")+53 K")+53
.5*$:%1#'I*5<6*5"
!,&),-%2$
#/0-
K")+53
! ."+)"&53'0,%&'+$"5*&"#'+,'*'$5%)%&%L"-'1*,'
65%&"'E*1M'&+')")+53
! 9:*,4"#'&+'2,<"5(3%,4':*5<6*5"N
! FE%(%&3'&+'65%&"'&+')")+53
! /-#.0.)12&3+"4)((.#5&'#.%(
! 90!F'%#'&:"',"6'6*3'&+'$"5>+5)'
1+)$2&*&%+,'+,'&:"'./0
! !")(#$%&'()&6+738.4(&9:;
! F5E%&5*53'*11"##'&+')")+53'7#1*&&"5'+5'
4*&:"5;! 0#"#'*(('*L*%(*E("'$5+1"##%,4'2,%&#
! I*#'G,&"4"5')*&:-'O%&6%#"'+$"5*&+5#
slide by Matthew Bolitho
History
![Page 38: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/38.jpg)
!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0
/DE/D/$$0
F
! !"#$%&"'(%)%&*&%+,#-'././0'1+))2,%&3'45"67././0'8'.","5*('/25$+#"'9+)$2&*&%+,'+,'&:"'./0;
!"!"#$"%&'%()*
! !"#$%&'()&*)+%),&-#.%
! /(*1"'<*&*'%,'&"=&25"#
! !5*6'*'>(*&'?2*<'7+>>@#15"",;
! A5%&"')2(&%@$*##'*(4+5%&:)'2#%,4'B5*4)",&'0,%&'
&+'$"5>+5)'12#&+)'$5+1"##%,4
! 0,<"5@2&%(%C"<':*5<6*5"
! D,(3'2&%(%C"<'B5*4)",&'0,%&
! D>&",')")+53'E*,<6%<&:'(%)%&"<
! .*&:"5@E*#"<'*(4+5%&:)#'+,(3'7,+'#1*&&"5;
! 0#"<'&:"'.5*$:%1#'F/G
F$$(%1*&%+,
9+))*,<9/0'H'I+#&
J*#&"5%C*
&%+,
!%#$(*3
+,%-,.$#/0-1%('),/-$
#/0-
K")+53 K")+53
.5*$:%1#'I*5<6*5"
!,&),-%2$
#/0-
K")+53
! ."+)"&53'0,%&'+$"5*&"#'+,'*'$5%)%&%L"-'1*,'
65%&"'E*1M'&+')")+53
! 9:*,4"#'&+'2,<"5(3%,4':*5<6*5"N
! FE%(%&3'&+'65%&"'&+')")+53
! /-#.0.)12&3+"4)((.#5&'#.%(
! 90!F'%#'&:"',"6'6*3'&+'$"5>+5)'
1+)$2&*&%+,'+,'&:"'./0
! !")(#$%&'()&6+738.4(&9:;
! F5E%&5*53'*11"##'&+')")+53'7#1*&&"5'+5'
4*&:"5;! 0#"#'*(('*L*%(*E("'$5+1"##%,4'2,%&#
! I*#'G,&"4"5')*&:-'O%&6%#"'+$"5*&+5#
slide by Matthew Bolitho
History
![Page 39: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/39.jpg)
!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0
/DE/D/$$0
F
! !"#$%&"'(%)%&*&%+,#-'././0'1+))2,%&3'45"67././0'8'.","5*('/25$+#"'9+)$2&*&%+,'+,'&:"'./0;
!"!"#$"%&'%()*
! !"#$%&'()&*)+%),&-#.%
! /(*1"'<*&*'%,'&"=&25"#
! !5*6'*'>(*&'?2*<'7+>>@#15"",;
! A5%&"')2(&%@$*##'*(4+5%&:)'2#%,4'B5*4)",&'0,%&'
&+'$"5>+5)'12#&+)'$5+1"##%,4
! 0,<"5@2&%(%C"<':*5<6*5"
! D,(3'2&%(%C"<'B5*4)",&'0,%&
! D>&",')")+53'E*,<6%<&:'(%)%&"<
! .*&:"5@E*#"<'*(4+5%&:)#'+,(3'7,+'#1*&&"5;
! 0#"<'&:"'.5*$:%1#'F/G
F$$(%1*&%+,
9+))*,<9/0'H'I+#&
J*#&"5%C*
&%+,
!%#$(*3
+,%-,.$#/0-1%('),/-$
#/0-
K")+53 K")+53
.5*$:%1#'I*5<6*5"
!,&),-%2$
#/0-
K")+53
! ."+)"&53'0,%&'+$"5*&"#'+,'*'$5%)%&%L"-'1*,'
65%&"'E*1M'&+')")+53
! 9:*,4"#'&+'2,<"5(3%,4':*5<6*5"N
! FE%(%&3'&+'65%&"'&+')")+53
! /-#.0.)12&3+"4)((.#5&'#.%(
! 90!F'%#'&:"',"6'6*3'&+'$"5>+5)'
1+)$2&*&%+,'+,'&:"'./0
! !")(#$%&'()&6+738.4(&9:;
! F5E%&5*53'*11"##'&+')")+53'7#1*&&"5'+5'
4*&:"5;! 0#"#'*(('*L*%(*E("'$5+1"##%,4'2,%&#
! I*#'G,&"4"5')*&:-'O%&6%#"'+$"5*&+5#
slide by Matthew Bolitho
History
![Page 40: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/40.jpg)
// AnalysisIAP09 CUDA@MIT / 6.963
![Page 41: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/41.jpg)
!"#$$%&$'()*+,-.(/$$0
1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86
/E#E/$$0
F
!"!# !"$#
$"!# $"$#
!%&'() $*(+%,()!%&'()
$*(+%,()
#-+-"&.+/*0+%1& !"!# !"$#
$"!# $"$#
!%&'() $*(+%,()
!%&'()
$*(+%,()
#-+-
"&.+/*0+%1&
!"!# !"$#
$"!# $"$#
!%&'() $*(+%,()
!%&'()
$*(+%,()
#-+-
"&.+/*0+%1& !"!# !"$#
$"!# $"$#
!%&'() $*(+%,()
!%&'()
$*(+%,()
#-+-
"&.+/*0+%1&
!"!# !"$#
$"!# $"$#
!%&'() $*(+%,()
!%&'()
$*(+%,()
"&.+/*0+%1&
"&.+/*0+%1&
$"$#
!"#$%&'(%)*$+
!(, -.(/
0123$1453%&'(%)*$+
(,, 67523%$2
8+4$1& 9$1&
slide by Matthew Bolitho
// Analysis
![Page 42: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/42.jpg)
!"#$$%&$'()*+,-.(/$$0
1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86
/E#E/$$0
&
! !"#$%&'()'*$'+&',)($'',%$)-."#)$/.&0/1)
0#$."2'#'3&
! 45)$."$".&0"3)"-)$."6./#)&7/&)0()$/./11'1
! 85)($'',%$)"-)$/./11'1)$".&0"3)
! 9)('.0/1)/16".0&7#)+/3):')#/,')$/./11'1):;)
!"#$"#%&'!"#$%&'()$!*+%,+-..!,+/0
! <03,)-%3,/#'3&/1)$/.&()"-)&7')/16".0&7#)
&7/&)/.')('$/./:1'
slide by Matthew Bolitho
// Analysis
![Page 43: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/43.jpg)
!"#$$%&$'()*+,-.(/$$0
1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86
/E#E/$$0
&
! !"#$%&'()'*$'+&',)($'',%$)-."#)$/.&0/1)
0#$."2'#'3&
! 45)$."$".&0"3)"-)$."6./#)&7/&)0()$/./11'1
! 85)($'',%$)"-)$/./11'1)$".&0"3)
! 9)('.0/1)/16".0&7#)+/3):')#/,')$/./11'1):;)
!"#$"#%&'!"#$%&'()$!*+%,+-..!,+/0
! <03,)-%3,/#'3&/1)$/.&()"-)&7')/16".0&7#)
&7/&)/.')('$/./:1'
slide by Matthew Bolitho
// Analysis
![Page 44: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/44.jpg)
!"#$$%&$'()*+,-.(/$$0
1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86
/E#E/$$0
#
!"#$%&'()*+)#,-,).
&"-"%&'()*+)#,-,).
/0)1+%!"#$#
203'0%!"#$#
&"-"%45"0,.6
&'()*+)#,-,). &'+'.3'.(7%8."97#,#
!"#$%&'()*+)#,-,).
&"-"%&'()*+)#,-,).
/0)1+%!"#$#
203'0%!"#$#
&"-"%45"0,.6
&'()*+)#,-,). &'+'.3'.(7%8."97#,#
! 896)0,-5*#%(".%:'%3'()*+)#'3%:7%:)-5%-"#$%
".3%3"-";
! !"#$;%<,.3%60)1+#%)=%,.#-01(-,).#%-5"-%(".%:'%
'>'(1-'3%,.%+"0"99'9
! %"&";%<,.3%+"0-,-,).#%,.%-5'%3"-"%-5"-%(".%:'%1#'3%
?0'9"-,@'97A%,.3'+'.3'.-97
!"#$%&'()*+)#,-,).
&"-"%&'()*+)#,-,).
/0)1+%!"#$#
203'0%!"#$#
&"-"%45"0,.6
&'()*+)#,-,). &'+'.3'.(7%8."97#,#
! 8."97B'%-5'%"96)0,-5*%".3%=,.3%!"#$%&'#('
)*&+"$,+)#*& -5"-%"0'%?0'9"-,@'97A%,.3'+'.3'.-
! 8."97B'%-5'%"96)0,-5*%".3%=,.3%!"#$%&'#('
)*&+"$,+)#*& -5"-%"0'%?0'9"-,@'97A%,.3'+'.3'.-
! C6;%D"-0,>%D19-,+9,("-,).
! E)*+1-,.6%'"(5%'9'*'.-%)=%F%,#%"%3)-%+0)31(-
slide by Matthew Bolitho
// Analysis
![Page 45: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/45.jpg)
!"#$$%&$'()*+,-.(/$$0
1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86
/E#E/$$0
#
!"#$%&'()*+)#,-,).
&"-"%&'()*+)#,-,).
/0)1+%!"#$#
203'0%!"#$#
&"-"%45"0,.6
&'()*+)#,-,). &'+'.3'.(7%8."97#,#
!"#$%&'()*+)#,-,).
&"-"%&'()*+)#,-,).
/0)1+%!"#$#
203'0%!"#$#
&"-"%45"0,.6
&'()*+)#,-,). &'+'.3'.(7%8."97#,#
! 896)0,-5*#%(".%:'%3'()*+)#'3%:7%:)-5%-"#$%
".3%3"-";
! !"#$;%<,.3%60)1+#%)=%,.#-01(-,).#%-5"-%(".%:'%
'>'(1-'3%,.%+"0"99'9
! %"&";%<,.3%+"0-,-,).#%,.%-5'%3"-"%-5"-%(".%:'%1#'3%
?0'9"-,@'97A%,.3'+'.3'.-97
!"#$%&'()*+)#,-,).
&"-"%&'()*+)#,-,).
/0)1+%!"#$#
203'0%!"#$#
&"-"%45"0,.6
&'()*+)#,-,). &'+'.3'.(7%8."97#,#
! 8."97B'%-5'%"96)0,-5*%".3%=,.3%!"#$%&'#('
)*&+"$,+)#*& -5"-%"0'%?0'9"-,@'97A%,.3'+'.3'.-
! 8."97B'%-5'%"96)0,-5*%".3%=,.3%!"#$%&'#('
)*&+"$,+)#*& -5"-%"0'%?0'9"-,@'97A%,.3'+'.3'.-
! C6;%D"-0,>%D19-,+9,("-,).
! E)*+1-,.6%'"(5%'9'*'.-%)=%F%,#%"%3)-%+0)31(-
slide by Matthew Bolitho
// Analysis
![Page 46: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/46.jpg)
!"#$$%&$'()*+,-.(/$$0
1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86
/E#E/$$0
#
!"#$%&'()*+)#,-,).
&"-"%&'()*+)#,-,).
/0)1+%!"#$#
203'0%!"#$#
&"-"%45"0,.6
&'()*+)#,-,). &'+'.3'.(7%8."97#,#
!"#$%&'()*+)#,-,).
&"-"%&'()*+)#,-,).
/0)1+%!"#$#
203'0%!"#$#
&"-"%45"0,.6
&'()*+)#,-,). &'+'.3'.(7%8."97#,#
! 896)0,-5*#%(".%:'%3'()*+)#'3%:7%:)-5%-"#$%
".3%3"-";
! !"#$;%<,.3%60)1+#%)=%,.#-01(-,).#%-5"-%(".%:'%
'>'(1-'3%,.%+"0"99'9
! %"&";%<,.3%+"0-,-,).#%,.%-5'%3"-"%-5"-%(".%:'%1#'3%
?0'9"-,@'97A%,.3'+'.3'.-97
!"#$%&'()*+)#,-,).
&"-"%&'()*+)#,-,).
/0)1+%!"#$#
203'0%!"#$#
&"-"%45"0,.6
&'()*+)#,-,). &'+'.3'.(7%8."97#,#
! 8."97B'%-5'%"96)0,-5*%".3%=,.3%!"#$%&'#('
)*&+"$,+)#*& -5"-%"0'%?0'9"-,@'97A%,.3'+'.3'.-
! 8."97B'%-5'%"96)0,-5*%".3%=,.3%!"#$%&'#('
)*&+"$,+)#*& -5"-%"0'%?0'9"-,@'97A%,.3'+'.3'.-
! C6;%D"-0,>%D19-,+9,("-,).
! E)*+1-,.6%'"(5%'9'*'.-%)=%F%,#%"%3)-%+0)31(-
slide by Matthew Bolitho
// Analysis
![Page 47: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/47.jpg)
!"#$$%&$'()*+,-.(/$$0
1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86
/E#E/$$0
'
! !"#$%&'()*'(#$+,-.)*/(#"0(1."0(!"#$%&'#('
)*&+"$,+)#*& )*#)(#-'(2-'$#).3'$%4(."0'5'"0'")
! 6+7(8,$'9:$#-(;%"#/.9<! =,/5:)'>.?-#).,"#$@,-9'<
! =,/5:)'A,)#).,"#$@,-9'<
! =,/5:)';.*'0-#$@,-9'<
! =,/5:)'B'.+*?,:-<
! =,/5:)'B,"C,"0."+@,-9'<
! D50#)'E,<.).,"<!"0>'$,9.).'<
F#<G(;'9,/5,<.).,"
;#)#(;'9,/5,<.).,"
H-,:5(F#<G<
I-0'-(F#<G<
;#)#(J*#-."+
;'9,/5,<.).," ;'5'"0'"9%(!"#$%<.<
! !"#$%&'()*'(#$+,-.)*/(),(1."0(K#%<(),(
%-"+)+)#*'+./'0-+-
! 6+7(8#)-.L(8:$).5$.9#).,"7(=,$:/"<(#"0(A,K<
1 2
! !"#$%&'()*'(#$+,-.)*/(),(1."0(K#%<(),(
%-"+)+)#*'+./'0-+-
! 6+7(8#)-.L(8:$).5$.9#).,"7(C$,9G<
1 2
! !"#$%&'()*'(#$+,-.)*/(),(1."0(K#%<(),(
%-"+)+)#*'+./'0-+-
! 6+7(8#)-.L(8:$).5$.9#).,"
1 2
! @."0(K#%<(),(5#-).).,"()*'(0#)#
! 6+7(8#)-.L(8:$).5$.9#).,"
1 2
slide by Matthew Bolitho
// Analysis
![Page 48: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/48.jpg)
!"#$$%&$'()*+,-.(/$$0
1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86
/E#E/$$0
'
! !"#$%&'()*'(#$+,-.)*/(#"0(1."0(!"#$%&'#('
)*&+"$,+)#*& )*#)(#-'(2-'$#).3'$%4(."0'5'"0'")
! 6+7(8,$'9:$#-(;%"#/.9<! =,/5:)'>.?-#).,"#$@,-9'<
! =,/5:)'A,)#).,"#$@,-9'<
! =,/5:)';.*'0-#$@,-9'<
! =,/5:)'B'.+*?,:-<
! =,/5:)'B,"C,"0."+@,-9'<
! D50#)'E,<.).,"<!"0>'$,9.).'<
F#<G(;'9,/5,<.).,"
;#)#(;'9,/5,<.).,"
H-,:5(F#<G<
I-0'-(F#<G<
;#)#(J*#-."+
;'9,/5,<.).," ;'5'"0'"9%(!"#$%<.<
! !"#$%&'()*'(#$+,-.)*/(),(1."0(K#%<(),(
%-"+)+)#*'+./'0-+-
! 6+7(8#)-.L(8:$).5$.9#).,"7(=,$:/"<(#"0(A,K<
1 2
! !"#$%&'()*'(#$+,-.)*/(),(1."0(K#%<(),(
%-"+)+)#*'+./'0-+-
! 6+7(8#)-.L(8:$).5$.9#).,"7(C$,9G<
1 2
! !"#$%&'()*'(#$+,-.)*/(),(1."0(K#%<(),(
%-"+)+)#*'+./'0-+-
! 6+7(8#)-.L(8:$).5$.9#).,"
1 2
! @."0(K#%<(),(5#-).).,"()*'(0#)#
! 6+7(8#)-.L(8:$).5$.9#).,"
1 2
slide by Matthew Bolitho
// Analysis
![Page 49: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/49.jpg)
!"#$$%&$'()*+,-.(/$$0
1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86
/E#E/$$0
'
! !"#$%&'()*'(#$+,-.)*/(#"0(1."0(!"#$%&'#('
)*&+"$,+)#*& )*#)(#-'(2-'$#).3'$%4(."0'5'"0'")
! 6+7(8,$'9:$#-(;%"#/.9<! =,/5:)'>.?-#).,"#$@,-9'<
! =,/5:)'A,)#).,"#$@,-9'<
! =,/5:)';.*'0-#$@,-9'<
! =,/5:)'B'.+*?,:-<
! =,/5:)'B,"C,"0."+@,-9'<
! D50#)'E,<.).,"<!"0>'$,9.).'<
F#<G(;'9,/5,<.).,"
;#)#(;'9,/5,<.).,"
H-,:5(F#<G<
I-0'-(F#<G<
;#)#(J*#-."+
;'9,/5,<.).," ;'5'"0'"9%(!"#$%<.<
! !"#$%&'()*'(#$+,-.)*/(),(1."0(K#%<(),(
%-"+)+)#*'+./'0-+-
! 6+7(8#)-.L(8:$).5$.9#).,"7(=,$:/"<(#"0(A,K<
1 2
! !"#$%&'()*'(#$+,-.)*/(),(1."0(K#%<(),(
%-"+)+)#*'+./'0-+-
! 6+7(8#)-.L(8:$).5$.9#).,"7(C$,9G<
1 2
! !"#$%&'()*'(#$+,-.)*/(),(1."0(K#%<(),(
%-"+)+)#*'+./'0-+-
! 6+7(8#)-.L(8:$).5$.9#).,"
1 2
! @."0(K#%<(),(5#-).).,"()*'(0#)#
! 6+7(8#)-.L(8:$).5$.9#).,"
1 2
slide by Matthew Bolitho
// Analysis
![Page 50: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/50.jpg)
!"#$$%&$'()*+,-.(/$$0
1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86
/E#E/$$0
'
! !"#$%&'()*'(#$+,-.)*/(#"0(1."0(!"#$%&'#('
)*&+"$,+)#*& )*#)(#-'(2-'$#).3'$%4(."0'5'"0'")
! 6+7(8,$'9:$#-(;%"#/.9<! =,/5:)'>.?-#).,"#$@,-9'<
! =,/5:)'A,)#).,"#$@,-9'<
! =,/5:)';.*'0-#$@,-9'<
! =,/5:)'B'.+*?,:-<
! =,/5:)'B,"C,"0."+@,-9'<
! D50#)'E,<.).,"<!"0>'$,9.).'<
F#<G(;'9,/5,<.).,"
;#)#(;'9,/5,<.).,"
H-,:5(F#<G<
I-0'-(F#<G<
;#)#(J*#-."+
;'9,/5,<.).," ;'5'"0'"9%(!"#$%<.<
! !"#$%&'()*'(#$+,-.)*/(),(1."0(K#%<(),(
%-"+)+)#*'+./'0-+-
! 6+7(8#)-.L(8:$).5$.9#).,"7(=,$:/"<(#"0(A,K<
1 2
! !"#$%&'()*'(#$+,-.)*/(),(1."0(K#%<(),(
%-"+)+)#*'+./'0-+-
! 6+7(8#)-.L(8:$).5$.9#).,"7(C$,9G<
1 2
! !"#$%&'()*'(#$+,-.)*/(),(1."0(K#%<(),(
%-"+)+)#*'+./'0-+-
! 6+7(8#)-.L(8:$).5$.9#).,"
1 2
! @."0(K#%<(),(5#-).).,"()*'(0#)#
! 6+7(8#)-.L(8:$).5$.9#).,"
1 2
slide by Matthew Bolitho
// Analysis
![Page 51: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/51.jpg)
!"#$$%&$'()*+,-.(/$$0
1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86
/E#E/$$0
0
!"#$%&'()*+)#,-,).
&"-"%&'()*+)#,-,).
/0)1+%!"#$#
203'0%!"#$#
&"-"%45"0,.6
&'()*+)#,-,). &'+'.3'.(7%8."97#,# ! !5'0'%"0'%*".7%:"7#%-)%3'()*+)#'%".7%
6,;'.%"96)0,-5*
! 4)*'-,*'#%3"-"%3'()*+)#'%'"#,97
! 4)*'-,*'#%-"#$#%3'()*+)#'%'"#,97
! 4)*'-,*'#%<)-5=! 4)*'-,*'#%.',-5'0=
!"#$%&'()*+)#,-,).
&"-"%&'()*+)#,-,).
/0)1+%!"#$#
203'0%!"#$#
&"-"%45"0,.6
&'()*+)#,-,). &'+'.3'.(7%8."97#,# ! 2.('%-5'%"96)0,-5*%5"#%<''.%3'()*+)#'3%
,.-)%3"-"%".3%-"#$#>
!8."97?' @.-'0"(-,).#
!"#$%&'()*+)#,-,).
&"-"%&'()*+)#,-,).
/0)1+%!"#$#
203'0%!"#$#
&"-"%45"0,.6
&'()*+)#,-,). &'+'.3'.(7%8."97#,# ! !)%'"#'%-5'%*"."6'*'.-%)A%3'+'.3'.(,'#%
A,.3%-"#$#%-5"-%"0'%#,*,9"0%".3%60)1+%-5'*
! !5'.%"."97?'%().#-0",.-#%-)%3'-'0*,.'%".7%
.'('##"07%)03'0
slide by Matthew Bolitho
// Analysis
![Page 52: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/52.jpg)
!"#$$%&$'()*+,-.(/$$0
1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86
/E#E/$$0
0
!"#$%&'()*+)#,-,).
&"-"%&'()*+)#,-,).
/0)1+%!"#$#
203'0%!"#$#
&"-"%45"0,.6
&'()*+)#,-,). &'+'.3'.(7%8."97#,# ! !5'0'%"0'%*".7%:"7#%-)%3'()*+)#'%".7%
6,;'.%"96)0,-5*
! 4)*'-,*'#%3"-"%3'()*+)#'%'"#,97
! 4)*'-,*'#%-"#$#%3'()*+)#'%'"#,97
! 4)*'-,*'#%<)-5=! 4)*'-,*'#%.',-5'0=
!"#$%&'()*+)#,-,).
&"-"%&'()*+)#,-,).
/0)1+%!"#$#
203'0%!"#$#
&"-"%45"0,.6
&'()*+)#,-,). &'+'.3'.(7%8."97#,# ! 2.('%-5'%"96)0,-5*%5"#%<''.%3'()*+)#'3%
,.-)%3"-"%".3%-"#$#>
!8."97?' @.-'0"(-,).#
!"#$%&'()*+)#,-,).
&"-"%&'()*+)#,-,).
/0)1+%!"#$#
203'0%!"#$#
&"-"%45"0,.6
&'()*+)#,-,). &'+'.3'.(7%8."97#,# ! !)%'"#'%-5'%*"."6'*'.-%)A%3'+'.3'.(,'#%
A,.3%-"#$#%-5"-%"0'%#,*,9"0%".3%60)1+%-5'*
! !5'.%"."97?'%().#-0",.-#%-)%3'-'0*,.'%".7%
.'('##"07%)03'0
slide by Matthew Bolitho
// Analysis
![Page 53: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/53.jpg)
!"#$$%&$'()*+,-.(/$$0
1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86
/E#E/$$0
0
!"#$%&'()*+)#,-,).
&"-"%&'()*+)#,-,).
/0)1+%!"#$#
203'0%!"#$#
&"-"%45"0,.6
&'()*+)#,-,). &'+'.3'.(7%8."97#,# ! !5'0'%"0'%*".7%:"7#%-)%3'()*+)#'%".7%
6,;'.%"96)0,-5*
! 4)*'-,*'#%3"-"%3'()*+)#'%'"#,97
! 4)*'-,*'#%-"#$#%3'()*+)#'%'"#,97
! 4)*'-,*'#%<)-5=! 4)*'-,*'#%.',-5'0=
!"#$%&'()*+)#,-,).
&"-"%&'()*+)#,-,).
/0)1+%!"#$#
203'0%!"#$#
&"-"%45"0,.6
&'()*+)#,-,). &'+'.3'.(7%8."97#,# ! 2.('%-5'%"96)0,-5*%5"#%<''.%3'()*+)#'3%
,.-)%3"-"%".3%-"#$#>
!8."97?' @.-'0"(-,).#
!"#$%&'()*+)#,-,).
&"-"%&'()*+)#,-,).
/0)1+%!"#$#
203'0%!"#$#
&"-"%45"0,.6
&'()*+)#,-,). &'+'.3'.(7%8."97#,# ! !)%'"#'%-5'%*"."6'*'.-%)A%3'+'.3'.(,'#%
A,.3%-"#$#%-5"-%"0'%#,*,9"0%".3%60)1+%-5'*
! !5'.%"."97?'%().#-0",.-#%-)%3'-'0*,.'%".7%
.'('##"07%)03'0
slide by Matthew Bolitho
// Analysis
![Page 54: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/54.jpg)
!"#$$%&$'()*+,-.(/$$0
1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86
/E#E/$$0
0
!"#$%&'()*+)#,-,).
&"-"%&'()*+)#,-,).
/0)1+%!"#$#
203'0%!"#$#
&"-"%45"0,.6
&'()*+)#,-,). &'+'.3'.(7%8."97#,# ! !5'0'%"0'%*".7%:"7#%-)%3'()*+)#'%".7%
6,;'.%"96)0,-5*
! 4)*'-,*'#%3"-"%3'()*+)#'%'"#,97
! 4)*'-,*'#%-"#$#%3'()*+)#'%'"#,97
! 4)*'-,*'#%<)-5=! 4)*'-,*'#%.',-5'0=
!"#$%&'()*+)#,-,).
&"-"%&'()*+)#,-,).
/0)1+%!"#$#
203'0%!"#$#
&"-"%45"0,.6
&'()*+)#,-,). &'+'.3'.(7%8."97#,# ! 2.('%-5'%"96)0,-5*%5"#%<''.%3'()*+)#'3%
,.-)%3"-"%".3%-"#$#>
!8."97?' @.-'0"(-,).#
!"#$%&'()*+)#,-,).
&"-"%&'()*+)#,-,).
/0)1+%!"#$#
203'0%!"#$#
&"-"%45"0,.6
&'()*+)#,-,). &'+'.3'.(7%8."97#,# ! !)%'"#'%-5'%*"."6'*'.-%)A%3'+'.3'.(,'#%
A,.3%-"#$#%-5"-%"0'%#,*,9"0%".3%60)1+%-5'*
! !5'.%"."97?'%().#-0",.-#%-)%3'-'0*,.'%".7%
.'('##"07%)03'0
slide by Matthew Bolitho
// Analysis
![Page 55: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/55.jpg)
!"#$$%&$'()*+,-.(/$$0
1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86
/E#E/$$0
F
! !"#$%&$#'($#)%*%+$)$*'#",#-$.$*-$*/0$&#,0*-#'%&1&#'(%'#%2$#&0)03%2#%*-#+2"4.#'($)
! 5+6#7"3$/43%2#89*%)0/&! :").4'$;0<2%'0"*%3="2/$&
! :").4'$>"'%'0"*%3="2/$&
! :").4'$80($-2%3="2/$&
! :").4'$?$0+(<"42&
! :").4'$?"*@"*-0*+="2/$&
! A.-%'$B"&0'0"*&C*-;$3"/0'0$&
! !"#$%&$#'($#)%*%+$)$*'#",#-$.$*-$*/0$&#,0*-#'%&1&#'(%'#%2$#&0)03%2#%*-#+2"4.#'($)
! 5+6#7"3$/43%2#89*%)0/&! :").4'$;0<2%'0"*%3="2/$&
! :").4'$>"'%'0"*%3="2/$&
! :").4'$80($-2%3="2/$&
! :").4'$?$0+(<"42&
! :").4'$?"*@"*-0*+="2/$&
! A.-%'$B"&0'0"*&C*-;$3"/0'0$&
! :").4'$#@"*-$-#="2/$&! :").4'$;0<2%'0"*%3="2/$&
! :").4'$>"'%'0"*%3="2/$&
! :").4'$80($-2%3="2/$&
! :").4'$#?$0+(<"42&! :").4'$#?"*D@"*-0*+#="2/$&
! A.-%'$B"&0'0"*&C*-;$3"/0'0$&
! E*/$#+2"4.&#",#'%&1&#%2$#0-$*'0,0$-F#-%'%#,3"G#
/"*&'2%0*'&#$*,"2/$#%#.%2'0%3#"2-$26
! E*/$#+2"4.&#",#'%&1&#%2$#0-$*'0,0$-F#-%'%#,3"G#
/"*&'2%0*'&#$*,"2/$#%#.%2'0%3#"2-$26
A.-%'$#B"&0'0"*&#%*-#;$3"/0'0$&
?"*#@"*-$-#="2/$&
?$0+(<"2#H0&'
@"*-$-#="2/$&
!%&1#8$/")."&0'0"*
8%'%#8$/")."&0'0"*
I2"4.#!%&1&
E2-$2#!%&1&
8%'%#J(%20*+
8$/")."&0'0"* 8$.$*-$*/9#C*%39&0&
slide by Matthew Bolitho
// Analysis
![Page 56: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/56.jpg)
!"#$$%&$'()*+,-.(/$$0
1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86
/E#E/$$0
F
! !"#$%&$#'($#)%*%+$)$*'#",#-$.$*-$*/0$&#,0*-#'%&1&#'(%'#%2$#&0)03%2#%*-#+2"4.#'($)
! 5+6#7"3$/43%2#89*%)0/&! :").4'$;0<2%'0"*%3="2/$&
! :").4'$>"'%'0"*%3="2/$&
! :").4'$80($-2%3="2/$&
! :").4'$?$0+(<"42&
! :").4'$?"*@"*-0*+="2/$&
! A.-%'$B"&0'0"*&C*-;$3"/0'0$&
! !"#$%&$#'($#)%*%+$)$*'#",#-$.$*-$*/0$&#,0*-#'%&1&#'(%'#%2$#&0)03%2#%*-#+2"4.#'($)
! 5+6#7"3$/43%2#89*%)0/&! :").4'$;0<2%'0"*%3="2/$&
! :").4'$>"'%'0"*%3="2/$&
! :").4'$80($-2%3="2/$&
! :").4'$?$0+(<"42&
! :").4'$?"*@"*-0*+="2/$&
! A.-%'$B"&0'0"*&C*-;$3"/0'0$&
! :").4'$#@"*-$-#="2/$&! :").4'$;0<2%'0"*%3="2/$&
! :").4'$>"'%'0"*%3="2/$&
! :").4'$80($-2%3="2/$&
! :").4'$#?$0+(<"42&! :").4'$#?"*D@"*-0*+#="2/$&
! A.-%'$B"&0'0"*&C*-;$3"/0'0$&
! E*/$#+2"4.&#",#'%&1&#%2$#0-$*'0,0$-F#-%'%#,3"G#
/"*&'2%0*'&#$*,"2/$#%#.%2'0%3#"2-$26
! E*/$#+2"4.&#",#'%&1&#%2$#0-$*'0,0$-F#-%'%#,3"G#
/"*&'2%0*'&#$*,"2/$#%#.%2'0%3#"2-$26
A.-%'$#B"&0'0"*&#%*-#;$3"/0'0$&
?"*#@"*-$-#="2/$&
?$0+(<"2#H0&'
@"*-$-#="2/$&
!%&1#8$/")."&0'0"*
8%'%#8$/")."&0'0"*
I2"4.#!%&1&
E2-$2#!%&1&
8%'%#J(%20*+
8$/")."&0'0"* 8$.$*-$*/9#C*%39&0&
slide by Matthew Bolitho
// Analysis
![Page 57: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/57.jpg)
!"#$$%&$'()*+,-.(/$$0
1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86
/E#E/$$0
F
! !"#$%&$#'($#)%*%+$)$*'#",#-$.$*-$*/0$&#,0*-#'%&1&#'(%'#%2$#&0)03%2#%*-#+2"4.#'($)
! 5+6#7"3$/43%2#89*%)0/&! :").4'$;0<2%'0"*%3="2/$&
! :").4'$>"'%'0"*%3="2/$&
! :").4'$80($-2%3="2/$&
! :").4'$?$0+(<"42&
! :").4'$?"*@"*-0*+="2/$&
! A.-%'$B"&0'0"*&C*-;$3"/0'0$&
! !"#$%&$#'($#)%*%+$)$*'#",#-$.$*-$*/0$&#,0*-#'%&1&#'(%'#%2$#&0)03%2#%*-#+2"4.#'($)
! 5+6#7"3$/43%2#89*%)0/&! :").4'$;0<2%'0"*%3="2/$&
! :").4'$>"'%'0"*%3="2/$&
! :").4'$80($-2%3="2/$&
! :").4'$?$0+(<"42&
! :").4'$?"*@"*-0*+="2/$&
! A.-%'$B"&0'0"*&C*-;$3"/0'0$&
! :").4'$#@"*-$-#="2/$&! :").4'$;0<2%'0"*%3="2/$&
! :").4'$>"'%'0"*%3="2/$&
! :").4'$80($-2%3="2/$&
! :").4'$#?$0+(<"42&! :").4'$#?"*D@"*-0*+#="2/$&
! A.-%'$B"&0'0"*&C*-;$3"/0'0$&
! E*/$#+2"4.&#",#'%&1&#%2$#0-$*'0,0$-F#-%'%#,3"G#
/"*&'2%0*'&#$*,"2/$#%#.%2'0%3#"2-$26
! E*/$#+2"4.&#",#'%&1&#%2$#0-$*'0,0$-F#-%'%#,3"G#
/"*&'2%0*'&#$*,"2/$#%#.%2'0%3#"2-$26
A.-%'$#B"&0'0"*&#%*-#;$3"/0'0$&
?"*#@"*-$-#="2/$&
?$0+(<"2#H0&'
@"*-$-#="2/$&
!%&1#8$/")."&0'0"*
8%'%#8$/")."&0'0"*
I2"4.#!%&1&
E2-$2#!%&1&
8%'%#J(%20*+
8$/")."&0'0"* 8$.$*-$*/9#C*%39&0&
slide by Matthew Bolitho
// Analysis
![Page 58: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/58.jpg)
!"#$$%&$'()*+,-.(/$$0
1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86
/E#E/$$0
F
! !"#$%&$#'($#)%*%+$)$*'#",#-$.$*-$*/0$&#,0*-#'%&1&#'(%'#%2$#&0)03%2#%*-#+2"4.#'($)
! 5+6#7"3$/43%2#89*%)0/&! :").4'$;0<2%'0"*%3="2/$&
! :").4'$>"'%'0"*%3="2/$&
! :").4'$80($-2%3="2/$&
! :").4'$?$0+(<"42&
! :").4'$?"*@"*-0*+="2/$&
! A.-%'$B"&0'0"*&C*-;$3"/0'0$&
! !"#$%&$#'($#)%*%+$)$*'#",#-$.$*-$*/0$&#,0*-#'%&1&#'(%'#%2$#&0)03%2#%*-#+2"4.#'($)
! 5+6#7"3$/43%2#89*%)0/&! :").4'$;0<2%'0"*%3="2/$&
! :").4'$>"'%'0"*%3="2/$&
! :").4'$80($-2%3="2/$&
! :").4'$?$0+(<"42&
! :").4'$?"*@"*-0*+="2/$&
! A.-%'$B"&0'0"*&C*-;$3"/0'0$&
! :").4'$#@"*-$-#="2/$&! :").4'$;0<2%'0"*%3="2/$&
! :").4'$>"'%'0"*%3="2/$&
! :").4'$80($-2%3="2/$&
! :").4'$#?$0+(<"42&! :").4'$#?"*D@"*-0*+#="2/$&
! A.-%'$B"&0'0"*&C*-;$3"/0'0$&
! E*/$#+2"4.&#",#'%&1&#%2$#0-$*'0,0$-F#-%'%#,3"G#
/"*&'2%0*'&#$*,"2/$#%#.%2'0%3#"2-$26
! E*/$#+2"4.&#",#'%&1&#%2$#0-$*'0,0$-F#-%'%#,3"G#
/"*&'2%0*'&#$*,"2/$#%#.%2'0%3#"2-$26
A.-%'$#B"&0'0"*&#%*-#;$3"/0'0$&
?"*#@"*-$-#="2/$&
?$0+(<"2#H0&'
@"*-$-#="2/$&
!%&1#8$/")."&0'0"*
8%'%#8$/")."&0'0"*
I2"4.#!%&1&
E2-$2#!%&1&
8%'%#J(%20*+
8$/")."&0'0"* 8$.$*-$*/9#C*%39&0&
slide by Matthew Bolitho
// Analysis
![Page 59: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/59.jpg)
!"#$$%&$'()*+,-.(/$$0
1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86
/E#E/$$0
F$
! !"#$%&'()*'++,%-(.$($.%/(-0&1%-2%)'131%'".%
&'()*)*-"1%-2%.')'%'($%*.$")*2*$.4%'"'+,5$%)6$%
!"#"$%&"'()*$)6')%-##0(1
! 7')'%16'(*"/%#'"%8$%#')$/-(*5$.%'19
! :$'.;-"+,
! <22$#)*=$+,%>-#'+
! :$'.;?(*)$
! @##0A0+')$
! B0+)*&+$%:$'.CD*"/+$%?(*)$
+,"!-.)/0
! 7')'%*1%($'.4%80)%"-)%E(*))$"
! F-%#-"1*1)$"#,%&(-8+$A1
! :$&+*#')*-"%*"%.*1)(*80)$.%1,1)$A
122,3#(4,/0-5.3"/
! 7')'%*1%($'.%'".%E(*))$"
! 7')'%*1%&'()*)*-"$.%*")-%1081$)1
! !"$%)'13%&$(%1081$)
! G'"%.*1)(*80)$%1081$)1
+,"!-6'(#,
! 7')'%*1%($'.%'".%E(*))$"
! B'",%)'131%'##$11%A'",%.')'
! G-"1*1)$"#,%*110$1
! B-1)%.*22*#0+)%)-%.$'+%E*)6
+,"!-6'(#,$!733898/"#(.)%
! @1%&$(%:$'.;?(*)$4%'+)6-0/6%E(*)$1%#-"1*1)%-2%'"%
'##0A0+')*-"%-&$(')*-"
! G-AA-"%*"%($.0#)*-";),&$%'+/-(*)6A1
! G'"%($&+*#')$%1*"#$%'##0A0+')*-"%#'"%8$%+*"$'(
slide by Matthew Bolitho
// Analysis
![Page 60: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/60.jpg)
!"#$$%&$'()*+,-.(/$$0
1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86
/E#E/$$0
F$
! !"#$%&'()*'++,%-(.$($.%/(-0&1%-2%)'131%'".%
&'()*)*-"1%-2%.')'%'($%*.$")*2*$.4%'"'+,5$%)6$%
!"#"$%&"'()*$)6')%-##0(1
! 7')'%16'(*"/%#'"%8$%#')$/-(*5$.%'19
! :$'.;-"+,
! <22$#)*=$+,%>-#'+
! :$'.;?(*)$
! @##0A0+')$
! B0+)*&+$%:$'.CD*"/+$%?(*)$
+,"!-.)/0
! 7')'%*1%($'.4%80)%"-)%E(*))$"
! F-%#-"1*1)$"#,%&(-8+$A1
! :$&+*#')*-"%*"%.*1)(*80)$.%1,1)$A
122,3#(4,/0-5.3"/
! 7')'%*1%($'.%'".%E(*))$"
! 7')'%*1%&'()*)*-"$.%*")-%1081$)1
! !"$%)'13%&$(%1081$)
! G'"%.*1)(*80)$%1081$)1
+,"!-6'(#,
! 7')'%*1%($'.%'".%E(*))$"
! B'",%)'131%'##$11%A'",%.')'
! G-"1*1)$"#,%*110$1
! B-1)%.*22*#0+)%)-%.$'+%E*)6
+,"!-6'(#,$!733898/"#(.)%
! @1%&$(%:$'.;?(*)$4%'+)6-0/6%E(*)$1%#-"1*1)%-2%'"%
'##0A0+')*-"%-&$(')*-"
! G-AA-"%*"%($.0#)*-";),&$%'+/-(*)6A1
! G'"%($&+*#')$%1*"#$%'##0A0+')*-"%#'"%8$%+*"$'(
slide by Matthew Bolitho
// Analysis
![Page 61: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/61.jpg)
!"#$$%&$'()*+,-.(/$$0
1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86
/E#E/$$0
F$
! !"#$%&'()*'++,%-(.$($.%/(-0&1%-2%)'131%'".%
&'()*)*-"1%-2%.')'%'($%*.$")*2*$.4%'"'+,5$%)6$%
!"#"$%&"'()*$)6')%-##0(1
! 7')'%16'(*"/%#'"%8$%#')$/-(*5$.%'19
! :$'.;-"+,
! <22$#)*=$+,%>-#'+
! :$'.;?(*)$
! @##0A0+')$
! B0+)*&+$%:$'.CD*"/+$%?(*)$
+,"!-.)/0
! 7')'%*1%($'.4%80)%"-)%E(*))$"
! F-%#-"1*1)$"#,%&(-8+$A1
! :$&+*#')*-"%*"%.*1)(*80)$.%1,1)$A
122,3#(4,/0-5.3"/
! 7')'%*1%($'.%'".%E(*))$"
! 7')'%*1%&'()*)*-"$.%*")-%1081$)1
! !"$%)'13%&$(%1081$)
! G'"%.*1)(*80)$%1081$)1
+,"!-6'(#,
! 7')'%*1%($'.%'".%E(*))$"
! B'",%)'131%'##$11%A'",%.')'
! G-"1*1)$"#,%*110$1
! B-1)%.*22*#0+)%)-%.$'+%E*)6
+,"!-6'(#,$!733898/"#(.)%
! @1%&$(%:$'.;?(*)$4%'+)6-0/6%E(*)$1%#-"1*1)%-2%'"%
'##0A0+')*-"%-&$(')*-"
! G-AA-"%*"%($.0#)*-";),&$%'+/-(*)6A1
! G'"%($&+*#')$%1*"#$%'##0A0+')*-"%#'"%8$%+*"$'(
slide by Matthew Bolitho
// Analysis
![Page 62: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/62.jpg)
!"#$$%&$'()*+,-.(/$$0
1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86
/E#E/$$0
F$
! !"#$%&'()*'++,%-(.$($.%/(-0&1%-2%)'131%'".%
&'()*)*-"1%-2%.')'%'($%*.$")*2*$.4%'"'+,5$%)6$%
!"#"$%&"'()*$)6')%-##0(1
! 7')'%16'(*"/%#'"%8$%#')$/-(*5$.%'19
! :$'.;-"+,
! <22$#)*=$+,%>-#'+
! :$'.;?(*)$
! @##0A0+')$
! B0+)*&+$%:$'.CD*"/+$%?(*)$
+,"!-.)/0
! 7')'%*1%($'.4%80)%"-)%E(*))$"
! F-%#-"1*1)$"#,%&(-8+$A1
! :$&+*#')*-"%*"%.*1)(*80)$.%1,1)$A
122,3#(4,/0-5.3"/
! 7')'%*1%($'.%'".%E(*))$"
! 7')'%*1%&'()*)*-"$.%*")-%1081$)1
! !"$%)'13%&$(%1081$)
! G'"%.*1)(*80)$%1081$)1
+,"!-6'(#,
! 7')'%*1%($'.%'".%E(*))$"
! B'",%)'131%'##$11%A'",%.')'
! G-"1*1)$"#,%*110$1
! B-1)%.*22*#0+)%)-%.$'+%E*)6
+,"!-6'(#,$!733898/"#(.)%
! @1%&$(%:$'.;?(*)$4%'+)6-0/6%E(*)$1%#-"1*1)%-2%'"%
'##0A0+')*-"%-&$(')*-"
! G-AA-"%*"%($.0#)*-";),&$%'+/-(*)6A1
! G'"%($&+*#')$%1*"#$%'##0A0+')*-"%#'"%8$%+*"$'(
slide by Matthew Bolitho
// Analysis
![Page 63: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/63.jpg)
!"#$$%&$'()*+,-.(/$$0
1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86
/E#E/$$0
F$
! !"#$%&'()*'++,%-(.$($.%/(-0&1%-2%)'131%'".%
&'()*)*-"1%-2%.')'%'($%*.$")*2*$.4%'"'+,5$%)6$%
!"#"$%&"'()*$)6')%-##0(1
! 7')'%16'(*"/%#'"%8$%#')$/-(*5$.%'19
! :$'.;-"+,
! <22$#)*=$+,%>-#'+
! :$'.;?(*)$
! @##0A0+')$
! B0+)*&+$%:$'.CD*"/+$%?(*)$
+,"!-.)/0
! 7')'%*1%($'.4%80)%"-)%E(*))$"
! F-%#-"1*1)$"#,%&(-8+$A1
! :$&+*#')*-"%*"%.*1)(*80)$.%1,1)$A
122,3#(4,/0-5.3"/
! 7')'%*1%($'.%'".%E(*))$"
! 7')'%*1%&'()*)*-"$.%*")-%1081$)1
! !"$%)'13%&$(%1081$)
! G'"%.*1)(*80)$%1081$)1
+,"!-6'(#,
! 7')'%*1%($'.%'".%E(*))$"
! B'",%)'131%'##$11%A'",%.')'
! G-"1*1)$"#,%*110$1
! B-1)%.*22*#0+)%)-%.$'+%E*)6
+,"!-6'(#,$!733898/"#(.)%
! @1%&$(%:$'.;?(*)$4%'+)6-0/6%E(*)$1%#-"1*1)%-2%'"%
'##0A0+')*-"%-&$(')*-"
! G-AA-"%*"%($.0#)*-";),&$%'+/-(*)6A1
! G'"%($&+*#')$%1*"#$%'##0A0+')*-"%#'"%8$%+*"$'(
slide by Matthew Bolitho
// Analysis
![Page 64: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/64.jpg)
!"#$$%&$'()*+,-.(/$$0
1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86
/E#E/$$0
FF
!"#$%&'()"*!+,-)(.-"*!"#$/0(12-"*&'()"
! !"#$%&#'%()*+&,-%.#(/-01230#14/5#14%#-("6#7&,-%"
! '%/(8%)#914","-%495#914"-&(,4-"
! :8(;$/%<#=(-&,8#=2/-,$/,9(-,14
3 4
! :8(;$/%<#=(-&,8#=2/-,$/,9(-,14
3 4
'%()*>4/5
'%()*>4/5
:??%9-,@%/5*
A19(/
! :8(;$/%<#=1/%92/(&#B54(;,9"
C$)(-%#D1",-,14"#(4)#E%/19,-,%"
F14#G14)%)#H1&9%"
F%,30I1&#A,"-
G14)%)#H1&9%"
H1&9%"
! :8(;$/%<#=1/%92/(&#B54(;,9"
C$)(-%#D1",-,14"#(4)#E%/19,-,%"
F14#G14)%)#H1&9%"
F%,30I1&#A,"-
G14)%)#H1&9%"
!-1;,9#
J11&),4(-%"
slide by Matthew Bolitho
// Analysis
![Page 65: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/65.jpg)
!"#$$%&$'()*+,-.(/$$0
1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86
/E#E/$$0
FF
!"#$%&'()"*!+,-)(.-"*!"#$/0(12-"*&'()"
! !"#$%&#'%()*+&,-%.#(/-01230#14/5#14%#-("6#7&,-%"
! '%/(8%)#914","-%495#914"-&(,4-"
! :8(;$/%<#=(-&,8#=2/-,$/,9(-,14
3 4
! :8(;$/%<#=(-&,8#=2/-,$/,9(-,14
3 4
'%()*>4/5
'%()*>4/5
:??%9-,@%/5*
A19(/
! :8(;$/%<#=1/%92/(&#B54(;,9"
C$)(-%#D1",-,14"#(4)#E%/19,-,%"
F14#G14)%)#H1&9%"
F%,30I1&#A,"-
G14)%)#H1&9%"
H1&9%"
! :8(;$/%<#=1/%92/(&#B54(;,9"
C$)(-%#D1",-,14"#(4)#E%/19,-,%"
F14#G14)%)#H1&9%"
F%,30I1&#A,"-
G14)%)#H1&9%"
!-1;,9#
J11&),4(-%"
slide by Matthew Bolitho
// Analysis
![Page 66: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/66.jpg)
!"#$$%&$'()*+,-.(/$$0
1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86
/E#E/$$0
FF
!"#$%&'()"*!+,-)(.-"*!"#$/0(12-"*&'()"
! !"#$%&#'%()*+&,-%.#(/-01230#14/5#14%#-("6#7&,-%"
! '%/(8%)#914","-%495#914"-&(,4-"
! :8(;$/%<#=(-&,8#=2/-,$/,9(-,14
3 4
! :8(;$/%<#=(-&,8#=2/-,$/,9(-,14
3 4
'%()*>4/5
'%()*>4/5
:??%9-,@%/5*
A19(/
! :8(;$/%<#=1/%92/(&#B54(;,9"
C$)(-%#D1",-,14"#(4)#E%/19,-,%"
F14#G14)%)#H1&9%"
F%,30I1&#A,"-
G14)%)#H1&9%"
H1&9%"
! :8(;$/%<#=1/%92/(&#B54(;,9"
C$)(-%#D1",-,14"#(4)#E%/19,-,%"
F14#G14)%)#H1&9%"
F%,30I1&#A,"-
G14)%)#H1&9%"
!-1;,9#
J11&),4(-%"
slide by Matthew Bolitho
// Analysis
![Page 67: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/67.jpg)
CUDA Overview
IAP09 CUDA@MIT / 6.963
![Page 68: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/68.jpg)
!
"#!"#$%&'()%*+,-.,/012+%3./456'1(%7'6)%389:
*,.;<+/$%=*=*8*,.;<+/$%=*=*8
>?9$ !"!"# @ 6,'2A%6)+%=*8%'16.%(+1+,0<B45,4.C+%
2./456'1(%;D%20C6'1(%4,.;<+/%0C%(,04)'2C
E5,1%F060%'16.%'/0(+C%GH6+I65,+%/04CJK
E5,1%0<(.,'6)/C%'16.%'/0(+%CD16)+C'C%GH,+1F+,'1(%40CC+CJK
*,./'C'1(%,+C5<6CL%;56$
E.5()%<+0,1'1(%25,M+L%40,6'25<0,<D%-.,%1.1B(,04)'2C%+I4+,6C
*.6+16'0<<D%)'()%.M+,)+0F%.-%(,04)'2C%:*N
&'()<D%2.1C6,0'1+F%/+/.,D%<0D.56%O%022+CC%/.F+<
P++F%-.,%/01D%40CC+C%F,'M+C%54%;01F7'F6)%2.1C5/46'.1
"$!"#$%&'()%*+,-.,/012+%3./456'1(%7'6)%389:
!.<56'.1$%=*8%3./456'1(!.<56'.1$%=*8%3./456'1(
PQR$ !"#$%&'()*+,-$7'6)%389:
389:%S%%&'()*.$#,+/+.0$12+3.2$4256+*.5*)2.
3.BF+C'(1+F%)0,F70,+%O%C.-670,+%-.,%F',+26%=*8%2./456'1(
&0,F70,+$%-5<<D%(+1+,0<%F060B40,0<<+<%0,2)'6+265,+
!.-670,+$%4,.(,0/%6)+%=*8%'1%3
=+1+,0<%6),+0F%<0512)
=<.;0<%<.0FBC6.,+
*0,0<<+<%F060%202)+
!20<0,%0,2)'6+265,+
N16+(+,CL%;'6%.4+,06'.1C
9.5;<+%4,+2'C'.1%GC..1K
!20<0;<+%F060B40,0<<+<%
+I+256'.1T/+/.,D%/.F+<
3%7'6)%/'1'/0<%D+6%
4.7+,-5<%+I6+1C'.1C
Overview
![Page 69: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/69.jpg)
!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0
/DE/D/$$0
E
!"#$%&"'()'*+$&,-%#$.,+'/,'0%-1
! !"#$$%&'()*+$+,#'-./0&'1234')$5'3234
! 21+-13'145"-'16,%$'789:
! 1634'7'81.$-"0'69+-9),:'3;"<.="0'4)<)>
! 2&'3234';$509'!"#$$%&'<)*+$+,#:'?;<@
! 9,4$'3.5"'7*9:;'"<="#$'+,'.+4$&%#$.,+'3,#54$"=
! >&1=?.#4'@1&-A1&"'@.4$,&B'CDC
! 0E:F'@1&-A1&"'F&#?.$"#$%&"'GH"&H."A
! 0E:F'9"I,&B'9,-"3
! 0E:F'J<"#%$.,+'9,-"3
! @,I"A,&5'G+"
! !"#$)'0,I=%$"'E+.K."-':"H.#"'F&#?.$"#$%&"
! 0&"1$"-'6B'LM*:*F
! F'A1B'$,'="&K,&I'#,I=%$1$.,+',+'$?"'>8E
! 7="#.K.#1$.,+'K,&)
! F'#,I=%$"&'1&#?.$"#$%&"
! F'31+N%1N"
! F+'1==3.#1$.,+'.+$"&K1#"'OF8*P
! 0E:F'?1&-A1&"'1&#?.$"#$%&"'.4'614"-',+'
")<0&<'A9)=B.C&'69+C0&&.$-'D$.<&'EA6D%&F
slide by Matthew Bolitho
Overview
![Page 70: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/70.jpg)
© NVIDIA Corporation 2006 9
CUDA Advantages over Legacy GPGPU
Random access to memoryThread can access any memory location
Unlimited access to memory
Thread can read/write as many locations as needed
User-managed cache (per block)
Threads can cooperatively load data into SMEM
Any thread can then access any SMEM location
Low learning curve
Just a few extensions to C
No knowledge of graphics is required
No graphics API overhead
Overview
![Page 71: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/71.jpg)
© 2008 NVIDIA Corporation.
Some Design Goals
Scale to 100’s of cores, 1000’s of parallel threads
Let programmers focus on parallel algorithms
Not on the mechanics of a parallel programming language
Enable heterogeneous systems (i.e. CPU + GPU)CPU and GPU are separate devices with separate DRAMs
Overview
![Page 72: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/72.jpg)
!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0
/DE/D/$$0
#
! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'
4)0563+7''890:*'"0'+;*'%*+(9
! <-):+')*9*(:*'=>?$>@A'B01B*5
! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.
! G4*.'C06)3*
! H0+',*+'('I-(B9*':096+-0.
&*I-3*
%69+-EJ)03*::0) %69+-EJ)03*::0)
!
!"#$%&'()"*+ !"#$%&'()"*+
,(-.#(&'()"*+
/0 /0
/0 /0
/0 /0
! !
/0 /0
/0 /0
/0 /0
! !
%69+-EJ)03*::0)
!"#$%&'()"*+
/0 /0
/0 /0
/0 /0
! !
! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#
! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN
! G.*'%69+-49,$!55'4*)'39032'3,39*
! 678 #OOOE@PQ'30149-(.+
! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.
(#)*$!!)#!
! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5
! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$
%69+-EJ)03*::0)
<(3.1;(*1!"#$%&
'()"*+
!
/;*($)&0*"#(11"*
/=$*(>&'()"*+
?%"@$%&'()"*+
A"21;$2;&'()"*+
8(B;C*(&'()"*+
/;*($)&0*"#(11"*1&=$-(&$##(11&;"D
'()"*+&8+5( E##(11 /=$*.23
R*/-:+*): R*(5$S)-+* J)-I(+*
M03(9'%*10), R*(5$S)-+* J)-I(+*
C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)
T90B(9'%*10), R*(5$S)-+* &*I-3*
80.:+(.+'%*10), R*(5 &*I-3*
"*U+6)*'%*10), R*(5 &*I-3*
slide by Matthew Bolitho
Overview
![Page 73: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/73.jpg)
Overview
![Page 74: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/74.jpg)
© 2008 NVIDIA Corporation.
CUDA Installation
CUDA installation consists of Driver
CUDA Toolkit (compiler, libraries)
CUDA SDK (example codes)
Overview
![Page 75: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/75.jpg)
© 2008 NVIDIA Corporation.
CUDA Software Development
NVIDIA C Compiler
NVIDIA Assemblyfor Computing (PTX)
CPU Host Code
Integrated CPU + GPUC Source Code
CUDA Optimized Libraries:math.h, FFT, BLAS, …
CUDADriver
Profiler Standard C Compiler
GPU CPU
Overview
![Page 76: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/76.jpg)
© 2008 NVIDIA Corporation.
Compiling CUDA Code
NVCC
C/C++ CUDAApplication
PTX to Target
Compiler
G80 … GPU
Target code
PTX Code Virtual
Physical
CPU Code
Overview
![Page 77: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/77.jpg)
CUDA Basics
IAP09 CUDA@MIT / 6.963
![Page 78: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/78.jpg)
© 2008 NVIDIA Corporation.
CUDA Kernels and Threads
Parallel portions of an application are executed on the device as kernels
One kernel is executed at a time
Many threads execute each kernel
Differences between CUDA and CPU threads CUDA threads are extremely lightweight
Very little creation overhead
Instant switching
CUDA uses 1000s of threads to achieve efficiencyMulti-core CPUs can use only a few
Definitions Device = GPU
Host = CPU
Kernel = function that runs on the device
Basics
![Page 79: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/79.jpg)
© 2008 NVIDIA Corporation.
Arrays of Parallel Threads
A CUDA kernel is executed by an array of threadsAll threads run the same code
Each thread has an ID that it uses to compute memory addresses and make control decisions
0 1 2 3 4 5 6 7
…
float x = input[threadID];
float y = func(x);
output[threadID] = y;
…
threadID
Basics
![Page 80: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/80.jpg)
© 2008 NVIDIA Corporation.
Thread Cooperation
The Missing Piece: threads may need to cooperate
Thread cooperation is valuable
Share results to avoid redundant computation
Share memory accesses
Drastic bandwidth reduction
Thread cooperation is a powerful feature of CUDA
Cooperation between a monolithic array of threads is not scalable
Cooperation within smaller batches of threads is scalable
Basics
![Page 81: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/81.jpg)
© 2008 NVIDIA Corporation.
Thread Batching
Kernel launches a grid of thread blocksThreads within a block cooperate via shared memory
Threads within a block can synchronize
Threads in different blocks cannot cooperate
Allows programs to transparently scale to different GPUs
Grid
Thread Block 0
Shared Memory
Thread Block 1
Shared Memory
Thread Block N-1
Shared Memory
…
Basics
![Page 82: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/82.jpg)
© 2008 NVIDIA Corporation.
Transparent Scalability
Kernel grid
Block 2 Block 3
Block 4 Block 5
Block 6 Block 7
Device Device
Block 0 Block 1 Block 2 Block 3
Block 4 Block 5 Block 6 Block 7
Block 0 Block 1
Block 2 Block 3
Block 4 Block 5
Block 6 Block 7
Block 0 Block 1
Hardware is free to schedule thread blocks on any processor
A kernel scales across parallel multiprocessors
Basics
![Page 83: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/83.jpg)
© 2008 NVIDIA Corporation.
8-Series Architecture (G80)
128 thread processors execute kernel threads
16 multiprocessors, each contains
8 thread processors
Shared memory enables thread cooperation
SharedMemory
SharedMemory
SharedMemory
SharedMemory
SharedMemory
SharedMemory
SharedMemory
SharedMemory
SharedMemory
SharedMemory
SharedMemory
SharedMemory
SharedMemory
SharedMemory
SharedMemory
SharedMemory
Multiprocessor
Thread
Processors
SharedMemory
Basics
![Page 84: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/84.jpg)
© 2008 NVIDIA Corporation.
10-Series Architecture
240 thread processors execute kernel threads
30 multiprocessors, each contains
8 thread processors
One double-precision unit
Shared memory enables thread cooperation
ThreadProcessors
Multiprocessor
SharedMemory
Double
Basics
![Page 85: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/85.jpg)
© 2008 NVIDIA Corporation.
Kernel Memory Access
Per-thread
Per-block
Per-device
ThreadRegisters
Local Memory
SharedMemory
Block
...Kernel 0
...Kernel 1
GlobalMemory
Time
On-chip
Off-chip, uncached
• On-chip, small
• Fast
• Off-chip, large
• Uncached
• Persistent across kernel launches
• Kernel I/O
Basics
![Page 86: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/86.jpg)
!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0
/DE/D/$$0
#
! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'
4)0563+7''890:*'"0'+;*'%*+(9
! <-):+')*9*(:*'=>?$>@A'B01B*5
! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.
! G4*.'C06)3*
! H0+',*+'('I-(B9*':096+-0.
&*I-3*
%69+-EJ)03*::0) %69+-EJ)03*::0)
!
!"#$%&'()"*+ !"#$%&'()"*+
,(-.#(&'()"*+
/0 /0
/0 /0
/0 /0
! !
/0 /0
/0 /0
/0 /0
! !
%69+-EJ)03*::0)
!"#$%&'()"*+
/0 /0
/0 /0
/0 /0
! !
! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#
! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN
! G.*'%69+-49,$!55'4*)'39032'3,39*
! 678 #OOOE@PQ'30149-(.+
! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.
(#)*$!!)#!
! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5
! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$
%69+-EJ)03*::0)
<(3.1;(*1!"#$%&
'()"*+
!
/;*($)&0*"#(11"*
/=$*(>&'()"*+
?%"@$%&'()"*+
A"21;$2;&'()"*+
8(B;C*(&'()"*+
/;*($)&0*"#(11"*1&=$-(&$##(11&;"D
'()"*+&8+5( E##(11 /=$*.23
R*/-:+*): R*(5$S)-+* J)-I(+*
M03(9'%*10), R*(5$S)-+* J)-I(+*
C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)
T90B(9'%*10), R*(5$S)-+* &*I-3*
80.:+(.+'%*10), R*(5 &*I-3*
"*U+6)*'%*10), R*(5 &*I-3*
slide by Matthew Bolitho
Basics
![Page 87: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/87.jpg)
© 2008 NVIDIA Corporation.
Execution Model
Software Hardware
Threads are executed by thread processors
Thread
Thread Processor
Thread Block Multiprocessor
Thread blocks are executed on multiprocessors
Thread blocks do not migrate
Several concurrent thread blocks can reside on one multiprocessor - limited by multiprocessor resources (shared memory and register file)
...
Grid Device
A kernel is launched as a grid of thread blocks
Only one kernel can execute on a device at one time
Basics
![Page 88: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/88.jpg)
© 2008 NVIDIA Corporation.
Key Parallel Abstractions in CUDA
Trillions of lightweight threadsSimple decomposition model
Hierarchy of concurrent threadsSimple execution model
Lightweight synchronization of primitivesSimple synchronization model
Shared memory model for thread cooperation
Simple communication model
Basics
![Page 89: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/89.jpg)
© 2008 NVIDIA Corporation.
Managing Memory
CPU and GPU have separate memory spaces
Host (CPU) code manages device (GPU) memory:Allocate / free
Copy data to and from device
Applies to global device memory (DRAM)
Multiprocessor
Host
CPU
ChipsetDRAM
Device
DRAM
Local Memory
GlobalMemory
GPU
Multiprocessor
Multiprocessor
Registers
Shared Memory
Basics
![Page 90: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/90.jpg)
© 2008 NVIDIA Corporation.
GPU Memory Allocation / Release
cudaMalloc(void ** pointer, size_t nbytes)
cudaMemset(void * pointer, int value, size_t count)
cudaFree(void* pointer)
int n = 1024;
int nbytes = 1024*sizeof(int);
int *a_d = 0;
cudaMalloc( (void**)&a_d, nbytes );
cudaMemset( a_d, 0, nbytes);
cudaFree(a_d);
Basics
![Page 91: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/91.jpg)
© 2008 NVIDIA Corporation.
Data Copies
cudaMemcpy(void *dst, void *src, size_t nbytes, enum cudaMemcpyKind direction);
direction specifies locations (host or device) of src and dst
Blocks CPU thread: returns after the copy is complete
Doesn’t start copying until previous CUDA calls complete
enum cudaMemcpyKindcudaMemcpyHostToDevice
cudaMemcpyDeviceToHost
cudaMemcpyDeviceToDevice
Basics
![Page 92: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/92.jpg)
© 2008 NVIDIA Corporation.
Data Movement Example
int main(void)
{
float *a_h, *b_h; // host data
float *a_d, *b_d; // device data
int N = 14, nBytes, i ;
nBytes = N*sizeof(float);
a_h = (float *)malloc(nBytes);
b_h = (float *)malloc(nBytes);
cudaMalloc((void **) &a_d, nBytes);
cudaMalloc((void **) &b_d, nBytes);
for (i=0, i<N; i++) a_h[i] = 100.f + i;
cudaMemcpy(a_d, a_h, nBytes, cudaMemcpyHostToDevice);
cudaMemcpy(b_d, a_d, nBytes, cudaMemcpyDeviceToDevice);
cudaMemcpy(b_h, b_d, nBytes, cudaMemcpyDeviceToHost);
for (i=0; i< N; i++) assert( a_h[i] == b_h[i] );
free(a_h); free(b_h); cudaFree(a_d); cudaFree(b_d);
return 0;
}
Basics
![Page 93: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/93.jpg)
© 2008 NVIDIA Corporation.
Data Movement Example
int main(void)
{
float *a_h, *b_h; // host data
float *a_d, *b_d; // device data
int N = 14, nBytes, i ;
nBytes = N*sizeof(float);
a_h = (float *)malloc(nBytes);
b_h = (float *)malloc(nBytes);
cudaMalloc((void **) &a_d, nBytes);
cudaMalloc((void **) &b_d, nBytes);
for (i=0, i<N; i++) a_h[i] = 100.f + i;
cudaMemcpy(a_d, a_h, nBytes, cudaMemcpyHostToDevice);
cudaMemcpy(b_d, a_d, nBytes, cudaMemcpyDeviceToDevice);
cudaMemcpy(b_h, b_d, nBytes, cudaMemcpyDeviceToHost);
for (i=0; i< N; i++) assert( a_h[i] == b_h[i] );
free(a_h); free(b_h); cudaFree(a_d); cudaFree(b_d);
return 0;
}
Host
a_h
b_h
Basics
![Page 94: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/94.jpg)
© 2008 NVIDIA Corporation.
Data Movement Example
int main(void)
{
float *a_h, *b_h; // host data
float *a_d, *b_d; // device data
int N = 14, nBytes, i ;
nBytes = N*sizeof(float);
a_h = (float *)malloc(nBytes);
b_h = (float *)malloc(nBytes);
cudaMalloc((void **) &a_d, nBytes);
cudaMalloc((void **) &b_d, nBytes);
for (i=0, i<N; i++) a_h[i] = 100.f + i;
cudaMemcpy(a_d, a_h, nBytes, cudaMemcpyHostToDevice);
cudaMemcpy(b_d, a_d, nBytes, cudaMemcpyDeviceToDevice);
cudaMemcpy(b_h, b_d, nBytes, cudaMemcpyDeviceToHost);
for (i=0; i< N; i++) assert( a_h[i] == b_h[i] );
free(a_h); free(b_h); cudaFree(a_d); cudaFree(b_d);
return 0;
}
Host Device
a_h
b_h
a_d
b_d
Basics
![Page 95: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/95.jpg)
© 2008 NVIDIA Corporation.
Data Movement Example
int main(void)
{
float *a_h, *b_h; // host data
float *a_d, *b_d; // device data
int N = 14, nBytes, i ;
nBytes = N*sizeof(float);
a_h = (float *)malloc(nBytes);
b_h = (float *)malloc(nBytes);
cudaMalloc((void **) &a_d, nBytes);
cudaMalloc((void **) &b_d, nBytes);
for (i=0, i<N; i++) a_h[i] = 100.f + i;
cudaMemcpy(a_d, a_h, nBytes, cudaMemcpyHostToDevice);
cudaMemcpy(b_d, a_d, nBytes, cudaMemcpyDeviceToDevice);
cudaMemcpy(b_h, b_d, nBytes, cudaMemcpyDeviceToHost);
for (i=0; i< N; i++) assert( a_h[i] == b_h[i] );
free(a_h); free(b_h); cudaFree(a_d); cudaFree(b_d);
return 0;
}
Host Device
a_h
b_h
a_d
b_d
Basics
![Page 96: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/96.jpg)
© 2008 NVIDIA Corporation.
Data Movement Example
int main(void)
{
float *a_h, *b_h; // host data
float *a_d, *b_d; // device data
int N = 14, nBytes, i ;
nBytes = N*sizeof(float);
a_h = (float *)malloc(nBytes);
b_h = (float *)malloc(nBytes);
cudaMalloc((void **) &a_d, nBytes);
cudaMalloc((void **) &b_d, nBytes);
for (i=0, i<N; i++) a_h[i] = 100.f + i;
cudaMemcpy(a_d, a_h, nBytes, cudaMemcpyHostToDevice);
cudaMemcpy(b_d, a_d, nBytes, cudaMemcpyDeviceToDevice);
cudaMemcpy(b_h, b_d, nBytes, cudaMemcpyDeviceToHost);
for (i=0; i< N; i++) assert( a_h[i] == b_h[i] );
free(a_h); free(b_h); cudaFree(a_d); cudaFree(b_d);
return 0;
}
Host Device
a_h
b_h
a_d
b_d
Basics
![Page 97: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/97.jpg)
© 2008 NVIDIA Corporation.
Data Movement Example
int main(void)
{
float *a_h, *b_h; // host data
float *a_d, *b_d; // device data
int N = 14, nBytes, i ;
nBytes = N*sizeof(float);
a_h = (float *)malloc(nBytes);
b_h = (float *)malloc(nBytes);
cudaMalloc((void **) &a_d, nBytes);
cudaMalloc((void **) &b_d, nBytes);
for (i=0, i<N; i++) a_h[i] = 100.f + i;
cudaMemcpy(a_d, a_h, nBytes, cudaMemcpyHostToDevice);
cudaMemcpy(b_d, a_d, nBytes, cudaMemcpyDeviceToDevice);
cudaMemcpy(b_h, b_d, nBytes, cudaMemcpyDeviceToHost);
for (i=0; i< N; i++) assert( a_h[i] == b_h[i] );
free(a_h); free(b_h); cudaFree(a_d); cudaFree(b_d);
return 0;
}
Host Device
a_h
b_h
a_d
b_d
Basics
![Page 98: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/98.jpg)
© 2008 NVIDIA Corporation.
Data Movement Example
int main(void)
{
float *a_h, *b_h; // host data
float *a_d, *b_d; // device data
int N = 14, nBytes, i ;
nBytes = N*sizeof(float);
a_h = (float *)malloc(nBytes);
b_h = (float *)malloc(nBytes);
cudaMalloc((void **) &a_d, nBytes);
cudaMalloc((void **) &b_d, nBytes);
for (i=0, i<N; i++) a_h[i] = 100.f + i;
cudaMemcpy(a_d, a_h, nBytes, cudaMemcpyHostToDevice);
cudaMemcpy(b_d, a_d, nBytes, cudaMemcpyDeviceToDevice);
cudaMemcpy(b_h, b_d, nBytes, cudaMemcpyDeviceToHost);
for (i=0; i< N; i++) assert( a_h[i] == b_h[i] );
free(a_h); free(b_h); cudaFree(a_d); cudaFree(b_d);
return 0;
}
Host Device
a_h
b_h
a_d
b_d
Basics
![Page 99: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/99.jpg)
© 2008 NVIDIA Corporation.
Data Movement Example
int main(void)
{
float *a_h, *b_h; // host data
float *a_d, *b_d; // device data
int N = 14, nBytes, i ;
nBytes = N*sizeof(float);
a_h = (float *)malloc(nBytes);
b_h = (float *)malloc(nBytes);
cudaMalloc((void **) &a_d, nBytes);
cudaMalloc((void **) &b_d, nBytes);
for (i=0, i<N; i++) a_h[i] = 100.f + i;
cudaMemcpy(a_d, a_h, nBytes, cudaMemcpyHostToDevice);
cudaMemcpy(b_d, a_d, nBytes, cudaMemcpyDeviceToDevice);
cudaMemcpy(b_h, b_d, nBytes, cudaMemcpyDeviceToHost);
for (i=0; i< N; i++) assert( a_h[i] == b_h[i] );
free(a_h); free(b_h); cudaFree(a_d); cudaFree(b_d);
return 0;
}
Host Device
a_h
b_h
a_d
b_d
Basics
![Page 100: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/100.jpg)
© 2008 NVIDIA Corporation.
Data Movement Example
int main(void)
{
float *a_h, *b_h; // host data
float *a_d, *b_d; // device data
int N = 14, nBytes, i ;
nBytes = N*sizeof(float);
a_h = (float *)malloc(nBytes);
b_h = (float *)malloc(nBytes);
cudaMalloc((void **) &a_d, nBytes);
cudaMalloc((void **) &b_d, nBytes);
for (i=0, i<N; i++) a_h[i] = 100.f + i;
cudaMemcpy(a_d, a_h, nBytes, cudaMemcpyHostToDevice);
cudaMemcpy(b_d, a_d, nBytes, cudaMemcpyDeviceToDevice);
cudaMemcpy(b_h, b_d, nBytes, cudaMemcpyDeviceToHost);
for (i=0; i< N; i++) assert( a_h[i] == b_h[i] );
free(a_h); free(b_h); cudaFree(a_d); cudaFree(b_d);
return 0;
}
Host Device
Basics
![Page 101: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/101.jpg)
© 2008 NVIDIA Corporation.
Executing Code on the GPU
Kernels are C functions with some restrictions
Cannot access host memoryMust have void return type
No variable number of arguments (“varargs”)Not recursiveNo static variables
Function arguments automatically copied from host to device
Basics
![Page 102: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/102.jpg)
© 2008 NVIDIA Corporation.
Function Qualifiers
Kernels designated by function qualifier:__global__
Function called from host and executed on deviceMust return void
Other CUDA function qualifiers__device__
Function called from device and run on deviceCannot be called from host code
__host__
Function called from host and executed on host (default)__host__ and __device__ qualifiers can be combined to generate both CPU and GPU code
Basics
![Page 103: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/103.jpg)
© 2008 NVIDIA Corporation.
Launching Kernels
Modified C function call syntax:
kernel<<<dim3 dG, dim3 dB>>>(…)
Execution Configuration (“<<< >>>”)
dG - dimension and size of grid in blocks
Two-dimensional: x and y
Blocks launched in the grid: dG.x * dG.y
dB - dimension and size of blocks in threads:
Three-dimensional: x, y, and z
Threads per block: dB.x * dB.y * dB.z
Unspecified dim3 fields initialize to 1
Basics
![Page 104: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/104.jpg)
© 2008 NVIDIA Corporation.
Execution Configuration Examples
kernel<<<32,512>>>(...);
dim3 grid, block;grid.x = 2; grid.y = 4;block.x = 8; block.y = 16;
kernel<<<grid, block>>>(...);
dim3 grid(2, 4), block(8,16);
kernel<<<grid, block>>>(...);
Equivalent assignment using constructor functions
Basics
![Page 105: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/105.jpg)
© 2008 NVIDIA Corporation.
CUDA Built-in Device Variables
All __global__ and __device__ functions have access to these automatically defined variables
dim3 gridDim;
Dimensions of the grid in blocks (at most 2D)
dim3 blockDim;
Dimensions of the block in threads
dim3 blockIdx;
Block index within the grid
dim3 threadIdx;
Thread index within the block
Basics
![Page 106: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/106.jpg)
© 2008 NVIDIA Corporation.
Unique Thread IDs
Built-in variables are used to determine unique thread IDs
Map from local thread ID (threadIdx) to a global ID which can be used as array indices
0
0 1 2 3 4
1
0 1 2 3 4
2
0 1 2 3 4
blockIdx.x
blockDim.x = 5
threadIdx.x
blockIdx.x*blockDim.x
+ threadIdx.x
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Grid
Basics
![Page 107: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/107.jpg)
!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0
/DE/D/$$0
0
! !"#$%#&'#%(")'*+%,*"-'*+%."/0#'/#%'/1%2$3#45$%
6$6"57%'**%)"6$%85"6%#&$%0'6$%9&70:)'*%
6$6"57%9""*
! ;40#%1:88$5%:/%'))$00%9'##$5/0+%)')&:/<+%$#)=
! >%.?@>%1$A:)$%:0%'%&:<&*7%9'5'**$*%95")$00"5
! B$%'0046$%:#%)'/%$3$)4#$%6'/7%&4/15$10%"8%
#&5$'10%:/%9'5'**$*
! 2&5$'10%C%D#5$'6%E5")$00"50%F%G
! B&$/%H5:#:/<%.?@>%0"8#H'5$+%#&:/I%:/%#$560%"8%#&5$'10+%/"#%95")$00"50
! >%!"#$"% :0%$3$)4#$1%'0%'%!"#$
! >%&#'( :0%'%)"**$)#:"/%"8%%&"'($)*+,-./
! >%)*#"+(,-%./!,:0%'%)"**$)#:"/%"8%%&"'($/
! 2&5$'1%-*")I0%'/1%#&5$'10%'5$%<:A$/%4/:J4$%
:1$/#:8:$50%
! K1$/#:8:$50%-$%G@+%L@%"5%M@
! ?0$1%#"%&$*9%:1$/#:87%H&:)&%9'5#%"8%'%95"-*$6%
'%#&5$'1N-*")I%0&"4*1%"9$5'#$%"/
@$A:)$
,5:1
0)
12324
0
12354
0)
15324
0
15354
0)
16324
0)
16354
!
! 2&5$'1%O*")I%PG+GQ
7)
12324
7)
12354!
7)
15324
7)
15354
7)
16324
7)
16354
!
!
slide by Matthew Bolitho
Basics
![Page 108: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/108.jpg)
!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0
/DE/D/$$0
F
!"#$%&'()*+,-
! !"#$%&'(")*+,-".'/"$'0&"12"#+"345"#$%&'(6
! !**"#$%&'(6"78"'"#$%&'(")*+,-"'%&"%18"+8"#$&"
6'.&".1*#792%+,&66+%
! :$16",'8",+..187,'#&"07'"6$'%&(".&.+%/
! !8("6/8,$%+87;&"
! :$%&'(6"+<"'")*+,-"'%&".1*#72*&=&("+8#+"'"
.1*#792%+,&66+%"'6"!"#$%
>?@
A+%#$)%7(B&
CD!E
F+1#$)%7(B&
F!:! G#$&%8&#
H%'2$7,6">'%("I"
>@C!
J%+8#"F7(&"K16
E&.+%/"K16 ?>L"K16
?>L9G=2
%&66"K16
!
! ./012 +%"./0$! D&2*',&("!H?
! ?5?M"J1**"C12*&="F&%7'*M"F/..&#%7,"K16! 53NEKI6")'8(O7(#$"78"&',$"(7%&,#7+8
! "#$$#%&'()#$*+%,-(+%.#($/&.+0&,(1&2%,3(,+8<7B1%'#7+86P""GPBQ! ?>L9G"4R="S"4R"*'8&6
! 4R"#7.&6"#$&")'8(O7(#$"TUHKI6V
! :$&">@C!"62&,7<7,'#7+8"$'6")&&8"12('#&(! W&%67+8"4PN"4 L87#7'*"%&*&'6&M"N4INX
! W&%67+8"4P4"4@2('#&"O7#$"8&O&%"$'%(O'%&M"NUINX
! K',-O'%(6",+.2'#7)*&
! G=2&,#&("12('#&6"78"8&'%"<1#1%&Q! W&%67+8"4P5"I"5PN
! RY9)7#"<*+'#78B"2+78#"6122+%#"T7P&P"(+1)*&V
! W&%67+8"4P4"'((&("6+.&"7.2+%#'8#"16&<1*"
<&'#1%&6Q
3*456%#$
! !6/8,$%+8+16".&.+%/",+27&6
! !6/8,$%+8+16"H?@"2%+B%'."*'18,$
7%#&6%#$
! !#+.7,".&.+%/"786#%1,#7+86
slide by Matthew Bolitho
Basics
![Page 109: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/109.jpg)
© 2008 NVIDIA Corporation.
Minimal Kernels
__global__ void minimal( int* a_d, int value)
{
*a_d = value;
}
__global__ void assign( int* a_d, int value)
{
int idx = blockDim.x * blockIdx.x + threadIdx.x;
a_d[idx] = value;
}
Basics
![Page 110: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/110.jpg)
© 2008 NVIDIA Corporation.
Increment Array Example
CPU program CUDA program
void inc_cpu(int *a, int N)
{
int idx;
for (idx = 0; idx<N; idx++)
a[idx] = a[idx] + 1;
}
int main()
{
...
inc_cpu(a, N);
}
__global__ void inc_gpu(int *a, int N)
{
int idx = blockIdx.x * blockDim.x
+ threadIdx.x;
if (idx < N)
a[idx] = a[idx] + 1;
}
int main()
{
…
dim3 dimBlock (blocksize);
dim3 dimGrid( ceil( N / (float)blocksize) );
inc_gpu<<<dimGrid, dimBlock>>>(a, N);
}
Basics
![Page 111: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/111.jpg)
© 2008 NVIDIA Corporation.
Host Synchronization
All kernel launches are asynchronouscontrol returns to CPU immediately
kernel executes after all previous CUDA calls have completed
cudaMemcpy() is synchronouscontrol returns to CPU after copy completes
copy starts after all previous CUDA calls have completed
cudaThreadSynchronize()blocks until all previous CUDA calls complete
Basics
![Page 112: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/112.jpg)
© 2008 NVIDIA Corporation.
Host Synchronization Example
// copy data from host to device
cudaMemcpy(a_d, a_h, numBytes, cudaMemcpyHostToDevice);
// execute the kernel
inc_gpu<<<ceil(N/(float)blocksize), blocksize>>>(a_d, N);
// run independent CPU code
run_cpu_stuff();
// copy data from device back to host
cudaMemcpy(a_h, a_d, numBytes, cudaMemcpyDeviceToHost);
Basics
![Page 113: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/113.jpg)
© NVIDIA Corporation 2006 29
Device Runtime Component:Synchronization Function
void __syncthreads();
Synchronizes all threads in a block
Once all threads have reached this point, execution resumes normally
Used to avoid RAW / WAR / WAW hazards when accessing shared
Allowed in conditional code only if the conditional is uniform across the entire thread block
Basics
![Page 114: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/114.jpg)
© 2008 NVIDIA Corporation.
Variable Qualifiers (GPU code)
__device__Stored in global memory (large, high latency, no cache)Allocated with cudaMalloc (__device__ qualifier implied)Accessible by all threads
Lifetime: application
__shared__Stored in on-chip shared memory (very low latency)Specified by execution configuration or at compile timeAccessible by all threads in the same thread block
Lifetime: thread block
Unqualified variables:Scalars and built-in vector types are stored in registersWhat doesn’t fit in registers spills to “local” memory
Basics
![Page 115: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/115.jpg)
© 2008 NVIDIA Corporation.
CUDA Error Reporting to CPU
All CUDA calls return error code:Except for kernel launches
cudaError_t type
cudaError_t cudaGetLastError(void)
Returns the code for the last error (no error has a code)
Can be used to get error from kernel execution
char* cudaGetErrorString(cudaError_t code)Returns a null-terminated character string describing the error
printf(“%s\n”, cudaGetErrorString( cudaGetLastError() ) );
Basics
![Page 116: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/116.jpg)
© NVIDIA Corporation 2006 26
Host Runtime Component:Device Management
Device enumerationcudaGetDeviceCount(), cudaGetDeviceProperties()
Device selectioncudaChooseDevice(), cudaSetDevice()
> ~/NVIDIA_CUDA_SDK/bin/linux/release/deviceQuery
There is 1 device supporting CUDA
Device 0: "Quadro FX 5600" Major revision number: 1 Minor revision number: 0 Total amount of global memory: 1609891840 bytes Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 16384 bytes Total number of registers available per block: 8192 Warp size: 32 Maximum number of threads per block: 512 Maximum sizes of each dimension of a block: 512 x 512 x 64 Maximum sizes of each dimension of a grid: 65535 x 65535 x 1 Maximum memory pitch: 262144 bytes Texture alignment: 256 bytes Clock rate: 1350000 kilohertz
Basics
![Page 117: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/117.jpg)
© NVIDIA Corporation 2006 27
Host Runtime Component:Memory Management
Two kinds of memory:Linear memory: accessed through 32-bit pointers
CUDA arrays: opaque layouts with dimensionality
readable only through texture objects
Memory allocation
cudaMalloc(), cudaFree(), cudaMallocPitch(),
cudaMallocArray(), cudaFreeArray()
Memory copycudaMemcpy(), cudaMemcpy2D(),
cudaMemcpyToArray(), cudaMemcpyFromArray(), etc.
cudaMemcpyToSymbol(), cudaMemcpyFromSymbol()
Memory addressingcudaGetSymbolAddress()
Basics
![Page 118: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/118.jpg)
!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0
/DE/D/$$0
F
!"#$%&'()*+,-
! !"#$%&'(")*+,-".'/"$'0&"12"#+"345"#$%&'(6
! !**"#$%&'(6"78"'"#$%&'(")*+,-"'%&"%18"+8"#$&"
6'.&".1*#792%+,&66+%
! :$16",'8",+..187,'#&"07'"6$'%&(".&.+%/
! !8("6/8,$%+87;&"
! :$%&'(6"+<"'")*+,-"'%&".1*#72*&=&("+8#+"'"
.1*#792%+,&66+%"'6"!"#$%
>?@
A+%#$)%7(B&
CD!E
F+1#$)%7(B&
F!:! G#$&%8&#
H%'2$7,6">'%("I"
>@C!
J%+8#"F7(&"K16
E&.+%/"K16 ?>L"K16
?>L9G=2
%&66"K16
!
! ./012 +%"./0$! D&2*',&("!H?
! ?5?M"J1**"C12*&="F&%7'*M"F/..&#%7,"K16! 53NEKI6")'8(O7(#$"78"&',$"(7%&,#7+8
! "#$$#%&'()#$*+%,-(+%.#($/&.+0&,(1&2%,3(,+8<7B1%'#7+86P""GPBQ! ?>L9G"4R="S"4R"*'8&6
! 4R"#7.&6"#$&")'8(O7(#$"TUHKI6V
! :$&">@C!"62&,7<7,'#7+8"$'6")&&8"12('#&(! W&%67+8"4PN"4 L87#7'*"%&*&'6&M"N4INX
! W&%67+8"4P4"4@2('#&"O7#$"8&O&%"$'%(O'%&M"NUINX
! K',-O'%(6",+.2'#7)*&
! G=2&,#&("12('#&6"78"8&'%"<1#1%&Q! W&%67+8"4P5"I"5PN
! RY9)7#"<*+'#78B"2+78#"6122+%#"T7P&P"(+1)*&V
! W&%67+8"4P4"'((&("6+.&"7.2+%#'8#"16&<1*"
<&'#1%&6Q
3*456%#$
! !6/8,$%+8+16".&.+%/",+27&6
! !6/8,$%+8+16"H?@"2%+B%'."*'18,$
7%#&6%#$
! !#+.7,".&.+%/"786#%1,#7+86
slide by Matthew Bolitho
Basics
![Page 119: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/119.jpg)
!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(
=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0
/DE/D/$$0
F
!"#$%&'()*+,-
! !"#$%&'(")*+,-".'/"$'0&"12"#+"345"#$%&'(6
! !**"#$%&'(6"78"'"#$%&'(")*+,-"'%&"%18"+8"#$&"
6'.&".1*#792%+,&66+%
! :$16",'8",+..187,'#&"07'"6$'%&(".&.+%/
! !8("6/8,$%+87;&"
! :$%&'(6"+<"'")*+,-"'%&".1*#72*&=&("+8#+"'"
.1*#792%+,&66+%"'6"!"#$%
>?@
A+%#$)%7(B&
CD!E
F+1#$)%7(B&
F!:! G#$&%8&#
H%'2$7,6">'%("I"
>@C!
J%+8#"F7(&"K16
E&.+%/"K16 ?>L"K16
?>L9G=2
%&66"K16
!
! ./012 +%"./0$! D&2*',&("!H?
! ?5?M"J1**"C12*&="F&%7'*M"F/..&#%7,"K16! 53NEKI6")'8(O7(#$"78"&',$"(7%&,#7+8
! "#$$#%&'()#$*+%,-(+%.#($/&.+0&,(1&2%,3(,+8<7B1%'#7+86P""GPBQ! ?>L9G"4R="S"4R"*'8&6
! 4R"#7.&6"#$&")'8(O7(#$"TUHKI6V
! :$&">@C!"62&,7<7,'#7+8"$'6")&&8"12('#&(! W&%67+8"4PN"4 L87#7'*"%&*&'6&M"N4INX
! W&%67+8"4P4"4@2('#&"O7#$"8&O&%"$'%(O'%&M"NUINX
! K',-O'%(6",+.2'#7)*&
! G=2&,#&("12('#&6"78"8&'%"<1#1%&Q! W&%67+8"4P5"I"5PN
! RY9)7#"<*+'#78B"2+78#"6122+%#"T7P&P"(+1)*&V
! W&%67+8"4P4"'((&("6+.&"7.2+%#'8#"16&<1*"
<&'#1%&6Q
3*456%#$
! !6/8,$%+8+16".&.+%/",+27&6
! !6/8,$%+8+16"H?@"2%+B%'."*'18,$
7%#&6%#$
! !#+.7,".&.+%/"786#%1,#7+86
slide by Matthew Bolitho
Basics
![Page 120: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/120.jpg)
COME
![Page 121: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/121.jpg)
Back Pocket Slides
slide by David Cox
![Page 122: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/122.jpg)
Code Walkthrough 2:Parallel Reduction
![Page 123: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/123.jpg)
© NVIDIA Corporation 2006 37
Execution Decomposition
Two stages of computation:Sum within each block
Sum partial results from the blocks
For reductions, code for all levels is the same
4 7 5 9
11 14
25
3 1 7 0 4 1 6 3
4 7 5 9
11 14
25
3 1 7 0 4 1 6 3
4 7 5 9
11 14
25
3 1 7 0 4 1 6 3
4 7 5 9
11 14
25
3 1 7 0 4 1 6 3
4 7 5 9
11 14
25
3 1 7 0 4 1 6 3
4 7 5 9
11 14
25
3 1 7 0 4 1 6 3
4 7 5 9
11 14
25
3 1 7 0 4 1 6 3
4 7 5 9
11 14
25
3 1 7 0 4 1 6 3
4 7 5 9
11 14
25
3 1 7 0 4 1 6 3
Stage 1:many blocks
Stage2:1 block
![Page 124: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/124.jpg)
© NVIDIA Corporation 2006 38
Kernel execution
10 1 8 -1 0 -2 3 5 -2 -3 2 7 0 11 0 2Values (shared memory)
0 1 2 3 4 5 6 7
8 -2 10 6 0 9 3 7 -2 -3 2 7 0 11 0 2values
0 1 2 3
8 7 13 13 0 9 3 7 -2 -3 2 7 0 11 0 2values
0 1
21 20 13 13 0 9 3 7 -2 -3 2 7 0 11 0 2values
0
41 20 13 13 0 9 3 7 -2 -3 2 7 0 11 0 2values
threads
Step 1 Distance 8
Step 2 Distance 4
Step 3 Distance 2
Step 4 Distance 1
threads
threads
threads
![Page 125: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/125.jpg)
© NVIDIA Corporation 2006 39
Kernel Source Code
__global__ void sum_kernel(int *g_input, int *g_output){ extern __shared__ int s_data[]; // allocated during kernel launch
// read input into shared memory unsigned int idx = blockIdx.x * blockDim.x + threadIdx.x; s_data[threadIdx.x] = g_input[idx]; __syncthreads();
// compute sum for the threadblock for(int dist = blockDim.x/2; dist>0; dist/=2) { if(threadIdx.x<dist) s_data[threadIdx.x] += s_data[threadIdx.x+dist]; __syncthreads(); }
// write the block's sum to global memory if(threadIdx.x==0) g_output[blockIdx.x] = s_data[0];}
![Page 126: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/126.jpg)
© NVIDIA Corporation 2006 40
Host Source Code (1)
int main(){ // data set size in elements and bytes unsigned int n = 4096; unsigned int num_bytes = n*sizeof(int);
// launch configuration parameters unsigned int block_dim = 256; unsigned int num_blocks = n / block_dim; unsigned int num_smem_bytes = block_dim*sizeof(int); // allocate and initialize the data on the CPU int *h_a=(int*)malloc(num_bytes); for(int i=0;i<n;i++) h_a[i]=1; // allocate memory on the GPU device int *d_a=0, *d_output=0; cudaMalloc((void**)&d_a, num_bytes); cudaMalloc((void**)&d_output, num_blocks*sizeof(int));
...
![Page 127: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)](https://reader030.fdocuments.net/reader030/viewer/2022020301/543d7ad28d7f72fa138b534b/html5/thumbnails/127.jpg)
© NVIDIA Corporation 2006 41
Host Source Code (2)
...
// copy the input data from CPU to the GPU device cudaMemcpy(d_a, h_a, num_bytes, cudaMemcpyHostToDevice);
// two stages of kernel execution sum_kernel<<<num_blocks, block_dim, num_smem_bytes>>>(d_a, d_output); sum_kernel<<<1, num_blocks, num_blocks*sizeof(int)>>>(d_output, d_output);
// copy the output from GPU device to CPU and print cudaMemcpy(h_a, d_output, sizeof(int), cudaMemcpyDeviceToHost); printf("%d\n", h_a[0]);
// release resources cudaFree(d_a); cudaFree(d_output); free(h_a); return 0;}