CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP...
Transcript of CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP...
![Page 1: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/1.jpg)
CS61C:GreatIdeasinComputerArchitecture
Lecture19:Thread-LevelParallelProcessing
Krste Asanović &RandyH.Katz
http://inst.eecs.berkeley.edu/~cs61c/fa17
111/2/17 Fall2017 - Lecture#19
![Page 2: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/2.jpg)
Agenda
• MIMD- multipleprogramssimultaneously• Threads• Parallelprogramming:OpenMP• Synchronizationprimitives• SynchronizationinOpenMP• And,inConclusion…
211/2/17 Fall2017 - Lecture#19
![Page 3: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/3.jpg)
ImprovingPerformance1. Increaseclockratefs
− Reachedpracticalmaximumfortoday’stechnology− <5GHzforgeneralpurposecomputers
2. LowerCPI(cyclesperinstruction)− SIMD,“instructionlevelparallelism”
3. Performmultipletaskssimultaneously− MultipleCPUs,eachexecutingdifferentprogram− Tasksmayberelated
§ E.g.eachCPUperformspartofabigmatrixmultiplication− orunrelated
§ E.g.distributedifferentwebhttprequestsoverdifferentcomputers§ E.g.runpptx (viewlectureslides)andbrowser(youtube)simultaneously
4. Doalloftheabove:− Highfs,SIMD,multipleparalleltasks
Today’slecture
311/2/17 Fall2017 - Lecture#19
![Page 4: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/4.jpg)
New-SchoolMachineStructures(It’sabitmorecomplicated!)
• ParallelRequestsAssignedtocomputere.g.,Search“Katz”
• ParallelThreadsAssignedtocoree.g.,Lookup,Ads
• ParallelInstructions>[email protected].,5pipelinedinstructions
• ParallelData>[email protected].,Addof4pairsofwords
• HardwaredescriptionsAllgates@onetime
• ProgrammingLanguages
SmartPhone
WarehouseScale
Computer
SoftwareHardware
HarnessParallelism&AchieveHighPerformance
LogicGates
Core Core…
Memory(Cache)
Input/Output
Computer
CacheMemory
Core
InstructionUnit(s) FunctionalUnit(s)
A3+B3A2+B2A1+B1A0+B0
Projects3and5!
411/2/17 Fall2017 - Lecture#19
![Page 5: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/5.jpg)
ParallelComputerArchitectures
Severalseparatecomputers,somemeansforcommunication(e.g.,Ethernet)
Massivearrayofcomputers,fastcommunicationbetweenprocessors
Multi-coreCPU:1datapathinsinglechip
shareL3cache,memory,peripheralsExample:Hivemachines
GPU“graphicsprocessingunit”
511/2/17 Fall2017 - Lecture#19
![Page 6: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/6.jpg)
Example:CPUwithTwoCoresProcessor“Core”1
Control
DatapathPC
Registers(ALU)
MemoryInput
Output
Bytes
I/O-MemoryInterfaces
Processor0MemoryAccesses
Processor“Core”2
Control
DatapathPC
Registers(ALU)
Processor1MemoryAccesses
611/2/17 Fall2017 - Lecture#19
![Page 7: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/7.jpg)
MultiprocessorExecutionModel
• Eachprocessor(core)executesitsowninstructions• Separate resources(notshared)
− Datapath(PC,registers,ALU)− Highestlevelcaches(e.g.,1st and2nd)
• Shared resources− Memory(DRAM)− Often3rd levelcache
§ Oftenonsamesiliconchip§ Butnotarequirement
• Nomenclature− “MultiprocessorMicroprocessor”− Multicoreprocessor
§ E.g.,fourcoreCPU(centralprocessingunit)§ Executesfourdifferentinstructionstreamssimultaneously
711/2/17 Fall2017 - Lecture#19
![Page 8: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/8.jpg)
TransitiontoMulticore
Sequential App Performance
811/2/17 Fall2017 - Lecture#19
![Page 9: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/9.jpg)
Pixel2vs.iPhone8
911/2/17 Fall2017 - Lecture#19
![Page 10: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/10.jpg)
Pixel2vs.iPhone8
1011/2/17 Fall2017 - Lecture#19
ALUs nm MHz GFlops
2.35Ghz+1.9Ghz,64BitOcta-Core
![Page 11: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/11.jpg)
Pixel2vs.iPhone8
1111/2/17 Fall2017 - Lecture#19
![Page 12: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/12.jpg)
Pixel2vs.iPhone8
1211/2/17 Fall2017 - Lecture#19
![Page 13: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/13.jpg)
MultiprocessorExecutionModel
• Sharedmemory− Each“core”hasaccesstotheentirememoryintheprocessor− Specialhardwarekeepscachesconsistent(nextlecture!)− Advantages:
§ Simplifiescommunicationinprogramviasharedvariables− Drawbacks:
§ Doesnotscalewell:o “Slow”memorysharedbymany“customers”(cores)o Maybecomebottleneck(Amdahl’sLaw)
• Twowaystouseamultiprocessor:− Job-levelparallelism
§ Processorsworkonunrelatedproblems§ Nocommunicationbetweenprograms
− Partitionworkofsingletaskbetweenseveralcores§ E.g.,eachperformspartoflargematrixmultiplication
1311/2/17 Fall2017 - Lecture#19
![Page 14: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/14.jpg)
ParallelProcessing• It’sdifficult!• It’sinevitable
− Onlypathtoincreaseperformance− Onlypathtolowerenergyconsumption(improvebatterylife)
• Inmobilesystems(e.g.,smartphones,tablets)− Multiplecores− Dedicatedprocessors,e.g.,
§ Motionprocessor,imageprocessor,neuralprocessoriniPhone8+X§ GPU(graphicsprocessingunit)
• Warehouse-scalecomputers(nextweek!)− Multiple“nodes”
§ “Boxes”withseveralCPUs,disksperbox− MIMD(multi-core)andSIMD(e.g.AVX)ineachnode
1411/2/17 Fall2017 - Lecture#19
![Page 15: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/15.jpg)
1511/2/17 Fall2017 - Lecture#19
PotentialParallelPerformance(assumingsoftwarecanuseit)
Year Cores SIMD bits /Core Core *SIMD bits
Total, e.g.FLOPs/Cycle
2003 2 128 256 42005 4 128 512 82007 6 128 768 122009 8 128 1024 162011 10 256 2560 402013 12 256 3072 482015 14 512 7168 1122017 16 512 8192 1282019 18 1024 18432 2882021 20 1024 20480 320
2.5X 8X 20X
MIMD SIMD MIMD&SIMD+2/
2yrs2X/4yrs
12years
20xin12years201/12 =1.28xà 28%peryearor2xevery3years!
IF(!)wecanuseit
![Page 16: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/16.jpg)
Agenda
• MIMD- multipleprogramssimultaneously• Threads• Parallelprogramming:OpenMP• Synchronizationprimitives• SynchronizationinOpenMP• And,inConclusion…
1611/2/17 Fall2017 - Lecture#19
![Page 17: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/17.jpg)
ProgramsRunningonmyComputerPID TTY TIME CMD220 ?? 0:04.34 /usr/libexec/UserEventAgent (Aqua)222 ?? 0:10.60 /usr/sbin/distnoted agent224 ?? 0:09.11 /usr/sbin/cfprefsd agent229 ?? 0:04.71 /usr/sbin/usernoted230 ?? 0:02.35 /usr/libexec/nsurlsessiond232 ?? 0:28.68 /System/Library/PrivateFrameworks/CalendarAgent.framework/Executables/CalendarAgent234 ?? 0:04.36 /System/Library/PrivateFrameworks/GameCenterFoundation.framework/Versions/A/gamed235 ?? 0:01.90 /System/Library/CoreServices/cloudphotosd.app/Contents/MacOS/cloudphotosd236 ?? 0:49.72 /usr/libexec/secinitd239 ?? 0:01.66 /System/Library/PrivateFrameworks/TCC.framework/Resources/tccd240 ?? 0:12.68 /System/Library/Frameworks/Accounts.framework/Versions/A/Support/accountsd241 ?? 0:09.56 /usr/libexec/SafariCloudHistoryPushAgent242 ?? 0:00.27 /System/Library/PrivateFrameworks/CallHistory.framework/Support/CallHistorySyncHelper243 ?? 0:00.74 /System/Library/CoreServices/mapspushd244 ?? 0:00.79 /usr/libexec/fmfd246 ?? 0:00.09 /System/Library/PrivateFrameworks/AskPermission.framework/Versions/A/Resources/askpermissiond248 ?? 0:01.03 /System/Library/PrivateFrameworks/CloudDocsDaemon.framework/Versions/A/Support/bird249 ?? 0:02.50 /System/Library/PrivateFrameworks/IDS.framework/identityservicesd.app/Contents/MacOS/identityservicesd250 ?? 0:04.81 /usr/libexec/secd254 ?? 0:24.01 /System/Library/PrivateFrameworks/CloudKitDaemon.framework/Support/cloudd258 ?? 0:04.73 /System/Library/PrivateFrameworks/TelephonyUtilities.framework/callservicesd267 ?? 0:02.15 /System/Library/CoreServices/AirPlayUIAgent.app/Contents/MacOS/AirPlayUIAgent --launchd271 ?? 0:03.91 /usr/libexec/nsurlstoraged274 ?? 0:00.90 /System/Library/PrivateFrameworks/CommerceKit.framework/Versions/A/Resources/storeaccountd282 ?? 0:00.09 /usr/sbin/pboard283 ?? 0:00.90
/System/Library/PrivateFrameworks/InternetAccounts.framework/Versions/A/XPCServices/com.apple.internetaccounts.xpc/Contents/MacOS/com.apple.internetaccounts285 ?? 0:04.72 /System/Library/Frameworks/ApplicationServices.framework/Frameworks/ATS.framework/Support/fontd291 ?? 0:00.25 /System/Library/Frameworks/Security.framework/Versions/A/Resources/CloudKeychainProxy.bundle/Contents/MacOS/CloudKeychainProxy292 ?? 0:09.54 /System/Library/CoreServices/CoreServicesUIAgent.app/Contents/MacOS/CoreServicesUIAgent293 ?? 0:00.29
/System/Library/PrivateFrameworks/CloudPhotoServices.framework/Versions/A/Frameworks/CloudPhotoServicesConfiguration.framework/Versions/A/XPCServices/com.apple.CloudPhotosConfiguration.xpc/Contents/MacOS/com.apple.CloudPhotosConfiguration
297 ?? 0:00.84 /System/Library/PrivateFrameworks/CloudServices.framework/Resources/com.apple.sbd302 ?? 0:26.11 /System/Library/CoreServices/Dock.app/Contents/MacOS/Dock303 ?? 0:09.55 /System/Library/CoreServices/SystemUIServer.app/Contents/MacOS/SystemUIServer
…156total at this momentHow does mylaptopdothis?
Imagine doing 156assignments all at the same time!1711/2/17 Fall2017 - Lecture#19
ps -x
![Page 18: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/18.jpg)
Threads• Sequentialflowofinstructionsthatperformssometask
− Uptonowwejustcalledthisa“program”
• Eachthreadhas:− DedicatedPC(programcounter)− Separateregisters− Accessesthesharedmemory
• Eachphysicalcoreprovidesone(ormore)− Hardwarethreads thatactivelyexecuteinstructions− Eachexecutesone“hardwarethread”
• Operatingsystemmultiplexesmultiple− Softwarethreads ontotheavailablehardwarethreads− Allthreadsexceptthosemappedtohardwarethreadsarewaiting
1811/2/17 Fall2017 - Lecture#19
![Page 19: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/19.jpg)
OperatingSystemThreads
Giveillusionofmany“simultaneously”activethreads1. Multiplexsoftwarethreadsontohardwarethreads:
a) Switchoutblockedthreads(e.g.,cachemiss,userinput,networkaccess)b) Timer(e.g.,switchactivethreadevery1ms)
2. Removeasoftwarethreadfromahardwarethreadbya) Interruptingitsexecutionb) SavingitsregistersandPCtomemory
3. Startexecutingadifferentsoftwarethreadbya) Loadingitspreviouslysavedregistersintoahardwarethread’sregistersb) JumpingtoitssavedPC
1911/2/17 Fall2017 - Lecture#19
![Page 20: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/20.jpg)
Example:FourCoresThreadpool:Listofthreadscompetingforprocessor
OSmapsthreadstocoresandscheduleslogical(software)threads
Core2
Each“Core”activelyrunsoneinstructionstreamatatime
Core1 Core3 Core4
2011/2/17 Fall2017 - Lecture#19
![Page 21: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/21.jpg)
Multithreading
• Typicalscenario:− Activethreadencounterscachemiss− Activethreadwaits~ 1000cyclesfordatafromDRAM−à switchoutandrundifferentthreaduntildataavailable
• Problem−Mustsavecurrentthreadstateandloadnewthreadstate
§ PC,allregisters(couldbemany,e.g.AVX)−àmustperformswitchin≪1000cycles
• Canhardwarehelp?−Moore’sLaw:transistorsareplenty
2111/2/17 Fall2017 - Lecture#19
![Page 22: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/22.jpg)
• TwocopiesofPCandRegistersinsideprocessorhardware
• Looksidenticaltotwoprocessorstosoftware(hardwarethread0,hardwarethread1)
• Hyperthreading:• Boththreadscanbeactivesimultaneously
HardwareAssistedSoftwareMultithreading
22
MemoryInput
Output
Bytes
I/O-MemoryInterfaces
Processor(1 Core,2Threads)
Control
DatapathPC0
Registers0
(ALU)
PC1
Registers1
CS61c Lecture19:ThreadLevelParallelProcessing
![Page 23: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/23.jpg)
Multithreading
• Logicalthreads− ≈1%morehardware,≈10%(?)betterperformance
§ Separateregisters§ Sharedatapath,ALU(s),caches
• Multicore− =>DuplicateProcessors− ≈50%morehardware,≈2Xbetterperformance?
• Modernmachinesdoboth−Multiplecoreswithmultiplethreads percore
2311/2/17 Fall2017 - Lecture#19
![Page 24: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/24.jpg)
Randy’sLaptop
$ sysctl -a | grep hw
hw.physicalcpu: 2hw.logicalcpu: 4hw.l1icachesize: 32,768 hw.l1dcachesize: 32,768hw.l2cachesize: 262,144hw.l3cachesize: 4,194,304
• 2Cores• 4Threadstotal
2411/2/17 Fall2017 - Lecture#19
![Page 25: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/25.jpg)
Example:6Cores,24LogicalThreads
Threadpool:Listofthreadscompetingforprocessor
OSmapsthreadstocoresandscheduleslogical(software)threads
Thread1Core2
Thread2
Thread3
Thread4
Thread1Core6
Thread2
Thread3
Thread4
Thread1Core4
Thread2
Thread3
Thread4
Thread1Core5
Thread2
Thread3
Thread4
Thread1Core3
Thread2
Thread3
Thread4
Thread1Core1
Thread2
Thread3
Thread4
4Logicalthreadspercore(hardware)thread2511/2/17 Fall2017 - Lecture#19
![Page 26: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/26.jpg)
Break!
2611/2/17 Fall2017 - Lecture#19
![Page 27: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/27.jpg)
Agenda
• MIMD- multipleprogramssimultaneously• Threads• Parallelprogramming:OpenMP• Synchronizationprimitives• SynchronizationinOpenMP• And,inConclusion…
2711/2/17 Fall2017 - Lecture#19
![Page 28: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/28.jpg)
LanguagesSupportingParallelProgramming
ActorScript Concurrent Pascal JoCaml OrcAda Concurrent ML Join OzAfnix Concurrent Haskell Java PictAlef Curry Joule ReiaAlice CUDA Joyce SALSAAPL E LabVIEW ScalaAxum Eiffel Limbo SISALChapel Erlang Linda SRCilk Fortan 90 MultiLisp Stackless PythonClean Go Modula-3 SuperPascalClojure Io Occam VHDLConcurrent C Janus occam-π XC
Whichonetopick?2811/2/17 Fall2017 - Lecture#19
![Page 29: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/29.jpg)
WhySoManyParallelProgrammingLanguages?
• Why“intrinsics”?− TOIntel:fixyour#()&$!Compiler!
• It’shappening...but− SIMDfeaturesarecontinuallyaddedtocompilers(Intel,gcc)− Intenseareaofresearch− Researchprogress:
§ 20+yearstotranslateCintogood(fast!)assembly§ HowlongtotranslateCintogood(fast!)parallelcode?
o Generalproblemisveryhardtosolveo Presentstate:specializedsolutionsforspecificcaseso Youropportunitytobecomefamous!
2911/2/17 Fall2017 - Lecture#19
![Page 30: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/30.jpg)
ParallelProgrammingLanguages
• Numberofchoicesisindicationof− Nouniversalsolution
§ Needsareveryproblemspecific− E.g.,
§ Scientificcomputing/machinelearning(matrixmultiply)§ Webserver:handlemanyunrelatedrequestssimultaneously§ Input/output:it’sallhappeningsimultaneously!
• Specializedlanguagesfordifferenttasks− Someareeasiertouse(forsomeproblems)− Noneisparticularly”easy”touse
• 61C− Parallellanguageexamplesforhigh-performancecomputing−OpenMP
3011/2/17 Fall2017 - Lecture#19
![Page 31: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/31.jpg)
ParallelLoops
• Serialexecution:for (int i=0; i<100; i++) {
…}
• ParallelExecution:
for (int i=0; i<25; i++) { …
}
for (int i=25; i<50; i++) {
…}
for (int i=50; i<75; i++) {
…}
for (int i=75; i<100; i++) {
…}
3111/2/17 Fall2017 - Lecture#19
![Page 32: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/32.jpg)
Parallelfor inOpenMP
#include <omp.h>
#pragma omp parallel forfor (int i=0; i<100; i++) {
…}
3211/2/17 Fall2017 - Lecture#19
![Page 33: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/33.jpg)
OpenMPExample
$ gcc-5 -fopenmp for.c;./a.outthread 0, i = 0thread 1, i = 3thread 2, i = 6thread 3, i = 8thread 0, i = 1thread 1, i = 4thread 2, i = 7thread 3, i = 9thread 0, i = 2thread 1, i = 501 02 03 14 15 16 27 28 39 40
3311/2/17 Fall2017 - Lecture#19
![Page 34: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/34.jpg)
OpenMP
• Cextension:nonewlanguagetolearn• Multi-threaded,shared-memoryparallelism
− CompilerDirectives,#pragma− RuntimeLibraryRoutines,#include <omp.h>
• #pragma− IgnoredbycompilersunawareofOpenMP− Samesourceformultiplearchitectures
§ E.g.,sameprogramfor1&16cores
• Onlyworkswithsharedmemory
3411/2/17 Fall2017 - Lecture#19
![Page 35: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/35.jpg)
OpenMPProgrammingModel• Fork- JoinModel:
• OpenMPprogramsbeginassingleprocess(masterthread)− Sequentialexecution
• Whenparallelregionisencountered− Masterthread“forks” intoteamofparallelthreads− Executedsimultaneously− Atendofparallelregion,parallelthreads”join”,leavingonlymasterthread
• Processrepeatsforeachparallelregion− Amdahl’sLaw?
3511/2/17 Fall2017 - Lecture#19
![Page 36: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/36.jpg)
WhatKindofThreads?
• OpenMPthreadsareoperatingsystem(software)threads• OSwillmultiplexrequestedOpenMPthreadsontoavailablehardwarethreads• Hopefullyeachgetsarealhardwarethreadtorunon,sonoOS-leveltime-multiplexing• Butothertasksonmachinecompeteforhardwarethreads!• Be“careful”(?)whentimingresultsforProject3!
− 5AM?− Jobqueue?
3611/2/17 Fall2017 - Lecture#19
![Page 37: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/37.jpg)
Example2:Computingp
http://openmp.org/mp-documents/omp-hands-on-SC08.pdf3711/2/17 Fall2017 - Lecture#19
![Page 38: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/38.jpg)
Sequentialp
pi = 3.142425985001
• Resemblesp,butnotveryaccurate• Let’sincreasenum_steps andparallelize
3811/2/17 Fall2017 - Lecture#19
![Page 39: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/39.jpg)
Parallelize(1)…
• Problem:eachthreadsneedsaccesstothesharedvariablesum
• Coderunssequentially…
3911/2/17 Fall2017 - Lecture#19
![Page 40: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/40.jpg)
Parallelize(2)…
sum[0] sum[1]
1. Computesum[0]andsum[1]
inparallel
2. Computesum = sum[0] + sum[1]
sequentially
4011/2/17 Fall2017 - Lecture#19
![Page 41: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/41.jpg)
Parallelp
4111/2/17 Fall2017 - Lecture#19
![Page 42: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/42.jpg)
TrialRun
i = 1, id = 1i = 0, id = 0i = 2, id = 2i = 3, id = 3i = 5, id = 1i = 4, id = 0i = 6, id = 2i = 7, id = 3i = 9, id = 1i = 8, id = 0pi = 3.142425985001
4211/2/17 Fall2017 - Lecture#19
![Page 43: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/43.jpg)
Scaleup:num_steps = 106
pi = 3.141592653590
Youverify howmany digitsarecorrect …
4311/2/17 Fall2017 - Lecture#19
![Page 44: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/44.jpg)
CanWeParallelizeComputingsum?
Summationinsideparallelsection• Insignificantspeedupinthisexample,but…• pi = 3.138450662641• Wrong!And value changes between runs?!• What’s going on?
AlwayslookingforwaystobeatAmdahl’sLaw…
4411/2/17 Fall2017 - Lecture#19
![Page 45: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/45.jpg)
PeerInstructionWhatarethepossiblevaluesof*(x1) afterexecutingthiscodebytwoconcurrent threads?
# *(x1) = 100lw x2,0(x1)addi x2,x2,1sw x2,0(x1)
Answer *(x1)
RED 100 or101GREEN 101ORANGE 101or102YELLOW 100or101or102
4511/2/17 Fall2017 - Lecture#19
![Page 46: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/46.jpg)
• Operationisreallypi = pi + sum[id]
• Whatif>1threadsreadscurrent(same)valueofpi,computesthesum,storestheresultbacktopi?
• Eachprocessorreadssameintermediatevalueofpi!
• Resultdependsonwhogetstherewhen• A“race”à resultisnot
deterministic
What’sGoingOn?
4611/2/17 Fall2017 - Lecture#19
![Page 47: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/47.jpg)
Administrivia
• Homework4(Caches,FloatingPoint)duetomorrow at11:59pm• Project2-2dueMonday
− ProjectOfficehoursthatMondaywillbewellstaffed!− TestyourCPUthoroughly!
§ WriteprogramswithVenusandloadthemintoyourcircuit
• Project3willbereleasedMondaynight− Atwo-weekperformanceproject− Canearnextracreditfromtheperformancecontest(Project5)
• MidtermscoreswillbereleasedbeforeTuesdayonGradescope
4711/2/17 Fall2017 - Lecture#19
![Page 48: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/48.jpg)
Break!
4811/2/17 Fall2017 - Lecture#19
![Page 49: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/49.jpg)
Agenda
• MIMD- multipleprogramssimultaneously• Threads• Parallelprogramming:OpenMP• Synchronizationprimitives• SynchronizationinOpenMP• And,inConclusion…
4911/2/17 Fall2017 - Lecture#19
![Page 50: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/50.jpg)
Synchronization
• Problem:− Limitaccesstosharedresourceto1actoratatime− E.g.only1personpermittedtoeditafileatatime
§ otherwisechangesbyseveralpeoplegetallmixedup
• Solution:• Taketurns:
• Onlyonepersonget’sthemicrophone&talksatatime
• Alsogoodpracticeforclassrooms,btw…
5011/2/17 Fall2017 - Lecture#19
![Page 51: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/51.jpg)
Locks
• Computersuselockstocontrolaccesstosharedresources− Servespurposeofmicrophoneinexample− Alsoreferredtoas“semaphore”
• Usuallyimplementedwithavariable− int lock;
§ 0forunlocked§ 1forlocked
5111/2/17 Fall2017 - Lecture#19
![Page 52: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/52.jpg)
SynchronizationwithLocks// wait for lock releasedwhile (lock != 0) ;// lock == 0 now (unlocked)
// set locklock = 1;
// access shared resource ... // e.g. pi// sequential execution! (Amdahl ...)
// release locklock = 0;
5211/2/17 Fall2017 - Lecture#19
![Page 53: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/53.jpg)
LockSynchronization
Thread1
while (lock != 0) ;
lock = 1;
// critical section
lock = 0;
Thread2
while (lock != 0) ;
lock = 1; // critical sectionlock = 0;
• Thread2findslocknotset,beforethread1setsit
• Boththreadsbelievetheygotandsetthelock!
Tryasyoulike,thisproblemhasnosolution,notevenattheassemblylevel.
Unlessweintroducenewinstructions,thatis!5311/2/17 Fall2017 - Lecture#19
![Page 54: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/54.jpg)
HardwareSynchronization
• Solution:−Atomicread/write−Read&writeinsingleinstruction
§ Nootheraccesspermittedbetweenreadandwrite−Note:
§ Mustusesharedmemory (multiprocessing)• Commonimplementations:
−Atomicswapofregister↔memory−Pairofinstructionsfor“linked”readandwrite
§ writefailsifmemorylocationhasbeen“tampered”withafterlinkedread• RISCVhasvariationsofboth,butforsimplicitywewillfocusontheformer
5411/2/17 Fall2017 - Lecture#19
![Page 55: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/55.jpg)
RISCVAtomicMemoryOperations(AMOs)
• AMOsatomicallyperformanoperationonanoperandinmemoryandsetthedestinationregistertotheoriginalmemoryvalue• R-TypeInstructionFormat:Add,And,Or,Swap,Xor,Max,Max Unsigned,Min,Min Unsigned
5511/2/17 Fall2017 - Lecture#19
Loadfromaddressinrs1to“t”rd =”t”,i.e.,thevalueinmemoryStoreataddressinrs1thecalculation“t”<operation>rs2aq andrl insureinorderexecution
amoadd.w rd,rs2,(rs1):t = M[x[rs1]]; x[rd] = t; M[x[rs1]] = t + x[rs2]
![Page 56: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/56.jpg)
RISCVCriticalSection
• Assumethatthelockisinmemorylocationstoredinregistera0• Thelockis“set”ifitis1;itis“free”ifitis0(it’sinitialvalue)
li t0, 1 # Get 1 to set lockTry: amoswap.w.aq t1, t0, (a0) # t1 gets old lock value
# while we set it to 1bnez t1, Try # if it was already 1, another
# thread has the lock,# so we need to try again
… critical section goes here …amoswap.w.rl x0, x0, (a0) # store 0 in lock to release
5611/2/17 Fall2017 - Lecture#19
![Page 57: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/57.jpg)
LockSynchronization
BrokenSynchronization
while (lock != 0) ;
lock = 1;
// critical section
lock = 0;
Fix(lockisatlocation(a0))
li t0, 1Try amoswap.w.aq t1, t0, (a0)
bnez t1, TryLocked:
# critical section
Unlock:amoswap.w.rl x0, x0, (a0)
5711/2/17 Fall2017 - Lecture#19
![Page 58: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/58.jpg)
Agenda
• MIMD- multipleprogramssimultaneously• Threads• Parallelprogramming:OpenMP• Synchronizationprimitives• SynchronizationinOpenMP• And,inConclusion…
5811/2/17 Fall2017 - Lecture#19
![Page 59: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/59.jpg)
OpenMPLocks
5911/2/17 Fall2017 - Lecture#19
![Page 60: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/60.jpg)
SynchronizationinOpenMP
• Typicallyareusedinlibrariesofhigherlevelparallelprogrammingconstructs• E.g.OpenMPoffers$pragmasforcommoncases:
− critical− atomic− barrier− ordered
• OpenMPoffersmanymorefeatures− Seeonlinedocumentation−Ortutorialat
§ http://openmp.org/mp-documents/omp-hands-on-SC08.pdf
6011/2/17 Fall2017 - Lecture#19
![Page 61: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/61.jpg)
OpenMP CriticalSection
6111/2/17 Fall2017 - Lecture#19
![Page 62: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/62.jpg)
TheTroublewithLocks…• …isdead-locks• Consider2cookssharingakitchen
− Eachcooksamealthatrequiressaltandpepper(locks)− Cook1grabssalt− Cook2grabspepper− Cook1noticess/heneedspepper
§ it’snotthere,sos/hewaits− Cook2realizess/heneedssalt
§ it’snotthere,sos/hewaits
• Anotsocommoncauseofcookstarvation− Butdeadlocksarepossibleinparallelprograms− Verydifficulttodebug
§ malloc/free iseasy…
6211/2/17 Fall2017 - Lecture#19
![Page 63: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/63.jpg)
Agenda
• MIMD- multipleprogramssimultaneously• Threads• Parallelprogramming:OpenMP• Synchronizationprimitives• SynchronizationinOpenMP• And,inConclusion…
6311/2/17 Fall2017 - Lecture#19
![Page 64: CS 61C: Great Ideas in Computer Architecture …inst.eecs.berkeley.edu/~cs61c/fa17/lec/19/L19 TLP (1up).pdfProjects 3 and 5! 11/2/17 Fall 2017-Lecture #19 4 Parallel Computer Architectures](https://reader035.fdocuments.net/reader035/viewer/2022062415/5fc640e226b68f457d635668/html5/thumbnails/64.jpg)
And,inConclusion,…• Sequentialsoftwareexecutionspeedislimited• Parallelprocessingistheonlypathtohigherperformance
− SIMD:instructionlevelparallelism§ ImplementedinallhighperformanceCPUstoday(x86,ARM,…)§ Partiallysupportedbycompilers
− MIMD:threadlevelparallelism§ Multicoreprocessors§ SupportedbyOperatingSystems(OS)§ Requiresprogrammerinterventiontoexploitatsingleprogramlevel
o E.g.OpenMP− SIMD&MIMDformaximumperformance
• Synchronization− Requireshardwaresupport:specializedassemblyinstructions− Typicallyusehigher-levelsupport− Bewareofdeadlocks
6411/2/17 Fall2017 - Lecture#19