CS 152 Computer Architecture and Engineering Lecture 15 ...cs152/sp16/lectures/L15...3/30/2016...

3/30/2016 CS152,Spring2016

CS152ComputerArchitectureandEngineering

Lecture15:VectorComputers

Dr. George Michelogiannakis

EECS, University of California at Berkeley CRD, Lawrence Berkeley National Laboratory

http://inst.eecs.berkeley.edu/~cs152!

3/30/2016 CS152,Spring2016

LastTimeLecture14:Mul?threading

Time (

le) Superscalar Fine-Grained Coarse-Grained Multiprocessing

Simultaneous Multithreading

Thread 1 Thread 2

Thread 3 Thread 4

Thread 5 Idle slot

3/30/2016 CS152,Spring2016

Ques?onoftheDay

§ CanVectorandVLIWcombine?

3/30/2016 CS152,Spring2016

Supercomputers

§ Defini@onofasupercomputer:§  Fastestmachineinworldatgiventask

–  Performsatornearthecurrentlyhighestopera@onalrateforcomputers

§ Adevicetoturnacompute-boundproblemintoanI/Oboundproblem

§ Anymachinecos@ng$30M+§ AnymachinedesignedbySeymourCray

§ CDC6600(Cray,1964)regardedasfirstsupercomputer

3/30/2016 CS152,Spring2016

CDC6600SeymourCray,1963

§ Afastpipelinedmachinewith60-bitwords–  128Kwordmainmemorycapacity,32banks

§ Tenfunc@onalunits(parallel,unpipelined)–  Floa@ngPoint:adder,2mul@pliers,divider–  Integer:adder,2incrementers,...

§ Hardwiredcontrol(nomicrocoding)§ Scoreboardfordynamicschedulingofinstruc@ons§ TenPeripheralProcessorsforInput/Output

–  afastmul@-threaded12-bitintegerALU§ Veryfastclock,10MHz(FPaddin4clocks)§ >400,000transistors,750sq.b.,5tons,150kW,novelfreon-basedtechnologyforcooling

§ Fastestmachineinworldfor5years(un@l7600)–  over100sold($7-10Meach)

53/10/2009

3/30/2016 CS152,Spring2016

IBMMemoonCDC6600ThomasWatsonJr.,IBMCEO,August1963:

“Lastweek,ControlData...announcedthe6600system.Iunderstandthatinthelaboratorydevelopingthesystemthereareonly34peopleincludingthejanitor.Ofthese,14areengineersand4areprogrammers...ContrasGngthismodesteffortwithourvastdevelopmentacGviGes,IfailtounderstandwhywehavelostourindustryleadershipposiGonbyleIngsomeoneelseoffertheworld'smostpowerfulcomputer.”

TowhichCrayreplied:“ItseemslikeMr.WatsonhasansweredhisownquesGon.”

3/30/2016 CS152,Spring2016

Top500Systems

LINPACK & LAPACK: Software libraries for performing linear algebra

3/30/2016 CS152,Spring2016

OakRidgeTitan

§  560,640cores§  LinkPackperformance17,590TFlop/s§  Theore@calpeak27,112.5TFlop/s§  8,209.00kW§  710,144GB§ Opteron627416C2.2GHz

3/30/2016 CS152,Spring2016

NERSC(LBNL)Cori

§ CrayXC40supercomputer§  Theore@calPeakperformance1.92Petaflops/sec§  1,630computesnodes,52,160coresintotal§ CrayArieshigh-speedinterconnectwithDragonflytopologyasonEdison(0.25μsto3.7μsMPIlatency,~8GB/secMPIbandwidth)

§ Aggregatememory:203TB§  Scratchstoragecapacity:30PB

3/30/2016 CS152,Spring2016

CDC6600:ALoad/StoreArchitecture

•  Separate instructions to manipulate three types of reg. 8 60-bit data registers (X)

8 18-bit address registers (A) 8 18-bit index registers (B)

• All arithmetic and logic instructions are reg-to-reg

• Only Load and Store instructions refer to memory!

Touching address registers 1 to 5 initiates a load 6 to 7 initiates a store

- very useful for vector operations

opcode i j k Ri ← (Rj) op (Rk)

opcode i j disp Ri ← M[(Rj) + disp]

6 3 3 3

6 3 3 18

3/30/2016 CS152,Spring2016

CDC6600:Datapath

AddressRegsIndexRegs8x18-bit8x18-bit

OperandRegs8x60-bit

Inst.Stack8x60-bit

10Func@onalUnits

CentralMemory

128Kwords,32banks,1µscycle

resultaddr

result

operand

operandaddr

3/30/2016 CS152,Spring2016

CDC6600ISAdesignedtosimplifyhigh-performanceimplementa?on

§ Useofthree-address,register-registerALUinstruc@onssimplifiespipelinedimplementa@on–  Noimplicitdependenciesbetweeninputsandoutputs

§ Decouplingseongofaddressregister(Ar)fromretrievingvaluefromdataregister(Xr)simplifiesprovidingmul@pleoutstandingmemoryaccesses–  Sobwarecanscheduleloadofaddressregisterbeforeuseofvalue–  Caninterleaveindependentinstruc@onsinbetween

§ CDC6600hasmul@pleparallelbutunpipelinedfunc@onalunits–  E.g.,2separatemul@pliers

§  Follow-onmachineCDC7600usedpipelinedfunc@onalunits–  ForeshadowslaterRISCdesigns

3/30/2016 CS152,Spring2016

CDC6600:VectorAddi?on

B0<--nloop: JZEB0,exit

A0<-B0+a0 loadX0A1<-B0+b0 loadX1X6<-X0+X1A6<-B0+c0 storeX6B0<-B0+1jumploop

Ai=addressregisterBi=indexregisterXi=dataregister

3/30/2016 CS152,Spring2016

SupercomputerApplica?ons

§  Typicalapplica@onareas–  Militaryresearch(nuclearweapons,cryptography)–  Scien@ficresearch–  Weatherforecas@ng–  Oilexplora@on–  Industrialdesign(carcrashsimula@on)–  Bioinforma@cs–  Cryptography

§ Allinvolvehugecomputa@onsonlargedatasets

§  In70s-80s,Supercomputer≡VectorMachine

3/30/2016 CS152,Spring2016

VLIWvsVector

§ VLIWtakesadvantageofinstruc@onlevelparallelism(ILP)byspecifyinginstruc@onstoexecuteinparallel

§ Vectorarchitecturesperformthesameopera@ononmul@pledataelements–  Data-levelparallelism

+ + + + + +

[0] [1] [VLR-1]

Vector Arithmetic Instructions

ADDV v3, v1, v2 v3

IntOp2 MemOp1 MemOp2 FPOp1 FPOp2IntOp1

3/30/2016 CS152,Spring2016

VectorProgrammingModel

+ + + + + +

[0] [1] [VLR-1]

Vector Arithmetic Instructions

ADDV v3, v1, v2 v3

Scalar Registers

r15 Vector Registers

[0] [1] [2] [VLRMAX-1]

VLR Vector Length Register

v1 Vector Load and Store Instructions

LV v1, r1, r2

Base, r1 Stride, r2 Memory

Vector Register

3/30/2016 CS152,Spring2016

ControlInforma?on

§ VLRlimitsthehighestvectorelementtobeprocessedbyavectorinstruc@on–  VLRisloadedpriortoexecu@ngthevectorinstruc@onwithaspecialinstruc@on

§  Strideforload/stores:–  Vectorsmaynotbeadjacentinmemoryaddresses–  E.g.,differentdimensionsofamatrix–  Stridecanbespecifiedaspartoftheload/store

3/30/2016 CS152,Spring2016

VectorCodeExample

# Scalar Code LI R4, 64 loop: L.D F0, 0(R1) L.D F2, 0(R2) ADD.D F4, F2, F0 S.D F4, 0(R3) DADDIU R1, 8 DADDIU R2, 8 DADDIU R3, 8 DSUBIU R4, 1 BNEZ R4, loop

# Vector Code LI VLR, 64 LV V1, R1 LV V2, R2 ADDV.D V3, V1, V2 SV V3, R3

# C code for (i=0; i<64; i++) C[i] = A[i] + B[i];

3/30/2016 CS152,Spring2016

Flynn’sTaxonomy

§ Singleinstruc@on,singledata(SISD)–  E.g.,ourin-orderprocessor

§ Singleinstruc@on,mul@pledata(SIMD)–  Mul@pleprocessingelements,sameopera@on,differentdata–  Vector–  Mul@pleprocessingunitsexecutethesameinstruc@onondifferentdatainalockstep.Eitherallcompleteornonedo.Therefore,allunitshavetoexecutethesameinstruc@onatagiven@me

§ Mul@pleinstruc@on,mul@pledata(MIMD)–  Mul@pleautonomousprocessorsexecu@ngdifferentinstruc@onsondifferentdata

–  Mostcommonandgeneralparallelmachine

§ Mul@pleinstruc@on,singledata(MISD)– Whywouldanyonedothis?

3/30/2016 CS152,Spring2016

MoreCategories

§ Singleprogram,mul@pledata(SPMD)–  Mul@pleautonomousprocessorsexecutetheprogramatindependentpoints

–  DifferencewithSIMD:SIMDimposesalockstep–  ProgramsatSPMDcanbeatindependentpoints–  SPMDcanrunongeneralpurposeprocessors–  Mostcommonmethodforparallelcompu@ng

§ Mul@pleprogram,mul@pledata(MPMD)–  Mul@pleautonomousprocessorssimultaneouslyopera@ngatleast2independentprograms

3/30/2016 CS152,Spring2016

VectorSupercomputers

§ Epitomy:Cray-1,1976§ ScalarUnit

–  Load/StoreArchitecture

§ VectorExtension–  VectorRegisters–  VectorInstruc@ons

§ Implementa@on–  HardwiredControl–  HighlyPipelinedFunc@onalUnits–  InterleavedMemorySystem–  NoDataCaches–  NoVirtualMemory

3/30/2016 CS152,Spring2016

Cray-1(1976)

Single Port Memory 16 banks of 64-bit words

+ 8-bit SECDED

80MW/sec data load/store 320MW/sec instruction buffer refill

4 Instruction Buffers

64-bitx16 NIP

( (Ah) + j k m )

64 T Regs

( (Ah) + j k m )

64 B Regs

S0 S1 S2 S3 S4 S5 S6 S7

A0 A1 A2 A3 A4 A5 A6 A7

FP Add FP Mul FP Recip

Int Add Int Logic Int Shift Pop Cnt

Addr Add Addr Mul

memory bank cycle 50 ns processor cycle 12.5 ns (80MHz)

V0 V1 V2 V3 V4 V5 V6 V7

Vi V. Mask

V. Length 64 Element Vector Registers

3/30/2016 CS152,Spring2016

VectorInstruc?onSetAdvantages

§ Compact–  oneshortinstruc@onencodesNopera@ons

§ Expressive,tellshardwarethattheseNopera@ons:–  areindependent–  usethesamefunc@onalunit–  accessdisjointregisters–  accessregistersinsamepauernaspreviousinstruc@ons–  accessacon@guousblockofmemory(unit-strideload/store)

–  accessmemoryinaknownpauern(stridedload/store)

§ Scalable–  canrunsamecodeonmoreparallelpipelines(lanes)

3/30/2016 CS152,Spring2016

VectorArithme?cExecu?on

• Usedeeppipeline(=>fastclock)toexecuteelementopera@ons

•  Simplifiescontrolofdeeppipelinebecauseelementsinvectorareindependent(=>nohazards!)

V3<-v1*v2

SixstagemulGplypipeline

3/30/2016 CS152,Spring2016

VectorInstruc?onExecu?on

ADDV C,A,B

A[3] B[3]

A[4] B[4]

A[5] B[5]

A[6] B[6]

Execution using one pipelined functional unit

A[12] B[12]

A[16] B[16]

A[20] B[20]

A[24] B[24]

A[13] B[13]

A[17] B[17]

A[21] B[21]

A[25] B[25]

A[14] B[14]

A[18] B[18]

A[22] B[22]

A[26] B[26]

A[15] B[15]

A[19] B[19]

A[23] B[23]

A[27] B[27]

Execution using four pipelined functional units

3/30/2016 CS152,Spring2016

HowDoVectorArchitecturesAffectMemory?

3/30/2016 CS152,Spring2016

InterleavedVectorMemorySystem

0 1 2 3 4 5 6 7 8 9 A B C D E F

Base Stride Vector Registers

Memory Banks

Address Generator

Cray-1, 16 banks, 4 cycle bank busy time, 12 cycle latency

•  Bank busy time: Time before bank ready to accept next request

3/30/2016 CS152,Spring2016

VectorUnitStructure

FuncGonalUnit

VectorRegisters

MemorySubsystem

Elements0,4,8,…

Elements1,5,9,…

Elements2,6,10,…

Elements3,7,11,…

3/30/2016 CS152,Spring2016

T0VectorMicroprocessor(UCB/ICSI,1995)

LaneVectorregisterelementsstriped

overlanes

[0] [8] [16] [24]

[1] [9] [17] [25]

[2] [10] [18] [26]

[3] [11] [19] [27]

[4] [12] [20] [28]

[5] [13] [21] [29]

[6] [14] [22] [30]

[7] [15] [23] [31]

3/30/2016 CS152,Spring2016

VectorInstruc?onParallelism§ Canoverlapexecu@onofmul@plevectorinstruc@ons

–  examplemachinehas32elementspervectorregisterand8lanes

load mul

Load Unit Multiply Unit Add Unit

Instruction issue

Complete24opera@ons/cyclewhileissuing1shortinstruc@on/cycle

3/30/2016 CS152,Spring2016

VectorChaining

§ Vectorversionofregisterbypassing–  introducedwithCray-1

Memory

Load Unit

MULV v3,v1,v2

ADDV v5, v3, v4

3/30/2016 CS152,Spring2016

VectorChainingAdvantage

•  Withchaining,canstartdependentinstruc@onassoonasfirstresultappears

LoadMul

AddTime

•  Withoutchaining,mustwaitforlastelementofresulttobewriuenbeforestar@ngdependentinstruc@on

3/30/2016 CS152,Spring2016

VectorStartup§  Twocomponentsofvectorstartuppenalty

–  func@onalunitlatency(@methroughpipeline)–  dead@meorrecovery@me(@mebeforeanothervectorinstruc@oncanstartdownpipeline).Somepipelinesreducecontrollogicbyrequiringdead@mebetweeninstruc@onstothesamevectorunit

R X X X WR X X X W

R X X X W

R X X X WR X X X W

R X X X W

Func@onalUnitLatency

DeadTime

FirstVectorInstruc@on

SecondVectorInstruc@on

DeadTime

3/30/2016 CS152,Spring2016

DeadTimeandShortVectors

Cray C90, Two lanes 4 cycle dead time

Maximum efficiency 94% with 128 element vectors

4 cycles dead time T0, Eight lanes No dead time

100% efficiency with 8 element vectors

No dead time

64 cycles active

3/30/2016 CS152,Spring2016

VectorMemory-MemoryversusVectorRegisterMachines

§ Vectormemory-memoryinstruc@onsholdallvectoroperandsinmainmemory

§  Thefirstvectormachines,CDCStar-100(‘73)andTIASC(‘71),werememory-memorymachines

§ Cray-1(’76)wasfirstvectorregistermachine

for (i=0; i<N; i++) { C[i] = A[i] + B[i]; D[i] = A[i] - B[i]; }

Example Source Code ADDV C, A, B SUBV D, A, B

Vector Memory-Memory Code

LV V1, A LV V2, B ADDV V3, V1, V2 SV V3, C SUBV V4, V1, V2 SV V4, D

Vector Register Code

3/30/2016 CS152,Spring2016

VectorMemory-Memoryvs.VectorRegisterMachines

§ Vectormemory-memoryarchitectures(VMMA)requiregreatermainmemorybandwidth,why?–  Alloperandsmustbereadinandoutofmemory

§ VMMAsmakeifdifficulttooverlapexecu@onofmul@plevectoropera@ons,why?–  Mustcheckdependenciesonmemoryaddresses

§ VMMAsincurgreaterstartuplatency–  ScalarcodewasfasteronCDCStar-100(VMM)forvectors<100elements

§ ApartfromCDCfollow-ons(Cyber-205,ETA-10)allmajorvectormachinessinceCray-1havehadvectorregisterarchitectures

§  (weignorevectormemory-memoryfromnowon)

3/30/2016 CS152,Spring2016

Automa?cCodeVectoriza?on

for (i=0; i < N; i++) C[i] = A[i] + B[i];

loadload

Iter.1

Iter.2

ScalarSequenGalCode

Vectoriza@onisamassivecompile-@mereorderingofopera@onsequencing

⇒requiresextensiveloopdependenceanalysis

VectorInstrucGon

Iter.1 Iter.2

VectorizedCode

3/30/2016 CS152,Spring2016

VectorStripminingProblem:VectorregistershavefinitelengthSolu?on:Breakloopsintopiecesthatfitinregisters,“Stripmining”

ANDI R1, N, 63 # N mod 64 MTC1 VLR, R1 # Do remainder loop: LV V1, RA DSLL R2, R1, 3 # Multiply by 8 DADDU RA, RA, R2 # Bump pointer LV V2, RB DADDU RB, RB, R2 ADDV.D V3, V1, V2 SV V3, RC DADDU RC, RC, R2 DSUBU N, N, R1 # Subtract elements LI R1, 64 MTC1 VLR, R1 # Reset full length BGTZ N, loop # Any more to do?

for (i=0; i<N; i++) C[i] = A[i]+B[i];

64 elements

Remainder

3/30/2016 CS152,Spring2016

VectorCondi?onalExecu?on

Problem: Want to vectorize loops with conditional code: for (i=0; i<N; i++) if (A[i]>0) then A[i] = B[i];

Solution: Add vector mask (or flag) registers –  vector version of predicate registers, 1 bit per element

…and maskable vector instructions –  vector operation becomes bubble (“NOP”) at elements where mask bit is clear

Code example: CVM # Turn on all elements LV vA, rA # Load entire A vector SGTVS.D vA, F0 # Set bits in mask register where A>0 LV vA, rB # Load B vector into A under mask SV vA, rA # Store A back to memory under mask

3/30/2016 CS152,Spring2016

MaskedVectorInstruc?ons

Write data port

A[7] B[7]

M[3]=0

M[4]=1

M[5]=1

M[6]=0

M[2]=0

M[1]=1

M[0]=0

M[7]=1

Density-TimeImplementa@on–  scanmaskvectorandonlyexecuteelementswithnon-zeromasks

A[3] B[3]

A[4] B[4]

A[5] B[5]

A[6] B[6]

M[3]=0

M[4]=1

M[5]=1

M[6]=0

M[2]=0

M[1]=1

M[0]=0

Write data port Write Enable

A[7] B[7] M[7]=1

SimpleImplementa@on–  executeallNopera@ons,turnoffresultwritebackaccordingtomask

3/30/2016 CS152,Spring2016

VectorReduc?ons

Problem:Loop-carrieddependenceonreduc@onvariablessum = 0; for (i=0; i<N; i++) sum += A[i]; # Loop-carried dependence on sum

Solu?on:Re-associateopera@onsifpossible,usebinarytreetoperformreduc@on # Rearrange as: sum[0:VL-1] = 0 # Vector of VL partial sums for(i=0; i<N; i+=VL) # Stripmine VL-sized chunks sum[0:VL-1] += A[i:i+VL-1]; # Vector sum # Now have VL partial sums in one vector register do { VL = VL/2; # Halve vector length sum[0:VL-1] += sum[VL:2*VL-1] # Halve no. of partials } while (VL>1)

3/30/2016 CS152,Spring2016

VectorScacer/Gather

Wanttovectorizeloopswithindirectaccesses:for (i=0; i<N; i++) A[i] = B[i] + C[D[i]]

Indexedloadinstruc@on(Gather)LV vD, rD # Load indices in D vector LVI vC, rC, vD # Load indirect from rC base LV vB, rB # Load B vector ADDV.D vA,vB,vC # Do add SV vA, rA # Store result

3/30/2016 CS152,Spring2016

VectorScacer/Gather

Histogramexample:for (i=0; i<N; i++) A[B[i]]++;

Isfollowingacorrecttransla@on? LV vB, rB # Load indices in B vector LVI vA, rA, vB # Gather initial A values ADDV vA, vA, 1 # Increment SVI vA, rA, vB # Scatter incremented values

3/30/2016 CS152,Spring2016

AModernVectorSuper:NECSX-9(2008)§ 65nmCMOStechnology§ Vectorunit(3.2GHz)

– 8foregroundVRegs+64backgroundVRegs(256x64-bitelements/VReg)

– 64-bitfunc@onalunits:2mul@ply,2add,1divide/sqrt,1logical,1maskunit

– 8lanes(32+FLOPS/cycle,100+GFLOPSpeakperCPU)

– 1loadorstoreunit(8x8-byteaccesses/cycle)

§ Scalarunit(1.6GHz)– 4-waysuperscalarwithout-of-orderandspecula@veexecu@on

– 64KBI-cacheand64KBdatacache

• Memorysystemprovides256GB/sDRAMbandwidthperCPU• Upto16CPUsandupto1TBDRAMformshared-memorynode

–  totalof4TB/sbandwidthtosharedDRAMmemory

• Upto512nodesconnectedvia128GB/snetworklinks(messagepassingbetweennodes)

3/30/2016 CS152,Spring2016

Mul?mediaExtensions(akaSIMDextensions)

§  Veryshortvectorsaddedtoexis@ngISAsformicroprocessors§  Useexis@ng64-bitregisterssplitinto2x32bor4x16bor8x8b

–  LincolnLabsTX-2from1957had36bdatapathsplitinto2x18bor4x9b–  Newerdesignshavewiderregisters

•  128bforPowerPCAl@vec,IntelSSE2/3/4•  256bforIntelAVX

§  Singleinstruc@onoperatesonallelementswithinregister

16b 16b 16b 16b

32b 32b

8b 8b 8b 8b 8b 8b 8b 8b

16b 16b 16b 16b

+ + + + 4x16badds

3/30/2016 CS152,Spring2016

Mul?mediaExtensionsversusVectors

§ Limitedinstruc@onset:–  novectorlengthcontrol–  nostridedload/storeorscauer/gather–  unit-strideloadsmustbealignedto64/128-bitboundary

§ Limitedvectorregisterlength:–  requiressuperscalardispatchtokeepmul@ply/add/loadunitsbusy–  loopunrollingtohidelatenciesincreasesregisterpressure

§ Trendtowardsfullervectorsupportinmicroprocessors–  Beuersupportformisalignedmemoryaccesses–  Supportofdouble-precision(64-bitfloa@ng-point)–  NewIntelAVXspec(announcedApril2008),256bvectorregisters(expandableupto1024b)

3/30/2016 CS152,Spring2016

DegreeofVectoriza?on

§ Compilersaregoodatfindingdata-levelparallelism

3/30/2016 CS152,Spring2016

AverageVectorLength

§ Maximumdependsonifbecnhmarksuse16bitor32bitopera@ons

3/30/2016 CS152,Spring2016

Distribu?onofInstruc?ons

3/30/2016 CS152,Spring2016

Ques?onoftheDay

§ CanVectorandVLIWcombine?

§  Yes!§  FujitsyFR-VcanprocessbothVLIWandvectorinstruc@ons§  Exploitsbothinstruc@on-anddata-levelparallelism

3/30/2016 CS152,Spring2016

Acknowledgements

§  Theseslidescontainmaterialdevelopedandcopyrightby:–  Arvind(MIT)–  KrsteAsanovic(MIT/UCB)–  JoelEmer(Intel/MIT)–  JamesHoe(CMU)–  JohnKubiatowicz(UCB)–  DavidPauerson(UCB)

§ MITmaterialderivedfromcourse6.823§ UCBmaterialderivedfromcourseCS252§  “VectorVs.SuperscalarandVLIWArchitecturesforEmbeddedMul@mediaBenchmarks”.ChristosKozyrakisandDavidPauerson.MICRO-35.2002

CS 152 Computer Architecture and Engineering Lecture 15 ...cs152/sp16/lectures/L15...3/30/2016...

Documents

Transcript of CS 152 Computer Architecture and Engineering Lecture 15 ...cs152/sp16/lectures/L15...3/30/2016...

CS 152 Computer Architecture and Engineering CS252 ...inst.eecs.berkeley.edu/~cs152/sp18/lectures/L03-Pipelining.pdfCS 152 Computer Architecture and Engineering CS252 Graduate Computer

CS 152 Computer Architecture and Engineering Lecture 9 -Virtual …cs152/fa16/lectures/L09... · 2016-10-02 · CS 152 Computer Architecture and Engineering Lecture 9 -Virtual Memory

CS 152 Computer Architecture and Engineering Lecture 1 ...cs152/sp16/lectures/L01-Intro.pdf · CS 152 Computer Architecture and Engineering Lecture 1 - Introducon Dr. George Michelogiannakis

CS 152 Computer Architecture and Engineering …inst.cs.berkeley.edu/~cs152/sp08/lectures/L01-Intro.pdf · · 2008-02-21Engineering Lecture 1 - Introduction ... ¥New CS152 focuses

CS 152 Computer Architecture and Engineering …cs152/sp12/lectures/L13...March 8, 2012 CS152, Spring 2012 CS 152 Computer Architecture and Engineering Lecture 13 - VLIW Machines and

CS 152 Computer Architecture and Engineering Lecture …cs152/sp11/lectures/L02-SimpleImps.pdf · CS 152 Computer Architecture and Engineering Lecture 2 ... ← Opcode(A,B) do instruction

EECS 152 Computer Architecture and Engineeringinst.eecs.berkeley.edu/~cs152/sp16/lectures/L03-CISCRISC.pdf · CS 152 Computer Architecture and Engineering Lecture 3 ... (8085, 6800,

CS 152: Programming Language Paradigmsaustin/cs152-fall16/slides/CS152-Day18-RubyIntroduction.pdfRuby . Introduction to Ruby Created by Yukihiro Matsumoto (known as "Matz") Ruby influences

CS 152 Laboratory Exercise 1 - University of California ...cs152/fa16/assignments/lab1.pdf · CS 152 Laboratory Exercise 1 ... The lab has two sections, ... Unlike HDL languages such

CS 152 L03 FPGA (1)Patterson Fall 2003 © UCB 2003-09-02 Dave Patterson (patterson) cs152/ CS152 – Computer.

CS 152 Computer Architecture and Engineering …cs152/sp05/lecnotes/lec5-1.pdfCS 152 L9: Pipelining II UC Regents Spring 2005 © UCB 2005-2-15 John Lazzaro (lazzaro) CS 152 Computer

CS 152 Computer Architecture and Engineering Lecture 14 ...cs152/sp08/... · CS 152 Computer Architecture and Engineering Lecture 14 - Branch Prediction Krste Asanovic Electrical

CS 152 Computer Architecture and Engineering Lecture 11 ...inst.eecs.berkeley.edu/~cs152/sp10/lectures/L11-VMCaches.pdf · February 25, 2010 CS152, Spring 2010 CS 152 Computer Architecture

CS 152 Computer Architecture and Engineering Lecture …cs152/fa16/lectures/L03-CISCRISC.pdf · CS 152 Computer Architecture and Engineering Lecture 3 -From CISC to RISC ... § Time

CS 152 Computer Architecture and Engineering …cs152/sp10/lectures/L01...1/19/2010 CS152, Spring 2010 11 The “New” CS152 • New CS152 focuses on interaction of software and hardware

CS 152 Computer Architecture and Engineering Lecture 3 ...cs152/fa16/lectures/L03-CISCRISC.pdf · CS 152 Computer Architecture and Engineering Lecture 3 -From CISC to RISC ... (i.e.,

CS 152 Computer Architecture and Engineering Lecture 4 ...inst.eecs.berkeley.edu/~cs152/sp11/lectures/L04-Pipelining.pdf · CS 152 Computer Architecture and Engineering Lecture 4

CS 152 Computer Programming Fundamentals Quiz 7bchenoweth/cs152/cs152lecture20-quiz7.… · CS 152 Computer Programming Fundamentals Quiz 7 Brooke Chenoweth University of New Mexico

CS 152 Computer Architecture and Engineering Lecture 1 …cs152/sp05/lecnotes/lec1-1.pdfCS 152 Computer Architecture and Engineering Lecture 1 – The MIPS ISA cs152/ A n d a l s o

CS 152 Computer Architecture and Engineering …inst.eecs.berkeley.edu/~cs152/sp08/lectures/L01-Intro.pdfCS 152 Computer Architecture and Engineering Lecture 1 ... cannot be solved