CSC 631: High-Performance Computer Architectureharmanani.github.io › classes › csc631 › Notes...

17
CSC 631: High-Performance Computer Architecture Spring 2017 Lecture 2: Instruction Set Architectures Analog Computers § Analog computer represents problem variables as some physical quantity (e.g., mechanical displacement, voltage on a capacitor) and uses scaled physical behavior to calculate results [Marsyas, Creative Commons BY-SA 3.0] Antikythera mechanism c.100BC [BenFrantzDale, Creative Commons BY-SA 3.0] Wingtip vortices off Cesna tail in wind tunnel

Transcript of CSC 631: High-Performance Computer Architectureharmanani.github.io › classes › csc631 › Notes...

Page 1: CSC 631: High-Performance Computer Architectureharmanani.github.io › classes › csc631 › Notes › L02-ISAs.pdf · CSC 631: High-Performance Computer Architecture Spring 2017

CSC631:High-PerformanceComputerArchitecture

Spring2017

Lecture2:InstructionSetArchitectures

AnalogComputers

§ Analogcomputerrepresentsproblemvariablesas

somephysicalquantity(e.g.,mechanical

displacement,voltageonacapacitor)andusesscaled

physicalbehaviortocalculateresults

[Marsyas,CreativeCommonsBY-SA3.0]

Antikythera mechanism c.100BC

[BenFrantzDale,CreativeCommonsBY-SA3.0]

Wingtip vortices off Cesna tail in wind tunnel

Page 2: CSC 631: High-Performance Computer Architectureharmanani.github.io › classes › csc631 › Notes › L02-ISAs.pdf · CSC 631: High-Performance Computer Architecture Spring 2017

DigitalComputers

§ Representproblemvariablesasnumbersencoded

usingdiscretesteps

- Discretestepsprovidenoiseimmunity

§ Enablesaccurateanddeterministiccalculations

- Sameinputsgivesameoutputsexactly

§ Notconstrainedbyphysicallyrealizablefunctions

§ Programmabledigitalcomputersarethefocusof

computerarchitectures

CharlesBabbage(1791-1871)§ Lucasian Professorof

Mathematics,Cambridge

University,1828-1839

§ Atrue“polymath”withinterests

inmanyareas

§ Frustratedbyerrorsinprinted

tables,wantedtobuildmachines

toevaluateandprintaccurate

tables

§ Inspiredbyearlierwork

organizinghuman“computers”to

methodicallycalculatetablesby

hand

[Copyrightexpiredandinpublicdomain.

ImageobtainedfromWikimediaCommons.]

Page 3: CSC 631: High-Performance Computer Architectureharmanani.github.io › classes › csc631 › Notes › L02-ISAs.pdf · CSC 631: High-Performance Computer Architecture Spring 2017

DifferenceEngine1822§ Continuousfunctionscanbeapproximatedby

polynomials,whichcanbecomputedfromdifference

tables:

f(n)=n2+n+41

d1(n)=f(n)– f(n-1)=2n

d2(n)=d1(n)– d1(n-1)=2

§ Can calculateusingonlyasingleadder:

n

d2(n)

d1(n)

f(n)

0

41

1

2

2

2

3

2

4

2

4 6 8

43 47 53 61

RealizingtheDifferenceEngine§ Mechanicalcalculator,hand-cranked,usingdecimaldigits

§ BabbagedidnotcompletetheDE,movingontotheAnalytical

Engine(butusedideasfromAEinimprovedDE2plan)

§ Scheutz completedworkingversionin1855,soldcopyto

BritishGovernment

§ ModerndayrecreationofDE2,

includingprinter,showedentire

designpossibleusingoriginal

technology

- firstatBritishScienceMuseum

- copyatComputerHistoryMuseumin

SanJose

[Geni,CreativeCommonsBY-SA3.0]

Page 4: CSC 631: High-Performance Computer Architectureharmanani.github.io › classes › csc631 › Notes › L02-ISAs.pdf · CSC 631: High-Performance Computer Architecture Spring 2017

AnalyticalEngine1837

§ Recognizedasfirstgeneral-purposedigitalcomputer

- Manyiterationsofthedesign(multipleAnalyticalEngines)

§ Containsthemajorcomponentsofmoderncomputers:

- “Store”:Mainmemorywherenumbersandintermediateresultswere

held(1,000decimalwords,40-digitseach)

- “Mill”:Arithmeticunitwhereprocessingwasperformedincluding

addition,multiplication,anddivision

- Alsosupportedconditionalbranchingandlooping,andexceptionson

overflow(machinejamsandbellrings)

- Hadaformofmicrocode(the“Barrel”)

§ Program,inputandoutputdataonpunchedcards

§ Instructioncardsholdopcode andaddressofoperandsin

store

- 3-addressformatwithtwosourcesandonedestination,allinstore

§ Branchesimplementedbymechanicallychangingordercards

wereinsertedintomachine

§ Onlysmallpieceswereeverbuilt

AnalyticalEngineDesignChoices

§ Decimal,becausestorageonmechanicalgears

- Babbageconsideredbinaryandotherbases,butnoclearadvantageoverhuman-friendlydecimal

§ 40-digitprecision(equivalentto>133bits)- Toreduceimpactofscalinggivenlackoffloating-point

hardware

§ Used“locking”ormechanicalamplificationto

overcomenoiseintransferringmechanicalmotion

aroundmachine

- Similartonon-lineargainindigitalelectroniccircuits

§ Hadafast“anticipating”carry-Mechanicalversionofpass-transistorcarrypropagateused

inCMOSadders(andearlierinrelayadders)

Page 5: CSC 631: High-Performance Computer Architectureharmanani.github.io › classes › csc631 › Notes › L02-ISAs.pdf · CSC 631: High-Performance Computer Architecture Spring 2017

AdaLovelace(1815-1852)§ TranslatedlecturesofLuigi

Menabrea whopublishednotesof

Babbage’slecturesinItaly

§ Lovelaceconsiderablyembellished

notesanddescribedAnalytical

Engineprogramtocalculate

Bernoullinumbersthatwould

haveworkedifAEwasbuilt

- Thefirstprogram!

§ Imaginedmanyusesofcomputers

beyondcalculationsoftables

§ Wasinterestedinmodelingthe

brain

[ByMargaretSarahCarpenter,

Copyrightexpiredandinpublicdomain]

EarlyProgrammableCalculators

§ Analogcomputingwaspopularinfirsthalfof20th

centuryasdigitalcomputingwastooexpensive

§ Butduringlate30sand40s,severalprogrammable

digitalcalculatorswerebuilt(datewhenoperational)

- Atanasoff LinearEquationSolver(1939)- Zuse Z3(1941)- HarvardMarkI(1944)

- ENIAC(1946)

Page 6: CSC 631: High-Performance Computer Architectureharmanani.github.io › classes › csc631 › Notes › L02-ISAs.pdf · CSC 631: High-Performance Computer Architecture Spring 2017

Atanasoff-BerryLinearEquationSolver(1939)§ Fixed-functioncalculatorforsolvingupto29simultaneous

linearequations

§ Digitalbinaryarithmetic(50-bitfixed-pointwords)

§ Dynamicmemory(rotatingdrumofcapacitors)

§ Vacuumtubelogicforprocessing

[Manop,CreativeCommonsBY-SA3.0]

In1973,Atanasoff was

creditedasinventorof

“automaticelectronic

digitalcomputer”after

patentdisputewith

EckertandMauchly

(ENIAC)

ZuseZ3(1941)§ BuiltbyKonrad Zuse inwartimeGermanyusing2000relays

§ Hadnormalizedfloating-pointarithmeticwithhardware

handlingofexceptionalvalues(+/- infinity,undefined)

- 1-bitsign,7-bitexponent,14-bitsignificand

§ 64wordsofmemory

§ Two-stagepipeline1)fetch&execute 2)writeback

§ Noconditionalbranch

§ Programmedviapapertape

ReplicaoftheZuse Z3inthe

Deutsches Museum,Munich

[Venusianer,CreativeCommonsBY-SA3.0]

Page 7: CSC 631: High-Performance Computer Architectureharmanani.github.io › classes › csc631 › Notes › L02-ISAs.pdf · CSC 631: High-Performance Computer Architecture Spring 2017

HarvardMarkI(1944)

§ ProposedbyHowardAikenatHarvard,andfundedandbuiltby

IBM

§ Mostlymechanicalwithsomeelectricallycontrolledrelaysand

gears

§ Weighed5tonsandhad750,000components

§ Stored72numberseachof23decimaldigits

§ Speed:adds0.3s,multiplies6s,divide15s,trig>1minute

§ Instructionsonpapertape(2-addressformat)

§ Couldrunlongprogramsautomatically

§ Loopsbygluingpapertapeintoloops

§ Noconditionalbranch

§ AlthoughmentionedBabbageinproposal,wasmorelimited

thananalyticalengine

[Waldir,CreativeCommonsBY-SA3.0]

ENIAC(1946)§ Firstelectronicgeneral-purposecomputer

§ ConstructionstartedinsecretatUPenn MooreSchoolof

ElectricalEngineeringduringWWIItocalculatefiringtablesfor

USArmy,designedbyEckertandMauchly

§ 17,468vacuumtubes

§ Weighed30tons,occupied1800sq ft,power150kW

§ Twelve10-decimal-digitaccumulators

§ Hadaconditionalbranch!

§ Programmedbyplugboard andswitches,timeconsuming!

§ Purelyelectronicinstructionfetchandexecution,sofast

- 10-digitx10-digitmultiplyin2.8ms(2000xfasterthanMark-1)

§ Asaresultofspeed,itwasalmostentirelyI/Obound

§ Asaresultoflargenumberoftubes,itwasoftenbroken(5

dayswaslongesttimebetweenfailures)

Page 8: CSC 631: High-Performance Computer Architectureharmanani.github.io › classes › csc631 › Notes › L02-ISAs.pdf · CSC 631: High-Performance Computer Architecture Spring 2017

ENIAC

[PublicDomain,USArmyPhoto]

Changingtheprogramcouldtakedays!

EDVAC§ ENIACteamstarteddiscussingstored-programconceptto

speedupprogrammingandsimplifymachinedesign

§ JohnvonNuemann wasconsultingatUPenn andtypedup

ideasin“FirstDraftofareportonEDVAC”

§ HermanGoldstine circulatedthedraftJune1945tomany

institutions,ignitinginterestinthestored-programidea

- Butalso,ruinedchancesofpatentingit

- ReportfalselygavesolecredittovonNeumannfortheideas

- MauriceWilkeswasexcitedbyreportanddecidedtocometoUS

workshoponbuildingcomputers

§ Later,in1948,modificationstoENIACallowedittorunin

stored-programmode,but6xslowerthanhardwired

- DuetoI/Olimitations,thisspeeddropwasnotpracticallysignificant

andimprovementinproductivitymadeitworthwhile

§ EDVACeventuallybuiltand(mostly)workingin1951

- Delayedbypatentdisputeswithuniversity

Page 9: CSC 631: High-Performance Computer Architectureharmanani.github.io › classes › csc631 › Notes › L02-ISAs.pdf · CSC 631: High-Performance Computer Architecture Spring 2017

[Piero71,Creative

CommonsBY-SA3.0]

Williams-Kilburn

TubeStore

ManchesterSSEM“Baby”(1948)§ ManchesterUniversitygroupbuildsmall-scaleexperimental

machinetodemonstrateideaofusingcathode-raytubes

(CRTs)forcomputermemoryinsteadofmercurydelaylines

§ Williams-KilburnTubeswerefirstrandomaccesselectronic

storagedevices

§ 32wordsof32-bits,accumulator,andprogramcounter

§ Machineranworld’sfirststored-programinJune1948

§ LedtolaterManchesterMark-1full-scalemachine

- Mark-1introducedindex registers

- Mark-1commercializedbyFerranti

CambridgeEDSAC(1949)§ MauriceWilkescamebackfromworkshopinUSandsetabout

buildingastored-programcomputerinCambridge

§ EDSACusedmercury-delaylinestoragetoholdupto1024

words(512initially)of17bits(+1bitofpaddingindelayline)

§ Two’s-complementbinaryarithmetic

§ AccumulatorISAwithself-modifyingcodeforindexing

§ DavidWheeler,whoearnedtheworld’sfirstcomputerscience

PhD,inventedthesubroutine(“Wheelerjump”)forthis

machine

- Usersbuiltalargelibraryofusefulsubroutines

§ UK’sfirstcommercialcomputer,LEO-I(LyonsElectronic

Office),wasbasedonEDSAC,ranbusinesssoftwarein1951

- SoftwareforLEOwasstillrunninginthe1980sinemulationonICL

mainframes!

§ EDSAC-II(1958)wasfirstmachinewithmicroprogrammed

controlunit

Page 10: CSC 631: High-Performance Computer Architectureharmanani.github.io › classes › csc631 › Notes › L02-ISAs.pdf · CSC 631: High-Performance Computer Architecture Spring 2017

Commercialcomputers:BINAC(1949)andUNIVAC(1951)

§ EckertandMauchly leftU.Penn afterpatentrights

disputesandformedtheEckert-Mauchly Computer

Corporation

§ World’sfirstcommercialcomputerwasBINACwith

twoCPUsthatcheckedeachother

- BINACapparentlyneverworkedaftershipmenttofirst

(only)customer

§ SecondcommercialcomputerwasUNIVAC

-Usedmercurydelay-linememory,1000wordsof12alpha

characters

- Famouslyusedtopredictpresidentialelectionin1952

- Eventually46unitssoldat>$1Meach

-Often,mistakingly calledtheIBMUNIVAC

IBM701(1952)

§ IBM’sfirstcommercialscientificcomputer

§ Mainmemorywas72William’sTubes,each1Kib,for

totalof2048wordsof36bitseach

-Memorycycletimeof12µs

§ AccumulatorISAwithmultipler/quotientregister

§ 18-bit/36-bitnumbersinsign-magnitudefixed-point

§ MisquotefromThomasWatsonSr/Jr:

“Ithinkthereisaworldmarketformaybefivecomputers”

§ ActuallyTWJr saidatshareholdermeeting:

“asaresultofourtrip[sellingthe701],onwhichweexpectedtogetordersforfivemachines,wecame

homewithordersfor18.”

Page 11: CSC 631: High-Performance Computer Architectureharmanani.github.io › classes › csc631 › Notes › L02-ISAs.pdf · CSC 631: High-Performance Computer Architecture Spring 2017

IBM650(1953)

§ Thefirstmass-producedcomputer

§ Low-endsystemwithdrum-basedstorageanddigit

serialALU

§ Almost2,000produced

[CushingMemorialLibraryandArchives,TexasA&M,

CreativeCommonsAttribution2.0Generic]

IBM650Architecture

22[From650Manual,©IBM]

MagneticDrum(1,000

or2,000

10-digitdecimal

words)

20-digit

accumulator

Activeinstruction

(includingnext

programcounter)

Digit-serial

ALU

Page 12: CSC 631: High-Performance Computer Architectureharmanani.github.io › classes › csc631 › Notes › L02-ISAs.pdf · CSC 631: High-Performance Computer Architecture Spring 2017

IBM650InstructionSet

§ Addressanddatain10-digitdecimalwords

§ Instructionsencode:- Two-digitopcode encoded44instructionsinbaseinstructionset,expandableto97instructionswithoptions

- Four-digitdataaddress- Four-digitnextinstructionaddress- Programmer’sarrangecodetominimizedrumlatency!

§ Specialinstructionsaddedtocomparevaluetoall

wordsontrack

EarlyInstructionSets

§ VerysimpleISAs,mostlysingle-addressaccumulator-

stylemachines,ashigh-speedcircuitrywasexpensive

- Basedonearlier“calculator”model

§ Overtime,appreciationofsoftwareneedsshapedISA

§ Indexregisters(Kilburn,Mark-1)addedtoavoidneed

forself-modifyingcodetostepthrougharray

§ Overtime,moreindexregisterswereadded

§ Andmoreoperationsontheindexregisters

§ Eventually,justprovidegeneral-purposeregisters

(GPRs)andorthogonalinstructionsets

§ Butsomeotheroptionsexplored…

Page 13: CSC 631: High-Performance Computer Architectureharmanani.github.io › classes › csc631 › Notes › L02-ISAs.pdf · CSC 631: High-Performance Computer Architecture Spring 2017

Burrough’s B5000StackArchitecture:RobertBarton,1960

§ Hideinstructionsetcompletelyfromprogrammer

usinghigh-levellanguage(ALGOL)

§ Usestackarchitecturetosimplifycompilation,

expressionevaluation,recursivesubroutinecalls,

interrupthandling,…

EvaluationofExpressions

26

a

b

c

(a+b*c)/(a+d*c- e)

/

+

* +a e

-

ac

dc

*b

ReversePolish

abc*+adc*+e- /

pushapushbpushcmultiply

*

EvaluationStack

b*c

Page 14: CSC 631: High-Performance Computer Architectureharmanani.github.io › classes › csc631 › Notes › L02-ISAs.pdf · CSC 631: High-Performance Computer Architecture Spring 2017

EvaluationofExpressions

27

a

(a+b*c)/(a+d*c- e)

/

+

* +a e

-

ac

dc

*b

ReversePolish

abc*+adc*+e- /

add

+

EvaluationStack

b*c

a+b*c

IBM’sBigBet:360Architecture

§ Byearly1960s,IBMhadseveralincompatiblefamilies

ofcomputer:

701→7094

650→ 7074

702→ 7080

1401→7010

§ Eachsystemhaditsown

- Instructionset- I/Osystemandsecondarystorage(magnetictapes,

drumsanddisks)

- assemblers,compilers,libraries,...

-marketniche(business,scientific,realtime,...)

Page 15: CSC 631: High-Performance Computer Architectureharmanani.github.io › classes › csc631 › Notes › L02-ISAs.pdf · CSC 631: High-Performance Computer Architecture Spring 2017

IBM360:DesignPremisesAmdahl,BlaauwandBrooks,1964

§ Thedesignmustlenditselftogrowthandsuccessor

machines

§ GeneralmethodforconnectingI/Odevices

§ Totalperformance- answerspermonthratherthanbits

permicrosecond� programmingaids

§ Machinemustbecapableofsupervisingitselfwithout

manualintervention

§ Built-inhardwarefaultcheckingandlocatingaidsto

reducedowntime

§ SimpletoassemblesystemswithredundantI/Odevices,

memoriesetc.forfaulttolerance

§ Someproblemsrequiredfloating-pointlargerthan36

bits

StackversusGPROrganizationAmdahl,BlaauwandBrooks,1964

1.Theperformanceadvantageofpush-downstackorganization

isderivedfromthepresenceoffastregistersandnottheway

theyareused.

2.“Surfacing”ofdatainstackwhichare“profitable”is

approximately50%becauseofconstantsandcommon

subexpressions.

3.Advantageofinstructiondensitybecauseofimplicitaddresses

isequaledifshortaddressestospecifyregistersareallowed.

4.Managementoffinite-depthstackcausescomplexity.

5.Recursivesubroutineadvantagecanberealizedonlywiththe

helpofanindependentstackforaddressing.

6.Fittingvariable-lengthfieldsintofixed-widthwordisawkward.

Page 16: CSC 631: High-Performance Computer Architectureharmanani.github.io › classes › csc631 › Notes › L02-ISAs.pdf · CSC 631: High-Performance Computer Architecture Spring 2017

IBM360:AGeneral-PurposeRegister(GPR)Machine

§ ProcessorState- 16General-Purpose32-bitRegisters- maybeusedasindexandbaseregister

- Register0hassomespecialproperties

- 4FloatingPoint64-bitRegisters- AProgramStatusWord(PSW)

- PC,Conditioncodes,Controlflags

§ A32-bitmachinewith24-bitaddresses

- Butnoinstructioncontainsa24-bitaddress!§ DataFormats

- 8-bitbytes,16-bithalf-words,32-bitwords,64-bitdouble-words

The IBM 360 is why bytes are 8-bits long today!

IBM360:InitialImplementations

32

Model30 ... Model70

Storage 8K- 64KB 256K- 512KB

Datapath 8-bit 64-bit

CircuitDelay 30nsec/level 5nsec/level

LocalStore MainStore TransistorRegisters

ControlStore Readonly1 μ sec Conventionalcircuits

IBM360instructionsetarchitecture(ISA)completelyhidtheunderlyingtechnologicaldifferencesbetweenvariousmodels.

Milestone:ThefirsttrueISAdesignedasportablehardware-softwareinterface!

Withminormodificationsitstillsurvivestoday!

Page 17: CSC 631: High-Performance Computer Architectureharmanani.github.io › classes › csc631 › Notes › L02-ISAs.pdf · CSC 631: High-Performance Computer Architecture Spring 2017

Acknowledgements

§ Thesecoursenotesweredevelopedby:- Krste Asanovic (UCB)- Arvind(MIT)

- JoelEmer (Intel/MIT)

- JamesHoe(CMU)

- JohnKubiatowicz (UCB)- DavidPatterson(UCB)