Post on 19-Oct-2015
description
Control Unit :Hardwired vs. Microprogrammed ApproachDr Shankar BalachandranIndian Institute of Technology Madrasshankar@cse.iitm.ernet.in14 October 2006
Two Major Blocks in a CPUDatapathAdders, multipliers, dividersShifters, RegistersAnything that changes or stores dataControl UnitControls the dataHow data is stored?Where is it stored?When should data be available?
Control UnitCorrect sequencing of control signalsMuch like human brain controlling various parts of bodySequence and timing is the keyAny aberration will result in wrong operation
A Simplified Control UnitControl UnitFetch UnitDecode UnitExecution UnitWrite Back UnitFetchDecodeExecuteWrite Back
A Possible Implementation2 to 4DecoderCLKMod-3 Counter
Timing DiagramCLKFetchDecodeExecuteWrite Back
Lets Sample The Signals1000010000100001
Another Way to Generate Signals1 0 0 00 1 0 00 0 1 00 0 0 1
Hardwired vs MicroprogrammedHardwiredUse gates to generate signalsSqueeze out the juice for performanceDifferent logic styles possibleMicroprogrammedStore the control signals in the sequenceJust read from the memory every clock cycle
A Model Computer (Richard Eckert, SIGCSE Bulletin, Vol. 20, No. 3, September 1988)AccumulatorALURegister BPCMARMDRRAMIRControl8812121212121212412BusRWLMIPLPEPLDEDLAEASAEULBLIEI
More DetailsL = LoadE = Copy to busA,S = Add and SubtractSign bit to control unitIP = Increment PC
MnemonicOpcodeActionRegister TransfersActive Controls
Hardwired UnitIRDecoderControl MatrixLDASTAADDSUBMBAJMPJNRing CounterNFT5T1HaltOpcodeControl SignalsCLK
Table with SequencingIP = T2; R=T1+T4*LDA; LI=T2;LP = T3*JMP+T3*JN*NF; W=T5* STA; A = T3*ADD;EP = T0; LD = T4*STA; S = T3*SUB;LM = T0+T3*LDA+T3*STA ED=T2+T5*LDA; ..
Control MatrixImplement using discrete gatesUsually done using PLAsLarge control matrices are implemented hierarchicallyFor speedA well known process and design flows are widespread
An Alternate ImplementationIRStartingAddressGeneratoruPCControl StoreCLK+1MicroinstructionRegister+NF&CDMAP1*0100Control32 x 24HLTControl ROMJump Address4-bit opcode
Control StoreInstructionOp-CodeuInstructionAddressControl SignalsCDMAPHLTAddr. Of NextControl Word
Example 1 MBA followed by ADD0B09LBEUSAEALAEILIEDLDWRLMEPLPIP
Sequence for MBA,ADD1. MAR PC2. MDR M(MAR)3. IR MDRBA1. MAR PC2. MDR M(MAR)3. IR MDRAALU(Add)
00110000000000000011000000000000000010000000000000001000000000001000000110000000100000011000000000000000000100010000000000101010MOV B,AADD
Example 2 JN with Flag Set0DCDIf negative FLAG is set, jump to a new location by skipping to uInstruction at 0FLBEUSAEALAEILIEDLDWRLMEPLPIP
Example 3 JN with Flag Not Set0DCDCDLBEUSAEALAEILIEDLDWRLMEPLPIP
Lets Review the Microprogramming ModelStore the microprogram in control storeFetch the instructionGet the set of control signals from the control wordMove the microinstruction addressLather, Rinse, Repeat
What is Microcode?Michael Slater's "Microprocessor Based Design" (pg.42): Microcode tells the processor every detailed step required to execute each machine language instruction. Microcode is thus at an even more detailed level than machine language, and in fact defines the machine language. In a standard microprocessor, the microcode is stored in a ROM or a programmable logic array (PLA) that is part of the microprocessor chip and cannot be modified by the user.'
Thought Experiment
Why is the design a little clumsy?What can we do about it?
Reason for ClumsinessJN Conditional Flag checkWithout any condition check, the whole process is very smoothSolution Avoid all conditional checks
Real LifeA little American Football StoryTheory vs. PracticeIn theory, there is no difference between theory and practiceIn practice, theory and practice are two different things altogetherLive with condition checksKeep designs as clean as possible
A General Approach
IRStarting and BranchAddressGeneratoruPCControl StoreControl WordExternal InputsConditional Codes
Format of MicroinstructionsPick yoursYour choice is as best as your neighborsWhat we did :One bit position per control signalOrder of the bits ?Dont matterCan result in long microinstructionsNot the number of microinstructions, but the width
A Note About DensityObserve that only a few bits are set to 1Poor usage of bit spaceThis scheme is called Horizontal MicroprogramAlternate Version : Encode the bitsVertical Microprogram
Vertical MicroprogramEncode the bits by grouping similar elements togetherGeneral Idea :Group similar resources togetherThere can be only one source or destination registerSome operations are mutually exclusiveRead vs Write of memory
Design IssuesEncoding reduces the bit-spaceBut requires decodersCost of decoder vs bit-spaceUsually decoder cost is very low
Another IdeaGroup concuurently active signalsEvery meaningful combination gets a codeComplex decoder to interpret every code
Vertical vs HorizontalHorizontal FasterMore areaMore common currentlyCheap transistorsVerticalSlowerMore microinstructions
MicrosequencingOther ways to save on hardwareEvery instruction had its own microprogram sequenceAlso, instructions have several addressing modesOnly the first few microinstructions differCan we share microcode?
A Powerful Technique in SharingBit-ORingExampleTwo instructions share some microcodeEventually, must branchThe default branch (one instructions) is X0The other branch is stored at X1Change the least significant bit(s?) to get a new addressCompare that with :Having two conditional branchesStore two fields, one for each branchBoth very unclean
Thought Experiment :What if we provided explicit branch instead of storing next field in our microprogram?Typical instruction set will need a lot of branchesLot of time will be wasted on branching
A Pat on Our BackWe provided explicit field for addressBranch location is now dataIt is already savedCaution :Microinstruction can get very wideSolution :There is no free lunch.
Can we pipeline microfetch?A neat idea :Why wait till the current micro-op is over?Branch field gives next operationGet the next opCaveat :External inputs and status flags may change the orderWhat about interrupts?They are going to follow you everywhereShould have a mechanism that can invalidate microcode prefetchSimilar to pipeline flush for instructionsCommonly used
Historical PerspectivesHardwired LogicPopular before 60sOnly way people did itPopular nowSpeed BenefitsMicroprogramPopular in 70sMemory was slower than CPUNo on-chip cacheBest way is to store the microcodeNow Depends on who you ask?Shades of gray :Extremes of spectrum are harder to find nowadays
Tools for DesignHardwiredAny state machine optimizerAssigning states, minimizing tranisitions, races, hazards,..MicrocodingSmall ones can be in binaryLarge ones Use microassemblerVery useful debug toolCan use microassembler simultaneously with actual hardware development
Hardwired vs MicrocodingHardwired units are faster and smallerEmulation is easy with microcodingHardwired design is complex if largeBugs in hardwired design cannot be fixed in fieldHardwired control is not suited for loopsLooping with microcode can be made as fast
Hardwired vs Microcode vs RISCRISCSimpler instruction setHardwired ImplementationRISC instructions are like microcodesInstructions come from I-Cache instead of Control StoreDifference :Contents are not fixedAdvantage : Only load what you want on the I-CacheKeeps size smaller as compared to Control Stores
Microprogram vs SoftwareImagine Floating Point DivisionSolution 1 : Write in softwareLong processError proneMany fetches repeatedly from memory for the given sequence of operationsSolution 2 : MicrocodeLong process too but designers not programmersRelatively error free more thorough designRequires many cycles but fetched and used locally
EmulationA very common use of microcodingIBM System/36032 bit architecture16-bit registersSecret :Most implementations were 8-bitKeep cost lowHeavy microcodingProgrammers obliviousIn 1992, International Meta Systems (IMS) announced the 3250Designed to emulate the x86, 68K, and 6502 architectures Uses customizable microcode, among other techniquesWent bust, never released
Another Interesting NoteWritable Control StoreWhat if you, a programmer, can write your own control store?Not a mad scientist thoughtImplemented inVAX 8800PDP-11/60IBM System/370
Current TrendsMicrocode UpdateLinux Utility - microcode_ctlCompanion to IA32 microcode driverIt decodes and sends new microcode to the kernel driver to be uploaded to Intel IA32 processorsUpdate is volatile lost on rebootsMicrocode updates are also rolled into BIOS updates typicallyReady even before an OS is loaded
Intel Said..The Pentium(R) Pro processor and Pentium(R) II processor maycontain design defects or errors known as errata that may cause theproduct to deviate from published specifications. Many times, theeffects of the errata can be avoided by implementing hardware orsoftware work-arounds, which are documented in the Pentium Pro Processor Specification Update and the Pentium II ProcessorSpecification Update. Pentium Pro and Pentium II processors include afeature called "reprogrammable microcode", which allows certain typesof errata to be worked around via microcode updates. The microcodeupdates reside in the system BIOS and are loaded into the processorby the system BIOS during the Power-On Self Test, or POST.
Current TrendsHyperthreading in P4A second logical CPUComplete state of the system in both CPUsMicrocoding in P4Two pointers control flow independentlyBoth processors share the ROM entriesAccess is alternated between the CPUs
Thank You
The Add and Sub microroutines are different from what is there in the Eckerts website