Chapter 1 Microcomputers and Microprocessors

download Chapter 1 Microcomputers and Microprocessors

If you can't read please download the document

description

Chapter 1 Microcomputers and Microprocessors. Microprocessor Evolution and Performance. Contents. Introduction to microcomputer system Microprocessor evolution the INTEL processor family Microprocessor performance. Introduction to Microcomputer. - PowerPoint PPT Presentation

Transcript of Chapter 1 Microcomputers and Microprocessors

  • Chapter 1 Microcomputers and MicroprocessorsMicroprocessor Evolution and Performance

  • ContentsIntroduction to microcomputer systemMicroprocessor evolutionthe INTEL processor familyMicroprocessor performance

  • Introduction to MicrocomputerAn microcomputer can be interpreted as a machine with:I/O devices for Input/Output,microprocessor for processing,memory units for storageBuses for connecting the above componentsIn 1970, a microcomputer was normally interpreted as a computer considerably smaller than a mini-computer, possibly using ROM for program storage

  • Basic hardware unitsInpute.g. keyboard, mouseMicroprocessore.g. 8085, 8086, mc68000 microprocessorsMemorye.g. RAM, hard diskOutpute.g. monitor, printer

  • BusesBuses: External connections to input/output unitMajor Buses:Address bus: address of memory locations containing instructions or dataData bus: contents of memory locationsControl Bus: synchronization and handshaking between components

  • General ArchitectureInputunitMicroprocessingunitOutputunitSecondarymemoryPrimarymemoryMemoryUnit

  • Processor HistoryVacuum Tubes to ICs

  • First Generation ComputersVacuum tube technologyLarge room, air-conditionedTube life-time: 3,000 hoursUseless Machine?1951: 1st Univac I (UNIVersal Automatic Computer) delivered1952: Prediction of presidential election by CBS1952: IBM Model 710 Data Processing System

  • Second Generation ComputersThe Transistor Is Born (Solid-State Era)1948: invention of bipolar transistors1956: Nobel physics award: Drs. William Shockley, John Bardeen and Walter H. Brattain (Bell Labs)1954: Bell Labs: all-transistorized computer (TRADIC)800 transistorsMuch less heatMore reliable and less costly

  • Second Generation ComputersMainframe Computers1958: IBMs 1st transistorized computer 7070/70901959: 1401 (business-oriented model)Built on circuit boards mounted into rack panels, or framesMain frame (mainframe): the CPU portion of the computerPopular with business and industry

  • Third Generation ComputersInvention of IC: 1959Dr. Robert Noyce (Fairchild) and Jack Kilby (TI)Kilby: fabricating resistors, capacitors and transistors on a germanium wafer, and connecting these parts with fine gold wiresNoyce: isolating individual components with reverse-biased diodes, and deposing an adherent metal film over the circuit, thus connecting the components1st IC: 2-transistor multivibratorBy mid 1960s: memory chips with 1,000 components are common

  • Third Generation Computers1964: IBM 360 Series (32-bit)The first to use IC technologyA family of 6 compatible computers40 different I/O and auxiliary storage devicesMemory capacity: 16K words to over 1MB.32-bit registers x 1624-bit address bus128-bit data bus

  • Third Generation Computers1964: IBM 360 Series (32-bit)375,000 computations per second(
  • Minicomputer1960s: Space Race between US & USSRIC industry boomA tremendous demand by scientists and engineers for an inexpensive computer that they could operate by themselves1965: DEC PDP-8 (by Edson de Castros group)Low-cost ($25,000) minicomputer12-bit16-bit PDP-11Supermini

  • Microprocessors: CPU on a Chip1968: INTEL (Integrated Electronics) Founded by Robert Noyce and Gordon Moore (Fairchild)Original goals: semiconductor memory market1969: customized ICs for Busicom for calculatorTed Hoff and Stan Mazor: proposed 4-bit CPU on a single chip, plus ROM, RAM chips

  • Microprocessors: CPU on a Chip1971: 4000 FamilyBy Fredrico Faggin4001: 2K ROM with 4-bit I/O port4002: 320-bit RAM, 4-bit output port4003: 10-bit serial-in parallel-out shift register4004: 4-bit processorProcessor-on-a-chip: Micro-processor era

  • Microprocessors: CPU on a Chip1972: 8008, 8-bit1974: 8080, an improved version

  • Microprocessors: CPU on a Chip8-bit CPUs16-bit address (64K)MC6800: Motorola6502: MOS Technology (spin-off from Motorola)Apple-II, Apple DOSZ-80: Zilog (spin-off from Intel)Z-80 cards on Apple-II, CP/M

  • Microprocessors: CPU on a Chip16-bit CPUs (Late 1970s)8086, 80186, 80286: IntelPC, PC-DOS, MS-DOS, SCO-UnixMC68000: Motorola16-bit instructionsHardware multiply and divide20-bit address buses (1MB)Workstations: Sun3

  • Microprocessors: CPU on a Chip32-bit CPUs80386, 80486: IntelMC68020, 68030: Motorola64-bit CPUsPentium, Pentium Pro (64-bit external data bus, 32-bit internal registers, not recognized as 64-bit CPUs in terms of internal register word length)

  • Microcomputers: Computers Based on Microprocessors1975: MITS Altair 8800 (Kit)$399, i8080, programmed by depositing 1s/0s via front panel switchesOther Computers boom8080: MITS, 6800: SWTPC 6800, Z-80: TRS-80, 6502: Apple I, 8K, programmed with BASICSteve Jobs & Steve Wozniak, millionaires from PC COMs

  • Personal Computers: the Open Architecture Era1982: IBM PCA system board (mother board)Intel 8088 processor16K memory5 expansion slotsThird-party vendors to supply various IO adapter cardsOpen architectureComputer with interchangeable components

  • Micro-controllers: Microcomputers on a ChipMicrocontroller: a computer on a chipMicroprocessor, plusOn-chip memory, plusInput/output ports1995: microcontrollers out sold microprocessors 10:1embedded on various equipments:Thermostat, machine tools, communication, automotive, Evolution: getting greater IO capabilitiesIntel: MCS-51, MCS-96,

  • High-Performance ProcessorsSupercomputersAircraft design, global climate modeling, oil-bearing formation, molecular design of new drugs, financial behaviorCDC6600, 7600: Seymour CrayCray-1: 1976, the first true supercomputerECL, 128 KW power consumption130 MFLOPS (Pentium 100: 150 MFLOPS)$5.1 million

  • High-Performance ProcessorsParallel ProcessorsTens of gigaflopsMulti-processors wired by a common busEach is given a portion of the problem to solveHypercube: early 1980sCosmic Cube, iPSC (with i860/RISC chips)2D rectangular Mesh architecture: multiple processor at each nodeIntel: teraflops computer with 4500 nodes, each powered by 2 Pentium Pro 200.

  • RISC vs. CISCRISC: Reduced Instruction Set Computer (1980s)A small number of fixed-length instructionsSimple addressing modesA large number of registersInstructions executed in one clock cycleIntel i860 (Cray on a Chip)82 instructions, 32-bit long eachFour addressing modes32 general-purpose registers

  • RISC vs. CISCCISC: Complex Instruction Set ComputerA large number of variable length instructionsMultiple addressing modesA small number of registersMultiple number of clock cycles to executeIntel 8086Over 3000 instruction forms, 1-6 bytes9 addressing modes8 general-purpose registersExecution from 2 to 80+ cycles

  • RISC vs. CISCRISCControl unit is much simpler (simpler instructions, execution in 1 CLK)Faster execution with less total on-chip logicChip area: 10% (vs 50% for CISC)More area for register file, data and instruction caches, FPU, and co-processorPowerPC: 32-bit, by IBM, Apple, MotorolaSparc: for SunMicro workstations

  • Application-Specific ProcessorsDSP ChipsMostly for analog signal processingADC-DSP-DAC architectureAvoid processing analog signals using discrete circuits, involving capacitors and inductanceDSP: conduct complex mathematic functionsDigital filter, spectrum analysis

  • Application-Specific ProcessorsDSP Chip ArchitectureDifferent data/program areas: Harvard ArchitectureHardware multipliers and adders, optimized to execute on a single cycleArithmetic pipelining: several instructions operated at onceHardware loop controlMultiple IO ports for communication with other processors

  • Summary of Processor History1940s: Vacuum tube, large and consuming large power1950s: Transistor (1948-)1959: First IC (second industrial revolution)1960s: IC was popular to build CPUs.1971: Intel 4004 microprocessor (2300 transistors) Starts of the microprocessor ageLate 1970s: 8080/85

  • Summary of Processor History1980: RISC (reduced instruction set computer)CISC (complicated instruction set computer) vs. RISCCISC family: Intel 80x86, Pentium; Motorola 68000 seriesAll others are RISC series.

  • Evolution of INTEL Processors4004 (71)-Pentium Pro (93-)

  • INTELIntegrated Electronics1968: founded by Robert Noyce and Gordon MooreIA: Intel Architecture (e.g, IA-16, IA-32, IA-64) since 8008 (72) had became the de facto standardEvolution:Internal register sizesExternal bus widthsReal, Protected, and Virtual 8086 modes

  • 4-bit Processors4004first microprocessorbecame available in 19714-bit microprocessor:4-bit registers & 4-bit data bus#transistors: 2250Min. feature size: 10 micronsAddress bus: 10 bits/1K0.06 MIPS (@ 0.108 MHz)No internal cache

  • 8-bit Processors8008, 8080, 8085became available in 19748-bit microprocessor

  • 8086: IA standardBecame available in 197816-bit data bus20-bit address bus (was 16-bit for 8080)memory organization: 16 segments of 64KB (1 MB limit)Re-organize CPU into BIU (bus interface unit) and EU (execution unit)Allow fetch and execution simultaneouslyInternal register expanded to 16-bitAllow access of low/high byte separately

  • 8086Hardware multiply and divide instructionsExternal math co-processorInstruction set compatible with 8080/80858086: defined the 80x86 architecture

  • 8086Not quite successful16-bit data bus: Requires two separate 8-bit memory banksMemory chips were expensive

  • 8088: PC standardBecame available in 1979, almost identical to 80868-bit data bus: for hardware compatibility with 808016-bit internal registers and data bus (same as 8086)20-bit address bus (was 16-bit for 8080)BIU re-designedmemory organization: 16 segments of 64KB (1 MB limit)Two memory accesses for 16-bit data (less efficient)But less cost8088: used by IBM PC (1982), 16K-64K, 4.77MHz

  • 80186, 80188: High Integration CPUPC system:8088 CPU + various supporting chipsClock generator8251: serial IO (RS232)8253: timer/counter8255: PPI (programmable periphial interface)8257: DMA controller8259: interrupt controller80186/80188: 8086/8088 + supporting functionsCompatible instruction set (+ 9 new instructions)

  • 80286Became available in 1982used in IBM AT computer (1984)16-bit data busclock speed 25% faster than 8088, throughput 5 times greater than 808824-bit address bus (16 MB) (vs. 20-bit/1M 8086)

  • 80286: Real vs. Protected ModesLarger address space: 24-bit address busReal Mode vs. Protected ModeReal Mode:Power on default modeFunction like a 8086: use 20-bit least significant address lines (1M)Software compatible with 28616 new instructions (for Protected Mode management)Faster 286: redesigned processor, plus higher clock rate (6-8MHz)

  • 80286: Real vs. Protected ModesProtected Mode:Multi-program environmentEach program has a predetermined amount of memoryAddressed via segment selector (physical addresses invisible): 16M addressableMultiple programs loaded at once (within their respective segments), protected from read/write by each other

  • 80286: Real vs. Protected ModesProtected Mode:Cannot be switch back to real mode to avoid illegal access by switching back and forth between modesA faster 8086 only?MS-DOS requires that all programs be run in Real Mode

  • Clock SpeedElectrical signals cannot change instantaneously (transition period required)System clock provides timing signal for synchronizationCannot be used to compare the performance of microprocessors with different instruction setse.g., a 66 MHz Pentium is twice as fast as a 66 MHz 80486

  • 80386DX (aka. 80386)available in 1985, a major redesign of 86/286Compatibility commitment through 200032-bit data and address buses (4 GB memory)Real Address Mode: 1M visible, 286 real modeProtected Virtual Address Mode:On board MMUSegmented tasks of 1byte to 4G bytesSegment base, limit, attributes defined by a descriptor registerPage swapping: 4K pages, up to 64TB virtual memory spaceWindows, OS/2, Unix/Linux

  • 80386DX (aka. 80386)Virtual 8086 mode (a special Protected mode feature): permitted multiple 8086 virtual machines-multitasking (similar to real mode)Windows (multiple MSDOSs)Clock rate:max. 40MHz, 2 pulses per R/W bus cycleExternal memory cache to avoid waitFast SRAM93% hit rate with 64K cacheCompatible instructions (14 new)

  • 80386SX80386SX: (for transition to 32-bit)16-bit data bus/32-bit register24-bit address bus

  • 80486DX1989: a polished 386, 6 new OS level instructionsvirtually identical to 386 in terms of compatibilityRISC design conceptsfewer clock cycles per operation, a single clock cycle for most frequently used instructionsMax 50MHz5 stage execution pipelinePortions of 5 instructions execute at once

  • 80486DXHighly Integrated:On board 8K memory cacheFPP (equivalent to external 80387 co-processor)Twice as fast as 386 at any given clock rate20Mhz 486 ~= 40Mhz 386

  • 80486SX80486SXNOT a 16-bit version for transition purposeno coprocessorNo internal cacheFor low-end applicationsMax. 33Mhz only

  • 80486DX2/DX4: Overdrive ChipsProcessor speed increased too fastRedesign of microcomputer for compatibility becomes harderSolution: Separating internal speed with external speed, improve performance independently80486DX2/DX4 internal clock twice/three times (NOT four times) the external clock: runs faster internally

  • 80486DX2/DX4: Overdrive ChipsSystem board design is independent of processor upgrade (less expensive components are allowed)Processor operate at maximum speed data rate internallyOnly slow access to external data operates at system board rateInternal cache offset the speed gap486DX2 66: 66 internal, 33 external486DX4 100: 100 internal, 33 external (3x)Overdrive sockets: for upgrading 486dx/sx to 486dx2/dx4 (with overdrive socket pin-outs)

  • Pentium: Superscaler Processoravailable in 199232-bit architectureSuperscaler architectureScaling: scaling down etchable feature size to increase complexity of IC (e.g., DRAM)10 microns/4004 to 0.13 microns (2001)Superscaler: go beyond simply scaling downTwo instruction pipelines: each with own ALU, address generation circuitry, data cache interfaceExecute two different instructions simultaneously

  • Pentium: Superscaler ProcessorOnboard cacheSeparate 8K data and code caches to avoid access conflictsFPPInstruction pipeline: 8 stageOptimized floating point functions5x-10x FLOPs of 4862x performance of 486 at any clock rate

  • Pentium: Superscaler ProcessorCompatibility with 386/486:Internal 32-bit registers and address busData bus expanded to 64-bits for higher data transfer rateCompare 8088 to 386sx transition

  • Pentium: Superscaler Processornon-clone competition from AMD, Cyrixdevelopment of brand identity by Intel

  • Pentium Pro: Two Chips in OneBecame available in 1995Superscaler of degree 3Can execute 3 instructions simultaneouslyOptimized for 32-bit operating systems (e.g., Windows NT, OS2/Warp)Two separate silicon die on the same packageProcessor: 0.35 u, 5.5 million transistors256KB(/512K) Level 2 cache included on chip, 15.5 million transistors in smaller area

  • Pentium Pro: Two Chips in OneOn Board Level 2 cacheSimplifies system board designRequires less spaceGains faster communication with processorInternal (level 1) cache: 8KPentium Pro 133 ~= 2x Pentium 66 ~= 4x 486DX2 66

  • Pentium Pro:Dynamic ExecutionDynamic execution: reduce idle processor time by predicting instruction behaviorsMultiple Branch Prediction: look as far as 30 instructions ahead to anticipate program branchesData Flow Analysis: looks at upcoming instructions and determine if they are available for processing, depending on other instructions. Determine optimal execution sequences.Speculative Execution: execute instructions in different order as entered. Speculative results are stored until final states can be determined.

  • Processor FutureWhats More from Moores Law?

  • Moore's LawIn 1965, Gordon Moore predicted that:

    The number of transistors per integrated circuit would double every 18 months

    He forecast that this trend would continue through 1975

  • Moores Law

  • Other MicroprocessorsMotorola familyfrom 6809 (Apple II) through 68040PowerPCjoint venture between Apple, IBM, and MotorolaRISC ProcessorsDEC Alpha, MIPS, Sun SPARC, etc.

  • CISC vs. RISCCISC (Complex Instruction Set Computer)CISC processors have a large versatile instruction set that supports many complex addressing modesmove complexity from software to hardwareRISC (Reduced Instruction Set Computer)RISC processors have a small instruction setmove complexity from hardware to software

  • Microprocessor PerformanceTwo main factors:

    Respond timethe time between the start and completion of a task, also referred to as execution timeThroughputthe total amount of work done in a given time

  • MIPSMillion Instructions Per SecondMIPS = (Instruction count) / (Execution time in micro second X 106)It specifies performance inversely to execution timeFaster machines have a higher MIPS rating

  • Some Problems of MIPSCannot compare computers with different instruction sets, since the instruction count will certainly differMIPS varies between programs on the same computer

  • iCOMPAn index provided by Intel for comparison of performance of their 32-bit microprocessorsBased on a variety of performance components that represent integer mathematics, graphics, etc.Combine results of a set of software application benchmarks

  • Chapter 2Computer Codes, Programming, and Operating SystemsNumber SystemsComputer CodesProgrammingOperating Systems

  • Number SystemsDecimal: Base 10Binary: Base 2Octal: Base 8Hexadecimal: Base 16

  • Base Conversion: 210Binary to DecimalD = i=0,n-1 bi x 2iDecimal to BinaryRepeated subtractionD = i=0,m-1 bi x 2i = D - 2m (bm=1)D
  • MCS-51 Program DevelopmentEditorAssemblerLinkerSymbolConverterICETargetProgram.ASM.OBJ.HEX.SYM.SDT(X8051)(Link)(CVTSYM)

  • Chapter 380x86 Processor Architecture8086/88Segmented Memory8038680486PentiumPentium Pro

  • The 8086 and 8088Processor ModelProgramming Model

  • 8086: IA standardBecame available in 197816-bit data bus20-bit address bus (was 16-bit for 8080)memory organization: 16 segments of 64KB (1 MB limit)Re-organize CPU into BIU (bus interface unit) and EU (execution unit)Allow fetch and execution simultaneouslyInternal register expanded to 16-bitAllow access of low/high byte separately

  • 8088: PC standardBecame available in 1979, almost identical to 80868-bit data bus: for hardware compatibility with 808016-bit internal registers and data bus (same as 8086)20-bit address bus (was 16-bit for 8080)BIU re-designedmemory organization: 16 segments of 64KB (1 MB limit)Two memory accesses for 16-bit data (less efficient)But less cost8088: used by IBM PC (1982), 16K-64K, 4.77MHz

  • 80186, 80188: High Integration CPUPC system:8088 CPU + various supporting chipsClock generator8251: serial IO (RS232)8253: timer/counter8255: PPI (programmable periphial interface)8257: DMA controller8259: interrupt controller80186/80188: 8086/8088 + supporting functionsCompatible instruction set (+ 9 new instructions)

  • 8086 Processor Model: BIU+EUBIUMemory & IO address generationEUReceive codes and data from BIUNot connected to system busesExecute instructionsSave results in registers, or pass to BIU to memory and IO

  • 8086 Processor ModelAddress Generationand Bus ControlInstruction QueueEUBIU

  • Fetch and Execution CycleBIU+EU allows the fetch and execution cycle to overlap0. System boot, Instruction Queue is empty1. IP =>BIU=> address bus && IP++2. Mem[(IP-1)] => Instruction Queue[tail++]3a. InstrQ[head] => EU => execution3b. Mem[IP++] => InstrQ[tail++]Maybe multiple instructionsRepeat 3a+3b (overlapped)

  • Waiting Conditions: Memory AccessBIU+EU: execute (almost) continuously without waitingWaiting Conditions: Accessing memory locations not in queueBIU suspend instruction fetchIssues external memory addressResumes instruction fetch and execution

  • Waiting Conditions: JumpNext Jump InstructionInstructions in queue are discardedEU wait for the next instruction after the jump location to be fetched by BIUResume execution

  • Waiting Conditions: Long InstructionsLong Instruction is being executedInstruction FullBIU waitsResume instruction fetch after EU pull one or tow bytes from queue

  • BIU: 8088 vs. 8086BIU is the major difference8088:data bus: 8-bit (vs. 16-bit/8086)Instruction queue: 4 bytes (vs. 6-byte/8086)Only 30% slower than 8086If queue is kept full

  • 8086 Programming Model

  • 8086 Programming ModelData Group:AX (AH+AL): AccumulatorBX (BH+BL): BaseCX (CH+CL): CounterDX (DH+DL): Data

  • 8086 Programming ModelSegment Group:CS: Code SegmentDS: Data SegmentES: Extra SegmentSS: Stack SegmentSegment Registers:Base address to particular segments

  • 8086 Programming ModelPointer/Index Group:IP: Instruction Pointer CSSI: Source IndexDSDI: Destination IndexESSP: Stack PointerSSIndex Registers:Index (offset) or Pointer to a Base address

  • 8086 Flag WordFlag L SF ZF X AF X PF X CF AF: Aux. Carry: Carry/Borrow on bit 3 (Low nibble of AL)SF: Sign Flag: (0: positive, 1: negative)ZF: Zero Flag: (1: result is zero)PF: (Even) Parity Flag (even number of 1s in low-order 8 bits of result)

  • 8086 Flag WordFlag H X X X X OF DF IF TF TF: Trap flag (single-step after next instruction; clear by single-step interrupt)IF: Interrupt-Enable: enable maskable interruptsDF: Direction flag: auto-decrement (1) or increment(0) index on string operationsOF: Overflow: signed result cannot be expressed within #bits in destination operand

  • Segmented MemoryLinear vs. SegmentedLinear Addressing:The entire memory is regarded as a wholethe entire memory space is available all the timeSegmented:memory is divided into segmentsProcess is limited to access designated segments at a given time

  • 8086 Memory OrganizationEven and Odd Memory Banks16-bit data bustwo-byte / two one-byte accessAllows processor to work on bytes or on words (16-bit)IO operations are normally conducted in bytesCan handle odd-length instructionsSingle byte instructionsMultiple byte (and very long) instructions

  • 8086 Memory OrganizationMemory Space:20-bit address busLinearly, 1M bytes directly addressableMemory BanksCan read 16-bit data (512K words) from even and odd-addressed simultaneouslyneed Two memory banks in parallelBHE control line: allows addressing even/odd banks or both

  • Memory Organization: AlignmentEndianess:One way to model multi-byte CPU registerAX AH+ALTwo ways to store operands in memoryBig-endian CPU: (IBM370, M68*, Sparc)High-order-byte-first (HOBF)Maps highest-order byte of internal registerlowest (1st) memory byte addressOperand addressaddress of MSBMOV R1, N N: 1st byte in memory & MSB of register

  • Memory Organization: AlignmentLittle-endian CPU: (DEC, Intel)Low-order-byte-first (LOBF)Maps lowest-order byte of register 1st memory byteOperand address address of LSB (1st memory byte)MOV AX, N N: 1st byte in memory & LSB of registerALN, AHN+1Configurable:Can switch between Big/Little-endian, orProvide instructions which convert 16-/32-bit data between two byte ordering (80486)

  • 8086 Memory OrganizationAligned operandOperand aligned at even-byte (word/dword) boundariesAllows single access to read/write one operandThrough internal shift/swap mechanism, if necessaryMis-aligned words:Word operand not start at even addressNeed 2 read cycles to read/write the word (8086)Issues two addresses to access the two even-aligned words containing the operand in order to access the operandslower but transparent to programmer

  • 8086 Memory Organization8088always 2 cycles for word operationsAligned or notBecause of 8-bit external data busSingle memory bank is sufficient

  • 8086 Memory MapMemory Map: How memory space is allocatedROM Area: boot, BIOSRAM: OS/User Apps & dataUnusedReserved: for future hardware/software usesDedicated: for specific system interrupt and rest functions, etc.

  • Segment Registers64K memory segments x 1616-bit offset eachCS, DS, ES, SS

  • Logical and Physical AddressesPhysical: 20-bitLogical: 16-bit16-byte segment boundariesAddress TranslationE.g., CS:IP

  • 80286First with Protection ModeReview of 286 Protected Mode Next

  • 80286Became available in 1982used in IBM AT computer (1984)16-bit data busclock speed 25% faster than 8088, throughput 5 times greater than 808824-bit address bus (16 MB) (vs. 20-bit/1M 8086)

  • 80286: Real vs. Protected ModesLarger address space: 24-bit address busReal Mode vs. Protected ModeReal Mode:Power on default modeFunction like a 8086: use 20-bit least significant address lines (1M)Software compatible with 28616 new instructions (for Protected Mode management)Faster 286: redesigned processor, plus higher clock rate (6-8MHz)

  • 80286: Real vs. Protected ModesProtected Mode:Multi-program environmentEach program has a predetermined amount of memoryAddressed via segment selector (physical addresses invisible): 16M addressableMultiple programs loaded at once (within their respective segments), protected from read/write by each other

  • 80286: Real vs. Protected ModesProtected Mode:Cannot be switch back to real mode to avoid illegal access by switching back and forth between modesA faster 8086 only?MS-DOS requires that all programs be run in Real Mode

  • 80386 ModelRefine 286 Protect ModeExpand to 32-bit registersNew Virtual 8086 Mode

  • 80386 Review

  • 80386DX (aka. 80386)available in 1985, a major redesign of 86/286Compatibility commitment through 200032-bit data and address buses (4 GB memory)Real Address Mode: 1M visible, 286 real modeProtected Virtual Address Mode:On board MMUSegmented tasks of 1byte to 4G bytesSegment base, limit, attributes defined by a descriptor registerPage swapping: 4K pages, up to 64TB virtual memory spaceWindows, OS/2, Unix/Linux

  • 80386DX (aka. 80386)Virtual 8086 mode (a special Protected mode feature): permitted multiple 8086 virtual machines-multitasking (similar to real mode)Windows (multiple MSDOSs)Clock rate:max. 40MHz, 2 pulses per R/W bus cycleExternal memory cache to avoid waitFast SRAM93% hit rate with 64K cacheCompatible instructions (14 new)

  • 80386SX80386SX: (for transition to 32-bit)16-bit data bus/32-bit register24-bit address bus

  • 80386: Real vs. Protected ModesLarger address space: 32-bit address bus (4G)Real Mode vs. Protected Mode (refined from 286)Real Mode:Power on default modeFunction like a 8086: (1) use only 20-bit least significant address lines (1M) (2) segmented memory retained (64K)Software compatible with 286New Real Mode Features:access to 32-bit register settwo new segments: F, G

  • 80386: Real vs. Protected ModesProtected Mode:new addressing mechanism vs. real modesupports protection levelssegment size: 1 to 4G (not 64K, fixed)segment register: pointer to a descriptor tablenot base address

  • 80386: Real vs. Protected ModesProtected Mode:descriptor table: (8 byte per entry)32-bit base address of segmentsegment sizeaccess rightsmemory address = base address (in table) + offset (in instruction)

  • 80386: Real vs. Protected ModesProtected Mode:Paging mechanism:map 32-bit linear address (base+offset) =>physical address & page frame address(4K page frames in system memory)64TB of virtual memory

  • 80386: Real vs. Protected ModesProtected Mode:Protection mechanism:tasks/data/instructions are assigned a privilege level (PL)tasks running at lower PL cannot access tasks or data segments at a higher PLrunning programs that are protected from the others

  • 80386: Real vs. Protected ModesTwo Ways to Run 8086 Programs:Real ModeVirtual 8086 ModeVirtual 8086 Mode:runs multiple 8086+other 386 (protected mode) programs independentlyeach sees 1 MB (mapped via paging to anywhere in 4GB space)running V8086+ Protected mode simultaneously

  • 80386 Processor Model386

  • 80386 Processor Model: BIU+CPU+MMUBIUcontrol 32-bit address and data buseskeep instruction queue full (16 bytes)Address pipeliningaddress of next memory location is output halfway through current bus cyclemore address decode timeslower memory chip is OKeasier to keep up with faster (2 CLK) bus cycle of 386

  • 80386 Processor Model: BIUdynamic data bus sizingswitch between 16-/32-bit data bus on the flyaccommodate to external 16-bit memory cards or IO devicesadjust bus timing to use only the least significant 16 bits

  • 80386 Processor Model: BIUExternal memory4 memory banks (4x8=32bits)BE0-BE3 for bank selectionaccess byte or word or double wordaligned operands: 1 bus cyclemis-aligned (not %4): 2 bus cycles

  • 80386 Processor Model: CPUCPU=IU (instruction) +EU (execution)fetching & execution overlapIU:retrieval instructions from queuedecodestore in decoded queueEU:ALU+registers (32-bit)execute decode instructions

  • 80386 Processor Model: MMUSegmentation unitReal mode: generate the 20-bit physical addressProtected mode: store base/size/rights in descriptor registerscache descriptor tables in RAMfaster operationsPaging Unitdetermines physical addresses associated with active segments (divided into 4K pages)virtual memory support to allow larger programs

  • 80386 Programming ModelGeneral Purpose RegistersData & Addresses GroupsStatus & Control FlagsVM, RF, NT, IOPLSegment Group

  • 80386 Programming ModelSpecial purpose Registers

  • 80386 Programming ModelMemory Managementsegment descriptorskeep base, size, access rights3 types of tables: global (GDT), local (LDT), interrupt (IDT)addressing:index (to a table) + RPLbase + offset (from instruction)PagingTLB

  • 80386 Programming ModelProtection (PL)task: CPLinstruction: RPLdata segment: DPLGatesspecial descriptors that allows access to higher PL tasks from lower PL tasks

  • 80486 Review

  • 80486DX1989: a polished 386, 6 new OS level instructionsvirtually identical to 386 in terms of compatibilityRISC design conceptsfewer clock cycles per operation, a single clock cycle for most frequently used instructionsMax 50MHz5 stage execution pipelinePortions of 5 instructions execute at once

  • 80486DXHighly Integrated:On board 8K memory cacheFPP (equivalent to external 80387 co-processor)Twice as fast as 386 at any given clock rate20Mhz 486 ~= 40Mhz 386

  • 80486SX80486SXNOT a 16-bit version for transition purposeno coprocessorNo internal cacheFor low-end applicationsMax. 33Mhz only

  • 80486DX2/DX4: Overdrive ChipsProcessor speed increased too fastRedesign of microcomputer for compatibility becomes harderSolution: Separating internal speed with external speed, improve performance independently80486DX2/DX4 internal clock twice/three times (NOT four times) the external clock: runs faster internally

  • 80486DX2/DX4: Overdrive ChipsSystem board design is independent of processor upgrade (less expensive components are allowed)Processor operate at maximum speed data rate internallyOnly slow access to external data operates at system board rateInternal cache offset the speed gap486DX2 66: 66 internal, 33 external486DX4 100: 100 internal, 33 external (3x)Overdrive sockets: for upgrading 486dx/sx to 486dx2/dx4 (with overdrive socket pin-outs)

  • 486 Processor Features386 features:Real/Protected ModesMemory ManagementPLsregisters & bus sizesNew features6 OS instructions8K/16K onboard cache (was external before 386)

  • 486 Processor FeaturesA better 3865 stage instruction pipelineIF/ID/EX => PF/D1/D2/EX/WBPF: instructions => Q (2*16-bytes)D1: determine opcodeD2: determine memory address of operandsEX: execute indicated OPWB: update register

  • 486 Processor FeaturesReduced Instruction Cycle Times5 stage instruction pipeline (e.g., Fig. 3.18)instruction cycle times:8086: 4 CLK80386: 2 CLK80486: 1 CLK (close to RISC)about 2X faster than 386

  • 486 Processor Model: 386+FPU+Cache386 units retained: BIU, CPU, MMUnew: FPU (80387) + Cache (8K/16K)FPU:387 onboard0.8 u => #transistors increased (275K => 1+ millions)simplified system board designspeedup FP operations

  • 486 Processor Model: CacheCache (8K/16K (dx4))Function: bridge processor memory bandwidth8088: 4.77MHz80486: 50MHzPentium: 100MHzPentium Pro: 133 MHzMain Memory (DRAM): relatively slowFast Static RAMs (SRAM) as cache

  • 486 Processor Model: CacheOrganization:8K4-way set associative4 direct mapped caches wired in paralleleach block maps to a set of 4 lines unified: data & code in the same cachewrite-through: update cache and memory page on write operations

  • 486 Processor Model: Cachelocality (why caches help?)spatial locality: e.g., array of datatemporal: e.g., loops in codesoperations on hit/miss128-bit cache lines32-bit x N to catch locality (N=4)128-bit = 16-byte

  • 486 Processor Model: CacheMapping:memory => many-to-many => cacheData RAM: save memory dataTag RAM: save memory address information3 methods of mappingfully associative: memory block to any cache linedirect map: memory block to specific linetrashingset associative: memory block to a set of cache lines

  • 486 Processor Model: CacheReplacement policy (LRU)valid bits: all 4 lines in use ?NO => use any unused lineYES => find one to replaceLRU bits: which is least recently used

  • Pentium Review

  • Pentium: Superscaler Processoravailable in 199232-bit architectureSuperscaler architectureScaling: scaling down etchable feature size to increase complexity of IC (e.g., DRAM)10 microns/4004 to 0.13 microns (2001)Superscaler: go beyond simply scaling downTwo instruction pipelines: each with own ALU, address generation circuitry, data cache interfaceExecute two different instructions simultaneously

  • Pentium: Superscaler ProcessorOnboard cacheSeparate 8K data and code caches to avoid access conflictsFPPInstruction pipeline: 8 stageOptimized floating point functions5x-10x FLOPs of 4862x performance of 486 at any clock rate

  • Pentium: Superscaler ProcessorCompatibility with 386/486:Internal 32-bit registers and address busData bus expanded to 64-bits for higher data transfer rateCompare 8088 to 386sx transition

  • Pentium: Superscaler Processornon-clone competition from AMD, Cyrixdevelopment of brand identity by Intel

  • Pentium Pro Review

  • Pentium Pro: Two Chips in OneBecame available in 1995Superscaler of degree 3Can execute 3 instructions simultaneouslyOptimized for 32-bit operating systems (e.g., Windows NT, OS2/Warp)Two separate silicon die on the same packageProcessor: 0.35 u, 5.5 million transistors256KB(/512K) Level 2 cache included on chip, 15.5 million transistors in smaller area

  • Pentium Pro: Two Chips in OneOn Board Level 2 cacheSimplifies system board designRequires less spaceGains faster communication with processorInternal (level 1) cache: 8KPentium Pro 133 ~= 2x Pentium 66 ~= 4x 486DX2 66

  • Pentium Pro:Dynamic ExecutionDynamic execution: reduce idle processor time by predicting instruction behaviorsMultiple Branch Prediction: look as far as 30 instructions ahead to anticipate program branchesData Flow Analysis: looks at upcoming instructions and determine if they are available for processing, depending on other instructions. Determine optimal execution sequences.Speculative Execution: execute instructions in different order as entered. Speculative results are stored until final states can be determined.