Mpmc

1/153 MPMC© Pawar Virendra D.

Microprocessors and

Microcontrollers

Third Year BE Computers

Pawar Virendra D. Mo. No.:9423582261


Syllabus

EC4813 : Microprocessors and Microcontrollers Microprocessors and Microcontrollers Prerequisites : Understanding of Microprocessors, Peripheral Chips, Analogue Sensors, Conversion, Interfacing Techniques. Aim : This course covers the design of hardware and software code using a modern microcontroller. It emphasizes on assembly language programming of the microcontroller including device drivers, exception and interrupt handling, and interfacing with higher-level languages. Objectives: 1. To exhibit knowledge of the architecture of microcontrollers and apply program control structures to microcontrollers; 2. To develop the ability to use assembly language to program a microcontroller and demonstrate the capability to program the microcontroller to communicate with external circuitry using parallel ports; 3. To demonstrate the capability to program the microcontroller to communicate with external circuitry using serial ports and timer ports. Unit 1 : Introduction to Pentium microprocessor ( 7 Hrs ) Pentium Microprocessor: History ,Feature & Architecture, Pin Description , Functional Description Real Mode, Risc Super Scalar, Pipe lining , Instruction Pairing, Branch Prediction, Inst Data Cache. FPU Unit 2 : Bus Cycles and Memory Organization: ( 7 Hrs ) Bus Cycles & Memory Organisation : Init & Configuration, Bus Operations-RST, Bus Operations-RST, Mem/Io Organisation, Data Transfer Mechanism , 8/16/32 bit Data Bus I, Programmers Model, Register Set, Instru Set , Data Types, Instructions Unit 3 : Protected Mode: ( 6 Hrs ) Protected Mode :Intro Segmentation, Supp Registers ,Rel Int Desc, Mem Man thru Segmentation , Logical to linear translation, protection by segmentation, Privilege Level protection, related instructions, inter - privilege level transfer of control, paging-support registers, descriptors ,linear-physical add trans, TLB, page level protection ,virtual memory Unit 4 : Multitasking, Interrupts, Exceptions and I /O ( 6 Hrs ) Multitasking, Interrupts, Exception I/O :Multi Tasking Support Reg , Rel Des, Task Switch I/O per BitMap, Virtual Mode, Add Gen, Priv Level, Inst &Reg ,enter/Leaving V86 M, Interrupt Structure Real/Prot V86 Mode, I/O Handling, comparison of 3 modes. Unit 5 : 8051 Micro controller ( 7 Hrs ) Family Architecture , ,Data / Programme Memory , Reg set Reg Bank SFR, Ext Data / Mem Programme Mem, Interrupt Structure , Timer Prog ,Serial Port Prog , Misc Features, Min System Unit 6 : PIC Micro-Controller ( 7 Hrs ) PIC Micro-Controller :OverView ,Features, Pin Out, Capture /Compare /Pulse width modulation Mode , Block Dia Prog Model, Rest /Clocking, Mem Org, Prog/Data, Flash Eprom, Add Mode/Inst Set Prog , I/o, Interrupt , Timer, ADC Outcomes: Upon completion of the course, the student should be able to:


1. Describe and use the functional blocks utilized in a basic microcontroller based system. 2. Describe the programmer's model of the CPU's instruction set and various addressing modes. 3. Proficiently use the various instruction set and functional groups, when programming. 4. Integrate structured programming techniques and sub-routines into microcontroller based hardware topologies. 5. Develop I/O port, ADC hardware, and software interfacing techniques. 6. Describe the use of sensors, interfacing, and signal conditioning when utilizing the microcontroller in control and monitor applications. Text Books: 1. Antonakos J., "The Pentium Microprocessor", Pearson Education, 2004, 2nd Edition. 2. Deshmukh A., "Microcontrollers - Theory and Applications", Tata McGraw-Hill, 2004, Reference Books: 1. Mazidi M., Gillispie J., " The 8051 Microcontroller and embedded systems", Pearson education, 2002, ISBN - 81-7808-574-7 2 Intel Pentium Data Sheets 3. Ayala K., "The 8051 Microcontroller", Penram International, 1996, ISBN 81 -900828- 4-1 4. Intel 8 bit Microcontroller manual 5. Microchip manual for PIC 16CXX and 16FXX


INTRODUCTION 16-bit Processors and Segmentation (1978) The IA-32 architecture family was preceded by 16-bit processors, the 8086 and 8088. The 8086 has 16-bit registers and a 16-bit external data bus, with 20-bit addressing giving a 1-MByte address space. The 8088 is similar to the 8086 except it has an 8-bit external data bus. The 8086/8088 introduced segmentation to the IA-32 architecture. With segmentation, a 16-bit segment register contains a pointer to a memory segment of up to 64 KBytes. Using four segment registers at a time, 8086/8088 processors are able to address up to 256 KBytes without switching between segments. The 20-bit addresses that can be formed using a segment register and an additional 16-bit pointer provide a total address range of 1 MByte. The Intel ® 286 Processor (1982) The Intel 286 processor introduced protected mode operation into the IA-32 architecture. Protected mode uses the segment register content as selectors or pointers into descriptor tables. Descriptors provide 24-bit base addresses with a physical memory size of up to 16 Mbytes , support for virtual memory management on a segment swapping basis, and a number of protection mechanisms. These mechanisms include: • Segment limit checking • Read-only and execute-only segment options • Four privilege levels The Intel386 ™ Processor (1985) The Intel386 processor was the first 32-bit processor in the IA-32 architecture family. It introduced 32-bit registers for use both to hold operands and for addressing. The lower half of each 32-bit Intel386 register retains the properties of the 16-bit registers of earlier generations, permitting backward compatibility. The processor also provides a virtual-8086 mode that allows for even greater efficiency when executing programs created for 8086/8088 processors. In addition, the Intel386 processor has support for: • A 32-bit address bus that supports up to 4-GBytes of physical memory • A segmented-memory model and a flat memory model • Paging, with a fixed 4-KByte page size providing a method for virtual memory management • Support for parallel stages The Intel486 ™ Processor (1989) The Intel486™ processor added more parallel execution capability by expanding the Intel386 processor’s instruction decode and execution units into five pipelined stages. Each stage operates in parallel with the others on up to five instructions in different stages of execution. In addition, the processor added: • An 8-KByte on-chip first-level cache that increased the percent of instructions that could execute at the scalar rate of one per clock


• An integrated x87 FPU • Power saving and system management capabilities The Intel ® Pentium ® Processor (1993) The introduction of the Intel Pentium processor added a second execution pipeline to achieve superscalar performance (two pipelines, known as u and v, together can execute two instructions per clock). The on-chip first-level cache doubled, with 8 KBytes devoted to code and another 8 KBytes devoted to data. The data cache uses the MESI protocol to support more efficient write-back cache in addition to the write-through cache previously used by the Intel486 processor. Branch prediction with an on-chip branch table was added to increase performance in looping constructs. In addition, the processor added: • Extensions to make the virtual-8086 mode more efficient and allow for 4-MByte as well as 4-KByte pages • Internal data paths of 128 and 256 bits add speed to internal data transfers • Burst able external data bus was increased to 64 bits • An APIC to support systems with multiple processors • A dual processor mode to support glueless two processor systems PROCESSOR FEATURES OVERVIEW The Pentium processor supports the features of previous Intel Architecture processors and provides significant enhancements including the following: • Superscalar Architecture • Dynamic Branch Prediction • Pipelined Floating-Point Unit • Improved Instruction Execution Time • Separate Code and Data Caches. • Writeback MESI Protocol in the Data Cache • 64-Bit Data Bus • Bus Cycle Pipelining • Address Parity • Internal Parity Checking • Functional Redundancy Checking2 and Lock Step operation2 • Execution Tracing • Performance Monitoring • IEEE 1149.1 Boundary Scan • System Management Mode • Virtual Mode Extensions • Upgradable with a Pentium OverDrive processor2 • Dual processing support • Advanced SL Power Management Features • Fractional Bus Operation • On-Chip Local APIC Device • Functional Redundancy Checking and Lock Step operation


• Support for the Intel 82498/82493 and 82497/82492 cache chipset products • Upgradability with a Pentium OverDrive processor • Split line accesses to the code cache

COMPONENT INTRODUCTION The application instruction set of the Pentium processor family includes the complete instruction set of existing Intel Architecture processors to ensure backward compatibility, with extensions to accommodate the additional functionality of the Pentium processor. All application software written for the Intel386™ and Intel486™ microprocessors will run on the Pentium processor without modification. The on-chip memory management unit (MMU) is completely compatible with the Intel386 and Intel486 CPUs. The two instruction pipelines and the floating-point unit on the Pentium processor are capable of independent operation. Each pipeline issues frequently used instructions in a single clock. Together, the dual pipes can issue two integer instructions in one clock, or one floating-point instruction (under certain circumstances, 2 floating-point instructions)


in one clock. Branch prediction is implemented in the Pentium processor. To support this, the Pentium processor implements two prefetch buffers, one to prefetch code in a linear fashion, and one that prefetches code according to the Branch Target Buffer (BTB) so the needed code is almost always prefetched before it is needed for execution. The Pentium processor includes separate code and data caches integrated on chip to meet its performance goals.. The caches on the Pentium processor are each 8 Kbytes in size and 2-way set-associative. Each cache has a dedicated Translation Lookaside Buffer (TLB) to translate linear addresses to physical addresses. The Pentium processor data cache is configurable to be writeback or writethrough on a line-by-line basis and follows the MESI protocol. The data cache tags are triple ported to support two data transfers and an inquire cycle in the same clock. The code cache is an inherently write protected cache. The code cache tags of the Pentium processor are also triple ported to support snooping and split-line accesses. The Pentium processor has a 64-bit data bus. Burst read and burst writeback cycles are supported by the Pentium processor. In addition, bus cycle pipelining has been added to allow two bus cycles to be in progress simultaneously. The Pentium processor Memory Management Unit contains optional extensions to the architecture which allow 4 MB page sizes. The Pentium processor has added significant data integrity and error detection capability. Data parity checking is still supported on a byte-by-byte basis. Address parity checking, and internal parity checking features have been added along with a new exception, the machine check exception. The Pentium processor has implemented functional redundancy checking to provide maximum error detection of the processor and the interface to the processor. When functional redundancy checking is used, a second processor, the “checker” is used to execute in lock step with the “master” processor. The checker samples the master’s outputs and compares those values with the values it computes internally, and asserts an error signal if a mismatch occurs. The Pentium processor with MMX technology does not support functional redundancy checking. As more and more functions are integrated on chip, the complexity of board level testing is increased. To address this, the Pentium processor has increased test and debug capability by implementing IEEE Boundary Scan (Standard 1149.1). System management mode has been implemented along with some extensions to the SMM architecture. Enhancements to the Virtual 8086 mode have been made to increase performanceby reducing the number of times it is necessary to trap to a Virtual 8086 monitor. including the two instruction pipelines, the “u” pipe and the “v” pipe. The u-pipe can execute all integer and floating-point instructions. The v-pipe can execute simple integer instructions and the FXCH floating-point instruction.


The separate code and data caches are shown. The data cache has two ports, one for each of the two pipes (the tags are triple ported to allow simultaneous inquire cycles). The data cache has a dedicated to translate linear addresses to the physical addresses used by the data cache. The code cache, branch target buffer and prefetch buffers are responsible for getting raw instructions into the execution units of the Pentium processor. Instructions are fetched from the code cache or from the external bus. Branch addresses are remembered by the branch target buffer. The code cache TLB translates linear addresses to physical addresses used by the code cache. The decode unit contains two parallel decoders which decode and issue up to the next two sequential instructions into the execution pipeline. The control ROM contains the microcode which controls the sequence of operations performed by the processor. The control unit has direct control over both pipelines. The Pentium processor contains a pipelined floating-point unit that provides a significant floating-point performance advantage over previous generations of Intel Architecture-based processors. The Pentium processor includes features to support multi-processor systems, namely an on chip Advanced Programmable Interrupt Controller (APIC). This APIC implementation supports multiprocessor interrupt management (with symmetric interrupt distribution across all processors), multiple I/O subsystem support, 8259A compatibility, and inter-processor interrupt support. The dual processor configuration allows two Pentium processors to share a single L2 cache for a low-cost symmetric multi-processor system. The two processors appear to the system as a single Pentium processor. Multiprocessor operating systems properly schedule computing tasks between the two processors. This scheduling of tasks is transparent to software applications and the end-user. Logic built into the processors support a “glueless” interface for easy system design. Through a private bus, the two Pentium processors arbitrate for the external bus and maintain cache coherency. The Pentium processor can also be used in a conventional multi-processor system in which one L2 cache is dedicated to each processor. The Pentium processor is produced on Intel’s advanced silicon technology. The Pentium processor also includes SL enhanced power management features. When the clock to the Pentium processor is stopped, power dissipation is virtually eliminated. The low VCC operating voltages and SL enhanced power management features make the Pentium processor a good choice for energy-efficient desktop designs.


PIN DESCRIPTION Symbol Type Name and Function A31-A3

I/O As outputs, the address lines of the processor along with the byte enables define the physical area of memory or I/O accessed. The external system drives the inquire address to the processor on A31-A5.

D63-D0

I/O These are the 64 data lines for the processor. Lines D7-D0 define the least significant byte of the data bus; lines D63-D56 define the most significant byte of the data bus. When the CPU is driving the data lines, they are driven during the T2, T12, or T2P clocks for that cycle. During reads, the CPU samples the data bus when BRDY# is returned.

ADS#

O The address status indicates that a new valid bus cycle is currently being driven by the Pentium processor

BE7#-BE5# BE4#-BE0#

O I/O

The byte enable pins are used to determine which bytes must be written to external memory, or which bytes were requested by the CPU for the current cycle. The byte enables are driven in the same clock as the address lines (A31-3).

BOFF# I The backoff input is used to abort all outstanding bus cycles that have not yet completed. In response to BOFF#, the Pentium processor will float all pins normally floated during bus hold in the next clock. Theprocessor remains in bus hold until BOFF# is negated, at which time the Pentium processor restarts the aborted bus cycle(s) in their entirety.

BRDY# I The burst ready input indicates that the external system has presented valid data on the data pins in response to a read or that the external system has accepted the Pentium processor data in response to a write request. This signal is sampled in the T2, T12 and T2P bus states.

CACHE# O For Pentium processor initiated cycles the cache pin indicates internal cacheability of the cycle (if a read), and indicates a burst write back cycle (if a write). If this pin is driven inactive during a read cycle, the Pentium processor will not cache the returned data, regardless of the state of the KEN# pin. This pin is also used to determine the cycle length (number of transfers in the cycle).

CPUTYP I CPU type distinguishes the Primary processor from the Dual processor. In a single processor environment, or when the Pentium processor is acting as the Primary processor in a dual processing system, CPUTYP should be strapped to VSS. The Dual processor should have CPUTYP strapped to VCC. For the Pentium OverDrive processor, CPUTYP will be used to determine whether the bootup handshake protocol will be used (in a dual socket system) or not (in a single socket system).

FLUSH# I When asserted, the cache flush input forces the Pentium processor to write back all modified lines in the data cache


and invalidate its internal caches. A Flush Acknowledge special cycle will be generated by the Pentium processor indicating completion of the write back and invalidation. If FLUSH# is sampled low when RESET transitions from high to low, tristate test mode is entered. If two Pentium processor are operating in dual processing mode and FLUSH# is asserted, the Dual processor will perform a flush first (without a flush acknowledge cycle), then the Primary processor will perform a flush followed by a flush acknowledge cycle. NOTE: If the FLUSH# signal is asserted in dual processing mode, it must be deasserted at least one clock prior to BRDY# of the FLUSH Acknowledge cycle to avoid DP arbitration problems.

FRCMC# I The functional redundancy checking master/checker mode input is used to determine whether the Pentium processor is configured in master mode or checker mode. When configured as a master, the Pentium processor drives its output pins as required by the bus protocol. When configured as a checker, the Pentium processor tristates all outputs (except IERR# and TDO) and samples the output pins. The configuration as a master/checker is set after RESET and may not be changed other than by a subsequent RESET.

HOLD I In response to the bus hold request, the Pentium processor will float most of its output and input/output pins and assert HLDA after completing all outstanding bus cycles. The Pentium processor will maintain its bus in this state until HOLD is de-asserted. HOLD is not recognized during LOCK cycles. The Pentium processor will recognize HOLD during reset.

HOLDA O The bus hold acknowledge pin goes active in response to a hold request driven to the processor on the HOLD pin. It indicates that the Pentium processor has floated most of the output pins and relinquished the bus to another local bus master. When leaving bus hold, HLDA will be driven inactive and the Pentium processor will resume driving the bus. If the Pentium processor has a bus cycle pending, it will be driven in the same clock that HLDA is de-asserted.

INIT

I The Pentium processor initialization input pin forces the Pentium processor to begin execution in a known state. The processor state after INIT is the same as the state after RESET except that the internal caches, write buffers, and floating point registers retain the values they had prior to INIT. INIT may NOT be used in lieu of RESET after power-up. If INIT is sampled high when RESET transitions from high to low, the Pentium processor will perform built-in self test prior to the start of program execution.


INV I The invalidation input determines the final cache line state (S or I) in case of an inquire cycle hit. It is sampled together with the address for the inquire cycle in the clock EADS# is sampled active.

KEN# I The cache enable pin is used to determine whether the current cycle is cacheable or not and is consequently used to determine cycle length. When the Pentium processor generates a cycle that can be cached (CACHE# asserted) and KEN# is active, the cycle will be transformed into a burst line fill cycle.

LOCK# O The bus lock pin indicates that the current bus cycle is locked. The Pentium processor will not allow a bus hold when LOCK# is asserted (but AHOLD and BOFF# are allowed). LOCK# goes active in the first clock of the first locked bus cycle and goes inactive after the BRDY# is returned for the last locked bus cycle. LOCK# is guaranteed to be de-asserted for at least one clock between back-to-back locked cycles.

NA# I An active next address input indicates that the external memory system is ready to accept a new bus cycle although all data transfers for the current cycle have not yet completed. The Pentium processor will issue ADS# for a pending cycle two clocks after NA# is asserted. The Pentium processor supports up to 2 outstanding bus cycles.

RESET I RESET forces the Pentium processor to begin execution at a known state. All the Pentium processor internal caches will be invalidated upon the RESET. Modified lines in the data cache are not written back. FLUSH#, FRCMC# and INIT are sampled when RESET transitions from high to low to determine if tristate test mode or checker mode will be entered, or if BIST will be run.


REAL MODE RISC

A Complex Instruction Set Computer (CISC) provides a large and powerful range of instructions, which is less flexible to implement. For example, the 8086 microprocessor family has these instructions:

JA Jump if Above JAE Jump if Above or Equal JB Jump if Below

By contrast, the Reduced Instruction Set Computer (RISC) concept is to identify the sub-components and use those. As these are much simpler, they can be implemented directly in silicon, so will run at the maximum possible speed. Nothing is 'translated'

Most modern CISC processors, such as the Pentium, uses a fast RISC core with an interpreter sitting between the core and the instruction. So when you are running Windows95 on a PC, it is not that much different to trying to get W95 running on the software PC emulator. Just imagine the power hidden inside the Pentium... .

This is not to say that CISC processors cannot have a large number of registers, some do. However for it's use, a typical RISC processor requires more registers to give it additional flexibility. Gone are the days when you had two general purpose registers and an 'accumulator'.

One thing RISC does offer, though, is register independence

The 8086 offers you fourteen registers, but with caveats: The first four (A, B, C, and D) are Data registers (a.k.a. scratch-pad registers). They are 16bit and accessed as two 8 bit registers, thus register A is really AH (A, high-order byte) and AL (A low-order byte). These can be used as general purpose registers, but they can also have dedicated functions - Accumulator, Base, Count, and Data.

The advantages of RISC against CISC are those today:

• RISC processors are much simpler to build, by this again results in the following advantages:

o easier to build, i.e. you can use already existing production facilities o much less expensive, just compare the price of a XScale with that of a

Pentium III at 1 GHz... o less power consumption, which again gives two advantages:

� much longer use of battery driven devices � no need for cooling of the device, which again gives to advantages:


� smaller design of the whole device � no noise

RISC processors are much simpler to program which doesn't only help the assembler programmer, but the compiler designer, too. You'll hardly find any compiler which uses all the functions of a Pentium III optimally

SUPER SCALAR

A superscalar CPU architecture implements a form of parallelism called instruction level parallelism within a single processor. It therefore allows faster CPU throughput than would otherwise be possible at a given clock rate. A superscalar processor executes more than one instruction during a clock cycle by simultaneously dispatching multiple instructions to redundant functional units on the processor. Each functional unit is not a separate CPU core but an execution resource within a single CPU such as an arithmetic logic unit, a bit shifter, or a multiplier.

While a superscalar CPU is typically also pipelined, pipelining and superscalar architecture are considered different performance enhancement techniques.

The superscalar technique is traditionally associated with several identifying characteristics (within a given CPU core):

• Instructions are issued from a sequential instruction stream • CPU hardware dynamically checks for data dependencies between instructions at

run time (versus software checking at compile time) • The CPU accepts multiple instructions per clock cycle

The simplest processors are scalar processors. Each instruction executed by a scalar processor typically manipulates one or two data items at a time. By contrast, each instruction executed by a vector processor operates simultaneously on many data items. An analogy is the difference between scalar and vector arithmetic. A superscalar processor is sort of a mixture of the two. Each instruction processes one data item, but there are multiple redundant functional units within each CPU thus multiple instructions can be processing separate data items concurrently.

Superscalar CPU design emphasizes improving the instruction dispatcher accuracy, and allowing it to keep the multiple functional units in use at all times. This has become increasingly important when the number of units increased. While early superscalar CPUs would have two ALUs and a single FPU, a modern design such as the PowerPC 970 includes four ALUs, two FPUs, and two SIMD units. If the dispatcher is ineffective at keeping all of these units fed with instructions, the performance of the system will suffer.


A superscalar processor usually sustains an execution rate in excess of one instruction per machine cycle. But merely processing multiple instructions concurrently does not make an architecture superscalar, since pipelined, multiprocessor or multi-core architectures also achieve that, but with different methods.

In a superscalar CPU the dispatcher reads instructions from memory and decides which ones can be run in parallel, dispatching them to redundant functional units contained inside a single CPU. Therefore a superscalar processor can be envisioned having multiple parallel pipelines, each of which is processing instructions simultaneously from a single instruction thread.

Existing binary executable programs have varying degrees of intrinsic parallelism. In some cases instructions are not dependent on each other and can be executed simultaneously. In other cases they are inter-dependent: one instruction impacts either resources or results of the other. The instructions a = b + c; d = e + f can be run in parallel because none of the results depend on other calculations. However, the instructions a = b + c; b = e + f might not be runnable in parallel, depending on the order in which the instructions complete while they move through the units.

When the number of simultaneously issued instructions increases, the cost of dependency checking increases extremely rapidly. This is exacerbated by the need to check dependencies at run time and at the CPU's clock rate. This cost includes additional logic gates required to implement the checks,


PIPELINE AND INSTRUCTION FLOW The integer instructions traverse a five stage pipeline in the Pentium processor The pipeline stages are as follows: PF Prefetch D1 Instruction Decode D2 Address Generate EX Execute - ALU and Cache Access WB Writeback The Pentium processor is a superscalar machine, built around two general purpose integer pipelines and a pipelined floating-point unit capable of executing two instructions in parallel. Both pipelines operate in parallel allowing integer instructions to execute in a single clock in each pipeline. Figure depicts instruction flow in the Pentium processor. The pipelines in the Pentium processor are called the “u” and “v” pipes and the process of issuing two instructions in parallel is termed “pairing.” The u-pipe can execute any instruction in the Intel architecture, while the v-pipe can execute “simple” instructions as defined in the “Instruction Pairing Rules” section of this chapter. When instructions are paired, the instruction issued to the v-pipe is always the next sequential instruction after the one issued to the u-pipe.

Pentium® Processor Pipeline Execution

The Pentium processor pipeline has been optimized to achieve higher throughput compared to previous generations of Intel Architecture processors. The first stage of the pipeline is the Prefetch (PF) stage in which instructions are prefetched from the on-chip instruction cache or memory. Because the Pentium processor has separate caches for instructions and data, prefetches do not conflict with data references for access to the cache. If the requested line is not in the code cache, a memory reference is made. In the PF stage of the Pentium processor, two independent pairs of line-size (32-byte) prefetch buffers operate in conjunction with the Branch Target Buffer . This allows one prefetch buffer to prefetch instructions sequentially, while the other prefetches according to the branch target buffer predictions. The pipeline stage after


the PF stage in the Pentium processor is Decode 1 (D1) in which two parallel decoders attempt to decode and issue the next two sequential instructions. The decoders determine whether one or two instructions can be issued contingent upon the “Instruction Pairing Rules.” The Pentium processor requires an extra D1 clock to decode instruction prefixes. Prefixes are issued to the u-pipe at the rate of one per clock without pairing. After all prefixes have been issued, the base instruction will then be issued and paired according to the pairing rules. The D1 stage is followed by Decode2 (D2) in which addresses of memory resident operands are calculated. In instructions containing both a displacement and an immediate, or instructions containing a base and index addressing mode , The Pentium processor removes both of these restrictions and is able to issue instructions in these categories in a single clock. The Pentium processor uses the Execute (EX) stage of the pipeline for both ALU operations and for data cache access; therefore those instructions specifying both an ALU operation and a data cache access will require more than one clock in this stage. In EX all u-pipe instructions and all v-pipe instructions except conditional branches are verified for correct branch prediction. Microcode is designed to utilize both pipelines and thus those instructions requiring microcode execute faster. The final stage is Writeback (WB) where instructions are enabled to modify processor state and complete execution. In this stage, v-pipe conditional branches are verified for correct branch prediction. During their progression through the pipeline, instructions may be stalled due to certain conditions. Both the u-pipe and v-pipe instructions enter and leave the D1 and D2 stages in unison. When an instruction in one pipe is stalled, then the instruction in the other pipe is also stalled at the same pipeline stage. Thus both the u-pipe and the v-pipe instructions enter the EX stage in unison. Once in EX if the u-pipe instruction is stalled, then the v-pipe instruction (if any) is also stalled. If the v-pipe instruction is stalled then the instruction paired with it in the u-pipe is not allowed to advance. No successive instructions are allowed to enter the EX stage of either pipeline until the instructions in both pipelines have advanced to WB. INSTRUCTION PREFETCH In the Pentium processor PF stage, two independent pairs of line-size (32-byte) prefetch buffers operate in conjunction with the branch target buffer. Only one prefetch buffer actively requests prefetches at any given time. Prefetches are requested sequentially until a branch instruction is fetched. When a branch instruction is fetched, the branch target buffer (BTB) predicts whether the branch will be taken or not. If the branch is predicted not taken, prefetch requests continue linearly. On a predicted taken branch the other prefetch buffer is enabled and begins to prefetch as though the branch was taken. If a branch is discovered mis-predicted, the instruction pipelines are flushed and prefetching activity starts over. Integer Instruction Pairing Rules The Pentium processor can issue one or two instructions every clock. In order to issue two instructions simultaneously they must satisfy the following conditions: • Both instructions in the pair must be “simple” as defined below


Simple instructions are entirely hardwired; they do not require any microcode control and, in general, execute in one clock. The exceptions are the ALU mem, reg and ALU reg, mem • There must be no read-after-write or write-after-write register dependencies between them • Neither instruction may contain both a displacement and an immediate • Instructions with prefixes can only occur in the u-pipe. • Instruction prefixes are treated as separate 1-byte instructions. Sequencing hardware is used to allow them to function as simple instructions. The following integer instructions are considered simple and may be paired: 1. mov reg, reg/mem/imm 2. mov mem, reg/imm 3. alu reg, reg/mem/imm 4. alu mem, reg/imm 5. inc reg/mem 6. dec reg/mem 7. push reg/mem 8. pop reg 9. lea reg,mem 10. jmp/call/jcc near 11. nop 12. test reg, reg/mem 13. test acc, imm In addition, conditional and unconditional branches may be paired only if they occur as the second instruction in the pair. They may not be paired with the next sequential instruction. Also, SHIFT/ROT by 1 and SHIFT by imm may pair as the first instruction in a pair. The register dependencies that prohibit instruction pairing include implicit dependencies via registers or flags not explicitly encoded in the instruction. For example, an ALU instruction in the u-pipe (which sets the flags) may not be paired with an ADC or an SBB instruction in the v-pipe. There are two exceptions to this rule. The first is the commonly occurring sequence of compare and branch which may be paired. The second exception is pairs of pushes or pops. Although these instructions have an implicit dependency on the stack pointer, special hardware is included to allow these common operations to proceed in parallel. Although in general two paired instructions may proceed in parallel independently, there is an exception for paired “read-modify-write” instructions. Read-modify-write instructions are ALU operations with an operand in memory. When two of these instructions are paired there is a sequencing delay of two clocks in addition to the three clocks required to execute the individual instructions. Although instructions may execute in parallel their behavior as seen by the programmer is exactly the same as if they were executed sequentially.


BRANCH PREDICTION Branch Target Buffer (BTB) The Pentium processor uses a Branch Target Buffer (BTB) to predict the outcome of branch instructions which minimizes pipeline stalls due to prefetch delays. The Pentium processor accesses the BTB with the address of the instruction in the D1 stage. It contains a Branch prediction state machine with four states: (1) strongly not taken, (2) weakly not taken, (3) weakly taken, and (4) strongly taken. In the event of a correct prediction, a branch will execute without pipeline stalls or flushes. Branches which miss the BTB are assumed to be not taken. Conditional and unconditional near branches and near calls execute in 1 clock and may be executed in parallel with other integer instructions. A mispredicted branch (whether a BTB hit or miss) or a correctly predicted branch with the wrong target address will cause the pipelines to be flushed and the correct target to be fetched. Incorrectly predicted unconditional branches will incur an additional three clock delay, incorrectly predicted conditional branches in the u-pipe will incur an additional three clock delay, and incorrectly predicted conditional branches in the v-pipe will incur an additional four clock delay. The benefits of branch prediction are illustrated in the following example. Consider the following loop from a benchmark program for computing prime numbers: for(k=i+prime;k<=SIZE;k+=prime) flags[k]=FALSE; A popular compiler generates the following assembly code: (prime is allocated to ecx, k is allocated to edx, and al contains the value FALSE) inner_loop: mov byte ptr flags[edx],al add edx,ecx cmp edx, SIZE jle inner_loop Each iteration of this loop will execute in 6 clocks on the Intel486 CPU. On the Pentium processor, the mov is paired with the add; the cmp with the jle. With branch prediction, each loop iteration executes in 2 clocks.

H: History P: Prediction T: Taken NT: Not Taken

H: 11 P: T

H: 10 P: T

H: 00 P: NT

H: 01 P: T

T

NTT

T

T

T

NTT

NTT

NTT


CACHE

ON-CHIP CACHES The Pentium processor implements two internal caches for a total integrated cache size of 16 Kbytes: an 8 Kbyte data cache and a separate 8 Kbyte code cache. These caches are transparent to application software to maintain compatibility with previous The data cache fully supports the MESI (modified/exclusive/shared/invalid) writeback cache consistency protocol. The code cache is inherently write protected to prevent code from being inadvertently corrupted, and as a consequence supports a subset of the MESI protocol, the S (shared) and I (invalid) states. The caches have been designed for maximum flexibility and performance. The data cache is configurable as writeback or writethrough on a line-by-line basis. Memory areas can be defined as non-cacheable by software and external hardware. Cache writeback and invalidations can be initiated by hardware or software. Protocols for cache consistency and line replacement are implemented in hardware, easing system devise On the Pentium processor , each of the caches are 8 Kbytes in size and each is organized as a 2-way set associative cache. There are 128 sets in each cache, each set containing 2 lines (each line has its own tag address). Each cache line is 32 bytes wide. The In the Pentium processor , replacement in both the data and instruction caches is handled by the LRU mechanism which requires one bit per set in each of the caches. Cache Structure The instruction and data caches can be accessed simultaneously. The instruction cache can provide up to 32 bytes of raw opcodes and the data cache can provide data for two data references all in the same clock. This capability is implemented partially through the tag structure. The tags in the data cache are triple ported. One of the ports is dedicated to snooping while the other two are used to lookup two independent addresses corresponding to data references from each of the pipelines. The instruction cache tags of the Pentium processor are also triple ported. Again, one port is dedicated to support snooping and the other two ports facilitate split line accesses (simultaneously accessing upper half of one line and lower half of the next line. Each of the caches are parity protected. The operating modes of the caches are controlled by the CD (cache disable) and NW (not writethrough) bits in CR0. TLB (Translation lookaside Buffers). Each of the caches are accessed with physical addresses and each cache has its own TLB (translation lookaside buffer) to translate linear addresses to physical addresses. The TLBs associated with the instruction cache are single ported whereas the data cache TLBs are fully dual ported to be able to translate two independent linear addresses for two data references simultaneously.


The goal of an effective memory system is that the effective access time that the processor sees is very close to to, the access time of the cache. Most accesses that the processor makes to the cache are contained within this level. The achievement of this goal depends on many factors: the architecture of the processor, the behavioral properties of the programs being executed, and the size and organization of the cache. Caches work on the basis of the locality of program behavior. There are three principles involved:

1. Spatial Locality - Given an access to a particular location in memory, there is a high probability that other accesses will be made to either that or neighboring locations within the lifetime of the program.

2. Temporal Locality - This is complementary to spatial locality. Given a sequence of references to n locations, there is a high probability that references following this sequence will be made into the sequence. Elements of the sequence will again be referenced during the lifetime of the program.

3. Sequentiality- Given that a reference has been made to a particular location s it is likely that within the next several references a reference to the location of s + 1 will be made. Sequentiality is a restricted type of spatial locality and can be regarded as a subset of it.

Some common terms

Processor reference that are found in the cache are called cache hits. References not found in the cache are called cache misses. On a cache miss, the cache control mechanism must fetch the missing data from memory and place it in the cache. Usually the cache fetches a spatial locality called the line from memory. The physical word is the basic unit of access in the memory. The processor-cache interface can be characterized by a number of parameters. Those that directly affect processor performance include:

1. Access time for a reference found in the cache (a hit) - property of the cache size and organization.

2. Access time for a reference not found in the cache (a miss) - property of the memory organization.

3. Time to initially compute a real address given a virtual address (not-in-TLB-time) - property of the address translation facility, which, though strictly speaking, is not part of the cache, resembles the cache in most aspects and is discussed in this chapter.

Data Cache Consistency Protocol (MESI Protocol) The Pentium processor Cache Consistency Protocol is a set of rules by which states are


assigned to cached entries (lines). The rules apply for memory read/write cycles only. I/O and special cycles are not run through the data cache. Every line in the Pentium processor data cache is assigned a state dependent on both Pentium processor generated activities and activities generated by other bus masters (snooping). The Pentium processor Data Cache Protocol consists of four states that define whether a line is valid (HIT/MISS), if it is available in other caches, and if it has been MODIFIED. The four states are the M (Modified), E (Exclusive), S (Shared) and the I (Invalid) states and the protocol is referred to as the MESI protocol. A definition of the states is given below: M - Modified: An M-state line is available in ONLY one cache and it is also MODIFIED (different from main memory). An M-state line can be accessed (read/written to) without sending a cycle out on the bus. E - Exclusive: An E-state line is also available in ONLY one cache in the system, but the line is not MODIFIED (i.e., it is the same as main memory). An E-state line can be accessed (read/written to) without generating a bus cycle. A write to an E-state line will cause the line to become MODIFIED. S - Shared: This state indicates that the line is potentially shared with other caches (i.e. the same line may exist in more than one cache). A read to an S-state line will not generate bus activity, but a write to a SHARED line will generate a write through cycle on the bus. The write through cycle may invalidate this line in other caches. A write to an S-state line will update the cache. I - Invalid: This state indicates that the line is not available in the cache. A read to this line will be a MISS and may cause the Pentium processor to execute a LINE FILL (fetch the whole line into the cache from main memory). A write to an INVALID line will cause the Pentium processor to execute a write-through cycle on the bus. Inquire Cycles (Snooping) The purpose of inquire cycles is to check whether the address being presented is contained within the caches in the Pentium processor.

----------------------------------------------------------------------------------------------


Cache Organization

Within the cache, there are three basic types of organization:

1. Direct Mapped 2. Fully Associative 3. Set Associative

In fully associative mapping, when a request is made to the cache, the requested address is compared in a directory against all entries in the directory. If the requested address is found (a directory hit), the corresponding location in the cache is fetched and returned to the processor; otherwise, a miss occurs.


Fully Associative Cache

In a direct mapped cache, lower order line address bits are used to access the directory. Since multiple line addresses map into the same location in the cache directory, the upper line address bits (tag bits) must be compared with the directory address to ensure a hit. If a comparison is not valid, the result is a cache miss, or simply a miss. The address given to the cache by the processor actually is subdivided into several pieces, each of which has a different role in accessing data.


Direct Mapped Cache

The set associative cache operates in a fashion somewhat similar to the direct-mapped cache. Bits from the line address are used to address a cache directory. However, now there are multiple choices: two, four, or more complete line addresses may be present in the directory. Each of these line addresses corresponds to a location in a sub-cache. The collection of these sub-caches forms the total cache array. In a set associative cache, as in the direct-maped cache, all of these sub-arrays can be accessed simultaneously, together with the cache directory. If any of the entries in the cache directory match the reference address, and there is a hit, the particular sub-cache array is selected and out gated back to the processor.

Set Associative Cache


Cache Calculation

Line Size = 16 = 42 Byte / Block = 4 Total Number of address lines to address main memory = 16 Kb = 142 Total number of lines in Cache = 512 = 92 Set or Ways = 2

= 2

512 = 82

Line or Set Size = 4

8

2

2 = 42 Line /Set Size = 4

Tag Size =

set cachein

lines ofNumber Total

memory main in linesNumber Total

= 4

10

2

2 = 62 Tag size = 6

)/(2*)/(2*)(2)(2 44614 ByteBlockSetLineTagTotal =

Byte/Block Line / Set Tag

Cache 512 bytes 16 Bytes / line 2 Sets

Main Memory 16Kb

Lines42Lines102

16 bytes / line


THE X87 FPU FLOATING-POINT UNIT The floating-point unit (FPU) of the Pentium processor is integrated with the integer unit on the first five stages of the U pipe line The fifth stage FB becomes X1. It is heavily pipelined. The FPU is designed to be able to accept one floating point .operation every clock. It can receive up to two floating-point instructions every clock, one of which must be an exchange instruction. Floating-Point Pipeline Stages The Pentium processor FPU has 8 pipeline stages, the first five of which it shares with the integer unit. Integer instructions pass through only the first 5 stages. Integer instructions use the fifth (X1) stage as a WB (write-back) stage. The 8 FP pipeline stages, and the activities that are performed in them are summarized below: PF Prefetch; D1 Instruction Decode; D2 Address generation; EX Memory and register read; conversion of FP data to external memory format and memory write; X1 Floating-Point Execute stage one; conversion of external memory format to internal FP data format and write operand to FP register file; bypass 1 (bypass 1 described in the “Bypasses” section). X2 Floating-Point Execute stage two; WF Perform rounding and write floating-point result to register file; bypass 2 (bypass 2 described in the “Bypasses” section). ER Error Reporting/Update Status Word. FPU Bypasses The Pentium processor stack architecture instruction set requires that all instructions have one source operand on the top of the stack. Since most instructions also have their destination as the top of the stack, most instructions see a “top of stack bottleneck.” New source operands must be brought to the top of the stack before we can issue an arithmetic instruction on them. This calls for extra usage of the exchange instruction, which allows the programmer to bring an available operand to the top of the stack. The following section describes the floating-point register file bypasses that exist on the Pentium processor. The register file has two write ports and two read ports. The read ports are used to read data out of the register file in the E stage. One write port is used to write data into the register file in the X1 stage, and the other in the WF stage. A bypass allows data that is about to be written into the register file to be available as an operand that is to be read from the register file by any succeeding floating-point instruction. A bypass is specified by a pair of ports (a write port and a read port) that get circumvented. Using the bypass, data is made available even before actually writing it to the register file.


The following procedures are implemented: 1. Bypass the X1 stage register file write port and the E stage register file read port. 2. Bypass the WF stage register file write port and the E stage register file read port. With bypass 1, the result of a floating-point load (that writes to the register file in the X1 stage) can bypass the X1 stage write and be sent directly to the operand fetch stage or E stage of the next instruction. With bypass 2, the result of any arithmetic operation can bypass the WF stage write to the register file, and be sent directly to the desired execution unit as an operand for the next instruction.

PROGRAMMING WITH THE x87 FPU The x87 Floating-Point Unit (FPU) provides high-performance floating-point processing capabilities for use in graphics processing, scientific, engineering, and business applications. It supports the floating-point, integer, and packed BCD integer data types and the floating-point processing algorithms and exception handling architecture defined in the IEEE Standard 754 for Binary Floating-Point Arithmetic. X87 FPU EXECUTION ENVIRONMENT The x87 FPU represents a separate execution environment within the IA-32. This execution environment consists of eight data registers (called the x87 FPU data registers) and the following special-purpose registers: • Status register • Control register • Tag word register • Last instruction pointer register • Last data (operand) pointer register • Opcode register These registers are described in the following sections. x87 FPU Data Registers The x87 FPU data registers consist of eight 80-bit registers. Values are stored in these registers in the double extended-precision floating-point format. When floating-point, integer, or packed BCD integer values are loaded from memory into any of the x87 FPU data registers, the values are automatically converted into double extended precision floating-point format (if they are not already in that format). When computation results are subsequently transferred back into memory from any of the x87 FPU registers, the results can be left in the double extended-precision floating-point format or converted back into a shorter floating-point format, an integer format, or the packed BCD integer format.


x87 FPU Execution Environment

The x87 FPU instructions treat the eight x87 FPU data registers as a register stack .All addressing of the data registers is relative to the register on the top of the stack. The register number of the current top-of-stack register is stored in the TOP (stack TOP) field in the x87 FPU status word. Load operations decrement TOP by one and load a value into the new top of- stack register, and store operations store the value from the current TOP register in memory and then increment TOP by one. (For the x87 FPU, a load operation is equivalent to a push and a store operation is equivalent to a pop.) Note that load and store operations are also available that do not push and pop the stack.

x87 FPU Data Register Stack


If a load operation is performed when TOP is at 0, register wraparound occurs and the new value of TOP is set to 7. The floating-point stack-overflow exception indicates when wraparound might cause an unsaved value to be overwritten . Many floating-point instructions have several addressing modes that permit the programmer to implicitly operate on the top of the stack, or to explicitly operate on specific registers relative to the TOP. Assemblers support these register addressing modes, using the expression ST(0), or simply ST, to represent the current stack top and ST(i) to specify the ith register from TOP in the stack (0 ≤ i ≤ 7). For example, if TOP contains 011B (register 3 is the top of the stack), the following instruction would add the contents of two registers in the stack (registers 3 and 5): FADD ST, ST(2); Figure shows an example of how the stack structure of the x87 FPU registers and instructions are typically used to perform a series of computations. Here, a two-dimensional dot product is computed, as follows: 1. The first instruction (FLD value1) decrements the stack register pointer (TOP) and loads the value 5.6 from memory into ST(0). The result of this operation is shown in snapshot (a). 2. The second instruction multiplies the value in ST(0) by the value 2.4 from memory and stores the result in ST(0), shown in snap-shot (b). 3. The third instruction decrements TOP and loads the value 3.8 in ST(0). 4. The fourth instruction multiplies the value in ST(0) by the value 10.3 from memory and stores the result in ST(0), shown in snap-shot (c). 5. The fifth instruction adds the value and the value in ST(1) and stores the result in ST(0), shown in snap-shot (d).

Example x87 FPU Dot Product Computation


MICROPROCESSOR INITIALIZATION AND CONFIGURATION Before normal operation of the Pentium processor can begin, the Pentium processor must be initialized by driving the RESET pin active. The RESET pin forces the Pentium processor to begin execution in a known state. Several features are optionally invoked at the falling edge of RESET: Built-in-Self-Test (BIST), Functional Redundancy Checking and Tristate Test Mode. In addition to the standard RESET pin, the Pentium processor has implemented an initialization pin (INIT) that allows the processor to begin execution in a known state without disrupting the contents of the internal caches or the floating-point state. POWER UP SPECIFICATIONS During power up, RESET must be asserted while VCC is approaching nominal operating voltage to prevent internal bus contention which could negatively affect the reliability of the processor. It is recommended that CLK begin toggling within 150 ms after VCC reaches its proper operating level. This recommendation is only to ensure long term reliability of the device. In order for RESET to be recognized, the CLK input needs to be toggling. RESET must remain asserted for 1 millisecond after VCC and CLK have reached their AC/DC specifications. TEST AND CONFIGURATION FEATURES (BIST, FRC, TRISTATE TEST MODE) The INIT, FLUSH#, and FRCMC# inputs are sampled when RESET transitions from high to low to determine if BIST will be run, or if tristate test mode or checker mode will be entered (respectively). If RESET is driven synchronously, these signals must be at their valid level and meet setup and hold times on the clock before the falling edge of RESET. If RESET is asserted asynchronously, these signals must be at their valid level two clocks before and after RESET transitions from high to low. Built In Self-Test Self-test is initiated by driving the INIT pin high when RESET transitions from high to low. No bus cycles are run by the Pentium processor during self test. The duration of self test is approximately 219 core clocks. Approximately 70% of the devices in the Pentium processor are tested by BIST. The Pentium processor BIST consists of two parts: hardware self-test and microcode self-test. During the hardware portion of BIST, the microcode ROM and all large PLAs are tested. All possible input combinations of the microcode ROM and PLAs are tested. The constant ROMs, BTB, TLBs, and all caches are tested by the microcode portion of BIST. The array tests (caches, TLBs and BTB) have two passes. On the first pass, data patterns are written to arrays, read back and checked for mismatches. The second pass writes the complement of the initial data pattern, reads it back, and checks for mismatches. The constant ROMs are tested by using the microcode to add various constants and check the result against a stored value.


Upon successful completion of BIST, the cumulative result of all tests are stored in the EAX register. If EAX contains 0h, then all checks passed; any non-zero result indicates a faulty unit Tristate Test Mode When the FLUSH# pin is sampled low when RESET transitions from high to low, the Pentium processor enters tristate test mode. The Pentium processor floats all of its output pins and bidirectional pins including pins which are never floated during normal operation (except TDO). Tristate test mode can be initiated in order to facilitate testing by external circuitry to test board interconnects. The Pentium processor remains in tristate test mode until the RESET pin is asserted again. Functional Redundancy Checking The functional redundancy checking master/checker configuration input is sampled when RESET is high to determine whether the Pentium processor is configured in master mode (FRCMC# high) or checker mode (FRCMC# low). The final master/checker configuration of the Pentium processor is determined the clock before the falling edge of RESET. When configured as a master, the Pentium processor drives its output pins as required by the bus protocol. When configured as a checker, the Pentium processor tristates all outputs (except IERR#, PICD0, PICD1 and TDO) and samples the output pins (that would normally be driven in master mode). If the sampled value differs from the value computed internally, the Pentium processor asserts IERR# to indicate an error. INITIALIZATION WITH RESET, INIT AND BIST Two pins, RESET and INIT, are used to reset the Pentium processor in different manners. A “cold” or “power on” RESET refers to the assertion of RESET while power is initially being applied to the Pentium processor. A “warm” RESET refers to the assertion of RESET or INIT while VCC and CLK remain within specified operating limits. Table 3-1 shows the effect of asserting RESET and/or INIT.

Toggling either the RESET pin or the INIT pin individually forces the Pentium processor to begin execution at address FFFFFFF0h. The internal instruction cache and data cache are invalidated when RESET is asserted (modified lines in the data cache are NOT written back). The instruction cache and data cache are not altered when the INIT pin is asserted without RESET. In both cases, the branch target buffer (BTB) and translation lookaside buffers (TLBs) are invalidated. After RESET (with or without BIST) or INIT, the Pentium processor will start executing instructions at location FFFFFFF0H. When the first Intersegment Jump or Call instruction is executed, address lines A20-A31 will be driven low for CS-relative memory cycles and the Pentium processor will only execute


instructions in the lower one Mbyte of physical memory. This allows the system designer to use a ROM at the top of physical memory to initialize the system. RESET is internally hardwired and forces the Pentium processor to terminate all execution and bus cycle activity within 2 clocks. No instruction or bus activity will occur as long as RESET is active. INIT is implemented as an edge triggered interrupt and will be recognized when an instruction boundary is reached. As soon as the Pentium processor completes the INIT sequence, instruction execution and bus cycle activity will continue at address FFFFFFF0h even if the INIT pin is not deasserted. At the conclusion of RESET (with or without self-test) or INIT, the DX register will contain a component identifier. The upper byte will contain 05h and the lower byte will contain a stepping identifier.


BUS CYCLES

The Pentium processor bus is designed to support a 528-Mbyte/sec data transfer rate at 66 MHz. All data transfers occur as a result of one or more bus cycles. PHYSICAL MEMORY AND I/O INTERFACE Pentium processor memory is accessible in 8-, 16-, 32-, and 64-bit quantities. Pentium processor I/O is accessible in 8-, 16-, and 32-bit quantities. The Pentium processor can directly address up to 4 Gbytes of physical memory, and up to 64 Kbytes of I/O. In hardware, memory space is organized as a sequence of 64-bit quantities. Each 64-bit location has eight individually addressable bytes at consecutive memory addresses

Memory Organization

The I/O space is organized as a sequence of 32-bit quantities. Each 32-bit quantity has four individually addressable bytes at consecutive memory addresses. See Figure for a conceptual diagram of the I/O space.

I/O Space Organization


Sixty-four-bit memories are organized as arrays of physical quadwords (8-byte words). Physical quadwords begin at addresses evenly divisible by 8. The quadwords are addressable by physical address lines A31-A3. Thirty-two-bit memories are organized as arrays of physical dwords (4-byte words). Physical dwords begin at addresses evenly divisible by 4. The dwords are addressable by physical address lines A31-A3 and A2. A2 can be decoded from the byte enables . Sixteen-bit memories are organized as arrays of physical words (2-byte words). Physical words begin at addresses evenly divisible by 2. DATA TRANSFER MECHANISM All data transfers occur as a result of one or more bus cycles. Logical data operands of byte, word, dword, and quadword lengths may be transferred. Data may be accessed at any byte boundary, but two cycles may be required for misaligned data transfers. The Pentium processor considers a 2-byte or 4-byte operand that crosses a 4-byte boundary to be misaligned. In addition, an 8-byte operand that crosses an 8-byte boundary is misaligned. The Pentium processor address signals are split into two components. High-order address bits are provided by the address lines A31-A3. The byte enables BE7#- BE0# form the low-order address and selects the appropriate byte of the 8-byte data bus. For both memory and I/O accesses, the byte enable outputs indicate which of the associated data bus bytes are driven valid for write cycles and on which bytes data is expected back for read cycles. Non-contiguous byte enable patterns will never occur.

Generating A2-A0 from BE7-0#

Interfacing With 8-, 16-, 32-, and 64-Bit Memories In 64-bit physical memories such as, each 8-byte quadword begins at a byte address that is a multiple of eight. A31-A3 are used as an 8-byte quadword select and BE7#-BE0# select individual bytes within the word.


Pentium® Processor with 64-Bit Memory

The Figure shows the Pentium processor data bus interface to 32-, 16- and 8-bit wide memories. External byte swapping logic is needed on the data lines so that data is supplied to and received from the Pentium processor on the correct data pins see Table. For memory widths smaller than 64 bits, byte assembly logic is needed to return all bytes of data requested by the Pentium processor in one cycle.

Addressing 32-, 16- and 8-Bit Memories


Data Bus Interface to 32-, 16- and 8-Bit Memories

Operand alignment and size dictate when two cycles are required for a data transfer.


BUS STATE DEFINITION This section describes the Pentium processor bus states in detail. See Figure for the bus state diagram. Ti: This is the bus idle state. In this state, no bus cycles are being run. The Pentium processor may or may not be driving the address and status pins, depending on the state of the HLDA,AHOLD, and BOFF# inputs. An asserted BOFF# or RESET will always force the state machine back to this state. HLDA will only be driven in this state. T1: This is the first clock of a bus cycle. Valid address and status are driven out and ADS# is asserted. There is one outstanding bus cycle. T2: This is the second and subsequent clock of the first outstanding bus cycle. In state T2, data is driven out (if the cycle is a write), or data is expected (if the cycle is a read), and the BRDY# pin is sampled. There is one outstanding bus cycle. T12: This state indicates there are two outstanding bus cycles, and that the Pentium processor is starting the second bus cycle at the same time that data is being transferred for the first. In T12, the Pentium processor drives the address and status and asserts ADS# for the second outstanding bus cycle, while data is transferred and BRDY# is sampled for the first outstanding cycle. T2P: This state indicates there are two outstanding bus cycles, and that both are in their second and subsequent clocks. In T2P, data is being transferred and BRDY# is sampled for the first outstanding cycle. The address, status and ADS# for the second outstanding cycle were driven sometime in the past (in state T12). TD: This state indicates there is one outstanding bus cycle, that its address, status and ADS# have already been driven sometime in the past (in state T12), and that the data and BRDY# pins are not being sampled because the data bus requires one dead clock to turn around between consecutive reads and writes, or writes and reads. The Pentium processor enters TD if in the previous clock there were two outstanding cycles, the last BRDY# was returned, and a dead clock is needed. The timing diagrams in the next section give examples when a dead clock is needed. Table gives a brief summary of bus activity during each bus state. Figure shows the Pentium processor bus state diagram.

Pentium® Processor Bus Activity


Pentium® Processor Bus Control State Machine


BUS CYCLES The Pentium processor requests data transfer cycles, bus cycles, and bus operations. A data transfer cycle is one data item, up to 8 bytes in width, being returned to the Pentium processor or accepted from the Pentium processor with BRDY# asserted. A bus cycle begins with the Pentium processor driving an address and status and asserting ADS#, and ends when the last BRDY# is returned. A bus cycle may have 1 or 4 data transfers. A burst cycle is a bus cycle with 4 data transfers. A bus operation is a sequence of bus cycles to carry out a specific function, such as a locked read-modify-write or an interrupt acknowledge. Single-Transfer Cycle The Pentium processor supports a number of different types of bus cycles. The simplest type of bus cycle is a single-transfer non-cacheable 64-bit cycle, either with or without wait states. Non-pipelined read and write cycles with 0 wait states are shown in Figure

Non Pipelined Read or Write


The Pentium processor initiates a cycle by asserting the address status signal (ADS#) in the first clock. The clock in which ADS# is asserted is by definition the first clock in the bus cycle. The ADS# output indicates that a valid bus cycle definition and address is available on the cycle definition pins and the address bus. The CACHE# output is deasserted (high) to indicate that the cycle will be a single transfer cycle. For a zero wait state transfer, BRDY# is returned by the external system in the second clock of the bus cycle. BRDY# indicates that the external system has presented valid data on the data pins in response to a read or the external system has accepted data in response to a write. The Pentium processor samples the BRDY# input in the second and subsequent clocks of a bus Cycle If the system is not ready to drive or accept data, wait states can be added to these cycles by not returning BRDY# to the processor at the end of the second clock. Cycles of this type, with one and two wait states added are shown in Figure .Note that BRDY# must be driven inactive at the end of the second clock. Burst Cycles For bus cycles that require more than a single data transfer (cacheable cycles and writeback cycles), the Pentium processor uses the burst data transfer. In burst transfers, a new data item can be sampled or driven by the Pentium processor in consecutive clocks. In addition the addresses of the data items in burst cycles all fall within the same 32-byte aligned area (corresponding to an internal Pentium processor cache line). The implementation of burst cycles is via the BRDY# pin. While running a bus cycle of more than one data transfer, the Pentium processor requires that the memory system perform a burst transfer and follow the burst order see Table. Given the first address in the burst sequence, the address of subsequent transfers must be calculated by external hardware. This requirement exists because the Pentium processor address and byte-enables are asserted for the first transfer and are not re-driven for each transfer. The burst sequence is optimized for two bank memory subsystems and is shown in Table

Pentium Processor Burst Order


BURST READ CYCLES When initiating any read, the Pentium processor will present the address and byte enables for the data item requested. When the cycle is converted into a cache linefill, the first data item returned should correspond to the address sent out by the Pentium processor; however, the byte enables should be ignored, and valid data must be returned on all 64 data lines. In addition, the address of the subsequent transfers in the burst sequence must be calculated by external hardware since the address and byte enables are not re-driven for each transfer. Figure shows a cacheable burst read cycle. Note that in this case the initial cycle generated by the Pentium processor might have been satisfied by a single data transfer, but was transformed into a multiple-transfer cache fill by KEN# being returned active on the clock that the first BRDY# is returned. In this case KEN# has such an effect because the cycle is internally cacheable in the Pentium processor (CACHE# pin is driven active). KEN# is only sampled once during a cycle to determine cacheability.

Basic Burst Read Cycle


BURST WRITE CYCLES Figure shows the timing diagram of basic burst write cycle. KEN# is ignored in burst write cycle. If the CACHE# pin is active (low) during a write cycle, it indicates that the cycle will be a burst writeback cycle. Burst write cycles are always writebacks of modified lines in the data cache. Writeback cycles have several causes: 1. Writeback due to replacement of a modified line in the data cache. 2. Writeback due to an inquire cycle that hits a modified line in the data cache. 3. Writeback due to an internal snoop that hits a modified line in the data cache. 4. Writebacks caused by asserting the FLUSH# pin. 5. Writebacks caused by executing the WBINVD instruction. The only write cycles that are burstable by the Pentium processor are writeback cycles. All other write cycles will be 64 bits or less, single transfer bus cycles.

Basic Burst Write Cycle

For writeback cycles, the lower five bits of the first burst address always starts at zero; therefore, the burst order becomes 0, 8h, 10h, and 18h. Again, note that the address of the subsequent transfers in the burst sequence must be calculated by external hardware since the Pentium processor does not drive the address and byte enables for each transfer.


Locked Operations The Pentium processor architecture provides a facility to perform atomic accesses of memory. For example, a programmer can change the contents of a memory-based variable and be assured that the variable was not accessed by another bus master between the read of the variable and the update of that variable. This functionality is provided for select instructions using a LOCK prefix, and also for instructions which implicitly perform locked read modify write cycles such as the XCHG (exchange) instruction when one of its operands is memory based. Locked cycles are also generated when a segment descriptor or page table entry is updated and during interrupt acknowledge cycles. In hardware, the LOCK functionality is implemented through the LOCK# pin, which indicates to the outside world that the Pentium processor is performing a read-modify-write sequence of cycles, and that the Pentium processor should be allowed atomic access for the location that was accessed with the first locked cycle. Locked operations begin with a read cycle and end with a write cycle. Note that the data width read is not necessarily the data width written. For example, for descriptor access bit updates the Pentium processor fetches eight bytes and writes one byte. A locked operation is a combination of one or multiple read cycles followed by one or multiple write cycles. Programmer generated locked cycles and locked page table / directory accesses are treated differently and are described in the following sections. Snooping (Inquire) When operating in an MP system, IA-32 processors (beginning with the Intel486 processor) have the ability to snoop other processor’s accesses to system memory and to their internal caches. They use this snooping ability to keep their internal caches consistent both with system memory and with the caches in other processors on the bus. For example, in the Pentium and P6 family processors, if through snooping one processor detects that another processor intends to write to a memory location that it currently has cached in shared state, the snooping processor will invalidate its cache line forcing it to perform a cache line fill the next time it accesses the same memory location. .


REGISTER SET

Alternate General Purpose Register Names


• I/O ports — The IA-32 architecture supports a transfers of data to and from input/output (I/O) ports. • Control registers — The five control registers (CR0 through CR4) determine the operating mode of the processor and the characteristics of the currently executing task. • Memory management registers — The GDTR, IDTR, task register, and LDTR specify the locations of data structures used in protected mode memory management. • Debug registers — The debug registers (DR0 through DR7) control and allow monitoring of the processor’s debugging operations. BASIC PROGRAM EXECUTION REGISTERS The processor provides 16 basic program execution registers for use in general system and application programming (see Figure ). These registers can be grouped as follows: • General-purpose registers. These eight registers are available for storing operands and pointers. • Segment registers. These registers hold up to six segment selectors. • EFLAGS (program status and control) register. The EFLAGS register report on the status of the program being executed and allows limited (application-program level) control of the processor. • EIP (instruction pointer) register. The EIP register contains a 32-bit pointer to the next instruction to be executed. • EAX — Accumulator for operands and results data • EBX — Pointer to data in the DS segment • ECX — Counter for string and loop operations • EDX — I/O pointer • ESI — Pointer to data in the segment pointed to by the DS register; source pointer for string operations • EDI — Pointer to data (or destination) in the segment pointed to by the ES register; destination pointer for string operations • ESP — Stack pointer (in the SS segment) • EBP — Pointer to data on the stack (in the SS segment) As shown in Figure 3-5, the lower 16 bits of the general-purpose registers map directly to the register set found in the 8086 and Intel 286 processors and can be referenced with the names AX, BX, CX, DX, BP, SI, DI, and SP. Each of the lower two bytes of the EAX, EBX, ECX, and EDX registers can be referenced by the names AH, BH, CH, and DH (high bytes) and AL, BL, CL, and DL (low bytes). DATA TYPES

This chapter introduces data types defined for the IA-32 architecture. FUNDAMENTAL DATA TYPES The fundamental data types of IA-32 architecture are bytes, words, doublewords, quadwords, and double quadwords (see Figure ). A byte is eight bits, a word is 2 bytes


(16 bits), a doubleword is 4 bytes (32 bits), a quadword is 8 bytes (64 bits), and a double quadword is 16 bytes (128 bits). A subset of the IA-32 architecture instructions operates on these fundamental data types without any additional operand typing.

Figure shows the byte order of each of the fundamental data types when referenced as operands in memory. The low byte (bits 0 through 7) of each data type occupies the lowest address in memory and that address is also the address of the operand.

Bytes, Words, Doublewords, Quadwords, and Double Quadwords in Memory


Alignment Words, Doublewords, Quadwords, and Double Quadwords Words, doublewords, and quadwords do not need to be aligned in memory on natural boundaries. The natural boundaries for words, double words, and quadwords are even-numbered addresses, addresses evenly divisible by four, and addresses evenly divisible by eight, respectively. However, to improve the performance of programs, data structures (especially stacks) should be aligned on natural boundaries whenever possible. The reason for this is that the processor requires two memory accesses to make an unaligned memory access; aligned accesses require only one memory access. A word or doubleword operand that crosses a 4-byte boundary or a quadword operand that crosses an 8-byte boundary is considered unaligned and requires two separate memory bus cycles for access. Some instructions that operate on double quadwords require memory operands to be aligned on a natural boundary. These instructions generate a general-protection exception (#GP) if an unaligned operand is specified. A natural boundary for a double quadword is any address evenly divisible by 16. Other instructions that operate on double quadwords permit unaligned access (without generating a general-protection exception). However, additional memory bus cycles are required to access unaligned data from memory. NUMERIC DATA TYPES Although bytes, words, and doublewords are the fundamental data types of the IA-32 architecture, some instructions support additional interpretations of these data types to allow operations to be performed on numeric data types (signed and unsigned integers, and floating-point numbers). See Figure


Numeric Data Types

OPERAND ADDRESSING IA-32 machine-instructions act on zero or more operands. Some operands are specified explicitly and others are implicit. The data for a source operand can be located in: • the instruction itself (an immediate operand) • a register • a memory location • an I/O port When an instruction returns data to a destination operand, it can be returned to: • a register • a memory location • an I/O port Immediate Operands Some instructions use data encoded in the instruction itself as a source operand. These operands are called immediate operands (or simply immediates). For example, the following ADD instruction adds an immediate value of 14 to the contents of the EAX register: ADD EAX, 14


All arithmetic instructions (except the DIV and IDIV instructions) allow the source operand to be an immediate value. The maximum value allowed for an immediate operand varies among instructions, but can never be greater than the maximum value of an unsigned doubleword integer (232). Register Operands Source and destination operands can be any of the following registers, depending on the instruction being executed: • 32-bit general-purpose registers (EAX, EBX, ECX, EDX, ESI, EDI, ESP, or EBP) • 16-bit general-purpose registers (AX, BX, CX, DX, SI, DI, SP, or BP) • 8-bit general-purpose registers (AH, BH, CH, DH, AL, BL, CL, or DL) • segment registers (CS, DS, SS, ES, FS, and GS) • EFLAGS register • x87 FPU registers (ST0 through ST7, status word, control word, tag word, data operand

pointer, and instruction pointer)

Some instructions (such as the DIV and MUL instructions) use quadword operands contained in a pair of 32-bit registers. Register pairs are represented with a colon separating them. For example, in the register pair EDX:EAX, EDX contains the high order bits and EAX contains the low order bits of a quadword operand. Several instructions (such as the PUSHFD and POPFD instructions) are provided to load and store the contents of the EFLAGS register or to set or clear individual flags in this register. Other instructions (such as the Jcc instructions) use the state of the status flags in the EFLAGS register as condition codes for branching or other decision making operations. The processor contains a selection of system registers that are used to control memory management, interrupt and exception handling, task management, processor management, and debugging activities. Some of these system registers are accessible by an application program, the operating system, or the executive through a set of system instructions. When accessing a system register with a system instruction, the register is generally an implied operand of the instruction. Memory Operands Source and destination operands in memory are referenced by means of a segment selector and an offset (see Figure). Segment selectors specify the segment containing the operand. Offsets specify the linear or effective address of the operand. Offsets can be 32 bits (represented by the notation m16:32) or 16 bits (represented by the notation m16:16).

Memory Operand Address

Specifying a Segment Selector The segment selector can be specified either implicitly or explicitly. The most common method of specifying a segment selector is to load it in a segment register and then allow


the processor to select the register implicitly, depending on the type of operation being performed. The processor automatically chooses a segment according to the rules given in Table When storing data in memory or loading data from memory, the DS segment default can be overridden to allow other segments to be accessed. Within an assembler, the segment override is generally handled with a colon “:” operator. For example, the following MOV instruction moves a value from register EAX into the segment pointed to by the ES register. The offset into the segment is contained in the EBX register: MOV ES:[EBX], EAX;

Default Segment Selection Rules

At the machine level, a segment override is specified with a segment-override prefix, which is a byte placed at the beginning of an instruction. The following default segment selections cannot be overridden: • Instruction fetches must be made from the code segment. • Destination strings in string instructions must be stored in the data segment pointed to by

the ES register. • Push and pop operations must always reference the SS segment. Some instructions require a segment selector to be specified explicitly. In these cases, the 16-bit segment selector can be located in a memory location or in a 16-bit register. For example, the following MOV instruction moves a segment selector located in register BX into segment register DS: MOV DS, BX Segment selectors can also be specified explicitly as part of a 48-bit far pointer in memory. Here, the first doubleword in memory contains the offset and the next word contains the segment selector. Specifying an Offset The offset part of a memory address can be specified directly as a static value (called a displacement) or through an address computation made up of one or more of the following components: • Displacement — An 8-, 16-, or 32-bit value. • Base — The value in a general-purpose register. • Index — The value in a general-purpose register. • Scale factor — A value of 2, 4, or 8 that is multiplied by the index value.


The offset which results from adding these components is called an effective address. Each of these components can have either a positive or negative (2s complement) value, with the exception of the scaling factor. Figure 3-11 shows all the possible ways that these components can be combined to create an effective address in the selected segment.

Offset (or Effective Address) Computation

The uses of general-purpose registers as base or index components are restricted in the following manner: • The ESP register cannot be used as an index register. • When the ESP or EBP register is used as the base, the SS segment is the default segment. In all other cases, the DS segment is the default segment. The base, index, and displacement components can be used in any combination, and any of these components can be null. A scale factor may be used only when an index also is used. Each possible combination is useful for data structures commonly used by programmers in high-level languages and assembly language. The following addressing modes suggest uses for common combinations of address components. • Displacement � A displacement alone represents a direct (uncomputed) offset to the operand. Because the displacement is encoded in the instruction, this form of an address is sometimes called an absolute or static address. It is commonly used to access a statically allocated scalar operand. • Base � A base alone represents an indirect offset to the operand. Since the value in the base register can change, it can be used for dynamic storage of variables and data structures. • Base + Displacement � A base register and a displacement can be used together for two distinct purposes: • As an index into an array when the element size is not 2, 4, or 8 bytes—The displacement component encodes the static offset to the beginning of the array. The base register holds the results of a calculation to determine the offset to a specific element within the array. • To access a field of a record: the base register holds the address of the beginning of the record, while the displacement is a static offset to the field. An important special case of this combination is access to parameters in a procedure activation record. A procedure activation record is the stack frame created when a procedure is entered. Here, the EBP register is the best choice for the base register,


because it automatically selects the stack segment. This is a compact encoding for this common function. • (Index ∗ Scale) + Displacement � This address mode offers an efficient way to index into a static array when the element size is 2, 4, or 8 bytes. The displacement locates the beginning of the array, the index register holds the subscript of the desired array element, and the processor automatically converts the subscript into an index by applying the scaling factor. • Base + Index + Displacement � Using two registers together supports either a twodimensional array (the displacement holds the address of the beginning of the array) or one of several instances of an array of records (the displacement is an offset to a field within the record). • Base + (Index ∗ Scale) + Displacement � Using all the addressing components together allows efficient indexing of a two-dimensional array when the elements of the array are 2, 4, or 8 bytes in size. I/O Port Addressing The processor supports an I/O address space that contains up to 65,536 8-bit I/O ports. Ports that are 16-bit and 32-bit may also be defined in the I/O address space. An I/O port can be addressed with either an immediate operand or a value in the DX register.


INSTRUCTION SET

• General purpose • x87 FPU

GENERAL-PURPOSE INSTRUCTIONS The general-purpose instructions preform basic data movement, arithmetic, logic, program flow, and string operations that programmers commonly use to write application and system software to run on IA-32 processors. They operate on data contained in memory, in the general-purpose registers (EAX, EBX, ECX, EDX, EDI, ESI, EBP, and ESP) and in the EFLAGS register. They also operate on address information contained in memory, the general-purpose registers, and the segment registers (CS, DS, SS, ES, FS, and GS). This group of instructions includes the data transfer, binary integer arithmetic, decimal arithmetic, logic operations, shift and rotate, bit and byte operations, program control, string, flag control, segment register operations, and miscellaneous subgroups. The sections that following introduce each subgroup. Data Transfer Instructions The data transfer instructions move data between memory and the general-purpose and segment registers. They also perform specific operations such as conditional moves, stack access, and data conversion. MOV Move data between general-purpose registers; move data between memory and general-purpose or segment registers; move immediates to general-purpose registers CMOVE/CMOVZ Conditional move if equal/Conditional move if zero CMOVNE/CMOVNZ Conditional move if not equal/Conditional move if not zero CMOVA/CMOVNBE Conditional move if above/Conditional move if not below

or equal CMOVAE/CMOVNB Conditional move if above or equal/Conditional move if

not below CMOVB/CMOVNAE Conditional move if below/Conditional move if not above

or equal CMOVBE/CMOVNA Conditional move if below or equal/Conditional move if

not above CMOVG/CMOVNLE Conditional move if greater/Conditional move if not less or

equal CMOVGE/CMOVNL Conditional move if greater or equal/Conditional move if not less CMOVL/CMOVNGE Conditional move if less/Conditional move if not greater or equal CMOVLE/CMOVNG Conditional move if less or equal/Conditional move if not greater CMOVC Conditional move if carry CMOVNC Conditional move if not carry CMOVO Conditional move if overflow CMOVNO Conditional move if not overflow CMOVS Conditional move if sign (negative) CMOVNS Conditional move if not sign (non-negative) CMOVP/CMOVPE Conditional move if parity/Conditional move if parity even CMOVNP/CMOVPO Conditional move if not parity/Conditional move if parity odd


XCHG Exchange BSWAP Byte swap XADD Exchange and add CMPXCHG Compare and exchange CMPXCHG8B Compare and exchange 8 bytes PUSH Push onto stack POP Pop off of stack PUSHA/PUSHAD Push general-purpose registers onto stack POPA/POPAD Pop general-purpose registers from stack CWD/CDQ Convert word to doubleword/Convert doubleword to quadword CBW/CWDE Convert byte to word/Convert word to doubleword in EAX register MOVSX Move and sign extend MOVZX Move and zero extend Binary Arithmetic Instructions The binary arithmetic instructions perform basic binary integer computations on byte, word, and doubleword integers located in memory and/or the general purpose registers. ADD Integer add ADC Add with carry SUB Subtract SBB Subtract with borrow IMUL Signed multiply MUL Unsigned multiply IDIV Signed divide DIV Unsigned divide INC Increment DEC Decrement NEG Negate CMP Compare Decimal Arithmetic Instructions The decimal arithmetic instructions perform decimal arithmetic on binary coded decimal (BCD)data. DAA Decimal adjust after addition DAS Decimal adjust after subtraction AAA ASCII adjust after addition AAS ASCII adjust after subtraction AAM ASCII adjust after multiplication AAD ASCII adjust before division Logical Instructions The logical instructions perform basic AND, OR, XOR, and NOT logical operations on byte, word, and doubleword values. AND Perform bitwise logical AND OR Perform bitwise logical OR XOR Perform bitwise logical exclusive OR NOT Perform bitwise logical NOT


Shift and Rotate Instructions The shift and rotate instructions shift and rotate the bits in word and doubleword operands. SAR Shift arithmetic right SHR Shift logical right SAL/SHL Shift arithmetic left/Shift logical left SHRD Shift right double SHLD Shift left double ROR Rotate right ROL Rotate left RCR Rotate through carry right RCL Rotate through carry left Bit and Byte Instructions Bit instructions test and modify individual bits in word and doubleword operands. Byte instructions set the value of a byte operand to indicate the status of flags in the EFLAGS register. BT Bit test BTS Bit test and set BTR Bit test and reset BTC Bit test and complement BSF Bit scan forward BSR Bit scan reverse SETE/SETZ Set byte if equal/Set byte if zero SETNE/SETNZ Set byte if not equal/Set byte if not zero SETA/SETNBE Set byte if above/Set byte if not below or equal SETAE/SETNB/SETNC Set byte if above or equal/Set byte if not below/Set byte if not carry SETB/SETNAE/SETC Set byte if below/Set byte if not above or equal/Set byte if carry SETBE/SETNA Set byte if below or equal/Set byte if not above SETG/SETNLE Set byte if greater/Set byte if not less or equal SETGE/SETNL Set byte if greater or equal/Set byte if not less SETL/SETNGE Set byte if less/Set byte if not greater or equal SETLE/SETNG Set byte if less or equal/Set byte if not greater SETS Set byte if sign (negative) SETNS Set byte if not sign (non-negative) SETO Set byte if overflow SETNO Set byte if not overflow SETPE/SETP Set byte if parity even/Set byte if parity SETPO/SETNP Set byte if parity odd/Set byte if not parity TEST Logical compare Control Transfer Instructions The control transfer instructions provide jump, conditional jump, loop, and call and return operations to control program flow. JMP Jump JE/JZ Jump if equal/Jump if zero


JNE/JNZ Jump if not equal/Jump if not zero JA/JNBE Jump if above/Jump if not below or equal JAE/JNB Jump if above or equal/Jump if not below JB/JNAE Jump if below/Jump if not above or equal JBE/JNA Jump if below or equal/Jump if not above JG/JNLE Jump if greater/Jump if not less or equal JGE/JNL Jump if greater or equal/Jump if not less JL/JNGE Jump if less/Jump if not greater or equal JLE/JNG Jump if less or equal/Jump if not greater JC Jump if carry JNC Jump if not carry JO Jump if overflow JNO Jump if not overflow JS Jump if sign (negative) JNS Jump if not sign (non-negative) JPO/JNP Jump if parity odd/Jump if not parity JPE/JP Jump if parity even/Jump if parity JCXZ/JECXZ Jump register CX zero/Jump register ECX zero LOOP Loop with ECX counter LOOPZ/LOOPE Loop with ECX and zero/Loop with ECX and equal LOOPNZ/LOOPNE Loop with ECX and not zero/Loop with ECX and not equal CALL Call procedure RET Return IRET Return from interrupt INT Software interrupt INTO Interrupt on overflow BOUND Detect value out of range ENTER High-level procedure entry LEAVE High-level procedure exit String Instructions The string instructions operate on strings of bytes, allowing them to be moved to and from memory. MOVS/MOVSB Move string/Move byte string MOVS/MOVSW Move string/Move word string MOVS/MOVSD Move string/Move doubleword string CMPS/CMPSB Compare string/Compare byte string CMPS/CMPSW Compare string/Compare word string CMPS/CMPSD Compare string/Compare doubleword string SCAS/SCASB Scan string/Scan byte string SCAS/SCASW Scan string/Scan word string SCAS/SCASD Scan string/Scan doubleword string LODS/LODSB Load string/Load byte string LODS/LODSW Load string/Load word string LODS/LODSD Load string/Load doubleword string STOS/STOSB Store string/Store byte string STOS/STOSW Store string/Store word string


STOS/STOSD Store string/Store doubleword string REP Repeat while ECX not zero REPE/REPZ Repeat while equal/Repeat while zero REPNE/REPNZ Repeat while not equal/Repeat while not zero I/O Instructions These instructions move data between the processor’s I/O ports and a register or memory. IN Read from a port OUT Write to a port INS/INSB Input string from port/Input byte string from port INS/INSW Input string from port/Input word string from port INS/INSD Input string from port/Input doubleword string from port OUTS/OUTSB Output string to port/Output byte string to port OUTS/OUTSW Output string to port/Output word string to port OUTS/OUTSD Output string to port/Output doubleword string to port Enter and Leave Instructions These instructions provide machine-language support for procedure calls in block-structured languages. ENTER High-level procedure entry LEAVE High-level procedure exit Flag Control (EFLAG) Instructions The flag control instructions operate on the flags in the EFLAGS register. STC Set carry flag CLC Clear the carry flag CMC Complement the carry flag CLD Clear the direction flag STD Set direction flag LAHF Load flags into AH register SAHF Store AH register into flags PUSHF/PUSHFD Push EFLAGS onto stack POPF/POPFD Pop EFLAGS from stack STI Set interrupt flag CLI Clear the interrupt flag Segment Register Instructions The segment register instructions allow far pointers (segment addresses) to be loaded into the segment registers. LDS Load far pointer using DS LES Load far pointer using ES LFS Load far pointer using FS LGS Load far pointer using GS LSS Load far pointer using SS Miscellaneous Instructions The miscellaneous instructions provide such functions as loading an effective address, executing a “no-operation,” and retrieving processor identification information. LEA Load effective address NOP No operation UD2 Undefined instruction


XLAT/XLATB Table lookup translation CPUID Processor Identification X87 FPU INSTRUCTIONS The x87 FPU instructions are executed by the processor’s x87 FPU. These instructions operate on floating-point, integer, and binary-coded decimal (BCD) operands. For more detail on x87 FPU instructions, see Chapter 8, Programming with the x87 FPU. These instructions are divided into the following subgroups: data transfer, load constants, and FPU control instructions. The sections that follow introduce each subgroup. x87 FPU Data Transfer Instructions The data transfer instructions move floating-point, integer, and BCD values between memory and the x87 FPU registers. They also perform conditional move operations on floating-point operands. FLD Load floating-point value FST Store floating-point value FSTP Store floating-point value and pop FILD Load integer FIST Store integer FISTP1 Store integer and pop FBLD Load BCD FBSTP Store BCD and pop FXCH Exchange registers FCMOVE Floating-point conditional move if equal FCMOVNE Floating-point conditional move if not equal FCMOVB Floating-point conditional move if below FCMOVBE Floating-point conditional move if below or equal FCMOVNB Floating-point conditional move if not below FCMOVNBE Floating-point conditional move if not below or equal FCMOVU Floating-point conditional move if unordered FCMOVNU Floating-point conditional move if not unordered x87FPU Basic Arithmetic Instructions The basic arithmetic instructions perform basic arithmetic operations on floating-point and integer operands. FADD Add floating-point FADDP Add floating-point and pop FIADD Add integer FSUB Subtract floating-point FSUBP Subtract floating-point and pop FISUB Subtract integer FSUBR Subtract floating-point reverse FSUBRP Subtract floating-point reverse and pop FISUBR Subtract integer reverse FMUL Multiply floating-point FMULP Multiply floating-point and pop


FIMUL Multiply integer FDIV Divide floating-point FDIVP Divide floating-point and pop FIDIV Divide integer FDIVR Divide floating-point reverse FDIVRP Divide floating-point reverse and pop FIDIVR Divide integer reverse FPREM Partial remainder FPREM1 IEEE Partial remainder FABS Absolute value FCHS Change sign FRNDINT Round to integer FSCALE Scale by power of two FSQRT Square root FXTRACT Extract exponent and significand x87 FPU Comparison Instructions The compare instructions examine or compare floating-point or integer operands. FCOM Compare floating-point FCOMP Compare floating-point and pop FCOMPP Compare floating-point and pop twice FUCOM Unordered compare floating-point FUCOMP Unordered compare floating-point and pop FUCOMPP Unordered compare floating-point and pop twice FICOM Compare integer FICOMP Compare integer and pop FCOMI Compare floating-point and set EFLAGS FUCOMI Unordered compare floating-point and set EFLAGS FCOMIP Compare floating-point, set EFLAGS, and pop FUCOMIP Unordered compare floating-point, set EFLAGS, and pop FTST T Test floating-point (compare with 0.0) FXAM Examine floating-point x87 FPU Transcendental Instructions The transcendental instructions perform basic trigonometric and logarithmic operations on floating-point operands. FSIN Sine FCOS Cosine FSINCOS Sine and cosine FPTAN Partial tangent FPATAN Partial arctangent F2XM1 2x − 1 FYL2X y∗log2x FYL2XP1 y∗log2(x+1) x87 FPU Load Constants Instructions


The load constants instructions load common constants, such as π, into the x87 floating-point registers. FLD1 Load +1.0 FLDZ Load +0.0 FLDPI Load π FLDL2E Load log2e FLDLN2 Load loge2 FLDL2T Load log210 FLDLG2 Load log102 x87 FPU Control Instructions The x87 FPU control instructions operate on the x87 FPU register stack and save and restore the x87 FPU state. FINCSTP Increment FPU register stack pointer FDECSTP Decrement FPU register stack pointer FFREE Free floating-point register FINIT Initialize FPU after checking error conditions FNINIT Initialize FPU without checking error conditions FCLEX Clear floating-point exception flags after checking for error

conditions FNCLEX Clear floating-point exception flags without checking for error

conditions FSTCW Store FPU control word after checking error conditions FNSTCW Store FPU control word without checking error conditions FLDCW Load FPU control word FSTENV Store FPU environment after checking error conditions FNSTENV Store FPU environment without checking error conditions FLDENV Load FPU environment FSAVE Save FPU state after checking error conditions FNSAVE Save FPU state without checking error conditions FRSTOR Restore FPU state FSTSW Store FPU status word after checking error conditions FNSTSW Store FPU status word without checking error conditions WAIT/FWAIT Wait for FPU


EFLAGS Register The 32-bit EFLAGS register contains a group of status flags, a control flag, and a group of system flags. Figure 3-8 defines the flags within this register. Following initialization of the processor (either by asserting the RESET pin or the INIT pin), the state of the EFLAGS register is 00000002H. Bits 1, 3, 5, 15, and 22 through 31 of this register are reserved. Software should not use or depend on the states of any of these bits. Some of the flags in the EFLAGS register can be modified directly, using special-purpose instructions (described in the following sections). There are no instructions that allow the whole register to be examined or modified directly. The following instructions can be used to move groups of flags to and from the procedure stack or the EAX register: LAHF, SAHF, PUSHF, PUSHFD, POPF, and POPFD. After the contents of the EFLAGS register have been transferred to the procedure stack or EAX register, the flags can be examined and modified using the processor’s bit manipulation instructions (BT, BTS, BTR, and BTC). When suspending a task (using the processor’s multitasking facilities), the processor automatically saves the state of the EFLAGS register in the task state segment (TSS) for the task being suspended. When binding itself to a new task, the processor loads the EFLAGS register with data from the new task’s TSS. When a call is made to an interrupt or exception handler procedure, the processor automatically saves the state of the EFLAGS registers on the procedure stack. When an interrupt or exception is handled with a task switch, the state of the EFLAGS register is saved in the TSS for the task being suspended.


EFLAGS Register

Status Flags The status flags (bits 0, 2, 4, 6, 7, and 11) of the EFLAGS register indicate the results of arithmetic instructions, such as the ADD, SUB, MUL, and DIV instructions. The status flag functions are: CF (bit 0) Carry flag — Set if an arithmetic operation generates a carry or a borrow out of the most-significant bit of the result; cleared otherwise. This flag indicates an overflow condition for unsigned-integer arithmetic. It is also used in multiple-precision arithmetic. PF (bit 2) Parity flag — Set if the least-significant byte of the result contains an even number of 1 bits; cleared otherwise. AF (bit 4) Adjust flag — Set if an arithmetic operation generates a carry or a borrow out of bit 3 of the result; cleared otherwise. This flag is used in binary-coded decimal (BCD) arithmetic. ZF (bit 6) Zero flag — Set if the result is zero; cleared otherwise. SF (bit 7) Sign flag — Set equal to the most-significant bit of the result, which is the sign bit of a signed integer. (0 indicates a positive value and 1 indicates a negative value.) OF (bit 11) Overflow flag — Set if the integer result is too large a positive number or too small a negative number (excluding the sign-bit) to fit in the destination operand; cleared otherwise. This flag indicates an overflow condition for signed-integer (two’s complement) arithmetic.


DF Flag -- The direction flag (DF, located in bit 10 of the EFLAGS register) controls string instructions (MOVS, CMPS, SCAS, LODS, and STOS). Setting the DF flag causes the string instructions to auto-decrement (to process strings from high addresses to low addresses). Clearing the DF flag causes the string instructions to auto-increment (process strings from low addresses to high addresses). The STD and CLD instructions set and clear the DF flag, respectively. System Flags and IOPL Field The system flags and IOPL field in the EFLAGS register control operating-system or executive operations. They should not be modified by application programs. The functions of the system flags are as follows: TF (bit 8) Trap flag — Set to enable single-step mode for debugging; clear to disable single-step mode. IF (bit 9) Interrupt enable flag — Controls the response of the processor to maskable interrupt requests. Set to respond to maskable interrupts; cleared to inhibit maskable interrupts. IOPL (bits 12 and 13) I/O privilege level field — Indicates the I/O privilege level of the currently running program or task. The current privilege level (CPL) of the currently running program or task must be less than or equal to the I/O privilege level to access the I/O address space. This field can only be modified by the POPF and IRET instructions when operating at a CPL of 0. NT (bit 14) Nested task flag — Controls the chaining of interrupted and called tasks. Set when the current task is linked to the previously executed task; cleared when the current task is not linked to another task. RF (bit 16) Resume flag — Controls the processor’s response to debug exceptions. VM (bit 17) Virtual-8086 mode flag — Set to enable virtual-8086 mode; clear to return to protected mode without virtual-8086 mode semantics. AC (bit 18) Alignment check flag — Set this flag and the AM bit in the CR0 register to enable alignment checking of memory references; clear the AC flag and/or the AM bit to disable alignment checking. VIF (bit 19) Virtual interrupt flag — Virtual image of the IF flag. Used in conjunction with the VIP flag. (To use this flag and the VIP flag the virtual mode extensions are enabled by setting the VME flag in control register CR4.) VIP (bit 20) Virtual interrupt pending flag — Set to indicate that an interrupt is pending; clear when no interrupt is pending. (Software sets and clears this flag; the processor only reads it.) Used in conjunction with the VIF flag. ID (bit 21) Identification flag — The ability of a program to set or clear this flag indicates support for the CPUID instruction.


MEMORY-MANAGEMENT REGISTERS The processor provides four memory-management registers (GDTR, LDTR, IDTR, and TR) that specify the locations of the data structures which control segmented memory management (see Figure ). Special instructions are provided for loading and storing these registers.

Memory Management Registers

Global Descriptor Table Register (GDTR) The GDTR register holds the base address (32 bits in protected mode) and the 16-bit table limit for the GDT. The base address specifies the linear address of byte 0 of the GDT; the table limit specifies the number of bytes in the table. The LGDT and SGDT instructions load and store the GDTR register, respectively. On power up or reset of the processor, the base address is set to the default value of 0 and the limit is set to 0FFFFH. A new base address must be loaded into the GDTR as part of the processor initialization process for protected-mode operation. Local Descriptor Table Register (LDTR) The LDTR register holds the 16-bit segment selector, base address (32 bits in protected mode; ), segment limit, and descriptor attributes for the LDT. The base address specifies the linear address of byte 0 of the LDT segment; the segment limit specifies the number of bytes in the segment The LLDT and SLDT instructions load and store the segment selector part of the LDTR register, respectively. The segment that contains the LDT must have a segment descriptor in the GDT. When the LLDT instruction loads a segment selector in the LDTR: the base address, limit, and descriptor attributes from the LDT descriptor are automatically loaded in the LDTR. When a task switch occurs, the LDTR is automatically loaded with the segment selector and descriptor for the LDT for the new task. The contents of the LDTR are not automatically saved prior to writing the new LDT information into the register. On power up or reset of the processor, the segment selector and base address are set to the default value of 0 and the limit is set to 0FFFFH.


IDTR Interrupt Descriptor Table Register The IDTR register holds the base address (32 bits in protected mode) and 16-bit table limit for the IDT. The base address specifies the linear address of byte 0 of the IDT; the table limit specifies the number of bytes in the table. The LIDT and SIDT instructions load and store the IDTR register, respectively. On power up or reset of the processor, the base address is set to the default value of 0 and the limit is set to 0FFFFH. The base address and limit in the register can then be changed as part of the processor initialization process. Task Register (TR) The task register holds the 16-bit segment selector, base address (32 bits in protected mode;), segment limit, and descriptor attributes for the TSS of the current task. The selector references the TSS descriptor in the GDT. The base address specifies the linear address of byte 0 of the TSS; the segment limit specifies the number of bytes in the TSS. The LTR and STR instructions load and store the segment selector part of the task register respectively. When the LTR instruction loads a segment selector in the task register, the base address, limit, and descriptor attributes from the TSS descriptor are automatically loaded into the task register. On power up or reset of the processor, the base address is set to the default value of 0 and the limit is set to 0FFFFH. When a task switch occurs, the task register is automatically loaded with the segment selector and descriptor for the TSS for the new task. The contents of the task register are not automatically saved prior to writing the new TSS information into the register.


CONTROL REGISTERS Control registers (CR0, CR1, CR2, CR3, and CR4; see Figure ) determine operating mode of the processor and the characteristics of the currently executing task. These registers are 32 bits in all 32-bit modes and compatibility mode.

Control Registers

• CR0 — Contains system control flags that control operating mode and states of the processor. • CR1 — Reserved. • CR2 — Contains the page-fault linear address (the linear address that caused a page fault). • CR3 — Contains the physical address of the base of the page directory and two flags (PCD and PWT). This register is also known as the page-directory base register (PDBR). Only the most-significant bits (less the lower 12 bits) of the base address are specified; the lower 12 bits of the address are assumed to be 0. The page directory must thus be aligned to a page (4-KByte) boundary. The PCD and PWT flags control caching of the page directory in the processor’s internal data caches (they do not control TLB caching of page-directory information). • CR4 — Contains a group of flags that enable several architectural extensions, and indicate operating system or executive support for specific processor capabilities. The control registers can be read and loaded (or modified) using the move-to-or-from-control-registers forms of the MOV instruction. In protected mode, the MOV instructions allow the control registers to be read or loaded (at privilege level 0 only). This restriction means


that application programs or operating-system procedures (running at privilege levels 1, 2, or 3) are prevented from reading or loading the control registers. • CR8 — Provides read and write access to the Task Priority Register (TPR). It specifies the priority threshold value that operating systems use to control the priority class of external interrupts allowed to interrupt the processor. This register is available only in 64-bit mode. However, interrupt filtering continues to apply in compatibility mode. PG Paging (bit 31 of CR0) — Enables paging when set; disables paging when clear. When paging is disabled, all linear addresses are treated as physical addresses. The PG flag has no effect if the PE flag (bit 0 of register CR0) is not also set; setting the PG flag when the PE flag is clear causes a general-protection exception (#GP). CD Cache Disable (bit 30 of CR0) — When the CD and NW flags are clear, caching of memory locations for the whole of physical memory in the processor’s internal (and external) caches is enabled. To prevent the processor from accessing and updating its caches, the CD flag must be set and the caches must be invalidated so that no cache hits can occur. NW Not Write-through (bit 29 of CR0) — When the NW and CD flags are clear, writeback (for Pentium 4, Intel Xeon, P6 family, and Pentium processors) or write-through (for Intel486 processors) is enabled for writes that hit the cache and invalidation cycles are enabled. AM Alignment Mask (bit 18 of CR0) — Enables automatic alignment checking when set; disables alignment checking when clear. Alignment checking is performed only when the AM flag is set, the AC flag in the EFLAGS register is set, CPL is 3, and the processor is operating in either protected or virtual-8086 mode. WP Write Protect (bit 16 of CR0) — Inhibits supervisor-level procedures from writing into user-level read-only pages when set; allows supervisor-level procedures to write into user-level read-only pages when clear. This flag facilitates implementation of the copy-on-write method of creating a new process (forking) used by operating systems such as UNIX*. NE Numeric Error (bit 5 of CR0) — Enables the native (internal) mechanism for reporting x87 FPU errors when set; enables the PC-style x87 FPU error reporting mechanism when clear. ET Extension Type (bit 4 of CR0) — Reserved in the Pentium . In the Intel386 and Intel486 processors, this flag indicates support of Intel 387 DX math coprocessor instructions when set. TS Task Switched (bit 3 of CR0) — Allows the saving of the x87 FPU context on a task switch to be delayed until an x87 FPU task switch and tests it when executing x87 FPU instructions.


EM Emulation (bit 2 of CR0) — Indicates that the processor does not have an internal or external x87 FPU when set; indicates an x87 FPU is present when clear. MP Monitor Coprocessor (bit 1 of CR0). — Controls the interaction of the WAIT (or FWAIT) instruction with the TS flag (bit 3 of CR0). PE Protection Enable (bit 0 of CR0) — Enables protected mode when set; enables real address mode when clear. This flag does not enable paging directly. It only enables segment-level protection. To enable paging, both the PE and PG flags must be set. PCD Page-level Cache Disable (bit 4 of CR3) — Controls caching of the current page directory. When the PCD flag is set, caching of the page-directory is prevented; when the flag is clear, the page-directory can be cached. PWT Page-level Writes Transparent (bit 3 of CR3) — Controls the write-through or writeback caching policy of the current page directory. When the PWT flag is set, writethrough caching is enabled; when the flag is clear, write-back caching is enabled. This flag affects only internal caches VME Virtual-8086 Mode Extensions (bit 0 of CR4) — Enables interrupt- and exceptionhandling extensions in virtual-8086 mode when set; disables the extensions when clear. PVI Protected-Mode Virtual Interrupts (bit 1 of CR4) — Enables hardware support for a virtual interrupt flag (VIF) in protected mode when set; disables the VIF flag in protected mode when clear. TSD Time Stamp Disable (bit 2 of CR4) — Restricts the execution of the RDTSC instruction to procedures running at privilege level 0 when set; allows RDTSC instruction to be executed at any privilege level when clear. DE Debugging Extensions (bit 3 of CR4) — References to debug registers DR4 and DR5 cause an undefined opcode (#UD) exception to be generated when set; when clear, PSE Page Size Extensions (bit 4 of CR4) — Enables 4-MByte pages when set; restricts pages to 4 KBytes when clear. PAE Physical Address Extension (bit 5 of CR4) — When set, enables paging mechanism to reference greater-or-equal-than-36-bit physical addresses. When clear, restricts physical addresses to 32 bits. MCE Machine-Check Enable (bit 6 of CR4) — Enables the machine- when set; disables the machine-check exception when clear. PGE Page Global Enable (bit 7 of CR4) — (Introduced in the P6 family processors.)


Enables the global page feature when set; disables the global page feature when clear. The global page feature allows frequently used or shared pages to be marked as global to all users TPL Task Priority Level (bit 3:0 of CR8) — This sets the threshold value corresponding to the highest-priority interrupt to be blocked. A value of 0 means all interrupts are enabled.


Summary of System Instruction


SYSTEM ARCHITECTURE OVERVIEW System-level architecture and includes features to assist in the following operations: • Memory management • Protection of software modules • Multitasking • Exception and interrupt handling • Multiprocessing • Cache management • Hardware resource and power management • Debugging and performance monitoring System-level architectural are used only by system programmers. However, application programmers may need to read this chapter and the following chapters in order to create a reliable and secure environment for application programs.


MEMORY ORGANIZATION The memory that the processor addresses on its bus is called physical memory. Physical memory is organized as a sequence of 8-bit bytes. Each byte is assigned a unique address, called a physical address. The physical address space ranges from zero to a maximum of 232− 1 (4 GBytes) . Virtually any operating system or executive designed to work with an IA-32 processor will use the processor’s memory management facilities to access memory. These facilities provide features such as segmentation and paging, which allow memory to be managed efficiently and reliably. The following paragraphs describe the basic methods of addressing memory when memory management is used. Three Memory Models When employing the processor’s memory management facilities, programs do not directly address physical memory. Instead, they access memory using one of three memory models: flat, segmented, or real address mode: • Flat memory model — Memory appears to a program as a single, continuous address space .. This space is called a linear address space. Code, data, and stacks are all contained in this address space. Linear address space is byte addressable, with addresses running contiguously from 0 to 232 - 1 (if not in 64-bit mode). An address for any byte in linear address space is called a linear address. • Segmented memory model — Memory appears to a program as a group of independent address spaces called segments. Code, data, and stacks are typically contained in separate segments. To address a byte in a segment, a program issues a logical address. This consists of a segment selector and an offset (logical addresses are often referred to as far pointers). The segment selector identifies the segment to be accessed and the offset identifies a byte in the address space of the segment. Programs running on an IA-32 processor can address up to 16,383 segments of different sizes and types, and each segment can be as large as 232 bytes. Internally, all the segments that are defined for a system are mapped into the processor’s linear address space. To access a memory location, the processor thus translates each logical address into a linear address. This translation is transparent to the application program. The primary reason for using segmented memory is to increase the reliability of programs and systems. For example, placing a program’s stack in a separate segment prevents the stack from growing into the code or data space and overwriting instructions or data, respectively. • Real-address mode memory model — This is the memory model for the Intel 8086 processor. It is supported to provide compatibility with existing programs written to run on the Intel 8086 processor. The real-address mode uses a specific implementation of segmented memory in which the linear address space for the program and the operating system/executive consists of an array of segments of up to 64 KBytes in size each. The maximum size of the linear address space in real-address mode is 220 bytes.


MEMORY MANAGEMENT OVERVIEW The memory management facilities of the IA-32 architecture are divided into two parts: segmentation and paging. Segmentation provides a mechanism of isolating individual code, data, and stack modules so that multiple programs (or tasks) can run on the same processor without interfering with one another. Paging provides a mechanism for implementing a conventional demand-paged, virtual-memory system where sections of a program’s execution environment are mapped into physical memory as needed. Paging can also be used to provide isolation between multiple tasks. When operating in protected mode, some form of segmentation must be used. There is no mode bit to disable segmentation. The use of paging, however, is optional. These two mechanisms (segmentation and paging) can be configured to support simple single program (or single-task) systems, multitasking systems, or multiple-processor systems that used shared memory. Segmentation provides a mechanism for dividing the processor’s addressable memory space (called the linear address space) into smaller protected address spaces called segments. Segments can be used to hold the code, data, and stack for a program or to hold system data structures (such as a TSS or LDT). If more than one program (or task) is running on a processor, each program can be assigned its own set of


segments. The processor then enforces the boundaries between these segments and insures that one program does not interfere with the execution of another program by writing into the other program’s segments. The segmentation mechanism also allows typing of segments so that the operations that may be performed on a particular type of segment can be restricted. All the segments in a system are contained in the processor’s linear address space. To locate a byte in a particular segment, a logical address (also called a far pointer) must be provided. A logical address consists of a segment selector and an offset. The segment selector is a unique identifier for a segment. Among other things it provides an offset into a descriptor table (such as the global descriptor table, GDT) to a data structure called a segment descriptor. Each segment has a segment descriptor, which specifies the size of the segment, the access rights and privilege level for the segment, the segment type, and the location of the first byte of the segment in the linear address space (called the base address of the segment). The offset part of the logical address is added to the base address for the segment to locate a byte within the segment. The base address plus the offset thus forms a linear address in the processor’s linear address space.

Segmentation


If paging is not used, the linear address space of the processor is mapped directly into the physical address space of processor. The physical address space is defined as the range of addresses that the processor can generate on its address bus. Because multitasking computing systems commonly define a linear address space much larger than it is economically feasible to contain all at once in physical memory, some method of “virtualizing” the linear address space is needed. This virtualization of the linear address space is handled through the processor’s paging mechanism. Paging supports a “virtual memory” environment where a large linear address space is simulated with a small amount of physical memory (RAM and ROM) and some disk storage. When using paging, each segment is divided into pages (typically 4 KBytes each in size), which are stored either in physical memory or on the disk. The operating system or executive maintains a page directory and a set of page tables to keep track of the pages. When a program (or task) attempts to access an address location in the linear address space, the processor uses the page directory and page tables to translate the linear address into a physical address and then performs the requested operation (read or write) on the memory location. Paging and Segmentation Paging can be used with any of the segmentation models. The processor’s paging mechanism divides the linear address space (into which segments are mapped) into pages. These linear-address-space pages are then mapped to pages in the physical address space. The paging mechanism offers several page-level protection facilities that can be used with or instead of the segment-protection facilities. For example, it lets read-write protection be enforced on a page-by-page basis. The paging mechanism also provides two-level user-supervisor protection that can also be specified on a page-by-page basis. PHYSICAL ADDRESS SPACE In protected mode, the IA-32 architecture provides a normal physical address space of 4 GBytes (232 bytes). This is the address space that the processor can address on its address bus. This address space is flat (unsegmented), with addresses ranging continuously from 0 to FFFFFFFFH. This physical address space can be mapped to read-write memory, read-only memory, and memory mapped I/O. The memory mapping facilities described in this chapter can be used to divide this physical memory up into segments and/or pages. Starting with the Pentium Pro processor, the IA-32 architecture also supports an extension of the physical address space to 236 bytes (64 GBytes); with a maximum physical address of FFFFFFFFFH. This extension is invoked in either of two ways: • Using the physical address extension (PAE) flag, located in bit 5 of control register CR4. • Using the 36-bit page size extension (PSE-36) feature (introduced in the Pentium III processors). .


LOGICAL AND LINEAR ADDRESSES At the system-architecture level in protected mode, the processor uses two stages of address translation to arrive at a physical address: logical-address translation and linear address space paging. Even with the minimum use of segments, every byte in the processor’s address space is accessed with a logical address. A logical address consists of a 16-bit segment selector and a 32-bit offset . The segment selector identifies the segment the byte is located in and the offset specifies the location of the byte in the segment relative to the base address of the segment. The processor translates every logical address into a linear address. A linear address is a 32-bit address in the processor’s linear address space. Like the physical address space, the linear address space is a flat (unsegmented), 232-byte address space, with addresses ranging from 0 to FFFFFFFH. The linear address space contains all the segments and system tables defined for a system. To translate a logical address into a linear address, the processor does the following: 1. Uses the offset in the segment selector to locate the segment descriptor for the segment in the GDT or LDT and reads it into the processor. (This step is needed only when a new segment selector is loaded into a segment register.) 2. Examines the segment descriptor to check the access rights and range of the segment to insure that the segment is accessible and that the offset is within the limits of the segment. 3. Adds the base address of the segment from the segment descriptor to the offset to form a linear address.


System Level Registers and Data Structures Global and Local Descriptor Tables When operating in protected mode, all memory accesses pass through either the global descriptor table (GDT) or an optional local descriptor table (LDT). These tables contain entries called segment descriptors. Segment descriptors provide the base address of segments well as access rights, type, and usage information. Each segment descriptor has an associated segment selector. A segment selector provides the software that uses it with an index into the GDT or LDT (the offset of its associated segment descriptor), a global/local flag (determines whether the selector points to the GDT or the LDT), and access rights information. To access a byte in a segment, a segment selector and an offset must be supplied. The segment selector provides access to the segment descriptor for the segment (in the GDT or LDT). From the segment descriptor, the processor obtains the base address of the segment in the linear address space. The offset then provides the location of the byte relative to the base address. This mechanism can be used to access any valid code, data, or stack segment, provided the segment is accessible from the current privilege level (CPL) at which the processor is operating. The CPL is defined as the protection level of the currently executing code segment. (Pentium 3 2.4.1 )

Memory Management Registers

Logical Address Translation To translate a logical address into a linear address, the processor does the following: 1. Uses the offset in the segment selector to locate the segment descriptor for the segment in the GDT or LDT and reads it into the processor. (This step is needed only when a new segment selector is loaded into a segment register.) 2. Examines the segment descriptor to check the access rights and range of the segment to insure that the segment is accessible and that the offset is within the limits of the segment.


4. Adds the base address of the segment from the segment descriptor to the offset to form a linear address.

Logical Address to Linear Address Translation

If paging is not used, the processor maps the linear address directly to a physical address (that is, the linear address goes out on the processor’s address bus). If the linear address space is paged, a second level of address translation is used to translate the linear address into a physical address. Segment Selectors A segment selector is a 16-bit identifier for a segment (see Figure ). It does not point directly to the segment, but instead points to the segment descriptor that defines the segment. A segment selector contains the following items: Index (Bits 3 through 15) — Selects one of 8192 descriptors in the GDT or LDT. The processor multiplies the index value by 8 (the number of bytes in a segment descriptor) and adds the result to the base address of the GDT or LDT (from the GDTR or LDTR register, respectively). TI (table indicator) flag (Bit 2) — Specifies the descriptor table to use: clearing this flag selects the GDT; setting this flag selects the current LDT.

Segment Selector

Requested Privilege Level (RPL)


(Bits 0 and 1) — Specifies the privilege level of the selector. The privilege level can range from 0 to 3, with 0 being the most privileged level. See Section 4.5, “Privilege Levels”, for a description of the relationship of the RPL to the CPL of the executing program (or task) and the descriptor privilege level (DPL) of the descriptor the segment selector points to. The first entry of the GDT is not used by the processor. A segment selector that points to this entry of the GDT (that is, a segment selector with an index of 0 and the TI flag set to 0) is used as a “null segment selector.” The processor does not generate an exception when a segment register (other than the CS or SS registers) is loaded with a null selector. It does, however, generate an exception when a segment register holding a null selector is used to access memory. A null selector can be used to initialize unused segment registers. Loading the CS or SS register with a null segment selector causes a general-protection exception (#GP) to be generated. Segment selectors are visible to application programs as part of a pointer variable, but the values of selectors are usually assigned or modified by link editors or linking loaders, not application programs. Segment Descriptors A segment descriptor is a data structure in a GDT or LDT that provides the processor with the size and location of a segment, as well as access control and status information. Segment descriptors are typically created by compilers, linkers, loaders, or the operating system or executive, but not application programs. Figure 3-8 illustrates the general descriptor format for all types of segment descriptors.

Segment Descriptor

Segment limit field Specifies the size of the segment. The processor puts together the two segment limit fields to form a 20-bit value. The processor interprets the segment limit in one of two ways, depending on the setting of the G (granularity) flag: • If the granularity flag is clear, the segment size can range from 1 byte to 1 MByte, in byte increments. • If the granularity flag is set, the segment size can range from 4 KBytes to


4 GBytes, in 4-KByte increments. Base address fields Defines the location of byte 0 of the segment within the 4-GByte linear address space. The processor puts together the three base address fields to form a single 32-bit value. Segment base addresses should be aligned to 16-byte boundaries. Although 16-byte alignment is not required, this alignment allows programs to maximize performance by aligning code and data on 16-byte boundaries. Type field Indicates the segment or gate type and specifies the kinds of access that can be made to the segment and the direction of growth. The interpretation of this field depends on whether the descriptor type flag specifies an application (code or data) descriptor or a system descriptor. The encoding of the type field is different for code, data, and system descriptors

CODE- and DATA- Segment Types

S (descriptor type) flag Specifies whether the segment descriptor is for a system segment (S flag is clear) or a code or data segment (S flag is set). DPL (descriptor privilege level) field Specifies the privilege level of the segment. The privilege level can range from 0 to 3, with 0 being the most privileged level. The DPL is used to control access to the segment. P (segment-present) flag Indicates whether the segment is present in memory (set) or not present (clear). If this flag is clear, the processor generates a segment-not-present exception (#NP) when a segment selector that points to the segment descriptor is loaded into a segment register. D/B (default operation size/default stack pointer size and/or upper bound) flag


Performs different functions depending on whether the segment descriptor is an executable code segment, an expand-down data segment, or a stack segment. (This flag should always be set to 1 for 32-bit code and data segments and to 0 for 16-bit code and data segments.) G (granularity) flag Determines the scaling of the segment limit field. When the granularity flag is clear, the segment limit is interpreted in byte units; when flag is set, the segment limit is interpreted in 4-KByte units. SYSTEM DESCRIPTOR TYPES When the S (descriptor type) flag in a segment descriptor is clear, the descriptor type is a system descriptor. The processor recognizes the following types of system descriptors: • Local descriptor-table (LDT) segment descriptor. • Task-state segment (TSS) descriptor. • Call-gate descriptor. • Interrupt-gate descriptor. • Trap-gate descriptor. • Task-gate descriptor. These descriptor types fall into two categories: system-segment descriptors and gate descriptors. System-segment descriptors point to system segments (LDT and TSS segments). Gate descriptors are in themselves “gates,” which hold pointers to procedure entry points in code segments (call, interrupt, and trap gates) or which hold segment selectors for TSS’s (task gates).

Segment Descriptor

System Non System LDT TSS Gate Code Data


PAGING (P3) When operating in protected mode, the architecture permits linear address space to be mapped directly into a large physical memory (for example, 4 GBytes of RAM) or indirectly (using paging) into a smaller physical memory and disk storage. This latter method of mapping the linear address space is referred to as virtual memory or demand-paged virtual memory. When paging is used, the processor divides the linear address space into fixed-size pages (of 4 KBytes, 2 MBytes, or 4 MBytes in length) that can be mapped into physical memory and/or disk storage. When a program (or task) references a logical address in memory, the processor translates the address into a linear address and then uses its paging mechanism to translate the linear address into a corresponding physical address. If the page containing the linear address is not currently in physical memory, the processor generates a page-fault exception (#PF). Paging is different from segmentation through its use of fixed-size pages. Unlike segments, which usually are the same size as the code or data structures they hold, pages have a fixed size. If segmentation is the only form of address translation used, a data structure present in physical memory will have all of its parts in memory. If paging is used, a data structure can be partly in memory and partly in disk storage. To minimize the number of bus cycles required for address translation, the most recently accessed page-directory and page-table entries are cached in the processor in devices called translation lookaside buffers (TLBs). TRANSLATION LOOKASIDE BUFFERS (TLBS) The processor stores the most recently used page-directory and page-table entries in on-chip caches called Translation Lookaside Buffers or TLBs. The P6 family and Pentium processors have separate TLBs for the data and instruction caches. Also, the P6 family processors maintain separate TLBs for 4-KByte and 4-MByte page sizes. The CPUID instruction can be used to determine the sizes of the TLBs provided in the P6 family and Pentium processors. The TLBs store the most recently used page-directory and page-table entries. They speed up memory accesses when paging is enabled by reducing the number of memory accesses that are required to read the page tables stored in system memory. The TLBs are divided into four groups: instruction TLBs for 4-KByte pages, data TLBs for 4-KByte pages; instruction TLBs for large pages (2-MByte or 4-MByte pages), and data TLBs for large pages. The TLBs are normally active only in protected mode with paging enabled. When paging is disabled or the processor is in real-address mode, the TLBs maintain their contents until explicitly or implicitly flushed Most paging is performed using the contents of the TLBs. Bus cycles to the page directory and page tables in memory are performed only when the TLBs do not contain the translation information for a requested page. The TLBs are inaccessible to application


programs and tasks (privilege level greater than 0); that is, they cannot invalidate TLBs. Only, operating system or executive procedures running at privilege level of 0 can invalidate TLBs or selected TLB entries. Whenever a page-directory or page-table entry is changed (including when the present flag is set to zero), the operating-system must immediately invalidate the corresponding entry in the TLB so that it can be updated the next time the entry is referenced. All of the (non-global) TLBs are automatically invalidated any time the CR3 register is loaded (unless the G flag for a page or page-table entry is set, as describe later in this section). The CR3 register can be loaded in either of two ways: • Explicitly, using the MOV instruction, for example: MOV CR3, EAX where the EAX register contains an appropriate page-directory base address. • Implicitly by executing a task switch, which automatically changes the contents of the CR3 register. The INVLPG instruction is provided to invalidate a specific page-table entry in the TLB. Normally, this instruction invalidates only an individual TLB entry; however, in some cases, it may invalidate more than the selected entry and may even invalidate all of the TLBs. This instruction ignores the setting of the G flag in a page-directory or page-table entry . Paging is controlled by three flags in the processor’s control registers: • PG (paging) flag. Bit 31 of CR0 The PG flag enables the page-translation mechanism. The operating system or executive usually sets this flag during processor initialization. The PG flag must be set if the processor’s page translation mechanism is to be used to implement a demand-paged virtual memory system or if the operating system is designed to run more than one program (or task) in virtual-8086 mode. • PSE (page size extensions) flag. Bit 4 of CR4 (introduced in the Pentium processor). The PSE flag enables large page sizes: 4-MByte pages or 2-MByte pages (when the PAE flag is set). When the PSE flag is clear, the more common page length of 4 KBytes is used. • PAE (physical address extension) flag. Bit 5 of CR4 (introduced in the Pentium Pro processors). ( For future Pentium Processors , support Intel EM64T ) The PAE flag provides a method of extending physical addresses to 36 bits. This physical address extension can only be used when paging is enabled. It relies on an additional page directory pointer table that is used along with page directories and page tables to reference physical addresses above FFFFFFFFH. Page Tables and Directories The information that the processor uses to translate linear addresses into physical addresses (when paging is enabled) is contained in four data structures: • Page directory — An array of 32-bit page-directory entries (PDEs) contained in a 4-KByte page. Up to 1024 page-directory entries can be held in a page directory.


• Page table — An array of 32-bit page-table entries (PTEs) contained in a 4-KByte page. Up to 1024 page-table entries can be held in a page table. (Page tables are not used for 2-MByte or 4-MByte pages. These page sizes are mapped directly from one or more pagedirectory entries.) • Page — A 4-KByte, 2-MByte, or 4-MByte flat address space. • Page-Directory-Pointer Table — An array of four 64-bit entries, each of which points to a page directory. This data structure is only used when the physical address extension is enabled These tables provide access to either 4-KByte or 4-MByte pages when normal 32-bit physical addressing is being used. PAGE TRANSLATION USING 32-BIT PHYSICAL ADDRESSING The following sections describe the IA-32 architecture’s page translation mechanism when using 32-bit physical addresses and a maximum physical address space of 4 GBytes. Linear Address Translation (4-KByte Pages) Figure 3-12 shows the page directory and page-table hierarchy when mapping linear addresses to 4-KByte pages. The entries in the page directory point to page tables, and the entries in a page table point to pages in physical memory. This paging method can be used to address up to 220 pages, which spans a linear address space of 232 bytes (4 GBytes).

Linear Address Translation (4 KB Pages)

To select the various table entries, the linear address is divided into three sections: • Page-directory entry — Bits 22 through 31 provide an offset to an entry in the page directory. The selected entry provides the base physical address of a page table. • Page-table entry — Bits 12 through 21 of the linear address provide an offset to an entry in the selected page table. This entry provides the base physical address of a page in physical memory.


• Page offset — Bits 0 through 11 provides an offset to a physical address in the page. Memory management software has the option of using one page directory for all programs and tasks, one page directory for each task, or some combination of the two.

Page Table Entry Page Directory Entry 4KbPages 32 Bit Address

PDE Descriptor (Page-directory entries for 4-KByte page tables) — Specifies the physical address of the first byte of a page table. The bits in this field are interpreted as the 20 most-significant bits of the physical address, which forces page tables to be aligned on 4-KByte boundaries. Page base address, bits 12 through 32 (Page-table entries for 4-KByte pages) — Specifies the physical address of the first byte of a 4-KByte page. The bits in this field are interpreted as the 20 most significant bits of the physical address, which forces pages to be aligned on 4-KByte boundaries. (Page-directory entries for 4-KByte page tables) — Specifies the physical address of the first byte of a page table. The bits in this field are interpreted as the 20 most-significant


bits of the physical address, which forces page tables to be aligned on 4-KByte boundaries. Present (P) flag, bit 0 Indicates whether the page or page table being pointed to by the entry is currently loaded in physical memory. When the flag is set, the page is in physical memory and address translation is carried out. When the flag is clear, the page is not in memory and, if the processor attempts to access the page, it generates a page-fault exception (#PF). The processor does not set or clear this flag; it is up to the operating system or executive to maintain the state of the flag. If the processor generates a page-fault exception, the operating system generally needs to carry out the following operations: 1. Copy the page from disk storage into physical memory. 2. Load the page address into the page-table or page-directory entry and set its present flag. Other flags, such as the dirty and accessed flags, may also be set at this time. 3. Invalidate the current page-table entry in the TLB . 4. Return from the page-fault handler to restart the interrupted program (or task). Read/write (R/W) flag, bit 1 Specifies the read-write privileges for a page or group of pages (in the case of a page-directory entry that points to a page table). When this flag is clear, the page is read only; when the flag is set, the page can be read and written into. This flag interacts with the U/S flag and the WP flag in register CR0. User/supervisor (U/S) flag, bit 2 Specifies the user-supervisor privileges for a page or group of pages (in the case of a page-directory entry that points to a page table). When this flag is clear, the page is assigned the supervisor privilege level; when the flag is set, the page is assigned the user privilege level. This flag interacts with the R/W flag and the WP flag in register CR0. Page-level write-through (PWT) flag, bit 3 Controls the write-through or write-back caching policy of individual pages or page tables. When the PWT flag is set, write-through caching is enabled for the associated page or page table; when the flag is clear, write-back caching is enabled for the associated page or page table. The processor ignores this flag if the CD (cache disable) flag in CR0 is set. Page-level cache disable (PCD) flag, bit 4 Controls the caching of individual pages or page tables. When the PCD flag is set, caching of the associated page or page table is prevented; when the flag is clear, the page or page table can be cached. This flag permits caching to be disabled for pages that contain memory-mapped I/O ports or that do not provide a performance benefit when cached. The processor ignores this flag (assumes it is set) if the CD (cache disable) flag in CR0 is set


Accessed (A) flag, bit 5 Indicates whether a page or page table has been accessed (read from or written to) when set. Memory management software typically clears this flag when a page or page table is initially loaded into physical memory. The processor then sets this flag the first time a page or page table is accessed. This flag is a “sticky” flag, meaning that once set, the processor does not implicitly clear it. Only software can clear this flag. The accessed and dirty flags are provided for use by memory management software to manage the transfer of pages and page tables into and out of physical memory. Dirty (D) flag, bit 6 Indicates whether a page has been written to when set. (This flag is not used in page-directory entries that point to page tables.) Memory management software typically clears this flag when a page is initially loaded into physical memory. The processor sets this flag the first time a page is accessed for a write operation. This flag is “sticky,” meaning that once set, the processor does not implicitly clear it. Only software can clear this flag. The dirty and accessed flags are provided for use by memory management software to manage the transfer of pages and page tables into and out of physical memory. Page size (PS) flag, bit 7 page-directory entries for 4-KByte pages Determines the page size. When this flag is clear, the page size is 4 KBytes and the page-directory entry points to a page table. When the flag is set, the page size is 4 MBytes for normal 32-bit addressing. Page attribute table index (PAT) flag, bit 7 in page-table entries for 4-KByte pages and bit 12 in page-directory entries for 4-MByte pages (Introduced in the Pentium III processor) Global (G) flag, bit 8 (Introduced in the Pentium Pro processor) — Indicates a global page when set. When a page is marked global and the page global enable (PGE) flag in register CR4 is set, the page-table or page-directory entry for the page is not invalidated. .


PROTECTION

The Pentium architecture provides a protection mechanism that operates at both the segment level and the page level. This protection mechanism provides the ability to limit access to certain segments or pages based on privilege levels (four privilege levels for segments and two privilege levels for pages). For example, critical operating-system code and data can be protected by placing them in more privileged segments than those that contain applications code. The processor’s protection mechanism will then prevent application code from accessing the operating-system code and data in any but a controlled, defined manner. Segment and page protection can be used at all stages of software development to assist in localizing and detecting design problems and bugs. It can also be incorporated into end-products to offer added robustness to operating systems, utilities software, and applications software. When the protection mechanism is used, each memory reference is checked to verify that it satisfies various protection checks. All checks are made before the memory cycle is started; any violation results in an exception. Because checks are performed in parallel with address translation, there is no performance penalty. The protection checks that are performed fall into the following categories: • Limit checks. • Type checks. • Privilege level checks. • Restriction of addressable domain. • Restriction of procedure entry-points. • Restriction of instruction set. All protection violation results in an exception being generated. Limit Checks The limit field of a segment descriptor prevents programs or procedures from addressing memory locations outside the segment. The effective value of the limit depends on the setting of the G (granularity) flag. For data segments, the limit also depends on the E (expansion direction) flag and the B (default stack pointer size and/or upper bound) flag. The E flag is one of the bits in the type field when the segment descriptor is for a data-segment type. When the G flag is clear (byte granularity), the effective limit is the value of the 20-bit limit field in the segment descriptor. Here, the limit ranges from 0 to FFFFFH (1 MByte). When the G flag is set (4-KByte page granularity), the processor scales the value in the limit field by a factor of 212 (4 KBytes). In this case, the effective limit ranges from FFFH (4 KBytes) to FFFFFFFFH (4 GBytes). For all types of segments except expand-down data segments, the effective limit is the last address that is allowed to be accessed in the segment, which is one less than the size, in bytes, of the segment. The processor causes a general-protection exception any time an attempt is made to access the following addresses in a segment: • A byte at an offset greater than the effective limit • A word at an offset greater than the (effective-limit – 1) • A doubleword at an offset greater than the (effective-limit – 3) • A quadword at an offset greater than the (effective-limit – 7)


TYPE CHECKING Segment descriptors contain type information in two places: • The S (descriptor type) flag. • The type field. The processor uses this information to detect programming errors that result in an attempt to use a segment or gate in an incorrect or unintended manner. The S flag indicates whether a descriptor is a system type or a code or data type. The type field provides 4 additional bits for use in defining various types of code, data, and system descriptors. The processor examines type information at various times while operating on segment selectors and segment descriptors. The following list gives examples of typical operations where type checking is performed (this list is not exhaustive): OPTIONAL • When a segment selector is loaded into a segment register — Certain segment registers can contain only certain descriptor types, for example: — The CS register only can be loaded with a selector for a code segment. — Segment selectors for code segments that are not readable or for system segments cannot be loaded into data-segment registers (DS, ES, FS, and GS). — Only segment selectors of writable data segments can be loaded into the SS register. • When a segment selector is loaded into the LDTR or task register — For example: — The LDTR can only be loaded with a selector for an LDT. — The task register can only be loaded with a segment selector for a TSS. • When instructions access segments whose descriptors are already loaded into segment registers — Certain segments can be used by instructions only in certain predefined ways, for example: — No instruction may write into an executable segment. — No instruction may write into a data segment if it is not writable. — No instruction may read an executable segment unless the readable flag is set. • When an instruction operand contains a segment selector — Certain instructions can access segments or gates of only a particular type, for example: — A far CALL or far JMP instruction can only access a segment descriptor for a conforming code segment, nonconforming code segment, call gate, task gate, or TSS. — The LLDT instruction must reference a segment descriptor for an LDT. — The LTR instruction must reference a segment descriptor for a TSS. — The LAR instruction must reference a segment or gate descriptor for an LDT, TSS, call gate, task gate, code segment, or data segment. — The LSL instruction must reference a segment descriptor for a LDT, TSS, code segment, or data segment. — IDT entries must be interrupt, trap, or task gates. • During certain internal operations — For example: — On a far call or far jump (executed with a far CALL or far JMP instruction), the processor determines the type of control transfer to be carried out (call or jump to another code segment, a call or jump through a gate, or a task switch) by checking the type field in the segment (or gate) descriptor pointed to by the segment (or gate) — On a call or jump through a call gate (or on an interrupt- or exception-handler call through a


trap or interrupt gate), the processor automatically checks that the segment descriptor being pointed to by the gate is for a code segment. — On a call or jump to a new task through a task gate (or on an interrupt- or exceptionhandler call to a new task through a task gate), the processor automatically checks that the segment descriptor being pointed to by the task gate is for a TSS. — On a call or jump to a new task by a direct reference to a TSS, the processor automatically checks that the segment descriptor being pointed to by the CALL or JMP instruction is for a TSS. — On return from a nested task (initiated by an IRET instruction), the processor checks that the previous task link field in the current TSS points to a TSS. selector given as an operand in the CALL or JMP instruction. If the descriptor type is for a code segment or call gate, a call or jump to another code segment is indicated; if the descriptor type is for a TSS or task gate, a task switch is indicated. 4.81 PRIVILEGE LEVELS The processor’s segment-protection mechanism recognizes 4 privilege levels, numbered from 0 to 3. The greater numbers mean lesser privileges. Figure 4-3 shows how these levels of privilege can be interpreted as rings of protection. The center (reserved for the most privileged code, data, and stacks) is used for the segments containing the critical software, usually the kernel of an operating system. Outer rings are used for less critical software. (Systems that use only 2 of the 4 possible privilege levels should use levels 0 and 3.)

Protection Rings

The processor uses privilege levels to prevent a program or task operating at a lesser privilege level from accessing a segment with a greater privilege, except under controlled situations. When the processor detects a privilege level violation, it generates a general-protection exception (#GP).


To carry out privilege-level checks between code segments and data segments, the processor recognizes the following three types of privilege levels: • Current privilege level (CPL) — The CPL is the privilege level of the currently executing program or task. It is stored in bits 0 and 1 of the CS and SS segment registers. Normally, the CPL is equal to the privilege level of the code segment from which instructions are being fetched. • Descriptor privilege level (DPL) — The DPL is the privilege level of a segment or gate. It is stored in the DPL field of the segment or gate descriptor for the segment or gate. When the currently executing code segment attempts to access a segment or gate, the DPL of the segment or gate is compared to the CPL and RPL of the segment or gate selector (as described later in this section). The DPL is interpreted differently, depending on the type of segment or gate being accessed: — Data segment — The DPL indicates the numerically highest privilege level that a program or task can have to be allowed to access the segment. For example, if the DPL of a data segment is 1, only programs running at a CPL of 0 or 1 can access the segment. — Nonconforming code segment (without using a call gate) — The DPL indicates the privilege level that a program or task must be at to access the segment. For example, if the DPL of a nonconforming code segment is 0, only programs running at a CPL of 0 can access the segment. — Call gate — The DPL indicates the numerically highest privilege level that the currently executing program or task can be at and still be able to access the call gate. (This is the same access rule as for a data segment.) — Conforming code segment and nonconforming code segment accessed through a call gate — The DPL indicates the numerically lowest privilege level that a program or task can have to be allowed to access the segment. For example, if the DPL of a conforming code segment is 2, programs running at a CPL of 0 or 1 cannot access the segment. — TSS — The DPL indicates the numerically highest privilege level that the currently executing program or task can be at and still be able to access the TSS. (This is the same access rule as for a data segment.) Requested privilege level (RPL) — The RPL is an override privilege level that is assigned to segment selectors. It is stored in bits 0 and 1 of the segment selector. The processor checks the RPL along with the CPL to determine if access to a segment is allowed. Accessing Data in Data Segments To access operands in a data segment, the segment selector for the data segment must be loaded into the data-segment registers (DS, ES, FS, or GS) or into the stack-segment register (SS). (Segment registers can be loaded with the MOV, POP, LDS, LES, LFS, LGS, and LSS instructions.) Before the processor loads a segment selector into a segment register, it performs a privilege check .The processor checks the RPL along with the CPL


to determine if access to a segment is allowed. Even if the program or task requesting access to a segment has sufficient privilege to access the segment, access is denied if the RPL is not of sufficient privilege level.

Privilege for Data Access

Accessing Data in Code Segments In some instances it may be desirable to access data structures that are contained in a code segment. The following methods of accessing data in code segments are possible: • Load a data-segment register with a segment selector for a nonconforming, readable, code segment. • Load a data-segment register with a segment selector for a conforming, readable, code segment. • Use a code-segment override prefix (CS) to read a readable, code segment whose selector is already loaded in the CS register. MAX (CPL,RPL) <= DPL


Gate Descriptors To provide controlled access to code segments with different privilege levels, the processor provides special set of descriptors called gate descriptors. There are four kinds of gate descriptors: • Call gates • Trap gates • Interrupt gates • Task gates Task gates are used for task switching(ch6). Trap and interrupt gates(ch5) are special kinds of call gates used for calling exception and interrupt handlers Call Gates Call gates facilitate controlled transfers of program control between different privilege levels. They are typically used only in operating systems or executives that use the privilege-level protection mechanism. Call gates are also useful for transferring program control between 16-bit and 32-bit code segments

Call Gate Descriptor

Figure 4-8 shows the format of a call-gate descriptor. A call-gate descriptor may reside in the GDT or in an LDT, but not in the interrupt descriptor table (IDT). It performs six functions: • It specifies the code segment to be accessed. • It defines an entry point for a procedure in the specified code segment. • It specifies the privilege level required for a caller trying to access the procedure. • If a stack switch occurs, it specifies the number of optional parameters to be copied between stacks. • It defines the size of values to be pushed onto the target stack: 16-bit gates force 16-bit pushes and 32-bit gates force 32-bit pushes. • It specifies whether the call-gate descriptor is valid. The segment selector field in a call gate specifies the code segment to be accessed. The offset field specifies the entry point in the code segment. This entry point is generally to the first instruction of a specific procedure. The DPL field indicates the privilege level of


the call gate, which in turn is the privilege level required to access the selected procedure through the gate. The P flag indicates whether the call-gate descriptor is valid. As shown in Figure 4-11, four different privilege levels are used to check the validity of a program control transfer through a call gate: • The CPL (current privilege level). • The RPL (requestor's privilege level) of the call gate’s selector. • The DPL (descriptor privilege level) of the call gate descriptor. • The DPL of the segment descriptor of the destination code segment. The C flag (conforming) in the segment descriptor for the destination code segment is also checked.

Call-Gate Mechanism


Privilege Check for Control Transfer with Call Gate

The privilege checking rules are different depending on whether the control transfer was initiated with a CALL or a JMP instruction, as shown in Table.

Privilege Check Rules for Call Gates


TASK MANAGEMENT ( P3 6.1) A task is a unit of work that a processor can dispatch, execute, and suspend. It can be used to execute a program, a task or process, an operating-system service utility, an interrupt or exception handler, or a kernel or executive utility. Task Structure A task is made up of two parts: a task execution space and a task-state segment (TSS). The task execution space consists of a code segment, a stack segment, and one or more data segments. The TSS specifies the segments that make up the task execution space and provides a storage place for task state information. In multitasking systems, the TSS also provides a mechanism for linking tasks. A task is identified by the segment selector for its TSS. When a task is loaded into the processor for execution, the segment selector, base address, limit, and segment descriptor attributes for the TSS are loaded into the task register

Structure of a Task

Executing a Task Software or the processor can dispatch a task for execution in one of the following ways: • A explicit call to a task with the CALL instruction. • A explicit jump to a task with the JMP instruction. • An implicit call (by the processor) to an interrupt-handler task.


• An implicit call to an exception-handler task. • A return (initiated with an IRET instruction) when the NT flag in the EFLAGS register is set. TASK MANAGEMENT DATA STRUCTURES The processor defines five data structures for handling task-related activities: • Task-state segment (TSS). • Task-gate descriptor. • TSS descriptor. • Task register. • NT flag in the EFLAGS register. When operating in protected mode, a TSS and TSS descriptor must be created for at least one task, and the segment selector for the TSS must be loaded into the task register (using the LTR instruction). Task-State Segment (TSS) The processor state information needed to restore a task is saved in a system segment called the task-state segment (TSS). Figure 6-2 shows the format of a TSS for tasks designed for 32-bit CPUs. The fields of a TSS are divided into two main categories: dynamic fields and static fields. Following are dynamic fields: • General-purpose register fields — State of the EAX, ECX, EDX, EBX, ESP, EBP, ESI, and EDI registers prior to the task switch. • Segment selector fields — Segment selectors stored in the ES, CS, SS, DS, FS, and GS registers prior to the task switch. • EFLAGS register field — State of the EFAGS register prior to the task switch. • EIP (instruction pointer) field — State of the EIP register prior to the task switch. • Previous task link field — Contains the segment selector for the TSS of the previous task (updated on a task switch that was initiated by a call, interrupt, or exception). This field (which is sometimes called the back link field) permits a task switch back to the previous task by using the IRET instruction. The processor reads the static fields, but does not normally change them. These fields are set up when a task is created. The following are static fields: • LDT segment selector field — Contains the segment selector for the task's LDT. • CR3 control register field — Contains the base physical address of the page directory to be used by the task. Control register CR3 is also known as the page-directory base register (PDBR). • Privilege level-0, -1, and -2 stack pointer fields — These stack pointers consist of a logical address made up of the segment selector for the stack segment (SS0, SS1, and SS2) and an offset into the stack (ESP0, ESP1, and ESP2). Note that the values in these fields are static for a particular task; whereas, the SS and ESP values will change if stack switching occurs within the task.


• T (debug trap) flag (byte 100, bit 0) — When set, the T flag causes the processor to raise a debug exception when a task switch to this task occurs (see Section 15.3.1.5, “Task- Switch Exception Condition”). • I/O map base address field — Contains a 16-bit offset from the base of the TSS to the I/O permission bit map and interrupt redirection bitmap. When present, these maps are stored in the TSS at higher addresses. The I/O map base address points to the beginning of the I/O permission bit map and the end of the interrupt redirection bit map.

32-Bit Task-State Segment (TSS)


TSS Descriptor The TSS, like all other segments, is defined by a segment descriptor. Figure 6-3 shows the format of a TSS descriptor. TSS descriptors may only be placed in the GDT; they cannot be placed in an LDT or the IDT. An attempt to access a TSS using a segment selector with its TI flag set (which indicates the current LDT) causes a general-protection exception (#GP) to be generated during CALLs and JMPs; it causes an invalid TSS exception (#TS) during IRETs. A general-protection exception is also generated if an attempt is made to load a segment selector for a TSS into a segment register. The busy flag (B) in the type field indicates whether the task is busy. A busy task is currently running or suspended. A type field with a value of 1001B indicates an inactive task; a value of 1011B indicates a busy task. Tasks are not recursive. The processor uses the busy flag to detect an attempt to call a task whose execution has been interrupted. To insure that there is only one busy flag is associated with a task, each TSS should have only one TSS descriptor that points to it.

TSS Descriptor


Task Register The task register holds the 16-bit segment selector and the entire segment descriptor (32-bit base address, 16-bit segment limit, and descriptor attributes) for the TSS of the current task (see Figure ). This information is copied from the TSS descriptor in the GDT for the current task. Figure shows the path the processor uses to access the TSS (using the information in the task register).

Task Register

Task-Gate Descriptor A task-gate descriptor provides an indirect, protected reference to a task (see Figure). It can be placed in the GDT, an LDT, or the IDT the GDT, an LDT, or the IDT. The TSS segment selector field in a task-gate descriptor points to a TSS descriptor in the GDT. The RPL in this segment selector is not used. The DPL of a task-gate descriptor controls access to the TSS descriptor during a task switch. When a program or procedure makes a call or jump to a task through a task gate, the CPL and the RPL field of the gate selector


pointing to the task gate must be less than or equal to the DPL of the task-gate descriptor. Note that when a task gate is used, the DPL of the destination TSS descriptor is not used.

Task Gate Descriptor

TASK SWITCHING The processor transfers execution to another task in one of four cases: • The current program, task, or procedure executes a JMP or CALL instruction to a TSS descriptor in the GDT. • The current program, task, or procedure executes a JMP or CALL instruction to a task-gate descriptor in the GDT or the current LDT. • An interrupt or exception vector points to a task-gate descriptor in the IDT. • The current task executes an IRET when the NT flag in the EFLAGS register is set. JMP, CALL, and IRET instructions, as well as interrupts and exceptions, are all mechanisms for redirecting a program. The referencing of a TSS descriptor or a task gate (when calling or jumping to a task) or the state of the NT flag (when executing an IRET instruction) determines whether a task switch occurs. TASK LINKING ( NESTED TASKS) The previous task link field of the TSS (sometimes called the “backlink”) and the NT flag in the EFLAGS register are used to return execution to the previous task. EFLAGS.NT = 1 indicates that the currently executing task is nested within the execution of another task. NESTED TASK SWITCHES

1) Nested Tasks act like subroutines. 2) Call inst to task will nest tasks 3) Interrupt or Exception to task gate will nest tasks


4) JMP instr will not nest tasks 5) New TSS gets old TSS in BACK LINK field 6) New Task gets nested Task bit set in EFLAGS REG 7) New Task must return to old task with IRET.

Nested Tasks


PROTECTED-MODE I/O When the processor is running in protected mode, the following protection mechanisms regulate access to I/O ports: • When accessing I/O ports through the I/O address space, two protection devices control access: — The I/O privilege level (IOPL) field in the EFLAGS register — The I/O permission bit map of a task state segment (TSS) • When accessing memory-mapped I/O ports, the normal segmentation and paging protection and the MTRRs (in processors that support them) also affect access to I/O ports. The following sections describe the protection mechanisms available when accessing I/O ports in the I/O address space with the I/O instructions. I/O Permission Bit Map The I/O permission bit map is a device for permitting limited access to I/O ports by less privileged programs or tasks and for tasks operating in virtual-8086 mode. The I/O permission bit map is located in the TSS (see Figure ) for the currently running task or program. The address of the first byte of the I/O permission bit map is given in the I/O map base address field of the TSS. The size of the I/O permission bit map and its location in the TSS are variable.

I/O Permission Bit Map


INTERRUPT AND EXCEPTION OVERVIEW Interrupts and exceptions are events that indicate that a condition exists somewhere in the system, the processor, or within the currently executing program or task that requires the attention of a processor. They typically result in a forced transfer of execution from the currently running program or task to a special software routine or task called an interrupt handler or an exception handler. The action taken by a processor in response to an interrupt or exception is referred to as servicing or handling the interrupt or exception. Interrupts occur at random times during the execution of a program, in response to signals from hardware. System hardware uses interrupts to handle events external to the processor, such as requests to service peripheral devices. Software can also generate interrupts by executing the INT n instruction. When an interrupt is received or an exception is detected, the currently running procedure or task is suspended while the processor executes an interrupt or exception handler. When execution of the handler is complete, the processor resumes execution of the interrupted procedure or task. The resumption of the interrupted procedure or task happens without loss of program continuity, unless recovery from an exception was not possible or an interrupt caused the currently running program to be terminated. EXCEPTION AND INTERRUPT VECTORS To aid in handling exceptions and interrupts, each IA-32 architecture-defined exception and each interrupt condition that requires special handling by the processor is assigned a unique identification number, called a vector. The processor uses the vector assigned to an exception or interrupt as an index into the interrupt descriptor table (IDT). The table provides the entry point to an exception or interrupt handler The allowable range for vector numbers is 0 to 255. Vectors in the range 0 through 31 are reserved by the IA-32 architecture for architecture-defined exceptions and interrupts. The vectors in the range 32 to 255 are designated as user-defined interrupts and are not reserved by the IA-32 architecture. These interrupts are generally assigned to external I/O devices to enable those devices to send interrupts to the processor through one of the external hardware interrupt mechanisms. SOURCES OF INTERRUPTS The processor receives interrupts from two sources:

• External (hardware generated) interrupts. • Software-generated interrupts. • Occurrence of some condition


SOURCES OF EXCEPTIONS The processor receives exceptions from three sources:

• Processor-detected program-error exceptions. • Software-generated exceptions. • Machine-check exceptions.

Exceptions are classified as faults, traps, or aborts depending on the way they are reported and whether the instruction that caused the exception can be restarted without loss of program or task continuity. Faults — Detected and serviced before execution of faulting instruction . A fault is an exception that can generally be corrected and that, once corrected, allows the program to be restarted with no loss of continuity. When a fault is reported, the processor restores the machine state to the state prior to the beginning of execution of the faulting instruction. The return address (saved contents of the CS and EIP registers) for the fault handler points to the faulting instruction, rather than to the instruction following the faulting instruction. Traps — A trap is an exception that is reported immediately following the execution of the trapping instruction. Traps allow execution of a program or task to be continued without loss of program continuity. The return address for the trap handler points to the instruction to be executed after the trapping instruction. Aborts — An abort is an exception that does not always report the precise location of the instruction causing the exception and does not allow a restart of the program or task that caused the exception. Aborts are used to report severe errors, such as hardware errors and inconsistent or illegal values in system tables. INTERRUPT DESCRIPTOR TABLE (IDT) The interrupt descriptor table (IDT) associates each exception or interrupt vector with a gate descriptor for the procedure or task used to service the associated exception or interrupt. Like the GDT and LDTs, the IDT is an array of 8-byte descriptors (in protected mode). Unlike the GDT, the first entry of the IDT may contain a descriptor. To form an index into the IDT, the processor scales the exception or interrupt vector by eight (the number of bytes in a gate descriptor). Because there are only 256 interrupt or exception vectors, the IDT need not contain more than 256 descriptors. It can contain fewer than 256 descriptors, because descriptors are required only for the interrupt and exception vectors that may occur. All empty descriptor slots in the IDT should have the present flag for the descriptor set to 0. The base addresses of the IDT should be aligned on an 8-byte boundary to maximize performance of cache line fills.


Relationship of the IDTR and IDT

IDT DESCRIPTORS The IDT may contain any of three kinds of gate descriptors: • Task-gate descriptor • Interrupt-gate descriptor • Trap-gate descriptor

IDT Gate Descriptors


Interrupt Procedure Call


VIRTUAL-8086 MODE Virtual-8086 mode is actually a special type of a task that runs in protected mode. When the operating-system or executive switches to a virtual-8086-mode task, the processor emulates an Intel 8086 processor. The execution environment of the processor while in the 8086-emulation state is the same as for real-address mode. The major difference between the two modes is that in virtual-8086 mode the 8086 emulator uses some protected-mode services (such as the protected-mode interrupt and exception-handling and paging facilities). As in real-address mode, any new or legacy program that has been assembled and/or compiled to run on an Intel 8086 processor will run in a virtual-8086-mode task. And several 8086 programs can be run as virtual-8086-mode tasks concurrently with normal protected-mode tasks, using the processor’s multitasking facilities. The processor runs in virtual-8086 mode when the VM (virtual machine) flag in the EFLAGS register is set. This flag can only be set when the processor switches to a new protected-mode task or resumes virtual-8086 mode via an IRET instruction. INTERRUPT AND EXCEPTION HANDLING IN VIRTUAL-8086 MO DE When the processor receives an interrupt or detects an exception condition while in virtual-8086 mode, it invokes an interrupt or exception handler, just as it does in protected or real-address mode. The interrupt or exception handler that is invoked and the mechanism used to invoke it depends on the class of interrupt or exception that has been detected or generated and the state of various system flags and fields. Handling an Interrupt or Exception Through a Protec ted-Mode Trap or Interrupt Gate When an interrupt or exception vector points to a 32-bit trap or interrupt gate in the IDT, the gate must in turn point to a nonconforming, privilege-level 0, code segment. When accessing this code segment, processor performs the following steps. 1. Switches to 32-bit protected mode and privilege level 0. 2. Saves the state of the processor on the privilege-level 0 stack. The states of the EIP, CS, EFLAGS, ESP, SS, ES, DS, FS, and GS registers are saved (see Figure ). 3. Clears the segment registers. Saving the DS, ES, FS, and GS registers on the stack and then clearing the registers lets the interrupt or exception handler safely save and restore these registers regardless of the type segment selectors they contain (protected-mode or 8086- style 4. Clears VM, NT, RF and TF flags (in the EFLAGS register). If the gate is an interrupt gate, clears the IF flag. 5. Begins executing the selected interrupt or exception handler. If the trap or interrupt gate references a procedure in a conforming segment or in a segment at a privilege level other than 0, the processor generates a general-protection exception (#GP). Here, the error code is the segment selector of the code segment to which a call was attempted. ------------------------------------------The End (Pentium)-------------------------------------------


PIC16f84A /16c6x/16c7x/16f877A

Enhanced FLASH/EEPROM 8-Bit Microcontroller SPECIAL FEATURES OF THE CPU What sets a microcontroller apart from other processors are special circuits to deal with the needs of real time applications. The PIC16F8X has a host of such features intended to maximize system reliability, minimize cost through elimination of external components, provide power saving operating modes and offer code protection. These features are: • OSC Selection • Reset - Power-on Reset (POR) - Power-up Timer (PWRT) - Oscillator Start-up Timer (OST) • Interrupts • Watchdog Timer (WDT) • SLEEP • Code protection • ID locations • In-circuit serial programming The PIC16F8X has a Watchdog Timer which can be shut off only through configuration bits. It runs off its own RC oscillator for added reliability. There are two timers that offer necessary delays on power-up. One is the Oscillator Start-up Timer (OST), intended to keep the chip in reset until the crystal oscillator is stable. The other is the Power-up Timer (PWRT), which provides a fixed delay of 72 ms (nominal) on power-up only. This design keeps the device in reset while the power supply stabilizes. With these two timers on-chip, most applications need no external reset circuitry. SLEEP mode offers a very low current power-down mode. The user can wake-up from SLEEP through external reset, Watchdog Timer time-out or through an interrupt. Several oscillator options are provided to allow the part to fit the application. The RC oscillator option saves system cost while the LP crystal option saves power. A set of configuration bits are used to select the various options. High Performance RISC CPU Features: • Only 35 single word instructions to learn • All instructions single-cycle except for program branches which are two-cycle • Operating speed: DC - 20 MHz clock input DC - 200 ns instruction cycle • 1024 words of program memory • 68 bytes of Data RAM • 64 bytes of Data EEPROM • 14-bit wide instruction words • 8-bit wide data bytes • 15 Special Function Hardware registers


• Eight-level deep hardware stack • Direct, indirect and relative addressing modes • Four interrupt sources: - External RB0/INT pin - TMR0 timer overflow - PORTB<7:4> interrupt-on-change - Data EEPROM write complete Peripheral Features: • 13 I/O pins with individual direction control • High current sink/source for direct LED drive - 25 mA sink max. per pin - 25 mA source max. per pin • TMR0: 8-bit timer/counter with 8-bit programmable prescaler Special Microcontroller Features: • 10,000 erase/write cycles Enhanced FLASH Program memory typical • 10,000,000 typical erase/write cycles EEPROM Data memory typical • EEPROM Data Retention > 40 years • In-Circuit Serial Programming™ (ICSP™) – via two pins • Power-on Reset (POR), Power-up Timer (PWRT), Oscillator Start-up Timer (OST) • Watchdog Timer (WDT) with its own On-Chip RC Oscillator for reliable operation • Code protection • Power saving SLEEP mode • Selectable oscillator options The PIC16F8X has up to 68 bytes of RAM, 64 bytes of Data EEPROM memory, and 13 I/O pins. A timer/ counter is also available.


Reset The PIC differentiates between various kinds of Reset: • Power-on Reset (POR) • MCLR Reset during normal operation • MCLR Reset during Sleep • WDT Reset (during normal operation) • WDT Wake-up (during Sleep) • Brown-out Reset (BOR) Some registers are not affected in any Reset condition. Their status is unknown on POR and unchanged in any other Reset. Most other registers are reset to a “Reset state” on Power-on Reset (POR), on the MCLR and WDT Reset, on MCLR Reset during Sleep and Brownout Reset (BOR). They are not affected by a WDT wake-up which is viewed as the resumption of normal operation. The TO and PD bits are set or cleared differently in different Reset situations. MCLR PIC devices have a noise filter in the MCLR Reset path. The filter will detect and ignore small pulses. It should be noted that a WDT Reset does not drive MCLR pin low. The behavior of the ESD protection on the MCLR pin differs from previous devices of this family. Voltages applied to the pin that exceed its specification can result in both Resets and current consumption outside of device specification during the Reset event. For this reason, Microchip recommends that the MCLR pin no longer be tied directly to VDD. The use of an RCR network, as shown in Figure , is suggested

RECOMMENDED MCLR CIRCUIT


Power-on Reset (POR) A Power-on Reset pulse is generated on-chip when VDD rise is detected (in the range of 1.2V-1.7V). To take advantage of the POR, tie the MCLR pin to VDD through an RC network, as described When the device starts normal operation (exits the Reset condition), device operating parameters (voltage, frequency, temperature, etc.) must be met to ensure operation. If these conditions are not met, the device must be held in Reset until the operating conditions are met. Brown-out Reset may be used to meet the start-up conditions. Power-up Timer (PWRT) The Power-up Timer provides a fixed 72 ms nominal time-out on power-up only from the POR. The PowerUp Timer operates on an internal RC oscillator. The chip is kept in Reset as long as the PWRT is active. The PWRT’s time delay allows VDD to rise to an acceptable level. A configuration bit is provided to enable or disable the PWRT. The power-up time delay will vary from chip to chip due to VDD, temperature and process variation. Oscillator Start-up Timer (OST) The Oscillator Start-up Timer (OST) provides a delay of 1024 oscillator cycles (from OSC1 input) after the PWRT delay is over (if PWRT is enabled). This helps to ensure that the crystal oscillator or resonator has started and stabilized. The OST time-out is invoked only for XT, LP and HS modes and only on Power-on Reset or wake-up from Sleep. Brown-out Reset (BOR) The configuration bit, BODEN, can enable or disable the Brown-out Reset circuit. If VDD falls below VBOR (parameter D005, about 4V) for longer than TBOR (parameter #35, about 100 µS), the brown-out situation will reset the device. If VDD falls below VBOR for less than TBOR, a Reset may not occur. Once the brown-out occurs, the device will remain in Brown-out Reset until VDD rises above VBOR. The Power-up Timer then keeps the device in Reset for TPWRT (parameter #33, about 72 mS). If VDD should fall below VBOR during TPWRT, the Brown-out Reset process will restart when VDD rises above VBOR with the Power-up Timer Reset. The Power-up Timer is always enabled when the Brown-out Reset circuit is enabled, regardless of the state of the PWRT configuration bit. Time-out Sequence On power-up, the time-out sequence is as follows: the PWRT delay starts (if enabled) when a POR Reset occurs. Then, OST starts counting 1024 oscillator cycles when PWRT ends (LP, XT, HS). When the OST ends, the device comes out of Reset. If MCLR is kept low long enough, the time-outs will expire. Bringing MCLR high will begin execution immediately. This is useful for testing purposes or to synchronize more than one PIC device operating in parallel.


Power Control/Status Register (PCON) The Power Control/Status Register, PCON, has up to two bits depending upon the device. Bit 0 is the Brown-out Reset Status bit, BOR. The BOR bit is unknown on a Power-on Reset. It must then be set by the user and checked on subsequent Resets to see if it has been cleared, indicating that a BOR has occurred. When the Brown-out Reset is disabled, the state of the BOR bit is unpredictable and is, therefore, not valid at any time. Bit 1 is the Power-on Reset Status bit, POR. It is cleared on a Power-on Reset and unaffected otherwise. The user must set this bit following a Power-on Reset.

TIME-OUT IN VARIOUS SITUATIONS

STATUS BITS AND THEIR SIGNIFICANCE

Note 1: When the wake-up is due to an interrupt and the GIE bit is set, the PC is loaded with the interrupt vector (0004h). RESET CONDITIONS FOR SPECIAL REGISTERS


Block Diagram

Clocking Scheme/Instruction Cycle The clock input (from OSC1) is internally divided by four to generate four non-overlapping quadrature clocks namely Q1, Q2, Q3 and Q4. Internally, the program counter (PC) is incremented every Q1, the instruction is fetched from the program memory and latched into the instruction register in Q4. The instruction is decoded and executed during the following Q1 through Q4. The clocks and instruction execution flow is shown in Figure 3-2. Instruction Flow/Pipelining An “Instruction Cycle” consists of four Q cycles (Q1, Q2, Q3 and Q4). The instruction fetch and execute are pipelined such that fetch takes one instruction cycle while decode and execute takes another instruction cycle. However, due to the pipelining, each instruction effectively executes in one cycle. If an instruction causes the program counter to change (e.g., GOTO) then two cycles are required to complete the instruction


CLOCK/INSTRUCTION CYCLE

INSTRUCTION PIPELINE FLOW

2.0 MEMORY ORGANIZATION There are two memory blocks in the PIC16F84A. These are the program memory and the data memory. Each block has its own bus, so that access to each block can occur during the same oscillator cycle. The data memory can further be broken down into the general purpose RAM and the Special Function Registers (SFRs). The operation of the SFRs that control the “core” are described here. The SFRs used to control the peripheral modules are described in the section discussing each individual peripheral module. The data memory area also contains the data EEPROM memory. This memory is not directly mapped into the data memory, but is indirectly mapped. That is, an indirect address pointer specifies the address of the data EEPROM memory to read/write. The 64 bytes of data EEPROM memory have the address range 0h-3Fh. More details on the EEPROM memory can be found in Section 3.0. 2.1 Program Memory Organization The PIC16FXX has a 13-bit program counter capable of addressing an 8K x 14 program memory space. For the PIC16F84A, the first 1K x 14 (0000h-03FFh) are physically implemented (Figure 2-1). Accessing a location above the physically implemented address will cause a wraparound. For example, for locations 20h, 420h, 820h, C20h, 1020h, 1420h, 1820h, and 1C20h, the instruction will be the same. The RESET vector is at 0000h and the interrupt vector is at 0004h.


Program Memory Map and Stack

2.2 Data Memory Organization The data memory is partitioned into two areas. The first is the Special Function Registers (SFR) area, while the second is the General Purpose Registers (GPR) area. The SFRs control the operation of the device. Portions of data memory are banked. This is for both the SFR area and the GPR area. The GPR area is banked to allow greater than 116 bytes of general purpose RAM. The banked areas of the SFR are for the registers that control the peripheral functions. Banking requires the use of control bits for bank selection. These control bits are located in the STATUS Register. Figure 2-2 shows the data memory map organization. Instructions MOVWF and MOVF can move values from the W register to any location in the register file (“F”), and vice-versa. The entire data memory can be accessed either directly using the absolute address of each register file or indirectly through the File Select Register (FSR) (Section 2.5). Indirect addressing uses the present value of the RP0 bit for access into the banked areas of data memory. Data memory is partitioned into two banks which contain the general purpose registers and the special function registers. Bank 0 is selected by clearing the RP0 bit (STATUS<5>). Setting the RP0 bit selects Bank 1. Each Bank extends up to 7Fh (128 bytes). The first twelve locations of each Bank are reserved for the Special Function Registers. The remainder are General Purpose Registers, implemented as static RAM.


GENERAL PURPOSE REGISTER FILE Each General Purpose Register (GPR) is 8-bits wide and is accessed either directly or indirectly through the FSR (Section 2.5). The GPR addresses in Bank 1 are mapped to addresses in Bank 0. As an example, addressing location 0Ch or 8Ch will access the same GPR.

Register File

Special Function Registers The Special Function Registers (Figure 2-2 and Table 2-1) are used by the CPU and Peripheral functions to control the device operation. These registers are static RAM. The special function registers can be classified into two sets, core and peripheral. Those associated with the core functions are described in this section. Those related to the operation of the peripheral features are described in the section for that specific feature.


Register File Summary

STATUS REGISTER The STATUS register contains the arithmetic status of the ALU, the RESET status and the bank select bit for data memory. As with any register, the STATUS register can be the destination for any instruction. If the STATUS register is the destination for an instruction that affects the Z, DC or C bits, then the write to these three bits is disabled. These bits are set or cleared according to device logic. Furthermore, the TO and PD bits are not writable. Therefore, the result of an instruction with the STATUS register as destination may be different than intended. For example, CLRF STATUS will clear the upper-three bits and set the Z bit. This leaves the STATUS register as 000u u1uu (where u = unchanged). Only the BCF, BSF, SWAPF and MOVWF instructions should be used to alter the STATUS register (Table 9-2) because these instructions do not affect any status bit.


Status Register


OPTION_REG REGISTER The OPTION_REG register is a readable and writable register which contains various control bits to configure the TMR0/WDT prescaler, the external INT interrupt, TMR0, and the weak pull-ups on PORTB. Note: When the prescaler is assigned to the WDT (PSA = ’1’), TMR0 has a 1:1 prescaler assignment.

Option Register

INTCON REGISTER The INTCON register is a readable and writable register which contains the various enable bits for all interrupt sources. Note: Interrupt flag bits get set when an interrupt condition occurs regardless of the state of its corresponding enable bit or the global enable bit, GIE (INTCON<7>).


Intcon Register

Program Counter: PCL and PCLATH The Program Counter (PC) is 13-bits wide. The low byte is the PCL register, which is a readable and writable register. The high byte of the PC (PC<12:8>) is not directly readable nor writable and comes from the PCLATH register. The PCLATH (PC latch high) register is a holding register for PC<12:8>. The contents of PCLATH are transferred to the upper byte of the program counter when the PC is loaded with a new value. This occurs during a CALL, GOTO or a write to PCL. The high bits of PC are loaded from PCLATH as shown in Figure


Loading of PC in Different Situations

Stack The PIC16FXX has an 8 deep x 13-bit wide hardware stack (Figure 4-1). The stack space is not part of either program or data space and the stack pointer is not readable or writable. The entire 13-bit PC is “pushed” onto the stack when a CALL instruction is executed or an interrupt is acknowledged. The stack is “popped” in the event of a RETURN, RETLW or a RETFIE instruction execution. PCLATH is not affected by a push or a pop operation. Note: There are no instruction mnemonics called push or pop. These are actions that occur from the execution of the CALL, RETURN, RETLW, and RETFIE instructions, or the vectoring to an interrupt address The stack operates as a circular buffer. That is, after the stack has been pushed eight times, the ninth push overwrites the value that was stored from the first push. The tenth push overwrites the second push (and so on). If the stack is effectively popped nine times, the PC value is the same as the value from the first pop. Indirect Addressing; INDF and FSR Registers The INDF register is not a physical register. Addressing INDF actually addresses the register whose address is contained in the FSR register (FSR is a pointer). This is indirect addressing.


EXAMPLE 4-1: INDIRECT ADDRESSING • Register file 05 contains the value 10h • Register file 06 contains the value 0Ah • Load the value 05 into the FSR register • A read of the INDF register will return the value of 10h • Increment the value of the FSR register by one (FSR = 06) • A read of the INDF register now will return the value of 0Ah. Reading INDF itself indirectly (FSR = 0) will produce 00h. Writing to the INDF register indirectly results in a no-operation (although STATUS bits may be affected). A simple program to clear RAM locations 20h-2Fh using indirect addressing is shown in Example 4-2. EXAMPLE 4-2: HOW TO CLEAR RAM USING INDIRECT ADDRES SING movlw 0x20 ;initialize pointer movwf FSR ; to RAM NEXT clrf INDF ;clear INDF register incf FSR ;inc pointer btfss FSR,4 ;all done? goto NEXT ;NO, clear next CONTINUE : ;YES, continue An effective 9-bit address is obtained by concatenating the 8-bit FSR register and the IRP bit (STATUS<7>), as shown in Figure 4-1. However, IRP is not used in the PIC16F8X.

DIRECT/INDIRECT ADDRESSING


I/O PORTS The PIC16F8X has two ports, PORTA and PORTB. Some port pins are multiplexed with an alternate function for other features on the device. PORTA and TRISA Registers PORTA is a 5-bit wide latch. RA4 is a Schmitt Trigger input and an open drain output. All other RA port pins have TTL input levels and full CMOS output drivers. All pins have data direction bits (TRIS registers) which can configure these pins as output or input. Setting a TRISA bit (=1) will make the corresponding PORTA pin an input, i.e., put the corresponding output driver in a hi-impedance mode. Clearing a TRISA bit (=0) will make the corresponding PORTA pin an output, i.e., put the contents of the output latch on the selected pin. Reading the PORTA register reads the status of the pins whereas writing to it will write to the port latch. All write operations are read-modify write operations. So a write to a port implies that the port pins are first read, then this value is modified and written to the port data latch. The RA4 pin is multiplexed with the TMR0 clock input. PORTB and TRISB Registers PORTB is an 8-bit wide bi-directional port. The corresponding data direction register is TRISB. A ’1’ on any bit in the TRISB register puts the corresponding output driver in a hi-impedance mode. A ’0’ on any bit in the TRISB register puts the contents of the output latch on the selected pin(s). Each of the PORTB pins have a weak internal pull-up. A single control bit can turn on all the pull-ups. This is done by clearing the RBPU (OPTION_REG<7>) bit. The weak pull-up is automatically turned off when the port pin is configured as an output. The pull-ups are disabled on a Power-on Reset. Four of PORTB’s pins, RB7:RB4, have an interrupt on change feature. Only pins configured as inputs can cause this interrupt to occur (i.e., any RB7:RB4 pin configured as an output is excluded from the interrupt on change comparison). The pins value in input mode are compared with the old value latched on the last read of PORTB. The “mismatch” outputs of the pins are OR’ed together to generate the RB port change interrupt. This interrupt can wake the device from SLEEP. The user, in the interrupt service routine, can clear the interrupt in the following manner: a) Read (or write) PORTB. This will end the mismatch condition. b) Clear flag bit RBIF. A mismatch condition will continue to set the RBIF bit. Reading PORTB will end the mismatch condition, and allow the RBIF bit to be cleared. This interrupt on mismatch feature, together with software configurable pull-ups on these four pins allow easy interface to a key pad and make it possible for wake-up on key-depression. Polling of PORTB is not recommended while using the interrupt on change feature.


TIMER0 MODULE AND TMR0 REGISTER

BLOCK DIAGRAM OF THE TIMER0/WDT PRESCALER

TIMER0 BLOCK DIAGRAM

The Timer0 module timer/counter has the following features: • 8-bit timer/counter • Readable and writable • 8-bit software programmable prescaler • Internal or external clock select • Interrupt on overflow from FFh to 00h • Edge select for external clock


Timer mode is selected by clearing the T0CS bit (OPTION_REG<5>). In timer mode, the Timer0 module (Figure ) will increment every instruction cycle (without prescaler). If

REGISTERS ASSOCIATED WITH TIMER0

the TMR0 register is written, the increment is inhibited for the following two cycles (Figure and Figure ). The user can work around this by writing an adjusted value to the TMR0 register. Counter mode is selected by setting the T0CS bit (OPTION_REG<5>). In this mode TMR0 will increment either on every rising or falling edge of pin RA4/T0CKI. The incrementing edge is determined by the T0 source edge select bit, T0SE (OPTION_REG<4>). Clearing bit T0SE selects the rising edge. The prescaler is shared between the Timer0 Module and the Watchdog Timer. The prescaler assignment is controlled, in software, by control bit PSA (OPTION_REG<3>). Clearing bit PSA will assign the prescaler to the Timer0 Module. The prescaler is not readable or writable. When the prescaler is assigned to the Timer0 Module, the prescale value (1:2, 1:4, ..., 1:256) is software selectable. TMR0 Interrupt The TMR0 interrupt is generated when the TMR0 register overflows from FFh to 00h. This overflow sets the T0IF bit (INTCON<2>). The interrupt can be masked by clearing enable bit T0IE (INTCON<5>). The T0IF bit must be cleared in software by the Timer0 Module interrupt service routine before re-enabling this interrupt. The TMR0 interrupt (Figure 6-4) cannot wake the processor from SLEEP since the timer is shut off during SLEEP. DATA EEPROM MEMORY The EEPROM data memory is readable and writable during normal operation (full VDD range). This memory is not directly mapped in the register file space. Instead it is indirectly addressed through the Special Function Registers. There are four SFRs used to read and write this memory. These registers are: • EECON1 • EECON2 • EEDATA • EEADR EEDATA holds the 8-bit data for read/write, and EEADR holds the address of the EEPROM location being accessed. PIC16F8X devices have 64 bytes of data EEPROM with an address range from 0h to 3Fh. The EEPROM data memory allows byte read and write. A byte write automatically erases the location and writes the new data (erase before write). The EEPROM data memory is rated for high erase/write cycles. The write


time is controlled by an on-chip timer. The write time will vary with voltage and temperature as well as from chip to chip. Please refer to AC specifications for exact limits. When the device is code protected, the CPU may continue to read and write the data EEPROM memory. The device programmer can no longer access this memory.

EECON1 REGISTER (ADDRESS 88h)

ANALOG-TO-DIGITAL CONVERTER (A/D) MODULE The Analog-to-Digital (A/D) Converter module has Four . The conversion of an analog input signal results in a corresponding 10-bit digital number. The A/D module has high and low-voltage reference input that is software selectable to some combination of VDD, VSS, RA2 or RA3. The A/D converter has a unique feature of being able to operate while the device is in Sleep mode. To operate in Sleep, the A/D clock must be derived from the A/D’s internal RC oscillator. The analog-to-digital (A/D) converter module has five inputs for the PIC16C72/R72. The A/D allows conversion of an analog input signal to a corresponding 8-bit digital number for use of A/D Converter). The output of the sample and hold is the input into the converter, which generates the result via successive approximation.

The A/D module has three registers. These registersare:


• A/D Result Register (ADRES) • A/D Control Register 0 (ADCON0) • A/D Control Register 1 (ADCON1) A device reset forces all registers to their reset state. This forces the A/D module to be turned off, and any conversion is aborted. The ADCON0 register, shown in Figure 9-1, controls the operation of the A/D module. The ADCON1 register, shown in configures the functions of the port pins. The port pins can be configured as analog inputs (RA3 can also be a voltage reference) or as digital I/O.

ADCON0 REGISTER (ADDRESS 1Fh)

The ADRES register contains the result of the A/D conversion. When the A/D conversion is complete, the result is loaded into the ADRES register, the GO/DONE bit (ADCON0<2>) is cleared, and A/D interrupt flag bit ADIF is set. The block diagram of the A/D module is shown in Figure The value that is in the ADRES register is not modified for a Power-on Reset. The ADRES register will contain unknown data after a Power-on Reset. After the A/D module has been configured as desired, the selected channel must be acquired before the conversion is started. The analog input channels must have their corresponding TRIS bits selected as an input. To determine acquisition time, see Section 9.1. After this acquisition time has elapsed the A/D conversion can be started. The following steps should be followed for doing an A/D conversion:


ADCON1 REGISTER (ADDRESS 9Fh)

1. Configure the A/D module:

• Configure analog pins / voltage reference / and digital I/O (ADCON1) • Select A/D input channel (ADCON0) • Select A/D conversion clock (ADCON0) • Turn on A/D module (ADCON0)

2. Configure A/D interrupt (if desired):

• Clear ADIF bit • Set ADIE bit • Set GIE bit

3. Wait the required acquisition time. 4. Start conversion:

• Set GO/DONE bit (ADCON0) 5. Wait for A/D conversion to complete, by either:

• Polling for the GO/DONE bit to be cleared OR • Waiting for the A/D interrupt

6. Read A/D Result register (ADRES), clear bit ADIF if required. 7. For next conversion, go to step 1 or step 2 as required. The A/D conversion time per bit is defined as TAD. A minimum wait of 2TAD is required before next acquisition starts.


A/D BLOCK DIAGRAM ------------------------------------------The End (PIC)-----------------------------------------------


The 8051 MCW-51 family, The features of the 8051 core are – 8-bit CPU optimized for control applications Extensive Boolean processing (Single-bit logic) capabilities 64K Program Memory address space 64K Data Memory address space 4K bytes of on-chip Program Memory 128 bytes of on-chip Data RAM 32 bidirectional and individually addressable 1/0 lines Two 16-bit timer/counters Full duplex UART 4-source two priority levels On-chip clock oscillator The basic architectural structure of this 8051 core is shown


Program Memory Figure shows a map of the lower part of the Program Memory. After reset, the CPU begins execution from location 0000H. AS shown in Figure , each interrupt is assigned a


fixed location in Program Memory. The interrupt causes the CPU to jump to that location, where it commences execution of the service routine. External Interrupt O, for example, is assigned to location 0003H. If External Interrupt O is going to & used, its service routine must begin at location 0003H. If the interrupt is not going to be used, its service location is available as general purpose Program Memory The interrupt service locations are spaced at 8-byte intervals 0003H for External Interrupt O, 000BH for Timer O, 0013H for External Interrupt 1, 00IBH for Timer 1, etc. If an interrupt service routine is short enough (as is often the case in control applications), it can reside entirely within that 8-byte interval. Longer service routines can use a jump instruction to skip over subsequent interrupt locations, if other interrupts are in use.

In the 4K byte ROM devices, if the EA pin is strapped to Vcc, then program fetches to addresses 0000H through OFFFH are directed to the internal ROM. Program fetches to addresses 1000H through FFFFH are directed to external ROM. If the EA pin is strapped to Vss, then all program fetches are directed to external ROM. The ROM less parts must have this pin externally strapped to VSS to enable them to execute properly. The read strobe to external Rom : PSEN, is used for all external ROM fetches . PSEN is not activated for internal ROM fetches. The hardware configuration for external program execution is shown in Figure 4. Note that 16 I/O lines (Ports 0 and 2) are dedicated to bus functions during external Program Memory fetches. Port O (P0 in Figure 4) serves as a multiplexed address/data bus. It emits the low byte of the Program Counter (PCL) as an address, and then goes into a float state awaiting the arrival of the code byte from the Program Memory. During the time that the low byte of the Program Counter is valid on PO, the signal ALE (Address Latch Enable) clocks this byte into an address latch. Meanwhile, Port 2 emits the high byte of the Program Counter (PCH). Then PSEN strobes the EPROM and the code byte is read into the microcontroller.


The DATA MEMORY Right Half of Figure 2 shows the internal and external Data Memory spaces available to the MCS-51 user. Figure 5 shows a hardware configuration for accessing up to 2K bytes of external RAM. The CPU in this ease is executing from internal ROM. Port O serves as a multiplexed address/data bus to the RAM, and 3 lines of Port 2 are being used to page the RAM. The CPU generates RD and WR signals as needed during external RAM addresses

There can be up to 64K bytes of external Data Memory. External Data Memory addresses can be either 1 or 2 bytes wide. One-byte addresses are often used in conjunction with one or more other 1/0 lines to page the RAM, as shown in Figure 5. Two-byte addresses are also be used, in which case the high address byte is emitted at Port 2 Internal Data Memory is mapped in Figure above. The memory space is shown divided into three blocks, which are generally referred to as the Lower 128, the Upper 128, and SFR space. Internal Data Memory addresses are always one byte Wide which implies an address space of only 256 bytes. However, the addressing modes for internal RAM can in fact accommodate 384 bytes, using a simple trick. Direct addresses higher than 7FH access one memory space, and indirect addresses higher than 7FH access a different memory space. Thus Figure shows the Upper 128and SFR space occupying the same block of addresses 80H through FFH, although they are physically separate entities ;


The lower 128 bytes of RAM are present in all MCS-51 devices as mapped in Figure. The lowest 32 bytes are grouped into 4 banks of 8 registers. Program instructions call out these registers as RO through R7. Two bits in the Program Status Word (PSW) select which register bank is in use. This allows more efficient use of code space, since register instructions are shorter than instructions that use direct addressing.

The next 16 bytes above the register banks form a block of bit addressable memory space. SFR are both byte. and bit. addressable. The MCS-51 instruction set includes a wide selection of single-bit instructions, and the 128 bits in this area can be directly addressed by these instructions. The bit addresses in this area are 00H through 7FH. All of the bytes in the Lower 128 can be accessed by either direct or indirect addressing. The Upper 128 (Figure 8) can only be accessed by indirect addressing. The Upper 128 bytes of RAM are not implemented in the 8051, but me in the devices with 256 bytes of RAM. .Figure gives a brief look at the Special Function Register (SFR) space. SFRS include the Port latches, timers, peripheral controls, etc. These registers can only be accessed by direct addressing. In general, all MCS-51 microcontrollers have the same SFRs as the 8051, and at the same addresses in SFR space. However, enhancements to the 8051 have additional SFRs that are not present in the 8051, nor perhaps in other proliferations of the family.

Bit Addressable


Program Status Word The Program Status Word (PSW) contains several status bits that reflect the current state of the CPU. The PSW, shown in Figure, resides in SFR space. It contains the Carry bit the Auxiliary Carry (for BCD operations), the two register bank select bits, the Overflow flag, a Parity bit, and two user definable status flags. The Carry bit, other than serving the functions of a Carry bit in arithmetic operations, also serves as the “Accumulator” for a number of Boolean operations.

The bits RSO and RSl are used to select one of the four register banks shown in Figure A number of instructions refer to these RAM locations as RO through R7. The selection of which of the four banks is being referred to is made on the basis of the bits RSO and RS1 at execution time. The Parity bit reflects the number of 1s in the Accumulator P = 1 if the Accumulator contains an odd number of 1s, and P = O if the Accumulator contains an even number of 1s. Thus the number of 1s in the Accumulator plus P is always even. Two bits in the PSW are uncommitted and maybe used as general purpose status flags. Addressing Modes DIRECT ADDRESSING In direct addressing the operand is specified by an 8-bit address field in the instruction. Only internal Data RAM and SFRS can be directly addressed.

INDIRECT ADDRESSING In indirect addressing the instruction specifies a register which contains the address of the operand. Both internal and external RAM can be indirectly addressed. The address register for 8-bit addresses can be RO or RI of the selected register bank, or the Stack Pointer. The address register for id-bit addresses can only be the 16-bit “data pointer” register, DPTR.


REGISTER INSTRUCTIONS The register banks, containing registers RO through R7, can be accessed by certain instructions which carry a 3-bit register specification within the opcode of the instruction. Instructions that access the registers this way are code efficient, since this mode eliminates an address byte. When the instruction is executed one of the eight registers in the selected bank is accessed. One of four banks is selected at execution time by the two bank select bits in the PSW. REGISTER-SPECIFIC INSTRUCTIONS Some instructions are specific to a certain register. For example, some instructions always operate on the Accumulator, or Data Pointer, etc., so no address byte is needed to point to it. The opcode itself does that. Instructions that refer to the Accumulator as A assemble as accumulator-specific opcodes. INDEXED ADDRESSING Only Program Memory can be accessed with indexed addressing, and it can only be read. This addressing mode is intended for reading look-up tables in Program Memory. A 16-bit base register (either DPTR or the Program Counter) points to the base of the table, and the Accumulator is setup with the table entry number. The address of the table entry in Program Memory is formed by adding the Accumulator data to the base pointer. Another type of indexed addressing is used in the “case jump” instruction. In this case the destination address of a jump instruction is computed as the sum of the base pointer and the Accumulator data. Arithmetic Instructions The ADD A, <byte> instruction can be written as ADD A,7FH (direct addressing) ADD A,@RO (indirect addressing) ADD A,R7 (register addressing) ADD A, # 127 (immediate constant) The execution times assume a 12MHz clock frequency. All of the arithmetic instructions execute in 1 us except the INC DPTR instruction, which takes 2 us, and the Multiply and Divide instructions, which take 4 us. Note that any byte in the internal Data Memory space can be incremented or decremented without going through the Accumulator. One of the INC instructions operates on the Id-bit Data Pointer. The Data Pointer is used to generate 16-bit addresses for external memory, w being able to increment it in one 16-bit operation is a useful feature. The MUL AB instruction multiplies the Accumulator by the data in the B register and puts the Id-bit product into the concatenated B and Accumulator registers. The DIV AB instruction divides the Accumulator by the data in the B register and leaves the 8-bit quotient in the Accumulator, and the 8-bit remainder in the B register.


Interrupt Structure The8051 core provides 5 interrupt sources 2 external interrupts, 2 timer interrupts, and the serial port interrupts What follows is an overview of the interrupt structure for the 8051. INTERRUPT ENABLES Each of the interrupt sources can be individually enabled or disabled by setting or clearing a bit in the SFR named IE (Interrupt Enable). This register also contains a global disable bit, which can be cleared to disable all interrupts at once. Figure shows the IE register for the 8051.

IE (Interrupt Enable) Register in the 8051

IP(Interrupt Priority)

INTERRUPT PRIORITIES Each interrupt source can also be individually programmed to one of the two priority levels by setting or clearing a bit in the SFR named 1P (Interrupt Priority). Figure shows the 1P register in the 8051. A low-priority interrupt can be interrupted by a high priority interrupt, but not by another low-priority interrupt. A high-priority interrupt can not be interrupted by any other interrupt source. If two interrupt requests of different priority levels are received simultaneously, the request of Higher priority level is serviced. If interrupt requests of the same priority level are received simultaneously, an internal polling sequence determines which request is serviced. Thus within each priority level there is a second priority structure determined by the polling sequence.


8051 Interrupt control system In operation all the interrupt flags are latched into the interrupt control system during State 5 of every machine cycle. The samples are polled during the following machine cycle- If the flag for an enabled interrupt is found to be set (l), the interrupt system generates an LCALL to the appropriate location in Program Memory, unless some other condition blocks the interrupt. Several conditions can block an interrupt, among them that an interrupt of equal or higher priority level is already in progress. The hardware-generated LCALL causes the contents of the Program Counter to be pushed onto the stack, and reloads the PC with the beginning address of the service routine. As previously noted, the service routine for each interrupt begins at a fixed location. Only the Program Counter is automatically pushed onto the stack, not the PSW or any other register. Having only the PC be automatically saved allows the programmer to decide how much time to spend saving which other registers. This enhances the interrupt response time, albeit at the expense of increasing the programmers burden of responsibility. As a result, many interrupt functions that are typical in control applications- toggling a port pin for example, or reloading a timer, or unloading a serial buffer can often be implemented in less time than other Architectures to commence them. SIMULATING A THIRD PRIORITV LEVEL IN SOFTWARE Some applications require more than the two priority levels that are provided by on-chip hardware in 8051. In these cases, relatively simple software can be written to produce the same effect as a third priority level. First, interrupts that are to have higher priority than 1 are assigned to priority 1 in the 1P (Interrupt Priority) register. The service routines for priority 1 interrupts that are supposed to be interruptible by “priority 2“ interrupts are written to include the following code

PUSH IE MOV IE, #MASK CALL LABEL ******


(execute service routine) ****** POP IE RET

LABEL RETI As soon as any priority 1 interrupt is acknowledged, the IE (Interrupt Enable) register is re-defined so as to disable all but “priority 2“ interrupts. Then, a CALL to LABEL executes the RETI instruction, which clears the priority 1 interrupt-in-program tlip-flop. At this point any priority 1 interrupt that is enabled can be serviced, but Only “priority’ 2 interrupts are enabled.. POPping IE restores the original enable byte. Then a normal RET (rather than another RETI) is used to terminate the service routine. The additional software adds 10 us (at 12MHz) to priority 1 interrupts PCON: POWER CONTROL REGISTER. NOT BIT ADDRESSABLE

TIMER/COUNTERS The 8051has two 16-bitTimer/Counter registers: Timer 0 and Timer 1 In the “Timer” function, the register is incremented Every machine cycle. Then one can think of it as counting machine cycles. Since a machine cycle consists of 12 Oscillator periods ,the count rate is 1/12 of the oscillator frequency. In the “Counter” function, the register is incremented in response to a l-to-O transition at its corresponding External input pin, TO, T1. In this function, the external input is sampled during S5P2of Every machine cycle. When the samples shows high in One cycle and a low in the next cycle, the count is incremented. The new count value appears in the register duringS3P1of the cycle following the one in which the Transition was detected .Since it takes 2 machine cycles (24 oscillator periods) to recognize l-to-O transition, the maximum count rate is 2/24 of the oscillator frequency. There are no restrictions on the duty cycle of the external input signal, but to ensure that a given level is sampled at least once before it changes, it


Should be held for at least one full machine cycle. In addition to the “Timer” or “Counter” selection, Timer O and Timer 1 have four operating modes from Which to select. Timer O and Timer 1 These Timer/Counters are present in both the. The “Timers ’or “Counter” function is selected by control bits C/T in the Special Function Register TMOD . These two Timer/ Counters have four operating modes which are selected by bit-pairs (M1. MO) in TMOD. Modes O, 1, and 2 are the same for both Timer/Counters.Mode3 is different. The four Operating modes are described in the following text.

TMOD: Timer/Counter Mode Control Register MODE 0

Timer/Counter 1 Mode 0 :13-BitCounter


Either Timer in Mode 0 is an 8-bit Counter with a divide-by-32preacaler. This 13-bit timer is MCS-48 (8048) compatible .Figure shows the Mode 0 operation as it applies to Timer 1. In this mode, the Timer register is configured as a 13-Bitregister.As the count rolls over from all 1s to all 0s, it sets the Timer interrupt flag TF1. The counted input is enabled to the Timer when TR1 = 1 and either GATE = 0 or INT = 1. (Setting GATE = 1 allows the Timer to be controlled by external input INT1, to facilitate pulse width measurements.) TRl is a control bit in the Special Function Register TCON (Figure ). GATE is in TMOD. The 13-Bitregister consists of ail 8 bits of THl and the lower 5 bits of TL1. The upper 3 bits of TLl are indeterminate and should e ignored. Setting the run flag (’TR1) does not clear the registers. Mode 0 operation is the same for Timer 0 as for Timer 1. Substitute TRO, TFO and INT0 for the corresponding Timer 1 signals in Figure. There are two different GATE bits one for Timer 1 (TMOD.7)and one for Timer 0 (TMOD.3). TMOD: TIMER/COUNTER MODE CONTROL REGISTER. NOT BIT ADDRESSABLE.


TCON: Timer/Counter Control Register MODE 1 Mode 1 is the same as Mode 0 , except that the Timer Register is being run with all 16bits.

Timer/Counter 0 or 1 In Mode1 :16-BitCounter


MODE 2

Timer/Counter1 Mode 2: 8-Bit Auto-Reload Mode2 configures the Time register as an 8-bitCounter (’TLl) with automatic reload, as shown in Figure . Overflow from TL1 not only sets TFl, but also reloads a timer TL1 with the contents of THl, which is preset by software. The reload leaves THI unchanged. Mode 3

Timer/Counter 0 Mode 3: Two 6-Bit Counters Timer 1 in Mode 3 holds its count . The effect is same as setting TR1 = .0 Timer in Mode 3 establishes TL0 and TH0 as two separate counters . The logic for Mode 3 on Timer 0 is as shown in the figure. TL0 uses Timer 1 Control bits C/T , GATE , TR0 , INT0 and TF0. TH0 is locked into Timer function and takes over the use of TR1 and TF1 from Timer 1 ,TH0 has Timer 1 interrupt.


SERIAL INTERFACE The serial port is full duplex, meaning it can transmit and receive simultaneously. It is also receive buffered, meaning it can commence reception of a second byte before a previously received byte has been read from the receive register. (However, if the first byte still hasn’t been read by the time reception of the second byte is complete one of the bytes will be lost). The serial port receive and transmit registers are both accessed as Special Function Register SBUF. Writing to SBUF loads the transmit register, and reading SBUF Accesses physically separate receiver register The serial port can operate in 4 modes: Mode O: Serial data enters end exits through RXD. TXD outputs the shift clock.8 bits are transmit text/received: 8 data bits (LSB first).The baud rate is fixed at 1/12 the oscillator frequency.


Mode 1: 10bits are transmitted (through TXD) or received (through RXD): a start bit (0), 8 data bits (LSB first), and a stop bit (l). On receive the stop bit goes into RB8 in Special Function Register SCON. The baud rate is variable. Mode 2: 11bits are transmitted (through TXD) or received (through RXD):a start bit (0), 8 data bits (LSB fret), a programmable 9th data bit, and a stop bit (l). On Transmit, the 9th data bit (TB8 in SCON)can be assigned the value of O or 1.Or, for example ,the parity bit (P, in the PSW) could be moved into TB8.On receive, the 9th data bit goesintoRB8in Special Function Register SCON, while the stop bit is ignored .The baud rate is programmable to either 1/32 or 1/64 the oscillator frequency. Mode 3: 11bits are transmitted (through TXD) or received (through RXD): a start bit (0), 8 data bits (LSB first), a programmable9th data bit and a stop bit (l). In fact Mode 3 is the same as Mode 2 in all respects except the baud rate. The baud rate in Mode3 is variable. In all four modes, transmission is initiated by any instruction that uses SBUF as a destination register Reception is initiated in Mode O by the condition RI = O and REN = 1. Reception is initiated in the other modes by the incoming start bit if REN = 1. Serial Port Control Register The serial port control end status register is the Special Function Register SCON, shown in Figure . This register maintains not only the mode selection bits , but also the 9th data bit for transmit and receive(TB8 and RB8),and the serial port interrupt bits (TI and RI).

SCON: Serial Port Control Register


BAUD RATES The Baud rate in Mode 0 is fixed a:

12

FrequencyOscillatorBaudRate0Mode =

The Baud Rate in Mode 2 depends on the value of Bit SMOD in SFR PCON . If SMOD = 0 (Reset Value) Baud Rate is 1/64 of the oscillator Frequency If SMOD = 1 Baud Rate = 1/32 of the oscillator Frequency

)FrequencyOscillator(X64

2BaudRate

SMOD

=

USING TIMER/COUNTER 1 TO GENERATE BAUD RATES: For this purpose, Timer 1 is used in mode 2 (Auto Reload).Refer to Timer Setup section of this chapter.

( )[ ]1TH2561232

FrequencyOscillatorKBaudRate

−×××=

If SMOD = 0, then K = 1. If SMOD = 1, then K = 2. (SMOD is the PCON register). Most of the time the user knows the baud rate and needs to know the reload value for TH1. Therefore, the equation to calculate TH1 can be written as:

BaudRate1232

FrequencyOscillatorK2561TH

×××−=

TH1 must be an integer value. Rounding off THl to the nearest integer may not produce the desired baud rate. In this case the user may have to chooses another crystal frequency. Since the PCON register is not bit addressable, one way to set the bit is logical ORing the PCON register. (ie, ORL PCON, #80H). The address of PCON is 87H.

-------xxxxxxx-------


QUESTION BANK MPMC No: Question Marks 1a Explain different features of Pentium. 6 1b How is pipelining supported in the floating point unit of Pentium 6

1c How Pentium services interrupts in real mode? 6

1a What is the privilege level of an application program, if it is running in real mode? Why?

4

1b What is BTB in Pentium processor? How is it useful? Explain 6 1c With help of neat diagram, explain how data / operand is stored in

register file (Stack) of floating point unit of the Pentium 4

1d State the super-scalar features of the Pentium 4 1a Compare Real mode and Protected Mode of Pentium 6 1b What is use of INIT pin? Is it same as reset pin functionally 4 1c Explain pin functions IU ii) BRDY iii) CPUTYP iv) IV 6 1a Which features make Pentium a super scalar processor? Give details

of every feature. 8

1b List features of real Mode of Pentium Processor 4 1d Explain Functions of BOFF and BE0 BE7 pins 6 ---xxx--- 2a Explain function of the following pins: i) ADSC ii) AP iii) BP [3:2]

iv) CPUTYP v) BREQ 10

2b With help of neat diagram, Explain Pentium Architecture 8 2a How physical address is generated in the Pentium? 3 2b Is the programmer allowed to use all the instructions and registers,

When Pentium operates in the real mode? Elaborate 4

2c Explain the following: I) BE0-BE7 ii) BRDY iii) IERR 6 2d State and explain pairing rules for instructions in Pentium. 5 2a Which features of the Pentium can be called RISC feature? Give

Details. 6

2b Explaining pipelining in the Pentium processor. 6 2c How is Data cache organized in the Pentium? 4 2a With help of neat block diagram, explain the architecture of Pentium

Processor 8

2b Explain ADS and NA pins of the Pentium Processor 4 2c What is branch prediction in Pentium Processor 6 ---xxx--- 3a What are different memory type addressing modes of the Pentium?

Explain, Give one example each 10

3b Differentiate between pipelined and non pipe lined bus cycles of Pentium

6

3a What are the different steps followed by Pentium processor in Power up?

6


3b Explain various addressing modes of Pentium processor, Give one

example each, 6

3c What is the significance of CR0 and CR3 registers in the Pentium? 4 3a Explain initialization process of Pentium. 6 3b Draw non-pipelined write cycle with one wait state 6 3c Which are the protected mode Registers in Pentium? Explain their

use. 6

3a How can you make the Pentium processor enter into self test? Explain

6

3b With help of block diagrams explain how Pentium addresses 32,16,8 bit memories

6

3c Draw non-pipelined read cycle of Pentium Processor 4 ---xxx--- 4a When does the Pentium processor takes up Built in Self Test? What

are the advantages of this test? 6

4b With the help of a neat diagram, Explain pipelined bus Cycle of the Pentium

6

4c Draw and explain EFLAGS register of the Pentium. 4 4a Name and explain 4 Protected mode instructions of the Pentium

Processor 8

4b With help of neat diagram, explain the non pipelined read bus cycle of the Pentium

6

4c Name protected mode registers of the Pentium 2 4a What are the Tests performed when INIT pin is asserted with RESET

pin? 6

4b What do you mean by bus cycle? How are pipelined Bus-Cycles different than non-pipelined Bus cycles? Explain

6

4c Explain different data types supported by Pentium processor. 6 4a How are 32 , 16 ,8 bit memories interfaced to the Pentium Processor ?

Draw a neat block diagram? 8

4b What is “cold” and “warm” Reset of the Pentium processor? Explain the operations performed immediately after the Reset Signal is activated.

8

---xxx--- 5a Differentiate between segmentation in Protected mode and

Segmentation in Real Mode 6

5b What is TLB? How is it useful in Paging? 6 5c What are page directories and Page tables? What are their sizes?

Where are they located/ 4

5a What are the selectors in Pentium? Explain their use in Segmentation. 6 5b State privilege level rule for I/O access, if Pentium is acting in

Protected mode 2

5c Under what circumstances application program need to change 8


privilege levels? What are different techniques used for changing Privilege levels?

5a Differentiate between system descriptors and non-system Descriptors. How does Pentium recognize these descriptors? Give format of non-system descriptor. Also name four system descriptors along with their use.

12

5b Give details of four instructions, which are related to segmentation or Pentium in protected mode.

4

5a Explain the process of linear to physical address translation for 4 MB pages. Also name and draw formats of Descriptors and registers used for this translation.

8

5b How protection is provided at segmentation level in Pentium 4 5c Differentiate between IVT IDT. 4 ---xxx--- 6a With the help of descriptor format, explain the different fields used to

provide protection to a segment. 8

6b Explain Linear to Physical address translation of Pentium 8 6a What is LDT descriptor? Describe its use in protected mode of

Pentium 6

6b Explain 2 instructions related to Paging unit of Pentium. 4 6c What can be sizes of pages in Pentium? How Pentium protects those

pages? 6

6a Explain the process of translation of logical address to Linear address, With the help of a neat diagram and related registers and structure.

8

6b What is TLB? Where is it situated? How is it organized? What is its use?

8

6a What are the system descriptors in Pentium. Name them and describe their use in Protected Mode

5

6b What is CPL, DPL, RPL? 3 6c What is GDT: How is it useful in logical to linear address translation

in Pentium. Explain with help o nest diagram

---xxx--- 7a Compare Real mode with Protected mode on following features

i) Privilege level ii)Interrupt handling iii)IOPL iv) Instructions allowed to be used

10

7b Name and describe descriptors used in multitasking 8 7a Name instructions used for performing a task switch. Explain the

significance of IRET instruction in task switching 6

7b How Pentium enters and leave Virtual Mode 6 7c What are error codes? What is their use? Elaborate 6 7a What is multitasking? How pentium supports this feature? What

registers, descriptors are involved? 8

7b How virtual mode is different from protected mode? 6 7c What information is pushed on the stack, When ISR is at different 4


privilege level and requires error code 7a What is Task Register? Which mode of Pentium makes use of this

Register and how? Explain in Detail. 8

7b What privilege levels are performed when any I/O access is done by Pentium Processor in protected Mode? If this check fails , is there any alternative the processor will have to continue I/O access?

6

7c List features of virtual mode in Pentium Processor. How pentium enters into Virtual Mode?

6

---xxx--- 8a What is TSS ? Where is it Located? How is it useful in Multitasking 8 8b What do you mean by Trap , Fault, Abort? Explain 6 8c Explain use of LTR Instruction. 4 8a Differentiate between TSS Descriptor and Task Gate Descriptor.

Where are they found? How are they useful to Pentium processor? Explain

8

8b Differentiate between Real Mode and Virtual Mode. 4 8c What are different types of exceptions? Explain by giving one

example of each. 6

8a How are interrupts handled in Virtual Mode? Explain 6 8b What is “Back link”? Where is it situated? What is its use? 5 8c What type of descriptors found in IDT? Differentiate between them.

Also give their applications. 7

8a What is TSS? What it Contains? How is it useful in multitasking in Pentium Processor? Explain.

8

8b What are sources of interrupts in Pentium Processor ? How interrupts are handled in Real Mode? Will it be different if Pentium Processor is in Protected mode? Explain with help of neat diagrams.

12

---xxx---

9a Name SFR ‘s found in 8051 microcontroller 4 9b Explain the function of the following pins of 8051 uC

i) EA ii)XTAL1 iii) PSEN iv) T0

8

9c Interface 4KB RAM to 8051 microcontroller. 4 9a Name Bit addressable SFR’s in 8051 uC 6 9b Write an instruction to clear the bit which has address 001h. 2 9c What is count value to be loaded in TH1 and TL1 if delay of 1 msec

is to be generated? Assume oscillator frequency of 12 MHz and TIMER 1 in MODE n1.

6

9d What is idle mode of 8051 uC? 2 9a Write a program in 8051 Assembly Language to XOR two bits.

Assume those bits are in Bit addressable area. 4

9b How 8051 uC differentiates external program memory and external data memory read and write operations?

4

9c Explain different modes of serial part in 8051? Also give details of Baud Rate in every mode.

8


9a Write 8051 instructions to:

i) Read from external program memory at address 0200h. ii) Write to external program memory at address 6000h

4

9b What are the different addressing modes of 8051 uC? Give two examples of each.

6

9c Write 8051 ALP to send one byte of data serially with baud rate of 1200. Oscillator Freq 12 MHz, Assume suitable mode for serial port.

6

---xxx--- 10a Explain on-chip features of 8051 uC. 4 10b Where RS0 and RS1 bits are found in 8051? What is their use? 4 10c Explain different modes supported by serial port of 8051. How do

you program Baud Rates for programmable modes of serial PORT? 8

10a What is maximum size of program memory and data memory can be connected to 8051 uC? Which pins / Signals are used for this purpose?

6

10b What are the different sources of interrupts and vectors allocated in 8051? Also name SFR’s involved in interrupt structure of 8051

6

10c Name 4 bit addressable instructions of 8051 4 10a What is use of T0 pin of 8051? When is it used? Explain 4 10b What are the different sources of interrupts of 8051? How are

interrupts handled in 8051? 8

10c Write 8051 program to generate continuous interrupt every 100usec? Assume clock freq 12 MHz.

4

10a Draw the memory map of 8051 uC. What is Bit addressable area? How many Bits are addressable? What are the uses of SFR’s?

6

10b It is said number of Timers and Counters in 8051 is Three? When this is Possible? Explain.

6

10c What are power saving modes of 8051? 4 ---xxx--- 11a Name and explain the different CPU registers of PIC 16C61/71 10 11b Explain indirect addressing mode of PIC 16C61/71 4 11c What is CRLF in PIC 16C61/71 2 11a Explain features of PIC 16C61/71 6 11b Explain following instructions

i) clrf TRISA ii)btfsc STATUS , 5 ii)movlw 0x25

6

11c What is prescaler in PIC 16C61/71 4 11a Explain features of PIC 16C61/71 6 11b What is CLRWDT? Explain its significance 6 11c Explain different addressing modes of PIC 16C61/71 4 11a With help of neat diagram explain memory organization of PIC

16C61/71. Give details of Program memory and Data memory 8

11b Explain following instructions i) bcf f , b ii)comf f, F(W) 4


11c Explain Watchdog timer found in PIC 16C61/71 4 ---xxx--- 12a Draw and explain memory map of PIC 16C61/71 8 12b How many I/O ports are found in PIC 16C61/ difference between PIC

16C61 and 16C7171? How do you configure them as input or output? 6

12c What is CLRWDT? Give details 2 12a What is architectural difference between PIC 16C61 and 16C71 2 12b Explain different addressing modes of PIC 16C61/71 with one

example each. 6

12c What are the sources of interrupts in PIC 16C61/71? How are they recognized? How are they serviced? Give details.

8

12a Give details of different CPU registers of 16C61 PIC 8 12b Explain the following i)XOR W K ii) incf F(W) 4 12c Differentiate between PIC 16C61 and PIC 61F8XX uC’s. 4 12a How many interrupts does PIC 61C61 and PIC 61C71 support? With

help of neat diagram explain interrupt structure of PIC 16C61. 6

12b How many Timers does PIC 61 C 61 / 7X contain? Explain Timer0 operation in Detail.

6

12c Write PIC 16C61 ALP to configure RA0, RA1 , RA2 as output and RA3 as input PORT lines.

6

WISH YOU ALL THE BEST

Mpmc

Technology

Transcript of Mpmc