ARM1176JZF-S ( iPhone 3G)
description
Transcript of ARM1176JZF-S ( iPhone 3G)
ARM1176JZF-S(iPhone 3G)
Jeff BrantleyChris GreggBill Stitson
Processor OverviewFeatures
• Designed for consumer and wireless products • RISC Processor with Harvard
Architecture• Vector Floating Point
coprocessor• Branch prediction• “TrustZone” security built-in
to the CPU• Instruction and data caches• 8-stage pipeline• 32-bit and 16-bit (“Thumb”)
instruction sets, and “Jazelle” technology for Java execution
Memory Hierarchy Harvard architecture:
separate data and instruction caches Allows simultaneous
access 64-bit datapaths L1 Cache
up to 64KB in size 4-way set associative virtual index, physical tag 8 words per line, critical
word first on miss Round robin or pseudo-
random replacement policy
[1]
Level 2 Interface
“high-bandwidth interface to second level caches, on-chip RAM, peripherals, and interfaces to external memory” [1]
Level 2 interconnect 64-bit wide interfaces: Instruction Fetch Data Read/Write DMA
Peripheral Interface is 32 bits wide
Translation Lookaside Buffer (TLB)
MicroTLBs One each for instructions, data 10 entries Fully associative Round-robin or random replacement
Single Main TLB Contains a fully-associative region of 8 lockable
elementsMisses handled by two-level page table
Coprocessor interfaceCore processor can interface to on-chip coprocessors
Instruction set supports up to 16 coprocessors Two of these are used by the VFP
Coprocessors intended to run in-step with core, share data Two-cycle delay: “generous timing margins” [1] Loose synchronization via token queues Core may flush coprocessor pipeline or cancel instructions
Only one coprocessor “active” at one time Not so bad: calls to driver software = core instructions Allows much of the interface to be shared ($$$)
Coprocessor Synchronization
[1]
VFP CoprocessorUses a dedicated interface to the processor IEEE 754 Standard for Binary Floating-Point
Arithmetic64-bit load and store buses3 independent, parallel pipelines:
Load and store Multiply and accumulate Divide and square root
Short vector instructions: 8 single precision, 4 double
No branch instructions
Branch PredictionBranch Prediction (BP) can be turned on and off
with a control register. Provides high level of control
The ARM processor performs two types of BP Dynamic: performed in the Prefetch Unit Static: performed by the integer core (and the first
time, before historical data exists)Branch folding
After prediction, the branch instruction is completely removed from the instruction stream presented to the pipeline.
Dynamic Branch Prediction Dynamic Branch Prediction is the “first line” of
branch prediction: if history exists, it will be used. The Branch Target Address Cache (BTAC) holds
virtual target addresses of previous branches 128-entry, direct mapped cache Includes a 2-bit branch prediction history. A BTAC hit produces a branch prediction with zero
cycle delay Both branches (resolved taken and not taken) are
stored in the BTAC, which improves performance. Branch folding is done for almost all dynamically
predicted branches.
Static Branch PredictionStatic Branch Prediction is only based on the branch
instruction characteristics (i.e., it does not utilize history)
Simple: All forward conditional branches are not taken, and all
backward branches are taken. “Around 65% of all branches are preceded by enough
non-branch cycles to be completely predicted.” [1]The static branch predictor is used
on compulsory misses (i.e., the first time a branch is encountered)
when there are capacity or conflict misses in the BTAC
TrustZone The ARM1176 processors implement “TrustZone”
security extensions that “provide a secure environment for software” [1]
dddd
[2]
• The hardware is partitioned so that the resources are physically separated on the chip, creating a strong boundary between the Normal World and the Secure World• Two virtual processors are created from the one physical
processor, removing the need for a separate processor dedicated to security
• TrustZone aware hardware such as DMA controllers allow secure data transfer
• Examples of how TrustZone can be used include secure PIN entry from the keyboard, to Digital Rights Management of multimedia data.
Integer Pipeline
• Up to 4 instructions fetched
• Static branch prediction in Fe2
• Decode/Issue can hold branch alongside other instruction
• Non-blocking loads• Hit Under Miss (HUM)
buffer
Jazelle Java hardware acceleration
Java bytecode translated to ARM instruction(s) Extra decode logic between Fetch and Decode stages
Extension of ARM instruction set Limited (unpublished) subset of Java bytecodes Instructions to enter and exit Jazelle state Unsupported bytecodes interpreted in software by
JVMRequires Jazelle-aware JVM
Relatively proprietary Free/Open Source JVM’s cannot take advantage
Thumb16-bit extension to 32-bit ARM ISA“Most commonly used” ARM instructions in 16-bit
formEnables higher code density
“Reduces memory bandwidth and size requirements by up to 35%” [4]
Like Jazelle, requires extra pre-decode translation hardware
Can link Thumb-compiled code optimized for space against performance-critical code compiled to 32-bit ARM
References① “ARM1176JZF-S Processor Technical Reference
Manual”, ARM Limited, Lit.-Nr.: ARM DDI 0301F, 2004--2007.
② “TrustZone Hardware Architecture”, ARM Limited, http://www.arm.com/products/security/trustzone/hardware.html, downloaded Dec. 4, 2009.
③ “Trust Zone System Design”, http://www.arm.com/products/security/trustzone/systemdesign.html, downloaded Dec. 4, 2009.
④ “ARM1176JZ(F)-S”, ARM Limited, http://www.arm.com/products/CPUs/ARM1176.html, downloaded Dec. 4, 2009.