Post on 19-Dec-2015
Processor/Interface
Ki-Hyung KimDivision of Information and Computer Eng.
Ajou University
2Embedded Software
임베디드 시스템 구조
Coordination of many levels of abstraction
I/O systemProcessor
CompilerOperating
System(Windows 98)
Application (Netscape)
Digital DesignCircuit Design
Instruction Set Architecture
Datapath & Control
transistors
MemoryHardware
Software Assembler
3Embedded Software
임베디드 시스템 H/W 구조
임베디드 시스템 구성 임베디드 H/W
프로세서 / 컨트롤러 메모리 , I/O 인터페이스 , 네트워크 인터페이스
Processor (active)
Embedded System
Control(“brain”)
Datapath(“brawn”)
Memory(passive)
(where programs, data live whenrunning)
Devices
Input
Output
4Embedded Software
임베디드 H/W 구성요소
임베디드 프로세서 / 컨트롤러 대부분의 프로세서가 임베디드 시스템용으로 사용 많은 종류의 마이크로프로세서 / 컨트롤러들 중에서 응용에 최적인
제품을 찾아내는 것이 설계에서 매우 어렵고 중요한 작업
Embedded Computers80%
Embedded Computers80%
8.5B Parts per Year
8.5B Parts per Year
Robots6%
Vehicles12%
Direct2%
Source: DARPA/Intel (Tennenhouse)
대부분의 프로세서가 임베디드용으로 사용됨
5Embedded Software
임베디드 H/W 구성요소 (2)
메모리 ROM/RAM
고속 / 대용량화 FLASH 메모리의 사용증가 . CACHE/Virtual Memory 효용성
버스 주변 장치
Timer/Counter Interrupt DMA
기타
6Embedded Software
임베디드 프로세서
Computation tasks 를 주로 담당 다양한 주변 인터페이스를 포함하는 SoC 형태로 발전 처리속도 , 전력 소비 , 가격 뿐만 아니라 개발환경과의 연관
관계가 매우 중요 제어 장치 (control unit) 와 연산부 (data-path) 로 구성 프로세서 선택 중요
ARM, PPC, MIPS, i386, Alpha, Sparc, m68k SH, CRIS, IA64, PARISC 등 MSP430, Atmega128 (AVR), i8051
본 강좌에서는 하나의 예로 ARM core 를 기반으로 설명
7Embedded Software
프로세서 기본 구조
Control unit 과 data-path 로 구성
특징 General data-path Control unit doesn’t
store the algorithm – the algorithm is “pro-grammed” into the memory
ProcessorControl unit Data-path
ALU
Registers
IRPC
Controller
Memory
I/O
Control/Status
8Embedded Software
Data-path 동작
Load Read memory location
into register ALU
Arithmetic/logical opera-tion
Store Write register into
memory location
ProcessorControl unit Datapath
ALU
Registers
IRPC
Controller
Memory
I/O
Control/Status
10...
...
10
+1
11
11
9Embedded Software
제어 장치 (Control Unit)
Control unit: configures the data-path operations Sequence of desired operations
(“instructions”) stored in memory – “program”
Instruction cycle – broken into several sub-operations, each one clock cycle: Fetch: Get next instruction into IR Decode: Determine what the in-
struction means Fetch operands: Move data from
memory to data-path register Execute: Move data through the
ALU Store results: Write data from reg-
ister to memory
ProcessorControl unit Datapath
ALU
Registers
IRPC
Controller
Memory
I/O
Control/Status
10...
...
load R0, M[500] 500
501
100
inc R1, R0101
store M[501], R1102
R0 R1
10Embedded Software
CISC and RISC 구조 CISC - Complex Instruction Set Computer
관련된 연산을 수행하는 수많은 명령을 가짐 CISC code is compact
Can be many clock cycles per instruction Large silicon area > Higher cost per die
RISC - Reduced Instruction Set Computer More modern architecture One instruction executed per clock cycle > Very fast RISC CPU cores tend to be small
Typical dynamic instruction usage Data movement, Control flow, Arithmetic operations, Comparisons,
Logical operations 이 99% 를 차지함 Risc 의 가능성을 보여줌 (?)
11Embedded Software
BUS A Bus Is: shared communication link single set of wires used to connect multiple subsystems Data Bus, Address Bus, Control Bus Input/Output Bus (eg. PCI) – 표준화되어야됨 . System Bus (local bus)- 고속이 목표 , Processor 에 의존적 chipset
A Bus is also a fundamental tool for composing large, complex systems systematic means of abstraction
Control
Datapath
Memory
ProcessorInput
Output
12Embedded Software
폰 노이만 아키텍처
memoryCPU
200
address
data
IRADD r5,r1,r3200
Embedded System
ADD r5,r1,r3
13Embedded Software
14Embedded Software
15Embedded Software
하버드 아키텍처 Harvard can’t use self-modifying code.
http://www.arm.com/support/faqip/3738.html Harvard allows two simultaneous memory fetches. Most DSP use Harvard architecture for streaming data:
greater memory bandwidth; more predictable bandwidth.
CPU
PCdata memory
program memory
address
data
address
data
16Embedded Software
Pipeline in RISC
T1 T2 T3 T4 T5
T1 T2 T3 T4 T5
T1 T2 T3 T4 T5
T1 T2 T3 T4 T5
T1 T2 T3 T4 T5
T1 T2 T3 T4 T5
T1 T2 T3 T4 T5
T1 T2 T3 T4 T5
T1 T2 T3 T4 T5
T1: instruction fetch T2: decode T3: Execution (Load from Memory) T4: Write to Memory (or Register)
17Embedded Software
MIPS
MIPS (originally an acronym for Microprocessor without Interlocked Pipeline Stages) is a RISC microprocessor ar-chitecture developed by MIPS Technologies. By the late 1990s it was estimated that one in three RISC chips produced were MIPS-based designs.
18Embedded Software
Pipeline in CISC
T1 T2 T3 T4 T5
T1 T2 T3 T4 T5
T1 T2 T3 T4 T5
T1 T2 T3 T4 T5
T1 T2 T3 T4 T5
T1 T2 T3 T4 T5
T1 T2 T3 T4 T5
T1 T2 T3 T4 T5
T1 T2 T3 T4 T5
T6 T7
T6
T6
파이프라인의 개수와 길이가 가변 파이프라인을 만들기 어렵다 . 파이프라인의 각 스텝을 길이를 최소화하기어렵다 .
19Embedded Software
Simplified Harvard Architecture of ARM
TCM: Tightly Coupled Memory ( Used for Realtime programs)
20Embedded Software
Dataflow Architecture
Program counter 가 없다 . 다음에 실행시킬 인스트럭션을 지정하는 기능이 없다 . 다음에 실행시킬 인스트럭션은 어떤 인스트럭션이든 필요한
데이터가 준비되면 바로 실행된다 . ( 병렬로 )
ALU ALU ALU ALU
메모리 ( 실행될 instruction 들 )
Add M1 + M2 M3
21Embedded Software
Dataflow Architecture 2
Dataflow architecture is a computer architecture that directly contrasts the traditional von Neumann architecture or control flow architec-ture.
Dataflow architectures do not have a program counter or (at least conceptually) the exe-cutability and execution of instructions is solely de-termined based on the availability of input argu-ments to the instructions.
22Embedded Software
메모리 - ROM
Read-Only Memory (ROM) Non-volatile storage ROM, PROM, EPROM, EEPROM OT-PROM (one time programmable)
Mask ROM Fuse ROM
PROM(programmable) EPROM EEPROM
Word Line
Bit Line
Mask ROMFuse ROM
Word Line
Bit Line
Word Line
Bit Line
EPROM EEPROMFlash Memory
Floatinggate
23Embedded Software
25Embedded Software
26Embedded Software
27Embedded Software
28Embedded Software
29Embedded Software
30Embedded Software
31Embedded Software
32Embedded Software
33Embedded Software
34Embedded Software
Flash Memory
NOR NAND
XIP In computer science, execute in place (XIP) is a method of execut-
ing programs directly from long term storage rather than copying it into RAM. It is an extension of using shared memory to reduce the total amount of memory required.
35Embedded Software
메모리 - RAM
Random Access Memory 전원이 인가되는 상태에서만 데이터를 유지 Two main types: Static RAM (SRAM) and Dynamic RAM
(DRAM) 비트가 저장되는 방법 상에 차이점이 존재 Static RAM
Fast (active drive) Less dense (4-6 transistors/bit) Stable (holds value as long as power applied)
Dynamic RAM Slower High density (1 transistor/bit) Unstable (needs refresh)
Other types: SDRAM, Video RAM, FERAM
36Embedded Software
Inverter with CMOS
39Embedded Software
SRAM
40Embedded Software
DRAM
41Embedded Software
SDRAM
SDRAM refers to synchronous dynamic random access memory, a term that is used to describe dynamic random access memory that has a syn-chronous interface. Traditionally, dynamic random access memory (DRAM) has an asynchronous interface which means that it responds as quickly as possible to changes in control inputs. SDRAM has a synchronous interface, mean-ing that it waits for a clock signal before responding to con-trol inputs and is therefore synchronized with the computer's system bus.
42Embedded Software
RD 값이출력
Latency ( 지연시간 )
성능 (Performance) -- Throughput
43Embedded Software
RAM 의 기본 구조
Word Lines
Bit Lines
Bit Cell
Sense Amplifier
Address
High
Low
Data
44Embedded Software
Typical 16 Mb DRAM (4M x 4)
45Embedded Software
Static RAM (SRAM) 구조 및 access
WordLine
Bit!Bit
Read: Drive word line, sense value on bit lines
Write: Drive word line, drive new value (strongly) on bit lines
CE
Addr
Data
Read Write
Accessing a Static RAM
Note: CE signal is often active-low as opposed to how shown here. SRAMs also generally have a write enable signal
46Embedded Software
Dynamic RAM (DRAM)
Bit Line
Word Line Read: Drive word line, sense value on bit line (destroys saved value)
Write: Drive word line, drive new value on bit line.
RAS
CAS
Addr
Dynamic RAM Timing (Read) Control signals are often active-low
47Embedded Software
Other RAM Types
Video RAM Optimized for high-speed regular accesses to frame buffer
SDRAM Uses clocked organization to pipeline for speed
Flash RAM Non-volatile (holds data without power)
FERAM Uses magnetic technology (similar to hard disk) to store data
Holds value when power off Capacity, access time similar to RAM (hard disks take ms)
Nanotech RAMs Molecular electronics, carbon nanotubes Nowhere near ready for prime time
48Embedded Software
Video DRAM
VRAM is a dual-ported variant of DRAM which was once commonly used to store the frame-buffer in some graphics adaptors.
Dual-ported RAM (DPRAM) is a type of Random Access Memory that allows multiple reads or writes to occur at the same time, or nearly the same time, unlike single-ported RAM which only allows one access at a time.
Video RAM or VRAM is a common form of dual-ported dynamic RAM mostly used for video memory, allowing the CPU to draw the image at the same time the video hardware is reading it out to the screen.
VRAMFrame Buffer in Graphic Card
DVICPU
49Embedded Software
Flash Memory
1. NOR 형 • cell 이 병렬로 배치되어 random access 가 가능하고 byte 단위로
프로그래밍 가능 . • 읽기 속도가 NAND 형보다 빠르지만 , 쓰기 / 지우기 속도는 느리다 . • 각 cell 마다 비트선의 접촉전극이 필요하여 NAND 형에 비해 cell 당 면적이 많이 필요하고 비싸다 . • 읽기 속도가 빠르므로 코드 저장용 ( 주로 디바이스의 OS 부팅용 ) 으로 사용한다 .
2. NAND 형 • cell 이 직렬로 배치되어 page/block 단위로 읽고 쓰기 가능 .
• random access 가 불가능하여 읽기 속도가 NOR 형에 비해서 느리지만 , 쓰기 / 지우기 속도는 빠름 . • 집적 밀도가 높다 대용량화가 가능하므로 데이터 저장용• ( 디지털 카메라 , MP3 등 ) 으로
사용한다 .
요약NOR 형은 대용량화가 어렵고 NAND 형은 읽기 속도가 느리다는 단점이 있다
50Embedded Software
Cache Systems
CPU Cache Main Memory
Data object transfer
Block transfer
CPU
400MHz
Main Memory 10MHz
Main Memory 10MHz
Bus 66MHz Bus 66MHz
CPU
Cache
SRAM DRAM
51Embedded Software
Why Memory Hierachy?
52Embedded Software
Cache Mechanism (1)
53Embedded Software
Cache Address Mapping
54Embedded Software
Cache Block
512 byte 캐쉬의 라인 ?
4byte4byte4byte4byte4byte
128 블록
1block = 4 byte
55Embedded Software
512 byte 캐쉬의 라인 ?
4byte4byte4byte4byte4byte
32 블록
1block = 16 byte (4word)
4byte 4byte 4byte4byte 4byte 4byte4byte 4byte 4byte4byte 4byte 4byte4byte 4byte 4byte
56Embedded Software
Cache entry 가 8 개이면 ? 1way cache (direct mapped cache)
0 번 라인 캐쉬 엔트리에 들어올수 있는 블록의 태그는 ? 캐쉬가 8 라인이면 (0, 8, 16, 24,… 캐쉬가 256 라인이면 (0, 256, 512,)
Tag=0
Tag=1
57Embedded Software
Cache entry 가 8 개이면 ? 2 way set-associative cache
Set 0
Set 1
Set 2
Set 3
way 1, way 2
Tag: index 가 set 안에 있나 없나 ? Set: set ID
58Embedded Software
Cache entry 가 8 개이면 ? 4 way cache
59Embedded Software
Cache entry 가 8 개이면 ? n way set associative cache (n=8, s=1) – Fully associative cache
60Embedded Software
Direct Mapped
61Embedded Software
Direct Mapping Cache Organization
62Embedded Software
Direct Mapping Example
63Embedded Software
Direct Mapping pros & cons
Simple Inexpensive Fixed location for given block
If a program accesses 2 blocks that map to the same line re-peatedly, cache misses are very high
64Embedded Software
Direct Mapping Cache Line Table
Cache line Main Memory blocks held 0 0, m, 2m, 3m…2s-m 1 1,m+1, 2m+1…2s-m+1
m-1 m-1, 2m-1,3m-1…2s-1
65Embedded Software
2 Way Set Associative
66Embedded Software
Set Associative Cache
A0, A1
S0 S1 S2
2
67Embedded Software
Set Associative Mapping
Cache is divided into a number of sets Each set contains a number of lines A given block maps to any line in a given set
e.g. Block B can be in any line of set i
e.g. 2 lines per set 2 way associative mapping A given block can be in one of 2 lines in only one set
68Embedded Software
Set Associative MappingExample
13 bit set number Block number in main memory is modulo 213 000000, 00A000, 00B000, 00C000 … map to same set
69Embedded Software
Two Way Set Associative Cache Organiza-tion
70Embedded Software
Set Associative MappingAddress Structure
Use set field to determine cache set to look in Compare tag field to see if we have a hit e.g
Address Tag Data Set number 1FF 7FFC 1FF 12345678 1FFF 001 7FFC 001 11223344 1FFF
Tag 9 bit Set 13 bit Word 2 bit
71Embedded Software
Two Way Set Associative Mapping Example
72Embedded Software
Fully Associative Cache
73Embedded Software
Associative Mapping
A main memory block can load into any line of cache Memory address is interpreted as tag and word Tag uniquely identifies block of memory Every line’s tag is examined for a match Cache searching gets expensive
74Embedded Software
Fully Associative Cache Organization
75Embedded Software
Associative Mapping Example
76Embedded Software
Tag 22 bit Word 2 bit
Associative MappingAddress Structure
22 bit tag stored with each 32 bit block of data Compare tag field with tag entry in cache to check for hit Least significant 2 bits of address identify which 16 bit
word is required from 32 bit data block e.g.
Address Tag Data Cache line
FFFFFC FFFFFC 24682468 3FFF
77Embedded Software
Replacement Algorithms (1)Direct mapping
No choice Each block only maps to one line Replace that line
78Embedded Software
Replacement Algorithms (2)Associative & Set Associative
Hardware implemented algorithm (speed) Least Recently used (LRU) e.g. in 2 way set associative
Which of the 2 block is lru?
First in first out (FIFO) replace block that has been in cache longest
Least frequently used replace block which has had fewest hits
Random
79Embedded Software
LRU (timestamp(access) 가 필요 )LFU (frequency(access) 가 필요 )FIFO (timestamp( 로딩된 시간 ))
Set 0
Set 1
Set 2
Set 3
way 1, way 2 timestamp
frequency
Locality of Reference
80Embedded Software
Set 0
Set 1
Set 2
Set 3
timestamp
frequency
0,1,0,0,1,0,0,1Access pattern
0 1
Frequency: 0=5, 1=3Access timestamp: 1 이 최근Loading timestamp: 0 이 old
81Embedded Software
Write Policy
Must not overwrite a cache block unless main memory is up to date
Multiple CPUs may have individual caches cache coherency ( 일관성 = 여러 개의 캐쉬의 데이터와
메인메모리의 데이터가 같아야 함 )
I/O may address main memory directly
82Embedded Software
Write through
All writes go to main memory as well as cache Multiple CPUs can monitor main memory traffic to keep lo-
cal (to CPU) cache up to date Lots of traffic Slows down writes
Remember bogus write through caches!
83Embedded Software
Write back
Updates initially made in cache only Update bit for cache slot is set when update occurs If block is to be replaced, write to main memory only if up-
date bit is set Other caches get out of sync I/O must access main memory through cache N.B. 15% of memory references are writes
84Embedded Software
The Memory System
Embedded systems and applications The memory system requirements: vary considerably
Simple blocks Multiple types of memory Caches Write buffers Virtual memory
85Embedded Software
Memory management units
Memory management unit (MMU) translates addresses: Protection checks
CPU main
memory
memory management
unit
logical address
physical address
86Embedded Software
Memory management tasks
Allows programs to move in physical memory during execu-tion
Allows virtual memory: memory images kept in secondary storage; images returned to main memory on demand during execution
Page fault: request for location not resident in memory
87Embedded Software
Address translation
Requires some sort of register/table to allow arbitrary map-pings of logical to physical addresses
Two basic schemes: segmented paged
Segmentation and paging can be combined (x86)
88Embedded Software
메모리 단편화 (Fragmentation)
P1
P2
P3P3
P2
P3
P5
단편화 (fragmentation)
89Embedded Software
압축 (compaction)
P2
P3
P5P2
P3
90Embedded Software
Segments and pages
memory
segment 1
segment 2
page 1 page 2
Size 가 고정
Size 가 가변
91Embedded Software
Code and data segment (section)
#include <stdio.h> int a,b,c=3; static int k=2; void main(void) { {int d=5; int e=c; int d=f; e=add(3,5);d=add(3,5)} int *p = (int*) malloc(int); } int add(int y, int z) { int d=7; static int f++=7; return y+z+f; }
Text (code)
DataBSS (Block Started by Symbol)
또는 Block Static Storage
Stack
Heap
92Embedded Software
Heap 과 Stack
SP(Stack Pointer)
93Embedded Software
Code and data segment (section)
Text (code)
DataBSS (Block Started by Symbol)
또는 Block Static Storage
Stack
Heap
Text (code)
Data
COFF 또는 ELF Header
Loading LD(.so)
Symbol table
94Embedded Software
Storage class and Scope
Static vs Volatile Static vs Dynamic Static vs External (In C and C++) extern Static vs Instance (Class variable in C++ and
Java) Local vs Global Fixed (size) data vs Variable (size) data Variable(dynamic) 메모리는 단편화의
위험이 있다 .
95Embedded Software
96Embedded Software
Segment address translation
segment base address logical address
range check
physical address
+
range error
segment lower bound segment upper bound
97Embedded Software
Page address translation
page offset
page offset
page i base
concatenate
98Embedded Software
Page table organizations
flat
page descriptor
tree
page descriptor
99Embedded Software
Caching address translations
Large translation tables require main memory access TLB: cache for address translation
Typically small
ARM Memory Management Unit
101
Embedded Software
ARM Memory Management
System control coprocessor(CP15) Memory Write Buffers Caches
Registers Up to 16 primary registers Physical registers in CP15 more than 16
Register access instructions MCR (ARM to CP15) MRC (CP15 to ARM)
102
Embedded Software
Cached MMU memory system
103
Embedded Software
ARM Memory Management
MMU can be enabled and disabled Memory region types:
section: 1 Mbytes block large page: 64 Kbytes small page: 4 Kbytes tiny Page: 1 Kbytes
Two-level translation scheme (why?) First-level table Second-level table
Page table size for 4-KB pages : 220 X
4 bytes = 4 MB
104
Embedded Software
ARM address translation
offset 1st index 2nd index
physical address
Translation table base register
1st level table
descriptor
2nd level table
descriptor
concatenate
105
Embedded Software
First-level descriptors
AP: access permission C,B: cachability and bufferability
106
Embedded Software
Section descriptor and translating section refer-ences
CP reg 2:
16 KB bound
ary
4K Entries 1 MB block (section)
Max: 16KB
107
Embedded Software
Coarse Page table descriptor
4 K entries
Max: 16KB
256 entries
Max: 1KB
108
Embedded Software
Fine page table descriptor
1 K entries
Max: 4 KB
109
Embedded Software
Second-level descriptor
110
Embedded Software
Translating large page references
111Embedded Software
Access permissions
System (S) and ROM (R) in CP15 register 1
112
Embedded Software
TLB functions
Invalidate instruction TLB Invalidate instruction single entry Invalidate entire data TLB Invalidate data single entry
TLB lockdown
113
Embedded Software
PC Bus Architecture
The northbridge, also known as the memory con-troller hub (MCH) in Intel systems (AMD, VIA, SiS and others usually use 'northbridge'), is traditionally one of the two chips in the core logic chipset on a PC motherboard
The Southbridge, also known as the I/O Con-troller Hub (ICH) in Intel systems (AMD, VIA, SiS and others usually use 'southbridge'), is a chip that implements the "slower" capabilities of the motherboard in a northbridge/southbridge chipset computer architecture.
114
Embedded Software
I/O devices
Usually includes some non-digital component Typical digital interface to CPU:
CPU
statusreg
datareg
mec
hani
sm
115
Embedded Software
I/O addressing
A microprocessor communicates with other devices using some of its pins Port-based I/O (parallel I/O)
Processor has one or more N-bit ports Processor’s software reads and writes a port just like a register
E.g., P0 = 0xFF; v = P1; -- P0 and P1 are 8-bit ports Bus-based I/O
Processor has address, data and control ports that form a single bus Communication protocol is built into the processor A single instruction carries out the read or write protocol on the bus
116
Embedded Software
Bus-based I/O
프로세서는 동일한 버스를 사용해서 메모리나 주변장치와 통신
Memory-mapped I/O Peripheral registers occupy addresses in same address space as
memory e.g., Bus has 16-bit address
lower 32K addresses may correspond to memory upper 32k addresses may correspond to peripherals
Standard I/O (I/O-mapped I/O) Additional pin (M/IO) on bus indicates whether a memory or periph-
eral access e.g., Bus has 16-bit address
all 64K addresses correspond to memory when M/IO set to 0 all 64K addresses correspond to peripherals when M/IO set to 1
117
Embedded Software
Memory-mapped vs. Standard I/O
Memory-mapped I/O 다른 특별한 명령이 요구되지 않음
Assembly instructions involving memory like MOV and ADD work with peripherals as well
Standard I/O No loss of memory addresses to peripherals Simpler address decoding logic in peripherals possible
When number of peripherals much smaller than address space then high-order address bits can be ignored
smaller and/or faster comparators Standard I/O requires special instructions (e.g., IN, OUT) to move data
between peripheral registers and memory
118
Embedded Software
Timers( 타이머 )
Timer: 시간 간격 (time interval) 측정 To generate timed output events
e.g., hold traffic light green for 10 s To measure input events
e.g., measure a car’s speed
Clock Pulse 의 counting 에 기반 E.g., let Clk period be 10 ns And we count 20,000 Clk pulses Then 200 microseconds have passed 16-bit counter would count up to
65,535*10 ns = 655.35 microsec., resolu-tion = 10 ns
Top: indicates top count reached, wrap-around
16-bit up counter
Clk Cnt
Basic timer
Top
Reset
16
119
Embedded Software
Counters( 카운터 )
카운터와 유사하나 , 클럭 펄스의 수를 세는 것이 아니라 일반 입력 신호로 부터의 펄스 수를 카운트 e.g., count cars passing over a
sensor Can often configure device as ei-
ther a timer or counter
16-bit up counter
Clk16
Cnt_in
2x1 mux
Mode
Timer/counter
Top
Reset
Cnt
120
Embedded Software
Watchdog timer
Since most industrial or mission critical embedded system can-not fail, how do we guarantee that a glitch doesn’t break the in-struction flow?
Watchdog timer - 시스템의 동작을 모니터링하여 , 다양한 조건 발생 시에서 RESET signal 발생 Power supply voltage goes out of range Computer hasn’t issued a reset pulse to the timer in designated time
interval
WatchdogTimer
RESET IN
RESET OUT
Processor
Output port: bit 0
RESET INPUT
121
Embedded Software
Interrupt interface
임베디드 시스템의 실시간성 요구에 필수적인 요소
CPU
statusreg
datareg
mec
hani
sm
PC
intr request
intr ack
data/address
IR
122
Embedded Software
Interrupts
Suppose a peripheral intermittently receives data, which must be serviced by the processor The processor can poll the peripheral regularly to see if data has ar-
rived – wasteful The peripheral can interrupt the processor when it has data
Requires an extra pin or pins: Int If Int is 1, processor suspends current program, jumps to an Inter-
rupt Service Routine, or ISR Known as interrupt-driven I/O Essentially, “polling” of the interrupt pin is built-into the hardware, so
no extra time!
123
Embedded Software
Interrupts (2)
ISR(interrupt service routine) 의 주소 ? Fixed interrupt
Address built into microprocessor, cannot be changed Either ISR stored at address or a jump to actual ISR stored if not
enough bytes available Vectored interrupt
주변장치가 주소를 제공 Common when microprocessor has multiple peripherals connected by
a system bus Compromise: interrupt address table
124
Embedded Software
Additional interrupt issues
Maskable vs. non-maskable interrupts Maskable: programmer can set bit that causes processor to ignore in-
terrupt Important when in the middle of time-critical code
Non-maskable: a separate interrupt pin that can’t be masked Typically reserved for drastic situations, like power failure requiring immedi-
ate backup of data to non-volatile memory Jump to ISR
Some microprocessors treat jump same as call of any subroutine Complete state saved (PC, registers) – may take hundreds of cycles
Others only save partial state, like PC only Thus, ISR must not modify registers, or else must save them first Assembly-language programmer must be aware of which registers stored
125
Embedded Software
Direct memory access (DMA)
Buffering Temporarily storing data in memory before processing Data accumulated in peripherals commonly buffered
Microprocessor could handle this with ISR Storing and restoring microprocessor state inefficient Regular program must wait
DMA controller more efficient Separate single-purpose processor Microprocessor relinquishes control of system bus to DMA con-
troller Microprocessor can meanwhile execute its regular program
No inefficient storing and restoring state due to ISR call Regular program need not wait unless it requires the system bus
ARM 프로세서(Xscale core 기반의 PXA255 중심으로 )
127
Embedded Software
References
ARM Architecture reference manual Second edition, by David Seal, Addison-wesley, 1996
ARM System Developer’s Guide Designing and Optimizing System Software Andrew N. Sloss, Dominic Symes, and Chris Wright, Morgan 2004
KAUFMANN and Elsevier
128
Embedded Software
PXA255 Processor
Intel PXA255 Overview High Performance 32-bit Microprocessor
Max 400MHz Technology
0.35um, 3 layer metal CMOS, 2.6 Million transistors 256 PBGA package (17x17mm)
Xscale core 로서 ARMv5TE 기반 Modified-Harvard Architecture 가 적용된 ARM 프로세서
Separate Instruction and data cache (2 caches)
129
Embedded Software
ARM Processor Evolution
130
Embedded Software
Evolution of ARM Architecture
ARM Architecture Revision (Version) 특정 ISA (Instruction Set Architecture) 을 가진다 .
http://www.arm.com/pdfs/ARM11%20Microarchitecture%20White%20Paper.pdf
131
Embedded Software
ARM Nomenclature
132
Embedded Software
ARM Nomenclature (2)
133
Embedded Software
ARM Revision History
134
Embedded Software
CPSR and Attribute Comparison
135
Embedded Software
ARM Processor Variants
136
Embedded Software
ARM7 Family
ARM7core has a Von-Neumann style architecture, 3stage pipeline, ARMv4T instruction set
ARM7TDMI is the first of a new range of processors intro-duced in 1995 by ARM
ARM7TDMI-S : same as 7TDMI but synthesizable ARM720T : has MMU, (capable of Linux and WinCE), uni-
fied 8Kcache (Data + Instruction) A variation of ARM7 is ARM7EJ-S: 5-stage pipeline, exe-
cutes ARMv5TEJ instructions
137
Embedded Software
ARM9 Family ARM9 family was announced in 1997 5 stage pipeline higher clock frequency than ARM7 family Memory system redesign Harvard architecture (separate D and I cache
(buses) The first processor in ARM9 family is ARM920T (Separate D + I cache, MMU
OS with virtual memory, ARMv4T instructions ARM922T is a variation on ARM920T (half of the cache size) ARM940T (smaller D+I cache and MPU) The next processors in ARM9 family are based on ARM9E-S core ( synthesiz-
able version of ARM9 core with E) Two variations of ARM9E-S: ARM946E-S and ARM966E-S Both execute architecture v5TE instructions Both support optional embedded trace macrocell (ETM)
ARM946E-S includes TCM, cache, and an MPU (designed for use in embedded control applications that require deterministic real-time response)
ARM966E does not have MPU and cache extensions (but does have config-urable TCMs)
The latest core in ARM9 family is ARM926EJ-S (announced in 2000) Designed for portable Java-enabled devices such as 3G phones and PDAs) ARM926EJ-S is the first ARM core to include Jazelle technology Includes MMU, configurable TCMs, and D+I caches
138
Embedded Software
ARM10 Family
ARM10 was designed for performance (announced in 1999) It extends the ARM9 pipeline to six stages Optional vector floating-point(VFP) unit ARM1020E is the first processor to use an ARM10E core Separate 32K D+I caches, MMU, optional vector floating
point unit, dual 64bit bus interface for increased perfor-mance
ARM1026EJ-S is very similar to ARM926EJ-S but both MPU and MMU Has performance of ARM10 and the flexibility of an ARM926EJ-S
139
Embedded Software
ARM11 Family
ARM1136J-S was designed for high performance and power-efficient applications (announced in 2003)
ARM1136J-S : the first processor implementation of ARMv6 architecture instructions
8stage pipeline with separate load-store and arithmetic pipe-lines
ARMv6 instructions include SIMD extensions for media pro-cessing
ARM1136JF-S is an ARM1136J-S with the addition of the vector floating point unit
140
Embedded Software
Specialized Processors
StrongARM was originally co-developed by Digital Semi-conductor and is now exclusively licensed by Intel Popular for PDA (high performance and low power consumption Harvard architecture with separate D+I caches 5 stage pipeline
without Thumb instruction set
XScale is a follow-on product to the StrongARM (upto 1GHz) Xscale executes architecture v5TE instructions Harvard architecture and is similar to the StrongARM, Includes
MMU
SC100 is designed for low-power security applications Based on ARM7TDMI core with an MPU Used for smart card applications
141
Embedded Software
Memory Management of ARM
Three different types of memory management hardware of ARM Non-protected memory MPU: Memory Protection Unit
Simple system that uses a limited number of memory resions
MMU: Memory Management Unit Used by Virtual memory management system of OS
142
Embedded Software
ARM Architecture 특징 비교
143
Embedded Software
ARM Processor Roadmap
http://www.arm.com/pdfs/ARM11%20Microarchitecture%20White%20Paper.pdf
144
Embedded Software
ARM Roadmap
145
Embedded Software
ARM7 과 ARM9 Core 의 비교http://www.arm.com/documentation/ARMProcessor_Cores/index.html
Maximum Clock Freq. 1.8 ~ 2 배 향상Performance: 30% 향상
146
Embedded Software
ARM Architecture
출처 : ARM6 Architecture: http://www.arm.com/documentation/White_Papers/index.html
147
Embedded Software
ARMv6 의 성능향상기법
148
Embedded Software
Little and Big endians
Little Endian 0x345f --- address: 0x8000: 5f 0x8001: 34 DNS
www.ajou.ac.kr MSB: kr LSB: www
Intel i386 CPUs Big Endian
kr.ac.yu.www 0x345f 34 , 5f IBM, Motorola
Mixed (Supports both Little and Big endians) ARM (default: Little Endian)
149
Embedded Software
삼성에서 나오는 ARM 프로세서들 http://www.samsung.com/Products/Semiconductor/common/
product_list.aspx?family_cd=LSI090101
150
Embedded Software
Qualcomm Processors
151
Embedded Software
Qualcomm MSM6800http://www.cdmatech.com/images/products/diagram_msm6800.pdf
152
Embedded Software
Qualcomm MSM3300http://www.cdmatech.com/solutions/pdf/msm3300_chipset.pdf
153
Embedded Software
TMS320DM270
http://www.tij.co.jp/jsc/docs/apps/digital/pdf/tms320dm270.pdf
154
Embedded Software
Intel XScale Core Architecture
Refer to Intel XScale Core Developer’s Manual
January, 2004
156
Embedded Software
Extensions to ARM Architecture
157
Embedded Software
Event Architecture
158
Embedded Software
Event Priority of XScale
159
Embedded Software
Configuration
160
Embedded Software
MCR/MRC Format
161
Embedded Software
LDC/STC Format when Accessing CP14
162
Embedded Software
CP15 Registers
Intel PXA255 Processor
Intel PXA255 Processor Developer’s Manual
January, 2004
164
Embedded Software
System Integration Unit
165
Embedded Software
PXA255 Pin
Serial Channel 4(CODEC)
Serial Channel 0 (USB)
Serial Channel 1
Serial Channel 2 (IrDA)
Serial Channel 3 (UART)
Power Management
Clocks, Reset and Test
JTAG
UDC-UDC+
RXD_1TXD_1RXD_2TXD_2RXD_3TXD_3TXD_C
RXD_C
SCLK_C SFRM_C
BATT_FAULT VDD_FAULT
PWR_EN
TCK_BYPTESTCLK PEXTAL
PXTAL TEXTAL TXTAL
nRESETnRESET_OUT
SMROM_ENROM_SEL TCK
TDITDO
TMSnTRST
L_DD(15:0) L_FCLK L_LCLK L_PCLK L_BIAS GP(27:0) GPIO Ports nCAS/ DQM(3:0)
SDCLK<2:0> SDCKE<1:0> nSDCAS nSDRAS RDY nCS(5:0) nWE nOE nRAS/ nSDCS(3:0)
LCDControl
Memory Control
RD/nWR Transceiver Control nPOE
nPCE<2:1>
nPIOW
nPIOR
nPWE
VDD
nIOIS16
nPWAIT
nPREG
PSKTSEL
VSS/VSSX
VDDX
PCMCIA Bus Signals
Supply
A<25:0>
D<31:0> Data Bus
Address Bus
Intelⓡ XScale* PXA250[256-pins]
166
Embedded Software
PXA255 Address Map
Static Memory Interface (ROM, Flash, SRAM)384 Mbytes
PCMCIA Interface 512 Mbytes
Memory Mapped registers Interface192 Mbytes
Dynamic Memory Interface 256 Mbytes
SDRAM Bank 3 (64 Mbytes)
SDRAM Bank 2 (64 Mbytes)
SDRAM Bank 1 (64 Mbytes)
SDRAM Bank 0 (64 Mbytes)
Memory Mapped registers (LCD)
PCMCIA/CF - Slot 1(256 Mbytes)
PCMCIA/CF - Slot 0(256 Mbytes)
Static Chip Select 3 (64 Mbytes)
Static Chip Select 2 (64 Mbytes)
Static Chip Select 1 (64 Mbytes)
Static Chip Select 0 (64 Mbytes)0h0000 0000
0h1000 0000
0h2000 0000
0h3000 0000
0h4000 0000
0hA000 0000
0h4400 0000
0h4800 0000
0h4C00 0000
0h0400 0000
0h0800 0000
0h0C00 0000
0h1800 0000
0hA800 0000
0hA400 0000
0hAC00 0000
0hB000 0000
Reserved (128 Mbytes)
Static Chip Select 5 (64 Mbytes)
Static Chip Select 4 (64 Mbytes)0h1400 0000
Reserved (1280 Mbytes)
Memory Mapped registers (Memory Ctl)
Memory Mapped registers (Peripherals)
Reserved (1344 Mbytes)
0hFFFF FFFF
167
Embedded Software
PXA255 기반의 Example System
Intel®XScalePX255
PortableCommunicationsMicroprocessor
UARTCommunications
Tablet/ SerialKeyboard
AC97
InfraredCommunications
USB SynchronizationPort
TFT ColorLCD
Display
SDRAM/DRAM
SMROM/ROM
Flash
Glue Logic
SRAM
VariableLatency
I/O
PCMCIA Interface(Flash, Modem)
SpeakerMicrophone3.686MHz
32.768KHz
168
Embedded Software
PXA255 Processor(1)
ASICASIC
Color or Grayscale LCD Controller
Color or Grayscale LCD Controller
RTCRTC
OS TimerOS Timer
PWM(2)PWM(2)
InterruptController
InterruptController
Clock &Power Man.
Clock &Power Man.
I2SI2S
I2CI2C
AC97AC97
FF_UARTFF_UART
BT_UARTBT_UART
Slow lrDASlow lrDA
Fast lrDAFast lrDA
SSPSSP
MemoryController
MemoryController
VariableLatency
I/OControl
PCMCIA& CF
Control
StaticMemoryControl
Ge
ne
ral
Pu
rpo
se
I /
OG
en
era
l P
urp
os
e I
/ O
Pe
rip
he
ral
Bu
sP
eri
ph
era
l B
us
3.6864 MHzOsc
3.6864 MHzOsc
32.768 KHzOsc
32.768 KHzOsc
System BusSystem Bus
XCVRXCVR
ROM/FlashSRAM
4 banks
ROM/FlashSRAM
4 banks
Socket 0
Socket 1
DynamicMemoryControl
SDRAM/SMROM4 banks
SDRAM/SMROM4 banks
DM
A C
on
tro
lle
r a
nd
Bri
dg
eD
MA
Co
ntr
oll
er
an
d B
rid
ge
CS # 0,1,2
CS # 3,4,5
0x4400_0000
XScaleCore
XScaleCore
IMMUIMMU
DMMUDMMU
Icache(32 Kbytes)
Icache(32 Kbytes)
Dcache(32 Kbytes)
Dcache(32 Kbytes)
MinicacheMinicache
Instructions
PC
Addr
WriteBuffer
WriteBuffer
ReadBuffer
ReadBuffer
Load/Store Data
MegacellCore
NSSPNSSP
USBClient
USBClient
MMCMMC
169
Embedded Software
PXA255 Processor(2)
Micro-architecture
Branch Target Buffer
TraceBuffer
InstructionCache
32KBytes
Data Cache32 KBytes
Mini D-Cache2 KBytes
MMU
MMU WriteBuffer
SystemManagement
DebugJTAG
CP0 Multiplier /
Accumulator
CP 15 Config
Registers
CP 14 PerformanceMonitoring
IRQ FIQ
Interrupt
Request
Coprocessor
Interface
Instruction
ExecutionCore
Data
AddressData
Core Memory Bus
Mini I-Cache2 KBytes
170
Embedded Software
PXA255 Processor(3)
XScale Core Architecture Features
InstructionCache 32 Kbytes 32 Ways Lockable by line
Micro-Processor7 Stage pipeline
Data Cache Max 32 Kbytes 32 ways WR-Back or WR-Through Hit under miss
Debug Hardware Breakpoints Branch History Table
MAC Single cycle Throughput (16*32) 16-bit SIMD 40-bit accumulator
PowerMgntCtrl
Write Buffer 8 entries Full coalescing
JTAG
Data Ram Max 28 Kbytes Re-map of data cache
Branch TargetBuffer128 entries
IMMU 32 entry TLB Fully associative Lockable by entry
DMMU 32 entry TLB Fully associative Lockable by entry
Fill Buffer 4~8 entries
PerformanceMonitoring
Mini-DataCache 2 Kbytes 2 ways
171
Embedded Software
PXA255 Processor(4)
XScale Core 32Bit RISC 32Bit registers 32Bit instructions
Longword aligned 32Bit datapaths 7~8 stage pipeline
ALUExecute
ALUExecute
Register FileOperandShifter
Register FileOperandShifter
InstructionFetch1
InstructionFetch1
WriteBack
WriteBack
StateExecute
StateExecute
PC
PC - 12
PC - 16
InstructionFetch2
InstructionFetch2 PC - 4
InstructionDecode
InstructionDecode PC - 8
Data CacheAccess
Data CacheAccess
Data CacheAccess
Data CacheAccess
Data CacheWriteback
Data CacheWriteback
MultiplierStage1
MultiplierStage1
MultiplierStage X
MultiplierStage X
MultiplierStage2
MultiplierStage2
F1
RF
X1
F2
ID
M1
M2
Mx
X2
XWB
DWB
D2
D1
MAC pipelineMain execution
pipeline Memory pipeline
172
Embedded Software
PXA255 Processor(5)
Advanced Microcontroller Bus Architecture
Arbiter
TIC
EBI
ARM
Bus I/F
Bri
dge
On-chipRAM
Decoder
Timer
Remap /Pause
InterruptController
Reset
ExternalROM
ExternalROM
ExternalRAM
ExternalRAM
AHB or ASB APB
SlowPeripherals
CPU 버스 : A, B, ALU BUS 로 구성
ARM HOST BUS, ARM SYSTEM BUS프로세서 내부에 내장된 고속 장치 연결
ARM PERIPHERAL BUS저속으로 동작하는 장치 연결
173
Embedded Software
PXA255 Processor(6)
Memory Model
Memory
CoreOn-chipCaches
MMU
Virtual AddressesPhysical Addresses
BuffersMemoryController
PX255 Processor
174
Embedded Software
PXA255 Processor(7)
PXA255 BUS Reads Cache line fills read 8 words Read Allocate Round robin replacement
XScaleCore
SystemMemory
PXA255
32KBI- Cache
MemoryController
32KBD-Cache
PC
Addr
32 bytes
Instruction
Data
Core Clock Half Core Clock
System BusExternal Bus
I-MMU
D-MMU
ReadBuffer
A[0:31]A[0:25]
D[0:31] D[0:31]
D[0:31]
D[0:31]
A[0:31]
A[0:31]
VA[0:31]
VA[0:31]
Instructions&
Data
hit
hit
miss
miss
175
Embedded Software
PXA255 Processor(8)
PXA255 BUS Writes No wirte to I-Cache Write Back D-Cache Software coherency needed between caches Not write allocate
XScaleCore
SystemMemory
PXA255
MemoryController
32KBD-Cache Addr
32 bytes
Data
Core Clock Half Core Clock
System BusExternal Bus
D-MMU
WriteBuffer(8entries)
A[0:31]A[0:25]
D[0:31] D[0:31]
D[0:31]
A[0:31] VA[0:31]
Data
Dirty Bits
176
Embedded Software
What happens on a write?
Write through—The information is written to both the block in the cache and to the block in the lower-level memory.
Write back—The information is written only to the block in the cache. The modified cache block is written to main memory only when it is replaced. is block clean or dirty?
Pros and Cons of each? WT: read misses cannot result in writes WB: no writes of repeated writes
WT always combined with write buffers so that don’t wait for lower level memory
177
Embedded Software
Write Buffer for Write Through
A Write Buffer is needed between the Cache and Memory Processor: writes data into the cache and the write buffer Memory controller: write contents of the buffer to memory
Write buffer is just a FIFO: Write Buffer 는 쓰는 경우의 성능 향상을 위해 존재
캐시 메모리는 명령어 , 데이터를 읽을 경우의 성능 향상 CPU 가 쓰기 동작을 하는 동안에도 다른 처리를 계속할 수
있도록 , 주소와 데이터가 write buffer 에 저장 버스의 사용권한이 write buffer 에 주어지면 외부장치에 write
ProcessorCache
Write Buffer
DRAM
178
Embedded Software
Cache organization
Tag
0124531 36781011 912
Virtual address IndexLine
offset
4byte8word32set
To FromCPU
32lines
TagCAM
01
23
2829
3031
DATARAM
0 1 2 3 4 5 6 7
179
Embedded Software
PXA255 - 명령어 캐시 명령어 캐시 (Instruction Cache)
32KB Instruction Cache 1024 lines of 32bytes(8words) Uses the virtual address 32-way 32-set associative Round-Robin replacement Mapped via MMU page C bits
MMU 가 enable 되었을 경우에는memory management table 에 있는C 비트에 의해서 제어된다 .
MMU 가 disable 되었을 경우에는 모든 어드레스에 대하여 C=1 인 된다 . C=1 또는 MMU 가 disable 상태인 경우 miss 인 경우
8word 의 line fetch 가 수행이 되어 Round-robin replacement 에 의해서 Cache bank 가 대치된다 .
MMU 가 enable 되고 C=0 인 경우에는 virtual address 에 해당하는 외부 메모리로부터 single word 를 읽어오고 , cache 에 쓰여지지 않는다 .
Data
PC
Address
Instructions
IMMUIMMU
DMMUDMMU
32 KbytesI-cache
32 KbytesI-cache
Main D-cache&
Mini-D-cache
Main D-cache&
Mini-D-cache
XScaleCore
XScaleCore
180
Embedded Software
PXA255 - 데이터 캐시 데이터 캐시 (Data Caches)
Two Data Caches(Main Data Cache, Mini Data Cache) Both: writeback, read allocate, virtual Mapped via MMU page B, C bits
Main Data Cache, 32KB 32-way 32-set associative Round-Robin replacement B=1 & C=1
Mini Data Cache, 2KB 2-way set associative Least Recently Used(LRU) replacement B=0 & C=1
Data
PC
Address
Instructions
IMMUIMMU
DMMUDMMU
16 KbyteI-cache
16 KbyteI-cache
Main D-cache&
Mini-D-cache
Main D-cache&
Mini-D-cache
XScaleCore
XScaleCore
181
Embedded Software
PXA255 - Read Buffer
PXA255 Read Buffer Data prefetcher
saves processor waiting
load & calculate in parallel
for Read-Only data
supplements the data cache
Under software control
Coprocessor 15, register #9
4 entries, 32 bytes each
Loads of 1, 4, 8 words
Replace or invalidate data
Data
PC
Address
Instructions
I-cacheI-cache
D-cache&
mini-D-cache
D-cache&
mini-D-cache
XScaleCore
XScaleCore
Write BufferWrite Buffer
128 ByteReadBuffer
128 ByteReadBuffer
System Bus
182
Embedded Software
PXA255 Memory Management
D-CACHE
XScaleCore
Virtual Addresses Space
SystemMemory
Instructions
Data
Descriptors
I-CACHE
PXA255MMU
Physical Addresses Space
VAPAAC
VAPAACB
32
ITLB
DTLB
TLB Miss
Translation TableBase Register
Coprocessor 에 의한 MMU 지원
183
Embedded Software
PXA255 Processor - CP15
CP15 Register structure
Register Purpose
0 ID Register
1 Control
2 Translation Table Base
3 Domain Access Control
5 Fault Staus
6 Fault Address
7 Cache Operations
8 TLB Operations
9 Read Buffer Operations
10 TLB lockdown
13 Process ID Mapping
14 Debug Support
15 Test & Clock Control
4,11~12 UNUSED
184
Embedded Software
PXA255 CoProcessor
CP15 registerimplementer revision
31 2423 1615 4 3 0
part number (BCD)0 0 0 0 0 0 0 AC0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 M
31 15 1413 12 1110 9 8 7 6 5 4 3 2 1 0
I Z F R S B L D P W C AVRRC1
0 0 0 0 0 0 0 0 0 0 0 0 0 0
31 14 13 0
translation table base address
D0
31 3029 28 2726 25 2423 2221 20 1918 17 1615 1413 12 11 10 9 8 7 6 5 4 3 2 1 0
D3 D2 D1D4D7 D6 D5D8D11 D10 D9D12D15 D14 D13
C2
C3
C5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 status
31 9 8 7 4 3 0
domain0
fault address
31 0
C6
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
31 2524 0
process IDC13
185
Embedded Software
PXA255 Processor - SCM
System Control Module Power management controller
Supporting normal, idle and sleep modes 81 general purpose I/O ports
Generate FIQ, IRQ, “wakeup” interrupts Interrupt controller
Routes all system (GPIOs, LCD, Serial Channel) interrupts to either IRQ or FIQ Multi-channel DMA controller
Software programmable to any serial port and LCD Supporting External DMA
Real time clock and timer 32 bit counter/comparator 32.7 kHz crystal - accuracy +/- 5 sec/month
OS timer with alarm register 32 bit counter/comparator 3.68 MHz crystal - fine grain timing interrupts
186
Embedded Software
PXA255 - running mode
Summary of running mode of PXA255
SLEEPIDLE
HARDWARE RESET
RUN
Power on, nRESET asserted
nRESET assertednRESET assertednRESETnegated
nRESETasserted
Wait for interruptinstruction Force sleep bit set, or VDD
or battery fault pins asserted
System orperipheral unitinterrupt
GPIO or RTCalarm interrupt
VDD or battery fault pins assertedCPU clock held low, all otherresources active, wait for interrupt
Wait for wake-up event
187
Embedded Software
PXA255 Processor - GPIO
General Purpose I/O GPIO[58:73] = dual panel color or 16 bit parallel input on LCD GPIO[23:27] = SPI if both synchronous serial protocols
are required in a single system Modem control signals for UART (CTS, RTS, CD, etc) implemented
via GPIO signals 4-5 GPIOs required for full PCMCIA support 3 GPIOs required for Intel® SA-1111 Interface
188
Embedded Software
PXA255 General Purose I/O Block Diagram
Pin Direction Register(GPDR)
Alternate FunctionRegister(GAFR)
Pin Set Registers(GPSR)
Edge DetectStatus Register(GEDR)
Rising Edge DetectEnable Register(GRER)
Falling Edge DetectEnable Register(GFER)
EdgeDetect
Pin-LevelRegister(GPLR)
0
1
Alternate Function(Output)
Alternate Function(Input)
Pin Clear Registers(GPCR)2
3
3
210
Power Manager
Sleep Wake-up logic
2
0x40E0_000C/10/14
GPDR1 : 출력0 : 입력
0x40E0_0054/58/5C0x40E0_0060/64/68
0x40E0_0060/64/68
0x40E0_0060/64/68
Base Address
0x40E0_0000
0x40E0_0048/4C/50
0x40E0_0030/34/38
0x40E0_003C/40/44
0x40E0_0000/04/08
189
Embedded Software
PXA255 Interrupt controller
Interrupt Controller
Level Register(ICLR)
Interrupt Controller
Mask Register (ICMR)
Interrupt Source Bit
Interrupt Controller
Pending Register (ICPR)
Interrupt Controller
IRQ Pending Register (ICIP)
Interrupt ControllerFIQ Pending Register (ICFP)
All Other Qualified interrupt Bits
FIQ
IRQ
23 23 XScale CORE
CPSR.6(F)
CPSR.7(I)40D0 0000
40D0 0004
40D0 0008
40D0 000C
40D0 0010
40D0 0014 : Interrupt controller control register (ICCR) ICCR.0 : disable idle mask(DIM)
CCR[DIM]=0 & IDLE mode=‘1’
0 : IRQ1 : FIQ
190
Embedded Software
Universal Serial Bus
USB: Standard used for device/peripheral interconnect in PC market.
Intel® PXA250 is Client not Hub Differential signaling Half-duplex Individual bits encoded with NRZI Bit stuffing keeps receiver synchronized Hand-held use USB to synchronize to a desktop PC
USB USB
UDC+ UDC-
191
Embedded Software
DMAC Block Diagram
Memory Controller
System Bus(internal)
Control Register DMA Controller
16 DMA Channels
DREQ[1:0]
(external)
PREQ[37:0]
(internal)
Peripheral Bus
(internal)
DMA_IRQ
(internal)
Channel 0
Channel 15
DDADR 0
DSADR 0
DTADR 0
DCMD 0DINT
DRCMR 0
DSCR 0
192
Embedded Software
Serial Infrared Datalink
IrDA: Infrared Data Association Standard v1.1
www.irda.org, 150 members including Digital
HP-SIR at 115kbps and 4PPM at 4Mbps UART datastream divided by 16
Pulse then fed to IR transceiver 4PPM encodes 2 data bits at a time
Each period divided into 4-125ns time periods 125ns pulse, period 1 represents 00; period 2 represents 01, etc
Loopback for diagnostics HandHelds talk IrDA with Laptops, PDAs & Printers
IrDAor UART
IrDAor UART
RXD 2 TXD 2
193
Embedded Software
UART
Universal Asynchronous Receiver/Transmitter UART: RS-232, Infamous PC ‘Com’ ports
Operates to 230 Kbits/s Level shifters needed for 5V logic (TTL) Loopback for diagnostics Data is byte wide if DMA used HandHelds talk RS232 for synchronization, commu-
nication, keyboard I/O, software loading, etc Primary debug connection
for ARM Software Development Toolset
UARTUART
RXD 3 TXD 3
194
Embedded Software
PXA255 - H/W Interface(1)
PXA255
RESET_IN
RESET_OUT
MR
RESET
MAX811T
31 5 7
JTAG_RST
JTAG PORT J20
RESET
RESET
RESET(EMPOS II 예 ) uP Reset Circuit MAX811T Voltage Monitor ( 3V~3.15 ) Manual Reset Input ( Push button – “Low” ) Multi-ICE Reset Reset Output to Flash
195
Embedded Software
PXA255 - H/W Interface(2)
PXA255Memory
ControllerInterface
ADDR [10..23]
DATA [0..32]
Flash16Bit Low
Flash16Bit High
D[0..15]
D[16..31]
CS0
RESET
OE
Flash memory 3Volt Intel Strata Flash - 28F128 32Bit Data Bus Size : 32MByte -128Mbit (16Mbyte) * 2 EA MSC0 - Static Chip Select 0 (Bank 0) Base Address = 0x0000_0000
196
Embedded Software
PXA255 - H/W Interface(3)
PXA255Memory
ControllerInterface
ADDR [10..23]
DATA [0..32]
SRAM16Bit Low
SRAM16Bit High
D[0..15]
D[16..31]
DQM[0..1]DQM[2..3]
CS3WEOE
Static RAM (SRAM) Samsung K6R4016V1C 3Volt High-Speed CMOS Static RAM 32Bit Data Bus / 1Mbyte MSC1 - Static Chip Select 3 (Bank 3) Base Address = 0x0C00_0000
197
Embedded Software
PXA255 - H/W Interface(4)
PXA255Memory
ControllerInterface
ADDR [10..24]
DATA [0..32]
SDRAM16Bit Low
SRAM16Bit High
D[0..15]
D[16..31]
DQM[0..1]DQM[2..3]
nSDCS0
WERAS/CAS
SDCLK1/SDCKE1
SDRAM (SDRAM) Samsung Synchronous DRAM - K4S561632 32Bit Data Bus 256Mbit - 4M x 16Bit x 4 Bank Size : 64MByte -256Mbit (32Mbyte) * 2 EA SDRAM Bank 0 - Dynamic Memory Base Address = 0xA000_0000
198
Embedded Software
PXA255 - H/W Interface(5)
PXA255
D[15:0] D[15:0]
SOCKET 0
SOCKET 1
DIR OE#
DIR OE#
D[15:0]
GPIO(7)
GPIO(12)
CD1#CD2#
CD1#CD2#
RDY/BSY#
RDY/BSY#
GPIO(11)
GPIO(10)
PSKTSEL
nPIOR
nPOE
A(25:0)OE#WE#IOR#IOW#REG#
MA(25:0)nPOEnPWEnPIORnPIOWnPREG
A(25:0)OE#WE#IOR#IOW#REG#nPCE(1:2) CE(1:2)#
CE(1:2)#
nPWAIT
nPIOS16
WAIT# WAIT#
IOIS16#
IOIS16#
PCMCIA / CF
199
Embedded Software
PXA255 - H/W Interface(6)
HT6542B
KBCOKBCI
KBDOKBDI
MSCO
MSCI
MSDO
MSDI MOUSE
KEYBOARD
DIR OE#
CS#
D(7:0)
PXA255
RD_nWR
nCS1nCS2
nCS3nCS4
DIR OE#
MD(31:0)
HT6542_CS
AddressDecoder
MA(25:0)nOE
nPWE
A0RD#WR#
DQ RESET#
KB_INT
MS_INT
GPIO(19)
GPIO(9)
PS2 Keyboard / Mouse Holtek HT6542B 8Bit Data Bus 8MHz Operating Support PS/2 compatible mouse
200
Embedded Software
PXA255 - H/W Interface(7)
PXA255
AC’97 Controller
Unit(ACUNIT)
nACRESET
CS4202
AC’97 Primary CODEC
SDATA_OUT
SYNC(48 kHz)
SDATA_IN_0
BITCLK(12.288MHz
Audio Codec Cirrus Logic CS4202 AC’97 2.2 Compliant 20-bit Stero D/A Converters 18-bit Stero A/D Converters MIC Input / Headphone Output
201
Embedded Software
PXA255 - H/W Interface(8)
PXA255
MD(31:0)
T/F
T/F
Primary Ethernet
Secondary Ethernet
ADDR (15:2)
D(31:0) D(31:0)
DIR OE#
LogicnCS1nCS2nCS3nCS4
RD_nWR
nPWEnOE
MA(25:0)nDQM(3:0)
WE#OE#A(15:2)DQM(3:0)#
WE#OE#A(15:2)DQM(3:0)#
nCS1nCS2
GPIO(0)
GPIO(1)
INTR0INTR0
Ethernet Controller SMSC 10/100 Ethernet Single Chip LAN91C111 Internal 32Bit Wide Data Path 8Kbytes Internal Memory (Receive and Transmit FIFO Buffers) External 25MHz-output pin for an external PHY and MAC MSC0,1 - Static Chip Select 1,2 (Bank 1,2) Base Address = 0x04000_0000 (Pri) 0x0800_0000(sec)
202
Embedded Software
PXA255 - H/W Interface(9)
PXA255
MemoryController
MD(7:0)
BUFFER
CS4
Address Decoder
MA(22:20)
G
+3.3V
Push Switches 8Bit Read [ D0~D7 ] Base Address = 0x1050_0000
203
Embedded Software
PXA255 - H/W Interface(10)
PXA255
MemoryController
MD(7:0)
BUFFER
CS4
Address Decoder
MA(22:20)CK
Discrete LED’s 8Bit Write [ D0~D7 ] Base Address = 0x1060_0000
204
Embedded Software
PXA255 - H/W Interface(11)
PXA255
MemoryController
MD(15:0) LATCH
CS4
Address Decoder
MA(22:20)
CK
CK
LATCH
DQ(7:0)
DQ(15:8)
DQ(7:0)
DQ(15:8)
7 Segment LED’s 16Bit Write [ D0~D7 ] Base Address = 0x1030_0000 [ Low 2 Segment ]
0x1040_0000 [ High 2 Segment ]
205
Embedded Software
PXA255 - H/W Interface(12)
PXA255
MemoryController
MD(15:0) LATCH
CS4
Address Decoder
MA(22:20)
CK
LATCH
Character LCD Module
+5V
VD
D(7:0)DQ(7:0)
RSRW
E
DQ(8)
DQ(9)DQ(10)
Character LCD 8Bit Data Write [ D0~D7 ] 3Bit Control Write [ D8~D10] Base Address = 0x1060_0000 20 Characters x 2 Lines / Backlight Type