Signal Processing Catches the Multi-core Wave
Transcript of Signal Processing Catches the Multi-core Wave
Freescale Semiconductor Confidential and Proprietary Information. Freescale™ and the
Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service
names are the property of their respective owners. © Freescale Semiconductor, Inc. 2005.
TM
Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other
product or service names are the property of their respective owners. © Freescale
Semiconductor, Inc. 2005.
TM
Dan BouvierDirector, Advanced Processor ArchitectureFreescale Semiconductor
Signal ProcessingCatches theMulti-core Wave:Which Architecture is Rightfor Your Application?
GSPx 2005 Conference
25 October 2005
TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of
their respective owners. © Freescale Semiconductor, Inc. 2005.
Prologue
• Decisions taken for scheduling and mapping at a high level
of abstraction have a major impact on the global design flow
“Only positive consequences encourage good future performances”
John Harvey-Jones
• Completing tasks in the least time possible is highly desirable
• Finally –
it is well known that nine women can deliver a
child in one month…
TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of
their respective owners. © Freescale Semiconductor, Inc. 2005.
Evolution of this presentation
• The underlying motivations to move to multi-core
• A mathematician’s view of signal processing and
applicability to parallelism
• Some attributes of signal processing applications Signal
Processing Applications
• Applying Applications to Signal Processors
• Futures: Longer term processor evolution
TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of
their respective owners. © Freescale Semiconductor, Inc. 2005.
An appetite for more
• By the year 2010, the average person
will encounter more than 300 embedded
processors every day.
-- Semico Research
• Applications of all forms continue
to demand more computational
performance
• Traditional means for scaling
performance have run their course
• We are on the cusp of an exciting
transition in how we meet the
performance demands
TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of
their respective owners. © Freescale Semiconductor, Inc. 2005.
Is the system now the chip?
“It may prove to be more economical to build large
systems out of smaller functions, which are separately
packaged and interconnected.”
“The availability of large functions, combined with functional design and
construction, should allow the manufacturer of large systems to design and
construct a considerable variety of equipment both rapidly and economically.”
April 19, 1965, Gordon Moore , 35th Anniversary Issue of Electronics magazine
“Clearly, we will be able to build such component-crammed equipment.”
TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of
their respective owners. © Freescale Semiconductor, Inc. 2005.
Performance scaling and technology challenges
• Clock rate improvements slowing: 40%/year 12%/year! Pipelining has increased by factor of 4 in last decade
– not possible in next decade
Semiconductor
Technology
Pipelining
GAP
8-10 FO4
Pipeline
Historical
Microarchitecture
Technology
Source: UT Dept. Computer Science
TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of
their respective owners. © Freescale Semiconductor, Inc. 2005.
How did we get here? – the physics
ITRS Historical Technology Trends
0.1
1
10
100
250 180 130 90 65 45 32
Transistor Gate Delay
Technology Node
TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of
their respective owners. © Freescale Semiconductor, Inc. 2005.
How did we get here? – the physics
ITRS Historical Technology Trends
0.1
1
10
100
250 180 130 90 65 45 32
Transistor Gate Delay
Local Interconnect
Global Interconnect
with RepeatersGlobal Interconnect
w/o Repeaters
Technology Node
TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of
their respective owners. © Freescale Semiconductor, Inc. 2005.
How much logic can we touch in 1 clock cycle?
• Transistors getting faster
• But wire delays begin to dominate
• Historical Solution: Divide and conquer with longer pipeline (less work per clock)
At 1GHz At 6GHz
TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of
their respective owners. © Freescale Semiconductor, Inc. 2005.
Time for a new approach
• The increase in performance is roughly proportional to the square root of
the increase in complexity.
! Doubling the logic of a processor core delivers approximately 40 percent more
performance
“There ain’t no such thing as a free lunch.” R.A. Heinlein, The Moon is a Harsh Mistress
Speculation
Multi-Issue
VLIW
Branch
Prediction
Pipelining
Out of Order
Execution
Pollack’s Rule
TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of
their respective owners. © Freescale Semiconductor, Inc. 2005.
Power Trends with the PowerPC® Processor Family
Hitting the application power envelope wall
603603
750
7410
7455
7457
7447A
7448F
requency (
MH
z)
Fre
quency/w
TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of
their respective owners. © Freescale Semiconductor, Inc. 2005.
Are multi-core processors the answer?
• Multi-core has better performance
per watt
Po
wer
Po
we
r
Po
we
r
Single Core
2x F1
Single Core
F1
Dual Core
F1
Perf Pe
rf
Pe
rf
Perf
orm
ance
Time
2x core
2x frequency
• Options for higher performance
! Double the core speed
! Double the cores
• Both can have performance close to
that of 2x a single core at F1
TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of
their respective owners. © Freescale Semiconductor, Inc. 2005.
What forms might these processors take?
• General-purpose processors with Vector SIMD engines
• Multi-core Digital Signal Processors
• Hybrid processors – GPP + DSP
Freescale Semiconductor Confidential and Proprietary Information. Freescale™ and the
Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service
names are the property of their respective owners. © Freescale Semiconductor, Inc. 2005.
TM
Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other
product or service names are the property of their respective owners. © Freescale
Semiconductor, Inc. 2005.
TM
Mathematician Viewof Signal Processingand Applicabilityto Parallelism
TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of
their respective owners. © Freescale Semiconductor, Inc. 2005.
Parallelism
• Sequential algorithms historically preceded parallel algorithms
• In simple terms parallelism viewed as…
! Algorithms that can be represented as a directed graph
> nodes presenting operations
> edges presenting data flow
! If graph contains layers of parallel nodes
> it could be executed in parallel
> mapped to a parallel platform
• Graph representation can help to determine optimal
depth of the algorithm
TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of
their respective owners. © Freescale Semiconductor, Inc. 2005.
Scalar+
+
+
+
+
+
+
C
B
D
A
E
F
G
H
+ + + +
Partial
CheckSum
Once per
Function
Vector
…
A CB D E F G H
+
- is Add with Carry+
Parallel code example: inner loop of the checksum
TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of
their respective owners. © Freescale Semiconductor, Inc. 2005.
• Many signal transforms can be represented by a transformation matrix A
which is multiplied by an input data X vector of degree n, to produce the
desired output vector Y=AX, or formally:
• If we assume that we also have processing elements (PEs)
• Then at best we would need steps on possible PEs
• Many transforms also use complex math
! which could be translated to use real numbers, by replacing every
complex number a + bj by the matrix
Math background
!=
=n
j
jijixay
1
)(log2
2 nO )( 4nO
2n
TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of
their respective owners. © Freescale Semiconductor, Inc. 2005.
Generic matrix multiply
for(i=0;i<N;i++) {
for(j=0;j<N;j++) {
for(k=0;k<N;k++){
c[i][j] = c[i][j] + a[i][k]*b[k][j];
}
}
}
Math background ! matrix multiply example
TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of
their respective owners. © Freescale Semiconductor, Inc. 2005.
!!!!
"
#
$$$$
%
&
!!!!
"
#
$$$$
%
&
=
!!!!
"
#
$$$$
%
&
33323130
23222120
13121110
03020100
33323130
23222120
13121110
03020100
33323130
23222120
13121110
03020100
BBBB
BBBB
BBBB
BBBB
AAAA
AAAA
AAAA
AAAA
CCCC
CCCC
CCCC
CCCC
333323321331033033
330323021301030003
320322021201020002
310321021101010001
300320021001000000
...
BABABABAC
BABABABAC
BABABABAC
BABABABAC
BABABABAC
+++=
+++=
+++=
+++=
+++=Note: All source elements
are available for computation
at start
Math background ! matrix multiply example
TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of
their respective owners. © Freescale Semiconductor, Inc. 2005.
Math background ! matrix multiply example
• Same algorithm could be presented in several formats
• The unrolled version in this case is directly “mappable” to SIMD
(AltiVec™ Multiply-Sum instruction)
vc[0] = va[0]*vb[0] + va[1]*vb[4] + va[2]*vb[8] + va[3]*vb[12];
vc[1] = va[0]*vb[1] + va[1]*vb[5] + va[2]*vb[9] + va[3]*vb[13];
vc[2] = va[0]*vb[2] + va[1]*vb[6] + va[2]*vb[10]+ va[3]*vb[14];
…….
vc[14] = va[12]*vb[2] + va[13]*vb[6] + va[14]*vb[10] + va[15]*vb[14];
vc[15] = va[12]*vb[3] + va[13]*vb[7] + va[14]*vb[11] + va[15]*vb[15];
for(i=0;i<N;i++) {
for(j=0;j<N;j++) {
for(k=0;k<N;k++){
vc[i][j] = vc[i][j] + va[i][k]*vb[k][j];
}
}
}
333323321331033033
330323021301030003
320322021201020002
310321021101010001
300320021001000000
...
BABABABAC
BABABABAC
BABABABAC
BABABABAC
BABABABAC
+++=
+++=
+++=
+++=
+++=
TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of
their respective owners. © Freescale Semiconductor, Inc. 2005.
Math background ! matrix multiply example
• As a result, four 4x4 matrix of bytes speedup can achieve 400%
vresult0
va[0]*vb[0]+va[1]*vb[4]+
va[2]*vb[8]+va[3]*vb[12]
va[0]*vb[1]+va[1]*vb[5]+
va[2]*vb[9]+va[3]*vb[13]
va[0]*vb[2]+va[1]*vb[6]+
va[2]*vb[10]+va[3]*vb[14]
va[0]*vb[3]+va[1]*vb[7]+
va[2]*vb[11]+va[3]*vb[15]
vresult0 = vec_msum(va_temp, vb_temp, vzero);
0 1 2 3 4 5 6 7 8 9 1
0
1
1
1
2
1
5
1
3
1
4
D
C
Prod
B
A
Freescale Semiconductor Confidential and Proprietary Information. Freescale™ and the
Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service
names are the property of their respective owners. © Freescale Semiconductor, Inc. 2005.
TM
Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other
product or service names are the property of their respective owners. © Freescale
Semiconductor, Inc. 2005.
TM
Some Attributes ofSignal ProcessingApplications
TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of
their respective owners. © Freescale Semiconductor, Inc. 2005.
Math background
• If serial form of algorithm was computationally “stable,”
it will remain as such in parallel form
! “internal” parallelism
• Majority of traditional DSP algorithms fall under this category
! Fast Fourier transform (FFT)
! Discrete Fourier transform (DFT)
! Discrete cosine transforms (DCTs)
! Walsh-Hadamard transform (WHT)
! Various filters
! ECC codes (Viterby, Convolution, CRC)
! and many others
TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of
their respective owners. © Freescale Semiconductor, Inc. 2005.
Signaling processing code types
• Two major classes
!Signal processing> filtering
> transformation (such as FFT, DCT, etc)
> convolution of signals
> correlation
!Baseband processing> channel coding
– convolutional, turbo, Reed-Solomon, LDPC (Low Density Parity-check Code)
> decoders of above codes such as Viterbi decoder for convolutional codes
> Cyclic Redundancy Code ( CRC )
> source coding such as voice compression and image (still or video) compression
• In addition, there are many other signal processing types
! Example: modulation
Freescale Semiconductor Confidential and Proprietary Information. Freescale™ and the
Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service
names are the property of their respective owners. © Freescale Semiconductor, Inc. 2005.
TM
Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other
product or service names are the property of their respective owners. © Freescale
Semiconductor, Inc. 2005.
TM
Applying Applicationsto Signal Processors
TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of
their respective owners. © Freescale Semiconductor, Inc. 2005.
Multicore DSPs in the converged network
Packet Trunk
Bypass
Enterprise
IP-PABX
IP-Phone
TDM-to-IP/ATM
Gateway
Trunking
Gateways
MSC/RNC
Fax
Server
Workstation
ISP-RAC
Internet
Printer
ISP WEB Servers
DSLAM
DSL Router
IP-TV
ATM Switch
CATV
CMTS
Media GW
802.11 Notebook
3G Network
TRAUGateway
VideoTranscoding
Gateway
Enterprise
Network
SOHO/Home
NetworkInfrastructure
DSPs
Access
Network
ISP
Network
802.11 Notebook
802.11 AP
Node-B
Broadband
Network
Video-
Phone
Content Server
Cable Modem
Enterprise
Network
PSTN
Switches
Packet
Video Streaming
Server
802.11 AP
TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of
their respective owners. © Freescale Semiconductor, Inc. 2005.
The ways…
! Performance target
! Workload mix
! Power consumption
DSP GPP
! Code development and portability
! Memory access profile
! Application stability
• Many tradeoffs in choosing processors for the application
TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of
their respective owners. © Freescale Semiconductor, Inc. 2005.
Multi-Core DSPs are here now - MSC8122
• Four 500MHz SC140 DSP cores
• 1.4MByte internal SRAM
• 10/100BT Ethernet
interface support
• 4 TDM interfaces
• DSI port (32/64)
• 16-channel DMA engine
• In production
“Large System built from
smaller functions”
TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of
their respective owners. © Freescale Semiconductor, Inc. 2005.
Multi-core programming considerations
• How should memory be partitioned?
! Multi-level memories
! Do I want to use the instruction cache?
• What do I need to do to allow for multi-threading?
! Re-entrancy
• How can device resources be safely shared by the cores?
• How can the cores and other hosts communicate efficiently with
each other?
• How do you partition an application?
TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of
their respective owners. © Freescale Semiconductor, Inc. 2005.
Basic elements for multi-core DSP usage
Core-to-coreInterrupt(VIRQ)
DMA
M2SharedMemory
BinarySemaphore
SharedInterruptSource
Multi-coreLink managedby one project
InstructionCache
(ICache)
WriteBuffer(WB)
SharedSubroutine
RTOS
Multi-taskprogramming
C RuntimeLibrary
MemoryUtilization
Necessary for
Multi-core DSP
Inter-Core
Synchronization
and Communication
Three major elements are
Binary Semaphore, Core-to-
core Interrupt and DMA.
Legend
RealizedBy hardware
RealizedBy software
TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of
their respective owners. © Freescale Semiconductor, Inc. 2005.
SharedInterruptSource
Core-to-coreInterrupt(VIRQ)
DMA
BinarySemaphore
Inter-Core Synchronization and Communication
M2SharedMemory
Multi-coreLink managedby one project
InstructionCache
(ICache)
WriteBuffer(WB)
SharedSubroutine
RTOS
Multi-taskprogramming
C RuntimeLibrary
MemoryUtilization
Necessary for
Multi-core DSP
Inter-Core
Synchronization
and Communication
Three major elements are
Binary Semaphore, Core-to-
core Interrupt and DMA.
Legend
RealizedBy hardware
RealizedBy software
TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of
their respective owners. © Freescale Semiconductor, Inc. 2005.
Core-to-coreInterrupt(VIRQ)
DMA
BinarySemaphore
SharedInterruptSource
Hardware Support for Efficient Application
M2SharedMemory
Multi-coreLink managedby one project
InstructionCache
(ICache)
WriteBuffer(WB)
SharedSubroutine
RTOS
Multi-taskprogramming
C RuntimeLibrary
MemoryUtilization
Necessary for
Multi-core DSP
Inter-Core
Synchronization
and Communication
Three major elements are
Binary Semaphore, Core-to-
core Interrupt and DMA.
Legend
RealizedBy hardware
RealizedBy software
TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of
their respective owners. © Freescale Semiconductor, Inc. 2005.
SharedInterruptSource
InstructionCache
(ICache)
WriteBuffer(WB)
Core-to-coreInterrupt(VIRQ)
DMA
BinarySemaphore
Software Programming for Multi-core DSP
M2SharedMemory
Multi-coreLink managedby one project
SharedSubroutine
RTOS
Multi-taskprogramming
C RuntimeLibrary
MemoryUtilization
Necessary for
Multi-core DSP
Inter-Core
Synchronization
and Communication
Three major elements are
Binary Semaphore, Core-to-
core Interrupt and DMA.
Legend
RealizedBy hardware
RealizedBy software
TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of
their respective owners. © Freescale Semiconductor, Inc. 2005.
Parallel computation
• Parallel computation is a very vague and general topic…
• Flynn’s classification
! SISD, SIMD, MISD, MIMD
• SIMD and MIMD are both in essence parallel platforms, but…
• SIMD - Single Instruction Multiple Data
! Implemented as one logic control unit, but multiple PEs operating on multiple
data streams
• MIMD could vary greatly by degree of integration:
! Multi-node MIMD system
! Multi-processor system
! Multi-core device
TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of
their respective owners. © Freescale Semiconductor, Inc. 2005.
Parallel computation
• A MIMD system needs a model to partition workloads
into it
• Parallel Random Access Machine (PRAM) is widely
used
• The operation of a synchronous PRAM can result in
simultaneous access to the same location in shared
memory
• Synchronization usually achieved through system of
locks or semaphores
TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of
their respective owners. © Freescale Semiconductor, Inc. 2005.
Strategies for multithreading synchronization
• Binary semaphore - mutual exclusion for sharedresources
! Guaranteed exclusive access to avoid corruption by a preemptive task
• Ideal mapping via Lock-free and Wait-free algorithms
! Allows multiple threads to concurrently read and write shared data
! Every step taken brings progress to the system
! No synchronization primitives, such as mutexes or semaphores, canbe involved
! "Wait-free"
>A thread can complete any operation in a finite number of steps
>Regardless of the actions of other threads
TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of
their respective owners. © Freescale Semiconductor, Inc. 2005.
Processing convergence
• General-purpose processor• Exploit data-level parallelism
• Add a vector SIMD unit
• Leverage a unified programming
model
Vector UnitFPUIU
Dispatch
Cache / Memory
128 bits64 bits32 bitsInstr
ucti
on
Str
eam Execution Flow
VR30
VR31
VR0
VR1
VR2
Vector Register File
Vector ALU
Vector Permute
128 128 128
128128
TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of
their respective owners. © Freescale Semiconductor, Inc. 2005.
Increased single thread performance with SIMD MPC7447A
MPC7447A Altivec™ Performance Improvement
1 10 100 1000
OSPF/Dijkstra
Route Lookup/Patricia
Packet Flow - 512 kbytes
Packet Flow - 1 Mbyte
Packet Flow - 2 Mbytes
Autocorrelation - Data1 (pulse)
Autocorrelation - Data2 (sine)
Auto-Correlation - Data3 (speech)
Convolutional Encoder - Data1 (xk5r2dt)
Convolutional Encoder - Data2 (xk4r2dt)
Convolutional Encoder - Data3 (xk3r2dt)
Fixed-point Bit Allocation - Data2 (typ)
Fixed-point Bit Allocation - Data3 (step)
Fixed Point Bit Allocation - Data6 (pent)
Fixed Point Complex FFT - Data1 (pulse)
Fixed point Complex FFT - Data2 (spn)
Fixed Point Complex FFT - Data3 (sine)
Viterbi GSM Decoder - Data1 (get)
Viterbi GSM Decoder - Data2 (toggle)
Viterbi GSM Decoder - Data3 (ones)
Viterbi GSM Decoder - Data4 (zeros)X Factor
12.09X
2.89X
TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of
their respective owners. © Freescale Semiconductor, Inc. 2005.
Big performance gain with minimal power impact
MPC7447AL AltiVec Power Measurements @ 1.4GHz/1.3V
0
2
4
6
8
10
12
14
16
18
20
22
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Benchmarks
Po
we
r (W
att
s)
Without AltiVec
With AltiVec
Average Measured Typical Power
TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of
their respective owners. © Freescale Semiconductor, Inc. 2005.
Higher numeric performance power density achieved
• Multi-core performance scaling
• Dual 1.67GHz e600 PowerPC®
processor cores
• Data-level parallelism
• Two 128b AltiVec™
SIMD Engines
• System-level parallelism
• Serial RapidIO 1x/4x
Dual-Core PowerPC® Processor – MPC8641D
TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of
their respective owners. © Freescale Semiconductor, Inc. 2005.
Challenges
Inertia:
More than 3 decades of exploiting Instruction Level Parallelism (ILP)
! Sequential thinking
! Programming style
• Need development environments to ease transition
! New compilation strategies, libraries, language support
! Perhaps a portable SIMD API –for common library development
! New analysis tools
! Methods to make code portable
! Benchmark retooling
Freescale Semiconductor Confidential and Proprietary Information. Freescale™ and the
Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service
names are the property of their respective owners. © Freescale Semiconductor, Inc. 2005.
TM
Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other
product or service names are the property of their respective owners. © Freescale
Semiconductor, Inc. 2005.
TM
Futures:How Will ProcessorsEvolve Longer TermThe Challenges in Long-Term
Processor Evolution
TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of
their respective owners. © Freescale Semiconductor, Inc. 2005.
Digital signal processor lineage
• 1st Generation
Specialized Hardware
for Accelerating Multiplications
• Increased Parallel Operations
• Single-Issue Complex-Instruction
• Multiple Instructions in every cycle
(VLIW, Superscaler)
• Single Instruction Multiple Data (SIMD)
• Parallel Execution MIMD (Multi-core)
2010s
2000s
1990s
1980s
Where might the next step take us?
TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of
their respective owners. © Freescale Semiconductor, Inc. 2005.
The solution
• Long-term performance scaling will come fromparallelism! Must have 100s of simultaneous instructions in flight
• Processor core architecture must remain simple! Frequency and power still important
• Simple replication of cores not the end game! Only so much unassisted thread level parallelism to exploit
• Parallel computing not a new topic
• Next-generation architecture must! easily adapt to either thread- or instruction-level parallelism
! have tighter marriage between hardware and software
! Examples:
> Heterogeneous processors
> DARPA – Polymorphous Computing ArchitectureRAW, Smart Memories, TRIPS
TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of
their respective owners. © Freescale Semiconductor, Inc. 2005.
Summary
• REALITY:
Continued appetite for more processing performance
• DISCOVERY:
Physics driving us to new approaches
• MAPPING:
Signal processing applications can leverage parallelism
• ENABLEMENT:
Multi-core signal processing is here now
TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of
their respective owners. © Freescale Semiconductor, Inc. 2005.
Processing Intelligence and
Behind the World’s Networks
We Deliver the
Connectivity Solutions
TM Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of
their respective owners. © Freescale Semiconductor, Inc. 2005.