Putting MPSOC to Work in Multimedia · © 2007 Tensilica Inc. Sample of HiFi2 Audio Software ......

21
1 © 2007 Tensilica Inc. Six billion people want live multimedia entertainment and information anywhere and anytime at the lowest cost 1. Multimedia subsystems appear everywhere – big market 2. Technical demands require new tools, architectures and business models 3. Diversity within multimedia mirrors diversity of entire MPSOC challenge Putting MPSOC to Work in Multimedia Putting MPSOC to Work in Multimedia Putting MPSOC to Work in Multimedia

Transcript of Putting MPSOC to Work in Multimedia · © 2007 Tensilica Inc. Sample of HiFi2 Audio Software ......

Page 1: Putting MPSOC to Work in Multimedia · © 2007 Tensilica Inc. Sample of HiFi2 Audio Software ... SRS TruSurround HD AMR NB codec ... Turbo TI C62x 18x decoding

1 © 2007 Tensilica Inc.

Six billion people want live multimedia entertainment and information anywhere and anytime at the lowest cost

1. Multimedia subsystems appear everywhere – big market

2. Technical demands require new tools, architectures and business models

3. Diversity within multimedia mirrors diversity of entire MPSOC challenge

Putting MPSOC to Work in MultimediaPutting MPSOC to Work in MultimediaPutting MPSOC to Work in Multimedia

Page 2: Putting MPSOC to Work in Multimedia · © 2007 Tensilica Inc. Sample of HiFi2 Audio Software ... SRS TruSurround HD AMR NB codec ... Turbo TI C62x 18x decoding

2 © 2007 Tensilica Inc.

MPSOC Becomes Mainstream StyleThe Transformation of the Modern Mobile Device

Key complementary processing functions of mobile devices covered by Tensilica configurable processors

Audio Codec

GPSBluetooth

Satellite Radio

HD Radio

User Authentication

Camera Image Processing

Mobile WiFi

Noise Cancellation

Video Codec

DTV Reception

Baseband DSP

Page 3: Putting MPSOC to Work in Multimedia · © 2007 Tensilica Inc. Sample of HiFi2 Audio Software ... SRS TruSurround HD AMR NB codec ... Turbo TI C62x 18x decoding

3 © 2007 Tensilica Inc.

The designer never has to manually add these instructioComplete

MultimediaSolutions

Software RulesNetwork of Partners and Customers Defines Strength

Video Imaging

Audio

AFA TECHNOLOGY

NUFRONTPnpNetwork

Technologies

MobileMedia

ReceiverConcept Embedded

Systems

Page 4: Putting MPSOC to Work in Multimedia · © 2007 Tensilica Inc. Sample of HiFi2 Audio Software ... SRS TruSurround HD AMR NB codec ... Turbo TI C62x 18x decoding

4 © 2007 Tensilica Inc.

Design Teams Need Different Entry PointsKey Question: Who Specifies Processor?

Hard-wired logic

RISC CPUμController

DSP

Xtensa LX

Dat

a P

roce

ssin

g N

eeds

Complex Computing Needs

• Xtensa configurable processors widely used in multimedia

• Diamond Standard cores introduced February 2006:

• Broad line of compatible controller, CPU and DSP cores

• High performance general-purpose DSP

• Complete low-power HiFi2 audio solution

• Diamond Video spans wide range of mobile media needs

Diamond388VDOVideo Codec

Diamond381 VDO

VideoDecode

Diamond383VDO

Video Codec

Diamond385VDOVideo

Decode

Xtensa

Diamond 570T

High-end CPU

Diamond 232L

Linux CPU

Diamond 212GP

Mid-rangeController/

DSP

Diamond 108 Mini

Low-powerController

Diamond330HiFi

Audio

Diamond 545CK

High-end DSPHiFi2Audio

Page 5: Putting MPSOC to Work in Multimedia · © 2007 Tensilica Inc. Sample of HiFi2 Audio Software ... SRS TruSurround HD AMR NB codec ... Turbo TI C62x 18x decoding

5 © 2007 Tensilica Inc.

Automation Enables Broad Product LineC

apab

i lit y

time

Xtensa 7

XtensaXtensa IIXtensa IIIXtensa IVXtensa VXtensa 6

Xtensa –next

Mainstream

extensible processors

XtensaLX

XtensaLX2

XtensaLX-next

High-performance

extensible processorsXPRES

Compiler

Processor Synthesis

- next

Automatic

processor synthesis

XTMPSystem Simulation

API

XTSCSystemC

Simulation

TurboXimEnergy

Modeling

Processor and

system modelingNext ModelingMP SW tools

CoWare

Seamless

xcc C/C++ Code

Development

Xtensa XplorerDevelopmentEnvironment

Next Xtensa Xplorer

Software development

environment

gccGNU tools

Xtensa XplorerDevelopmentEnvironment

DiamondCtrlrs

DiamondCtrlr - next

Standard controllers

DiamondAudio

DiamondVideo

nextDSP

next Video

next Audio

Application-

specific DSPs

DiamondDSP

Page 6: Putting MPSOC to Work in Multimedia · © 2007 Tensilica Inc. Sample of HiFi2 Audio Software ... SRS TruSurround HD AMR NB codec ... Turbo TI C62x 18x decoding

6 © 2007 Tensilica Inc.

Base CPU

AppsDatapaths

OCD

Timer

FPUExtended Registers

Cache

Multimedia Design is Energy-Based DesignExample: Xenergy Energy Explorer

Processor Description

Xenergy Energy

Estimator

Complete DynamicEnergy/Power Profile

Processor + Memory

Xtensa xccC/C++ compiler

int main( ) {int i;short c[100];for (i=0;i<N/2;i++){

int main( ) {int i;short c[100];for (i=0;i<N/2;i++){

Application Code

Tune processorinstruction set and

memories

Tune softwarealgorithms and coding

Page 7: Putting MPSOC to Work in Multimedia · © 2007 Tensilica Inc. Sample of HiFi2 Audio Software ... SRS TruSurround HD AMR NB codec ... Turbo TI C62x 18x decoding

7 © 2007 Tensilica Inc.

Application-Specific ProcessorsExample: HiFi 2 Audio Engine

• Pre-configured VLIW audio DSP with dual 24-bit MACs for higher quality audio• Already in use in dozens of chip designs, licensed by many top semiconductor

suppliers• Available 2 ways:

• Xtensa HiFi 2 Audio Engine for further customization• Diamond Standard 330HiFi audio core

• Delivers both low power and scalability• <75K gates for full-feature processor• Example: MP3 decode at 5.7 MHz (core + memory <0.75mW in 65nm)• Emerging trend: multiple instances of HiFi2 on same chip, tuned for different mW vs. MHz

Fetch/Decode

I-RAM

Reg File

ALU

Branch Unit

External I/F

Load/Store D-RAM 2

Brid

ge (o

pt.)

VLCAccel.

I-Cache

Audio ALU

Extended

Audio MAC 24b

Misc. AudioFunctionsQ

ueue

I/F

SLOT 1 SLOT 0

Audio Reg File

D-RAM 1D-CACHE

Page 8: Putting MPSOC to Work in Multimedia · © 2007 Tensilica Inc. Sample of HiFi2 Audio Software ... SRS TruSurround HD AMR NB codec ... Turbo TI C62x 18x decoding

8 © 2007 Tensilica Inc.

Sample of HiFi2 Audio Software

Currently available:MP3 decodeMP3 encodeWMA decodeWMA encodeAAC-LC decodeAAC-LC encodeaacPlus v2 decodeaacPlus v2 encodeaacPlus v1 decodeaacPlus v1 encodeOgg Vorbis DecoderDolby AC-3 5.1 ch decodeDolby AC-3 2 ch encodeDolby Digital Plus 5.1 ch

decodeDolby Digital Plus 7.1 ch

decode AM3D Diesel 3D

AM3D Zirene effectsQSound microQ MIDI, 3D, effectsSONiVOX AudioiNSIDEMIDISRS WOW XT audio expansionSRS Xspace 3DSRS TruSurround HDAMR NB codecAMR WB codecSchedule for release:BSAC decoderDolby TrueHD Dolby Digital Prologic II / IIxDolby Digital Consumer Encoder 2 channel encoder

Dolby Digital Compatible Output 5.1 channel encoder

aacPlus 5.1/7.1 channel decode

SBC Bluetooth stereo codec

AMR-WB+G.711 codecG.723.1 codecG.726 codec DTMF aacPlus 5.1/7.1 channel

decodeDolby TrueHD Dolby Digital

Prologic II / IIxG.167 (AEC)G.168 (LEC)WMA 10 Pro Decoder

Page 9: Putting MPSOC to Work in Multimedia · © 2007 Tensilica Inc. Sample of HiFi2 Audio Software ... SRS TruSurround HD AMR NB codec ... Turbo TI C62x 18x decoding

9 © 2007 Tensilica Inc.

Growth in Video Resolution and Complexity

Computational Complexity of Digital Video

1

10

100

1000

H.264MPEG-4MPEG-2

Video Standard

Rel

ativ

e C

ompu

teC

ycle

s (M

PEG

-2 C

IF =

1)

HD 1080pHD 720pD1 (576i)CIF

Source: “A Performance Characterization of High Definition Digital Video Decoding using H.264/AVC”, scaled for CIF

Page 10: Putting MPSOC to Work in Multimedia · © 2007 Tensilica Inc. Sample of HiFi2 Audio Software ... SRS TruSurround HD AMR NB codec ... Turbo TI C62x 18x decoding

10 © 2007 Tensilica Inc.

Diamond VDO Design Goals

• Video encode & decode up to720 x 480 @30 fps (NTSC)720 x 576 @25 fps (PAL)

• Support multiple coding standards @ main profileH.264 main profile @ L3 decoderMPEG-4 ASP [no GMC] @ L5 decoderMPEG-2 MP @ ML decoderWMV9/VC1 main profile decoderMPEG-4 ASP encoder

• Programmable solution • Less than 100MB/s memory decode bandwidth requirement• Operating frequency <200Mhz

Can be implemented in 130nm G technology

• Compatible subsets for decode only and Base Profile only

Page 11: Putting MPSOC to Work in Multimedia · © 2007 Tensilica Inc. Sample of HiFi2 Audio Software ... SRS TruSurround HD AMR NB codec ... Turbo TI C62x 18x decoding

11 © 2007 Tensilica Inc.

Top Level Block Diagram

Page 12: Putting MPSOC to Work in Multimedia · © 2007 Tensilica Inc. Sample of HiFi2 Audio Software ... SRS TruSurround HD AMR NB codec ... Turbo TI C62x 18x decoding

12 © 2007 Tensilica Inc.

Diamond VDO Product Family

Baseline Profiles – D1

Main / Advanced Profiles – D1

MPEG-4 Advanced Simple Profile

H.264 Main Profile MPEG-4 Advanced

Simple Profile VC-1/WMV9 Main ProfileMPEG-2 Main Profile

Diamond 388VDO

NoneH.264 Main Profile MPEG-4 Advanced

Simple Profile VC-1/WMV9 Main ProfileMPEG-2 Main Profile

Diamond 385VDO

MPEG-4 Simple Profile

H.264 Baseline ProfileMPEG-4 Simple ProfileVC-1/WMV9 Simple

Profile MPEG-2 Main Profile

Diamond 383VDO

NoneH.264 Baseline Profile MPEG-4 Simple ProfileVC-1/WMV9 Simple

ProfileMPEG-2 Main Profile

Diamond 381VDO

EncoderDecodersTensilica Engine

150 MHz30D1MPEG-2 Main Profile Decode

185 MHz30D1MPEG-4 Advanced Simple Profile Encode

189 MHz30D1VC-1/WMV9 Main Profile Decode

156 MHz30D1MPEG-4 Advanced Simple Profile Decode

172 MHz30D1H.264 Main Profile Decode

Max. Required

Clock Rate

Frames-per-

Second

ResolutionStandard

Page 13: Putting MPSOC to Work in Multimedia · © 2007 Tensilica Inc. Sample of HiFi2 Audio Software ... SRS TruSurround HD AMR NB codec ... Turbo TI C62x 18x decoding

13 © 2007 Tensilica Inc.

Optimizing EnergyISA Extension Example: CABAC

• Context-adaptive binary arithmetic coding (CABAC) achieves higher compression in H.264

Required for Main Profile H.264 decodingOur solution: Xtensa ISA extensionsPeak decode performance is one bin per cycle

13

Millions of cycles per second in ISA extended core

710CABAC decode cycles

Millions of cycles per second in unaugmented core

5mJ164mJEnergy for 1 second

Area cost: 20K gates for CABAC ISA extensions

Unaugmented core assumes base core plus 16b MUL, LOOP ops, debug, PIF at max MHz

Page 14: Putting MPSOC to Work in Multimedia · © 2007 Tensilica Inc. Sample of HiFi2 Audio Software ... SRS TruSurround HD AMR NB codec ... Turbo TI C62x 18x decoding

14 © 2007 Tensilica Inc.

Optimizing EnergyExample: Motion Estimation for MPEG4 Encode

14.216598pt SAD26.6204Control76.24268Motion Estimation

10.71143Half Pel

8.4735Diamond

16.352716x16 SAD

Millions of cycles required on ISA extended core

Millions of cycles required on unaugmented core

Area Cost: Less than 40Kgates for all Motion Estimation extensions

29mJ988mJEnergy for 1 second

Unaugmented core assumes base core plus 16b MUL, LOOP ops, debug, PIF at max MHz

Page 15: Putting MPSOC to Work in Multimedia · © 2007 Tensilica Inc. Sample of HiFi2 Audio Software ... SRS TruSurround HD AMR NB codec ... Turbo TI C62x 18x decoding

15 © 2007 Tensilica Inc.

System Issues in Mobile VideoExample: Decode Memory Bandwidth Drives Power

• Off-chip power is a major component of solution power

• Early, accurate modeling gives insight and enables novel optimization

• Non-trivial interactions:Core power increases with bit stream data rateMemory power decreases with bit stream data rate

• Design of video architecture, DMA and software reduces bandwidth by about one-third (saves 30mW)

• Additional power savings in on-chip interconnect and memory controller 0

2040

60

80

100120

140

160

60 70 80 90 100 110 120 130 140Delivered MB/s

Mob

ile D

DR p

ower

(mW

)

Estimated power for Micron MT46H16M16LF MobileDDR at 133MHz: 50% 16B block read, 50% 16B writes

0

20

40

60

80

100

120

0 2 4 6 8 10H.264 MP bitstream (Mbps)

Mem

ory

Band

wid

th (M

B/s)

Memory Bandwidth Needs

Memory Bandwidth Power Cost

Page 16: Putting MPSOC to Work in Multimedia · © 2007 Tensilica Inc. Sample of HiFi2 Audio Software ... SRS TruSurround HD AMR NB codec ... Turbo TI C62x 18x decoding

16 © 2007 Tensilica Inc.

Move Up to Fully Programmable HD Approach: five small video task engines + audio

Off-chip DRAM controller

Processor InterFace (PIF) connect

Off-chip DDR

Bridge to legacy system bus

AHB, AXI, OCP, CoreConnect etc.

BitstreamGP scalar with

CABAC/CAVLC accel

InstCache Data

RAM

MotionVector

GP FLIX

InstCache

DataRAM

1K motionvectors

PredictionPixel SIMD

InstCache

DataRAM

ResidualSmall SIMD

InstCache

DataCache

DeblockingPixel SIMD

InstCache

DataRAM

Hifi2 AudioEngine

InstCache

DataRAM

3-4 computed macroblocks:

<1.5KBcomputedresiduals<100B

Residual data on 6-8 macroblocks

2-4KB20-30 motion

vectors

~32B

bits

tream

data

Bits

tream

addr

Blo

ck d

ata

Boundary strength data on5-7 macroblocks

Pixel positions: 3-5 macroblocks

Intra

fram

ead

dr

Intra

fram

eda

ta

DataCache

3KB

DataCache

Optional Audio

Res

ult

addr

ess

& d

ata

Page 17: Putting MPSOC to Work in Multimedia · © 2007 Tensilica Inc. Sample of HiFi2 Audio Software ... SRS TruSurround HD AMR NB codec ... Turbo TI C62x 18x decoding

17 © 2007 Tensilica Inc.

Baseband DSP Needs Extensible Processors

Optimized baseband architecture with DSP-optimized task-engines

• Explosion of standards and multiple radios dictates agile-but-efficient baseband design

• Each processor extensible to handle specific algorithm kernels with RTL-like power and throughput

• Easy composition of engines into efficient pipeline

• Unified hardware and software development model

Baseband Processing Engine

MA

C

proc

esso

r

Enc

rypt

enco

de

proc

esso

r

Mod

ulat

ion

Dem

odul

atio

npr

oces

sor

Ana

log

I/FA

nalo

g I/F

Ana

log

I/F

Xtensa DSP Foundation

Ker

nel A

ccel

Pack

age

Ker

nel A

ccel

Pack

age

Ker

nel A

ccel

Pack

age

Ker

nel A

ccel

Pack

age

StreamingI/O

Interfaces

Special FunctionMemories

~100xTI C64xLDPC

2xRISCUpper layer processing4xRISCSSL

300xRISCAES43xRISC3-DESSecurity

18xTI C62xTurbo decoding

3xTI C64xViterbiError Coding

3xTI C64xFFT4xTI C64xFIRModulation

(e.g. OFDM)

Efficiency (cycle reduction)

VersusFunctionDomain

Page 18: Putting MPSOC to Work in Multimedia · © 2007 Tensilica Inc. Sample of HiFi2 Audio Software ... SRS TruSurround HD AMR NB codec ... Turbo TI C62x 18x decoding

18 © 2007 Tensilica Inc.

System Level Design MattersExample: Audio System Optimization

System-level design tools make a big difference• Tensilica supports wealth of system modeling:

ISS, TurboXim, XTMP C modeling, XTSC System C modeling, CoWare, Seamless

• Tensilica Xenergy Energy Estimator provides rapid, accurate configuration-specific application power analysis

• Example: Audio system organization analysis:

HiFi2 coreInst

cache

On-chip SRAM

Mobile DDR (32M x 16b)

Data cache

PIF

AudioDACs

DDR/Flash controller

Other sub-systems

PLLs & clock trees

0.250.3616K/32K DM + DDR

0.190.5616K/32K + 128K on-chip + DDR0.150.6516K/32K DM + 192K on-chip

0.160.654K/8K DM + 192K on-chip0.941.004K/8K DM + DDR

90G high-vt90GSystem Organization Power includes dynamic and static at minimum WMA decode MHzPower includes HiFi2 core, I cache, D cache, On-chip SRAM, DDR (no refresh)I/O through queues

Relative power

Page 19: Putting MPSOC to Work in Multimedia · © 2007 Tensilica Inc. Sample of HiFi2 Audio Software ... SRS TruSurround HD AMR NB codec ... Turbo TI C62x 18x decoding

19 © 2007 Tensilica Inc.

MPSOC System Design is CollaborativeExample: Xtensa in CoWare Platform Architect

Configured subsystems with rich interfaces:

• AMBA bus• Local memories• Caches• Special lookup

memories• Input queues• Output queues• Input wires• Exported state

wires• Interrupts

• Wrapper automatically generated from processor configuration

Page 20: Putting MPSOC to Work in Multimedia · © 2007 Tensilica Inc. Sample of HiFi2 Audio Software ... SRS TruSurround HD AMR NB codec ... Turbo TI C62x 18x decoding

20 © 2007 Tensilica Inc.

Putting It TogetherThe principles

• Multimedia SOC is not “rocket science”: hundreds of designs underway around the world

• Pay attention to best practices:1.Leverage existing design

standards, tools, interconnect and subsystem IP

2.Model early and often. Cost, performance, power largely set in architectural phase

3.Application standards explode. Build in programmability everywhere

4.Look at total bandwidth and total power, especially in off-chip communications

• Differentiation, time-to-market and high efficiency are compatible!

On-chip interconnect

Video

DSPI/O

inte

rface

s System CPU

Audio

Mem

ory

cont

rolle

r

RF

AudioDAC

MA

C

proc

esso

r

Enc

rypt

/enc

ode

proc

esso

r

Mod

ulat

ion

proc

esso

r

Fetch/Decode

I-RAM

Reg File

ALU

Branch Unit

External I/F

Load/Store D-RAM 2

Brid

ge

(opt

.)

VLC

Accel.

I-Cache

Audio ALU

Extended

Audio MAC 24bMisc.

AudioFunctio

ns

Que

ue

I/F

SLOT 1 SLOT 0

Audio Reg File

D-RAM 1D-CACHE

Fetch/Decode MMU

Reg File

ALU

Branch Unit

External I/F

Load/Store MMU

32

PIF

Brid

ge

(opt

.)

32

AMBA AHB lite

MAC 16 DSP

I-Cache

D-Cache

Page 21: Putting MPSOC to Work in Multimedia · © 2007 Tensilica Inc. Sample of HiFi2 Audio Software ... SRS TruSurround HD AMR NB codec ... Turbo TI C62x 18x decoding

21 © 2007 Tensilica Inc.Thank you!