Download - Interconnect-Centric Approach to System on a Chip (iSoC) for Low-Power Signal Processing

Interconnect-Centric Approach to System on a Chip (iSoC) for Low-P

ower Signal Processing

성균관대 조준동

차례 • 재구성 플랫폼 재구성 플랫폼 • Software Defined Radio Software Defined Radio • SW/HW SW/HW 통합 설계 사례통합 설계 사례• SW/HW SW/HW 통합 설계 도구 통합 설계 도구 • Network on Chip Network on Chip • 연구실 소개연구실 소개• 연구 제안 연구 제안

SoC and Customizable Platform Based-Design

ReconfigurableHardware

(Coarse Grain)ASIC 1

DSP Reconfigurabl

eHardware

(Fine Grain)ASIC 2

ControllerCPU

RAMROM

Flash

?

ControllerCPU

RAMROM

Flash

?

Semiconductor Revolutions- Makimoto’s wave

TTL µproc.,memory

19571967

19771987

1997

2007

ASICs,accel’s

LSI,MSI

FPGAs

coarsegrain

soft CPU

s

hardware people CSpeople new breed needed

Abstract• iSoC 는 SoC design 의 scalability, flexibili

ty 를 향상시키기 위한 on-chip communication architecture

• Dynamic Configuration• iSoC 의 규칙적이고 유연한 구조는 global

communication 을 위한 traffic, power, speed, area requirement 모델링을 위해 예측 가능한 framework 를 제공

IBM’s Coreconnect

초기의 32 비트에서 시작하여 128 비트까지 대역폭을 확장

Sonics Smart Interconnect IP

SMART (Sonics Methodology and Architecture for Rapid Time-to-Market)• plug-and-play on-chip communications net

work • Packet-based• 50 employees in a year • IP 및 설계환경 제공 , SoC 설계 지원• Cadence 와 연합 • SiliconBackplne III 는 통신 + 미디어

Nexperia Digital Video Platform

• Designing the initial platform, along with the pnx8500, wasn't quick and easy.

• It involved about 300 hardware, software and systems people working between 1999 and 2001, of which 60 were involved with hardware.

발전 방향 • 멀티미디어 응용 제품의 확대와 이에 필요한 대용량의 burst 데이터 전송요구를 만족하기 위한 통신 대역폭을 확장 • Dual-Core Architecture

(ARM+DSP)

온칩 네트워크 아키텍처

● Router/Scheduler 알고리즘 개발 ● SystemC 를 이용한 네트워크 모델 설계

및 검증 ● Star 형 /Mesh 형 온칩 네트워크 핵심 IP

설계 ● Master/Slave 네트워크 인터페이스 ,

고성능 메모리 관리 인터페이스 설계

온칩 네트워크 기반 SoC 설계 플랫폼 구축 및 설계 환경

● 분산형 Crossbar Switch Topology 생성 및 IP 맵핑 툴 개발

● IP to Mesh Tile 맵핑 툴 개발 ● IP 간 데이터 플로우 분석 기반 네트워크

Topology 생성 툴 개발 , SoC 플랫폼 구축

활용 분야- QoS 를 보장하는 프로토콜을 지원하여 Real Time Applicat

ion 및 대용량 데이터 대역폭이 요구되는 응용 분야에 적합

- 멀티미디어 SoC, 휴대 및 통신용 단말기 , 인터넷 셋톱 박스 , 게임기 , 네트워크 단말의 제품 구현에 필요한 시스템 레벨 칩 등

- high frame rate video 및 3D 그래픽 관련 등과 같은 멀티미디어 대용량 응용분야 SoC 설계

- 온칩 네트워크 핵심 IP 및 설계 지원 툴을 하나의 플랫폼화한 플랫폼 기반

설계 환경을 구축하여 이를 다양한 SoC 설계에 활용함

최근 연구동향• Intel’s Reconfigurable Radio Architecture. (mesh + near

est neighbor)• Reconfigurable Baseband Processing, Picochip• Portable Components using Containers for Heterogene

ous Platforms, Mercury Computer Systems, Inc.• A configurable Platform, Altera, Excalibur, Xilinx Virtex F

PGA• Adaptive Computing Machine, Quicksilver Tech.• Mercury, Sky, Galileo, Tundra (crossbars, bridges)• Virginia Tech’s reconfigurable hardware

66% chips are not OK on first silicon (2004)

Mid-90s – 6 months late = > 31% earnings lossToday 3 month late = $500M loss

HIERARCHY OF PLATFORMS

Full Application Platform• users design full applications on top of har

dware and software architectures • Nexperia• Texas Instrument's OMAP multimedia platf

orm• Infineon's M-Gold 3G wireless platform,• Parthus' Bluetooth platforms• ARM's PrimeXsys wireless platform

processor-centric platform • focus on access to a configurable process

or but doesn't model complete applications

• Improv Systems• ARC• Tensilica• Triscend

communication- centric platform

• interconnect architecture but doesn't typically provide a processor or a full application

• Sonics' SiliconBackplane • PalmChip's CoreFrame architectures.

fully programmable platform

• consisting of FPGA logic and a processor core

• Altera's Excalibur, Xilinx' Virtex-II Pro and Quicklogic's QuickMIPS

• Xilinx-IBM XBlue architecture

SDR solution 으로 5 단계Tier

0전통적인 하드웨어 구현

Tier 1

SCR(software controlled radios)

소프트웨어로 다중 하드웨어 요소에 대한 제어 특징을 구현Tier

2SDR(software

defined radios)

소프트웨어로 변조와 기저대역 처리를 구현하고 , 다중 주파수 RF 는 고정된 기능의 하드웨어로 구현

Sand-Bridge(ARM+4DSP’s)

Tier 3

ISR(Ideal Software radio)

안테나에서 아날로그 변환 기능을 갖는 RF 구현을 통해 프로그램 능력을 확장Tier

4USR(Ultimate s

oftware radio)

디지털 처리 능력에 추가하여 , 빠른( 수 millisecond 이내 ) 통신 프로토콜 전환 능력까지 제공

Introduction• Wireless processing system 은 높은 throu

ghput 과 함께 많은 계산을 필요로 하지만 엄격한 power 제약이 있음• 재구성 SoC 구현은 parallelism 에 의해

성능향상을 시도하고 , IP reuse 를 사용• Hot spot bottleneck(or traffic) 에 의한 성능

예측을 통한 Algorithm partitioning

Introduction• Scheduled interconnect

– Link utilizations are substantially smaller than the bus since communication is distributed and pipelined throughout the system.

– Eliminate the congestion caused by the bus and header overhead presen in dynamic routing.

• Reconfigurable Architecture Workstation (RAW) project has re-examined static communication as a mechanism for general-purpose computing.

• 규칙적인 연결구조와 정적인 스케줄링은 불필요한 interconnect switching 을 제거

• 전체 core 에서 Computational load 의 균형을 맞추어 성능향상• Overhead of the configuration streams

– Configuration streams must be scheduled periodically along with the data

– 4% 의 bandwidth 를 configuration stream 이 사용• Data content variation 과 system operating 환경에 따라 core interface 와

core 자체가 low power 모드로 동적 재설정

Scheduled Communication• A tiled architecture• 각 tile 은 computational core 이며

각 interface 가 네트웍을 구성• Core interface 는 하나 이상의 tile

에서 발생하는 heterogeneous processing 의 사용을 제공함

• The system connect using statically scheduled mesh of interconnect

• Data 는 이웃하는 tile 과 communication pipeline 에 의해 이동하므로 fast clock rate 와 interconnection resource 의 시 분할이 가능

• Core 와 runtime interconnect 의 재설정 능력에 의해 dynamic power management 를 가능케 한다 .

Adaptive System on Chip

Communication Interface

-Stream data that passes through a communication interface is scheduled for a specific communication - clock cycle based on data link availability.-the result of scheduling for each interface is a set of instructions for its associated interconnect memory.

9-core and 16-core Mode

Evaluation Methodology

Performance of the Benchmarks

Dynamic Power Management

• Dynamic Power Management 는 data content 의 run-time variation 에 따른 서로 다른 clock domain 을 이용한 frequency 의 감소로 인한 power saving

• DCT 구현에서 계산 결과 값이 변하지 않는 high order bit 는 bypass 하여 switching 을 제거

• Valid data stream data 일 경우만 연결시켜 불필요한 switching 을 제거

• Prefetch many frames in a optimal-sized buffer [[email protected]]

Dynamic Power Management

• Reconfigurable clock based system balancing creates an environment of just in time computing which can reduce overall power usage.

• Taking advantage of interconnect flexibility allows a system to dynamically change functionality and avoid unused computational units.

• Interconnect power consumption is low and the overhead due to configuration streams is under 10% for both bandwidth and power.

Power Metric• Based on network activity and HSPICE circuit simulation o

f interconnect, the network power consumption(Pint) is:

T : represents the number of tilesPIF/D: overhead of the instruction memory fetch and decodes: the number of streamNvs and Nivs: the number of valid and invalid transfer for strea

m s while Ps is the power consumed in transferring 1 bit through stream s

iSOC Compiler

• divides applications into parts, each of which fit into a specific core.

• determines data communications between the cores in a space-time fashion

• generate interconnect memory contents for each individual interface.

References• aSOC: A Scalable, Single-Chip Communications Architecture

Jian Liang, Sriram Swaminathan, and Russell TessierDepartment of Electrical and Computer EngineeringUniversity of Massachusetts, Amherst, MA. 01003.{jliang, tessier}@ecs.umass.edu

• Configurable Platforms With Dynamic Platform Management: An Efficient Alternative to Application-Specific System-on-Chips

– Krishna Sekar Kanishka Lahiri Sujit Dey– [email protected] [email protected] [email protected]– Dept. of ECE, UC San Diego, La Jolla, CA– NEC Laboratories America, Princeton, NJ

mailto:tessier%[email protected]

OMAPTM(open multimedia application platform)

• OMAP architecture는 platform 의 전체 clocking 과 idle mode 의 전체 control을 할 수 있는 SW/OS 가 있다 .

• Dual core architecture 는 task 에 대해 가정 적당한 process 에게 task 를 할당하는 것이 가능

Memory vs Reused-IP

ED2

• SMT (Simultaneous Multi-Threading) 20% speed-up and 24% power overhead [yin

[email protected]] using PowerTimer, PowerPC simulator

Slow-down using DVS: 10% energy gain, scheduling:15% every saving increase

Time-Space Exploration• Enumerate all Trade-off’s and select the o

ne with the most benefit.• Branch and Bound method for estimating e

very SoC metric.

Jiang Xu and Wayne WolfPrinceton University

First decide an architecture, and assign estimated requirements to unavailable modules.Adjust the requirements using performance analysis in a trial-and-error fashion.Based upon the requirements purchase IP cores and design customized modules.May need several iterations to reach a final design.It is very helpful, if designers can getperformance models of IP cores before buy them.Cadence Virtual Component Co-design(VCC)

A Multimedia Embedded Chip