XStream: Rapid Generation of Custom Processors for ASIC Designs
description
Transcript of XStream: Rapid Generation of Custom Processors for ASIC Designs
XStream: Rapid Generation of Custom Processorsfor ASIC Designs
Binu Mathew
* ASIC: Application Specific Integrated Circuit
2
Overview
What is XStream ?Comparison to Network ProcessorsDesign Flow Design Example: Ethernet Bridge/VLAN
Switch
3
What is XStream ?
Software tool to rapidly generate high performance custom stream processors Stream Processing: Repeated application of an algorithm kernel to
a sequence of packets subject to throughput specifications
Resulting custom processors: 40-90% performance of a custom ASIC < 5% design effort of a custom ASIC
Rapidly develop your own ultra high performance network processors!
4
When you use a Network Processor
What your product looks like What your competitor’sproduct looks like
5
XStream vs Network Processor
What if my application does not look like this ?
6
XStream vs Network Processor
What if my application does not look like this ?Network Processor: No helpXStream: Make a system that looks like my app in days
7
XStream vs Network Processor
What if I want to use cheaper DDR2 instead of RDRAM or need more b/w ?
8
XStream vs Network Processor
What if I want to use cheaper DDR2 instead of RDRAM or need more b/w ?Network Processor: No helpXStream: Select a different controller from the GUI and plop it on the chip
9
XStream vs Network Processor What if I need
Different type/number of micro-engines More capable control processor Additional high performance processors for value
added services More crypto cores Different trie lookup hardware Different DRAM bandwidth Etc, etc, etc
Network processor: No help XStream: Yes
10
Design Flow Draw an architecture diagram for your application Select processors, interfaces, IP blocks etc from a
GUI Specify parameters, throughput requirements etc Specify the high level function of any additional
custom coprocessors you need Press a button and wait... XStream generates the h/w for you
11
Design Example Objective:
Design a platform chip that is shared across different products to save cost
Product 1: 16 port Ethernet Bridge Product 2: 16 port VLAN switch with advanced
filtering abilities Major differences:
Wimpy ingress/egress processors ok on the bridge VLAN Switch needs high performance ingress/egress
processors VLAN Switch needs high performance filter rule
engine
12
XStream: Designing a Platform ChipLinkInterface
PortIngressProcessor
PortEgressProcessor
LinkInterface
PortIngressProcessor
PortEgressProcessor
.
.
.16 ports
IngressQueue
EgressQueue
Crossbar
StreamProcessorforSwitchingDecisions
ControlProcessor
ExternalDRAM
13
The Streams in XStreamLinkInterface
PortIngressProcessor
PortEgressProcessor
LinkInterface
PortIngressProcessor
PortEgressProcessor
.
.
.16 ports
IngressQueue
EgressQueue
Crossbar
StreamProcessorforSwitchingDecisions
ControlProcessor
ExternalDRAM
14
The Streams in XstreamLinkInterface
PortIngressProcessor
LinkInterface
PortIngressProcessor
PortEgressProcessor
.
.
.16 ports
IngressQueue
EgressQueue
Crossbar
StreamProcessorforSwitchingDecisions
ControlProcessor
ExternalDRAM
PortEgressProcessor
15
The Streams in XstreamLinkInterface
PortIngressProcessor
PortEgressProcessor
LinkInterface
PortIngressProcessor
PortEgressProcessor
.
.
.16 ports
IngressQueue
EgressQueue
Crossbar
StreamProcessorforSwitchingDecisions
ControlProcessor
ExternalDRAM
16
XStream: Mapping the core processorLinkInterface
PortIngressProcessor
PortEgressProcessor
LinkInterface
PortIngressProcessor
PortEgressProcessor
.
.
.16 ports
IngressQueue
EgressQueue
Crossbar
StreamProcessorforSwitchingDecisions
ControlProcessor
ExternalDRAM
17
XStream: Mapping the core processor...IngressQueue
EgressQueue
StreamProcessorforSwitchingDecisions
Imagine a snazzy GUI here Designer says:
Stream processor, 8 issue Stream 1: Input, 16x1 queue, N deep Stream 2: Output,16x1 queue, M deep Stream 3: Inout, RISC processor
interface Add a CAM: 2 port, 48 bit keys, 1024
entries, 4 way associative, hash=F(…) The tool ponders for a while…
Says: “Yes master”
18
IngressQueue
EgressQueue
StreamProcessorforSwitchingDecisions
Imagine a snazzy GUI here Designer writes 15 lines of code for the data
plane, say in a subset of C Designer says: Schedule and report The tool ponders for a while…Says:
Compiled 45 instructions Using modulo accelerator Initiation interval = 8 cycles Clock speed: 500 MHz Throughput based on 64 byte (worst case)
packet size: 500MHz/8 * 64 * 8 = 32 Gb/s Area: 2.5mm x 2.5mm Power: 1.2 W
Single stream processor @ 500 MHz = 32 Gb/s Have designed up to 1 GHz processor in 0.13u
process
XStream: Mapping the core processor...
19
XStream: Mapping the ingress processor... LinkInterface
PortIngressProcessor
PortEgressProcessor
LinkInterface
PortIngressProcessor
PortEgressProcessor
.
.
.16 ports
IngressQueue
EgressQueue
Crossbar
StreamProcessorforSwitchingDecisions
ControlProcessor
ExternalDRAM
20
XStream: Mapping the ingress processor...
PortIngressProcessor
FilterRuleEngine
Imagine a snazzy GUI here Designer says:
RISC processor engine, no-cache 2 issue, scratchpad memory Stream 1: Input, link interface Stream 2: Output, StreamProc:Ingress
Queue Add a Filter Rule Engine: Rule
complexity = 64 terms, … The tool ponders for a while…Says:
RISC core and compiler generated Area: 1mm x 1mm (i.e. this can be
replicated 100x on a 10x10mm chip) Power: 250 mW
21
Summary Showed network processor design
But might as well be multi-media or wireless product design
Very high performance custom processors replace ASIC modules Reduce design time for stream oriented ASIC modules
by 95% Retain 40-90% of ASIC performance
Software replaces hardware design Software prototype already exists Flexible, fast bug fixes, feature upgrades Share chip across product family