Applied Micro Circuits Corporation Challenges in Making ...

22
CONFIDENTIAL Keith Morris Keith Morris Director of Marketing Director of Marketing Switching & Network Processing Switching & Network Processing [email protected] [email protected] Challenges in Making Highly Challenges in Making Highly Integrated Network Processors Integrated Network Processors Applied Micro Circuits Corporation Applied Micro Circuits Corporation

Transcript of Applied Micro Circuits Corporation Challenges in Making ...

CONFIDENTIAL

Keith MorrisKeith MorrisDirector of MarketingDirector of Marketing

Switching & Network ProcessingSwitching & Network [email protected]@amcc.com

Challen ges in Makin g HighlyChallen ges in Makin g HighlyIntegrated Network ProcessorsIntegrated Network Processors

Applied Micro Circuits CorporationApplied Micro Circuits Corporation

Switching & Network Processing 2CONFIDENTIAL

AgendaAgenda

z NPU Definitionz Challen ges & Solutions

z Physicalz Performancez Flexibilityz Scalability & Re-use

z Puttin g it all to gether – Inte grated NPU

Switching & Network Processing 3CONFIDENTIAL

NPU DefinitionNPU Definition

NPUSoftware

Programmable

MEMORY

Co-Processors

NetworkingInterfaces

Context & Queuing(Big & Fast)

Fast EthernetGigabit Ethernet

SPIx

Back Plane /Fabric

Interface

Special Functions &Host CPU

Switching & Network Processing 4CONFIDENTIAL

Context Handling – The Ultimate BottleneckContext Handling – The Ultimate Bottleneck

NPU or ASIC Performance

Read & Write time of Flow Context

ApplicationPerformance

Varies ByApplication

&NPU/ASIC

Architecture

CONFIDENTIAL

Challen ges & SolutionsChallen ges & Solutions

Switching & Network Processing 6CONFIDENTIAL

Challenge: Physical ConstraintsChallenge: Physical Constraints

System Power Dissipation

Small Silicon Die Area

40ns

Packet Arrival Rate

Efficient nPcoreArchitecture

Switching & Network Processing 7CONFIDENTIAL

Networking Optimized Execution PipelineNetworking Optimized Execution Pipeline

T0 T1 T2 T3 T4 T5T0 T1

Fetch

Decode

Read

Task Sel

Execute

Task Sel

FetchFetch

Decode

Read

Task Sel

Execute

Task Sel

Fetch

Decode

Read

Task Sel

Fetch

Decode

Task Sel

Wr Back

Case Statements and Conditional Jumps donot cause pipeline breaka ges

Extremely important for deterministicperformance in decision rich data pathprocessin g

Overlappin g Virtual Processors

Switching & Network Processing 8CONFIDENTIAL

Zero Cycle Task Switching – Hides LatencyZero Cycle Task Switching – Hides Latency

Pool of Virtual Processors

0

1

2

n

Resource 1

Resource 2

Command FIFO

TIME

Elapsed Time for Red Packet Task

Request FIFO

Switching & Network Processing 9CONFIDENTIAL

Frequency, Power and Die Area for nPcoresFrequency, Power and Die Area for nPcores

0.00

10.00

20.00

30.00

40.00

50.00

60.00

70.00

80.00

1998 1999 2000 2001 2002

Year

No

rma

lize

d v

al

u T echnology (um)

Frequency (M Hz)

Power Performance(M IPS/power)Die Area Performance(M IPS/mm^2)

30 –70 fold increasein NPU performance

ratios in 4 years

Switching & Network Processing 10CONFIDENTIAL

Challenge: Scaling Processing PerformanceChallenge: Scaling Processing Performance

Parallel – Sin gle Stage

Pipeline – Multi-Sta ge

z Complex software partitionin gz Very sensitive to latency

z Each sta ge must complete within thesmallest packet time

z One sta ge becomes the bottleneckz Poor Scalability

z Eventually some sta ges may need tobecome parallel

9

8z Allows simplest pro grammin g model

z Run to completionz Not sensitive to latency

z Elapsed time to process a packet canbe very lon g if required

z Highly scalablez 100’s of tasks, multiple processors

Processor

Switching & Network Processing 11CONFIDENTIAL

Multi-Stage – Complex and InefficientMulti-Stage – Complex and Inefficient

Application Code

N N threadsthreads

core 1core 1

N N threadsthreads

core 2core 2

N N threadsthreads

core core NN

programsegment

A

programsegment

B

programsegment

N

need addt’l code to stitch each

segment together

Developer must:• Subdivide

algorithm• Load-balance

segments• Stitch

segmentstogether

Performanceissues with:

• Underutilizedcores

• 1 segment canoverrun next

• Hand-offsbetween cores

• Locking sharedresources

Switching & Network Processing 12CONFIDENTIAL

AMCC: 1-Stage – Fundamentally SimplerAMCC: 1-Stage – Fundamentally Simpler

nPcore 1nPcore 1 Thread 1Thread 1Thread 2Thread 2

Thread Thread NN

nPcore 2nPcore 2

nPcore nPcore NN

Thread 1Thread 1Thread 2Thread 2

Thread Thread NN

Thread 1Thread 1Thread 2Thread 2

Thread Thread NN

;; cannot handle.;;;; An encapsulation header is used to communicate the;; source-port-id and the errno value as follows:;;;; +====+====+====+====+====+====+====+====+;; | _0 | _1 | _2 | _3 | _4 | _5 | _6 | _7 |;; +====+====+====+====+====+====+====+====+;; | 81 | 00 | S-PORT | ERRNO | N/A |;; +----+----+----+----+----+----+----+----+;;;;===================================================================

pkt_fwd_to_cpu

#define DB_HDL r6

;; retrieve the control connection switch header from thenPkernel ;; unicast port table.

set_entry_db ucast_port_table, DB_HDL, APP_CPU_PORTID read_first_db ucast_port_table, DB_HDL, r0

;; setup the encap header

• Same program image loadedon each thread

• A packet runs to completionon a single thread

9

Switching & Network Processing 13CONFIDENTIAL

Challenge: Flexibility vs. PerformanceChallenge: Flexibility vs. Performance

SolutionFlexibility

100%

SolutionEfficiency

100%Software Implementation (CPU)

100%Hardware Implementation (ASIC)

1

NPUTarget

Operating Range

Switching & Network Processing 14CONFIDENTIAL

Solution: Co-ProcessorsSolution: Co-Processors

Input Controller

Output ControllerTransform

Req FIFO

Cm

d F

IFO

Dat

a F

IFO

Channel

nPcore

Mem

ory

PolicyEngine

SearchMachine

SPU

XSCI/F

OFFCHIP

Switching & Network Processing 15CONFIDENTIAL

High Speed Single-Step Packet ClassificationHigh Speed Single-Step Packet Classification

SoftwareSoftwareCode BlockCode Block

_________________________________________________________________________________

IPv4

_________________________________________________________________________________

IPv6

_________________________________________________________________________________

OAMcell

___________________________________________

___________________________________________

___________________________________________

___________________________________________

If-else (1)If-else (2)

If-else (3)If-else (4)

SoftwareSoftwareCode BlockCode Block

_________________________________________________________________________________

IPv4

_________________________________________________________________________________

IPv6

_________________________________________________________________________________

OAMcell

___________________________________________

___________________________________________

___________________________________________

___________________________________________

If-else (1)If-else (2)

If-else (3)If-else (4)

•Policy Enginesaves many instructions compared to using RISCsoftware to classify packets !

Is this a cell or a packet?

If a cell, which type?

L2 or L3 packet?

•Policy Enginealso simplifiesprogramming, makingit modular•RISC for classificationis instruction hungry,not modular

Switching & Network Processing 16CONFIDENTIAL

Efficient Packet Modification & EncapsulationEfficient Packet Modification & Encapsulation

Transform

&0'��

&0'��

&0'��

&0'��

&0'��

&0'�� &0'�� &0'�� &0'�� &0'��

nPcore

Switching & Network Processing 17CONFIDENTIAL

PHYSICAL PORT• OC-192

SUB-PORTS• Fixed sub-division of the physical port• OC-192, -48, -12, -3, GE• Flows are scheduled according tominimum rate, class of service andweight at this level

VIRTUAL PIPES• A collection of flows that have anaggregate maximum rate• Pipe ensures that provisioned rate isnot exceeded• For example - all flows for anindividual subscriber, network or traffictype

Complex Bandwidth ProvisioningComplex Bandwidth Provisioning

FLOWS• Smallest scheduled entity• May have guaranteed minimum bandwidth• Individually weighted within a pipe

Switching & Network Processing 18CONFIDENTIAL

Challenge: Scalability & Re-useChallenge: Scalability & Re-use

16 x 2.5Gbps

40GbpsToday

160GbpsToday

640Gbps

“Tomorrow”

“Tomorrow”

DS-3 to OC-48c OC-3 to OC-192c OC-12 to 4 x 0C-192

One Code Base

+MPLS +DiffServ +IPv6+VPN

16 x 10Gbps 16 x 40Gbps

Maximum Software ROI

Switching & Network Processing 19CONFIDENTIAL

Solution: nP Programming InfrastructureSolution: nP Programming Infrastructure

nP

nPkernel

nPdebug

Tra

ffic

Man

ager

Fra

mer

nPk M

essageH

andler

StandardnPMs

nPapplication Ingress Data Alg

nPk M

essageH

andler

External MemorySearchCoprocessor

nPapplication Egress Data Alg

nPk DBDrivers

nPk DBDrivers

nPk Drivers

nPk DBDrivers

nPapplicationnPMs

On Chip Coprocessor

Switching & Network Processing 20CONFIDENTIAL

“Runtime Libraries”:“Runtime Libraries”:Reducin g code customer must actually writeReducin g code customer must actually write

z 99+% of total Lines of Code (LOC) are on the control CPUz Next, nPkernel alone provides up to 95% of actual NPU code, i.e.

95% of the 1%z So NPU-resident portion of app often only 100-200 total LOCz AMCC sample app u-code provides most of that 100-200 LOC, all in

some casesz Actual customer-generated code typically much less than 5% of 1% !!

5%

nPkernel

A pp lication

Switching & Network Processing 21CONFIDENTIAL

Putting it all together - Integrated NPUPutting it all together - Integrated NPU

I/O a

nd F

ram

er b

lock

s(I

nteg

rate

d M

AC

s) nPCoreCluster

Algorithmicsearch

Packet Transform Engine

FrameI/F

Channel Buffers Checksums

Workspace PolicyEngine

SPI4 P2NPSI

SP

I4.2

- C

hip

to C

hip

Inte

rcon

nect

SEARCH I/F DRAM I/F DRAM I/F QDR I/F

Data Coherency(SPU)

CPU Interface Debug JTAGGPIO

Non Blocking Memory Access Unit

TrafficManagementCo-Processor

Line

Inte

rfac

es

Cache

Switching & Network Processing 22CONFIDENTIAL

SummarySummary

z NPU Definition – No performance penaltyz Challen ges & Solutions

z Physical – More MHz won’t cut itz Performance – Cannot be at the price of usabilityz Flexibility – Doesn’t mean 100% softwarez Scalability & Re-use – Any Protocol, Any Speed,

One Architecture

z Puttin g it all to getherz Avoiding System Problems & Side Effects