Low Latency Video Analytics Servers · Computing • X86, OpenPower, ARM • GPU and FPGA assist...

26
Low Latency Video Analytics Servers Devashish Paul, Director Strategic Marketing [email protected] April 2016 © Integrated Device Technology

Transcript of Low Latency Video Analytics Servers · Computing • X86, OpenPower, ARM • GPU and FPGA assist...

Page 1: Low Latency Video Analytics Servers · Computing • X86, OpenPower, ARM • GPU and FPGA assist • Push Video encode/decode, streaming, rendering and analytics into the network

Low Latency Video Analytics Servers

Devashish Paul, Director Strategic [email protected]

April 2016© Integrated Device Technology

Page 2: Low Latency Video Analytics Servers · Computing • X86, OpenPower, ARM • GPU and FPGA assist • Push Video encode/decode, streaming, rendering and analytics into the network

IDT Company Overview

Devashish Paul, Director Strategic Marketing 2

Founded 1980

Workforce Approximately 1,800 employees

Headquarters San Jose, California

Mixed-signal application-specific solutions

#1 Serial Switching – 100% 4G Infrastructure with RapidIO

#1 Memory Interface – Industry Leader DDR4

#1 Silicon Timing Devices – Broadest Portfolio

800+ Issued and Pending Patents Worldwide

Page 3: Low Latency Video Analytics Servers · Computing • X86, OpenPower, ARM • GPU and FPGA assist • Push Video encode/decode, streaming, rendering and analytics into the network

Low Latency Video Analytics at the Edge

Devashish Paul, Director Strategic Marketing 3

Agenda

Network Trends

RapidIO 20-50 Gbps Technology

Video Analytics Architectures

Analytics examples with low latency RapidIO

Page 4: Low Latency Video Analytics Servers · Computing • X86, OpenPower, ARM • GPU and FPGA assist • Push Video encode/decode, streaming, rendering and analytics into the network

5G Base Station + Edge Computing Appliance

Devashish Paul, Director Strategic Marketing 4

The Network is the Data Center

Ecommerce

Fleet Management

Semi Autonomous

Vehicles

Traffic management

• Low Latency

• Energy Efficient

• Analytics Workloads

• At Network Edge

• RapidIO

• IEEE 1588

• Timing

• RF Products

• Memory

Interface

• Retimers

• Sensors

Page 5: Low Latency Video Analytics Servers · Computing • X86, OpenPower, ARM • GPU and FPGA assist • Push Video encode/decode, streaming, rendering and analytics into the network

Devashish Paul, Director Strategic Marketing 5

The Network is the Data Center

Ubiquitous Computing:

IDT connects, synchronizes, times and makes sense

of the human-and-machine connected world

Page 6: Low Latency Video Analytics Servers · Computing • X86, OpenPower, ARM • GPU and FPGA assist • Push Video encode/decode, streaming, rendering and analytics into the network

Network and Data Center Convergence

Devashish Paul, Director Strategic Marketing 6

• Apps moving from data center

to co-locate with access node

(base station or wired access

node)

• Supporting real time

communication to mobile

devices (phones, cars IOT)

• Tight time synchronization

between apps running on

distributed servers and in data

center

• Need low latency interconnect

Edge Computing an essential element

of 5G Rollouts

Wired and Wireless

Access nodes

Servers/

Analytics

Page 7: Low Latency Video Analytics Servers · Computing • X86, OpenPower, ARM • GPU and FPGA assist • Push Video encode/decode, streaming, rendering and analytics into the network

PPC

GPU

Data Center

Servers

Cloud

RAN

4G BTSTowards 5G/MEC Co-located CPU and Acceleraters

Devashish Paul, Director Strategic Marketing 7

Mobile Edge Computing

Page 8: Low Latency Video Analytics Servers · Computing • X86, OpenPower, ARM • GPU and FPGA assist • Push Video encode/decode, streaming, rendering and analytics into the network

RapidIO Interconnect combines the best attributes

of PCIe® and Ethernet in a multi-processor fabric

Ethernet PCIe PCIe

ScalabilityLow

LatencyHardware

Terminated

Guaranteed

Delivery

Ethernet

in

software

only

Clustering Fabric Needs

• Lowest Deterministic

System Latency

• Scalability

• Peer to Peer / Any

Topology

• Embedded Endpoints

• Energy Efficiency

• Cost per performance

• HW Reliability and

Determinism

Devashish Paul, Director Strategic Marketing 8

Page 9: Low Latency Video Analytics Servers · Computing • X86, OpenPower, ARM • GPU and FPGA assist • Push Video encode/decode, streaming, rendering and analytics into the network

Video Analytics Edge Server with RapidIO

• Heterogeneous compute

workloads

• No protocol termination

CPU cycles

• Energy efficiency

• 20 to 50 Gbps

embedded interconnect

• Scalable Fat node

connect multiple boards

in Edge Appliance

• Push Video

encode/decode,

streaming, rendering

and analytics into the

network edge

Devashish Paul, Director Strategic Marketing 9

Board leveRapidIOInter-

ConnectFabric

FPGA/GPU

accelerators

CPU

Storage

Cable / connector

Backplane

RapidIO Rack Scale Interconnect Fabric

Other nodes

Other nodes

RapidIO Backplane Interconnect Fabric

Video Analytics require low latency computeVideo Analytics = HPC ‘like’

Video server

Page 10: Low Latency Video Analytics Servers · Computing • X86, OpenPower, ARM • GPU and FPGA assist • Push Video encode/decode, streaming, rendering and analytics into the network

• 10/20/40/50 Gbps per port – 6.25/10/12.5 Gps

lane

• 100+ Gbps interconnect in definition

• Hardware termination at PHY layer: 3 layer

protocol

• Lowest Latency Interconnect ~ 100 ns

• Inherently scales to large system with

1000’s of nodes

• Over 15 million RapidIO

switches shipped

• > 2xEthernet (10GbE)

Over 110 million 10-20 Gbps

ports shipped

• 100% 4G interconnect

market share

Devashish Paul, Director Strategic Marketing 10

Page 11: Low Latency Video Analytics Servers · Computing • X86, OpenPower, ARM • GPU and FPGA assist • Push Video encode/decode, streaming, rendering and analytics into the network

Supported by Large Eco-system

02/09/16 Devashish Paul, Director Strategic Marketing 11

© 2015 RapidIO.org

Page 12: Low Latency Video Analytics Servers · Computing • X86, OpenPower, ARM • GPU and FPGA assist • Push Video encode/decode, streaming, rendering and analytics into the network

RapidIO Ecosystem and Market Progression

Performance evolves for multiple system platform generations

2000 2002 2004 2006 2008 2010 2012 2014 2016+

1st Gen Standard

2nd Gen Standard

3rd Gen Standard

4th Gen Standard

SpaceInterconnect

Bridging

Data Center

Computing

Storage

RapidIO Gen3 (10xN) released with path to 25 Gbaud

Devashish Paul, Director Strategic Marketing 12

Page 13: Low Latency Video Analytics Servers · Computing • X86, OpenPower, ARM • GPU and FPGA assist • Push Video encode/decode, streaming, rendering and analytics into the network

Hyperscale Data Center Real Time Analytics Examples

13

• RapidIO Low latency interconnect fabric for real time data center analytics

• Desire to leverage multi core heterogeneous computing x86, ARM, GPU, FPGA, DSP

• Collapse workloads and detect events with more certainty in real time: ex: particle collisions, fraud detection

Market emerges for real time compute analytics in data center

Page 14: Low Latency Video Analytics Servers · Computing • X86, OpenPower, ARM • GPU and FPGA assist • Push Video encode/decode, streaming, rendering and analytics into the network

Open Compute Project HPC and Supercomputing

02/09/16 Devashish Paul, Director Strategic Marketing 14

Supports Opencompute.org HPC Initiative

Page 15: Low Latency Video Analytics Servers · Computing • X86, OpenPower, ARM • GPU and FPGA assist • Push Video encode/decode, streaming, rendering and analytics into the network

LocalInter-

ConnectFabric

Cable / Connector

CPU/

Accelerator

Storage

I/O

Hub

20 to 50 Gbps

RapidIOPorts

Scalable Low Latency Video Analytics Server Fat Node

Devashish Paul, Director Strategic Marketing 15

20 to 50 Gbps

embedded ports

300 mW per 10 Gbps

100 ns latency

Distributed switching

Direct connection to BBU

CPU/

Accelerator

Storage

I/O

Hub

Edge Node with I/O Hub

LocalInter-

ConnectFabric

FPGA/GPU

CPU

Storage

Cable / Connector

Edge Node Native Interconnect

20 to 50 Gbps

RapidIOPorts

RapidIO Top of Rack Switching

Page 16: Low Latency Video Analytics Servers · Computing • X86, OpenPower, ARM • GPU and FPGA assist • Push Video encode/decode, streaming, rendering and analytics into the network

16

Clockin

g &

Reset

Power

MgmtJTAGI2C

Maintenance Buffer and Control

Scheduler and Switch

Fabric

Stack

Port0/1

SerDes

Stack

Port0/1

SerDes

Stack

Port0/1

SerDes

Stack

Port0/1

SerDesS

tack

Port0

/1

SerD

es

Sta

ck

Port0

/1

SerD

es

Sta

ck

Port0

/1

SerD

es

Sta

ck

Port0

/1

SerD

es S

tack

Port

0/1

SerD

es

Sta

ck

Port

0/1

SerD

es

Sta

ck

Port

0/1

SerD

es

Sta

ck

Port

0/1

SerD

es

Page 17: Low Latency Video Analytics Servers · Computing • X86, OpenPower, ARM • GPU and FPGA assist • Push Video encode/decode, streaming, rendering and analytics into the network

Analytics Optimized 100ns 50 Gbps Switch

• RXS2448

– 600 Gbps Full-Duplex Serial RapidIO® Switch

– 50 Gbps per port

– 33 x 33 mm package

– 48 lanes at 12.5 Gbps

– Up to 24 Serial RapidIO Ports

– RapidIO Specification (Rev 3.2) Compliant

• RXS1632

– 400 Gbps Full-Duplex Serial RapidIO Switch

– 50 Gbps per port

– 29 x 29 mm package

– 32 lanes at 12.5 Gbps

– Up to 16 Serial RapidIO Ports

– RapidIO Specification (Rev 3.2) Compliant

Clocking

& Reset

Power

MgmtJTAGI2C

Maintenance Buffer and Control

Scheduler and Switch

Fabric

Stack

Port0/1

SerDes

Stack

Port0/1

SerDes

Stack

Port0/1

SerDes

Stack

Port0/1

SerDes

Sta

ck

Port0

/1

SerD

es

Sta

ck

Port0

/1

SerD

es

Sta

ck

Port0

/1

SerD

es

Sta

ck

Port0

/1

SerD

es S

tack

Port

0/1

SerD

es

Sta

ck

Port

0/1

SerD

es

Sta

ck

Port

0/1

SerD

es

Sta

ck

Port

0/1

SerD

es

©2016. IDT.

50 Gbps per port | 300mW per 10 Gbps data | 100ns latency

Devashish Paul, Director Strategic Marketing 17

Page 18: Low Latency Video Analytics Servers · Computing • X86, OpenPower, ARM • GPU and FPGA assist • Push Video encode/decode, streaming, rendering and analytics into the network

Edge Video Analytics Systems

APPLICATION ISSUES SOLVED

● 50 Gbps per port with 95% link utilization

● 100 ns latency

● Power efficient 300 mW per 10 Gbps

RXS

SoC

FPGA

DSP

Connector

RXS

SoC

FPGA

DSP

Connector

RXS

SoC

FPGA

DSP

Connector

RXS

BBU

Basestation

RXS

Fabric Cluster for Rack Switching

C-RAN

Backplane

RXS

CPU

FPGA/

GPU

RXS

Server

Backplane

RXS

CPU

FPGA/

GPU

Server

RXS

CPU

FPGA/

GPU

Server

Edge Computing Appliance

Distributed

low latency switching

Optimized for needs

of Edge Video Analytics

Devashish Paul, Director Strategic Marketing 18

Page 19: Low Latency Video Analytics Servers · Computing • X86, OpenPower, ARM • GPU and FPGA assist • Push Video encode/decode, streaming, rendering and analytics into the network

X86 Edge based Video Analytics Appliance

Devashish Paul, Director Strategic Marketing 19

• Key Components:

• 19 inch 1U ToR 38x20 port ToRswitch

• 19 inch 1U networking server with built in 320 Gbps RapidIO Switching

• Quad Intel Broadwell Sockets

• 4 TB storage in 8 bays

• 320 Gbps built in switching in Video Server

• 100ns latency per switch hop

• Up to 160 sockets per rack with any to any connectivity

• TCP/IP software driver for RapidIO + IDT Open Source Software

• Available Now

Video Analytics Server and ToR

RapidIO|GPU|x86|FPGA|Power|ARM

Low Latency | Energy Efficient

Page 20: Low Latency Video Analytics Servers · Computing • X86, OpenPower, ARM • GPU and FPGA assist • Push Video encode/decode, streaming, rendering and analytics into the network

GPU Acceleration Nodes

02/09/16 Devashish Paul, Director Strategic Marketing 20

• Computing

• 4 x Nvidia Tegra K1 GPU per Compute slot

• 16x per 19 Inch server

• Analytics acceleration or cloud based rendering of gaming content

• Interconnect Fabric

• RapidIO Switching + 4x RapidIO Gen2 NIC

• Performance

• 1.28 Tflops/Compute (4 GPU)

• 140 Gbps Switching Fabric

• 100nsec RapidIO switching latency

Low Latency GPU Analytics Module

http://www.gocct.com/sheets/AG/photograph/hires/aga1xm1d.jpg

http://www.gocct.com/sheets/AG/aga1xm1d.htm

Page 21: Low Latency Video Analytics Servers · Computing • X86, OpenPower, ARM • GPU and FPGA assist • Push Video encode/decode, streaming, rendering and analytics into the network

Low Latency Switch for Edge Computing Scale Out

Devashish Paul, Director Strategic Marketing 21

• 38 x 20 Gbps ports

• Sub 200W switching power

• Support 42U Rack level scale out

• Available Now

0.75 Tbps

1 U 19 Inch 100 ns Switch

With 20 Gbps ports

Roadmap to 4.8 Tbps

2U 100 ns Switch

With 50 Gbps ports

• 96 x 50 Gbps ports

• Sub 400W switching power

• Supports redundant ports to 42U rack and intra rack scale out

• 2H 2016

5G|Mobile Edge Computing |HPC| Video Analytics | Low Latency Financial Trading

Page 22: Low Latency Video Analytics Servers · Computing • X86, OpenPower, ARM • GPU and FPGA assist • Push Video encode/decode, streaming, rendering and analytics into the network

0

10

20

30

40

50

60

70

0

200

400

600

800

1000

1200

1400

1600

1800

2000

CP

U%

Gb

ps

Bytes

Utilization

Goodput

Throughput

Edge Video Analytics: IBM Power8/Nvidia Tesla/RapidIO Switching

Devashish Paul, Director Strategic Marketing 22

Roadmap to 4.8 Tbps

2U 100 ns Switch

With 50 Gbps ports

Building the OpenPower Edge Node

• Low latency RapidIO ToR Switching 38x20 20 Gbps

• 100 ns IDT RapidIO switch latency, 95% link utilization

• Multi socket Power8 servers

• 38 sockets per rack any to any compute for video analytics

• Option to add Nvidia Tesla class GPU for acceleration per Power8 Node

• IDT Versaclock5 Timing

Page 23: Low Latency Video Analytics Servers · Computing • X86, OpenPower, ARM • GPU and FPGA assist • Push Video encode/decode, streaming, rendering and analytics into the network

Edge Social Data Analytics• Analyze User Impressions on World Cup Final 2014 (Germany/Argentina)

– HPAC Lab project to analyze World Cup 2014 twitter data using Hadoop and visualize using Tableau public on HPAC

Platform

Devashish Paul, Director Strategic Marketing 23

Orange Silicon Valley

Page 24: Low Latency Video Analytics Servers · Computing • X86, OpenPower, ARM • GPU and FPGA assist • Push Video encode/decode, streaming, rendering and analytics into the network

Edge Analytics for Autonomous System

• Video Processing/Analytics– GPU based Video Analytics

– Tegra K1 GPU+ARM Cluster

– Low latency RapidIO Fabric

– Remote Memory inter-node transfer

02/18/16 Devashish Paul, Director Strategic Marketing 24

Autonomous Vehicle

Video Analytics/Object

Recognition

Deep Learning/Object Analytics

Page 25: Low Latency Video Analytics Servers · Computing • X86, OpenPower, ARM • GPU and FPGA assist • Push Video encode/decode, streaming, rendering and analytics into the network

5G Lab Germany: Edge Analytics for Autonomous Vehicles

• Network Edge and In Vehicle Analytics– Edge Node Multi processor network

– GPU/x86/ARM/Open Power based Analytics

– Low latency RapidIO Fabric

– In vehicle sensor fusion in real time with low latency

– Leverage OCP Innovations (Edge Appliance and ToR)

Devashish Paul, Director Strategic Marketing 25

Autonomous Vehicle

Video Analytics/Object

Recognition

Deep Learning/Object Analytics

Page 26: Low Latency Video Analytics Servers · Computing • X86, OpenPower, ARM • GPU and FPGA assist • Push Video encode/decode, streaming, rendering and analytics into the network

RapidIO Video Analytics Edge Server

• Low Latency Servers

• 20 to 50 Gbps RapidIO

interconnect with 100ns

latency

• Heterogeneous

Computing

• X86, OpenPower, ARM

• GPU and FPGA assist

• Push Video

encode/decode,

streaming, rendering

and analytics into the

network edge

Devashish Paul, Director Strategic Marketing 26

Board leveRapidIOInter-

ConnectFabric

FPGA/GPU

accelerators

CPU

Storage

Cable / connector

Backplane

RapidIO Rack Scale Interconnect Fabric

Other nodes

Other nodes

RapidIO Backplane Interconnect Fabric

Video Analytics = HPC ‘like’

Video server