Building the Reconfigurable Cloud...

82
High-Performance Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Toronto Building the Reconfigurable Cloud Ecosystem Paul Chow April 8, 2017

Transcript of Building the Reconfigurable Cloud...

High-Performance Reconfigurable Computing Group Department of Electrical and Computer Engineering

University of Toronto

Building the Reconfigurable Cloud Ecosystem

Paul Chow

April 8, 2017

The Ecosystem

•  What is it? What do we have now?

•  Why do we need one?

•  What do we need?

•  What are we doing about it at UofT?

•  What next?

April 8, 2017 ETCD 2017

2

WHAT DO WE HAVE NOW?

April 8, 2017 ETCD 2017

3

Systems we can learn from

•  Microsoft Catapult – two excellent papers

•  Amazon EC2 F1 – online blogs and documentation

•  Baidu – Hot Chips presentations, other online

•  Others?

•  Hard for academia to do research at scale

April 8, 2017 ETCD 2017

4

Characteristics

•  Cool hardware, but …

•  Microsoft still uses Verilog, as far as I know

•  Amazon just gives you Vivado and some IP

•  Baidu has built accelerators accessed via APIs

•  How do mortals use these?

•  It’s the dark ages compared to software development!

April 8, 2017 ETCD 2017

5

What about High-Level Synthesis? •  Raises the level of abstraction – more software-like •  Lots of great research •  Absolutely necessary •  Tremendous progress recently •  Can describe complex computations and functions and

create hardware But!!! We are still building custom hardware.

HLS is not sufficient, only a part of the big picture…

April 8, 2017 ETCD 2017

6

What about OpenCL?

•  Vendor tools using a high-level language standard

•  Getting closer

•  Can almost write code, <CR>, Run –  Closer to software environment –  Abstracts the hardware –  Data movers, run time, scheduling, etc. are transparent

to user

April 8, 2017 ETCD 2017

7

But, we’re talking about clouds

April 8, 2017 ETCD 2017

8

Not just one host, one accelerator

But, we’re talking about clouds

April 8, 2017 ETCD 2017

9

Not two

But, we’re talking about clouds

April 8, 2017 ETCD 2017

10 Not

More like

April 8, 2017 ETCD 2017

11

And many more…

Clouds are about scaling

•  And elasticity

•  And resilience

•  And sharing and virtualization

•  And security and privacy

•  And accessibility by users, devices, applications

April 8, 2017 ETCD 2017

12

April 8, 2017 ETCD 2017

13

Any of this available for FPGAs?

It depends …

What’s your architectural perspective?

April 8, 2017 ETCD 2017

14

CPU

FPGA

CPU

FPGA

CPU

FPGA

CPU

FPGA

CPU

FPGA

CPU

CPU

CPU

FPGA FPGA FPGA

Accelerator Peers

Accelerator Model

•  Amazon EC2 F1, Intel x86 + Arria

•  Virtual machine (VM) + accelerator

•  Many cloud issues are handled via VM –  Still need to manage the communication between the

host and the FPGA, resource allocation –  Easier if FPGA is not network-connected

April 8, 2017 ETCD 2017

15

Peer Model •  Microsoft Catapult 2 •  Pools of computing resources

–  Pick the appropriate one

April 8, 2017 ETCD 2017

16

ToR ToR

CS CS

ToR ToR

Bing Ranking SW

HPC

Bing Ranking HW

Speech to text

Large-scale deep learning

•  No equivalent cloud infrastructure for FPGA part

From Derek Chiou

Why the Peer Model? •  Easier development: SW Prototyping à Migration •  Model makes no distinction between CPUs and FPGAs

(in terms of data communication, synchronization) •  Heterogeneous or FPGA-only systems are easier to

program – keep a uniform programming model •  As performance requirements increase, e.g., video +

5G, CPUs will not keep up leaving FPGA-only solutions for these tasks

April 8, 2017 ETCD 2017

17

Cloud Challenges of Peer Model

•  Very little infrastructure to deal with heterogeneity

•  FPGAs are very different from CPUs

•  Focus of our research at UofT –  Building the cloud ecosystem for the reconfigurable

cloud

April 8, 2017 ETCD 2017

18

THE RECONFIGURABLE CLOUD ECOSYSTEM April 8, 2017 ETCD 2017

19

Ecosystem: www.dictionary.com

noun

1. a system, or a group of interconnected elements, formed by the interaction of a community of organisms with their environment.

2. any system or network of interconnecting and interacting parts

April 8, 2017 ETCD 2017

20

What parts are needed?

•  Almost everything!

•  Learn from software

•  Linux, software portability, scalability, platforms

•  Networking, management, security, resilience

•  Etc. …

April 8, 2017 ETCD 2017

21

Why do we need this ecosystem? •  Already have a software ecosystem

–  Continues to grow

•  World is becoming heterogeneous –  Expand the current software ecosystem to encompass

heterogeneity

•  User should just see a collection of processing elements and pick appropriate one to use when required –  Cannot continue to treat accelerators as a special case augmenting

a current programming model –  Scales better

April 8, 2017 ETCD 2017

22

WHAT DO WE NEED?

April 8, 2017 ETCD 2017

23

The Parts (Before the Cloud)

April 8, 2017 ETCD 2017

24

SW Middleware

BIOS Processor Hardware

SW OS

SW Application

Software

MPI Rank 0 and Software Ranks

MPI Library

Linux

Xeon Processor Motherboard

The Parts (Before the Cloud)

April 8, 2017 ETCD 2017

25

SW Middleware

BIOS Processor Hardware

SW OS

SW Application

BSPS

HW OS

HW Middleware

HW Application

BSPH

Software Hardware

Hardware MPI Ranks

Message Passing Engine (MPE)

MPI Network Infrastructure

FPGA BSP

The Parts (Before the Cloud)

April 8, 2017 ETCD 2017

26

SW Middleware

BIOS Processor Hardware

SW OS

SW Application

BSPS

HW OS

HW Middleware

HW Application

BSPH

Software Hardware

PCIe QPI AXI Network

Interconnect

Plus Cloud

April 8, 2017 ETCD 2017

27

SW Middleware

BIOS Processor Hardware

SW OS

SW Application

BSPS

HW OS

HW Middleware

HW Application

BSPH

Software Hardware

Resource Management

Resource Allocation

Deployment

Task Scheduling

Cloud Management

Networking

Interconnect

HOW DO WE GET THERE?

April 8, 2017 ETCD 2017

28

Start with Software Ecosystem •  Build from software as much as possible

–  Already lots of knowledge and infrastructure

•  OpenStack is starting point for several groups –  Cloud resource management –  IBM, Huawei, UofT

•  Virtualization –  Means many things! –  Sharing, abstraction

April 8, 2017 ETCD 2017

29

U OF T WORK

April 8, 2017 ETCD 2017

30

ENABLING FLEXIBLE NETWORK FPGA CLUSTERS IN A HETEROGENEOUS CLOUD DATA CENTER Naif Tarafdar, Thomas Lin, Eric Fukuda,

Hadi Bannazadeh, Alberto Leon-Garcia, Paul Chow

University of Toronto

FPGA 2017

31

April 8, 2017 ETCD 2017

Problems We Target

•  Large multi-FPGA systems –  Create abstraction between FPGAs in multi-FPGA

systems –  Easy scalability of system

32

April 8, 2017 ETCD 2017

Problems We Target

•  Large multi-FPGA systems –  Create abstraction between FPGAs in multi-FPGA

systems –  Easy scalability of system

•  Network capabilities –  FPGA cluster directly accessible by any other network

device in the datacenter

33

April 8, 2017 ETCD 2017

Overall System View

34

FPGA Mapping File Logical Cluster Description

FPGA Cluster Generator

User Input From User

April 8, 2017 ETCD 2017

Overall System View

35

FPGA Cluster Generator

User

Output to VM with FPGA Tools Individual FPGA

Projects

April 8, 2017 ETCD 2017

Overall System View

36

FPGA Cluster Generator

User

Output to Cloud Manager Command For

Resource Allocation

Commands For Connecting FPGAs to Network

April 8, 2017 ETCD 2017

Overall System View

37

FPGA Cluster Generator

User Output To User

MAC addresses of FPGAs in Multi-FPGA Cluster

April 8, 2017 ETCD 2017

Baseline Infrastructure

•  SAVI (Smart Applications on Virtualized Infrastructure)

•  OpenStack (Cloud Managing Software)

•  Xilinx SDAccel (FPGA Hypervisor) 38

April 8, 2017 ETCD 2017

SAVI (Smart Applications on Virtualized Infrastructure)

39

April 8, 2017 ETCD 2017

Cloud Management Software: OpenStack

40

April 8, 2017 ETCD 2017

FPGA Hypervisor: Xilinx SDAccel

•  Abstracts physical hardware on FPGA and provides software interface for these modules

•  Part of Xilinx SDAccel •  No network interface

41 April 8, 2017 ETCD 2017

Logical Cluster Description

42 FPGA Mapping File

Kernel A FPGA 1

Kernel B FPGA 1

Kernel C FPGA 2

April 8, 2017 ETCD 2017

Physical Mapping

43

April 8, 2017 ETCD 2017

I/O to FPGAs in Cluster

Input Output

44

April 8, 2017 ETCD 2017

Scaling Up the Clusters

45

April 8, 2017 ETCD 2017

Networking Backend

46

OpenStack SAVI Network Manager

FPGA Cluster Generator

Network Port Request

April 8, 2017 ETCD 2017

Networking Backend

47

OpenStack SAVI Network Manager

FPGA Cluster Generator

Network MAC address

April 8, 2017 ETCD 2017

Networking Backend

48

OpenStack SAVI Network Manager

FPGA Cluster Generator

Network MAC address

FPGA Port on Physical Switch

April 8, 2017 ETCD 2017

Case Study: Scalability of Query Processing Engine

49

•  Representative Case study: Database Streaming Query Processing Engine –  Size –  Streaming

•  Scalable

April 8, 2017 ETCD 2017

Case Study: Scalability of Query Processing Engine

50

Query Processing Engine

April 8, 2017 ETCD 2017

Case Study: Scalability of Query Processing Engine

51

Query Processing Engine

Scheduler

Query Processing Engine

April 8, 2017 ETCD 2017

Case Study: Scalability of Query Processing Engine

52

Query Processing Engine

Scheduler

Query Processing Engine -Replicated 6 times -  3 FPGAs -  2 units /FPGA

April 8, 2017 ETCD 2017

HETEROGENEOUS VIRTUALIZED NETWORK FUNCTION CHAINING

53

April 8, 2017 ETCD 2017

Software-Defined Networking

April 8, 2017 ETCD 2017

54

VNF Service Chain Example

April 8, 2017 ETCD 2017

55

Software VNF Chaining  •  Initially VM1 and VM2 talk through the switch

–  Could be software VNFs

•  Automate adding a VNF between VM1 and VM2 –  Example: Signature matching

•  Input: VM1, VM2, type of VNF desired

•  Result: traffic flows through VNF

VM1 VM2

Controller

Switch

April 8, 2017 ETCD 2017

56

Software VNF Chaining  •  Modifyflowsintheswitchwiththecontroller

•  VM1's traffic is routed to VNF1

•  VNF1's traffic is routed to VM2

•  No modification to VM1, VM2, or VNF1

•  No user action required

•  Any VNF can be software (VM) or hardware (FPGA)

VNF1 VM1 VM2

Controller

Switch

April 8, 2017 ETCD 2017

57

Service Chain Scheduler

April 8, 2017 ETCD 2017

58

VNFs with FPGAs

•  Use same SDI interface but with FPGAs

•  OpenStack (Neutron) used to create Network Port 

•  FPGA VNF: Partial bitstream à requires FPGA "hypervisor”

April 8, 2017 ETCD 2017

59

FPGA Hypervisor

•  Hypervisor contains PCIe module which is the master for –  An off-chip DRAM Controller

•  PCIe passthrough used to connect VM with FPGA

•  Processor on FPGA receives VNF application, gates the FPGA (to ensure no corruption) and then programs the FPGA with the partial bitstream

April 8, 2017 ETCD 2017

60

Gating Partial Region

WhatifEthernetisinthemiddleofatransac6onduringthepar6alreconfigura6on?!•  ThereadysignalofEthernetnevergoesupagain!Thesolu6onisGa6ngthePar6alRegion.Thesegatesmakethereconfigura6onsafe!

VNF Gate Gate Input Stream Output

Stream

Bitstream

April 8, 2017 ETCD 2017

61

Partial Bitstream Generation •  TheVNFimplementedhasastandardizedinterface

•  ScripttakesaVNFusingaboveportsandplacesitintheapplica6onregion,automa6callycreatingthebitstreamsneeded.

VNF Reset Clock

Input Stream Output Stream

April 8, 2017 ETCD 2017

62

Partial Bitstream Generation

VNF

 The steps that the scripts do are as follows:•  PuttheVNFIPintothesta6cregionandmaketheconnec6ons.•  SynthesizetheVNFhardware•  LoadthenetlistofVNFintosta6cregionwithlockedplaceandroute•  PlaceandroutetheVNF•  GeneratebitstreamofVNF

ScriptsVNF

April 8, 2017 ETCD 2017

63

THE HYPERVISOR

April 8, 2017 ETCD 2017

64

What is the Hypervisor?

•  Microsoft calls it the “Shell”

•  An API that presents the I/O of the devices

•  Provides services to application region (MS “Role”)

•  Protection, security

April 8, 2017 ETCD 2017

65

BSPS

HW OS

HW Middleware

HW Application

BSPH Hypervisor

Requirements

•  Support virtualization

•  Multi-tenant/multi-user/multi-task on FPGAs

•  Abstracted Peripheral I/O

April 8, 2017 ETCD 2017

66

More thoughts à

What is Virtualization?

•  SW Analogues – Server Virt. (i.e. Hypervisor) –  Emulates existing physical architecture (of PCs/Servers)

for its I/O abstraction –  Multiple “emulated systems” on a single host system for

multi-tenant/multi-user –  Includes both data isolation and performance isolation –  Often tightly coupled with orchestration SW

April 8, 2017 ETCD 2017

67

What is Virtualization?

•  SW Analogues – Operating System –  Not often thought of as virtualization –  Creates a multi-user/multi-tasking environment –  Provides an “invented” abstraction layer for I/O access

•  Virtual Memory (in HW), and the OS ABI & APIs (in SW)

–  Generally no performance isolation

April 8, 2017 ETCD 2017

68

What is Virtualization?

•  SW Analogues – Containerization –  Server Virtualization “light” –  Creates and manages multiple separate instances of the

“invented” abstraction layer of an OS •  Each container looks like it has exclusive access to the

entire OS

–  Adds some device emulation (e.g. vNIC, vSwitch) –  Adds support for performance isolation

April 8, 2017 ETCD 2017

69

System Requirements

•  Can be implemented on multiple vendor’s FPGAs

•  Multiple applications on same FPGA (either temporally or spatially)

•  Abstraction layer for peripheral I/O Access

•  External management ability

April 8, 2017 ETCD 2017

70

Implications of Vendor Portability

•  Recompilation for different vendors/PR regions will always be required, therefore: –  Lowest level for portable executable in such a system is

source code (maybe netlist?) –  Some abstractions/features can be implemented during

static compilation at no cost, if need be (i.e. as HDL IP)

April 8, 2017 ETCD 2017

71

Implications of Multi-Tasking •  Requires Data Security/Isolation

–  Page Tables/MMU for MM resources –  Secure channels for stream-based resources

•  Performance Isolation –  To prevent DDoS-like attacks –  Interesting research area (not aware of any research)

•  Partial Reconfig Necessary –  Some standardized interface required –  Signal decoupling during reconfig

April 8, 2017 ETCD 2017

72

Implications of Multi-Tasking

•  Decisions to Make –  Hierarchical Resource Management? (i.e. user groups) –  Local vs. Remote hosted CAD tools? –  Different-sized PR Regions?

April 8, 2017 ETCD 2017

73

Implications of External Mgmt.

•  Need to expose either: –  Some management protocol connected to an accessible

common network –  On-chip CPU running a lightweight SW OS with remote

access (e.g. SSH) •  Note – “on-chip CPU” can be modelled by a CPU

connected to a PCIe FPGA card for our purposes (to use the infrastructure that we already have in place)

April 8, 2017 ETCD 2017

74

Another Challenge -- Migration

April 8, 2017 ETCD 2017

75 APPLICATION

HWHYPERVISOR

STANDARDAPITOMEMORYAND

NETWORK

MIGRATIONSIGNALSMIGRATIONCONTROLLER

STANDARDAPITONETWORK

MMURAM MEMCONTROLLER

FPGA

CONTROLFROMHOSTCPU

HOSTCPU

ETHERNETCONTROLLER STANDARDAPI

TONETWORK

STANDARDAPITOMEMORY

ETHERNETPORT

?

Live Migration Controller

HYPERVISOR GENERATOR

April 8, 2017 ETCD 2017

76

What is it? •  Need to preserve the hypervisor abstraction across

platforms –  Different vendors, devices, board configurations

•  Hypervisor supports the rest of the “stack” so that higher layers can remain isolated from low-level platform changes –  Software has been very successful at this –  Must do this for hardware

•  Avoid building it by hand every time –  Idea only about 2 weeks old, so needs more thinking!

April 8, 2017 ETCD 2017

77

April 8, 2017 ETCD 2017

78

Generator

Hypervisor Description

Board Description

Hypervisor Bitstreams

WHAT IS NEEDED NEXT?

April 8, 2017 ETCD 2017

79

Conclusions •  Lots of focus on HLS today – it’s needed, not sufficient

•  Some working now on other layers – need identified

•  To achieve a cloud ecosystem for using FPGAs, much more is needed – it’s a big stack

•  Need a coordinated effort to enable cloud computing with FPGAs – cannot be haphazard à need a plan –  Open source is only way to harness enough resources –  How do we do this?

April 8, 2017 ETCD 2017

80

Acknowledgements

Stuart Byma, Naif Tarafdar, Eric Fukuda, Daniel Ly-Ma, Daniel Rozhko, Roberto DiCecco, Nariman Eskandari

SAVI – Prof. Alberto Leon-Garcia, Hadi Bannazadeh, Thomas Lin April 8, 2017 ETCD 2017

81

emSYSCAN

Questions?

April 8, 2017 ETCD 2017

82