by Spandan Bemby - University of Toronto T-Space · 2016. 11. 17. · means of production. Cloud...

Orchestration over Heterogeneous Infrastructures

by

Spandan Bemby

A thesis submitted in conformity with the requirementsfor the degree of Master of Applied Science

Graduate Department of Electrical and Computer EngineeringUniversity of Toronto

c© Copyright 2016 by Spandan Bemby

Abstract

Orchestration over Heterogeneous Infrastructures

Spandan Bemby

Master of Applied Science

Graduate Department of Electrical and Computer Engineering

University of Toronto

2016

The future cloud ecosystem will be very diverse. On account of differences in offerings,

prices, and locations, resource allocations may span multiple public cloud providers and

include private resource pools in the form of virtual customer premise edges. Additionally,

future applications will require more powerful networking paradigms like software-defined

networking (SDN), which provide a centralized and fine-grained view of the network.

When considering private resource pools, we must extend the notion of SDN to other

resource types and consider software-defined infrastructure (SDI)- a resource management

approach that converges the management of heterogeneous resource types and provides

a centralized view over all resources. This work proposes Vino, a system for managing

resources in heterogeneous domains (public and private clouds) as well as orchestration

over these heterogeneous infrastructures. Additionally, Vino enables SDI capabilities on

arbitrary clouds by leveraging overlay networks. We design, prototype and evaluate the

Vino system, capable of handling the aforementioned tasks.

ii

For my mom, dad, sister, and my three angels, Vanya, Samaya, and Nav.

iii

Contents

1 Introduction 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.4 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Background 7

2.1 Cloud Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 Public Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2.1 Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2.2 Disadvantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3 Private Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3.1 Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.3.2 Disadvantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.4 Hybrid, Multi Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.5 Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.5.1 Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.5.2 Disadvantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.5.3 Hardware Virtualization . . . . . . . . . . . . . . . . . . . . . . . 13

2.5.4 Network Virtualization . . . . . . . . . . . . . . . . . . . . . . . . 13

2.6 Compute Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.6.1 Bare Metal Machines . . . . . . . . . . . . . . . . . . . . . . . . 15

2.6.2 Virtual Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.6.3 Containers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.6.4 Lambda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.6.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.7 Software-defined Networking . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.7.1 OpenFlow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

iv

2.7.2 Network Function Virtualization . . . . . . . . . . . . . . . . . . . 18

2.8 Resource Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.8.1 OpenStack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.9 Software-defined Infrastructure . . . . . . . . . . . . . . . . . . . . . . . 19

2.10 Smart Application on Virtual Infrastructure . . . . . . . . . . . . . . . . 20

2.11 Orchestration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3 Related Work 24

3.1 Single Cloud Orchestration . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.1.1 Secondnet, VDC Planner . . . . . . . . . . . . . . . . . . . . . . . 24

3.1.2 Borg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.1.3 AWS: CloudFormation . . . . . . . . . . . . . . . . . . . . . . . . 25

3.1.4 OpenStack: Heat . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.2 Multi Cloud Orchestration Tools . . . . . . . . . . . . . . . . . . . . . . 26

3.2.1 Cloud Resource Orchestration: A Data-Centric Approach, Declar-

ative Automated Cloud Resource Orchestration . . . . . . . . . . 26

3.2.2 Networked Cloud Orchestration: A GENI Perspective . . . . . . . 26

3.2.3 Greenhead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.2.4 CloudMF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.2.5 Multi-Cloud Brokering . . . . . . . . . . . . . . . . . . . . . . . . 27

3.2.6 Terraform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.2.7 The Topology and Orchestration Specification (TOSCA) . . . . . 27

3.3 SDN over Legacy Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.3.1 Fibbing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.3.2 Ravello Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.3.3 OpenContrail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.4 Configuration/Orchestration Tools . . . . . . . . . . . . . . . . . . . . . 29

3.4.1 Salt Cloud, Ansible . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4 Design of Multidimensional Orchestrator 30

4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.2 Orchestration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.2.2 Requirements on the Substrate . . . . . . . . . . . . . . . . . . . 32

4.2.3 Modelling the Application . . . . . . . . . . . . . . . . . . . . . . 32

4.3 Resource Management Overview . . . . . . . . . . . . . . . . . . . . . . . 36

4.4 Resource Provisioning Model . . . . . . . . . . . . . . . . . . . . . . . . . 36

v

4.4.1 Native Provisioning . . . . . . . . . . . . . . . . . . . . . . . . . 36

4.4.2 Delegated Provisioning . . . . . . . . . . . . . . . . . . . . . . . . 38

4.4.3 Fully-managed Provisioning . . . . . . . . . . . . . . . . . . . . . 38

4.4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4.5 Organization of Resource Controllers . . . . . . . . . . . . . . . . . . . . 40

4.5.1 OpenStack RMS . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4.5.2 Software-defined Infrastructure RMS . . . . . . . . . . . . . . . . 41

4.6 Vino Version 1: SDN Orchestration Over a Single Legacy Cloud . . . . . 43

4.6.1 Initial Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4.6.2 Adapting the Design . . . . . . . . . . . . . . . . . . . . . . . . . 44

4.6.3 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.7 Vino Version 2: SDN Orchestration Over Multiple Legacy Cloud . . . . . 47

4.8 Vino Version 3: SDN Orchestration Over Unmanaged Resources . . . . . 47

4.8.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.8.2 Types of Virtualization . . . . . . . . . . . . . . . . . . . . . . . 49

4.8.3 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.8.4 Modelling the Substrates . . . . . . . . . . . . . . . . . . . . . . . 50

4.9 Vino Version 4: Container Orchestration . . . . . . . . . . . . . . . . . . 51

4.9.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.9.2 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

5 Implementation of Multidimensional Orchestrator 53

5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

5.2 Programming Language . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

5.2.1 Java . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

5.2.2 Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

5.2.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

5.3 Data Serialization Language . . . . . . . . . . . . . . . . . . . . . . . . . 54

5.3.1 XML (Extensible Markup Language) . . . . . . . . . . . . . . . . 54

5.3.2 JSON (JavaScript Object Notation) . . . . . . . . . . . . . . . . . 55

5.3.3 YAML (YAML Aint Markup Language) . . . . . . . . . . . . . . 55

5.3.4 Custom Language . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

5.3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5.4 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5.4.1 Bootloader . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

5.4.2 Orchestrator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

vi

5.5 Bootloader Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

5.5.1 Parser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

5.5.2 Remote Code Execution . . . . . . . . . . . . . . . . . . . . . . . 58

5.6 Orchestrator Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

5.6.1 Parser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

5.6.2 Declared Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

5.6.3 Parametrization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

5.6.4 Configuration File . . . . . . . . . . . . . . . . . . . . . . . . . . 64

5.6.5 Special Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

5.6.6 Dependency Resolution . . . . . . . . . . . . . . . . . . . . . . . . 65

5.6.7 Cloud Drivers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

5.6.8 Creating the Topology . . . . . . . . . . . . . . . . . . . . . . . . 66

5.6.9 Logical Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

5.6.10 Virtual Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

5.6.11 Containers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5.6.12 Network Tunnels . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5.7 Traffic Steering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

5.7.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

5.7.2 Portal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

6 Evaluation 75

6.1 Functional Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

6.1.1 WordPress Firewall Exposition . . . . . . . . . . . . . . . . . . . 76

6.2 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

6.3 Vino Parser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

6.3.1 Parsing Time Overhead . . . . . . . . . . . . . . . . . . . . . . . 80

6.3.2 Memory Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . 80

6.3.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

6.4 SDI Overlay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

6.4.1 VXLAN Tunnels . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

6.4.2 OpenVPN Bridged Tunnels . . . . . . . . . . . . . . . . . . . . . 85

6.4.3 Testing Ryu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

6.4.4 Testing the SDI Manager . . . . . . . . . . . . . . . . . . . . . . . 86

6.4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

vii

7 Conclusions 90

7.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

Bibliography 93

viii

List of Tables

6.1 Total time to allocate various topologies and the parser overhead on SAVI. 81

6.2 Total time to allocate various topologies and the parser overhead on AWS. 81

6.3 Total memory used for the different topologies. . . . . . . . . . . . . . . . 82

6.4 Comparison of underlay and VXLAN throughput for various configurations. 84

6.5 Comparison of underlay and OpenVPN tunnel throughput. . . . . . . . . 85

6.6 Statistics on the number of responses sent by the Ryu SDN controller. . . 87

6.7 Statistics on the number of responses sent by the Ryu SDN controller and

SDI manager. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

6.8 Statistics on the number of responses sent by the Ryu SDN controller and

SDI manager. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

ix

List of Figures

1.1 An example illustrating a complex orchestration scenario. This involves

sensors, local private servers, and resources on a public cloud (AWS). The

sensors collect Wi-Fi probe packets and send them to a local server. The

local server runs a predictive algorithm to determine when and what the

user will order. The local server, also sends the probe packets to AWS to

be stored. These components need to be connected using secure channels

with tasks distributed over each of the nodes. . . . . . . . . . . . . . . . 4

2.1 The relative weights of different characteristics of the various compute

environment. Isolation refers to how isolated the environment is. Per-

formance refers to how many system resources are used to perform actual

work, as opposed to being used for virtualization, i.e. the opposite of over-

head. Flexibility refers to how much flexibility the user has. Here, BMs

perform poorly since there hardware and kernel is fixed. Virtualization

refers to the amount of virtualization being performed. . . . . . . . . . . 14

2.2 Different types of resources sorted by increasing flexibility and isolation. . 15

2.3 The two different views of the terms resource and resource management.

Type 1 refers to system resources and management thereof. Type 2 refers

to processing units and clusters of processing units and their management. 19

2.4 A conceptual view of the SDI RMS. . . . . . . . . . . . . . . . . . . . . . 21

2.5 A conceptual view of a SAVI node modelled after the SDI RMS. . . . . . 22

4.1 An example of how different components, e.g. the compute manager, and

monitoring manager must be coordinated to realize a complex application,

e.g. an autoscaling web server deployment. . . . . . . . . . . . . . . . . . 31

4.2 A conceptual view of how the resource provisioning middleware interfaces

with the user and the cloud. . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.3 A conceptual view of the native provisioning model. . . . . . . . . . . . . 37

4.4 A conceptual view of the delegated provisioning model. . . . . . . . . . . 38

x

4.5 A conceptual view of the full middleware provisioning model. . . . . . . . 39

4.6 The upshifting of virtualization stack when OpenStack is deployed on vir-

tual machines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.7 A conceptual view of the OpenStack RMS. . . . . . . . . . . . . . . . . . 42

4.8 A conceptual view of the SDI RMS. . . . . . . . . . . . . . . . . . . . . . 43

4.9 A conceptual view of the Vino RMS. . . . . . . . . . . . . . . . . . . . . 46

4.10 A conceptual view of the Vino RMS V2. . . . . . . . . . . . . . . . . . . 47

4.11 A conceptual view of the Vino RMS V3. This shows how unmanaged

resources are brought under the purview of a RMS. The logic surrounding

resource management is the same Vino RMS V2. . . . . . . . . . . . . . 48

4.12 A conceptual view of the Vino RMS V4. . . . . . . . . . . . . . . . . . . 51

5.1 Final version of VTL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.2 Example of a VTL file with the complete set of features. . . . . . . . . . 61

5.3 Continuation of the above topology file. . . . . . . . . . . . . . . . . . . . 62

5.4 Node configuration snippet. User can specify a list of configurations in the

form of playbooks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

5.5 A conceptual view of vxlan tunnels. . . . . . . . . . . . . . . . . . . . . . 70

5.6 A conceptual view of an OpenVPN setup. . . . . . . . . . . . . . . . . . 71

5.7 The Vino Portal can be used to create service chains . . . . . . . . . . . 73

6.1 Example of a VTL topology file. . . . . . . . . . . . . . . . . . . . . . . . 77

6.2 Continuation of the above topology file. . . . . . . . . . . . . . . . . . . . 78

6.3 An example of a service chaining. The user specifies the endpoints, i.e.

the Gateway and the Web Server, and the middlebox, i.e. the DPI. This

install rules on the switches that forwards traffic going from the Gateway

to the Web Server to be sent to the DPI instead, which transparently

forwards it to the Web Server. This can be used for arbitrary VNFs. . . . 79

6.4 The total parsing and provisioning time as a function of number of nodes

on SAVI. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

6.5 The total parsing and provisioning time as a function of number of nodes

on AWS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

6.6 Comparison of underlay and VXLAN throughput for various configurations. 84

6.7 Comparison of underlay and VXLAN throughput. . . . . . . . . . . . . . 86

6.8 Performance of our network control stack compared with a single stan-

dalone Ryu instance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

xi

Chapter 1

Introduction

The history of humans is the the story of evolving technologies being used to answer

the questions of land and food. The last such technology was the Internet. Indeed, the

Internet has drastically changed many aspects of our lives. It was able to have such

an impact because, in a sense, it democratized the means of distributing content. The

emergent field of cloud computing goes a step further, by additionally democratizing the

means of production.

Cloud computing is a model of computing that allows on-demand access to resources

over a communication network. This allows efficient sharing of resources and reduces the

associated infrastructure cost for users. Combined with hardware virtualization, this ab-

stracts away considerations of machine and network failures and allows (re)configuration

of resources. Although the term cloud computing became popular in the mid 2000s (with

the introduction of Amazon’s Elastic Cloud Compute service), the architecture of multi-

plexing users over network accessible resources has existed since the late 1970s. Likewise,

hardware virtualization technologies, whereby users interface with virtual slices of physi-

cal servers have existed since the 1960s. The various practical and theoretical challenges

associated with these technologies has generated much interest in academia and industry.

The principles of cloud computing, i.e. resource sharing by disparate users and on-

demand availability, have been adapted as a business model whereby resources are leased

under a ”pay as you go” model. Cloud providers (CP) leverage economies of scale to

provide resources at a lower cost with higher reliability and availability. From the users

perspective, public clouds shifts the cost distribution from higher capital expenditure

(capex) to higher operational expenditure (opex), and can enable more agile prototyping.

Public cloud computing services, for reducing the complexity and cost associated with

managing private infrastructures are becoming increasingly popular.

Although, public clouds are a key part of industry information and communication

1

Chapter 1. Introduction 2

technology (ICT) solutions, the ecosystem is diverse and additionally consists of private

and hybrid clouds. Both public and private clouds have similar architectures, but the

former is typically multi-tenancy (i.e. resources are used by multiple users, where users

can be organizations or individuals), while the latter is typically single-tenancy (i.e.

resources are used by a single user). This allows for improved security in private clouds

and also allows custom modifications.

1.1 Motivation

Traditionally, ICT requirements were defined in terms of servers and how they were

networked together. As a result, public cloud providers evolved to primarily offer virtual

analogues to existing physical resources, e.g. servers. This led to cloud providers only

supporting IP networking, while ignoring the more flexible software-defined networking

(SDN) paradigm. This created a fundamental mismatch between the offerings of cloud

providers and the needs of users who required more advanced networking, e.g. users

trying to do network function virtualization (NFV).

The cloud ecosystem consists of many different public cloud providers, with varying

prices, locations, features, etc. When deploying an application, an optimal resource

allocation may span multiple providers (optimality can be with regards to cost, energy

usage, availability, etc.) [45], [46]. Furthermore, there is large variation in user needs.

Specifically, some users may prefer private resources due to increased security [16] and

customizability. Additionally, some private resource pools may be unmanaged, i.e. exist

as individual physical servers outside the purview of a centralized manager or resource

management system (RMS). In this situation, users need a sensible way of interfacing

with, and provisioning resources over this substrate.

This heterogeneity of user requirements and infrastructure capabilities highlights the

problem of orchestration. Orchestration is the coordinated provisioning, modification,

and deprovisioning of compute, network, and storage resources, in order to form a service

that satisfies some high-level objective (e.g. policy requirements, constrained optimiza-

tion over some heuristics). To realize an orchestration system, we require: 1) a system

capable of supporting diverse orchestration tasks, such as provisioning virtual machines

(VM), and 2) control logic that determines when to provision and deprovision resources

to satisfy some objective. This thesis is concerned with the former.

To illustrate this problem, consider a food festival, where food vendors, collectively,

want to reduce the time users have to wait before receiving their order (see Figure 1.1).

Indeed, there exists a model that can predict, with high accuracy, the time, the vendor,


and the food item that any user will order, given the near real-time position information

of the users. The rationale here is that the vendor can start preparing the item before

the user places the order so that when the user arrives the item is ready to be served.

In order to get this position data, there exist Wi-Fi sensors that listen for Wi-Fi probe

request packets sent by users’ smart phones. There are also some small servers located

on the festival grounds that can process the probe packets as per the model and inform

the vendors to preemptively start preparing certain items. Finally, we want all this

probing data to be stored in a public cloud, like Amazon Web Service (AWS) for future

modelling. This is an example of orchestration since different types of nodes, i.e. sensors,

local private servers, and servers in public remote datacenters, and different resource

types, i.e. compute for running the predictive algorithm, storage for storing the data,

and network for communicating the data between the various nodes, must be coordinated

to create a service.

The current public cloud offerings have become very popular; however, as illustrated

by the above example, user requirements extend beyond the offerings of cloud providers.

For the promise of cloud to be realized, users must have access to a more powerful net-

working model, and be able to orchestrate over heterogeneous infrastructures, including

private and hybrid deployment schemes. Indeed, some of the biggest cloud providers use

SDN internally [42] and could easily expose that functionality to their end user. Fur-

thermore, there is work being done to facilitate multiple autonomous clouds coordinating

(directly or through a broker) to realize user requests [30], [19]. However, until such solu-

tions are more mature, our proposed approach provides a transitory platform. This thesis

is motivated by the gap between user requirements and cloud offerings. Specifically, we

propose a design, implementation, and evaluation of a system that enables orchestration

(including advanced networking) over heterogeneous infrastructures.

1.2 Problem Statement

The goal of this thesis is to design a system that allows extending SDN capabilities to

multiple legacy infrastructures and allows orchestration over the diverse cloud landscape.

This requires designing a unified orchestration layer that can handle a wide range of

orchestration tasks. We propose the following goals for this thesis:

1. Evaluate the existing approaches to bringing SDN to legacy systems.

2. Evaluate the existing approaches to orchestrating over single and multiple domains.


Wi-Fi Sensor

Wi-Fi Sensor

Wi-Fi Probe Packet

Wi-Fi Probe Packet

AWSLocal

Server

Figure 1.1: An example illustrating a complex orchestration scenario. This involvessensors, local private servers, and resources on a public cloud (AWS). The sensors collectWi-Fi probe packets and send them to a local server. The local server runs a predictivealgorithm to determine when and what the user will order. The local server, also sendsthe probe packets to AWS to be stored. These components need to be connected usingsecure channels with tasks distributed over each of the nodes.


3. Design a system that supports advanced SDN features such as traffic steering. With

regards to the above scenario, this means that if a local server goes down, then we

would have the ability to steer the probe data from the sensors directly to the

public cloud, to be processed there.

4. Design a system that facilitates diverse orchestrations tasks, namely, multi-cloud

orchestration, managing of unmanaged physical resources, container and VM or-

chestration, and node configuration. With regards to the above example, this means

we can coordinate the orchestration across diverse nodes and resources types.

5. Implement a system that realizes the above design.

6. Analyze the solution and quantify its performance and scalability.

We anticipate facing the following challenges as a result of the above enumerated

objectives.

1. How to enable advanced SDN features such as traffic steering.

2. How to import our own control and management layer on an arbitrary cloud.

3. How to orchestrate over multiple SDN-enabled clouds.

4. How to design a unified abstraction over multiple orchestration tasks.

1.3 Contributions

The contributions of this thesis are the following:

1. Design and implementation of how to bring SDN capabilities to non-SDN infras-

tructures.

2. Design and implementation of a unified orchestration layer that supports multiple

orchestration tasks. This includes:

(a) an application templating language that abstracts multiple orchestration tasks

(b) a subsystem that is responsible for managing unmanaged substrates in order

to be orchestrated over,

(c) and a subsystem responsible for orchestration, i.e. provisioning and deprovi-

sioning user requested resources.

3. Evaluation of the system.


1.4 Organization

The remainder of this document is organized as follows. Chapter 2 provides background

information on the cloud and related technologies and concepts. Chapter 3 is a survey

of the related works, including an analysis of their shortcomings. Chapter 4 presents

a design that realizes the above objectives. Chapter 5 discusses the implementation.

Chapter 6 evaluates the systems using various metrics. Finally, Chapter 7, considers

conclusions of this work and areas of future research.

Chapter 2

Background

2.1 Cloud Computing

Cloud computing is a paradigm of computing where resource are:

• accessed over a network,

• used concurrently by multiple users

Although the term cloud computing is somewhat recent, the underlying technologies,

namely the Internet (in the form of ARPANET) for network accessibility and time-sharing

and virtualization (as realized in the IBM OS/360 [25]) for resource multiplexing have

existed since the 1970s and 1960s, respectively. Since their inception, there have been

many advancements in networking and virtualization. Modern cloud computing offerings

consist almost exclusively of virtualized resources; although the two do not have to be

coupled thus. The cloud in the context of cloud computing, refers to one or more resource

pools, over which resources can be provisioned.

Beyond these minimal conditions, cloud computing can further be categorized based

on the logistics of the hardware, and the interface with the user. Some of these cate-

gorizations may have more specialized names, depending on whether the niche is large

enough. The various models typically have names suffixed with ”as a service”, which

reflect the type of abstraction being exposed to the user. The following categorizations

are based on the logistics of the hardware.

• who owns the infrastructure

• whether the infrastructure is single or multi-tenancy, i.e. private and public cloud,

respectively

7

Chapter 2. Background 8

The following are the categorizations based on how the user interfaces with the re-

sources.

• the resources are physical or virtual

• the level of abstraction around the resources i.e. the user interacts with:

– virtualized analogues of physical resources, i.e. infrastructure as a service

(IaaS)

– runtimes, related libraries, and environments, i.e. platform as a service (PaaS)

– software, i.e. software as a service (SaaS)

– other models of accessing services, e.g. desktop as a service (DaaS), mobile

backend as a service (MBaaS)

2.2 Public Cloud

Public cloud is a cloud that is accessible to the general public. Typically, the infrastruc-

ture is virtualized and multi-tenancy, i.e. VMs from different users may be provisioned on

the same hardware. However, some providers support single-tenancy as well as baremetal

machines, i.e. non virtualized servers. The key players in this space include Amazon Web

Services (AWS), Google Compute Engine (GCE), and Microsoft Azure, among others.

2.2.1 Advantages

High Elasticity

Elasticity refers to the CPs ability to dynamically adapt the amount of provisioned

resources based on changes in user demand. High elasticity corresponds to the ability

to satisfy user requests that vary greatly in time. Public cloud providers typically have

large resource pools, allowing for high elasticity, i.e. the ability to scale up or scale down

the number of resources being used.

Low Cost

CPs typically maintain large resource pools. This allows them to benefit from economies

of scales, with regards to hardware as well as maintenance thereof, and subsequently

reduces per unit resource cost.


Lower capex, higher opex

One of the defining characteristics of modern cloud platforms is a pay-as-you-go model

where users provision and pay for resources when required and deprovision them when

not required. This shifts the cost distribution from capex to opex. This has the following

benefits:

• requires little upfront cost

• allows for more agility

• provides insurance against exponential decrease in cost of resources, e.g. Moores

law (transistor density computational power) , Kryders law (magnetic disk storage

density storage), Kecks law (fiber optic link capacity network throughput)

Greater availability and reliability

Public cloud providers typically possess a fleet so large that the probability that at least

one physical machine will fail over some period, e.g. a day, becomes very high. As such,

they spend effort diagnosing and retiring machines that are likely to fail. Additionally,

they have robust failover mechanisms intended to reduce downtime in case network links

or switches fail. Therefore, public cloud providers typically have greater availability and

reliability [23].

2.2.2 Disadvantages

Security

Public clouds raise various security concerns due to multi-tenancy and shared identity and

access management (IAM) systems. Specifically, various side-channel attacks have been

shown to recover secret keys (and generally any other data in memory) if the attacker

has a virtual machine hosted on the same physical machine as the victim [16], [44].

2.3 Private Cloud

A private cloud is a cloud where access is restricted to a single organization. Typically,

the cloud is owned and operated by the same organization, comparable to datacenters.

However, unlike datacenters, private clouds use virtualized resources, which brings a host

of benefits.


2.3.1 Advantages

Customizability

Private clouds are owned and operated by the same organization. Since the resources are

used exclusively by one organization, the resource management systems and the physical

resources can be customized as per user requirements.

Security

There are two facets to security shortcomings on public clouds. First, in a multi-tenancy

environment, users are vulnerable to various side channel attacks. These attacks as

demonstrated by Bernstein [16] and Yarom et al. [44], make use of time to load cache

entries to infer the cache contents of collocated users. In a public cloud, users cannot

choose where there VM will be located, and specifically, whether they will be collocated

with a malicious tenant. In fact, Ristenpart et al. [35] showed how to map Amazons EC2

infrastructure and infer the likely placement of victim VMs. They could then repeat-

edly provision instances, until an instance was collocated with the victim. Furthermore,

malicious co-tenants can infer the victim’s data and algorithms based on memory access

patterns. The single-tenancy of private clouds circumvents these issues.

The second issue, arises from the lack of trust between users and CPs, specifically

the inability of users to discern malicious CPs (this includes compromised CPs, since

compromised and inherently malicious are indistinguishable). We primarily address se-

curity concerns in terms of data confidentiality; albeit integrity and availability can be

reasoned similarly. We can secure data in networks and on disk storage; however, without

an efficient fully homomorphic encryption (FHE) [21], data in CPU pipeline and memory

is vulnerable. A malicious CP could emulate the entire hardware and view all the data

being passed into the CPU pipeline. Although, FHE schemes exists [21], they tend to

be a few orders of magnitude slower compared to processing plain text, and therefore

impractical [21]. A private cloud mitigates both of these situations.

2.3.2 Disadvantages

Low Elasticity, Underutilization

There is a tradeoff between elasticity and utilization of a cloud. High elasticity requires

large unused resources; whereas, high utilization requires few unused resources. If re-

source demand is variable, then organizations must choose between incurring costs due

to unused resources or being susceptible to increases in demand. Low elasticity can be


problematic if the load is highly variable. In order to account for variance in demand,

organization must have enough resources to account for their peak load. This leads to

resource underutilization equal to the difference between average and peak loads. These

issue are less pronounced with public clouds, since their pricing schemes account for these

differences in elasticity and utilization. By contrast, the cost of private clouds is borne

entirely by the owner organization. Therefore, if the demand is variable or future demand

is intractable, then private clouds can become inflexible or costly.

High Capex, Low Opex

In order to create a private cloud, the underlying physical resources must be purchased

before they can be used. This results in higher capex, due to the cost of the physical

resources as well the cost of configuring the resources. This is in sharp contrast to public

clouds, where users can start using resources without any prior investment. However, as

total cost is composed of opex and capex, private clouds may lead to a lower average

cost, due to a lower opex.

2.4 Hybrid, Multi Cloud

A hybrid cloud is a cloud composed of public and private clouds meant to overcome their

shortcomings. Public clouds offer lower opex, effectively at the cost of reduced security.

Whereas, private clouds offer greater security for higher capex. Furthermore, private

clouds incur costs due to unused resources because of differences in peak and average

demands. A hybrid cloud is composed of a public cloud and private cloud (typically with

capacity equal to the average load). It overcomes the shortcomings of public and private

clouds by using the constituent private cloud up to its maximum capacity, and then

using the public cloud for additional capacity, e.g. when load spikes. This arrangement

additionally requires planning with regards to resource allocation, e.g. since users would

want more sensitive code and data to live and execute on private infrastructure.

A multi cloud is a cloud composed of multiple clouds. A multi cloud is an arbitrary

combination of public, private, and hybrid clouds. Furthermore, the constituent clouds

in a multi cloud, can be other multi clouds. The thesis focuses on how different kinds of

clouds can be composed to realize a multi cloud.


2.5 Virtualization

Virtualization refers to a set of technologies where the user interfaces with a virtual

analogue of a physical resource. Virtualization can be applied to CPUs, storage, and

networks. Virtualization decouples considerations of failures and configuration of the

underlying resource. Typically, virtualization creates logical partitions of the underlying

resource, which can be individually provisioned and deprovisioned. Broadly speaking,

virtualization increases flexibility albeit with associated costs. Cloud computing and

virtualization are synergetic and allow the realization of computing as a utility. This is

because, whereas cloud computing removes concerns of resource management, virtualiza-

tion removes concerns of resource failure or configuration of resources. The following are

the tradeoffs of virtualization.

2.5.1 Advantages

Separation of Concerns and Flexibility

Virtualization decouples concerns of physical resources, e.g. machine failure, from the

logical state of the applications. This separation greatly facilitates operations like migra-

tion, taking snapshots of VMs, and creating redundant replicas.

Although, this varies from resource to resource, virtual instances offer more flexibil-

ity than their physical analogues. That is because, typically we can create the virtual

instance over any physical instance. For instance, we can create Linux virtual machines

over both Windows and other Linux based physical machines. This means there is no

vendor lock in, or issues due to incompatible hardware.

Reduced Cost, Higher Utilization

Virtualization enables cost reduction through statistical multiplexing over resources. This

additionally improves resource utilization. For instance, two applications may have de-

pendencies on non-compatible versions of the same library. If we do not use virtualization,

then only one of them can run per physical machine. However, with virtualization we can

create two virtual machines with each having a separate version of the library. Utilization

becomes a more pronounced issue when the physical machines have a large number of

system resources. However, higher utilization also reduces energy used.


2.5.2 Disadvantages

Reduced Performance

There is an overhead associated with any given virtualization technology. This is, in

part, due to the extra resources used to run the virtualizing system. For instance, when

creating VMs, some resources must be used to run the hypervisor. However, some forms of

virtualization can result in substandard performance due to how virtualization is created

and how the virtual component runs. That is, if a physical component runs in hardware,

while its virtual counterpart runs in software, there will be performance degradation on

account of differences in speeds of hardware and software execution.

2.5.3 Hardware Virtualization

Hardware virtualization refers to simulating the hardware such that multiple guest oper-

ating systems can run on the host system. Hardware virtualization is considered full, if

a guest operating system can run unmodified [20]. Furthermore, hardware virtualization

can be hardware-assisted or emulated. Hardware-assisted virtualization allows for direct

execution of instructions, barring privileged instructions, e.g. modifying the page table.

By contrast, for emulated systems, all instructions must pass through the hypervisor.

This greatly degrades performance since all instructions must be executed with a layer

of abstraction and the added layer of abstraction is software based.

2.5.4 Network Virtualization

Network virtualization refers to a host of techniques to create a virtual view of the

network. This is typically achieved using encapsulation protocols, like VXLAN and GRE.

Comparable to how hardware virtualization creates the view that the user has access to

their own hardware resources, network virtualization creates the view that the user has

complete control over the entire network and can create arbitrary network topologies.

2.6 Compute Environments

A compute environment is some interfaceable realization of a Turing machine (e.g. con-

tainers and VMs are the most popular examples) [17]. This notion is intrinsically related

to virtualization and VMs, since VMs are one type of compute environment. The fol-

lowing enumeration of the environments are sorted by decreasing resource overhead,

flexibility, and isolation.


Figure 2.1: The relative weights of different characteristics of the various compute en-vironment. Isolation refers to how isolated the environment is. Performance refers tohow many system resources are used to perform actual work, as opposed to being usedfor virtualization, i.e. the opposite of overhead. Flexibility refers to how much flexibil-ity the user has. Here, BMs perform poorly since there hardware and kernel is fixed.Virtualization refers to the amount of virtualization being performed.


Virtual Machine

Container

Serverless

Increasing amount of virtualization

Virtual Hardware

Virtual OS

Virtual Server

Physical Machine

Figure 2.2: Different types of resources sorted by increasing flexibility and isolation.

2.6.1 Bare Metal Machines

A bare metal (BM) machine is a physical server that is provisioned in its entirety. A

BM has the best performance, and lowest overhead. This is due to BMs not running any

virtualization stack. Incidentally, this also makes them less flexible, since their hardware

(instruction set architecture) and typically OS are unchangeable. More importantly,

BMs lose the host of benefits associated with virtualization, chiefly the decoupling of the

hardware (and its inevitable failures) from the state of the machine.

2.6.2 Virtual Machine

Virtual machines (VM) are logical slices of system resources that run on top of physical

machines. VMs run a full operating system (OS) and have the same capabilities as

physical machines. The process of booting and running a virtual machine is as follows:

• a hypervisor (a special program that logically partitions the host to create VMs)

is run on the host machine either natively on the bare metal or as a process

• the hypervisor takes an OS image file and a resource allocation, e.g. 2GB RAM,

20GB storage, and 1 core


• if the hypervisor is run natively, it can use hardware-assisted virtualization and the

VM will have near native performance; otherwise the hardware must be emulated

and the performance will be significantly worse

Virtual machines require their own OS kernel and hardware components that the OS

needs to run. This allows VMs to provide high isolation and flexibility, at the cost of a

large overhead.

2.6.3 Containers

Containers are another environment of computation. There are many distributions of

containers, such as LXC, OpenVZ, and Docker. Unlike VMs that require emulation of

hardware components and their own OS, containers share the OS kernel with the host

and other sibling containers. This reduces the resource overhead and provisioning time

associated with containers. Containers are created by the OS creating a logical partition

of system resources and providing an overlay filesystem.

2.6.4 Lambda

Lambda (also referred to as serverless computation) extends the goals of containers [24].

Whereas containers only share the OS, the serverless model goes as far as to share

language runtimes. The user specifies a callback function that gets invoked when an

event occurs. Compared to containers, this model has an even smaller resource footprint

and response time. However, this comes at cost of flexibility, since user must choose from

a limited set of preconfigured environments.

2.6.5 Discussion

The different platforms have different tradeoffs with regards to resource overhead, flexi-

bility, and isolation. With regards to Figure 2.2, resource types should sensibly only be

nested in something below it. Therefore, the kind of substrate resource type we have

available, will determine what kind of resource can be provisioned. For instance, if the

resource management systems can only provision VMs, then there is no way to provision

a bare metal server; however, they can provision containers on top of these VMs.


2.7 Software-defined Networking

Software-defined networking (SDN) is an alternative model for computer networking. The

defining characteristic of SDN is the separation of high level control logic (control plane)

from the low level forwarding actions (data plane). Unlike traditional networking, where

routing is distributed, SDN exposes a centralized view of the network to the control plane.

This has a host of benefits including: agile network (re)configuration, faster convergence

and improved debugging. There are many different flavors of SDN, with OpenFlow [31]

being the most popular.

2.7.1 OpenFlow

OpenFlow is one realization of the SDN paradigm. OpenFlow is a protocol for communi-

cating between the control and data plane. The control plane (also called the controller)

is a logically centralized entity that determines the routing of the packets [31]. Currently,

the most popular controllers are Ryu [36], Floodlight, OpenDaylight [32], and ONOS [15].

The data plane consists of hardware and software, OpenFlow-compliant switches. For a

switch to be OpenFlow compliant it must be able to support the following:

• forwarding matching packet through a specified port

• encapsulating and sending packet to controller

• dropping a packet

• communicate using the OpenFlow protocol

A typical interaction between the controller and a switch is as follows:

1. A switch receives a packet that does not match a flow.

2. The switch sends the packet to the controller.

3. The controller determines how to route the packet.

4. The controller installs a flow, i.e. a match and a corresponding action, on the

switch.

5. Subsequent, matching packets are forwarded by the switch based on the installed

flow.


Matching headers can be defined for arbitrary header values. This includes any field

in an Ethernet frame or in the header of any encapsulated protocol. The policy with

regards to flow installation can be reactive or proactive, which determine whether rules

are installed before or after seeing a matching packet, respectively.

2.7.2 Network Function Virtualization

Network function virtualization (NFV) is an emergent area where network functions

such as load balancers and firewalls, which typically existed as hardware appliances, are

being replaced with virtual analogues. The goals of NFV are more flexible provisioning

and reduced costs. NFV critically requires server virtualization and cloud computing to

elastically provision and deprovision resources. Additionally, it requires SDN to chain

arbitrary virtual network functions to form new services.

2.8 Resource Management

Resource management refers to management of exhaustible resources. The term resource

is ambiguous on account of its varied usage. Specifically, resource can refer to:

1. system resources, e.g. compute in terms of number of cores (or perhaps even more

granularly, FLOPS or IPS), storage in terms of bytes, and network in terms of

bandwidth

2. standalone processing units, or cluster of processing units that:

• can be provisioned

• encapsulate some amount of system resources

An example would be a physical server that contains 4 cores, 32 GB of memory, 1

TB of hard disk storage, and access to 25 Mbps of uplink and downlink network

bandwidth. These can be classified as physical or virtual. In this view, servers,

FPGAs, GPUs, microcontrollers, switches, routers, and links are considered phys-

ical resources. Virtual machines, containers, and software switches are considered

virtual resources.

By extension, resource management in the former sense refers to managing system

resources as done by an operating system. In the latter sense, resource management

refers to management of processing units, e.g. a hypervisor managing virtual machines,


System Resources

Processing Units

Cluster(s) of Processing Units

Type 2

Type 1

Figure 2.3: The two different views of the terms resource and resource management.Type 1 refers to system resources and management thereof. Type 2 refers to processingunits and clusters of processing units and their management.

or a meta entity that manages multiple hypervisor. See Figure 2.3 for a visualization

of this. In the context of this document, resources typically refers to the latter notion.

However, if the usage is ambiguous, the intention will be specified.

A resource management system (RMS) houses a specific resource management ap-

proach. Since resources are finite, there is contention amongst players. The goal of a

resource management system is to provide an interface to be able to provision and depro-

vision resources to the user and arbitrate contentious requests. In the context of cloud

computing a RMS exposes the API that allow individual resources to be provisioned.

2.8.1 OpenStack

OpenStack [37] is a cloud platform supporting IaaS. As a complete IaaS solution, Open-

Stack provides all of the necessary resource management components, e.g. nova for

managing virtual machines, neutron for managing networks, and swift and cinder for

storage. The compute management includes support for hypervisors like KVM [27], Xen

[13], and QEMU[14].

It also has an image registry, an IAM system, an orchestration engine. Addition-

ally, the system is in a continuous state of development, with new features like Docker

container support. Other RMS are CloudStack and Eucalyptus.

2.9 Software-defined Infrastructure

SDI is a resource management architecture proposed by Kang et. al that converges the

management of compute, network, and other heterogeneous resources and maintains a


global topological view of all resources, virtual and physical [26].

Software-defined infrastructure (SDI) is an approach to resource management that

converges the management of compute, networking, and other heterogeneous resource

types [26]. Additionally, SDI attempts to virtualize all resource types. SDI extends the

notions of SDN in two ways:

• decouples the control and actuator logic

• exposes a centralized view of all nodes

SDI is a hierarchical resource management system whereby different resources are

controlled by different resource controllers. Each of these resource controllers then inter-

face with a centralized manager and a monitoring and analytics manager. For instance,

the SDI resource management system (RMS) natively uses SDN to manage it networks.

This allows for the following advanced use cases:

• traffic steering

• service chaining

See figure 4.8 for a conceptual view of the SDI RMS.

2.10 Smart Application on Virtual Infrastructure

Smart Application on Virtual Infrastructure (SAVI) is an initiative for building a large

scale testbed for designing and testing future application platforms. The software-defined

infrastructure RMS was researched and developed within the context of the SAVI project.

The SDI RMS is natively available only on the SAVI testbed. The SAVI testbed critically

leverages the OpenStack [6] and OpenFlow initiatives to manage compute and network

resources uniformly.

2.11 Orchestration

Orchestration, refers to the provisioning, modification, and deprovisioning of compute,

network, and storage resources (or complex resource types composed of these primitive

resource types, e.g. a virtual machine) to form a coordinated service. In general, orches-

tration includes both: 1) the mechanism for provisioning and deprovisioning resources,

and 2) the decision making logic that determines when to provision and deprovision

resources, in order to satisfy some high level objective, such as policy requirements or


SDI Resource Management System

SDI Manager Topology Manager Monitoring and Analytics

Resource A Controller Resource N Controller

Resource A Resource N

Open Interfaces

External EntitiesPhysical Resources

Virtual Resources

Design of SAVI RMS

Figure 2.4: A conceptual view of the SDI RMS.


SDI ManagerTopology Manager Monitoring and Analytics

OpenStack OpenFlow Controller

Manage Cloud

Manage Networks

Resource virtualization

PhysicalNetwork Resources (e.g. switches)

PhysicalCompute/ Storage Resources (e.g. servers)

Virtual Network Resources

Virtual Compute Resources, e.g. VM instance

OpenFlow Protocol

Network Connectivity

Monitoring Data Collection

Implementation of SAVI Node

Open Interfaces

Figure 2.5: A conceptual view of a SAVI node modelled after the SDI RMS.


constrained optimization over some metrics. Viewing orchestration as an optimization

problem requires that the domain be modelled, which may be non-trivial, and difficult to

generalize, e.g. if trying to optimize for data redundancy. In related literature, systems

that perform 1), 2) or both, all come under the purview of orchestration. In this work,

orchestration refers to the former, i.e. the mechanism for provisioning and deprovisioning

resources.

The coordination of separate resources poses challenges due to the heterogeneity of

cloud resources. An orchestration system interfaces with one or many RMSes. A RMS

typically exposes an interface to provision and deprovision resources. Orchestration sys-

tems take high level requests from users, typically in the form of a topology specification

file. They then use the interface exposed by the RMS to provision resources and compose

them together. In some cases, an orchestration system may have to perform additional

steps to satisfy the user request.

Chapter 3

Related Work

In this chapter we present the related works. We group the related works as orchestration

systems, configuration systems, or overlay SDN on legacy (non-SDN) approaches. The

related works reflect the two objectives of this thesis, namely, to enable SDN capabilities

on legacy systems, and to orchestrate over multiple clouds.

3.1 Single Cloud Orchestration

The following works consider orchestration over a single cloud.

3.1.1 Secondnet, VDC Planner

Guo et. al [22] consider the orchestration of server and network resources while mini-

mizing cost. They propose a virtual data center (VDC) as a joint abstraction of network

(bandwidth) and server resources. The objective is to find an embedding of this VDC,

over the physical substrate. The VDC embedding problem is NP-hard (i.e. a generaliza-

tion of the bin packing problem. The author proposes a greedy heuristic to achieve near

optimal allocations.

The VDC Planner is an orchestration system proposed by Zhani et al. [47]. Like

Secondnet the work focuses on the VDC embedding problem. However, it additionally

considers the problem of minimizing energy usage while utilizing VM migrations to satisfy

fluctuating user demands. The shortcomings of these works is that they model clouds

as well-defined objects with a standardized interfaces and capabilities. Our work is

explicitly positioned to design a system that can work with the actual, highly nuanced

cloud landscape. Additionally, these works consider the optimality of resource allocation

from the perspective of cloud providers. This thesis looks to design a system from the

24

Chapter 3. Related Work 25

user’s perspective. This includes providing capabilities, such as SDN, in an overlay

manner, when native support does not exist. Finally, these solutions only consider the

allocation of VMs; whereas we look to design a system that considers both VMs and

containers.

3.1.2 Borg

Borg is a large scale cluster management system [39] that is used internally by Google.

Borg is concerned with the scheduling of jobs and orchestration of containers. As such,

it’s design considers job failures, worker node failure, master node failure, the allocation

of jobs to optimize certain variables, and different priorities amongst jobs. The system

is designed to support a wide variety of objectives, e.g. minimizing running time and

minimizing total cost. Borg is very effective for the problem of container orchestration.

However, the scope of Borg is limited to container orchestration and also to single autho-

rization domains. As such, it cannot be considered an alternative orchestration system

since it excludes a large number of tasks that we want to consider under the purview of

our orchestration system.

3.1.3 AWS: CloudFormation

The AWS orchestration system is very comprehensive, and covers all the services provided

by AWS [1]. However, AWS doesnt expose a SDN interface, which prevent users from

performing advanced traffic steering. Since their API must be comprehensive, some

operations end up being complex to specify. Additionally, from the perspective of the

API design, some parts of the API are inconsistent. This can make it difficult for users

to perform certain tasks. Also, CloudFormation reflects the view that each topology file

specifies an isolated collection of resources. On account of this, it becomes difficult to

interface with existing components, e.g. existing VMs.

3.1.4 OpenStack: Heat

Heat is the orchestration engine that is developed as part of OpenStack [6]. Heat is

largely modelled after CloudFormation, and therefore it shares many of its features.

For OpenStack, Heat provides an easy to write, maintain, and read modelling language.

However, like CloudFormation, the capabilities of Heat do not include any of the advanced

SDI capabilities. Furthermore, Heat also does not support interfacing with existing

components.


3.2 Multi Cloud Orchestration Tools

The following works consider orchestration over multi clouds.

3.2.1 Cloud Resource Orchestration: A Data-Centric Approach,

Declarative Automated Cloud Resource Orchestration

Liu et al. have proposed a two part orchestration system [29], [28]. The first aspect

of the work is to model cloud resources as structured data. This allows problems from

the domain of cloud resource orchestration to be mapped to problems in the domain of

database management. Subsequently, cloud resources can be queried by a declarative

language, and updated with well-understood transactional semantics (e.g. atomically

perform a set of operations). The second part of the work is concerned with formulating

orchestration tasks as constrained optimization problems. These works exclusively focus

on a model cloud, while only considering server and network resources. By comparison,

this thesis is concerned with enabling multidimensional orchestration, while delegating

the decision making of resource allocation to the user.

3.2.2 Networked Cloud Orchestration: A GENI Perspective

This work by Baldine et al. considers a unified control layer layer over a heterogeneous,

and distributed landscape [12]. The authors document their progress in building a uni-

fied control layer that allows users to provision virtual infrastructure slices consisting of

compute, storage, and network resources, over distributed infrastructures and enables au-

tonomic management thereof. The work proposes a quasi-declarative modelling language

that allows users to express resource objects, their properties, and how different objects

are related. The work also considers the stitching problem, whereby their proposed orches-

tration system must stitch ”different pieces of virtualized resources from geographically

distributed compute, network, and storage substrate into a single connected configura-

tion” [12]. The system works by dividing orchestration tasks into subtasks based on

where the resources would be provisioned and subsequently determines each individual

resource allocation.

3.2.3 Greenhead

GreenHead is a conceptual extension of VDC Planner that applies heuristics to perform

VDC embedding across a distributed infrastructure [11]. As in the case of Secondnet and


VDC Planner, Greenhead targets abstract datacenters with standardized capabilities.

Although, their evaluation shows impressive results, the evaluation is ultimately based

on the simulations of requests and of the datacenters. As such, it does not consider the

design for interfacing with multiple heterogeneous clouds. Furthermore, Greenhead only

targets VMs.

3.2.4 CloudMF

Cloud Modelling Framework (CloudMF) is an approach for multi cloud orchestration

proposed by Ferry et al. [19]. CloudMF leverages model driven engineering (MDE)

to create individual models of the different clouds. It then uses these models along

with a modelling language to reduce the complexity of expressing various topologies

across multiple clouds. This approach is effective in orchestration across multiple clouds.

However. it only supports provisioning of VMs and as such CloudMF is unsuitable for

network resource orchestration and traffic steering.

3.2.5 Multi-Cloud Brokering

Lucas-Simaro et. al [30] propose a mechanism that allows application deployments across

multiple clouds. This approach works by brokering user requests and cloud capabilities.

As a brokering system, this system is limited to what is provided by the substrate cloud

layer. This restricts extensibility to new resource types and private resources.

3.2.6 Terraform

Terraform [8] is a project that allows users to specify infrastructures that span multiple

providers. These providers cover most Iaas, Paas, and Saas providers. However, this

requires knowledge of many parameters such as, image identifier, before realizing the

template. This presents usability challenges since determining the exact values of objects

like image identifier can be tedious.

3.2.7 The Topology and Orchestration Specification (TOSCA)

The Topology and Orchestration Specification (TOSCA) [9] is declarative language for

describing applications and services that span multiple administrative domains. There

exist parsers and services that translate services written in TOSCA to topologies that

span multiple clouds (such as Cloudify [3]). Its cross cloud capabilities are similar to Ter-

raform; however, the language itself is provider-agnostic. As a specification for existing


and future clouds, it has limited support for extended SDI features such as unified view

of heterogeneous resource types.

3.3 SDN over Legacy Systems

Here we describe the various approaches for achieving SDN over non-SDN systems. These

approaches attempt to provide SDN like capabilities in an overlay manner.

3.3.1 Fibbing

Fibbing is an approach for providing centralized routing capabilities [41], [40]. Fibbing is

different from other SDN approaches, chiefly OpenFlow, in that it does not propose a way

to explicitly create a centralized management plane. Instead, fibbing works with the exist-

ing distributed routing protocol, open shortest path first (OSPF). This approach presents

the user with a centralized view of the topology. In order to make routing changes, fib-

bing introduces fake nodes and links. These fake nodes and links are used to maneuver

hosts into installing arbitrary forwarding rules. This is similar to SDN where you install

arbitrary rules; however, SDN controller explicitly push rules to switches. Specifically,

an agent that knows the global topology sends fake OSPF packets (corresponding to

these fake nodes and links). This approach combines the benefits of centralized routing,

namely ease of use, with the benefits of distributed routed, namely robustness and fault

tolerance.

3.3.2 Ravello Systems

Ravello Systems [10] is a company that created the HVX hypervisor for orchestration over

multiple public clouds. HVX is offered as a SaaS, and the technology is closed source.

The following has been inferred from some high level descriptions of their product.

HVX is a distributed hypervisor that uses a trap-and-emulate like approach to per-

forms dynamic translation of guest OS instructions. They also use a similar approach

to process networks packets. Since all packets are being passed through a logically cen-

tralized control plane, they can do arbitrary switching of packets. Additionally, they

use some encapsulation protocol (perhaps VXLAN) to allow the user to create arbitrary

network topologies. Specifically, unlike native offerings of cloud providers (which only

enable network layer, i.e. L3), this allows topologies with regards to layer 2 (L2), i.e. the

data link layer. Yet, these network topologies are based on traditional networking stack

and they do not expose a mechanism for traffic steering or dynamic service chaining.


3.3.3 OpenContrail

OpenContrail [5] is a network virtualization approach by Juniper networks. OpenContrail

supports the creation of arbitrary virtual L2 network over physical networks. Further-

more, they also support NFV service chaining, via a domain specific modelling language.

However, as a network virtualization platform, this only tackles half of the problem.

Specifically, there is no support for orchestration of compute nodes.

3.4 Configuration/Orchestration Tools

3.4.1 Salt Cloud, Ansible

Ansible [2] and SaltStack [7] are similar projects and primarily focus on configuration of

nodes. They both provide a high level language to specify configurations to apply. Indeed,

Vino uses Ansible to create the cloud and deploy applications. These, by themselves,

lack features to statefully manage resources.

Chapter 4

Design of Multidimensional

Orchestrator

4.1 Overview

The goal of this thesis is to design and implement a system that extends SDN capabilities

to heterogeneous infrastructures and enables orchestration of complex objects. In section

1.2 we considered a high level set of objectives for this system. In this chapter, we will

design a system that realizes these objectives. In an attempt to produce a design that

is uninfluenced by the implementation considerations, we have separated the design and

implementation and placed them in separate chapters. This chapter documents the

iterative approach in designing the orchestration system called, virtual infrastructure

orchestrator, or simply Vino. This chapter answers the following questions.

1. What is orchestration? What dependencies does the orchestrator have on the un-

derlying RMS? How to model the orchestration task?

2. How can we interface with and orchestrate over a single legacy cloud?

3. How can we interface with and orchestrate over multiple legacy clouds?

4. How can we interface with and orchestrate over an unmanaged infrastructure?

5. How can we provision containers?

6. How can we perform arbitrary configuration of nodes?

We distinguish between two types of infrastructures, managed and unmanaged. A

managed infrastructure is a collection of resources that are controlled by a RMS and can

30

Chapter 4. Design of Multidimensional Orchestrator 31

Web Server 1 Web Server 2 Web Server N

Monitoring Manager

Upper threshold crossed?

Lower threshold crossed?

Provision VMs

Deprovision VMs

Yes Yes

Compute Manager

Figure 4.1: An example of how different components, e.g. the compute manager, andmonitoring manager must be coordinated to realize a complex application, e.g. an au-toscaling web server deployment.

be provisioned by users. As such, all public clouds are managed. By contrast, a set of

standalone physical servers are unmanaged. We describe an infrastructure as legacy if it

only supports distributed IP networking.

Across the literature on cloud computing, orchestration and resource management are

used in varying ways. For the purposes of our work these terms are defined in Chapter

2. Additionally, the term substrate refers to the combination of the resources (i.e. the

objects of resource management) and the RMS (if one exists).

4.2 Orchestration

4.2.1 Overview

Orchestration refers to provisioning resources and connecting them to realize more com-

plex objects. Typically, this requires interfacing with multiple components and program-

ming them in a coordinated way. For instance, an auto scaling web server deployment

would require interfacing with the compute manager to provision new virtual machines

and with the monitoring system to detect when machines should be provisioned or de-

provisioned. See Figure 4.1 for an example.


4.2.2 Requirements on the Substrate

The key requirements for an orchestration system are that it have access to infrastructure

resources, such as the compute, network, storage, and monitoring data. Orchestration in

this sense can be viewed as the sewing together of resources exposed to the orchestrator.

For instance, reconsider the example of an auto scaling web server. To orchestrate this

deployment, we must have access to the monitoring data; without which, this arrange-

ment cannot be achieved. Therefore, in order to orchestrate something, the corresponding

capabilities must be provided either:

• natively by the underlying RMS

• as an overlay service

Thus, the creation of the overlay service is distinct from the orchestration requiring

that service; however, an implementation may combine these phases. In the following

section, we will consider the requirements of the underlying RMS, and how deficient

RMSes can be supplemented to enable the desired orchestration.

4.2.3 Modelling the Application

An orchestration system requires a modelling language (textual or visual) that allows the

users to express their application. Like the design of an API, the design of a modelling

language greatly determines whether it facilitates or hinders the workflow. The following

considerations will help us create a more effective modelling language.

Language Type

Visual languages provide a more intuitive interface by facilitating the discoverability

of capabilities. For instance, in a visual interface, we can have a button called Create a

Virtual Machine, which when clicked, creates a circular element representing a VM. Thus,

visual languages can improve discoverability of features and provide direct feedback on

user actions [18]. Another benefit of a visual representation is that the entire topology

can be understood in a single glimpse. However, both of these benefits only apply if the

topology is small or simple. If a modelling language supports a large number of features,

then the resulting interface may become overly-complex, e.g. if there was a button

for each capability of the system. The same applies for the size of the topology, since

a large topology would not fit on a single screen, and may become incomprehensible.

Furthermore, most existing modelling platforms primarily use a textual language. So

from a user’s perspective, a visual language would require learning a new type of interface.


One benefit of a textual representation is that it can leverage the mature design of

other textual modelling languages like TOSCA and the Heat template language. Unlike

visual representations, text files are trivial to serialize. This allows users to perform diff,

search and replace, etc. Users can modify and reuse template files. With the exception

of instant feedback and discoverability, textual representations are superior to visual

representation. For this reason a textual language is chosen.

Model

This section considers how an application topology is modelled and represented. A

modelling language implies both the language syntax and semantics. We have the option

of choosing an existing language or creating a custom language. For a custom language,

we can choose an arbitrary mapping from the syntax to the semantics, since we would be

writing the parser. However, for an existing language (i.e. when using an existing parser),

we are bound by the semantics of the language. For instance, consider the code fragment,

[1,2,3,4] . If we write a custom parser, we can interpret this as an associative array,

whereby pairs of elements correspond to key value pairs. This is valid JSON 1, and a

JSON parser interprets this as a list with the elements 1 , 2 , 3 , and 4 . As

in the case of a new language, we can define a custom mapping from the syntax to the

semantics, e.g. interpret this as an associative array by defining pairs of elements as

key-value of pairs. However, this would require additional logic and extra time and space

overhead due to an extra phase of parsing, i.e. due to parsing the output of the JSON

parser. Furthermore, although we can define key-value pairs as above, {1:2, 3:4} ,

better maps to user’s intuition of how key-value pairs should be represented. The point

here is that there are many ways to model an application; however, some representations

are better with regards to resource usage and user experience.

In general, there is no restriction on either the nodes or the connection between them.

Therefore, we choose to model the application as a graph. An application can be fully

described as the 2-tuple of the list of nodes and the list of connections between the nodes

(this includes each node’s properties and each connection’s properties). To represent

these in a succinct manner, we use the following conventions. We distinguish between

scalars, i.e. number and strings, and containers, i.e. lists and associative arrays (also

called objects). Containers can hold scalars or other containers. A list is represented as

follows:

- 1

1JSON is a data serialization language based on the JavaScript language’s syntax for representingvarious data types. This will be discussed in more detail in the next chapter.


- 2

- 3

An associative array is represented as follows:

name: foo

type: bar

Based on this, we get the following representation of a graph of nodes.

-

- nodeA

- nodeB

-

- nodeB

- nodeC

The application to be modelled cannot wholly be represented as above. Specifically,

nodes and links have additional properties that must be specified. We can achieve this

by creating a separate list of node objects that contain each node’s properties.

nodes:

-

name: nodeA

-

name: nodeB

-

name: nodeC

edges:

-

- nodeA

- nodeB

-

- nodeB

- nodeC

However, links may also have properties. Specifically, a property on the link from

nodeA to nodeB, may not be the same as that from nodeB to nodeC. Thus, we can

represent it as follows:

nodes:

-


name: nodeA

-

name: nodeB

-

name: nodeC

edges:

-

src: nodeA

destination:

-

endpoint2: nodeB

-

src: nodeB

destinations:

-

endpoint2: nodeC

However, this becomes somewhat unreadable. An alternative would be:

nodes:

-

name: nodeA

-

name: nodeB

-

name: nodeC

edges:

-

endpoint1: nodeA

endpoint2: nodeB

-

endpoint1: nodeB

endpoint2: nodeC

At the cost of some redundancy, this allows for a more intuitive representation.


4.3 Resource Management Overview

A resource management scheme must consider how different resources should be managed,

i.e. the interface exposed by each resource controller, and how resources should be

provisioned, i.e. are resources provisioned on a managed middleware layer or directly

on a substrate. We will first discuss these considerations and then iteratively design a

system that satisfies our requirements.

4.4 Resource Provisioning Model

The resource provisioning model refers to how resources are provisioned and deprovi-

sioned on heterogeneous infrastructures. Resource provisioning, especially provisioning

VMs, is a key consideration, because it affects the design and capabilities of the orches-

trator. Since, we are taking an iterative approach to designing the requisite system, for

the initial iterations, we will only consider public clouds. Furthermore, no major public

cloud provider exposes a SDN interface. Therefore, in our discussion, public cloud is

synonymous with legacy cloud.

We are designing a system that extends the native capabilities of legacy clouds. This

can be achieved by either natively supplementing the existing RMS or by providing

the capabilities in an overlay manner. Our design must not assume that we have any

privileged access to the cloud, or its API; therefore, our only option is to provide the

functionality in an overlay manner. In general, our system should be designed from the

perspective of a third-party module. In this view, the choice of resource provisioning

model also affects the thickness (i.e. the amount of logic contained in) of the correspond-

ing resource provisioning middleware, between the user and the cloud (see Figure 4.2).

There are three main models for provisioning resources, as explained below.

4.4.1 Native Provisioning

The simplest approach is to use the native API exposed by the cloud provider (see

Figure 4.3). This has the benefit of working with a stable and mature API, that is

directly provided by the cloud provider. As new capabilities are added to the cloud, the

API is updated to reflect this. This, however, leads to a strong coupling between the

orchestration system and the specific cloud. This is suitable if the cloud provides all the

requisite functionality, and the user only wants to interface with one cloud.


Public Cloud API

Resource Provisioning Middleware

User Request

Varying thickness

Figure 4.2: A conceptual view of how the resource provisioning middleware interfaceswith the user and the cloud.

Figure 4.3: A conceptual view of the native provisioning model.


Figure 4.4: A conceptual view of the delegated provisioning model.

4.4.2 Delegated Provisioning

A second approach is to have a thin translation layer that takes user requests and maps

them to requests comprehensible by the resource management system, and vice versa for

responses by the cloud (see Figure 4.4). The middleware delegates most of the subtasks

associated with provisioning to the cloud RMS. However, this model keeps some state

information. For instance, this model would track the identifier for each VM that was

created. This system, also achieves uniformity across different resource pools by abstract-

ing away differences in APIs. However, compared to native provisioning approach, this

requires us to write drivers for each cloud we will interface with.

4.4.3 Fully-managed Provisioning

This approach involves creating a fully managed middleware, typically, in the form of a

cloud management platform like OpenStack running on top of the provisioned resources

(see Figure 4.5). If we interface with multiple substrates, e.g. multiple public clouds,

then this provides a uniform interface without having to write any drivers for the clouds

(e.g. as in the delegated approach). This has the benefit of providing a very powerful

management system with a rich feature set, e.g. to manage VMs, images, storage blobs.

However, if this is applied to public clouds, where we generally cannot provision BMs

(i.e. only VMs or containers), this effectively shifts the virtualization stack up. That is

because, OpenStack is primarily intended to be run on bare metals. Therefore, when a

user requests to provision a VM, it will boot the VM on the VM that constitutes the

middleware. See Figure 4.6 for a visualization of this phenomenon.

This greatly degrades performance since user requested VMs must be emulated (cf.

virtualization). These shortcomings can be overcome if the OS and hardware support


Figure 4.5: A conceptual view of the full middleware provisioning model.

Figure 4.6: The upshifting of virtualization stack when OpenStack is deployed on virtualmachines.


nested virtualization [20]. Alternatively, this can be overcome with binary translation,

whereby guest OS instructions are translated on the fly by the hypervisor. However, these

technologies are not mature, and not always supported. This approach is best suited when

working with baremetals. However, this may not always be possible, especially in the

context of public clouds.

4.4.4 Discussion

The fully-managed approach allows for uniformity across all resource pools. However,

this approach is infeasible for allocating VMs on public clouds. The delegated approach

provides a thin translation layer between the user and public clouds. If the user in-

frastructure requests span private and public clouds, it is best to use a hybrid resource

management approach, whereby resources on private clouds are provisioned using the

full-managed approach and the resources on the public clouds are managed using the

delegated approach.

4.5 Organization of Resource Controllers

A RMS typically consists of multiple resource controllers. The interface exposed by the

resource controllers and how the different controllers are organized is an open design

consideration. Here we will consider how OpenStack and SDI organize the control and

management plane.

4.5.1 OpenStack RMS

The OpenStack RMS consists of heterogeneous resources, such as compute, network, and

storage, and corresponding resource controllers (see Figure 4.7) . OpenStack employs a

controller-agent model to manage resources. Specifically, when OpenStack is deployed on

a set of physical servers, one server is assigned the role of controller, while the others are

assigned the role of agent. The controller server runs the various controllers (processes).

Controllers expose functionalities in the form of APIs and delegate requests to the corre-

sponding agent. For instance, the compute controller exposes the API to provision VMs.

Requests to provision VMs are received by the controller and delegated to a compute

agent.

Whereas the controllers provide the management interface, the agents provide the

resource itself. A resource agent consists of the raw resource and a physically collo-

cated management process. For instance, referring back to the example of provisioning


VMs, the agent would run a hypervisor and the management process would translate

the controllers requests into requests comprehensible by the hypervisor. The hypervisor

upon receiving the request would spawn the VM. Note, that the logically singular agent,

may in fact run a stack of processes, as is the case with compute (i.e. nova agent →libvirt → hypervisor). The difference between the agent management process and the

controller is one of scope and abstraction. The agent typically manages a single phys-

ical machine; whereas a controller manages multiple agents. Also, the controller may

be capable of receiving abstract requests (e.g. driven by policy or trigger events). By

contrast, the agent management process can only understand relatively simple requests

of provisioning, updating, and deprovisioning resources.

In addition to the controllers and services concerned with resources, there are other

services that make state changes without directly managing resources. This includes the

identity and access management (IAM) system, the telemetry system, and the orchestra-

tion system. Although agents are most commonly physical servers, they can be anything

capable of supporting an OpenStack deployment, e.g. physical servers, microcontrollers,

or VMs.

4.5.2 Software-defined Infrastructure RMS

The SDI RMS (see Figure 4.8) is built on top of the OpenStack RMS. Therefore, the

resource management broadly follows the controller-agent model of OpenStack. However,

the design diverges in the following key ways:

1. The OpenStack model only manages compute nodes and implicitly relies on net-

working that exists. The SDI RMS by contrast, maintains a global network topol-

ogy, of all virtual and physical servers, switches, and links.

2. The SDI RMS converges the control and management of compute and network re-

sources, through a centralized SDI manager. The SDI manager has oversight over

all resource and resource agents and can decree certain actions be taken by indi-

vidual controllers. This multi-layer management allows most tasks to be handled

by the designated controller, while allowing specific tasks (e.g. those that require

the global state information) to be handled by the SDI manager.

3. The SDI RMS replaces the OpenStack networking stack with one based on Open-

Flow SDN. This combined with a global topology allows it to have fine grained

networking control. For instance, the SDI manager can install rules to redirect


Figure 4.7: A conceptual view of the OpenStack RMS.


SDI Resource Management System

SDI Manager Topology Manager Monitoring and Analytics

Resource A Controller Resource N Controller

Resource A Resource N

Open Interfaces

External EntitiesPhysical Resources

Virtual Resources

Design of SAVI RMS

Figure 4.8: A conceptual view of the SDI RMS.

packets in very specific ways. This enables a host of capabilities such as service

chaining and traffic steering.

4.6 Vino Version 1: SDN Orchestration Over a Sin-

gle Legacy Cloud

4.6.1 Initial Design

Having provided the context and relevant discussions, we will now design the first version

of Vino. The objective is to simply perform SDN orchestration over a single legacy

cloud. Here, we have two degrees of freedom, the resource provisioning model, and

the organization of resource controllers. With regards to the organization of resource

controllers, the OpenStack model does not provide SDN functionality. In addition, since


the SDI model is built on top of the OpenStack model, there is no benefit of choosing

the OpenStack model.

Now, we must choose the appropriate provisioning model. Ideally, we want an ap-

proach that maximizes performance, ceteris paribus. In considering public clouds, the

fully-managed model would provide poor performance. Therefore, we must choose be-

tween the native and delegated approaches. The delegated and native approaches pro-

vides comparable performance. Additionally, the delegated approach abstracts the native

API exposed the cloud provider. Since we need to perform additional steps with regards

to the SDN orchestration, the added abstraction layer can encapsulate this additional

functionality. Therefore, this design will use the SDI model with the delegated provision-

ing approach.

4.6.2 Adapting the Design

Now, we have a skeleton of the design. However, the SDI RMS only works as a native

RMS; whereas, we require these capabilities in an overlay manner. Let us consider how

the SDI RMS natively provides SDN capabilities, and then we can consider how these

capabilities can be provided in an overlay manner.

• A node is provisioned through the compute agent; its virtual MAC and IP address

is registered with the SDI manager.

• The SDI manager maintains a global topology of all connected servers and switches.

• When a VM on one physical host tries to communicate with a VM on another

physical host or to a node in an external network, the packet is sent to a virtual

OpenFlow switch running on the hypervisor. The switch would check its flow table

and after not finding a match would send the message to the OpenFlow controller,

which would subsequently send it to the SDI manager. The SDI manager would

determine the optimal route (i.e. shortest path) and install flows on the initial

switch as well as other switches in the path.

• The packet would then get forwarded to its destination.

• This allows communication between any two VMs, or between VMs and external

networks. This can also be used to install newer, higher priority flows, for instance

to create a dynamic service chain.

Let us analyze each component of the control stack and how it can be used to create

our overlay SDN.


SDI Manager, Topology Manager

The SDI manager is responsible for accessing the global topology, and installing rules

based on it. The SDI manager requires the topology manager for topology information

and the SDN controller to actually install the rules. We can just run a local copy of the

SDI manager and topology manager to provide this functionality.

OpenFlow Controller (Ryu)

The OpenFlow controller installs the rules created by the SDI manager. Separating the

OpenFlow controller from the SDI manager allows us to add more controllers to scale

up. Other OpenFlow controller could also be used. We can run an OpenFlow controller

to provide this functionality.

Switches

Switches are the last element in the networking control stack. Whereas, the native SDI

deployment uses hardware switches to do packet switching, we can use software to do

this switching. Our only constraint is that the switch support OpenFlow. Our options

are:

1. Open vSwitch (OVS) [34]

2. ofsoftswitch [4]

3. Lagopus [43]

We choose OVS because of the maturity and support for the project.

Overlay Networks

The network stack we described cannot manage the native network provided by the

cloud. Instead, we must create our own overlay network. We can achieve this by using

a tunneling protocol. Our options are VXLAN and GRE. Functionally, VXLAN encap-

sulates L2 frames in L4 UDP datagrams; whereas, GRE encapsulates L2 frames in L3

packets. This reduces the the header overhead for GRE tunnels. However, this can also

cause issues, e.g. if a firewall only allows certain transport layer protocols. GRE creates

point to point tunnels, whereas VXLAN creates point to multicast tunnels. Addition-

ally, VXLAN tunnels due to containing UDP headers have higher header entropy and

can allow for better network utilization when there are multiple equal cost routes [33].

For these reasons VXLAN was chosen.


Legacy Controller (Vino)

SDI Manager

Network Controller (Ryu)

Topology Manager

Cloud Driver

VM

Register Port (MAC Address)

OVS

Install Flows

User Request

Provision VMConfigure OVS/ Create VXLANs

Figure 4.9: A conceptual view of the Vino RMS.

4.6.3 Architecture

Let us present the architecture with all the components (see Figure 4.9 for a visualization

of the architecture). Essentially, our design takes the elements of a network: 1) switch-

ing/routing nodes, 2) mediums connecting these nodes, and 3) logic determining how the

routing should be performed and recreates them in a user-controllable, overlay space.

1. There is a management layer consisting of the SDI manager and the topology

manager. There are two controllers, the network controller, i.e. Ryu, and the

legacy controller, i.e. Vino. In addition, there are cloud driver that Vino interfaces

with in order to provision VMs.

2. The user sends a request to the Vino.

3. The Vino controller sends a request to the cloud driver, which provisions resources,

and relays the response to Vino. Subsequently, Vino configures and runs OVS on

the nodes.



SDI Manager


Topology Manager

Cloud Driver

VM


OVS

Install Flows

User Request

Provision VM

Configure OVS / Create VXLANs

Cloud Driver

VM

OVS

Figure 4.10: A conceptual view of the Vino RMS V2.

4.7 Vino Version 2: SDN Orchestration Over Mul-

tiple Legacy Cloud

Here we extend the orchestration systems to work with multiple cloud providers, i.e.

SAVI, AWS, and GCE. Other controllers are organized to work with multiple resource

agents. For instance, the compute controller, nova, can interface with multiple hypervi-

sors by using a middle layer (libvirt) that abstracts the differences between the different

hypervisors. Likewise, we can add more cloud drivers to interface with multiple cloud

providers (see Figure 4.10).

4.8 Vino Version 3: SDN Orchestration Over Un-

managed Resources

4.8.1 Overview

As mentioned before, there are two kinds of substrates that must be accounted for,

unmanaged and managed. Unmanaged resources are standalone physical servers. In



SDI Manager Topology Manager

User Request

Hardware Resources

Virtualize

Figure 4.11: A conceptual view of the Vino RMS V3. This shows how unmanagedresources are brought under the purview of a RMS. The logic surrounding resource man-agement is the same Vino RMS V2.


contrast to managed infrastructures, e.g. public clouds, these resources exist as a disjoint

collections and without a proper provisioning interface. An unmanaged resource after

becoming managed is called a private cloud or a virtual customer premise edge (vCPE).

In order to provision unmanaged resources, they must first come under the purview of a

RMS. Once this is achieved, we can orchestrate over this substrate. This roughly divides

into the following subtasks,

1. manage the substrate

2. orchestrate over the substrate

We have already considered how to orchestrate over a managed infrastructure. The

goal of this section is consider how to manage the substrate.

4.8.2 Types of Virtualization

Managing unmanaged resources is effectively about virtualizing the resources. Here, we

consider the various types of virtualizations that can be applied to physical resources.

Network Virtualization

This refers to running a software switch (typically OVS) on the node(s). Once, we have

OVS, we can connect this node to any other node using overlay tunnels like VXLAN or

OpenVPN tunnels. If other resource pools are involved, this approach creates a logical

L2 network connecting all the nodes. This is comparable to creating a virtual private

network (VPN) that connects the vCPE with the other resources.

Operating-system-level Virtualization

OS-level virtualization refers to running a containerization platform on one or many

nodes. OS-level virtualization refers to running a containerization platform over a cluster

of VMs or BMs, which can subsequently be used to provision containers. Compared with

VMs, containers require fewer resources, have a higher resource utilization, and are easy

to setup. OS-level virtualization includes network virtualization, without which multi-

cloud deployments would not be possible.

Hardware-level Virtualization

Hardware-level virtualization refers to virtualizing the resources such that they can subse-

quently run VMs. This is typically done by either running a hypervisor or an entire cloud


management platform, like the SDI RMS on the node(s). As with OS-level virtualization,

this includes network virtualization.

Discussion

When adding unmanaged resource pools to the fleet, we must always perform network

virtualization. The different levels of virtualization reflect the different capabilities of

the vCPEs, and the different use cases. For instance, both containers and VMs can

support general computation. The tradeoff between the two is about superior security

and flexibility for VMs, and superior resource utilization for containers. Additionally, it

is possible to perform both hardware-level and OS-level virtualization on the same set of

unmanaged physical servers. This can be achieved by running containers inside a VM,

or on different BM machines in the cluster.

4.8.3 Architecture

We need to create a RMS to manage the unmanaged resources. The unmanaged resources

could be a single resource or a collection of them. Once, we have the bare resources

managed through a RMS, then we can interface with the collection of resources like

another cloud. The options of RMSes to run are:

• OpenStack

• SDI RMS

• CloudStack

• Eucalyptus

We choose the SDI RMS, since we can then leverage SDI capabilities natively (since

overlay have some overhead cost). The SDI RMS can be configured remotely.

4.8.4 Modelling the Substrates

The key component in modelling the substrate is specifying the location of the vCPEs.

As mentioned in the discussion on the SDI RMS (section 4.5.1), resource management

consists of agents and the controller and management layer. The management layer runs

the control stack, i.e. where to provision requests, etc., whereas the agent is where the

actual provisioning happens. The following figure shows an example of how we can model

the distributed, unmanaged resource pool.



SDI Manager


Topology Manager

Cloud Driver

VM


OVS

Install Flows

User Request

Provision VM

Configure OVS / Create VXLANs

Cloud Driver

VM

OVS

Container ContainerContainer Container

Figure 4.12: A conceptual view of the Vino RMS V4.

cluster:

controller:

substrate_host: savi

controller_flavor: m1.medium

agents:

-

username: ubuntu

ip_addr: 10.12.1.2

hw_virt: true

4.9 Vino Version 4: Container Orchestration

4.9.1 Overview

Thus far we have primarily focused on orchestrating VMs. However, we would also like

to consider containers, since they offer a lightweight alternative to VMs. Although there

are many flavors of containers, we will focus on Docker containers due to their popularity


and maturity. In addition to being lightweight, Docker containers improve the workflow

in two additional ways. Specifically, they

1. improve packaging of applications and avoiding dependency problems since each

container can encapsulate arbitrary packages and arbitrary versions of packages

2. improve distribution of packaged containers through a global image registry

4.9.2 Architecture

Containers, and specifically Docker, greatly improve the packaging of applications. They

overcome dependency issues related to unmatched dependencies or broken source repos-

itory (i.e. where a dependency is fetched from). Additionally, containers can be used to

achieve much higher resource utilization. Consider, a deployment with two nodes that re-

quire different versions of the same library. Without containers this would require either

creating separate VMs or using an ad-hoc approach to achieve this. Creating separate

VMs can be resource inefficient, since separate VMs have the additional overhead of each

extra OS that must be run. Additionally, there is a minimum size a VM can be; however

containers have no such restriction and can be packed more densely. Therefore, in certain

situations, containers provide an advantage over VMs. Containers are provisioned as per

user requests. The user can specify, which containers must be colocated (i.e. share the

same VM).

In order to orchestrate containers, we leverage native cloud APIs where available.

Regardless, of the availability of native container APIs, the workflow around container

orchestration does not change much. In order to orchestrate containers, we first provision

VMs. These VMs will act as hosts for the containers. We configure these VMs to run the

Docker container engine, and Docker container cluster manager, Docker Swarm. Docker

Swarm exposes an API to provision containers on the different nodes in the cluster.

Chapter 5

Implementation of Multidimensional

Orchestrator

5.1 Overview

This chapter will report on the implementation of the system designed in the previous

chapter. We begin this chapter by presenting implementation considerations that affect

the whole design, such as the choice of programming language, libraries and frameworks

used, e.g. for remote execution. Then, we consider the design and implementation of the

Vino system, i.e. its components and how they interact.

5.2 Programming Language

Here we compare the choice of programming language that were considered. Although,

there are numerous programming language, we only considered two languages, Java and

Python. In this context, when we discuss programming languages we are referring to

the language and the most common interpreter, compiler, or runtime associated with the

language. Therefore, in the following, Java should be interpreted as the Java language

and the Java Runtime Environment (JRE). Likewise, Python should be interpreted as

the Python language and the CPython interpreter. The following discussion will focus

on aspects that distinguish the two languages.

5.2.1 Java

Java is an object-oriented programming language with a large user base and ecosystem

(in terms of 3rd party modules, and publicly available code snippets). Java is a strongly

53

Chapter 5. Implementation of Multidimensional Orchestrator 54

typed (i.e. less likely to perform type conversions) and statically typed (most type

information is known at compile time) language. This means that bugs arising due to

type inconsistencies are easily caught by the compiler. Indeed, Java was designed to

overcome the security issues arising from unsafe C and C++ code. Java provides a

hybrid execution model that compiles code to an intermediate representation, which is

interpreted at runtime.

5.2.2 Python

Python is a programming language that supports procedural and object-oriented pro-

gramming paradigm. Python also has a large user base and ecosystem. Python is a

strongly typed and dynamically typed (type information is determined at runtime) pro-

gramming language. Therefore, bugs that arise due to type inconsistencies cannot be

caught until runtime. Compared to Java, Python runs slower (assuming typical non-

optimized code). In terms of memory and resource overhead, this varies between the two

languages. Python was designed to be easier to read, write, and debug.

5.2.3 Discussion

The key tradeoff between Python and Java is that of ease of development and perfor-

mance. From the perspective of this thesis, it is important to work with a language that

would allow agile prototyping. Therefore, the Python language is chosen.

5.3 Data Serialization Language

Here we analyze the various data serialization languages that are considered. Data serial-

ization languages allows data and data structures to be encoded in a format that can be

stored in a file and/or transmitted over a network. By contrast, programming languages

are primarily intended to express computation on data. Although, one could define a seri-

alization language from a subset of constructs in a programming languages (e.g. JSON is

inspired by JavaScript), the two are different. In our work, a data serialization language

would be needed to model the underlay and application topologies.

5.3.1 XML (Extensible Markup Language)

XML is a data serialization format that has a hierarchical tree like structure. All child

nodes are contained in enclosing opening and closing tags. Properties associated with the


node itself are embedded in the opening tag. XML is intended to be human and machine

readable, and has multiple parser implementations in Python. However, the nested tree

structure makes the language verbose. This affects both human’s ability to visually parse

XML and has a corresponding computation cost in increased storage and processing.

5.3.2 JSON (JavaScript Object Notation)

JSON is a data serialization language inspired by the JavaScript programming language

and how it expresses various data types. JSON has first class support for scalars: numbers

and strings, associative arrays (also called objects), and arrays. JSON replaces the

opening and closing tags of XML with curly braces and square brackets, to represent

associative arrays and arrays, respectively. This also effectively reduces the size and

visual clutter of JSON files.

5.3.3 YAML (YAML Aint Markup Language)

YAML is a data serialization language that was inspired by JSON, Python, and others.

YAML further improves the visual clarity of languages like JSON, by making indents

significant, i.e. indents and dedents imply the structure of the data. This makes it

especially well suited for human readability and machine interpretation. YAML (versions

greater than 1.3) is strict superset of JSON.

5.3.4 Custom Language

Custom language refers to a language designed solely to encode the application orches-

tration topology. This encoding could be very efficient since we can create a strong

alignment between the syntax of the language, and the concepts being expressed.

For illustrative purposes let’s assume our language is YAML based and we want to

encode a VM, we could do the following:

virtual -machines:

- vm -1

- vm -2

Since YAML only provides very generic data types, the extra information must be

explicitly specified, i.e. the virtual-machines . By contrast, a custom language could

encode virtual-machines as a dot (.), e.g.


.

- vm -1

- vm -2

The biggest shortcoming of this approach is that it requires us to write a custom

parser.

5.3.5 Discussion

To write a custom parser would be a very time consuming undertaking. For this reason,

the custom language approach is not considered further. The remaining three languages

have well implemented parsers in Python. Although, there may be differences in per-

formance (i.e. memory and time) of these parsers, our preference is for a language that

supports quick prototyping and is expressive and clear both from the perspective of the

developer and the final end user. On account of its more verbose and noisier syntax,

XML is excluded from consideration. YAML is a proper superset of JSON, therefore, it

has all the benefits of JSON. However, YAML is whitespace sensitive, which compared

to brackets and braces allows data to be expressed more clearly. In fact, the abstract

modelling languages, described prior, are examples of valid YAML. For these reasons, we

have chosen YAML.

5.4 System Architecture

The previous chapter explained the design considerations that led to the design of Vino

and its components. Here, we will reconsider those in the context of its implementation.

Broadly speaking, the Vino system performs the following:

1. manages any unmanaged substrates, and

2. orchestrates over heterogeneous substrates

These two phases can be divided over two components. The Bootloader, called as

such because of its conceptual similarity to the boot loader program in computer oper-

ating systems, is responsible for creating a RMS to manage unmanaged infrastructures.

The Orchestrator is responsible for orchestration over the heterogeneous infrastructure

landscape.


5.4.1 Bootloader

The bootloader is responsible for bringing unmanaged resources under the purview of

a RMS, so that they can subsequently be orchestrated over. This phase must only be

performed when vCPEs are concerned. When the bootloading phase completes, the

unmanaged resource pool is transformed into a cloud. In order to bootload a resource

pool, the user must specify where the resources (i.e. physical servers) are located. This

is realized in the form of an underlay topology file that contains the IP address of the

resources. If the bootloader is being run from the same network as the servers, then

private IP addresses can be used. Otherwise, the resources must have public IP addresses.

The configuration process is designed to be automatic without requiring manual oversight.

This is achieved by using sensible defaults and allowing the user to specify configuration

changes through the underlay topology file.

5.4.2 Orchestrator

The orchestrator is responsible for the orchestration of applications. Similar to the boot-

loader, the orchestrator reads a topology file. It parses a topology and determines the

graph of resources, i.e. the different nodes, how they are connected and other depen-

dencies. It then determines which cloud drivers will be responsible for which tasks, and

delegates the tasks accordingly.

5.5 Bootloader Design

As described before, the bootloader is responsible for instantiating the control stack over

unmanaged resources. Here, we document the various aspects of its implementation.

5.5.1 Parser

The following figure shows the modelling language that we previously designed to model

the substrate.

cluster:

controller:

substrate_host: savi

controller_flavor: m1.medium

agents:

-


username: ubuntu

ip_addr: 10.12.1.2

hw_virt: true

The first step the bootloader must perform is to parse the underlay topology file.

To parse the YAML based modelling language, we use a Python YAML parser called

PyYAML. The parser takes a YAML file as input and returns a data structure as output.

This data structure informs us where the agents are located and where the controller

node will be located.

5.5.2 Remote Code Execution

Once the Parser has performed the initial pass, we know where the agents and controllers

are located. The agents are typically unmanaged physical servers. The controller node is

either run on a physical server or on a VM, e.g. on a public cloud. To run the controller

node, we must first provision a VM on the specified provider (provider refers to the cloud

provider that will host this VM). Once, the controller is provisioned, we must perform

remote code execution to create the RMS. The following are the options for remote code

execution. These systems are also called configuration management systems.

Ansible

Ansible is a configuration management system, with additional capabilities to deploy

applications and execute arbitrary code. Ansible uses a YAML based DSL language

to express tasks. This list of tasks is contained in a file called a playbook. The user

specifies the nodes and the corresponding playbooks they would like to execute on each

node. Additionally, Ansible has a declarative syntax (i.e. the user specifies the state they

would like to achieve rather than the steps towards it, e.g. instead of specifying to create

the file /home/foo, the user specifies that a file should exist). In this regard, Ansible

attempts to allow users to specify configuration at a higher-level of abstraction. Ansible

has a large number of modules, including those for file IO, networking, and monitoring.

Additionally, any third-party developer can write custom modules. Finally, Ansible runs

over SSH, is agentless, and is written in largely dependency-free Python, which makes

setup very easy.

Puppet, Chef

The chief difference between Puppet or Chef and Ansible is the learning curve and ease

of doing your first deployment. These systems use a master-agent model, whereby an


agent process runs on each machine that must be configured. Additionally, the agents

must pull updates from the master. This approach is beneficial if the intention is to run

a large number of commands; however, for a small number of commands, this turns out

to be inefficient. This is because there is an overhead tradeoff between running an agent

and pushing commands. Specifically, running an agent requires more system resources;

however, each agent pulls only the required configuration (as opposed to Ansible which

pushes all changes to be applied, e.g. create a file /foo, even when it exists). Also,

these systems use HTTPS; which requires a non-trivial step to set up the certificates.

Additionally, the remote system may not be able to run the agent for many reasons

(lack of system resources, unmet dependencies). All these lead to a more involved setup

process. Also, both of these have a steep learning curve.

Python

This is not so much a solution in itself, but rather a method of structuring the solution.

Specifically, since Ansible is written in Python, and the Python ecosystem consists of

additional libraries, this means using any of these libraries in user defined top level

programs to perform the required tasks.

Discussion

Ansible is the superior choice since it avoids the pitfalls of Chef and Puppet, namely

being agentless, having a large module ecosystem, and declarative syntax. As described

previously, the SAVI RMS is based on the agent-controller model. In this model, there

is some control stack that runs on the controller node. There is corresponding agent

program that runs on the agent node. Therefore, we have a Ansible playbook for agents

and hosts that configures them as required.

5.6 Orchestrator Design

Here we discuss the implementation of the various components of the Orchestrator.

5.6.1 Parser

The parser is the component that parses and realizes the topology. In this section we

will consider the different phases of the parser.


The following is the final topology that we designed in the previous chapter. We refer

to the language as the Vino template language (VTL), and the parser as VTL parser or

simply parser.

nodes:

#Nodes only have the property ‘name ‘

-

name: nodeA

-

name: nodeB

-

name: nodeC

edges:

-

endpoint1: nodeA

endpoint2: nodeB

-

endpoint1: nodeB

endpoint2: nodeC

Figure 5.1: Final version of VTL.

The topology in this form only expresses how the nodes are connected together. As

we noted before, we needed a way of expressing other properties about nodes and edges.

Specifically, nodes can either be virtual machines or containers. The cloud can be SAVI

(native and vCPE variants), AWS, GCE. Additionally, orchestration of nodes requires

specifying the image to boot. The following is a prototypical topology file that in addition

to the topology information, contains auxiliary information.

Parsing Phase

The parsing phase is when the parser gathers all the information. When the parsing phase

completes, this topology of nodes and properties of each node is known. The topology

exists as a list of nodes, and a list of endpoint pairs representing each point to point

connection. Alternatively, if no endpoints are specified, all nodes are meshed together.

The properties of a node include the type of node, i.e. VM or container, the image to

be used, the cloud that node should be provisioned in, among others. Although, most of

the properties are resolved by the end of this phase, there may be other properties that

can only be resolved at a later phase.


parameters:

savi_key_name:

description: SAVI keypair name.

aws_key_name:

description: AWS keypair name.

nodes:

-

name: vino_gateway

role: gateway

image: ami -df24d9b2 #ubuntu with ovs

flavor: t2.micro

provider: aws

type: virtual -machine

region: us-east -1

key -name: utils:: get_param(aws_key_name)

security -groups:

- wordpress -vino

config:

-

playbook: playbooks/gateway/playbook.yaml

host: gateway

extra -vars:

webserver_ip: utils:: get_overlay_ip(

↪→ vino_webserver)

-

name: vino_webserver

role: webserver

image: Ubuntu64 -OVS

flavor: m1.medium

provider: savi


region: tr-edge -1

key -name: utils:: get_param(savi_key_name)

security -groups:

- wordpress -vino

config:

-

playbook: playbooks/webserver/wordpress.yaml

host: webserver

Figure 5.2: Example of a VTL file with the complete set of features.


#Leave blank for mesh

edges:

declarations:

-

name: wordpress -vino

type: security -group

description: security group for vino

ingress:

-

from: -1

to: -1

protocol: icmp

allowed:

-

0.0.0.0/0

-

from: 22

to: 22

protocol: tcp

allowed:

-

0.0.0.0/0

-

from: 80

to: 80

protocol: tcp

allowed:

-

0.0.0.0/0

-

from: 4789

to: 4789

protocol: udp

allowed:

-

0.0.0.0/0

-

from: 6633

to: 6633

protocol: tcp

allowed:

-

0.0.0.0/0

egress:

Figure 5.3: Continuation of the above topology file.


Provisioning Phase

The provisioning phase is when the resources are provisioned. Resources can be classified

as either hard or soft, depending on whether they use system resources, or are only logical

objects, respectively. For instance, a VM would be a hard resource since it consumes

system resources and a security group would be a soft resource since it is only a state

change (except, the resources required to create the state change). There are two hard

resources, nodes which can be either VMs or containers, and network connections (al-

though this is not strictly true, since the underlay network always exists and the overlay

network is more of a logical entry). The system first provisions soft resources, such as

security groups and SSH keys- since users may requires these for the creation of nodes.

Next, we provision the nodes. The provisioning phases completes with the creation of

network tunnels, connecting the nodes as per user specification.

Configuration Phase

The configuration phase, configures the VMs as specified by the user. The user can specify

multiple Ansible playbooks to be executed on any given host. The configuration phase

determines which playbooks to run on which hosts, including resolving any unknown

parameters. It then runs the matching playbooks on the hosts. Users can also use the

same playbook for multiple hosts, by specifying the hostnames in the playbook.

5.6.2 Declared Types

This section discusses all the resource types that can be requested to be provisioned.

Nodes

These represent the computational nodes, in the form of either virtual machines or con-

tainers. Nodes additionally contain other properties, like name (for symbolic referencing)

the image (OS distribution and installed packages), flavor (system resources, e.g. 2GB

RAM, 1 CPU core, 20 GB disk), the associated security groups, SSH keys, the cloud

service provider, the region, the tenant (if applicable).

Edges

Edges represent the bidirectional communication links. Edges have the properties end-

point1 and endpoint2, which are the symbolic names of the nodes, representing the two


endpoints of a link. Additionally, there is a boolean property called secure , which

determines if the link is encrypted.

Declarations

These represent resources other than nodes and edges. These include logical constructs

like security groups, key pairs. Here logical refers to the fact that these objects corre-

spond to state changes and affect the workings of other resources, as opposed to physical

constructs like virtual machines.

5.6.3 Parametrization

The topology to be deployed may be similar across multiple deployments. One of the

reasons for using a textual representation was to be able to reuse the text file. To

these ends, the VTL parser allows values to parametrized. The parameterized values

can then be accessed through a special construct, which looks like utils::get_param

. For instance, if a parameterized value is called foo, then any place in the a VTL file,

where one expects the value of foo, we can substitute utils::get_param(foo) . The

parameterized variables can either be set through the environment variables or through

the config file. If there are duplicates then the environment variable takes precedence.

The choice of supporting both a config file and environment variables was intended

to increase user flexibility. Specifically, the user can keep multiple config files, e.g. for

development and production. However, files may be accessible by the other users (de-

pending on file permission). By using environment variables, it adds a layer of security

(albeit weak).

5.6.4 Configuration File

As part of the design effort to allows for parameterizable template files, we allow the user

to specify a configuration (config) file. The config file contains the login credentials for

the various clouds and are part of the authorization and authentication system of Vino.

The config file also specifies other default values, such as the default SSH key, the default

region, etc.

5.6.5 Special Forms

In the previous section, we described a mechanism for accessing parameterized vari-

ables, which could be read from a config file or from an environment variable. The


core of VTL is concerned with specifying topologies to orchestrate. Specifically, the

parser is responsible for provisioning VMs or containers and networking them as defined

by the user. However, the parser exposes a user-extensible mechanism to encapsulate

arbitrary Python code and access it during the various phases. These are called spe-

cial forms, and utils::get_param is an example. Special forms have the structure:

<namespace>::<form name>::(<arg1>, ... , <argN>) . Lets consider this piece-

wise:

• <namespace> refers to the namespace that the form belongs to. Forms are orga-

nized by namespace, so related special forms can be grouped together.

• <form name> refers to the unique name of the form.

• <arg1>,...<argN> refers to the N positional arguments that the form accepts.

Although, special forms are very powerful, they can also be easily abused to encap-

sulate arbitrary unstructured logic. Therefore, special forms should be used sparingly,

and only when the requisite functionality is not otherwise available. When extending

forms the user can specify when the form is resolved, e.g. before parsing, after parsing,

before provisioning, after provisioning, before configuration, or after configuration. Other

examples of forms are:

• aws::get_image_id::(<image name>) . To provision VMs on AWS requires that

the user specify the image identifier (ID). However, the image ID varies between

different regions. This special form takes the image name and returns the ID for

the current region, i.e. based on the node definition.

• utils::install_ovs_2_3_3 , installs OVS version 2.3.3 on a provisioned VM.

This is useful when a cloud does not provide default images with OVS installed.

5.6.6 Dependency Resolution

The parser performs semi-intelligent dependency resolution. In general, we can efficiently

and sensibly perform dependency resolution if the graph of dependencies forms a directed

acyclic graph (DAG). However, in practice, it may not be possible to achieve this. For

instance, consider two nodes A and B. If we want to configure A to ping B, and B to

ping A, then a naive implementation may deadlock since the system would not provision

A until it knows Bs IP address, and vice versa for B.

The above example foreshadows the solution. Specifically, the parser is divided into

phases, i.e. parsing, provisioning, configuration. All tasks in one phase are performed (as


opposed to performing all tasks for one node), and resolutions performed before moving

on to the next phase. This approach also has some shortcomings, i.e. when dependencies

are inter-phase. However, in practice these are very uncommon and this approach is

effective.

5.6.7 Cloud Drivers

The various cloud drivers provide the API to perform tasks on a specific cloud. Based on

our design in the previous chapter, the cloud driver is a thin wrapper around the various

cloud providers APIs. Having an abstraction, makes the system more robust in many

ways. First it protects against discontinued and changing APIs. Specifically, if the cloud

providers API has changed, we can make changes to the specific driver to account for

this. Although the driver is supposed to be thin, if needed it can encapsulate arbitrary

logic.

Separating the cloud drivers from the parser logic is also beneficial in other ways.

First it ensures that parser logic is focused on delegating tasks to individual drivers.

This keeps the parser lean and makes it easily extensible. Finally, different provider

APIs have different ways of achieving the same tasks, e.g. to provision a VM, the AWS

API requires, client.run_instances(<some arguments>) ; while GCE API requires,

compute.instances().insert(<some arguments>) . The wrapper can normalize the

API and makes testing the modules easier. The current implementation of the Orches-

trator has the following drivers, and hence supports the following clouds:

• SAVI

• AWS

• GCE

5.6.8 Creating the Topology

Automatic Master Creation

The config file has an option for automatically creating a master node. The master node

runs the control stack, i.e. SDI manager, the topology manager, the SDN controller

(Ryu) and the multi-domain controller, Vino. The Vino controller provisions the nodes

and registers the MAC addresses of the nodes with the SDI manager (see 4.10 for more

details). Subsequently, the SDI manager, or the SDN controller can install flows on the

switches, e.g. to do service chaining.


5.6.9 Logical Resources

The logical resources are the constructs aside from compute nodes and networks. This

includes things like security groups and SSH keys. Below we discuss their implementation.

Security Groups

Security groups encapsulate the logic around network access of virtual machines. This

includes the ingress and egress ports that are open and the nodes to whom they are open.

This is specified as the following 3-tuple:

• protocol type, i.e. TCP, UDP, or ICMP

• port number or range of ports ( note this is just set to -1 for ICMP)

• allowed IP addresses in CIDR notation (e.g. 192.168.10.11/16)

SSH Keys

SSH keys refer to the private and public key pairs that are used for accessing virtual

machines. SSH can work with both keys and user chosen passwords. However, keys offer

superior security, since they are much harder to brute force and are never accessible to

the remote server. The Vino system performs the following steps to create and register

SSH keys.

1. Check if the local machine has an SSH key (i.e. check the default location: /.ssh).

If a key exists then proceed to the next step, otherwise create a keypair using the

ssh-keygen utility.

2. Check if the remote end, i.e. the cloud has your SSH public key. If there is no key,

then upload the public key.

3. If a key exists, there are two ways to proceed further; by name or by public key

4. By name means that a user specifies a key name. If the key name exists remotely,

check if the remote public key matches the local public key. If it does, then use

this key. Otherwise, raise an exception and let the user handle this.

5. By public key means that the user specifies the public key (i.e. the local public

key) and the corresponding key on the remote end is used. If there is no match on

the cloud, then this key is uploaded.


5.6.10 Virtual Machines

Here we discuss the various components of the parser related to virtual machines.

Provisioning

To provision virtual machines, we leverage the cloud drivers. These cloud drivers expose

API to provision virtual machines. They accept the image name, SSH key, and flavor

(i.e. the amount of system resources that are allocated).

Configuration

A comprehensive orchestration system should be able to support both provisioning and

configuration of nodes. Other orchestration systems such as OpenStack Heat enable this

by allowing the user to specify code that is injected inside the VM and executed after

it is provisioned. This can be arbitrary shell code, e.g. install packages, create files

etc. Although, this can be used to perform arbitrary configuration, this is hard to write,

maintain, and debug, in part due to the inherent complexities of shell script.

One approach could be to extend VTL to allow users to specify common configu-

ration tasks. However, there are many configuration tasks, which can be called in an

innumerable number of ways. Therefore, it is challenging not only from a development

perspective, but also from the perspective of the user, who would need to learn another

language. As mentioned before, we use Ansible to perform remote configuration and code

execution. Indeed, Ansible is widely used in industry. Therefore, in order to facilitate

the configuration of nodes, we allow users to specify playbooks that must be executed on

a node. Additionally, multiple nodes can use the same playbooks, with the specific role

of a given node determining which part of the playbook is executed on each node. The

following figure shows the node configuration as specified in a topology file.

config:

-

playbook: playbooks/firewall/playbook.yaml

host: firewall

extra -vars:



gateway_ip: utils:: get_overlay_ip(vino_gateway)

Figure 5.4: Node configuration snippet. User can specify a list of configurations in theform of playbooks.


Ansible is written in Python, and the core Ansible libraries are directly accessible

from Python code. Therefore, instead of interfacing with Ansible directly, we wrote a

wrapper class for interfacing with Ansible. The parser calls this wrapper to execute

playbooks on nodes.

Autoscaling

We added support for autoscaling. Autoscaling requires two components, alarms, i.e.

notification of certain events, and a corresponding action. This logic is a thin wrapper

around the functionality exposed by the clouds. This feature is only implemented for

AWS.

5.6.11 Containers

There are two aspects of containers. The first is managing individual containers. The

second is managing clusters of containers. Vino supports the ability to provision one or

many containers. This feature is implemented on AWS and SAVI.

5.6.12 Network Tunnels

Here, we consider the different network tunnels, namely unsecure VXLAN tunnels and

secure OpenVPN tunnels.

VXLAN (Unsecure Tunnels)

Networks connections can be either secure or unsecure. Unsecure connections are im-

plemented as VXLAN tunnels. VXLAN tunnels encapsulate entire L2 frames in a UDP

datagrams. This first requires a bridge on both hosts, created through OVS. We then

create virtual network interfaces and assign them a private IP address. We then add this

interface (also called ports) on the bridge we created. Finally, we create a VXLAN port,

which corresponds to an endpoint of a VXLAN tunnel. Thereafter, any packets being

sent to a matching private IP address, are sent over the VXLAN tunnel.

OpenVPN (Secure Tunnels)

We use OpenVPN to achieve secure tunnels. OpenVPN offers two ways of creating secure

VPN tunnels, namely routed, which creates IP based tunnels and bridged, which creates

layer 4 Ethernet based tunnels. A bridged VPN requires a TAP virtual network adaptor.

Whereas, a routed VPN requires a TUN virtual network adaptor. Practically, TUN


Process A

OVS (VTEP)

OuterL2 Frame

OuterIP Packet

OuterUDP Datagram

Inner L2 Frame

Inner L2 Frame Process B

OVS (VTEP)

Inner L2 Frame

VXLAN Header

Figure 5.5: A conceptual view of vxlan tunnels.


Physical Connection Virtual Connection

Client 1

Application A

OpenVPN

Inner L2 Frame

Client 2

Application B

OpenVPN

Inner L2 Frame

TAP Device

TAP Device

Server

OpenVPN

TAP Device

Figure 5.6: A conceptual view of an OpenVPN setup.


based tunnels can only be used for IP trafficking. However, since, the network controller

works with Ethernet frames and MAC addresses, we use the latter approach.

A bridged VPN connection works with a client initiating a UDP connection to the

server. In order to perform mutual authorization, the client and server must have access

to the certificate authority’s (CA) certificate. The client and server can then validate

each other, and establish an encrypted channel using the tunel layer security (TLS)

cryptographic protocol. Subsequently the nodes are assigned an internal IP address, and

clients can communicate with each other with data being relayed through the server,

using the assigned IP address.

5.7 Traffic Steering

5.7.1 Overview

Once the nodes are provisioned and the network channels are configured, then we can

perform advanced traffic steering. Traffic steering refers to dynamically changing the

routing strategy and forwarding of packets. For instance, packets heading from node A

to node B, can be redirected through a graph G (an arbitrary collection of nodes). When

the goal of traffic steering is to insert intermediate middleboxes, it is referred to as service

chaining.

The default network behavior between two communicating nodes is for the traffic

to take the shortest path. Specifically, the communication of two VMs is based on the

shortest path route determined by the SDI manager. The communication between two

VMs starts with the sender sending a packet to the first switch in the path; for our

case this is the OVS bridge on the VM. When two VMs communicate, the intermediary

switches try to find a match based on the address. If this is the first time the VMs are

communicating, there would be no matching rules and a notification will be sent to the

SDN controller, and relayed to the SDI manager. The SDI manager will use the topology

manager to determine the shortest path and install the appropriate forwarding rule on

all the intermediary switches.

These rules (also called flows) have an associated priority. To perform steering, e.g.

when A sends a packet to B, those packets pass through G, we install higher priority

flows on switches. These higher priority flows cause the traffic to take an alternative

route. The forwarding of packets from G to B must be handled by G. That is because,

the network stack has no control once the packet is delivered to a userspace program.

Therefore, if we want G to transparently forward packets to B, this must be handled by


Figure 5.7: The Vino Portal can be used to create service chains

the userspace program on G.

The SDI manager exposes a HTTP Restful API to install high priority flows as per

user requirements. The API requires the addresses of the head, tail, and middle of the

service chain. This API can be called directly. Additionally, to facilitate, the creation of

service chains, I created a portal. This is documented in the following section.

5.7.2 Portal

We created a portal to facilitate the creation of service chains. The portal is built using

JavaScript for the frontend and Python for the backend. The frontend was developed

using jQuery and D3.js; whereas the backend was developed using Flask. The portal

currently only works for SAVI and AWS. The main aspects of the portal are an au-

thentication and the ability to create service chains. To authenticate, the user provides

credentials. The user can authenticate against SAVI, AWS, or both. It then validates

the credentials against the IAM systems of SAVI and AWS, respectively. Once the user

is authenticated, a token is generated. Subsequent requests must be accompanied by the

token.

The interface is primarily designed to facilitate the creation of service chains. After

the user authenticates, the system requests a list of all the nodes. Each node is displayed

as a circular element on the canvas. Users can click two nodes successively to create a

link between the nodes, represented by a black connecting line. This represents the first


rung of a service chain, i.e. from the head to the middle. Next, we need the second rung

representing the link from middle to the tail. In addition, the user can click a link to

delete a chain.

In addition, the portal is designed to work with any SDI manager, i.e. this could be

the native underlay manager or an overlay SDI manager. The user then clicks the Create

chain button to create chain.

Chapter 6

Evaluation

This chapter documents the functional and performance evaluation of the Vino system.

The objective of functional analysis is to determine the capabilities of the system and

how these align with the requirements of the systems. We will consider how application

topologies can be realized using the Vino system.

The objective of performance analysis is to evaluate the performance of the system.

We want to determine the space and time overhead of the system and how much of the

system resources are used to do useful work. We will then measure the scalability of

the system as whole, i.e. the system’s ability to handle more requests as the allocated

system resources increases. All the experiments were performed on medium sized virtual

machines (unless otherwise stated). Specifically, on SAVI we used the m1.medium flavor

which corresponds to 2 vCPUs, 4096MB RAM. vCPUs roughly correspond to cores that

are allocated to a VM. The underlying physical machines typically use Intel Xeon micro-

processors, albeit the exact microarchitecture is unknown. Likewise, AWS experiments

were conducted on t2.medium instances, which correspond to 2 vCPUs and 4096MB of

RAM running on Intel Xeon. Each stated value was the result of running the experiment

a 100 times and averaging the result.

6.1 Functional Evaluation

In the introduction, we identified the objectives of this thesis. Specifically, the functional

objectives were to have a system that supports:

1. advanced traffic steering

2. management and orchestration over managed and unmanaged resources

75

Chapter 6. Evaluation 76

3. enhanced network security

The Vino system meets all these objectives. Specifically, the overlay management

enables advanced traffic steering. The orchestrator and bootloader enable management

and orchestration over heterogeneous infrastructures. Finally, using OpenVPN tunnels

improves network security (i.e. integrity and confidentiality). Furthermore, with regards

to orchestration, Vino through special forms allows interfacing with existing components,

such as existing VMs, and security groups. This is a shortcoming of OpenStack Heat and

AWS CloudFormation, since they consider elements declared in a template to represent

an isolated and self-contained deployment. These, however, do support the use of static

specifications, e.g. specifying the name of an existing security group. However, these are

not always useful, e.g. if the user wants the identifier of a newly created security group.

Using special forms allows interfacing with arbitrary components.

6.1.1 WordPress Firewall Exposition

The topology file shown in Figures 6.1 and 6.2, represents a wordpress deployment dis-

tributed over SAVI and AWS infrastructures. It consists of a web server, a gateway

and deep packet inspection middlebox. The scenario is that we have provisioned a web

server, perhaps running a WordPress based blog. Note, this example is simplified to

demonstrate the capabilities of Vino. In reality, the web server would be split into the

web server proper and the database, each of which would be horizontally scaled. In this

view, the gateway server can also be viewed as the load balancer. Regardless, we notice

there is a security vulnerability in the web server and that it is susceptible to SQL injec-

tion attacks. Ideally, we would like to patch the vulnerability without taking down the

server. The SDI capabilities facilitate this. We can create a service chain, such that all

traffic going from the gateway to the webserver is sent to the DPI unit located on the

SAVI cloud. The DPI unit then analyses each packet and transparently forwards non-

malicious packets to the webserver. When the patch is created, we can apply a hotfix

and remove our previous service chain. This achieves dynamic service chaining without

service disruption (See Figure 6.3).

Based on the topology file, we can see that each node contains a provider property,

which specifies where the node must be located. Thus, we have orchestration capabilities

over multiple cloud providers. Furthermore, we can achieve network security by specifying

that the point to point tunnels be encrypted by setting each edge’s secure property to

true. Finally, by using the Vino portal as shown in Figure 5.7, we can perform arbitrary

service chaining, thereby satisfying the goal of advanced traffic steering. Therefore, the


parameters:

savi_key_name:

description: SAVI keypair name.

aws_key_name:

description: AWS keypair name.

nodes:

-

name: vino_gateway

role: gateway

image: ami -df24d9b2 #ubuntu with ovs

flavor: t2.micro

provider: aws


region: us-east -1

key -name: utils:: get_param(aws_key_name)

security -groups:

- wordpress -vino

config:

-

playbook: playbooks/gateway/playbook.yaml

host: gateway

extra -vars:



-

name: vino_webserver

role: webserver

image: Ubuntu64 -OVS

flavor: m1.medium

provider: savi


region: tr-edge -1

key -name: utils:: get_param(savi_key_name)

security -groups:

- wordpress -vino

config:

-

playbook: playbooks/webserver/wordpress.yaml

host: webserver

Figure 6.1: Example of a VTL topology file.


#Leave blank for mesh

edges:

-

endpoint1: vino_gateway

endpoint2: vino_webserver

secure: true

declarations:

-

name: wordpress -vino

type: security -group

description: security group for vino

ingress:

-

from: -1

to: -1

protocol: icmp

allowed:

-

0.0.0.0/0

-

from: 22

to: 22

protocol: tcp

allowed:

-

0.0.0.0/0

egress:

Figure 6.2: Continuation of the above topology file.


Overlay SDN Network

Cloud 1 Network

SDI Management LayerBefore Chaining

After Chaining

Cloud 2 Network

Gateway

Web Server

DPI Middlebox

Figure 6.3: An example of a service chaining. The user specifies the endpoints, i.e. theGateway and the Web Server, and the middlebox, i.e. the DPI. This install rules on theswitches that forwards traffic going from the Gateway to the Web Server to be sent tothe DPI instead, which transparently forwards it to the Web Server. This can be usedfor arbitrary VNFs.

Vino system satisfies the functional requirements as initially outlined.

6.2 Performance Evaluation

There are two aspects of evaluating Vino: the performance (time and space) of the

Vino parser, and the cost of running the overlay SDI management stack. For the Vino

parser, we want to understand the resource overhead of the parser. The corresponding

experiment first runs the Vino parser to deploy various topologies. These topologies only

consider the provisioning of VMs on SAVI and AWS. The reason for only provisioning

VMs (as opposed to including other entities like security groups) was because it takes

the longest time, and is the most frequently performed operation.

Next, we evaluate the overlay SDI management. The goal here is to determine the

scalability of the overlay system. We approach this by first evaluating the resource

overhead of the underlying technologies, namely OpenVPN and VXLAN tunnels. Then,


we measure the performance of the SDN controller, Ryu in isolation. We do this by

sending a large number of events to Ryu and measuring the number of events it can

handle. We then perform this experiment again by running a single Ryu instance together

with the SDI manager. We, then scale out the number of Ryu instances.

6.3 Vino Parser

Here, we perform experiments to evaluate the overhead of the Vino parser. We do this

by creating multiple topologies over SAVI and AWS. The topologies correspond to 1, 2,

4, 8, and 16 VMs being provisioned on each cloud. The time for the parser to run can

be divided as the time to parse the topology file and the time to make the API calls. In

order to evaluate the time overhead of the parser, we will measure the total time taken

to provision the topology and compare this with the time taken to make the API calls

and for the resources to become available.

Additionally, we would like to see the memory overhead of the parser as opposed to

making direct API calls. Since, the parser includes the API calls, the time and memory

overhead of the parser will be strictly greater than that of the API calls. However, we

want to assess whether the benefits of ease of use and reusability are enough to justify

this overhead. While evaluating the parser, we will focus on provisioning VM instances,

since VM provisioning is the most expensive and common operation. In the following

set of experiments we are only interested in evaluating the parser. Therefore, we will not

consider the overhead of creating tunnels and running the control stack. These will be

addressed in the subsequent experiments.

6.3.1 Parsing Time Overhead

The following are the results of running the experiments on increasingly complex topolo-

gies. Here, we consider the total time take to provision VMs, i.e. starting from the

parsing of the topology file, including the API calls to provision resources, until all the

VMs are accessible over SSH. The times were measured using Pythons time module. We

perform this experiment for topologies on both AWS and SAVI. As before, we repeated

this experiment 100 times and averaged the times.

6.3.2 Memory Overhead

Next we would like to measure the memory overhead of Vino. There are two sources

of memory overhead: the fixed memory overhead arising from things like making API


Number of nodes Time on SAVI (sec-onds)

Parser Overhead(seconds)

Parser Overhead(Percentage)

1 37.6813590527 0.651310248375 1.732 95.4224300385 0.679185665571 0.724 128.043931007 0.683220799153 0.518 197.128249168 0.688514838585 0.3416 332.347186089 0.714512329835 0.21

Table 6.1: Total time to allocate various topologies and the parser overhead on SAVI.

Figure 6.4: The total parsing and provisioning time as a function of number of nodes onSAVI.

Number of nodes Time on AWS (sec-onds)

Parser Overhead(seconds)

Parser Overhead(Percentage)

1 54.116526842 0.37127437458 0.682 67.9349241257 0.357185625719 0.524 68.0365350246 0.320741057381 0.478 75.2786910534 0.425184816508 0.5616 86.6466450691 0.467112294583 0.53

Table 6.2: Total time to allocate various topologies and the parser overhead on AWS.


Figure 6.5: The total parsing and provisioning time as a function of number of nodes onAWS.

calls and variable memory overhead on account of per node information being stored.

We used guppy-PE Python module to measure memory usage. Due to the dynamicity

of Python, it is difficult to examine the total memory used by any object. For instance,

a Python list will typically include unused cells in order to make the addition of new

elements more efficient. Therefore, for this experiment we first created all the topologies

using only the native API. This provided us with the baseline memory usage. Then,

we created all the topologies using the Vino parser. We then subtracted the memory

overhead. The following table shows the differential memory overhead.

Number of nodes SAVI Memory Overhead(KB)

AWS Memory Overhead(KB)

1 832 8122 1272 12224 1774 16788 2178 205816 2554 2524

Table 6.3: Total memory used for the different topologies.


6.3.3 Discussion

The above experiments evaluated the parser cost of multi-cloud orchestration. Specif-

ically, we evaluated the space and time overhead of the Vino parser and found that

memory usage was roughly equal to 460*N + 372 (in bytes) (where N is the number of

nodes). The time overhead was typically less than 1% of the total time taken. However,

the orchestration system facilitates the specification of topologies, and management of

multi-cloud topologies- which justifies this overhead.

6.4 SDI Overlay

The following set of experiments evaluate the overlay SDI system. The goal of the

following experiments is to determine the resource overhead and scalability of the SDI

overlay. This includes assessing the scalability of point to point links, and of the entire

management stack. Specifically, we conducted experiments to measure the throughput

of VXLAN and OpenVPN tunnels. Then we measured the response time of the SDN

controller in isolation, and finally the SDN controller with the SDI manager.

6.4.1 VXLAN Tunnels

We will first assess the throughput of VXLAN tunnels. This will help us isolate the

performance degradation due to tunnels and those due to the control stack (i.e. SDN

controller and SDI manager). VXLAN works by encapsulating entire L2 frames inside

UDP datagrams. This causes reduction in throughput and increase in delay. Specifically,

the added header overhead reduces the amount of useful data that can be transmitted.

Furthermore, the packet encapsulation and decapsulation causes an increase in delay.

Throughput

We will measure the throughput of VXLAN tunnels by using iPerf. iPerf is a network

bandwidth measurement tool. iPerf uses a client-server model. One node runs the iPerf

server, while another runs the iPerf client. The two perform a handshake and exchange

information regarding how much total bytes will be transmitted. Then the two nodes

transfer data, and then calculate the bandwidth. In order to better understand results,

we will also measure the throughput of the underlying network channel.

As aforementioned, VXLAN encapsulates L2 frames in UDP datagrams. However,

the sender and recipient only see the inner L2 frame. Therefore, the encapsulation


Figure 6.6: Comparison of underlay and VXLAN throughput for various configurations.

and decapsulation is performed by non-terminal points. Specifically, VXLAN requires

VXLAN tunnel end points (VTEP) to perform this task. In our case, the OVS bridge

containing the VXLAN port acts as a VTEP. This processing time affects the throughput

of the VXLAN link. We ran this experiment for point to point links between two SAVI

nodes, two AWS nodes, and SAVI and AWS nodes.

Endpoints Underlay Throughput(Mbps)

VXLAN Throughput(Mbps)

SAVI and SAVI 3392 906SAVI and AWS 180 162AWS and AWS 160 101

Table 6.4: Comparison of underlay and VXLAN throughput for various configurations.

Space

We can analytically determine the space overhead of VXLAN tunnels. Specifically,

VXLAN encapsulates a whole L2 frame in a UDP datagram. So the space overhead

is 50 bytes (14 bytes Ethernet + 20 bytes IP + 8 bytes UDP + 8 bytes VXLAN header).


Ethernet typically has an maximum transmission unit (MTU) of 1500 bytes. Therefore,

the space overhead of VXLAN is 3.33%.

Discussion

The performance of VXLAN tunnels varies depending on the underlying bandwidth.

Specifically, there are two regions of interest: when the underlay throughput is roughly

under 1Gbps, and when it is over 1Gbps. In the first region, we notice a slight degra-

dation in throughput. Typically, the performance penalty is 10 - 35%. However, in

the second region, i.e. above 1Gbps, the performance degradation is much higher. The

performance penalty arises due to: 1) bigger header and 2) processing delays. VXLAN

adds 50 bytes of additional header. For the experiment, the maximum transmission unit

(MTU) was set to 1500 bytes. This amounts to a 3.33% overhead. That is, since VXLAN

packets have bigger headers, even if processing delay was constant, there would be per-

formance degradation. The remaining degradation can be attributed to processing delay,

i.e. from encapsulating and decapsulating the transmitted frames. Specifically, OVS

performs VXLAN encapsulation and decapsulation entirely in software, which can only

be performed so fast. As such, this operation becomes a bottleneck, and as the underlay

throughput increases, the achieved throughput using the overlay tunnels degrades.

6.4.2 OpenVPN Bridged Tunnels

We repeat the above experiment again, now with OpenVPN bridged tunnels. The goal

again was to determine the space and time overheads of having OpenVPN tunnels.

Throughput

Endpoints Underlay Throughput(Mbps)

OpenVPN Throughput(Mbps)

AWS and AWS 170 51

Table 6.5: Comparison of underlay and OpenVPN tunnel throughput.

Space

OpenVPN bridged tunnels are somewhat similar to VXLAN tunnels. OpenVPN encap-

sulates entire L2 frames in UDP datagrams. However, OpenVPN does not have anything


Figure 6.7: Comparison of underlay and VXLAN throughput.

like VXLAN headers, therefore the space overhead is 42 bytes (14 bytes Ethernet + 20

bytes IP + 8 bytes UDP).

Discussion

OpenVPN tunnels had a throughput of about 30% of the underlying network channel,

i.e. an overhead of about 70

6.4.3 Testing Ryu

Next, we ran experiments to test the performance of the SDN controller Ryu. For this,

we used cBench [38]. cBench is a library designed for testing OpenFlow controllers. It

tests them by sending a very large number of OpenFlow events to controllers and then

seeing how many events are responded to per unit time.

6.4.4 Testing the SDI Manager

For the next step of experiments, the goal was to evaluate the SDI manager. The SDI

manager can work with multiple SDN controllers. We will first run the experiment with


Minimum Numberof Responses (persecond)

Maximum Numberof Responses (persecond)

Average Number ofResponses (per sec-ond)

Stddev (per sec-onds)

2571.26 2767.27 2705.92 46.03

Table 6.6: Statistics on the number of responses sent by the Ryu SDN controller.

a single Ryu controller. Next, we will scale out the number of Ryu instances. We will

perform these in a similar way as the above experiments, i.e. using cBench.

Single Ryu Topology

This experiment considers the performance of the a single SDN controller and SDI man-

ager together.

Minimum Numberof Responses (persecond)

Maximum Numberof Responses (persecond)

Average Number ofResponses (per sec-ond)

Stddev (per sec-onds)

162.00 352.00 239.73 77.25

Table 6.7: Statistics on the number of responses sent by the Ryu SDN controller andSDI manager.

Scaling Out Ryu: Increasing the Number of Ryu Instances.

This experiment considers the performance of the multiple SDN controller and SDI man-

ager together.

Number of con-trollers

MinimumNumber ofResponses (persecond)

MaximumNumber ofResponses (persecond)

Average Num-ber of Re-sponses (persecond)

Stddev (perseconds)

2 54.00 392.00 218.14 79.324 102.00 403.00 268.49 65.188 98.00 386.00 258.49 68.85

Table 6.8: Statistics on the number of responses sent by the Ryu SDN controller andSDI manager.


Figure 6.8: Performance of our network control stack compared with a single standaloneRyu instance.


6.4.5 Discussion

For the SDN controller alone, we noticed it could handle 2700 events per second. When

we reran the experiment with the SDN controller acting as proxy to the SDI manager,

the number of events dropped to around 250 events per second. As we scaled the number

of controllers, the number of responses largely stayed the same. This was expected since

the SDN controller itself was not the bottleneck and therefore scaling it would have no

impact on the scalability of the whole system.

To interpret these results, we must understand how the SDI manager is intended to

be used. The SDI manager is meant to be a centralized manager that can interface with

multiple controllers which interface with OpenFlow switches. In addition, this design

maintains a flow store for caching flow entries (since the on-switch flow tables are limited

in size) at each controller. This hierarchical design is comparable to a multi layer cache

and in effect should reduce the number of events that the SDI manager receives, since

some of the events should be handled by the controller directly. Since, cBench generates

random packets, the flow store cache is never used and all the packets end up going to the

SDI manager. In more realistic scenarios, the SDI manager would receive fewer events

thus allowing it to handle a higher number of events.

Chapter 7

Conclusions

7.1 Summary

The future ICT landscape will be very diverse and consist of large numbers public and

private cloud options, along with vCPEs and sensors. In light of this, this thesis proposed

a system to perform orchestration over multiple infrastructure types and how to bring

advanced traffic steering capabilities to these systems. We started with an initial set of

requirements for an orchestration system. We then iteratively designed increasing pow-

erful orchestration systems. We first designed a system that could support orchestration

over a single public cloud. Next, we considered how we could orchestrate over multi-

ple clouds. Then, we considered how to include unmanaged resources. Whereas, public

cloud already run a RMS; for unmanaged resources we designed a system to bring unman-

aged resources under the purview of a RMS. Finally, we considered how to incorporate

container orchestration.

In addition to performing multi-cloud orchestration, we also considered how to bring

SDN capabilities to legacy clouds. Specifically, we achieved this by extending the SDI

manager and Ryu SDN controller to work with legacy systems. This required us to create

point to point tunnels between the virtual machines. The system allows users to choose

between secure or unsecure tunnels, which trade off security for performance overhead.

This thesis consisted of prototyping the above system. We also contributed a YAML

based language for expressing multi cloud topologies. This includes specifying topologies,

how they should be configured and how we can perform advanced traffic steering over

them. We also contributed a parser that could read these technology file and realize the

specified topologies. Finally, we created a GUI portal to enable creating service chains.

90

Chapter 7. Conclusions 91

7.2 Future Work

This thesis considered how to orchestrate over multiple clouds and how to bring SDN

and SDI management to heterogeneous infrastructures. To these ends we developed a

functional solution. This meant that two secondary objectives, performance and ease of

use were not fully achieved. Although we made some effort to realize these objectives,

namely providing a GUI portal for creating service chains, and multi controller topologies,

there is a large potential for future research on these topics.

With regards to performance, we noted that the provisioning of nodes scaled up well.

However, network throughput had very different behavior depending on whether the un-

derlying network throughput was roughly less than or equal to 1Gbps. For VXLAN, one

of the bottlenecks seemed to be the encapsulation and decapsulation of overlay frames.

This reduced bandwidth utilization efficiency (i.e. the percent of physical layer net bi-

trate used that goes to actually achieved throughput). One way to reduce could be to

investigate other tunneling protocols like GRE. GRE encapsulates L2 frames in IP pack-

ets, so slightly reduces the header overhead. Another venue for improving performance

could be to perform encapsulation and decapsulation using specialized hardware, such as

FPGAs and ASICs. Having VXLAN or OpenVPN tunnels provides us with a lot of flex-

ibility. However, we may not necessarily need tunnels to achieve SDN style centralized

and dynamic routing. Specifically, in [40] Vissicchio et. al propose a way to achieve this.

Although, this work may not necessarily be related, it show that centralized control can

be merely emulating it. This would also be an interesting area of future research.

The advanced traffic steering we proposed is only effective when considering logic in

communication layers (i.e. upto OSI layer 4, transport layer). However, when considering

traffic steering, we may want to chain arbitrary network functions. Therefore, if data gets

passed to the application layer, then steering based on OpenFlow does not work. The

middleboxes (i.e. the service function nodes) would have to be programmed to forward

the traffic to a specific location. But this precisely takes away from the dynamicity and

flexibility of our proposed SDN overlay approach. Another area of future research could

be ways to facilitate more complex traffic steering and service chaining.

Another area of research is related to the ease of use around multi-cloud orchestra-

tion. For instance, when users want to provision a VM, they have to specify an image

identifier. On AWS, the identifier for a vanilla Ubuntu is ami-fce3c696. Furthermore, this

identifier changes across different regions on AWS. Now, consider the tediousness around

specifying the topology. One solution would be to create an ontological mapping where

the user specifies Ubuntu and it resolves the correct identifier depending on the region

Chapter 7. Conclusions 92

and provider. This, is also related to the discoverability of services and offerings. For

instance, AWS offers spot instances, which are unused instances that are auctioned away

at a considerably lower hourly price than on-demand instances. However, if AWS needs

those instances, they will preempt the instances. Spot instances trade stability for re-

duced price. In some cases, e.g. a hadoop worker node or a horizontally scaled webserver,

having an instance preempted has no effect. Thus, providing a high level mapping from

user requirements to actual deployments would be very powerful. Perhaps, this could

be taken to the point where users specify a meta parameter optimization, e.g. deploy a

Wordpress Server, while minimizing response time and having a medium level of redun-

dancy. In summary, pursuing these strains of research can lead to a greatly improved

system.

Bibliography

[1] Amazon web services cloudformation.

[2] Ansible, url = http://www.ansible.com year = 2016, note = Accessed: 2016-3-10,.

[3] Cloudify, url = http://getcloudify.com year = 2016, note = Accessed: 2016-3-10,.

[4] Cpqd openflow software switch 1.3.

[5] Opencontrail.

[6] Openstack, url = http://www.openstack.org year = 2016, note = Accessed: 2016-

3-10,.

[7] Saltstack, url = http://www.saltstack.com year = 2016, note = Accessed: 2016-3-

10,.

[8] Terraform.

[9] Tosca-simple-profile-nfv-v1.0.

[10] Ravello systems — running openstack on aws using devstack and nested vms, 2014.

Accessed: 2016-3-10.

[11] Ahmed Amokrane, Mohamed Faten Zhani, Rami Langar, Raouf Boutaba, and Guy

Pujolle. Greenhead: Virtual data center embedding across distributed infrastruc-

tures. IEEE Transactions on Cloud Computing, 1(1):36–49, 2013.

[12] Ilia Baldine, Yufeng Xin, Anirban Mandal, Chris Heermann Renci, Unc-Ch Jeff

Chase, Varun Marupadi, Aydan Yumerefendi, and David Irwin. Networked cloud

orchestration: a geni perspective. In 2010 IEEE Globecom Workshops, pages 573–

578. IEEE, 2010.

93

Bibliography 94

[13] Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf

Neugebauer, Ian Pratt, and Andrew Warfield. Xen and the art of virtualization. In

ACM SIGOPS Operating Systems Review, volume 37, pages 164–177. ACM, 2003.

[14] Fabrice Bellard. Qemu, a fast and portable dynamic translator. In USENIX Annual

Technical Conference, FREENIX Track, pages 41–46, 2005.

[15] Pankaj Berde, Matteo Gerola, Jonathan Hart, Yuta Higuchi, Masayoshi Kobayashi,

Toshio Koide, Bob Lantz, Brian O’Connor, Pavlin Radoslavov, William Snow, et al.

Onos: towards an open, distributed sdn os. In Proceedings of the third workshop on

Hot topics in software defined networking, pages 1–6. ACM, 2014.

[16] Daniel J Bernstein. Cache-timing attacks on aes, 2005.

[17] George S Boolos, John P Burgess, and Richard C Jeffrey. Computability and logic.

Cambridge university press, 2002.

[18] Margaret M Burnett, Marla J Baker, Carisa Bohus, Paul Carlson, Sherry Yang, and

Pieter Van Zee. Scaling up visual programming languages. Computer, 28(3):45–54,

1995.

[19] Nicolas Ferry, Hui Song, Alessandro Rossini, Franck Chauvel, and Arnor Solberg.

Cloud mf: Applying mde to tame the complexity of managing multi-cloud appli-

cations. In Proceedings of the 2014 IEEE/ACM 7th International Conference on

Utility and Cloud Computing, pages 269–277. IEEE Computer Society, 2014.

[20] Peter B Galvin, Greg Gagne, and Abraham Silberschatz. Operating system concepts.

John Wiley & Sons, Inc., 2013.

[21] Craig Gentry. A fully homomorphic encryption scheme. PhD thesis, Stanford Uni-

versity, 2009. url: crypto.stanford.edu/craig.

[22] Chuanxiong Guo, Guohan Lu, Helen J Wang, Shuang Yang, Chao Kong, Peng Sun,

Wenfei Wu, and Yongguang Zhang. Secondnet: a data center network virtualization

architecture with bandwidth guarantees. In Proceedings of the 6th International

COnference, page 15. ACM, 2010.

[23] Mohammad Hajjat, Xin Sun, Yu-Wei Eric Sung, David Maltz, Sanjay Rao, Kun-

wadee Sripanidkulchai, and Mohit Tawarmalani. Cloudward bound: planning for

beneficial migration of enterprise applications to the cloud. In ACM SIGCOMM

Computer Communication Review, volume 40, pages 243–254. ACM, 2010.

Bibliography 95

[24] Scott Hendrickson, Stephen Sturdevant, Tyler Harter, Venkateshwaran Venkatara-

mani, Andrea C Arpaci-Dusseau, and Remzi H Arpaci-Dusseau. Serverless com-

putation with openlambda. In 8th USENIX Workshop on Hot Topics in Cloud

Computing (HotCloud 16), 2016.

[25] IBM. IBM Operating System/360.

[26] Joon-Myung Kang, Thomas Lin, Hadi Bannazadeh, and Alberto Leon-Garcia.

Software-defined infrastructure and the savi testbed. In Testbeds and Research

Infrastructure: Development of Networks and Communities, pages 3–13. Springer,

2014.

[27] Avi Kivity, Yaniv Kamay, Dor Laor, Uri Lublin, and Anthony Liguori. kvm: the

linux virtual machine monitor. In Proceedings of the Linux symposium, volume 1,

pages 225–230, 2007.

[28] Changbin Liu, Boon Thau Loo, and Yun Mao. Declarative automated cloud resource

orchestration. In Proceedings of the 2nd ACM Symposium on Cloud Computing,

page 26. ACM, 2011.

[29] Changbin Liu, Yun Mao, Jacobus Van der Merwe, and Mary Fernandez. Cloud

resource orchestration: A data-centric approach. In Proceedings of the biennial

Conference on Innovative Data Systems Research (CIDR), pages 1–8, 2011.

[30] Jose Luis Lucas-Simarro, Rafael Moreno-Vozmediano, Ruben S Montero, and Igna-

cio M Llorente. Cost optimization of virtual infrastructures in dynamic multi-cloud

scenarios. Concurrency and Computation: Practice and Experience, 27(9):2260–

2277, 2015.

[31] Nick McKeown, Tom Anderson, Hari Balakrishnan, Guru Parulkar, Larry Peterson,

Jennifer Rexford, Scott Shenker, and Jonathan Turner. Openflow: enabling inno-

vation in campus networks. ACM SIGCOMM Computer Communication Review,

38(2):69–74, 2008.

[32] Jan Medved, Robert Varga, Anton Tkacik, and Ken Gray. Opendaylight: Towards

a model-driven sdn controller architecture. In Proceeding of IEEE International

Symposium on a World of Wireless, Mobile and Multimedia Networks 2014, 2014.

[33] David Melman and Uri Safrai. Network virtualization: A data plane perspective.

2015.

Bibliography 96

[34] Ben Pfaff, Justin Pettit, Teemu Koponen, Ethan Jackson, Andy Zhou, Jarno Raja-

halme, Jesse Gross, Alex Wang, Joe Stringer, Pravin Shelar, et al. The design and

implementation of open vswitch. In 12th USENIX symposium on networked systems

design and implementation (NSDI 15), pages 117–130, 2015.

[35] Thomas Ristenpart, Eran Tromer, Hovav Shacham, and Stefan Savage. Hey, you,

get off of my cloud: exploring information leakage in third-party compute clouds. In

Proceedings of the 16th ACM conference on Computer and communications security,

pages 199–212. ACM, 2009.

[36] SDN Ryu. Framework community,ryu sdn framework,, 2015.

[37] Omar Sefraoui, Mohammed Aissaoui, and Mohsine Eleuldj. Openstack: toward

an open-source solution for cloud computing. International Journal of Computer

Applications, 55(3), 2012.

[38] Rob Sherwood and KK Yap. Cbench controller benchmarker. Last accessed, Nov,

2011.

[39] Abhishek Verma, Luis Pedrosa, Madhukar Korupolu, David Oppenheimer, Eric

Tune, and John Wilkes. Large-scale cluster management at google with borg. In

Proceedings of the Tenth European Conference on Computer Systems, page 18. ACM,

2015.

[40] Stefano Vissicchio, Olivier Tilmans, Laurent Vanbever, and Jennifer Rexford. Cen-

tral control over distributed routing. ACM SIGCOMM Computer Communication

Review, 45(4):43–56, 2015.

[41] Stefano Vissicchio, Laurent Vanbever, and Jennifer Rexford. Sweet little lies: Fake

topologies for flexible routing. In Proceedings of the 13th ACM Workshop on Hot

Topics in Networks, page 3. ACM, 2014.

[42] Jim Wanderer. Case study: The google sdn wan. Computing. co. uk, 11, 2013.

[43] K Yamazaki, Y Nakajima, T Hatano, and A Miyazaki. Lagopus fpga–a repro-

grammable data plane for high-performance software sdn switches. In 2015 IEEE

Hot Chips 27 Symposium (HCS), pages 1–1. IEEE, 2015.

[44] Yuval Yarom and Katrina Falkner. Flush+ reload: a high resolution, low noise, l3

cache side-channel attack. In 23rd USENIX Security Symposium (USENIX Security

14), pages 719–732, 2014.

Bibliography 97

[45] Qi Zhang, Mohamed Faten Zhani, Shuo Zhang, Quanyan Zhu, Raouf Boutaba, and

Joseph L Hellerstein. Dynamic energy-aware capacity provisioning for cloud comput-

ing environments. In Proceedings of the 9th international conference on Autonomic

computing, pages 145–154. ACM, 2012.

[46] Qi Zhang, Quanyan Zhu, and Raouf Boutaba. Dynamic resource allocation for spot

markets in cloud computing environments. In Utility and Cloud Computing (UCC),

2011 Fourth IEEE International Conference on, pages 178–185. IEEE, 2011.

[47] Mohamed Faten Zhani, Qi Zhang, Gwendal Simona, and Raouf Boutaba. Vdc

planner: Dynamic migration-aware virtual data center embedding for clouds. In

2013 IFIP/IEEE International Symposium on Integrated Network Management (IM

2013), pages 18–25. IEEE, 2013.

by Spandan Bemby - University of Toronto T-Space · 2016. 11. 17. · means of production. Cloud...

Documents

Transcript of by Spandan Bemby - University of Toronto T-Space · 2016. 11. 17. · means of production. Cloud...