by Spandan Bemby - University of Toronto T-Space · 2016. 11. 17. · means of production. Cloud...
Transcript of by Spandan Bemby - University of Toronto T-Space · 2016. 11. 17. · means of production. Cloud...
Orchestration over Heterogeneous Infrastructures
by
Spandan Bemby
A thesis submitted in conformity with the requirementsfor the degree of Master of Applied Science
Graduate Department of Electrical and Computer EngineeringUniversity of Toronto
c© Copyright 2016 by Spandan Bemby
Abstract
Orchestration over Heterogeneous Infrastructures
Spandan Bemby
Master of Applied Science
Graduate Department of Electrical and Computer Engineering
University of Toronto
2016
The future cloud ecosystem will be very diverse. On account of differences in offerings,
prices, and locations, resource allocations may span multiple public cloud providers and
include private resource pools in the form of virtual customer premise edges. Additionally,
future applications will require more powerful networking paradigms like software-defined
networking (SDN), which provide a centralized and fine-grained view of the network.
When considering private resource pools, we must extend the notion of SDN to other
resource types and consider software-defined infrastructure (SDI)- a resource management
approach that converges the management of heterogeneous resource types and provides
a centralized view over all resources. This work proposes Vino, a system for managing
resources in heterogeneous domains (public and private clouds) as well as orchestration
over these heterogeneous infrastructures. Additionally, Vino enables SDI capabilities on
arbitrary clouds by leveraging overlay networks. We design, prototype and evaluate the
Vino system, capable of handling the aforementioned tasks.
ii
For my mom, dad, sister, and my three angels, Vanya, Samaya, and Nav.
iii
Contents
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Background 7
2.1 Cloud Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Public Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.1 Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.2 Disadvantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Private Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.1 Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.2 Disadvantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4 Hybrid, Multi Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.5 Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.5.1 Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.5.2 Disadvantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5.3 Hardware Virtualization . . . . . . . . . . . . . . . . . . . . . . . 13
2.5.4 Network Virtualization . . . . . . . . . . . . . . . . . . . . . . . . 13
2.6 Compute Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.6.1 Bare Metal Machines . . . . . . . . . . . . . . . . . . . . . . . . 15
2.6.2 Virtual Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.6.3 Containers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.6.4 Lambda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.6.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.7 Software-defined Networking . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.7.1 OpenFlow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
iv
2.7.2 Network Function Virtualization . . . . . . . . . . . . . . . . . . . 18
2.8 Resource Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.8.1 OpenStack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.9 Software-defined Infrastructure . . . . . . . . . . . . . . . . . . . . . . . 19
2.10 Smart Application on Virtual Infrastructure . . . . . . . . . . . . . . . . 20
2.11 Orchestration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3 Related Work 24
3.1 Single Cloud Orchestration . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.1.1 Secondnet, VDC Planner . . . . . . . . . . . . . . . . . . . . . . . 24
3.1.2 Borg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.1.3 AWS: CloudFormation . . . . . . . . . . . . . . . . . . . . . . . . 25
3.1.4 OpenStack: Heat . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2 Multi Cloud Orchestration Tools . . . . . . . . . . . . . . . . . . . . . . 26
3.2.1 Cloud Resource Orchestration: A Data-Centric Approach, Declar-
ative Automated Cloud Resource Orchestration . . . . . . . . . . 26
3.2.2 Networked Cloud Orchestration: A GENI Perspective . . . . . . . 26
3.2.3 Greenhead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2.4 CloudMF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2.5 Multi-Cloud Brokering . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2.6 Terraform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2.7 The Topology and Orchestration Specification (TOSCA) . . . . . 27
3.3 SDN over Legacy Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.3.1 Fibbing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.3.2 Ravello Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.3.3 OpenContrail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.4 Configuration/Orchestration Tools . . . . . . . . . . . . . . . . . . . . . 29
3.4.1 Salt Cloud, Ansible . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4 Design of Multidimensional Orchestrator 30
4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.2 Orchestration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2.2 Requirements on the Substrate . . . . . . . . . . . . . . . . . . . 32
4.2.3 Modelling the Application . . . . . . . . . . . . . . . . . . . . . . 32
4.3 Resource Management Overview . . . . . . . . . . . . . . . . . . . . . . . 36
4.4 Resource Provisioning Model . . . . . . . . . . . . . . . . . . . . . . . . . 36
v
4.4.1 Native Provisioning . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.4.2 Delegated Provisioning . . . . . . . . . . . . . . . . . . . . . . . . 38
4.4.3 Fully-managed Provisioning . . . . . . . . . . . . . . . . . . . . . 38
4.4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.5 Organization of Resource Controllers . . . . . . . . . . . . . . . . . . . . 40
4.5.1 OpenStack RMS . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.5.2 Software-defined Infrastructure RMS . . . . . . . . . . . . . . . . 41
4.6 Vino Version 1: SDN Orchestration Over a Single Legacy Cloud . . . . . 43
4.6.1 Initial Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.6.2 Adapting the Design . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.6.3 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.7 Vino Version 2: SDN Orchestration Over Multiple Legacy Cloud . . . . . 47
4.8 Vino Version 3: SDN Orchestration Over Unmanaged Resources . . . . . 47
4.8.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.8.2 Types of Virtualization . . . . . . . . . . . . . . . . . . . . . . . 49
4.8.3 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.8.4 Modelling the Substrates . . . . . . . . . . . . . . . . . . . . . . . 50
4.9 Vino Version 4: Container Orchestration . . . . . . . . . . . . . . . . . . 51
4.9.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.9.2 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5 Implementation of Multidimensional Orchestrator 53
5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.2 Programming Language . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.2.1 Java . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.2.2 Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.2.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.3 Data Serialization Language . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.3.1 XML (Extensible Markup Language) . . . . . . . . . . . . . . . . 54
5.3.2 JSON (JavaScript Object Notation) . . . . . . . . . . . . . . . . . 55
5.3.3 YAML (YAML Aint Markup Language) . . . . . . . . . . . . . . 55
5.3.4 Custom Language . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.4 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.4.1 Bootloader . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.4.2 Orchestrator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
vi
5.5 Bootloader Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.5.1 Parser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.5.2 Remote Code Execution . . . . . . . . . . . . . . . . . . . . . . . 58
5.6 Orchestrator Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.6.1 Parser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.6.2 Declared Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.6.3 Parametrization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.6.4 Configuration File . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.6.5 Special Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.6.6 Dependency Resolution . . . . . . . . . . . . . . . . . . . . . . . . 65
5.6.7 Cloud Drivers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.6.8 Creating the Topology . . . . . . . . . . . . . . . . . . . . . . . . 66
5.6.9 Logical Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.6.10 Virtual Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.6.11 Containers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.6.12 Network Tunnels . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.7 Traffic Steering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.7.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.7.2 Portal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6 Evaluation 75
6.1 Functional Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6.1.1 WordPress Firewall Exposition . . . . . . . . . . . . . . . . . . . 76
6.2 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.3 Vino Parser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.3.1 Parsing Time Overhead . . . . . . . . . . . . . . . . . . . . . . . 80
6.3.2 Memory Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.3.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.4 SDI Overlay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.4.1 VXLAN Tunnels . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.4.2 OpenVPN Bridged Tunnels . . . . . . . . . . . . . . . . . . . . . 85
6.4.3 Testing Ryu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.4.4 Testing the SDI Manager . . . . . . . . . . . . . . . . . . . . . . . 86
6.4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
vii
7 Conclusions 90
7.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Bibliography 93
viii
List of Tables
6.1 Total time to allocate various topologies and the parser overhead on SAVI. 81
6.2 Total time to allocate various topologies and the parser overhead on AWS. 81
6.3 Total memory used for the different topologies. . . . . . . . . . . . . . . . 82
6.4 Comparison of underlay and VXLAN throughput for various configurations. 84
6.5 Comparison of underlay and OpenVPN tunnel throughput. . . . . . . . . 85
6.6 Statistics on the number of responses sent by the Ryu SDN controller. . . 87
6.7 Statistics on the number of responses sent by the Ryu SDN controller and
SDI manager. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.8 Statistics on the number of responses sent by the Ryu SDN controller and
SDI manager. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
ix
List of Figures
1.1 An example illustrating a complex orchestration scenario. This involves
sensors, local private servers, and resources on a public cloud (AWS). The
sensors collect Wi-Fi probe packets and send them to a local server. The
local server runs a predictive algorithm to determine when and what the
user will order. The local server, also sends the probe packets to AWS to
be stored. These components need to be connected using secure channels
with tasks distributed over each of the nodes. . . . . . . . . . . . . . . . 4
2.1 The relative weights of different characteristics of the various compute
environment. Isolation refers to how isolated the environment is. Per-
formance refers to how many system resources are used to perform actual
work, as opposed to being used for virtualization, i.e. the opposite of over-
head. Flexibility refers to how much flexibility the user has. Here, BMs
perform poorly since there hardware and kernel is fixed. Virtualization
refers to the amount of virtualization being performed. . . . . . . . . . . 14
2.2 Different types of resources sorted by increasing flexibility and isolation. . 15
2.3 The two different views of the terms resource and resource management.
Type 1 refers to system resources and management thereof. Type 2 refers
to processing units and clusters of processing units and their management. 19
2.4 A conceptual view of the SDI RMS. . . . . . . . . . . . . . . . . . . . . . 21
2.5 A conceptual view of a SAVI node modelled after the SDI RMS. . . . . . 22
4.1 An example of how different components, e.g. the compute manager, and
monitoring manager must be coordinated to realize a complex application,
e.g. an autoscaling web server deployment. . . . . . . . . . . . . . . . . . 31
4.2 A conceptual view of how the resource provisioning middleware interfaces
with the user and the cloud. . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.3 A conceptual view of the native provisioning model. . . . . . . . . . . . . 37
4.4 A conceptual view of the delegated provisioning model. . . . . . . . . . . 38
x
4.5 A conceptual view of the full middleware provisioning model. . . . . . . . 39
4.6 The upshifting of virtualization stack when OpenStack is deployed on vir-
tual machines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.7 A conceptual view of the OpenStack RMS. . . . . . . . . . . . . . . . . . 42
4.8 A conceptual view of the SDI RMS. . . . . . . . . . . . . . . . . . . . . . 43
4.9 A conceptual view of the Vino RMS. . . . . . . . . . . . . . . . . . . . . 46
4.10 A conceptual view of the Vino RMS V2. . . . . . . . . . . . . . . . . . . 47
4.11 A conceptual view of the Vino RMS V3. This shows how unmanaged
resources are brought under the purview of a RMS. The logic surrounding
resource management is the same Vino RMS V2. . . . . . . . . . . . . . 48
4.12 A conceptual view of the Vino RMS V4. . . . . . . . . . . . . . . . . . . 51
5.1 Final version of VTL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.2 Example of a VTL file with the complete set of features. . . . . . . . . . 61
5.3 Continuation of the above topology file. . . . . . . . . . . . . . . . . . . . 62
5.4 Node configuration snippet. User can specify a list of configurations in the
form of playbooks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.5 A conceptual view of vxlan tunnels. . . . . . . . . . . . . . . . . . . . . . 70
5.6 A conceptual view of an OpenVPN setup. . . . . . . . . . . . . . . . . . 71
5.7 The Vino Portal can be used to create service chains . . . . . . . . . . . 73
6.1 Example of a VTL topology file. . . . . . . . . . . . . . . . . . . . . . . . 77
6.2 Continuation of the above topology file. . . . . . . . . . . . . . . . . . . . 78
6.3 An example of a service chaining. The user specifies the endpoints, i.e.
the Gateway and the Web Server, and the middlebox, i.e. the DPI. This
install rules on the switches that forwards traffic going from the Gateway
to the Web Server to be sent to the DPI instead, which transparently
forwards it to the Web Server. This can be used for arbitrary VNFs. . . . 79
6.4 The total parsing and provisioning time as a function of number of nodes
on SAVI. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6.5 The total parsing and provisioning time as a function of number of nodes
on AWS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.6 Comparison of underlay and VXLAN throughput for various configurations. 84
6.7 Comparison of underlay and VXLAN throughput. . . . . . . . . . . . . . 86
6.8 Performance of our network control stack compared with a single stan-
dalone Ryu instance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
xi
Chapter 1
Introduction
The history of humans is the the story of evolving technologies being used to answer
the questions of land and food. The last such technology was the Internet. Indeed, the
Internet has drastically changed many aspects of our lives. It was able to have such
an impact because, in a sense, it democratized the means of distributing content. The
emergent field of cloud computing goes a step further, by additionally democratizing the
means of production.
Cloud computing is a model of computing that allows on-demand access to resources
over a communication network. This allows efficient sharing of resources and reduces the
associated infrastructure cost for users. Combined with hardware virtualization, this ab-
stracts away considerations of machine and network failures and allows (re)configuration
of resources. Although the term cloud computing became popular in the mid 2000s (with
the introduction of Amazon’s Elastic Cloud Compute service), the architecture of multi-
plexing users over network accessible resources has existed since the late 1970s. Likewise,
hardware virtualization technologies, whereby users interface with virtual slices of physi-
cal servers have existed since the 1960s. The various practical and theoretical challenges
associated with these technologies has generated much interest in academia and industry.
The principles of cloud computing, i.e. resource sharing by disparate users and on-
demand availability, have been adapted as a business model whereby resources are leased
under a ”pay as you go” model. Cloud providers (CP) leverage economies of scale to
provide resources at a lower cost with higher reliability and availability. From the users
perspective, public clouds shifts the cost distribution from higher capital expenditure
(capex) to higher operational expenditure (opex), and can enable more agile prototyping.
Public cloud computing services, for reducing the complexity and cost associated with
managing private infrastructures are becoming increasingly popular.
Although, public clouds are a key part of industry information and communication
1
Chapter 1. Introduction 2
technology (ICT) solutions, the ecosystem is diverse and additionally consists of private
and hybrid clouds. Both public and private clouds have similar architectures, but the
former is typically multi-tenancy (i.e. resources are used by multiple users, where users
can be organizations or individuals), while the latter is typically single-tenancy (i.e.
resources are used by a single user). This allows for improved security in private clouds
and also allows custom modifications.
1.1 Motivation
Traditionally, ICT requirements were defined in terms of servers and how they were
networked together. As a result, public cloud providers evolved to primarily offer virtual
analogues to existing physical resources, e.g. servers. This led to cloud providers only
supporting IP networking, while ignoring the more flexible software-defined networking
(SDN) paradigm. This created a fundamental mismatch between the offerings of cloud
providers and the needs of users who required more advanced networking, e.g. users
trying to do network function virtualization (NFV).
The cloud ecosystem consists of many different public cloud providers, with varying
prices, locations, features, etc. When deploying an application, an optimal resource
allocation may span multiple providers (optimality can be with regards to cost, energy
usage, availability, etc.) [45], [46]. Furthermore, there is large variation in user needs.
Specifically, some users may prefer private resources due to increased security [16] and
customizability. Additionally, some private resource pools may be unmanaged, i.e. exist
as individual physical servers outside the purview of a centralized manager or resource
management system (RMS). In this situation, users need a sensible way of interfacing
with, and provisioning resources over this substrate.
This heterogeneity of user requirements and infrastructure capabilities highlights the
problem of orchestration. Orchestration is the coordinated provisioning, modification,
and deprovisioning of compute, network, and storage resources, in order to form a service
that satisfies some high-level objective (e.g. policy requirements, constrained optimiza-
tion over some heuristics). To realize an orchestration system, we require: 1) a system
capable of supporting diverse orchestration tasks, such as provisioning virtual machines
(VM), and 2) control logic that determines when to provision and deprovision resources
to satisfy some objective. This thesis is concerned with the former.
To illustrate this problem, consider a food festival, where food vendors, collectively,
want to reduce the time users have to wait before receiving their order (see Figure 1.1).
Indeed, there exists a model that can predict, with high accuracy, the time, the vendor,
Chapter 1. Introduction 3
and the food item that any user will order, given the near real-time position information
of the users. The rationale here is that the vendor can start preparing the item before
the user places the order so that when the user arrives the item is ready to be served.
In order to get this position data, there exist Wi-Fi sensors that listen for Wi-Fi probe
request packets sent by users’ smart phones. There are also some small servers located
on the festival grounds that can process the probe packets as per the model and inform
the vendors to preemptively start preparing certain items. Finally, we want all this
probing data to be stored in a public cloud, like Amazon Web Service (AWS) for future
modelling. This is an example of orchestration since different types of nodes, i.e. sensors,
local private servers, and servers in public remote datacenters, and different resource
types, i.e. compute for running the predictive algorithm, storage for storing the data,
and network for communicating the data between the various nodes, must be coordinated
to create a service.
The current public cloud offerings have become very popular; however, as illustrated
by the above example, user requirements extend beyond the offerings of cloud providers.
For the promise of cloud to be realized, users must have access to a more powerful net-
working model, and be able to orchestrate over heterogeneous infrastructures, including
private and hybrid deployment schemes. Indeed, some of the biggest cloud providers use
SDN internally [42] and could easily expose that functionality to their end user. Fur-
thermore, there is work being done to facilitate multiple autonomous clouds coordinating
(directly or through a broker) to realize user requests [30], [19]. However, until such solu-
tions are more mature, our proposed approach provides a transitory platform. This thesis
is motivated by the gap between user requirements and cloud offerings. Specifically, we
propose a design, implementation, and evaluation of a system that enables orchestration
(including advanced networking) over heterogeneous infrastructures.
1.2 Problem Statement
The goal of this thesis is to design a system that allows extending SDN capabilities to
multiple legacy infrastructures and allows orchestration over the diverse cloud landscape.
This requires designing a unified orchestration layer that can handle a wide range of
orchestration tasks. We propose the following goals for this thesis:
1. Evaluate the existing approaches to bringing SDN to legacy systems.
2. Evaluate the existing approaches to orchestrating over single and multiple domains.
Chapter 1. Introduction 4
Wi-Fi Sensor
Wi-Fi Sensor
Wi-Fi Probe Packet
Wi-Fi Probe Packet
AWSLocal
Server
Figure 1.1: An example illustrating a complex orchestration scenario. This involvessensors, local private servers, and resources on a public cloud (AWS). The sensors collectWi-Fi probe packets and send them to a local server. The local server runs a predictivealgorithm to determine when and what the user will order. The local server, also sendsthe probe packets to AWS to be stored. These components need to be connected usingsecure channels with tasks distributed over each of the nodes.
Chapter 1. Introduction 5
3. Design a system that supports advanced SDN features such as traffic steering. With
regards to the above scenario, this means that if a local server goes down, then we
would have the ability to steer the probe data from the sensors directly to the
public cloud, to be processed there.
4. Design a system that facilitates diverse orchestrations tasks, namely, multi-cloud
orchestration, managing of unmanaged physical resources, container and VM or-
chestration, and node configuration. With regards to the above example, this means
we can coordinate the orchestration across diverse nodes and resources types.
5. Implement a system that realizes the above design.
6. Analyze the solution and quantify its performance and scalability.
We anticipate facing the following challenges as a result of the above enumerated
objectives.
1. How to enable advanced SDN features such as traffic steering.
2. How to import our own control and management layer on an arbitrary cloud.
3. How to orchestrate over multiple SDN-enabled clouds.
4. How to design a unified abstraction over multiple orchestration tasks.
1.3 Contributions
The contributions of this thesis are the following:
1. Design and implementation of how to bring SDN capabilities to non-SDN infras-
tructures.
2. Design and implementation of a unified orchestration layer that supports multiple
orchestration tasks. This includes:
(a) an application templating language that abstracts multiple orchestration tasks
(b) a subsystem that is responsible for managing unmanaged substrates in order
to be orchestrated over,
(c) and a subsystem responsible for orchestration, i.e. provisioning and deprovi-
sioning user requested resources.
3. Evaluation of the system.
Chapter 1. Introduction 6
1.4 Organization
The remainder of this document is organized as follows. Chapter 2 provides background
information on the cloud and related technologies and concepts. Chapter 3 is a survey
of the related works, including an analysis of their shortcomings. Chapter 4 presents
a design that realizes the above objectives. Chapter 5 discusses the implementation.
Chapter 6 evaluates the systems using various metrics. Finally, Chapter 7, considers
conclusions of this work and areas of future research.
Chapter 2
Background
2.1 Cloud Computing
Cloud computing is a paradigm of computing where resource are:
• accessed over a network,
• used concurrently by multiple users
Although the term cloud computing is somewhat recent, the underlying technologies,
namely the Internet (in the form of ARPANET) for network accessibility and time-sharing
and virtualization (as realized in the IBM OS/360 [25]) for resource multiplexing have
existed since the 1970s and 1960s, respectively. Since their inception, there have been
many advancements in networking and virtualization. Modern cloud computing offerings
consist almost exclusively of virtualized resources; although the two do not have to be
coupled thus. The cloud in the context of cloud computing, refers to one or more resource
pools, over which resources can be provisioned.
Beyond these minimal conditions, cloud computing can further be categorized based
on the logistics of the hardware, and the interface with the user. Some of these cate-
gorizations may have more specialized names, depending on whether the niche is large
enough. The various models typically have names suffixed with ”as a service”, which
reflect the type of abstraction being exposed to the user. The following categorizations
are based on the logistics of the hardware.
• who owns the infrastructure
• whether the infrastructure is single or multi-tenancy, i.e. private and public cloud,
respectively
7
Chapter 2. Background 8
The following are the categorizations based on how the user interfaces with the re-
sources.
• the resources are physical or virtual
• the level of abstraction around the resources i.e. the user interacts with:
– virtualized analogues of physical resources, i.e. infrastructure as a service
(IaaS)
– runtimes, related libraries, and environments, i.e. platform as a service (PaaS)
– software, i.e. software as a service (SaaS)
– other models of accessing services, e.g. desktop as a service (DaaS), mobile
backend as a service (MBaaS)
2.2 Public Cloud
Public cloud is a cloud that is accessible to the general public. Typically, the infrastruc-
ture is virtualized and multi-tenancy, i.e. VMs from different users may be provisioned on
the same hardware. However, some providers support single-tenancy as well as baremetal
machines, i.e. non virtualized servers. The key players in this space include Amazon Web
Services (AWS), Google Compute Engine (GCE), and Microsoft Azure, among others.
2.2.1 Advantages
High Elasticity
Elasticity refers to the CPs ability to dynamically adapt the amount of provisioned
resources based on changes in user demand. High elasticity corresponds to the ability
to satisfy user requests that vary greatly in time. Public cloud providers typically have
large resource pools, allowing for high elasticity, i.e. the ability to scale up or scale down
the number of resources being used.
Low Cost
CPs typically maintain large resource pools. This allows them to benefit from economies
of scales, with regards to hardware as well as maintenance thereof, and subsequently
reduces per unit resource cost.
Chapter 2. Background 9
Lower capex, higher opex
One of the defining characteristics of modern cloud platforms is a pay-as-you-go model
where users provision and pay for resources when required and deprovision them when
not required. This shifts the cost distribution from capex to opex. This has the following
benefits:
• requires little upfront cost
• allows for more agility
• provides insurance against exponential decrease in cost of resources, e.g. Moores
law (transistor density computational power) , Kryders law (magnetic disk storage
density storage), Kecks law (fiber optic link capacity network throughput)
Greater availability and reliability
Public cloud providers typically possess a fleet so large that the probability that at least
one physical machine will fail over some period, e.g. a day, becomes very high. As such,
they spend effort diagnosing and retiring machines that are likely to fail. Additionally,
they have robust failover mechanisms intended to reduce downtime in case network links
or switches fail. Therefore, public cloud providers typically have greater availability and
reliability [23].
2.2.2 Disadvantages
Security
Public clouds raise various security concerns due to multi-tenancy and shared identity and
access management (IAM) systems. Specifically, various side-channel attacks have been
shown to recover secret keys (and generally any other data in memory) if the attacker
has a virtual machine hosted on the same physical machine as the victim [16], [44].
2.3 Private Cloud
A private cloud is a cloud where access is restricted to a single organization. Typically,
the cloud is owned and operated by the same organization, comparable to datacenters.
However, unlike datacenters, private clouds use virtualized resources, which brings a host
of benefits.
Chapter 2. Background 10
2.3.1 Advantages
Customizability
Private clouds are owned and operated by the same organization. Since the resources are
used exclusively by one organization, the resource management systems and the physical
resources can be customized as per user requirements.
Security
There are two facets to security shortcomings on public clouds. First, in a multi-tenancy
environment, users are vulnerable to various side channel attacks. These attacks as
demonstrated by Bernstein [16] and Yarom et al. [44], make use of time to load cache
entries to infer the cache contents of collocated users. In a public cloud, users cannot
choose where there VM will be located, and specifically, whether they will be collocated
with a malicious tenant. In fact, Ristenpart et al. [35] showed how to map Amazons EC2
infrastructure and infer the likely placement of victim VMs. They could then repeat-
edly provision instances, until an instance was collocated with the victim. Furthermore,
malicious co-tenants can infer the victim’s data and algorithms based on memory access
patterns. The single-tenancy of private clouds circumvents these issues.
The second issue, arises from the lack of trust between users and CPs, specifically
the inability of users to discern malicious CPs (this includes compromised CPs, since
compromised and inherently malicious are indistinguishable). We primarily address se-
curity concerns in terms of data confidentiality; albeit integrity and availability can be
reasoned similarly. We can secure data in networks and on disk storage; however, without
an efficient fully homomorphic encryption (FHE) [21], data in CPU pipeline and memory
is vulnerable. A malicious CP could emulate the entire hardware and view all the data
being passed into the CPU pipeline. Although, FHE schemes exists [21], they tend to
be a few orders of magnitude slower compared to processing plain text, and therefore
impractical [21]. A private cloud mitigates both of these situations.
2.3.2 Disadvantages
Low Elasticity, Underutilization
There is a tradeoff between elasticity and utilization of a cloud. High elasticity requires
large unused resources; whereas, high utilization requires few unused resources. If re-
source demand is variable, then organizations must choose between incurring costs due
to unused resources or being susceptible to increases in demand. Low elasticity can be
Chapter 2. Background 11
problematic if the load is highly variable. In order to account for variance in demand,
organization must have enough resources to account for their peak load. This leads to
resource underutilization equal to the difference between average and peak loads. These
issue are less pronounced with public clouds, since their pricing schemes account for these
differences in elasticity and utilization. By contrast, the cost of private clouds is borne
entirely by the owner organization. Therefore, if the demand is variable or future demand
is intractable, then private clouds can become inflexible or costly.
High Capex, Low Opex
In order to create a private cloud, the underlying physical resources must be purchased
before they can be used. This results in higher capex, due to the cost of the physical
resources as well the cost of configuring the resources. This is in sharp contrast to public
clouds, where users can start using resources without any prior investment. However, as
total cost is composed of opex and capex, private clouds may lead to a lower average
cost, due to a lower opex.
2.4 Hybrid, Multi Cloud
A hybrid cloud is a cloud composed of public and private clouds meant to overcome their
shortcomings. Public clouds offer lower opex, effectively at the cost of reduced security.
Whereas, private clouds offer greater security for higher capex. Furthermore, private
clouds incur costs due to unused resources because of differences in peak and average
demands. A hybrid cloud is composed of a public cloud and private cloud (typically with
capacity equal to the average load). It overcomes the shortcomings of public and private
clouds by using the constituent private cloud up to its maximum capacity, and then
using the public cloud for additional capacity, e.g. when load spikes. This arrangement
additionally requires planning with regards to resource allocation, e.g. since users would
want more sensitive code and data to live and execute on private infrastructure.
A multi cloud is a cloud composed of multiple clouds. A multi cloud is an arbitrary
combination of public, private, and hybrid clouds. Furthermore, the constituent clouds
in a multi cloud, can be other multi clouds. The thesis focuses on how different kinds of
clouds can be composed to realize a multi cloud.
Chapter 2. Background 12
2.5 Virtualization
Virtualization refers to a set of technologies where the user interfaces with a virtual
analogue of a physical resource. Virtualization can be applied to CPUs, storage, and
networks. Virtualization decouples considerations of failures and configuration of the
underlying resource. Typically, virtualization creates logical partitions of the underlying
resource, which can be individually provisioned and deprovisioned. Broadly speaking,
virtualization increases flexibility albeit with associated costs. Cloud computing and
virtualization are synergetic and allow the realization of computing as a utility. This is
because, whereas cloud computing removes concerns of resource management, virtualiza-
tion removes concerns of resource failure or configuration of resources. The following are
the tradeoffs of virtualization.
2.5.1 Advantages
Separation of Concerns and Flexibility
Virtualization decouples concerns of physical resources, e.g. machine failure, from the
logical state of the applications. This separation greatly facilitates operations like migra-
tion, taking snapshots of VMs, and creating redundant replicas.
Although, this varies from resource to resource, virtual instances offer more flexibil-
ity than their physical analogues. That is because, typically we can create the virtual
instance over any physical instance. For instance, we can create Linux virtual machines
over both Windows and other Linux based physical machines. This means there is no
vendor lock in, or issues due to incompatible hardware.
Reduced Cost, Higher Utilization
Virtualization enables cost reduction through statistical multiplexing over resources. This
additionally improves resource utilization. For instance, two applications may have de-
pendencies on non-compatible versions of the same library. If we do not use virtualization,
then only one of them can run per physical machine. However, with virtualization we can
create two virtual machines with each having a separate version of the library. Utilization
becomes a more pronounced issue when the physical machines have a large number of
system resources. However, higher utilization also reduces energy used.
Chapter 2. Background 13
2.5.2 Disadvantages
Reduced Performance
There is an overhead associated with any given virtualization technology. This is, in
part, due to the extra resources used to run the virtualizing system. For instance, when
creating VMs, some resources must be used to run the hypervisor. However, some forms of
virtualization can result in substandard performance due to how virtualization is created
and how the virtual component runs. That is, if a physical component runs in hardware,
while its virtual counterpart runs in software, there will be performance degradation on
account of differences in speeds of hardware and software execution.
2.5.3 Hardware Virtualization
Hardware virtualization refers to simulating the hardware such that multiple guest oper-
ating systems can run on the host system. Hardware virtualization is considered full, if
a guest operating system can run unmodified [20]. Furthermore, hardware virtualization
can be hardware-assisted or emulated. Hardware-assisted virtualization allows for direct
execution of instructions, barring privileged instructions, e.g. modifying the page table.
By contrast, for emulated systems, all instructions must pass through the hypervisor.
This greatly degrades performance since all instructions must be executed with a layer
of abstraction and the added layer of abstraction is software based.
2.5.4 Network Virtualization
Network virtualization refers to a host of techniques to create a virtual view of the
network. This is typically achieved using encapsulation protocols, like VXLAN and GRE.
Comparable to how hardware virtualization creates the view that the user has access to
their own hardware resources, network virtualization creates the view that the user has
complete control over the entire network and can create arbitrary network topologies.
2.6 Compute Environments
A compute environment is some interfaceable realization of a Turing machine (e.g. con-
tainers and VMs are the most popular examples) [17]. This notion is intrinsically related
to virtualization and VMs, since VMs are one type of compute environment. The fol-
lowing enumeration of the environments are sorted by decreasing resource overhead,
flexibility, and isolation.
Chapter 2. Background 14
Figure 2.1: The relative weights of different characteristics of the various compute en-vironment. Isolation refers to how isolated the environment is. Performance refers tohow many system resources are used to perform actual work, as opposed to being usedfor virtualization, i.e. the opposite of overhead. Flexibility refers to how much flexibil-ity the user has. Here, BMs perform poorly since there hardware and kernel is fixed.Virtualization refers to the amount of virtualization being performed.
Chapter 2. Background 15
Virtual Machine
Container
Serverless
Increasing amount of virtualization
Virtual Hardware
Virtual OS
Virtual Server
Physical Machine
Figure 2.2: Different types of resources sorted by increasing flexibility and isolation.
2.6.1 Bare Metal Machines
A bare metal (BM) machine is a physical server that is provisioned in its entirety. A
BM has the best performance, and lowest overhead. This is due to BMs not running any
virtualization stack. Incidentally, this also makes them less flexible, since their hardware
(instruction set architecture) and typically OS are unchangeable. More importantly,
BMs lose the host of benefits associated with virtualization, chiefly the decoupling of the
hardware (and its inevitable failures) from the state of the machine.
2.6.2 Virtual Machine
Virtual machines (VM) are logical slices of system resources that run on top of physical
machines. VMs run a full operating system (OS) and have the same capabilities as
physical machines. The process of booting and running a virtual machine is as follows:
• a hypervisor (a special program that logically partitions the host to create VMs)
is run on the host machine either natively on the bare metal or as a process
• the hypervisor takes an OS image file and a resource allocation, e.g. 2GB RAM,
20GB storage, and 1 core
Chapter 2. Background 16
• if the hypervisor is run natively, it can use hardware-assisted virtualization and the
VM will have near native performance; otherwise the hardware must be emulated
and the performance will be significantly worse
Virtual machines require their own OS kernel and hardware components that the OS
needs to run. This allows VMs to provide high isolation and flexibility, at the cost of a
large overhead.
2.6.3 Containers
Containers are another environment of computation. There are many distributions of
containers, such as LXC, OpenVZ, and Docker. Unlike VMs that require emulation of
hardware components and their own OS, containers share the OS kernel with the host
and other sibling containers. This reduces the resource overhead and provisioning time
associated with containers. Containers are created by the OS creating a logical partition
of system resources and providing an overlay filesystem.
2.6.4 Lambda
Lambda (also referred to as serverless computation) extends the goals of containers [24].
Whereas containers only share the OS, the serverless model goes as far as to share
language runtimes. The user specifies a callback function that gets invoked when an
event occurs. Compared to containers, this model has an even smaller resource footprint
and response time. However, this comes at cost of flexibility, since user must choose from
a limited set of preconfigured environments.
2.6.5 Discussion
The different platforms have different tradeoffs with regards to resource overhead, flexi-
bility, and isolation. With regards to Figure 2.2, resource types should sensibly only be
nested in something below it. Therefore, the kind of substrate resource type we have
available, will determine what kind of resource can be provisioned. For instance, if the
resource management systems can only provision VMs, then there is no way to provision
a bare metal server; however, they can provision containers on top of these VMs.
Chapter 2. Background 17
2.7 Software-defined Networking
Software-defined networking (SDN) is an alternative model for computer networking. The
defining characteristic of SDN is the separation of high level control logic (control plane)
from the low level forwarding actions (data plane). Unlike traditional networking, where
routing is distributed, SDN exposes a centralized view of the network to the control plane.
This has a host of benefits including: agile network (re)configuration, faster convergence
and improved debugging. There are many different flavors of SDN, with OpenFlow [31]
being the most popular.
2.7.1 OpenFlow
OpenFlow is one realization of the SDN paradigm. OpenFlow is a protocol for communi-
cating between the control and data plane. The control plane (also called the controller)
is a logically centralized entity that determines the routing of the packets [31]. Currently,
the most popular controllers are Ryu [36], Floodlight, OpenDaylight [32], and ONOS [15].
The data plane consists of hardware and software, OpenFlow-compliant switches. For a
switch to be OpenFlow compliant it must be able to support the following:
• forwarding matching packet through a specified port
• encapsulating and sending packet to controller
• dropping a packet
• communicate using the OpenFlow protocol
A typical interaction between the controller and a switch is as follows:
1. A switch receives a packet that does not match a flow.
2. The switch sends the packet to the controller.
3. The controller determines how to route the packet.
4. The controller installs a flow, i.e. a match and a corresponding action, on the
switch.
5. Subsequent, matching packets are forwarded by the switch based on the installed
flow.
Chapter 2. Background 18
Matching headers can be defined for arbitrary header values. This includes any field
in an Ethernet frame or in the header of any encapsulated protocol. The policy with
regards to flow installation can be reactive or proactive, which determine whether rules
are installed before or after seeing a matching packet, respectively.
2.7.2 Network Function Virtualization
Network function virtualization (NFV) is an emergent area where network functions
such as load balancers and firewalls, which typically existed as hardware appliances, are
being replaced with virtual analogues. The goals of NFV are more flexible provisioning
and reduced costs. NFV critically requires server virtualization and cloud computing to
elastically provision and deprovision resources. Additionally, it requires SDN to chain
arbitrary virtual network functions to form new services.
2.8 Resource Management
Resource management refers to management of exhaustible resources. The term resource
is ambiguous on account of its varied usage. Specifically, resource can refer to:
1. system resources, e.g. compute in terms of number of cores (or perhaps even more
granularly, FLOPS or IPS), storage in terms of bytes, and network in terms of
bandwidth
2. standalone processing units, or cluster of processing units that:
• can be provisioned
• encapsulate some amount of system resources
An example would be a physical server that contains 4 cores, 32 GB of memory, 1
TB of hard disk storage, and access to 25 Mbps of uplink and downlink network
bandwidth. These can be classified as physical or virtual. In this view, servers,
FPGAs, GPUs, microcontrollers, switches, routers, and links are considered phys-
ical resources. Virtual machines, containers, and software switches are considered
virtual resources.
By extension, resource management in the former sense refers to managing system
resources as done by an operating system. In the latter sense, resource management
refers to management of processing units, e.g. a hypervisor managing virtual machines,
Chapter 2. Background 19
System Resources
Processing Units
Cluster(s) of Processing Units
Type 2
Type 1
Figure 2.3: The two different views of the terms resource and resource management.Type 1 refers to system resources and management thereof. Type 2 refers to processingunits and clusters of processing units and their management.
or a meta entity that manages multiple hypervisor. See Figure 2.3 for a visualization
of this. In the context of this document, resources typically refers to the latter notion.
However, if the usage is ambiguous, the intention will be specified.
A resource management system (RMS) houses a specific resource management ap-
proach. Since resources are finite, there is contention amongst players. The goal of a
resource management system is to provide an interface to be able to provision and depro-
vision resources to the user and arbitrate contentious requests. In the context of cloud
computing a RMS exposes the API that allow individual resources to be provisioned.
2.8.1 OpenStack
OpenStack [37] is a cloud platform supporting IaaS. As a complete IaaS solution, Open-
Stack provides all of the necessary resource management components, e.g. nova for
managing virtual machines, neutron for managing networks, and swift and cinder for
storage. The compute management includes support for hypervisors like KVM [27], Xen
[13], and QEMU[14].
It also has an image registry, an IAM system, an orchestration engine. Addition-
ally, the system is in a continuous state of development, with new features like Docker
container support. Other RMS are CloudStack and Eucalyptus.
2.9 Software-defined Infrastructure
SDI is a resource management architecture proposed by Kang et. al that converges the
management of compute, network, and other heterogeneous resources and maintains a
Chapter 2. Background 20
global topological view of all resources, virtual and physical [26].
Software-defined infrastructure (SDI) is an approach to resource management that
converges the management of compute, networking, and other heterogeneous resource
types [26]. Additionally, SDI attempts to virtualize all resource types. SDI extends the
notions of SDN in two ways:
• decouples the control and actuator logic
• exposes a centralized view of all nodes
SDI is a hierarchical resource management system whereby different resources are
controlled by different resource controllers. Each of these resource controllers then inter-
face with a centralized manager and a monitoring and analytics manager. For instance,
the SDI resource management system (RMS) natively uses SDN to manage it networks.
This allows for the following advanced use cases:
• traffic steering
• service chaining
See figure 4.8 for a conceptual view of the SDI RMS.
2.10 Smart Application on Virtual Infrastructure
Smart Application on Virtual Infrastructure (SAVI) is an initiative for building a large
scale testbed for designing and testing future application platforms. The software-defined
infrastructure RMS was researched and developed within the context of the SAVI project.
The SDI RMS is natively available only on the SAVI testbed. The SAVI testbed critically
leverages the OpenStack [6] and OpenFlow initiatives to manage compute and network
resources uniformly.
2.11 Orchestration
Orchestration, refers to the provisioning, modification, and deprovisioning of compute,
network, and storage resources (or complex resource types composed of these primitive
resource types, e.g. a virtual machine) to form a coordinated service. In general, orches-
tration includes both: 1) the mechanism for provisioning and deprovisioning resources,
and 2) the decision making logic that determines when to provision and deprovision
resources, in order to satisfy some high level objective, such as policy requirements or
Chapter 2. Background 21
SDI Resource Management System
SDI Manager Topology Manager Monitoring and Analytics
Resource A Controller Resource N Controller
Resource A Resource N
Open Interfaces
External EntitiesPhysical Resources
Virtual Resources
Design of SAVI RMS
Figure 2.4: A conceptual view of the SDI RMS.
Chapter 2. Background 22
SDI ManagerTopology Manager Monitoring and Analytics
OpenStack OpenFlow Controller
Manage Cloud
Manage Networks
Resource virtualization
PhysicalNetwork Resources (e.g. switches)
PhysicalCompute/ Storage Resources (e.g. servers)
Virtual Network Resources
Virtual Compute Resources, e.g. VM instance
OpenFlow Protocol
Network Connectivity
Monitoring Data Collection
Implementation of SAVI Node
Open Interfaces
Figure 2.5: A conceptual view of a SAVI node modelled after the SDI RMS.
Chapter 2. Background 23
constrained optimization over some metrics. Viewing orchestration as an optimization
problem requires that the domain be modelled, which may be non-trivial, and difficult to
generalize, e.g. if trying to optimize for data redundancy. In related literature, systems
that perform 1), 2) or both, all come under the purview of orchestration. In this work,
orchestration refers to the former, i.e. the mechanism for provisioning and deprovisioning
resources.
The coordination of separate resources poses challenges due to the heterogeneity of
cloud resources. An orchestration system interfaces with one or many RMSes. A RMS
typically exposes an interface to provision and deprovision resources. Orchestration sys-
tems take high level requests from users, typically in the form of a topology specification
file. They then use the interface exposed by the RMS to provision resources and compose
them together. In some cases, an orchestration system may have to perform additional
steps to satisfy the user request.
Chapter 3
Related Work
In this chapter we present the related works. We group the related works as orchestration
systems, configuration systems, or overlay SDN on legacy (non-SDN) approaches. The
related works reflect the two objectives of this thesis, namely, to enable SDN capabilities
on legacy systems, and to orchestrate over multiple clouds.
3.1 Single Cloud Orchestration
The following works consider orchestration over a single cloud.
3.1.1 Secondnet, VDC Planner
Guo et. al [22] consider the orchestration of server and network resources while mini-
mizing cost. They propose a virtual data center (VDC) as a joint abstraction of network
(bandwidth) and server resources. The objective is to find an embedding of this VDC,
over the physical substrate. The VDC embedding problem is NP-hard (i.e. a generaliza-
tion of the bin packing problem. The author proposes a greedy heuristic to achieve near
optimal allocations.
The VDC Planner is an orchestration system proposed by Zhani et al. [47]. Like
Secondnet the work focuses on the VDC embedding problem. However, it additionally
considers the problem of minimizing energy usage while utilizing VM migrations to satisfy
fluctuating user demands. The shortcomings of these works is that they model clouds
as well-defined objects with a standardized interfaces and capabilities. Our work is
explicitly positioned to design a system that can work with the actual, highly nuanced
cloud landscape. Additionally, these works consider the optimality of resource allocation
from the perspective of cloud providers. This thesis looks to design a system from the
24
Chapter 3. Related Work 25
user’s perspective. This includes providing capabilities, such as SDN, in an overlay
manner, when native support does not exist. Finally, these solutions only consider the
allocation of VMs; whereas we look to design a system that considers both VMs and
containers.
3.1.2 Borg
Borg is a large scale cluster management system [39] that is used internally by Google.
Borg is concerned with the scheduling of jobs and orchestration of containers. As such,
it’s design considers job failures, worker node failure, master node failure, the allocation
of jobs to optimize certain variables, and different priorities amongst jobs. The system
is designed to support a wide variety of objectives, e.g. minimizing running time and
minimizing total cost. Borg is very effective for the problem of container orchestration.
However, the scope of Borg is limited to container orchestration and also to single autho-
rization domains. As such, it cannot be considered an alternative orchestration system
since it excludes a large number of tasks that we want to consider under the purview of
our orchestration system.
3.1.3 AWS: CloudFormation
The AWS orchestration system is very comprehensive, and covers all the services provided
by AWS [1]. However, AWS doesnt expose a SDN interface, which prevent users from
performing advanced traffic steering. Since their API must be comprehensive, some
operations end up being complex to specify. Additionally, from the perspective of the
API design, some parts of the API are inconsistent. This can make it difficult for users
to perform certain tasks. Also, CloudFormation reflects the view that each topology file
specifies an isolated collection of resources. On account of this, it becomes difficult to
interface with existing components, e.g. existing VMs.
3.1.4 OpenStack: Heat
Heat is the orchestration engine that is developed as part of OpenStack [6]. Heat is
largely modelled after CloudFormation, and therefore it shares many of its features.
For OpenStack, Heat provides an easy to write, maintain, and read modelling language.
However, like CloudFormation, the capabilities of Heat do not include any of the advanced
SDI capabilities. Furthermore, Heat also does not support interfacing with existing
components.
Chapter 3. Related Work 26
3.2 Multi Cloud Orchestration Tools
The following works consider orchestration over multi clouds.
3.2.1 Cloud Resource Orchestration: A Data-Centric Approach,
Declarative Automated Cloud Resource Orchestration
Liu et al. have proposed a two part orchestration system [29], [28]. The first aspect
of the work is to model cloud resources as structured data. This allows problems from
the domain of cloud resource orchestration to be mapped to problems in the domain of
database management. Subsequently, cloud resources can be queried by a declarative
language, and updated with well-understood transactional semantics (e.g. atomically
perform a set of operations). The second part of the work is concerned with formulating
orchestration tasks as constrained optimization problems. These works exclusively focus
on a model cloud, while only considering server and network resources. By comparison,
this thesis is concerned with enabling multidimensional orchestration, while delegating
the decision making of resource allocation to the user.
3.2.2 Networked Cloud Orchestration: A GENI Perspective
This work by Baldine et al. considers a unified control layer layer over a heterogeneous,
and distributed landscape [12]. The authors document their progress in building a uni-
fied control layer that allows users to provision virtual infrastructure slices consisting of
compute, storage, and network resources, over distributed infrastructures and enables au-
tonomic management thereof. The work proposes a quasi-declarative modelling language
that allows users to express resource objects, their properties, and how different objects
are related. The work also considers the stitching problem, whereby their proposed orches-
tration system must stitch ”different pieces of virtualized resources from geographically
distributed compute, network, and storage substrate into a single connected configura-
tion” [12]. The system works by dividing orchestration tasks into subtasks based on
where the resources would be provisioned and subsequently determines each individual
resource allocation.
3.2.3 Greenhead
GreenHead is a conceptual extension of VDC Planner that applies heuristics to perform
VDC embedding across a distributed infrastructure [11]. As in the case of Secondnet and
Chapter 3. Related Work 27
VDC Planner, Greenhead targets abstract datacenters with standardized capabilities.
Although, their evaluation shows impressive results, the evaluation is ultimately based
on the simulations of requests and of the datacenters. As such, it does not consider the
design for interfacing with multiple heterogeneous clouds. Furthermore, Greenhead only
targets VMs.
3.2.4 CloudMF
Cloud Modelling Framework (CloudMF) is an approach for multi cloud orchestration
proposed by Ferry et al. [19]. CloudMF leverages model driven engineering (MDE)
to create individual models of the different clouds. It then uses these models along
with a modelling language to reduce the complexity of expressing various topologies
across multiple clouds. This approach is effective in orchestration across multiple clouds.
However. it only supports provisioning of VMs and as such CloudMF is unsuitable for
network resource orchestration and traffic steering.
3.2.5 Multi-Cloud Brokering
Lucas-Simaro et. al [30] propose a mechanism that allows application deployments across
multiple clouds. This approach works by brokering user requests and cloud capabilities.
As a brokering system, this system is limited to what is provided by the substrate cloud
layer. This restricts extensibility to new resource types and private resources.
3.2.6 Terraform
Terraform [8] is a project that allows users to specify infrastructures that span multiple
providers. These providers cover most Iaas, Paas, and Saas providers. However, this
requires knowledge of many parameters such as, image identifier, before realizing the
template. This presents usability challenges since determining the exact values of objects
like image identifier can be tedious.
3.2.7 The Topology and Orchestration Specification (TOSCA)
The Topology and Orchestration Specification (TOSCA) [9] is declarative language for
describing applications and services that span multiple administrative domains. There
exist parsers and services that translate services written in TOSCA to topologies that
span multiple clouds (such as Cloudify [3]). Its cross cloud capabilities are similar to Ter-
raform; however, the language itself is provider-agnostic. As a specification for existing
Chapter 3. Related Work 28
and future clouds, it has limited support for extended SDI features such as unified view
of heterogeneous resource types.
3.3 SDN over Legacy Systems
Here we describe the various approaches for achieving SDN over non-SDN systems. These
approaches attempt to provide SDN like capabilities in an overlay manner.
3.3.1 Fibbing
Fibbing is an approach for providing centralized routing capabilities [41], [40]. Fibbing is
different from other SDN approaches, chiefly OpenFlow, in that it does not propose a way
to explicitly create a centralized management plane. Instead, fibbing works with the exist-
ing distributed routing protocol, open shortest path first (OSPF). This approach presents
the user with a centralized view of the topology. In order to make routing changes, fib-
bing introduces fake nodes and links. These fake nodes and links are used to maneuver
hosts into installing arbitrary forwarding rules. This is similar to SDN where you install
arbitrary rules; however, SDN controller explicitly push rules to switches. Specifically,
an agent that knows the global topology sends fake OSPF packets (corresponding to
these fake nodes and links). This approach combines the benefits of centralized routing,
namely ease of use, with the benefits of distributed routed, namely robustness and fault
tolerance.
3.3.2 Ravello Systems
Ravello Systems [10] is a company that created the HVX hypervisor for orchestration over
multiple public clouds. HVX is offered as a SaaS, and the technology is closed source.
The following has been inferred from some high level descriptions of their product.
HVX is a distributed hypervisor that uses a trap-and-emulate like approach to per-
forms dynamic translation of guest OS instructions. They also use a similar approach
to process networks packets. Since all packets are being passed through a logically cen-
tralized control plane, they can do arbitrary switching of packets. Additionally, they
use some encapsulation protocol (perhaps VXLAN) to allow the user to create arbitrary
network topologies. Specifically, unlike native offerings of cloud providers (which only
enable network layer, i.e. L3), this allows topologies with regards to layer 2 (L2), i.e. the
data link layer. Yet, these network topologies are based on traditional networking stack
and they do not expose a mechanism for traffic steering or dynamic service chaining.
Chapter 3. Related Work 29
3.3.3 OpenContrail
OpenContrail [5] is a network virtualization approach by Juniper networks. OpenContrail
supports the creation of arbitrary virtual L2 network over physical networks. Further-
more, they also support NFV service chaining, via a domain specific modelling language.
However, as a network virtualization platform, this only tackles half of the problem.
Specifically, there is no support for orchestration of compute nodes.
3.4 Configuration/Orchestration Tools
3.4.1 Salt Cloud, Ansible
Ansible [2] and SaltStack [7] are similar projects and primarily focus on configuration of
nodes. They both provide a high level language to specify configurations to apply. Indeed,
Vino uses Ansible to create the cloud and deploy applications. These, by themselves,
lack features to statefully manage resources.
Chapter 4
Design of Multidimensional
Orchestrator
4.1 Overview
The goal of this thesis is to design and implement a system that extends SDN capabilities
to heterogeneous infrastructures and enables orchestration of complex objects. In section
1.2 we considered a high level set of objectives for this system. In this chapter, we will
design a system that realizes these objectives. In an attempt to produce a design that
is uninfluenced by the implementation considerations, we have separated the design and
implementation and placed them in separate chapters. This chapter documents the
iterative approach in designing the orchestration system called, virtual infrastructure
orchestrator, or simply Vino. This chapter answers the following questions.
1. What is orchestration? What dependencies does the orchestrator have on the un-
derlying RMS? How to model the orchestration task?
2. How can we interface with and orchestrate over a single legacy cloud?
3. How can we interface with and orchestrate over multiple legacy clouds?
4. How can we interface with and orchestrate over an unmanaged infrastructure?
5. How can we provision containers?
6. How can we perform arbitrary configuration of nodes?
We distinguish between two types of infrastructures, managed and unmanaged. A
managed infrastructure is a collection of resources that are controlled by a RMS and can
30
Chapter 4. Design of Multidimensional Orchestrator 31
Web Server 1 Web Server 2 Web Server N
Monitoring Manager
Upper threshold crossed?
Lower threshold crossed?
Provision VMs
Deprovision VMs
Yes Yes
Compute Manager
Figure 4.1: An example of how different components, e.g. the compute manager, andmonitoring manager must be coordinated to realize a complex application, e.g. an au-toscaling web server deployment.
be provisioned by users. As such, all public clouds are managed. By contrast, a set of
standalone physical servers are unmanaged. We describe an infrastructure as legacy if it
only supports distributed IP networking.
Across the literature on cloud computing, orchestration and resource management are
used in varying ways. For the purposes of our work these terms are defined in Chapter
2. Additionally, the term substrate refers to the combination of the resources (i.e. the
objects of resource management) and the RMS (if one exists).
4.2 Orchestration
4.2.1 Overview
Orchestration refers to provisioning resources and connecting them to realize more com-
plex objects. Typically, this requires interfacing with multiple components and program-
ming them in a coordinated way. For instance, an auto scaling web server deployment
would require interfacing with the compute manager to provision new virtual machines
and with the monitoring system to detect when machines should be provisioned or de-
provisioned. See Figure 4.1 for an example.
Chapter 4. Design of Multidimensional Orchestrator 32
4.2.2 Requirements on the Substrate
The key requirements for an orchestration system are that it have access to infrastructure
resources, such as the compute, network, storage, and monitoring data. Orchestration in
this sense can be viewed as the sewing together of resources exposed to the orchestrator.
For instance, reconsider the example of an auto scaling web server. To orchestrate this
deployment, we must have access to the monitoring data; without which, this arrange-
ment cannot be achieved. Therefore, in order to orchestrate something, the corresponding
capabilities must be provided either:
• natively by the underlying RMS
• as an overlay service
Thus, the creation of the overlay service is distinct from the orchestration requiring
that service; however, an implementation may combine these phases. In the following
section, we will consider the requirements of the underlying RMS, and how deficient
RMSes can be supplemented to enable the desired orchestration.
4.2.3 Modelling the Application
An orchestration system requires a modelling language (textual or visual) that allows the
users to express their application. Like the design of an API, the design of a modelling
language greatly determines whether it facilitates or hinders the workflow. The following
considerations will help us create a more effective modelling language.
Language Type
Visual languages provide a more intuitive interface by facilitating the discoverability
of capabilities. For instance, in a visual interface, we can have a button called Create a
Virtual Machine, which when clicked, creates a circular element representing a VM. Thus,
visual languages can improve discoverability of features and provide direct feedback on
user actions [18]. Another benefit of a visual representation is that the entire topology
can be understood in a single glimpse. However, both of these benefits only apply if the
topology is small or simple. If a modelling language supports a large number of features,
then the resulting interface may become overly-complex, e.g. if there was a button
for each capability of the system. The same applies for the size of the topology, since
a large topology would not fit on a single screen, and may become incomprehensible.
Furthermore, most existing modelling platforms primarily use a textual language. So
from a user’s perspective, a visual language would require learning a new type of interface.
Chapter 4. Design of Multidimensional Orchestrator 33
One benefit of a textual representation is that it can leverage the mature design of
other textual modelling languages like TOSCA and the Heat template language. Unlike
visual representations, text files are trivial to serialize. This allows users to perform diff,
search and replace, etc. Users can modify and reuse template files. With the exception
of instant feedback and discoverability, textual representations are superior to visual
representation. For this reason a textual language is chosen.
Model
This section considers how an application topology is modelled and represented. A
modelling language implies both the language syntax and semantics. We have the option
of choosing an existing language or creating a custom language. For a custom language,
we can choose an arbitrary mapping from the syntax to the semantics, since we would be
writing the parser. However, for an existing language (i.e. when using an existing parser),
we are bound by the semantics of the language. For instance, consider the code fragment,
[1,2,3,4] . If we write a custom parser, we can interpret this as an associative array,
whereby pairs of elements correspond to key value pairs. This is valid JSON 1, and a
JSON parser interprets this as a list with the elements 1 , 2 , 3 , and 4 . As
in the case of a new language, we can define a custom mapping from the syntax to the
semantics, e.g. interpret this as an associative array by defining pairs of elements as
key-value of pairs. However, this would require additional logic and extra time and space
overhead due to an extra phase of parsing, i.e. due to parsing the output of the JSON
parser. Furthermore, although we can define key-value pairs as above, {1:2, 3:4} ,
better maps to user’s intuition of how key-value pairs should be represented. The point
here is that there are many ways to model an application; however, some representations
are better with regards to resource usage and user experience.
In general, there is no restriction on either the nodes or the connection between them.
Therefore, we choose to model the application as a graph. An application can be fully
described as the 2-tuple of the list of nodes and the list of connections between the nodes
(this includes each node’s properties and each connection’s properties). To represent
these in a succinct manner, we use the following conventions. We distinguish between
scalars, i.e. number and strings, and containers, i.e. lists and associative arrays (also
called objects). Containers can hold scalars or other containers. A list is represented as
follows:
- 1
1JSON is a data serialization language based on the JavaScript language’s syntax for representingvarious data types. This will be discussed in more detail in the next chapter.
Chapter 4. Design of Multidimensional Orchestrator 34
- 2
- 3
An associative array is represented as follows:
name: foo
type: bar
Based on this, we get the following representation of a graph of nodes.
-
- nodeA
- nodeB
-
- nodeB
- nodeC
The application to be modelled cannot wholly be represented as above. Specifically,
nodes and links have additional properties that must be specified. We can achieve this
by creating a separate list of node objects that contain each node’s properties.
nodes:
-
name: nodeA
-
name: nodeB
-
name: nodeC
edges:
-
- nodeA
- nodeB
-
- nodeB
- nodeC
However, links may also have properties. Specifically, a property on the link from
nodeA to nodeB, may not be the same as that from nodeB to nodeC. Thus, we can
represent it as follows:
nodes:
-
Chapter 4. Design of Multidimensional Orchestrator 35
name: nodeA
-
name: nodeB
-
name: nodeC
edges:
-
src: nodeA
destination:
-
endpoint2: nodeB
-
src: nodeB
destinations:
-
endpoint2: nodeC
However, this becomes somewhat unreadable. An alternative would be:
nodes:
-
name: nodeA
-
name: nodeB
-
name: nodeC
edges:
-
endpoint1: nodeA
endpoint2: nodeB
-
endpoint1: nodeB
endpoint2: nodeC
At the cost of some redundancy, this allows for a more intuitive representation.
Chapter 4. Design of Multidimensional Orchestrator 36
4.3 Resource Management Overview
A resource management scheme must consider how different resources should be managed,
i.e. the interface exposed by each resource controller, and how resources should be
provisioned, i.e. are resources provisioned on a managed middleware layer or directly
on a substrate. We will first discuss these considerations and then iteratively design a
system that satisfies our requirements.
4.4 Resource Provisioning Model
The resource provisioning model refers to how resources are provisioned and deprovi-
sioned on heterogeneous infrastructures. Resource provisioning, especially provisioning
VMs, is a key consideration, because it affects the design and capabilities of the orches-
trator. Since, we are taking an iterative approach to designing the requisite system, for
the initial iterations, we will only consider public clouds. Furthermore, no major public
cloud provider exposes a SDN interface. Therefore, in our discussion, public cloud is
synonymous with legacy cloud.
We are designing a system that extends the native capabilities of legacy clouds. This
can be achieved by either natively supplementing the existing RMS or by providing
the capabilities in an overlay manner. Our design must not assume that we have any
privileged access to the cloud, or its API; therefore, our only option is to provide the
functionality in an overlay manner. In general, our system should be designed from the
perspective of a third-party module. In this view, the choice of resource provisioning
model also affects the thickness (i.e. the amount of logic contained in) of the correspond-
ing resource provisioning middleware, between the user and the cloud (see Figure 4.2).
There are three main models for provisioning resources, as explained below.
4.4.1 Native Provisioning
The simplest approach is to use the native API exposed by the cloud provider (see
Figure 4.3). This has the benefit of working with a stable and mature API, that is
directly provided by the cloud provider. As new capabilities are added to the cloud, the
API is updated to reflect this. This, however, leads to a strong coupling between the
orchestration system and the specific cloud. This is suitable if the cloud provides all the
requisite functionality, and the user only wants to interface with one cloud.
Chapter 4. Design of Multidimensional Orchestrator 37
Public Cloud API
Resource Provisioning Middleware
User Request
Varying thickness
Figure 4.2: A conceptual view of how the resource provisioning middleware interfaceswith the user and the cloud.
Figure 4.3: A conceptual view of the native provisioning model.
Chapter 4. Design of Multidimensional Orchestrator 38
Figure 4.4: A conceptual view of the delegated provisioning model.
4.4.2 Delegated Provisioning
A second approach is to have a thin translation layer that takes user requests and maps
them to requests comprehensible by the resource management system, and vice versa for
responses by the cloud (see Figure 4.4). The middleware delegates most of the subtasks
associated with provisioning to the cloud RMS. However, this model keeps some state
information. For instance, this model would track the identifier for each VM that was
created. This system, also achieves uniformity across different resource pools by abstract-
ing away differences in APIs. However, compared to native provisioning approach, this
requires us to write drivers for each cloud we will interface with.
4.4.3 Fully-managed Provisioning
This approach involves creating a fully managed middleware, typically, in the form of a
cloud management platform like OpenStack running on top of the provisioned resources
(see Figure 4.5). If we interface with multiple substrates, e.g. multiple public clouds,
then this provides a uniform interface without having to write any drivers for the clouds
(e.g. as in the delegated approach). This has the benefit of providing a very powerful
management system with a rich feature set, e.g. to manage VMs, images, storage blobs.
However, if this is applied to public clouds, where we generally cannot provision BMs
(i.e. only VMs or containers), this effectively shifts the virtualization stack up. That is
because, OpenStack is primarily intended to be run on bare metals. Therefore, when a
user requests to provision a VM, it will boot the VM on the VM that constitutes the
middleware. See Figure 4.6 for a visualization of this phenomenon.
This greatly degrades performance since user requested VMs must be emulated (cf.
virtualization). These shortcomings can be overcome if the OS and hardware support
Chapter 4. Design of Multidimensional Orchestrator 39
Figure 4.5: A conceptual view of the full middleware provisioning model.
Figure 4.6: The upshifting of virtualization stack when OpenStack is deployed on virtualmachines.
Chapter 4. Design of Multidimensional Orchestrator 40
nested virtualization [20]. Alternatively, this can be overcome with binary translation,
whereby guest OS instructions are translated on the fly by the hypervisor. However, these
technologies are not mature, and not always supported. This approach is best suited when
working with baremetals. However, this may not always be possible, especially in the
context of public clouds.
4.4.4 Discussion
The fully-managed approach allows for uniformity across all resource pools. However,
this approach is infeasible for allocating VMs on public clouds. The delegated approach
provides a thin translation layer between the user and public clouds. If the user in-
frastructure requests span private and public clouds, it is best to use a hybrid resource
management approach, whereby resources on private clouds are provisioned using the
full-managed approach and the resources on the public clouds are managed using the
delegated approach.
4.5 Organization of Resource Controllers
A RMS typically consists of multiple resource controllers. The interface exposed by the
resource controllers and how the different controllers are organized is an open design
consideration. Here we will consider how OpenStack and SDI organize the control and
management plane.
4.5.1 OpenStack RMS
The OpenStack RMS consists of heterogeneous resources, such as compute, network, and
storage, and corresponding resource controllers (see Figure 4.7) . OpenStack employs a
controller-agent model to manage resources. Specifically, when OpenStack is deployed on
a set of physical servers, one server is assigned the role of controller, while the others are
assigned the role of agent. The controller server runs the various controllers (processes).
Controllers expose functionalities in the form of APIs and delegate requests to the corre-
sponding agent. For instance, the compute controller exposes the API to provision VMs.
Requests to provision VMs are received by the controller and delegated to a compute
agent.
Whereas the controllers provide the management interface, the agents provide the
resource itself. A resource agent consists of the raw resource and a physically collo-
cated management process. For instance, referring back to the example of provisioning
Chapter 4. Design of Multidimensional Orchestrator 41
VMs, the agent would run a hypervisor and the management process would translate
the controllers requests into requests comprehensible by the hypervisor. The hypervisor
upon receiving the request would spawn the VM. Note, that the logically singular agent,
may in fact run a stack of processes, as is the case with compute (i.e. nova agent →libvirt → hypervisor). The difference between the agent management process and the
controller is one of scope and abstraction. The agent typically manages a single phys-
ical machine; whereas a controller manages multiple agents. Also, the controller may
be capable of receiving abstract requests (e.g. driven by policy or trigger events). By
contrast, the agent management process can only understand relatively simple requests
of provisioning, updating, and deprovisioning resources.
In addition to the controllers and services concerned with resources, there are other
services that make state changes without directly managing resources. This includes the
identity and access management (IAM) system, the telemetry system, and the orchestra-
tion system. Although agents are most commonly physical servers, they can be anything
capable of supporting an OpenStack deployment, e.g. physical servers, microcontrollers,
or VMs.
4.5.2 Software-defined Infrastructure RMS
The SDI RMS (see Figure 4.8) is built on top of the OpenStack RMS. Therefore, the
resource management broadly follows the controller-agent model of OpenStack. However,
the design diverges in the following key ways:
1. The OpenStack model only manages compute nodes and implicitly relies on net-
working that exists. The SDI RMS by contrast, maintains a global network topol-
ogy, of all virtual and physical servers, switches, and links.
2. The SDI RMS converges the control and management of compute and network re-
sources, through a centralized SDI manager. The SDI manager has oversight over
all resource and resource agents and can decree certain actions be taken by indi-
vidual controllers. This multi-layer management allows most tasks to be handled
by the designated controller, while allowing specific tasks (e.g. those that require
the global state information) to be handled by the SDI manager.
3. The SDI RMS replaces the OpenStack networking stack with one based on Open-
Flow SDN. This combined with a global topology allows it to have fine grained
networking control. For instance, the SDI manager can install rules to redirect
Chapter 4. Design of Multidimensional Orchestrator 42
Figure 4.7: A conceptual view of the OpenStack RMS.
Chapter 4. Design of Multidimensional Orchestrator 43
SDI Resource Management System
SDI Manager Topology Manager Monitoring and Analytics
Resource A Controller Resource N Controller
Resource A Resource N
Open Interfaces
External EntitiesPhysical Resources
Virtual Resources
Design of SAVI RMS
Figure 4.8: A conceptual view of the SDI RMS.
packets in very specific ways. This enables a host of capabilities such as service
chaining and traffic steering.
4.6 Vino Version 1: SDN Orchestration Over a Sin-
gle Legacy Cloud
4.6.1 Initial Design
Having provided the context and relevant discussions, we will now design the first version
of Vino. The objective is to simply perform SDN orchestration over a single legacy
cloud. Here, we have two degrees of freedom, the resource provisioning model, and
the organization of resource controllers. With regards to the organization of resource
controllers, the OpenStack model does not provide SDN functionality. In addition, since
Chapter 4. Design of Multidimensional Orchestrator 44
the SDI model is built on top of the OpenStack model, there is no benefit of choosing
the OpenStack model.
Now, we must choose the appropriate provisioning model. Ideally, we want an ap-
proach that maximizes performance, ceteris paribus. In considering public clouds, the
fully-managed model would provide poor performance. Therefore, we must choose be-
tween the native and delegated approaches. The delegated and native approaches pro-
vides comparable performance. Additionally, the delegated approach abstracts the native
API exposed the cloud provider. Since we need to perform additional steps with regards
to the SDN orchestration, the added abstraction layer can encapsulate this additional
functionality. Therefore, this design will use the SDI model with the delegated provision-
ing approach.
4.6.2 Adapting the Design
Now, we have a skeleton of the design. However, the SDI RMS only works as a native
RMS; whereas, we require these capabilities in an overlay manner. Let us consider how
the SDI RMS natively provides SDN capabilities, and then we can consider how these
capabilities can be provided in an overlay manner.
• A node is provisioned through the compute agent; its virtual MAC and IP address
is registered with the SDI manager.
• The SDI manager maintains a global topology of all connected servers and switches.
• When a VM on one physical host tries to communicate with a VM on another
physical host or to a node in an external network, the packet is sent to a virtual
OpenFlow switch running on the hypervisor. The switch would check its flow table
and after not finding a match would send the message to the OpenFlow controller,
which would subsequently send it to the SDI manager. The SDI manager would
determine the optimal route (i.e. shortest path) and install flows on the initial
switch as well as other switches in the path.
• The packet would then get forwarded to its destination.
• This allows communication between any two VMs, or between VMs and external
networks. This can also be used to install newer, higher priority flows, for instance
to create a dynamic service chain.
Let us analyze each component of the control stack and how it can be used to create
our overlay SDN.
Chapter 4. Design of Multidimensional Orchestrator 45
SDI Manager, Topology Manager
The SDI manager is responsible for accessing the global topology, and installing rules
based on it. The SDI manager requires the topology manager for topology information
and the SDN controller to actually install the rules. We can just run a local copy of the
SDI manager and topology manager to provide this functionality.
OpenFlow Controller (Ryu)
The OpenFlow controller installs the rules created by the SDI manager. Separating the
OpenFlow controller from the SDI manager allows us to add more controllers to scale
up. Other OpenFlow controller could also be used. We can run an OpenFlow controller
to provide this functionality.
Switches
Switches are the last element in the networking control stack. Whereas, the native SDI
deployment uses hardware switches to do packet switching, we can use software to do
this switching. Our only constraint is that the switch support OpenFlow. Our options
are:
1. Open vSwitch (OVS) [34]
2. ofsoftswitch [4]
3. Lagopus [43]
We choose OVS because of the maturity and support for the project.
Overlay Networks
The network stack we described cannot manage the native network provided by the
cloud. Instead, we must create our own overlay network. We can achieve this by using
a tunneling protocol. Our options are VXLAN and GRE. Functionally, VXLAN encap-
sulates L2 frames in L4 UDP datagrams; whereas, GRE encapsulates L2 frames in L3
packets. This reduces the the header overhead for GRE tunnels. However, this can also
cause issues, e.g. if a firewall only allows certain transport layer protocols. GRE creates
point to point tunnels, whereas VXLAN creates point to multicast tunnels. Addition-
ally, VXLAN tunnels due to containing UDP headers have higher header entropy and
can allow for better network utilization when there are multiple equal cost routes [33].
For these reasons VXLAN was chosen.
Chapter 4. Design of Multidimensional Orchestrator 46
Legacy Controller (Vino)
SDI Manager
Network Controller (Ryu)
Topology Manager
Cloud Driver
VM
Register Port (MAC Address)
OVS
Install Flows
User Request
Provision VMConfigure OVS/ Create VXLANs
Figure 4.9: A conceptual view of the Vino RMS.
4.6.3 Architecture
Let us present the architecture with all the components (see Figure 4.9 for a visualization
of the architecture). Essentially, our design takes the elements of a network: 1) switch-
ing/routing nodes, 2) mediums connecting these nodes, and 3) logic determining how the
routing should be performed and recreates them in a user-controllable, overlay space.
1. There is a management layer consisting of the SDI manager and the topology
manager. There are two controllers, the network controller, i.e. Ryu, and the
legacy controller, i.e. Vino. In addition, there are cloud driver that Vino interfaces
with in order to provision VMs.
2. The user sends a request to the Vino.
3. The Vino controller sends a request to the cloud driver, which provisions resources,
and relays the response to Vino. Subsequently, Vino configures and runs OVS on
the nodes.
Chapter 4. Design of Multidimensional Orchestrator 47
Legacy Controller (Vino)
SDI Manager
Network Controller (Ryu)
Topology Manager
Cloud Driver
VM
Register Port (MAC Address)
OVS
Install Flows
User Request
Provision VM
Configure OVS / Create VXLANs
Cloud Driver
VM
OVS
Figure 4.10: A conceptual view of the Vino RMS V2.
4.7 Vino Version 2: SDN Orchestration Over Mul-
tiple Legacy Cloud
Here we extend the orchestration systems to work with multiple cloud providers, i.e.
SAVI, AWS, and GCE. Other controllers are organized to work with multiple resource
agents. For instance, the compute controller, nova, can interface with multiple hypervi-
sors by using a middle layer (libvirt) that abstracts the differences between the different
hypervisors. Likewise, we can add more cloud drivers to interface with multiple cloud
providers (see Figure 4.10).
4.8 Vino Version 3: SDN Orchestration Over Un-
managed Resources
4.8.1 Overview
As mentioned before, there are two kinds of substrates that must be accounted for,
unmanaged and managed. Unmanaged resources are standalone physical servers. In
Chapter 4. Design of Multidimensional Orchestrator 48
Legacy Controller (Vino)
SDI Manager Topology Manager
User Request
Hardware Resources
Virtualize
Figure 4.11: A conceptual view of the Vino RMS V3. This shows how unmanagedresources are brought under the purview of a RMS. The logic surrounding resource man-agement is the same Vino RMS V2.
Chapter 4. Design of Multidimensional Orchestrator 49
contrast to managed infrastructures, e.g. public clouds, these resources exist as a disjoint
collections and without a proper provisioning interface. An unmanaged resource after
becoming managed is called a private cloud or a virtual customer premise edge (vCPE).
In order to provision unmanaged resources, they must first come under the purview of a
RMS. Once this is achieved, we can orchestrate over this substrate. This roughly divides
into the following subtasks,
1. manage the substrate
2. orchestrate over the substrate
We have already considered how to orchestrate over a managed infrastructure. The
goal of this section is consider how to manage the substrate.
4.8.2 Types of Virtualization
Managing unmanaged resources is effectively about virtualizing the resources. Here, we
consider the various types of virtualizations that can be applied to physical resources.
Network Virtualization
This refers to running a software switch (typically OVS) on the node(s). Once, we have
OVS, we can connect this node to any other node using overlay tunnels like VXLAN or
OpenVPN tunnels. If other resource pools are involved, this approach creates a logical
L2 network connecting all the nodes. This is comparable to creating a virtual private
network (VPN) that connects the vCPE with the other resources.
Operating-system-level Virtualization
OS-level virtualization refers to running a containerization platform on one or many
nodes. OS-level virtualization refers to running a containerization platform over a cluster
of VMs or BMs, which can subsequently be used to provision containers. Compared with
VMs, containers require fewer resources, have a higher resource utilization, and are easy
to setup. OS-level virtualization includes network virtualization, without which multi-
cloud deployments would not be possible.
Hardware-level Virtualization
Hardware-level virtualization refers to virtualizing the resources such that they can subse-
quently run VMs. This is typically done by either running a hypervisor or an entire cloud
Chapter 4. Design of Multidimensional Orchestrator 50
management platform, like the SDI RMS on the node(s). As with OS-level virtualization,
this includes network virtualization.
Discussion
When adding unmanaged resource pools to the fleet, we must always perform network
virtualization. The different levels of virtualization reflect the different capabilities of
the vCPEs, and the different use cases. For instance, both containers and VMs can
support general computation. The tradeoff between the two is about superior security
and flexibility for VMs, and superior resource utilization for containers. Additionally, it
is possible to perform both hardware-level and OS-level virtualization on the same set of
unmanaged physical servers. This can be achieved by running containers inside a VM,
or on different BM machines in the cluster.
4.8.3 Architecture
We need to create a RMS to manage the unmanaged resources. The unmanaged resources
could be a single resource or a collection of them. Once, we have the bare resources
managed through a RMS, then we can interface with the collection of resources like
another cloud. The options of RMSes to run are:
• OpenStack
• SDI RMS
• CloudStack
• Eucalyptus
We choose the SDI RMS, since we can then leverage SDI capabilities natively (since
overlay have some overhead cost). The SDI RMS can be configured remotely.
4.8.4 Modelling the Substrates
The key component in modelling the substrate is specifying the location of the vCPEs.
As mentioned in the discussion on the SDI RMS (section 4.5.1), resource management
consists of agents and the controller and management layer. The management layer runs
the control stack, i.e. where to provision requests, etc., whereas the agent is where the
actual provisioning happens. The following figure shows an example of how we can model
the distributed, unmanaged resource pool.
Chapter 4. Design of Multidimensional Orchestrator 51
Legacy Controller (Vino)
SDI Manager
Network Controller (Ryu)
Topology Manager
Cloud Driver
VM
Register Port (MAC Address)
OVS
Install Flows
User Request
Provision VM
Configure OVS / Create VXLANs
Cloud Driver
VM
OVS
Container ContainerContainer Container
Figure 4.12: A conceptual view of the Vino RMS V4.
cluster:
controller:
substrate_host: savi
controller_flavor: m1.medium
agents:
-
username: ubuntu
ip_addr: 10.12.1.2
hw_virt: true
4.9 Vino Version 4: Container Orchestration
4.9.1 Overview
Thus far we have primarily focused on orchestrating VMs. However, we would also like
to consider containers, since they offer a lightweight alternative to VMs. Although there
are many flavors of containers, we will focus on Docker containers due to their popularity
Chapter 4. Design of Multidimensional Orchestrator 52
and maturity. In addition to being lightweight, Docker containers improve the workflow
in two additional ways. Specifically, they
1. improve packaging of applications and avoiding dependency problems since each
container can encapsulate arbitrary packages and arbitrary versions of packages
2. improve distribution of packaged containers through a global image registry
4.9.2 Architecture
Containers, and specifically Docker, greatly improve the packaging of applications. They
overcome dependency issues related to unmatched dependencies or broken source repos-
itory (i.e. where a dependency is fetched from). Additionally, containers can be used to
achieve much higher resource utilization. Consider, a deployment with two nodes that re-
quire different versions of the same library. Without containers this would require either
creating separate VMs or using an ad-hoc approach to achieve this. Creating separate
VMs can be resource inefficient, since separate VMs have the additional overhead of each
extra OS that must be run. Additionally, there is a minimum size a VM can be; however
containers have no such restriction and can be packed more densely. Therefore, in certain
situations, containers provide an advantage over VMs. Containers are provisioned as per
user requests. The user can specify, which containers must be colocated (i.e. share the
same VM).
In order to orchestrate containers, we leverage native cloud APIs where available.
Regardless, of the availability of native container APIs, the workflow around container
orchestration does not change much. In order to orchestrate containers, we first provision
VMs. These VMs will act as hosts for the containers. We configure these VMs to run the
Docker container engine, and Docker container cluster manager, Docker Swarm. Docker
Swarm exposes an API to provision containers on the different nodes in the cluster.
Chapter 5
Implementation of Multidimensional
Orchestrator
5.1 Overview
This chapter will report on the implementation of the system designed in the previous
chapter. We begin this chapter by presenting implementation considerations that affect
the whole design, such as the choice of programming language, libraries and frameworks
used, e.g. for remote execution. Then, we consider the design and implementation of the
Vino system, i.e. its components and how they interact.
5.2 Programming Language
Here we compare the choice of programming language that were considered. Although,
there are numerous programming language, we only considered two languages, Java and
Python. In this context, when we discuss programming languages we are referring to
the language and the most common interpreter, compiler, or runtime associated with the
language. Therefore, in the following, Java should be interpreted as the Java language
and the Java Runtime Environment (JRE). Likewise, Python should be interpreted as
the Python language and the CPython interpreter. The following discussion will focus
on aspects that distinguish the two languages.
5.2.1 Java
Java is an object-oriented programming language with a large user base and ecosystem
(in terms of 3rd party modules, and publicly available code snippets). Java is a strongly
53
Chapter 5. Implementation of Multidimensional Orchestrator 54
typed (i.e. less likely to perform type conversions) and statically typed (most type
information is known at compile time) language. This means that bugs arising due to
type inconsistencies are easily caught by the compiler. Indeed, Java was designed to
overcome the security issues arising from unsafe C and C++ code. Java provides a
hybrid execution model that compiles code to an intermediate representation, which is
interpreted at runtime.
5.2.2 Python
Python is a programming language that supports procedural and object-oriented pro-
gramming paradigm. Python also has a large user base and ecosystem. Python is a
strongly typed and dynamically typed (type information is determined at runtime) pro-
gramming language. Therefore, bugs that arise due to type inconsistencies cannot be
caught until runtime. Compared to Java, Python runs slower (assuming typical non-
optimized code). In terms of memory and resource overhead, this varies between the two
languages. Python was designed to be easier to read, write, and debug.
5.2.3 Discussion
The key tradeoff between Python and Java is that of ease of development and perfor-
mance. From the perspective of this thesis, it is important to work with a language that
would allow agile prototyping. Therefore, the Python language is chosen.
5.3 Data Serialization Language
Here we analyze the various data serialization languages that are considered. Data serial-
ization languages allows data and data structures to be encoded in a format that can be
stored in a file and/or transmitted over a network. By contrast, programming languages
are primarily intended to express computation on data. Although, one could define a seri-
alization language from a subset of constructs in a programming languages (e.g. JSON is
inspired by JavaScript), the two are different. In our work, a data serialization language
would be needed to model the underlay and application topologies.
5.3.1 XML (Extensible Markup Language)
XML is a data serialization format that has a hierarchical tree like structure. All child
nodes are contained in enclosing opening and closing tags. Properties associated with the
Chapter 5. Implementation of Multidimensional Orchestrator 55
node itself are embedded in the opening tag. XML is intended to be human and machine
readable, and has multiple parser implementations in Python. However, the nested tree
structure makes the language verbose. This affects both human’s ability to visually parse
XML and has a corresponding computation cost in increased storage and processing.
5.3.2 JSON (JavaScript Object Notation)
JSON is a data serialization language inspired by the JavaScript programming language
and how it expresses various data types. JSON has first class support for scalars: numbers
and strings, associative arrays (also called objects), and arrays. JSON replaces the
opening and closing tags of XML with curly braces and square brackets, to represent
associative arrays and arrays, respectively. This also effectively reduces the size and
visual clutter of JSON files.
5.3.3 YAML (YAML Aint Markup Language)
YAML is a data serialization language that was inspired by JSON, Python, and others.
YAML further improves the visual clarity of languages like JSON, by making indents
significant, i.e. indents and dedents imply the structure of the data. This makes it
especially well suited for human readability and machine interpretation. YAML (versions
greater than 1.3) is strict superset of JSON.
5.3.4 Custom Language
Custom language refers to a language designed solely to encode the application orches-
tration topology. This encoding could be very efficient since we can create a strong
alignment between the syntax of the language, and the concepts being expressed.
For illustrative purposes let’s assume our language is YAML based and we want to
encode a VM, we could do the following:
virtual -machines:
- vm -1
- vm -2
Since YAML only provides very generic data types, the extra information must be
explicitly specified, i.e. the virtual-machines . By contrast, a custom language could
encode virtual-machines as a dot (.), e.g.
Chapter 5. Implementation of Multidimensional Orchestrator 56
.
- vm -1
- vm -2
The biggest shortcoming of this approach is that it requires us to write a custom
parser.
5.3.5 Discussion
To write a custom parser would be a very time consuming undertaking. For this reason,
the custom language approach is not considered further. The remaining three languages
have well implemented parsers in Python. Although, there may be differences in per-
formance (i.e. memory and time) of these parsers, our preference is for a language that
supports quick prototyping and is expressive and clear both from the perspective of the
developer and the final end user. On account of its more verbose and noisier syntax,
XML is excluded from consideration. YAML is a proper superset of JSON, therefore, it
has all the benefits of JSON. However, YAML is whitespace sensitive, which compared
to brackets and braces allows data to be expressed more clearly. In fact, the abstract
modelling languages, described prior, are examples of valid YAML. For these reasons, we
have chosen YAML.
5.4 System Architecture
The previous chapter explained the design considerations that led to the design of Vino
and its components. Here, we will reconsider those in the context of its implementation.
Broadly speaking, the Vino system performs the following:
1. manages any unmanaged substrates, and
2. orchestrates over heterogeneous substrates
These two phases can be divided over two components. The Bootloader, called as
such because of its conceptual similarity to the boot loader program in computer oper-
ating systems, is responsible for creating a RMS to manage unmanaged infrastructures.
The Orchestrator is responsible for orchestration over the heterogeneous infrastructure
landscape.
Chapter 5. Implementation of Multidimensional Orchestrator 57
5.4.1 Bootloader
The bootloader is responsible for bringing unmanaged resources under the purview of
a RMS, so that they can subsequently be orchestrated over. This phase must only be
performed when vCPEs are concerned. When the bootloading phase completes, the
unmanaged resource pool is transformed into a cloud. In order to bootload a resource
pool, the user must specify where the resources (i.e. physical servers) are located. This
is realized in the form of an underlay topology file that contains the IP address of the
resources. If the bootloader is being run from the same network as the servers, then
private IP addresses can be used. Otherwise, the resources must have public IP addresses.
The configuration process is designed to be automatic without requiring manual oversight.
This is achieved by using sensible defaults and allowing the user to specify configuration
changes through the underlay topology file.
5.4.2 Orchestrator
The orchestrator is responsible for the orchestration of applications. Similar to the boot-
loader, the orchestrator reads a topology file. It parses a topology and determines the
graph of resources, i.e. the different nodes, how they are connected and other depen-
dencies. It then determines which cloud drivers will be responsible for which tasks, and
delegates the tasks accordingly.
5.5 Bootloader Design
As described before, the bootloader is responsible for instantiating the control stack over
unmanaged resources. Here, we document the various aspects of its implementation.
5.5.1 Parser
The following figure shows the modelling language that we previously designed to model
the substrate.
cluster:
controller:
substrate_host: savi
controller_flavor: m1.medium
agents:
-
Chapter 5. Implementation of Multidimensional Orchestrator 58
username: ubuntu
ip_addr: 10.12.1.2
hw_virt: true
The first step the bootloader must perform is to parse the underlay topology file.
To parse the YAML based modelling language, we use a Python YAML parser called
PyYAML. The parser takes a YAML file as input and returns a data structure as output.
This data structure informs us where the agents are located and where the controller
node will be located.
5.5.2 Remote Code Execution
Once the Parser has performed the initial pass, we know where the agents and controllers
are located. The agents are typically unmanaged physical servers. The controller node is
either run on a physical server or on a VM, e.g. on a public cloud. To run the controller
node, we must first provision a VM on the specified provider (provider refers to the cloud
provider that will host this VM). Once, the controller is provisioned, we must perform
remote code execution to create the RMS. The following are the options for remote code
execution. These systems are also called configuration management systems.
Ansible
Ansible is a configuration management system, with additional capabilities to deploy
applications and execute arbitrary code. Ansible uses a YAML based DSL language
to express tasks. This list of tasks is contained in a file called a playbook. The user
specifies the nodes and the corresponding playbooks they would like to execute on each
node. Additionally, Ansible has a declarative syntax (i.e. the user specifies the state they
would like to achieve rather than the steps towards it, e.g. instead of specifying to create
the file /home/foo, the user specifies that a file should exist). In this regard, Ansible
attempts to allow users to specify configuration at a higher-level of abstraction. Ansible
has a large number of modules, including those for file IO, networking, and monitoring.
Additionally, any third-party developer can write custom modules. Finally, Ansible runs
over SSH, is agentless, and is written in largely dependency-free Python, which makes
setup very easy.
Puppet, Chef
The chief difference between Puppet or Chef and Ansible is the learning curve and ease
of doing your first deployment. These systems use a master-agent model, whereby an
Chapter 5. Implementation of Multidimensional Orchestrator 59
agent process runs on each machine that must be configured. Additionally, the agents
must pull updates from the master. This approach is beneficial if the intention is to run
a large number of commands; however, for a small number of commands, this turns out
to be inefficient. This is because there is an overhead tradeoff between running an agent
and pushing commands. Specifically, running an agent requires more system resources;
however, each agent pulls only the required configuration (as opposed to Ansible which
pushes all changes to be applied, e.g. create a file /foo, even when it exists). Also,
these systems use HTTPS; which requires a non-trivial step to set up the certificates.
Additionally, the remote system may not be able to run the agent for many reasons
(lack of system resources, unmet dependencies). All these lead to a more involved setup
process. Also, both of these have a steep learning curve.
Python
This is not so much a solution in itself, but rather a method of structuring the solution.
Specifically, since Ansible is written in Python, and the Python ecosystem consists of
additional libraries, this means using any of these libraries in user defined top level
programs to perform the required tasks.
Discussion
Ansible is the superior choice since it avoids the pitfalls of Chef and Puppet, namely
being agentless, having a large module ecosystem, and declarative syntax. As described
previously, the SAVI RMS is based on the agent-controller model. In this model, there
is some control stack that runs on the controller node. There is corresponding agent
program that runs on the agent node. Therefore, we have a Ansible playbook for agents
and hosts that configures them as required.
5.6 Orchestrator Design
Here we discuss the implementation of the various components of the Orchestrator.
5.6.1 Parser
The parser is the component that parses and realizes the topology. In this section we
will consider the different phases of the parser.
Chapter 5. Implementation of Multidimensional Orchestrator 60
The following is the final topology that we designed in the previous chapter. We refer
to the language as the Vino template language (VTL), and the parser as VTL parser or
simply parser.
nodes:
#Nodes only have the property ‘name ‘
-
name: nodeA
-
name: nodeB
-
name: nodeC
edges:
-
endpoint1: nodeA
endpoint2: nodeB
-
endpoint1: nodeB
endpoint2: nodeC
Figure 5.1: Final version of VTL.
The topology in this form only expresses how the nodes are connected together. As
we noted before, we needed a way of expressing other properties about nodes and edges.
Specifically, nodes can either be virtual machines or containers. The cloud can be SAVI
(native and vCPE variants), AWS, GCE. Additionally, orchestration of nodes requires
specifying the image to boot. The following is a prototypical topology file that in addition
to the topology information, contains auxiliary information.
Parsing Phase
The parsing phase is when the parser gathers all the information. When the parsing phase
completes, this topology of nodes and properties of each node is known. The topology
exists as a list of nodes, and a list of endpoint pairs representing each point to point
connection. Alternatively, if no endpoints are specified, all nodes are meshed together.
The properties of a node include the type of node, i.e. VM or container, the image to
be used, the cloud that node should be provisioned in, among others. Although, most of
the properties are resolved by the end of this phase, there may be other properties that
can only be resolved at a later phase.
Chapter 5. Implementation of Multidimensional Orchestrator 61
parameters:
savi_key_name:
description: SAVI keypair name.
aws_key_name:
description: AWS keypair name.
nodes:
-
name: vino_gateway
role: gateway
image: ami -df24d9b2 #ubuntu with ovs
flavor: t2.micro
provider: aws
type: virtual -machine
region: us-east -1
key -name: utils:: get_param(aws_key_name)
security -groups:
- wordpress -vino
config:
-
playbook: playbooks/gateway/playbook.yaml
host: gateway
extra -vars:
webserver_ip: utils:: get_overlay_ip(
↪→ vino_webserver)
-
name: vino_webserver
role: webserver
image: Ubuntu64 -OVS
flavor: m1.medium
provider: savi
type: virtual -machine
region: tr-edge -1
key -name: utils:: get_param(savi_key_name)
security -groups:
- wordpress -vino
config:
-
playbook: playbooks/webserver/wordpress.yaml
host: webserver
Figure 5.2: Example of a VTL file with the complete set of features.
Chapter 5. Implementation of Multidimensional Orchestrator 62
#Leave blank for mesh
edges:
declarations:
-
name: wordpress -vino
type: security -group
description: security group for vino
ingress:
-
from: -1
to: -1
protocol: icmp
allowed:
-
0.0.0.0/0
-
from: 22
to: 22
protocol: tcp
allowed:
-
0.0.0.0/0
-
from: 80
to: 80
protocol: tcp
allowed:
-
0.0.0.0/0
-
from: 4789
to: 4789
protocol: udp
allowed:
-
0.0.0.0/0
-
from: 6633
to: 6633
protocol: tcp
allowed:
-
0.0.0.0/0
egress:
Figure 5.3: Continuation of the above topology file.
Chapter 5. Implementation of Multidimensional Orchestrator 63
Provisioning Phase
The provisioning phase is when the resources are provisioned. Resources can be classified
as either hard or soft, depending on whether they use system resources, or are only logical
objects, respectively. For instance, a VM would be a hard resource since it consumes
system resources and a security group would be a soft resource since it is only a state
change (except, the resources required to create the state change). There are two hard
resources, nodes which can be either VMs or containers, and network connections (al-
though this is not strictly true, since the underlay network always exists and the overlay
network is more of a logical entry). The system first provisions soft resources, such as
security groups and SSH keys- since users may requires these for the creation of nodes.
Next, we provision the nodes. The provisioning phases completes with the creation of
network tunnels, connecting the nodes as per user specification.
Configuration Phase
The configuration phase, configures the VMs as specified by the user. The user can specify
multiple Ansible playbooks to be executed on any given host. The configuration phase
determines which playbooks to run on which hosts, including resolving any unknown
parameters. It then runs the matching playbooks on the hosts. Users can also use the
same playbook for multiple hosts, by specifying the hostnames in the playbook.
5.6.2 Declared Types
This section discusses all the resource types that can be requested to be provisioned.
Nodes
These represent the computational nodes, in the form of either virtual machines or con-
tainers. Nodes additionally contain other properties, like name (for symbolic referencing)
the image (OS distribution and installed packages), flavor (system resources, e.g. 2GB
RAM, 1 CPU core, 20 GB disk), the associated security groups, SSH keys, the cloud
service provider, the region, the tenant (if applicable).
Edges
Edges represent the bidirectional communication links. Edges have the properties end-
point1 and endpoint2, which are the symbolic names of the nodes, representing the two
Chapter 5. Implementation of Multidimensional Orchestrator 64
endpoints of a link. Additionally, there is a boolean property called secure , which
determines if the link is encrypted.
Declarations
These represent resources other than nodes and edges. These include logical constructs
like security groups, key pairs. Here logical refers to the fact that these objects corre-
spond to state changes and affect the workings of other resources, as opposed to physical
constructs like virtual machines.
5.6.3 Parametrization
The topology to be deployed may be similar across multiple deployments. One of the
reasons for using a textual representation was to be able to reuse the text file. To
these ends, the VTL parser allows values to parametrized. The parameterized values
can then be accessed through a special construct, which looks like utils::get_param
. For instance, if a parameterized value is called foo, then any place in the a VTL file,
where one expects the value of foo, we can substitute utils::get_param(foo) . The
parameterized variables can either be set through the environment variables or through
the config file. If there are duplicates then the environment variable takes precedence.
The choice of supporting both a config file and environment variables was intended
to increase user flexibility. Specifically, the user can keep multiple config files, e.g. for
development and production. However, files may be accessible by the other users (de-
pending on file permission). By using environment variables, it adds a layer of security
(albeit weak).
5.6.4 Configuration File
As part of the design effort to allows for parameterizable template files, we allow the user
to specify a configuration (config) file. The config file contains the login credentials for
the various clouds and are part of the authorization and authentication system of Vino.
The config file also specifies other default values, such as the default SSH key, the default
region, etc.
5.6.5 Special Forms
In the previous section, we described a mechanism for accessing parameterized vari-
ables, which could be read from a config file or from an environment variable. The
Chapter 5. Implementation of Multidimensional Orchestrator 65
core of VTL is concerned with specifying topologies to orchestrate. Specifically, the
parser is responsible for provisioning VMs or containers and networking them as defined
by the user. However, the parser exposes a user-extensible mechanism to encapsulate
arbitrary Python code and access it during the various phases. These are called spe-
cial forms, and utils::get_param is an example. Special forms have the structure:
<namespace>::<form name>::(<arg1>, ... , <argN>) . Lets consider this piece-
wise:
• <namespace> refers to the namespace that the form belongs to. Forms are orga-
nized by namespace, so related special forms can be grouped together.
• <form name> refers to the unique name of the form.
• <arg1>,...<argN> refers to the N positional arguments that the form accepts.
Although, special forms are very powerful, they can also be easily abused to encap-
sulate arbitrary unstructured logic. Therefore, special forms should be used sparingly,
and only when the requisite functionality is not otherwise available. When extending
forms the user can specify when the form is resolved, e.g. before parsing, after parsing,
before provisioning, after provisioning, before configuration, or after configuration. Other
examples of forms are:
• aws::get_image_id::(<image name>) . To provision VMs on AWS requires that
the user specify the image identifier (ID). However, the image ID varies between
different regions. This special form takes the image name and returns the ID for
the current region, i.e. based on the node definition.
• utils::install_ovs_2_3_3 , installs OVS version 2.3.3 on a provisioned VM.
This is useful when a cloud does not provide default images with OVS installed.
5.6.6 Dependency Resolution
The parser performs semi-intelligent dependency resolution. In general, we can efficiently
and sensibly perform dependency resolution if the graph of dependencies forms a directed
acyclic graph (DAG). However, in practice, it may not be possible to achieve this. For
instance, consider two nodes A and B. If we want to configure A to ping B, and B to
ping A, then a naive implementation may deadlock since the system would not provision
A until it knows Bs IP address, and vice versa for B.
The above example foreshadows the solution. Specifically, the parser is divided into
phases, i.e. parsing, provisioning, configuration. All tasks in one phase are performed (as
Chapter 5. Implementation of Multidimensional Orchestrator 66
opposed to performing all tasks for one node), and resolutions performed before moving
on to the next phase. This approach also has some shortcomings, i.e. when dependencies
are inter-phase. However, in practice these are very uncommon and this approach is
effective.
5.6.7 Cloud Drivers
The various cloud drivers provide the API to perform tasks on a specific cloud. Based on
our design in the previous chapter, the cloud driver is a thin wrapper around the various
cloud providers APIs. Having an abstraction, makes the system more robust in many
ways. First it protects against discontinued and changing APIs. Specifically, if the cloud
providers API has changed, we can make changes to the specific driver to account for
this. Although the driver is supposed to be thin, if needed it can encapsulate arbitrary
logic.
Separating the cloud drivers from the parser logic is also beneficial in other ways.
First it ensures that parser logic is focused on delegating tasks to individual drivers.
This keeps the parser lean and makes it easily extensible. Finally, different provider
APIs have different ways of achieving the same tasks, e.g. to provision a VM, the AWS
API requires, client.run_instances(<some arguments>) ; while GCE API requires,
compute.instances().insert(<some arguments>) . The wrapper can normalize the
API and makes testing the modules easier. The current implementation of the Orches-
trator has the following drivers, and hence supports the following clouds:
• SAVI
• AWS
• GCE
5.6.8 Creating the Topology
Automatic Master Creation
The config file has an option for automatically creating a master node. The master node
runs the control stack, i.e. SDI manager, the topology manager, the SDN controller
(Ryu) and the multi-domain controller, Vino. The Vino controller provisions the nodes
and registers the MAC addresses of the nodes with the SDI manager (see 4.10 for more
details). Subsequently, the SDI manager, or the SDN controller can install flows on the
switches, e.g. to do service chaining.
Chapter 5. Implementation of Multidimensional Orchestrator 67
5.6.9 Logical Resources
The logical resources are the constructs aside from compute nodes and networks. This
includes things like security groups and SSH keys. Below we discuss their implementation.
Security Groups
Security groups encapsulate the logic around network access of virtual machines. This
includes the ingress and egress ports that are open and the nodes to whom they are open.
This is specified as the following 3-tuple:
• protocol type, i.e. TCP, UDP, or ICMP
• port number or range of ports ( note this is just set to -1 for ICMP)
• allowed IP addresses in CIDR notation (e.g. 192.168.10.11/16)
SSH Keys
SSH keys refer to the private and public key pairs that are used for accessing virtual
machines. SSH can work with both keys and user chosen passwords. However, keys offer
superior security, since they are much harder to brute force and are never accessible to
the remote server. The Vino system performs the following steps to create and register
SSH keys.
1. Check if the local machine has an SSH key (i.e. check the default location: /.ssh).
If a key exists then proceed to the next step, otherwise create a keypair using the
ssh-keygen utility.
2. Check if the remote end, i.e. the cloud has your SSH public key. If there is no key,
then upload the public key.
3. If a key exists, there are two ways to proceed further; by name or by public key
4. By name means that a user specifies a key name. If the key name exists remotely,
check if the remote public key matches the local public key. If it does, then use
this key. Otherwise, raise an exception and let the user handle this.
5. By public key means that the user specifies the public key (i.e. the local public
key) and the corresponding key on the remote end is used. If there is no match on
the cloud, then this key is uploaded.
Chapter 5. Implementation of Multidimensional Orchestrator 68
5.6.10 Virtual Machines
Here we discuss the various components of the parser related to virtual machines.
Provisioning
To provision virtual machines, we leverage the cloud drivers. These cloud drivers expose
API to provision virtual machines. They accept the image name, SSH key, and flavor
(i.e. the amount of system resources that are allocated).
Configuration
A comprehensive orchestration system should be able to support both provisioning and
configuration of nodes. Other orchestration systems such as OpenStack Heat enable this
by allowing the user to specify code that is injected inside the VM and executed after
it is provisioned. This can be arbitrary shell code, e.g. install packages, create files
etc. Although, this can be used to perform arbitrary configuration, this is hard to write,
maintain, and debug, in part due to the inherent complexities of shell script.
One approach could be to extend VTL to allow users to specify common configu-
ration tasks. However, there are many configuration tasks, which can be called in an
innumerable number of ways. Therefore, it is challenging not only from a development
perspective, but also from the perspective of the user, who would need to learn another
language. As mentioned before, we use Ansible to perform remote configuration and code
execution. Indeed, Ansible is widely used in industry. Therefore, in order to facilitate
the configuration of nodes, we allow users to specify playbooks that must be executed on
a node. Additionally, multiple nodes can use the same playbooks, with the specific role
of a given node determining which part of the playbook is executed on each node. The
following figure shows the node configuration as specified in a topology file.
config:
-
playbook: playbooks/firewall/playbook.yaml
host: firewall
extra -vars:
webserver_ip: utils:: get_overlay_ip(
↪→ vino_webserver)
gateway_ip: utils:: get_overlay_ip(vino_gateway)
Figure 5.4: Node configuration snippet. User can specify a list of configurations in theform of playbooks.
Chapter 5. Implementation of Multidimensional Orchestrator 69
Ansible is written in Python, and the core Ansible libraries are directly accessible
from Python code. Therefore, instead of interfacing with Ansible directly, we wrote a
wrapper class for interfacing with Ansible. The parser calls this wrapper to execute
playbooks on nodes.
Autoscaling
We added support for autoscaling. Autoscaling requires two components, alarms, i.e.
notification of certain events, and a corresponding action. This logic is a thin wrapper
around the functionality exposed by the clouds. This feature is only implemented for
AWS.
5.6.11 Containers
There are two aspects of containers. The first is managing individual containers. The
second is managing clusters of containers. Vino supports the ability to provision one or
many containers. This feature is implemented on AWS and SAVI.
5.6.12 Network Tunnels
Here, we consider the different network tunnels, namely unsecure VXLAN tunnels and
secure OpenVPN tunnels.
VXLAN (Unsecure Tunnels)
Networks connections can be either secure or unsecure. Unsecure connections are im-
plemented as VXLAN tunnels. VXLAN tunnels encapsulate entire L2 frames in a UDP
datagrams. This first requires a bridge on both hosts, created through OVS. We then
create virtual network interfaces and assign them a private IP address. We then add this
interface (also called ports) on the bridge we created. Finally, we create a VXLAN port,
which corresponds to an endpoint of a VXLAN tunnel. Thereafter, any packets being
sent to a matching private IP address, are sent over the VXLAN tunnel.
OpenVPN (Secure Tunnels)
We use OpenVPN to achieve secure tunnels. OpenVPN offers two ways of creating secure
VPN tunnels, namely routed, which creates IP based tunnels and bridged, which creates
layer 4 Ethernet based tunnels. A bridged VPN requires a TAP virtual network adaptor.
Whereas, a routed VPN requires a TUN virtual network adaptor. Practically, TUN
Chapter 5. Implementation of Multidimensional Orchestrator 70
Process A
OVS (VTEP)
OuterL2 Frame
OuterIP Packet
OuterUDP Datagram
Inner L2 Frame
Inner L2 Frame Process B
OVS (VTEP)
Inner L2 Frame
VXLAN Header
Figure 5.5: A conceptual view of vxlan tunnels.
Chapter 5. Implementation of Multidimensional Orchestrator 71
Physical Connection Virtual Connection
Client 1
Application A
OpenVPN
Inner L2 Frame
Client 2
Application B
OpenVPN
Inner L2 Frame
TAP Device
TAP Device
Server
OpenVPN
TAP Device
Figure 5.6: A conceptual view of an OpenVPN setup.
Chapter 5. Implementation of Multidimensional Orchestrator 72
based tunnels can only be used for IP trafficking. However, since, the network controller
works with Ethernet frames and MAC addresses, we use the latter approach.
A bridged VPN connection works with a client initiating a UDP connection to the
server. In order to perform mutual authorization, the client and server must have access
to the certificate authority’s (CA) certificate. The client and server can then validate
each other, and establish an encrypted channel using the tunel layer security (TLS)
cryptographic protocol. Subsequently the nodes are assigned an internal IP address, and
clients can communicate with each other with data being relayed through the server,
using the assigned IP address.
5.7 Traffic Steering
5.7.1 Overview
Once the nodes are provisioned and the network channels are configured, then we can
perform advanced traffic steering. Traffic steering refers to dynamically changing the
routing strategy and forwarding of packets. For instance, packets heading from node A
to node B, can be redirected through a graph G (an arbitrary collection of nodes). When
the goal of traffic steering is to insert intermediate middleboxes, it is referred to as service
chaining.
The default network behavior between two communicating nodes is for the traffic
to take the shortest path. Specifically, the communication of two VMs is based on the
shortest path route determined by the SDI manager. The communication between two
VMs starts with the sender sending a packet to the first switch in the path; for our
case this is the OVS bridge on the VM. When two VMs communicate, the intermediary
switches try to find a match based on the address. If this is the first time the VMs are
communicating, there would be no matching rules and a notification will be sent to the
SDN controller, and relayed to the SDI manager. The SDI manager will use the topology
manager to determine the shortest path and install the appropriate forwarding rule on
all the intermediary switches.
These rules (also called flows) have an associated priority. To perform steering, e.g.
when A sends a packet to B, those packets pass through G, we install higher priority
flows on switches. These higher priority flows cause the traffic to take an alternative
route. The forwarding of packets from G to B must be handled by G. That is because,
the network stack has no control once the packet is delivered to a userspace program.
Therefore, if we want G to transparently forward packets to B, this must be handled by
Chapter 5. Implementation of Multidimensional Orchestrator 73
Figure 5.7: The Vino Portal can be used to create service chains
the userspace program on G.
The SDI manager exposes a HTTP Restful API to install high priority flows as per
user requirements. The API requires the addresses of the head, tail, and middle of the
service chain. This API can be called directly. Additionally, to facilitate, the creation of
service chains, I created a portal. This is documented in the following section.
5.7.2 Portal
We created a portal to facilitate the creation of service chains. The portal is built using
JavaScript for the frontend and Python for the backend. The frontend was developed
using jQuery and D3.js; whereas the backend was developed using Flask. The portal
currently only works for SAVI and AWS. The main aspects of the portal are an au-
thentication and the ability to create service chains. To authenticate, the user provides
credentials. The user can authenticate against SAVI, AWS, or both. It then validates
the credentials against the IAM systems of SAVI and AWS, respectively. Once the user
is authenticated, a token is generated. Subsequent requests must be accompanied by the
token.
The interface is primarily designed to facilitate the creation of service chains. After
the user authenticates, the system requests a list of all the nodes. Each node is displayed
as a circular element on the canvas. Users can click two nodes successively to create a
link between the nodes, represented by a black connecting line. This represents the first
Chapter 5. Implementation of Multidimensional Orchestrator 74
rung of a service chain, i.e. from the head to the middle. Next, we need the second rung
representing the link from middle to the tail. In addition, the user can click a link to
delete a chain.
In addition, the portal is designed to work with any SDI manager, i.e. this could be
the native underlay manager or an overlay SDI manager. The user then clicks the Create
chain button to create chain.
Chapter 6
Evaluation
This chapter documents the functional and performance evaluation of the Vino system.
The objective of functional analysis is to determine the capabilities of the system and
how these align with the requirements of the systems. We will consider how application
topologies can be realized using the Vino system.
The objective of performance analysis is to evaluate the performance of the system.
We want to determine the space and time overhead of the system and how much of the
system resources are used to do useful work. We will then measure the scalability of
the system as whole, i.e. the system’s ability to handle more requests as the allocated
system resources increases. All the experiments were performed on medium sized virtual
machines (unless otherwise stated). Specifically, on SAVI we used the m1.medium flavor
which corresponds to 2 vCPUs, 4096MB RAM. vCPUs roughly correspond to cores that
are allocated to a VM. The underlying physical machines typically use Intel Xeon micro-
processors, albeit the exact microarchitecture is unknown. Likewise, AWS experiments
were conducted on t2.medium instances, which correspond to 2 vCPUs and 4096MB of
RAM running on Intel Xeon. Each stated value was the result of running the experiment
a 100 times and averaging the result.
6.1 Functional Evaluation
In the introduction, we identified the objectives of this thesis. Specifically, the functional
objectives were to have a system that supports:
1. advanced traffic steering
2. management and orchestration over managed and unmanaged resources
75
Chapter 6. Evaluation 76
3. enhanced network security
The Vino system meets all these objectives. Specifically, the overlay management
enables advanced traffic steering. The orchestrator and bootloader enable management
and orchestration over heterogeneous infrastructures. Finally, using OpenVPN tunnels
improves network security (i.e. integrity and confidentiality). Furthermore, with regards
to orchestration, Vino through special forms allows interfacing with existing components,
such as existing VMs, and security groups. This is a shortcoming of OpenStack Heat and
AWS CloudFormation, since they consider elements declared in a template to represent
an isolated and self-contained deployment. These, however, do support the use of static
specifications, e.g. specifying the name of an existing security group. However, these are
not always useful, e.g. if the user wants the identifier of a newly created security group.
Using special forms allows interfacing with arbitrary components.
6.1.1 WordPress Firewall Exposition
The topology file shown in Figures 6.1 and 6.2, represents a wordpress deployment dis-
tributed over SAVI and AWS infrastructures. It consists of a web server, a gateway
and deep packet inspection middlebox. The scenario is that we have provisioned a web
server, perhaps running a WordPress based blog. Note, this example is simplified to
demonstrate the capabilities of Vino. In reality, the web server would be split into the
web server proper and the database, each of which would be horizontally scaled. In this
view, the gateway server can also be viewed as the load balancer. Regardless, we notice
there is a security vulnerability in the web server and that it is susceptible to SQL injec-
tion attacks. Ideally, we would like to patch the vulnerability without taking down the
server. The SDI capabilities facilitate this. We can create a service chain, such that all
traffic going from the gateway to the webserver is sent to the DPI unit located on the
SAVI cloud. The DPI unit then analyses each packet and transparently forwards non-
malicious packets to the webserver. When the patch is created, we can apply a hotfix
and remove our previous service chain. This achieves dynamic service chaining without
service disruption (See Figure 6.3).
Based on the topology file, we can see that each node contains a provider property,
which specifies where the node must be located. Thus, we have orchestration capabilities
over multiple cloud providers. Furthermore, we can achieve network security by specifying
that the point to point tunnels be encrypted by setting each edge’s secure property to
true. Finally, by using the Vino portal as shown in Figure 5.7, we can perform arbitrary
service chaining, thereby satisfying the goal of advanced traffic steering. Therefore, the
Chapter 6. Evaluation 77
parameters:
savi_key_name:
description: SAVI keypair name.
aws_key_name:
description: AWS keypair name.
nodes:
-
name: vino_gateway
role: gateway
image: ami -df24d9b2 #ubuntu with ovs
flavor: t2.micro
provider: aws
type: virtual -machine
region: us-east -1
key -name: utils:: get_param(aws_key_name)
security -groups:
- wordpress -vino
config:
-
playbook: playbooks/gateway/playbook.yaml
host: gateway
extra -vars:
webserver_ip: utils:: get_overlay_ip(
↪→ vino_webserver)
-
name: vino_webserver
role: webserver
image: Ubuntu64 -OVS
flavor: m1.medium
provider: savi
type: virtual -machine
region: tr-edge -1
key -name: utils:: get_param(savi_key_name)
security -groups:
- wordpress -vino
config:
-
playbook: playbooks/webserver/wordpress.yaml
host: webserver
Figure 6.1: Example of a VTL topology file.
Chapter 6. Evaluation 78
#Leave blank for mesh
edges:
-
endpoint1: vino_gateway
endpoint2: vino_webserver
secure: true
declarations:
-
name: wordpress -vino
type: security -group
description: security group for vino
ingress:
-
from: -1
to: -1
protocol: icmp
allowed:
-
0.0.0.0/0
-
from: 22
to: 22
protocol: tcp
allowed:
-
0.0.0.0/0
egress:
Figure 6.2: Continuation of the above topology file.
Chapter 6. Evaluation 79
Overlay SDN Network
Cloud 1 Network
SDI Management LayerBefore Chaining
After Chaining
Cloud 2 Network
Gateway
Web Server
DPI Middlebox
Figure 6.3: An example of a service chaining. The user specifies the endpoints, i.e. theGateway and the Web Server, and the middlebox, i.e. the DPI. This install rules on theswitches that forwards traffic going from the Gateway to the Web Server to be sent tothe DPI instead, which transparently forwards it to the Web Server. This can be usedfor arbitrary VNFs.
Vino system satisfies the functional requirements as initially outlined.
6.2 Performance Evaluation
There are two aspects of evaluating Vino: the performance (time and space) of the
Vino parser, and the cost of running the overlay SDI management stack. For the Vino
parser, we want to understand the resource overhead of the parser. The corresponding
experiment first runs the Vino parser to deploy various topologies. These topologies only
consider the provisioning of VMs on SAVI and AWS. The reason for only provisioning
VMs (as opposed to including other entities like security groups) was because it takes
the longest time, and is the most frequently performed operation.
Next, we evaluate the overlay SDI management. The goal here is to determine the
scalability of the overlay system. We approach this by first evaluating the resource
overhead of the underlying technologies, namely OpenVPN and VXLAN tunnels. Then,
Chapter 6. Evaluation 80
we measure the performance of the SDN controller, Ryu in isolation. We do this by
sending a large number of events to Ryu and measuring the number of events it can
handle. We then perform this experiment again by running a single Ryu instance together
with the SDI manager. We, then scale out the number of Ryu instances.
6.3 Vino Parser
Here, we perform experiments to evaluate the overhead of the Vino parser. We do this
by creating multiple topologies over SAVI and AWS. The topologies correspond to 1, 2,
4, 8, and 16 VMs being provisioned on each cloud. The time for the parser to run can
be divided as the time to parse the topology file and the time to make the API calls. In
order to evaluate the time overhead of the parser, we will measure the total time taken
to provision the topology and compare this with the time taken to make the API calls
and for the resources to become available.
Additionally, we would like to see the memory overhead of the parser as opposed to
making direct API calls. Since, the parser includes the API calls, the time and memory
overhead of the parser will be strictly greater than that of the API calls. However, we
want to assess whether the benefits of ease of use and reusability are enough to justify
this overhead. While evaluating the parser, we will focus on provisioning VM instances,
since VM provisioning is the most expensive and common operation. In the following
set of experiments we are only interested in evaluating the parser. Therefore, we will not
consider the overhead of creating tunnels and running the control stack. These will be
addressed in the subsequent experiments.
6.3.1 Parsing Time Overhead
The following are the results of running the experiments on increasingly complex topolo-
gies. Here, we consider the total time take to provision VMs, i.e. starting from the
parsing of the topology file, including the API calls to provision resources, until all the
VMs are accessible over SSH. The times were measured using Pythons time module. We
perform this experiment for topologies on both AWS and SAVI. As before, we repeated
this experiment 100 times and averaged the times.
6.3.2 Memory Overhead
Next we would like to measure the memory overhead of Vino. There are two sources
of memory overhead: the fixed memory overhead arising from things like making API
Chapter 6. Evaluation 81
Number of nodes Time on SAVI (sec-onds)
Parser Overhead(seconds)
Parser Overhead(Percentage)
1 37.6813590527 0.651310248375 1.732 95.4224300385 0.679185665571 0.724 128.043931007 0.683220799153 0.518 197.128249168 0.688514838585 0.3416 332.347186089 0.714512329835 0.21
Table 6.1: Total time to allocate various topologies and the parser overhead on SAVI.
Figure 6.4: The total parsing and provisioning time as a function of number of nodes onSAVI.
Number of nodes Time on AWS (sec-onds)
Parser Overhead(seconds)
Parser Overhead(Percentage)
1 54.116526842 0.37127437458 0.682 67.9349241257 0.357185625719 0.524 68.0365350246 0.320741057381 0.478 75.2786910534 0.425184816508 0.5616 86.6466450691 0.467112294583 0.53
Table 6.2: Total time to allocate various topologies and the parser overhead on AWS.
Chapter 6. Evaluation 82
Figure 6.5: The total parsing and provisioning time as a function of number of nodes onAWS.
calls and variable memory overhead on account of per node information being stored.
We used guppy-PE Python module to measure memory usage. Due to the dynamicity
of Python, it is difficult to examine the total memory used by any object. For instance,
a Python list will typically include unused cells in order to make the addition of new
elements more efficient. Therefore, for this experiment we first created all the topologies
using only the native API. This provided us with the baseline memory usage. Then,
we created all the topologies using the Vino parser. We then subtracted the memory
overhead. The following table shows the differential memory overhead.
Number of nodes SAVI Memory Overhead(KB)
AWS Memory Overhead(KB)
1 832 8122 1272 12224 1774 16788 2178 205816 2554 2524
Table 6.3: Total memory used for the different topologies.
Chapter 6. Evaluation 83
6.3.3 Discussion
The above experiments evaluated the parser cost of multi-cloud orchestration. Specif-
ically, we evaluated the space and time overhead of the Vino parser and found that
memory usage was roughly equal to 460*N + 372 (in bytes) (where N is the number of
nodes). The time overhead was typically less than 1% of the total time taken. However,
the orchestration system facilitates the specification of topologies, and management of
multi-cloud topologies- which justifies this overhead.
6.4 SDI Overlay
The following set of experiments evaluate the overlay SDI system. The goal of the
following experiments is to determine the resource overhead and scalability of the SDI
overlay. This includes assessing the scalability of point to point links, and of the entire
management stack. Specifically, we conducted experiments to measure the throughput
of VXLAN and OpenVPN tunnels. Then we measured the response time of the SDN
controller in isolation, and finally the SDN controller with the SDI manager.
6.4.1 VXLAN Tunnels
We will first assess the throughput of VXLAN tunnels. This will help us isolate the
performance degradation due to tunnels and those due to the control stack (i.e. SDN
controller and SDI manager). VXLAN works by encapsulating entire L2 frames inside
UDP datagrams. This causes reduction in throughput and increase in delay. Specifically,
the added header overhead reduces the amount of useful data that can be transmitted.
Furthermore, the packet encapsulation and decapsulation causes an increase in delay.
Throughput
We will measure the throughput of VXLAN tunnels by using iPerf. iPerf is a network
bandwidth measurement tool. iPerf uses a client-server model. One node runs the iPerf
server, while another runs the iPerf client. The two perform a handshake and exchange
information regarding how much total bytes will be transmitted. Then the two nodes
transfer data, and then calculate the bandwidth. In order to better understand results,
we will also measure the throughput of the underlying network channel.
As aforementioned, VXLAN encapsulates L2 frames in UDP datagrams. However,
the sender and recipient only see the inner L2 frame. Therefore, the encapsulation
Chapter 6. Evaluation 84
Figure 6.6: Comparison of underlay and VXLAN throughput for various configurations.
and decapsulation is performed by non-terminal points. Specifically, VXLAN requires
VXLAN tunnel end points (VTEP) to perform this task. In our case, the OVS bridge
containing the VXLAN port acts as a VTEP. This processing time affects the throughput
of the VXLAN link. We ran this experiment for point to point links between two SAVI
nodes, two AWS nodes, and SAVI and AWS nodes.
Endpoints Underlay Throughput(Mbps)
VXLAN Throughput(Mbps)
SAVI and SAVI 3392 906SAVI and AWS 180 162AWS and AWS 160 101
Table 6.4: Comparison of underlay and VXLAN throughput for various configurations.
Space
We can analytically determine the space overhead of VXLAN tunnels. Specifically,
VXLAN encapsulates a whole L2 frame in a UDP datagram. So the space overhead
is 50 bytes (14 bytes Ethernet + 20 bytes IP + 8 bytes UDP + 8 bytes VXLAN header).
Chapter 6. Evaluation 85
Ethernet typically has an maximum transmission unit (MTU) of 1500 bytes. Therefore,
the space overhead of VXLAN is 3.33%.
Discussion
The performance of VXLAN tunnels varies depending on the underlying bandwidth.
Specifically, there are two regions of interest: when the underlay throughput is roughly
under 1Gbps, and when it is over 1Gbps. In the first region, we notice a slight degra-
dation in throughput. Typically, the performance penalty is 10 - 35%. However, in
the second region, i.e. above 1Gbps, the performance degradation is much higher. The
performance penalty arises due to: 1) bigger header and 2) processing delays. VXLAN
adds 50 bytes of additional header. For the experiment, the maximum transmission unit
(MTU) was set to 1500 bytes. This amounts to a 3.33% overhead. That is, since VXLAN
packets have bigger headers, even if processing delay was constant, there would be per-
formance degradation. The remaining degradation can be attributed to processing delay,
i.e. from encapsulating and decapsulating the transmitted frames. Specifically, OVS
performs VXLAN encapsulation and decapsulation entirely in software, which can only
be performed so fast. As such, this operation becomes a bottleneck, and as the underlay
throughput increases, the achieved throughput using the overlay tunnels degrades.
6.4.2 OpenVPN Bridged Tunnels
We repeat the above experiment again, now with OpenVPN bridged tunnels. The goal
again was to determine the space and time overheads of having OpenVPN tunnels.
Throughput
Endpoints Underlay Throughput(Mbps)
OpenVPN Throughput(Mbps)
AWS and AWS 170 51
Table 6.5: Comparison of underlay and OpenVPN tunnel throughput.
Space
OpenVPN bridged tunnels are somewhat similar to VXLAN tunnels. OpenVPN encap-
sulates entire L2 frames in UDP datagrams. However, OpenVPN does not have anything
Chapter 6. Evaluation 86
Figure 6.7: Comparison of underlay and VXLAN throughput.
like VXLAN headers, therefore the space overhead is 42 bytes (14 bytes Ethernet + 20
bytes IP + 8 bytes UDP).
Discussion
OpenVPN tunnels had a throughput of about 30% of the underlying network channel,
i.e. an overhead of about 70
6.4.3 Testing Ryu
Next, we ran experiments to test the performance of the SDN controller Ryu. For this,
we used cBench [38]. cBench is a library designed for testing OpenFlow controllers. It
tests them by sending a very large number of OpenFlow events to controllers and then
seeing how many events are responded to per unit time.
6.4.4 Testing the SDI Manager
For the next step of experiments, the goal was to evaluate the SDI manager. The SDI
manager can work with multiple SDN controllers. We will first run the experiment with
Chapter 6. Evaluation 87
Minimum Numberof Responses (persecond)
Maximum Numberof Responses (persecond)
Average Number ofResponses (per sec-ond)
Stddev (per sec-onds)
2571.26 2767.27 2705.92 46.03
Table 6.6: Statistics on the number of responses sent by the Ryu SDN controller.
a single Ryu controller. Next, we will scale out the number of Ryu instances. We will
perform these in a similar way as the above experiments, i.e. using cBench.
Single Ryu Topology
This experiment considers the performance of the a single SDN controller and SDI man-
ager together.
Minimum Numberof Responses (persecond)
Maximum Numberof Responses (persecond)
Average Number ofResponses (per sec-ond)
Stddev (per sec-onds)
162.00 352.00 239.73 77.25
Table 6.7: Statistics on the number of responses sent by the Ryu SDN controller andSDI manager.
Scaling Out Ryu: Increasing the Number of Ryu Instances.
This experiment considers the performance of the multiple SDN controller and SDI man-
ager together.
Number of con-trollers
MinimumNumber ofResponses (persecond)
MaximumNumber ofResponses (persecond)
Average Num-ber of Re-sponses (persecond)
Stddev (perseconds)
2 54.00 392.00 218.14 79.324 102.00 403.00 268.49 65.188 98.00 386.00 258.49 68.85
Table 6.8: Statistics on the number of responses sent by the Ryu SDN controller andSDI manager.
Chapter 6. Evaluation 88
Figure 6.8: Performance of our network control stack compared with a single standaloneRyu instance.
Chapter 6. Evaluation 89
6.4.5 Discussion
For the SDN controller alone, we noticed it could handle 2700 events per second. When
we reran the experiment with the SDN controller acting as proxy to the SDI manager,
the number of events dropped to around 250 events per second. As we scaled the number
of controllers, the number of responses largely stayed the same. This was expected since
the SDN controller itself was not the bottleneck and therefore scaling it would have no
impact on the scalability of the whole system.
To interpret these results, we must understand how the SDI manager is intended to
be used. The SDI manager is meant to be a centralized manager that can interface with
multiple controllers which interface with OpenFlow switches. In addition, this design
maintains a flow store for caching flow entries (since the on-switch flow tables are limited
in size) at each controller. This hierarchical design is comparable to a multi layer cache
and in effect should reduce the number of events that the SDI manager receives, since
some of the events should be handled by the controller directly. Since, cBench generates
random packets, the flow store cache is never used and all the packets end up going to the
SDI manager. In more realistic scenarios, the SDI manager would receive fewer events
thus allowing it to handle a higher number of events.
Chapter 7
Conclusions
7.1 Summary
The future ICT landscape will be very diverse and consist of large numbers public and
private cloud options, along with vCPEs and sensors. In light of this, this thesis proposed
a system to perform orchestration over multiple infrastructure types and how to bring
advanced traffic steering capabilities to these systems. We started with an initial set of
requirements for an orchestration system. We then iteratively designed increasing pow-
erful orchestration systems. We first designed a system that could support orchestration
over a single public cloud. Next, we considered how we could orchestrate over multi-
ple clouds. Then, we considered how to include unmanaged resources. Whereas, public
cloud already run a RMS; for unmanaged resources we designed a system to bring unman-
aged resources under the purview of a RMS. Finally, we considered how to incorporate
container orchestration.
In addition to performing multi-cloud orchestration, we also considered how to bring
SDN capabilities to legacy clouds. Specifically, we achieved this by extending the SDI
manager and Ryu SDN controller to work with legacy systems. This required us to create
point to point tunnels between the virtual machines. The system allows users to choose
between secure or unsecure tunnels, which trade off security for performance overhead.
This thesis consisted of prototyping the above system. We also contributed a YAML
based language for expressing multi cloud topologies. This includes specifying topologies,
how they should be configured and how we can perform advanced traffic steering over
them. We also contributed a parser that could read these technology file and realize the
specified topologies. Finally, we created a GUI portal to enable creating service chains.
90
Chapter 7. Conclusions 91
7.2 Future Work
This thesis considered how to orchestrate over multiple clouds and how to bring SDN
and SDI management to heterogeneous infrastructures. To these ends we developed a
functional solution. This meant that two secondary objectives, performance and ease of
use were not fully achieved. Although we made some effort to realize these objectives,
namely providing a GUI portal for creating service chains, and multi controller topologies,
there is a large potential for future research on these topics.
With regards to performance, we noted that the provisioning of nodes scaled up well.
However, network throughput had very different behavior depending on whether the un-
derlying network throughput was roughly less than or equal to 1Gbps. For VXLAN, one
of the bottlenecks seemed to be the encapsulation and decapsulation of overlay frames.
This reduced bandwidth utilization efficiency (i.e. the percent of physical layer net bi-
trate used that goes to actually achieved throughput). One way to reduce could be to
investigate other tunneling protocols like GRE. GRE encapsulates L2 frames in IP pack-
ets, so slightly reduces the header overhead. Another venue for improving performance
could be to perform encapsulation and decapsulation using specialized hardware, such as
FPGAs and ASICs. Having VXLAN or OpenVPN tunnels provides us with a lot of flex-
ibility. However, we may not necessarily need tunnels to achieve SDN style centralized
and dynamic routing. Specifically, in [40] Vissicchio et. al propose a way to achieve this.
Although, this work may not necessarily be related, it show that centralized control can
be merely emulating it. This would also be an interesting area of future research.
The advanced traffic steering we proposed is only effective when considering logic in
communication layers (i.e. upto OSI layer 4, transport layer). However, when considering
traffic steering, we may want to chain arbitrary network functions. Therefore, if data gets
passed to the application layer, then steering based on OpenFlow does not work. The
middleboxes (i.e. the service function nodes) would have to be programmed to forward
the traffic to a specific location. But this precisely takes away from the dynamicity and
flexibility of our proposed SDN overlay approach. Another area of future research could
be ways to facilitate more complex traffic steering and service chaining.
Another area of research is related to the ease of use around multi-cloud orchestra-
tion. For instance, when users want to provision a VM, they have to specify an image
identifier. On AWS, the identifier for a vanilla Ubuntu is ami-fce3c696. Furthermore, this
identifier changes across different regions on AWS. Now, consider the tediousness around
specifying the topology. One solution would be to create an ontological mapping where
the user specifies Ubuntu and it resolves the correct identifier depending on the region
Chapter 7. Conclusions 92
and provider. This, is also related to the discoverability of services and offerings. For
instance, AWS offers spot instances, which are unused instances that are auctioned away
at a considerably lower hourly price than on-demand instances. However, if AWS needs
those instances, they will preempt the instances. Spot instances trade stability for re-
duced price. In some cases, e.g. a hadoop worker node or a horizontally scaled webserver,
having an instance preempted has no effect. Thus, providing a high level mapping from
user requirements to actual deployments would be very powerful. Perhaps, this could
be taken to the point where users specify a meta parameter optimization, e.g. deploy a
Wordpress Server, while minimizing response time and having a medium level of redun-
dancy. In summary, pursuing these strains of research can lead to a greatly improved
system.
Bibliography
[1] Amazon web services cloudformation.
[2] Ansible, url = http://www.ansible.com year = 2016, note = Accessed: 2016-3-10,.
[3] Cloudify, url = http://getcloudify.com year = 2016, note = Accessed: 2016-3-10,.
[4] Cpqd openflow software switch 1.3.
[5] Opencontrail.
[6] Openstack, url = http://www.openstack.org year = 2016, note = Accessed: 2016-
3-10,.
[7] Saltstack, url = http://www.saltstack.com year = 2016, note = Accessed: 2016-3-
10,.
[8] Terraform.
[9] Tosca-simple-profile-nfv-v1.0.
[10] Ravello systems — running openstack on aws using devstack and nested vms, 2014.
Accessed: 2016-3-10.
[11] Ahmed Amokrane, Mohamed Faten Zhani, Rami Langar, Raouf Boutaba, and Guy
Pujolle. Greenhead: Virtual data center embedding across distributed infrastruc-
tures. IEEE Transactions on Cloud Computing, 1(1):36–49, 2013.
[12] Ilia Baldine, Yufeng Xin, Anirban Mandal, Chris Heermann Renci, Unc-Ch Jeff
Chase, Varun Marupadi, Aydan Yumerefendi, and David Irwin. Networked cloud
orchestration: a geni perspective. In 2010 IEEE Globecom Workshops, pages 573–
578. IEEE, 2010.
93
Bibliography 94
[13] Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf
Neugebauer, Ian Pratt, and Andrew Warfield. Xen and the art of virtualization. In
ACM SIGOPS Operating Systems Review, volume 37, pages 164–177. ACM, 2003.
[14] Fabrice Bellard. Qemu, a fast and portable dynamic translator. In USENIX Annual
Technical Conference, FREENIX Track, pages 41–46, 2005.
[15] Pankaj Berde, Matteo Gerola, Jonathan Hart, Yuta Higuchi, Masayoshi Kobayashi,
Toshio Koide, Bob Lantz, Brian O’Connor, Pavlin Radoslavov, William Snow, et al.
Onos: towards an open, distributed sdn os. In Proceedings of the third workshop on
Hot topics in software defined networking, pages 1–6. ACM, 2014.
[16] Daniel J Bernstein. Cache-timing attacks on aes, 2005.
[17] George S Boolos, John P Burgess, and Richard C Jeffrey. Computability and logic.
Cambridge university press, 2002.
[18] Margaret M Burnett, Marla J Baker, Carisa Bohus, Paul Carlson, Sherry Yang, and
Pieter Van Zee. Scaling up visual programming languages. Computer, 28(3):45–54,
1995.
[19] Nicolas Ferry, Hui Song, Alessandro Rossini, Franck Chauvel, and Arnor Solberg.
Cloud mf: Applying mde to tame the complexity of managing multi-cloud appli-
cations. In Proceedings of the 2014 IEEE/ACM 7th International Conference on
Utility and Cloud Computing, pages 269–277. IEEE Computer Society, 2014.
[20] Peter B Galvin, Greg Gagne, and Abraham Silberschatz. Operating system concepts.
John Wiley & Sons, Inc., 2013.
[21] Craig Gentry. A fully homomorphic encryption scheme. PhD thesis, Stanford Uni-
versity, 2009. url: crypto.stanford.edu/craig.
[22] Chuanxiong Guo, Guohan Lu, Helen J Wang, Shuang Yang, Chao Kong, Peng Sun,
Wenfei Wu, and Yongguang Zhang. Secondnet: a data center network virtualization
architecture with bandwidth guarantees. In Proceedings of the 6th International
COnference, page 15. ACM, 2010.
[23] Mohammad Hajjat, Xin Sun, Yu-Wei Eric Sung, David Maltz, Sanjay Rao, Kun-
wadee Sripanidkulchai, and Mohit Tawarmalani. Cloudward bound: planning for
beneficial migration of enterprise applications to the cloud. In ACM SIGCOMM
Computer Communication Review, volume 40, pages 243–254. ACM, 2010.
Bibliography 95
[24] Scott Hendrickson, Stephen Sturdevant, Tyler Harter, Venkateshwaran Venkatara-
mani, Andrea C Arpaci-Dusseau, and Remzi H Arpaci-Dusseau. Serverless com-
putation with openlambda. In 8th USENIX Workshop on Hot Topics in Cloud
Computing (HotCloud 16), 2016.
[25] IBM. IBM Operating System/360.
[26] Joon-Myung Kang, Thomas Lin, Hadi Bannazadeh, and Alberto Leon-Garcia.
Software-defined infrastructure and the savi testbed. In Testbeds and Research
Infrastructure: Development of Networks and Communities, pages 3–13. Springer,
2014.
[27] Avi Kivity, Yaniv Kamay, Dor Laor, Uri Lublin, and Anthony Liguori. kvm: the
linux virtual machine monitor. In Proceedings of the Linux symposium, volume 1,
pages 225–230, 2007.
[28] Changbin Liu, Boon Thau Loo, and Yun Mao. Declarative automated cloud resource
orchestration. In Proceedings of the 2nd ACM Symposium on Cloud Computing,
page 26. ACM, 2011.
[29] Changbin Liu, Yun Mao, Jacobus Van der Merwe, and Mary Fernandez. Cloud
resource orchestration: A data-centric approach. In Proceedings of the biennial
Conference on Innovative Data Systems Research (CIDR), pages 1–8, 2011.
[30] Jose Luis Lucas-Simarro, Rafael Moreno-Vozmediano, Ruben S Montero, and Igna-
cio M Llorente. Cost optimization of virtual infrastructures in dynamic multi-cloud
scenarios. Concurrency and Computation: Practice and Experience, 27(9):2260–
2277, 2015.
[31] Nick McKeown, Tom Anderson, Hari Balakrishnan, Guru Parulkar, Larry Peterson,
Jennifer Rexford, Scott Shenker, and Jonathan Turner. Openflow: enabling inno-
vation in campus networks. ACM SIGCOMM Computer Communication Review,
38(2):69–74, 2008.
[32] Jan Medved, Robert Varga, Anton Tkacik, and Ken Gray. Opendaylight: Towards
a model-driven sdn controller architecture. In Proceeding of IEEE International
Symposium on a World of Wireless, Mobile and Multimedia Networks 2014, 2014.
[33] David Melman and Uri Safrai. Network virtualization: A data plane perspective.
2015.
Bibliography 96
[34] Ben Pfaff, Justin Pettit, Teemu Koponen, Ethan Jackson, Andy Zhou, Jarno Raja-
halme, Jesse Gross, Alex Wang, Joe Stringer, Pravin Shelar, et al. The design and
implementation of open vswitch. In 12th USENIX symposium on networked systems
design and implementation (NSDI 15), pages 117–130, 2015.
[35] Thomas Ristenpart, Eran Tromer, Hovav Shacham, and Stefan Savage. Hey, you,
get off of my cloud: exploring information leakage in third-party compute clouds. In
Proceedings of the 16th ACM conference on Computer and communications security,
pages 199–212. ACM, 2009.
[36] SDN Ryu. Framework community,ryu sdn framework,, 2015.
[37] Omar Sefraoui, Mohammed Aissaoui, and Mohsine Eleuldj. Openstack: toward
an open-source solution for cloud computing. International Journal of Computer
Applications, 55(3), 2012.
[38] Rob Sherwood and KK Yap. Cbench controller benchmarker. Last accessed, Nov,
2011.
[39] Abhishek Verma, Luis Pedrosa, Madhukar Korupolu, David Oppenheimer, Eric
Tune, and John Wilkes. Large-scale cluster management at google with borg. In
Proceedings of the Tenth European Conference on Computer Systems, page 18. ACM,
2015.
[40] Stefano Vissicchio, Olivier Tilmans, Laurent Vanbever, and Jennifer Rexford. Cen-
tral control over distributed routing. ACM SIGCOMM Computer Communication
Review, 45(4):43–56, 2015.
[41] Stefano Vissicchio, Laurent Vanbever, and Jennifer Rexford. Sweet little lies: Fake
topologies for flexible routing. In Proceedings of the 13th ACM Workshop on Hot
Topics in Networks, page 3. ACM, 2014.
[42] Jim Wanderer. Case study: The google sdn wan. Computing. co. uk, 11, 2013.
[43] K Yamazaki, Y Nakajima, T Hatano, and A Miyazaki. Lagopus fpga–a repro-
grammable data plane for high-performance software sdn switches. In 2015 IEEE
Hot Chips 27 Symposium (HCS), pages 1–1. IEEE, 2015.
[44] Yuval Yarom and Katrina Falkner. Flush+ reload: a high resolution, low noise, l3
cache side-channel attack. In 23rd USENIX Security Symposium (USENIX Security
14), pages 719–732, 2014.
Bibliography 97
[45] Qi Zhang, Mohamed Faten Zhani, Shuo Zhang, Quanyan Zhu, Raouf Boutaba, and
Joseph L Hellerstein. Dynamic energy-aware capacity provisioning for cloud comput-
ing environments. In Proceedings of the 9th international conference on Autonomic
computing, pages 145–154. ACM, 2012.
[46] Qi Zhang, Quanyan Zhu, and Raouf Boutaba. Dynamic resource allocation for spot
markets in cloud computing environments. In Utility and Cloud Computing (UCC),
2011 Fourth IEEE International Conference on, pages 178–185. IEEE, 2011.
[47] Mohamed Faten Zhani, Qi Zhang, Gwendal Simona, and Raouf Boutaba. Vdc
planner: Dynamic migration-aware virtual data center embedding for clouds. In
2013 IFIP/IEEE International Symposium on Integrated Network Management (IM
2013), pages 18–25. IEEE, 2013.