Optimal Scaling Conﬁgurations for Microservice-Oriented...

Optimal Scaling Configurations forMicroservice-Oriented ArchitecturesUsing Genetic Algorithms

Tobias Nebaeus

Tobias NebaeusMaster’s Thesis in Computing Science and Engineering, 30 ECTS CreditsAutumn 2019Supervisor: Ahmed Ali-EldinExternal supervisor: Daniel KarlssonExaminer: Henrik BjorklundMaster of Science Programme in Computing Science and Engineering, 300 ECTS Credits

Abstract

Genetic algorithms (GAs) are a powerful tool for solving multi-objectiveoptimization problems. Resource allocation and scaling of cloud sys-tems typically involve multiple conflicting objectives, such as high through-put in the presence of failures, cost, and reduced latency. Microservice-based architectures introduce additional complexities since the underly-ing services respond differently to different workloads. In this work, theperformance of two multi-objective GAs is compared on the problem offinding efficient scaling configurations of a microservice-based archi-tecture. Results show that while the use of GAs is effective at findingefficient configurations, GAs can not be used for larger systems involv-ing many microservices or for systems that make use of caching.

Acknowledgements

I would like to thank Ahmed Ali-Eldin and Jerry Eriksson for their feed-back and guidance. I would also like to thank my external supervisorDaniel Karlsson, as well as Nasdaq for providing a place to work andexcellent coffee. Finally, I would like to thank Daniel Harr, withoutwhose cooperation this thesis would not have been possible.

Contents

1 Background 11.1 Introduction 11.2 Goals 11.3 Problem Statement 21.4 Introduction to Microservices 21.5 Genetic Algorithms and Resource Allocation 41.6 Related Work 4

2 Theory 62.1 Multi-Objective Optimization 62.2 Genetic Algorithm 7

2.2.1 Chromosome Representation 82.2.2 Population 82.2.3 Crossover 82.2.4 Selection 102.2.5 Mutation 102.2.6 New Generations 10

2.3 NSGA-II 112.4 NSGA-III 13

3 Method 163.1 System Overview and Algorithm Implementation 163.2 Objective Evaluation 173.3 Caching 173.4 Developed Test Service 183.5 Test Procedure 183.6 Algorithm Evaluation Metrics 183.7 Test Environment 203.8 Parameters 203.9 Constraints and Invalid Solutions 213.10 Limitations 21

4 Results 224.1 Tests Using File Caching 224.2 Tests Not Using File Caching 284.3 Run Time 294.4 Sock Shop and the Minium System 30

5 Conclusion and Discussion 315.1 Conclusion 315.2 Discussion 325.3 Future Work 32

References 33

Glossary

Container Similar to a lightweight virtual machine, that runs on top of a container engineinstead of a hypervisor.

GA Genetic Algorithm.

MOEA Multi-objective Evolutionary Algorithm.

NSGA-II Nondominated Sorting Genetic Algorithm II.

Pareto front The set of solutions to a multi-objective optimization problem where eachsolution cannot be improved with respect to one objective without impairing another.

Pareto frontier See Pareto front.

Pareto set See Pareto front.

Replica An instance of a service, running inside a container.

1 Background

This chapter introduces the main goals of the thesis, and research questions. It providesan introduction to microservices, and a motivation for using the algorithms that are to beexamined.

1.1 Introduction

Genetic algorithms (GAs) are a powerful tool for solving optimization problems. Some oftheir advantages compared to other methods are that they

• support multi-objective optimization

• are good for noisy environments

• are inherently parallel

• can find near-optimal solutions when an optimal solution cannot be found.

This makes them suitable for optimizing cloud systems with multiple objectives such asmemory or CPU footprint, response time, or throughput.

Scaling microservice-oriented architectures efficiently is not a simple task. Dependen-cies between the services, and the differences in resource usage, makes it hard to determinethe optimal number of instances for each service. Also, different types of requests may re-sult in a completely different load between the involved services. An efficient scaling con-figuration will maximize performance and quality of service with respect to the expectedtypes and frequencies of requests, while minimizing resource usage and cost.

This project implements two genetic algorithms to optimize the scaling configurationof microservice-oriented architectures, and compare their performance. The project is donein cooperation with Nasdaq, and one of the goals of the project is to use the methods ofoptimization on their Minium system in order to discover efficient scaling configurations.

1.2 Goals

The main goal of the thesis is to evaluate the effectiveness of using genetic algorithms to findoptimal scaling configurations of a microservice-based system developed by Nasdaq. Thissystem will be referred to as the ”Minium system” in this thesis. The system is deployedusing Kubernetes, a popular platform for managing containerized workloads and services.Because the expected load of the system at a certain point in time is known beforehand, nodynamic scaling mechanism or auto-scaler is needed. As the system consists of a number ofmicroservices which can be scaled independently and which are taxed differently dependingon the type and number of requests, a static scaling solution needs to be able to determinehow to scale each microservice for each expected load pattern. Doing this manually is

1

time-consuming and ineffective as the optimal configuration might change as the under-lying services change during development. Thus, the goal is to develop a method whichcan find optimal scaling configurations of each microservice in the system for a certain ex-pected load. This means finding the optimal number of instance of each microservice, inKubernetes terms, the optimal number of replicas of each pod. Genetic algorithms providea way to do this without having knowledge about the underlying services’ dependencies andinteractions.

A secondary goal is to examine the possibility of adding other resources to a scalingconfiguration, such as the amount of memory or CPU allocated to each service.

1.3 Problem Statement

This thesis aims to find answers to the following questions:

• Are genetic algorithms a viable method for the purpose of finding optimal scalingconfigurations for microservice-based applications?

• Are there any requirements or constraints on the microservice architecture to makethe method viable?

• Which, of the more popular variants of GAs, is best suited to the task?

The algorithms will be compared using the following metrics:

• Convergence to a pareto-optimal front

• Width and spread of the generated pareto front

• The number of evaluations required for convergence

1.4 Introduction to Microservices

The idea of microservices is a software architectural pattern that has evolved from service-oriented architectures (SOA), and is becoming more popular as companies are moving theirinfrastructure to the cloud. Service-oriented architectures consist of a number of discreteapplication components, implemented as services that can be accessed remotely through acommunication protocol over a network.

Microservices takes the idea of SOA one step further, with fine-grained services andlightweight, technology-agnostic protocols. The main reason for adopting this architecturalstyle is to ease development, testing and deployment, rather than to improve performance.The concept of microservices as it is used today is still fairly new at the time of writing, andthe current microservice literature is very sparse [8, p. ix]. Microservices is still an evolvingidea, and the definition of what a microservice architecture is varies.

Some of the typically cited benefits of using microservices are

• Since technology-agnostic protocols are used, each service can be implemented usingthe programming language best suited to that particular service.

• Services can be developed by small autonomous teams independently and in parallel

• They enable continuous delivery and deployment

2

• Microservices can be scaled independently and thus promote elasticity

Wolff argues that the most important difference between SOA and microservices is thelevel at which the architecture aims. Where SOA considers the entire enterprise, microser-vices represent an architecture for an individual system at the project level. In SOA, the inte-gration solution is also responsible for orchestrating the services, while a in a microservice-based architecture the integration solution does not possess any intelligence. [19, pp. 81-82]

An application that is not compromised of a set of services following the SOA or mi-croservice architectural pattern is typically referred to as a monolith. Figure 1.1 shows atypical monolith architecture [8]. Incoming requests are handled by the frontend through aload-balancing layer which splits the load evenly among the backend servers. The backendis deployed as a single application on multiple servers. Any changes to the application, evena small bug fix, would require rebuilding the entire application and redeploying it on eachserver.

Single application -monolith

Frontend layer

Load-balancing layer

Backend servers each running acomplete copy of the application

Feature A

Feature B

Feature C

ABC ABC ABC

ABC ABC ABC

Incoming requests

Figure 1.1: A monolith architecture. The features A, B and C are implemented as a single,tightly coupled monolith.

Figure 1.2 shows a similar system, but implemented using a microservice architecture.Each feature A, B and C are implemented as independent microservices. A change to asmall part of the system, say microservice C, would only require rebuilding and redeployingthat microservice.

3

Microservice A

Frontend layer

Load-balancing layer

Backend constisting of manymicroservices working together

Feature A

A

Incoming requests

A A A

B B B

C C C C C

Microservice B

Feature B

Microservice C

Feature C

Figure 1.2: A microservice architecture. Each feature is implemented as an independentservice.

1.5 Genetic Algorithms and Resource Allocation

Multi-objective evolutionary algorithms (MOEA), and especially NSGA-II [17], have beenused with good results for other resource management optimization problems in cloud archi-tectures [10][21]. This motivates the choice of NSGA-II as the main algorithm to examinein this thesis. A close alternative that also will be examined is NSGA-III, which is morerecent, and is developed to better support many objectives [6].

Modeling the problem of scaling cloud-deployed microservice architectures as a re-source allocation problem makes it possible to use the MOEA approach to find optimalscaling configurations.

1.6 Related Work

Similar studies using NSGA-II to optimize container allocation of microservices has beendone by Guerrero et al.[10]. Their study uses configurations that specify the number ofreplicas of each container as well as their allocation to a physical machine. The study relieson estimates for measurements of objective values, which allows for greater population sizesand higher numbers of generations. The good results presented in their paper motivates theuse of NSGA-II in this thesis. Metaheuristic algorithms, which includes GAs, have beengenerally successful at solving cloud-related optimization problems [22] [2] [18] [4]

Metaheuristic algorithms have also been used in dynamic scaling solutions. Chen andBahsoon suggests an adaptive auto-scaling solution based on a multi-objective ant colonyalgorithm. [5]

While generating a pareto-optimal front for the given problem is the main focus ofthis thesis, in a practical application, solutions will be chosen from this front according tosome trade-off preference. Grierson suggests a multi-criteria decision making (MCDM)

4

strategy, PEG-MCDM, which solves the problem of choosing the best solution from thepareto-optimal front according a preferred trade-off [9].

5

2 Theory

This chapter describes the relevant theory behind the optimization methods. The first sec-tion explains the concept of multi-objective optimization, which is different from single-objective optimization in that the result is a set of solutions rather than one optimal solution.In the second section, the concept of a genetic algorithm is explained in detail, along withthe parts of a genetic algorithm which can be tuned or implemented in various ways. Thetwo main algorithms used in the thesis, NSGA-II and NSGA-III are thoroughly explained.

2.1 Multi-Objective Optimization

Single-objective optimization deals with the problem of finding a single, optimal solutionwhich maximizes or minimizes a certain function. Usually, minimization is implied, andunless stated otherwise, will be the case in this thesis. While single-objective optimizationcan handle multiple objectives by, for example, using a weighted sum of all objectives, thisrequires finding optimal values for those weights. Multi-objective optimization considers allobjectives separately, and produces a set of solutions rather than a single one. The obtainedsolutions represent the solutions which are optimal from any kind of trade-off preference.

If the number of objectives exceed three, the term many-objective optimization is oftenused. A reason for making this distinction is that many algorithms that deal with multi-objective optimization, such as NSGA-II, suffer from performance issues when the numberof objectives increase beyond this number [6].

A central concept in multi-objective optimization is that of domination. A solution y′′

dominates another solution y′, written y′′ ≺ y′, if

1. fi(y′′)≤ fi(y′) for all objective functions fff

2. fi(y′′)< fi(y′) for at least one f i ∈ fff .

A solution y∗ is pareto optimal if @y′ ∈ Ω such that y′ ≺ y∗, where Ω is the set of allsolutions to the problem. In other words, there is no other solution that has a better valuefor one objective without also being worse at another objective. The pareto optimal set isthe set of all pareto optimal solutions P∗ = y∗ ∈Ω. The pareto front is simply the objectivefunction values for those solutions PF∗ = f (y)|y∈ P. Figure 2.1 depicts a pareto-optimalfront for two objective functions f1 and f2.

6

f1

f2

Figure 2.1: A pareto-optimal front. The front is the set of marked solutions.

The goal of multi-objective optimization is to find, or approximate, the pareto front, asit represents the best solutions to the problem. A solution can then be chosen from the front,depending on what trade-off is desired.

2.2 Genetic Algorithm

Genetic algorithms were originally introduced by John Holland in 1973[11], based on theidea of natural selection. The basic idea is to keep a population of chromosomes, whichrepresent solutions to a problem, and create new generations by re-combining the fittestmembers of the previous population, while also mutating some of them in order to increasediversity. Algorithm 1 describes the basic structure of a genetic algorithm [15].

Algorithm 1 Genetic Algorithm

Choose an initial population of chromosomeswhile termination condition not satisfied do

repeatif crossover condition satisfied then

select parent chromosomeschoose crossover parametersperform crossover

if mutation condition satisfied thenselect chromosome(s) for mutationperform mutation

evaluate fitness of offspringuntil sufficient offspring createdselect new population

A genetic algorithm contains many mechanisms which can be implemented in variousways, such as how individuals are combined to produce offspring. Some of the most impor-tant are the following:• Chromosome

– Binary or real-coded– Length

• Population– Size– How the initial population is generated: randomly or using some scheme to maximize diversity

• Crossover

7

– Number of children– Uniform, single- or multi-point

• Selection– Tournament, truncated, or based on a probability distribution– Selection pressure and selection intensity

• Mutation– Mutation rate– Which, and the number of, genes to mutate

• New generations– Elitism and population overlap

2.2.1 Chromosome RepresentationA chromosome is a representation of one possible solution. One of the most common rep-resentations is the binary representation, where the chromosome consists of a binary string.For example, the solution x = 3 might be represented as 00000011, assuming numbers are8-bit. Finding an effective representation depends largely on the problem. For example, ifthe search space are the integers between 0 and 255, the mentioned binary representationusing 8-bit numbers covers the entire search space. Also, solutions outside the search spaceare not possible with this representation, and thus crossover and mutation operations cannotgenerate such solutions. If the chosen representation allow for invalid solutions, these needto be handled. This can be done in multiple ways, such as modifying the crossover andmutation operators to only generate valid solutions, or repeat them until the result is valid.

2.2.2 PopulationSince memory and other computational resources are not limitless, the population needs tobe kept at a constant size between generations. A large population size increases diversityand evaluates a larger part of the search space each generation, but is more computation-ally heavy. While it is possible to calculate the optimal size of the population for a givenproblem, it has been shown that in practice, a size of 20-30 is usually sufficient. [15]

The initial population needs to be generated, and this can be done either randomly orusing some kind of scheme to maximize the search space covered by the chromosomes’genes. Non-random generation usually aims to make sure that every possible value of eachgene is included in the initial population, so that every combination of genes can be reachedby crossover from the initial state. Reeves and Rowe suggest a scheme based on a Latinhypercube to accomplish this. [15]

2.2.3 CrossoverCrossover or recombination is the mechanism by which two chromosomes combine in orderto produce offspring. This is done by choosing the value of each gene of the offspringfrom either the first parent or the second, in a randomized fashion. Since this is a binarychoice, a second offspring can be created by simply making the opposite choices. Thiscreates an offspring that contains all the gene values that were not chosen for the first.The simplest forms of crossover are called single-point crossover, n-point crossover, anduniform crossover. Single-point crossover is a special case of n-point crossover using onlyone crossover point. Suppose we have two chromosomes a and b, each consisting of 8variables or genes:

(a1,a2,a3,a4,a5,a6,a7,a8) and (b1,b2,b3,b4,b5,b6,b7,b8).

8

To perform single-point crossover, a crossover point p is chosen randomly from thenumbers 1, ...,7. To create the new offspring, genes values in positions 1 to p are chosenfrom the first parent, and values in positions (p+1) to 8 are chosen from the second parent.Figure 2.2a illustrates single-point crossover with p = 3. N-point crossover works in asimilar fashion, using N randomly chosen crossover points p. Figure 2.2b shows a 2-pointcrossover with p1 = 2 and p2 = 5.

Figure 2.2: Different forms of crossover

a1 a2 a3 a4 a5 a7a6 a8

b1 b2 b3 b4 b5 b7b6 b8

a1 a2 a3 b4 b5 b7b6 b8

b1 b2 b3 a4 a5 a7a6 a8

Parents

Offspring

(a) Single-point crossover

a1 a2 a3 a4 a5 a7a6 a8

b1 b2 b3 b4 b5 b7b6 b8

a1 a2 b3 b4 b5 a7a6 a8

b1 b2 a3 a4 a5 b7b6 b8

Parents

Offspring

(b) Two-point crossover

a1 a2 a3 a4 a5 a7a6 a8

b1 b2 b3 b4 b5 b7b6 b8

a1 b2 b3 a4 b5 b7a6 b8

b1 a2 a3 b4 a5 a7b6 a8

Parents

Offspring

(c) Uniform crossover

If, instead of using crossover points, each gene value is chosen randomly from either thefirst or second parent, the operation is called uniform crossover. This is depicted in Figure2.2c. Uniform crossover has no positional bias, since there is no greater chance of neigh-boring genes to be chosen than any other genes. Single-point crossover has considerablepositional bias. For example, for the chromosomes shown in Figure 2.2a there is only onecrossover point p = 1 which results in a1 being chosen but not a2, out of a total of 6 possi-ble crossover points. On the other hand, no valid crossover point results in both a1 and a8being chosen together. This is obviously a problem if the combination of a1 and a8 is whatgives chromosome a a high fitness value. Holland suggests using an operator that re-ordersthe genes in a chromosome before performing crossover, thus altering the proximity of thegenes and alleviating the problem of positional bias.

SBX, Simulated Binary Crossover is a crossover operator that overcomes some of theproblems of using binary crossover when the search space is continuous [1]. This operatoris often used in GAs, such as NSGA-III [6]. SBX is used in the implementations of NSGA-II and NSGA-III in this thesis as it proved to be more effective than simple binary crossoverat generating a more diverse set of solutions.

9

2.2.4 SelectionThe process of selecting the parents for the next generation is not a straightforward proce-dure, due to conflicting objectives. There are two main goals:

• Selecting the best individuals, in order to keep the best genes in the population and toproduce better offspring.

• Maintaining and increasing diversity, in order to effectively explore the search spaceand not get stuck in local optima.

These two goals are in conflict, as it is necessary to include individuals with lower fitnessin order to increase diversity. Most of the methods used for selection fall into one of twocategories: tournament-based, or based on a probability distribution. Tournament selectionconsists of randomly selecting a number of individuals, and selecting the ”winner”, the onewith highest fitness, for crossover. This is repeated until the desired number of individualshave been chosen. Roulette Wheel selection is one of the more popular methods usinga probability distribution. In this method, the individuals of the current population areassigned a probability of being selected that is proportional to their normalized fitness value.

2.2.5 MutationA mutation operator changes the values of the genes of chromosomes randomly. One ofthe benefits of this operation is that solutions can be generated that cannot be reached byrepeated crossover alone. For example, if the best solution is 10000000, and the initialpopulation does not contain a chromosome with the value of 1 in the first position, the bestsolution cannot be reached by crossover.

Mutation typically involves three random choices:

• Choosing which chromosomes to mutate

• For each chromosome selected for mutation, choose which genes to mutate

• For each selected gene, randomly choose a new allele value

If the chromosome uses a binary string representation, the chosen bits (genes) can sim-ply be set to their complement, and a random choice of gene value is not necessary.

2.2.6 New GenerationsThe most basic way of creating new populations is to perform crossover on randomly chosenindividuals from the set obtained by selection until enough individuals have been generatedto fill the new population. This has the downside in that if the parents had better fitnessthan the children, the parent solutions are lost, and potentially results in a worse population.To keep this from happening, instead of generating new populations by crossover alone, thebest individuals from the old population can be included in the new population. This conceptis referred to as elitism. For single-objective GAs, Reeves and Rowe suggests keeping thebest individual and re-inserting it into the new population if none of the offspring have abetter fitness, replacing one of the children. For multi-objective GAs such as NSGA-II,since the end result is a set of solutions rather than a single one, only keeping one solutionleads to potentially losing other non-dominated solutions of the pareto front. NSGA-II forexample solves this by merging the offspring population with the previous population intoa population of size 2N, and selecting the best N individuals from that population [7].

10

2.3 NSGA-II

NSGA-II (Non-dominant Sorting Genetic Algorithm II) is a common multi-objective opti-mization algorithm based on non-dominated sorting. It is an improvement over the originalNSGA algorithm, focusing on the main criticisms of NSGA: slow non-dominant sorting,lack of elitism, and the need of a sharing parameter [7]. The first central mechanism of theNSGA-II algorithm is the non-dominated sort, which sorts the population into a number offronts according to non-dominance. The first front is the pareto-optimal front of the set ofall individuals in the population, that is, the individuals which are not dominated by anyother individuals. The second front is the pareto-optimal front of the remaining individualswhen the first front is removed, and so on. The original NSGA-II paper suggests a sort-ing algorithm, the fast non-dominated sort, shown in Algorithm 2. While better algorithmshave been developed for non-dominant sorting [16], the original algorithm is used in theNSGA-II implementation of this thesis.

Algorithm 2 Fast non-dominated sort(P) [7]Input: Population POutput: List of fronts Ffor each p ∈ P do

Sp = /0

np = 0for each q ∈ P do

if p≺ q then # If p dominates pSp = Sp∪q # Add q to the set of solutions dominated by p

else if q≺ p thennp = np +1 # Increment the domination counter of p

if np = 0 then # p belongs to the first frontprank = 1F1 = F1∪p

i = 1 # Initialize the front counterwhile Fi 6= /0 do

Q = /0 # Used to store the members of the next frontfor each p ∈ Fi do

for each q ∈ Sp donq = nq−1if nq = 0 then # q belongs to the next front

qrank = i+1Q = Q ∪q

i = i+1Fi = Q

The second central mechanism of NSGA-II is the crowding distance assignment al-gorithm, shown in Algorithm 3. Unlike the non-domination sort, its main purpose is tomaximize diversity and spread among the population. The algorithm assigns a crowdingdistance value to each solution, where a larger value is considered better since it promotesdiversity by being ”further away” from other solutions. Crowding distance is calculated,for each objective, by first assigning boundary solutions (solutions with smallest and largestvalues) infinite distance. Intermediate solutions are then assigned distance values equal to

11

the absolute normalized difference in the objective values of two adjacent solutions. Theoverall crowding distance value of the solution is then calculated as the sum of the distancevalues of each objective. Normalized objective values are used in in the algorithm. Crowd-ing distance is used in NSGA-II to compare two solutions of equal non-domination rank,where a solution with a larger crowding distance is considered better.

Algorithm 3 Crowding distance assignment(I ) [7]Input: Nondominated set of solutions IOutput: Crowding distance values Idistance

l = |I | # number of solutions in Ifor each i do

I [i]distance = 0 # initialize distancefor each objective m do

I = sort(I ,m) # sort using objective valueI [1]distance = I [l]distance = ∞ # so that boundary points are always selectedfor i = 2 to (l−1) do # for all other points

I [i]distance = I [i]distance +(I [i+1].m− I [i−1].m)/( f maxm − f min

m )

The main loop of NSGA-II is shown in Algorithm 4. Elitism is achieved by combin-ing the previous population with its offspring, creating a set Rt = Pt ∪Qt containing 2Nindividuals, where N is the population size. Rt is then sorted using the non-dominated sortalgorithm. This results in a sequence of fronts, each front containing the solutions that dom-inate the following fronts. A new population Pt+1 is chosen first by including all solutionsfrom every front Fi in sequence that can be fitted within the population size N. For the lastfront, crowding distance is used to choose solutions within the front that maximize diversity.The resulting population is then used to create an offspring population Qt+1 by using theusual mechanisms of selection, crossover and mutation. NSGA-II uses binary tournamentselection and a comparison operator, called the crowded-comparison operator ≺n, based onthe solutions’ non-domination ranks as assigned by the non-dominated sort algorithm, andthe crowding distance value:

(i≺n j) if (irank < jrank)

or (irank = jrank) and (idistance > jdistance).

12

Algorithm 4 NSGA2 Main loop [7]

Rt = Pt ∪Qt # combine parent and offspring populationF = fast-non-dominated-sort(Rt) # F = (F1,F2, ...), all nondominated fronts of Rt

Pt+1 = /0

i = 1repeat

crowding-distance-assignment(Fi) # calculate crowding-distance in Fi

Pt+1 = Pt+1∪Fi # include ith nondominated front in the parent popi = i+1 # check the next front for inclusion

until |Pt+1|+ |Fi| ≥ N # until the parent population is filledSort(Fi,≺n) # sort in descending order using ≺n

Pt+1 = Pt+1∪Fi[1 : (N−|Pt+1|)] # choose the first (N−|Pt+1|) elements of Fi

Qt+1 = make-new-pop(Pt+1) # use selection, crossover and mutation to create# a new population Qt+1

t = t +1

2.4 NSGA-III

NSGA-III is a many-objective follow-up to the NSGA-II algorithm which is designed tobetter handle many objectives, generally more than three. Some of the difficulties withprevious MOEAs it aims to solve are

• A large fraction of the population is non-dominated

• Evaluation of diversity measure becomes computationally expensive

• Recombination operation may be inefficient

To alleviate these difficulties, NSGA-III uses a predefined multiple targeted search, with aset of reference points. By emphasizing solutions that correspond to each reference point, awidely distributed set of pareto-optimal points can be obtained. [6]

The NSGA-III algorithm is rather complicated compared to NSGA-II, and this sectiononly presents the key ideas and operations due to space constraints. For the full details ofthe algorithm, see the original paper [6].

The main loop of NSGA-III is very similar to NSGA-II as described in the previous sec-tion. The population is sorted into non-dominated levels and the first fronts are included forthe next population until the next front is too large to fit, and needs to be sorted using someother method. In NSGA-II, this is accomplished using the crowding distance metric. ForNSGA-III, this procedure is replaced by a method based on reference points. A simplifiedlisting of this part of the algorithm is shown in Algorithm 5.

13

Algorithm 5 Reference point-based selection [6]

Compute the number of points K to be chosen for the next population Pt+1 from the nextfront FlNormalize objectives of the individuals chosen so far and the next front with respect tothe reference points using NormalizeAssociate each individual with a reference point with AssociateCompute the niche count of each reference pointChoose K members one at a time from Fl using Niching

The fitness values (objectives) of each individual in the next front is normalized withrespect to the set of reference points. This procedure is shown in detail in Algorithm 6.Individuals are then associated with a reference point using the Associate procedure shownin Algorithm 7. This association is stored in both directions: each individual is associatedwith a unique reference point, while each reference point is associated with a number ofindividuals.

Algorithm 6 Normalize [6]Input: St (set of individuals), Zs (structured points) or Za (supplied points)Output: fn (normalized objectives),

Zr (reference points on normalized hyper-plane)for j = 1 to M do

Compute ideal point: zminj = mins∈St f j(s)

Translate objectives: f ′(s) = f j(s)− zminj ∀s ∈ St

Compute extreme points: z j,max = s : argmins∈St ASF(s,w j), where w j = (ε, ...,ε)T ,ε = 10−6, w j

j = 1, and ASF(x,w) = maxMi=1 f ′i (x)/wi, for x ∈ St

Compute intercepts a j for j = 1, ...,M

Normalize objectives (fn) using equation f ni (x) =

f ′n(x)ai−zmin

i=

fi(x)−zmini

ai−zmini

, for i = 1,2, ...,M.

if Za is given thenMap each (aspiration) point on normalized hyper-plane using above equation and savethe points in the set Zr

elseZr = Zs

Algorithm 7 Associate [6]Input: Zr(reference points), St (set of individuals)Output: π(s ∈ St) (associated reference point),

d(s ∈ St) (distance to reference point)for each reference point z ∈ Zr do

Compute reference line w = zfor each s ∈ St do

for each w ∈ Zr doCompute d⊥(s,w) = s−wT s/||w||

Assign π(s) = w : argminw∈Zr d⊥(s,w)Assign d(s) = d⊥(s,π(s))

14

Niche counts of the reference points are then computed, that is, the number of individu-als associated with each reference point. Individuals are then chosen one at a time until thenew population has been filled, using the Niching procedure shown in 8. Note that whereNSGA-II uses binary tournament selection, NSGA-III has no explicit selection operation,since this is essentially included in the Niching procedure.

Algorithm 8 Niching[6]Input: K (number of individuals to select),

ρ j (number of individuals associated with reference point j),π(s ∈ St) (associated reference point), d(s ∈ St) (distance to reference point),Zr (reference points), Fl (front of individuals)

Output: Pt+1 (next population)k = 1while k ≤ K do

Jmin = j : argmin j∈Zr ρ jj = random(Jmin)I j = s : π(s) = j,s ∈ Flif I j 6= /0 then

if ρ j = 0 then

Pt+1 = Pt+1∪(

s : argmins∈I jd(s)

)else

Pt+1 = Pt+1∪ random(I j)ρ j = ρ j +1, Fl \ sk = k+1

elseZr = Zr \ j

Deb and Jain suggest several methods of generating or supplying reference point for thealgorithm, and suggests a number of reference points roughly equal to the population size[6]. In the NSGA-III implementation of this thesis, the reference points are generated in auniform fashion along the normalized hyperplane generated in the Normalize procedure.

It should be noted that at the time of writing, the source code of the original NSGA-IIIalgorithm has not been released by the authors, and because of this, NSGA-III implementa-tions often differ and can make comparisons difficult [13].

15

3 Method

This chapter describes the test environment and framework, how objective values are mea-sured, as well as the metrics used for performance evaluation.

In order to evaluate the algorithms, two different test systems are used:

• A developed test system, where each service either delegates requests to another ser-vice, or does a predefined amount of work per request. This makes it possible to knowthe expected optimal configurations of the system beforehand.

• Sock Shop, a demo application for testing microservice technologies.

The tests are run on the Google Cloud Platform using their Kubernetes service GKE, GoogleKubernetes Engine. Evaluation of the algorithms’ performance is done using metrics thathave been used in previous work, capturing several aspects:

• The extent of convergence to a Pareto-optimal set and distribution

• The extent of spread among the obtained solutions

3.1 System Overview and Algorithm Implementation

The algorithms and test framework are implemented as two separate modules as shown inFigure 3.1.

Figure 3.1: A system overview of how the algorithms are integrated with the testing frame-work and test system

The interface between the algorithms and the test tool is a simple function evaluatewhich takes a representation of a configuration and returns a list of evaluated objectivevalues. This function is called whenever the fitness of an individual of the GAs is to beevaluated. If a configuration has not been evaluated previously, the test tool applies theconfiguration on the target system using the Kubernetes API and then waits until the systemis re-configured. It then performs a load test against the system, interprets the resulting

16

performance metrics as objective values and returns them to the caller. The objective valuesfor the configurations are cached in memory so that subsequent evaluations of the sameconfiguration can be returned without the need for another test. The test tool also supportsoptional file-caching which is used for repeated tests on the same data set.

3.2 Objective Evaluation

Two objective functions are used for the tests: average response time and resource cost. Thechoice of using two objectives is mainly motivated by the fact that the results can be moreeasily represented and analyzed compared to using three or more objectives.

Response time is evaluated by starting a large number of processes which simultane-ously send requests to the frontend service of the system. The response time of each requestis then stored. The procedure of starting a set of new client threads sending requests andwaiting for them to finish is repeated a number of times before the average of all responsetimes is calculated and set as the value of the objective function.

The measure used for resource cost is simply the total number of replicas. The assump-tion is that each pod requires additional CPU and memory, which makes it necessary toincrease the size of the cluster in order to fit all the replicas, which in turn result in in-creased costs. Because all the services in the developed test system are allocated the sameamount of CPU time and memory, only one objective function is necessary to representtheir resource cost.

3.3 Caching

Due to the time necessary to evaluate one configuration, usually 10-20 seconds, caching offitness values is necessary for the algorithms to finish within a reasonable amount of time.While in-memory caching is always used, file-caching is mostly useful for analyzing thealgorithms as it allows for repeated runs on the same data.

Two different types of tests are used in the thesis:

• Tests based on file-cached evaluations of configurations covering the entire searchspace.

• Tests where the evaluations of configurations are done as needed by the algorithm,and caching is only used for configurations that have already been evaluated duringthe particular run.

Because of the random nature of genetic algorithms, a fair comparison of them requiresmultiple runs, preferably on the same dataset. The first type of test makes it possible to domultiple runs within a short amount of time. It also eliminates the problem of variance inresponse time measurements. With the entire search space covered, it is possible to calculatethe true pareto optimal set using non-dominant sorting, which can then be used to validatethe result of the algorithms. This kind of test is however only possible for small number ofservices and maximum number of replicas, as the time needed to evaluate the entire searchspace quickly grows too large.

The second type of test are how the algorithms would be used in practice, and allowsfor a larger number of services and maximum replicas.

17

3.4 Developed Test Service

In order to provide a stable test system, a microservice-based testing application was de-veloped. This service was designed with a predefined, constant amount work to be donefor each request, and dependencies between microservices such that a rough estimate of theexpected optimal scaling configuration is known beforehand. Each service is deployed in aseparate Docker container. The system and its dependencies are depicted in Figure 3.2.

Entry

A CB

1 2 3

1 Fib 2 Fib 4 Fib

Figure 3.2: The developed test system and the dependencies between its microservices.Arrows depict the direction of requests. Fib refers to the number of times that the functionthat is used to simulate work is called - the function computes Fibonacci numbers. Sinceboth microservice A and B forward requests to service 3, the service calls the Fibonaccifunction 8 times for each request sent to the system.

The service uses a simple REST interface, containing a single resource with a singleoperation, GET. Services A to C only send requests to services below them, and have nodependencies between each other. Services 1 to 3 calls a function that computes Fibonaccinumbers a set number of times.

3.5 Test Procedure

A test run is done by first specifying the parameters of the service, the constraints of theproblem, and the algorithm parameters. The optimization algorithm is then executed, andthe configurations represented by the individuals of the population are evaluated as needed.A configuration is evaluated by first using API calls to set the new requested number ofreplicas for each service. When the current number of replicas matches the requested con-figuration, a load test is initiated, sending requests and measuring response time. For eachgeneration of the GAs, the number of unique (non-cached) evaluations are stored, alongwith the current values of the metrics used for performance evaluation of the algorithms.

3.6 Algorithm Evaluation Metrics

Many performance indicators have been suggested for evaluating MOEAs [3]. Some ofthem rely on knowing the true pareto front. Because of the practical and non-mathematicalnature of the problem examined in this thesis, it is not possible to know the true pareto frontwithout evaluating the entire search space. Thus, metrics were chosen that do not depend

18

on such knowledge. Two metrics are used to compare the algorithms: hypervolume and aspacing metric.

Hypervolume, also known as S-metric, is a metric that is commonly used for comparingmulti-objective optimization algorithms [3]. The hypervolume of a pareto front generatedby an optimization algorithm is the volume of the search space that is dominated by thefront’s solutions, with respect to a reference point. Thus, the closer the pareto front is tothe true optimal pareto set, the larger the hypervolume is. Since the metric does not requireknowledge of the true pareto front, it is suitable for comparing algorithms on problemswhere computing the true pareto front is unfeasible. Hypervolume is given by

HV (S,r) = λm(⋃z∈S

[z;r])

where S is the pareto front approximation, r ∈ Rm is a reference point such that z ≺ rfor all z ∈ S, and λ is the m-dimensional Lebesgue measure [3].

Figure 3.3 shows the hypervolume of a pareto approximation of a two-objective prob-lem.

Hypervolume

f1

f2

Reference point

Figure 3.3: The hypervolume of a pareto approximation of a two-objective problem

To measure the speed of convergence, the hypervolume of the current iteration of thealgorithm is compared to the number of function evaluations. Using this metric, an al-gorithm which explores an unnecessarily large portion of the search space will appear toconverge more slowly. The reference point is chosen by measuring the objective valuesof the extreme solutions and choosing a point that is 10% worse for all objectives. Whileknowing the extreme solutions beforehand is not possible in the general case, for the testsystems it is assumed that the solution with the maximum number of instances of each ser-vice has the fastest response time. For the cost objective, the extreme solution is simply theconfiguration with the lowest possible number of total instances.

The second metric used for comparison is a spacing metric which shows how evenlyspaced the pareto front generated by the algorithm is in the objective space. The metricis described in Yang et al. [20] and is similar to Deb et al.’s ∆-metric [7], with the differ-ence that it does not rely on knowing the true pareto optimal set. The following equationdescribes the metric:

S =

√1

h−1

h

∑i=1

(d−di)2 (3.1)

where di = min j(| f1i(x)− f1

j|+ ...+ | fmi(x)− fm

j(x)|), d is the mean of all di, andh is the size of the current pareto front generated by the algorithm. A completely uniform

19

distribution of solutions in the objective space results in a value of zero for the metric, andhigher values indicate a more uneven spread.

3.7 Test Environment

All of the test systems are deployed using Google Kubernetes Engine on the Google CloudPlatform. Each service is instantiated in a Docker container. Two different configurationswere used for the Kubernetes clusters:

• For the test runs that use file-caching and box constraints that limit the size of thesearch space so that every possible configuration can be evaluated, the cluster con-sisted of 6 high-cpu nodes with 1 vCPU and 3.75 GB memory per node.

• For the extended tests using larger box constraints, a cluster of 3 high memory nodeswith 2 vCPUs and 13 GB memory per node was used.

3.8 Parameters

Tests are affected by a number of parameters, both problem and algorithm related. The mostcentral problem related parameters are the following:

• Which services to scale.

• The minimum and maximum number of replicas for each service.

• Parameters of the load test used to measure response time, such as the number ofthreads, requests and runs per evaluation.

Two different configurations are used for the tests in the thesis. For the file-cached runson the developed test system, the services A, B and C are set to a maximum of 3 replicas,while the services 1, 2 and 3 are set to a maximum of 5 replicas. The minimum number ofreplicas for all services are set to 1. For the larger, non-file-cached tests, all services are setto a maximum of 8 replicas.

The important parameters of NSGA-II and NSGA-III are

• Population size.

• Mutation rate.

• Crossover SBX eta parameter, which affects how similar children are to the parents.

• Number of reference points, in the case of NSGA-III.

These parameters are set to the values that were found to be the most effective. Thenumber of reference points are set to roughly the same number as the population size asrecommended by the NSGA-III creators [6].

20

3.9 Constraints and Invalid Solutions

The algorithms are developed to handle box constraints, which in this case is the minimumand maximum number of instances of each service. In general, the mutation and crossoveroperators of a genetic algorithm can result in invalid solutions if not implemented to handleconstraints. In the case of mutation, the operator was implemented such that it only gener-ates new genes that are within the specified box constraints. For the crossover operator, if agene of a child is not within the box constraints, the gene is set to the closest value withinthe constraints.

3.10 Limitations

Because running large systems in the cloud can get potentially very expensive, the test wererun using a trial account on GCP which includes a sum of free credits. This puts somelimitations on the number of services and the number of replicas of each service that canbe deployed, however. A maximum of 8 virtual CPUs means that only 8 Kubernetes nodescan be used in a cluster. This in turn affects the maximum amount of memory available.The size of the developed test system and the maximum number of replicas during test runswere chosen with these limitations in mind.

Another limiting factor is that to produce a load that is high enough to affect the responsetime of the test system, multiple machines sending requests in parallel would normally beneeded. This was handled by limiting the CPU time available to each instance.

21

4 Results

In this chapter, results are presented from test runs using both genetic algorithms to gener-ate optimized scaling configurations. The first section presents results from running tests onfully cached evaluations of configurations on the developed test system. These tests showaverage values for sets of 500 runs of the algorithms. The second section shows the result ofa single test run using larger constraints and no file caching. The run time characteristics ofthe algorithms are then described. Finally, while the optimization method was not success-fully applied to either Sock Shop or the Minium system, the problems that were discoveredare presented.

4.1 Tests Using File Caching

The following results were obtained using a cache of fitness values from a fully evaluatedsearch space. Test were done on the developed test system, with services 1 to 3 constrainedto a maximum of 5 replicas and the other services constrained to a maximum of 3. Becausewe want to find a good pareto front approximation without evaluating more configurationsthan necessary, hypervolume is shown against the number of unique, non-cached evalua-tions. With the constraints used for the tests in this section, the size of the search space, thetotal number of possible configurations, is 3×3×3×3×5×5×5 = 10125.

Figure 4.1 shows average hypervolume against unique evaluations for 200 generationsof NSGA-II and NSGA-III, optimizing a configuration of 6 services constrained to 3, 3, 3,3, 5, 5, and 5 replicas.

22

0 60 120 180 240 300 360 420 480 540Unique evaluations

22

24

26

28

30

32

Aver

age

hype

rvol

ume

Best fit NSGA-IIBest fit NSGA-IIINSGA-IINSGA-III

Figure 4.1: Average hypervolume for 500 runs of NSGA-II and NSGA-III with a popula-tion of 20, running for 200 generations.

The algorithm shows similar results until about 140 unique evaluations, where NSGA-III essentially stops improving its pareto front. NSGA-II is shown to perform a larger num-ber of unique evaluations during the 200 generations, indicating that more new configura-tions are generated each generation.

The average spacing metric values for the same runs are shown in Figure 4.2. As ex-plained in the Method chapter, a smaller spacing value is considered better as it indicatesthat the algorithm explores the search space more evenly. However, because the searchspace is discrete rather than continuous, the true optimal pareto set might not be even.NSGA-III has a better spacing value throughout the runs, which is also one of the goals ofits reference-points based selection mechanism.

23

0 60 120 180 240 300 360 420 480 540Unique evaluations

1.0

1.5

2.0

2.5

3.0

3.5

4.0

Aver

age

spac

ing

valu

e


Figure 4.2: Average spacing metric value for 500 runs of NSGA-II and NSGA-III with apopulation of 20, running for 200 generations.

While large population sizes are not feasible when each configuration takes 10-20 sec-onds to evaluate, it is still of interest to examine whether there are any differences in thealgorithms’ performance for different population sizes. Using cached evaluations, it waspossible to do such tests within a reasonable amount of time. Figure 4.3 shows average hy-pervolume for the same test setup as in Figure 4.1 but with a population size of 100. Here,NSGA-III initially outperforms NSGA-II, but stops improving after about 400 unique eval-uations, while NSGA-II continues to improve until about 800 evaluations. For the 200 gen-erations, NSGA-III performs a maximum of roughly 725 unique evaluations while NSGA-IIperforms more than 1800 at most.

24

200 400 600 800 1000 1200 1400 1600 1800Unique evaluations

26

27

28

29

30

31

32

Aver

age

hype

rvol

ume


Figure 4.3: Average hypervolume for 500 runs of NSGA-II and NSGA-III with a popula-tion of 100, running for 200 generations.

The spacing metric values of the same run does not show the same pronounced differ-ence between the algorithms as when using a population of 20, but NSGA-III still has agenerally better spacing metric as seen in Figure 4.4.


1.5

2.0

2.5

3.0

3.5

4.0

4.5

5.0

5.5

Aver

age

spac

ing

valu

e


Figure 4.4: Average spacing metric value for 500 runs of NSGA-II and NSGA-III with apopulation of 100, running for 200 generations.

25

Since the search space for the tests performed in this section is fully evaluated, it ispossible to find the true pareto optimal set. This was done by using the non-domination sortalgorithm from NSGA-II [7] on the set of all configurations within the constraints. The truepareto optimal set is shown in Figure 4.5.

5 10 15 20 25 30Number of replicas

0.75

1.00

1.25

1.50

1.75

2.00

2.25

2.50Av

erag

e re

spon

se ti

me

(s)

[2 1 2 1 1 1 4]

[2 2 1 2 3 3 5][2 1 1 1 3 2 5]

[1 1 1 1 1 1 3]

[2 1 2 1 1 2 5][2 2 1 1 1 1 5]

[1 1 1 1 1 1 2]

[3 3 2 3 4 5 5][3 2 1 1 2 2 5]

[1 1 1 1 1 1 1]

[1 2 1 1 1 1 3]

[2 1 1 2 1 1 3]

Pareto-optimal set

Figure 4.5: The pareto optimal set of scaling configurations for the test system when con-strained to a maximum of 3, 3, 3, 3, 5, 5 and 5 replicas per service.

Figure 4.6 shows a pareto front generated by NSGA-II, when using a population size of100, after 200 generations. Most of the solutions are identical to the true pareto front. Thisis consistent with the high hypervolumes shown in the extensive tests.

26

4 6 8 10 12 14 16 18 20Number of replicas

0.75

1.00

1.25

1.50

1.75

2.00

2.25

2.50

Aver

age

resp

onse

tim

e (s

)

[2 1 2 1 1 1 4]

[2 2 1 2 3 3 5]

[2 1 1 1 3 2 5]

[1 1 1 1 1 1 3]

[2 2 1 1 1 1 5][2 1 2 1 1 2 5]

[1 1 1 1 1 1 2]

[1 1 1 1 1 1 1]

[2 2 1 2 1 3 5] [3 2 1 2 1 3 5]

[1 2 1 1 1 1 3][2 1 1 2 1 1 3]

Pareto-optimal set

Figure 4.6: Pareto front of the population after 200 generations of NSGA-II. Populationsize is 100.

A typical pareto front of NSGA-III for the same test parameters is shown in Figure4.7. Notably, the solution (1, 1, 1, 1, 1, 1, 2) from the true pareto front is missing, whichshould result in a large reduction in hypervolume consistent with the results of NSGA-IIIin the extensive tests. Also, the front only contains 9 solutions, while the NSGA-II frontcontained 12, the same number as the true pareto front.

5.0 7.5 10.0 12.5 15.0 17.5 20.0 22.5Number of replicas

0.75

1.00

1.25

1.50

1.75

2.00

2.25

2.50

Aver

age

resp

onse

tim

e (s

)

[2 3 1 1 2 3 5][2 2 1 1 1 4 5]

[2 2 1 1 1 1 3]

[1 1 1 1 1 1 3]

[2 2 1 1 1 1 5]

[1 1 1 1 1 1 1]

[2 3 1 1 2 5 5]

[1 2 1 1 1 1 3]

[2 1 1 1 1 1 5]

Pareto-optimal set

Figure 4.7: Pareto front of the population after 200 generations of NSGA-III. Populationsize is 100.

27

4.2 Tests Not Using File Caching

The tests using larger constraints of a maximum of 6 of each service and no file cachingproved to be very time-consuming. One test run of either NSGA-II or NSGA-III using apopulation size of 20 took about 14 hours and 30 minutes to complete 20 generations, withthe search space constrained to a maximum of 6 replicas of each service on the developedtest system. Figure 4.8 compares the hypervolumes of a single run of each algorithm.


40

45

50

55

60

65

Aver

age

hype

rvol

ume


Figure 4.8: Hypervolumes of a single run of each algorithm on the test system using amaximum of 6 replicas of each service and population of 20, after 20 generations.

After only 20 generations, 2000 unique evaluations have been performed by both algo-rithms, compared to the tests using smaller constraints where NSGA-II did a maximum ofaround 500 unique evaluations using the same population size.

Figure 4.9 shows the spacing metric for the same test run. For this test run, NSGA-IIshows better spacing values.

28


1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

Aver

age

spac

ing

valu

e


Figure 4.9: The spacing metric of a single run of each algorithm on the test system using amaximum of 6 replicas of each service and population of 20, after 20 generations.

4.3 Run Time

Measuring the exact run time of the algorithms is not the aim of this thesis as it is highlydependent on the implementations, programming languages, and the platform on which theservices are deployed. However, for the algorithms to be a suitable solution to the scalingproblem, the run time needs to be kept within reasonable limits.

Two important results were found within this area. The first, is that the run time of thealgorithms is completely dominated by the time required to evaluate a single configuration.This is supported by Povinelli and Feng who suggests that in some practical applicationsof GAs, about 96% of the time is spent evaluating the objective function(s) [14]. For com-parison, in the time required for a single run of the algorithms using ”live” evaluations, itwas possible to finish thousands of runs when using cached evaluations. Evaluating a singleconfiguration of the developed test system, deployed using GKE, took on average about10 to 20 seconds. The majority of this time is spent starting up or shutting down serviceinstances. Performing the load test which evaluates response time is the second most time-consuming part of the process. For the developed test system, this was necessary in orderto get consistent values.

A second important result is that the average time needed for evaluating a configurationwas increased with the number of maximum replicas and services. This could represent anadditional performance cost when using the method on larger problem sizes.

Completing 20 generations of either NSGA-II or NSGA-III on the larger problem sizeof 6 maximum replicas each of the test system took over 14 hours. This can be compared toFigure 4.1, where NSGA-II converges after about 100 generations on the smaller problemsize.

29

4.4 Sock Shop and the Minium System

Although an interface for optimizing the Sock Shop application using the algorithms wasdeveloped, the results found when evaluating single configurations showed that runningthe algorithms would be meaningless due to inconsistent objective values. The evaluatedresponse time for the same configuration varied greatly between runs, and the differencein response time when adding a large amount of instances was too small. Using twice thenumber of instances would sometimes give a slower response time. It was assumed that thiswas either due to caching, or that the work done for each request was not enough to give aconsistent result.

30

5 Conclusion and Discussion

This chapter presents the conclusions that can be drawn from the results obtained, anddiscusses the possible reasons behind the results as well as improvements that can be madeto the methods. The section Future Work discusses possible extensions and changes thatcan be made to the scaling method.

5.1 Conclusion

While the examined genetic algorithms are successful in generating pareto optimal frontsin some of the cases, the results indicate that there are two requirements on the system to beoptimized:

• The services need to be consistent performance-wise; if the variance of the resultingfitness function is too large, the difference in performance between two scaling con-figurations will be overshadowed by random variance. This is a potential problemwhen services use any type of caching.

• The number of microservices that is to be included in the scaling configuration mustnot be too large.

• The maximum number of replicas for each pod must not be too large.

The results of the file-cached tests show that NSGA-II is superior to NSGA-III for thisproblem size. While initially, hypervolume gains are similar for the first generations of thealgorithms, NSGA-III converges earlier and does not improve past a certain level, failingto find several solutions from the true pareto optimal front. This is clear from both Figure4.1 and 4.3, where the average hypervolume of NSGA-III stops improving past a certainpoint. NSGA-II on the other hand continues improving, eventually converging around thetrue pareto optimal front. Figure 4.6 shows that in a typical run, the NSGA-II front isalmost equal to the true pareto set shown in Figure 4.5. The fact that the NSGA-II frontis not constant after the pareto optimal front has been discovered is explained by mutation.A typical NSGA-III front after 200 generations using the same parameters yields only 9solutions compared to NSGA-II’s 12, as shown in 4.7.

As shown in Figure 4.2, the average value of the spacing metric is noticeably higher forNSGA-II in every generation. While this means that NSGA-III is better at finding an evenlyspaced pareto front, if the true pareto front is uneven, it is not beneficial to focus too muchon evenness.

While the goal of this thesis was not to measure or compare the run time of the al-gorithms or the scaling method, this was found to be a large factor when determining theviability of the method. In its current implementation, the method can only be applied onsmall systems, or an isolated subset of a larger system of services. Since the method re-quires the system to be deployed and running on a Kubernetes cluster with enough capacityto be able to use the maximum number of replicas of each service, the method is also costly.

31

The algorithms typically have to run for many hours, and keeping a system deployed in thecloud with many replicas of each service can be expensive.

5.2 Discussion

While it was shown in this thesis that the method of using GAs to find optimal scalingconfigurations is only viable for a certain type of smaller microservice systems, the methodrequires no knowledge about the system to be optimized, only the names of the serviceswhich are to be scaled and the maximum number of replicas for each service. This makesthe method easy to use as an automated tool to find the best scaling configurations of a newrelease of a system.

The results show that NSGA-II is superior at finding a good pareto approximation onthe given problem. NSGA-II was also found easier to implement and tune, compared toNSGA-III which was rather complex and not as straightforward.

One of the goals of this thesis was to examine the effectiveness of using GAs to findoptimal scaling configurations of the Minium system. The limitations of the method dis-covered through experimentation showed that the Minium system is too large to optimizeusing the method. Also, the caching used would lead to sub-par results due to variance ofevaluated fitness values. However, it could be possible to isolate a specific subset of servicesand find optimal configurations of them, if caching was taken into account or simply turnedoff. As an alternative, the GAs could be used to optimize resource allocations of servicesinstead, such a CPU and memory, thus scaling vertically instead of horizontally.

5.3 Future Work

There are several possible improvements to the general optimization method that could beexplored. Since the main problem for practical applications is the time required for eachevaluation, this should be a central area of improvement. An alternative to starting newinstances whenever a new scaling configuration is to be evaluated, it could be possible tosimply use the maximum number of replicas of each service and disable a number of themcorresponding to the configuration which is to be evaluated. Whether this yields similarperformance to the current method, and is representative of how the configuration wouldactually perform, is not clear.

The method explored in this thesis is focused on discovering static configurations forscaling a system. It would be worthwhile to explore extensions of the method to handledynamic scaling as well. This could be accomplished by, instead of using actual scalingconfigurations as chromosomes, using a series of weights associated with the services, oreven an artificial neural net.

Another unexplored improvement is the possibility of scaling vertically as well as hori-zontally, or even focusing solely on scaling vertically. This would involve adjusting the CPUand memory resources of each service by setting their requests and limits in Kubernetes.

32

References

[1] Ram Bhushan Agrawal. Simulated binary crossover for continuous search space.Complex systems, 9(2):115–148, 1995.

[2] A. Antonescu, P. Robinson, and T. Braun. Dynamic sla management with forecastingusing multi-objective optimization. In 2013 IFIP/IEEE International Symposium onIntegrated Network Management (IM 2013), pages 457–463, May 2013.

[3] Charles Audet, Jean Bigeon, Dominique Cartier, Sebastien Le Digabel, and LudovicSalomon. Performance indicators in multiobjective optimization. 2018.

[4] Xiaolin Chang, Bin Wang, Jiqiang Liu, Wenbo Wang, and Jogesh Muppala. Greencloud virtual network provisioning based ant colony optimization. In Proceedings ofthe 15th Annual Conference Companion on Genetic and Evolutionary Computation,GECCO ’13 Companion, pages 1553–1560, New York, NY, USA, 2013. ACM. ISBN978-1-4503-1964-5. doi: 10.1145/2464576.2482735. URL http://doi.acm.org/10.1145/2464576.2482735.

[5] T. Chen and R. Bahsoon. Self-adaptive trade-off decision making for autoscalingcloud-based services. IEEE Transactions on Services Computing, 10(4):618–632, July2017. ISSN 1939-1374. doi: 10.1109/TSC.2015.2499770.

[6] K. Deb and H. Jain. An evolutionary many-objective optimization algorithm usingreference-point-based nondominated sorting approach, part i: Solving problems withbox constraints. IEEE Transactions on Evolutionary Computation, 18(4):577–601,Aug 2014. ISSN 1089-778X. doi: 10.1109/TEVC.2013.2281535.

[7] Kalyanmoy Deb, Samir Agrawal, Amrit Pratap, and Tanaka Meyarivan. A fast elitistnon-dominated sorting genetic algorithm for multi-objective optimization: Nsga-ii.In International conference on parallel problem solving from nature, pages 849–858.Springer, 2000.

[8] Susan J. Fowler. Microservices vs. Service-Oriented Architectures. O’Reilly Media,Inc., 2016.

[9] Donald E. Grierson. Pareto multi-criteria decision making. Advanced EngineeringInformatics, 22(3):371 – 384, 2008. ISSN 1474-0346. doi: https://doi.org/10.1016/j.aei.2008.03.001. URL http://www.sciencedirect.com/science/article/pii/S1474034608000281. Collaborative Design and Manufacturing.

[10] Carlos Guerrero, Isaac Lera, and Carlos Juiz. Genetic algorithm for multi-objectiveoptimization of container allocation in cloud architecture. Journal of Grid Computing,16(1):113–135, Mar 2018. ISSN 1572-9184. doi: 10.1007/s10723-017-9419-x. URLhttps://doi.org/10.1007/s10723-017-9419-x.

33

http://doi.acm.org/10.1145/2464576.2482735

http://doi.acm.org/10.1145/2464576.2482735

http://www.sciencedirect.com/science/article/pii/S1474034608000281

http://www.sciencedirect.com/science/article/pii/S1474034608000281

https://doi.org/10.1007/s10723-017-9419-x

[11] John H. Holland. Genetic algorithms and the optimal allocation of trials. SIAM Journalof Computing, 2:88–105, 06 1973. doi: 10.1137/0202009.

[12] John H. Holland. Adaptation in Natural and Artificial Systems. University of MichiganPress, Ann Arbor, MI, 1975. second edition, 1992.

[13] H. Ishibuchi, R. Imada, Y. Setoguchi, and Y. Nojima. Performance comparison ofnsga-ii and nsga-iii on various many-objective test problems. In 2016 IEEE Congresson Evolutionary Computation (CEC), pages 3045–3052, July 2016. doi: 10.1109/CEC.2016.7744174.

[14] Richard Povinelli and Xin Feng. Improving genetic algorithms performance by hash-ing fitness values. 04 2009.

[15] C. R. Reeves and J. E. Rowe. Genetic Algorithms: Principles and Perspectives, vol-ume 20 of Operations Research/Computer Science Interfaces Series. Kluwer Aca-demic Publishers, London, 2003. ISBN 1-4020-7240-6.

[16] Proteek Chandan Roy, Md. Monirul Islam, and Kalyanmoy Deb. Best order sort: Anew algorithm to non-dominated sorting for evolutionary multi-objective optimiza-tion. In Proceedings of the 2016 on Genetic and Evolutionary Computation Con-ference Companion, GECCO ’16 Companion, pages 1113–1120, New York, NY,USA, 2016. ACM. ISBN 978-1-4503-4323-7. doi: 10.1145/2908961.2931684. URLhttp://doi.acm.org/10.1145/2908961.2931684.

[17] Christian von Lucken, Benjamın Baran, and Carlos Brizuela. A survey on multi-objective evolutionary algorithms for many-objective problems. Computational Opti-mization and Applications, 58(3):707–756, Jul 2014. ISSN 1573-2894. doi: 10.1007/s10589-014-9644-1. URL https://doi.org/10.1007/s10589-014-9644-1.

[18] Hiroshi Wada, Junichi Suzuki, Yuji Yamano, and Katsuya Oba. Evolutionary deploy-ment optimization for service-oriented clouds. Softw., Pract. Exper., 41:469–493, 042011. doi: 10.1002/spe.1032.

[19] E. Wolff. Microservices: Flexible Software Architecture. Pearson Education,2016. ISBN 9780134650401. URL https://books.google.se/books?id=zucwDQAAQBAJ.

[20] Kaifeng Yang, Li Mu, Dongdong Yang, Feng Zou, Lei Wang, and Qiaoyong Jiang.Multiobjective memetic estimation of distribution algorithm based on an incrementaltournament local searcher. TheScientificWorldJournal, 2014:836272, 07 2014. doi:10.1155/2014/836272.

[21] Zhi-Hui Zhan, Xiao-Fang Liu, Yue-Jiao Gong, Jun Zhang, Henry Shu-Hung Chung,and Yun Li. Cloud computing resource scheduling and a survey of its evolutionaryapproaches. ACM Comput. Surv., 47(4):63:1–63:33, July 2015. ISSN 0360-0300. doi:10.1145/2788397. URL http://doi.acm.org/10.1145/2788397.

[22] Y. Zhang, G. Huang, X. Liu, and H. Mei. Integrating resource consumption and alloca-tion for infrastructure resources on-demand. In 2010 IEEE 3rd International Confer-ence on Cloud Computing, pages 75–82, July 2010. doi: 10.1109/CLOUD.2010.11.

34

http://doi.acm.org/10.1145/2908961.2931684

https://doi.org/10.1007/s10589-014-9644-1

https://books.google.se/books?id=zucwDQAAQBAJ

https://books.google.se/books?id=zucwDQAAQBAJ

http://doi.acm.org/10.1145/2788397

Optimal Scaling Conﬁgurations for Microservice-Oriented...

Documents

Transcript of Optimal Scaling Conﬁgurations for Microservice-Oriented...