Server Consolidation

Server Consolidation

Xiujiao [email protected]

12/02/2011

mailto:[email protected]

Overview

• Introduction• Server consolidation problems and solutions– Static Server Allocation Problems (SSAP) and its

extensions[1]

– Shares and Utilities based Server Consolidation[2]

– Server Consolidation with Dynamic Bandwidth Demand[3]

• Conclusion

2

Introduction

• Server Consolidation– The process of combining the workloads of several

different servers or services on a set of target (physical) servers

• The Gartner Group estimates that the utilization of servers in datacenter is less than 20 percent.

VM 1 VM2 VM 4

VM 3

VM 1VM 4VM 3VM 2

Server Consolidation

3

Introduction

• Server Virtualization– Provide technical means to consolidate multiple

servers leading to increased utilization of physical servers

– Virtual machine appears to a “guest” operating system as hardware, but it is simulated in a contained software environment by the host system

– Reduced time for deployment, easier system management— lower hardware and operating costs

4

Overview


extensions[1]



• Conclusion

5

SSAP and its Extensions [1]

• Decision problems• Available data in data centers• Problem Formulation• Complexity and Algorithms• Experimental Setup• Simulation Results

6

Decision Problems

• It applies to three widespread scenarios– Investment decision– Operational costs (i.e. energy, cooling and

administrative cost)– Rack of identical blade servers (which subset of

servers to use)• Minimize the sum of server costs in terms of

purchasing, maintenance, administration or sum of them (ci has different meanings)

7

Available Data in Data Centers

• Date centers reserve certain amounts of IT resources for each single service or server– CPU capacity — SAPS or HP computons– Memory —Gigabyte– Bandwidth —Megabits per second

• Resource demand has seasonal patterns on a daily, weekly or monthly basis– large set of workload traces from their industry

partnerhttp://doi.ieeecomputersociety.org/10.1109/TSC.2010.25

8

http://doi.ieeecomputersociety.org/10.1109/TSC.2010.25

An Example of Available Data

9

Available Data in Data Centers

• Workloads traces can change in extended time periods

• IT service managers monitor workload developments regularly

• Reallocate servers if it is necessary• Models for initial and subsequent allocation

problems

10

Problem Formulation

• Static Server Allocation Problems (SSAP)• Static Server Allocation Problem with variable

workload (SSAPv)• Extensions of the SSAP– Max-No. of services Constraints– Separation Constraints– Combination Constraints– Technical Constraints and Preassignment Constraints– Limit on the number of reallocations

11

SSAP

• n services j ϵ J that are to be served by m servers i ϵ I• Different types of resources k ϵ K• Server i has a certain capacity sik of resource k• ci describes the potential cost of a server• Service j orders ujk units of resource k

• yi are binary decision variables indicating which servers are used

• xij describes which service is allocated on which server

12

SSAP

The SSAP represents a single service’s resource demand as constant over time (side constraints 2) 13

SSAPv• Consider variations in the workload• Time is divided into a set of intervals T indexed by t={1,

….r}• ujkt describes how much capacity service j requires from

resource type k in time interval t

• ujkt depend on the load characteristics of the servers to be consolidated

14

Extensions of SSAP• Max No. of Services Constraints

• Separation Constraints

• Combination Constraints

• Technical Constraints

• Limits on the number of reallocations15

Complexity and Algorithms

• SSAP is strongly NP-hard• A straightforward proof by reducing SSAP to the

multidimensional bin packing problem (MDBP) http://doi.ieeecomputersociety.org/10.1109/TSC.2010.25

• NP-hard does not necessarily mean that it is intractable for practical problem sizes

• Which problem sizes can be solved exactly and how far one can get with heuristic solutions, both in terms of problem size and solution quality

20

http://doi.ieeecomputersociety.org/10.1109/TSC.2010.25

Complexity and Algorithms

• Polynomial-time approximation schemes (PTAS) with worst-case guarantees on the solution quality of MDBP have been published.

• The first important result was produced in C. Chekuri and S. Khanna

, “On Multi-Dimensional Packing Problems,” Proc. ACM-SIAM Symp. Discrete Algorithms, pp. 185-194, 1999

• For any fixed ε>0, delivers a approximate solution for constant d (d is

dimension of MDBP)Two steps

21

Algorithms for MDBP

• First step—Solves linear programming relaxation :make

fractional assignments for at most dm vectors in d dimensions and m bins

• Second step– The set of fractionally assigned vectors is assigned

greedily—find the largest possible set

22

Algorithms for SSAP(v)

• SSAP with only one source– Branch & Bound (SSAP B&B)– First Fit (FF)– First-Fit Decreasing (FFD)

• SSAPv– Branch & Bound (SSAPv B&B)– LP-relaxation-based heuristic (SSAPv Heuristic)

• Use the results of an LP-relaxation• Use an integer program to find an integral assignment

(Compared to the PTAS)

23


• For SSAP B&B, SSAPv B&B and SSAPv Heuristic, the number of servers used does have a significant impact on the computation time

• Each additional server increases the number of binary decision variables by n+1

• Use specific iterative approach to keep the number of binary variables as low as possible

Lower bound number of serversSame capacity s Fractional allocation of services

24


• Start to solve the problem with m being the LB• If the problem is infeasible, m is incremented

by 1 • Repeat until a feasible solution is found• The first feasible solution found in B&B search

tree is obviously an optimal solution

25

Experimental Data

• Experimental Data (3 consecutive months measured in intervals of 5minutes)– 160 traces for the resource usage of

Web/Application/Database servers (W/A/D)– 259 traces describing the load of servers exclusively hosting

ERP applications– Resources demands are in terms of CPU and memory– Strong diurnal seasonality with nearly all servers and some

weekly seasonality– CPU is the bottleneck resource for these types of applications– CPU demand of ERP services is significantly higher than

W/A/D26

Data Preprocessing

• Data Preprocessing: discrete characterization of daily patterns in the workload traces and solve the allocation problem as a discrete optimization problem

• Two- step process to derive the parameters ujkt

for our optimization models from the original workload tracesujkt

raw original workload tracesujkt

an estimator from the set of ujktraw

27

Data Preprocessing

• First step– Derive an estimator for each intervalA day as a period of observationp number of periods contained in the load data (p=92)ϒ’ intervals in a single period (ϒ’=288)

Derive ujkt from the above distribution

28

Data Preprocessing

Y-axis captures a sample of about 92values Risk attitude : 0.95-quantile of Ujkt is an estimator for the resource requirement of service j where 95percent of requests can be satisfied 29

Data Preprocessing

• Second step– Aggregate these intervals to reduce the number of

parameters for the optimization

30

Experimental Design

• Experimental Design– Model (SSAP and SSAPv)– Algorithms (B&B, Heuristic, FF,FFD)– Service type (W/A/D, ERP)– Number of services– Server capacity (CPU only)– Risk attitude– Number of time intervals considered in SSAPv– Sensitivity with respect to additional allocation

constraints

31

Experimental Design

• Experimental Design– lp_solve 5.5.9 :revised simplex and B&B– COIN-OR CBC branch-and-cut IP solver with the

CLP LP server– Java 1.5.0 : FF and FFD– Time out is 20 mins (already up to 700 servers)

32

Simulation Results

• Computation time Depending on Problem Size– Examine 24 time intervals– 95th percentile of 5-minute intervals– 5000 SAPS server capacity– For each of different numbers of services, 20 instances

have been sampled(with replacement)

– Different number of services—x-axis– Computation time—y-axis– Proportion of solvable instances within 2o mins—y-axis

33

Computation time Depending on Problem Size

34


35


36

Proportion of solvable W/A/D instances

37

Proportion of solvable ERP instances

Solve much smaller instances compared with W/A/D services with 20mins 38

Solution Quality Depending on Problem Size

Computed number of required servers exceeds the lower bound number of servers

Refer to this excess ratio Q as solution quality

The closer Q is to 1, the better the solution is

39


40


• W/A/D

• ERPD

41

Impact of Risk Attitude on Solution Quality

• Previous simulation assumed the decision maker to select 95th percentile in data processing

• Percent of the historical service demand would have been satisfied without delay at this capacity

• Risk attitude – Actual overbooking of server resources (aggregate demands)– More conservative estimate (reduction in variance)

• Analysis of capacity violations – 10 different consolidation problems of 250 W/A/D services– Quantiles :0.4.0.45….1– Use SSAPv B&B

42

Impact of Risk Attitude on Solution Quality

43

Influence of the Interval Size

SSAPSSAPv 44

Influence of Additional Allocation Constraints

• Up bound on the number of services per server– The number of servers increases– Computation time increases

• Combination and separation constraints– Little effect on the solution quality – Negative impact on computation time

• Technical constraints– Little effect on the number of servers needed– Computation time decreases

45

Overview


extensions[1]


– Server Consolidation with Dynamic Bandwidth

Demand [3]

• Conclusion

46

Shares and Utilities based Server consolidation [2]

• Min, max and shares• Problem formulation• Algorithms– Basic Overprovision (BO)– Greedy Max (GM)– Greedy Min Max (GMM)– Expand Min Max (EMM)– Power Expand Min Max (PEMM)– Hypothetical Upper Bound Algorithm (HUB)

• Experimental Evaluation

47

Min, Max and Shares

• Not all the applications are created equal.• Different priority– High priority applications : e-commerce web server– Low priority applications: the intranet blogging server

• Different resource affinities– Ex : web server may value additional CPU cycles much

more than a storage backup • Under situation of high load, CPU resources are best

to allocated to higher utility application-web server

48

Min, Max and Shares

• Take advantage of the Min, Max and Shares parameters– Min: ensure VM receive at least that amount of

resources when it is power on– Max: ensure low priority application does not use

more resources and keep them available for high priority applications

– Shares: provide advice to the virtualization scheduler distribute resources between contending VMs (shares ratio of 1:4)

49

Min, Max and Shares Impact Experiment

3 Vmware ESX servers12 VMs (6 low priority and 6 high priority)– Low load: desire 35% of the total available CPU– High load: desire 100% of the total available CPU– Under high load conditions, MMS delivers 47%

more utility than BASE 50

Problem Formulation

• The set of VMs– Vi.m minimum resources needed (CPU only)

– Vi.M maximum resources needed (CPU only)

– Vi.u utility derived from the VM when it is allocated Vi.m

– Vi.U utility derived from the VM when it is allocated Vi.M

• The set of physical servers– Cj the CPU capacity of the server Sj – Pj power cost for the server Sj if it is turned on

51

Problem Formulation

The set of VMs allocated to server Sj

52

Problem Formulation

• Maximize

• Subject to

Unique Multi-knapsack problem: Items can be elastic between min and maxTry to find the best size

53

Algorithms-BO

Power-aware: choosing lower power cost per unit resourceFirst-fitPacking VMs at their maximum requirementsConservative use of 9/10 of servers’ capacitiesFail to choose higher utility VMs

54

Algorithm-GM & GMM

• GM: Sorts the VMs by their profitablity– Provide higher utility for every unit resource than BO– Assign maximum requested allocation– Some VMs are more profitable at another smaller size

• GMM: Sorts the VMs by and – When it choose one corresponding to , then delete

..i

i

V uV m

..i

i

V uV m

55

Algorithm-GMM

• GMM– Pick right combination of VMs especially when

some VMs are more profitable at their min– Fail to get additional resources even when the

incremental value they provide beyond min is much higher than other VMs

– Fail to get resources when some nodes still have left room after first-fitting

56

Algorithms-EMM

• No first-fit fashion • Compute an estimated utility for each node if

the new VM were added to that node • Choose the node gives best utility• How to compute the utility of a node?

• Given a feasible set Q for Sj, use algorithm 2 to estimate the utility of node Sj

57

Compute Node Utility

Expanding VMs that give the most incremental utility per unit capacity until either the node’s capacity is reached or no more expansion is possible

58

Algorithms-EMM

• Limitation of EMM• It tends to use all the servers that it has access

to (considering when empty server is available)

• It is disadvantage in terms of power costs• How to detect whether to start a new node or

not ？ 59

Algorithms-PEMM

• One important change compared with EMM• Use the node utility gain minus the proportional

power cost incurred on the new node as the comparative measure

• If Sj is an already open node

• If Sj is a new node

60

Algorithms-HUB

• Provide a upper bound– Relax multi-bin packing constraint by “lining up

the servers end-to-end”– Allow single VMs to be placed over multiple

servers– Charge only for each VM’s fractional power

consumption over its respective machines• Allow achieving 100% capacity usage on each

machine thus providing an upper bound

61

Experimental Evaluation

• Large synthetic data center experiments– Simulate thousands of VMs and hundreds of servers– Utility and Min-Max inputs: normal distribution– Power cost is a percentage of total utility received

• Real testbed experiments– Three Vmware ESX 3.5 servers– 12 VMs run RedHat Enterprise Linux (RHEL)– Low priority (1) and high priority (4) (6/6)– Workloads in each VM were generated using HPCCHPC Challenge Benchmark, http://icl.cs.utk.edu/hpcc/.

62

http://icl.cs.utk.edu/hpcc/

Utility under Different Number of Servers

MM: Packing VMs at other sizes (compared with BO using maximum size)150-200 servers: Shrinking resources given to high-profitable VMs in favor of fitting more low-profitable VMs200-300 servers: more room to expand high –utility VMs 300-350 servers: steady state :place all the VMS; PEMM with higher utility63

Number of VMs Excluded under Different Number of Servers

All algorithms leave many VMs out up to 150 serversPEMM and EMM are able to fit all the VMs earliest

64

Utility under Different Number of VMs (at min) per Server

Lower numbers: more servers are needed and hence power costs rise leading to lower utilityHigh numbers: fewer servers are needed and utility goes upPEMM yields the highest utility placements 65

Utility under Different Standard Deviation of Server Capacity

As the standard deviation increases, certain servers will have high capacity, lower power to capacity rate and will be filled first, since the utility goes up 66

Computation Time under Different VMs

EMM has a greater execution time as it attempts to use many more servers than PEMM PEMM can easily scale to larger systems 67

Utility under Different Percentage of Low Priority Load

GMM places all VMs in a single server allocating only the min amount of resources resulting in poor utilityPEMM use less servers than EMM to generate more utility68

Utility under Different Percentage of Power Cost

As the power cost increases, it becomes increasingly expensive to use extra servers, causing drops in utilityPEMM does the best by adjusting to those increasing cost69

Overview


extensions[1]



• Conclusion

70

Server Consolidation with Dynamic Bandwidth Demand [3]

• Dynamic bandwidth demand: normal distribution

• Modulate it into Stochastic Bin Packing (SBP)• Propose an online algorithm to solve SBP by

which the number of servers required is within of the optimal for any

• Simulation Results

71

Dynamic Bandwidth Demand

• Normal Distribution

The mean is positiveThe standard deviation is small enough compared with the mean

The probability of bandwidth demand being negative is very small

72

Stochastic Bin Packing

The total size of items in a bin follows normal distribution mean variance

73


74


• Equivalent Size

• Classical Bin Packing: – The number of bins used is the sum of the

item sizes plus the sum of the residual capacity of each bin

– Pack each bin in a compact way so as to reduce the residual capacity of each bin

75


• SBP– Reducing the residual capacity does not

necessarily reduce the number of bins used since the equivalent size can change

• Solution– Reduce the residual capacity– Reduce the equivalent size at the same time

while packing items

76


• One way: 1-4(full),5-13(full),14-35• Second way:1-2 & 5-8 &14 (full), 3-4 & 9-10 & 15

(full), the rest must be in two bins

• Both of them pact as compact as possible• The total equivalent size in the second method is

larger than the first one 77

Online Algorithm

• Find methods of dividing groups – The items with both means and standard deviations

in the same interval belong to the same group • Run Group Packing algorithm

78

Online Algorithm

• Two scenarios– Finite possibilities of means– Generic scenarios

• Worst case performance ratio

– B(L) is the number of bins used by the packing algorithm• First scenario• Second scenario 79

Simulation Setup

• Use traffic dataset from global operational data centers.

• 9K VMs and servers are equipped with 1Gbps Ethernet card

The bandwidth requirement for VM i

80

Simulation Results

• The number of servers used : our algorithm 421 HARMONIC 609

C. C. Lee and D. T. Lee, "A simple on-line bin-packing algorithm," J. ACM, vol. 32, no. 3, pp. 562-572, 1985

• Use as the bandwidth for requirement VM i

• The number of servers become 402 for HARMONIC

• But the violation probability exceeds 0.0181

Simulation Results

82

Simulation Setup

• Number of items: 2000-20,000• • • for FF and FFD

83

Simulation Results

84

Overview


extensions [1]

– Shares and Utilities based Server Consolidation [2]

– Server Consolidation with Dynamic Bandwidth Demand [3]

• Conclusion

85

Conclusion

• They(except SSAP) do not use deterministic values to characterize the demands over time anymore – Some modulate the problem into ILP while

considering real world constraints, then solve the ILP with heuristic algorithm [1]

– Some try to take advantages of Min, Max and Shares features inherent in Server consolidation [2]

– Some use SBP to consolidate dynamic servers [3]

86

References• 1 Speitkamp, B.; Bichler, M., “A Mathematical Programming Approach for Server

Consolidation Problems in Virtualized Data Centers,” IEEE Transactions on Services Computing, vol.3, no.4, pp.266-278, Oct.-Dec. 2010

• 2 Cardosa, M.; Korupolu, M.R.; Singh, A., “Shares and utilities based power consolidation in virtualized server environments,” IM ’09. IFIP/IEEE International Symposium on Integrated Network Management, 2009, pp.327-334, 1-5 June 2009

• 3 Meng Wang; Xiaoqiao Meng; Li Zhang, “Consolidating virtual machines with dynamic bandwidth demand in data centers,” 2011 Proceedings IEEE INFOCOM , pp.71-75, 10-15 April 2011

• 4 Ying Song; Yanwei Zhang; Yuzhong Sun; Weisong Shi, “Utility analysis for Internet-oriented server consolidation in VM-based data centers,” 2009. CLUSTER ’09. IEEE International Conference on Cluster Computing and Workshops, pp.1-10, Aug. 31 2009-Sept. 4 2009

• 5 C.Chekuri and S. Khanna, “On Multi-Dimensional PackingProblems,” Proc.ACM-SIAM Symp. Discrete Algorithms, pp. 185-194, 1999.

87

Questions and Answers

• Q1: How to understand the combination constraints in the extensions of SSAP?[1]

• Answer: If is zero, that means the service e is not allocated on server i, from the formulation

we can get that must be zero, in other words, the other services are not allocated on the server i, too. If is one, it means service e is allocated on server i, then all the other services have to be allocated on the same server i. 88


• Q2: Explain the constraints of limit on the number of reallocations. [1]

• Professor Qiao explained this in an much easier way, just reformulate the constraints into , here r is the limit number of reallocations, and n is the size of the set in the current time.

xijij

X

n x r

=1ijx

89


• Q3: How does GMM work if the is higher, but the corresponding max size cannot be satisfied?[2]

• If the max size cannot be satisfied, it will not allocate this service at max, and then keep traversing the list, leaving the corresponding in the list.

..i

i

V uV m

90


• Q4: How to get the utility inputs in Min, Max and Shares paper? [2]

• They use normal distribution to generate the utility. First, they generate the utility at min, and then on top of this, they get the utility at max using certain variance. But for the utility of the size between min and max, the paper does not explain very clearly.

91


• Q5: In which way they get the HUB method? [2]

• HUB is ideal. It allows VMs use different servers in different time interval and also VMs can be fractional allocated. In real technology world, we cannot really do this. If we assume that migration over time does not consume time and VMs can be fractionally allocated, we can get this upper bound.

92


• Q6 : Since “the dynamic bandwidth demand” paper is too abstract and theoretical, just introduce the main ideas. [3]

• The core idea is that we should pact items as compact as possible, and at the same time try to minimize the equivalent size of the items. First they divided the items into different groups according to the mean and variance interval, and then used on line algorithm to pact the items in each group. They improved that the worst case performance ratio is .

93

Any Questions?

Thank you!

Server Consolidation

Documents

Transcript of Server Consolidation