Hands on OSU Open Source Lab Virtualization with … · Session Overview (part 1) Ganeti...

50
Hands on Virtualization with Ganeti (part 1) Lance Albertson @ramereth Associate Director OSU Open Source Lab

Transcript of Hands on OSU Open Source Lab Virtualization with … · Session Overview (part 1) Ganeti...

Hands on Virtualization with

Ganeti(part 1)

Lance Albertson@ramereth

Associate DirectorOSU Open Source Lab

About us● OSU Open Source Lab● Server hosting for Open Source

Projects○ Linux Foundation, Apache Software Foundation,

Drupal, Python Software Foundation, Freenode, Gentoo, Debian, CentOS, Fedora, etc etc ...

● Open Source development projects○ Ganeti Web Manager

Session Overview (part 1)● Ganeti Introduction

● Terminology● Major Components

● Latest Features● Using Ganeti in Practice● How Ganeti is deployed at OSUOSL

Session Overview (part 2)● Hands on Demo● Installation and Initialization● Cluster Management

● Adding instances (VMs)● Controlling instances● Auto Allocation

● Dealing with node failures

What can Ganeti do?● Virtual machine management software tool● Manages clusters of physical machines● Xen/KVM/LXC VM deployment● Live Migration● Resiliency to failure

● data redundancy via DRBD

● Cluster Balancing● Ease of repairs and hardware swaps

Ganeti Cluster

Comparing Ganeti● Private IaaS● Primarily utilizes local storage● Designed for hardware failures● Mature project● Low package requirements● Simple administration● Easily pluggable via hooks & RAPI

Project Background● Google funded project● Used in internal corporate env● Open Sourced in 2007 GPLv2● Team based in Google Switzerland● Active mailing list & IRC channel● Started internally before libvirt,

openstack, etc

Goals of Ganeti

Goals: Low Entry Level● Keeping the entry level as low as

possible● Easy to install, manage and upgrade● No specialized hardware needed

● i.e. SANs

● Lightweight● no "expensive" package dependencies

Goals: Enterprise Scale● Manage simultaneously from 1 to ~200

host machines● Access to advanced features

● drbd, live migration, API, OOB control

● Batch VM deployments● Ease of lateral expansion and

rebalancing

Goals: Open Source Citizen● Design and code discussions are open● External contributions are welcome● Cooperate with other "big scale"

Ganeti users● Welcome third-party projects

● Ganeti Web Manager (OSL), Synnefo (GRNET)

Terminology

Terminology

Node virtualization host

Node Group homogeneous set of nodes (i.e. rack of nodes)

Instance virtualization guest

Cluster set of nodes, managed as a collective

Job ganeti operation

Architecture

Components● Linux & standard utils

○ (iproute2, bridge-utils, ssh)● KVM, Xen or LXC● DRBD, LVM, RDB, or SAN● Python

○ (plus a few modules)● socat● Haskell

(optional, for auto-allocation)

Nodes Roles (management level)

Master NodeRuns ganeti-masterd, rapi, noded and confd

Master Candidates

Have a full copy of the config, can become master

Run ganeti-confd and noded

Regular NodesCannot become master

Get only part of the config

Offline nodes In repair or decommissioned

Nodes Roles (instance hosting level)

VM Capable Node Can run virtual machines

Drained Nodes Are being evacuated

Offline Nodes Are in repair

Instances

● Virtual machine that runs on the cluster● fault tolerant/HA entity within cluster

Instance Parameters● Hypervisor: hvparams

● General: beparams

● Networking: nicparams

● Modifiable at the instance or

cluster level

hvparams● Boot order, CDROM Image● NIC Type, Disk Type● VNC Parameters, Serial console● Kernel Path, initrd, args● Other Hypervisor specific

parameters

beparams / nicparams● Memory / Virtual CPUs

● Adding or removing disks

● MAC

● NIC mode (routed or bridged)

● Link

Disk Template

drbd LVM + DRBD between 2 nodes

rbd RBD volumes residing inside a RADOS cluster *

plain LVM with no redundancy

diskless No disks. Useful for testing only

* experimental support added in 2.6

Primary & Secondary Concepts

● Instances always runs on primary● Uses secondary node for disk replication● Depends on disk template (i.e. drbd, plain)

Instance creation scriptsalso known as OS Definitions

● Requires Operating System installation script

● Provide scripts to deploy various operating systems

● Ganeti Instance Debootstrap● upstream supported

● Ganeti Instance Image● written by me

OS Variants● Variants of the OS Definition● Used for defining guest operating

system● Types of deployment settings:

● Extra packages● Filesystem● Image directory● Image Name

Latest Features2.4

March 20112.5

April 2012

● Out of Band management● vhost net support (KVM)● hugepages support (KVM)● initial node groups

● shared storage (SAN) support

● improved node groups (scalability, evacuate, commands)

● master IP turnup customization

● full SPICE support (KVM)

Latest Features2.6

July 2012Upcoming

Just ideas, not promises

● RBD support (ceph)● initial memory balloning

(KVM, Xen)● cpu pinning● OVF export/import support● customized drbd parameters● policies for better resource

modeling● Optional haskell ganeti-confd

● Full dynamic memory support● Better instance networking

customization● Rolling Reboot● Better automation, self-

healing, availability● Higher Scalability● KVM block device migration● Better OS Installation

Initializing your cluster

gnt-cluster init [-s ip] ... \ --enabled-hypervisors=kvm cluster

The node needs to be set up following the ganeti installation guide.

gnt-cluster

gnt-cluster infognt-cluster modify [-B/H/N ...]gnt-cluster verifygnt-cluster master-failovergnt-cluster command/copyfile ...

Cluster wide operations:

Adding nodes

gnt-node add [-s ip] node2gnt-node add [-s ip] node3gnt-node add [-s ip] node4

Adding instances# install instance-{debootstrap, image}gnt-os listgnt-instance add -t drbd \ {-n node3:node2 | -I hail } \ -o debootstrap+default webping webssh web # easy with OS hooks

gnt-node

gnt-node remove node4gnt-node modify \ [ --master-candidate yes|no ] \ [ --drained yes|no ] \ [ --offline yes|no ] node2gnt-node evacuate/failover/migrategnt-node powercycle

Per node operations:

-t drbd

"RAID1" over the network

DRBD provides redundancy to instance data, and makes it possible to perform live migration without having shared storage between the nodes.

Recovering from failure

# set the node offlinegnt-node modify -O yes node3

Recovering from failure# failover instances to their secondariesgnt-node failover --ignore-consistency node3

# or, for each instance:gnt-instance failover \ --ignore-consistency web

Recovering from failure# restore redundancygnt-node evacuate -I hail node3

# or, for each instance:gnt-instance replace-disks \ {-n node1 | -I hail } web

gnt-backup

gnt-backup export -n node1 webgnt-backup import -t plain \ {-n node3 | -I hail } \ --src-node node1 \ --src-dir /tmp/myexport webgnt-backup listgnt-backup remove

Manage instance exports/backups:

htools: cluster resource management

● Written in Haskell● Where do I put a new instance?● Where do I move an existing one?

● hail: the H iallocator● How much space do I have?

● hspace: the H space calculator● How do I fix an N+1 error?

● hbal: the cluster balancer

Controlling Ganeti● Command line *● Ganeti Web Manager

● Developed by OSUOSL

● RAPI (Rest-full HTTP Interface) *● On-cluster "luxi" interface *

● luxi is currently json over unix socket

● there is code for python and haskell

* programmable interfaces

Job Queue

gnt-job listgnt-job infognt-job watchgnt-job cancel

● Ganeti operations generate jobs in the master○ with the exception of queries

● Jobs execute concurrently● You can cancel non-started jobs, inspect the queue

status, and inspect jobs

gnt-group

gnt-group addgnt-group assign-nodesgnt-group evacuategnt-group listgnt-group modifygnt-group removegnt-group renamegnt-instance change-group

Managing node groups:

Running Ganeti in ProductionWhat should you add?

● Monitoring/Automation● Check host disks, memory, load● Trigger events (evacuate, send to repairs, readd

node, rebalance)● Automated host installation/setup (config

management)● Self service use

● Instance creation and resize● Instance console access

Ganeti in practice● Medium to small virtualization

environments● High performance

● Dedicated hardware, faster disks, more spindles on local storage

● Cheap hardware to high-end hardware

● Higher reliability

Ganeti as a "cloud"● Not a traditional cloud environment

● No AWS APIs (yet at least), no object store

● Ganeti specific API

● Tools to extend it● Ganeti Web Manager, Syssnefo, GlusterFS, Ceph

● Storage layer differences● block devices instead of disk images (typically)

How the OSL uses Ganeti● Powers all managed virtualization● Project hosting● KVM based● Hundreds of VMs● Web hosts, code hosting, etc● Per-project clusters: PSF, OSGeo,

phpBB, Gentoo● Powers Supercell

Ganeti at OSL● Node OS: Gentoo

● Migrating towards CentOS

● CFEngine for node configuration setup● Utilize instance-image for guest installs

● Flexibility on guest operating systems we can deploy

● 10 clusters, 27 nodes, 230 instances● Ganeti Web Manager

Ganeti at OSL● Production cluster

● busybox, darcs, inkscape, musicbrainz, openmrs, php.net, qemu, freenode, yum

● 5 nodes, 20 instances per machine

● 64G Ram / 3-7TB / 24 cores (2)

● 24G Ram / 670G / 4 cores (3)

● Reduced cooling footprint● Per-project clusters enabled flexibility

People running Ganeti● Google

● Corporate Computing Infra

● osuosl.org● Oregon State University Open Source Lab

● grnet.gr● Greek Research & Technology Network

● nero.net● Network for Education & Research in Oregon

Questions? (Part 1 Conclusion)

Lance Albertson

[email protected]

@ramereth

http://lancealbertson.com

Check it out at: http://code.google.com/p/ganeti/

Or just search for "Ganeti"

Try it. Love it. Improve it. Contribute back (CLA required).

© 2009-2012 Oregon State University

Use under CC-by-SA / Some content borrowed/modified from Iustin Pop (with permission)