Unveiling CERN Cloud Architecture - October, 2015
-
Upload
belmiro-moreira -
Category
Technology
-
view
421 -
download
3
Transcript of Unveiling CERN Cloud Architecture - October, 2015
Unveiling CERN Cloud Architecture
Openstack Design Summit – Tokyo, 2015
Belmiro Moreira [email protected] @belmiromoreira
What is CERN? • European Organization for Nuclear
Research (Conseil Européen pour la Recherche Nucléaire)
• Founded in 1954 • 21 state members, other countries
contribute to experiments • Situated between Geneva and the
Jura Mountains, straddling the Swiss-French border
• CERN mission is to do fundamental research
3
LHC - Large Hadron Collider
4
LHC and Experiments
5
CMS detector
https://www.google.com/maps/streetview/#cern
LHC and Experiments
6
Proton-lead collisions at ALICE detector
CERN Data Centres
7
OpenStack at CERN by numbers
8
~ 5000 Compute Nodes (~130k cores) • ~ 4800 KVM • ~ 200 Hyper-V
~ 2400 Images ( ~ 30 TB in use)
~ 1800 Volumes ( ~ 800 TB allocated) ~ 2000 Users ~ 2300 Projects
~ 16000 VMs running
Number of VMs created (green) and VMs deleted (red) every 30 minutes
OpenStack timeline at CERN
9
ESSEX 5 Apr 2012
FOLSOM 27 Sep 2012
GRIZZLY 4 Apr 2013
HAVANA 17 Oct 2013
ICEHOUSE 17 Apr 2014
JUNO 16 Oct 2014
Havana February 2014
Icehouse October 2014
KILO 30 Apr 2015
“Hamster” Oct 2013
“Guppy” Jun 2012
“Ibex” Mar 2013
Grizzly Jul 2013
Juno April 2015
LIBERTY
Kilo October 2015
CERN production infrastructure
• Evolution of the number of VMs created since July 2013
OpenStack timeline at CERN
10
Number of VMs running Number of VMs created (cumulative)
Infrastructure Overview • One region, two data centres, 26 Cells • HA architecture only on Top Cell • Children Cells control plane are usually VMs running in the shared infrastructure • Using nova-network with custom CERN driver • 2 Hypervisor types (KVM, HyperV) • Scientific Linux CERN 6; CERN Centos 7; Windows Server 2012 R2 • 2 Ceph instances • Keystone integrated with CERN account/lifecycle system • Nova; Keystone; Glance; Cinder; Heat; Horizon, Ceilometer; Rally • Deployment using OpenStack puppet modules and RDO
11
Architecture Overview
12
Nova Compute Cell
Nova Top Cell
Nova Compute Cell
Nova Compute Cell
Load Balancer Ceph
Glance
Cinder
Heat
Ceilometer
Horizon
Keystone
DB infrastructure
(...)
Geneva Data Centre Budapest Data Centre
Ceph
DB infrastructure
Nova Compute Cell
Nova Compute Cell
Nova Compute Cell
(...)
Why Cells? • Single endpoint to users • Scale transparently between Data Centres
• Availability and Resilience • Isolate different use-cases
13
CellsV1 Limitations • Functionality Limitations:
• Security Groups • Manage aggregates on Top Cell • Availability Zone support
• Cell scheduler limited functionality • Ceilometer integration
14
Nova Deployment at CERN
15
nova-cells
rabbitmq Top cell controller API node
nova-api
rabbitmq
nova-cells
nova-api
nova-scheduler
nova-conductor
nova-network
Child cell controller
Compute node
nova-compute
rabbitmq
nova-cells
nova-api
nova-scheduler
nova-conductor
nova-network
Child cell controller
Compute node
nova-compute
DB
(...)
Load Balancer
DB DB
Nova - Cells Control Plane Top Cell Controller: • Controller nodes running only on
physical nodes • Clustered RabbitMQ with mirrored
queues
• “nova-api” nodes are VMs • deployed in the “common” (user
shared) infrastructure
16
Children Cells Controllers: • Only ONE controller node per cell
• NO HA at Children Cell level • Most are VMs running in other
Cells • Children Cell controller fails?
• Replaced by another VM • User VMs are still available
• ~200 compute nodes per cell
Nova - Cells Scheduling • Different cells have different use cases
• Hardware, Location, Network configuration, Hypervisor type, ...
• Cells capabilities • “datacentre”, “hypervisor”, “avzs”
• example: capabilities=hypervisor=kvm,avzs=avz-a,datacentre=geneva • scheduler filters to use these capabilities
• CERN Cell Filters available at: https://github.com/cernops/nova/tree/cern-2014.2.2-1/nova/cells/filters
17
Nova - Cells Scheduling - Project Mapping How we map projects to cells? https://github.com/cernops/nova/blob/cern-2014.2.2-2/nova/cells/filters/target_cell_project.py
• Default cells; Dedicated cells • Target cell will be selected considering the following configuration:
“nova.conf” cells_default=cellA,cellB,cellC,cellD cells_projects=cellE:<project_uuid1>;<project_uuid2>,cellF:<project_uuid3>
• “disabling” a cell is removing it from the list... http://openstack-in-production.blogspot.fr/2015/10/scheduling-and-disabling-cells.html
18
Nova - Cells Scheduling - AVZs • CellsV1 implementation is not aware of aggregates • How to have AVZs with cells?
• Create the aggregate/availability zone in the Top Cell • Create “fake” nova-compute services to add nodes into the
AVZs aggregates • Cell scheduler uses “capabilities” to identify AVZs • NO aggregates in the children cells
19
Nova - Legacy Child Cell configuration at CERN • Our first cell (2013)
• Cell with >1000 compute nodes • Any problem in Cell control plane had huge impact • All availability zones behind this Cell using aggregates
• Aggregates dedicated to specific projects • Multiple hardware types
• KVM and Hyper-V
20
Nova - Cell Division (from 1 to 9) How to divide an existing Cell? • Setup new Child Cells controllers • Copy the existing DB to all new Cells and delete all instance records that
will not belong to the new Cell • Move compute nodes to new Cells
• Change instances “cells path” in Top Cell DB
21
Nova - Live Migration • Block live migration
• Compute nodes don’t have shared storage
• Not used for daily operations... • Resources availability and network clusters constraints • Only considered for pets
• Planned for the SLC6 to CC7 migration • Planned for hardware end of life
• How to orchestrate large live-migration campaign?
22
Nova - Live Migration • Block live migration with volumes attached is problematic...
• Attached Cinder volumes are block migrated along with instance • They are copied, over the network, from themselves to themselves • Can cause data corruption
• https://bugs.launchpad.net/nova/+bug/1376615 • https://bugzilla.redhat.com/show_bug.cgi?id=1203032 • https://review.openstack.org/#/c/176768/
23
Nova - Kilo with SLC6 • Kilo dropped support to Python 2.6
• We still have ~800 compute nodes running on SLC6
• We needed to build Nova RPM for SLC6 • Original recipe from GoDaddy!
• Create a venv using python 2.7 from SCL • Build the venv with Anvil
• Package the venv in a RPM
24
Nova - Network CERN network configuration: • Network is divided into several "network clusters" (L3 networks), that
have several ”IP services" (L2 subnets) • Each compute node is associated to a "network cluster”
• VMs running in a compute node can only have an IP from the "network cluster" associated to the compute node
• https://etherpad.openstack.org/p/Network_Segmentation_Usecases
25
Nova - Network • Developed CERN Network driver
• Create a new VM 1. Selects the network cluster considering the compute node selected to boot the instance 2. Selects an address from the network cluster 3. Updates CERN network database 4. Waits for the central DNS refresh
• “fixed_ips” table contains IPv4, IPv6, MAC and network cluster • New table does the mapping “host” -> network cluster • Network constraints in some nova operations
• Resize, Live-Migration • https://github.com/cernops/nova/blob/cern-2014.2.2-2/nova/network/manager.py
26
Neutron is coming... • NOT in production. Testing/developing instance • What we use/don't use from Neutron
• No SDN or tunneling • Only provider networks, no private/tenant • Flat networking. VMs bridged directly to the real network • No DHCP or DNS from neutron. We have already our infrastructure • We don't use floating IPs • Neutron API not exposed to users
• Implemented API extensions and Mechanism Driver for our use case • https://github.com/cernops/neutron/commit/63f4e19c7423dcdc2b5a7573d0898ec9e799663b
• How to migrate from nova-network to Neutron?
27
Keystone Deployment at CERN
28
Load Balancer
DB Service
Catalogue DB
Keystone
Service Catalogue
(Exposed to Users) (Dedicated to Ceilometer)
Keystone
Active Directory
Keystone • Keystone nodes are VMs • Integrated with CERN’s Active Directory infrastructure • Project life cycle
• ~200 arrivals/departures per month • CERN user subscribes the "cloud service”
• Created "Personal Project" with limited quota • “Shared Projects” created by request
• "Personal project" disabled when user leaves the Organization • After 3 months stop resources and after 6 months delete resources (VMs,
Volumes, Images, …)
29
Glance Deployment at CERN
30
Load Balancer
DB
Glance-api
Glance-registry
Glance node
(Exposed to Users)
Glance-api
Glance-registry
Glance node
(Only used for Ceilometer calls)
Ceph Geneva
Glance • Uses Ceph backend in Geneva • Glance nodes are VMs • NO Glance image cache • Glance API and Glance Registry running in the same node
• Glance API only talks with local Glance Registry • Two sets of nodes (API exposed to users and Ceilometer)
• When Glance Quotas per Project? • Problematic in private clouds where users are not “charged” for storage
31
Cinder Deployment at CERN
32
Load Balancer
DB
Cinder-api
Cinder-volume
Cinder node
Cinder-scheduler
rabbitmq
Ceph Geneva
Ceph Budapest
NetApp
Cinder • Ceph and NetApp backends • Extended list of available volume types (QoS, Backend, Location) • Cinder nodes are VMs • Active/Active?
• When a volume is created a “cinder-volume” node is associated • Responsible for volume operations
• Not easy to replace cinder controller nodes • DB entries need to be changed manually
• More about CERN storage infrastructure for OpenStack: • https://www.openstack.org/summit/vancouver-2015/summit-videos/presentation/ceph-at-cern-a-year-in-the-
life-of-a-petabyte-scale-block-storage-service
33
Ceilometer Deployment at CERN
34
nova-compute
ceilometer-compute
Hbase
Ceilometer Notification
Agent Ceilometer
Pulling Collector
Ceilometer Notification Collector
Ceilometer UDP
Collector
Mysql MongoDB
Ceilometer API
Cell rabbitmq
notifi
catio
ns
Ceilometer rabbitmq
Ceilometer Evaluator & Notifier
samp
le RP
C
samp
le UD
P
Ceilometer API
HEAT
ceilometer-central-agent
Compute node
Ceilometer
35
• “ceilometer-compute-agent” queries “nova-api” for the instances hosted in the compute node • This can be very demanding for
“nova-api” • When using the default
“instance_name_template” the “instance_name” in Top Cell is different from the Child Cell
• Need to have “nova-api” per Cell Number of Nova API calls done by ceilometer-compute-agent per hour
• Using a dedicated RabbitMQ cluster for Ceilometer • Initially we used Children Cells
Not a good idea! • Any failure/slow down in the
backend storage system can create a big queue...
Ceilometer
36
Size of “metering.sample” queue
Rally
37
• Probe/Benchmarking the Infrastructure every hour
Challenges • Capacity increase to 200k cores by Summer 2016 • Live Migrate thousands of VMs
• Upgrade ~800 compute nodes from SLC6 to CC7 • Retire old servers
• Move to Neutron • Identity Federation with different scientific sites • Magnum and containers possibilities
38
[email protected] @belmiromoreira
http://openstack-in-production.blogspot.com