Open stack HA - Theory to Reality
-
Upload
sriram-subramanian -
Category
Technology
-
view
85 -
download
1
Transcript of Open stack HA - Theory to Reality
Gerd Prüßmann Shamail TahirCloud Architect Cloud Architect Deutsche Telekom AG EMC Office of the CTO
Sriram Subramanian Kalin NikolovFounder & Cloud Specialist Cloud EngineerCloudDon PayPal
@2digitsleft @ShamailXD
@sriramhere
Agenda
OpenStack HA - IntroductionActive/ ActiveActive/ PassiveDT ImplementationeBay/PayPal ImplementationSummary
OpenStack HA - Introduction
What does it mean?Why is it not by default?Stateless vs StatefulChallengesMore than one way
Active/ PassiveActive/ Active
Active/ Active● OS High Availability (HA) concept depends on components used for
i.e. network virtualization, storage backend, database system etc.● Various technologies available to realize HA:
Vendors use combinations: i.e. Pacemaker, Corosync, Galera, Keepalived, HAProxy, VRRP, DRBD … or their own tools
The following description is derived from the generic proposal from the OpenStack HA guide:http://docs.openstack.org/high-availability-guide/content/index.html
Active/ Active● Target: Try to have all services of the platform highly available
Redundancy and resiliency against single service / node failure
● stateless services are load balanced (HAproxy + keepalived)
o i.e. API endpoints / nova-scheduler
● stateful services use individual HA technologies
o i.e. RabbitMQ, MySQL DB etc.
o might be load balanced as well
● some services/agents where no built in HA feature is available
Active/ Active - API service endpoints
API endpoints● deploy on multiple nodes● configure load balancing with virtual IPs in HAproxy● use HAproxy’s VIPs to configure respective identity endpoints● all service configuration files refer to these VIPs only
schedulers ● nova-scheduler, nova-conductor, cinder-scheduler, neutron-server,
ceilometer-collector, heat-engine● schedulers will be configured with clustered RabbitMQ nodes
Active/ Active - Databases
● MySQL or MariaDB with Galera cluster (wsrep) library extensiono transaction commit level replication
● synchronous multiple master nodes setupo min. 3 nodes to get quorum in
case of network partition● Write and read to any node● other databases options possible:
Percona XtraDB, PostgreSQL etc.
Active/ Active - RabbitMQ
● RabbitMQ nodes clustered● mirrored queues configured via policy (i.e. ha-mode all)● all services use the RabbitMQ nodes
Active/ Active - Networking
Network ● deploy multiple network nodes● Neutron DHCP agent – configure multiple DHCP agents
(dhcp_agents_per_network)● Neutron L3 agent
o Automatic L3 agent HA (allow_automatic_l3agent_failover)o VRRP (l3_ha, max_l3_agents_per_router, min_l3_agents_per_router)
● Neutron L2 agent - no HA available● Neutron metadata agent – no HA availailable● Neutron LBaaS agent – no HA available
● no HA feature available: active/passive pacemaker / corosync solution
Active/ Passive: General
● Components should leverage a Virtual IP● The primary tools used for Active/Passive
OpenStack configurations are general (non-OpenStack specific): Pacemaker + Corosync, and DRBD
Corosync
● Messaging Layer used by Cluster● Responsibilities include cluster membership and
messaging● Leverages RRP (Redundant Ring Protocol)
o Rings can be set up as A/A or A/Po UDP Onlyo mcastport specifies rcv port; mcastport minus 1 is
send port
Pacemaker ● Cluster Resource Manager
● Cluster Information Base (CIB)
o Represents current state of resources and cluster configuration (XML)
● Cluster Resource Management Daemon (CRMd)
o Acts as decision maker (one master)
● Policy Engine (PEngine)
o Send instructions to LRMd and CRMd
● STONITHd
o Fencing mechanism
● Resource Agents
o Standardized interfaces for resource
CRMd
STONITHd CIB
PEngine
LRMd
DRBD
● Distributed Replicated Block Device● Creates logical block devices (e.g. /dev/drbdX) that
having backing volumes● Reads serviced locally● Primary node writes are sent to secondary node
Host1
Active/Passive: Database
MySQL
Host2
MySQL
DRBD DRBD
Pacemaker Pacemaker
Corosync Corosync
● Use DRBD to back MySQL
● Leverage VIP that can float between hosts
● Manage all resources (including MySQL Daemon) with Pacemaker
● MySQL/Galera is an alternative but current version of HA Guide does not recommend it
Host1
Active/Passive: RabbitMQ
RabbitMQ
Host2
RabbitMQ
DRBD DRBD
Pacemaker Pacemaker
Corosync Corosync
● Use DRBD to back RabbitMQ
● Leverage VIP that can float between hosts
● Ensure erlang.cookie are identical on all nodes
o Enables ability to communicate with each other
● RabbitMQ clustering does not tolerate network partitions well
Active/Passive: Overview (From Guide)
● Leverage DB, RabbitMQ VIP in configuration files
● Configure Pacemaker Resources for OpenStack Services
o Image API
o Identity
o Block Storage API
o Telemetry Central Agent
o Networking
o L3-Agent
o DHCP
DT Implementation - Overview
● Business Market Place (BMP)● SaaS offering● https://portal.telekomcloud.com/● SaaS Applications from Software Partners
(ISVs) and DT offered to SME customers ● Platform based on Open Source technologies only
(OpenStack, CEPH, Linux)● Project started in 2012 with OS Essex, CEPH● In production since 3/13
DT Implementation
DTAG scale out project (ongoing)
Target: Migrate production to a new DC and scale out
Requirements:● scale out compute by 30%, storage by 40%● eliminate all SPOFs● Setup in two fire protection areas / physically separated DC rooms
DT Implementation
● single region HA OS instance● all services distributed over two DC rooms
o Compute and Storage distributed equallyo All OpenStack services HA (as far as possible)
OSS (DNS, NTP, puppet master, Mirror etc., redundant perimeter firewall)
● Instance distribution: 4 Availability Zones, multiple host aggregates and scheduler filters
DT Implementation● Load Balancing
o HAproxy for MySQL, services, RabbitMQ, APIs (nginx under test)● MySQL
o Galera Multi Master Node replication (3 nodes)● RabbitMQ
o 2 nodes cluster / mirrored queues● Neutron
o DHCP multiple agents started; Pacemaker/Corosync● API Endpoints
o Loadbalancing with round robin distribution● Storage
o 2 shared, distributed CEPH clusters (RBD/S3)
DT ImplementationTests/Experiences so far
● Load balancing works well● Database: OpenStack multi-node write issues
o 1 node write / 2 nodes backup: diminishes Galera HA efficiency (monitoring)● Specific issues with deployment in 2 DC rooms / uneven distribution of services (Galera)
o if the “wrong” room fails Galera: quorum requires majority!
room with 2 nodes goes down → 3rd node will deactivate itself → DB outage Storage specific:
CEPH may lose 2/3 of the replicas → heavy replication load on CEPH cluster danger of losing data (OSD/disk failure) → raise replica level / adapt crush
map Network: recovering from a neutron / L3 failure: <15 minutes to recover
o pet applications vulnerable – may suffer from hick-ups at disasters anyway● DHCP agent failures
DT Implementation
Plans for the future
● use DVR / VRRP in the futureo make network more resilient and elastic
● a third DC room would be desirable :-)o CEPH replicas / MONs, MySQL Galera
eBay/PayPal Implementation
The scope of Ebay/PayPal OpenStack Clouds● 100% of PayPal web/mid tier● Most of Dev/QA● Number of HVs: 8,500● Number of Virtual Machines: 70,000● Number of users: Several thousands● Availability zones: 10
eBay/PayPal Implementation● Database
MySQL MMM replication, VIP with FailoverPersistence / Galera● RabbitMQ
VIP with SingleNode FailoverPersistence or 3 nodes with mirrored queues● NeutronDHCP / LBaaS
Corosync/Pacemaker● API Endpoints
LB VIPs for every service with either RR or least connection● Storage
Shared storage with nfs/iscsi
eBay/PayPal Implementation
Successful HA Implementations● LoadBalanced HA - VIPs for every service● LB Single Node Failover Persistence Profile● Galera/Percona for Identity Service● Global Identity Service using GLB
eBay/PayPal Implementation
HA Failures● Corosync/Pacemaker
NeutronDHCP and LBaaS - missing advanced health checks ● RabbitMQ
Single Node Failover Persistence● MySQL Replication
Single Node Failover Persistence sometimes doesn't work well Implemented external monitoring and disabling of the failed member.● VIPs without ECV health checks
eBay/PayPal Implementation
Future direction● HA on Global or Regional Services
One leg in each Availability Zone (Keystone, LBaaS, Swift)● RabbitMQ with 3 node/mirrored queues
LB VIP with least connections● No shared NFS for Glance
eBay/PayPal Implementation
Lessons Learned● Try not to overcomplicate● Simulate Failures
Before placing in production make sure HA works● Place your services in different Availability zones
or at least different FaultZones● Always make backups
No matter how robust your HA solution is
● OpenStack HA Guide Update Efforts● WTE Work Group (now known as ‘Enterprise’)
● Share Best Practices
Call to Action
Reference
OpenStack HA guide: http://docs.openstack.org/high-availability-guide/content/index.htmlPercona Resourceshttps://www.percona.com/resources/mysql-webinars/high-availability-using-mysql-cloud-today-tomorrow-and-keys-your-successHA Proxy Documentation:http://www.haproxy.org/