Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)
-
Upload
ontico -
Category
Engineering
-
view
1.945 -
download
4
Transcript of Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)
Spilo, highly-available PostgreSQL cluster
Oleksii Kliukin Zalando SE
Zalando• 15 EU countries • 3 fulfilment
centers • 15+ million
active customers • 2.2 billion €
revenue 2014
150 000+ products
We are growing!
Zalando platform
Our databases• >150 production Postgresql
databases • >13.5 TB data • >5 TB biggest DB • 400-1000+ write tps • >2 DB failures/month
Zalando never sleeps
Infrastructure bottleneck
ACID Teamcreate alter deploy migrate failover upgrade
80+ teams
Radical Agility
Purpose
Autonomy
Mastery
Cloud• 2013: ZCloud
• 2014: project Pequod
• 2015: Let’s just use AWS…
Amazon 3-letter words
• AWS - amazon web services • EC2 - elastic compute cloud • ELB - elastic load balancer • RDS - relational DB service
AWS• One account per team
• Microservices
• REST/OAuth2
• Deployment with Docker
Autonomous teams on AWS
REST
INTERNET
Autonomous teams• Team decides which product to
build • … and which technologies to use
• REST/OAuth2 mandatory
• Team is responsible for its infrastructure
Databases?• Developers should take care
of infrastructure
• ..including production databases
• On AWS!
Isn’t it dangerous?
DBAs running with scissors, by Gavin M. Roy: https://www.flickr.com/photos/gavinmroy/4638958958
ACID team provides
PostgreSQL trainings
What about failover?
Autofailover tasks
• Detect the master failure
• Elect a new master
• Redirect clients
Autofailover issues
• Discarded writes
• Split-brain
• False positives
RDS?• Support for PostgreSQL
• Automatic failover
• Most extensions
• Automatic backups
RDS?• Vendor lock
• No superuser
• No untrusted languages
• No logical decoding plugins
• Rather expensive
EC2 + Linux HA
• Complex setup
• Lots of manual steps(i.e. new replica creation)
Spilo (!"#$%)
Spilo does
• Rapid deployment of PostgreSQL on AWS EC2 instances
• Streaming replication with auto-failover
Spilo on AWS
Spilo MASTER
Spilo REPLICA
Spilo REPLICA
Master connection
Application DB request
ETCD cluster status update
Failover
Spilo REPLICA
Spilo REPLICA
Master connection
Application DB request
ETCD cluster status update
Failover
Spilo MASTER
Spilo REPLICA
Master connection
Application DB request
ETCD cluster status update
NEW SPILO STARTS…
Failover
Spilo MASTER
Spilo REPLICA
Master connection
Application DB request
ETCD cluster status update
Spilo REPLICA
What is Spilo?
cPatroni
MASTER
cPatroni
REPLICA
cPatroni
REPLICA
Auto-scaling group Auto-scaling group
Patroni ("&'(%)#)• Handles new replicas and
failover
• Based on ideas and code of the Compose Governor
• Open-source
Compose Governor idea
Core to our PostgreSQL HA system is the Governor application which uses etcd as its repository of truth to discover which database instance is leader.
Distributed configuration systems
• Fault tolerant
• Reliably store small amounts of strongly-consistent data between distributed nodes
• Good for storing the PostgreSQL cluster state
Distributed consensus
LEADER
CLIENT CLIENT CLIENT
Distributed consensus
LEADER
CLIENT CLIENT CLIENT
Cluster state in etcd$ etcdctl ls --recursive /service /service/batman /service/batman/optime /service/batman/optime/leader /service/batman/members /service/batman/members/postgresql0 /service/batman/members/postgresql1 /service/batman/initialize /service/batman/leader
Leader key$ etcdctl get /service/batman/leader postgresql0
• Points to the member key • Has a TTL, autoexpires • Acts as an exclusive lock • Only the leader can become
the master
Leader TTL$ http http://127.0.0.1:2379/v2/keys/service/batman/leader … { "action": "get", "node": { "createdIndex": 48723, "expiration": "2015-10-23T14:51:49.686506977Z", "key": "/service/batman/leader", "modifiedIndex": 49037, "ttl": 27, "value": "postgresql0" } }
Member key$ etcdctl get /service/batman/members/postgresql0
{“role":"master", “state”:"running", “conn_url”:"postgres://replicator:[email protected]:5432/postgres", “api_url”:"http://127.0.0.1:8008/patroni", "xlog_location":67108960}
Connection and API URL
cPatroni
cPatroni
API URL (check health
during promotion)
MASTER
REPLICA
CONNECTION URL
MASTER LB
REPLICA LB
CONNECTION URL
Initialize key$ etcdctl get /service/batman/initialize 6208852353820383446
• PostgreSQL cluster system ID • Created by the first node that
joins the cluster • Nodes with different system
ID are not allowed to join
Patroni modules
ETCD ZOOKEEPER
ABSTRACT DCS PostgreSQL REST API
High availability
Asynchronous executor
Callbacks
From Governor to PatroniGovernor
Patroni
Location of etcd: original
cGovernor
cGovernor
cGovernor
Replace etcd with proxy
cGovernor
cGovernor
cGovernor
Proxy
Proxy
Proxy
Embed etcd client in Patroni
cPatroni
cPatroni
cPatroni
Patroni improvements• Robust exception handling • Run long-running tasks (i.e.
base backup in a separate thread)
• ETCD + Zookeeper • Rest API
Patroni improvements
• Configurable replica imaging
• Support for pg_rewind
Patroni improvements• Manual failover • Initialize from external
cluster • Attach to already running
PostgreSQL nodes • Tags (i.e. nofailover)
What you should monitor• replication lag • unhealthy member • no leader • etcd/
Zookeeper
Thank you!• Spilo:
github.com/zalando/spilospilo.readthedocs.org
• Patroni:github.com/zalando/patronipatroni.readthedocs.org
• Feedback: @alexeyklyukin