Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)

Spilo, highly-available PostgreSQL cluster

Oleksii Kliukin Zalando SE

Zalando• 15 EU countries • 3 fulfilment

centers • 15+ million

active customers • 2.2 billion €

revenue 2014

150 000+ products

We are growing!

Zalando platform

Our databases• >150 production Postgresql

databases • >13.5 TB data • >5 TB biggest DB • 400-1000+ write tps • >2 DB failures/month

Zalando never sleeps

Infrastructure bottleneck

ACID Teamcreate alter deploy migrate failover upgrade

80+ teams

Radical Agility

Purpose

Autonomy

Mastery

Cloud• 2013: ZCloud

• 2014: project Pequod

• 2015: Let’s just use AWS…

Amazon 3-letter words

• AWS - amazon web services • EC2 - elastic compute cloud • ELB - elastic load balancer • RDS - relational DB service

AWS• One account per team

• Microservices

• REST/OAuth2

• Deployment with Docker

Autonomous teams on AWS

INTERNET

Autonomous teams• Team decides which product to

build • … and which technologies to use

• REST/OAuth2 mandatory

• Team is responsible for its infrastructure

Databases?• Developers should take care

of infrastructure

• ..including production databases

• On AWS!

Isn’t it dangerous?

DBAs running with scissors, by Gavin M. Roy: https://www.flickr.com/photos/gavinmroy/4638958958

ACID team provides

PostgreSQL trainings

What about failover?

Autofailover tasks

• Detect the master failure

• Elect a new master

• Redirect clients

Autofailover issues

• Discarded writes

• Split-brain

• False positives

RDS?• Support for PostgreSQL

• Automatic failover

• Most extensions

• Automatic backups

RDS?• Vendor lock

• No superuser

• No untrusted languages

• No logical decoding plugins

• Rather expensive

EC2 + Linux HA

• Complex setup

• Lots of manual steps(i.e. new replica creation)

Spilo (!"#$%)

Spilo does

• Rapid deployment of PostgreSQL on AWS EC2 instances

• Streaming replication with auto-failover

Spilo on AWS

Spilo MASTER

Spilo REPLICA

Master connection

Application DB request

ETCD cluster status update

Failover

Spilo REPLICA

Master connection

Failover

Spilo MASTER

Spilo REPLICA

Master connection

NEW SPILO STARTS…

Failover

Spilo MASTER

Spilo REPLICA

Master connection

Spilo REPLICA

What is Spilo?

cPatroni

MASTER

cPatroni

REPLICA

cPatroni

REPLICA

Auto-scaling group Auto-scaling group

Patroni ("&'(%)#)• Handles new replicas and

failover

• Based on ideas and code of the Compose Governor

• Open-source

Compose Governor idea

Core to our PostgreSQL HA system is the Governor application which uses etcd as its repository of truth to discover which database instance is leader.

Distributed configuration systems

• Fault tolerant

• Reliably store small amounts of strongly-consistent data between distributed nodes

• Good for storing the PostgreSQL cluster state

Distributed consensus

LEADER

CLIENT CLIENT CLIENT

Distributed consensus

LEADER

CLIENT CLIENT CLIENT

Cluster state in etcd$ etcdctl ls --recursive /service /service/batman /service/batman/optime /service/batman/optime/leader /service/batman/members /service/batman/members/postgresql0 /service/batman/members/postgresql1 /service/batman/initialize /service/batman/leader

Leader key$ etcdctl get /service/batman/leader postgresql0

• Points to the member key • Has a TTL, autoexpires • Acts as an exclusive lock • Only the leader can become

the master

Leader TTL$ http http://127.0.0.1:2379/v2/keys/service/batman/leader … { "action": "get", "node": { "createdIndex": 48723, "expiration": "2015-10-23T14:51:49.686506977Z", "key": "/service/batman/leader", "modifiedIndex": 49037, "ttl": 27, "value": "postgresql0" } }

Member key$ etcdctl get /service/batman/members/postgresql0

{“role":"master", “state”:"running", “conn_url”:"postgres://replicator:rep-pass@127.0.0.1:5432/postgres", “api_url”:"http://127.0.0.1:8008/patroni", "xlog_location":67108960}

Connection and API URL

cPatroni

API URL (check health

during promotion)

MASTER

REPLICA

CONNECTION URL

MASTER LB

REPLICA LB

CONNECTION URL

Initialize key$ etcdctl get /service/batman/initialize 6208852353820383446

• PostgreSQL cluster system ID • Created by the first node that

joins the cluster • Nodes with different system

ID are not allowed to join

Patroni modules

ETCD ZOOKEEPER

ABSTRACT DCS PostgreSQL REST API

High availability

Asynchronous executor

Callbacks

Demo time!

https://asciinema.org/a/29087

From Governor to PatroniGovernor

Patroni

Location of etcd: original

cGovernor

Replace etcd with proxy

cGovernor

Embed etcd client in Patroni

cPatroni

Patroni improvements• Robust exception handling • Run long-running tasks (i.e.

base backup in a separate thread)

• ETCD + Zookeeper • Rest API

Patroni improvements

• Configurable replica imaging

• Support for pg_rewind

Patroni improvements• Manual failover • Initialize from external

cluster • Attach to already running

PostgreSQL nodes • Tags (i.e. nofailover)

What you should monitor• replication lag • unhealthy member • no leader • etcd/

Zookeeper

Thank you!• Spilo:

github.com/zalando/spilospilo.readthedocs.org

• Patroni:github.com/zalando/patronipatroni.readthedocs.org

• Feedback: @alexeyklyukin

Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)

Engineering

Transcript of Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)

DEFINE CLOUD COMPUTING Oleksii Tregubov Cloud Computing Consultant.

PROZORRO.SALE REFORM AIMED AT ERADICATING … · prozorro.sale reform aimed at eradicating corruption in public sales via transparent e-auction platform oleksii sobolev, cfa

Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen

Serviços Metrológicos para Nanotecnologias Oleksii Kuznetsov Divisão de Metrologia de Materiais (DIMCI/DIMAT) Inmetro, Duque de Caxias, Rio de Janeiro.

Measuring Competition in Spatial Retailpersonal.psu.edu/plg15/files/preprint/groceries.pdfMeasuring Competition in Spatial Retail Paul B. Ellicksona, Paul L. E. Grieco b, and Oleksii

Nataliia Kussul, Andrii Shelestov, Sergii Skakun, Oleksii Kravchenko Space Resarch Institute NASU-NSAU, Ukraine Forecasting winter wheat yield in Ukraine.

Estudio de la susceptibilidad / resistencia de variedades del ......Cuadro 2.—Variedades de moderada resistencia o sensibilidad frente a los ataques del hongo Spilo-caea oleaginae

Конференция Highload++ 2014, "Отказоустойчивый микрокластер своими руками", "Ленвендо", Виталий Гаврилов

CV Oleksii Koval (Fr)

Отказоустойчивый доступ в интернет - MikroTik · 2016-03-21 · Отказоустойчивый доступ в интернет с использованием

Presentation by Oleksii Pavlenko

Varys: Protecting SGX enclaves from practical side-channel … · Varys Protecting SGX Enclaves From Practical Side-Channel Attacks Oleksii Oleksenko†, Bohdan Trach†, Robert Krahn†,

National Aerospace University “Kharkov Aviation Institute” SPIE Remote Sensing 2015 1 Performance prediction for 3D filtering of multichannel images Oleksii.

Hochverfügbare Cluster auf Kubernetes › 2020 › attachments › ... · ist DER PostgreSQL high-availability manager ist Zalando's Postgres Image incl. Patroni betreut Spilo auf

Отказоустойчивый микрокластер своими руками, Виталий Гаврилов (Ленвендо)

Какие функции нужнывашему€¦ · Adventure Works Sales Proposal Author: Oleksii Gavliuk Created Date: 3/26/2013 2:44:13 PM ...

Anna Pomyalov, Itamar Procaccia, Oleksii Rudenko (WIS ... · Anna Pomyalov, Itamar Procaccia, Oleksii Rudenko (WIS), Said Elgobashi (Irvine UCLA) and Sergej S. Zilitinkevich (Helsinki

capital letters SPILO & PATRONI PostgreSQL HA... · Availability Zone A Master Elastic Root volume Data Volume Load Balancer Cloud Formation Stack Replica DB Availability Zone B Root

capital letters ON AWS - Postgres Conf...Spilo - Docker package of Patroni and WAL-E for AWS or Kubernetes Use CloudFormation stacks and ASG for deployments One Docker container per

Matches Oleksandr Tebenko Oleksii Tebenko Ukraine.