Download - Jena University Talk 2016.03.09 -- SQL at Zalando Technology

Transcript
Page 1: Jena University Talk 2016.03.09 -- SQL at Zalando Technology

PostgreSQL at ZalandoSQL in Fashion

Page 2: Jena University Talk 2016.03.09 -- SQL at Zalando Technology

About me

Valentine GogichashviliHead of Data Engineering @ZalandoTechtwitter: @valgoggoogle+: +valgogemail: [email protected]

Page 3: Jena University Talk 2016.03.09 -- SQL at Zalando Technology
Page 4: Jena University Talk 2016.03.09 -- SQL at Zalando Technology

15 countries4 fulfillment centers15+ million active customers2.9 billion € revenue 2015150,000+ products9,000+ employees

One of Europe's largest online fashion retailers

Page 5: Jena University Talk 2016.03.09 -- SQL at Zalando Technology

Zalando Technology

BERLINDORTMUNDDUBLIN

HELSINKI

ERFURT

MÖNCHENGLADBACH

HAMBURG

Page 6: Jena University Talk 2016.03.09 -- SQL at Zalando Technology

Zalando Technology

900+ TECHNOLOGISTS

Rapidly growing international team

http://tech.zalando.de

Page 7: Jena University Talk 2016.03.09 -- SQL at Zalando Technology

THE HISTORY

Page 8: Jena University Talk 2016.03.09 -- SQL at Zalando Technology

Once upon a time...

Started as a tiny online shop

Prototyped on Magento (PHP)

Used MySQL as a database

Web Application

Backend

Database

Page 9: Jena University Talk 2016.03.09 -- SQL at Zalando Technology

REBOOT

Page 10: Jena University Talk 2016.03.09 -- SQL at Zalando Technology

REBOOT

5½ years ago

● Java○ macro service architecture with SOAP as RPC layer

● PostgreSQL ○ Heavy usage of Stored Procedures○ 4 databases + 1 sharded database on 2 shards

● Python for tooling (i.e code deploy automation)

Page 11: Jena University Talk 2016.03.09 -- SQL at Zalando Technology

REBOOT

Java Web Frontend

Java Backend

PostgreSQL

Java Backend

PostgreSQL

Java Backend

PostgreSQL 9.0 RC1PostgreSQL 9.0

RC1PostgreSQL 9.0 RC1PostgreSQL

"macro" services

Page 12: Jena University Talk 2016.03.09 -- SQL at Zalando Technology

SQL

Page 13: Jena University Talk 2016.03.09 -- SQL at Zalando Technology

SQL

● human readable query language

● standed the test of time

● allows automatic optimization of access to data

Page 14: Jena University Talk 2016.03.09 -- SQL at Zalando Technology

SQL

● window functions for moving averages etc.

● recursive SQL for searching trees and networks

● non-blocking locking

● DDL locking (schema guarantees)

Page 15: Jena University Talk 2016.03.09 -- SQL at Zalando Technology

PostgreSQL

Page 16: Jena University Talk 2016.03.09 -- SQL at Zalando Technology

PostgreSQL

The world's

most advanced

open-source

database

Page 17: Jena University Talk 2016.03.09 -- SQL at Zalando Technology

PostgreSQL

● Minimal DDL locks (easy schema changes)

● Nearest neighbour searching using an index

● Block-range indexes

● Serializability (true theoretical serializability)

● JSONB - compressed JSON, fully indexable

● Data sampling with guarantees on execution time

Page 18: Jena University Talk 2016.03.09 -- SQL at Zalando Technology

PostgreSQL

● Read scalability: "Hot standby" replicas

● Synchronous Replication

● Cascading Replication

● Logical Decoding to allow Change Data Capture

● BDR for Multi-Master Scalability

● Write scalability/sharding (is on the way)

Page 19: Jena University Talk 2016.03.09 -- SQL at Zalando Technology

PostgreSQL

● Multiple data models (tables, JSON, ROWs, arrays)

● ACID

● Eventual Consistency (on-demand)

Page 20: Jena University Talk 2016.03.09 -- SQL at Zalando Technology

Stored Procedures

Page 21: Jena University Talk 2016.03.09 -- SQL at Zalando Technology

Stored Procedures

● clean transaction scope

● very clean data

● processing close to data

● no need in classical ORM mappers

Page 22: Jena University Talk 2016.03.09 -- SQL at Zalando Technology

Stored Procedures

● no debugger in Eclipse or IntelliJ

● difficult for projects heavy on CRUD

● versioning automation is needed

Page 23: Jena University Talk 2016.03.09 -- SQL at Zalando Technology

Stored Procedures

Java Sproc Wrapper

● very easy to use

● proxies stored procedures as Java method calls

● supports complex type mapping

● supports transparent sharding

Page 24: Jena University Talk 2016.03.09 -- SQL at Zalando Technology

Stored Procedures

CREATE FUNCTION register_customer(p_email text, p_gender z_data.gender) RETURNS intAS $$ INSERT INTO z_data.customer (c_email, c_gender) VALUES (p_email, p_gender) RETURNING c_id$$LANGUAGE 'sql' SECURITY DEFINER;

SQL

Page 25: Jena University Talk 2016.03.09 -- SQL at Zalando Technology

Stored Procedures

@SProcServicepublic interface CustomerSProcService { @SProcCall int registerCustomer(@SProcParam String email, @SProcParam Gender gender);}

JAVA

CREATE FUNCTION register_customer(p_email text, p_gender z_data.gender) RETURNS intAS $$ INSERT INTO z_data.customer (c_email, c_gender) VALUES (p_email, p_gender) RETURNING c_id$$LANGUAGE 'sql' SECURITY DEFINER;

SQL

Page 26: Jena University Talk 2016.03.09 -- SQL at Zalando Technology

Stored Procedures

@SProcServicepublic interface CustomerSProcService { @SProcCall int registerCustomer(@SProcParam String email, @SProcParam Gender gender);}

CREATE FUNCTION register_customer(p_email text, p_gender z_data.gender) RETURNS intAS $$ INSERT INTO z_data.customer (c_email, c_gender) VALUES (p_email, p_gender) RETURNING c_id$$LANGUAGE 'sql' SECURITY DEFINER;

SQL

JAVA

Page 27: Jena University Talk 2016.03.09 -- SQL at Zalando Technology

Stored Procedures

@SProcCallList<Order> findOrders(@SProcParam String email);

JAVA

CREATE FUNCTION find_orders(p_email text, OUT order_id int, OUT order_created timestamptz, OUT shipping_address order_address) RETURNS SETOF recordAS $$ SELECT o_id, o_created, ROW(oa_street, oa_city, oa_country)::order_address FROM z_data."order" JOIN z_data.order_address ON oa_order_id = o_id JOIN z_data.customer ON c_id = o_customer_id WHERE c_email = p_email$$LANGUAGE 'sql' SECURITY DEFINER;

SQL

Page 28: Jena University Talk 2016.03.09 -- SQL at Zalando Technology

Stored Procedures

CREATE FUNCTION find_orders(p_email text, OUT order_id int, OUT order_created timestamptz, OUT shipping_address order_address) RETURNS SETOF recordAS $$ SELECT o_id, o_created, ROW(oa_street, oa_city, oa_country)::order_address FROM z_data."order" JOIN z_data.order_address ON oa_order_id = o_id JOIN z_data.customer ON c_id = o_customer_id WHERE c_email = p_email$$LANGUAGE 'sql' SECURITY DEFINER;

@SProcCallList<Order> findOrders(@SProcParam String email);

JAVA

SQL

Page 29: Jena University Talk 2016.03.09 -- SQL at Zalando Technology

Stored Procedures

@SProcCall

int registerCustomer(@SProcParam @ShardKey CustomerNumber customerNumber,

@SProcParam String email,

@SProcParam Gender gender);

@SProcCall

Article getArticle(@SProcParam @ShardKey Sku sku);

@SProcCall(runOnAllShards = true, parallel = true)

List<Order> findOrders(@SProcParam String email);

JAVA

JAVA

JAVA

Page 30: Jena University Talk 2016.03.09 -- SQL at Zalando Technology

Stored Procedures

Virtual Shard IDs (pre-sharding)

0 1 0 1 1 0 0 1

7 6 4 3 2 1 05

md5(partitioning_key)

SprocWrapper

PostgreSQL 9.0 RC1PostgreSQL 9.0

RC1PostgreSQL 9.0 RC1PostgreSQL

Java Application

Page 31: Jena University Talk 2016.03.09 -- SQL at Zalando Technology

Stored Procedures

Schema based stored procedure versioning

● uses search_path on the clients

● new schema for every application version

● automated deployments

Page 32: Jena University Talk 2016.03.09 -- SQL at Zalando Technology

Stored Procedures

Database Tables

api_v16_01

Page 33: Jena University Talk 2016.03.09 -- SQL at Zalando Technology

Stored Procedures

Database Tables

api_v16_01

search_path = api_v16_01, public;

Page 34: Jena University Talk 2016.03.09 -- SQL at Zalando Technology

Stored Procedures

Database Tables

api_v16_01api_v16_02

search_path = api_v16_01, public;

Page 35: Jena University Talk 2016.03.09 -- SQL at Zalando Technology

Stored Procedures

Database Tables

api_v16_01api_v16_02

search_path = api_v16_02, public;

search_path = api_v16_01, public;

Page 36: Jena University Talk 2016.03.09 -- SQL at Zalando Technology

Stored Procedures

Database Tables

api_v16_01api_v16_02

search_path = api_v16_02, public;

Page 37: Jena University Talk 2016.03.09 -- SQL at Zalando Technology

Stored Procedures

Database Tables

api_v16_01api_v16_02

search_path = api_v16_02, public;

Page 38: Jena University Talk 2016.03.09 -- SQL at Zalando Technology

Stored Procedures

Database Tables

api_v16_02

search_path = api_v16_02, public;

Page 39: Jena University Talk 2016.03.09 -- SQL at Zalando Technology

https://github.com/zalando/PGObserver

Page 40: Jena University Talk 2016.03.09 -- SQL at Zalando Technology

https://github.com/zalando/PGObserver

Page 41: Jena University Talk 2016.03.09 -- SQL at Zalando Technology

Schema Management

Page 42: Jena University Talk 2016.03.09 -- SQL at Zalando Technology

Schema management

DBDIFF database schema management

● schema changes must be documented

● atomic changes per feature

● locks should be minimal

Page 43: Jena University Talk 2016.03.09 -- SQL at Zalando Technology

Schema management

BEGIN; SELECT _v.register_patch('ZEOS-15430.order');

CREATE TABLE z_data.order_address ( oa_id int SERIAL, oa_country z_data.country, oa_city varchar(64), oa_street varchar(128), ... );

ALTER TABLE z_data."order" ADD o_shipping_address_id int REFERENCES z_data.order_address (oa_id);COMMIT;

DBDIFF SQL

Page 44: Jena University Talk 2016.03.09 -- SQL at Zalando Technology

Schema management

BEGIN; SELECT _v.register_patch('ZEOS-15430.order');

\i order/database/order/10_tables/10_order_address.sql ALTER TABLE z_data."order" ADD o_shipping_address_id int REFERENCES z_data.order_address (oa_id);COMMIT;

DBDIFF SQL

Page 45: Jena University Talk 2016.03.09 -- SQL at Zalando Technology

Schema management

BEGIN; SELECT _v.register_patch('ZEOS-15430.order');

\i order/database/order/10_tables/10_order_address.sql SET statement_timeout TO '3s';

ALTER TABLE z_data."order" ADD o_shipping_address_id int REFERENCES z_data.order_address (oa_id);COMMIT;

DBDIFF SQL

Page 46: Jena University Talk 2016.03.09 -- SQL at Zalando Technology

Schema management

pg_view (https://github.com/zalando/pg_view)

● helps to monitor locks and load in real-time

● used during all DB schema change rollouts

nice_updater (https://github.com/zalando/acid-tools)

● runs big migrations controlling database/system load

● used during automatic data migrations

Page 47: Jena University Talk 2016.03.09 -- SQL at Zalando Technology

https://github.com/zalando/pg_view

Page 48: Jena University Talk 2016.03.09 -- SQL at Zalando Technology

High Availability

Page 49: Jena University Talk 2016.03.09 -- SQL at Zalando Technology

High Availability

Patroni - High Availability Runner● etcd or ZooKeeper for master electionSpilo - PostgreSQL AWS appliance● Zalando Patroni

for High Availability● Docker

for packaging● Zalando STUPS

for audit compliance

Page 50: Jena University Talk 2016.03.09 -- SQL at Zalando Technology

Monitoring: ZMON

Page 51: Jena University Talk 2016.03.09 -- SQL at Zalando Technology

Open Source at Zalando Technology

● Tech Blog of Zalando Technology● https://zalando.github.io/ - Open Source Projects

○ Java Sproc Wrapper○ PGObserver○ pg_view○ Patroni○ Spilo○ ...

Page 52: Jena University Talk 2016.03.09 -- SQL at Zalando Technology

Questions?

Page 53: Jena University Talk 2016.03.09 -- SQL at Zalando Technology

Where to Find Us:

Tech Blog: tech.zalando.comGitHub: github.com/zalando

Twitter: @ZalandoTechInstagram: zalandotech

Jobs: http://tech.zalando.com/jobs