Jena University Talk 2016.03.09 -- SQL at Zalando Technology
date post
23-Jan-2017Category
Technology
view
271download
3
Embed Size (px)
Transcript of Jena University Talk 2016.03.09 -- SQL at Zalando Technology
PostgreSQL at ZalandoSQL in Fashion
About me
Valentine GogichashviliHead of Data Engineering @ZalandoTechtwitter: @valgoggoogle+: +valgogemail: valentine.gogichashvili@zalando.de
15 countries4 fulfillment centers15+ million active customers2.9 billion revenue 2015150,000+ products9,000+ employees
One of Europe's largest online fashion retailers
Zalando Technology
BERLINDORTMUNDDUBLIN
HELSINKI
ERFURT
MNCHENGLADBACH
HAMBURG
Zalando Technology
900+ TECHNOLOGISTS
Rapidly growing international team
http://tech.zalando.de
THE HISTORY
Once upon a time...
Started as a tiny online shop
Prototyped on Magento (PHP)
Used MySQL as a database
Web Application
Backend
Database
REBOOT
REBOOT
5 years ago
Java macro service architecture with SOAP as RPC layer
PostgreSQL Heavy usage of Stored Procedures 4 databases + 1 sharded database on 2 shards
Python for tooling (i.e code deploy automation)
REBOOT
Java Web Frontend
Java Backend
PostgreSQL
Java Backend
PostgreSQL
Java Backend
PostgreSQL 9.0 RC1PostgreSQL 9.0
RC1PostgreSQL 9.0 RC1PostgreSQL
"macro" services
SQL
SQL
human readable query language
standed the test of time
allows automatic optimization of access to data
SQL
window functions for moving averages etc.
recursive SQL for searching trees and networks
non-blocking locking
DDL locking (schema guarantees)
PostgreSQL
PostgreSQL
The world's
most advanced
open-source
database
PostgreSQL
Minimal DDL locks (easy schema changes)
Nearest neighbour searching using an index
Block-range indexes
Serializability (true theoretical serializability)
JSONB - compressed JSON, fully indexable
Data sampling with guarantees on execution time
PostgreSQL
Read scalability: "Hot standby" replicas
Synchronous Replication
Cascading Replication
Logical Decoding to allow Change Data Capture
BDR for Multi-Master Scalability
Write scalability/sharding (is on the way)
PostgreSQL
Multiple data models (tables, JSON, ROWs, arrays)
ACID
Eventual Consistency (on-demand)
Stored Procedures
Stored Procedures
clean transaction scope
very clean data
processing close to data
no need in classical ORM mappers
Stored Procedures
no debugger in Eclipse or IntelliJ
difficult for projects heavy on CRUD
versioning automation is needed
Stored Procedures
Java Sproc Wrapper
very easy to use
proxies stored procedures as Java method calls
supports complex type mapping
supports transparent sharding
https://github.com/zalando/java-sproc-wrapperhttps://github.com/zalando/java-sproc-wrapper
Stored Procedures
CREATE FUNCTION register_customer(p_email text, p_gender z_data.gender) RETURNS intAS $$ INSERT INTO z_data.customer (c_email, c_gender) VALUES (p_email, p_gender) RETURNING c_id$$LANGUAGE 'sql' SECURITY DEFINER;
SQL
Stored Procedures
@SProcServicepublic interface CustomerSProcService { @SProcCall int registerCustomer(@SProcParam String email, @SProcParam Gender gender);}
JAVA
CREATE FUNCTION register_customer(p_email text, p_gender z_data.gender) RETURNS intAS $$ INSERT INTO z_data.customer (c_email, c_gender) VALUES (p_email, p_gender) RETURNING c_id$$LANGUAGE 'sql' SECURITY DEFINER;
SQL
Stored Procedures
@SProcServicepublic interface CustomerSProcService { @SProcCall int registerCustomer(@SProcParam String email, @SProcParam Gender gender);}
CREATE FUNCTION register_customer(p_email text, p_gender z_data.gender) RETURNS intAS $$ INSERT INTO z_data.customer (c_email, c_gender) VALUES (p_email, p_gender) RETURNING c_id$$LANGUAGE 'sql' SECURITY DEFINER;
SQL
JAVA
Stored Procedures
@SProcCallList findOrders(@SProcParam String email);
JAVA
CREATE FUNCTION find_orders(p_email text, OUT order_id int, OUT order_created timestamptz, OUT shipping_address order_address) RETURNS SETOF recordAS $$ SELECT o_id, o_created, ROW(oa_street, oa_city, oa_country)::order_address FROM z_data."order" JOIN z_data.order_address ON oa_order_id = o_id JOIN z_data.customer ON c_id = o_customer_id WHERE c_email = p_email$$LANGUAGE 'sql' SECURITY DEFINER;
SQL
Stored Procedures
CREATE FUNCTION find_orders(p_email text, OUT order_id int, OUT order_created timestamptz, OUT shipping_address order_address) RETURNS SETOF recordAS $$ SELECT o_id, o_created, ROW(oa_street, oa_city, oa_country)::order_address FROM z_data."order" JOIN z_data.order_address ON oa_order_id = o_id JOIN z_data.customer ON c_id = o_customer_id WHERE c_email = p_email$$LANGUAGE 'sql' SECURITY DEFINER;
@SProcCallList findOrders(@SProcParam String email);
JAVA
SQL
Stored Procedures
@SProcCall
int registerCustomer(@SProcParam @ShardKey CustomerNumber customerNumber,
@SProcParam String email,
@SProcParam Gender gender);
@SProcCall
Article getArticle(@SProcParam @ShardKey Sku sku);
@SProcCall(runOnAllShards = true, parallel = true)
List findOrders(@SProcParam String email);
JAVA
JAVA
JAVA
Stored Procedures
Virtual Shard IDs (pre-sharding)
0 1 0 1 1 0 0 1
7 6 4 3 2 1 05
md5(partitioning_key)
SprocWrapper
PostgreSQL 9.0 RC1PostgreSQL 9.0
RC1PostgreSQL 9.0 RC1PostgreSQL
Java Application
Stored Procedures
Schema based stored procedure versioning
uses search_path on the clients
new schema for every application version
automated deployments
Stored Procedures
Database Tables
api_v16_01
Stored Procedures
Database Tables
api_v16_01
search_path = api_v16_01, public;
Stored Procedures
Database Tables
api_v16_01api_v16_02
search_path = api_v16_01, public;
Stored Procedures
Database Tables
api_v16_01api_v16_02
search_path = api_v16_02, public;
search_path = api_v16_01, public;
Stored Procedures
Database Tables
api_v16_01api_v16_02
search_path = api_v16_02, public;
Stored Procedures
Database Tables
api_v16_01api_v16_02
search_path = api_v16_02, public;
Stored Procedures
Database Tables
api_v16_02
search_path = api_v16_02, public;
https://github.com/zalando/PGObserver
https://github.com/zalando/PGObserver
Schema Management
Schema management
DBDIFF database schema management
schema changes must be documented
atomic changes per feature
locks should be minimal
Schema management
BEGIN; SELECT _v.register_patch('ZEOS-15430.order');
CREATE TABLE z_data.order_address ( oa_id int SERIAL, oa_country z_data.country, oa_city varchar(64), oa_street varchar(128), ... );
ALTER TABLE z_data."order" ADD o_shipping_address_id int REFERENCES z_data.order_address (oa_id);COMMIT;
DBDIFF SQL
Schema management
BEGIN; SELECT _v.register_patch('ZEOS-15430.order');
\i order/database/order/10_tables/10_order_address.sql ALTER TABLE z_data."order" ADD o_shipping_address_id int REFERENCES z_data.order_address (oa_id);COMMIT;
DBDIFF SQL
Schema management
BEGIN; SELECT _v.register_patch('ZEOS-15430.order');
\i order/database/order/10_tables/10_order_address.sql SET statement_timeout TO '3s';
ALTER TABLE z_data."order" ADD o_shipping_address_id int REFERENCES z_data.order_address (oa_id);COMMIT;
DBDIFF SQL
Schema management
pg_view (https://github.com/zalando/pg_view)
helps to monitor locks and load in real-time
used during all DB schema change rollouts
nice_updater (https://github.com/zalando/acid-tools)
runs big migrations controlling database/system load
used during automatic data migrations
https://github.com/zalando/pg_view
High Availability
High Availability
Patroni - High Availability Runner etcd or ZooKeeper for master electionSpilo - PostgreSQL AWS appliance Zalando Patroni
for High Availability Docker
for packaging Zalando STUPS
for audit compliance
https://github.com/zalando/patronihttps://github.com/zalando/patronihttps://github.com/zalando/spilohttps://github.com/zalando/spilo
Monitoring: ZMON
https://zmon.io
Open Source at Zalando Technology
Tech Blog of Zalando Technology https://zalando.github.io/ - Open