Jena University Talk 2016.03.09 -- SQL at Zalando Technology

Click here to load reader

  • date post

    23-Jan-2017
  • Category

    Technology

  • view

    271
  • download

    3

Embed Size (px)

Transcript of Jena University Talk 2016.03.09 -- SQL at Zalando Technology

  • PostgreSQL at ZalandoSQL in Fashion

  • About me

    Valentine GogichashviliHead of Data Engineering @ZalandoTechtwitter: @valgoggoogle+: +valgogemail: valentine.gogichashvili@zalando.de

  • 15 countries4 fulfillment centers15+ million active customers2.9 billion revenue 2015150,000+ products9,000+ employees

    One of Europe's largest online fashion retailers

  • Zalando Technology

    BERLINDORTMUNDDUBLIN

    HELSINKI

    ERFURT

    MNCHENGLADBACH

    HAMBURG

  • Zalando Technology

    900+ TECHNOLOGISTS

    Rapidly growing international team

    http://tech.zalando.de

  • THE HISTORY

  • Once upon a time...

    Started as a tiny online shop

    Prototyped on Magento (PHP)

    Used MySQL as a database

    Web Application

    Backend

    Database

  • REBOOT

  • REBOOT

    5 years ago

    Java macro service architecture with SOAP as RPC layer

    PostgreSQL Heavy usage of Stored Procedures 4 databases + 1 sharded database on 2 shards

    Python for tooling (i.e code deploy automation)

  • REBOOT

    Java Web Frontend

    Java Backend

    PostgreSQL

    Java Backend

    PostgreSQL

    Java Backend

    PostgreSQL 9.0 RC1PostgreSQL 9.0

    RC1PostgreSQL 9.0 RC1PostgreSQL

    "macro" services

  • SQL

  • SQL

    human readable query language

    standed the test of time

    allows automatic optimization of access to data

  • SQL

    window functions for moving averages etc.

    recursive SQL for searching trees and networks

    non-blocking locking

    DDL locking (schema guarantees)

  • PostgreSQL

  • PostgreSQL

    The world's

    most advanced

    open-source

    database

  • PostgreSQL

    Minimal DDL locks (easy schema changes)

    Nearest neighbour searching using an index

    Block-range indexes

    Serializability (true theoretical serializability)

    JSONB - compressed JSON, fully indexable

    Data sampling with guarantees on execution time

  • PostgreSQL

    Read scalability: "Hot standby" replicas

    Synchronous Replication

    Cascading Replication

    Logical Decoding to allow Change Data Capture

    BDR for Multi-Master Scalability

    Write scalability/sharding (is on the way)

  • PostgreSQL

    Multiple data models (tables, JSON, ROWs, arrays)

    ACID

    Eventual Consistency (on-demand)

  • Stored Procedures

  • Stored Procedures

    clean transaction scope

    very clean data

    processing close to data

    no need in classical ORM mappers

  • Stored Procedures

    no debugger in Eclipse or IntelliJ

    difficult for projects heavy on CRUD

    versioning automation is needed

  • Stored Procedures

    Java Sproc Wrapper

    very easy to use

    proxies stored procedures as Java method calls

    supports complex type mapping

    supports transparent sharding

    https://github.com/zalando/java-sproc-wrapperhttps://github.com/zalando/java-sproc-wrapper

  • Stored Procedures

    CREATE FUNCTION register_customer(p_email text, p_gender z_data.gender) RETURNS intAS $$ INSERT INTO z_data.customer (c_email, c_gender) VALUES (p_email, p_gender) RETURNING c_id$$LANGUAGE 'sql' SECURITY DEFINER;

    SQL

  • Stored Procedures

    @SProcServicepublic interface CustomerSProcService { @SProcCall int registerCustomer(@SProcParam String email, @SProcParam Gender gender);}

    JAVA

    CREATE FUNCTION register_customer(p_email text, p_gender z_data.gender) RETURNS intAS $$ INSERT INTO z_data.customer (c_email, c_gender) VALUES (p_email, p_gender) RETURNING c_id$$LANGUAGE 'sql' SECURITY DEFINER;

    SQL

  • Stored Procedures

    @SProcServicepublic interface CustomerSProcService { @SProcCall int registerCustomer(@SProcParam String email, @SProcParam Gender gender);}

    CREATE FUNCTION register_customer(p_email text, p_gender z_data.gender) RETURNS intAS $$ INSERT INTO z_data.customer (c_email, c_gender) VALUES (p_email, p_gender) RETURNING c_id$$LANGUAGE 'sql' SECURITY DEFINER;

    SQL

    JAVA

  • Stored Procedures

    @SProcCallList findOrders(@SProcParam String email);

    JAVA

    CREATE FUNCTION find_orders(p_email text, OUT order_id int, OUT order_created timestamptz, OUT shipping_address order_address) RETURNS SETOF recordAS $$ SELECT o_id, o_created, ROW(oa_street, oa_city, oa_country)::order_address FROM z_data."order" JOIN z_data.order_address ON oa_order_id = o_id JOIN z_data.customer ON c_id = o_customer_id WHERE c_email = p_email$$LANGUAGE 'sql' SECURITY DEFINER;

    SQL

  • Stored Procedures

    CREATE FUNCTION find_orders(p_email text, OUT order_id int, OUT order_created timestamptz, OUT shipping_address order_address) RETURNS SETOF recordAS $$ SELECT o_id, o_created, ROW(oa_street, oa_city, oa_country)::order_address FROM z_data."order" JOIN z_data.order_address ON oa_order_id = o_id JOIN z_data.customer ON c_id = o_customer_id WHERE c_email = p_email$$LANGUAGE 'sql' SECURITY DEFINER;

    @SProcCallList findOrders(@SProcParam String email);

    JAVA

    SQL

  • Stored Procedures

    @SProcCall

    int registerCustomer(@SProcParam @ShardKey CustomerNumber customerNumber,

    @SProcParam String email,

    @SProcParam Gender gender);

    @SProcCall

    Article getArticle(@SProcParam @ShardKey Sku sku);

    @SProcCall(runOnAllShards = true, parallel = true)

    List findOrders(@SProcParam String email);

    JAVA

    JAVA

    JAVA

  • Stored Procedures

    Virtual Shard IDs (pre-sharding)

    0 1 0 1 1 0 0 1

    7 6 4 3 2 1 05

    md5(partitioning_key)

    SprocWrapper

    PostgreSQL 9.0 RC1PostgreSQL 9.0

    RC1PostgreSQL 9.0 RC1PostgreSQL

    Java Application

  • Stored Procedures

    Schema based stored procedure versioning

    uses search_path on the clients

    new schema for every application version

    automated deployments

  • Stored Procedures

    Database Tables

    api_v16_01

  • Stored Procedures

    Database Tables

    api_v16_01

    search_path = api_v16_01, public;

  • Stored Procedures

    Database Tables

    api_v16_01api_v16_02

    search_path = api_v16_01, public;

  • Stored Procedures

    Database Tables

    api_v16_01api_v16_02

    search_path = api_v16_02, public;

    search_path = api_v16_01, public;

  • Stored Procedures

    Database Tables

    api_v16_01api_v16_02

    search_path = api_v16_02, public;

  • Stored Procedures

    Database Tables

    api_v16_01api_v16_02

    search_path = api_v16_02, public;

  • Stored Procedures

    Database Tables

    api_v16_02

    search_path = api_v16_02, public;

  • https://github.com/zalando/PGObserver

  • https://github.com/zalando/PGObserver

  • Schema Management

  • Schema management

    DBDIFF database schema management

    schema changes must be documented

    atomic changes per feature

    locks should be minimal

  • Schema management

    BEGIN; SELECT _v.register_patch('ZEOS-15430.order');

    CREATE TABLE z_data.order_address ( oa_id int SERIAL, oa_country z_data.country, oa_city varchar(64), oa_street varchar(128), ... );

    ALTER TABLE z_data."order" ADD o_shipping_address_id int REFERENCES z_data.order_address (oa_id);COMMIT;

    DBDIFF SQL

  • Schema management

    BEGIN; SELECT _v.register_patch('ZEOS-15430.order');

    \i order/database/order/10_tables/10_order_address.sql ALTER TABLE z_data."order" ADD o_shipping_address_id int REFERENCES z_data.order_address (oa_id);COMMIT;

    DBDIFF SQL

  • Schema management

    BEGIN; SELECT _v.register_patch('ZEOS-15430.order');

    \i order/database/order/10_tables/10_order_address.sql SET statement_timeout TO '3s';

    ALTER TABLE z_data."order" ADD o_shipping_address_id int REFERENCES z_data.order_address (oa_id);COMMIT;

    DBDIFF SQL

  • Schema management

    pg_view (https://github.com/zalando/pg_view)

    helps to monitor locks and load in real-time

    used during all DB schema change rollouts

    nice_updater (https://github.com/zalando/acid-tools)

    runs big migrations controlling database/system load

    used during automatic data migrations

  • https://github.com/zalando/pg_view

  • High Availability

  • High Availability

    Patroni - High Availability Runner etcd or ZooKeeper for master electionSpilo - PostgreSQL AWS appliance Zalando Patroni

    for High Availability Docker

    for packaging Zalando STUPS

    for audit compliance

    https://github.com/zalando/patronihttps://github.com/zalando/patronihttps://github.com/zalando/spilohttps://github.com/zalando/spilo

  • Monitoring: ZMON

    https://zmon.io

  • Open Source at Zalando Technology

    Tech Blog of Zalando Technology https://zalando.github.io/ - Open