On Building Public Cloud Service for PostgreSQL …...On Building Public Cloud Service for...

On Building Public Cloud Service for PostgreSQL and MySQL

Guangzhou Zhang, Alibaba Cloud [email protected]

2

Agenda •  Resource isolation

•  CPU/IO/MEM •  Disk

•  Privilege management •  Critical issues

•  Handling IO hang •  Handling OOM

•  Enhance the service with Proxy •  Transparent switch-over with Proxy •  Transparent connection pool with Proxy •  Read write auto routing

3

Self-Intro •  Guangzhou Zhang, Developer from Alibaba Cloud RDS Team •  Alibaba Cloud RDS Team

•  Covers MySQL/PostgreSQL/SQL Server •  Handles infrastructure, database kernel (MySQL/PG), operations, customer support, etc. •  Run one of world largest database instance cluster, and still growing more than 100% YTY

4

Architecture

Master

Slave

HA

Instances

User requests thru VIP

Proxy layer Server Load Balancer

5




•  Handling IO stalls •  Handling OOM


6

Resource isolation – CPU/IO/MEM •  Use bare-metal machines (no VM no network storage )

•  Locate up to hundred of instances on the same host •  Take full control over the whole stack, minimize dependency with predictable SLA and

performance •  Control resource allocation layout

7

Resource isolation – CPU/IO/MEM •  Use cgroup for CPU/IO/MEM isolation

cgroup for Instance A

master

backend backend

backend

cgroup for Instance B

slave

backend backend

backend

Host Machine

8

Resource isolation – CPU/IO/MEM •  Use cgroup for CPU/IO/MEM isolation

•  Trick: Separate data and write-ahead-logging (WAL) disks, loose the limit of WAL disk IOPS, why?

•  Maintenance work inside database requires heavy IOs and should be finished as soon as possible

•  Strict limit on WAL IOPS on slave could result in severe replication lag

•  Strict limit on WAL IOPS on master could easily cause a failover switch

•  For better write performance

Why we have to loose the limit? •  No one will constantly be writing to database at a

high pace, since database storage has a upper limit (2T) and is expensive

•  The storage usage of WAL files is charged •  Relocate ‘hot’ instances to different hosts as needed

Why we are able to loose the limit?

cgroup for Instance A master

backend backend

backend

WAL Disk Data Disk

10000 IOPS 500 IOPS

9

Resource isolation - Disk

•  Regularly collect disk usage of each database instance, mark those with excessive disk usage as locked-down

•  Change the database kernel to add a parameter that controls user access: •  Allow read access •  Disallow any write to instance •  Allow DROP/TRUNCATE commands

10






11

Privilege Management

•  Superuser role is kept from the end users for security reason

•  Prevent users from messing up critical configurations for HA or maintenance •  Convenient for us to handle end users’ roles and maintenance roles differently in code

•  Users keep on asking for more from “superuser” privilege set for managing

data, monitoring, extensions, roles, etc.

•  We have to loose privilege check for some cases to make users happy

12

Privilege Management

•  Creating a new role – rds_superuser •  Allow this role to manage all other non-superuser’s data, kill sessions, etc. •  Allow it to set specific configuration parameters

13






14

Handling OOM

•  OOM (out of memory) cases were frequently observed •  Cloud users tend to buy small class instance firstly, then regularly upgrade to higher class •  Large objects / big temp memory are widely used in some instances •  Concurrent connections are not suitably configured by application

•  When an instance goes OOM, one of its processes is picked up and killed by the Linux kernel. The whole instance will be restarted with all connections lost.

•  How can we minimize OOM impact for users?

15

Handling OOM

Linux Box CGroup

backend

backend

backend

CGroup backen

d

backend

backend

16

Handling OOM

Linux Box CGroup

backend

backend

backend

CGroup backen

d

backend

backend

CGroup

Public CGroup

Kill –USR2

17

Handling IO stalls

•  Symbols of IO stalls (ext4 filesystem with mount –data=order on Linux) •  Long checkpoint fsync time •  Nearly all operations including setup of a new connection hang for > 10s. All instances on the

same machine are affected.

•  But IO usage is low (< 10%) for the problematic disk.

18

Handling IO stalls fsync can be slow

Linux Box CGroup

backend

backend

checkpointer

fsync

Ext4 journaling buffer for metadata

Dirty data in IO queue File 2 metadata

File 1 metadata

19

Handling IO stalls

Linux Box CGroup

backend

backend

checkpointer

fsync

Ext4 journaling buffer for metadata

Dirty data in IO queue File 2 metadata

File 1 metadata

write

write() can be blocked by fsync()

20

Handling IO stalls

•  Mount ext4 with option data=writeback •  Change the database kernel to call sync_file_range() before calling fsync in

checkpointer process

•  Linux kernel enhancement

21




•  Handling IO stall •  Handling OOM


22

Architecture

Master

Slave

HA

Instances

User requests thru VIP

Proxy layer Server Load Balacer

23

Transparent connection pool with Proxy

Backend thread/process

Proxy Instance

•  Connection establishment is costly •  Performance is bad for short-connection applications

24


Backend thread/process

Proxy DB Instance

Conn pool

25


•  Proxy needs to restore all the resources held by this connection before putting it into the connection pool

•  Authentication needs to be performed against the incoming user when connection is reused

•  Change the database kernel to support “SET AUTH_REQ = ‘user/database/md5password/salt’” for authentication

26

Transparent switch-over with Proxy

•  There are quite a few cases requiring instance restart. And we implement a restart with switch-over

•  Upgrade or degrade an instance’s class (e.g. from 2G mem to 4G) •  Move an instance from one machine to another •  Database kernel version upgrades

•  Transparent switch-over is the process that does a switch-over without breaking existing user connections. This is key to SLA and customer satisfactory

27


Master

Slave

HA Proxy layer

DB Instances

28


•  Proxy does the connection re-establishing out of transaction boundaries

•  Only for connections that have not done anything not re-creatable •  Temporary table usage •  Statement preparation

•  Change the kernel to support a command like “set connection_user to xxx”for proxy to re-establish connection without an explicit authentication process

29


Master

Slave

HA Proxy layer

DB Instances

Change user to app user

Application user

30

Read-write auto routing

Master

Slave

HA Proxy layer

DB Instances

Application user

ReadonlyReplica

31






On Building Public Cloud Service for PostgreSQL …...On Building Public Cloud Service for...

Documents

Transcript of On Building Public Cloud Service for PostgreSQL …...On Building Public Cloud Service for...