On Building Public Cloud Service for PostgreSQL …...On Building Public Cloud Service for...
Transcript of On Building Public Cloud Service for PostgreSQL …...On Building Public Cloud Service for...
On Building Public Cloud Service for PostgreSQL and MySQL
Guangzhou Zhang, Alibaba Cloud [email protected]
2
Agenda • Resource isolation
• CPU/IO/MEM • Disk
• Privilege management • Critical issues
• Handling IO hang • Handling OOM
• Enhance the service with Proxy • Transparent switch-over with Proxy • Transparent connection pool with Proxy • Read write auto routing
3
Self-Intro • Guangzhou Zhang, Developer from Alibaba Cloud RDS Team • Alibaba Cloud RDS Team
• Covers MySQL/PostgreSQL/SQL Server • Handles infrastructure, database kernel (MySQL/PG), operations, customer support, etc. • Run one of world largest database instance cluster, and still growing more than 100% YTY
4
Architecture
Master
Slave
HA
Instances
User requests thru VIP
Proxy layer Server Load Balancer
5
Agenda • Resource isolation
• CPU/IO/MEM • Disk
• Privilege management • Critical issues
• Handling IO stalls • Handling OOM
• Enhance the service with Proxy • Transparent switch-over with Proxy • Transparent connection pool with Proxy • Read write auto routing
6
Resource isolation – CPU/IO/MEM • Use bare-metal machines (no VM no network storage )
• Locate up to hundred of instances on the same host • Take full control over the whole stack, minimize dependency with predictable SLA and
performance • Control resource allocation layout
7
Resource isolation – CPU/IO/MEM • Use cgroup for CPU/IO/MEM isolation
cgroup for Instance A
master
backend backend
backend
cgroup for Instance B
slave
backend backend
backend
Host Machine
8
Resource isolation – CPU/IO/MEM • Use cgroup for CPU/IO/MEM isolation
• Trick: Separate data and write-ahead-logging (WAL) disks, loose the limit of WAL disk IOPS, why?
• Maintenance work inside database requires heavy IOs and should be finished as soon as possible
• Strict limit on WAL IOPS on slave could result in severe replication lag
• Strict limit on WAL IOPS on master could easily cause a failover switch
• For better write performance
Why we have to loose the limit? • No one will constantly be writing to database at a
high pace, since database storage has a upper limit (2T) and is expensive
• The storage usage of WAL files is charged • Relocate ‘hot’ instances to different hosts as needed
Why we are able to loose the limit?
cgroup for Instance A master
backend backend
backend
WAL Disk Data Disk
10000 IOPS 500 IOPS
9
Resource isolation - Disk
• Regularly collect disk usage of each database instance, mark those with excessive disk usage as locked-down
• Change the database kernel to add a parameter that controls user access: • Allow read access • Disallow any write to instance • Allow DROP/TRUNCATE commands
10
Agenda • Resource isolation
• CPU/IO/MEM • Disk
• Privilege management • Critical issues
• Handling IO stalls • Handling OOM
• Enhance the service with Proxy • Transparent switch-over with Proxy • Transparent connection pool with Proxy • Read write auto routing
11
Privilege Management
• Superuser role is kept from the end users for security reason
• Prevent users from messing up critical configurations for HA or maintenance • Convenient for us to handle end users’ roles and maintenance roles differently in code
• Users keep on asking for more from “superuser” privilege set for managing
data, monitoring, extensions, roles, etc.
• We have to loose privilege check for some cases to make users happy
12
Privilege Management
• Creating a new role – rds_superuser • Allow this role to manage all other non-superuser’s data, kill sessions, etc. • Allow it to set specific configuration parameters
13
Agenda • Resource isolation
• CPU/IO/MEM • Disk
• Privilege management • Critical issues
• Handling IO stalls • Handling OOM
• Enhance the service with Proxy • Transparent switch-over with Proxy • Transparent connection pool with Proxy • Read write auto routing
14
Handling OOM
• OOM (out of memory) cases were frequently observed • Cloud users tend to buy small class instance firstly, then regularly upgrade to higher class • Large objects / big temp memory are widely used in some instances • Concurrent connections are not suitably configured by application
• When an instance goes OOM, one of its processes is picked up and killed by the Linux kernel. The whole instance will be restarted with all connections lost.
• How can we minimize OOM impact for users?
15
Handling OOM
Linux Box CGroup
backend
backend
backend
CGroup backen
d
backend
backend
16
Handling OOM
Linux Box CGroup
backend
backend
backend
CGroup backen
d
backend
backend
CGroup
Public CGroup
Kill –USR2
17
Handling IO stalls
• Symbols of IO stalls (ext4 filesystem with mount –data=order on Linux) • Long checkpoint fsync time • Nearly all operations including setup of a new connection hang for > 10s. All instances on the
same machine are affected.
• But IO usage is low (< 10%) for the problematic disk.
18
Handling IO stalls fsync can be slow
Linux Box CGroup
backend
backend
checkpointer
fsync
Ext4 journaling buffer for metadata
Dirty data in IO queue File 2 metadata
File 1 metadata
19
Handling IO stalls
Linux Box CGroup
backend
backend
checkpointer
fsync
Ext4 journaling buffer for metadata
Dirty data in IO queue File 2 metadata
File 1 metadata
write
write() can be blocked by fsync()
20
Handling IO stalls
• Mount ext4 with option data=writeback • Change the database kernel to call sync_file_range() before calling fsync in
checkpointer process
• Linux kernel enhancement
21
Agenda • Resource isolation
• CPU/IO/MEM • Disk
• Privilege management • Critical issues
• Handling IO stall • Handling OOM
• Enhance the service with Proxy • Transparent switch-over with Proxy • Transparent connection pool with Proxy • Read write auto routing
22
Architecture
Master
Slave
HA
Instances
User requests thru VIP
Proxy layer Server Load Balacer
23
Transparent connection pool with Proxy
Backend thread/process
Proxy Instance
• Connection establishment is costly • Performance is bad for short-connection applications
24
Transparent connection pool with Proxy
Backend thread/process
Proxy DB Instance
Conn pool
25
Transparent connection pool with Proxy
• Proxy needs to restore all the resources held by this connection before putting it into the connection pool
• Authentication needs to be performed against the incoming user when connection is reused
• Change the database kernel to support “SET AUTH_REQ = ‘user/database/md5password/salt’” for authentication
26
Transparent switch-over with Proxy
• There are quite a few cases requiring instance restart. And we implement a restart with switch-over
• Upgrade or degrade an instance’s class (e.g. from 2G mem to 4G) • Move an instance from one machine to another • Database kernel version upgrades
• Transparent switch-over is the process that does a switch-over without breaking existing user connections. This is key to SLA and customer satisfactory
27
Transparent switch-over with Proxy
Master
Slave
HA Proxy layer
DB Instances
28
Transparent switch-over with Proxy
• Proxy does the connection re-establishing out of transaction boundaries
• Only for connections that have not done anything not re-creatable • Temporary table usage • Statement preparation
• Change the kernel to support a command like “set connection_user to xxx”for proxy to re-establish connection without an explicit authentication process
29
Transparent switch-over with Proxy
Master
Slave
HA Proxy layer
DB Instances
Change user to app user
Application user
30
Read-write auto routing
Master
Slave
HA Proxy layer
DB Instances
Application user
ReadonlyReplica
31
Agenda • Resource isolation
• CPU/IO/MEM • Disk
• Privilege management • Critical issues
• Handling IO stalls • Handling OOM
• Enhance the service with Proxy • Transparent switch-over with Proxy • Transparent connection pool with Proxy • Read write auto routing