PostgreSQL Scaling And Failover
-
Upload
john-paulett -
Category
Technology
-
view
32.160 -
download
0
description
Transcript of PostgreSQL Scaling And Failover
PostgreSQL
John Paulett
October 26, 2009
High Availability & Scaling
10/26/2009 2
Overview
Scaling Overview– Horizontal & Vertical Options
High Availability Overview
Other Options
Suggested Architecture
Hardware Discussion
10/26/2009 3
What are we trying to solve?
Survive server failure?– Support an uptime SLA (e.g. 99.9999%)?
Application scaling?– Support additional application demand
10/26/2009 4
What are we trying to solve?
Survive server failure?– Support an uptime SLA (e.g. 99.9999%)?
Application scaling?– Support additional application demand
→ Many options, each optimized for different constraints
10/26/2009 5
Scaling Overview
10/26/2009 6
How To Scale
Horizontal Scaling– “Google” approach– Distribute load across multiple servers– Requires appropriate application architecture
Vertical Scaling– “Big Iron” approach– Single, massive machine (lots of fast processors,
RAM, & hard drives)
10/26/2009 7
Horizontal DB Scaling
Load Balancing– Distribute operations to multiple servers
Partitioning– Cut up the data (horizontal) or tables (vertical)
and put them on separate servers– aka “sharding”
10/26/2009 8
Basic Problem when Load Balancing
Difficult to maintain consistent state between servers (remember ACID), especially when dealing with writes
4 PostgreSQL Load Balancing Methods:– Master-Slave Replication– Statement-Based Replication Middleware– Asynchronous Multimaster Replication– Synchronous Multimaster Replication
10/26/2009 9
Master-Slave Replication
Master handles writes, slaves handle reads
Asynchronous replication – Possible data loss on master failure
Slony-I– Does not automatically propagate schema changes – Does not offer single connection point– Requires separate solution for master failures
10/26/2009 10
Statement-Based Replication Middleware
Intercept SQL queries, send writes to all servers, reads to any server
Possible issues using random(), CURRENT_TIMESTAMP, & sequences
pgpool-II– Connection Pooling, Replication, Load Balancing,
Parallel Queries, Failover
10/26/2009 11
pgpool-II
10/26/2009 12
Synchronous Multimaster Replication
Writes & reads on any server
Not implemented in PostgreSQL, but application code can mimic via two-phase commit
10/26/2009 13
Load Balancing Issue
Scaling writes breaks down at a certain point
10/26/2009 14
Partitioning
Requires heavy application modification
Performing queries across partitions is problematic (not possible)
PL/Proxy can help
10/26/2009 15
Vertical DB Scaling
“Buying a bigger box is quick(ish). Redesigning software is not.”● Cal Henderson, Flickr
37 Signals Basecamp upgraded to 128 GB DB server: “don’t need to pay the complexity tax yet”● David Heinemeier Hansson, Ruby on Rails
10/26/2009 16
Sites Running on Single DB
StackOverflow– MS SQL, 48GB RAM, RAID 1 OS, RAID 10 for data
37Signals Basecamp– MySQL, 128GB RAM. Dell R710 or Dell 2950
10/26/2009 17
High Availability Overview
10/26/2009 18
High Availability
Application still up even after node failure– (Also try to prevent failure with appropriate
hardware)
PostgreSQL High Availability Options– pg-pool – Shared Disk Failover– File System Replication– Warm Standby with Point-In-Time Recovery (PITR)
Often still need heartbeat application
10/26/2009 19
Shared Disk Failover
Use single disk array to hold database's data files.
– Network Attached Storage (NAS)– Network File System (NFS)
Disk array is central point of failure
Need heartbeat to bring 2nd server online
10/26/2009 20
File System Replication
File system is mirrored to another computer
DRDB– Linux filesystem replication
Need heartbeat to bring 2nd server online
10/26/2009 21
Point in Time Recovery
“Log shipping”– Write Ahead Logs sent to and replayed on standby– Included in PostgreSQL 8.0+– Asynchronous - Potential loss of data
Warm Standby– Standbys' hardware very similar to primary's– Need heartbeat to bring 2nd server online
10/26/2009 22
Heartbeat
“STONITH” (Shoot the Other Node In The Head)
– Prevent multiple nodes thinking they are the master
Linux-HA– Creates cluster, takes nodes out when they fail
10/26/2009 23
Additional Options
10/26/2009 24
Additional Options
Tune PostgreSQL– Defaults designed to “run anywhere”– pgbench, VACUUM/ANALYZE
Tune Queries– EXPLAIN
Caching (avoid the database)– memcached– Ehcache
10/26/2009 25
Radical Additional Options
“NoSQL” database– CouchDB, MongoDB, HBase, Cassandra, Redis– Document store– Map/Reduce querying
10/26/2009 26
Suggested Architecture
10/26/2009 27
Current Production Setup
DB and Web server on same machine
No failover
10/26/2009 28
Suggested Architecture
2 nice machines
Point in Time Recovery with Heartbeat
Tune PostgreSQL
Monitor & improve slow queries
Add in Ehcache as we touch code
→ Leave horizontal scaling for another day
10/26/2009 29
Initial Architecture
High Availability
10/26/2009 30
Future Architecture
Scale up application servers horizontally as needed
Improve DB Hardware
10/26/2009 31
Hardware Options
PostgreSQL typically constrained by RAM & Disk IO, not processor
64-bit, as much memory as possible
Data Array– RAID10 with 4 drives (not RAID 5), 15k RPM
Separate OS Drive / Array
10/26/2009 32
Dell R710
Processor: Xeon
4x 15k HD in RAID10
24GB (3x 8GB) RAM (up to 6x 16GB)
=$6,905
10/26/2009 33
Other Considerations
Should have Test environment mimic Production
– Same database setup– Provides environment for experimentation
Can host multiple DBs on single cluster
10/26/2009 34
References
http://37signals.com/svn/posts/1509-mr-moore-gets-to-punt-on-sharding
http://37signals.com/svn/posts/1819-basecamp-now-with-more-vroom
http://anchor.com.au/hosting/dedicated/Tuning_PostgreSQL_on_your_Dedicated_Server
http://blogs.amd.co.at/robe/2009/05/testing-postgresql-replication-solutions-log-shipping-with-pg-standby.html
http://blog.stackoverflow.com/2009/01/new-stack-overflow-servers-ready/
http://developer.postgresql.org/pgdocs/postgres/high-availability.html
http://developer.postgresql.org/pgdocs/postgres/pgbench.html
https://developer.skype.com/SkypeGarage/DbProjects/PlProxy
http://wiki.postgresql.org/wiki/Performance_Optimization
http://www.postgresql.org/docs/8.4/static/warm-standby.html
http://www.postgresql.org/files/documentation/books/aw_pgsql/hw_performance/
http://www.slony.info/
10/26/2009 35
Additional Links
http://ehcache.org/
http://highscalability.com/skype-plans-postgresql-scale-1-billion-users
http://www.25hoursaday.com/weblog/2009/01/16/BuildingScalableDatabasesProsAndConsOfVariousDatabaseShardingSchemes.aspx
http://www.danga.com/memcached/
http://www.mysqlperformanceblog.com/2009/08/06/why-you-dont-want-to-shard/
http://www.slideshare.net/iamcal/scalable-web-architectures-common-patterns-and-approaches-web-20-expo-nyc-presentation
10/26/2009 36