Managing terabytes

43
Some Conference PgConf.EU 2011 1 Managing Terabytes Problems and solutions with operating large Postgres installations Selena Deckelmann Prime Radiant @selenamarie

description

 

Transcript of Managing terabytes

Page 1: Managing terabytes

Some Conference

PgConf.EU 2011 1

Managing TerabytesProblems and solutions with

operating large Postgres installations

Selena DeckelmannPrime Radiant@selenamarie

Page 2: Managing terabytes

Some Conference

PgConf.EU 2011

About me.

2

Page 3: Managing terabytes

Some Conference

PgConf.EU 2011

• 1.6 TB, 1 cluster, Version 8.2

• 1.1 TB, 1 cluster, Version 8.3

• 8.4/9.0 Dev systems

• Working toward 9.0 into prod (May 2011)

• pgpool, Redis, RabbitMQ, NFS

The Environment

3

Page 4: Managing terabytes

Some Conference

PgConf.EU 2011

• daily peak: ~3000 commits per second

• average writes: 4 MBps

• average reads: 8 MBps

Some stats

4

Page 5: Managing terabytes

Some Conference

PgConf.EU 2011

What’s good

• Most queries are fast!

• Benchmarks say we’re pushing the limits of the hardware

• Developers love working with Postgres

Page 6: Managing terabytes

Some Conference

PgConf.EU 2011

And lots more. But...

Page 7: Managing terabytes

Some Conference

PgConf.EU 2011

Page 8: Managing terabytes

Some Conference

PgConf.EU 2011

The Problems

1. System resource exhaustion

2. Everything is slow: Huge catalogs, Backups

3. Handling VACUUM problems: Bloat, Transaction wraparound

4. Upgrades: Minor, Major

Page 9: Managing terabytes

Some Conference

PgConf.EU 2011

System Resource Exhaustion

Page 10: Managing terabytes

Some Conference

PgConf.EU 2011

Problem: UFS on Solaris

“The only way to add more inodes to a UFS filesystem is: 1. destroy the filesystem and create a new filesystem with a higher inode density 2. enlarge the filesystem - growfs man page”

Running out of inodes

10

Page 11: Managing terabytes

Some Conference

PgConf.EU 2011

Solution 0: Delete files.

Solution 1: Sharding/bigger filesystem

Solution 2: xfs

Running out of inodes

11

Page 12: Managing terabytes

Some Conference

PgConf.EU 2011

Problem: Too many open files by the database.selena@lulu:~ #508 18:43 :) sudo lsof -p 19121 | wc

40 355 4151

Solution: You need a connection pooler.

Running out of file descriptors

12

Page 13: Managing terabytes

Some Conference

PgConf.EU 2011

Solution: You need a connection pooler.

Recommended: pgbouncer (threaded, online upgrade)pgpool-II (failover)

Running out of file descriptors

13

Page 14: Managing terabytes

Some Conference

PgConf.EU 2011

Everything is slow.

Page 15: Managing terabytes

Some Conference

PgConf.EU 2011

409,994 tables

Huge Catalogs

15

Page 16: Managing terabytes

Some Conference

PgConf.EU 2011

Minor mistake in parent table definitions:

not null default nextval('important_sequence'::text)

vs

not null default nextval('important_sequence'::regclass)

16

Maintenance problem

Page 17: Managing terabytes

Some Conference

PgConf.EU 2011

Problem: Slow scans of catalog data

Solution: Upgrade to Postgres 8.4 or higher

But really: Avoid making a cluster with >400k tables.

Huge Catalogs

17

Page 18: Managing terabytes

Some Conference

PgConf.EU 2011 18

9,019,868 total data points for table stats

4,550,770 total data points for index stats

Problem: This is slow to write.(128 MB written every second or so)

Stats collection

Page 19: Managing terabytes

Some Conference

PgConf.EU 2011 19

9,019,868 total data points for table stats

4,550,770 total data points for index stats

Soution: Move stats file to RAM.

stats_temp_directory (8.4 or higher)There’s a trivial patch for earlier versions.

Stats collection

Page 20: Managing terabytes

Some Conference

PgConf.EU 2011 20

9,019,868 total data points for table stats

4,550,770 total data points for index stats

Problem: This is slow to read.

Stats collection

Page 21: Managing terabytes

Some Conference

PgConf.EU 2011 21

9,019,868 total data points for table stats

4,550,770 total data points for index stats

Solution: Supposedly, this is better in 8.4 and higher.(fewer writes per minute)Still probably not fast.

Stats collection

Page 22: Managing terabytes

Some Conference

PgConf.EU 2011

pg_dump takes longer and longer...

Backups

22

Page 23: Managing terabytes

Some Conference

PgConf.EU 2011

   backup     |   duration -------------------+--------------------  2009­11­22  |  02:44:36.821475   2009­11­23  |  02:46:20.003507  2009­11­24  |  02:47:06.260705  2009­12­06  |  07:13:04.174964  2009­12­13  |  05:00:01.082676  2009­12­20  |  06:24:49.433043  2009­12­27  |  05:35:20.551477  2010­01­03  |  07:36:49.651492  2010­01­10  |  05:55:02.396163  2010­01­17  |  07:32:33.277559  2010­01­24  |  06:22:46.522319  2010­01­31  |  10:48:13.060888  2010­02­07  |  21:21:47.77618  2010­02­14  |  14:32:04.638267  2010­02­21  |  11:34:42.353244  2010­02­28  |  11:13:02.102345

23

Page 24: Managing terabytes

Some Conference

PgConf.EU 2011

Problem: pg_dump is too slow.

Solutions:

• patching pg_dump for SELECT ... LIMIT

• crank down shared_buffers

• Stop using pg_dump for backups

• 64-bit might help

Backups

24

Page 25: Managing terabytes

Some Conference

PgConf.EU 2011

How not to migrate to a 64-bit system

25

Page 26: Managing terabytes

Some Conference

PgConf.EU 2011

Install 32-bit Postgres and libraries on a 64-bit system. Install 64-bit Postgres/libs of the same version. Copy “hot backup” from 32-bit sys over to 64-bit sys. Run pg_dump from 64-bit version on 32-bit Postgres.

Title Text

26

Page 27: Managing terabytes

Some Conference

PgConf.EU 2011

But lots of people use them that way!

A single warm standby is not a backup.

27

Page 28: Managing terabytes

Some Conference

PgConf.EU 2011

Ship WAL from Solaris x86 -> Linux

It did work!

28

Page 29: Managing terabytes

Some Conference

PgConf.EU 2011

Handling VACUUM problems

Page 30: Managing terabytes

Some Conference

PgConf.EU 2011

Problem: Lots of dead tuples in tables.

• Frequent UPDATEs to long tables of log data

• Frequent DELETEs without a VACUUM

• A terabyte of dead tuples

Bloat

30

Page 31: Managing terabytes

Some Conference

PgConf.EU 2011

Solution: Write custom scripts to clean

• VACUUM for small things

• CLUSTER for everything else

• Considered TRUNCATE

Fixing bloat

31

Page 32: Managing terabytes

Some Conference

PgConf.EU 2011 32

Application allowed users to initiate ALTER TABLE.

Regular VACUUM couldn’t fix it.

VACUUM FULL of the catalog takes 2+ hours.

Use of NOTIFY/LISTEN can also cause bloat.

Catalog Bloat

Page 33: Managing terabytes

Some Conference

PgConf.EU 2011

Problem: autovacuum set off too frequently

Watch age(datfrozenxid)

Solution: Increase autovacuum_freeze_max_age (default is 200 million, we increase to one billion)

Transaction wraparound avoidance

33

Page 34: Managing terabytes

Some Conference

PgConf.EU 2011

Upgrades

Page 35: Managing terabytes

Some Conference

PgConf.EU 2011

Problem: Restarting Postgres causes bad application performance.

• Require a start/stop of database

• Unexpected CHECKPOINT

• Cold cache

Minor upgrades

35

Page 36: Managing terabytes

Some Conference

PgConf.EU 2011

Solutions:

• Plan for a CHECKPOINT before shutdown

• Warm the cache (Queries that exercise indexes, maybe table scans)

Minor upgrades

36

Page 37: Managing terabytes

Some Conference

PgConf.EU 2011

Problem: Major upgrades are a PITA.

• <8.2 - no pg_upgrade :(

• Time your restores.

• Document your SLAs.

Major Version upgrades

37

Page 38: Managing terabytes

Some Conference

PgConf.EU 2011

Solutions: :(

• >=8.3 - pg_upgrade

• Time your restores.

• Document your SLAs.

Major Version upgrades

38

Page 39: Managing terabytes

Some Conference

PgConf.EU 2011

Solutions: :(

• Write tools to migrate data

• Shard

• Trigger-based replication

Major Version upgrades

39

Page 40: Managing terabytes

Some Conference

PgConf.EU 2011

The Problems

1. System resource exhaustion

2. Everything is slow: Huge catalogs, Backups

3. Handling VACUUM problems: Bloat, Transaction wraparound

4. Upgrades: Minor, Major

Page 41: Managing terabytes

Some Conference

PgConf.EU 2011

The Solutions

1. System resource exhaustionChoose a better filesystem, Pooling

2. Everything is slow: Huge catalogs, BackupsDon’t do that, Monitor & Binary backups

Page 42: Managing terabytes

Some Conference

PgConf.EU 2011

The Solutions

3. Handling VACUUM problems: Bloat, Transaction wraparoundDeveloper education, Monitoring, Cleanup, *_max_freeze_age

4. Upgrades: Minor, MajorPlan, Plan, Plan (CHECKPOINT, warm cache, pg_upgrade)

Page 43: Managing terabytes

Some Conference

PgConf.EU 2011 43

Managing TerabytesProblems and solutions with

operating large Postgres installations

Selena DeckelmannPrime Radiant@selenamarie

Thanks!