PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).

21
PostgreSQL Troubleshoot On-Line. Ilya Kosmodemyansky Alexey Lesovsky

Transcript of PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).

Page 1: PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).

PostgreSQL Troubleshoot On-Line.

Ilya KosmodemyanskyAlexey Lesovsky

Page 2: PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).

case 1: Bad release. Overview.

- Symptoms: - significant load increase, - slowing operations. - Often unpredictable: - we don't know where the problem occurs. - Emergency: - problem must be found and resolved ASAP.

Page 3: PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).

case 1: Bad release. Troubleshoot.

- Outside the database - top, sysstat, etc... - nagios/zabbix/... - Inside the database - pgbadger/loganalyze/etc... - pg_stat_statements

Page 4: PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).

case 1: Bad release. Outside the database.

- top: - cpu usage, load average, swapping, iowait. - sysstat: - disk utilization (iostat), - resource consumption (sar).

Page 5: PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).

case 1: Bad release. Outside the database.

- Okmeter: - online monitoring service, - rich feature/plugin set, - postgresql good support.

Page 6: PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).

case 1: Bad release. Inside the database.

- Log analyze (pgBadger) - huge logs - read log before report creating - a lot of time need tobuild report - pg_stat_statements (contrib) - small storage footprint, - quick and flexible reports.

Page 7: PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).

case 1: Bad release. Inside the database.

- query_stat_total.sql - https://github.com/PostgreSQL-Consulting/pg-utils

Page 8: PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).

case 1: Bad release. Query #1.

SELECT p.id, p.ratingFROM posts p LEFT JOIN complaints com ON (com.post_id = p.id AND com.user_id= ?) WHERE p.is_deleted IS FALSE AND com.is_hide IS NOT TRUE AND p.type_id != ? ORDER BY p.rating DESC LIMIT ?;

Page 9: PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).

case 1: Bad release. Query #1. JOIN -> (NOT) EXISTS

SELECT p.id, p.rating FROM posts p WHERE p.is_deleted IS FALSE AND p.type_id != ?AND NOT EXISTS (SELECT 1 FROM complaints com

WHERE com.post_id = p.id AND user_id = ? AND is_hide = true)

ORDER BY p.rating DESC LIMIT ?;

Page 10: PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).

case 1: Bad release. Query #2.

SELECT * FROM tags WHERE (tags.title ilike ?)

Trigram Index.

CREATE INDEX tags_title_trigram_key on tags using gin(title gin_trgm_ops);

Page 11: PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).

case 1: Bad release. Query #3.

SELECT post.* FROM post JOIN domain ON post.domain_id = domain.id LEFT OUTER JOIN domain_acl ON domain_acl.domain_id = domain.id AND domain_acl.user_id = ? WHERE post.deleted = ? AND post.domain_id IN (?, ?, ?, ?, ?, ?, ?, ?) AND ((domain.flags & ?) = ? OR (domain_acl.acl & ?) = ?) AND post.id NOT IN (?, ?) ORDER BY post.last_activity DESC LIMIT ? OFFSET ?

Page 12: PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).

case 1: Bad release. Query #3. Index Only Scan

SELECT * FROM post WHERE id IN (SELECT post.id

FROM post JOIN domain ON post.domain_id = domain.id LEFT OUTER JOIN domain_acl

ON domain_acl.domain_id = domain.id AND domain_acl.user_id = ?

WHERE post.deleted = ? AND post.domain_id IN (?, ?, ?, ?, ?, ?, ?, ?) AND ((domain.flags & ?) = ? OR (domain_acl.acl & ?) = ?) AND post.id NOT IN (?, ?) ORDER BY post.last_activity DESC LIMIT ? OFFSET ?)

order by post.last_activity DESC

CREATE INDEX post_domain_id_last_activity_id_deleted_partial ON post USING btree (domain_id, last_activity, id, deleted) where deleted = 0;

Page 13: PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).

case 1: Bad release. Query #4.

SELECT * FROM "group" WHERE ("group".group_vislvl_content >= ?) AND (group_main_domain_id IS NULL OR group_main_domain_id IN(?,?)) AND ("group".obj_pics_count + "group".group_persons_count + "group".group_blog_posts_count + "group".group_wiki_count >= ?) AND "group".group_is_demo = ? AND "group".obj_status_did = ? ORDER BY "group".group_persons_count desc, "group".obj_created asc LIMIT ?;

Page 14: PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).

case 1: Bad release. Query #4. Partial Index

CREATE INDEX group_special2_key ON "group" USING btree (group_persons_count DESC, obj_created) WHERE ("group".obj_pics_count + "group".group_persons_count +

"group".group_blog_posts_count + "group".group_wiki_count >= 1);

Page 15: PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).

case 2: More app servers... We need more...

- project grow - load increasing - add more app servers - more apps -> more db connections

Page 16: PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).

case 2: More app servers... We need more...

- too much db connections are bad - high resource contention - os overhead (memory, locks, forks)

Page 17: PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).

case 2: More app servers... We need more...

- pgbouncer - lightweight connection pooler - stable, simple, fast (libevent) - use pgbouncer between apps and database

Page 18: PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).

case 2: More app servers... We need more...

- simple test: without pgbouncer - pgbench -C -c 32 -T 300 -U postgres shopdb transaction type: TPC-B (sort of), scaling factor: 128, query mode: simple, number of clients: 32, number of threads: 1, duration: 300 s number of transactions actually processed: 253628 latency average: 37.851 ms tps = 845.403711 (including connections establishing) tps = 15320.442789 (excluding connections establishing)

Page 19: PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).

case 2: More app servers... We need more...

- simple test: with pgbouncer - pgbench -C -c 32 -T 300 -U postgres shopdb transaction type: TPC-B (sort of), scaling factor: 128, query mode: simple, number of clients: 32, number of threads: 1, duration: 300 s number of transactions actually processed: 2689931 latency average: 3.569 ms tps = 8966.389025 (including connections establishing) tps = 19225.431659 (excluding connections establishing)

Page 20: PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).

case 2: More app servers... We need more...

- total: 300 seconds with 32 clients on 8-core server - latency: 37.8ms vs. 3.5ms - total transactions: 253628 vs. 2689931 - tps: 15320 vs. 19225

Page 21: PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).

Thanks.

Questions?