Postgres demystified

of 107 /107
Postgres Demystified Craig Kerstiens @craigkerstiens http://www.craigkerstiens.com https://speakerdeck.com/u/craigkerstiens/p/postgres-demystified

Transcript of Postgres demystified

Page 1: Postgres demystified

Postgres DemystifiedCraig Kerstiens@craigkerstienshttp://www.craigkerstiens.com

https://speakerdeck.com/u/craigkerstiens/p/postgres-demystified

Page 2: Postgres demystified

Postgres Demystified

Page 3: Postgres demystified

Postgres Demystified

Page 4: Postgres demystified

Postgres Demystified

We’re Hiring

Page 5: Postgres demystified

Getting Setup

Postgres.app

Page 6: Postgres demystified

AgendaBrief History

Developing w/ PostgresPostgres Performance

Querying

Page 7: Postgres demystified

Postgres History

PostgresPostgresQL

Post IngressAround since 1989/1995Community Driven/Owned

Page 8: Postgres demystified

MVCC

Each query sees transactions committed before itLocks for writing don’t conflict with reading

Page 9: Postgres demystified

Why Postgres

Page 10: Postgres demystified

Why Postgres

“ its the emacs of databases”

Page 11: Postgres demystified

Developing w/ Postgres

Page 12: Postgres demystified

Basicspsql is your friend

Page 13: Postgres demystified

Basicspsql is your friend# \dt# \d# \d tablename# \x# \e

Page 14: Postgres demystified

Datatypes

smallint bigint integer

numeric floatserial money char

varchartext

bytea

timestamp

timestamptz date

timetimetz

interval boolean

enum

pointline

polygon

box

circle

path

inetcidr

macaddr tsvector

tsquery

arrayXML

UUID

Page 15: Postgres demystified

smallint bigint integer

numeric floatserial money char

varchartext

bytea

timestamp

timestamptz date

timetimetz

interval boolean

enum

pointline

polygon

box

circle

path

inetcidr

macaddr tsvector

tsquery

arrayXML

UUID

Datatypes

Page 16: Postgres demystified

smallint bigint integer

numeric floatserial money char

varchartext

bytea

timestamp

timestamptz date

timetimetz

interval boolean

enum

pointline

polygon

box

circle

path

inetcidr

macaddr tsvector

tsquery

arrayXML

UUID

Datatypes

Page 17: Postgres demystified

Datatypes

smallint bigint integer

numeric floatserial money char

varchartext

bytea

timestamp

timestamptz date

timetimetz

interval boolean

enum

pointline

polygon

box

circle

path

inetcidr

macaddr tsvector

tsquery

arrayXML

UUID

Page 18: Postgres demystified

CREATE TABLE items ( id serial NOT NULL, name varchar (255), tags varchar(255) [], created_at timestamp);

Datatypes

Page 19: Postgres demystified

CREATE TABLE items ( id serial NOT NULL, name varchar (255), tags varchar(255) [], created_at timestamp);

Datatypes

Page 20: Postgres demystified

DatatypesINSERT INTO itemsVALUES (1, 'Ruby Gem', '{“Programming”,”Jewelry”}', now());

INSERT INTO items VALUES (2, 'Django Pony', '{“Programming”,”Animal”}', now());

Page 21: Postgres demystified

Datatypes

smallint bigint integer

numeric floatserial money char

varchartext

bytea

timestamp

timestamptz date

timetimetz

interval boolean

enum

pointline

polygon

box

circle

path

inetcidr

macaddr tsvector

tsquery

arrayXML

UUID

Page 22: Postgres demystified

Extensionsdblink hstore

citext

ltreeisncube

pgcrypto

tablefunc

uuid-ossp

earthdistance

trigram

fuzzystrmatchpgrowlocks

pgstattuple

btree_gistdict_int

dict_xsynunaccent

Page 23: Postgres demystified

Extensionsdblink hstore

citext

ltreeisncube

pgcrypto

tablefunc

uuid-ossp

earthdistance

trigram

fuzzystrmatchpgrowlocks

pgstattuple

btree_gistdict_int

dict_xsynunaccent

Page 24: Postgres demystified

CREATE EXTENSION hstore;CREATE TABLE users ( id integer NOT NULL, email character varying(255), data hstore, created_at timestamp without time zone, last_login timestamp without time zone);

NoSQL in your SQL

Page 25: Postgres demystified

INSERT INTO users VALUES (1, '[email protected]', 'sex => "M", state => “California”', now(), now()

);

hStore

Page 26: Postgres demystified

SELECT '{"id":1,"email": "[email protected]",}'::json;

JSON

V8 w/ PLV8

9.2

Page 27: Postgres demystified

SELECT '{"id":1,"email": "[email protected]",}'::json;

JSON

V8 w/ PLV8

9.2

Page 28: Postgres demystified

SELECT '{"id":1,"email": "[email protected]",}'::json;

JSON

create or replace functionjs(src text) returns text as $$ return eval( "(function() { " + src + "})" )();$$ LANGUAGE plv8;

V8 w/ PLV8

9.2

Page 29: Postgres demystified

SELECT '{"id":1,"email": "[email protected]",}'::json;

JSON

create or replace functionjs(src text) returns text as $$ return eval( "(function() { " + src + "})" )();$$ LANGUAGE plv8;

V8 w/ PLV8

9.2Bad Idea

Page 30: Postgres demystified

Range Types

9.2

Page 31: Postgres demystified

CREATE TABLE talks (room int, during tsrange);INSERT INTO talks VALUES (3, '[2012-09-24 13:00, 2012-09-24 13:50)');

Range Types

9.2

Page 32: Postgres demystified

CREATE TABLE talks (room int, during tsrange);INSERT INTO talks VALUES (3, '[2012-09-24 13:00, 2012-09-24 13:50)');

Range Types

9.2

ALTER TABLE talks ADD EXCLUDE USING gist (during WITH &&);INSERT INTO talks VALUES (1108, '[2012-09-24 13:30, 2012-09-24 14:00)');ERROR: conflicting key value violates exclusion constraint "talks_during_excl"

Page 33: Postgres demystified

Full Text Search

Page 34: Postgres demystified

Full Text SearchTSVECTOR - Text DataTSQUERY - Search Predicates

Specialized Indexes and Operators

Page 35: Postgres demystified

Datatypes

smallint bigint integer

numeric floatserial money char

varchartext

bytea

timestamp

timestamptz date

timetimetz

interval boolean

enum

pointline

polygon

box

circle

path

inetcidr

macaddr tsvector

tsquery

arrayXML

UUID

Page 36: Postgres demystified

PostGIS

Page 37: Postgres demystified

1. New datatypes i.e. (2d/3d boxes)

PostGIS

Page 38: Postgres demystified

1. New datatypes i.e. (2d/3d boxes)

i.e. SELECT foo && bar ...2. New operators

PostGIS

Page 39: Postgres demystified

1. New datatypes i.e. (2d/3d boxes)

i.e. SELECT foo && bar ...

i.e. person within location, nearest distance

2. New operators

3. Understand relationships and distance

PostGIS

Page 40: Postgres demystified
Page 41: Postgres demystified

Performance

Page 42: Postgres demystified

Sequential Scans

Page 43: Postgres demystified

Sequential Scans

They’re Bad

Page 44: Postgres demystified

Sequential Scans

They’re Bad (most of the time)

Page 45: Postgres demystified

Indexes

Page 46: Postgres demystified

Indexes

They’re Good

Page 47: Postgres demystified

Indexes

They’re Good (most of the time)

Page 48: Postgres demystified

IndexesB-TreeGeneralized Inverted Index (GIN)Generalized Search Tree (GIST)K Nearest Neighbors (KNN)Space Partitioned GIST (SP-GIST)

Page 49: Postgres demystified

IndexesB-Tree

DefaultUsually want this

Page 50: Postgres demystified

IndexesGeneralized Inverted Index (GIN)

Use with multiple values in 1 columnArray/hStore

Page 51: Postgres demystified

IndexesGeneralized Search Tree (GIST)

Full text searchShapes

Page 52: Postgres demystified

Understanding Query Perf

SELECT last_name FROM employees WHERE salary >= 50000;

Given

Page 53: Postgres demystified

Explain# EXPLAIN SELECT last_name FROM employees WHERE salary >= 50000; QUERY PLAN-------------------------------------------------------------------------Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) Filter: (salary >= 50000)(3 rows)

Page 54: Postgres demystified

# EXPLAIN SELECT last_name FROM employees WHERE salary >= 50000; QUERY PLAN-------------------------------------------------------------------------Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) Filter: (salary >= 50000)(3 rows)

Startup Cost

Explain

Page 55: Postgres demystified

# EXPLAIN SELECT last_name FROM employees WHERE salary >= 50000; QUERY PLAN-------------------------------------------------------------------------Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) Filter: (salary >= 50000)(3 rows)

Startup Cost

Max Time

Explain

Page 56: Postgres demystified

# EXPLAIN SELECT last_name FROM employees WHERE salary >= 50000; QUERY PLAN-------------------------------------------------------------------------Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) Filter: (salary >= 50000)(3 rows)

Startup Cost

Max Time

Rows Returned

Explain

Page 57: Postgres demystified

# EXPLAIN ANALYZE SELECT last_name FROM employees WHERE salary >= 50000; QUERY PLAN-------------------------------------------------------------------------Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) (actual time=2.401..295.247 rows=1428 loops=1) Filter: (salary >= 50000)Total runtime: 295.379(3 rows)

Explain

Page 58: Postgres demystified

# EXPLAIN ANALYZE SELECT last_name FROM employees WHERE salary >= 50000; QUERY PLAN-------------------------------------------------------------------------Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) (actual time=2.401..295.247 rows=1428 loops=1) Filter: (salary >= 50000)Total runtime: 295.379(3 rows)

Startup Cost

Explain

Page 59: Postgres demystified

# EXPLAIN ANALYZE SELECT last_name FROM employees WHERE salary >= 50000; QUERY PLAN-------------------------------------------------------------------------Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) (actual time=2.401..295.247 rows=1428 loops=1) Filter: (salary >= 50000)Total runtime: 295.379(3 rows)

Startup Cost Max Time

Explain

Page 60: Postgres demystified

# EXPLAIN ANALYZE SELECT last_name FROM employees WHERE salary >= 50000; QUERY PLAN-------------------------------------------------------------------------Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) (actual time=2.401..295.247 rows=1428 loops=1) Filter: (salary >= 50000)Total runtime: 295.379(3 rows)

Startup Cost Max Time

Rows Returned

Explain

Page 61: Postgres demystified

# EXPLAIN ANALYZE SELECT last_name FROM employees WHERE salary >= 50000; QUERY PLAN-------------------------------------------------------------------------Seq Scan on employees (cost=0.00..35811.00 rows=1 width=6) (actual time=2.401..295.247 rows=1428 loops=1) Filter: (salary >= 50000)Total runtime: 295.379(3 rows)

Startup Cost Max Time

Rows Returned

Explain

Page 62: Postgres demystified

# CREATE INDEX idx_emps ON employees (salary);

Page 63: Postgres demystified

# CREATE INDEX idx_emps ON employees (salary);# EXPLAIN ANALYZE SELECT last_name FROM employees WHERE salary >= 50000; QUERY PLAN-------------------------------------------------------------------------Index Scan using idx_emps on employees (cost=0.00..8.49 rows=1 width=6) (actual time = 0.047..1.603 rows=1428 loops=1) Index Cond: (salary >= 50000)Total runtime: 1.771 ms(3 rows)

Page 64: Postgres demystified

# CREATE INDEX idx_emps ON employees (salary);# EXPLAIN ANALYZE SELECT last_name FROM employees WHERE salary >= 50000; QUERY PLAN-------------------------------------------------------------------------Index Scan using idx_emps on employees (cost=0.00..8.49 rows=1 width=6) (actual time = 0.047..1.603 rows=1428 loops=1) Index Cond: (salary >= 50000)Total runtime: 1.771 ms(3 rows)

Page 65: Postgres demystified

Indexes Pro Tips

Page 66: Postgres demystified

Indexes Pro TipsCREATE INDEX CONCURRENTLY

Page 67: Postgres demystified

Indexes Pro TipsCREATE INDEX CONCURRENTLY

CREATE INDEX WHERE foo=bar

Page 68: Postgres demystified

Indexes Pro TipsCREATE INDEX CONCURRENTLY

CREATE INDEX WHERE foo=bar

SELECT * WHERE foo LIKE ‘%bar% is BAD

Page 69: Postgres demystified

Indexes Pro TipsCREATE INDEX CONCURRENTLY

CREATE INDEX WHERE foo=bar

SELECT * WHERE foo LIKE ‘%bar% is BADSELECT * WHERE Food LIKE ‘bar%’ is OKAY

Page 70: Postgres demystified

Extensionsdblink hstore

citext

ltreeisncube

pgcrypto

tablefunc

uuid-ossp

earthdistance

trigram

fuzzystrmatchpgrowlocks

pgstattuple

btree_gistdict_int

dict_xsynunaccent

Page 71: Postgres demystified
Page 72: Postgres demystified

Cache Hit RateSELECT 'index hit rate' as name, (sum(idx_blks_hit) - sum(idx_blks_read)) / sum(idx_blks_hit + idx_blks_read) as ratio FROM pg_statio_user_indexes union all SELECT 'cache hit rate' as name, case sum(idx_blks_hit) when 0 then 'NaN'::numeric else to_char((sum(idx_blks_hit) - sum(idx_blks_read)) / sum(idx_blks_hit + idx_blks_read), '99.99')::numeric end as ratio FROM pg_statio_user_indexes;)

Page 73: Postgres demystified

Index Hit RateSELECT relname, 100 * idx_scan / (seq_scan + idx_scan), n_live_tupFROM pg_stat_user_tablesORDER BY n_live_tup DESC;

Page 74: Postgres demystified

Index Hit Rate relname | percent_of_times_index_used | rows_in_table ---------------------+-----------------------------+--------------- events | 0 | 669917 app_infos_user_info | 0 | 198218 app_infos | 50 | 175640 user_info | 3 | 46718 rollouts | 0 | 34078 favorites | 0 | 3059 schema_migrations | 0 | 2 authorizations | 0 | 0 delayed_jobs | 23 | 0

Page 75: Postgres demystified

pg_stats_statements

9.2

Page 76: Postgres demystified

pg_stats_statements$ select * from pg_stat_statements where query ~ 'from users where email';

userid │ 16384dbid │ 16388query │ select * from users where email = ?;calls │ 2total_time │ 0.000268rows │ 2shared_blks_hit │ 16shared_blks_read │ 0shared_blks_dirtied │ 0shared_blks_written │ 0local_blks_hit │ 0local_blks_read │ 0local_blks_dirtied │ 0local_blks_written │ 0temp_blks_read │ 0temp_blks_written │ 0time_read │ 0time_write │ 0

9.2

Page 77: Postgres demystified

pg_stats_statements

9.2

Page 78: Postgres demystified

pg_stats_statementsSELECT query, calls, total_time, rows, 100.0 * shared_blks_hit / nullif(shared_blks_hit + shared_blks_read, 0) AS hit_percent FROM pg_stat_statements ORDER BY total_time DESC LIMIT 5;----------------------------------------------------------------------query | UPDATE pgbench_branches SET bbalance = bbalance + ? WHERE bid = ?;calls | 3000total_time | 9609.00100000002rows | 2836hit_percent | 99.9778970000200936 9.2

Page 79: Postgres demystified
Page 80: Postgres demystified

Querying

Page 81: Postgres demystified

Window FunctionsExample:

Biggest spender by state

Page 82: Postgres demystified

SELECT email, users.data->'state', sum(total(items)), rank() OVER (PARTITION BY users.data->'state' ORDER BY sum(total(items)) desc)

FROM users, purchases

WHERE purchases.user_id = users.id GROUP BY 1, 2;

Window Functions

Page 83: Postgres demystified

SELECT email, users.data->'state', sum(total(items)), rank() OVER (PARTITION BY users.data->'state' ORDER BY sum(total(items)) desc)

FROM users, purchases

WHERE purchases.user_id = users.id GROUP BY 1, 2;

Window Functions

Page 84: Postgres demystified

Extensionsdblink hstore

citext

ltreeisncube

pgcrypto

tablefunc

uuid-ossp

earthdistance

trigram

fuzzystrmatchpgrowlocks

pgstattuple

btree_gistdict_int

dict_xsynunaccent

Page 85: Postgres demystified

Fuzzystrmatch

Page 86: Postgres demystified

FuzzystrmatchSELECT soundex('Craig'), soundex('Will'), difference('Craig', 'Will');

Page 87: Postgres demystified

FuzzystrmatchSELECT soundex('Craig'), soundex('Will'), difference('Craig', 'Will');

SELECT soundex('Craig'), soundex('Greg'), difference('Craig', 'Greg');SELECT soundex('Willl'), soundex('Will'), difference('Willl', 'Will');

Page 88: Postgres demystified

Moving Data Around\copy (SELECT * FROM users) TO ‘~/users.csv’;

\copy users FROM ‘~/users.csv’;

Page 89: Postgres demystified

db_linkSELECT dblink_connect('myconn', 'dbname=postgres');SELECT * FROM dblink('myconn','SELECT * FROM foo') AS t(a int, b text);

a | b -------+------------ 1 | example 2 | example2

Page 90: Postgres demystified

Foreign Data Wrappersoracle mysql

informix

twitterfiles

wwwcouch

sybase

ldap

odbc

s3redis jdbc

mongodb

Page 91: Postgres demystified

CREATE EXTENSION redis_fdw;

CREATE SERVER redis_server FOREIGN DATA WRAPPER redis_fdw OPTIONS (address '127.0.0.1', port '6379');

CREATE FOREIGN TABLE redis_db0 (key text, value text) SERVER redis_server OPTIONS (database '0');

CREATE USER MAPPING FOR PUBLIC SERVER redis_server OPTIONS (password 'secret');

Foreign Data Wrappers

Page 92: Postgres demystified

SELECT id, email, value as visits

FROM users, redis_db0

WHERE ('user_' || cast(id as text)) = cast(redis_db0.key as text) AND cast(value as int) > 40;

Query Redis from PostgresSELECT * FROM redis_db0;

Page 93: Postgres demystified

All are not equaloracle mysql

informix

twitterfiles

wwwcouch

sybase

ldap

odbc

s3redis jdbc

mongodb

Page 94: Postgres demystified

Production Ready FDWsoracle mysql

informix

twitterfiles

wwwcouch

sybase

ldap

odbc

s3redis jdbc

mongodb

Page 95: Postgres demystified

Readability

Page 96: Postgres demystified

ReadabilityWITH top_5_products AS ( SELECT products.*, count(*) FROM products, line_items WHERE products.id = line_items.product_id GROUP BY products.id ORDER BY count(*) DESC LIMIT 5)

SELECT users.email, count(*)FROM users, line_items, top_5_productsWHERE line_items.user_id = users.id AND line_items.product_id = top_5_products.idGROUP BY 1ORDER BY 1;

Page 97: Postgres demystified

Common Table ExpressionsWITH top_5_products AS ( SELECT products.*, count(*) FROM products, line_items WHERE products.id = line_items.product_id GROUP BY products.id ORDER BY count(*) DESC LIMIT 5)

SELECT users.email, count(*)FROM users, line_items, top_5_productsWHERE line_items.user_id = users.id AND line_items.product_id = top_5_products.idGROUP BY 1ORDER BY 1;

Page 98: Postgres demystified

Common Table ExpressionsWITH top_5_products AS ( SELECT products.*, count(*) FROM products, line_items WHERE products.id = line_items.product_id GROUP BY products.id ORDER BY count(*) DESC LIMIT 5)

SELECT users.email, count(*)FROM users, line_items, top_5_productsWHERE line_items.user_id = users.id AND line_items.product_id = top_5_products.idGROUP BY 1ORDER BY 1;

Page 99: Postgres demystified

Common Table ExpressionsWITH top_5_products AS ( SELECT products.*, count(*) FROM products, line_items WHERE products.id = line_items.product_id GROUP BY products.id ORDER BY count(*) DESC LIMIT 5)

SELECT users.email, count(*)FROM users, line_items, top_5_productsWHERE line_items.user_id = users.id AND line_items.product_id = top_5_products.idGROUP BY 1ORDER BY 1; ’

Page 100: Postgres demystified

Common Table ExpressionsWITH top_5_products AS ( SELECT products.*, count(*) FROM products, line_items WHERE products.id = line_items.product_id GROUP BY products.id ORDER BY count(*) DESC LIMIT 5)

SELECT users.email, count(*)FROM users, line_items, top_5_productsWHERE line_items.user_id = users.id AND line_items.product_id = top_5_products.idGROUP BY 1ORDER BY 1; ’Don’t do this in production

Page 101: Postgres demystified

Brief HistoryDeveloping w/ PostgresPostgres Performance

Querying

Page 102: Postgres demystified

Extras

Page 103: Postgres demystified

ExtrasListen/Notify

Page 104: Postgres demystified

ExtrasListen/NotifyPer Transaction Synchronous Replication

Page 105: Postgres demystified

ExtrasListen/NotifyPer Transaction Synchronous ReplicationSELECT for UPDATE

Page 106: Postgres demystified

Postgres - TLDR Datatypes

Conditional IndexesTransactional DDL

Foreign Data WrappersConcurrent Index Creation

ExtensionsCommon Table Expressions

Fast Column AdditionListen/Notify

Table InheritancePer Transaction sync replication

Window functionsNoSQL inside SQL

Momentum

Page 107: Postgres demystified

Questions?Craig Kerstiens@craigkerstienshttp://www.craigkerstiens.com

https://speakerdeck.com/u/craigkerstiens/p/postgres-demystified