Bigger data with PostgreSQL 9

© by Numius nvOpen systems, Smarter people

Bigger data with PostgreSQL 9

Datawarehousing in the 21st century.

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

http://creativecommons.org/licenses/by-sa/3.0/


The presenter..

• Bert Desmet

• Consultant @ Deloitte

• System Engineer / DBA for deloitteanalytics.eu

• 'devop'?


agenda

• Introduction

• Release the elephants!

• Impacting factors

• Divide et impera

• Basic configuration

• Passing the speed limits

• Keep your database fit


Big data?

● 44x data growth per year!

● About 35.2 zettabyte by 2020

● 80% of data is unstructured

● The volume will grow by a whopping 650% in the next 5years

● 80% of organisations will use cloud analytics

● By 2014 80% of eneterprises will want a saas based bi system


Know your limits

● DB2

● More load

● Scaling

● Speed

● Data size

● Pricing

© by Numius nvOpen systems, Smarter people6 Footer

Release the elephants!


PostgreSQL 9

● Good for big databases

● Easy maintenance

● Scales!

● Very fast

● Extendable

Impacting factors


Higly impacting operations

• Dataload

• In bulk (ETL)

• Row by row. Up to 100k rows / minute

• Datafetch (Reporting)

• We do like joins. The more the better.


Extra problems

• a lot of I/O

• A lot of cpu power (index creation)

• A lot of locks


The solution?

• Use at least 2 servers

• Set up binary replication

• Put a lot of ram in your servers.


Dataflow

© by Numius nvOpen systems, Smarter people13 Footer

Devide et Impera


Replication with postgres

• 8.3 Warm Standby

• 9.0 Async. Binary Replication

• 9.1 Synchronous Replication

• 9.2 Cascading Replication

• 9.3 more improvents towards fail overs / switching masters

• 9.4 Multimaster Binary Replication?


Configure replication

• Wal_level = ‘host standby’

• Checkpoint_segments >= 32

• Checkpoint_completetion_target >= 0.8

• Hot_standby = on

• Hot_standby_feedback = on


Keep it simple, stupid

• 2nd quadrant is pretty awesome

• Barman for backups

• Repmgr for replication management


Basic configuration


Raise those memory limits!

• shared_buffers = 1/8 to ¼ of RAM

• work_mem = 128MB to 1GB

• maintenance_work_mem = 512MB to 1GB

• temp_buffers = 128MB to 1GB

• effective_cache_size = ¾ of RAM

• wal_buffers = 32MB


Tune the planner for correct planning

• Random_page_cost = 3

• Cpu_tuple_cost = 0.1

• Contraint_exclusion=on

• From_collapse_limit => 12

• Join_collapse_limit => 12


Passing the speed limits


Use partitions

• Think about the partition key!

• Trigger based for row / row inserts

• Rule based for bulk inserts

• Make sure you add constraints


Use indexes

• Learn to read query explains

• Use http://explain.depesz.com/

• Don’t over index

http://explain.depesz.com/

http://explain.depesz.com/


Other sane things to do

• Use unique indexes

• Auto created when defining a primary key

• Use clustered indexes

• And cluster those tables regularly


Use partial indexes

• Can only be found in Postgres and Mysql.

• Really usefull on big tables

• Disadvantage: no ‘moving’ indexes. Eg: index for current_day.

Keep your database fit


Vacuum

• Disable autovacuum for datawarehouses

• Vacuum once a day

• Check regulary if the vacuums to run!

• Prevents data loss

• Prevents the database to go out of control, size wise


Analyze

• Analyze once a day

• Together with vacuum

• Vacuum analyze <schema>.<table>;

• ‘default_statistics_target’ >= 300


Check for bloat!

• Free space on tables.

• Indexes are not optimized anymore

• use nagios check_postgres.pl


Prevent bloat

• Vacuum full • Offline!

• Only when a pk is not available

• Repack• Online!

• Orders the tables (clustered index)

• Needs a pk on the table

• Reindex

• Reindex regulary.


Partial indexes?

• Write a script

• Use a cronjob

• Recreate your time-aware indexes every day. Will be fast.


Questions?

• Postgres has an awesome community ®

• Irc: #postgresql @ freenode

• Check the mailing list

Bigger data with PostgreSQL 9

Technology

Transcript of Bigger data with PostgreSQL 9