Bigger data with PostgreSQL 9

34
Slide 1 © by Numius nv Open systems, Smarter people Bigger data with PostgreSQL 9 Datawarehousing in the 21st century. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

description

Companies today collect more and more data. And those companies also want to ask more and more questions to those datawarehouses. The time that a datawarehouse only needed to be tuned for read queries, and run the etl / elt once a night is over. This brought us some new challenges. With databases sizing to over 500GB, doing hundreds, sometimes even thousands of inserts/updates/deletes every second, and running select queries on the database who's motto is: the more tables we join, the more fun, it brought us some new challenged on how to configure and tune those databases (and servers). Not only dba skills were needed, but good system engineering skills were used too, to get the db running smooth under the heavy workload. We discovered that we needed more than one server. luckily PostreSQL 9 now provides us with streaming replication too. In this talk I will discuss how we took on all challenges, how we setup up our backup / replication strategy, and that all with as little effort as possible by using the right tools for the job.

Transcript of Bigger data with PostgreSQL 9

Page 1: Bigger data with PostgreSQL 9

Slide 1

© by Numius nvOpen systems, Smarter people

Bigger data with PostgreSQL 9

Datawarehousing in the 21st century.

This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

Page 2: Bigger data with PostgreSQL 9

Slide 2

© by Numius nvOpen systems, Smarter people

The presenter..

• Bert Desmet

• Consultant @ Deloitte

• System Engineer / DBA for deloitteanalytics.eu

• 'devop'?

Page 3: Bigger data with PostgreSQL 9

Slide 3

© by Numius nvOpen systems, Smarter people

agenda

• Introduction

• Release the elephants!

• Impacting factors

• Divide et impera

• Basic configuration

• Passing the speed limits

• Keep your database fit

Page 4: Bigger data with PostgreSQL 9

Slide 4

© by Numius nvOpen systems, Smarter people

Big data?

● 44x data growth per year!

● About 35.2 zettabyte by 2020

● 80% of data is unstructured

● The volume will grow by a whopping 650% in the next 5years

● 80% of organisations will use cloud analytics

● By 2014 80% of eneterprises will want a saas based bi system

Page 5: Bigger data with PostgreSQL 9

Slide 5

© by Numius nvOpen systems, Smarter people

Know your limits

● DB2

● More load

● Scaling

● Speed

● Data size

● Pricing

Page 6: Bigger data with PostgreSQL 9

Slide 6

© by Numius nvOpen systems, Smarter people6 Footer

Release the elephants!

Page 7: Bigger data with PostgreSQL 9

Slide 7

© by Numius nvOpen systems, Smarter people

PostgreSQL 9

● Good for big databases

● Easy maintenance

● Scales!

● Very fast

● Extendable

Page 8: Bigger data with PostgreSQL 9

Impacting factors

Page 9: Bigger data with PostgreSQL 9

Slide 9

© by Numius nvOpen systems, Smarter people

Higly impacting operations

• Dataload

• In bulk (ETL)

• Row by row. Up to 100k rows / minute

• Datafetch (Reporting)

• We do like joins. The more the better.

Page 10: Bigger data with PostgreSQL 9

Slide 10

© by Numius nvOpen systems, Smarter people

Extra problems

• a lot of I/O

• A lot of cpu power (index creation)

• A lot of locks

Page 11: Bigger data with PostgreSQL 9

Slide 11

© by Numius nvOpen systems, Smarter people

The solution?

• Use at least 2 servers

• Set up binary replication

• Put a lot of ram in your servers.

Page 12: Bigger data with PostgreSQL 9

Slide 12

© by Numius nvOpen systems, Smarter people

Dataflow

Page 13: Bigger data with PostgreSQL 9

Slide 13

© by Numius nvOpen systems, Smarter people13 Footer

Devide et Impera

Page 14: Bigger data with PostgreSQL 9

Slide 14

© by Numius nvOpen systems, Smarter people

Replication with postgres

• 8.3 Warm Standby

• 9.0 Async. Binary Replication

• 9.1 Synchronous Replication

• 9.2 Cascading Replication

• 9.3 more improvents towards fail overs / switching masters

• 9.4 Multimaster Binary Replication?

Page 15: Bigger data with PostgreSQL 9

Slide 15

© by Numius nvOpen systems, Smarter people

Configure replication

• Wal_level = ‘host standby’

• Checkpoint_segments >= 32

• Checkpoint_completetion_target >= 0.8

• Hot_standby = on

• Hot_standby_feedback = on

Page 16: Bigger data with PostgreSQL 9

Slide 16

© by Numius nvOpen systems, Smarter people

Page 17: Bigger data with PostgreSQL 9

Slide 17

© by Numius nvOpen systems, Smarter people

Keep it simple, stupid

• 2nd quadrant is pretty awesome

• Barman for backups

• Repmgr for replication management

Page 18: Bigger data with PostgreSQL 9

Slide 18

© by Numius nvOpen systems, Smarter people

Basic configuration

Page 19: Bigger data with PostgreSQL 9

Slide 19

© by Numius nvOpen systems, Smarter people

Raise those memory limits!

• shared_buffers = 1/8 to ¼ of RAM

• work_mem = 128MB to 1GB

• maintenance_work_mem = 512MB to 1GB

• temp_buffers = 128MB to 1GB

• effective_cache_size = ¾ of RAM

• wal_buffers = 32MB

Page 20: Bigger data with PostgreSQL 9

Slide 20

© by Numius nvOpen systems, Smarter people

Tune the planner for correct planning

• Random_page_cost = 3

• Cpu_tuple_cost = 0.1

• Contraint_exclusion=on

• From_collapse_limit => 12

• Join_collapse_limit => 12

Page 21: Bigger data with PostgreSQL 9

Slide 21

© by Numius nvOpen systems, Smarter people

Passing the speed limits

Page 22: Bigger data with PostgreSQL 9

Slide 22

© by Numius nvOpen systems, Smarter people

Use partitions

• Think about the partition key!

• Trigger based for row / row inserts

• Rule based for bulk inserts

• Make sure you add constraints

Page 23: Bigger data with PostgreSQL 9

Slide 23

© by Numius nvOpen systems, Smarter people

Use indexes

• Learn to read query explains

• Use http://explain.depesz.com/

• Don’t over index

Page 24: Bigger data with PostgreSQL 9

Slide 24

© by Numius nvOpen systems, Smarter people

Other sane things to do

• Use unique indexes

• Auto created when defining a primary key

• Use clustered indexes

• And cluster those tables regularly

Page 25: Bigger data with PostgreSQL 9

Slide 25

© by Numius nvOpen systems, Smarter people

Use partial indexes

• Can only be found in Postgres and Mysql.

• Really usefull on big tables

• Disadvantage: no ‘moving’ indexes. Eg: index for current_day.

Page 26: Bigger data with PostgreSQL 9

Keep your database fit

Page 27: Bigger data with PostgreSQL 9

Slide 27

© by Numius nvOpen systems, Smarter people

Vacuum

• Disable autovacuum for datawarehouses

• Vacuum once a day

• Check regulary if the vacuums to run!

• Prevents data loss

• Prevents the database to go out of control, size wise

Page 28: Bigger data with PostgreSQL 9

Slide 28

© by Numius nvOpen systems, Smarter people

Analyze

• Analyze once a day

• Together with vacuum

• Vacuum analyze <schema>.<table>;

• ‘default_statistics_target’ >= 300

Page 29: Bigger data with PostgreSQL 9

Slide 29

© by Numius nvOpen systems, Smarter people

Check for bloat!

• Free space on tables.

• Indexes are not optimized anymore

• use nagios check_postgres.pl

Page 30: Bigger data with PostgreSQL 9

Slide 30

© by Numius nvOpen systems, Smarter people

Prevent bloat

• Vacuum full • Offline!

• Only when a pk is not available

• Repack• Online!

• Orders the tables (clustered index)

• Needs a pk on the table

• Reindex

• Reindex regulary.

Page 31: Bigger data with PostgreSQL 9

Slide 31

© by Numius nvOpen systems, Smarter people

Partial indexes?

• Write a script

• Use a cronjob

• Recreate your time-aware indexes every day. Will be fast.

Page 32: Bigger data with PostgreSQL 9

Slide 32

© by Numius nvOpen systems, Smarter people

Page 33: Bigger data with PostgreSQL 9

Slide 33

© by Numius nvOpen systems, Smarter people

Questions?

• Postgres has an awesome community ®

• Irc: #postgresql @ freenode

• Check the mailing list

Page 34: Bigger data with PostgreSQL 9

Slide 34

© by Numius nvOpen systems, Smarter people