Bigger data with PostgreSQL 9
-
Upload
bert-desmet -
Category
Technology
-
view
2.200 -
download
3
description
Transcript of Bigger data with PostgreSQL 9
Slide 1
© by Numius nvOpen systems, Smarter people
Bigger data with PostgreSQL 9
Datawarehousing in the 21st century.
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
Slide 2
© by Numius nvOpen systems, Smarter people
The presenter..
• Bert Desmet
• Consultant @ Deloitte
• System Engineer / DBA for deloitteanalytics.eu
• 'devop'?
Slide 3
© by Numius nvOpen systems, Smarter people
agenda
• Introduction
• Release the elephants!
• Impacting factors
• Divide et impera
• Basic configuration
• Passing the speed limits
• Keep your database fit
Slide 4
© by Numius nvOpen systems, Smarter people
Big data?
● 44x data growth per year!
● About 35.2 zettabyte by 2020
● 80% of data is unstructured
● The volume will grow by a whopping 650% in the next 5years
● 80% of organisations will use cloud analytics
● By 2014 80% of eneterprises will want a saas based bi system
Slide 5
© by Numius nvOpen systems, Smarter people
Know your limits
● DB2
● More load
● Scaling
● Speed
● Data size
● Pricing
Slide 6
© by Numius nvOpen systems, Smarter people6 Footer
Release the elephants!
Slide 7
© by Numius nvOpen systems, Smarter people
PostgreSQL 9
● Good for big databases
● Easy maintenance
● Scales!
● Very fast
● Extendable
Impacting factors
Slide 9
© by Numius nvOpen systems, Smarter people
Higly impacting operations
• Dataload
• In bulk (ETL)
• Row by row. Up to 100k rows / minute
• Datafetch (Reporting)
• We do like joins. The more the better.
Slide 10
© by Numius nvOpen systems, Smarter people
Extra problems
• a lot of I/O
• A lot of cpu power (index creation)
• A lot of locks
Slide 11
© by Numius nvOpen systems, Smarter people
The solution?
• Use at least 2 servers
• Set up binary replication
• Put a lot of ram in your servers.
Slide 12
© by Numius nvOpen systems, Smarter people
Dataflow
Slide 13
© by Numius nvOpen systems, Smarter people13 Footer
Devide et Impera
Slide 14
© by Numius nvOpen systems, Smarter people
Replication with postgres
• 8.3 Warm Standby
• 9.0 Async. Binary Replication
• 9.1 Synchronous Replication
• 9.2 Cascading Replication
• 9.3 more improvents towards fail overs / switching masters
• 9.4 Multimaster Binary Replication?
Slide 15
© by Numius nvOpen systems, Smarter people
Configure replication
• Wal_level = ‘host standby’
• Checkpoint_segments >= 32
• Checkpoint_completetion_target >= 0.8
• Hot_standby = on
• Hot_standby_feedback = on
Slide 16
© by Numius nvOpen systems, Smarter people
Slide 17
© by Numius nvOpen systems, Smarter people
Keep it simple, stupid
• 2nd quadrant is pretty awesome
• Barman for backups
• Repmgr for replication management
Slide 18
© by Numius nvOpen systems, Smarter people
Basic configuration
Slide 19
© by Numius nvOpen systems, Smarter people
Raise those memory limits!
• shared_buffers = 1/8 to ¼ of RAM
• work_mem = 128MB to 1GB
• maintenance_work_mem = 512MB to 1GB
• temp_buffers = 128MB to 1GB
• effective_cache_size = ¾ of RAM
• wal_buffers = 32MB
Slide 20
© by Numius nvOpen systems, Smarter people
Tune the planner for correct planning
• Random_page_cost = 3
• Cpu_tuple_cost = 0.1
• Contraint_exclusion=on
• From_collapse_limit => 12
• Join_collapse_limit => 12
Slide 21
© by Numius nvOpen systems, Smarter people
Passing the speed limits
Slide 22
© by Numius nvOpen systems, Smarter people
Use partitions
• Think about the partition key!
• Trigger based for row / row inserts
• Rule based for bulk inserts
• Make sure you add constraints
Slide 23
© by Numius nvOpen systems, Smarter people
Use indexes
• Learn to read query explains
• Use http://explain.depesz.com/
• Don’t over index
Slide 24
© by Numius nvOpen systems, Smarter people
Other sane things to do
• Use unique indexes
• Auto created when defining a primary key
• Use clustered indexes
• And cluster those tables regularly
Slide 25
© by Numius nvOpen systems, Smarter people
Use partial indexes
• Can only be found in Postgres and Mysql.
• Really usefull on big tables
• Disadvantage: no ‘moving’ indexes. Eg: index for current_day.
Keep your database fit
Slide 27
© by Numius nvOpen systems, Smarter people
Vacuum
• Disable autovacuum for datawarehouses
• Vacuum once a day
• Check regulary if the vacuums to run!
• Prevents data loss
• Prevents the database to go out of control, size wise
Slide 28
© by Numius nvOpen systems, Smarter people
Analyze
• Analyze once a day
• Together with vacuum
• Vacuum analyze <schema>.<table>;
• ‘default_statistics_target’ >= 300
Slide 29
© by Numius nvOpen systems, Smarter people
Check for bloat!
• Free space on tables.
• Indexes are not optimized anymore
• use nagios check_postgres.pl
Slide 30
© by Numius nvOpen systems, Smarter people
Prevent bloat
• Vacuum full • Offline!
• Only when a pk is not available
• Repack• Online!
• Orders the tables (clustered index)
• Needs a pk on the table
• Reindex
• Reindex regulary.
Slide 31
© by Numius nvOpen systems, Smarter people
Partial indexes?
• Write a script
• Use a cronjob
• Recreate your time-aware indexes every day. Will be fast.
Slide 32
© by Numius nvOpen systems, Smarter people
Slide 33
© by Numius nvOpen systems, Smarter people
Questions?
• Postgres has an awesome community ®
• Irc: #postgresql @ freenode
• Check the mailing list
Slide 34
© by Numius nvOpen systems, Smarter people