Monitor Some of the Things
-
Upload
vividcortex -
Category
Software
-
view
72 -
download
0
Transcript of Monitor Some of the Things
![Page 1: Monitor Some of the Things](https://reader038.fdocuments.net/reader038/viewer/2022110310/55a8b11c1a28ab85088b470c/html5/thumbnails/1.jpg)
2013-10-18
MONITORSOME OF THE THINGS
![Page 2: Monitor Some of the Things](https://reader038.fdocuments.net/reader038/viewer/2022110310/55a8b11c1a28ab85088b470c/html5/thumbnails/2.jpg)
Optimization, Backups, Replication, and more
Baron Schwartz, Peter Zaitsev &
Vadim Tkachenko
High PerformanceMySQL
3rd Edition
Covers Version 5.5
ME
• Cofounder of @VividCortex
• Author of High Performance MySQL
• @xaprb on Twitter
• http://www.linkedin.com/in/xaprb
![Page 3: Monitor Some of the Things](https://reader038.fdocuments.net/reader038/viewer/2022110310/55a8b11c1a28ab85088b470c/html5/thumbnails/3.jpg)
RANT, RECAPPED
• The sky is falling
• Tools drive processes, and we need better tools designed for methods
• Pay attention to CAPS (Capacity, Availability, Performance, Scalability)
• Monitoring tools need to be a lot smarter
• Measure and monitor “work getting done”
![Page 4: Monitor Some of the Things](https://reader038.fdocuments.net/reader038/viewer/2022110310/55a8b11c1a28ab85088b470c/html5/thumbnails/4.jpg)
HARD CAPACITY
• Disk volume
• CPU Cycles
• max_connections
• File descriptors, sockets, TCP port numbers, etc
• %used, absolute quantity available
![Page 5: Monitor Some of the Things](https://reader038.fdocuments.net/reader038/viewer/2022110310/55a8b11c1a28ab85088b470c/html5/thumbnails/5.jpg)
SOFT CAPACITY
• Neil Gunther’s Universal Scalability Law
• %used, absolute quantity available
• Throughput, concurrency, errors
![Page 6: Monitor Some of the Things](https://reader038.fdocuments.net/reader038/viewer/2022110310/55a8b11c1a28ab85088b470c/html5/thumbnails/6.jpg)
AVAILABILITY
• Availability is absence of downtime • %used, absolute quantity available
• Throughput, concurrency, errors
• MTBF, MTTR, MTTD, %availability
![Page 7: Monitor Some of the Things](https://reader038.fdocuments.net/reader038/viewer/2022110310/55a8b11c1a28ab85088b470c/html5/thumbnails/7.jpg)
TASK PERFORMANCE
• Task performance is consistently fast response time.
• Measure an SLA in percentile response time per task, over observation intervals
• %used, absolute quantity available
• Throughput, concurrency, errors
• MTBF, MTTR, MTTD, %availability
• Response time, 95% response time
![Page 8: Monitor Some of the Things](https://reader038.fdocuments.net/reader038/viewer/2022110310/55a8b11c1a28ab85088b470c/html5/thumbnails/8.jpg)
RESOURCE PERFORMANCE
• Resource performance is ability to run tasks consistently fast.
• %used, absolute quantity available
• Throughput, concurrency, errors
• MTBF, MTTR, MTTD, %availability
• Response time, 95% response time
• Throughput, concurrency, busy time, total response time, backlog/queue
![Page 9: Monitor Some of the Things](https://reader038.fdocuments.net/reader038/viewer/2022110310/55a8b11c1a28ab85088b470c/html5/thumbnails/9.jpg)
SCALABILITY
• Universal Scalability Law again • %used, absolute quantity available
• Throughput, concurrency, errors
• MTBF, MTTR, MTTD, %availability
• Response time, 95% response time
• Throughput, concurrency, busy time, total response time, backlog/queue
![Page 10: Monitor Some of the Things](https://reader038.fdocuments.net/reader038/viewer/2022110310/55a8b11c1a28ab85088b470c/html5/thumbnails/10.jpg)
STALL DETECTION
• Overloaded or underperforming? • %used, absolute quantity available
• Throughput, concurrency, errors
• MTBF, MTTR, MTTD, %availability
• Response time, 95% response time
• Throughput, concurrency, busy time, total response time, backlog/queue
• Utilization, saturation, errors, sources of load/demand
![Page 11: Monitor Some of the Things](https://reader038.fdocuments.net/reader038/viewer/2022110310/55a8b11c1a28ab85088b470c/html5/thumbnails/11.jpg)
GIT ‘ER DONE
MONITOR WORK AND RESOURCES
![Page 12: Monitor Some of the Things](https://reader038.fdocuments.net/reader038/viewer/2022110310/55a8b11c1a28ab85088b470c/html5/thumbnails/12.jpg)
WHAT NOT TO DO
• Don’t use top-N lists from Google
• Don’t just do what’s included in some Nagios plugin
![Page 13: Monitor Some of the Things](https://reader038.fdocuments.net/reader038/viewer/2022110310/55a8b11c1a28ab85088b470c/html5/thumbnails/13.jpg)
№1TOP 10 LIST
1. MySQL availability2. Presence of insecure users and databases3. Aborted connects4. Error log5. Deadlocks6. Change in server configuration7. Slow query log8. Slave lag9. Percentage of maximum allowed connections10. Percentage of full table scans
![Page 14: Monitor Some of the Things](https://reader038.fdocuments.net/reader038/viewer/2022110310/55a8b11c1a28ab85088b470c/html5/thumbnails/14.jpg)
№2TOP 10 LIST
1. Threads_connected2. Created_tmp_disk_tables3. Handler_read_first4. Innodb_buffer_pool_wait_free5. Key_reads6. Max_used_connections7. Open_tables8. Select_full_join9. Slow_queries10. Uptime
![Page 15: Monitor Some of the Things](https://reader038.fdocuments.net/reader038/viewer/2022110310/55a8b11c1a28ab85088b470c/html5/thumbnails/15.jpg)
№1PLUGIN
1. threadcache-hitrate (Hit rate of the thread-cache) 2. slave-io-running (Slave io running: Yes) 3. slave-sql-running (Slave sql running: Yes) 4. qcache-hitrate (Query cache hitrate) 5. qcache-lowmem-prunes (Query cache entries pruned because of low memory) 6. keycache-hitrate (MyISAM key cache hitrate) 7. bufferpool-hitrate (InnoDB buffer pool hitrate) 8. bufferpool-wait-free (InnoDB buffer pool waits for clean page available) 9. log-waits (InnoDB log waits because of a too small log buffer) 10. tablecache-hitrate (Table cache hitrate) 11. table-lock-contention (Table lock contention) 12. index-usage (Usage of indices) 13. tmp-disk-tables (Percent of temp tables created on disk) 14. long-running-procs (long running processes)
![Page 16: Monitor Some of the Things](https://reader038.fdocuments.net/reader038/viewer/2022110310/55a8b11c1a28ab85088b470c/html5/thumbnails/16.jpg)
№2PLUGIN
1. connection-time2. uptime3. threads-connected4. threadcache-hitrate5. q[uery]cache-hitrate6. q[uery]cache-lowmem-
prunes7. [myisam-]keycache-hitrate8. [innodb-]bufferpool-hitrate9. [innodb-]bufferpool-wait-free10. [innodb-]log-waits11. tablecache-hitrate
12. table-lock-contention13. index-usage14. tmp-disk-tables15. slow-queries16. long-running-procs17. slave-lag18. slave-io-running19. slave-sql-running20. sql21. open-files22. encode23. cluster-ndb-running
![Page 17: Monitor Some of the Things](https://reader038.fdocuments.net/reader038/viewer/2022110310/55a8b11c1a28ab85088b470c/html5/thumbnails/17.jpg)
№3PLUGIN
![Page 18: Monitor Some of the Things](https://reader038.fdocuments.net/reader038/viewer/2022110310/55a8b11c1a28ab85088b470c/html5/thumbnails/18.jpg)
SURFACE AREA
HTTP://WWW.FLICKR.COM/PHOTOS/NASAMARSHALL/5926864640/
![Page 19: Monitor Some of the Things](https://reader038.fdocuments.net/reader038/viewer/2022110310/55a8b11c1a28ab85088b470c/html5/thumbnails/19.jpg)
DUPLICATE SIGNALS
• Queries
• Com_admin_commands
• Com_assign_to_keycache
• Com_alter_db
• Com_alter_db_upgrade
• Com_alter_event
• Com_alter_function
• Com_alter_procedure
• Com_alter_server
• Com_alter_table
• Com_alter_tablespace
• Com_alter_user
• Com_analyze
• Com_begin
• Com_binlog
• Com_ad_nauseum
![Page 20: Monitor Some of the Things](https://reader038.fdocuments.net/reader038/viewer/2022110310/55a8b11c1a28ab85088b470c/html5/thumbnails/20.jpg)
DESIRABLE METRICS
• %used, absolute quantity available
• Throughput, concurrency, errors
• MTBF, MTTR, MTTD, %availability
• Response time, 95% response time
• Throughput, concurrency, busy time, total response time, backlog/queue
• Utilization, saturation, errors, sources of load/demand
![Page 21: Monitor Some of the Things](https://reader038.fdocuments.net/reader038/viewer/2022110310/55a8b11c1a28ab85088b470c/html5/thumbnails/21.jpg)
Desirable Easy
![Page 22: Monitor Some of the Things](https://reader038.fdocuments.net/reader038/viewer/2022110310/55a8b11c1a28ab85088b470c/html5/thumbnails/22.jpg)
Desirable Easy
![Page 23: Monitor Some of the Things](https://reader038.fdocuments.net/reader038/viewer/2022110310/55a8b11c1a28ab85088b470c/html5/thumbnails/23.jpg)
IRRELEVANT
EXAMPLE PLEASE?
![Page 24: Monitor Some of the Things](https://reader038.fdocuments.net/reader038/viewer/2022110310/55a8b11c1a28ab85088b470c/html5/thumbnails/24.jpg)
RESOURCE LIMITS
• Threads_connected near max_connections?
• %table cache used?
• Open file handles?
• Long-running queries/transactions?
![Page 25: Monitor Some of the Things](https://reader038.fdocuments.net/reader038/viewer/2022110310/55a8b11c1a28ab85088b470c/html5/thumbnails/25.jpg)
ERRORS
• Deadlocks?
• Aborted connects?
![Page 26: Monitor Some of the Things](https://reader038.fdocuments.net/reader038/viewer/2022110310/55a8b11c1a28ab85088b470c/html5/thumbnails/26.jpg)
AVAILABILITY
• Ability to connect and run a query?
• Uptime is small?
• Replication is running?
![Page 27: Monitor Some of the Things](https://reader038.fdocuments.net/reader038/viewer/2022110310/55a8b11c1a28ab85088b470c/html5/thumbnails/27.jpg)
PERFORMANCE
• You can get throughput (Queries) and concurrency (Threads_running) from MySQL
• But in a Nagios check, no context to know whether they’re good or bad
• You generally can’t get response time, busy time, utilization, backlog, etc
• You can aggregate thread states, thread times, users, databases, query abstracts...
![Page 28: Monitor Some of the Things](https://reader038.fdocuments.net/reader038/viewer/2022110310/55a8b11c1a28ab85088b470c/html5/thumbnails/28.jpg)
NAGIOS IS BEST AT
LIVING IN THE MOMENT
![Page 29: Monitor Some of the Things](https://reader038.fdocuments.net/reader038/viewer/2022110310/55a8b11c1a28ab85088b470c/html5/thumbnails/29.jpg)
THOU SHALT NOT
• Cache hit ratios
• Thread cache hit ratio
• Buffer pool cache hit ratio
• Table cache hit ratio
• Key cache hit ratio
• Query cache hit ratio
• Rates of “bad” queries
• % temp tables on disk
• % full table scans
• % slow queries
• Unfixable things
• Replication delay
![Page 30: Monitor Some of the Things](https://reader038.fdocuments.net/reader038/viewer/2022110310/55a8b11c1a28ab85088b470c/html5/thumbnails/30.jpg)
WHY NOT?
• Those are properties of the workload and application
• They are not conditions to alert/warn about
• They are not fixable / actionable in the service
![Page 31: Monitor Some of the Things](https://reader038.fdocuments.net/reader038/viewer/2022110310/55a8b11c1a28ab85088b470c/html5/thumbnails/31.jpg)
ALERTS ARE
BETTER TOGETHER
![Page 32: Monitor Some of the Things](https://reader038.fdocuments.net/reader038/viewer/2022110310/55a8b11c1a28ab85088b470c/html5/thumbnails/32.jpg)
QUESTION:
WHAT IS BETTER?
![Page 33: Monitor Some of the Things](https://reader038.fdocuments.net/reader038/viewer/2022110310/55a8b11c1a28ab85088b470c/html5/thumbnails/33.jpg)
№1 ALERT!!!!!Disk CRIT 100% /dev/sda2
![Page 34: Monitor Some of the Things](https://reader038.fdocuments.net/reader038/viewer/2022110310/55a8b11c1a28ab85088b470c/html5/thumbnails/34.jpg)
№2 ALERT!!!!!Replication CRIT Slave I/O Thread No
![Page 35: Monitor Some of the Things](https://reader038.fdocuments.net/reader038/viewer/2022110310/55a8b11c1a28ab85088b470c/html5/thumbnails/35.jpg)
№3 ALERT!!!!!Replication CRIT Slave SQL Thread No
![Page 36: Monitor Some of the Things](https://reader038.fdocuments.net/reader038/viewer/2022110310/55a8b11c1a28ab85088b470c/html5/thumbnails/36.jpg)
№4 ALERT!!!!!Replication CRIT Seconds_Behind_Master NULL
![Page 37: Monitor Some of the Things](https://reader038.fdocuments.net/reader038/viewer/2022110310/55a8b11c1a28ab85088b470c/html5/thumbnails/37.jpg)
№5 ALERT!!!!!MySQL CRIT oldest transaction: 86400 seconds
![Page 38: Monitor Some of the Things](https://reader038.fdocuments.net/reader038/viewer/2022110310/55a8b11c1a28ab85088b470c/html5/thumbnails/38.jpg)
- OR -
![Page 39: Monitor Some of the Things](https://reader038.fdocuments.net/reader038/viewer/2022110310/55a8b11c1a28ab85088b470c/html5/thumbnails/39.jpg)
№1 ALERT!!!!!CRIT* Disk /dev/sda2 full* Replication stopped* Oldest transaction 86400 seconds* 4999 threads in status “Waiting for table metadata lock”
![Page 41: Monitor Some of the Things](https://reader038.fdocuments.net/reader038/viewer/2022110310/55a8b11c1a28ab85088b470c/html5/thumbnails/41.jpg)
RESOURCES
• Chapter 3 of High Performance MySQL, 3rd Edition
• Percona White Papers
• Causes of Downtime in Production MySQL Servers
• Preventing MySQL Emergencies
• Goal-Driven Performance Optimization
• Forecasting MySQL Scalability with the Universal Scalability Law
• Method R: Optimizing Oracle Performance, Cary Millsap
• The Goal, Eli Goldratt
• The USE Method (Brendan Gregg) & his new book
• Guerrilla Capacity Planning, Neil J. Gunther
• Fundamental Performance & Scalability Instrumentation