Preview of “mod dbms monitoring.ppt” - Técnico Lisboa ... Consumer-Producer Chain of a DBMS’s...
Transcript of Preview of “mod dbms monitoring.ppt” - Técnico Lisboa ... Consumer-Producer Chain of a DBMS’s...
Administração e Optimização de Bases de Dados 2012/2013
Performance Monitoring
Bruno Martins DEI@Técnico e DMIR@INESC-ID
Performance Monitoring Goals !! Monitoring should check that
the performance-influencing ! database parameters are
correctly set and if they are not, it should point to where the problems are !
! Alternatively, one can sit and wait for the usersʼ complaints
!
Monitoring !preventive method !
OR !
Correct problems when detected ! reactive
method !
How to monitor a DBMS? !! By extracting relevant performance ! indicators, (counters, metrics, and details
of internal DBMSʼs activities) !
! By comparing the obtained values of these indicators against ideal values !! But… there are many indicators! !
Recurrent Patterns of Problems !
! An overloading high-level ! consumer
!" Ex: query accessing too many rows
!! A poorly parameterized subsystem
! " Ex: disk subsystem that stores tables, indexes, logging in a single disk
!! An overloaded primary resource ! " Ex: CPU busy because non-DB
processes are using it at the same time
!
A Consumer-Producer Chain of a DBMS’s Resources ! High Level
!sql commands ! Consumers
!
PARSER !OPTIMIZER
!
Intermediate !
DISK !SYBSYSTEM
!
probing spots ! for
indicators !
EXECUTION !SUBSYSTEM
LOCKING ! SUBSYSTEM
!
CACHE LOGGING MANAGER SUBSYSTEM !
Resources/ Consumers !
MEMORY CPU !
DISK/ CONTROLLER !
Primary !NETWORK Resources
!
A Systematic Approach to !Monitoring !Extract indicators to answer the following questions !
! Question 1: Are critical ! queries being served in
!the most efficient manner? !! Question 2: Are DBMS
! subsystems making optimal use of resources? !! Question 3: Are there
! enough primary resources available and are they configured adequately, given the current and expected workload? !
Investigating High Level Consumers: !Critical query monitoring !
Find !critical
queries !
Investigate no intermed. level !
Found any? !
yes ! Check Quest. 1
!
no Overcon- ! sumption?
!
yes Tune problematic ! queries
!
Investigating Intermediate and Primary Resources: routine !monitoring ! Check Quest. 3
!
Check Quest. 2 !
Tune yes Problematic subsystems subsystems? !
no Problems at yes ! OS level?
!
no Investigate ! upper level
!
Tune low-level resources
!
Tools used to gather information !
! Event monitors !
! Query plan explainers
! Performance monitors
!
Investigating High Level Consumers ! ! Answer question 1:
! “Are critical queries being served in the most efficient manner?” !
1. !2. !3. !
Identify the critical queries Analyze their access plans Profile their execution !
Query profiling indicators !! Duration information involves 3 indicators: ! " Elapsed time for the query: time it took to process it as
perceived by the user !" CPU time: time the CPU was actually used to process the
query !" Wait time: time the query was not processing and waiting
for a resource to become available !
! Resource consumption information includes: ! " (I/O) Physical and logical reads/writes
!" (Locking) Maximum nbr locks held, nbr lock escalations, nbr deadlocks/timeouts, total time spent waiting for locks
!" (SQL activity) Nb sorts and temporary area usage: gives an idea of the time spent in expensive overhead activity
!
Two common scenarios !1.
!
2. !
Elapsed query time close to CPU time, wait time negligible. Consumption fair: reasonable nb pages being accesses, most of them are logical accesses; nb locks low or inexistent and no deadlocks or lock escalations; if there were sorts, they !didnʼt augment the nb of logical/physical reads/writes
! Access plan is executed in the best way possible; if the rest ! of the chain is balanced, this is the best performance the
system can deliver for this query !
Noticeable discrepancy between elapsed and CPU time, wait time fills the gap. So there must be a problem in !resource consumption: contention (probably waiting for locks) or poorly performing resource (ex. if area allocated for sorting is small, additional I/O). !" To distinguish between contention and physical resource
consumption problem, run the query in isolation !
Investigating Primary Resources ! •
Answer question 3: “Are there enough primary resources available for a DBMS to consume?”
•
Primary resources: CPU, disk/controllers, memory, and network •
Analyze specific OS-level indicators to discover bottlenecks. •
A system-level performance monitor is the right tool here
CPU Consumption Indicators at the OS Level !
Sustained utilization !
CPU % of
utilization !
100% !
70% !
total usage !
system usage !
time !
over 70% should trigger the alert. !System utilization !shouldn’t be more
than 40%. !DBMS (in a non-
dedicated machine) !should be getting a decent time share. !
Disk Performance Indicators at the OS Level ! Should be
!close to zero !Average Queue Size
!
Disk Transfers /second
!
Idle disk with pending requests? !Check controller
contention. ! Also, transfers
should be !balanced among
disks/controllers !
New requests Wait queue ! Wait times
should also be close to
zero !
Memory Consumption !Indicators at the OS Level !
Page faults/time !should be close to zero. If paging
happens, at least
% of pagefile in use (it’s used a fixed file/partition) will tell ! you how much memory is “lacking”. !
not DB cache pages. real
! memory !
pagefile !
virtual memory !
Investigating Intermediate Resources/Consumers ! •
Answer question 2: “Are subsystems making optimal use of resources?”
•
Main subsystems: Cache Manager, Disk subsystem, Lock subsystem, and Log/Recovery subsystem •
Extract and analyze relevant perf. indicators •
A Performance Monitor is usually useful, but sometimes specific tools apply !
Cache Manager Performance !Indicators !
Table scan !
readpage() !
If page is not in the cache, readpage
(logical) generate an actual IO (physical). !Ratio of readpages that
did not generate !physical IO (cache hit !ration) should be 90% !Cache Pick
!Manager victim
!strategy Data Pages !
Free Page slots !
or more !
Pages are regularly saved to disk to make ! free space.
!Page reads/ writes
!# of free slots should
always be > 0 !
Disk Manager Performance !Indicators ! rows
!
page !
Storage !Hierarchy extent
!(simplified) !
file !
disk !
Row displacement: should kept under 5% of rows
!
Free space fragmentation: pages with few space should ! not be in the free list
!Data fragmentation: ideally files that store DB objects
(table, index) should be in one or few contiguous extents (<5) !
File position: should balance workload evenly among all disks !
Lock Manager Perf. Indicators !
Lock !request !
Object Lock Type !
Lock List !
Locks !pending list
!
TXN ID !
Lock wait time for a transaction should be a
small fraction of the whole ! transaction time.
!Number of locks on wait !should be a small fraction of the number of locks on ! the lock list, under 20% !
Deadlocks and timeouts ! should be virtually
!inexistent or extremely low (no more than 1% of the transactions) !
Logging Subsystem Performance Indicators !! Log is used by every transaction that alters, inserts
or deletes data !! Well tuned logging subsystem should write log
records at a pace that doesnʼt slow active transactions
!! Indicatores to measure: ! " Number of log waits to confirm that the disks storing the log
files are keeping pace with the transaction workload - should be zero
!" Nb of log expansions or log archives due to lack of space - if bigger than zero indicates a configuration problem
!" Log cache-hit ratio to check whether the size of the log buffer is adequate
!
Conclusion !
! Monitoring a DBMSʼs performance can be done in a systematic way ! " The consumption chain helps distinguishing
problems’ causes from their symptoms !" Existing tools help extracting relevant performance indicators !" The three questions guide the whole monitoring process !