Deep Dive Into 2ASH

33
Deep dive into ASH Saibabu Devabhaktuni

Transcript of Deep Dive Into 2ASH

Saibabu Devabhaktuni

About me Sr Manager of DB engineering at PayPal Using Oracle since 1998 http://sai-oracle.blogspot.com

Can be contacted at [email protected] Lives in Fremont, CA

Scope No OEM demos Tested on Oracle 10.2 and 11.2 Some of the observations here can be incorrect or may

change in future versions.

Introduction Active session history (ASH) introduced in 10g Need diagnostic pack license Integrated with database kernel and awr

Very light weight Works on physical standby (from 11g onwards)

V$session Single most important performance view Active session count is the top KPI V$session_wait is included in v$session

Blocking session information is reported Just like for any v$ view, no read consistency for selects

v$session (contd) Fixed_table_sequence is most under used column Max(fixed_table_sequence) is total db calls made Incremental values for a session indicate activity

It can be used to determine order of sessions on a wait About 100 columns in v$session DBAs should know about all columns in v$session

ASH overview Snapshot of active sessions at one second interval Mostly same information as in v$session Maintains circular buffer of up to 254MB

AWR snapshot preserves sample of ASH data to disk ASH queries read from head to tail of circular buffer ASH overhead is very minimal (less than 5%)

ASH architecture MMNL process capture 1 second interval snapshot of

active sessions ASH is join of x$kewash and x$ash X$kewash has one record for every snashot X$kewash also has sample count and length X$kewash is used to locate snapshot address in x$ash Is_awr_sample column is indexed

ASH architecture (contd) With the exception of few columns ASH is same as

x$ash ASH is indexed on sample_id and is_awr_sample cols ASH buffer size is typically 2M per cpu but no more than 254M or 5% of SGA ASH sample record can be self inconsistent due to lack of read consistency in underlying x$(v$) fixed tables New records overwrite older records in the ASH circular buffer

AWR Each AWR snapshot flush ASH into

DBA_HIST_ACTIVE_SESS_HISTORY Only one out of every 10 ASH samples are preserved ASH emergency flush at 2/3 of buffer full ASH snapshots in DBA_HIST_ASH_SNAPSHOT Ashrpt.sql script provided under rdbms/admin ADDM rely on ASH data

ASH parameters _ash_enable for enable/disable ASH _ash_disk_write_enable for AWR writes _ash_disk_filter_ratio for sample records to write

_ash_eflush_trigger set to 66% for emergency flush _ash_sample_all for sampling active/inactive sessions _ash_sampling_interval for changing sampling

interval defined in milli seconds

ASH parameters (contd) _ash_enable set to false doesnt remove existing ASH

data (can only be accessed after parameter set to true) Set _ash_disk_filter_ratio to 1 during any database incident Dont change _ash_sample_all, as inactive session data is not relevant and it generates more data Set _ash_sampling_interval to lower values (10 ms) during RAT or snapshot standby testing (not dynamic parameter, require instance restart)

Sample data Sample id and time Session details Sql details

Pl/sql details Wait event information Blocking session details Client details Time model information

Session detailsIS_AWR_SAMPLE SESSION_ID SESSION_SERIAL# SESSION_TYPE USER_ID SESSION_STATE XID REMOTE_INSTANCE# PGA_ALLOCATED TEMP_SPACE_ALLOCATED TOP_LEVEL_CALL# TOP_LEVEL_CALL_NAME VARCHAR2(1) NUMBER NUMBER VARCHAR2(10) NUMBER VARCHAR2(7) RAW(8) NUMBER NUMBER NUMBER NUMBER VARCHAR2(64)

Session details (contd) AWR writes sample data when is_awr_sample set Session_id and session_serial# to be used for

comparing session data across samples XID indicate whether session is in transaction Percentage of sample sqls not having XID indicate purely read only traffic (can be target for Active DG) Pga_allocated and temp_space_allocated helpful in determining big sort operations Top_level_call_name indicate db calls like exec, fetch, commit, rollback, etc.

Sql detailsSQL_ID IS_SQLID_CURRENT SQL_CHILD_NUMBER SQL_OPCODE SQL_OPNAME TOP_LEVEL_SQL_ID TOP_LEVEL_SQL_OPCODE SQL_PLAN_HASH_VALUE SQL_PLAN_LINE_ID SQL_PLAN_OPERATION SQL_PLAN_OPTIONS SQL_EXEC_ID SQL_EXEC_START VARCHAR2(13) VARCHAR2(1) NUMBER NUMBER VARCHAR2(64) VARCHAR2(13) NUMBER NUMBER NUMBER VARCHAR2(30) VARCHAR2(30) NUMBER DATE

Sql details (contd) Sql_exec_id incremented for every execution per

instance Same sql_exec_id across snapshots indicate session executing same sql Sql plan details in ASH helps determine expensive sqls Execution state detail in columns (in_bind, in_parse, in_sql_execution, in_sequence_load) Top_level_sql_id helpful for pl/sql or recursive sql

Plsql detailsPLSQL_ENTRY_OBJECT_ID PLSQL_ENTRY_SUBPROGRAM_ID PLSQL_OBJECT_ID PLSQL_SUBPROGRAM_ID IN_PLSQL_EXECUTION IN_PLSQL_RPC IN_PLSQL_COMPILATION NUMBER NUMBER NUMBER NUMBER VARCHAR2(1) VARCHAR2(1) VARCHAR2(1)

Plsql details (contd) Plsql_entry_object_id is the calling pl/sql procedure Plsql_entry_subprogram_id is the calling subprogram

in the pl/sql procedure Plsql_object_id is the currently executing procedure Query dba_procedures to map subprogram_id to sub program name Demo

WaiteventsEVENT EVENT_ID EVENT# SEQ# P1TEXT P1 P2TEXT P2 P3TEXT P3 WAIT_CLASS WAIT_CLASS_ID WAIT_TIME TIME_WAITED VARCHAR2(64) NUMBER NUMBER NUMBER VARCHAR2(64) NUMBER VARCHAR2(64) NUMBER VARCHAR2(64) NUMBER VARCHAR2(64) NUMBER NUMBER NUMBER

Waitevents (contd) Event null in ASH (unlike in v$session) when session is

on CPU P1, p2, p3 populated even when session is on CPU (for the previous event) Wait class is higher level dimension to group data Wait_time is irrevelant in ASH (unlike in v$session) Seq# is the sequence number of each wait event in a given session

Waitevents (contd) Seq# rolls over after reaching 64K in a given session Sequential samples with same seq# indicate session in

the same wait event (extremely useful for RCA) Event is not written (event_id is enough) to AWR table wrh$_active_session_history to save space Event_id and event# are synonymous (event_id is usually same across DB versions) All the wait time metrics reported in micro seconds

Time_waited Only populated when session is done waiting on a

given event, set to zero for remaining previous samples ASH realizes when session done waiting for a given event, go back and update time_waited in the most recent sample for that wait_event with same seq# Time_waited not be populated if session exits before ASH can capture it for the previous sample Query max and avg(time_waited) for samples it is set Demo

Blocking sessionBLOCKING_SESSION_STATUS BLOCKING_SESSION BLOCKING_SESSION_SERIAL# BLOCKING_INST_ID BLOCKING_HANGCHAIN_INFO CURRENT_OBJ# CURRENT_FILE# CURRENT_BLOCK# CURRENT_ROW# VARCHAR2(11) NUMBER NUMBER NUMBER VARCHAR2(1) NUMBER NUMBER NUMBER NUMBER

Blocking session (contd) Blocking session reported for any lock/enqueue, buffer

busy waits, latch contention, library cache mutex contention, etc. (More useful than v$lock) Current_obj#/file/block/row can be used to determine row lock contention Current_obj#/file/block is populated for any I/O related wait event Blocking_hangchain_info indicate multiple levels of lock contention

Client detailsSERVICE_HASH PROGRAM MODULE ACTION CLIENT_ID MACHINE PORT ECID NUMBER VARCHAR2(48) VARCHAR2(64) VARCHAR2(64) VARCHAR2(64) VARCHAR2(64) NUMBER VARCHAR2(64)

Client details (contd) Ability to report incident data by many client

dimensions Finding resource utilization by program, module, service, etc Useful for troubleshooting network related issues Having this data in awr base tables would give more insight into changes being to middle tier

Time modelTM_DELTA_TIME TM_DELTA_CPU_TIME TM_DELTA_DB_TIME DELTA_TIME DELTA_READ_IO_REQUESTS DELTA_WRITE_IO_REQUESTS DELTA_READ_IO_BYTES DELTA_WRITE_IO_BYTES DELTA_INTERCONNECT_IO_BYTE NUMBER NUMBER NUMBER NUMBER NUMBER NUMBER NUMBER NUMBER NUMBER

Time model (contd) How much cpu and db time spend by a session over

tm_delta_time Cpu time + db time can be greater than tm_delta_time tm_delta_time reported since last time session was sampled Useful for finding top cpu/db time sessions in a given interval Demo

Time model (contd) Read/write I/O requests (including RAC interconnect)

reported over delta_time Delta_time is different from tm_delta_time Delta_time reported since last time session was sampled Useful for finding top I/O sessions during any interval Demo

Desired new features Hash of all bind values for a given sql execution (this is

to identify contention caused sql with same binds) Populating current_obj#/file#/block# for all logical reads when session is on cpu Adding wait_time_micro from v$session and populating it just like in v$session across each sample Populating last event when session is on cpu Record redo usage per sql (ER# 8646714) More awr samples for session outliers (ER# 8669416) Reporting plsql line id being executed

DIY ASH Do it yourself ASH by copying v$session active session

data every second Sql plan information will be unavailable Not going to be light weight process Time model information will be missing Incomplete wait time metrics Overall, a good alternate option if diagnostic pack license not purchased

ASH use cases Real time troubleshooting of any incident RCA analysis post incident Proactive scalability analysis (index contention, buffer

busy waits, latch contention, etc) Fine grained resource utilization metrics report Finding effectiveness of pointing read only traffic to active dataguard