EM12c Monitoring Best Practices - Rob Zoeteweij -
Transcript of EM12c Monitoring Best Practices - Rob Zoeteweij -
EM12c Monitoring Best Practices Author: Rob Zoeteweij Date: 13 October 2012 http://oemgc.wordpress.com Some weeks ago I posted an article on my blog after attending Ana McCollum’s presentation “Beyond the Basics: Making the Most of Oracle Enterprise Manager Monitoring” at OOW 2012. In this document I further elaborated my notes to give a good overview of all topics discussed during the presentation. All credits for Ana! To my opinion this document could very good be the bases for your guys “EM12c Best Practices” document. I included some snippets of pictures I took of the slides during the presentation. They are bit blurry (sorry for that), but I hope they will give a bit more understanding. Creating the Administration Group Hierarchy • Specify multiple values for the target property criteria • Target Type criteria: Database, Listener, ASM belong to the same group instead of 3
groups • Set the time zone when you define the group
o Time zone is used for group operations and charts o All subgroups will default to the same time zone
• After the hierarchy is created, you can: o Add or remove values for a target property (expand/shrink hierarchy
horizontally) o Add new/Remove target property criteria (add/remove new level)
Hierarchy will be deleted and re-‐created Template Collections will remain but will need re-‐association
o Rename any group (EMCLI rename_target) How do I set Target Properties so Targets join Administration Groups? • Set properties during target addition/promotion workflow
o Target Properties page in console Target menu Target Setup Properties Possible Property Values are based on Administration Group
Hierarchy (New in Rel2) o Use EMCLI set_target_property_value for setting the Property Values for
multiple Targets at once • Aggregate Targets
o Cluster targets Target property set on the cluster automatically applies to all
members o Non-‐cluster aggregate targets
Target property set on aggregate does not auto apply to members • Members could be part of different aggregate targets,
properties therefor need to be set explicitly Templates auto-‐applied only to members whose target properties
match the group criteria (aka Direct Members) To set target property on aggregate and its current members
• EMCLI set_target_property_value –propagate_to_members • Example: set the Location property of a database system
including its members: emcli set_target_property_value –property_records=”dbrac_sys:oracle_dbsys:Location Bangalore” –propagate_to_members
What Monitoring Settings will be applied to the Administration Group? • Enhanced Group Management Settings (New Rel2)
o Use on LEAF Groups o Shows parent groups/template collections o Review specific monitoring templates o Review combined monitoring settings from multiple templates o Verify if management settings have been applied to the group
Are my Targets monitored using our Standards for Monitoring? • Check synchronization Status region of TOPMOST administration group
o Shows sync status of all targets in hierarchy Sync Status Column What to do Synchronized Targets Nothing. Targets are in sync with monitoring
templates Pending Targets Ensure you have Global Sync Schedule
defined. Indicated by ‘Next Synchronization’ date; if N/A set schedule
Running Targets Nothing. Check later to see if they are all synchronized.
Failed Targets Drilldown to get details; Fix where possible. Will attempt to re-‐sync on next sync schedule, or on demand by user
N/A targets Targets have no associated monitoring template. Drilldown to get target type, add monitoring template to template collection.
Privileges required for Monitoring Setup • You need to use super administrator to perform these actions
Monitoring Setup Required Privilege Create Administration Group Hierarchy
• FULL Any Target • Create Privilege Propagating Group
Use Monitoring Templates • None to create • View on specific Monitoring Template
Use Template Collections • Create Template Collection • View/Full on specific Template
Collections or View any Template Collection
Associate Template Collection with Administration Group
• Operator on group • View on Template Collection
Incident Management
• Manage by Incidents
o Significant events o Combination of events related to the same issue (e.g. events raised from
database, host, storage indicating lack of space) • Centralized incident management console
o View, manage, diagnose and resolve incidents from one location • Support for incident lifecycle operations
o Assign, acknowledge, prioritize, track status, escalate, suppress o Notify and open helpdesk ticket
• Integrated Oracle expertise o Access to My Oracle Support knowledge base o Accelerates incident and problem diagnosis and resolution
What Targets should be used in Rule Sets? • Specify group(s)/systems
o Specify administration group(s) if applicable o Rules keep up with changes in group membership o Example: All database targets whose Lifecycle Status = ‘Mission Critical’ or
‘Production’
How do I organize my Rule Sets / Rules? • Combine all rules applying to the same group in one rule set • Leverage the order of rules within a rule set and group similar rules together:
o Rules to create incidents o Rules to manage incidents (email, ticketing, escalation) o Put duration-‐based rules last
• Duplicate actions across rule sets o ‘Create Incident’: first rule wins (can’t create multiple incidents for same
event o Incident workflow (assign, set priority…): last rule wins (final value from
rule) o Notifications: all actions executed
What Type of Rules should I choose? Type of Rule Best used for Event Rule • Create incidents based on events
• Create helpdesk tickets for incidents • Send events to third party
management systems • Send email for specific events of
interest (e.g. send email to business users if target is down)
Incident Rule • Automate incident workflow operations (e.g. assign incident)
• Send notifications on incidents • Create helpdesk tickets for incidents
(e.g. create ticket if incident is escalated to level 2)
Problem Rule • Automate problem workflow operations (e.g. assignment, prioritization, etc.)
• Send notifications on problems What Conditions should I specify in Event Rules • Use broad criteria that spans multiple target types • Metric Alert event rule
o Use broad criteria (e.g. all critical events or critical events on specified target types) instead of individual metrics
Requires controlling metric alerts thresholds) at the source Simplifies rule maintenance: No need to change rule for new metrics
• Target Availability event rule o Based on status metric o Choose ‘agent unreachable’ only for host and/or agent targets o Choose ‘down’ for all other targets
Target Availability Event Severities Scenario Target Type Target Status Availability
Event Severity
Target is down All target types except host and agent
Down Fatal
Agent is down or unreachable
Agent Agent Unreachable Critical
All non-‐agent targets including host
Agent Unreachable Warning
Host is down or unreachable
Agent on the host Agent Unreachable Critical
All agents on the host including host
Agent Unreachable Warning
Blackout started on target All target types Blackout Advisory Target is up (from any of the other states)
All target types Up Clear
Target is in status pending for more than 5 minutes
All target types Unknown Warning
What Conditions should I specify in Event Rules – 2 • Job Status event rule
o No job events unless you set it up o Setup Incidents Job Events o Choose Job Status to raise events
Action Required, Problems are defaults o Select targets on which job events are raised (tip: use groups)
Who gets notified for Events / Incidents • Checklist for email notifications
o Recipient must have at least View on the source of the event o Recipient must have email address and notification schedule
• Can specify direct email addresses including distribution lists • Leverage TO: vs CC: email notification
o TO: recipients: Best used to enforce mandatory recipients of the email. Only rule creator can add these
o CC: Recipients: Best used for interested parties; Users who self-‐subsribe to rule are added to the CC line
• Take advantage of ‘page’ vs ‘email’ classification o Enables easier setup for: Page me for critical, email me for warning
• Use variables as notification recipients: $INCIDENT_OWNER$ $TGT_OWNER$ $PROBLEM_OWNER$ $SOURCE_OBJ_OWNER$ Example: Setup a single rule to send email notifications to the $INCIDENT_OWNER$ when he gets assigned an incident
• Tailor email message (Subject, Body) formats for your requirements
o Setup Notifications Customize Email Format o Customize per event type, incident or problem email messages
Using Incident Rule Sets • For rules with duration conditions, put more specific criteria in the rule
o Rule: After 7 days, for all critical events, clear event o Better rule: After 7 days, for all critical Generic Alert Log Error events, clear
event • Leverage failover feature for SMTP gateway by specifying multiple gateways • Setup repeat notifications for important incidents
o Will repeat until cleared or acknowledged o Acknowledge the incident via Console or Enterprise Manager Mobile
How to clear these Events / Incidents? • Incidents will auto-‐clear if all their underlying events are cleared
o Most events auto-‐clear if underlying condition is resolved • Exception: Manually-‐clearable events
o emcli clear_stateless_alerts (bulk clear for metric alerts) o get_metrics_for_stateless_alerts lists manually clearable metric alerts o Event Rule to clear events after specified duration
Tip: Put the specific metric alerts in the rule o Incident Manager: clear (appears if applicable)
Clear multiple incidents (New R2)
Leveraging Incident Manager • To filter on incidents of interest, create custom views on groups or by lifecycle status
(New R2) • To enable more granular tracking of the incident status, add new resolution status
values (e.g. Waiting on SME) o emcli create_resolution_state
• Leverage ‘Resolved’ incident status as ‘soft closed’ o Set this wen fix has been implemented o Enterprise Manager will set to ‘Closed’ when the underlying event/incident is
cleared Maintain Priority processing of important Targets • Set Lifecycle Status target property especially for important targets
o Mission Critical, Production, Staging, Test, Development o Highest priority -‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐>>> Lowes priority
• Used to prioritize loading of data and metric alerts, and processing of events for notifications, creating incidents, etc.
• Enable priority processing of important targets even if managed targets increase