HowtodeployandusetheIBMMonitoring and$Analy7cs$Service ... · HowtodeployandusetheIBMMonitoring...
Transcript of HowtodeployandusetheIBMMonitoring and$Analy7cs$Service ... · HowtodeployandusetheIBMMonitoring...
How to deploy and use the IBM Monitoring and Analy7cs Service with your IBM Bluemix applica7on
1. Introduc+on
1.1. Pla0orm as a service support model
1.2. IBMs Monitoring and analy+cs features
2. Solu+on overview
2.1. Service Management Architecture overview
2.2. Incident Management Architecture overview
2.3. System context monitoring with IBM Monitoring and Analy+cs
3. Prerequisites
3.1. System requirements
3.2. Browsers
4. Set up the Monitoring and Analy+cs service to your applica+on
4.1. There are two types of Monitoring and Analy+cs offerings available:
4.1.1. Free service
4.1.2. Diagnos+cs
4.2. Add the Monitoring and Analy+cs service to your applica+on’s components
4.3. Bind Monitoring and Analy+cs to the other applica+on components
5. Use the Bluemix console for applica+on resource consump+on
6. Launch The Monitoring and Analy+cs user interface
6.1. Launch the UI from the services workspace
6.2. Launch Monitoring and Analy+cs from the applica+on workspace
7. Use cases and roles
8. Using the Monitoring and Analy+cs UI Dev-‐Ops
8.1. Availability Tab
8.2. Performance monitoring
8.3. Log Analysis tab
8.3.1. Use the search feature
8.4. Events Tab
8.4.1. Enable alarms
8.5. Configure no+fica+ons
9. Related links
1. INTRODUCTION IBM® Bluemix® is a Pla0orm as a Service (PaaS) applica+on which provides the ability to rapidly build and deploy and applica+on. Applica+ons may run in na+ve Bluemix or in a hybrid fashion.
This guide provides step-‐by-‐step instruc+ons on deploying the IBM Monitoring and Analy+cs Service to your applica+on.
For this guide, we will monitor the developerWorks Microservices Store Sample applica+on. You can read about this applica+on in the ar+cle “Microservices Store Sample on Bluemix,” which also shows you how the applica+on was installed.
We will cover the set up of the Monitoring and Analy+cs service and integra+on with PagerDuty and IBM Netcool Opera+ons Insight (NOI). Note: To follow along, you must have an IBM NOI deployment on premises or installed in the cloud.
1.1. PLATFORM AS A SERVICE SUPPORT MODEL As you can see in the diagram below, the Bluemix team monitors the Bluemix PaaS infrastructure. As an applica+on owner, customer, it is impera+ve that you provide monitoring for your applica+on.
�
1.2. IBMS MONITORING AND ANALYTICS FEATURES With IBM Monitory and Analy+cs, you can:
• Maintain visibility and control over your applica+ons.
• Determine the response +me your users see.
• Understand the performance and availability of the applica+on components.
• Leverage analy+cs to keep your applica+on up and performing well.
• Receive automa+c no+fica+ons should an applica+on problem occur.
• Integrate with event management and alert no+fica+on solu+ons.
2. SOLUTION OVERVIEW
2.1. SERVICE MANAGEMENT ARCHITECTURE OVERVIEW Cloud-‐based applica+ons need to be available all the +me. Proper processes need to be put in place to assure availability and performance. This includes incident and problem management to respond to outages, but also release management to assure a seamless deployment and release of new versions.
Visit the IBM Cloud Architecture site to view a standards Service Management Architecture, including its run+me flow and func+onal and non-‐func+onal requirements.
2.2. INCIDENT MANAGEMENT ARCHITECTURE OVERVIEW
Incident management and its opera+ons is a key component of cloud service management. An Incident Management architecture restores the normal service opera+ons as quickly as possible to maintain op+mal service quality and availability.
Visit the IBM Cloud Architecture Center to view a standard Incident Management Architecture, including its run+me flow and func+onal and non-‐func+onal requirements.
2.3. SYSTEM CONTEXT MONITORING WITH IBM MONITORING AND ANALYTICS
Incident management can be handled with a variety of IBM and third-‐party tools. In this document, we selected IBM Bluemix Monitoring and Analy+cs Service to show how to monitor cloud-‐based applica+ons that can be deployed on Bluemix Public or Bluemix Dedicated.
Figure 1 shows an end-‐to-‐end view of incident management solu+on for a cloud-‐based applica+on. This figure also shows how to use IBM Monitoring and Analy+cs to monitor the applica+ons, along with other selected sets of toolchains.
�
Figure 1 – System context diagram for Monitoring and Analy+cs
The following flow describes the set up and opera+ons of this solu+on in an overall cloud service management space:
1. The online store’s microservices applica+on selected in this solu+on is hosted in Public or Dedicated Bluemix and bound to the Monitoring and Analy+cs service. The diagnos+cs Node.js and WebSphere Liberty are added to the applica+on.
2. Diagnos+cs capture the alerts or excep+ons from an applica+on and display them on a New Relic dashboard.
3. The Monitoring and Analy+cs service is integrated via an email channel with the IBM NetCool Omnibus, an event-‐correla+on tool. First responders look at IBM Netcool Omnibus for raised issues. They either use runbooks for guidance to recover the applica+on failures or the involves incident owners to inves+gate.
4. NetCool Omnibus correlates events and, depending on policies and rules, generates the necessary alerts and publishes them to the third-‐party collabora+on tool, Slack. Slack enables incident owners to collaborate with incident specialists who in turn validate issues in NetCool Omnibus or dashboard tools to inspect and resolve the issues.
Note: You can directly integrate the Monitoring and Analy+cs service with Slack, an collabora+on tool Slack, and PagerDuty, a no+fica+on tool. IBM recommends that you use an event-‐correla+on system to handle applica+ons that are built as microservices or that are deployed in a hybrid environment.
3. PREREQUISITES
3.1. SYSTEM REQUIREMENTS Before you can include the Monitoring and Analy+cs Service, you need:
• A working Bluemix account
• Bluemix and Cloud Foundry command line tools
• Installed and working applica+on
• 110 megabytes (MB) of space on your applica+on environment to store your applica+on's log data.
• If you use the free version of the Monitoring and Analy+cs Service to monitor Node.js-‐based applica+ons, you must ensure that the memory quota for the applica+on is at least 512 MBs.
3.2. BROWSERS If you use Microsoj Internet Explorer, make sure the security and privacy sekngs are set to medium-‐high or lower.
If you use Mozilla Firefox or Google Chrome, make sure that third-‐party cookies are allowed.
4. SET UP THE MONITORING AND ANALYTICS SERVICE TO YOUR APPLICATION
4.1. THERE ARE TWO TYPES OF MONITORING AND ANALYTICS OFFERINGS AVAILABLE:
4.1.1. Free service You can use the free plan to monitor the availability of any type of applica+on and to help you with basic monitoring and log analysis for Liberty, Node.js, and Ruby-‐based applica+ons. The free plan offers the following capabili+es:
• Displays instant results with capabili+es that are built in to the development environment, helping you to quickly iden+fy bonlenecks in the monitored applica+on. These results give you insight into how your code may affect the performance of the monitored applica+ons.
• Easy-‐to-‐use dashboards and integrated analy+cs-‐powered search help you quickly and easily find root cause issues in the code.
• Integrated log file search and analysis capabili+es help you to quickly iden+fy errors. • Integrated event monitoring, alarms, and no+fica+ons.
4.1.2. DiagnosIcs The diagnos+cs plan contains the same features as the free plan, plus the latest features. This helps you to conduct enhanced performance monitoring for your applica+ons. You can use specialized widgets and metrics to get a deeper analysis of your applica+ons, including detailed informa+on about applica+on performance.
4.2. ADD THE MONITORING AND ANALYTICS SERVICE TO YOUR APPLICATION’S COMPONENTS The following steps show you how to add the service to monitor your applica+ons.
1) Login in to IBM Bluemix and you see your applica+ons and the associated services. In our scenario with the online store sample, the applica+on is made up of three applica+ons (Microservice APIs): PHP, Node.js and Liberty. This applica+on also uses five Bluemix-‐provided services.
�
�
2) There are several ways to add the Monitoring and Analy+cs service to your Bluemix workspace. You can click an applica+on icon or select the Services and APIs Tile in the upper right. For this example, we will work with each applica+on.
3) Click on the Microservices-‐CatalogAPI applica+on to open it and select ADD A SERVICE OR API
�
You will be presented with the Catalog to select your service.
4) Scroll down to the DevOps sec+on or click on DevOps in the filter menu on the lej:
�
5) Click the Monitoring and AnalyIcs icon. You will be asked to select the free or diagnos+c (fee-‐based) service. In our scenario, we select the diagnos+c version.
�
6) Under “Selected Plan” choose Diagnos+cs.
7) Next, click the Create buRon to add the Monitoring and Analy+cs service.
8) Once the service is created, you will be prompted to restage your applica+on. Click Restage.
�
9) Once the applica+on is restaged (re-‐started), the Monitoring and Analy+cs service is ready to start monitoring. The service discovers if your applica+on is Liberty, Node.js, or Ruby. Allow up to five minutes for the Monitoring and Analy+cs service to display data:
� Important Note: For services other than Node.js, Liberty, or Ruby—such as PHP—you will have availability data, but not diagnos+cs data.
We will address how to customize Monitoring and Analy+cs in sec+on 8, but first let’s finish adding the service to our applica+on.
10) Click Return to Dashboard on the upper lej por+on of your screen.
4.3. BIND MONITORING AND ANALYTICS TO THE OTHER APPLICATION COMPONENTS Now that you added the Monitoring and Analy+cs service to your applica+on services, you need to bind the service to the other applica+on components.
Click the Microservices Orders API applicaIon icon to open the service.
�
1) Click BIND A SERVICE OR API.
You will be presented with a pop-‐up menu of all available services to your applica+on, select the Monitoring and Analy+cs as shown below and click ADD.
�
2) The Monitoring and Applica+on Service will be added to your applica+on and the applica+on must be re-‐staged:
�
3) Click on RESTAGE to restart your applica+on.
Repeat these same steps for the remaining applica+on components. Your applica+on is now monitored.
5. USE THE BLUEMIX CONSOLE FOR APPLICATION RESOURCE CONSUMPTION Once logged into Bluemix, click on the applica+on +le:
�
1) Click on the Node.JS Icon for a view of the resources consumed in Bluemix, including the disk, memory, and CPU.
�
6. LAUNCH THE MONITORING AND ANALYTICS USER INTERFACE The Monitoring and Analy+cs service user interface displays KPIs for your applica+on. Let’s look at the user interface, which you can launch in several ways.
6.1. LAUNCH THE UI FROM THE SERVICES WORKSPACE
�
2) Click on the MONITORING AND ANALYTICS +le in your service workspace:
�
3) To view the summary data for your components, click Show ApplicaIon Summary in the Ac+on field:
�
Here you can gain insight to availability and performance of your applica+on components.
4) To drill down to the component level, click the specific component’s name:
�
You will be presented with a summary view of the applica+on and the services it is using.
�
5) Click Monitoring and AnalyIcs in the lej naviga+on:
�
The next screen you will see the tabbed UI for Monitoring and Analy+cs.
There are four tabs for Liberty, Ruby, and Node.JS: Availability, Performance Monitoring, Log Analysis, and Events.
�
There are two tabs for other applica+on components: Availability and Events.
�
6.2. LAUNCH MONITORING AND ANALYTICS FROM THE APPLICATION WORKSPACE You can also launch the UI from the applica+on workspace. To do this, click the Monitoring and AnalyIcs +le in the workspace or Monitoring and AnalyIcs in the lej naviga+on menu.
�
7. USE CASES AND ROLES Now that your applica+on in monitored, let’s see how Monitoring and Analy+cs assists with solving an incident. The following flow describes the different roles of people involved in responding to an incident and the steps they take to resolve the incident.
In brackets I have pointed to the sec+ons in this document that are used for each persona:
First responder:
1. Receives the alerts via a no+fica+on or +cke+ng system—a PagerDuty in this scenario. [8.5] 2a. Assigns an incident owner to the +cket.
2b. Focus on speed and resolves the issue via the prescribed scripts (either in runbook or manual) [5, 8.1, 8.4]
3.Collaborates with the applica+on owner if more details are needed.
4a. Closes +cket if #2b is resolved. 4b. If not resolved, the ncident owner con+nues to work on the problem un+l it is resolved.
Incident owner:
1. Receives the incident assignment informa+on. 2a. Collaborates on Slack with SMEs to understand how to handle the problem.
2b. Collaborates with First responder/Applica+on Owner if more informa+on is needed. 3. brings the incident to closure 4. Opens Problem +cket for root cause analysis
Subject MaRer Expert (SME):
1. Inves+gates the problem in Monitoring tools for more details. [8.1, 8.2, 8.4] 2. Inspects logs 3. Tests and verifies issues 4. Recommends fixes if missing instruc+on
Site reliability engineer (SRE):
1. Pursuing Maximum Change Velocity Without Viola+ng a Service’s SLO
2. Monitoring 3. Emergency response 4. Change Management 5. Demand Forecas+ng and Capacity Planning 6. Provisioning 7. Efficiency and Performance
8. USING THE MONITORING AND ANALYTICS UI DEV-‐OPS All monitored applica+ons have the Availability Tab in Public Bluemix. This informa+on is generated via URL pings conducted within Bluemix. The tabs also have an icon iden+fying client personnel, who are key users of this interface.
8.1. AVAILABILITY TAB
� � �
The Availability tab provides basic informa+on about your Bluemix applica+on URL, including results from the last seven days of data. This data helps you answer the following ques+ons:
• Has my Bluemix applica+on been consistently reachable and responsive?
• Were there +me periods when my applica+on response +mes were unusually slow?
• Were there +me periods when my applica+on was down, as signified by 404 or other non-‐200 status codes?
The figure below shows a sample of this tab with the basic Bluemix informa+on.
�
On the right drop-‐down menu, you can adjust the +me range you are viewing. Select either last 24 hours, last 7 days, or last hour.
�
8.2. PERFORMANCE MONITORING
� �
For Liberty, Node.JS, and Ruby, the Performance Monitoring Tab lets you view diagnos+c data associated with the applica+on. The image below depicts the Node.JS API component of the Online store sample applica+on. Data from the last seven days is displayed so that you can answer the following ques+ons:
• Were there any unusual spikes in CPU usage?
• How much heap memory has my applica+on used and is it within acceptable limits?
• What is my applica+on's thread pool usage, and is it what I expected?
• How frequently and for what dura+on has garbage collec+on run and might that impact performance?
• What pages were the most frequent target of GET and POST requests in my Node.js applica+on?
For Node.HS, you can see the slowest requests CPU, throughput and response +me, and memory usage. To drill down into the requests, select Diagnose.
�
On the next screen, you can see details for the requests, including a summary of all recent requests. To drill down into a request, select View instance data.
�
On the next page, you will see the instance data. For this instance, select the one with the highest response +me and click View request sequence.
�
On the next screen, click the + by the under the Event Name to open up a view of the methods.
�
Next, click ./routes./items.list to see a stack trace for that event.
�
The following image shows a sample stack trace.
�
8.3. LOG ANALYSIS TAB
�
For Liberty, Node.JS and Ruby, the Log Analysis tab displays the log file data that is associated with the applica+on. The following example shows a Node.JS API component of the online store sample applica+on. Use the Log Analysis tab to help you to search the log files that your applica+on generated in the previous 24 hours, iden+fy errors, and graph search results.
�
Ini+ally you are presented with the last 15 minutes of entries in the log. This window contains several features, including a lace to hove on a graph entry for data +me and number of occurrences, +me filters, and +me sliders.:
8.3.1. Use the search feature To search, type a search term and click Search. To view distribu+on informa+on for all logs, type the wildcard character, an asterisk (*), in the Search field.
To search for a par+al string, type an asterisk (*) at the start and end of your search string. For example, to search for strings that contain the phrase "hostname", type *hostname*.
Ajer you click Search, the log records that contain a match for your search term are displayed. A graph that shows the distribu+on of matching events in the log is also displayed.
The search results +meline displays a graph that shows the distribu+on of log events over a +me period. You can use the +meline slider to view the logs for a specific dura+on. To drill down to a more specific +me range, click the relevant bar on the Search UI.
In the following example all logs were searched for the text “Deepdive.”
�
For a more readable format, click the grid box in the upper lej of the search results. This grid view allows you to sort on the columns by clicking on the column +tle.
�
To filter your results by the data source, click on the associated data source text. Note your search bar will automa+cally update, then click the Search buRon.
�
The log analysis tab is a powerful tool allowing you to quickly and easily locate issues in your applica+on logs.
8.4. EVENTS TAB
� �
To help you monitor the performance and availability of your applica+on, you can use the event monitoring features on the Events tab to view the latest events, send email no+fica+ons, and configure alarms for the events that you want to monitor.
You can only use the alarms that are listed in the tables below. Custom alarms are currently not supported. Once you have enabled the alarms, the event window will display the last ten alerts.
The Monitoring and Analy+cs service for this applica+on will be integrated with PagerDuty and a cloud-‐based installa+on of NOI via email. Via the NOI event management tool, the alerts are also sent to Slack. The set up of PagerDuty and NOI are covered separately.
Once you enable the alarms, the event window displays the last ten alerts.
8.4.1. Enable alarms
�
The SME should enable alarms when he or she adds the Monitoring and Analy+cs service to the applica+on.
1. Open the Event tab.
2. Click the ConfiguraIon wheel in the upper right corner of the page.
�
Select Configure events policy.
�
For each component, there are pre-‐configured alarms you can enable. Simply check the Enabled checkbox to ac+vate the threshold alarm. Click Next to enable alarms on the next page.
�
Click Done. Your applica+on component is now monitored.
Repeat these steps for each applica+on component.
8.5. CONFIGURE NOTIFICATIONS
� �
The SME should configure no+fica+ons when he or she adds the Monitoring and Analy+cs service to the applica+on. To do this, follow these steps:
1. Select Configure noIficaIon in the customiza+on wheel in the event view window:
�
Enter or paste your email addresses in the pop-‐up window. Click Done.
�
The alerts are sent to PagerDuty and NOI. You can set up the no+fica+ons to go to more than one email by separa+ng the addresses with commas. You only need to configure the no+fica+ons one +me. All applica+ons that use the Monitoring and Analy+cs service will have the same no+fica+on list.
9. RELATED LINKS For more informa+on, please see:
Bluemix: hnps://ibm.biz/Bd45Yr
Monitoring and Analy+cs Service in Bluemix hnps://ibm.biz/Bd4nA
NOI: hnps://ibm.biz/Bd45YR
PagerDuty: hnps://ibm.biz/Bd45Z5
Slack: hnps://ibm.biz/Bd45YB