Stay clear of the bugs: Troubleshooting Applications in Microsoft Azure

40
MICHAEL S. COLLIER / Principal Cloud Architect MIKE WOOD / Technical Evangelist, Cerebrata

description

Slides from our #GoCloudWebinar series. In this presentation, you will learn how to incorporate the necessary diagnostic tools into your application so you can monitor and take action on your Azure applications. Michael Collier, Principal Cloud Architect at Aditi and our guest speaker, Mike Wood, Technical Evangelist at Cerebrata give you insights on how to best troubleshoot your Microsoft Azure applications.

Transcript of Stay clear of the bugs: Troubleshooting Applications in Microsoft Azure

Page 1: Stay clear of the bugs: Troubleshooting Applications in Microsoft Azure

MICHAEL S. COLLIER / Principal Cloud Architect

MIKE WOOD / Technical Evangelist, Cerebrata

Page 2: Stay clear of the bugs: Troubleshooting Applications in Microsoft Azure

www.aditi.com

MICHAEL S. COLLIER

Principal Cloud Architect, Aditi

[email protected]

@MichaelCollier

www.MichaelSCollier.com

Page 3: Stay clear of the bugs: Troubleshooting Applications in Microsoft Azure

www.aditi.com

MIKE WOOD

Technical Evangelist, Cerebrata

[email protected]

@mikewo

www.mvwood.com

Page 4: Stay clear of the bugs: Troubleshooting Applications in Microsoft Azure

www.aditi.com

TODAY’S AGENDA

1 / The need for diagnostic data in cloud applications

2 / Data we can we monitor

3 / Using the Microsoft Azure Diagnostic Agent

4 / Real-world guidance for troubleshooting Microsoft Azure apps

Page 5: Stay clear of the bugs: Troubleshooting Applications in Microsoft Azure

www.aditi.com

SUCCESS VS. FAILURE

Successful projects share at least one common trait . . .

node.jsC#

Java

Agile- vs -

Waterfall

Page 6: Stay clear of the bugs: Troubleshooting Applications in Microsoft Azure

www.aditi.com

SUCCESS VS. FAILURE

Successful projects share at least one common trait . . .

Diagnostics Data / Telemetry

Page 7: Stay clear of the bugs: Troubleshooting Applications in Microsoft Azure

www.aditi.com

A TRUE STORY

Scenario

o Determine if solution is production ready

o Deployed as a Microsoft Azure Cloud Service

o No load tests

o No performance tests

o No unit tests

o Very little instrumentation

We

Have

Problema

Page 8: Stay clear of the bugs: Troubleshooting Applications in Microsoft Azure

www.aditi.com

A TRUE STORY

Resolution

o Step 0 – Enable Microsoft Azure

diagnostics

Set key performance

counters

o Step 1 – Add logging statements

around key functionality

Especially external services

o Step 3 – Test, test, test

o Step 4 – Analyze

o Step 5 – Fix it

Scenario

o Determine if solution is production ready

o Deployed as a Microsoft Azure Cloud Service

o No load tests

o No performance tests

o No unit tests

o Very little instrumentation

Page 9: Stay clear of the bugs: Troubleshooting Applications in Microsoft Azure

www.aditi.com

INSTRUMENTATION MORE IMPORTANT IN “THE CLOUD”

o Need to have good instrumentation for on-premises applications

o Cloud – it matters more!

o Distributed environments and services

o Composite applications

o Reliance on 3rd party vendors . . . such as Microsoft for Azure

o Highly automated environments

o Scale out model

o Massive amounts of data

Page 10: Stay clear of the bugs: Troubleshooting Applications in Microsoft Azure

www.aditi.com

THE CLOUD SCALES . . . YOU DO NOT

worker roles

web roles

o Event Logs – 4x

o Performance Counters – 4x

o Trace Logs – 4x

o log4net/nlog/[custom] – 4x

o IIS Logs – 4x

Page 11: Stay clear of the bugs: Troubleshooting Applications in Microsoft Azure

www.aditi.com

THE CLOUD SCALES . . . YOU DO NOT

o Event Logs – 4x

o Performance Counters – 4x

o Trace Logs – 4x

o log4net/nlog/[custom] – 4x

o IIS Logs – 4x

worker roles

web roles

Page 12: Stay clear of the bugs: Troubleshooting Applications in Microsoft Azure

www.aditi.com

GATHERING DATA

Performance Counters

Event Logs

Trace Logs

IIS Logs

Crash Dumps

Custom Log Files

Page 13: Stay clear of the bugs: Troubleshooting Applications in Microsoft Azure

www.aditi.com

GATHERING DATA

Performance Counters

Event Logs

Trace Logs

IIS Logs

Crash Dumps

Custom Log Files

Page 14: Stay clear of the bugs: Troubleshooting Applications in Microsoft Azure

www.aditi.com

GATHERING DATA

Performance Counters

Event Logs

Trace Logs

IIS Logs

Crash Dumps

Custom Log Files

Azure Storage

Page 15: Stay clear of the bugs: Troubleshooting Applications in Microsoft Azure

www.aditi.com

HOW DOES IT GET THERE?

The role instance starts up

The diagnostics Monitor process starts

Diagnostics is configured- By code, file or remotely

Data is buffered locally to each instance- Rolling buffer

Data is saved to storage account- Configured Schedule- On demand

1

1 22 wadcfg

3

4

5

3

4

5

Page 16: Stay clear of the bugs: Troubleshooting Applications in Microsoft Azure

www.aditi.com

WHERE DOES THE DATA GO?

Performance Counters

Event Logs

Trace Logs IIS Logs

Crash Dumps

Custom Log Files

Azure Storage

Table Storage BLOB Storage

Page 17: Stay clear of the bugs: Troubleshooting Applications in Microsoft Azure

www.aditi.com

WHERE DOES THE DATA GO?

Page 18: Stay clear of the bugs: Troubleshooting Applications in Microsoft Azure

www.aditi.com

CONFIGURATION

Page 19: Stay clear of the bugs: Troubleshooting Applications in Microsoft Azure

www.aditi.com

Default Configuration

• Trace Messages

• IIS Logs

• Azure Infrastructure Logs

• No Transfer

Imperative Configuration

• Usually handled in RoleEntry OnStart

• Overrides Default

Declarative Configuration

• Diagnostics.wadcfgfile

• Overrides imperative

CONFIGURATION

wadcfg

Page 20: Stay clear of the bugs: Troubleshooting Applications in Microsoft Azure

www.aditi.com

<?xml version="1.0" encoding="utf-8"?>

<DiagnosticMonitorConfiguration configurationChangePollInterval="PT1M" overallQuotaInMB="8192" xmlns="http://schemas.microsoft.com/ServiceHosting/2010/10/DiagnosticsConfiguration">

<DiagnosticInfrastructureLogs bufferQuotaInMB="512" scheduledTransferPeriod="PT5M" scheduledTransferLogLevelFilter="Warning" />

<Directories bufferQuotaInMB="512" scheduledTransferPeriod="PT5M">

<IISLogs container="wad-iis-logfiles" directoryQuotaInMB="100" />

<FailedRequestLogs container="wad-iis-frq-logfiles" directoryQuotaInMB="100" />

</Directories>

<Logs bufferQuotaInMB="1024" scheduledTransferPeriod="PT5M" scheduledTransferLogLevelFilter="Warning" />

<PerformanceCounters bufferQuotaInMB="512" scheduledTransferPeriod="PT5M">

<PerformanceCounterConfiguration counterSpecifier="\Memory\Available MBytes" sampleRate="PT1M" />

<PerformanceCounterConfiguration counterSpecifier="\ASP.NET Applications(__Total__)\Requests/Sec" sampleRate="PT1M" />

<PerformanceCounterConfiguration counterSpecifier="\ASP.NET\Requests Queued" sampleRate="PT1M" />

<PerformanceCounterConfiguration counterSpecifier="\ASP.NET\Requests Rejected" sampleRate="PT1M" />

<PerformanceCounterConfiguration counterSpecifier="\Process(w3wp)\% Processor Time" sampleRate="PT1M" />

<PerformanceCounterConfiguration counterSpecifier="\Memory\Committed Bytes" sampleRate="PT1M" />

<PerformanceCounterConfiguration counterSpecifier="\.NET CLR Exceptions(_Global_)\# Exceps Thrown" sampleRate="PT1M" />

</PerformanceCounters>

<WindowsEventLog bufferQuotaInMB="1024" scheduledTransferPeriod="PT5M" scheduledTransferLogLevelFilter="Warning">

<DataSource name="Application!*" />

<DataSource name="System!*" />

<DataSource name="Security!*" />

<DataSource name="Windows Azure!*" />

</WindowsEventLog>

</DiagnosticMonitorConfiguration>

CONFIGURATION – DIAGNOSTICS.WADCFG

Page 21: Stay clear of the bugs: Troubleshooting Applications in Microsoft Azure

www.aditi.com

CONFIGURATION

wadcfg

wadcfg

Remotely Updated or On DemandImperative, Default or Declarative

Page 22: Stay clear of the bugs: Troubleshooting Applications in Microsoft Azure

www.aditi.com

INSTRUMENTATION VS. TELEMETRY

o Instrumentation – generation of custom monitoring and debugging information, usually

via event and error handling code in the application

o Telemetry – process of gathering the information collected by instrumentation

o Microsoft Azure diagnostics enables instrumentation

o 3rd party tools and/or custom processes provide the telemetry to understand

o Apply to development, test, and QA versions – validate performance & ensure telemetry

systems operating correctly

Page 23: Stay clear of the bugs: Troubleshooting Applications in Microsoft Azure

www.aditi.com

DEFINE KEY METRICS

Compute node

resource usage

Windows Event

logs

Database

queries

response times

Application

specific

exceptions

Database

connection &

cmd failures

Microsoft

Azure Storage

Analytics

Process for Microsoft Azure hosted solutions is not that different from traditional, on-premises

solutions.

Page 24: Stay clear of the bugs: Troubleshooting Applications in Microsoft Azure

www.aditi.com

BE REALISTIC

o Sample every 1 minute

o Transfer every 5 minutes

o Transfer only what is needed

o Azure Diagnostics writes data in 60 second wide partitions

– Too much data could overwhelm the partition

Page 25: Stay clear of the bugs: Troubleshooting Applications in Microsoft Azure

www.aditi.com

BE REALISTIC<?xml version="1.0" encoding="utf-8"?>

<DiagnosticMonitorConfiguration configurationChangePollInterval="PT1M" overallQuotaInMB="8192" xmlns="http://schemas.microsoft.com/ServiceHosting/2010/10/DiagnosticsConfiguration">

<DiagnosticInfrastructureLogs bufferQuotaInMB="512" scheduledTransferPeriod="PT5M" scheduledTransferLogLevelFilter="Warning" />

<Directories bufferQuotaInMB="512" scheduledTransferPeriod="PT5M">

<IISLogs container="wad-iis-logfiles" directoryQuotaInMB="100" />

<FailedRequestLogs container="wad-iis-frq-logfiles" directoryQuotaInMB="100" />

</Directories>

<Logs bufferQuotaInMB="1024" scheduledTransferPeriod="PT5M" scheduledTransferLogLevelFilter="Warning" />

<PerformanceCounters bufferQuotaInMB="512" scheduledTransferPeriod="PT5M">

<PerformanceCounterConfiguration counterSpecifier="\Memory\Available MBytes" sampleRate="PT1M" />

<PerformanceCounterConfiguration counterSpecifier="\ASP.NET Applications(__Total__)\Requests/Sec" sampleRate="PT1M" />

<PerformanceCounterConfiguration counterSpecifier="\ASP.NET\Requests Queued" sampleRate="PT1M" />

<PerformanceCounterConfiguration counterSpecifier="\ASP.NET\Requests Rejected" sampleRate="PT1M" />

<PerformanceCounterConfiguration counterSpecifier="\Process(w3wp)\% Processor Time" sampleRate="PT1M" />

<PerformanceCounterConfiguration counterSpecifier="\Memory\Committed Bytes" sampleRate="PT1M" />

<PerformanceCounterConfiguration counterSpecifier="\.NET CLR Exceptions(_Global_)\# Exceps Thrown" sampleRate="PT1M" />

</PerformanceCounters>

<WindowsEventLog bufferQuotaInMB="1024" scheduledTransferPeriod="PT5M" scheduledTransferLogLevelFilter="Warning">

<DataSource name="Application!*" />

<DataSource name="System!*" />

<DataSource name="Security!*" />

<DataSource name="Windows Azure!*" />

</WindowsEventLog>

</DiagnosticMonitorConfiguration>

Page 26: Stay clear of the bugs: Troubleshooting Applications in Microsoft Azure

www.aditi.com

BE REALISTIC<?xml version="1.0" encoding="utf-8"?>

<DiagnosticMonitorConfiguration configurationChangePollInterval="PT1M" overallQuotaInMB="8192" xmlns="http://schemas.microsoft.com/ServiceHosting/2010/10/DiagnosticsConfiguration">

<DiagnosticInfrastructureLogs bufferQuotaInMB="512" scheduledTransferPeriod="PT5M" scheduledTransferLogLevelFilter="Warning" />

<Directories bufferQuotaInMB="512" scheduledTransferPeriod="PT5M">

<IISLogs container="wad-iis-logfiles" directoryQuotaInMB="100" />

<FailedRequestLogs container="wad-iis-frq-logfiles" directoryQuotaInMB="100" />

</Directories>

<Logs bufferQuotaInMB="1024" scheduledTransferPeriod="PT5M" scheduledTransferLogLevelFilter="Warning" />

<PerformanceCounters bufferQuotaInMB="512" scheduledTransferPeriod="PT5M">

<PerformanceCounterConfiguration counterSpecifier="\Memory\Available MBytes" sampleRate="PT1M" />

<PerformanceCounterConfiguration counterSpecifier="\ASP.NET Applications(__Total__)\Requests/Sec" sampleRate="PT1M" />

<PerformanceCounterConfiguration counterSpecifier="\ASP.NET\Requests Queued" sampleRate="PT1M" />

<PerformanceCounterConfiguration counterSpecifier="\ASP.NET\Requests Rejected" sampleRate="PT1M" />

<PerformanceCounterConfiguration counterSpecifier="\Process(w3wp)\% Processor Time" sampleRate="PT1M" />

<PerformanceCounterConfiguration counterSpecifier="\Memory\Committed Bytes" sampleRate="PT1M" />

<PerformanceCounterConfiguration counterSpecifier="\.NET CLR Exceptions(_Global_)\# Exceps Thrown" sampleRate="PT1M" />

</PerformanceCounters>

<WindowsEventLog bufferQuotaInMB="1024" scheduledTransferPeriod="PT5M" scheduledTransferLogLevelFilter="Warning">

<DataSource name="Application!*" />

<DataSource name="System!*" />

<DataSource name="Security!*" />

<DataSource name="Windows Azure!*" />

</WindowsEventLog>

</DiagnosticMonitorConfiguration>

Page 27: Stay clear of the bugs: Troubleshooting Applications in Microsoft Azure

www.aditi.com

BE REALISTIC<?xml version="1.0" encoding="utf-8"?>

<DiagnosticMonitorConfiguration configurationChangePollInterval="PT1M" overallQuotaInMB="8192" xmlns="http://schemas.microsoft.com/ServiceHosting/2010/10/DiagnosticsConfiguration">

<DiagnosticInfrastructureLogs bufferQuotaInMB="512" scheduledTransferPeriod="PT5M" scheduledTransferLogLevelFilter="Warning" />

<Directories bufferQuotaInMB="512" scheduledTransferPeriod="PT5M">

<IISLogs container="wad-iis-logfiles" directoryQuotaInMB="100" />

<FailedRequestLogs container="wad-iis-frq-logfiles" directoryQuotaInMB="100" />

</Directories>

<Logs bufferQuotaInMB="1024" scheduledTransferPeriod="PT5M" scheduledTransferLogLevelFilter="Warning" />

<PerformanceCounters bufferQuotaInMB="512" scheduledTransferPeriod="PT5M">

<PerformanceCounterConfiguration counterSpecifier="\Memory\Available MBytes" sampleRate="PT1M" />

<PerformanceCounterConfiguration counterSpecifier="\ASP.NET Applications(__Total__)\Requests/Sec" sampleRate="PT1M" />

<PerformanceCounterConfiguration counterSpecifier="\ASP.NET\Requests Queued" sampleRate="PT1M" />

<PerformanceCounterConfiguration counterSpecifier="\ASP.NET\Requests Rejected" sampleRate="PT1M" />

<PerformanceCounterConfiguration counterSpecifier="\Process(w3wp)\% Processor Time" sampleRate="PT1M" />

<PerformanceCounterConfiguration counterSpecifier="\Memory\Committed Bytes" sampleRate="PT1M" />

<PerformanceCounterConfiguration counterSpecifier="\.NET CLR Exceptions(_Global_)\# Exceps Thrown" sampleRate="PT1M" />

</PerformanceCounters>

<WindowsEventLog bufferQuotaInMB="1024" scheduledTransferPeriod="PT5M" scheduledTransferLogLevelFilter="Warning">

<DataSource name="Application!*" />

<DataSource name="System!*" />

<DataSource name="Security!*" />

<DataSource name="Windows Azure!*" />

</WindowsEventLog>

</DiagnosticMonitorConfiguration>

Page 28: Stay clear of the bugs: Troubleshooting Applications in Microsoft Azure

www.aditi.com

BE REALISTIC<?xml version="1.0" encoding="utf-8"?>

<DiagnosticMonitorConfiguration configurationChangePollInterval="PT1M" overallQuotaInMB="8192" xmlns="http://schemas.microsoft.com/ServiceHosting/2010/10/DiagnosticsConfiguration">

<DiagnosticInfrastructureLogs bufferQuotaInMB="512" scheduledTransferPeriod="PT5M" scheduledTransferLogLevelFilter="Warning" />

<Directories bufferQuotaInMB="512" scheduledTransferPeriod="PT5M">

<IISLogs container="wad-iis-logfiles" directoryQuotaInMB="100" />

<FailedRequestLogs container="wad-iis-frq-logfiles" directoryQuotaInMB="100" />

</Directories>

<Logs bufferQuotaInMB="1024" scheduledTransferPeriod="PT5M" scheduledTransferLogLevelFilter="Warning" />

<PerformanceCounters bufferQuotaInMB="512" scheduledTransferPeriod="PT5M">

<PerformanceCounterConfiguration counterSpecifier="\Memory\Available MBytes" sampleRate="PT1M" />

<PerformanceCounterConfiguration counterSpecifier="\ASP.NET Applications(__Total__)\Requests/Sec" sampleRate="PT1M" />

<PerformanceCounterConfiguration counterSpecifier="\ASP.NET\Requests Queued" sampleRate="PT1M" />

<PerformanceCounterConfiguration counterSpecifier="\ASP.NET\Requests Rejected" sampleRate="PT1M" />

<PerformanceCounterConfiguration counterSpecifier="\Process(w3wp)\% Processor Time" sampleRate="PT1M" />

<PerformanceCounterConfiguration counterSpecifier="\Memory\Committed Bytes" sampleRate="PT1M" />

<PerformanceCounterConfiguration counterSpecifier="\.NET CLR Exceptions(_Global_)\# Exceps Thrown" sampleRate="PT1M" />

</PerformanceCounters>

<WindowsEventLog bufferQuotaInMB="1024" scheduledTransferPeriod="PT5M" scheduledTransferLogLevelFilter="Warning">

<DataSource name="Application!*" />

<DataSource name="System!*" />

<DataSource name="Security!*" />

<DataSource name="Windows Azure!*" />

</WindowsEventLog>

</DiagnosticMonitorConfiguration>

Page 29: Stay clear of the bugs: Troubleshooting Applications in Microsoft Azure

www.aditi.com

SET PRIORITIES

o Two separate channels for telemetry data

– Vital information

• Application or service failures. Higher level of alerting.

• Fix and return to “normal” as soon as possible

– Day-to-day operational data

• Root cause analysis

• How to prevent in the future

o Fine tune the alerts – reduce false alarms and noise

Page 30: Stay clear of the bugs: Troubleshooting Applications in Microsoft Azure

www.aditi.com

CONSIDERATIONS

o Log all calls to external services

o Helpful for SLA violations or challenging a provider

o Log details of transient faults

o Partition telemetry data by date (or hour) – reduce impact of data aggregation or reporting

o Use a different storage account!

o Remove old / non-relevant telemetry data

Page 31: Stay clear of the bugs: Troubleshooting Applications in Microsoft Azure

www.aditi.com

ANALYSIS

Detect First

Transient

vs.

Systemic

Recover FirstRoot Cause

Analysis

Page 32: Stay clear of the bugs: Troubleshooting Applications in Microsoft Azure

www.aditi.com

QUICK ANALYSIS

Page 33: Stay clear of the bugs: Troubleshooting Applications in Microsoft Azure

www.aditi.com

QUICK ANALYSIS

Page 34: Stay clear of the bugs: Troubleshooting Applications in Microsoft Azure

www.aditi.com

QUICK ANALYSIS

% Processor Time

ASP.NET\Requests Queued

Memory\Available MBytes

Page 35: Stay clear of the bugs: Troubleshooting Applications in Microsoft Azure

www.aditi.com

QUICK ANALYSIS

Azure Management Portal

o Visualize key performance counters

via graph

o Data collected via host

o Requires co-admin access to

subscription

o Default data survives for 7 days

o Shows only performance counters

o No query capability

Azure Management Studio

o Visualize key performance counters

via graph

o Data collected via Azure Diagnostics

agent

o Anyone with storage account

credentials

o Data as long as you want it

o Full suite of instrumentation

o Full query and correlation capability

Page 36: Stay clear of the bugs: Troubleshooting Applications in Microsoft Azure

www.aditi.com

SUMMARY

o Instrumentation and telemetry are key to successful projects

o Cloud metrics similar to metrics for traditional applications

o Be realistic and set priorities

o Cerebrata Azure Management Studio an essential tool for

troubleshooting

Page 37: Stay clear of the bugs: Troubleshooting Applications in Microsoft Azure

www.aditi.com

CEREBRATA OFFER

Thank you for attending or watching the webinar!

15% off Azure Management Studio until April 30th, 2014

http://bit.ly/ams-webinar

Page 38: Stay clear of the bugs: Troubleshooting Applications in Microsoft Azure

www.aditi.com

ADITI OFFER

o Aditi provides an onsite cloud expert for a 2 day cloud strategy assessment

– Key objective is to analyze the viability of cloud as a deployment option, including its

technical and economic impact for a targeted workload or set of applications and

infrastructure

o Deliverables:

• Cloud Strategy Assessment

To know more, email us at [email protected]!

Page 39: Stay clear of the bugs: Troubleshooting Applications in Microsoft Azure

www.aditi.com

Mike WoodTECHNICAL EVANGELIST, CEREBRATA

[email protected]

Michael CollierPRINCIPLE CLOUD ARCHITECT, ADITI

[email protected]

Let’s continue the conversation.

Page 40: Stay clear of the bugs: Troubleshooting Applications in Microsoft Azure