Stay clear of the bugs: Troubleshooting Applications in Microsoft Azure
-
Upload
aditi-technologies -
Category
Technology
-
view
488 -
download
1
description
Transcript of Stay clear of the bugs: Troubleshooting Applications in Microsoft Azure
MICHAEL S. COLLIER / Principal Cloud Architect
MIKE WOOD / Technical Evangelist, Cerebrata
www.aditi.com
MICHAEL S. COLLIER
Principal Cloud Architect, Aditi
@MichaelCollier
www.MichaelSCollier.com
www.aditi.com
TODAY’S AGENDA
1 / The need for diagnostic data in cloud applications
2 / Data we can we monitor
3 / Using the Microsoft Azure Diagnostic Agent
4 / Real-world guidance for troubleshooting Microsoft Azure apps
www.aditi.com
SUCCESS VS. FAILURE
Successful projects share at least one common trait . . .
node.jsC#
Java
Agile- vs -
Waterfall
www.aditi.com
SUCCESS VS. FAILURE
Successful projects share at least one common trait . . .
Diagnostics Data / Telemetry
www.aditi.com
A TRUE STORY
Scenario
o Determine if solution is production ready
o Deployed as a Microsoft Azure Cloud Service
o No load tests
o No performance tests
o No unit tests
o Very little instrumentation
We
Have
Problema
www.aditi.com
A TRUE STORY
Resolution
o Step 0 – Enable Microsoft Azure
diagnostics
Set key performance
counters
o Step 1 – Add logging statements
around key functionality
Especially external services
o Step 3 – Test, test, test
o Step 4 – Analyze
o Step 5 – Fix it
Scenario
o Determine if solution is production ready
o Deployed as a Microsoft Azure Cloud Service
o No load tests
o No performance tests
o No unit tests
o Very little instrumentation
www.aditi.com
INSTRUMENTATION MORE IMPORTANT IN “THE CLOUD”
o Need to have good instrumentation for on-premises applications
o Cloud – it matters more!
o Distributed environments and services
o Composite applications
o Reliance on 3rd party vendors . . . such as Microsoft for Azure
o Highly automated environments
o Scale out model
o Massive amounts of data
www.aditi.com
THE CLOUD SCALES . . . YOU DO NOT
worker roles
web roles
o Event Logs – 4x
o Performance Counters – 4x
o Trace Logs – 4x
o log4net/nlog/[custom] – 4x
o IIS Logs – 4x
www.aditi.com
THE CLOUD SCALES . . . YOU DO NOT
o Event Logs – 4x
o Performance Counters – 4x
o Trace Logs – 4x
o log4net/nlog/[custom] – 4x
o IIS Logs – 4x
worker roles
web roles
www.aditi.com
GATHERING DATA
Performance Counters
Event Logs
Trace Logs
IIS Logs
Crash Dumps
Custom Log Files
www.aditi.com
GATHERING DATA
Performance Counters
Event Logs
Trace Logs
IIS Logs
Crash Dumps
Custom Log Files
www.aditi.com
GATHERING DATA
Performance Counters
Event Logs
Trace Logs
IIS Logs
Crash Dumps
Custom Log Files
Azure Storage
www.aditi.com
HOW DOES IT GET THERE?
The role instance starts up
The diagnostics Monitor process starts
Diagnostics is configured- By code, file or remotely
Data is buffered locally to each instance- Rolling buffer
Data is saved to storage account- Configured Schedule- On demand
1
1 22 wadcfg
3
4
5
3
4
5
www.aditi.com
WHERE DOES THE DATA GO?
Performance Counters
Event Logs
Trace Logs IIS Logs
Crash Dumps
Custom Log Files
Azure Storage
Table Storage BLOB Storage
www.aditi.com
WHERE DOES THE DATA GO?
www.aditi.com
CONFIGURATION
www.aditi.com
Default Configuration
• Trace Messages
• IIS Logs
• Azure Infrastructure Logs
• No Transfer
Imperative Configuration
• Usually handled in RoleEntry OnStart
• Overrides Default
Declarative Configuration
• Diagnostics.wadcfgfile
• Overrides imperative
CONFIGURATION
wadcfg
www.aditi.com
<?xml version="1.0" encoding="utf-8"?>
<DiagnosticMonitorConfiguration configurationChangePollInterval="PT1M" overallQuotaInMB="8192" xmlns="http://schemas.microsoft.com/ServiceHosting/2010/10/DiagnosticsConfiguration">
<DiagnosticInfrastructureLogs bufferQuotaInMB="512" scheduledTransferPeriod="PT5M" scheduledTransferLogLevelFilter="Warning" />
<Directories bufferQuotaInMB="512" scheduledTransferPeriod="PT5M">
<IISLogs container="wad-iis-logfiles" directoryQuotaInMB="100" />
<FailedRequestLogs container="wad-iis-frq-logfiles" directoryQuotaInMB="100" />
</Directories>
<Logs bufferQuotaInMB="1024" scheduledTransferPeriod="PT5M" scheduledTransferLogLevelFilter="Warning" />
<PerformanceCounters bufferQuotaInMB="512" scheduledTransferPeriod="PT5M">
<PerformanceCounterConfiguration counterSpecifier="\Memory\Available MBytes" sampleRate="PT1M" />
<PerformanceCounterConfiguration counterSpecifier="\ASP.NET Applications(__Total__)\Requests/Sec" sampleRate="PT1M" />
<PerformanceCounterConfiguration counterSpecifier="\ASP.NET\Requests Queued" sampleRate="PT1M" />
<PerformanceCounterConfiguration counterSpecifier="\ASP.NET\Requests Rejected" sampleRate="PT1M" />
<PerformanceCounterConfiguration counterSpecifier="\Process(w3wp)\% Processor Time" sampleRate="PT1M" />
<PerformanceCounterConfiguration counterSpecifier="\Memory\Committed Bytes" sampleRate="PT1M" />
<PerformanceCounterConfiguration counterSpecifier="\.NET CLR Exceptions(_Global_)\# Exceps Thrown" sampleRate="PT1M" />
</PerformanceCounters>
<WindowsEventLog bufferQuotaInMB="1024" scheduledTransferPeriod="PT5M" scheduledTransferLogLevelFilter="Warning">
<DataSource name="Application!*" />
<DataSource name="System!*" />
<DataSource name="Security!*" />
<DataSource name="Windows Azure!*" />
</WindowsEventLog>
</DiagnosticMonitorConfiguration>
CONFIGURATION – DIAGNOSTICS.WADCFG
www.aditi.com
CONFIGURATION
wadcfg
wadcfg
Remotely Updated or On DemandImperative, Default or Declarative
www.aditi.com
INSTRUMENTATION VS. TELEMETRY
o Instrumentation – generation of custom monitoring and debugging information, usually
via event and error handling code in the application
o Telemetry – process of gathering the information collected by instrumentation
o Microsoft Azure diagnostics enables instrumentation
o 3rd party tools and/or custom processes provide the telemetry to understand
o Apply to development, test, and QA versions – validate performance & ensure telemetry
systems operating correctly
www.aditi.com
DEFINE KEY METRICS
Compute node
resource usage
Windows Event
logs
Database
queries
response times
Application
specific
exceptions
Database
connection &
cmd failures
Microsoft
Azure Storage
Analytics
Process for Microsoft Azure hosted solutions is not that different from traditional, on-premises
solutions.
www.aditi.com
BE REALISTIC
o Sample every 1 minute
o Transfer every 5 minutes
o Transfer only what is needed
o Azure Diagnostics writes data in 60 second wide partitions
– Too much data could overwhelm the partition
www.aditi.com
BE REALISTIC<?xml version="1.0" encoding="utf-8"?>
<DiagnosticMonitorConfiguration configurationChangePollInterval="PT1M" overallQuotaInMB="8192" xmlns="http://schemas.microsoft.com/ServiceHosting/2010/10/DiagnosticsConfiguration">
<DiagnosticInfrastructureLogs bufferQuotaInMB="512" scheduledTransferPeriod="PT5M" scheduledTransferLogLevelFilter="Warning" />
<Directories bufferQuotaInMB="512" scheduledTransferPeriod="PT5M">
<IISLogs container="wad-iis-logfiles" directoryQuotaInMB="100" />
<FailedRequestLogs container="wad-iis-frq-logfiles" directoryQuotaInMB="100" />
</Directories>
<Logs bufferQuotaInMB="1024" scheduledTransferPeriod="PT5M" scheduledTransferLogLevelFilter="Warning" />
<PerformanceCounters bufferQuotaInMB="512" scheduledTransferPeriod="PT5M">
<PerformanceCounterConfiguration counterSpecifier="\Memory\Available MBytes" sampleRate="PT1M" />
<PerformanceCounterConfiguration counterSpecifier="\ASP.NET Applications(__Total__)\Requests/Sec" sampleRate="PT1M" />
<PerformanceCounterConfiguration counterSpecifier="\ASP.NET\Requests Queued" sampleRate="PT1M" />
<PerformanceCounterConfiguration counterSpecifier="\ASP.NET\Requests Rejected" sampleRate="PT1M" />
<PerformanceCounterConfiguration counterSpecifier="\Process(w3wp)\% Processor Time" sampleRate="PT1M" />
<PerformanceCounterConfiguration counterSpecifier="\Memory\Committed Bytes" sampleRate="PT1M" />
<PerformanceCounterConfiguration counterSpecifier="\.NET CLR Exceptions(_Global_)\# Exceps Thrown" sampleRate="PT1M" />
</PerformanceCounters>
<WindowsEventLog bufferQuotaInMB="1024" scheduledTransferPeriod="PT5M" scheduledTransferLogLevelFilter="Warning">
<DataSource name="Application!*" />
<DataSource name="System!*" />
<DataSource name="Security!*" />
<DataSource name="Windows Azure!*" />
</WindowsEventLog>
</DiagnosticMonitorConfiguration>
www.aditi.com
BE REALISTIC<?xml version="1.0" encoding="utf-8"?>
<DiagnosticMonitorConfiguration configurationChangePollInterval="PT1M" overallQuotaInMB="8192" xmlns="http://schemas.microsoft.com/ServiceHosting/2010/10/DiagnosticsConfiguration">
<DiagnosticInfrastructureLogs bufferQuotaInMB="512" scheduledTransferPeriod="PT5M" scheduledTransferLogLevelFilter="Warning" />
<Directories bufferQuotaInMB="512" scheduledTransferPeriod="PT5M">
<IISLogs container="wad-iis-logfiles" directoryQuotaInMB="100" />
<FailedRequestLogs container="wad-iis-frq-logfiles" directoryQuotaInMB="100" />
</Directories>
<Logs bufferQuotaInMB="1024" scheduledTransferPeriod="PT5M" scheduledTransferLogLevelFilter="Warning" />
<PerformanceCounters bufferQuotaInMB="512" scheduledTransferPeriod="PT5M">
<PerformanceCounterConfiguration counterSpecifier="\Memory\Available MBytes" sampleRate="PT1M" />
<PerformanceCounterConfiguration counterSpecifier="\ASP.NET Applications(__Total__)\Requests/Sec" sampleRate="PT1M" />
<PerformanceCounterConfiguration counterSpecifier="\ASP.NET\Requests Queued" sampleRate="PT1M" />
<PerformanceCounterConfiguration counterSpecifier="\ASP.NET\Requests Rejected" sampleRate="PT1M" />
<PerformanceCounterConfiguration counterSpecifier="\Process(w3wp)\% Processor Time" sampleRate="PT1M" />
<PerformanceCounterConfiguration counterSpecifier="\Memory\Committed Bytes" sampleRate="PT1M" />
<PerformanceCounterConfiguration counterSpecifier="\.NET CLR Exceptions(_Global_)\# Exceps Thrown" sampleRate="PT1M" />
</PerformanceCounters>
<WindowsEventLog bufferQuotaInMB="1024" scheduledTransferPeriod="PT5M" scheduledTransferLogLevelFilter="Warning">
<DataSource name="Application!*" />
<DataSource name="System!*" />
<DataSource name="Security!*" />
<DataSource name="Windows Azure!*" />
</WindowsEventLog>
</DiagnosticMonitorConfiguration>
www.aditi.com
BE REALISTIC<?xml version="1.0" encoding="utf-8"?>
<DiagnosticMonitorConfiguration configurationChangePollInterval="PT1M" overallQuotaInMB="8192" xmlns="http://schemas.microsoft.com/ServiceHosting/2010/10/DiagnosticsConfiguration">
<DiagnosticInfrastructureLogs bufferQuotaInMB="512" scheduledTransferPeriod="PT5M" scheduledTransferLogLevelFilter="Warning" />
<Directories bufferQuotaInMB="512" scheduledTransferPeriod="PT5M">
<IISLogs container="wad-iis-logfiles" directoryQuotaInMB="100" />
<FailedRequestLogs container="wad-iis-frq-logfiles" directoryQuotaInMB="100" />
</Directories>
<Logs bufferQuotaInMB="1024" scheduledTransferPeriod="PT5M" scheduledTransferLogLevelFilter="Warning" />
<PerformanceCounters bufferQuotaInMB="512" scheduledTransferPeriod="PT5M">
<PerformanceCounterConfiguration counterSpecifier="\Memory\Available MBytes" sampleRate="PT1M" />
<PerformanceCounterConfiguration counterSpecifier="\ASP.NET Applications(__Total__)\Requests/Sec" sampleRate="PT1M" />
<PerformanceCounterConfiguration counterSpecifier="\ASP.NET\Requests Queued" sampleRate="PT1M" />
<PerformanceCounterConfiguration counterSpecifier="\ASP.NET\Requests Rejected" sampleRate="PT1M" />
<PerformanceCounterConfiguration counterSpecifier="\Process(w3wp)\% Processor Time" sampleRate="PT1M" />
<PerformanceCounterConfiguration counterSpecifier="\Memory\Committed Bytes" sampleRate="PT1M" />
<PerformanceCounterConfiguration counterSpecifier="\.NET CLR Exceptions(_Global_)\# Exceps Thrown" sampleRate="PT1M" />
</PerformanceCounters>
<WindowsEventLog bufferQuotaInMB="1024" scheduledTransferPeriod="PT5M" scheduledTransferLogLevelFilter="Warning">
<DataSource name="Application!*" />
<DataSource name="System!*" />
<DataSource name="Security!*" />
<DataSource name="Windows Azure!*" />
</WindowsEventLog>
</DiagnosticMonitorConfiguration>
www.aditi.com
BE REALISTIC<?xml version="1.0" encoding="utf-8"?>
<DiagnosticMonitorConfiguration configurationChangePollInterval="PT1M" overallQuotaInMB="8192" xmlns="http://schemas.microsoft.com/ServiceHosting/2010/10/DiagnosticsConfiguration">
<DiagnosticInfrastructureLogs bufferQuotaInMB="512" scheduledTransferPeriod="PT5M" scheduledTransferLogLevelFilter="Warning" />
<Directories bufferQuotaInMB="512" scheduledTransferPeriod="PT5M">
<IISLogs container="wad-iis-logfiles" directoryQuotaInMB="100" />
<FailedRequestLogs container="wad-iis-frq-logfiles" directoryQuotaInMB="100" />
</Directories>
<Logs bufferQuotaInMB="1024" scheduledTransferPeriod="PT5M" scheduledTransferLogLevelFilter="Warning" />
<PerformanceCounters bufferQuotaInMB="512" scheduledTransferPeriod="PT5M">
<PerformanceCounterConfiguration counterSpecifier="\Memory\Available MBytes" sampleRate="PT1M" />
<PerformanceCounterConfiguration counterSpecifier="\ASP.NET Applications(__Total__)\Requests/Sec" sampleRate="PT1M" />
<PerformanceCounterConfiguration counterSpecifier="\ASP.NET\Requests Queued" sampleRate="PT1M" />
<PerformanceCounterConfiguration counterSpecifier="\ASP.NET\Requests Rejected" sampleRate="PT1M" />
<PerformanceCounterConfiguration counterSpecifier="\Process(w3wp)\% Processor Time" sampleRate="PT1M" />
<PerformanceCounterConfiguration counterSpecifier="\Memory\Committed Bytes" sampleRate="PT1M" />
<PerformanceCounterConfiguration counterSpecifier="\.NET CLR Exceptions(_Global_)\# Exceps Thrown" sampleRate="PT1M" />
</PerformanceCounters>
<WindowsEventLog bufferQuotaInMB="1024" scheduledTransferPeriod="PT5M" scheduledTransferLogLevelFilter="Warning">
<DataSource name="Application!*" />
<DataSource name="System!*" />
<DataSource name="Security!*" />
<DataSource name="Windows Azure!*" />
</WindowsEventLog>
</DiagnosticMonitorConfiguration>
www.aditi.com
SET PRIORITIES
o Two separate channels for telemetry data
– Vital information
• Application or service failures. Higher level of alerting.
• Fix and return to “normal” as soon as possible
– Day-to-day operational data
• Root cause analysis
• How to prevent in the future
o Fine tune the alerts – reduce false alarms and noise
www.aditi.com
CONSIDERATIONS
o Log all calls to external services
o Helpful for SLA violations or challenging a provider
o Log details of transient faults
o Partition telemetry data by date (or hour) – reduce impact of data aggregation or reporting
o Use a different storage account!
o Remove old / non-relevant telemetry data
www.aditi.com
ANALYSIS
Detect First
Transient
vs.
Systemic
Recover FirstRoot Cause
Analysis
www.aditi.com
QUICK ANALYSIS
www.aditi.com
QUICK ANALYSIS
www.aditi.com
QUICK ANALYSIS
% Processor Time
ASP.NET\Requests Queued
Memory\Available MBytes
www.aditi.com
QUICK ANALYSIS
Azure Management Portal
o Visualize key performance counters
via graph
o Data collected via host
o Requires co-admin access to
subscription
o Default data survives for 7 days
o Shows only performance counters
o No query capability
Azure Management Studio
o Visualize key performance counters
via graph
o Data collected via Azure Diagnostics
agent
o Anyone with storage account
credentials
o Data as long as you want it
o Full suite of instrumentation
o Full query and correlation capability
www.aditi.com
SUMMARY
o Instrumentation and telemetry are key to successful projects
o Cloud metrics similar to metrics for traditional applications
o Be realistic and set priorities
o Cerebrata Azure Management Studio an essential tool for
troubleshooting
www.aditi.com
CEREBRATA OFFER
Thank you for attending or watching the webinar!
15% off Azure Management Studio until April 30th, 2014
http://bit.ly/ams-webinar
www.aditi.com
ADITI OFFER
o Aditi provides an onsite cloud expert for a 2 day cloud strategy assessment
– Key objective is to analyze the viability of cloud as a deployment option, including its
technical and economic impact for a targeted workload or set of applications and
infrastructure
o Deliverables:
• Cloud Strategy Assessment
To know more, email us at [email protected]!
www.aditi.com
Mike WoodTECHNICAL EVANGELIST, CEREBRATA
Michael CollierPRINCIPLE CLOUD ARCHITECT, ADITI
Let’s continue the conversation.