VMworld 2013: Troubleshooting at Cox Communications with VMware vCenter Log Insight and vCenter...
-
Upload
vmworld -
Category
Technology
-
view
116 -
download
2
Transcript of VMworld 2013: Troubleshooting at Cox Communications with VMware vCenter Log Insight and vCenter...
Troubleshooting at Cox Communications with
VMware vCenter Log Insight and vCenter
Operations Management Suite
Chris Nakagaki, Cox Communications
Jason Davis, Cox Communications
Himanshu Kumar Singh, VMware
VCM5034
#VCM5034
Troubleshooting at
Cox Communications
with VMware vCenter
Log Insight and
vCenter Operations
Management Suite
Press Start
Player 1!
x3
World vCOPs
Agenda Background
Why vCOPs and Log Insight?
vCOPs
Capacity Planning Demo
Custom Dashboarding Demo
HeatMaps Demo
Log Insight – What is it? How did Cox use it?
Storage Deeper Dive Demo
VM Backup Failures Demo
Q&A
How to Play
Background
Cox Communications, Inc. (Atlanta)
100+ Hosts, 3000+ VM’s
2800+GHz Compute Capacity
13.5 TB Memory Capacity
200TB SAN Storage
Chris Nakagaki vExpert 2011, 2012, 2013
10 years @ Cox Communications
Started w/ ESX 2.5
@zsoldier
Jason Davis
15 years Windows Experience
12 years @ Cox Communications
Started w/ ESX 2.0
Credits?
Why vCOPs and Log Insight?
!?
Dynamic Thresholds (vCOPs)
Easy Deployment (vCOPs/Log Insight)
Capacity Planning (vCOPs)
Cloud Suite Cost Savings (vCOPs)
Log Aggregation (Log Insight)
Pretty Pictures (vCOPs)
Because we like to have a strong upper
and lower body.
vCOPs – Is there capacity?
1UP!
Network switch maintenance
Multiple hosted production VM’s
potentially affected
Can we place affected hosts in
maintenance mode and maintain
uptime?
vCOPs – Is there capacity?
1UP!
vCOPs – Is there capacity?
1UP!
Conclusion:
Yes, there is capacity
Network maintenance can proceed
Demonstrated:
vCOPs Capacity Planning Tool
Bottleneck is disk space not anything else
VM’s can continue to run
vCOPs - How do we monitor streaming servers?
Sim Infrastructure
Live streaming event w/ CEO and CTO
Monitor VM’s associated w/ streaming
service live!
Key Metrics?
CPU
Memory
Network
vCOPs - How do we monitor streaming servers?
Sim Infrastructure
vCOPs - How do we monitor streaming servers?
Sim Infrastructure
Conclusion:
vCOPs custom dashboarding is useful!
We demonstrated:
Grouping all streaming VMs as an
application object
Creating a custom dashboard
Focused on 3 Key Metrics
Health Tree to show who’s being lazy
vCOPs – Why are VM’s slow?
POW!
Receiving reports that VM’s are
performing slowly.
No immediate discernable pattern
vCOPs to the rescue!
vCOPs – Why are VM’s slow?
POW!
vCOPs – Why are VM’s slow?
POW!
Determined one array having severe
latency.
Now questions arise around VMware NMP
To Log Insight for deeper analysis…
Why vCOPs AND Log Insight?
Chicken Legs
Log Insight – What is it?
Continue?
Continue?
5 4 3 2 1 0
18
We Interrupt This Program
To Bring You An Important Message…
19
Introducing: VMware vCenter Log Insight
Himanshu Singh
Senior Product Marketing Manager
Enterprise and Cloud Management, VMware
20
Problem: Operate and Troubleshoot a Complex System
VMware
Logs
OS and
App Logs
Physical Infrastructure Logs
21
VMware’s Approach to Log Management
Extend Analytics to Log Data
• With vC Ops, VMware introduced an analytics-based operations
management solution for structured data (metrics, KPIs, events, alerts)
• Log Insight extends our analytics-based approach to logs and
unstructured, machine generated data
Easy to Use and Accessible
• Existing solutions are highly specialized and often too expensive
• Log Insight has an intuitive, easy-to-use interface
• Using a predictable pricing model with unlimited amount of log data,
making it accessible to all
Optimized for VMware Environments
• Log Insight comes with built-in knowledge and native support for vSphere
• Integration with vC Ops maximizes ROI and value, providing a complete
cloud operations management solution
1
2
3
22
VMware Cloud Ops Mgmt = Log Insight and vCenter Operations
Cloud Operations Management
• vCenter Log Insight and vCenter
Operations complement each other
• Delivers best of breed capabilities for
performance, capacity, configuration
management
• Tight integration enables seamless
transition from monitoring to
troubleshooting
• Log Insight and VC Ops together provide
a complete solution for
Cloud Operations Management
23
Key vCenter Log Insight Use Cases
IT Operations
• Troubleshooting and Root Cause Analysis
I observed a problem (e.g. slowness), try to troubleshoot the problem and identify the
part of the stack that is responsible (e.g. network delay vs storage)
Follow the trail from vC Ops to logs to get to root cause to an observed problem
• Monitoring Using Logs
Monitor metrics and events (performance & change) that are visible only in logs
Collect all the data in one place without the need for custom parsing, transformation of
data
Security and Compliance
• Security Forensics
• Comprehensive Audit (who, when) / Compliance Reporting
Business Transaction Monitoring
• Collect and correlate transaction logs with infrastructure performance
24
Integration with vCenter Operations
Automated correlation of performance and log data
25
Announcing: Log Insight Content Pack Marketplace
And more…
https://solutionexchange.vmware.com/store/loginsight
Extend vCenter Log Insight with Content Packs from:
26
And Now…
Back To Regular Scheduled Programming…
Player 2!
x3
World Log Insight
Log Insight – Was Round Robin causing issues?
Were paths being marked dead?
Were the paths remaining dead?
Did the paths come back when
expected?
LET’S SEE ….
Leeroy Jenkins!
Log Insight – Was Round Robin causing issues?
Leeroy Jenkins!
Log Insight – Was Round Robin causing issues?
Conclusion:
No, round robin was not causing issues!
We Demonstrated:
Paths were marked DEAD.
Paths remained DEAD.
Paths came back ON when expected.
Leeroy Jenkins!
Log Insight – What’s causing VM backup failures?
Netbackup has snapshot errors (status
code 156).
Symantec HOWTO70949 article states
there are multiple possible causes.
Which is the most probable cause?
Does VMware have correlating logs?
LET’S SEE …
Paku-Man?
Log Insight – What’s causing VM backup failures?
Paku-Man?
Log Insight – What’s causing VM backup failures?
Conclusion:
The most probable cause is inability to
create VM snapshots due to timeouts.
We Demonstrated:
Correlating errors in VMware logs stating: “The guest OS has reported an error during quiescing.”
VMware KB 1018194 provides additional
troubleshooting steps:
Reboot the VM
Reduce I/O
Etc ….
Paku-Man?
Q&A
42
35
Other VMware Activities Related to This Session
HOL:
HOL-SDC-1301
Applied Cloud Operations
Group Discussions:
VCM1002-GD, VCM1004-GD
Cloud Operations with Hicham Mourad or Sam McBride
Breakout Session – repeat by demand:
VCM4528 – Thursday, 2 pm Moscone West, room 3001
Tips and Tricks with vCenter Log Insight
Follow us:
@VMLogInsight and get 5 free licenses
Hang with us:
Booth 2020 – Cloud Management Lounge
VCM5034
THANK YOU
Troubleshooting at Cox Communications with
VMware vCenter Log Insight and vCenter
Operations Management Suite
Chris Nakagaki, Cox Communications
Jason Davis, Cox Communications
Himanshu Kumar Singh, VMware
VCM5034
#VCM5034