© 2010 VMware Inc. All rights reserved
vSphere Management and Automation
Al GrandvilleVMware - Sr. Systems EngineerSouth Florida - Enterprise Accounts
2
COST EFFICIENCY
QUALITY OF SERVICE
BUSINESS AGILITY
IT Production Business Production IT as a Service
15%
30%
70%
85%
Customer Value Journey
How?
3
VMware’s Current Focus: Private Cloud
Enable Self-Service Infrastructure as a Service Orchestration / workflow Chargeback
Automate Infrastructure & Operations Management Performance Capacity Configuration
Ensure Security & Compliance Operational best practices Regulatory
Streamline IT Service Management Problem, incident, change and
configuration
4
Performance
5
1st and 2nd Generation Monitoring
Monitoring Solutions Insufficient for Performance Management
1st generation – good data collection, but alert storms
2nd generation – rules can’t adapt to change
Result:• Performance problems often occur
with no real warning• Performance problems require time
consuming manual effort to resolve• Virtual infrastructure is blamed for
application performance problems that originate elsewhere
3/4/08 16:45 Host 1 processingTimeServ The Processing Time Service Level on process… n/a n/a n/a
3/4/08 16:45 Host 1 Processor_Table 0 Processor 0 is at 87.0%. A CPU Bottleneck is….. n/a 0 Windows_System
3/4/08 16:44 Host 2 System_Table The number of hardware interrupts per second… n/a 0 Windows_System
3/4/08 16:30 Host 2 Processor_Table 1 Processor 1 is at 84.0%. A CPU Bottleneck is …. n/a 0 Windows_System
3/4/08 16:25 n/a responseTimeServ… The Response Time Service Level on Toadwor.. n/a n/a n/a
3/4/08 16:20 n/a processingTimeServ.. The Processing Time Service Level on Prospec.. n/a n/a n/a
3/4/08 16:08 Host 1 Ora_Sql_Hogs_Alert Oracle: SFPRD A CPU Hog has been detected n/a OraSF Oracle
3/4/08 16:08 Host 1 Ora_Sql_Hogs_Alert Oracle: SFPRD SQL with high
* New Approach – uses analytics to turn a sea of data into information
6
Performance Visibility Across the Virtualized Datacenter
Full visibility up and down the
datacenter stack Auto-detects
deviations from learned
baselines
Drill into ESX server for
further details
Automatically aggregates
100s of metrics into health
scores
7
Anticipate Application Issues Before They Happen
Proactive warning related
to Oracle workload
Correlated workload metrics
forecast a potential breach
Project forward future issues
hours or days in advance
8
Slide 8
Where does Alive’s Analytics begin?
Learns your dynamic ranges of “Normal” without Templates
Doesn’t assume IT data has a normal “bell-shaped” distribution
Accepts any time series data
Learns patterns of behavior – hour-by-hour, day-by-day, etc.
Sophisticated, Metric-level Dynamic Thresholding
9
Smart Alert™ - Alive Correlates Abnormalities Across the Application
User Experience (eg, RUM, etc.)
Servers (eg, VCenter, ITM)
App Data (eg, Wily, etc.)
Network Data (eg, SMARTS, etc.)
Application-level AnalysisSmart Alert Generation (“When”)
Business Data (eg, Finance)
Root Cause AnalysisSmart Alert Summary (“What”)
! SMART ALERT
10
What Does Alive Do With This Foundational Analysis?
Alive Auto-Pilot Dashboards
Alive Smart Alert™
Alive Generates Preemptive “Performance Alerts” with automatic RCA
-- Houston We Have a Problem!
Alive shows role-based information about on-going Performance
-- A Single View of the Truth!
Alive On-Demand Analytics
Alive’s behavioral and trend analysis gives powerful information for infrastructure optimization
-- Capacity Management, VM Workload Optimization
11
VMware Performance Management OPEX Savings
Incident Management Lifecycle Savings Manage/Resolve incidents Proactive alerts reduce costs
30-40%
Change Lifecycle Savings Manage changes to
apps/infrastructure “Before/after” analysis reduces
changed-related incidents 30-40%
Incident Management Savings Managing Service Desk issues
(Incidents) Manual threshold elimination
reduces erroneous tickets by 50-60%
Problem Management Savings Closing problems after systems
restored, includes root cause analysis
Root cause analysis reduces problem closure by 30%
12
Customer Success: IT Operations
Before 400 critical alerts/hour
End user complaints alerted IT to the problem
End users impacted (avg. 2 hours/outage)
12 Level-2 engineers on bridge call to address problem
After 20 alerts/MONTH
3 hours advanced warning of slowdown w/root cause
NO end user impact
1 Level-2 Engineer and 1 DBA to address problems
Learn NormalSmart Alerting
Root Cause
Solve performance issues before end users are affected and reduce total alerts
13
Capacity
14
Why do you need to manage virtual infrastructure capacity?
Spreadsheets and simple tools are no longer sufficient for managing virtual resources.
90% of VMs are over-provisioned!
15
Capacity Management of Virtual Infrastructure is hard!
As the creator of vSphere, VMware can give you a complete and accurate view of your available resources
CPU OptimizationsvSMP, Shares, Reservations, Limits
Memory OptimizationsTransparent Page Sharing,
Memory Ballooning, Memory Compression
Storage Optimizations Thin Provisioning, Linked-Clones
ClustersDRS, HA, FT, vMotion, Storage vMotion
Workload FluxVMs growing/shrinking, added/removed
vSphere36 days remaining
Reserved Capacity
?Usable
Capacity
RemainingCapacity
UsedCapacity
16
How VMware Simplifies Capacity Management
Deliver the right capacity at the right time!
When will I run out of capacity? What if I add, remove, reconfigure capacity? Can I defer infrastructure investments?
Forecast
How can we use my resources more efficiently? What VMs should be right-sized? Can I reclaim over-provisioned or unused capacity?
Optimize
What are my historical utilization trends? What resources have been requested vs. needed? How many more VMs will fit in my current VI?
Analyze
17
Effective Capacity Management Increases Consolidation Ratios
Source: Leading Healthcare Provider in Southern US
You could fit 3-6x more VMs in your environment and plan for the future.
Microsoft License Savings
$750,000
5:1 30:1
18
Change & Configuration
19
OS Data HW Data Cron Jobs Device Drivers Storage (Quota, Space,
File systems) Event Log Settings File System Networking Processes Registry Services/Exported Svcs Software Inventory System Startup User Services WMI
Accounts Groups Account Policy Audit Policy Directory Permissions Directory Audit Settings Event Log & (ng)Syslog
config and Events Patches Registry Key Permissions Service Accounts Shares and Permissions User Rights
Active Directory IIS SQL Server Exchange Oracle Apache Sendmail
OS & HW DNS & Routing File level details Physical Network Resource Pools Virtual Network Snapshot details Storage (SAN,NFS,…) VI Capability
(vMotion,DRS,…) Advanced settings Security Profile Logs
Change Management Must be Comprehensive to be Effective
More than 80,000 configuration variables!
VMware now provides change and remediation for the entire datacenterActive Directory
& SecurityVMware Infrastructure Operations Applications
20
VMware delivers Rapid Response and Remediation
Change is #1 Reason for Downtime
1. Real-time analytics alert of impending performance degradation
2. Comprehensive change tracking isolates root cause
3. Single-click rollback to remediate and return to normal
Great! No more conference calls and fire drills!
21
Comprehensive Impact Analysis Drives Change Control
A change to this VM
Directly affects this App Server
Could impact these Web
Servers
Potential impacted
clients
Get better visibility into physical, virtual infrastructure & applications
22
Save Time with Automated Patching and Provisioning
Software Provisioning (Windows)• Create software packages• Push packages to systems & guests• Tied to compliance• Push software to systems out of compliance
(e.g. Anti-virus)
Patching to Mitigate Vulnerabilities• Pull down patch bulletins for the OS vendors• Assess the infrastructure for vulnerabilities• Remediate - Push patches out to the guests
and systems that need them
Provision Standard Images (vSphere, Windows and Linux)• Install ESX to Bare Metal• Install OS in a VM Container in ESX/ESXi • Install OS to Bare Metal
Common provisioning platform for both physical and virtual environments
Provision SW
Patch
Provision OS
23
Account PolicySoftware Inventory
File SystemServices
Anti Virus PostureChange Rollback
Customer Success: Operations Management
Manage 3X more VMs and Physical Servers with the same staff
80,000+ variablesPasswords
Access Registration keys
PatchesVulnerabilities
Active Directory
35 Windows servers managed / Admin
12 Web servers / Admin 11 UNIX/Linux servers
/ Admin
24 hours to change local admin passwords on 2000 systems
2 administrators covering all systems including DMZ
Verification not possible due to time and constraints resources
Before
72 Windows servers managed / Admin
32 Web servers / Admin 24 UNIX/Linux servers
/ Admin
1 Hour to Change local admin passwords on 2000 Systems
1 Administrator scheduling the job via VCM
Verification is automated with VCM Reporting
After
24
Alive Enterprise
Performance Analytics
vCenter Configuration Mgr
Change & Configuration Mgmt
VMware Solution for Infrastructure and Operations Mgmt
Non-VMware (incl. physical) environments VMware Cloud environments
vSphere
Adapters
Application Discovery Mgr
Discovery & Impact Analysis
vCenter CapacityIQ
vCenter ServerCapacity Optimization
vSphere Management Console
25
Questions?
Top Related