Moving Enterprise Application Dev/Test to VMware’s
Internal Private Cloud- Operations Transformation
Venkat Gopalakrishnan, VMware
Kurt Milne, VMware
OPT5194
#OPT5194
2
Executive Summary
Key Lesson Learned
Invest in Agility, and Service Quality and Cost will Improve
AppOps Team Deploy integrated, complex SDLC instances to support 600 developers.
Challenge Process is manual, siloed, slow, unreliable. Reduces developer efficiency. Increases risk.
Two Fundamentally Different Options
1. Fix the “human middleware” on traditional infrastructure
2. Replace and automate on private cloud SDDC
Results From Choice to Replace and Automate on SDDC
Process time – dropped from 4 weeks to 36 hours
Developer productivity – increased 20% or more
Project schedule risk - eliminated
Annual infrastructure and operating costs - reduced by $6M annually
3
Agenda
The Challenge
The Options
The Solution
Transforming Operations
The Results
Key Takeaways
4
Corporate IT Application Group
Manage portfolio of enterprise
applications used by global
business functions
AppOps team
27 engineers
Customer
600 developers
Role
Provision 16 different dev/test instances
that include 80+ app components.
Infrastructure footprint
~4,000 non-production VMs
~500 production VMs
Enterprise Application Portfolio
SaaS 65
IT tools 50
Business 100
Total 215
6
Project “OneCloud” – Explosive Tenant Growth
Corp IT AppOps = Tenant #4
Very low cost per VM
“Cloud first” policy in IT
AppOps
SDLC
provisioning
Hands
On Labs
Hol.vmware.com
Services &
Support
Customer environment
reproduction
Sales
Engineering
Demo Pods
VMworld 2013
Management
BU Field
Testing
TechSummit
2013
Tech Ops
Mini R&D
Cloud
Training
LiveFire
Private Cloud IaaS Software Defined Data Center
June
2012
Jan.
2013
Today End
2013
2014
Launched
Built on
vCloud Suite
4 tenants
10,000 VMs
9 tenants
38,000 VMs
12 tenants
50,000 VMs
More
services
Timeline
7
Private Cloud IaaS Software Defined Data Center
Bring Your Own – Application Ops
Three-tier Ops Model
Different Tenants
Different Application Ops
Application Ops (Provided by Tenant)
Now an infrastructure service consumer. Provisioning.
Monitoring. Configuring. Upgrades. Maintenance.
Many typical ops tasks still required.
Infrastructure Ops (Provided by OneCloud infrastructure team)
Network, storage, compute availability. Deliver to SLA.
Tenant/Service Ops (Provided by OneCloud service team)
Common service definitions, SLA, tenant onboarding, tenant management
Private Cloud IaaS Software Defined Data Center
8
Application Ops – Requirements Vary By Workload Type
Ops requirements for this App… …are different than Ops
requirements for this App
Corp IT– App Dev/Test
Provision: Complex App Stack
Manage: Yes
Duration: 6+ months
Provision: VM
Manage: No
Duration: 3 weeks
Sales – Demo Pod
9
Bring Your Own – Cloud Automation and Management
Tenant – needs automation and
management capabilities
IaaS – needs automation and
management capabilities
Service Manager
Decides what goes in service catalog.
Service Catalog
Mechanism to request service.
Policy
Logic used to guide automation.
Cloud Automation and Management
Manage workloads and underlying services.
Tenant 1 Tenant 2 Tenant 3
Private Cloud IaaS Software Defined Data Center
10
Phased Project Approach
Deploy automation and management
capabilities
Create 5+1 vDCs
Blueprints for 80+ applications
Service catalog with 16 instances
Transition 2,800 VMs – Dev, Test, UAT
Key Milestone – 4 months
• 1st automated instance @ 172 hours
Expand service profiles – using
expanded virtual network and
storage in IaaS
Financial transformation – chargeback
Advanced analytics, performance
management
Transition 1,200 VMs Stage, LoadTest
Phase 2 - H1 2014
Production Dev Test UAT Stage Load
Test
Phase 1 - Completed
© 2013 VMware Inc. All rights reserved
Automation and Ops Transformation
Venkat Gopalakrishnan – Director IT Operations
12
Traditional Operations Functions – Provided by AppOps
People Process Governance
Extension
via
API and SDK
3rd Party
Components
Cloud Automation and Management
vCloud Suite
Private Cloud IaaS Software Defined Data Center
13
Human Middleware Problem – AppOps Team View
Global Team
Management
Project manage
around PTO,
holidays,
variable skills
Capacity
Constrained
Only 4-6 projects
in parallel
Slow and
Error Prone
Many manual steps.
Ticketing systems.
Human error
Handoffs
Silos. Globally
distributed teams.
Multiple application
experts
14
100% Task Automation – Not Going to Meet Needs
Request Infrastructure
Verification
Hardware
Setup
Build VMs –
New or Clone
DNS Entries Install, Setup,
Configure
Workload Database
Refresh
Latest Code
Deployment
Load Balancer
Entries
Web Server
Configuration
Firewall
Changes
External Interface &
Integration
PPM Tasks Workload
Monitoring Setup
Security – VM
access control Testing
1- 2 days 3- 5 days 2 – 4 weeks 3 – 5 days
1 – 2 days 4 – 7 days 2 – 3 days 2 – 5 days
2 – 5 days 1 -2 days 2 – 4 days 1 – 2 days
3 – 7 days 2 – 3 days 1 day 5 – 6 days
Task time Wait time
15
SDDC Eliminates Steps
Request Infrastructure
Verification
Hardware
Setup
Build VMs –
New or Clone
DNS Entries Install, Setup,
Configure
Workload Database
Refresh
Latest Code
Deployment
Load Balancer
Entries
Web Server
Configuration
Firewall
Changes
External Interface &
Integration
PPM Tasks Workload
Monitoring Setup
Security – VM
access control Testing
1- 2 days
1 – 2 days 4 – 7 days 2 – 3 days 2 – 5 days
2 – 5 days 1 -2 days 2 – 4 days 1 – 2 days
3 – 7 days 2 – 3 days 1 day 5 – 6 days
Task time Wait time
16
Automation Eliminates Wait Time…
Request Infrastructure
Verification
Hardware
Setup
Build VMs –
New or Clone
DNS Entries Install, Setup,
Configure
Workload Database
Refresh
Latest Code
Deployment
Load Balancer
Entries
Web Server
Configuration
Firewall
Changes
External Interface &
Integration
PPM Tasks Workload
Monitoring Setup
Security – VM
access control Testing
Task time Wait time
17
… and Manual Work
Request Infrastructure
Verification
Hardware
Setup
Build VMs –
New or Clone
DNS Entries Install, Setup,
Configure
Workload Database
Refresh
Latest Code
Deployment
Load Balancer
Entries
Web Server
Configuration
Firewall
Changes
External Interface &
Integration
PPM Tasks Workload
Monitoring Setup
Security – VM
access control Testing
Task time Wait time
18
Why Standardize and Automate Service Provisioning?
Service
Definition
Blueprint
Policy POC1 POC2 To Catalog
Provision QA Staging Release
40 work weeks effort – Per Release…
20 work weeks effort – Once!
Run Book
36 hours
Service
Request
4 weeks
Virtual Data Center
Virtual Server
It takes less effort/time to convert the runbook into blueprints
than it takes to “run” the runbook...
19
Transformation – Process
Challenges
First version of automation solution did not meet all needs
Actively deploying instances while building machines
Difficulty in managing integration with SaaS apps
High inflow of demand
Action
Automation capability – parallel provisioning
Testing suite additional functions getting automated – environmental
and functional
Continuous process improvement in place, root cause action after every cycle
Instance provisioning being treated as a ‘release’
Documentation is key to achieve predictability
20
Total Cycle Time – Improvements
1. Re-provision instead of repair,
and cross-training teams
2. Improve blueprints to drive down
defects, automate functional and
environmental testing
3. More automation and management
capabilities
Plan to get to 24 hour goal
• Even more automation and
management changes
• Improve QA testing process
Improvements
Provision – 16 hours
QA – 8 hours
2013 Goal
0
20
40
60
80
100
120
140
160
180
200
Test13 Dev14 Test14 Dev15 Test15 Dev16 Test16
1.
2.
3.
Pro
vis
ionin
g t
ime (
hours
)
SDLC Instance - Oracle ERP with Portal (date)
05/07 05/22 05/27 06/19 06/25 07/22 08/05
21
Process – Details
Results
4 weeks to 36 hours.
24 hours (Provisioning 16 hours, Testing 8 hours) by Q4’13
Streamline demand intake process
Created bandwidth to provision an instance per week
Key
Takeaways
Automate end to end process, not focus on individual tasks
Empower global team
Don’t skimp on Blueprints
22
Transformation – People
Challenges
New People Roles and Change in Skill Sets
New role for Blueprint creation and management
Automation requires global coverage to manage process
Scarcity of skilled resources to perform new role
Most top skills were in one location
Action
IT resources obtained vCloud certification
Technical skills assessment, create a well balanced global team
Created subject matter expertise in installation and configuration
of tech stack/application
Team got solid shell scripting, tasks automation and trouble shooting skills
23
Ticketing Systems and Late Night Phone Calls – Gone!
24
People – Details
Results
27 – now 22 – goal 5 (old instances still in use)
Provisioning can be initiated and executed from any
of global location
Employees performing high value work like blueprint
management
Key
Takeaways
Promote and help people internalize vision to get in lock step
Mental shift – fix blue print and re-provision vs fix problem
25
Transformation – Governance
Challenges
Functional test failures
Blueprint changes resulting in manual work
Lack of service definition and process to track cost per service
Action
Avoided changes during provisioning cycle
Re-provision instead of repair
Initiated programs to transform IT-as-a-Service (ITaaS)
26
Blueprint Work – Governance Before Service Request
27
Governance: Details
Results
Predictable delivery of 36 hours, targeting 24 hours by Q4’13
Improvement in functional testing, lower defect count
15 instances provisioned in 4 months
Key
Takeaways
Spend time in Blue printing all apps, no shortcut
“Disposable Infrastructure” reduce IT Capex
28
Results
Phase 1 Phase 2
Cycle Time
Hours per SDLC instance
172
36
Today
Phase 1 Phase 2
Virtual Machines Transitioned
To Private Cloud
Phase 1 Phase 2
AppOps Team
# of Engineers
Goal – 4000
2,800
2,200
Goal - 5
27
22
Reduced provision time
95% (4 weeks to 36 hours)
Improved productivity
of 600 developers
20%
Reduced
IT operations costs
$1.5M /year
Able to say
“yes” to developer requests
Reduced the cost
of a VM/month
80% ($133 to $20)
Reduced
infrastructure costs
$4.5M/year
672 hours (4 weeks)
Goal – 24 hours
Today Today
29
Phase 1 Phase 2
Cycle Time
Hours per SDLC instance
172
36
Today
Phase 1 Phase 2
Virtual Machines Transitioned
To Private Cloud
Phase 1 Phase 2
AppOps team
# of Engineers
Goal – 4000
2,800
2,200
Goal - 5
27
22
672 hours (4 weeks)
Goal – 24 hours
Today Today
Reduced provision time
95% (4 weeks to 36 hours)
Improved productivity
of 600 developers
20%
Reduced
IT operations costs
$1.5M /year
Able to say
“yes” to developer requests
Reduced the cost of
a VM/month
80% ($133 to $20)
Reduced
infrastructure costs
$4.5M/year
Bottom Line
Agility is Self-Sustaining
30
Key Takeaways (Advice)
Share results of early automation with developers (customers) Show how the effort will help them
Training is key. Blueprint management role become key SME Help them become experts
Don’t try to automate individual tasks Take holistic approach – system’s footprint view
SDDC provides greater flexibility not possible with server virtualization Software controlled infrastructure
31
Additional Resources
Expert Lounge – Tues 12-1pm. Sign up in person at Expert Lounge.
Group Discussion – Wed 11am – 12pm. OPT 1002-GD
Related Architecture Session- Wed 3:30 – 4:30 (Full - look for repeat)
VSVC4948 - Moving Enterprise Application Dev/Test to VMware’s internal Private Cloud –
Architecture, Implementation and Integration
Blogs.vmware.com/cloudops
White paper and video/demo of this presentation
IT Transformation webpage on vmware.com
http://www.vmware.com/solutions/vmware-it-journey/
@vmwarecloudops #cloudops
THANK YOU
Moving Enterprise Application Dev/Test to VMware’s
Internal Private Cloud- Operations Transformation
Venkat Gopalakrishnan, VMware
Kurt Milne, VMware
OPT5194
#OPT5194
Top Related