The IBM Research Compute Cloud (RC2): Innovation, Best Practices and Lessons Learned on Delivering a...
-
Upload
society-of-women-engineers -
Category
Engineering
-
view
207 -
download
5
description
Transcript of The IBM Research Compute Cloud (RC2): Innovation, Best Practices and Lessons Learned on Delivering a...
© 2012 IBM Corporation11/2012
The IBM Research Compute Cloud (RC2) – Innovation, Best Practices and Lessons Learned on Delivering a Private Cloud for the Enterprise
Lorraine M. Herger
Director
Research Integrated Solutions
© 2012 IBM Corporation2
Agenda
Setting the Context – IBM Research– Research Integrated Solutions (RIS)– Research IS Innovation
History of the Research Compute Cloud (RC2) Project– RC2 Journey – RC2 Today
• Architecture • Monitoring, Management, Statistics, Metrics• Applications
Lessons Learned During Transformation to Cloud – Technical challenges– Cultural challenges
© 2012 IBM Corporation3
IBM Research Labs
IBM Research – Openings in 2011
IBM Research – Openings in 2012
IBM Research: Globalization
China
WatsonAlmaden
Austin
Tokyo
Zurich
India
Dublin
Australia
Brazil
Africa Next Gen Public Sector Water & transportation Human Capacity Development
Natural Resources Disaster management Healthcare/Life Sciences
(60% funding from gov’t)
Natural Resources Smarter Devices Human Systems/Events
Industry Solutions
Accessibility
Internet of Things
“Big Data” Analytics
Security
Haifa
Smarter Cities
Services Mobile
Communications
Semiconductors Systems Software &
Services Semiconductors Processors
Analytics Storage Nanotech
Healthcare
Science Nanotech
Materials
© 2012 IBM Corporation4
ChemistryComputer Science
ElectricalEngineering
Materials Science
Mathematical Science Physics
Service Science
Behavioral Science
IBM Research – Who We Are
BusinessInnovation
TechnologyInnovation
Social Innovation
Demand Innovation
Science & Engineering
Business & Data Management
Social & Cognitive Sciences
Economics & Markets
3,000 engineers, scientists and technical professionals
Pushing the boundaries of science and technology to make the world work better
Helping clients, governments and universities apply scientific breakthroughs to solve challenges in business and society
© 2012 International Business Machines Corporation 5
Defining research values provide the compass for an evolving technical
agenda and business model
Defining values:
– “..people from several different disciplines trying to visualize … future potentials..”
– “..intellectual curiosity and the love of knowledge..”
– recognition that the “greatest progress seems to come from almost casual encounters..”
“… the long, curved front halls. With beautiful views on the outside, but with the sense as you walked along that you could not quite see what lay ahead, just around the curve. This surely reflected the adventure of research…”
- Gardiner Tucker, Lab Director, IBM Research 1963 - 1967
IBM Research – Our 2nd Home, Yorktown Heights, NY
© 2012 IBM Corporation6
2012: Watson Research Computing Environment
30,000 sq ft / 7 data centers / 20 Labs in Yorktown, Hawthorne and Poughkeepsie
2,500 IBM P and X Servers
310 Blade Center Chassis / 3,200 Blade Servers
500+ non Cloud Virtual Servers
>1 petabyte Storage
Research Compute Cloud (RC2) – 2,900 current Virtual Machines– 3.2 petabytes storage– Over 50,000 VMs created and used since Nov 2009
2.5 Mw IT Energy Load
6
Yorktown Hawthorne Poughkeepsie
© 2012 IBM Corporation7
Mission of Research Integrated Solutions
The worldwide Research Information Services community is a global team of IT professionals which participates in the Research Division mission as an innovation partner.
The worldwide Research Information Services community is a global team of IT professionals which participates in the Research Division mission as an innovation partner.
Provide leading edge Information Services to our users and serve as an agile "Living Laboratory” for Researchers to deploy and demonstrate experimental technology
Drive Research-wide teamwork and a common strategy
Increase IBM value through partnerships– Main Partners: IBM Research, IBM Divisions and Clients– Innovation partner with Research teams on key initiatives– Be recognized as the best of breed IT organization in IBM and outside of IBM
in the industry as a premier leadership IT organization
Research IS
© 2012 IBM Corporation8
Research Living Lab Partnerships: Delivering Innovation
Innovation + Solution Delivery - Delivering Value to the World
DC Robot
Mobile Measurement Technology
Deep Thunder
8
© 2012 IBM Corporation9
Agenda
Setting the Context – IBM Research– Research Integrated Solutions – Research IS Innovation
History of the Research Compute Cloud (RC2) Project– RC2 Journey – RC2 Today
• Architecture • Monitoring, Management, Statistics, Metrics• Applications
Lessons Learned During Transformation to Cloud – Technical challenges– Cultural challenges
© 2012 IBM Corporation10
Cloud computing is a new consumption and delivery model inspired by consumer internet services.
5 key characteristics:
1. On-demand self-service 2. Ubiquitous network access3. Location independent resource
pooling4. Rapid elasticity5. Flexible pricing models
VirtualizationServiceAutomation
UsageTracking
Web 2.0SOA
End User Focused
What is Cloud Computing?
An effective cloud deployment is built on an Integrated Service Management Platform, Dynamic Infrastructure and can be part of an overall data center transformation plan
See What is Cloud Computing for more information,
http://www.ibm.com/cloud-computing/us/en/what-is-cloud-computing.html
© 2012 IBM Corporation11
Cloud – Our Goals – Innovation and Operational Excellence
2009
Deliver customer value through the Cloud in the areas of Security, Energy Efficiency, and Services Enablement.
Create a Cloud ecosystem through collaboration services, platform scaling technologies, and exploration of cloud aware middleware.
2010
Create a competitive Compute cloud offering and significant value through specialized clouds such as Storage Cloud, Test Cloud, Desktop Cloud and Industry Solution Specific clouds.
2011
Substantially contribute to the IBM Cloud product & service offerings.
Drive end-to-end differentiation through private, hybrid, and industry specific clouds.
Leverage structure aware image lifecycle management, fine-grained security, quality of service optimized Platform as a Service.
Optimally manage virtualized environments on the cloud.
Leverage High Scale Low Touch Cloud Operating Environment and single view of the hybrid cloud.
Include innovative Research contributions in the Common Cloud Management Platform.
© 2012 IBM Corporation
Research Compute Cloud - Objectives
12
Goal: Create a Research cross strategy initiative to establish an environment for innovation in Cloud Computing. Harness the Research ‘living lab’ for high growth areas. Harvest Cloud Computing technology for client facing opportunities.
Approach: Use Virtualization and ‘cloud’ as a technology enabler to deliver a Worldwide Research Computing Service as a means to unite I/T process and rapid technology delivery. RC2 defined as IBM Research intersection point.
Collaboration: Overlay on Research Computing Cloud leveraging existing major initiatives. Focusing on client-driven scenarios, and close partnership with IBM Services, software and systems to maximize IBM integrated value proposition. Intersect IBM Cloud Roadmap.
Value: Deliver greater a more effective and efficient set of I/T capabilities in support of Research and IBM priorities. Showcase via real world application of cloud computing and virtualization technologies.
Future Capabilities: Datacenter Optimization & Management (Ensembles), Catalog-based I/T Services, Virtual Image management, Security Zones, Workload Optimization
Greater IT Capabilities
VirtualWeb
Server
VirtualWeb
Server
WebServer
VirtualDatabase
Server
DatabaseServer
AppServer
VirtualApp
Server
AppServer
Virtual resources reduce cost and fulfill On-Demand promise − Adjustable capacity− Movable across Research Labs− Rapid Provisioning− Collaborative shared Images− Standard Software Stacks− Dense Asset utilization− Simplified Systems Management
VirtualSystems
VirtualDisksVirtual
LANs
Watson
Australia
Haifa
13 © 2006 IBM Corporation© 2012 IBM Corporation© 2012 IBM Corporation
Four axis of Innovation Strategy for Research Cloud
1. Deliver Effective and Efficient I/T
low high
low
high
IT Cost
Value
target
Virtual MachinesStorage CloudFast provisioningImage SearchIncreased High Availability
2. Function as Cloud Big Bet Living Lab Execution based Innovation Best practices & Skills ( IBM SW, Cloud SW) Client facing Resource Rapid tech. adoption amidst ‘quirks’
Research Cloud
Software; Hardware; Services
4. WorldClass Research Organization3. Lead Cloud Innovation for IBM
Tivoli Product Enterprise ready; Desktop Cloud ( Security; Optimization; On Boarding) Platform as a service Workload placement & migration Storage Cloud ( GPFS ++ ) Image Analytics and construction Patch Management
Client facing testimonials IBM Technology leadership Internal / External Cloud Standards participation Papers and Patents Internal Recognition and References
© 2012 IBM Corporation
Cloud – The Journey to Implementation
© 2012 IBM Corporation
RC2 Component Diagram
Watson A
Watson B
RC2 Virtual datacenters
RC2 Business Support ServicesUsers
Cloud Team
UseCompute/Storage
RequestPortal
ManageSupport
© 2012 IBM Corporation
RC2 Infrastructure Diagram
© 2012 IBM Corporation17
Research Compute Environment: Transformation Journey
20042003 2006
Offices
Labs
Closets
Decentralized environments
Centralized model• Two major location• Pool resources• Share infrastructure• Provide resource
flexibility• Leveraging Skill sets• Achieve economies
of scale• Centralized capital
planning
• Virtualized resource• Optimize power and
cooling• Dynamic resource
allocation• Standardize• Consolidated more
than 3500 sq ft• Manage 6000+ assets
2009-2012
Research Lab Systems Engineering phase I
Research Lab Systems Engineering phase II
Research Compute CloudIT Consolidation: 2 New Datacenters
Watson Pok
© 2012 IBM Corporation
IBM Research IT – Operational Pains – How Cloud Helps
Takes too long to create application infrastructures– The average lead time to get a new application environment up and running is 4-6 weeks!
• Approvals, procurement, shipment, HW installation, license procurement, OS installation, application installation, configuration
Cloud images can be created in a matter of minutes
Creating middleware infrastructures is a manual and error-prone process– 30% of bugs are introduced by inconsistent configurations
• These types of bugs are often the most difficult to detect• These bugs typically only emerge when moving between dev/test, QA, production
Cloud instances are built from pre-tested images, reducing infrastructure errors
Poorly utilized resources driving up hardware & labor costs– Because it’s so expensive to set up an environment, there is an incentive to hold onto hw/sw resources ->
just in case it may be needed at a later date– Future environments = new hardware, rather than recycling returned hardware
Cloud resources are always available, whenever a project requires resources, so no need to ‘hoard’ machines
18
19 © 2006 IBM Corporation© 2012 IBM Corporation© 2012 IBM Corporation
Current Status of Research Compute Cloud
Largest self service cloud in IBM
Solution for Research projects needing IT resources
Sunset 820 Physical Machines as part of Legacy Update
Dynamic usage patterns Real time monitoring and
alerts Chargeback policies shape
usage patterns Significant contributions to
IBM cloud product offerings Cost efficiencies realized
(graph)
20
Next Steps: RC2 Research Lab(s) Federated Services
IBM SCP/HSLT 1.2IaaS
(with RC2 Innovation)
Watson A
Watson B
Australia
Haifa
RC2 Global Pods
RC2 Central Cloud Services Users
LocalAdmins
Cloud Team
RequestUnified Portal
Local Compute/Storage
ManageSupport
Consume & Contribute
ManageSupport
India
21 © 2006 IBM Corporation© 2012 IBM Corporation© 2012 IBM Corporation
Monitoring, Management – Key to a reliable solution
Continuous monitoring of all infrastructure components
Track Availability
System Performance (real time and historical)
Capacity Tracking – foundation for analysis and forecasting
Events and Outage Tracking (maintenance windows, outages, failures)
Monitoring Services
Backup
Alerting
22 © 2006 IBM Corporation© 2012 IBM Corporation© 2012 IBM Corporation
RC2 Health Dashboard
© 2012 IBM Corporation23
RC2 Metrics Journey
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2012
Launched RC2 HealthDashboard
Launched RC2 HealthDashboard
E2E Probes
and Alerts
E2E Probes
and Alerts
UI Availability
Metrics
UI Availability
Metrics
Instance Availability
Metrics
Instance Availability
MetricsCapacity Metrics
Capacity Metrics
Usage Delta
Reports
Usage Delta
Reports
Health Dashboard
•Reactive•Minimal monitoring
•No alerting
•Reactive•Minimal monitoring
•No alerting
•Proactive•Reduced Problems
• Improved Availability
•Proactive•Reduced Problems
• Improved Availability
© 2012 IBM Corporation2424
RC2 Metrics Journey2012
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2012
Smart Metrics
Launched Smart
Metrics
Launched Smart
Metrics
Service Asset
Metrics Updates
Service Asset
Metrics Updates
Improved Problem and User Request
Categories
Improved Problem and User Request
Categories
Problem Category Trending
Problem Category Trending
Launch of Maximo (Tivoli
ticketing solution)
Launch of Maximo (Tivoli
ticketing solution)
•Inconsistent Methodologies
•Minimal metrics• No Trending
•Inconsistent Methodologies
•Minimal metrics• No Trending
•Consistency•Strategic Solutions
• Measures that Matter
•Consistency•Strategic Solutions
• Measures that Matter
25 © 2006 IBM Corporation© 2012 IBM Corporation© 2012 IBM Corporation
Compliance Automation: Automated Windows License Tracking
Daily check Available Licenses vs. Windows instances
High water mark triggers ordering process
Real time reports
26 © 2006 IBM Corporation© 2012 IBM Corporation© 2012 IBM Corporation
Capacity Tracking
27 © 2006 IBM Corporation© 2012 IBM Corporation© 2012 IBM Corporation
Event and Outage Tracking
Capture every outage, failure, service or maintenance window Atom feed for subscription Reports Integrated with Availability and Performance monitoring Service ticket correlation
28 © 2006 IBM Corporation© 2012 IBM Corporation© 2012 IBM Corporation
Usage Statistics
29 © 2006 IBM Corporation© 2012 IBM Corporation© 2012 IBM Corporation
RC2 Chargeback
An experimental chargeback service to help recover costs, and to encourage efficient use of cloud resources.
© 2012 IBM Corporation
Why is automating image management important?
30
Virtualization + Cloud Automation + Scale-out Applications = Virtual Server Explosion
Linear scaling of maintenance cost is not good enoughPandora’s box problem: rate of growth of virtual servers >> rate of growth of IT budget
Image Management
© 2012 IBM Corporation
IBM Virtual Image Library
• Centralizes storage of reference images
• Index image content
• Check-in/out supports distributed environments
• Version numbering
• Search and compare
• Deep analytics on content
IBM’s Common Cloud Stack
IBMWorkloadDeployer
Image Management
© 2012 IBM Corporation
RC2 Cloud Applications
Rational
Team Concert
IBM Systems
Director
DevOps
Rational
Application
DeveloperVLSI
IBM Power9
Development
IBM Systems
SoftwareCrunch Day
© 2012 IBM Corporation33
One Time
Recurring
Order & Approval Receiving/Delivery Registration (MAD, eAMT)
Install & Configure
Security Scan & Findings
Patch -> Rescan KCO Selection Physical Audit
Compliance Management Audit
Re-Image
P | V P | V P | V
P | VP | V
P | V
P | VP | V P | V
Physical Machine | Virtual MachineHours
RC2 Efficiencies over Physical Machine Management
•One time 65%•Recurring 49%
Measuring Our Progress
© 2012 IBM Corporation34
Agenda
Setting the Context – IBM Research– Research Integrated Solutions – Research IS Innovation
History of the Research Compute Cloud (RC2) Project– RC2 Journey – RC2 Today
• Architecture • Monitoring, Management, Statistics, Metrics• Applications
Lessons Learned During Transformation to Cloud – Technical challenges– Cultural challenges
© 2012 IBM Corporation
Cloud Lessons: Technical Cloud management solutions are complex
Do not assume these are easy to implement, unless using pre-packaged appliances.
Networks are the enabler as well as the inhibitor ! Early verification of network viability for Cloud services delivery is vital, especially when
Cloud spans beyond the data centre.
Build to the lowest denominator Build the basics starting with IaaS capability and move up the stack to PaaS and SaaS
offerings.
Don’t forget the development and test environments Testing must be done, as for any new implementation
A top-down approach (green field infrastructure) will achieve greater benefits Higher levels of service standardisation possible by designing from top-down. Avoids “legacy” infrastructure & processes which may constrain the “purity” of Cloud services.
Take a workload based approach Understanding workloads and how these map to the cloud is key to a successful implementation
Development & Test workloads are ‘low hanging fruit’ 50-60% of IT spend covers non-production systems, which suffer from low utilisation, high cost and
many cycles of deployment. These attributes align extremely well with cloud.
New aspects of Cloud do need to be carefully planned (Cloud) Service definition, quality of service, evolution of the service, service catalogue, and service
life cycle need to be well defined and designed. Clarity in use-cases, service catalogue and non-functional requirements fundamental to success.
35
© 2012 IBM Corporation36
Changing The Way People Work Takes Time and Patience!Centralize the IT capital management process
• moved from department / projects control to IT control• fair & rapid exception process
•Centralize the IT staff
• system administrators under a single group
•Education• brown bag lunch series• IT web / wiki pages explaining rationale• Department Outreach (Single Points of Contact for communication)
•Development Team Leadership• Early adopters self-identify, sign up early, help create the environment, become• advocates• Commit to / execute process • Continuous socialization of successes• Rapid response to problems / requests for help
•Issue Kevlar Uniforms to Cloud Team• They are learning too, provide them cover!
Cloud Lessons: Cultural
© 2012 IBM Corporation
Fundamental Security Challenges
What is unique about cloud computing security?– Loss of physical ownership – “technological, cultural and psychological issue”
• Redefines boundaries of IT infrastructure, redefines “insider attacks”– Scale; many VMs, few system administrators; mis-configuration– Complexity of reasoning and optimization: multiple layers & constraints
• Complexity implies the need for a framework to manage security– Data loss risk: concentration of computing and data magnifies risk
Mission critical workloads and sensitive business data will not migrate to the cloud, unless customers are convinced that the cloud offers security and compliance guarantees that are equivalent or better than what they can provide with physical systems.
Security and in particular, authentication, access control, isolation management, integrity management and image management are key enabling technologies for cloud computing.
Research Topic
© 2012 IBM Corporation38
For More Information
IBM Research - http://www.research.ibm.com
Lorraine M. Herger
Director
Research Integrated Solutions