CERN Team thoughts about the OpenStack Design Summit Tim Bell; Jan Van Eldik; Dan van der Ster;...
-
Upload
alvin-marshall-caldwell -
Category
Documents
-
view
217 -
download
0
Transcript of CERN Team thoughts about the OpenStack Design Summit Tim Bell; Jan Van Eldik; Dan van der Ster;...
1
CERN Team thoughts about the OpenStack Design SummitTim Bell; Jan Van Eldik; Dan van der Ster; Belmiro Moreira; Jose Leon; Marek Denis; Marcos Lobo
2
Growth continues• Number of attendees
6,000+Vancouver
• Over 50% attending for the first time
3
Growth continues
San Francisco, April 2012 Vancouver, May 2015
4
Growth continues
5
Conference and Design Summit
• Conference and Design Summit (Liberty cycle)• +400 conference sessions• +300 design sessions• Up to 32 parallel conference and
design sessions
6
Users – Show and Tell• Several companies talked about their deployments• Walmart (> 100,000 cores) running their production• Nike discussed their large VMWare OpenStack environment• National Supercomputing Center in Guangzhou (> 12,000
hypervisors)• eBay/Paypal (~ 12,000 hypervisors) (~300,000 cores)• Yahoo! (>> 10,000 servers) on OpenStack bare metal
7
From integrated release to Big Tent
The Big Tent - a look at the new OpenStack projects governance - https://www.youtube.com/watch?v=TTe_bZtEKxo
8
From integrated release to Big Tent
• Move to Big Tent to become inclusive• Single Integrated Release (“Too Small – Too Big”)
• Release management – “Is it part of the release?”• QA, CI – “Should we test it together?”• Operators – “Is this mature-enough for production?”• Cross-project teams – “How to support all of them?”
• No integrated release with “Liberty”• “Is this OpenStack” vs “Are you OpenStack” • Tags – help navigate in the OpenStack projects• Opt-in coordinated release at the end of the cycle• 6 month cycle to bind all of them
• New Projects Members• Magnum; Murano; MagnetoDB; Mistral; Rally; Puppet Modules (…)
9
RDO Community meetup• RDO provide the client and server-side RPMs• Using CentOS build and CI system
• Still quite RedHat driven, but aims to open to wider audience• CERN's effort in the EL6 packaging was gratefully acknowledged by various
people• And they read our blog as well!
• Provide packages for the big tent is work in progress
From the meetup:https://etherpad.openstack.org/p/RDO_Vancouver
https://soundcloud.com/rich-bowen/sets/rdo-community-meetup-openstack
10
Large Deployments Team meetup
• Rackspace; Intel; HP; CERN; Oracle; NECTAR; RedHat; godaddy; Yahoo; bigswitch; BlueBox• Capacity management
• IP Addresses management• AVZs vs HostAggregates
• Neutron and Cells – How do Large Deployments use Networks• Most deployers in the room have segmented networks whose IP addresses are constrained to
parts of the datacenter• Provider Networks vs. Tenant Networks
• CellsV2• General discussion
https://etherpad.openstack.org/p/YVR-ops-large-deployments
11
Nova Cells• Cells used by Large Deployments however
“The cells feature of Nova is considered experimental by the OpenStack project because it receives much less testing than the rest of Nova. This may change in the future, but current deployers should be aware that the use of it in production right now may be risky.”
• CellsV2 – All OpenStack nova deployments will run cells• Effort started in the Kilo development cycle• Changes not user visible in Kilo release• CERN and BARC actively collaborating in the design and development
• Scheduling; Availability Zones; Host Aggregates; RPC communication; DB redesign; …
• Single cell deployment for Liberty using CellV2• Classified as High Priority for Liberty cycle
12
Networking• Discussions how to move from nova-network to neutron• Tests and reference architecture uses OVS• No deprecation date for nova-network
• Several deployments don’t want to expose the network to users• Neutron community very open and interested in operators use cases• Large Deployment Team is providing input for the Neutron Code Sprint
• Improve bridge neutron driver• Neutron DVR has centralized SNAT • No floating IPs with provided networks
13
Live-Migration• Presentations from Intel, HP Public, Time Warner clouds• Sharing experience, tips & tricks• Lots of acclaim from the audience
• HP have hypervisors w/ 2 spinning disks, no shared storage• Want to patch every 3 months• Tried "continuous live migration" -> failed miserably• Now: announce downtime every 3 months
• Some improvements to Nova underway• Side-note: Caring for "pets" is not a CERN-specific issue :)
14
Image Management• The community has the same management issues as us:
"As a user of the cloud, which image should I choose for my VM's from this big image list?"
• Same solutions:• Add metadata for images• Naming convention• Try to show image subset to final user
• At CERN, we modify the Image list in OpenStack Dashboard (Horizon)• Upstream are interested in this solution
15
Quotas• Quotas is seen by operators as pain point and common bug theme• Different deployments have different strategies to keep quotas in sync • Proposal to simplify quotas architecture• https://review.openstack.org/#/c/182445/
• Hierarchical quotas• Allows to delegate quota management to sub projects• Collaboration work between CERN and BARC• https://review.openstack.org/#/c/160605/
16
Error Handling• “No valid Host”• Developer community is aware about the problem
• Improve error handling in general• Better error messages to the end user• Better logs and notifications for operators• Recovering from failure
• Retry• Rollback
• Self service debugging
17
Containers, Containers, Containers• OpenStack Magnum – Containers as a Service – launched• Native container project (as opposed to driver implementation in Nova)• Framework agnostic and supports native commands also
• Docker, Rocket, CoreOS, Atomic, …• Kubernetes, Mesos, …
• Builds on existing OpenStack features• Heat for Orchestration• Glance for registry• Nova/Ironic for resources
• Interesting but needs at least another release for maturity• Can’t wait ….. test out Docker on the CERN cloud using http://cern.ch/go/66Tx• Significantly different programming model compared to VMs
18
Horizon• Improve the user experience and performance • Refactor using Angular JS
• https://youtu.be/uumkhUo-7Y0?t=4m44s (New instance DEMO)• Heat Templates creator via drag & drop editor• Project SEARCHLIGHT to improve the searches
19
Murano• Application catalog service• Compose reliable application
environments• Uses Heat as foundation• http://apps.openstack.org/
20
Hierarchical multitenancy & Resseler• HMT code landed in Kilo• Project hierarchy can now be created • Inherited roles for projects: one can grant an inherited role for a project
subtree in one API call• http://raildo.me/hierarchical-multitenancy-in-openstack
• Reseller use-case – work in progress:• Project can act as a domain (new users, groups, roles can be created within
the scope of this project)
21
CERN has significantly contributed to each of the federation bit in OpenStack upstream
Identity Federation in OpenStack
22
23
Identity Federation in OpenStack
• OS-FEDERATION is a core part of OpenStack• no longer an extension• available in a default OpenStack deployment
• Keystone can act as a simple Identity Provider (featuring SAML2 protocol)• OS-FEDERATION is a core feature of OpenStack Identity Services
• Keystone2Keystone federation marked as stable and production ready
• Web-Single-Sign-On available upstream
24
OS-FEDERATION in general
• Design is stable and doesn’t change since Icehouse
• Keystone acts as a Service Provider• issues an OpenStack token based on credentials from Identity Provider (ADFS)
• What Kilo brings to us:• Mapping engine enhancements (blacklisting, whitelisting)• Ability to map to users stored in OpenStack backend (A 3rd party trusted Identity Provider can be used for authentication only)
25
Nice surprise during first keynotes…“Thanks to OpenStack, DigitalFilm Tree employs multiple clouds to squeeze
3,600 hours of content into one-hour television show hits”
26
Keystone2Keystone federation
• Keystone acts as a simple SAML2 Identity Provider • Ability burst to remote trusted clouds
• What Kilo brings to us:• Keystone2Keystone is marked as stable (and production ready ) • Fixed few severe bugs• Clarified design concept (service catalog has a new entry: service_providers)
• Client side will be available soon (long transition due to backward compatibility and user experience requirements)
27
Web Single Sign On
• We already know it• booting a VM@CERN – https://openstack.cern.ch• … or requesting for holidays – https://edh.cern.ch
• What Kilo brings to us:• Fully fledged Web-SSO feature
Upstream change heavily basing on CERN’s prototype!
28
Over 30 vendors announced support for federated identities….
• Rackspace• Cisco• DreamHost• EasyStack• HP• IBM• Internap• Mirantis• Suse• Ubuntu• Vexxhost • …and more
29
Storage: Cinder, Glance and more…• Cinder: self-service block-devices, aka “Volumes”; Glance: system image
repository• Both are growing in number of drivers, now considering moving drivers out of
core• Ongoing developments related to backup and DR• Glance “Tasks”: asynchronous processing on uploaded system images• E.g. security validation, add standard packages• https://youtu.be/ROXrjX3pdqw
30
Storage Disaster Recovery• “Dude Where’s My Volume?” – RedHat Storage Talk• Today option 1: Backup to a Ceph pool in another site: user-controlled Cinder Backup • Today option 2: Two Ceph clusters, Admin-controlled backups, equivalent to tape
backup
31
Storage DR: Live Failover Site (in dev)
• http://www.slideshare.net/SeanCohen/dude-wheres-my-volume-open-stack-summit-vancouver-2015
32
Coming Soon: Self-Service Shared Storage• The Manila project has the goal of integrating shared storage into OpenStack• Web and API access to create/manage shares and access them from client machines
33
OpenStack Manila
• Demo by NetApp: http://www.youtube.com/watch?v=8KhXD9v0jKI&t=7m35s • DT experiences: https://www.youtube.com/watch?v=ikhjeGN8sY4
34
CephFS Drivers for kvm/lxc/docker/ironic
S. Weil: http://www.slideshare.net/sageweil1/keeping-openstack-storage-trendy-with-ceph-and-containers
A CephFS driver for Manila is in progress.
VirtFS/9P is a particularly interesting option with different applications – FS pass-through from HV to VM.
http://www.linux-kvm.org/page/9p_virtio
35
Other Takeaways• Ceph Operators Meetup• Site reports from Time Warner Cable, Comcast
• TWC expanded their cluster 4 times (60TB->400TB): https://www.youtube.com/watch?v=8bwLxJRok08
• Largest clusters:• 5PB @ Monash University (Melbourne), 4PB @ CERN, 2PB @ Comcast• 1.8PB @ Ontario Institute for Cancer Research (going to 15PB in 1½ years)• 10000 nodes 40PB @ anonymous Chinese agency (anecdotal)• eBay, Yahoo other big users (Flickr hashes photos over many 3PB instances)
• Storage Hardware• More vendors producing smart drives: e.g. Toshiba KV drive• Inexpensive ARM processors will allow us to run software directly on the drives• Commodity manufacturers are trying to leverage SDS; big SAN vendors are on the defensive.
36
37
Links
• About 300 hours worth of videos athttps://www.openstack.org/summit/vancouver-2015/summit-videos/• Design sessions at https://
wiki.openstack.org/wiki/Design_Summit/Liberty/Etherpads• OpenStack in Production
http://openstack-in-production.blogspot.ch/