open stackliberty_recap_by_VietOpenStack

14
Tokyo Summit Recap Nova Liberty known issues neutron LBaaS does not work with Heat autoscaling keystone v3 has a lot of undocumented config options nova conductor scaling is worst than expected. Nova switched to use pymysql driver in Liberty glance client broken with urllib3 Kilo known issues The release notes have quite big holes. Things that was deprecated in Juno and removed in Kilo are not documented Global issues: Neutron-client was pinned to a wrong version upper constrains requirements are provided! This is the last known requirement version that works with Kilo. what-broke tool that helps finding new pypi releases that might broke our unit test Upgrading Cinder is hard because of database changes Nova major issues: These issued are listed in: https://review.openstack.org/#/c/240959/7/priorities/mitaka- priorities.rst,cm 1. Cells V2 Use case: Operators want to partition their deployments into cells for scaling, failure domain, etc. When partitioned, we need a way to route queries to the cell database for an instance. http://specs.openstack.org/openstack/nova-specs/specs/liberty/ approved/cells-db-connection-switching.html

Transcript of open stackliberty_recap_by_VietOpenStack

Page 1: open stackliberty_recap_by_VietOpenStack

Tokyo Summit Recap

NovaLiberty known issues

neutron LBaaS does not work with Heat autoscaling keystone v3 has a lot of undocumented config options nova conductor scaling is worst than expected. Nova switched to use pymysql driver in Liberty glance client broken with urllib3

Kilo known issues The release notes have quite big holes. Things that was deprecated in Juno and removed in Kilo

are not documented Global issues:

Neutron-client was pinned to a wrong version upper constrains requirements are provided! This is the last known requirement version

that works with Kilo. what-broke tool that helps finding new pypi releases that might broke our unit test

Upgrading Cinder is hard because of database changes

Nova major issues:These issued are listed in: https://review.openstack.org/#/c/240959/7/priorities/mitaka-priorities.rst,cm

1. Cells V2

Use case: Operators want to partition their deployments into cells for scaling, failure domain, etc. When partitioned, we need a way to route queries to the cell database for an instance.

http://specs.openstack.org/openstack/nova-specs/specs/liberty/approved/cells-db-connection-switching.html

Priorities: Cell v2 will become default in Mitaka version. Upgrade path from v1 to v2 and multiple cells support in the next OpenStack version .

The requirement of Cell API is that it should show the instance before it is scheduled to a specific cell, therefore the information of the instance has to be persisted somewhere. From the Kilo version, scheduler uses RequestSpec that contains the information of spawning instance so that RequestSpec must be persisted. Besides, we can persist the BuildSpec instead of RequestSpec.

Page 2: open stackliberty_recap_by_VietOpenStack

Instead of using nova-cell proxy, nova-api directly connect to the cell db and cell queue for each instance. CellMapping contains the connection information of the cell. The interaction between DB layer (master cell db) and CellMapping mus be dynamically provided. The situation is the same with message queue.

Flavor is a global concept that should be stored in only one database. Since the information of flavor lives in Cell API so that tables related to flavor should be created in API database.

2. V2.1 API Priorities:

The lack of document of the V2.1 API API concept guide API code review guideline

Support improved service catalog make project_id optional in every API

API V2.1 is default in Liberty for all API endpoints

3. Scheduler

Request spec object implementation is finished in Liberty The idea that scheduler might be detached from Nova architechture to be run as a separate service so that the below aspects are created to support this long run:

The resource allocation happens on the compute node, in the future, this action will take part in scheduler too.

In current versions of OpenStack, the compute decides itself that what resource is free, used, which host is fit for a specific vm. In the future, those works will belong to scheduler.

Basically, a flavor is a composition of resource and capability needs. If we need to add new capability or new resource type, it means we create new flavors therefore the number of flavors is growing rapidly. Instead of dealing with raw information in flavor when booting a vm, we decompose flavor to get the specific resources for scheduler interface.

Initiating resource object will make easier to add new resource (e.g. numa topology) into the scheduler interface. By decomposing flavor, the new resource will be taken from resource object.

It needs a new REST API that can get the capabilities of a cloud (e.g. disk, gpus)

4. OS VIF Lib

Page 3: open stackliberty_recap_by_VietOpenStack

https://review.openstack.org/#/c/193668/3/specs/liberty/approved/os-vif-library.rst,unified

- Create a new library (actually an interface) containing all the VIF drivers that will make Neutron and Nova more easily work together.

- Implement a new VIFConfig abstract base class as a versioned object using oslo.versionedobjects:

Neutron should return the port as either VIFConfig object instance or the legacy dict format. The lib interface contains only an init, plug, unplug method. xml generation can be handled by oslo.versionedobject serialization

Nova minor issues SRIOV

Nova-compute needs to know which PCI devices are allowed to be passed through to the VMs. Also for SRIOV PCI devices it needs to know to which physical network the VF belongs to

Link: https://wiki.openstack.org/wiki/SR-IOV-Passthrough-For-Networking-Liberty

Cinder - Nova Volume multi-attach :

Currently, Nova only allows a volume to be attached to one instance and or host at a time. Nova makes an assumption in a number of places that assumes the limitation of a single volume to a single instance.

Use case: Clustered applications with two nodes where one is active and one is passive. Both require access to the same volume although only one accesses actively. When the active one goes down, the passive one can take over quickly and has access to the data.

https://specs.openstack.org/openstack/nova-specs/specs/liberty/approved/multi-attach volume.html

Nova - Ironic interactions Changes from a single compute model to a multi compute model in order to avoid single point

of failure. A single ironic node will belong to a nova compute node so that migration and evacuation has no effect

Nova-compute will report the resource usage of resource providers. But for Ironic, resources are not elastic, it is static and it may have chunks of resources types (disks, ram, etc.) that map to the underlying hardware of Ironic nodes. The resource pool concept will solve this problem in the future.

Nova scheduler will only schedule based on compute resource count, ironic needs to do the real scheduling.

Page 4: open stackliberty_recap_by_VietOpenStack

Nova small issues Nova has an instance locked by field but external service wants to lock other things, keypair,

snapshot An instance needs to prove others who it is. Metadata is insecure, config drive is not flexible

enough to solve key rotation.

Live-migrate issues are fixed.

The third mode of shelf: Previously, shelf has two modes, normal (associated data, resources are reserved and kept) and offload (resources are released and instance disk is delelted). The proposal for the third mode is about: Release resource but keep the instance disk.

Supports shelving and unshelving of instances in Horizon. https://blueprints.launchpad.net/horizon/+spec/horizon-shelving-command

Mix and Match resource federation:https://blueprints.launchpad.net/nova/+spec/mix-and-match-resource-federation

- Multiple landlords cloud model where multiple services providers can cooperate to stand up in a single cloud environment.

- Use case of Nova and Cinder that means an instance can be booted in an OpenStack deployment with volume from another OpenStack deployment. In this case, Nova needs to talk with different cinder, local keystone needs to talk with remote keystone, uuid token should be globally unique.

Hierarchical quota: A sub-project shall be able to get quota from the main project (duplicate or split). Cinder already implemented it so nova needs to adapt.https://blueprints.launchpad.net/nova/+spec/nested-quota-driver-api

Nova image signature verification: Glance shall provide a checksum along with the image, glance message needs to be signed with a CA. Finally, nova needs to check the checksum at booting instance.

Nova load balancer: It can help dinamically move VMs between computes. https://wiki.openstack.org/wiki/Watcher

Centralized config definitions: Today the nova config option definitions are spread around in implementation files. Let's move them to a central place.

SRIOV attach interface: Nova interface-acttach/detach can not support SRIOV neutron port. https://blueprints.launchpad.net/nova/+spec/sriov-interface-attach-detach

CrossProject: live upgrade (Source: Ericsson sum up after Mitaka submit) Dan Smith (nova core) has a comprehensive set of blog psts about openstack upgrade There are four pillars of the live upgrade:

RPC Understanding multiple version of the same rpc api in the same deployment so

you can upgrade your services separately define compatible versions to limit the possible combinations

Page 5: open stackliberty_recap_by_VietOpenStack

client shall be able to send old message version if told to, server shall understand the old version

order the services for upgrade (to limit the amount of combinations) Data facade:

define the schema of the data you are passing around db schema shall be independent this layer can handle version differences

DB schema: Do not allow shrinking db schema changes Allow the schema change to happen any time Data facade shall handle data migration Removing things needs extra care: data facade moves the data, next code revision

removes the field from the db model, then the db schema can be shrinked Validation:

test that old client can talk to the new server in nova gate: jenkins starts liberty, do smoke test, populate the system with data,

shut down everything except nova-compute, roll out the new patch under test, start the services up, run full tempest.

in nova gate: multi node test is experimental but shall be ready in Mitaka, it will allow testing even more service separation during upgrade

Nova supports upgrade between one release because data migration needs to have a well defined scope

Always move to the latest stable release before the upgrade (bug fixes for the upgrade code) This should be easy as stable releases cannot contain db schema change and rpc version

change ironic, cinder, neutron, ceilometer needs to implement this support to have a meaningful stack

upgrade

Oslo.config for configuration changes oslo.config already supports rereading the config file at SIGHUP however current services are

not prepared for this kind of config change. oslo.config shall support marking mutable config variables as changeable and only reload those

that are marked mutable by the service. The config sample generator shall be able to put this information in the config sample too.

Services shall start marking the changeable parameters in the code

Guru Mediation Report http://docs.openstack.org/developer/nova/gmr.html https://wiki.openstack.org/wiki/GuruMeditationReport

Page 6: open stackliberty_recap_by_VietOpenStack

Off-Topics:OPS billing Alternatives:

ceilometer gnochi cloud kitty ceilometer with monasca backend for low-lever querying

OPS quota A more fine-grained quota system that can measure many types of resources like objects,

flavors, availability zones, etc. needs to be built and implemented. Nested quota to support sub project shared quota. Cinder implemented it already. https://wiki.openstack.org/wiki/CinderBrick

Distributed locking Main problem: OpenStack needs a common distribute lock solution. As today the db, and the

message queue is not used for it. No single solution is acceptable for the community Needs:

service discover locking to have single writers leader election (out of scope of distribute lock) load sharing between workers (re-queue the workpiece if a worker dies, ensure that

resurrected worker does not resume the workpiece) fair locking (starvation)

Zookeeper is mature but is java based, some operator does not want to use java in their environment as well as openjdk might not scale and oracle has licensing issues.

Tooz is a potential candidate as an abstraction layer to provide distributed locking. Zookeeper will be the default backend but other can be added

Need make a plan how to move from today's solutions to Tooz

Page 7: open stackliberty_recap_by_VietOpenStack

NEUTRON

Trunk Port IT is an important feature for telco applications that have the need of multiple networks. A new spec was taken about new abstraction of trunk port in Neutron. It is now under the review of API alternatives:

https://review.openstack.org/#/c/243786/

Prospective Issues:

- SFC (Service Function chaining)

- Kuryr and nested containers in Kuryr.

Other IssuesPerformance Testingshttps://etherpad.openstack.org/p/mitaka-cross-project-performance-team-kick-off

There are several tools for perf testing: Rally, Gatling, Zipkin, Locust, etc.

Rally nowadays is the most popular tool but there is no standard benchmarking solutions/toolsfor OpenStack. We have 2 most usecases of testings that are performance testing and scale testing. Mirantis is using Rally for both of cases. Check the etherpadd for more information.

Nova/Neutron: OS VIF lib Since there are more new neutron drivers so that new vif types are needed. However, now the vif drivers are in the nova code tree therefore it is needed to take them out into their own library. See more details in Nova OS VIF lib.

Neutron: Scalability, Reliability Pain Points https://etherpad.openstack.org/p/mitaka-neutron-next-ops-painpoints

Neutron: Extending Network Model https://etherpad.openstack.org/p/mitaka-neutron-next-network-model

The idea is about change the Neutron API from the architect of Net-Subnet to Net-IpNetwork-Subnet. The use case can be like this: What if people do not care about the network where the vm is attached.

Page 8: open stackliberty_recap_by_VietOpenStack

They just care about the ip range in which the vm is assigned from. In the large deployment, we are going to have a lots of networks so that remembering the network where the vm is attached to is somehow superfluous. That is the reason why the vm should be attached to IpNetwork and then automatically assigned to a Net.

Cross-Project: Distributed Lock Manager https://etherpad.openstack.org/p/mitaka-cross-project-dlm

Since OpenStack is a distributed system so that it needs a distributed lock manager (DLM). Each sub-project of OpenStack has its own locking and its own solution. What if some others projects would want to have locking but would rather not create their own solutions? Does OpenStack accept a single DLM for all of sub-projects. If YES, what is the solution. Zookeper, etcd, Consul are all the potential candidates with their own strengths and weaknesses.

Cross-Project: Dynamic reconfiguration Before Liberty, oslo.config was created to be capable of re-reading changes of configurations. But up to now, no any OpenStack services use it. In daemon running, log changes are ignored by SIGHUP but oslo.config can be responsible for it.

Other interesting projects

OVN: blog entry on OVN Kuryr: Kuryr on github Dragonflow : SDN controller in 3 kLOC of Python BGPVPN Neutron L2 Gateway

.

Page 9: open stackliberty_recap_by_VietOpenStack

KEYSTONE

Keystone Federation- Add k2k plugin into keystoneclient: https://review.openstack.org/#/c/207585/

- People are trying to make use of openstack-client to use keystoneV3 since it does not support many services of OpenStack. A cloud config file should be given for user to be easy of switching among the cloud deployments.

http://docs.openstack.org/developer/python-openstackclient/configuration.html#configuration-files

- From Kilo, Service Provider idea was initiated and now in K2K it should be limited due to the reasons of security, performance, etc. in large cloud deployment. The filters should be provided to enable only the associating service providers with projects where the user is operating.

https://review.openstack.org/#/c/188534/

Token and Tokenless:- Since Fernet has some problems about performance but it is planned to be the default token type in Mitaka.

- Fernet token information: http://dolphm.com/openstack-keystone-fernet-tokens/

Cross-project:Based on the topics of: https://etherpad.openstack.org/p/keystone-mitaka-summit-x-project

- LDAP implementations: People are putting more efforts to make LDAP more mature.

- Custon TTL on tokens for the long term operations: auth_token makes token validation request to keystone as GET /v3/auth/token?ttl=123456. Keystone receives token validation request, ignores the expiration in the token, re-calculates the expiration based on actual created_at + ttl, and uses that to perform validation.

Page 10: open stackliberty_recap_by_VietOpenStack

Reference Links:https://github.com/openstack

http://accelazh.github.io/openstack/Openstack-Tokyo-Summit-Notes/

http://blog.openattic.org/posts/conference-report-openstack-summit-2015-tokyo-japan/

http://www.solidfire.com/blog/openstack-summit-tokyo-recap-containers-cloud-native-and-a-dash-of-cinder/

https://wiki.openstack.org/wiki/Design_Summit/Mitaka/Etherpads