This is a session on features people would like to add to the current scheduler in Nova. It's packed with ideas. We will do our best to discuss as much as we can in the time available. Topics that need more time can be scheduled for the unconference track later in the week.
This session will include the following subject(s):
Extend the Nova Scheduler view of mutable resource:
Currently the Nova host_manager keeps track of the Disk, Memory and vCPU resources of each host which are then available for use in scheduler filters. These are also reflected in the flavour definitions.
I’d like to discuss extending this to cover additional resource types such as:
Compute Units : An measure of CPU capacity, that is independent of the physical CPU performance of a server models and independent of the vCPU configuration of a VM . Each server would report its total Compute Unit capacity. The flavor "vcpu_weight" would seem to meet the requirement in terms of definition, but it seems to be something of a hidden value (the instance_types create() method doesn;t support it for example) and its not currently tracked as a mutable resource in the host_manager.
Network Bandwidth: An measure of network bandwidth that is independent of the network capacity of a server. Each server would report its total network bandwidth capacity. The current rxtx_factor is the flavours looks like it could be logically used to represent this, but the current usage seems to be in conflict with being an arbitary measure of bandwidth sinceit represents a % the rxtx_base value of a network. A Nova system could include hosts with 1Gb, 10Gb, multiple 10Gb network configurations connected to the same logical network.
These are two examples of additional flavour and host attributes, there will probably be others either now or in the future. Flavors already provide an extra_spec that could in theory be used to define these, but there is no way to expose the corresponding host capacity to scheduler filters. The host manager does support a “capabilities” feature, but this is more of a binary value that a consumable resource.
Possible options are: - Add the existing vcpu_weight and rxtx_factor as specific mutable resources in the host manager. May be a conflict here between current usage in xen and the more general definitions of resources.
- Add additional flavour & host manager resources, to avoid overload / conflict with current usage of vcpu_weight and rxtx_factor.
- Provide a generic mechanism for the host manager to support additional mutable objects that correspond to and can be consumed by flavour extra_spec values.
In addition to making this data available to the scheduler, it also needs to be consumable by a resource management layer that may be to some extent independent of the virtualisation library. For example it is an established design pattern to implement resource management via an agent running outside of Nova itself – for example an agent which is triggered via a libvirt hook when a VM is created. Currently such an approach only has access to the flavour aspects which are part of the VM definition. This proposal would (for libvirt) create an additional XML file per VM that contains the full flavour definition.
The Nova API for instance creation supports a scheduler_hints mechanism whereby the user can pass additional placement related information into the Nova scheduler.
The implementation of scheduler_hints lies (mostly) in the various scheduler filters, and the set of hints which are supported on any system therefore depends on the filters that have been configured (this could include non-standard filters). It is not currently possible for a user of the system to determine which hints are available. Hints that are not supported will be silently ignored by the scheduler
We propose to add an API extension to make the list of supported hints available to users.
A common requirement for analytical applications (amongst others) is to want to place related workloads on the same rack so as to take advantage of the increased network bandwidth. In order to support this we need:
This is similar to the existing affinity scheduler filter, but requires additional per server attributes to be exposed. We would like to discuss whether this can be achieved by extending the existing capabilities mechanism.
Allow a tenant to allocate all of the capacity of a host for their exclusive use. The host remains part of the Nova configuration, i.e. this is different from bare metal provisioning in that the tenant is not getting access to the Host OS - just a dedicated pool of compute capacity. This gives the tenant guaranteed isolation for their instances, at the premium of paying for a whole host.
We will present a proposal that could achieve this by building on existing host aggregate and scheduler filters.
Extending this further in the future could form the basis of hosted private clouds - i.e. schematics of having a private could without the operational overhead.
The required features are explored by stepping through the main use cases in the Etherpad.
(Session proposed by Phil Day)
Make scheduler host state extensible:
The nova scheduler is periodically sent updates from each of the compute managers about the latest host capabilities / stats. This includes available ram, the amount of IOPS, the types of supported cpus, number of instances running, etc. Scheduler filters can then be defined, and guided using scheduler hints, to use this information to improve instance scheduling.
It would be useful if the information could be generalized so that services other than compute could also update the scheduler's host information.
* 3rd party extensions can feed information into the scheduler that their SchedulerFilters can interpret and make better scheduling decisions.
* This could be a good step in moving more scheduler code into oslo-incubator. The common scheduler code can accept updates for any source (e.g. cinder manager in cinder's case or compute manager in nova's case).
* If the scheduler does become a full independent service then this type of functionality will be required (i.e. the scheduler will need to make a decision based on information from both Cinder and Nova).
Related Summit Talks
* http://summit.openstack.org/cfp/details/121 Features like this could be implemented as small daemons on the hosts that can cast a message to the scheduler about the host rack ID.
* http://summit.openstack.org/cfp/details/120 Since we are opening up more customization to the type of data available to the scheduler, and the potential filters installed, this would be a good feature.
* http://summit.openstack.org/cfp/details/36 Updates the information that is sent to the Scheduler.
(Session proposed by David Scannell)
Coexistence of different schedulers:
Today in Nova only one scheduling algorithm can be active at any point in time. For environments that comprise multiple kinds of hardware (potentially optimized for different classed of workloads), it would be desirable to introduce flexibility in choosing different scheduler algorithms for different resource pools (aggregates). This can be done ei