-Kerberos and Health Checks and Bare Metal, Oh My! Updates to OpenStack Sahara in Newton

Post on 15-Apr-2017

63 views 0 download

Transcript of -Kerberos and Health Checks and Bare Metal, Oh My! Updates to OpenStack Sahara in Newton

Kerberos and Health Checks and Bare Metal, Oh My!Updates to OpenStack Sahara in Newton

Updates to OpenStack Sahara in Newton

Vitaly Gridnev, Sahara PTL (Mirantis)Elise Gafford, Sahara Core (Red Hat)

Nikita Konovalov, Sahara Core (Mirantis)

Agenda

1. Sahara overview2. Health checks and management improvements3. Kerberos integration for clusters4. Image generation improvements5. Bare metal clusters6. What is NEW in NEWton7. Q&A

Agenda

1. Sahara overview2. Health checks and management improvements3. Kerberos integration for clusters4. Image generation improvements5. Bare metal clusters6. What is NEW in NEWton7. Q&A

Sahara: The Use Cases

● Data Processing Cluster Management○ On-demand, scalable, configurable, persistent clusters○ Supports multiple plugins (Apache, Ambari, CDH, MapR...)○ Integrates with Heat, Glance, Nova, Neutron, and Cinder

● EDP (Elastic Data Processing)○ Supports multiple job types (Java, MR, Hive, Pig, Spark, Storm...)○ Supports transient clusters (spin up, process, shut down) or

persistent clusters○ Integrates with Swift and/or Manila (optionally)

Sahara: The API

Sahara: The Project

● Cluster provisioning plugins:○ Cloudera Distribution of Hadoop (using Cloudera Manager)○ Hortonworks Data Platform (using Apache Ambari)○ MapR○ “Vanilla” Apache Hadoop, Spark, and Storm

● EDP job types:○ MapReduce, Java, Hive, and Pig jobs (using Apache Oozie)○ Spark, Spark Streaming, and Storm jobs (using Apache Spark and Apache Storm)

● Image packing repository (sahara-image-elements)● Framework to validate Sahara installation (sahara-tests)● UI plugin● OpenStackClient plugin

Agenda

1. Sahara overview2. Health checks and management improvements3. Kerberos integration for clusters4. Image generation improvements5. Bare metal clusters6. What is NEW in NEWton7. Q&A

Event log for clusters

● Cluster events about provisioning: allows to understand what is the current status of cluster provisioning, or reasons of failure

● Available since Newton for clusters created by using Ambari

● Supported in CLI since Newton, with full dump of all steps and events

Event log for clusters

Event log for clusters

Health checks for clusters

● Users are interested in monitoring cluster state after cluster provisioning: vital for long living clusters

● Sahara in Liberty doesn't have any monitoring of the health of cluster processes. A cluster can be broken or unavailable but Sahara will still think that it is in ACTIVE status.

Health checks for clusters

● Clusters health checks have been implemented since Mitaka

● Available for clusters deployed using Ambari and Cloudera Manager. Less availability for vanilla clusters

● Since Newton checks are available for the MapR plugin

● Health results can be set to notify Ceilometer● Easy to recheck health

Health checks for clusters

Health checks for clusters

Health checks for clusters

Health checks for clustersNext steps are:

● More detailed health checks○ Particular datanode/slave failure○ No enough space in HDFS

● Suggestions/actions to repair health:○ Datanode replacement○ New nodes○ Restarting services

● More flexible configuration of health checks (advanced health checks, on disabling/enabling health checks for some reason)

Agenda

1. Sahara overview2. Health checks and management improvements3. Kerberos integration for clusters4. Image generation improvements5. Bare metal clusters6. What is NEW in NEWton7. Q&A

Security improvements● Security is an important part of created clusters● Previously security could be enabled only by

managers calling only Ambari and Cloudera Manager directly, but that leads to a situation in which Sahara will not perform auth operations, and EDP does not work

● Security is important not just for clusters, but for Sahara itself

Security improvements

In Newton the following Kerberos security features were implemented:

● MIT KDC can be preconfigured (or an existing KDC can be used)● Oozie client was re-implemented to support auth operations with Kerberos● Spark job executions are also supported● Keys are distributed on nodes for system users (hdfs, hadoop, spark)● Supported for clusters deployed using Ambari and Cloudera Manager● Note: Be sure that latest hadoop-swift jars are in place for Swift data sources!

Security improvements

Security improvements● Bandit tests per commit● Improved secret storage

(using Barbican and Castellan) was implemented in the previous release

Agenda

1. Sahara overview2. Health checks and management improvements3. Kerberos integration for clusters4. Image generation improvements5. Bare metal clusters6. What is NEW in NEWton7. Q&A

Where we were

Sahara had 2 flows that were relevant to image manipulation:

● Pre-Nova spawn image packing○ Used sahara-image-elements repository to generate images (to store in Glance)

● Post-Nova spawn cluster generation from “clean” (OS-only) images○ Logic maintained in Sahara process within plugins

● Pre-Configuration validation of images by plugins○ Remember how I said we had 2 flows relevant to image manipulation?○ We didn’t do this at all.

Where We Were: Problems

● Duplication of logic○ Steps required for packing images and “clean” image clusters were often identical, but had to

be expressed separately (in DIB and in Python).

● Poor validation○ Plugins did not validate that images provided to them met their needs.○ Failures due to image contents were late and sometimes difficult to understand.

● Poor encapsulation○ Image generation and cluster provisioning logic for any one plugin are really one application○ Maintaining them in two places allows versionitis and dependency problems○ Having one monolithic repo for all plugins makes them less pluggable

Our Dream Implementation

● All flows share common logic:○ Image packing○ Image validation○ Clean image cluster gen

● Image manipulation is stored and versioned within plugins● The user can still generate images with a CLI...● But they can also use an API to generate images in clean build environments● ... And both dev test cycles and user retries are as quick and painless as

possible

The plan

1. Build a validation engine that ensures that images meet a specificationa. YAML-based spec definition

2. Extend that engine to optionally modify images to spec3. Build a CLI to expose this functionality4. Create and test specifications for each plugin to support this method5. Deprecate sahara-image-elements (only when this method proves stable)6. Build an API to:

a. Spawn a clean tenant-plane image build environmentb. Download a base image from Glance and modify it to specc. Push the new image back to Glance and register it for use by Sahara

Where we are

1. Build a validation engine that ensures that images meet a specificationa. YAML-based spec definition

2. Extend that engine to optionally modify images to spec3. Build a CLI to expose this functionality4. Create and test specifications for each plugin to support this method5. Deprecate sahara-image-elements (only when this method proves stable)6. Build an API to:

a. Spawn a clean tenant-plane image build environmentb. Download a base image from Glance and modify it to specc. Push the new image back to Glance and register it for use by Sahara

What it looks like: the specs

● YAML-based definitions● Argument definitions for

configurability● Idempotent resource

declarations○ Scripts must be written

idempotently, as always in resource declarations

● Logical control operators (any, all, os_case, etc.)

What it looks like: the CLICommand structure:

sahara-image-pack --image ./image.qcow2 PLUGIN VERSION [plugin arguments]

Features:

● Auto-generates help text from arguments● Idempotent and modifies images in-place

○ Very fast test cycles and retries● Allows freeform bash scripts and more

structured resources○ Though it’s on you to make your scripts

idempotent● Test-only mode to validate without change

What it’s doing

The images module runs a sequence of steps against a remote machine

● Validation uses the Sahara SSH remote in read-only mode

● Clean image gen uses the SSH remote● Image packing uses a libguestfs Python

API image handle

All three use the same logic, contained in the appropriate plugin

Plugin implementation targeting O!

Agenda

1. Sahara overview2. Health checks and management improvements3. Kerberos integration for clusters4. Image generation improvements5. Bare metal clusters6. What is NEW in NEWton7. Q&A

Ironic integration

Why should you run Bare Metal in OpenStack:

● Big Data workload originates from Bare Metal installations● Quick cluster scalability may have lower priority than a long running stability

and persistence● Best performance by design, no virtualization overhead● The ability to manage a baremetal cluster with the OpenStack API

Bare Metal compared to Virtualized

Bare metal (Ironic) Virtual Machines

Cluster size flexibility Dedicating nodes completely. Flavor based scheduling

Resource utilization The host is 100% utilized. KVM has memory overhead. Other VM may abuse host’s resources.

Data locality Data is accessible directly from the local disks.

Locality may be achieved by proper resource scheduling

Live migration A host may be lost completely. Supported for some target daemons

Some tips before running Bare Metal

● Scheduling is not trivial. The Cloud operator may need to specify additional Flavors, Availability Zones, or other metadata

● Storage is not backed by Cinder for Bare Metal○ Sahara does disk discover on it’s own○ Disks are different from the on w/o root mount are going to be dedicated to HDFS

● Non-standard hardware will require drivers built into the provisioning image● Network tenant isolation is achievable through manual hardware switch

configurations

Agenda

1. Sahara overview2. Health checks and management improvements3. Kerberos integration for clusters4. Image generation improvements5. Bare metal clusters6. What is NEW in NEWton7. Q&A

What is NEW in NEWton

● Designate integration;● API Improvements: pagination for list operations, API to

manage/enable/disable plugins;● New plugin versions

○ HDP 2.4 supported○ MapR 5.2.0○ CDH 5.7.x○ Vanilla + Spark on YARN

What is NEW in Newton

● Sahara tests framework to validate environment readiness for Sahara’s clusters

○ Sahara tempest plugin with more tests (CLI, API)

○ Sahara scenario framework with a bunch of templates

○ Published on PyPi https://pypi.python.org/pypi/sahara-tests

Q&A

Useful links and materials

● Sahara wiki https://wiki.openstack.org/wiki/Sahara● Sahara specs https://specs.openstack.org/openstack/sahara-specs/● Sahara docs http://docs.openstack.org/developer/sahara/● Sahara images http://sahara-files.mirantis.com/images/upstream/newton/