-Kerberos and Health Checks and Bare Metal, Oh My! Updates to OpenStack Sahara in Newton

39
Kerberos and Health Checks and Bare Metal, Oh My! Updates to OpenStack Sahara in Newton Updates to OpenStack Sahara in Newton Vitaly Gridnev, Sahara PTL (Mirantis) Elise Gafford, Sahara Core (Red Hat) Nikita Konovalov, Sahara Core (Mirantis)

Transcript of -Kerberos and Health Checks and Bare Metal, Oh My! Updates to OpenStack Sahara in Newton

Page 1: -Kerberos and Health Checks and Bare Metal, Oh My! Updates to OpenStack Sahara in Newton

Kerberos and Health Checks and Bare Metal, Oh My!Updates to OpenStack Sahara in Newton

Updates to OpenStack Sahara in Newton

Vitaly Gridnev, Sahara PTL (Mirantis)Elise Gafford, Sahara Core (Red Hat)

Nikita Konovalov, Sahara Core (Mirantis)

Page 2: -Kerberos and Health Checks and Bare Metal, Oh My! Updates to OpenStack Sahara in Newton

Agenda

1. Sahara overview2. Health checks and management improvements3. Kerberos integration for clusters4. Image generation improvements5. Bare metal clusters6. What is NEW in NEWton7. Q&A

Page 3: -Kerberos and Health Checks and Bare Metal, Oh My! Updates to OpenStack Sahara in Newton

Agenda

1. Sahara overview2. Health checks and management improvements3. Kerberos integration for clusters4. Image generation improvements5. Bare metal clusters6. What is NEW in NEWton7. Q&A

Page 4: -Kerberos and Health Checks and Bare Metal, Oh My! Updates to OpenStack Sahara in Newton

Sahara: The Use Cases

● Data Processing Cluster Management○ On-demand, scalable, configurable, persistent clusters○ Supports multiple plugins (Apache, Ambari, CDH, MapR...)○ Integrates with Heat, Glance, Nova, Neutron, and Cinder

● EDP (Elastic Data Processing)○ Supports multiple job types (Java, MR, Hive, Pig, Spark, Storm...)○ Supports transient clusters (spin up, process, shut down) or

persistent clusters○ Integrates with Swift and/or Manila (optionally)

Page 5: -Kerberos and Health Checks and Bare Metal, Oh My! Updates to OpenStack Sahara in Newton

Sahara: The API

Page 6: -Kerberos and Health Checks and Bare Metal, Oh My! Updates to OpenStack Sahara in Newton

Sahara: The Project

● Cluster provisioning plugins:○ Cloudera Distribution of Hadoop (using Cloudera Manager)○ Hortonworks Data Platform (using Apache Ambari)○ MapR○ “Vanilla” Apache Hadoop, Spark, and Storm

● EDP job types:○ MapReduce, Java, Hive, and Pig jobs (using Apache Oozie)○ Spark, Spark Streaming, and Storm jobs (using Apache Spark and Apache Storm)

● Image packing repository (sahara-image-elements)● Framework to validate Sahara installation (sahara-tests)● UI plugin● OpenStackClient plugin

Page 7: -Kerberos and Health Checks and Bare Metal, Oh My! Updates to OpenStack Sahara in Newton

Agenda

1. Sahara overview2. Health checks and management improvements3. Kerberos integration for clusters4. Image generation improvements5. Bare metal clusters6. What is NEW in NEWton7. Q&A

Page 8: -Kerberos and Health Checks and Bare Metal, Oh My! Updates to OpenStack Sahara in Newton

Event log for clusters

● Cluster events about provisioning: allows to understand what is the current status of cluster provisioning, or reasons of failure

● Available since Newton for clusters created by using Ambari

● Supported in CLI since Newton, with full dump of all steps and events

Page 9: -Kerberos and Health Checks and Bare Metal, Oh My! Updates to OpenStack Sahara in Newton

Event log for clusters

Page 10: -Kerberos and Health Checks and Bare Metal, Oh My! Updates to OpenStack Sahara in Newton

Event log for clusters

Page 11: -Kerberos and Health Checks and Bare Metal, Oh My! Updates to OpenStack Sahara in Newton

Health checks for clusters

● Users are interested in monitoring cluster state after cluster provisioning: vital for long living clusters

● Sahara in Liberty doesn't have any monitoring of the health of cluster processes. A cluster can be broken or unavailable but Sahara will still think that it is in ACTIVE status.

Page 12: -Kerberos and Health Checks and Bare Metal, Oh My! Updates to OpenStack Sahara in Newton

Health checks for clusters

● Clusters health checks have been implemented since Mitaka

● Available for clusters deployed using Ambari and Cloudera Manager. Less availability for vanilla clusters

● Since Newton checks are available for the MapR plugin

● Health results can be set to notify Ceilometer● Easy to recheck health

Page 13: -Kerberos and Health Checks and Bare Metal, Oh My! Updates to OpenStack Sahara in Newton

Health checks for clusters

Page 14: -Kerberos and Health Checks and Bare Metal, Oh My! Updates to OpenStack Sahara in Newton

Health checks for clusters

Page 15: -Kerberos and Health Checks and Bare Metal, Oh My! Updates to OpenStack Sahara in Newton

Health checks for clusters

Page 16: -Kerberos and Health Checks and Bare Metal, Oh My! Updates to OpenStack Sahara in Newton

Health checks for clustersNext steps are:

● More detailed health checks○ Particular datanode/slave failure○ No enough space in HDFS

● Suggestions/actions to repair health:○ Datanode replacement○ New nodes○ Restarting services

● More flexible configuration of health checks (advanced health checks, on disabling/enabling health checks for some reason)

Page 17: -Kerberos and Health Checks and Bare Metal, Oh My! Updates to OpenStack Sahara in Newton

Agenda

1. Sahara overview2. Health checks and management improvements3. Kerberos integration for clusters4. Image generation improvements5. Bare metal clusters6. What is NEW in NEWton7. Q&A

Page 18: -Kerberos and Health Checks and Bare Metal, Oh My! Updates to OpenStack Sahara in Newton

Security improvements● Security is an important part of created clusters● Previously security could be enabled only by

managers calling only Ambari and Cloudera Manager directly, but that leads to a situation in which Sahara will not perform auth operations, and EDP does not work

● Security is important not just for clusters, but for Sahara itself

Page 19: -Kerberos and Health Checks and Bare Metal, Oh My! Updates to OpenStack Sahara in Newton

Security improvements

In Newton the following Kerberos security features were implemented:

● MIT KDC can be preconfigured (or an existing KDC can be used)● Oozie client was re-implemented to support auth operations with Kerberos● Spark job executions are also supported● Keys are distributed on nodes for system users (hdfs, hadoop, spark)● Supported for clusters deployed using Ambari and Cloudera Manager● Note: Be sure that latest hadoop-swift jars are in place for Swift data sources!

Page 20: -Kerberos and Health Checks and Bare Metal, Oh My! Updates to OpenStack Sahara in Newton

Security improvements

Page 21: -Kerberos and Health Checks and Bare Metal, Oh My! Updates to OpenStack Sahara in Newton

Security improvements● Bandit tests per commit● Improved secret storage

(using Barbican and Castellan) was implemented in the previous release

Page 22: -Kerberos and Health Checks and Bare Metal, Oh My! Updates to OpenStack Sahara in Newton

Agenda

1. Sahara overview2. Health checks and management improvements3. Kerberos integration for clusters4. Image generation improvements5. Bare metal clusters6. What is NEW in NEWton7. Q&A

Page 23: -Kerberos and Health Checks and Bare Metal, Oh My! Updates to OpenStack Sahara in Newton

Where we were

Sahara had 2 flows that were relevant to image manipulation:

● Pre-Nova spawn image packing○ Used sahara-image-elements repository to generate images (to store in Glance)

● Post-Nova spawn cluster generation from “clean” (OS-only) images○ Logic maintained in Sahara process within plugins

● Pre-Configuration validation of images by plugins○ Remember how I said we had 2 flows relevant to image manipulation?○ We didn’t do this at all.

Page 24: -Kerberos and Health Checks and Bare Metal, Oh My! Updates to OpenStack Sahara in Newton

Where We Were: Problems

● Duplication of logic○ Steps required for packing images and “clean” image clusters were often identical, but had to

be expressed separately (in DIB and in Python).

● Poor validation○ Plugins did not validate that images provided to them met their needs.○ Failures due to image contents were late and sometimes difficult to understand.

● Poor encapsulation○ Image generation and cluster provisioning logic for any one plugin are really one application○ Maintaining them in two places allows versionitis and dependency problems○ Having one monolithic repo for all plugins makes them less pluggable

Page 25: -Kerberos and Health Checks and Bare Metal, Oh My! Updates to OpenStack Sahara in Newton

Our Dream Implementation

● All flows share common logic:○ Image packing○ Image validation○ Clean image cluster gen

● Image manipulation is stored and versioned within plugins● The user can still generate images with a CLI...● But they can also use an API to generate images in clean build environments● ... And both dev test cycles and user retries are as quick and painless as

possible

Page 26: -Kerberos and Health Checks and Bare Metal, Oh My! Updates to OpenStack Sahara in Newton

The plan

1. Build a validation engine that ensures that images meet a specificationa. YAML-based spec definition

2. Extend that engine to optionally modify images to spec3. Build a CLI to expose this functionality4. Create and test specifications for each plugin to support this method5. Deprecate sahara-image-elements (only when this method proves stable)6. Build an API to:

a. Spawn a clean tenant-plane image build environmentb. Download a base image from Glance and modify it to specc. Push the new image back to Glance and register it for use by Sahara

Page 27: -Kerberos and Health Checks and Bare Metal, Oh My! Updates to OpenStack Sahara in Newton

Where we are

1. Build a validation engine that ensures that images meet a specificationa. YAML-based spec definition

2. Extend that engine to optionally modify images to spec3. Build a CLI to expose this functionality4. Create and test specifications for each plugin to support this method5. Deprecate sahara-image-elements (only when this method proves stable)6. Build an API to:

a. Spawn a clean tenant-plane image build environmentb. Download a base image from Glance and modify it to specc. Push the new image back to Glance and register it for use by Sahara

Page 28: -Kerberos and Health Checks and Bare Metal, Oh My! Updates to OpenStack Sahara in Newton

What it looks like: the specs

● YAML-based definitions● Argument definitions for

configurability● Idempotent resource

declarations○ Scripts must be written

idempotently, as always in resource declarations

● Logical control operators (any, all, os_case, etc.)

Page 29: -Kerberos and Health Checks and Bare Metal, Oh My! Updates to OpenStack Sahara in Newton

What it looks like: the CLICommand structure:

sahara-image-pack --image ./image.qcow2 PLUGIN VERSION [plugin arguments]

Features:

● Auto-generates help text from arguments● Idempotent and modifies images in-place

○ Very fast test cycles and retries● Allows freeform bash scripts and more

structured resources○ Though it’s on you to make your scripts

idempotent● Test-only mode to validate without change

Page 30: -Kerberos and Health Checks and Bare Metal, Oh My! Updates to OpenStack Sahara in Newton

What it’s doing

The images module runs a sequence of steps against a remote machine

● Validation uses the Sahara SSH remote in read-only mode

● Clean image gen uses the SSH remote● Image packing uses a libguestfs Python

API image handle

All three use the same logic, contained in the appropriate plugin

Plugin implementation targeting O!

Page 31: -Kerberos and Health Checks and Bare Metal, Oh My! Updates to OpenStack Sahara in Newton

Agenda

1. Sahara overview2. Health checks and management improvements3. Kerberos integration for clusters4. Image generation improvements5. Bare metal clusters6. What is NEW in NEWton7. Q&A

Page 32: -Kerberos and Health Checks and Bare Metal, Oh My! Updates to OpenStack Sahara in Newton

Ironic integration

Why should you run Bare Metal in OpenStack:

● Big Data workload originates from Bare Metal installations● Quick cluster scalability may have lower priority than a long running stability

and persistence● Best performance by design, no virtualization overhead● The ability to manage a baremetal cluster with the OpenStack API

Page 33: -Kerberos and Health Checks and Bare Metal, Oh My! Updates to OpenStack Sahara in Newton

Bare Metal compared to Virtualized

Bare metal (Ironic) Virtual Machines

Cluster size flexibility Dedicating nodes completely. Flavor based scheduling

Resource utilization The host is 100% utilized. KVM has memory overhead. Other VM may abuse host’s resources.

Data locality Data is accessible directly from the local disks.

Locality may be achieved by proper resource scheduling

Live migration A host may be lost completely. Supported for some target daemons

Page 34: -Kerberos and Health Checks and Bare Metal, Oh My! Updates to OpenStack Sahara in Newton

Some tips before running Bare Metal

● Scheduling is not trivial. The Cloud operator may need to specify additional Flavors, Availability Zones, or other metadata

● Storage is not backed by Cinder for Bare Metal○ Sahara does disk discover on it’s own○ Disks are different from the on w/o root mount are going to be dedicated to HDFS

● Non-standard hardware will require drivers built into the provisioning image● Network tenant isolation is achievable through manual hardware switch

configurations

Page 35: -Kerberos and Health Checks and Bare Metal, Oh My! Updates to OpenStack Sahara in Newton

Agenda

1. Sahara overview2. Health checks and management improvements3. Kerberos integration for clusters4. Image generation improvements5. Bare metal clusters6. What is NEW in NEWton7. Q&A

Page 36: -Kerberos and Health Checks and Bare Metal, Oh My! Updates to OpenStack Sahara in Newton

What is NEW in NEWton

● Designate integration;● API Improvements: pagination for list operations, API to

manage/enable/disable plugins;● New plugin versions

○ HDP 2.4 supported○ MapR 5.2.0○ CDH 5.7.x○ Vanilla + Spark on YARN

Page 37: -Kerberos and Health Checks and Bare Metal, Oh My! Updates to OpenStack Sahara in Newton

What is NEW in Newton

● Sahara tests framework to validate environment readiness for Sahara’s clusters

○ Sahara tempest plugin with more tests (CLI, API)

○ Sahara scenario framework with a bunch of templates

○ Published on PyPi https://pypi.python.org/pypi/sahara-tests

Page 38: -Kerberos and Health Checks and Bare Metal, Oh My! Updates to OpenStack Sahara in Newton

Q&A

Page 39: -Kerberos and Health Checks and Bare Metal, Oh My! Updates to OpenStack Sahara in Newton

Useful links and materials

● Sahara wiki https://wiki.openstack.org/wiki/Sahara● Sahara specs https://specs.openstack.org/openstack/sahara-specs/● Sahara docs http://docs.openstack.org/developer/sahara/● Sahara images http://sahara-files.mirantis.com/images/upstream/newton/