Cloudera User Group SF - Cloudera Manager: APIs & Extensibility

26
1 Cloudera Manager – API’s & Extensibility Bala Venkatrao, Products@Cloudera December 2013

description

Presented by Bala Venkatrao, Director of Products at Cloudera, during our Bay Area Cloudera User Group on 12/10/13 in San Francisco.

Transcript of Cloudera User Group SF - Cloudera Manager: APIs & Extensibility

Page 1: Cloudera User Group SF - Cloudera Manager: APIs & Extensibility

1

Cloudera Manager – API’s & Extensibility

Bala Venkatrao, Products@Cloudera

December 2013

Page 2: Cloudera User Group SF - Cloudera Manager: APIs & Extensibility

Cloudera Manager

2

End-to-End Administration for CDH

ManageEasily deploy, configure & optimize clusters1

MonitorMaintain a central view of all activity2

DiagnoseEasily identify and resolve issues3

IntegrateUse Cloudera Manager with existing tools4

©2013 Cloudera, Inc. All Rights Reserved.

Page 3: Cloudera User Group SF - Cloudera Manager: APIs & Extensibility

Integrating with your IT Mgmt tools

3 ©2013 Cloudera, Inc. All Rights Reserved.

Cloudera

Manager

Installation,

Deployment

tools

e.g. Chef,

Puppet etc.

Monitoring

Tools

e.g. Orion,

Tivoli, BMC

etc.

Alerting

Tools

e.g Nagios,

SNMP etc.

Hadoop Operations

Datacenter OperationsVarious options of integrating Cloudera Manager into your existing

Datacenter Operations/Tools

• Cloudera Manager API

• Introduced in CM4 (June 2012)

• Installation & deployment

• Monitoring

• SNMP Alerts

• Introduced in CM4.5 (Feb 2013)

• And more…

• Monitoring ‘tsquery’ (Feb 2013)

• User-defined triggers/alarms (new for C5!)

• Service extensibility (new for C5!)

Page 4: Cloudera User Group SF - Cloudera Manager: APIs & Extensibility

Cloudera Manager (CM) API

• API access was a feature introduced in Cloudera Manager 4.0, providing programmatic access

to cluster operations (such as configuration and restart) and monitoring information (such as

health and metrics).

• The CM API is an HTTP REST API, using JSON serialization. The API is served on the same host

and port as the CM web UI, and does not require an extra process or extra configuration. API

users have the same privileges as they do in the web UI world.

©2013Cloudera, Inc. All Rights Reserved.4

• Docs & Examples

http://cloudera.github.io/cm_api/

https://github.com/cloudera/cm_api

• Java/Python clients

http://blog.cloudera.com/blog/2013/05/how-to-

automate-your-hadoop-cluster-from-java/

Page 5: Cloudera User Group SF - Cloudera Manager: APIs & Extensibility

Examples of integration with CM API

• Installation & Deployment• Chef/Puppet

• Dell Crowbar• http://blog.cloudera.com/blog/2013/08/how-to-deploy-hadoop-clusters-automatically-with-

dell-crowbar-and-cloudera-manager/

• StackIQ• http://web.stackiq.com/blog/bid/312064/StackIQ-Cluster-Manager-now-integrated-with-

Cloudera

• WANdisco – non-stop NN setup

• Several other customers/partners leveraging the API’s as part of their install & deployment process

• Monitoring & Alerting• Oracle Enterprise Manager (via Big Data Appliance)

• Nagios• https://github.com/cloudera/cm_api/tree/master/nagios

• https://github.com/harisekhon/nagios-plugins/blob/master/check_hadoop_cloudera_manager_metrics.pl

• SNMP alerts integration with IBM Netcool

©2013 Cloudera, Inc. All Rights Reserved.5

Develop & Contribute your plug-in’s using Cloudera

Manager API

Page 6: Cloudera User Group SF - Cloudera Manager: APIs & Extensibility

Cloudera Manager – Monitoring via ‘tsquery’

6 ©2013 Cloudera, Inc. All Rights Reserved.

• Introduced as part of CM4.5 release (Feb 2013)

• Great way to add interesting charts (above & beyond what is provided by default) and monitor metrics that are relevant to your clusters

• The tsquery language is used to specify statements for retrieving time-series data from the Cloudera Manager time-series data store

• Example: How do I compare all disk IO for all the DataNodes that belong to a specific HDFS service?

select bytes_read, bytes_written where roleType=DATANODE and serviceName=hdfs1

• Retrieved time-series data can be plotted via various options – line, bar, scatter, heat maps, table list etc.

• Extending this concept to create user-defined triggers/alarms (new for C5!).

• More details• http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM5/latest/Cloudera-Manager-

Diagnostics-Guide/cm5dg_chart_time_series_data.html

Page 7: Cloudera User Group SF - Cloudera Manager: APIs & Extensibility

Examples of Cloudera Manager ‘tsquery’

7 ©2013 Cloudera, Inc. All Rights Reserved.

Example1: How do I track the

aggregate Cluster Disk IO?

select dt0(read_bytes_disk_sum),

dt0(write_bytes_disk_sum) where

category = CLUSTER and clusterId =

$CLUSTERID

Example2: How do I compare CPU

usage across hosts?select dt0(total_cpu_user) / getHostFact(numCores, 1) * 100,

dt0(total_cpu_system) / getHostFact(numCores, 1) * 100,

dt0(total_cpu_nice) / getHostFact(numCores, 1) * 100,

dt0(total_cpu_iowait) / getHostFact(numCores, 1) * 100,

dt0(total_cpu_irq) / getHostFact(numCores, 1) * 100,

dt0(total_cpu_soft_irq) / getHostFact(numCores, 1) * 100

Create & Contribute your ‘tsqueries’!

https://github.com/cloudera/cm_charting_scrapbook

Page 8: Cloudera User Group SF - Cloudera Manager: APIs & Extensibility

Cloudera as an Application Platform

8 ©2013Cloudera, Inc. All Rights Reserved.

Core Database

Workload

Mgmt

DriversJDBC/ODBC

Security

Mgmt

Data

Access

API’s

ISV’s view of a Database

Systems

Mgmt

Core OS kernel

Package

Mgmt

Process/

Resource

Mgmt

Security

Mgmt

Data

Access

API’s

ISV’s view of an OS

Systems

Mgmt

Page 9: Cloudera User Group SF - Cloudera Manager: APIs & Extensibility

Cloudera as an Application Platform

9 ©2013Cloudera, Inc. All Rights Reserved.

CDH

Package

MgmtDrivers

JDBC/ODBC

Security

Mgmt

Data

Access

API’s

ISV’s view of Cloudera

Systems

Mgmt

Workload/

Process

Mgmt

Page 10: Cloudera User Group SF - Cloudera Manager: APIs & Extensibility

Cloudera Platform Features

10 ©2013Cloudera, Inc. All Rights Reserved.

Features Description Examples

Package Mgmt - Ability to easily package and distribute binaries/jars via

“Parcels”

Informatica, Syncsort, LZO libraries

Workload/ Process Mgmt - Ability to deploy applications as stand-alone processes

or via YARN* on the Hadoop cluster

- Isolation of cluster resources

SAS, 0xData, Accumulo, Spark

Security Mgmt - Support for Kerberos Mgmt

- Role bases access control for Tables/Views in

Hive/Impala via Sentry

Data Access API’s - HDFS API, HBase API, Search API, Spark API

- Kite (formerly Cloudera Development Kit)

Causata, Basis Tech, CounterTack, Amdocs

Drivers - ODBC/JDBC drivers for Hive/Impala Zoomdata, Tableau, Microstrategy, Qlikview

Systems Mgmt - End-to-End management of an application via Cloudera

Manager (CM)

StackIQ, Dell Crowbar, Oracle OEM

Manage -Deploy and upgrade (rolling) services and pkgs

-Manage configurations

Monitor -Proactive health checks

-Track resource utilization

-Custom metrics charts

Diagnose -Distributed log collection and searching

-Tag and track key events

Integrate -Access CM via API

* Support for YARN planned as part of CM5.x in FY14

Page 11: Cloudera User Group SF - Cloudera Manager: APIs & Extensibility

Example – Deployment via Parcels

• Smarter Architecture: No code generation. ETL engine runs natively

within Hadoop MapReduce, via plugin included in CDH 4.2

• Smarter Deployment & Administration: Seamless integration with

Cloudera Manager for one-click deployment and easier

administration

• Smarter Monitoring: Comprehensive logging capabilities + activity

monitoring through Cloudera Manager

+The platform for Big Data The ETL app for hadoop

11 ©2013Cloudera, Inc. All Rights Reserved.

Page 12: Cloudera User Group SF - Cloudera Manager: APIs & Extensibility

How it works

1. Download Syncsort DMX-h “Parcel” file to your custom repository

A B CFind Nodes Install

Components

Assign Roles

Enter the names of the hosts

which will be included in the

Hadoop cluster. Click

Continue.

Cloudera Manager

automatically installs the CDH

components on the hosts you

specified.

Verify the roles of the nodes

within your cluster. Make

changes as necessary.

2. Distribute & activate DMX-h parcel on your Cloudera cluster

� File contains everything you need to properly

deploy Syncsort DMX-h ETL Edition on Cloudera

12 ©2013Cloudera, Inc. All Rights Reserved.

Page 13: Cloudera User Group SF - Cloudera Manager: APIs & Extensibility

Syncsort DMX-h + Cloudera Manager

13

Installation

Management

Monitoring

Support

Integration

A

P

I

CDH Cluster + ISV softwareCloudera Manager

Syncsort

DMX-h

CDH Nodes DMX-h on every CDH node

13 ©2013Cloudera, Inc. All Rights Reserved.

Page 14: Cloudera User Group SF - Cloudera Manager: APIs & Extensibility

Get a 360° View of Your Cluster, Including DMX-h Logs

View service health

& performance

Monitor &

diagnose workloads

…And more!!

Get host-level

snapshots

Gather, view & search

Hadoop & DMX-h logs

14 ©2013Cloudera, Inc. All Rights Reserved.

Build and Distribute your own Parcels via Cloudera Manager and

share it with the community !

Page 15: Cloudera User Group SF - Cloudera Manager: APIs & Extensibility

Service Extensibility

• Introduced in C5

• Still in Beta!

• Single management console for CDH, non-CDH services and

ISV applications

• Similar look and feel as existing services

• Easy to write (Java-free!)

• Flexible

• Independent release cycle

©2013Cloudera, Inc. All Rights Reserved.15

Page 16: Cloudera User Group SF - Cloudera Manager: APIs & Extensibility

So.. How does it work?

• A JSON file that describes of your service

• Set of control scripts

• Packaged as a JAR file

• As promised, Java-free

©2013Cloudera, Inc. All Rights Reserved.16

Page 17: Cloudera User Group SF - Cloudera Manager: APIs & Extensibility

Example: Cloudera Manager Extensions - Spark

©2013Cloudera, Inc. All Rights Reserved.17

Page 18: Cloudera User Group SF - Cloudera Manager: APIs & Extensibility

Cloudera Manager Extensions

©2013Cloudera, Inc. All Rights Reserved.18

Page 19: Cloudera User Group SF - Cloudera Manager: APIs & Extensibility

Cloudera Manager Extensions: Spark

©2013Cloudera, Inc. All Rights Reserved.19

Page 20: Cloudera User Group SF - Cloudera Manager: APIs & Extensibility

Cloudera Manager Extensions: Spark

©2013Cloudera, Inc. All Rights Reserved.20

Page 21: Cloudera User Group SF - Cloudera Manager: APIs & Extensibility

Cloudera Manager Extensions: Spark

©2013Cloudera, Inc. All Rights Reserved.21

Page 22: Cloudera User Group SF - Cloudera Manager: APIs & Extensibility

#!/bin/bash

CMD=$1

MASTER_PORT=<read in from ./params.properties>

case $CMD in

(start_master)

exec $SPARK_HOME/scripts/spark-start.sh master"

;;

(*)

echo "$timestamp Don't understand [$CMD]"

;;

esac

name : “spark”,

roles : [{

name : "master",

startRunner : {

program : "scripts/control.sh",

args : [ "start_master",

"./params.properties"]

},

parameters : [{

name : "master_port",

type : "port",

default : 7077

}],

configWriter : {

generators : [{

filename : "params.properties"

}]

}]

The Code

©2013Cloudera, Inc. All Rights Reserved.22

Page 23: Cloudera User Group SF - Cloudera Manager: APIs & Extensibility

Next Steps

• Documentation & SDK as part of C5 Beta2

or later (definitely before GA!)

• Working with select ISV’s (SAS, 0xData

etc.) as part of Beta to further fine-tune

this feature

©2013Cloudera, Inc. All Rights Reserved.

Develop & Contribute your Cloudera Manager service extensibility

plug-in’s !

23

Page 24: Cloudera User Group SF - Cloudera Manager: APIs & Extensibility

Vision of CM Extensibility

24

CDHCM

Syncsort Informatica

Security ISV’s

0xData

Capacity Mgr SLA Mgr Cost

Optimizer

API

Horizontal Extension

Ver

tica

l Ext

ensi

on

Ser

vice

Ext

ensi

bili

tyOps Apps

SAS

Revolution

Spark GiraphAccumulo

Oracle OEM DellNagios

APISNMP

Chef/Puppet

©2013Cloudera, Inc. All Rights Reserved.

Page 25: Cloudera User Group SF - Cloudera Manager: APIs & Extensibility

Q&A

©2013Cloudera, Inc. All Rights Reserved.25

• If you interested in learning more,

participating in Beta, contributing plug-ins

or Apps, contact: [email protected]

Page 26: Cloudera User Group SF - Cloudera Manager: APIs & Extensibility

Appendix/Resources

©2013Cloudera, Inc. All Rights Reserved.26

• Systems Management

• Cloudera Manager API

• http://cloudera.github.io/cm_api/

• http://blog.cloudera.com/blog/2013/05/how-to-automate-your-hadoop-cluster-from-java/

• Package Management

• Docs on Parcels

• http://training.cloudera.com/elearning/Parcels/

• http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM4Ent/latest/Cloudera-Manager-

Introduction/cmi_primer.html

• http://blog.cloudera.com/blog/2013/05/faq-understanding-the-parcel-binary-distribution-format/

• http://blog.cloudera.com/blog/2013/07/one-engineers-experience-with-parcel/

• Data Access API’s

• http://blog.cloudera.com/blog/2013/05/cloudera-development-kit-cdk/

• https://github.com/cloudera/cdk

• Workload/Resource Management

• Cloudera Manager 5 documentation

• http://cloudera.com/content/cloudera-content/cloudera-docs/CM5/latest/Cloudera-Manager-Managing-

Clusters/cm5mc_managing_resources.html

• http://blog.cloudera.com/blog/2013/05/how-the-sas-and-cloudera-platforms-work-together/

• Security Management

• http://blog.cloudera.com/blog/2013/07/with-sentry-cloudera-fills-hadoops-enterprise-security-gap/