Managing Enterprise Hadoop Clusters with Apache Ambari
-
Upload
hortonworks -
Category
Technology
-
view
1.254 -
download
0
Embed Size (px)
Transcript of Managing Enterprise Hadoop Clusters with Apache Ambari

1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Managing Enterprise Hadoop Clusters with
Apache Ambari
Jayush Luniya @ Hortonworks Apache Ambari PMC
© Hortonworks Inc. 2011 – 2016. All Rights Reserved May 2016

2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
Ambari Overvie
w
Ambari Features Demo Q&A

3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
What’s Apache Ambari?
100% open-source platform for simplifying
Hadoop cluster management and
use.
Highly extensible.

4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
It’s a wild zoo out there!Gotta manage this
efficiently.

5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Ambari Themes
• Deliver the core operational capabilities to provision, manage and monitor Hadoop clusters at scale.
Operate Hadoop at Scale
• Robust API for integration with existing enterprise systems, such as Microsoft SCOM and Teradata Viewpoint.
Integrate with the Enterprise
• Provide extensible platform for Customers, Partners and the Community (Stacks, Views)
Extend for the Ecosystem

6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Ambari

7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Open Source Activity

8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Inception: AMBARI-1 (Sept, 2011)

9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Fast forward 5 years to today…
Latest JIRA: AMBARI-16131 150+ Contributors 60+ Committers 16131 JIRAs filed 14254 JIRAs fixed
At 1.5 day per JIRA ~ 90 person years!
Used by hundreds of companies

10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ambari – 3rd Biggest Project* @ Apache
* Based on total JIRAs filed on a project basis as of April 26, 2016
#2: Hadoop at ~32k as it is split across multiple JIRA Projects
#1#3#4#5

11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Timeline
Ambari 1.6.*May 2014908 JIRAs
Ambari 1.5.*Apr 2014
1218 JIRAs
Ambari 1.7.*Dec 2014
1620 JIRAs
Ambari 2.0.* April 20151804 JIRAs
Current GA Version (2.2.2)
Ambari 2.1.*July 2015
2674 JIRAs
Ambari Stacks
Resolution of 9k+ JIRAs
Ambari Blueprints Ambari Views
Alerts FrameworkMetrics SystemRolling UpgradeKerberos Automation
Enhanced DashboardsSmart Configs
Ambari 2.2.*Dec 2015
1542 JIRAs
Express UpgradeAMS Grafana

12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
Ambari Overvie
w
Ambari Features Demo Q&A

13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Extensibility Features
• To add new Services (ISV or otherwise) beyond HDP stack• To customize a Stack for customer specific environmentsStacks
• To use Ambari for automating cluster installations.• To share best practices on layout and cluster configurationBlueprints
• To extend and customize the Ambari Web UI• Add new capabilities, customize existing capabilitiesViews

14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Anatomy of Ambari Extension Points

15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ambari Stacks

16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Stack Terminology
Term Definition Examples
STACK Defines a set of Services, where to obtain the software packages and how to manage the lifecycle.
HDP-2.3, HDP-2.2
SERVICE Defines the Components that make-up the service. HDFS, NAGIOS, YARN
COMPONENT The building-blocks of a Service, that adhere to a certain lifecycle.
NAMENODE, DATANODE, OOZIE_SERVER
CATEGORY The category of Component. MASTER, SLAVE, CLIENT
REPO Repository metadata where the artifacts reside http://public-repo-1.hortonworks.com/HDP/centos6/2.x/GA/2.3.0.0

17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ambari Stack Stacks define Services + Repo
– What is a stack, and where to get the bits
Each service has a definition– What components are part of the Service
Each service has defined lifecycle commands– start, stop, status, install, configure
Lifecycle is controlled via command scripts Ability to define “custom” commands
Ambari Server
Stack
Service Definitions
Command Scripts
xml python
Ambari Agents
Repos

18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Stacks Support Inheritance
HDP 2.1 Stack
HDP 2.0 Stack
Overrides any Service definitions, commands and configurations Adds new Services specific to this Stack
Defines a set of Service definitions Default service configurations and command scripts

19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Blueprints

20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Automated Cluster Deployment
Deploy clusters of any scale with ease Two REST API calls is all it takes to provision a clusterWho uses it? HDInsight (Microsoft Azure) Hortonworks QA

21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Example: Create a 100-node Cluster
{ "configurations" : [ { ”hdfs-site" : {
"dfs.datanode.data.dir" : ”/hadoop/1,/hadoop/2,/hadoop/3" } } ], "host_groups" : [ { "name" : ”master-host", "components" : [ { "name" : "NAMENODE” }, { "name" : "RESOURCEMANAGER” }, … ], "cardinality" : "1" }, { "name" : ”worker-host", "components" : [ { "name" : ”DATANODE” }, { "name" : ”NODEMANAGER” }, … ], "cardinality" : "1+" }, ], "Blueprints" : { "stack_name" : "HDP", "stack_version" : "2.0" }}
{ "blueprint" : ”my-blueprint", "host_groups" :[ { "name" : ”master-host", "hosts" : [ { "fqdn" : ”master001.ambari.apache.org”
} ] }, { "name" : ”worker-host", "hosts" : [ { "fqdn" : ”worker001.ambari.apache.org”
}, { "fqdn" : ”worker002.ambari.apache.org”
}, … { "fqdn" : ”worker099.ambari.apache.org”
} ] } ]}
1. POST /api/v1/blueprints/my-blueprint 2. POST /api/v1/clusters/my-cluster

22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Cluster Replication
{ "configurations" : [ { ”cluster-env" : {
”user_group" : ”hadoop" } ”hdfs-site" : {
"dfs.datanode.data.dir" : ”/hadoop/1,/hadoop/2,/hadoop/3" } } ], "host_groups" : [ { "name" : ”master-host", "components" : [ { "name" : "NAMENODE” }, { "name" : "RESOURCEMANAGER” }, … ], "cardinality" : "1" } ], "Blueprints" : { "stack_name" : "HDP", "stack_version" : "2.0" }}
GET/api/v1/clusters/my-cluster?format=blueprint
Export blueprint from an existing cluster Import blueprint to replicate the cluster

23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Blueprint Features
Ambari 2.0: High availability (HA) cluster deployments Adding hosts using blueprints (AMBARI-8458)Ambari 2.1: Advanced cluster creation options (AMBARI-10750)Ambari 2.2: Kerberized cluster deployments (AMBARI-13431) Stack advisor recommendations (AMBARI-13487)

24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Stack Upgrades

25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Stack Upgrades Rolling vs Express Upgrade modes Side-by-Side Bits and Configs
Bits:/usr/hdp/2.2.0.0-2041/usr/hdp/2.2.4.2-2/usr/hdp/2.3.0.0-3000
Configs:/etc/hive/conf/ (initial)/etc/hive/conf/v0 (HDP 2.2.4.2)/etc/hive/conf/v1 (HDP 2.3)
2.2.0.0 2.2.4.2 2.3.0.0minor jump major jump

26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Express vs Rolling Upgrade
Rolling Upgrade Services are up the entire time Upgrade one component at a time Robust and fault-tolerant Service checks performed frequently during the upgradeExpress Upgrade All services are brought down, upgraded and restarted Faster upgrade mode Planned service downtime Relatively service checks performed less frequently during the upgrade.

27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Stack Upgrade – Install Version
Install new version in parallel on all agents No downtime

28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Stack Upgrade – Orchestration
Not necessarily “one-click” but fully guided

29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Stack Upgrade – Upgrade Catalog
Upgrades are driven by upgrade catalogs defined in stack definitions. Defines upgrade groups and upgrade order Provides ability to modify configurations
– Set, move, delete, transform Upgrade steps can be marked as skippable and retryable Supports executing custom scripts during upgrade

30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Stack Upgrade – Upgrade Catalog

31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Stack Downgrade
Can trigger downgrade at any stage of the stack upgrade Cannot downgrade once stack upgrade has been finalized

32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Smart Configurations

33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hadoop Configuration Challenges
Too many configurations– Which ones are important?
Too easy to mess up– What are valid/reasonable values?– What are the units?– Ok, what about dependencies?
Gets harder with combinations of services, host assignments, enabled features, CPU/RAM/disks, etc– Any recommendations? What am I doing wrong?
Smart Configurations

34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ambari Smart Configs UI
Customizable layout
- Tabs- Sections- Sub-sections- Simple grid layout
(Advanced Tab contains remaining configurations)
New Widgets
- Sliders- Recommended- Minimum- Maximum- Increment Step
- Combos- Enumerated values
- Toggles- Binary options
- Spinners- Splits value into multiple
controls. Time in milliseconds split into days, hours, minutes.
- Lists- Enumerated values- Single select- Multi select
Implemented- HDFS- YARN- MapReduce- Hive- HBase

35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Stack Driven Layouts
Stack has theme.json file
Layout Tabs Sections Sub-sections
Placement Configs placement in sub-sections
Widgets Widget type Optional Units Bytes (B, KB, MB, GB, TB, PB) Time (Millis, Seconds, Minutes, Hours, Days, Months,
Years)
{ "name": "default", "description": "Default theme for HBASE service", "configuration": { "layouts": [ { "name": "default", "tabs": [ { "name": "settings", "display-name": "Settings", "layout": { "tab-columns": "3", "tab-rows": "3", "sections": [ ... ] } } ] } ], "placement": { "configuration-layout": "default", "configs": [...] }, "widgets": [ { "config": "hbase-env/hbase_master_heapsize", "widget": { "type": "slider", "units": [ { "unit-name": "GB" } ] } }, ... ] }}

36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Config Metadata and Dependencies
Extended Metadata Defined in property_value_attributes Hold non-UI metadata about value range,
increment, unit, etc
Dependencies Models bi-directional relationship between configs Depends On (property_depends_on)
Answers “which configs do I depend on?”
Depended By (dependencies) Answers “which configs are dependent on me?”
Ambari automatically updates dependencies
{ "StackConfigurations": { "final": "false", "property_depends_on": [ { "type": "yarn-site", "name": "yarn.nodemanager.resource.memory-mb" } ], "property_description": “The minimum allocation for every", "property_display_name": "Minimum Container Size (Memory)", "property_name": "yarn.scheduler.minimum-allocation-mb", "property_type": [], "property_value": "512", "property_value_attributes": { "type": "int", "maximum": "5120", "minimum": "0", "unit": "MB", "increment_step": "256" }, "type": "yarn-site.xml" }, "dependencies": [ { "StackConfigurationDependency": { "dependency_name": "hive.tez.container.size", "property_name": "yarn.scheduler.minimum-allocation-mb” } }, { "StackConfigurationDependency": { "dependency_name": "mapreduce.map.memory.mb", "property_name": "yarn.scheduler.minimum-allocation-mb” } }, { "StackConfigurationDependency": { "dependency_name": "mapreduce.reduce.memory.mb", "property_name": "yarn.scheduler.minimum-allocation-mb” } }… ]}

37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Metrics

38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ambari Metrics Service (AMS) - Goals
Ability to collect metrics from Hadoop and other Stack services Ability to collect system level metrics Ability to retain metrics at a high precision for a configurable time period Ability to automatically purge metrics after retention period Provide integration point for metrics collection and retention by external system Trigger alerts based on metrics in Ambari

39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ambari Metrics System - Architecture

40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
AMS Grafana
Ambari 2.2.2 Powerful dashboard builder integrated with AMS Pre-built Grafana dashboards for host-level and service-level metrics User can build and save custom dashboards

41 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
AMS Grafana

42 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Alerts

43 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Alert – Types
Type Description Status ThresholdsConfigurable?
PORT Watches a port based on a configuration property such as the URI. OK, WARN, CRIT Yes (seconds)
WEB Watches an HTTP or HTTPS endpoint and determines connectivity and HTTP status code. OK, WARN, CRIT No
AGGREGATE Aggregate of status for another alert definition. OK, WARN, CRIT Yes (percentage)
METRIC Watches a metric or series of metrics in JMX and compares a mathematical result against a threshold. OK, WARN, CRIT Yes (variable)
SCRIPT Uses a custom script to handle checking. OK or CRIT No

44 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
UI – Current Alerts
Configured by default; managed via the the web client

45 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
UI – Host Alerts
Automatically refreshes Query alert history

46 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
UI– Customization & Instances
Status text, thresholds, and interval

47 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Views

48 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ambari Views
View Framework Provide various applications accessible from Ambari Web UI – interact with the cluster via a
browser from a single place for all users (cluster operators, data analysis, developers, etc)
Easy to develop No need to understand Ambari core code – view development is just like creating any other web
application
Easy to deploy Packaged as a single jar file Auto create / auto configure

49 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
CS Queue Manager for Cluster Operators
Capacity Scheduler Queue Manager

50 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDFS File Browser for General Users
HDFS File Browser

51 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Job Analysis for Developers
Troubleshoot Tez JobsTroubleshoot / Improve Hive queries

52 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Query Editors for Data Analysts
Create, edit, execute, and analyze Hive queries Create, edit, and execute Pig scripts

53 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ambari Server in Views-Only mode
AmbariServer Cluster managed by Ambari
AmbariServer “Views-only” mode
(aka “Stand-alone” mode)Cluster not managed by Ambari
Management
Use Views
Use Views
Use Views
Use Views on existing clusters not managed by Ambari Can use Views against multiple clusters

54 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Kerberos Automation

55 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Kerberos Automation
Ambari 2.0 Ambari manage Kerberos principals and keytabs Works with existing MIT KDC or Active Directory Once Kerberized, seamlessly handle:
Adding new hosts Adding new components to existing hosts Adding new services Moving components to different hosts

56 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
Ambari Overvie
w
Ambari Features Demo Q&A

57 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
Ambari Overvie
w
Ambari Features Demo Q&A

58 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thank You!
Try Ambari Follow the Ambari Quick Start Guide https://cwiki.apache.org/confluence/display/AMBARI/Quick+Start+Guide
Learn more Visit the project website http://ambari.apache.org/
Get Involved User Mailing List: [email protected]
Developer Mailing List: [email protected]
Use JIRA to file bugs and improvement requests https://issues.apache.org/jira/browse/AMBARI/
Jayush Luniya @ Hortonworks (Apache Ambari PMC)

59 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Future Roadmap
AMS Grafana Integration Ambari Management Packs Ambari Logsearch Patch Upgrades Multi Service Versions Multi Service Instances

60 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Q&A
Stats
Largest production clusters managed by Ambari ~1600 nodes, ~800 nodes
Largest test cluster for Ambari scale testing ~400 nodes
Largest test cluster where rolling upgrade was performed ~400 nodes~40 hours