Post on 15-Jul-2015
Topics Covered
• Overview• What is Ambari?• Provisioning • Managing• Monitoring
• Technical Layout• Terminology• Stacks• Blueprints• API Reference
• Building Custom Services
What is Apache Ambari?
• “the seat that one sits upon an elephant”• Provisioner, manager, and monitor of Apache Hadoop clusters• 100% open source• Driven by web app or RESTful APIs• Step-by-step wizards for installing / provisioning a cluster• Can be used to automate a cluster install• Distribution agnostic• “server-agent” type architecture• Central place for managing everything in Hadoop ecosystem• Built-in, extensible, pre-configured metrics collection and system alerting for
monitoring
Provisioning
• Manually provisioning a cluster doesn’t scale, Ambari will.
• Takes care of software dependencies• Installs user and service accounts• Scales to hundreds to a couple thousand nodes• Simple step-by-step installation wizard to guide you
through cluster setup• Choose what services should be on which host(s)• Customize specific service settings or use defaults• Steps to install:
• Install the Ambari Server• Install Ambari Agent• Choose and configure services to hosts• Install and sit back
• Note – Ambari’s definition of provisioning is in the scope of the Hadoop ecosystem, not general provisioning (salt, chef, puppet, etc.) Steps through wizard in GUI
Managing• Add and remove hosts• Add, remove, or modify services & components• Decommission or recommission nodes• Move hadoop namenode or secondary
namenode• Rolling restarts of hosts• Restart entire cluster or specific services • Rollback to previous configurations• View history of past configuration changes• Define host groups for better management• Search for specific hosts by name, ip address,
hardware specs, etc.• More management capabilities specific to
service• View MapReduce job history• Job logs• Jobs currently running
Figure 2. Service Management Options Example
Figure 3. Host management
Managing Con’t.
• Supports wide array of user authentication methods• Single user (default)• LDAP• Active Directory
• Kerberos support• Built-in user access control
• Control what users view and interact in GUI
Monitoring
• Uses existing open-source projects• Pre-configured from installation• Ganglia
• Monitoring, trending patterns, metrics collection
• Used by web interface for metric views & customizable widgets
• Lots of heat maps• Nagios
• Used for health checking and alerting• Email alerting• Customizable for new services
Stacks
• Stacks are a set of services, repo information, and meta information
• Separate from Ambari – Anyone can create and use a new stack
• Supports versioning• Supports inherits – new stack can inherit old stack
• New stack only contains new changes / services• Not part of Ambari. Stack is separate from Ambari.• By default, Ambari comes with the HDP stack
(Hortonworks)• Services in stacks define lifecycle commands (start,
stop, status, install, configure)• Lifecycle commands are controlled via command
scripts• Ability to define “custom” commands
Figure 6. Inside a Stack
Stack Details
• Agents download stack definitions and command scripts
• Agent executes commands locally
• If stack definition changes, agents will pull down latest stack definition
• Services are made up of components:• NameNode• SNameNode• DataNode• HDFS Client
• 3 types of components:• MASTER, SLAVE,
CLIENT
Figure 7. Layout of a Stack and it’s services
Blueprints
• Stacks are just a definition of what’s available • Blueprints are a specific cluster definition
• Maps what is installed in the cluster• Maps which hosts have what service components
• Stacks + Hosts = Blueprint• Can be exported from existing cluster and reused• Used for installation and automation with API• Contains the specifics• Blueprints in JSON file format
Ambari API
• API –anything web ui can do and more• Used for automation and integration • Examples of API uses:
• Get access to monitoring & metrics information
• Get resource usage of specific services• Create, delete, and update services• Start and stop services• Delete entire cluster• Query cluster with parameters
curl –username:password –H ‘X-Requested-By: ambari’ –X POST
–d @ambari-blueprint.json
http://{your.ambari.server}/api/v1/clusters/{cluster-name}
Building a Custom Service
• Choose to define a separate stack, inherit from another stack, or just put new service definition in existing stack (easiest for development)
• Define a metainfo.xml with the following:• Service name• Display name• Comment• Version• Components
• Component category• Cardinality• Command script• Timeouts
<service><name>GREENPLUM</name><displayName>Greenplum</displayName><comment>Pivotal Greenplum Database</comment><version>0.1</version>
<components><component>
<name>GREENPLUM_MASTER</name><displayName>Greenplum Master</displayName><category>MASTER</category><cardinality>1</cardinality><commandScript>
<script>scripts/master.py</script><scriptType>PYTHON</scriptType><timeout>4800</timeout>
</commandScript></component>
<component><name>GREENPLUM_SLAVE</name><displayName>Greenplum Segment</displayName><category>SLAVE</category><cardinality>1+</cardinality><commandScript>
<script>scripts/segment.py</script><scriptType>PYTHON</scriptType><timeout>600</timeout>
</commandScript>
………………..
Building a Custom Service Con’t.
• Create Xml Configuration Files• Define properties command scripts can use and users can edit through GUI or blueprint
<configuration><property>
<name>gp.installer.zip.file.location</name><value></value><description>The absolute file path of where the Greenplum installer zip file is on the master
host.</description></property><property>
<name>gp.installation.path</name><value>/usr/local</value><description>The absolute path to the install location. You must have write permissions to the location
you specify.</description></property><property>
<name>gp.admin.user</name><value>gpadmin</value><description>The Greenplum system user used to administer the Greenplum Database. The user will be
created on all Greenplum hosts.</description></property><property>
<name>gp.admin.password</name><value></value><description>The password for gp.admin.user.</description>
</property><property>
<name>use.mirrors</name><value>false</value><description>Create segment mirrors</description>
</property></configuration>
Creating a Custom Service Con’t.
import sys
from resource_management import *
class Slave(Script):
def install(self, env):
print 'Install the Sample Srv Slave';
def stop(self, env):
print 'Stop the Sample Srv Slave';
def start(self, env):
print 'Start the Sample Srv Slave';
def status(self, env):
print 'Status of the Sample Srv Slave';
def configure(self, env):
print 'Configure the Sample Srv Slave';
if __name__ == "__main__":
Slave().execute()
• Write the command scripts in Python inheriting from the “Script” class• Overload all lifecycle commands
• Install• Stop• Start• Status• configure
Summary
• Ambari is THE provisioner, manager, and monitor for Apache Hadoop clusters
• Great for automation, integration, and extensibility
• Easy step-by-step installation wizards• Simple managing and monitoring• Very powerful API• Stacks separated from Ambari framework• Services can be built for anything