Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and...

102
Dell EMC Hadoop Application Agent Version 4.5 Installation and Administration Guide 302-004-232 REV 01

Transcript of Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and...

Page 1: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

Dell EMC Hadoop Application AgentVersion 4.5

Installation and Administration Guide302-004-232

REV 01

Page 2: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

Copyright © 2016-2017 Dell Inc. or its subsidiaries. All rights reserved.

Published September 2017

Dell believes the information in this publication is accurate as of its publication date. The information is subject to change without notice.

THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS-IS.“ DELL MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND

WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF

MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. USE, COPYING, AND DISTRIBUTION OF ANY DELL SOFTWARE DESCRIBED

IN THIS PUBLICATION REQUIRES AN APPLICABLE SOFTWARE LICENSE.

Dell, EMC, and other trademarks are trademarks of Dell Inc. or its subsidiaries. Other trademarks may be the property of their respective owners.

Published in the USA.

Dell EMCHopkinton, Massachusetts 01748-91031-508-435-1000 In North America 1-866-464-7381www.DellEMC.com

2 Hadoop Application Agent 4.5 Installation and Administration Guide

Page 3: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

7

Hadoop Application Agent Overview 11Product description.....................................................................................12

Hadoop application agent capabilities.............................................12DD Boost backups and restores.................................................................. 12Environment and system requirements....................................................... 12

Data Domain Configuration 15Licensing the Data Domain system..............................................................16Enable DD Boost on a Data Domain system.................................................16Changing the DD Boost access rights......................................................... 17Enable encryption over a WAN connection................................................. 18Enable the DD Boost operations through a firewall..................................... 18Set up the storage units..............................................................................19Enable the distributed segment processing.................................................19Enable the advanced load balancing and link failover..................................20Validating the Data Domain system connection...........................................21

Installation 23Installation overview................................................................................... 24Download the software...............................................................................24Install the Hadoop application agent...........................................................25

Administration Using the GUI 29Configuration using the GUI....................................................................... 30

Configuration considerations......................................................... 30Download HDBoost service files.................................................... 30Hadoop application agent and Cloudera Manager...........................31Hadoop application agent and Ambari............................................33

HDBoost web application overview............................................................ 35Certificate requirements............................................................................ 35

Obtain a signed certificate.............................................................36Create a self-signed certificate..................................................... 36Configure the keystore and truststore........................................... 37Configure the keystore for Jetty....................................................37

Backup metadata........................................................................................38Create backups.......................................................................................... 38

Add backup devices....................................................................... 39Additional backup options..............................................................39Complete the configuration........................................................... 40

Change backup retention times.................................................................. 40Restore backups......................................................................................... 41Delete backups........................................................................................... 42

Preface

Chapter 1

Chapter 2

Chapter 3

Chapter 4

CONTENTS

Hadoop Application Agent 4.5 Installation and Administration Guide 3

Page 4: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

Administration Using the CLI 43Configuration using the CLI........................................................................ 44

Configuration considerations......................................................... 44Connect Hadoop to the Data Domain system................................ 44Add a Data Domain, back up an HDFS directory, and back up anHBase table................................................................................... 45Configure multiple Data Domain systems.......................................46Configure replication..................................................................... 47Configure Kerberos........................................................................48Maps parameter............................................................................ 48Audit logging..................................................................................49

Backup overview........................................................................................ 55Data flow overview..................................................................................... 57Back up HDFS data to a Data Domain system............................................ 58Back up HBase data to a Data Domain system........................................... 59Restore overview....................................................................................... 59Restore an HDFS backup........................................................................... 60Restore an HBase backup...........................................................................60Restore a replicated backup........................................................................61

Restoring a replicated backup with a device ID override................ 62Restoring a replicated backup with a device override.................... 63

List backup configurations..........................................................................64List backups............................................................................................... 64Search backups..........................................................................................65Clean up backups....................................................................................... 65Delete backups...........................................................................................66Refresh the Kerberos credentials cache..................................................... 66Test the connection to the Data Domain system........................................ 67Change retention dates.............................................................................. 68Erase the backup configuration.................................................................. 69Restore the configuration...........................................................................69Display the software version.......................................................................70Disaster recovery overview........................................................................ 70

Restore data from a lost namenode............................................... 70Restore data from a lost master Data Domain system.................... 71

Backup and Recovery Application Agent 73Backup and Recovery application agent overview...................................... 74Backup and Recovery application agent capabilities................................... 74Software requirements............................................................................... 74Install BoostFS............................................................................................74

MySQL backup script.................................................................... 75Passwordless SSH configuration................................................................ 75

Verify the SSH configuration......................................................... 76Backup and Recovery application agent installation overview.....................76

Download the software..................................................................76Install and configure BRBoost........................................................ 77

Configure synchronized backups with HDBoost and BRBoost.................... 78Create synchronized backups.....................................................................79Restore the databases................................................................................79Disaster recovery........................................................................................79Disaster recovery scenarios........................................................................80

Loss of the master Data Domain system........................................80Loss of the Hive client................................................................... 80Synchronized disaster recovery.....................................................82

Chapter 5

Chapter 6

CONTENTS

4 Hadoop Application Agent 4.5 Installation and Administration Guide

Page 5: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

Troubleshooting 83Troubleshooting overview...........................................................................84Log information.......................................................................................... 84

Command Reference 85hdboost command overview.......................................................................86

hdboost --addconfig......................................................................86hdboost --backup.......................................................................... 88hdboost --delete............................................................................89hdboost --eraseconfig................................................................... 90hdboost --expire............................................................................90hdboost --job................................................................................. 91hdboost --kerberos........................................................................ 91hdboost --list.................................................................................92hdboost --listconfig....................................................................... 93hdboost --restore.......................................................................... 93hdboost --retention....................................................................... 95hdboost --search...........................................................................96hdboost --test............................................................................... 96hdboost --version.......................................................................... 97

brboost command overview........................................................................97brboost --addconfig.......................................................................97brboost --backup...........................................................................98brboost --delete............................................................................ 99brboost --eraseconfig....................................................................99brboost --expire............................................................................ 99brboost --list................................................................................ 100brboost --listconfig...................................................................... 100brboost --retention....................................................................... 101brboost --test............................................................................... 101brboost --version.......................................................................... 101

Chapter 7

Chapter 8

CONTENTS

Hadoop Application Agent 4.5 Installation and Administration Guide 5

Page 6: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

CONTENTS

6 Hadoop Application Agent 4.5 Installation and Administration Guide

Page 7: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

Preface

As part of an effort to improve product lines, periodic revisions of software andhardware are released. Therefore, all versions of the software or hardware currently inuse might not support some functions that are described in this document. Theproduct release notes provide the most up-to-date information on product features.

If a product does not function correctly or does not function as described in thisdocument, contact a technical support professional.

Note

This document was accurate at publication time. To ensure that you are using thelatest version of this document, go to the Support website at https://support.emc.com.

PurposeThis guide includes information about how to install and administer the HadoopApplication Agent.

AudienceThis guide is intended for administrators who are responsible for installing andadministering the Hadoop Application Agent.

Revision HistoryThe following table presents the revision history of this document.

Table 1 Revision history

Revision Date Description

01 September 2017 Initial release of the Hadoop Application Agent 4.5Installation and Administration Guide.

The Hadoop Application Agent 4.5 Release Notes provides additional information.

For compatibility information, including specific backup software and hardwareconfigurations, go to:

http://compatibilityguide.emc.com:8080/CompGuideApp/getDDbeaCompGuidePage.do

The documentation for the following products provides additional information:

l Data Domain

l DD Boost

Special notice conventions that are used in this documentThe following conventions are used for special notices:

NOTICE

Identifies content that warns of potential business or data loss.

Hadoop Application Agent 4.5 Installation and Administration Guide 7

Page 8: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

Note

Contains information that is incidental, but not essential, to the topic.

Typographical conventionsThe following type style conventions are used in this document:

Table 2 Style conventions

Bold Used for interface elements that a user specifically selects or clicks,for example, names of buttons, fields, tab names, and menu paths.Also used for the name of a dialog box, page, pane, screen area withtitle, table label, and window.

Italic Used for full titles of publications that are referenced in text.

Monospace Used for:

l System code

l System output, such as an error message or script

l Pathnames, file names, file name extensions, prompts, andsyntax

l Commands and options

Monospace italic Used for variables.

Monospace bold Used for user input.

[ ] Square brackets enclose optional values.

| Vertical line indicates alternate selections. The vertical line means orfor the alternate selections.

{ } Braces enclose content that the user must specify, such as x, y, or z.

... Ellipses indicate non-essential information that is omitted from theexample.

You can use the following resources to find more information about this product,obtain support, and provide feedback.

Where to find product documentation

l https://support.emc.com

l https://community.emc.com

Where to get supportThe Support website at https://support.emc.com provides access to licensinginformation, product documentation, advisories, and downloads, as well as how-to andtroubleshooting information. This information may enable you to resolve a productissue before you contact Support.

To access a product specific Support page:

1. Go to https://support.emc.com/products.

2. In the Find a Product by Name box, type a product name, and then select theproduct from the list that appears.

3. Click the following button:

Preface

8 Hadoop Application Agent 4.5 Installation and Administration Guide

Page 9: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

4. (Optional) To add the product to My Saved Products, in the product specificpage, click Add to My Saved Products.

KnowledgebaseThe Knowledgebase contains applicable solutions that you can search for by solutionnumber, for example, 123456, or by keyword.

To search the Knowledgebase:

1. Go to https://support.emc.com.

2. Click Advanced Search.The screen refreshes and filter options appear.

3. In the Search Support or Find Service Request by Number box, type a solutionnumber or keywords.

4. (Optional) To limit the search to specific products, type a product name in theScope by product box, and then select the product from the list that appears.

5. In the Scope by resource list box, select Knowledgebase.The Knowledgebase Advanced Search panel appears.

6. (Optional) Specify other filters or advanced options.

7. Click the following button:

Live chatTo participate in a live interactive chat with a support agent:

1. Go to https://support.emc.com.

2. Click Chat with Support.

Service requestsTo obtain in-depth help from Support, submit a service request. To submit a servicerequest:

1. Go to https://support.emc.com.

2. Click Create a Service Request.

Note

To create a service request, you must have a valid support agreement. Contact a salesrepresentative for details about obtaining a valid support agreement or with questionsabout an account.

To review an open service request:

1. Go to https://support.emc.com.

2. Click Manage service requests.

Online communitiesGo to the Community Network at https://community.emc.com for peer contacts,conversations, and content on product support and solutions. Interactively engageonline with customers, partners, and certified professionals for all products.

How to provide feedbackFeedback helps to improve the accuracy, organization, and overall quality ofpublications. You can send feedback to [email protected].

Preface

Hadoop Application Agent 4.5 Installation and Administration Guide 9

Page 10: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

Preface

10 Hadoop Application Agent 4.5 Installation and Administration Guide

Page 11: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

CHAPTER 1

Hadoop Application Agent Overview

This chapter includes the following topics:

l Product description............................................................................................ 12l DD Boost backups and restores.......................................................................... 12l Environment and system requirements............................................................... 12

Hadoop Application Agent Overview 11

Page 12: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

Product descriptionHadoop application agent provides backup and recovery of Hadoop distributed filesystems (HDFS) and HBase tables to a Data Domain storage system. Hadoopapplication agent uses the distributed copy (DistCP) functionality native to Hadoop toperform backup and restore operations without the need for creating or managingmount points.

Hadoop application agent capabilitiesHadoop application agent provides CLI and GUI interfaces to perform the followingtasks:

l Configure backups

l Perform backups of Hadoop directories

l Perform offline backups of HBase tables

l Perform backups synchronized with Hive metastore backups

l Restore a full backup or a subset of backup objects

l Restore to the original or alternate location

l List backups that reside on the Data Domain system

l Delete backups

l Expire backups

l Enable and disable Kerberos authentication

DD Boost backups and restoresA DD Boost backup to a Data Domain system takes advantage of DD Boost features byusing three main components.

l The DD Boost File System connector provides file system interface to the DDBoost library.

l The DD Boost library API enables the backup software to communicate with theData Domain system.

l The distributed segment processing (DSP) component reviews the data alreadystored on the Data Domain system and sends only unique data from the Hadoophost to the Data Domain system for storage.Distributed Segment Processing (DSP) compares the data already stored on theData Domain system to the written backup data, sending only the unique datafrom the Hadoop host to the Data Domain system for storage.

During restore operations, the Data Domain System returns all the stored data toits original state before sending the data over the network.

Environment and system requirementsHadoop clusters can have integrated compute and storage resources, or the clusterscan use shared storage.

The Hadoop application agent requires:

Hadoop Application Agent Overview

12 Hadoop Application Agent 4.5 Installation and Administration Guide

Page 13: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

l Java

l Data Domain Operating System (DDOS)

l 64-bit Linux

n CENTOS

n Red Hat Enterprise Linux

n SUSE Enterprise Linux Server

The following Hadoop distributions are supported:

l Cloudera Hadoop (CDH)

l Hortonworks Data Platform (HDP)

For detailed information about software and hardware compatibility, includingsupported operating system versions, as well as Cloudera and Hortonworks DataPlatform versions, go to http://compatibilityguide.emc.com:8080/CompGuideApp/.

The Hadoop application agent can be installed in virtualized environments as long asthe virtual name node is running a supported Linux distribution.

Replication to a secondary Data Domain system is optional. A Data Domain replicationlicense is required to use replication functionality.

Hadoop Application Agent Overview

Environment and system requirements 13

Page 14: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

Hadoop Application Agent Overview

14 Hadoop Application Agent 4.5 Installation and Administration Guide

Page 15: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

CHAPTER 2

Data Domain Configuration

This chapter includes the following topics:

l Licensing the Data Domain system..................................................................... 16l Enable DD Boost on a Data Domain system........................................................ 16l Changing the DD Boost access rights................................................................. 17l Enable encryption over a WAN connection......................................................... 18l Enable the DD Boost operations through a firewall............................................. 18l Set up the storage units..................................................................................... 19l Enable the distributed segment processing........................................................ 19l Enable the advanced load balancing and link failover......................................... 20l Validating the Data Domain system connection.................................................. 21

Data Domain Configuration 15

Page 16: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

Licensing the Data Domain system

Note

The Data Domain administrator must configure the Data Domain system for DD Boostoperations. This chapter provides examples of the basic configurations. The DataDomain documentation provides details on the Data Domain system configurations.

You need the Data Domain Boost license to use the Hadoop application agentsoftware.

Contact your Data Domain representative for more information and to purchaselicensed features.

The Data Domain Operating System Administration Guide provides details about all thelicensed features and how to display and enable Data Domain licenses.

Enable DD Boost on a Data Domain systemEnable DD Boost on a Data Domain system through the ddboost enable commandor from the Data Domain System Manager on the Data Management > DD Boostpage as described in the Data Domain Operating System Administration Guide.

Note

DD Boost requires a separate license.

Use the Data Domain command line interface to complete the required administrationtasks. The Data Domain Operating System Command Reference Guide provides detailsabout the commands.

Procedure

1. On the Data Domain system, log in as an administrative user.

2. To verify that the file system is enabled and running, run the followingcommand:

# filesys status

The file system is enabled and running.

To enable the file system, run the following command:

# filesys enable

3. To verify that the DD Boost license is enabled, run the following command:

# license show

Feature licenses:## License Key Feature-- -------------------- --------1 ABCD-EFGH-IJKL-MNOP DDBOOST-- -------------------- --------

Data Domain Configuration

16 Hadoop Application Agent 4.5 Installation and Administration Guide

Page 17: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

If the DD Boost license is disabled, run the following command to add the DDBoost license by using the license key that Data Domain provided:

# license add <license_key>

License “ABCE-BCDA-CDAB-DABC” added.4. Establish the DD Boost username and password for the Data Domain system.

Note

The username, password, and role must be set up on the Data Domain systemas described in the Data Domain Operating System Administration Guide.

To establish the username and password, run the following commands:

# user add <username> password <password># ddboost set user-name <username>

Changing the DD Boost access rights on page 17 provides information abouthow changing a username and access rights affects the operations on a DataDomain system.

5. To enable DD Boost, run the following command:

# ddboost enable

DD Boost enabled6. To verify that DD Boost is enabled, run the following command:

# ddboost status

Changing the DD Boost access rightsBy default, when the DD Boost service is first enabled on a Data Domain system, theservice is accessible to all client hosts. You can use the ddboost access commandto override this default and restrict the access to specific client hosts.

For example, the Data Domain administrator can run the following commands toremove the default access permission for all hosts and add new access permissions fortwo specific client hosts, dbserver1.datadomain.com and dbserver2.datadomain.com.The Data Domain Operating System Command Reference Guide provides details aboutthe commands.

# ddboost disable# ddboost access del clients *# ddboost access add clients dbserver1.datadomain.com dbserver2.datadomain.com# ddboost enable

These commands establish a set of access controls that enable DD Boost access onlyto the two client hosts, dbserver1.datadomain.com and dbserver2.datadomain.com.

Consider the following guidelines when you change the DD Boost access rights:

Data Domain Configuration

Changing the DD Boost access rights 17

Page 18: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

l Ensure that no backup operations are running to the Data Domain system whenyou change any access rights. You can run the ddboost disable command toprevent operations while access is being changed.

l Specify only a fully qualified domain name, IP address, or resolvable DNS name forthe client when modifying the client access control list.

l After the access rights are changed, you can run the ddboost enable commandto enable DD Boost and the access rights will take effect.

You can run the ddboost clients show command to verify which hosts have theDD Boost access rights. If the command output is simply *, then all client hosts havethe access rights. For example:

# ddboost clients show

DD Boost access allowed from the following clients*

# ddboost clients show

DD Boost access allowed from the following clients:dbserver1.datadomain.comdbserver2.datadomain.com

Enable encryption over a WAN connectionThe Hadoop application agent provides support for DD Boost clients to have in-flightdata encryption with a Data Domain 5.5 or later operating system over a WANconnection.

To enable the in-flight data encryption over a WAN connection, you can configure theData Domain system with either medium-strength or high-strength TLS encryption.For example, to set the required TLS encryption for the client systems, type thefollowing command:

ddboost clients add <client_list> [encryption-strength {medium | high} authentication-mode {one-way | two-way | anonymous}] | [authentication-mode kerberos]

The configuration is transparent to the application agent. The Data Domain BoostAdministration Guide provides details.

Enable the DD Boost operations through a firewallThe Data Domain system as initially configured does not operate through a firewall,neither for a client host connection to a Data Domain system nor for one Data Domainsystem connection to another. If you need the Data Domain system to operatethrough a firewall, contact your network support provider.

The following ports must be open in a firewall to enable DD Boost backups andoptimized duplication:

l TCP 2049 (NFS)

l TCP 2051 (Replication)

Data Domain Configuration

18 Hadoop Application Agent 4.5 Installation and Administration Guide

Page 19: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

l TCP 111 (NFS portmapper)

l TCP xxx (select a port for NFS mountd, where the default MOUNTD port is 2052)

Set up the storage unitsOne or more storage units must be created on each Data Domain system that will beused with the Hadoop application agent. Each storage unit name on a single DataDomain system must be unique. However, you can use the same storage unit name onmore than one Data Domain system.

Note

Storage unit names are case-sensitive.

You must provide the storage unit name when you configure the operations with theHadoop application agent.

Create a storage unit through the ddboost storage-unit command or from theData Domain System Manager on the Data Management > DD Boost page asdescribed in the Data Domain Operating System Administration Guide.

For example, run the following command on the Data Domain system for each storageunit that you want to create:

# ddboost storage-unit create <storage_unit_name> user <username>

Run the following command to list the status of the storage units:

# ddboost storage-unit show

Name Pre-Comp (GiB) Status---------- -------------- ------SU_ABCDE03 5.8 RWSU_ABCDE5 9.8 RW/Q---------- -------------- ------ D : Deleted Q : Quota Defined RO : Read Only RW : Read Write

You must create at least one storage unit on each Data Domain system that you willuse with the Hadoop application agent.

Enable the distributed segment processingDistributed segment processing is a DD Boost software feature that uses the DDBoost library on the client host and the Data Domain software on the DDR. TheHadoop application agent loads the DD Boost library during backup and restoreoperations.

You must configure the distributed segment processing option on the Data Domainsystem. The option setting applies to all the client hosts and all the software that usesDD Boost on this Data Domain system.

Manage the distributed segment processing through the ddboost option commandor from the Data Domain System Manager on the Data Management > DD Boostpage as described in the Data Domain Operating System Administration Guide.

Data Domain Configuration

Set up the storage units 19

Page 20: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

To confirm whether or not DD Boost has distributed segment processing enabled, runthe command ddboost option show.

To configure the distributed segment processing option, run the following command:

# ddboost option set distributed-segment-processing {enabled | disabled}

Enabling or disabling the distributed segment processing option does not require arestart of the Data Domain file system.

Enable the advanced load balancing and link failoverThe advanced load balancing and link failover feature enables the combination ofmultiple Ethernet links into a group and the registration of only one interface on theData Domain system with the Hadoop application agent.

The Data Domain documentation provides details about the features and benefits ofadvanced load balancing and link failover.

If an interface group is configured when the Data Domain system receives data fromthe DD Boost client, the data transfer is load balanced and distributed as separate jobson the private network, providing greater throughput, especially for customers whouse multiple 1 GbE connections.

Manage the advanced load balancing and link failover through the ddboostifgroup command or from the Data Domain System Manager on the DataManagement > DD Boost page as described in the Data Domain Operating SystemAdministration Guide.

You can perform the following steps to create an interface group on the Data Domainsystem by adding existing interfaces to the group and registering the Data Domainsystem with the Hadoop application agent. After the interface group is set up, you canadd or delete interfaces from the group.

Procedure

1. To add the interfaces into the group, run the ddboost ifgroup command.The interfaces must have been created with the net command. For example:

# ddboost ifgroup default add interface 192.168.1.1# ddboost ifgroup default add interface 192.168.1.2# ddboost ifgroup default add interface 192.168.1.3# ddboost ifgroup default add interface 192.168.1.4

This example assumes that no additional named interface groups have beencreated and uses the default interface group.

2. Select one interface on the Data Domain system to register with the Hadoopapplication agent. Create a failover aggregated interface and register thatinterface with the Hadoop application agent. The Data Domain Operating SystemAdministration Guide describes how to create a virtual interface for linkaggregation.

It is not mandatory to use an interface in the ifgroup to register with theHadoop application agent. An interface that is not part of the ifgroup can alsobe used to register with the Hadoop application agent. The interface should beregistered with a resolvable name using DNS or any other name resolutionmechanism.

Data Domain Configuration

20 Hadoop Application Agent 4.5 Installation and Administration Guide

Page 21: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

3. To enable the feature on the Data Domain system, run the following command:

# ifgroup enable

4. To verify the configuration, run the following command:

# ifgroup show config interfaces

Group Name Status Interface---------- ------- -----------default enabled 192.168.1.1default enabled 192.168.1.2default enabled 192.168.1.3default enabled 192.168.1.4

Validating the Data Domain system connectionDepending on the type of network connection being used, you can run the appropriatecommand to validate the communication between the client host and the Data Domainsystem:

l If you have a DD Boost-over-IP system, you can log in to the primary name nodeof the Hadoop cluster with the Hadoop application agent installed, and run therpcinfo command if the command is available on the system. For example:

# rpcinfo -p <Data_Domain_system_hostname>

The command output must include the ports listed in Enable the DD Boostoperations through a firewall on page 18. For example:

# rpcinfo -p <Data_Domain_system_hostname>

program vers proto port service 100000 2 tcp 111 portmapper 100000 2 udp 111 portmapper 100024 1 udp 779 status 100024 1 tcp 782 status 537220272 2 tcp 3006 100005 1 tcp 2052 mountd 100005 1 udp 2052 mountd 100005 2 tcp 2052 mountd 100005 2 udp 2052 mountd 100005 3 tcp 2052 mountd 100005 3 udp 2052 mountd 100003 3 tcp 2049 nfs 100003 3 udp 2049 nfs 285824256 1 udp 709 537329792 1 tcp 3007 537220001 2 tcp 2051 537220001 3 tcp 2051 537220439 1 tcp 695 537220017 1 tcp 727

Data Domain Configuration

Validating the Data Domain system connection 21

Page 22: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

Data Domain Configuration

22 Hadoop Application Agent 4.5 Installation and Administration Guide

Page 23: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

CHAPTER 3

Installation

This chapter includes the following topics:

l Installation overview...........................................................................................24l Download the software...................................................................................... 24l Install the Hadoop application agent.................................................................. 25

Installation 23

Page 24: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

Installation overviewHadoop application agent consists of a single Linux software package (.rpm file) thatyou install on the target system.

Install the Hadoop application agent from the root account on the primary name nodeof the Hadoop cluster. Also, install the agent on any failover nodes that take over ifthe primary name node goes offline.

Note

The Hadoop application agent can be installed in virtualized environments as long asthe virtual name node is running a supported Linux distribution.

The following table lists the components that the Hadoop application agent packageinstalls.

Table 3 Hadoop application agent components

Component Description

hdboost Hadoop application agent executable that performsproduct operations.

ddhcfs-<software-version>.jar Data Domain Hadoop Compatible File System(DDHCFS) interface.

libbfswrap.so Connector between ddhcfs.jar and

libDDBoostFS.so.

libDDBoost.so DD Boost library.

libDDBoostFS.so DD BoostFS library.

Download the softwareDownload the Hadoop Application Agent and Backup and Recovery Application Agentsoftware files from the Support website.

Procedure

1. On UNIX or Linux, log in as the root user.

2. In a local file system, create a temporary installation download directory withsufficient free disk space to contain both the downloaded software package andthe software installation files that are extracted from the package. On UNIX orLinux, type:

mkdir/usr/extract_hdboost

3. Go to https://support.emc.com.

4. Browse to the Downloads page, and then search for Hadoop ApplicationAgent.

5. Download the Hadoop Application Agent software file to the temporaryinstallation download directory.

Installation

24 Hadoop Application Agent 4.5 Installation and Administration Guide

Page 25: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

6. Extract the installation files from the downloaded software package:

a. On UNIX or Linux, to uncompress the downloaded package, type thegunzip command with the file_name .tar.gz name for the specificdownload file name:

gunzip emchdappagent-4.5.0.x-1-linux_x86_64.tar.gz .tar.gz

b. Extract the software from the uncompressed, tarred file:

tar -xvpBf emchdappagent-4.5.0.x-1-linux_x86_64.tar.gz .tar

The extraction lists the distribution software files on the screen.

Remain in the directory for the installation.

Install the Hadoop application agentTo install the Hadoop application agent, complete the following steps.

Procedure

1. To install Hadoop application agent, type the following:

rpm -ivh emchdappagent*.rpm

Under the top-level installation path, the following directory layout is created:

Table 4 Hadoop application agent subdirectory structure

Subdirectory Additional information

install-path>/backup-history The backup history cache files are stored underthis path.

<install-path>/bin All executable binaries are installed under thispath.

<install-path>/config All configuration files are stored under thispath.

<install-path>/credentials The encrypted credentials files are stored here.

<install-path>/java The ddhcfs-<software-version>.jarfile resides in this directory.

<install-path>/logs/debug (symlink

to /var/opt/dlp/logs/debug)

All Hadoop application agent logs reside underthis path on the name node. If backup orrestore commands are invoked with the -Doption, Hadoop application agent generatesadditional detailed debug logs on the datanodes in addition to the basic logs.

Installation

Install the Hadoop application agent 25

Page 26: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

Table 4 Hadoop application agent subdirectory structure (continued)

Subdirectory Additional information

<install-path>/logs/audit (symlink

to /var/opt/dlp/logs/audit)

If the audit log feature is set from thecommand line, the audit log resides under thispath.

<install-path>/tmp (symlink

to /var/opt/dlp/tmp)

Path that the Hadoop application agent uses tostore temporary files that are used duringbackup and recovery operations.

<install-path>/scripts All Hadoop application agent utility scriptsreside under this path.

/usr/lib/dlp/lib64 All .so libraries are installed under this path.

<install-path>/jobs Information about Hadoop Application Agentbackup and restore jobs are stored under thispath.

Note

To easily reference working directories, symlinks are created under <installpath>/tmp and <install path>/ logs. The practice of storing variableinformation under /var conforms to the File System Hierarchy Standards thatLinux employs.

2. To switch the user to the typical superuser hdfs, type the following command:

su - hdfs

3. To check the Hadoop native library warnings, type the following command:

hadoop checknative -a

4. Resolve any Hadoop library warnings.

5. To switch the user back to root, type the following command:

exit

6. (Optional) For environments that use the Hadoop application agent GUI,complete the steps in Certificate requirements on page 35 and Configure thekeystore for Jetty on page 37 as part of the post-installation script. At theConfigure the keystore for the GUI prompt, select y.

7. As a root user, type the following:

/opt/emc/dlp/config/dlp_config_hadoop

This command queries the user for the following:

Installation

26 Hadoop Application Agent 4.5 Installation and Administration Guide

Page 27: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

l The Hadoop distribution type of Cloudera or Hortonworks.

l The version of libjvm.so and libhdfs.so to use.

l The location of the Hadoop binary.

l If HBase is in use, HBase information.

This command completes the following actions:

l Configures the class path that the Hadoop application agent uses(hadoopclasspath --glob).

l Updates the dynamic linker (ldconfig).

l Creates the /opt/emc/dlp/tmp/dlp_cfg.json.tmp file that hdboostuses to create the dlp_cfg.json file.

Installation

Install the Hadoop application agent 27

Page 28: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

Installation

28 Hadoop Application Agent 4.5 Installation and Administration Guide

Page 29: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

CHAPTER 4

Administration Using the GUI

This chapter includes the following topics:

l Configuration using the GUI...............................................................................30l HDBoost web application overview.................................................................... 35l Certificate requirements.................................................................................... 35l Backup metadata............................................................................................... 38l Create backups.................................................................................................. 38l Change backup retention times..........................................................................40l Restore backups................................................................................................. 41l Delete backups...................................................................................................42

Administration Using the GUI 29

Page 30: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

Configuration using the GUIYou can integrate the HDBoost service with other application interfaces, allowing youto perform various application-specific operations within those interfaces.

You can currently integrate the HDBoost service with the following:

l Cloudera Manager 5.8 and later

l Ambari 2.5 and later

Configuration considerationsBefore configuring the Hadoop application agent, there are several things to consider.

The following list includes these considerations:

l Configure the Hadoop application agent as a Hadoop superuser. The typicalHadoop superuser is hdfs.

l All Hadoop application agent configuration is performed on the node of theHadoop cluster where the agent is installed.

l In a non-Kerberos environment, it is recommended that only one user managedevice configurations. If more than one user is required, a Kerberos environmentsupports multiple users.

Download HDBoost service filesProcedure

1. From the host, download the compressed file. The name of the file is as follows:

emchdappagent-*-linux_x86_64.tar.gz

2. To extract the compressed package, type the following:

tar xvzf emchdappagent-*-linux_x86_64.tar.gz

The HDBoost service packages for both Hortonworks and Cloudera are under:

emchdappagent-*/ui-serverFind:

l emchdappagent-*/ui-server/l emchdappagent-*/ui-server/l emchdappagent-*/ui-server/clouderal emchdappagent-*/ui-server/cloudera/HDBoost-Cloudera-

*.jarl emchdappagent-*/ui-server/hortonworksl emchdappagent-*/ui-server/hortonworks/HDBoost-Ambari-

*.tar.gz

Administration Using the GUI

30 Hadoop Application Agent 4.5 Installation and Administration Guide

Page 31: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

Hadoop application agent and Cloudera ManagerCloudera Manager allows third parties, such as Hadoop application agent, to integratetheir services into the Cloudera Manager user interface.

The HDBoost service is an add-on service in Cloudera Manager that integratesHDBoost interface components into the Cloudera platform, allowing you to performthe following functions from the Cloudera Manager web interface:

l Install the HDBoost service

l Uninstall the HDBoost service

l Start the HDBoost service

l Stop the HDBoost service

HDBoost web server requirements for Cloudera ManagerHDBoost web server for Cloudera Manager requires Java Runtime Environment (JRE)8 or later.

On the same host where the HDBoost command-line interface (CLI) is installed, installJRE 8 or later. See Oracle documentation for information about installing JRE 8 orlater.

You can also upgrade the Cloudera cluster to JRE 8 or later. See Clouderadocumentation for details.

HDBoost package for ClouderaThe HDBoost service is an archive Java Archive (JAR) file applied to ClouderaManager.

The HDBoost JAR file contains the following:

jar -tf HDBoost-Cloudera-4.0-1-SNAPSHOT.jarMETA-INF/MANIFEST.MFMETA-INF/descriptor/images/scripts/descriptor/service.sdlimages/favicon.icoscripts/control.sh

Adding HDBoost to Cloudera ManagerOn the same node where the HDBoost command-line interface (CLI) package isinstalled, install HDBoost.

Procedure

1. Copy the HDBoost jar file, HDBoost-Cloudera-4.0-1-SNAPSHOT.jar, tothe Cloudera Manager host at /opt/cloudera/csd.

2. To restart the Cloudera Manager server, type the following command:

service cloudera-scm-server restart

3. To add the HDBoost service, perform the following actions:

Administration Using the GUI

Hadoop application agent and Cloudera Manager 31

Page 32: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

a. Open a browser and type the following in the address bar:http://<Cloudera-host>:7180/cmf/home

b. In the Cloudera Manager user interface, next to the cluster name, click thetriangle, and then from the menu, select Add a Service.

c. In the Add a Service to Cluster 1 window, select HDBoost, and then clickContinue.

4. In a new window, to configure the hosts for the HDBoost server, perform thefollowing steps:

a. Choose the web server, which should be the same as the host where Hadoopapplication agent is installed.

b. Browse to the Cloudera Manager Home page, and then confirm thatHDBoost is installed by checking the listed services under Cluster.

5. Click HDBoost.

An integrated web page for HDBoost appears.The installation is complete and the HDBoost service appears on the left withthe other services in the Cloudera management console. You can go to HDBoostservice, and then to launch the HDBoost GUI, use the HDBoost UI quick link.

Starting the HDBoost ServiceProcedure

1. On the Cloudera Manager home page, to open the HDBoost page, underClusters, click HDBoost.

2. On the HDBoost page, from the Actions list box, select Start.

Stopping the HDBoost serviceProcedure

1. From the list box at Cloudera Manager > Clusters, on the Cloudera Managerpage, click Actions, and then select HDBoost.

2. On the HDBoost page, click Action, and then from the menu, select Stop.

Configuring HDBoostTo change the HDBoost configuration file, in Cloudera Manager, you can use theintegrated HDBoost web page.

As an admin user, you can edit, deploy, and version control the configuration settingsto the HDBoost nodes.

Procedure

1. In Cloudera Manager, browse to the HD Boost services page.

2. Click the Configuration tab.

3. Make the desired changes, and then click Save Changes.

4. To deploy the changed configuration, in the menu under Actions, click Restart.

Cloudera Manager automatically stops the HDBoost service, deploys theupdated configuration files, and then restarts the HDBoost service again.

Administration Using the GUI

32 Hadoop Application Agent 4.5 Installation and Administration Guide

Page 33: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

Deleting the HDBoost serviceProcedure

1. Browse to the Cloudera Manager Home page.

2. Next to HDBOOST, click the menu, and then select Delete.

3. (Optional) To remove the Java Archive (JAR) file, go to /opt/cloudera/csd,and then manually delete the file.

Hadoop application agent and AmbariApache Ambari allows third parties such as Hadoop application agent to integrate theirservices into the Ambari user interface.

HDBoost Application Server ServiceHDBoost Ambari service package allows you to deploy the HDBoost ApplicationService to any host in a Hadoop cluster.

HDBoost Ambari application service requirementsHDBoost application server requires Ambari version 2.3 or later.

In addition, HDBoost application server requires the latest version of Google Chrome.

HDBoost for Ambari packageThe HDBoost for Ambari package has the following structure:

HDBOOSTHDBOOST/configurationHDBOOST/configuration/hdboost-site.xmlHDBOOST/packageHDBOOST/package/scriptsHDBOOST/package/scripts/hdboost_server.pyHDBOOST/quicklinksHDBOOST/quicklinks/quicklinks.jsonHDBOOST/metainfo.xml

You use the hdboost-site.xml file configure HDBoost application serverproperties. Any changes that you make to these properties though the Ambari consoleare version-controlled.

The quicklinks.json file in Ambari version 2.3.0 allows you to add and definequick links on the HDBOOST service page to access the HDBoost Application Server.

Install HDBoost on AmbariProcedure

1. To stop the Ambari server, type the following command:

$ambari-server stop

2. Place the HDBoost-Ambari tar.gz file in the /var/lib/ambari-server/resources/stacks/HDP/2.5/services directory.

3. To create the HDBoost folder containing the HDBOOST package, uncompressthe tar.gz file.

Administration Using the GUI

Hadoop application agent and Ambari 33

Page 34: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

4. To change ownership of HDBoost to the user who starts the Ambari server,type the following command:

chown –R root:root HDBOOST

To change permissions, type the following command:

chmod -R 755 HDBOOST

5. To make HDBoost visible in the Ambari Add Service wizard, restart the Ambariserver. Type the following command:

$ambari-server start

For example:

Using python /usr/bin/pythonStarting ambari-serverAmbari Server running with administrator privileges.Organizing resource files at /var/lib/ambari-server/resources...Server PID at: /var/run/ambari-server/ambari-server.pidServer out at: /var/log/ambari-server/ambari-server.outServer log at: /var/log/ambari-server/ambari-server.logWaiting for server start....................Ambari Server 'start' completed successfully.

Add HDBoost to the Ambari user interfaceProcedure

1. Log in to the Ambari management console.

2. Select Actions > +Add Service

The Add Service Wizard appears.

3. Select the checkbox for HDBoost, and then click Next.

4. In the Add Service Wizard, on the Summary page, to deploy HDBoost, selectthe node where the Ambari server is running.

5. To deploy the HDBoost service in the cluster, on the right, select NameNode.

On the right, an HDBoost SERVER button appears in green.

6. Click Next.

7. (Optional) In the Add Service Wizard, make any configuration changes that youwant in the Customize Services window, or to skip this step, click Next.

You can reconfigure the HDBoost service after you complete the installation.

8. Review the configuration information in the Review window. If you are satisfiedwith the configuration, to install the HDBoost service on the cluster, clickDeploy.

The Install, Start, and Test window opens and displays the progress of theinstallation.

Administration Using the GUI

34 Hadoop Application Agent 4.5 Installation and Administration Guide

Page 35: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

9. Click Next, and then click Complete.

The installation is complete and the HDBoost service appears on the left withthe other services in the Ambari management console.

Configure the HDBoost serviceAfter the HDBoost service is installed, you can reconfigure the service if required.

Procedure

1. On the Ambari management console, n the list of services on the left, clickHDBoost.

2. Select the Configs tab.

3. Make the desired changes, and then click Save.

After you save the changes, the new version of the configuration is applied.

Note

For the changes to take effect, restart the HDBoost service.

Access the HDBoost user interfaceWhen using HDBoost with other application interfaces, to return to the HDBoostinterface, use the Quick Links menu.

Procedure

1. On the application console, click Quick Links.

2. Select HDBoost UI.

Delete the HDBoost serviceAmbari versions 2.4 and later provide a delete service function. This function does notremove installed files.

Procedure

1. In the Ambari management console, click the Service Actions menu, and thenclick Delete Service.

2. (Optional) To completely remove the HDBoost service, delete the followingdirectory:

/var/lib/ambari-server/resources/stacks/HDP/<version>/services/HDBOOST

HDBoost web application overviewTo perform Hadoop application agent tasks, use the graphical user interface (GUI).The web application consists of an HDBoost application service and a web browser.

To start and stop the HDBoost application service, use the Ambari or ClouderaManager interface.

Certificate requirementsThe HDBoost application server requires that you have a certificate in the keystore toenable the HTTPS protocol. You can choose one of two approaches to obtain a

Administration Using the GUI

HDBoost web application overview 35

Page 36: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

certificate: retrieve a signed certificate from a public trusted certificate authority(CA), or generate a self-signed certificate.

To obtain or create a certificate, you can use a utility tool, keytool, provided by theJava Development Kit (JDK). The executable can be found under $JAVA_HOME/bin.You can also use OpenSSL to generate a certificate. See OpenSSL documentation formore information.

Obtain a signed certificateProcedure

1. To generate a public/private key pair with the Java Development Kit (JDK)keytool, and then save it in a keystore, type the following command:

$ keytool -genkeypair –keystore jetty.keystore -alias hdboostas \-dname "CN=node1.domainname1.com,O=Hadoop" -keyalg RSA \ -keysize 2048 -storepass changeme -keypass changeme

2. To create a certificate signing request (CSR) file with the key pair that wasgenerated using the JDK keytool, type the following command:

$ keytool -certreq -keystore jetty.keystore -alias hdboostas \-storepass changeme -keypass changeme -file hdboostas.csr

3. Transmit the CSR file to the Certificate Authority (CA) and get a signedcertificate file (CRT).You should receive a file similar to hdboostas.crt from the CA.

4. To import the signed certificated into the keystore, type the followingcommand:

$ keytool -importcert -keystore jetty.keystore -alias hdboostas \-storepass changeme -keypass changeme -trustcacerts -file hdboostas.crt

Create a self-signed certificateIn addition to obtaining certificates, you can also create certificates for testing orother purposes.

Note

This approach is not recommended for deployment in a production environment. Youuse this approach at your own risk.

When a server with a self-signed certificate is started, the browser shows an indicatorthat warns the communication is not secure.

The following is an example of how to create a self-signed certificate:

keytool -genkeypair -keystore /opt/emc/dlp/config/jetty/jetty.keystore -keyalg RSA \

Administration Using the GUI

36 Hadoop Application Agent 4.5 Installation and Administration Guide

Page 37: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

-alias jetty123 -dname "CN=HYPERLINK "http://sandbox.hortonworks.com" sandbox.hortonworks.com,O=Hadoop" \-storepass changeme -keypass changeme -validity 999

Configure the keystore and truststoreProcedure

1. Create the password for the specific key within the keystore by adding thefollowing child element in /opt/emc/dlp/config/jetty/jetty-ssl-context.xml:

<Set name="KeyManagerPassword">OBF:1sot1v961saj1v9i1v941sar1v9g1sox</Set>

The OBF password is created with changeme as the password:

java -cp /opt/emc/dlp/java/hdboostas.jar org.eclipse.jetty.util.security.Password changeme

2. Configure the truststore by adding the following child elementin /opt/emc/dlp/config/jetty/jetty-ssl-context.xml:

<Set name="TrustStorePassword">OBF:1sot1v961saj1v9i1v941sar1v9g1sox</Set><Set name="trust-store-path">full-path-name-of-truststore</Set>

After you finish

To configure secure socket layer (SSL) connectors in addition to the https client,refer to the following Jetty reference document:

l http://download.eclipse.org/jetty/stable-9/apidocs/org/eclipse/jetty/util/ssl/SslContextFactory.html

To add more fields in Jetty .xml configuration files as needed, refer to the followingJetty reference document:

l http://wiki.eclipse.org/Jetty/Reference/jetty.xml_syntax

Configure the keystore for JettyWhen you configure a keystore for Jetty on the HDBoost application, you mustprovide a full pathname and password for the keystore. Configuring the Jetty keystoreis required to use the GUI.

Procedure

1. To configure the system, type:

/opt/emc/dlp/config/dlp_config_hadoop

2. To configure the keystore for the GUI, select y when prompted.

Administration Using the GUI

Configure the keystore and truststore 37

Page 38: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

3. Specify the full pathname for the keystore, or to select the following defaultlocation, press Enter:

[/opt/emc/dlp/config/jetty/jetty.keystore]

4. When prompted to choose a password for the keystore, type the password thatwas associated with the keystore when it was created.

The keystore should be successfully configured.

Backup metadataBackup metadata consists of required and optional elements.

The following list includes information about required and optional backup metadata:

l Required backup metadata—Includes information about a specific backup.

l Optional backup metadata:

n Audit log—Includes audit information.

n Backup configuration—Includes all backup configurations.

n Master index—Includes information about all backups.

Note

Required backup metadata cannot be skipped by setting the Backup configurationand audit logs option to false. Optional backup metadata can be skipped by settingthe Backup configuration and audit logs option to false.

Create backupsTo protect data from a data loss event, create backups. You can then recover the datafrom the backup.

Procedure

1. Select HDBoost > Dashboard.

2. From the Dashboard, select Backup.

The Backup page appears.

3. On the Backup page, select Add.

Six tabs appear. To save or run a backup, complete the fields under BackupSource and Backup Device.

4. On the Backup Source page, click Choose, and then in the list box, select theapplication type.

5. Select HDFS/HBase, and then click Next.

A list of available backup sources appears.

6. Select the desired backup source, and then click Next.

The red check mark that appears in the Backup Source tab turns green whenthe backup source is selected.

Administration Using the GUI

38 Hadoop Application Agent 4.5 Installation and Administration Guide

Page 39: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

7. Click the Backup Device tab.

A list of available backup devices appears.

8. (Optional) To add a backup device, click Add, and then to indicate the DataDomain server, DD Boost user, and storage unit, complete the three fields.

In this window, you can also edit and delete backup devices.

9. From the list, select a backup device, and then click Next.

The red check mark that appears in the Backup Device tab turns green whenthe backup device is selected. In this window, you can also edit and deletebackup devices.

10. Click the Hive Configuration (Optional) tab.

A list of available Hive configurations appears.

11. (Optional) To add a Hive configuration, click Add, and then specify thefollowing information:

l Hive configuration name

l Hive server

l Hive user

Note

This screen can also be used to modify and delete Hive configurations.

12. If a synchronized backup with HDBoost and BRBoost is desired, from the list,select a Hive configuration, and then click Next.

Add backup devicesProcedure

1. On the Backup Device tab, select Add.

2. Enter the necessary information in the DD Server, Storage Unit, and DD BoostUser fields, and then click Add.

Note

To reserve the backup device for the specified backup configuration, andprevent it from being added to another backup configuration, select the Is asecondary backup device? option.

Additional backup optionsAdditional backup options are available in the HDBoost interface.

In the HDBoost interface, when you click the Backup Options tab, a list of defaultsettings are displayed that you can change as required. These settings include thefollowing:

l Mapper jobs—You can change the maximum number of jobs in this field.

l Hadoop Distcp Options—Specify options for copying data between clusters.

Administration Using the GUI

Add backup devices 39

Page 40: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

l Secondary Device—Select a secondary system that is to be used to restoreoperations from a replicated Data Domain system.

l Keep backup snapshot—To retain the snapshot that was used for the backup,click Keep backup snapshot.

l Debug mode—To add additional debug information in the event logs, click Enabledebug mode. Debug mode is disabled by default.

l Backup configuration and audit logs—The Backup configuration and audit logsswitch is on by default.

Complete the configurationThe HDBoost Backup Summary screen displays the options that were specified forthe configuration.

Procedure

1. Click the Summary tab.

2. If you are satisfied with the configuration, click Finish or to abort the backupand return to the Dashboard, click Cancel.

After you click Finish, the Save the backup configuration dialog box appears.

3. Perform one of the following

l To save the ad hoc backup using the current configuration, perform thefollowing steps:

a. In the name field, type a name for the backup.When you type a name for the backup, the Save button is enabled.

b. Click Save.

l To start the ad hoc backup using the current configuration without saving it,click Run.

Change backup retention timesYou can change the retention times for backups through the HDBoost graphical userinterface (GUI).

Procedure

1. From the main dashboard, select Retention.

2. From the left side of the screen, select a backup source, and then select abackup target that is associated with that source from the right side of thescreen.

3. (Optional) To narrow the list of backup sources, apply type and name filtering,or sort by ascending or descending alphabetical order.

4. Drag a backup source or backup target to the Selected area next to the list.

Administration Using the GUI

40 Hadoop Application Agent 4.5 Installation and Administration Guide

Page 41: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

Note

l Retention updates applied to a backup source modify all backups of theselected item

l Retention updates applied to a target modify only the selected backup.

l If you do not select a backup, the retention update applies to all backups.

The selected item appears in the list, along with the message 1 Selecteditems:.

5. To set the updated retention time, click Retention.

Note

l The default retention time change is forward by 3 months, and is displayedas an absolute date.

l The Retention by Date radio button updates retention time to a specifieddate. To select a date, click the date box, and then use the calendar tool.The arrows to the left and right of the calendar step forward or backwardsby a month. Dates in the past are disallowed, and disable the Changebutton.

l The Retention by Period radio button allows relative adjustments to theexisting retention time. Modify the existing retention time up or down byentering values in the month, day, or year fields. Only one field can bemodified in a single update. Optionally, to adjust the times up or down, clickthe scroll arrows that appear when the box is selected.

6. To apply the change, click Change.

Restore backupsProcedure

1. From the HDBoost home page, click the upper left corner to expand theoptions.

2. Click Restore.

3. If desired, use the following options to help locate a specific backup sourcefrom which to restore a backup.

l Filtering

l Sorting

l Searching by keyword

4. To expand the list of associated backups for the backup source, click thebackup source.

5. Double-click the backup to restore.

The specified backup appears in the basket at the bottom of the screen.

6. Click Restore.

Administration Using the GUI

Restore backups 41

Page 42: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

Note

Only one backup can be restored at a time.

7. Accept the confirmation dialog box that displays the backup URI.

8. (Optional) Choose to restore the backup to a different location, and thenspecify the new target location.

9. (Optional) Choose to restore the backup from a secondary Data Domainsystem.

10. To start the restore operation, click Yes.

A confirmation appears with the Job ID of the restore operation.

Delete backupsProcedure

1. From the HDBoost home page, click the upper left corner to expand theoptions.

2. Click Delete.

3. If desired, to help locate a specific backup source from which to delete abackup, use the following options:

l Filtering.

l Sorting.

l Keyword search.

l To display all expired backups, select the Expired option.

4. To expand the list of associated backups for the backup source, click thebackup source.

Note

The snapshot timestamp indicates the retention time.

5. Double-click the backup to delete.

The specified backup appears in the basket at the bottom of the screen.

6. Click Delete.

Note

Multiple backups can be deleted at a time.

7. To start the delete operation, click Yes.

The selected backups are deleted in the background.

Administration Using the GUI

42 Hadoop Application Agent 4.5 Installation and Administration Guide

Page 43: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

CHAPTER 5

Administration Using the CLI

This chapter includes the following topics:

l Configuration using the CLI................................................................................44l Backup overview................................................................................................55l Data flow overview.............................................................................................57l Back up HDFS data to a Data Domain system.................................................... 58l Back up HBase data to a Data Domain system................................................... 59l Restore overview............................................................................................... 59l Restore an HDFS backup................................................................................... 60l Restore an HBase backup.................................................................................. 60l Restore a replicated backup............................................................................... 61l List backup configurations................................................................................. 64l List backups.......................................................................................................64l Search backups..................................................................................................65l Clean up backups...............................................................................................65l Delete backups.................................................................................................. 66l Refresh the Kerberos credentials cache.............................................................66l Test the connection to the Data Domain system................................................67l Change retention dates......................................................................................68l Erase the backup configuration..........................................................................69l Restore the configuration.................................................................................. 69l Display the software version.............................................................................. 70l Disaster recovery overview................................................................................ 70

Administration Using the CLI 43

Page 44: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

Configuration using the CLIBefore you perform administrative tasks with the Hadoop Application Agent CLI,configure the agent with the CLI.

For specific information about command line options, refer to Command Reference onpage 85.

Configuration considerationsBefore configuring the Hadoop application agent, there are several things to consider.

The following list includes these considerations:

l Configure the Hadoop application agent as a Hadoop superuser. The typicalHadoop superuser is hdfs.

l All Hadoop application agent configuration is performed on the node of theHadoop cluster where the agent is installed.

l When the hdboost command is invoked for the first time with the --addconfig--device option, the dlp_cfg.json file is created. This file is:

n Persistent across all invocations of Hadoop application agent.

n Managed by the Hadoop application agent configuration command options.

n Backed up by default.

l Passwords are stored in a separate credentials file, dlp_cfg.jceks, whenKerberos is not in use. To securely store password information, this file uses thehadoop credential command.

Note

Data Domain credentials are not required when Kerberos is in use.

l In a non-Kerberos environment, it is recommended that only one user managedevice configurations. If more than one user is required, a Kerberos environmentsupports multiple users.

Connect Hadoop to the Data Domain systemBefore you begin

If Kerberos is not being used, you must disable it by running hdboost {--kerberos|-K} --disable.

Ensure that one or more target DD Boost storage units on the Data Domain areconfigured to store the backups from the Hadoop environment.

To connect Hadoop objects to backup targets on the Data Domain system, completethe following steps.

Procedure

1. To configure a backup target on the Data Domain system for the Hadoopapplication agent configuration and credentials files, type the followingcommand:

hdboost {--addconfig|-a} --device <user>@<hostname>:<device-path> [ --kerberosscc <credential-cache>]

Administration Using the CLI

44 Hadoop Application Agent 4.5 Installation and Administration Guide

Page 45: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

For example:

hdboost --addconfig --device [email protected]:pc_hdp -yEnter password: Enter password again:

2. To configure the source object for the backup to the target on the Data Domainsystem, type the following command:

hdboost --addconfig -n <configuration_name> {hdfs://<hostname>/<dir> | hbase://<hostname>/<tablename>} --deviceid <ID>

Note

For the deviceid parameter, specify the target ID configured in the previousstep.

For example:

hdboost --addconfig -n test_hdboost_bkcfg hdfs://suse11sp301/test_hdboost --deviceid <ID> [ --kerberoscc <credential-cache> ]

Add a Data Domain, back up an HDFS directory, and back up an HBase tableThe Hadoop application agent provides the capability for you to add a Data Domain,back up an HDFS directory, and back up an HBase table.

The examples in the following procedure use the generic indicator clustername toshow where the name of the specific Hadoop distribution should be placed.

Procedure

1. Switch the user to hdfs:

clustername:/submittals/ # su - hdfshdfs@clustername:~> whoamihdfshdfs@clustername:~>

2. Type the following command:

hdboost --listconfig

3. Disable Kerberos, and then type the hdboost --listconfig command:

hdboost -K --disable

hdboost --listconfig

Administration Using the CLI

Add a Data Domain, back up an HDFS directory, and back up an HBase table 45

Page 46: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

4. Add the first Data Domain system, and then type the hdboost --listconfig command, for example:

hdboost -a --device [email protected]:storagepath

hdboost -k

5. Add the HDFS src URI, and then type the hdboost --listconfigcommand, for example:

hdboost -a -n demo1 -o hdfs://bu-hdp2-nn.lss.emc.com/test1 --deviceid 7cd45463-4274-4fba-a50b-5067c87ef8bc

hdboost -k

6. Add the HBase Src URI, and then type the hdboost --listconfigcommand, for example:

hdboost -a -n demo2 -o hbase://bu-hdp2-nn.lss.emc.com/emp --deviceid 7cd45463-4274-4fba-a50b-5067c87ef8bc

hdboost -k

You are now ready to back up HDFS/test1 directory, HBase table:emp, or both.

Configure multiple Data Domain systemsHadoop application agent supports the use of multiple Data Domain systems asbackup targets for the Hadoop environment. A secondary Data Domain system is alsorequired to use the Data Domain replication functionality.

By default, the first Data Domain system added to the configuration becomes themaster Data Domain, which serves as the backup target for metadata andconfiguration. However, it is possible to designate a subsequent system as the masterData Domain.

Procedure

1. To add the first Data Domain system to the configuration, type the followingcommand:

hdboost {--addconfig|-a} --device <user>@<hostname>:<device-path>

For example:

hdboost --addconfig --device [email protected]:pc_hdp -yEnter password:

Administration Using the CLI

46 Hadoop Application Agent 4.5 Installation and Administration Guide

Page 47: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

Enter password again:[email protected]:pc_hdp will get configured as target 1

2. To add the second Data Domain system to the configuration, type the followingcommand:

hdboost {--addconfig|-a} --device <user>@<hostname>:<device-path>

For example:

hdboost --addconfig --device [email protected]:pc_hdp -yEnter password:Enter password again:[email protected]:pc_hdp will get configured as target 2

3. To set the second Data Domain system as the master, type the followingcommand:

hdboost {--addconfig|-a} --master --deviceid <ID>

For example:

hdboost --addconfig --master --deviceid 2

Note

When the master device is changed, you cannot list backups with metadatainformation until you copy the metadata from the old master device to the newdevice.

Configure replicationHadoop application agent provides the ability to restore backups that were replicatedto a secondary Data Domain system, but does not provide any control over thereplication process. For more information, see the Data Domain Operating SystemAdministration Guide.

Replication requires at least two Data Domain systems, and a Data Domain replicationlicense.

To specify another Data Domain system as a secondary source for restoring backupswhen the primary source is offline, complete the following steps.Procedure

1. To specify a secondary system for the master Data Domain, type the followingcommand:

hdboost {--addconfig|-a} --master --deviceid <ID> --secondary <ID>

For example:

hdboost --addconfig --master --deviceid 1 --secondary 2

Administration Using the CLI

Configure replication 47

Page 48: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

2. To specify a secondary system from which to restore a Hadoop object or HBasetable to the primary system, type the following command:

hdboost {--addconfig|-a} --device <user>@<hostname>:<device-path> [--kerberoscc <credential cache>] [--secondary <sid>]

For example:

hdboost --addconfig --device [email protected]:pc_hdp --secondary c97a16e7-38b1-4088-b410-6478a486749e

Configure KerberosHadoop application agent supports the use of Kerberos authentication. Kerberos mustbe correctly configured before any Data Domain devices are added.To configure Kerberos, complete the following steps.

Note

Kerberos is enabled by default.

Procedure

1. Enable or disable Kerberos, perform one of the following actions:

l To enable Kerberos, type the following command:

hdboost -K --enable

l To disable Kerberos, type the following command:

hdboost -K --disable

2. To configure the Kerberos credentials cache, type the following command:

hdboost -a --device <user>@<hostname>:<device-path> -y

For example:

hdboost -a --device [email protected]:pc_master -yHadoop App Agent Version: version number Build: build number Enter Kerberos credential cache: /tmp/kcc3

Note

The credentials cache is required when Kerberos is in use. If it is not specified,the system prompts for it when adding Data Domain devices.

Maps parameterBy default, 20 maps are used for Hadoop Distributed Copy (DistCP) jobs, includinghdboost backup and recovery. However, the maps parameter can be set manually tospecify an upper limit on the number of map jobs to run on Hadoop application agent

Administration Using the CLI

48 Hadoop Application Agent 4.5 Installation and Administration Guide

Page 49: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

calls to DistCP. This parameter can also be used to place a limit on the number ofsimultaneous connections to the Data Domain system or systems in the environment.

Setting this parameter impacts backup performance.

To set the maps parameter, type the following command:

hdboost --addconfig -o {hdfs://<hostname>/<dir> | hbase:// <hostname>/<tablename>} --deviceid <ID> --maps <n>

For example:

hdboost --addconfig -o hdfs://suse11sp301/test_hdboost --deviceid 1 --maps 20

Audit loggingThe audit logging feature records all configuration changes, including the user whoperformed the change, when the change was performed, and whether the change wassuccessful. The listed functionality of this feature applies to both HDBoost andBRBoost.

Note

To use the audit logging feature, an additional license is not required.

All changes are recorded in the dlp_cfg.json configuration file. Each configurationitem is a JSON formatted entry in the configuration file and is one of the following:

l A backup Data Domain appliance

l A hdfs backup

l A hbase backup

l A metadata backup

The audit.log file records all changes to the hdboost configuration. It is createdwith world-writable permission, ensuring that any Hadoop application agent user whomakes a configuration change triggers an audit log entry.

The audit log itself is backed up as a final step along with the dlp_cfg.jsonconfiguration file whenever a hdboost backup occurs. The audit log is restored onlywhen a restore operation is performed with the --restore-logs option on theinvocation line.

If the user changes configuration during a backup, the updated configuration file andaudit logs are written out after the backup is complete.

Added configuration itemsWhen a configuration item is added, a structure reference holds a copy of the newitem.

If the addition attempt fails, the original settings are retained and the addition attemptis logged.

Administration Using the CLI

Audit logging 49

Page 50: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

Audit logging date and ID user formatLog entries have the following format:

YYYY-MMM-dd_HH:mm:ss.SSSSSS_<user><old/new><entry>

See the following example:

2017-Jun-14 09:46:21.148000 hdfs: Starting the modification of auditlog configuration.2017-Jun-14 09:46:21.157589 hdfs: Number of old log rotations is 3.2017-Jun-14 09:46:21.159238 hdfs: Old log rotation interval is none.2017-Jun-14 09:46:21.160357 hdfs: Old maximum file size is 1MB.2017-Jun-14 09:46:21.161403 hdfs: Number of new log rotations is 5.2017-Jun-14 09:46:21.162361 hdfs: New log rotation interval is none.2017-Jun-14 09:46:21.163373 hdfs: New maximum file size is 100KB.2017-Jun-14 09:46:21.167695 hdfs: Modification of auditlog configuration is completed.

Table 5 Date and ID entry definitions

Entry Definition

YYYY The current year.

MMM The current three-letter month (January through December).

dd The current day of the month.

HH The current hour displayed in the 24-hour format.

mm The current minute.

ss The current second.

SSS The current millisecond.

_ Indicates a blank space.

user The current system user name.

old/new Either the old entry that is being overwritten or a newly added entry.

entry The item being reconfigured.

Note

Log entries match the current local time zone provided by the Linux system.

Deleted configuration itemsWhen a configuration item is deleted, the status indicates whether the deletion wassuccessful.

If the deletion attempt fails, the original settings are retained and the deletion attemptis logged.

Administration Using the CLI

50 Hadoop Application Agent 4.5 Installation and Administration Guide

Page 51: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

Modified configuration itemsWhen a configuration item is modified, structure references hold two versions of theitem. The first version is the original value that existed before the modificationattempt.

The second version holds the updated item that the user is attempting to write.Finally, the status is written out indicating whether the update was successful. If theupdate fails, the configuration remains unaltered and retains the original settings,although the attempted change is still recorded.

Audit log rotationYou can configure the system to rotate old logs according to either a fixed schedule,file size limits, or both.

You can set log rotations that are based on both time and file size. In this case, the logrotates when either the period ends or the size limit is reached.

The following is an example of an audit log rotation configuration command:

./hdboost {--addconfig|-a} --audit --rotation <n> [ --yearly| --monthly| --weekly| --daily] [ --size <size>{K|M|G}] [-y] [-D]

where --rotation <n> represents the number of log rotations.

Rotated logs are saved uncompressed in the following file name format:

audit.log.<k>where <k> runs from 1 to the number specified in <n>.

After <n> rotations, the old audit.log.1 is discarded and the file name is used tohold the next rotation in sequence. The default is for 5 rotations with monthlyintervals.

The -y option skips prompts, and the -D option enables debug logging.

If a rotation parameter of 0 is specified, the log is not rotated and audit.log growsindefinitely. The rest of the optional arguments are ignored. If a log rotation isspecified without a time or size argument, an error is returned to the user, and theaudit log configuration remains unchanged.

The following list includes information about the log rotation parameters.

yearly

The audit log is rotated when the first entry is added on or after midnight onJanuary 1 each year.

monthly

The audit log is rotated when the first entry is added on or after midnight on thefirst day of each month.

weekly

The audit log is rotated when the first entry is added on or after midnight eachSunday.

daily

The audit log is rotated when the first entry is added on or after midnight eachday.

Administration Using the CLI

Audit logging 51

Page 52: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

size

The audit log is rotated when a specified file size is reached.

The size argument requires a numeric size followed by K (kilobytes), M(megabytes), or G (gigabytes).

If you omit the size parameter, the log rotation does not occur based on size.

You can specify a size limit of up to 2G.

Displaying the audit log configuration using the CLIProcedure

1. Type the following command:

hdboost {--listconfig | -k}

2. To list the audit log, type the following command:

hdboost {--listconfig | -k} --audit

Restoration of old configuration filesWhen you restore an older configuration file, the differences between the old and newversions are logged. Because the old configuration file being restored replaces thenewer one, the restored file contains the newer entries.

Note

Because you are not updating the passwords, the passwords remain the same.

When you restore old configuration files, the following occurs:

1. Kerberos setting changes are displayed first.

2. Changes to the master Data Domain configuration are logged.

3. The list of devices is generated:

l Any device that exists only in the file being replaced is treated as a deleteddevice.

l Any device that exists only in the configuration being restored is treated as anadded device.

l Every changed device configuration is written to the log one at a time.

4. The backup configurations are checked.

5. The process is repeated for metadata backups.If you restore a configuration file that is substantially different from the previousone, the system generates a lengthy set of log entries.

When restoring an updated configuration file which has some Data Domains thatwere deleted, the keystore is not updated. If re-adding the Data Domains isrequired, perform the following steps:

a. If the /opt/emc/dlp/credentials/dlp_cfg.jceks file exists, delete thefile.

Administration Using the CLI

52 Hadoop Application Agent 4.5 Installation and Administration Guide

Page 53: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

b. For each configured device, rerun the hdboost -a --device command.

Kerberos setting changes are displayed first.

Example log entriesLog entries provide an array of information

Kerberos is disabled

2017-Jun-14 09:55:16.766023 hdfs: Starting the modification of Kerberos authentication.2017-Jun-14 09:55:16.771584 hdfs: Old Kerberos authentication is enabled.2017-Jun-14 09:55:16.772957 hdfs: New Kerberos authentication is disabled.2017-Jun-14 09:55:16.779560 hdfs: Modification of Kerberos authentication is completed.

A device is addedThere is no old entry because this is a newly added device.

2017-Jun-14 13:14:52.049927 hdfs: Starting the device addition.2017-Jun-14 13:14:52.052245 hdfs: New device ID is 8a8a38e9-d8d5-4607-bc69-fd2083590cc7.2017-Jun-14 13:14:52.053191 hdfs: New secondary device ID is -1.2017-Jun-14 13:14:52.054010 hdfs: New device storage pathname is [email protected]:tahoe.2017-Jun-14 13:14:52.054824 hdfs: New device password is added.2017-Jun-14 13:14:53.844353 hdfs: Device addition is completed.

The master Data Domain system is changedIn this example, you add a secondary master Data Domain system to the configuration:

2017-Jun-07 15:48:31 hdfs: Starting the modification of master.2017-Jun-07 15:48:31 hdfs: Old ID of the master DD device is 1.2017-Jun-07 15:48:31 hdfs: Old ID of the master DD secondary device is 2.2017-Jun-07 15:48:31 hdfs: New ID of the master DD device is 1.2017-Jun-07 15:48:31 hdfs: Modification of master is completed.

The log rotation is reconfiguredIn the following example, you can see the log rotation has changed:

2017-Jun-07 15:44:00 hdfs: Starting the modification of auditlog configuration.2017-Jun-07 15:44:00 hdfs: Number of old log rotations is 5.2017-Jun-07 15:44:00 hdfs: Old log rotation interval is none.2017-Jun-07 15:44:00 hdfs: Old maximum file size is 40GB.2017-Jun-07 15:44:00 hdfs: Number of new log rotations is 5.2017-Jun-07 15:44:00 hdfs: New log rotation interval is none.2017-Jun-07 15:44:00 hdfs: New maximum file size is 1KB.2017-Jun-07 15:44:00 hdfs: Modification of auditlog configuration is completed.

The Data Domain password is updated (non-Kerberos)

2017-Jun-14 13:37:53.061111 hdfs: Starting the device addition.2017-Jun-14 13:37:53.063844 hdfs: New device ID is 4250b7cc-1de3-4491-bbda-81bfe34a14be.2017-Jun-14 13:37:53.065172 hdfs: New secondary device ID is -1.2017-Jun-14 13:37:53.066124 hdfs: New device storage pathname is [email protected]:hadoop.2017-Jun-14 13:37:53.067082 hdfs: New device password is added.2017-Jun-14 13:37:54.728216 hdfs: Device addition is completed.

Administration Using the CLI

Audit logging 53

Page 54: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

Data Domain update credential cache (Kerberos only)

2017-Jun-14 13:25:06.526704 hdfs: Starting the modification of device.2017-Jun-14 13:25:06.538553 hdfs: Old device ID is e38d314c-0c22-4b2c-9e1d-ffbc9c7562c0.2017-Jun-14 13:25:06.540273 hdfs: Old secondary device ID is -1.2017-Jun-14 13:25:06.541553 hdfs: Old device storage pathname is [email protected]:tahoe.2017-Jun-14 13:25:06.542556 hdfs: Old Kerberos credential cache path is /tmp/krb5cc_old.2017-Jun-14 13:25:06.543366 hdfs: New device ID is e38d314c-0c22-4b2c-9e1d-ffbc9c7562c0.2017-Jun-14 13:25:06.544244 hdfs: New secondary device ID is -1.2017-Jun-14 13:25:06.545076 hdfs: New device storage pathname is [email protected]:tahoe.2017-Jun-14 13:25:06.545804 hdfs: New Kerberos credential cache path is /tmp/krb5cc_new.2017-Jun-14 13:25:06.550312 hdfs: Device modification is completed.

A device is successfully deleted

2017-Jun-14 13:20:45.735706 hdfs: Starting the device deletion.2017-Jun-14 13:20:45.738047 hdfs: Old device ID is 8a8a38e9-d8d5-4607-bc69-fd2083590cc7.2017-Jun-14 13:20:45.738801 hdfs: Old secondary device ID is -1.2017-Jun-14 13:20:45.739566 hdfs: Old device storage pathname is [email protected]:tahoe.2017-Jun-14 13:20:45.740395 hdfs: Old device password is removed.2017-Jun-14 13:20:47.257209 hdfs: Device deletion is completed.

A backup is successfully configuredBecause this is a new backup, there is no old entry.

2017-Jun-14 13:41:21.107475 hdfs: Starting the new backup configuration.2017-Jun-14 13:41:21.110845 hdfs: New name is hadoop-admin.2017-Jun-14 13:41:21.111783 hdfs: New URI is hdfs://dlpm-build.corp.emc.com/user/hadoop-admin.2017-Jun-14 13:41:21.112585 hdfs: New device ID is 2.2017-Jun-14 13:41:21.113380 hdfs: New preserve snapshot setting is yes.2017-Jun-14 13:41:21.114114 hdfs: New setting for skip backup configuration is yes.2017-Jun-14 13:41:21.114793 hdfs: New number of maps is 65535.2017-Jun-14 13:41:21.115468 hdfs: New retention date is 0 3 0.2017-Jun-14 13:41:21.120266 hdfs: Backup configuration is completed.

A backup is successfully reconfigured

2017-Jun-14 13:49:29.062372 admin: Starting the modification of backup.2017-Jun-14 13:49:29.072617 admin: Old name is admin.2017-Jun-14 13:49:29.073659 admin: Old URI is hdfs://dlpm-build.corp.emc.com/user/admin.2017-Jun-14 13:49:29.074707 admin: Old device ID is 1.2017-Jun-14 13:49:29.075686 admin: Old preserve snapshot setting is yes.2017-Jun-14 13:49:29.076470 admin: Old setting for skip backup configuration is yes.2017-Jun-14 13:49:29.077184 admin: Old number of maps is 65535.2017-Jun-14 13:49:29.077967 admin: Old type isforcefull.2017-Jun-14 13:49:29.078668 admin: New name is admin.2017-Jun-14 13:49:29.079429 admin: New URI is hdfs://128.222.111.186/user/admin.2017-Jun-14 13:49:29.080320 admin: New device ID is 3.2017-Jun-14 13:49:29.081079 admin: New preserve snapshot setting is yes.

Administration Using the CLI

54 Hadoop Application Agent 4.5 Installation and Administration Guide

Page 55: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

2017-Jun-14 13:49:29.081806 admin: New setting for skip backup configuration is yes.2017-Jun-14 13:49:29.082536 admin: New number of maps is 65535.2017-Jun-14 13:49:29.083295 admin: New retention date is 0 0 0.2017-Jun-14 13:49:29.084084 admin: New user options are -pugp.2017-Jun-14 13:49:29.089299 admin: backup modification completed.

A backup is successfully deleted

2017-Jun-14 13:51:41.091736 hdfs: Starting the backup deletion.2017-Jun-14 13:51:41.101512 hdfs: Old name is hadoop-admin.2017-Jun-14 13:51:41.102659 hdfs: Old URI is hdfs://dlpm-build.corp.emc.com/user/hadoop-admin.2017-Jun-14 13:51:41.104327 hdfs: Old device ID is 2.2017-Jun-14 13:51:41.105549 hdfs: Old preserve snapshot setting is yes.2017-Jun-14 13:51:41.106411 hdfs: Old setting for skip backup configuration is yes.2017-Jun-14 13:51:41.107343 hdfs: Old number of maps is 65535.2017-Jun-14 13:51:41.108161 hdfs: Old retention date is 0 3 0.2017-Jun-14 13:51:41.112603 hdfs: Backup deletion is completed.

Restore old configuration with multiple changes (non-Kerberos)

2017-Jun-07 15:48:31 hdfs: Starting the modification of master.2017-Jun-07 15:48:31 hdfs: Old ID of the master DD device is 1.2017-Jun-07 15:48:31 hdfs: Old ID of the master DD secondary device is 2.2017-Jun-07 15:48:31 hdfs: New ID of the master DD device is 1.2017-Jun-07 15:48:31 hdfs: Modification of master is completed.2017-Jun-07 15:48:31 hdfs: Starting the modification of auditlog configuration.2017-Jun-07 15:48:31 hdfs: Number of old log rotations is 3.2017-Jun-07 15:48:31 hdfs: Old log rotation interval is none.2017-Jun-07 15:48:31 hdfs: Old maximum file size is 1GB.2017-Jun-07 15:48:31 hdfs: Number of new log rotations is 5.2017-Jun-07 15:48:31 hdfs: New log rotation interval is none.2017-Jun-07 15:48:31 hdfs: New maximum file size is 40GB.2017-Jun-07 15:48:31 hdfs: Modification of auditlog configuration is completed.2017-Jun-07 15:48:31 hdfs: Starting the backup deletion.2017-Jun-07 15:48:31 hdfs: Old name is tmp.2017-Jun-07 15:48:31 hdfs: Old URI is hdfs://bu-cloudera-nn.lss.emc.com/tmp.2017-Jun-07 15:48:31 hdfs: Old device ID is 1.2017-Jun-07 15:48:31 hdfs: Old preserve snapshot setting is yes.2017-Jun-07 15:48:31 hdfs: Old skip backup configuration setting is yes.2017-Jun-07 15:48:31 hdfs: Old number of maps is 65535.2017-Jun-07 15:48:31 hdfs: Backup deletion is completed.

Backup overviewThe Hadoop application agent uses native Hadoop functionality to back up HDFS dataand HBase tables to a DD Boost storage unit on a Data Domain system.

All backup operations are controlled from the Hadoop name node. No additionalmounts, paths, libraries, or settings are required on the cluster data nodes.

Consider the following guidelines for creating backups with Hadoop application agent:

l Use 10 GbE or faster network connectivity, if possible.

l The amount of time that is required to complete a backup varies depending on thefollowing factors:

n The number and type of Data Domain systems in the environment.

Administration Using the CLI

Backup overview 55

Page 56: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

n The size of the Hadoop cluster.

n The amount of data to be backed up.

Note

Typically, the first backup should take the longest. The time that is required tocomplete subsequent backups should decrease due to data deduplication, butvaries depending on the amount of new data that is added between backups.

n The network infrastructure in the environment.

Backup metadataHadoop application agent uses the following metadata to uniquely identify eachbackup:

l Name node host or cluster name

l Backup type

l Hadoop directory name or HBase table name

l Timestamp

The backup metadata is persisted on the Data Domain system.

Backup object URIsAs an alternative to configuring backup information with the CLI, backup URIs can becreated directly in the HDFS application agent configuration file. URIs can be createdfor both Hadoop objects and HBase tables. If no object is specified as part of a backupcommand, all valid URIs in the configuration are backed up. If the URI specifies anHDFS or an HBase snapshot directory, all user snapshots are included as part of thebackup.

Common backup CLI optionsThe following CLI options can be used to modify the operation of the hdboost --backup commands.

Table 6 CLI options

Command option Description

--until Specify the retention time for the backup.

-N Do not include the configuration file, audit.log file, and master

index in the backup.

-P Preserve the temporary snapshot that is created in the Hadoopenvironment.

-D Debug, send detailed information to the logs that are storedin /var/opt/dlp/logs.

-y To allow operations to be scripted, assume yes to all user prompts.

Backup contentsBy default, a backup includes the following elements:

l Required backup metadata, that cannot be skipped with the -N option, is backedup to the following file:/metadata/1/hadoop/<backyp_type>/<namenode>/Source_URI/<snapshot_time>/<snapshot_time>.idx

Administration Using the CLI

56 Hadoop Application Agent 4.5 Installation and Administration Guide

Page 57: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

l Optional backup metadata, that can be skipped with the -N option, is backed up tothe following locations:

n audit.log file: /metadata/1/hadoop/<backup_type>/<namenode>/Source_URI/<snapshot_time>/

n dlp_cfg.json file: /metadata/1/hadoop/<backup_type>/<namenode>/Source_URI/<snapshot_time>/

n master index: /metadata/1/hadoopA subsequent -N backup only includes required backup metadata that is backed up tothe following file:

/metadata/1/hadoop/<backyp_type>/<namenode>/Source_URI/<snapshot_time>/<snapshot_time>.idxTo find backup data that is associated with the snapshot on the Data Domain systemfor restore, the <snapshot_time>.idx file is required.

Data flow overviewThe following figure shows the data flow during a Hadoop application agent backupoperation.

Figure 1 Data flow

The data flow is as follows:

1. The user requests through the console that a directory path is backed up toHadoop application agent.

2. Hadoop application agent copies the dlp_cfg.json configuration file to themaster Data Domain appliance.If Kerberos is active, the Kerberos credential file is passed to the BoostFS librariesfor authentication.

Administration Using the CLI

Data flow overview 57

Page 58: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

3. Hadoop application agent requests a file system snapshot through the Hadoopinterface.

4. The Namenode takes a snapshot path unless the path itself is a snapshot.The user must have previously configured this directory to allow snapshots to betaken.

5. The snapshot is saved on the <path>/.snapshot directory of the HDFS filesystem.

6. The file system snapshot is backed up to the Data Domain appliance.

7. Hadoop application agent deletes the snapshot by default, with the option topreserve it.

8. The backup index is created.

Back up HDFS data to a Data Domain systemBefore you begin

The connections between the Hadoop cluster and the Data Domain system must beconfigured as described in Configuration using the CLI on page 44.

The retention period is set in the hdboost --backup command. The retentionperiod is relative to today’s date, and is given in the following format:

l #y: refers to the number of years for data retention.

l #m: refers to the number of months for data retention.

l #d: refers to the number of days for data retention.

Each time parameter can be used individually, but you can also specify #y, #m, and #don the same line. For example, 1y 3m 2d would yield a retention time of 1 year, 3months, and 2 days from the current date.

A retention time in the yyyy-mm-dd format retains the backup until the specified date.If either the date or path specified is invalid, an error message appears.

To Initiate the HDFS backup to the Data Domain system, type the following command:

hdboost {--backup|-b} [--until forever | <retention-period>] [-o hdfs://<hostname>/<dir> --deviceid <ID>]

For example:

hdboost --backup --until 6m -o hdfs://suse11sp301/test_hdboost

Note

l If no retention period is specified, the default is three months.

l Specify a retention period in the format [#y][#m][#d] or yyyy-mm-dd.

Administration Using the CLI

58 Hadoop Application Agent 4.5 Installation and Administration Guide

Page 59: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

Back up HBase data to a Data Domain systemYou can back up HBase data to a Data Domain system. Only one HBase table is backedup at a time.

Before you begin

Ensure that the connections between the Hadoop cluster and the Data Domain systemare configured as described in Configuration using the CLI on page 44.

To initiate the HBase backup to the Data Domain system, type the followingcommand:

hdboost {--backup|-b} [--until forever | <retention-period>] [-o hbase://<hostname>/<table>]

Note

l If no retention period is specified, the default is 3 months.

l Specify a retention period in the format [#y][#m][#d] or yyyy-mm-dd.

For example:

hdboost --backup --until 6m -o hbase://suse11sp301/table1

Restore overviewHadoop application agent provides the ability to perform the following types of restoreoperations to copy data from the Data Domain system back to the Hadoop cluster:

l Full restore of Hadoop data

l Partial restore of Hadoop data

l Full restore of an HBase table

Restore operations can be completed using the original backup, or a replicated backupon a secondary Data Domain system.

Hadoop application agent performs restore operations without the need to mount theData Domain system on any node in the Hadoop cluster.

All restore operations are controlled from the Hadoop name node. No additionalmounts, paths, libraries, or settings are required on the cluster data nodes.

Consider the following guidelines for creating backups with Hadoop application agent:

l Use 10 GbE or faster network connectivity, if possible.

l The amount of time that is required to complete a restore varies depending on thefollowing factors:

n The number and type of Data Domain systems in the environment.

n The size of the Hadoop cluster.

n The amount of data to be restored.

n The network infrastructure in the environment

Administration Using the CLI

Back up HBase data to a Data Domain system 59

Page 60: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

Restore URIsIf valid backup URIs exist in the configuration file, the URIs can be specified forrestore operations.

Common restore CLI optionsTo modify the operation of the hdboost --restore commands, the following CLIoptions can be used.

Table 7 CLI commands

Command option Description

-R <destination> Restore to an alternate destination.

-p Restore from a secondary Data Domain system.

-D Debug, send detailed information to the logs storedin /var/opt/dlp/logs.

-y To allow operations to be scripted, assume yes to all user prompts.

Restore an HDFS backupTo initiate an HDFS restore from the Data Domain system, type the followingcommand:

hdboost {--restore|-r} -o {<backup-URI> | hdfs://<hostname>/<dir>[/<subdirectory>]} [--deviceid <ID> | --device user@hostname:<device-path>]

For example:

hdboost --restore -o hdfs://suse11sp301/test_hdboost

Note

A partial backup can be restored by specifying a subdirectory path within the top leveldirectory.

Restore an HBase backupTo initiate an HBase restore from the Data Domain system, type the followingcommand:

hdboost {--restore|-r} -o {<backup-URI> | hbase://<clustername>/<tablename>} [--deviceid <ID> | --device user@hostname:<device-path>]

For example:

hdboost --restore -o hbase://suse11sp301/table1

Administration Using the CLI

60 Hadoop Application Agent 4.5 Installation and Administration Guide

Page 61: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

Note

l HBase backups must be restored one table at a time.

l The Hadoop distribution on the HBase restore target system must be the sameversion or newer as the Hadoop distribution on the backup source system.

Restore a replicated backupBefore you begin

Replication must be performed offline, outside of the Hadoop application agent.The Data Domain that functions as the replication destination must be associated to aprimary device. Type the following command:

hdboost {--addconfig|-a} --device <user>@<hostname>:<device-path> [ --kerberoscc <credential-cache> ] [--secondary <sid>]

For example:

hdboost -a --device [email protected]:kb1 --secondary 2dd56c13-8c13-42a2-8ce7-16a8b7836adc -yHadoop App Agent Version: 4.5.0.0 Build: 1_SNAPSHOT20170613201136

Enter password:Enter password again: ID SID Secondary Device==================================== ==================================== ========= ======d3c6dafe-cb01-4103-bdff-94ec20ba5c74 2dd56c13-8c13-42a2-8ce7-16a8b7836adc false [email protected]:kb1

[email protected]:pc_hdp is configured as target 2.

Note

If Kerberos is enabled, you must add the pathname to the credential cache file.

Procedure

1. To associate a secondary device which contains the replicated backup, modifythe device configuration.

Type the following command:

hdboost {--addconfig|-a} --device <user>@<hostname>:<device-path> [--kerberoscc <credential cache>] [--secondary <sid>] [--isSecondary] [-D] [-y]

Administration Using the CLI

Restore a replicated backup 61

Page 62: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

Note

The --isSecondary option reserves the device and prevents it from beingused by another backup configuration. Reserving the device prevents additionalunwanted or accidental replication sessions that would occur if the device wereshared between backup configurations.

For example:

hdboost -a --device [email protected]:kb1 --secondary 2dd56c13-8c13-42a2-8ce7-16a8b7836adc -yHadoop App Agent Version: 4.5.0.0 Build: 1_SNAPSHOT20170613201136

Enter password:Enter password again: ID SID Secondary Device==================================== ==================================== ========= ======d3c6dafe-cb01-4103-bdff-94ec20ba5c74 2dd56c13-8c13-42a2-8ce7-16a8b7836adc false [email protected]:kb1

2. To restore the backup from the secondary Data Domain system, type thefollowing command:

hdboost {--restore|-r} -o <object> -p

For example:

hdboost --restore -o hdfs://suse11sp301/test_hdboost -p

Restoring a replicated backup with a device ID overrideIf a replicated backup exists on a Data Domain system that is not part of the Hadoopapplication agent configuration, a restore can be performed by adding the DataDomain system to the configuration and specifying it as the location to get the backupfor the restore operation.

To restore a backup with a device ID override, complete the following steps.

Note

If Kerberos is enabled, you must add the pathname to the credential cache file.

Procedure

1. To add the Data Domain system to the configuration, type the followingcommand:

hdboost {--addconfig|-a} --device <user>@<hostname>:<device-path>

Administration Using the CLI

62 Hadoop Application Agent 4.5 Installation and Administration Guide

Page 63: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

For example:

hdboost --addconfig --device [email protected]:pc_hdp -yEnter password:Enter password again:

[email protected]:pc_hdp will get configured as target 3

2. To restore the backup from the new Data Domain system, type the followingcommand:

hdboost {--restore|-r} -o <object> --deviceid <ID>

For example:

hdboost --restore -o hdfs://suse11sp301/test_hdboost --deviceid 3

Restoring a replicated backup with a device overrideIf a replicated backup exists on a Data Domain system that is not part of the Hadoopapplication agent configuration, a restore can be performed by adding the DataDomain system to the configuration and specifying the device URI as the location toget the backup for the restore operation.

To restore a backup with a device override, complete the following steps.

Note

If Kerberos is enabled, you must add the pathname to the credential cache file.

Procedure

1. To add the Data Domain system to the configuration, type the followingcommand:

hdboost {--addconfig|-a} --device <user>@<hostname>:<devicepath> hdboost --addconfig --device

For example:

hdboost --addconfig --device [email protected]:pc_hdp -yEnter password:Enter password again:[email protected]:pc_hdp will get configured as target 4

Administration Using the CLI

Restoring a replicated backup with a device override 63

Page 64: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

2. To restore the backup from the Data Domain system, type the followingcommand:

hdboost {--restore|-r} -o <object> --device <username>@<hostname>:<device-path>

For example:

hdboost --restore -o hdfs://suse11sp301/test_hdboost --device [email protected]:pc_hdp

List backup configurationsHadoop application agent can list the backup configuration for a specified device, themaster Data Domain system, a specified backup object, or for all objects and devicesin the configuration.To list backup configuration information, specify a device or object for which to listthe backup configuration.

Type the following command:

hdboost {--listconfig|-k} {-? | [--deviceid <ID> | --device <username>@<hostname>:<device-path> | --master | -o <object>]}

List backupsHadoop application agent provides the ability to list the backups completed of aspecific backup object, within a specified date range, or all backups on the system.To list backup information, specify the parameters to list backups.

Type the following:

hdboost {--list|-l} { -? | [-L] [-o <object>] [[[--before|--after] yyyy[-mm[-dd[.hh[-mm[-ss]]]]] | --from yyyy[-mm[-dd[.hh[-mm[-ss]]]] --to yyyy[-mm[-dd[.Thh[-mm[-ss]]]]] }

For example:

hdboost --list -o hdfs://suse11sp301/test_hdboost

There are short and long options for the list command. The short option is the default.The following is an example of the short version:

Retention Backup URI========= ============20160715 ddhcfs://[email protected]:pc_hdp/data/1/hadoop/hbase/hdboost-build.corp.emc.com/analytics2/20160414T231646

20160715 ddhcfs://[email protected]:pc_hdp/data/1/hadoop/hbase/hdboost-build.corp.emc.com/.hbase-snapshot/analytics2-snap/20160415T030854

Administration Using the CLI

64 Hadoop Application Agent 4.5 Installation and Administration Guide

Page 65: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

20160729 ddhcfs://[email protected]:pc_hdp/data/1/hadoop/hbase/hdboost-build.corp.emc.com/.hbase-snapshot/analytics2-snap/20160415T032332

The following is an example of the long [-L] version:

Snapshot Time Backup Start Backup End Retention Backup URI=============== =============== ================ ========= ===========20160414T231646 20160414T231707 20160414T231720 20160715 ddhcfs://[email protected]:pc_hdp/data/1/hadoop/hbase/hdboost-build.corp.emc.com/analytics2/20160414T231646

20160415T030854 20160414T231936 20160414T231955 20160715 ddhcfs://[email protected]:pc_hdp/data/1/hadoop/hbase/hdboost-build.corp.emc.com/.hbase-snapshot/analytics2-snap/20160415T030854

20160415T032332 20160420T010239 20160420T010253 20160729 ddhcfs://[email protected]:pc_hdp/data/1/hadoop/hbase/hdboost-build.corp.emc.com/.hbase-snapshot/analytics2-snap/20160415T032332

Search backupsHadoop application agent provides the ability to search for a specific backup.To search for a backup, specify the search parameters.

Type the following command:

hdboost {--search|-s} {-? | -o {<backup-URI>| hdfs://<hostname>/<dir>} {--for object |--regex expression} [-L] [-V]}

Note

To display long information about the backup, specify the -L option. The longinformation includes:

l UID:GIDl Date

l File size

To search for the object in subfolders, specify the -V option.

For example:

hdboost --search -o hdfs://suse11sp301/test_hdboost --for dd_test.txt -V -L

Clean up backupsHadoop application agent provides the ability to clean up backups with retention timesthat have lapsed. Expired backups are not deleted automatically.

Administration Using the CLI

Search backups 65

Page 66: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

To clean up expired backups on a system, type the following command:

hdboost {--expire|-e} [--dryrun] [-D] [-y]

To show the results of the operation without cleaning up the expired backups, use the--dryrun option.

For example:

hdboost --expire --dryrun -D -y

Delete backupsYou can delete one or more backups. You can specify a specific backup object todelete or you can specify to delete all backups that fall within a specified date range.

Note

This operation deletes a backup even when the retention period is still active.

To delete the specified backup or backups, type the following command:

hdboost {--delete|-d} {-? | [-o <object>] [[[--before|--after] yyyy[-mm[-dd[.hh[-mm[-ss]]]]] | --from yyyy[-mm[-dd[.hh[-mm[-ss]]]] --to yyyy[-mm[-dd[.hh[-mm[-ss]]]] ] [-y] } [--dryrun]

For example:

hdboost --delete -o hdfs://suse11sp301/test_hdboost

Delete: 'ddhcfs://[email protected]:pc_hdp/data/1/hadoop/hdfs/suse11sp301/test_hdboost/20160624T170940'Are you sure (Y/N)? n

Delete: 'ddhcfs://[email protected]:pc_hdp/data/1/hadoop/hdfs/suse11sp301/test_hdboost/20160627T121835'Are you sure (Y/N)? y

Delete: 'ddhcfs://[email protected]:pc_hdp/data/1/hadoop/hdfs/suse11sp301/test_hdboost/20160627T153051'Are you sure (Y/N)? n

Refresh the Kerberos credentials cacheAfter a period of time, the Kerberos credentials cache expires. After expiration,hdboost commands do not work in a Kerberos environment without reinitializing thecredentials cache.To reinitialize the Kerberos credentials cache, type the following command:

kinit hdfs -c <dir>/<kerberos-file-name>

Administration Using the CLI

66 Hadoop Application Agent 4.5 Installation and Administration Guide

Page 67: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

For example:

kinit hdfs -c /opt/emc/dlp/credentials/kerberos_cc_file

Test the connection to the Data Domain systemThe Hadoop application agent provides the ability to test the connection between theHadoop name node, and any Data Domain system in the environment. The commandmay take up to 10 minutes for a negative response because it tries up to three times tocontact the specified Data Domain system.

To test the connection to a Data Domain system, complete the following steps.

Procedure

1. To display the Data Domain systems in the configuration, type the followingcommand:

hdboost --listconfig

For example:

hdboost –-listconfigHadoop App Agent Version: version number

Directories:

Primary SecondaryDevice ID Device ID maxmaps Source URI--------- --------- -------- --------------------

Devices:

ID Device---- ------------------------------------------------1 [email protected]:devpath2 [email protected]:master

Master Device:

target: 2secondary-target:

2. To specify a Data Domain system to test the connection, type the followingcommand:

hdboost {--test|-T]} {-? | [deviceid] } [-D]

Note

Leave out the target ID to test the connection to all Data Domain systems in theHadoop application agent configuration.

Administration Using the CLI

Test the connection to the Data Domain system 67

Page 68: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

For example:

hdboost --test 1

hdfs@cloudera1-sn:/root> hdboost --test 1Hadoop App Agent Version: version number Build: build number

The device [1] is effective.

Change retention datesHadoop application agent provides the ability to change the retention dates of one ormore backups after those backups are created. There are two types of retention datethat can be changed:

l Absolute retention date: Set a new expiration date for a single backup instance, fora range of instances of a single backup object, or for all backups of a single backupobject within a specified time range.

l Relative retention date: Add or subtract a specified number of months, days oryears to the retention period for a single backup instance, for a range of instancesof a single backup object, or for all backups of a single backup object within aspecified time range.

To change the retention date of a backup, complete the following steps.

Procedure

1. To list the backups on the Data Domain system, type the following command:

hdboost -l

2. Change the absolute or relative retention date of a backup.

l To change the absolute retention date, type the following command:

hdboost {--retention|-t} { -? | [-o <object>] [[[--before|--after] yyyy[-mm[-dd[.hh[-mm[-ss]]]]] | --from yyyy[-mm[-dd[.hh[-mm[-ss]]]] --to yyyy[-mm[-dd[.hh[-mm[-ss]]]] ] {--until {yyyy-mm-dd|forever} [-y] }

For example:

hdboost --retention -o hdfs://nameservice1/test1 --until 2016-12-31

Update retention: 'ddhcfs://[email protected]:cloudera1_k/data/1/hadoop/hdfs/nameservice1/test1/20160627T160030'.

Are you sure (Y/N)? y

l To change the relative retention date, type the following command:

hdboost {--retention|-t} { -? | [-o <object>] [[[--before|--after] yyyy[-mm[-dd[.hh[-mm[-ss]]]]] | --from yyyy[-mm[-dd[.hh[-mm[-ss]]]] --to yyyy[-mm[-dd[.hh[-mm[-ss]]]] ] {+|-}[#y][#m][#d] } [-y] }

Administration Using the CLI

68 Hadoop Application Agent 4.5 Installation and Administration Guide

Page 69: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

For example:

hdboost --retention -o hdfs://nameservice1/test1 --until +6m -y

Update retention: 'ddhcfs://[email protected]:cloudera1_k/data/1/hadoop/hdfs/nameservice1/test1/20160627T160030'.

Erase the backup configurationHadoop application agent provides the ability to erase specific backup objects, ordevices from the configuration as the environment changes.To erase a backup object from the configuration, specify the backup object to delete.

Type the following command:

hdboost {--eraseconfig|-x} { -? | {-o {<object>} | {--deviceid <ID> | --device [<username>@]<hostname>:device-path}} [-y]}

For example:

hdfs@hadoop1-sn:/root> hdboost --eraseconfig -o hdfs://nameservice1/test1Hadoop App Agent Version: version number Build: build number

Restore the configurationHadoop application agent provides the ability to restore the configuration in case of anoutage or corruption of the configuration file.

By default, the most recent configuration backup is restored. To restore an earlierconfiguration, specify the timestamp of its backup.

Procedure

1. To restore the Hadoop application agent configuration, type the followingcommand:

hdboost {--restore|-r} --config {latest | yyyy-mm-dd.hh-mm-ss}

For example:

hdboost --restore --config 2016-06-29.16-13-43

[hdfs@dh-hadoop1-sn logs]$ hdboost --restore --config2016-06-29.16-13-43Hadoop App Agent Version: version number Build: build number

Are you sure (Y/N)? y[hdfs@dh-hadoop1-sn logs]

The user may verify the restored configuration using hdboost --listconfig

2. In a non-Kerberos environment, the credential file for backup devices must bere-created. To re-build the credential file, perform the following steps:

Administration Using the CLI

Erase the backup configuration 69

Page 70: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

a. If the /opt/emc/dlp/credentials/dlp_cfg.jceks credential fileexists, remove the file.

b. To retrieve a list of every device that was previously connected, type thefollowing command:

hdboost -k --device all

c. For each device in the list, to add the device to the restored configurationfile, type the following command:

hdboost -a --device <user>@<hostname>:<device-path>

While adding a device, a prompt appears that asks you to input the passwordfor the device. The password that is specified is saved into the new keystorefile.

Display the software versionTo display the Hadoop application agent software version, type the followingcommand:

hdboost --version

For example:

hdboost --versionHadoop App Agent Version: version number Build: build number

hdfs@hadoop1-sn:/root>

Disaster recovery overviewIn situations where the configuration and credentials are lost, the hostname andcredentials for the master Data Domain system, or a replica of the master DataDomain system, can be used to retrieve lost configuration and credentials data.

This section describes how to retrieve information if a namenode from an existingcluster is lost, or if the master Data Domain system is lost.

Restore data from a lost namenodeTo address a situation where a namenode from an existing cluster has failed, performthe following steps.

Procedure

1. Reconfigure a new namenode, and then attach it to the existing cluster, orconfigure a new cluster in Hortonworks or Cloudera.

2. On the new namenode, install HDBoost.

Administration Using the CLI

70 Hadoop Application Agent 4.5 Installation and Administration Guide

Page 71: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

3. To restore the last known copy of the configuration, type the following:

hdboost --restore --config latest --device dd user@ddr hadoop.emc.com:master

4. To list the configuration devices, type the following:

hdboost -k --device all

5. If the environment does not use Kerberos, for each device, to regenerate thecredentials, type the following command:

hdboost -a --device <user>@<hostname>:<device-path>

6. To list all configured backups, type the following command:

hdboost --list-config

7. To restore each backed up directory from the previous list, type the followingcommand:

hdboost --restore -o [hdfs://<hostname>/<dir> | hbase://<hostname>/<table-name>]

Restore data from a lost master Data Domain systemTo address a situation where the master Data Domain system has failed, perform thefollowing steps.

Procedure

1. Commission a new master Data Domain system.

Note

If a replica of the failed master Data Domain system exists, to copy the datafrom the replica to the new master Data Domain system, use the Data Domainreplication feature.

2. If the new master Data Domain system or replica is not already in theconfiguration file, add the system to the file. Type the following:

hdboost --addconfig --device [email protected]:master

Administration Using the CLI

Restore data from a lost master Data Domain system 71

Page 72: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

3. Note the number of the Data Domain system, and then configure it to be themaster. Type the following:

hdboost --addconfig --master --deviceid <id> [ --secondary <optional replicate DD> ]

Administration Using the CLI

72 Hadoop Application Agent 4.5 Installation and Administration Guide

Page 73: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

CHAPTER 6

Backup and Recovery Application Agent

This chapter includes the following topics:

l Backup and Recovery application agent overview.............................................. 74l Backup and Recovery application agent capabilities...........................................74l Software requirements.......................................................................................74l Install BoostFS................................................................................................... 74l Passwordless SSH configuration........................................................................75l Backup and Recovery application agent installation overview............................ 76l Configure synchronized backups with HDBoost and BRBoost............................78l Create synchronized backups............................................................................ 79l Restore the databases....................................................................................... 79l Disaster recovery............................................................................................... 79l Disaster recovery scenarios............................................................................... 80

Backup and Recovery Application Agent 73

Page 74: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

Backup and Recovery application agent overviewBackup and Recovery application agent provides backup and recovery of HiveMetadata. To send and retrieve data to and from Data Domain systems, Backup andRecovery application agent uses Boost File System (BoostFS).

HDBoost first uses BRBoost to back up the Hive metadata, and then backs up eitherHDFS or HBase.

Backup and Recovery application agent capabilitiesBackup and Recovery application agent (BRBoost) provides a CLI interface for thefollowing operations:

l Configuringl Perform backupsl Enable restores of backupsl List backups that reside on the Data Domain systeml Delete backupsl Expire backupsl Create activity logs

Software requirementsThe following software is required to use the BRBoost feature:

l BoostFSl Data Domain Operating System (DDOS)l Hadoop-supported Hive MySQL metastore database that supports mysqldump

Note

To view the supported versions of BoostFS and DDOS go to:http://compatibilityguide.emc.com:8080/CompGuideApp/

Install BoostFSOn the metastore database client host, install, and then configure the BoostFSpackage.

BRBoost only requires that the users running BRBoost have read-and-write access tothe BoostFS-created mounts.

Procedure

1. Install the BoostFS RPM package that is bundled with BRBoost.

To download the RPM package, go to https://support.emc.com/downloads/.

2. Create the mount point as a root user for mounting the Data Domain system,and then set the following permissions:

mkdir /mnt/boostfs

Backup and Recovery Application Agent

74 Hadoop Application Agent 4.5 Installation and Administration Guide

Page 75: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

3. Using Lockbox or Kerberos, configure the BoostFS credentials:

l RSA Lockbox

n RSA Lockbox is the default password manager.

n To set up the lockbox, type the boostfs lockbox set command, forexample:

/opt/emc/boostfs/bin/boostfs lockbox set -d dd9500.company.com -u mysql -s su1 /mnt/boostfs

For more information about BoostFS commands, see the BoostFSConfiguration Guide.

l Kerberos

n To set up Kerberos authentication, type the boostfs kerebros setcommand, for example:

/opt/emc/boostfs/bin/boostfs kerberos set -u admin –s su1 --kerberos-realm realm --kerberos-username mysql

n For more information about BoostFS and Kerberos authentication, seethe BoostFS Configuration Guide.

4. To mount the Data Domain system, type the boostfs mount command, forexample:

/opt/emc/boostfs/bin/boostfs mount -d dd9500.company.com -s su1 -o allow-others=true /mnt/boostfs

MySQL backup scriptIf you want to create scripts to back up MySQL instances and databases, verify thatthe script can take the %destdir% label, and executes successfully for the users whoare to perform backups.

The %destdir% label is a placeholder that BRBoost uses to indicate where backupoutput data is placed. During backup, %destdir% is converted to the final backuppath on the Data Domain mount path, or the local file path before the command isexecuted.

For security purposes, the password that is associated with the MySQL backup user isstored in a my.cnf file under a specific OS user account.

Passwordless SSH configurationTo allow HDBoost backup users to connect to the metastore database client host, andthen run BRBoost, configure Passwordless SSH.

Backup and Recovery Application Agent

MySQL backup script 75

Page 76: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

Verify the SSH configurationAfter you configure passwordless SSH, verify the configuration.

From the namenode, type the following command:

ssh mysql@<metastore client host name> “ls –l /”

Note

If passwordless SSH is configured correctly, you are not prompted for a password forthe metastore client host. If you are prompted for a password, verify the passwordlessSSH connection.

Backup and Recovery application agent installationoverview

Install the Backup and Recovery application agent from the root user account on theHadoop node or client host where the metastore database client resides.

Type the following command:

rpm –i emcbrboost-1.0.0.1-1.x86_64.rpm

So that missing dependencies can be installed, it is recommended that you avoid usingthe --nodes option.

Download the softwareDownload the Hadoop Application Agent and Backup and Recovery Application Agentsoftware files from the Support website.

Procedure

1. On UNIX or Linux, log in as the root user.

2. In a local file system, create a temporary installation download directory withsufficient free disk space to contain both the downloaded software package andthe software installation files that are extracted from the package. On UNIX orLinux, type:

mkdir/usr/extract_hdboost

3. Go to https://support.emc.com.

4. Browse to the Downloads page, and then search for Hadoop ApplicationAgent.

5. Download the Hadoop Application Agent software file to the temporaryinstallation download directory.

6. Extract the installation files from the downloaded software package:

Backup and Recovery Application Agent

76 Hadoop Application Agent 4.5 Installation and Administration Guide

Page 77: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

a. On UNIX or Linux, to uncompress the downloaded package, type thegunzip command with the file_name .tar.gz name for the specificdownload file name:

gunzip emchdappagent-4.5.0.x-1-linux_x86_64.tar.gz .tar.gz

b. Extract the software from the uncompressed, tarred file:

tar -xvpBf emchdappagent-4.5.0.x-1-linux_x86_64.tar.gz .tar

The extraction lists the distribution software files on the screen.

Remain in the directory for the installation.

Install and configure BRBoostProcedure

1. Go to https://support.emc.com/downloads/, and then download the followingBRBoost package:

rpm –i emcbrboost-1.0.0.1-1.x86_64.rpm

2. To add a BoostFS mount point, type the following command:

brboost -a --device ddboostfs:///<BoostFS mount point>

3. To test the configured mount point, type the following command:

brboost -T

4. Add the backup command, and then associate that command with the devicethat you designated when you added the mount point. Type the followingcommand:

brboost -a -n HiveBackup -o "/space1/scripts/backup_mysql.sh %destdir% root" --<deviceid>

The backup command (-o) can consist of an actual executable path withcommand options, or a path to a shell script with command-line options. Eachbackup command should include the %destdir% label for output paths, asshown in the following example:

/usr/bin/mysqldump -u mysql --all-databases > %destdir%/whole_instance_backup.sql

BRBoost uses the %destdir% label as a placeholder to indicate where backupoutput data is placed. During backup, %destdir% is converted to the finalbackup path on the DD mount path or the local file path before the command isexecuted.

Backup and Recovery Application Agent

Install and configure BRBoost 77

Page 78: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

In the following example, %destdir% is passed into the backup_mysql.shscript as an input parameter:

/<path to script>/backup_mysql.sh %destdir%

5. To test the backup, run a backup, for example:

brboost –b –n HiveBackup

Configure synchronized backups with HDBoost andBRBoost

Before you begin

Perform the following steps:

1. On the namenode, install HDBoost .

2. On the Hive database server, install BRBoost.

3. Configure passwordless SSH.

Procedure

1. To configure BRBoost with HDBoost, type the following command:

hdboost –a –hive brboost://<user>@hive.host.com:HiveBackup

2. To associate the HDBoost backup configuration with a BRBoost backup, typethe following command:

hdboost -a -n <cfg-name> -o hdfs://dh-hdp-ph-nn1/data --deviceid <device ID> --maps 9 –hiveid <device-ID> –y

3. To configure an HDFS path to the backup and the associated BRBoost backup,type the following command:

a. hdboost -a -o hdfs://dh-hdp-ph-nn1/data --deviceid 1 --maps 9 –hiveid 1 –y

Backup and Recovery Application Agent

78 Hadoop Application Agent 4.5 Installation and Administration Guide

Page 79: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

Create synchronized backupsWhen you have completed the installation and configuration, you can create asynchronized backup of the metastore database and Hadoop file system (HDFS).

To create synchronized backups, type the following command:

hdboost -b -n hdfs backup –y

Restore the databasesAlthough you can restore the BRBoost database and Hadoop independently, restorethe Hive metastore database first because of the dependency between the Hivedatabase and the data directory in the Hadoop file system (HDFS).

Procedure

1. To determine which MySQL backup and associated HDBoost backup to restore,type the following command:

hdboost -l

2. Shut down Hive and all other services that use the metastore database.

You can shut down Hive through the management console of the Hadoopinstallation you are using.

3. To restore the Hive metastore database, perform the following steps:

a. To locate the backup, type the following command:

brboost -l

b. To restore the database, use database-specific tools.

4. Restore the HDBoost backup using normal workflows.

See Restore overview on page 59 for more information.

5. Restart HDFS services.

6. Restart Hive and other Hadoop components that employ the metastoredatabase.

Disaster recoveryFor a comprehensive disaster recovery plan, ensure that you can reconstruct thecomputing environment and all the associated files.

In all cases, you must know the hostname and credentials of either the master DataDomain system or a replicate of that system so you can retrieve the basicconfiguration and master index.

Backup and Recovery Application Agent

Create synchronized backups 79

Page 80: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

Disaster recovery scenariosAfter you have retrieved the basic configuration and master index information, youcan take additional steps to re-create the lost environment.

The following sections address three disaster recovery scenarios:

l Loss of the master Data Domain system

l Loss of the Hive Client

l Failure of the Hadoop database and the Hive server

In each scenario, assume the following:

l BoostFS uses the following master uniform resource identifier (URI):

boostfs:///boostfs-mountpoint-master

l The /mnt/boosfs/master is mounted to the following:

dd-user:secret@ddr-hadoop.<domain-name>.com:master

Loss of the master Data Domain systemThe master Data Domain system is used to store configuration and metadata aboutBRBoost backups.

The following procedure addresses a scenario in which the master Data Domainsystem that is configured on the Hive client is lost, but the Hive client backup systemremains functional.

Procedure

1. Commission a new master Data Domain system.

If a replicate exists, you can replicate existing data to the new master systemoutside of Hadoop application agent.

2. To mount the master device using BoostFS, perform one of the following:

l boostfs lockbox set –u dd-user –d ddr.emc.com –s master

l boostfs mount –d ddr.emc.com –s master /mnt/boostfs/master

3. If the new master or replicate is not present in the configuration file, to add themaster or replicate, type the following command:

brboost –a device boostfs:///mnt/boostfs/master

4. Perform a backup operation.

Do not skip the backup configuration options.

Loss of the Hive clientIf the Hive client system is lost, you may have to perform one or more reinstallationand configuration operations.

Backup and Recovery Application Agent

80 Hadoop Application Agent 4.5 Installation and Administration Guide

Page 81: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

These operations include any or all of the following:

l Reinstall the operating system.

l Reinstall the database or application software.

l Configure the database server or instance.

l Reinstall and configure BoostFS.

l Reinstall Back and Recovery Application Agent (BRBoost).

Note

See the documentation for the applicable operating system and other third-partysoftware for installation and configuration procedures.

Procedure

1. Based on the type of disaster that occurred, reconfigure the Hive client.

2. If you are setting up a new Data Domain system, reinstall BoostFS.

3. Mount the master mount point (/mnt/boosfs/master).

4. To install and configure BRBoost with the master mount point, type one of thefollowing commands:

l brboost -a -device boostfs:///mnt/boostfs/master

l brboost -a -master deviceid 1

This option is based on the assumption that the only device is the oneconfigured in the previous step.

5. To restore the last known copy of the BRBoost configuration file, type thefollowing command:

brboost --restore --config latest -deviceid 1

6. Based on the information that is listed in the configuration file, re-configure allBoostFS mount points. Type the following command:

brboost -k -device all

Based on the information in the previous steps, create, and then mount brboostmount points. Ensure that all mounts that are listed in the configuration arecreated with the same path and user information that is provided in thebrboost -k -device all command.

7. To verify that all the re-configured mount points are available, type thebrboost -T command.

8. On the Hadoop console, if required, shut down any Hive-related services thatare still running.

9. Prepare the database instance for recovery.

MySQL may require you to complete additional preparation steps before youperform a restore operation. Consult the MySQL documentation relevant to theenvironment for details.

Backup and Recovery Application Agent

Loss of the Hive client 81

Page 82: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

10. To list the backup location of the associated backup configuration to berestored, type the following command:

brboost -n Hive_backup_full -y

11. Perform MySQL database restore operations using the MySQL backup found inthe directory listing.

12. Bring the MySQL database into full operational mode.

13. Restart any dependent Hive services on the Hadoop console.

Synchronized disaster recoveryAlthough you can restore the BRBoost database and Hadoop independently, restorethe Hive metastore database first because of the dependency between the Hivedatabase and the data directory in the Hadoop file system (HDFS).

See Restore the databases on page 79 for more information.

Backup and Recovery Application Agent

82 Hadoop Application Agent 4.5 Installation and Administration Guide

Page 83: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

CHAPTER 7

Troubleshooting

This chapter includes the following topics:

l Troubleshooting overview.................................................................................. 84l Log information..................................................................................................84

Troubleshooting 83

Page 84: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

Troubleshooting overviewTo troubleshoot issues with Hadoop application agent, take the following actions tohelp diagnose and resolve issues:

l Verify that the Hadoop application agent environment is running supportedversions of DD OS and Hadoop.

l To identify and resolve issues with the cluster services, use the managementconsole of the Hadoop distribution (Cloudera or Hortonworks).

l Verify basic Hadoop operations function as expected.

l Verify the status of Kerberos authentication in the Hadoop application agentenvironment.

l To generate detailed debug logs, specify the -D option with Hadoop applicationagent CLI operations.

l To expire, delete, and preview results without actually performing the operation,specify the --dryrun option with commands.

l To test connectivity to the Data Domain system, type the hdboost --testcommand.

Log informationFor more information when troubleshooting Hadoop application agent, check thefollowing logs:

l Hadoop job history

l Hadoop application agent logs in:/var/opt/dlp/logs/debugThe following Hadoop application agent logs are available:

n Hadoop application agent binary logs

n DD BoostFS logs

n DDHCFS logs (one per node in the Hadoop cluster)

All normal Hadoop application agent activity, such as start, stop, success, and failurefor Hadoop application agent operations are logged as informational entries in:

/var/opt/dlp/logs/hdboost.<yyyymmdd>.<hhmmss>.<pid>.logwhere:

l <yyyymmdd> is the date when the log was created.

l <hhmmss> is the time when the log was opened.

l <PID> is the Hadoop application agent process ID.

Troubleshooting

84 Hadoop Application Agent 4.5 Installation and Administration Guide

Page 85: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

CHAPTER 8

Command Reference

This chapter includes the following topics:

l hdboost command overview...............................................................................86l brboost command overview............................................................................... 97

Command Reference 85

Page 86: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

hdboost command overviewBe aware of the following hdboost options available with the Hadoop applicationagent CLI.

HelpFor general Hadoop application agent help, type the hdboost {--help|-h}command.

For help with a specific command, type the command with the -? option.

Command optionsThe following command options are valid across multiple Hadoop application agentcommands.

--kerberoscc

Specify the Kerberos credential cache file.

-D

Debug, send detailed information to the logs stored in /var/opt/dlp/logs.

-y

To allow operations to be scripted, assume yes to all user prompts.

-F or --format

Specify the format of the output. The supported format is json.

hdboost --addconfigThe hdboost --addconfig command allows users to add Data Domain systems tothe Hadoop application agent, specify the master Data Domain system, and createprimary and secondary system relationships between Data Domain systems.

hdboost {-addconfig|-a} –n <name> -o {hdfs://<hostname>/<dir> | hbase://<hostname>/<tablename>} --deviceid <id> [maps <maps>] [until {[#y][#m][#d] | yyyy-mm-dd | forever] }] [-options <distcp options>] [-P] [-N] [-D] [-y]

This command configures the configuration file backup, a directory to be backed up, aHBase table to be backed up, or a target Data Domain appliance to hold the backups.This command may be used to add Data Domain systems that contain Hadoopapplication agent backup data from another cluster to the configuration. To add aData Domain system, input the Data Domain credentials, and then link it to a restoretarget directory.

hdboost {--addconfig|-a} --device user@hostname:device-path [--kerberoscc <credential-cache>] [--secondary <sid>] [--isSecondary] [-D] [-y]

This command configures a backup host. Hadoop application agent automaticallyassigns the configured Data Domain system the GUID, and appends it to thebackup‑devices section of the configuration file. If there is an existing entry for thisData Domain, it is overwritten.

Command Reference

86 Hadoop Application Agent 4.5 Installation and Administration Guide

Page 87: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

If this Data Domain system is the first to be configured, it is added to the masterdevice section of the configuration file and becomes the default target whereconfiguration files are backed up.

hdboost {--addconfig|-a} --master --deviceid <id> [--secondary <sid>] [--D] [--y]

This command specifies a configured device as the master device to hold backupsmetadata. By default, the first configured device is designated as the master device.

hdboost {--addconfig|-a} -o {hdfs://<hostname>/<dir> | hbase://<hostname>/<tablename>} -n <name> --deviceid <id> [--maps <maps>] [-D] [-y]

This command configures a source HDFS directory or HBase table to be backed up tothe target identified by the deviceid number, and assigns it with the specified backupconfiguration name. If the assigned name exists, the configuration is updated withinvoked parameters.

If the source object is specified with hdfs://hadoop.emc.com/<directory>, anHDFS file system backup is assumed. If a parent directory of the source has alreadybeen configured for backup, an error is logged and is returned to the user.

If the source object is specified with hbase://hadoop.emc.com/<tablename>, aHBase backup is configured. Anything else results in an error being logged andreturned to the user. If there is an existing configuration for the source object, it isoverwritten.

The following list includes information about the options that can be specified with thehdboost --addconfig command.

--deviceid

Specifies which Data Domain configuration is used to perform the backup. If thereis no configured Data Domain corresponding to this number, an error is loggedand returned to the user.

--isSecondary

The --isSecondary option reserves the device and prevents it from being usedby another backup configuration. Reserving the device prevents additionalunwanted or accidental replication sessions from occurring in situations wheredevices are shared between backup configurations.

--kerberoscc

If using Kerberos, only a username is specified. If the user does not supply theKerberos credential cache location with the --kerberoscc option, they areprompted for it.

--maps

Specifies the maximum number of map jobs to run, which limits the number of DDconnections and reduces load on the Hadoop cluster at the expense of increasedbackup times.

--secondary

(Optional) Specify to indicate a replication destination. If there is no configuredDD corresponding to this number, an error is logged and returned to the user.

Command Reference

hdboost --addconfig 87

Page 88: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

hdboost --backupThe hdboost --backup command initiates Hadoop application agent backupoperations for Hadoop objects and HBase tables.

hdboost {-backup|-b} { -? | { - until {[#y][#m][#d] | yyyy-mm-dd | forever] }} [-n <name> | [-o {hdfs://<hostname>/<dir> | hbase://<hostname>/<tablename} --deviceid <id>] [maps <maps>] [-kerberoscc <credential-cache>][-options <distcp options>] [-N] [-P]] [-D] [-y]

This command snapshots and backs up the specified directory or HBase table.

Note

When you restore an HBase snapshot, any existing snapshot data with the same nameis overwritten.

This command supports two ways of performing a backup:

l Using the provided backup configuration name to back up the source.

l Backing up the source that is specified by –o to the backup device specified by -deviceid.

If using Kerberos, the optional --kerberoscc option overrides the credential cachefile set in the backup device configuration. If the user specifies --kerberoscc whileKerberos is not configured, an error is logged and returned to the user.

The optional --until parameter specifies retention time. Retention time relative totoday's date is specified in the format:

l y: The number of years to retain the backup.

l m: The number of months to retain the backup.

l d: The number of days to retain the backup.

It is possible to specify #y, #m, and #d on the same line, for example, 1y3m2d wouldyield a retention time of 1 year, 3 months, and 2 days from the current local date.

A retention time in the yyyy-mm-dd format retains the backup until the specifiedabsolute date. If either the date or path that is specified is invalid, an error is returnedto the user. If no retention time is specified, 3 months from the current date isassumed. The backup is retained until manually deleted by the DD administrator, it isexplicitly deleted with a --delete invocation, or an --expire operation is run afterthe retention date has been passed. It does not make sense to have retention timegranularity of less than a day. The time is assumed to be 00:00:00 (midnight) on thespecified date.

If no source object is specified, all configured file system and HBase backups areperformed in sequence. Each backed up object is assigned a unique backup URI. Thisoperation assumes that the user has configured the backups so that there is enoughavailable capacity to hold them all.

If the source object is specified with hdfs://<hostname> /<directory> a filesystem backup is performed. If a parent directory of the source has been configuredfor backup, an error is logged and returned to the user because HDFS does not allowsnapshot-capable directories to exist within other snapshot-capable directories.

Command Reference

88 Hadoop Application Agent 4.5 Installation and Administration Guide

Page 89: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

Note

It is possible to backup all HDFS user snapshots by specifying hdfs://<hostname>/<directory>/.snapshot/ as the source. Each user snapshot isconsidered a separate backup.

If the source object is specified with hbase://<hostname>/<tablename>, anHBase table backup is performed. Anything else results in an error being logged andreturned to the user. If there is an existing configuration for the source object, it isoverwritten.

If the source object is specified as hbase://<hostname>/.hbase-snapshot/<snapshot_name>, the snapshot is exported to the Data Domain system. As withHDFS user snapshots, HBase user snapshots are not deleted after backup.

Note

It is possible to backup all HBase user snapshots by specifying hbase://<hostname>/.hbase-snapshot/ as the source. Each user snapshot serves as aseparate backup.

For either the file system or HBase backup case, the backup is assigned a uniquebackup URI. If an invocation error occurs, no backup URI is assigned.

hdboost --deleteThe hdboost --delete command allows the user to delete one or more backupsthat are stored on the Data Domain system.

hdboost {--delete|-d} { -? | [-o { <backup-URI> | hdfs://<hostname>/<dir> | hbase://<hostname>/<tablename>}] [[ [--before |--after] yyyy[ -mm[-dd[.hh[-mm[ -ss]]]]] | --from yyyy[-mm[-dd[.hh[-mm[-ss]]]] --to yyyy[-mm[-dd[.hh[-mm[-ss]]]] ] [--kerberoscc <credential-cache>] [--dryrun] [-D] [-y] [-F -json]}

This command removes the backup target matching the specified URI, regardless ofretention date, unless a retention lock has been set outside of Hadoop applicationagent by the Data Domain administrator.

If the user specifies no options, then all backups are deleted.

The ‑o option limits the deletion to a particular backed up file system or HBase object.

All deleted backups are logged at warning level.

If using Kerberos, the optional --kerberoscc option overrides the credential cachefile set in the backup device configuration. If the user specifies --kerberoscc whileKerberos is not configured, an error is logged and returned to the user.

The --dryrun option prints the backups that would be deleted, so users candetermine exactly what will be removed. If the ‑‑dryrun and ‑y options are specifiedtogether, Hadoop application agent still prompts the user for each phantom action,exactly as if a real delete was about to happen.

If the object does not exist, an error is logged and returned to the user.

The --before and --after options allow users to specify a baseline time and datefrom which to delete all backups before or after that date.

Command Reference

hdboost --delete 89

Page 90: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

The --from and --to options allow users to specify a date range to delete allbackups that fall within that range.

hdboost --eraseconfigThe hdboost --eraseconfig command erases configuration information from theHadoop application agent configuration file.

hdboost {--eraseconfig|-x} –n <name> [-y]

This command removes the named backup configuration. The configuration for theData Domain system that is used to back up the object is left untouched because itmay be used for other backups. You are prompted to confirm the intent to erase theconfiguration unless the -y option is specified.

hdboost {--eraseconfig|-x} -o {hdfs://<hostname>/<dir> | hbase://<hostname>/<table-name>} [-D] [-y]

This command removes the HBase table or Hadoop directory object from the backupconfiguration. The configuration for the Data Domain system that is used to back upthe object does not change, since it may be used for other backups.

hdboost {--eraseconfig|-x} --deviceid <id> [-D] [-y]

This command removes the configuration and credentials for the specified backupappliance. If the id number specified is invalid or it is still being used as the target ofany backup operation, the operation fails, and an error is logged and returned to theuser. The user must erase or reconfigure the backup configurations that are using itfirst.

hdboost {--eraseconfig|-x} --device user@hostname:device‑path [-D] [-y]

This command removes the configuration and credentials for the specified backupappliance. If the Data Domain system does not exist, or if it is still used as the target ofany backup operation, the operation fails.

hdboost --expireThe hdboost --expire command initiates a cleanup operation that deletes allbackups with lapsed retention periods.

hdboost {--expire|-e} { -? | [--kerberoscc <credential-cache>] [--dryrun] [-D] [-y] [- F json}

This command runs a deletion scan that removes all backups older than the retentiondate specified in their metadata. Backups with forever retention time, or a retentionlock set by the Data Domain administrator are not deleted, until the retention lock iscleared.

The --dryrun option prints the backups that would be deleted, so users candetermine exactly what will be removed. If the ‑‑dryrun and ‑y options are specifiedtogether, Hadoop application agent still prompts the user for each list of users to be

Command Reference

90 Hadoop Application Agent 4.5 Installation and Administration Guide

Page 91: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

deleted, exactly as if a real delete was about to happen. If -F json is specified, theoutput is returned in .json format.

hdboost --jobThis command allows you to list current jobs, view their states, kill, and purge jobs.

hdboost --job {--job|-j} –status | --kill | --purge [-D]

This command lists all the jobs in the /opt/emc/dlp/jobs/ path. There are twotypes of jobs, backup and restore. The state of a job can be aborted, failed, runningsuccessful, or unknown. This command also provides options to list jobs by id, type,and state. In addition, the command provides the option to cancel one or more jobsusing the HDBoost-assigned job ID.

hdfs@bu-cloudera-nn:/opt/emc/dlp> hdboost -j -?Hadoop App Agent Version: 4.5 Build: 1_SNAPSHOT_LOCAL_BUILD hdboost --job|-j --status [--jobid <ID> | --type backup|restore | --state aborted|failed|running|successful|unknown] [--format|-F <format>] The command displays job status information. Optionally jobs can be listed by a specific ID, by type, or by state.hdboost --job|-j --kill --jobid <ID> [--format|-F <format>] The command enables the termination of a specific running job by the job ID.hdboost --job|-j --purge [--jobid <ID>] [--before <now | yyyy[-mm[-dd[.hh[-mm[-ss]]]]] > ] [--dryrun] The command enables the deletion of job records from the jobs directory. By default, the jobs that are older than two weeks are deleted unless a job ID or before time is provided.

Options: --format: Specify the format of the output. Supported format is 'json'. --status: List the status of the jobs. --kill: Cancel a job with the specified job ID. --purge: Delete the job records. --jobid: Job ID as assigned by hdboost. --state: List all the jobs in the specified state. --type: List all the jobs of the specified type. --before: Delete the jobs that are older than the specified time. --dryrun: Perform the purge operation without deleting any job records. -D: Add debugging information to the log.

hdboost -job | -j --status {jobid <id> | [-type backup|restore | --state aborted|failed|running|successful|unknown]}

hdboost --kerberosThe hdboost --kerberos command allows users to enable and disable Kerberosauthentication.

hdboost {--kerberos|-K} { -? | [--disable|--enable] }

Command Reference

hdboost --job 91

Page 92: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

This command disables or enables Kerberos authentication. When Kerberos is disabledthe password from the dlp_cfg.jceks credentials file and username from thehdboost_cfg.json file is used as the Data Domain credentials. If Kerberos isenabled, dlp_cfg.jceks is ignored.

hdboost --listThe hdboost --list command lists available backups with optional object and daterange filtering.

hdboost {--list | -l} [-L] { -? | [-o { <backup-URI> | hdfs://<hostname>/<dir> | hbase://<hostname>/<tablename>}] [ [ [--before |--after] yyyy[ mm[-dd[.hh[-mm[-ss]]]]] | --from yyyy[ mm[-dd[.hh[-mm[-ss]]]] --to yyyy[-mm[-dd[.hh[-mm[-ss]]]] ] [--kerberoscc <credential-cache>] [-D] [-F json]}

If using Kerberos, the optional --kerberoscc option overrides the credential cachefile set in the backup device configuration. If the user specifies --kerberoscc whileKerberos is not configured, an error is logged and returned to the user.

The --before and --after options allow users to specify a baseline time and datefrom which to delete all backups before or after that date. The --from and --tooptions allow users to specify a date range to delete all backups that fall within thatrange.

There are short and long options for the list command. The short version is thedefault.

The following is an example of the short version:

Retention Backup URI========= ============20160715 ddhcfs://[email protected]:pc_hdp/data/1/hadoop/hbase/hdboost-build.corp.emc.com/analytics2/20160414T231646

20160715 ddhcfs://[email protected]:pc_hdp/data/1/hadoop/hbase/hdboost-build.corp.emc.com/.hbase-snapshot/analytics2-snap/20160415T030854

The following is an example of the long version [-L]:

Snapshot Time Backup Start Backup End Retention Backup URI=============== =============== ================ ========= ===========20160414T231646 20160414T231707 20160414T231720 20160715 ddhcfs://[email protected]:pc_hdp/data/1/hadoop/hbase/hdboost-build.corp.emc.com/analytics2/20160414T231646

20160415T030854 20160414T231936 20160414T231955 20160715 ddhcfs://[email protected]:pc_hdp/data/1/hadoop/hbase/hdboost-build.corp.emc.com/.hbase-snapshot/analytics2-snap/20160415T030854

Command Reference

92 Hadoop Application Agent 4.5 Installation and Administration Guide

Page 93: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

The following command restricts the list output to a single backup specified by thesupplied URI.:

hdboost {--list | -l} [-L] -o <backup-URI> [--kerberoscc <credential-cache>] [-D]

This command lists only a single backup if it is specified with --from or --to. If aninvalid URI is specified, an error is logged and returned to the user.

hdboost --listconfigThe hdboost --listconfig command lists the configuration file entries, optionallyfiltered to show only those tied to the given backup Data Domain system, sourceobject, or the master Data Domain system.

hdboost {--listconfig|-k} { -? | [--deviceid <id> |--device user@<hostname>:<device-path>] | --master | -o {hdfs://<hostname>/<dir> | hbase://<hostname>/<tablename>}] }

The following shows an example of the command output:

Primary Secondary max Device ID Device ID maps Source URI ========= ========= ==== ==================== 1 hdfs://dlpm-build.corp.emc.com/user/admin 2 hdfs://dlpm-build.corp.emc.com/test

Devices:

ID Device==== ====== 1 [email protected]:cloudera1 2 [email protected]:cloudera2 3 [email protected]:pc_master

Master Device:

target: 3secondary-target:

Env:

DDHCFS_KERBEROS: disabledHADOOP_CLASSPATH: /opt/emc/dlp/java/ddhcfs.jarMR2_CLASSPATH: /opt/emc/dlp/java/ddhcfs.jarJAVA_LIBRARY_PATH: /usr/lib/dlp/lib64HADOOP_BIN_PATH: /usr/binHBASE_CONF_DIR: /etc/hbase/conf.cloudera.hbase

hdboost --restoreThe hdboost --restore command restores a Hadoop application agent backup ofa Hadoop object or an HBase table from a Data Domain system.

hdboost {--restore|-r} { -? | { --config {latest | yyyy-mm-dd.hh-mm-ss } | -o {<backup-URI> [-S sub-directory | hdfs://<hostname>/<dir>

Command Reference

hdboost --listconfig 93

Page 94: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

[--options <distcp-options>] [-S sub-directory] | hbase://<hostname>/<tablename>} [-R dst-object] [ --deviceid <id> | --device user@hostname:device-path] [--kerberoscc <credential-cache>] [-D] [-p] [-y]}

This command restores Hadoop application agent backups of Hadoop objects orHBase tables, or Hadoop application agent configuration information.

Note

When you restore an HBase snapshot, any existing snapshot data with the same nameis overwritten.

If the backup was for a Hadoop directory, the target location must be a directory. Ifthe target location does not currently exist, the target location is created.

The following list includes information about the options that can be specified with thehdboost --restore command.

‑‑config

Restores the configuration and dlp.idx files only. This option would normallyonly be used if part of the configuration is corrupt or in the event of a disasterrecovery. The user may specify the latest configuration with the keywordlatest or specify the timestamp of an earlier backup in yyyy‑mm‑dd.hh‑mm‑ssformat. The latest configuration timestamp is determined by searchingmetadata/<version>/hadoop/dlp.idx of the supplied device. Thetimestamp is obtained from the master Data Domain system unless overriddenwith the ‑‑device or --deviceid invocation options.

--device

Overrides the Data Domains system in the configuration file and tries to restorefrom the Data Domain system that is specified at the command prompt.

--deviceid

Overrides the Data Domain system number in the configuration file, and tries torestore from the specified Data Domain system.

--kerberoscc

If using Kerberos authentication with --device, the logged in user runninghdboost is authenticated as the DD user. The optional --kerberoscc optionoverrides the credential cache file set in the backup device configuration.

--options

Provides special options to the distcp invocation. To set the number of mapsthat are used, you can use this option, for example, --options "-m 10" limitsthe restore to 10 maps.

-R or --redirect

The -R or --redirect option may be specified to redirect the restore operationto a new destination, which must be a fully qualified URI. If the backup was anHDFS directory, then the redirect must be to an HDFS directory as well. If anHBase table restore is redirected to another cluster, it cannot be automaticallyrecreated. The user must launch an HBase shell on the remote cluster and runclone_snapshot <snapname>, <tablename>. The required action isdisplayed when the restore command runs.

Command Reference

94 Hadoop Application Agent 4.5 Installation and Administration Guide

Page 95: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

If the device does not exist, the credentials are wrong, or the requested backup isnot found, an error is logged and returned to the user. This option is normallyused for disaster recovery, or replicated restores where the replicant is not in theconfiguration file.

-S

Specifies a subdirectory for partial restore. If the specified index was an HBasetable backup and if a subdirectory argument is specified, an error is logged andreturned to the user. Only the indicated subdirectory of the original backup isrestored.

-y

If any data is overwritten during the restore, the operation stops and prompts theuser for permission unless the -y option is specified. If destination data is to beoverwritten, no additional prompt is displayed as long as the user has specified [-y]. There is no difference in the user experience between restoring to a locationwith data, or without.

hdboost --retentionThe hdboost --retention command allows users to change the retention periodsof Hadoop application agent backups that are stored on a Data Domain system.

hdboost {--retention|-t} { -? | [-o {<backup-URI> [hdfs://<hostname>/<dir> | hbase://<hostname>/<tablename>}] [ [ [--before |--after] yyyy[-mm[-dd[.hh[-mm[-ss]]]]] | --from yyyy[ mm[-dd[.hh[-mm[-ss]]]] --to yyyy[-mm[-dd[.hh[-mm[-ss]]]] ] {--until {yyyy-mm-dd|forever} | {+|-}[#y][#m][#d] } [--kerberoscc <credential-cache>] [-D] [-y] }

This command updates backup retention times.

The -o option restricts the retention time update to a particular file system backup,HBase table backup, or a backup URI.

A retention --until time in the yyyy-mm-dd format retains the backup until thespecified date and time. A retention time of forever indicates that the backup is tobe retained until explicitly deleted.

Note

The baseline time and date is based on the local system time.

The user may specify retention time relative to the currently specified retention timewith the following format:

l y: The number of years to retain the backup.

l m: The number of months to retain the backup.

l d: The number of days to retain the backup.

The --before and --after options allow users to specify a baseline time and datefrom which to delete all backups before or after that date.

The --from and --to options allow users to specify a date range to delete allbackups that fall within that range.

l If the mm field is not specified, it is assumed to be 1 (January).

l If the dd field is not specified it is assumed to be 1 (1st day of the month).

Command Reference

hdboost --retention 95

Page 96: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

l If the hh field is not specified, it is assumed to be hour 0 (midnight).

l If the mm field is not specified, it is assumed to be minute 0.

l If the ss field is not specified, it is assumed to be second 0.

If preceded by a + or -, the retention time is adjusted relative to the current retentionperiod.

It is possible to specify #y, #m, and #d on the same line, for example, 1y3m2d wouldyield a retention time of 1 year, 3 months, and 2 days from the current retentionperiod.

If using Kerberos, the optional --kerberoscc option overrides the credential cachefile set in the backup device configuration. If the user specifies --kerberoscc whileKerberos is not configured, an error is logged and returned to the user.

hdboost --searchThe hdboost --search command allows users to search for specific backupinformation

hdboost {--search|-s]} { -? | -o {<backup-URI> | hdfs://<hostname>/<dir>} {--for object |--regex expression} [--kerberoscc <credential-cache>] [-D] [-L] [-V] [-F json]}

The user may search the specified backup URI for the directory or file object. Thebackup URI must be a directory backup or, an error is logged and returned to the user.

If an HDFS URI is specified instead, the most recent backup of the specified directoryis searched for the desired object.

If using Kerberos, the --kerberoscc option overrides the credential cache file set inthe backup device configuration.

The ‑V option searches for the object in subdirectories within the specified searchdirectory.

The -‑regex option allows for regular expression searching.

The -L option displays the permissions, UID:GID, timestamp, and file size.

hdboost --testThe hdboost --test command tests the connection between Hadoop applicationagent and the Data Domain system.

hdboost {--test|-T]} { -? | [--kerberoscc <credential-cache>] [-D] [deviceid <ID>] }

This command tests connectivity to the configured target Data Domain systems. If theoptional target ID is not specified, all Data Domain systems are tested in turn.

If using Kerberos, the optional --kerberoscc option overrides the credential cachefile set in the backup device configuration.

Command Reference

96 Hadoop Application Agent 4.5 Installation and Administration Guide

Page 97: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

hdboost --versionThe hdboost --version command displays the version number of the Hadoopapplication agent software.

hdboost --version [-?]

This option displays the current hdboost and interface version number, to identify theproduct name and version.

brboost command overviewBe aware of the following brboost options available with the Hadoop applicationagent CLI.

HelpFor general brboost help, type the brboost {--help| -h} command.

For help with a specific command, type the command with the -? option.

Command optionsThe following command options are valid across multiple brboost commands.

-y

Assume yes to all user prompts, to allow operations to be scripted.

-D

Include debug information in the brboost logs.

-?

Provides command-specific help.

brboost --addconfigThe brboost --addconfig command allows you to configure a mount point, set upa mount path to use for backups, specify a target device, and configure a MySQLbackup command or script.

brboost {--addconfig|-a} -? {--name | -n <name>} master --deviceid <id> [--secondary <sid>] | -o {“<backup command>”} --deviceid <id> [--secondary <sid>] | - device [ddboostfs:///<mnt point> | file:///<local file path>] [ y] }

This configures a backup command for a target BoostFS mount point or a file path towhere the backups are to be stored.

Command Reference

hdboost --version 97

Page 98: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

Note

An (Y/N) prompt appears for you to confirm whether you want to update theconfiguration unless you specify the -y invocation option.

brboost {--addconfig | -a} master --deviceid <id> [--secondary <sid>] [-y]

Sets up the BoostFS mount path to which you want to send backups. If there is noBoostFS mount point that is configured, an error message is displayed.

An optional --secondary parameter allows you to specify a replicate.

brboost {--addconfig|-a} {--name | -n <name>} -o {“backup command”} --deviceid <id> [ secondary <sid>] [-y]

Configures the MySQL backup command or script to run and specifies which targetdevice to use identified by the deviceid number. If there is an existing configuration forthe source object, it is overwritten.

--name | -n <name> is a user-specified unique name to identify the backup object.

The --deviceid specifies the BoostFS mount configuration that is used for backups.If there is no configured BoostFS mount point that corresponds to this --deviceid,an error message is displayed.

brboost {--addconfig | -a} - device [ddboostfs:///<mnt point> | file:///<local file path>] [ - y]

Configures a backup mount point. Brboost automatically assigns the configured mountpoint the next available ID number, then appends that ID to the backup devices sectionof the configuration file. Any existing entries for this mount point are overwritten.

If it is the first mount point, it is added to the master device section of theconfiguration file and becomes the default target for backups of configuration files.

brboost --backupThe brboost --backup command backs up an existing backup configuration byname.

brboost --backup | -b [--until {[#y][#m][#d] | yyyy-mm-dd | forever] }] {--backup | -b} {--name | -n <name>} [ D] [-y] [-N]

You can adjust the retention time form this backup using the --until command.

If you specify -N, the operation skips the backup of the brboost_cfg.json file, theaduit.log file, and the master index.

Command Reference

98 Hadoop Application Agent 4.5 Installation and Administration Guide

Page 99: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

brboost --deleteThe brboost --delete command allows you to delete all backups or specificallyindividual backups.

brboost {--delete|-d} –guid <guid> [--dryrun] [ D] [-y]

This version of the brboost --delete command allows you to delete backups bytheir associated globally unique identifier (GUID).

The -d option allows you to delete all backups. You are prompted to confirm eachdeletion unless you include -y at the command prompt.

If you want to run the brboost --delete command without immediately makingthe deletions, include --dryrun at the command prompt.

After each successful deletion, the brboost.idx file is backed up to the masterdevice.

brboost {--delete|-d} o <backup-URI> [--dryrun] [ D] [-y]

This version of the brboost --delete command allows you to delete backups bytheir associated uniform resource identifiers (URI)

brboost --eraseconfigThe brboost --eraseconfig command erases configuration information from thebrboost configuration file.

brboost {--eraseconfig | -x} --device <URI>:///<absolute path> [-y] [-D]

Allows you to remove a configured backup URI and path from the backupconfiguration. The acceptable URIs are ddboostfs:///<path> or file:///<path>. If errors occur, those errors appear in the logs unless you use the –D debugoption.

brboost {--eraseconfig|-x} --deviceid <id> [-y] [-D]

Allows you to remove a device that is specified by the ID from the configuration file.

brboost --expireThe brboost --expire command expires any backups that are past their retentiontime.

brboost {- expire | -e} [--dryrun] [ D] [-y]

When used with the --dryrun option, the command lists all backups that are pasttheir expiration date without deleting them.

After each successful deletion, the index is backed up to the master device.

Command Reference

brboost --delete 99

Page 100: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

brboost --listThe brboost --list command displays a list of backups from the master index file.

brboost --list | -l --from yyyy[-mm[-dd[.hh[-mm[-ss]]]]] --toyyyy[-mm[-dd[.hh[-mm[-ss]]]]] [--deviceid {n} | --device{device-URI}] [-L] [-D]The --list option displays the backup name, GUID, retention date, and location ofthe backup. When the --list option is used with either --deviceid or --deviceoption, only master index listings containing --deviceid or --device aredisplayed. If any of these commands are used in conjunction with -L, additional metainformation that is associated with the backup is displayed.

Table 8 Options

Option Description

yyyy Specifies a given year.

yyyy mm Specifies a given month.

yyyy mm dd Specifies a given day.

yyyy-mm-dd.hh Specifies a particular hour (24-hour format).

yyyy-mm-dd.hh-mm Specifies a particular minute.

yyyy-mm-dd.hh-mm-ss Specifies a particular second.

brboost --listconfigThe brboost --listconfig command allows you to list any of the configurationinformation that is stored in the brboost_cfg.json that was added with thebrboost -a command.

brboost { {-k|--listconfig} -? {--name | -n <name>} master --deviceid <id> [--secondary <sid>] | -o {“<backup command>”} --deviceid <id> [--secondary <sid>] | - device [ddboostfs:///<mnt point> | file:///<local file path>] [ y] }

This option allows you to list a specific device that is based on the supplied URI.

brboost {-k|--listconfig} master

This option allows you to list the configured master device.

brboost { {-k|--listconfig} -? {--name | -n <unique name>}

This command lists the backup object with a specific name.

Command Reference

100 Hadoop Application Agent 4.5 Installation and Administration Guide

Page 101: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

brboost --retentionThe brboost --retention command updates the retention date of all backups ora specific backup that is identified with a uniform resource identifier (URI).

brboost --retention | -t [-o <backup-URI>] {--until {yyyy-mm-dd |forever} | {+|-}[#y][#m][#d]} [-D] [-y]

The backup retention can be updated to be kept until a specific date or to be keptindefinitely. The retention date is adjustable by number of years, months, or days.

brboost --testThe brboost --test command allows you to test the backup path mount point thatis configured with the brboost -a command.

brboost { {--test|-T]}

When running thebrboost command with the --test option, brboost verifies alldevice paths that are configured in the backup configuration. When testing a backuppath, the command uses the file path file:// URI. Brboost verifies whether thespecified file path exists. If the file path does not exist, an error message is displayed.If the backup path being tested uses the file path ddboostfs:// URI, brboostverifies whether the specified mount point path exists. Brboost also verifies whetherthe mount point is listed in /proc/mounts and is associated with a BoostFS fusemount point.

brboost { {--test|-T] [--deviceid <id>]}

This version of the test commands allows you to specify a specific device to test asopposed to testing all the device paths configured. An individual device path testoccurs in the same fashion.

brboost --versionThe brboost --version command displays the version number of the Hadoopapplication agent software.

brboost {--version | -v} -?

This option displays the current brboost and interface version number.

Command Reference

brboost --retention 101

Page 102: Hadoop Application Agent 4.5 Installation and ... · Hadoop application agent provides backup and recovery of Hadoop distributed file systems (HDFS) and HBase tables to a Data Domain

Command Reference

102 Hadoop Application Agent 4.5 Installation and Administration Guide