Building Enterprise Grade Applications in Yarn with Apache Twill

26
Build Enterprise Grade Applications in YARN with Poorna Chandra [email protected] Big Data App Meetup July 27, 2016

Transcript of Building Enterprise Grade Applications in Yarn with Apache Twill

Page 1: Building Enterprise Grade Applications in Yarn with Apache Twill

Build Enterprise Grade Applications in YARN with

Poorna Chandra [email protected]

Big Data App MeetupJuly 27, 2016

Page 2: Building Enterprise Grade Applications in Yarn with Apache Twill

Agenda● Hadoop YARN● Challenges in building enterprise applications● Apache Twill● Architecture● Features● Real World Enterprise Use Case - CDAP● Roadmap● Q & A

2

Page 3: Building Enterprise Grade Applications in Yarn with Apache Twill

First: The NEWS

to the Apache Twill Community!!!

Apache Twill is now a Top-Level Project of the ASF

Announcement: https://s.apache.org/Rzsf

3

Page 4: Building Enterprise Grade Applications in Yarn with Apache Twill

Apache Hadoop® YARN● MapReduce NextGen aka MRv2● Resource management vs job scheduling/monitoring● New ResourceManager manages the global assignment of compute

resources to applications● Introduce concept of ApplicationMaster per application to communicate

with ResourceManager for compute resource management● Enables more than MR jobs on cluster - like Apache Spark, etc.

4

Page 5: Building Enterprise Grade Applications in Yarn with Apache Twill

How YARN Application Works

5

Page 6: Building Enterprise Grade Applications in Yarn with Apache Twill

YARN is powerful, but...● Every application needs to write boilerplate code

○ Negotiate resources from RM○ Talk to NM to run jobs○ Monitor running jobs

● Every application needs to handle ○ High availability ○ Long running applications

■ Security aspects - delegation token expiry○ Easy scalability

6

Page 7: Building Enterprise Grade Applications in Yarn with Apache Twill

● Provides abstraction for YARN to reduce complexity to develop complex and large scale distributed applications

● Adds simplicity to the power of YARN○ Java thread-like programming

model● Reduces boilerplate code● Offers common needs for distributed

enterprise-grade application development○ Lifecycle management○ High Availability○ Scalability○ Service discovery

Simplification with Apache Twill

7

Page 8: Building Enterprise Grade Applications in Yarn with Apache Twill

Hello World in TwillDefine a TwillRunnable

public class HelloWorldRunnable extends AbstractTwillRunnable {

@Override

public void run() {

LOG.info("Hello World. My first distributed application.");

}

}

8

Page 9: Building Enterprise Grade Applications in Yarn with Apache Twill

Hello World in TwillLaunch it!

public class HelloWorld {

public static void main(String[] args) throws Exception {

TwillRunnerService twillRunner =

new YarnTwillRunnerService(new YarnConfiguration(), "localhost:2181");

twillRunner.startAndWait();

TwillController controller = twillRunner.prepare(new HelloWorldRunnable());

controller.start();

controller.awaitTermination();

//...

}

}9

Page 10: Building Enterprise Grade Applications in Yarn with Apache Twill

Major Features● Service Discovery● Placement Policy● Elastic Scaling● Command Messages● State Recovery

10

Page 11: Building Enterprise Grade Applications in Yarn with Apache Twill

11

Service Discovery

Page 12: Building Enterprise Grade Applications in Yarn with Apache Twill

Placement Policy● Placement policy can be used to address

○ Performance○ Availability○ Resource conflict

● Exposes container placement policy from YARN● Will allow Twill to allocate containers in specific racks and host based on

DISTRIBUTED deployment mode

12

Page 13: Building Enterprise Grade Applications in Yarn with Apache Twill

Elastic Scaling● Ability to add or reduce number of YARN containers to run the

application● Scale your application based on load● No need to restart the application● Twill API TwillController.changeInstances is used to accomplish

this task

13

Page 14: Building Enterprise Grade Applications in Yarn with Apache Twill

14

Command Messages

Page 15: Building Enterprise Grade Applications in Yarn with Apache Twill

15

State Recovery

Page 16: Building Enterprise Grade Applications in Yarn with Apache Twill

Real World Enterprise Usages - CDAP● Cask Data Application Platform (CDAP) - http://cdap.io

○ Open source application and integration framework for big data○ Simplifies and enhances data application development and management

■ APIs for simplification, portability and standardization● Works across wide range of Hadoop versions and all common distros

■ Built-in System services, such as metrics and logs aggregation, dataset

management, and distributed transaction service for common big data applications needs

○ Extensions to enhance user experience■ Hydrator - Interactive data pipeline construction■ Tracker - Metadata discovery and data lineage

16

Page 17: Building Enterprise Grade Applications in Yarn with Apache Twill

Apache Twill in CDAP● CDAP runs different types of processes on YARN

○ Long running daemons○ REST services○ Real-time transactional streaming framework○ Workflow execution

● CDAP only interacts with Twill○ Greatly simplifies the CDAP code base○ Just a matter of minutes to add support for new type of work to run on YARN

● Twill support of common needs○ Service discovery○ Leader election○ Elastic scaling○ Security

17

Page 18: Building Enterprise Grade Applications in Yarn with Apache Twill

CDAP Architecture

18

Page 19: Building Enterprise Grade Applications in Yarn with Apache Twill

Service Discovery● CDAP exposes all functionalities through REST● Almost all CDAP HTTP services are running in YARN

○ No fixed host and port○ Bind to ephemeral port○ Announce the host and port through Twill

■ Unique service name for a given service type

● Router inspects the request URI to derive a service name○ Uses Twill discovery service client to locate actual host and port○ Proxy the request and response

19

Page 20: Building Enterprise Grade Applications in Yarn with Apache Twill

Long Running Applications● All CDAP services on YARN are long running

○ Transaction server, metrics and log processing, real-time data ingestion, …

● Many user applications are long running too○ Real-time streaming, HTTP service, application daemon

● Secure cluster, specifically Kerberos enabled cluster○ All all Hadoop services use delegation token

■ NN, RM, HBase Master, Hive, KMS, ... ○ YARN containers don’t have the keytab, hence can’t update the token

20

Page 21: Building Enterprise Grade Applications in Yarn with Apache Twill

Long Running Applications in Twill● Twill provides support for updating delegation tokens

○ TwillRunner.scheduleSecureStoreUpdate

● Update delegation tokens from the launcher process (kinit process)○ Acquires new delegation tokens periodically○ Serializes tokens to HDFS○ Notifies all running applications about the update

■ Through command message○ Each runnable refreshes delegation tokens by reading from HDFS

■ Requires a non-expired HDFS delegation token

● New launcher process will discover all Twill apps from ZK○ Can run HA launcher processes using leader election support from Twill

21

Page 22: Building Enterprise Grade Applications in Yarn with Apache Twill

Scalability● Many components in CDAP are linearly scalable, such as

○ Streaming data ingestion (through REST endpoint)○ Log processing

■ Reads from Kafka, writes to HDFS○ Metrics processing

■ Reads from Kafka, writes to timeseries table○ User real-time streaming DAG○ User HTTP service

● Twill supports adding/reducing YARN containers for a given TwillRunnable○ No need to restart application○ Guarantees a unique instance ID is assigned

■ Application can use it for partitioning

● Dynamic scaling using service discovery22

Page 23: Building Enterprise Grade Applications in Yarn with Apache Twill

High Availability● In production environment, it is important to have high availability● Twill provides couple means to achieve that

○ Running multiple instances of the same TwillRunnable○ Use dynamic service discovery to route requests○ Twill Automatic restart of TwillRunnable container if it gets killed / exit abnormally

■ Killed container will be removed from the service discovery■ Restarted container will be added to the service discovery

○ Built-in leader election support to have active-passive type of redundancy■ Tephra service use that, as it requires only having one active server

○ Placement policy to make sure that instances run on different hosts

23

Page 24: Building Enterprise Grade Applications in Yarn with Apache Twill

Apache Twill in Enterprise● CDAP, which uses Twill, is being used by large enterprises in production● Apache Twill runs on different cluster types

○ AWS, Azure, bare metal, VMs

● Compatible with wide range of Hadoop versions○ Vanilla Hadoop 2.0 - 2.7○ HDP 2.1 - 2.3○ CDH 5○ MapR 4.1 - 5.1

24

Page 25: Building Enterprise Grade Applications in Yarn with Apache Twill

Roadmap● Generalize to run on more frameworks

○ Apache Mesos, Kubernetes

● Smarter containers management○ Run simple runnable in AM○ Multiple runnables in one container

● Fine-grained control of containers lifecycle○ When to start, stop and restart on failure

● Smaller footprint○ Optional Kafka, optional ZooKeeper

25

Page 26: Building Enterprise Grade Applications in Yarn with Apache Twill

Thank you!● Apache Twill is Open Source

○ http://twill.apache.org ○ [email protected] ○ @ApacheTwill

● Contributions are welcome!

26