Toward Better Multi-Tenancy Support from HDFS

24
1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Toward Better Multi-Tenancy Support from HDFS Xiaoyu Yao Email: [email protected]

Transcript of Toward Better Multi-Tenancy Support from HDFS

Page 1: Toward Better Multi-Tenancy Support from HDFS

1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Toward Better Multi-Tenancy Support from HDFS

Xiaoyu YaoEmail: [email protected]

Page 2: Toward Better Multi-Tenancy Support from HDFS

2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

About myself⬢Member of Technical Staff at Hortonworks since 2014

⬢Apache Hadoop Committer and PMC member.

⬢Currently working on HDFS.

⬢This talk is to help better understanding of HDFS multi-tenancy support and ongoing work for better resource management.

Page 3: Toward Better Multi-Tenancy Support from HDFS

3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Agenda

⬢Overview

⬢Hadoop multi-tenancy features

⬢HDFS resources and multi-tenancy offerings

⬢HDFS resource management via resource coupon

⬢Q&A

Page 4: Toward Better Multi-Tenancy Support from HDFS

4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Overview

⬢Centrally managed infrastructure – Consolidate to simplify management and lower TCO– Better utilization and efficiency

⬢Requirement– Resource Sharing– Resource Isolation– Resource Control

Page 5: Toward Better Multi-Tenancy Support from HDFS

5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Multi-Tenancy Support from Hadoop

Resource Sharing

Resource Isolation

Resource Management

HBASE Y Namespace, Region Server Group

Quota

YARN Y Queue, Node Label...

Capacity Scheduler,...

HDFS Y Federation Quota, FairCallQueue, Backoff

Page 6: Toward Better Multi-Tenancy Support from HDFS

6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

HDFS Resources

⬢Capacity– Namespace– Storage Space– Storage Type

⬢Operational Resources– Namenode

•RPC– Datanode

•Disk & Network

Page 7: Toward Better Multi-Tenancy Support from HDFS

7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

HDFS Resource Sharing/Isolation – Federation

Page 8: Toward Better Multi-Tenancy Support from HDFS

8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

HDFS Capacity Management – Quota

⬢Quota– Namespace– StorageSpace– HDFS-7584 Quota by Storage Types

⬢ Limitations– Static– Per directory– No per user/job control

Page 9: Toward Better Multi-Tenancy Support from HDFS

9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

HDFS Operational Resource Management – Namenode RPC Isolation (1)

⬢ Internal RPC– DN->NN block report, heartbeat, etc.– ZKFC->NN liveness check

⬢External RPC– Client RPCs from HDFSClients such as MR jobs/Hive queries/HBase

Client ListenerReader

ReaderCall Queue

Handler

Handler

Handler

FSN

Page 10: Toward Better Multi-Tenancy Support from HDFS

10

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

HDFS Operational Resource Management – Namenode RPC Isolation (2)⬢Use case:

–HFDS access from normal jobs impacted by offending jobs–Internal RPCs impacted by External RPCs –One blocked RPC method could affect others

⬢Protect HDFS internal RPCs:–Dedicated service RPC server/port

•Isolate DN->NN block report, heartbeat, etc.–Dedicated lifeline RPC server/port

•Protect ZKFC->NN liveness check

⬢All external RPCs go to the default port (e.g., 8020)

Page 11: Toward Better Multi-Tenancy Support from HDFS

11

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

HDFS Resource Management – Name Node RPC Call Queue

⬢ In multi-tenancy scenario, call queue should play an important role like a shock absorber to accommodate different workload, converting busty arrivals into smooth, steady departures.

⬢Good call queue– queue without call bloat– catches and handles bursts with no more than a temporary increase of queue delay– maximum server utilization

⬢Bad call queue– queue that exhibits call bloat – queue filled up and stay filled upon bursts– low utilization and high queue latency

Page 12: Toward Better Multi-Tenancy Support from HDFS

12

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

HDFS Resource Management - Fair Call Queue⬢Before HADOOP-9640 LinkedBlockingQueue

– Single queue – Client blocked and timeout/fail when queue is full

⬢HADOOP-9640 - Fair Call Queue

– Multiple priority levels and call queues with different processing priority– Each RPC is assigned a priority by scheduler – High priority RPC calls are put into call queue with higher probability of being executed.

Scheduler

Queue 0

Queue ...

Queue 2

Multiplexer (WRR)

Page 13: Toward Better Multi-Tenancy Support from HDFS

13

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

HDFS Resource Management – Namenode RPC Throttling <1>⬢HADOOP-10597 Backoff when the call queue is full

–Send back a Retriable exception–Let the client do exponential wait and retry instead of blocking/timeout/failed the call.

Page 14: Toward Better Multi-Tenancy Support from HDFS

14

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

HDFS Resource Management – Namenode RPC Throttling <2>

⬢HADOOP-12916 Backoff based on response time–The basic idea: Backoff earlier to avoid call queue overload so that namenode

can recover quickly.–Low priority calls get backed off if response time of high priority call is over

predefined threshold. –More per user/queue metrics added for trouble shooting.

Page 15: Toward Better Multi-Tenancy Support from HDFS

15

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

HDFS Resource Management – Namenode RPC Throttling <3>

⬢Abstract scheduler interface from call queue for pluggable RPC priority assignment–DefaultRpcScheduler: all RPC calls with same priority–DecayRpcScheduler: from original FairCallQueue priority assigned based on previous call volumes of users.–Other experimental schedulers: configurable list of high priority user/group for low latency jobs, medium priority user/group for normal jobs and low priority user/group for batch jobs.

Page 16: Toward Better Multi-Tenancy Support from HDFS

16

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

HDFS resource management - QoS

⬢Use case:– Allow high performance QoS mechanism with minimum decoding effort on server side

⬢HADOOP-9194 QoS support for Hadoop RPC – One bytes in RPC header to facilitate QoS mechanism– E.g., differentiate OLTP/OLAP, batch/streaming against the same HDFS

⬢ Limitation– No mechanism level implementation yet

Page 17: Toward Better Multi-Tenancy Support from HDFS

17

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

HDFS resource management with YARN

⬢Use Case– Priority inversion without centralized resource management (e.g., RPC calls from high priority YARN jobs may be put into low priority HDFS namenode call queue)– Identify and manage ”bad” caller effectively

⬢Namenode – RPC handler– FairCallQueue offers the fairness use of namenode RPC handlers– No guarantee of differentiation

⬢Datanode – I/O bandwidth– No differentiation of writer/reader and bandwidth usage.– Datanode allows static throttling balancer I/O.

Page 18: Toward Better Multi-Tenancy Support from HDFS

18

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

HDFS Namenode Resource Reservation

⬢HADOOP-13128 propose HDFS namenode resource reservation via resource coupon– From throttling to manage– Similar to delegation token in many aspects– Works for both Kerberos and non-Kerberos cluster– Allows only privileged service user to request resource coupons from namenode. – Coupon can be serialized/de-serialized for use within container.– Coupon can be renewed for long running jobs or canceled after the intended job is finished.

Page 19: Toward Better Multi-Tenancy Support from HDFS

19

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

HDFS Namenode Resource Coupon

⬢Coupon Identifier– Finer grain owner (MR job ID, Hive Query ID) to help identify and manage “good” and “bad” callers– Resource type (Namenode RPC or Datanode I/O bandwidth)– Flexible management unit for different resources.

•Min/Max percentage (e.g. Namenode RPC) •Absolute value (Datanode I/O bandwidth)

Page 20: Toward Better Multi-Tenancy Support from HDFS

20

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

HDFS Namenode Resource Coupon Manager (RCM)

⬢Grant/Renew/Cancel resource coupon

⬢Monitor and report resource usage

⬢Check and validate resource use requests

Page 21: Toward Better Multi-Tenancy Support from HDFS

21

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

HDFS Namenode Resource Pool

HDFS Namenode Resource Pool

Fairness Pool Managed Pool

Applications supporting Resource Coupon (YARN/HBASE)

Legacy Applications without Resource

Coupon

Page 22: Toward Better Multi-Tenancy Support from HDFS

22

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

HDFS Namenode Resource Coupon Manager (RCM)

NEW Client

YARNResourceManager

HDFS Namenode

RCM

HDFS Datanode

YARN Node Manager

YARN Container

Page 23: Toward Better Multi-Tenancy Support from HDFS

23

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

HDFS Resource Management – Datanode⬢Use case:

– When a client writes to HDFS faster than the disk bandwidth of the DNs, it saturates the disk bandwidth and put the DNs into an unresponsive state.– The client only backs off by aborting / recovering the pipeline, which causes failed writes and unnecessary pipeline recovery.

⬢ Static I/O Throttling – HDFS-7265 Support HDFS IO throttling– HDFS-9796 Use a throttler for replica write in datanode – HDFS-4412 Add throttler for datanode bandwidth– HADOOP-10410 datanode Qos via ioprio_set on DataXceiver thread

⬢Dynamic I/O Throttling– HDFS-7270 Add congestion signaling capability to DataNode write pipline(ECN)

⬢ Future work: I/O bandwidth reservation with resource coupon

Page 24: Toward Better Multi-Tenancy Support from HDFS

24

© Hortonworks Inc. 2011 – 2016. All Rights Reserved24

© Hortonworks Inc. 2011 – 2016. All Rights Reserved

Thank you!

Q&A