Architecting a multi-tenanted platform

27
insyght.io Architecting a Multi-Tenanted Platform Hadoop Summit - Melbourne, 2016 Grant Priestley - insyght.io

Transcript of Architecting a multi-tenanted platform

Page 1: Architecting a multi-tenanted platform

insyght.io

Architecting a Multi-Tenanted PlatformHadoop Summit - Melbourne, 2016

Grant Priestley - insyght.io

Page 2: Architecting a multi-tenanted platform

Copyright © 2016 insyght.io. All Rights Reserved. insyght.io

2

Agenda• Hurdles• Multi-tenancy Defined• Strategies & Tactics• Demonstration• Considerations

Page 3: Architecting a multi-tenanted platform

insyght.ioCopyright © 2016 insyght.io. All Rights Reserved.

3

Hurdles

Cognitive Hurdle Resource Hurdle Political HurdleOpposition from powerful

vested interests where any change will likely

result in a shift of power

Limited resources in the market with right skill set, even less with execution

experience

Organisation is stuck on the status quo and there

is a general unwillingness to change

Page 4: Architecting a multi-tenanted platform

Copyright © 2016 insyght.io. All Rights Reserved. insyght.io

4

"Multi-tenancy is a reference to the mode of operation of software where multiple independent instances of one or multiple applications operate in a shared environment. The instances (tenants) are logically isolated, but physically integrated…"

- GARTNER IT

Page 5: Architecting a multi-tenanted platform

Copyright © 2016 insyght.io. All Rights Reserved. insyght.io

Multi-Tenancy in Hadoop

Single Hadoop cluster used by multiple groups/users (tenants) that is designed to:• Have appropriate data access & security• Share resources storage (HDFS) and processing

(cores, RAM) capacity• Support mixed workloads (batch, interactive, etc.)• Lower the total cost of ownership

5

Page 6: Architecting a multi-tenanted platform

Copyright © 2016 insyght.io. All Rights Reserved. insyght.io

Ok, a couple of words on securityFirst A Word on Security

MUST Haves• Authentication process supported by Kerberos• Perimeter Security• Encryption (@Rest, In Motion)• Centralised Auditing & SIEM integration

MUST Haves for Multi-tenancy• Process isolation• Fine grain access controls• Data Masking, anonymisation • Centralised security policies

6

Page 7: Architecting a multi-tenanted platform

Copyright © 2016 insyght.io. All Rights Reserved. insyght.io

7

Strategy without tactics is the slowest route to victory..

Tactics without strategy is the noise before defeat.

- Sun Tzu -

Page 8: Architecting a multi-tenanted platform

Copyright © 2016 insyght.io. All Rights Reserved. insyght.io

8

Strategic ApproachesCONTAINERISATIONOne cluster, multiple containers forapplications

MULTI-CLUSTERSingle management of multiple clusters for users and applications

MULTI-TENANCYOne cluster, multiple users, multiple applications

!"

!"

! !"

!

Page 9: Architecting a multi-tenanted platform

Copyright © 2016 insyght.io. All Rights Reserved. insyght.io

Support for Docker Containers on YARNContainerisation

Motivation• Sandboxing / dependency isolation of container technology to Hadoop• Simple to use cluster resources for wider range of applications

Current State of Affairs• Success translating Kubernetes to AppMaster to launch container• Success of custom container launcher for docker fully managed by YARN• [WIP] - Run Docker and traditional YARN applications side by side on same

cluster concurrently

9

Page 10: Architecting a multi-tenanted platform

Copyright © 2016 insyght.io. All Rights Reserved. insyght.io

Shared Management ApproachMulti-Cluster

Motivation• Leverages a shared management model the enables organisations to

reduce the operational overhead of managing individual clusters separately• Mitigate any risk in performance or resource contention• Geographic recovery, redundancy and availability

Current State of Affairs• Most common pattern for isolating SLA-bound from Dev/Test and DR

10

Page 11: Architecting a multi-tenanted platform

Copyright © 2016 insyght.io. All Rights Reserved. insyght.io

Shared Resources ApproachMulti-Tenancy

Motivation• Increase utilisation of resources on the cluster• Lower the total cost of ownership of the platform• Eliminate data proliferation and data silos

Current State of Affairs• All Hadoop distributions (Cloudera, Hortonworks, MapR) support multi-tenancy,

some a little better than others in aspects

11

Page 12: Architecting a multi-tenanted platform

insyght.ioCopyright © 2016 insyght.io. All Rights Reserved.

Implementation Tactics

12

Page 13: Architecting a multi-tenanted platform

Copyright © 2016 insyght.io. All Rights Reserved. insyght.io

Disk Space Quota

Hard limit on the disk space (size) that users can write to a particular directory preventing accidental or malicious consumption of disk.

13

Tactic: Quota Management In UseHigh

# hadoop dsfadmin -setSpaceQuota <size> <directory>hadoop dfsadmin -setSpaceQuota 150G /user/granthadoop dfsadmin -setSpaceQuota 3000G /user/john

Page 14: Architecting a multi-tenanted platform

Copyright © 2016 insyght.io. All Rights Reserved. insyght.io

Name Quota

Hard limits the number of files and subdirectories within a particular directory optimising the metadata subsystem (NameNode) within the Hadoop-based platform

14

Tactic: Quota Management In UseLow

# hadoop dsfadmin -setQuota <numberOfFiles> <directory>hadoop dfsadmin -setQuota 100 /user/granthadoop dfsadmin -setQuota 1000 /user/john

Page 15: Architecting a multi-tenanted platform

Copyright © 2016 insyght.io. All Rights Reserved. insyght.io

Capacity Scheduler

Capacity of each queue specifies the percentage of clusterresources that are available for applications submitted to the queue. Queues can be set up in a hierarchy that reflects the resource requirements and access restrictions required by the various tenants, groups, and users of the cluster.

15

Tactic: Resource Management In UseHigh

yarn.scheduler.capacity.root.queues=tenant1,tenant2,tenant3yarn.scheduler.capacity.root.tenant1.capacity=50yarn.scheduler.capacity.root.tenant2.capacity=25yarn.scheduler.capacity.root.tenant3.capacity=25

Page 16: Architecting a multi-tenanted platform

Copyright © 2016 insyght.io. All Rights Reserved. insyght.io

Tactic: Resource ManagementCapacity Scheduler - Important Parameters

• Min User Limit % determines how much the scheduler will give to an application before evenly distributing it.

• Capacity defines what resources the scheduler tries to guarantee• User Limit Factor defines the “hard” limit for individual users• Max Capacity defines the “hard” limit for the queue

16

Max CapacityUser Limit FactorMin User Limit % Capacity

25% 150%

Page 17: Architecting a multi-tenanted platform

Copyright © 2016 insyght.io. All Rights Reserved. insyght.io

Tactic: Resource ManagementCapacity Scheduler - Best Practices

1. Tenant-based allocation for capex and metering allowing tenants to manage capacity amongst its initiatives/major projects

2. Setup initiatives/major projects as sub-queues3. Map user / groups to queues 4. Low “absolute” and high “absolute max” on ad-hoc queues, possibly with

preemption

17

Page 18: Architecting a multi-tenanted platform

Copyright © 2016 insyght.io. All Rights Reserved. insyght.io

Node Labels

Enables the configuration of node partitions as a mechanism toenforce node-level isolation while accounting for resource contention across non-YARN managed resources allowing applications to specify where to execute on the cluster.

18

Tactic: Resource Isolation In UseEmerging

Access is restricted to applications running in

queues associated with that node label

If idle capacity is available, resources are shared with all applications on the cluster

Page 19: Architecting a multi-tenanted platform

Copyright © 2016 insyght.io. All Rights Reserved. insyght.io

Node Labels - Emerging Practices

1. Use non-exclusive node partitions to maximise resource utilisation2. Factor in user-limits, capacity, etc. to ensure jobs can be launched3. Combine node labels with queue management to:

a. Allocate capacity of queue across different partitionsb. Enable ACLs for node partition by attaching it to a specific queue.

19

Tactic: Resource Isolation

yarn.scheduler.capacity.root.queues=tenant1,tenant2,tenant3yarn.scheduler.capacity.root.tenant1.capacity=50yarn.scheduler.capacity.root.tenant2.capacity=25yarn.scheduler.capacity.root.tenant3.capacity=25

# Queue ACLs for node partitionsyarn.scheduler.capacity.root.tenant1.accessible-node-labels=GPUyarn.scheduler.capacity.root.tenant1.accessible-node-labels.GPU.capacity=100

Page 20: Architecting a multi-tenanted platform

Copyright © 2016 insyght.io. All Rights Reserved. insyght.io

Resource Preemption

A queue under its minimum resource, and the cluster doesn’t have available resources, the preemption policy is able to get resources from other queues, which are above their minimum resources

20

Tactic: Resource Isolation & Management In UseHigh

Cap

acity

Use

d

TimeTime

Cap

acity

Use

d

Preemption EnabledPreemption Not Enabled

App 01 App 02 App 01 App 02

t1 t2 t3 t1 t2 t3

Page 21: Architecting a multi-tenanted platform

Copyright © 2016 insyght.io. All Rights Reserved. insyght.io

Resource Preemption - Best Practices

1. Control the pace of preemption

2. Control if/when preemption happens

21

Tactic: Resource Isolation & Management

yarn.resourcemanager.monitor.capacity.preemption.max_wait_before_killyarn.resourcemanager.monitor.capacity.preemption.natural_termination_factoryarn.resourcemanager.monitor.capacity.preemption.total_preemption_per_round

yarn.resourcemanager.monitor.capacity.preemption.max_ignored_over_capacityyarn.scheduler.capacity.<queue-path>.disable_preemption

Page 22: Architecting a multi-tenanted platform

Copyright © 2016 insyght.io. All Rights Reserved. insyght.io

Static Resource Allocation

Enable CGroups for per-resource isolation for all non-YARN application workloads where applications require guaranteed access to resources. SLAs need to enforce (hard or soft) CPU allocations by vCores given to application containers

22

Tactic: Resource Management In UseHigh

Page 23: Architecting a multi-tenanted platform

Copyright © 2016 insyght.io. All Rights Reserved. insyght.io

Disk Resourcing

Enforces equal sharing of disk or dedication of spindles allowingapplications to request dedicated spindles while providing isolationthrough local disk IOPs at runtime.. not HDFS read/writes.

23

Tactic: Resource Management In UseNot Used

# Use CGroups resource handlerorg.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler

# Enable disk resourceyarn.nodemanager.resource.disk.enabled

Page 24: Architecting a multi-tenanted platform

Copyright © 2016 insyght.io. All Rights Reserved. insyght.io

Pluggable UI framework that provides a single point of entry tocontrol access to Hadoop and ecosystem components based upon user roles and groups

Ambari Views24

Tactic: Access Management In UseEmerging

Page 25: Architecting a multi-tenanted platform

Copyright © 2016 insyght.io. All Rights Reserved. insyght.io

Demonstration25

Applying the Tactics

DEMO

Page 26: Architecting a multi-tenanted platform

insyght.ioCopyright © 2016 insyght.io. All Rights Reserved.

Considerations

26

Change Management Data Organisation Chargeback ModelUnderstand / communicate

how tenants will be charged for the resources

they consume

How will data be accessed, how will it be secured, how will it be

stored / managed

Required to handle updates to the cluster

while ensuring that applications still run

Page 27: Architecting a multi-tenanted platform

insyght.ioCopyright © 2016 insyght.io. All Rights Reserved.

Thank You.. Questions?

27

[email protected] https://au.linkedin.com/in/grant-priestley-89314762