Built-In Security for the Cloud

26
1 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Built-in Security For The Cloud DataWorks Summit Sydney September 2017

Transcript of Built-In Security for the Cloud

1 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Built-in Security For The CloudDataWorks Summit Sydney

September 2017

2 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Presenters

Jeff Sposetti

Senior Director of Product Management, Cloud

Hortonworks Data Cloud, Cloudbreak

3 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Agenda

Introduction

Quick Demo

Security Building Blocks: Apache Ranger and Knox

Bringing It Together: Cloud and Data Lake Security

Longer Demo

Wrap Up

Q & A

4 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Background: Ephemeral Workloads + Cloud Storage

Cloud is driving more ephemeral data processing use cases

Cloud requires a robust integration with cloud storage

CLOUD STORAGE

S3ADLSWASB

WORKLOAD CLUSTERS

Durable Ephemeral

5 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Background: Hortonworks Data Cloud for AWS

Focuses on business agility, rather than infinite configurability and cluster management

Addresses prescriptive, ephemeral use cases around Apache Spark + Apache Hive

Pre-tuned and configured for use with Amazon S3

Learn more:http://hortonworks.com/products/cloud/aws/

6 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Quick demo…

7 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Security Building Blocks:Apache Ranger and Knox

8 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Protecting the Elephant in the Castle…..Kerberos,

Wire Encryption

HDFS Encryption

Apache RangerNetwork Segmentation,

Firewalls

LDAP/AD

Apache Knox

9 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Apache Knox Proxying Services

★ Provide access to Hadoop via proxying of HTTP resources

★ Ecosystem APIs and UIs + Hadoop oriented dispatching for Kerberos + doAs(impersonation) etc.

Authentication Services

★ REST API access, WebSSO flow for UIs

★ LDAP/AD, Header based PreAuth

★ Kerberos, SAML, OAuth

Client DSL/SDK Services

★ Scripting through DSL

★ Using Knox Shell classes directly as SDK

10 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Apache RangerComprehensive and Extensible Security Model

• Centralized platform to define, administer and manage security policies across Hadoop components (HDFS, Hive, HBase, YARN, Kafka, Solr, Storm, Knox, NiFi, Atlas)

• Extensible Architecture with ability to add custom policy conditions, user context enrichers

Fine-Grained Authorization• For data access control for Database, Table, Column, LDAP

Groups & Specific Users

Centralized Auditing• Central audit location for all access requests

• Support multiple destination sources (HDFS, Solr, etc.)

• Real-time visual query interface

Advanced Security

• Dynamic Security Policies: Prohibition, Time, Location and Tag (Atlas)

• Dynamic Column Masking & Row Filtering

OPERATIONS SECURITY

GOVERNANCE

ST

OR

AG

E

ST

OR

AG

E

Machine

LearningBatch

StreamingInteractive

Search

SECURITY

11 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Bringing It Together:Cloud and Data Lake Services

12 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

CLOUD

DATA LAKESECURITY

13 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Key Components for Enterprise Security

SCHEMA POLICY AUDIT DIRECTORY

WHAT

Provides Hive schema (tables, views, etc).

WHY

If you have 2+ workloads accessing the same data, need to share schema across those

workloads.

HOW

Externalize Hive Metastoreinto for schema definition.

WHAT

Defines security policies around Hive schema.

WHY

If you have 2+ users accessing the same data, need policies to be consistently available

and enforced.

HOW

Externalize and share Ranger across workloads and store

policies external.

WHAT

Audit user access.

WHY

Capture data access activity.

HOW

Externalize and share Ranger across workloads, leverage

cloud storage for audit data.

GATEWAY

WHAT

Provide single endpoint that can be protected with SSL and enabled for authentication to access to cluster resources.

WHY

Avoid opening many ports, some potentially w/o authentication or SSL

protection.

HOW

Deploy a centralized protected gateway automatically.

WHAT

Users and groups.

WHY

Provide authentication source for users and authorization

source for groups.

HOW

Leverage external LDAP or Active Directory.

14 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Ephemeral Workloads: With Enterprise Security

Ephemeral Enterprise Security

Tuned and Optimized Infrastructure

Simplified, Automated Operations

S3 Integration

Protected Network Access

Schema Shared (Hive Metastore) Shared (Hive Metastore)

Authentication Single-user Multi-User (LDAP/AD)

Authorization - Security Policies (Ranger)

Audit - Audit (Ranger)

15 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Ephemeral Workloads + Cloud Storage + Shared “Data Lake” Services

CLOUD STORAGE

S3ADLSWASB

WORKLOAD CLUSTERS

Durable Ephemeral

SHARED DATA LAKE SERVICES

Metastore

SCHEMA

Long Running

Define your data schema and security policies once for your

ephemeral and always-on workloads

Ranger

POLICY

Security access to workload clusters via a Protected Gateway enabled for AuthN and HTTPS.

16 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Shared Schema: Hive Metastore

Register external “Amazon RDS” instances to use with Hive Metastore

Preserve Hive schema across multiple ephemeral clusters

17 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Protected Network Access: Knox

18 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Shared Security Policies: Ranger

Create a set of “Shared Data Lake Services”

Preserve Ranger Security Policies across multiple ephemeral clusters

19 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Deployment Architecture

Access your cluster components through the

protected gateway via SSL on port 443 open on the controller security group.

CONTROLLER

PROTECTEDGATEWAY

USER ACCESS

Zeppelin

HIVE LLAP / SPARK WORKLOADS

Hive

LLAP

SHARED DATA LAKE SERVICES

Ranger

POLICY

(RDS)

AUDIT

(S3)

SCHEMA

(RDS)

DIRECTORY

(LDAP/AD)

Spark

Hive

Metastore

20 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Hortonworks Data Cloud + Shared Data Lake Services

1

2

3

Register an Authentication Source (i.e. LDAP/AD).

Create a “Shared Data Lake”, specify S3 Bucket & RDS.

When you create a cluster, ”attach” to the Shared Data Lake Services:• for Multi-User AuthN (LDAP/AD)• for AuthZ + Audit (Ranger)• for Schema (Hive Metastore)

PREREQUISITES

• LDAP/AD

• S3 Bucket

• RDS Instance

21 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Longer demo…

22 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

General Guidelines

Think Ephemeral. All of your data and metadata in S3 and RDS respectively, do not create tables or files in the local HDFS.

The Hive warehouse is setup to be on S3 for data lakes, create tables in this location instead of individual S3 buckets, it will make them easier to manage.

Use Hive “external tables” for tables that are outside this warehouse, typically if the data is being ingested through some path outside of Hadoop

Create S3 bucket policies that exactly match usage so that you can spin up clusters with the least privilege.

23 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Wrap Up

24 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Takeaways

Cloud driving more ephemeral data processing use cases

Ephemeral workloads leverage cloud storage

This pattern is driving an architectural approach for “Shared Data Lake Services”

Building blocks are Apache Ranger and Apache Knox

Resource Link

Hortonworks Data Cloud https://hortonworks.com/products/cloud/aws/

Apache Ranger https://hortonworks.com/apache/ranger/

Apache Knox https://hortonworks.com/apache/knox-gateway/

25 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Learn More

Enterprise ready security and governance for Hadoop ecosystem

Breakout SessionThursday, September 21 @ 3:10p

https://dataworkssummit.com/sydney-2017/sessions/treat-your-enterprise-data-lake-indigestion-enterprise-ready-security-and-governance-for-hadoop-ecosystem

Security, Governance and Cybersecurity

Bird of a FeatherThursday, September 21 @ 6:00p

https://dataworkssummit.com/sydney-2017/birds-of-a-feather/security-governance-cybersecurity/

26 © Hortonworks Inc. 2011 – 2017. All Rights Reserved

Thank You

https://hortonworks.com/products/cloud/aws/

https://hortonworks.com/apache/ranger/

https://hortonworks.com/apache/atlas/