08 cloud day ksa - the cloud built for business - cloud overview
Built-In Security for the Cloud
-
Upload
dataworks-summit -
Category
Technology
-
view
249 -
download
0
Transcript of Built-In Security for the Cloud
1 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Built-in Security For The CloudDataWorks Summit Sydney
September 2017
2 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Presenters
Jeff Sposetti
Senior Director of Product Management, Cloud
Hortonworks Data Cloud, Cloudbreak
3 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Agenda
Introduction
Quick Demo
Security Building Blocks: Apache Ranger and Knox
Bringing It Together: Cloud and Data Lake Security
Longer Demo
Wrap Up
Q & A
4 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Background: Ephemeral Workloads + Cloud Storage
Cloud is driving more ephemeral data processing use cases
Cloud requires a robust integration with cloud storage
CLOUD STORAGE
S3ADLSWASB
WORKLOAD CLUSTERS
Durable Ephemeral
5 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Background: Hortonworks Data Cloud for AWS
Focuses on business agility, rather than infinite configurability and cluster management
Addresses prescriptive, ephemeral use cases around Apache Spark + Apache Hive
Pre-tuned and configured for use with Amazon S3
Learn more:http://hortonworks.com/products/cloud/aws/
7 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Security Building Blocks:Apache Ranger and Knox
8 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Protecting the Elephant in the Castle…..Kerberos,
Wire Encryption
HDFS Encryption
Apache RangerNetwork Segmentation,
Firewalls
LDAP/AD
Apache Knox
9 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Apache Knox Proxying Services
★ Provide access to Hadoop via proxying of HTTP resources
★ Ecosystem APIs and UIs + Hadoop oriented dispatching for Kerberos + doAs(impersonation) etc.
Authentication Services
★ REST API access, WebSSO flow for UIs
★ LDAP/AD, Header based PreAuth
★ Kerberos, SAML, OAuth
Client DSL/SDK Services
★ Scripting through DSL
★ Using Knox Shell classes directly as SDK
10 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Apache RangerComprehensive and Extensible Security Model
• Centralized platform to define, administer and manage security policies across Hadoop components (HDFS, Hive, HBase, YARN, Kafka, Solr, Storm, Knox, NiFi, Atlas)
• Extensible Architecture with ability to add custom policy conditions, user context enrichers
Fine-Grained Authorization• For data access control for Database, Table, Column, LDAP
Groups & Specific Users
Centralized Auditing• Central audit location for all access requests
• Support multiple destination sources (HDFS, Solr, etc.)
• Real-time visual query interface
Advanced Security
• Dynamic Security Policies: Prohibition, Time, Location and Tag (Atlas)
• Dynamic Column Masking & Row Filtering
OPERATIONS SECURITY
GOVERNANCE
ST
OR
AG
E
ST
OR
AG
E
Machine
LearningBatch
StreamingInteractive
Search
SECURITY
11 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Bringing It Together:Cloud and Data Lake Services
13 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Key Components for Enterprise Security
SCHEMA POLICY AUDIT DIRECTORY
WHAT
Provides Hive schema (tables, views, etc).
WHY
If you have 2+ workloads accessing the same data, need to share schema across those
workloads.
HOW
Externalize Hive Metastoreinto for schema definition.
WHAT
Defines security policies around Hive schema.
WHY
If you have 2+ users accessing the same data, need policies to be consistently available
and enforced.
HOW
Externalize and share Ranger across workloads and store
policies external.
WHAT
Audit user access.
WHY
Capture data access activity.
HOW
Externalize and share Ranger across workloads, leverage
cloud storage for audit data.
GATEWAY
WHAT
Provide single endpoint that can be protected with SSL and enabled for authentication to access to cluster resources.
WHY
Avoid opening many ports, some potentially w/o authentication or SSL
protection.
HOW
Deploy a centralized protected gateway automatically.
WHAT
Users and groups.
WHY
Provide authentication source for users and authorization
source for groups.
HOW
Leverage external LDAP or Active Directory.
14 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Ephemeral Workloads: With Enterprise Security
Ephemeral Enterprise Security
Tuned and Optimized Infrastructure
Simplified, Automated Operations
S3 Integration
Protected Network Access
Schema Shared (Hive Metastore) Shared (Hive Metastore)
Authentication Single-user Multi-User (LDAP/AD)
Authorization - Security Policies (Ranger)
Audit - Audit (Ranger)
15 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Ephemeral Workloads + Cloud Storage + Shared “Data Lake” Services
CLOUD STORAGE
S3ADLSWASB
WORKLOAD CLUSTERS
Durable Ephemeral
SHARED DATA LAKE SERVICES
Metastore
SCHEMA
Long Running
Define your data schema and security policies once for your
ephemeral and always-on workloads
Ranger
POLICY
Security access to workload clusters via a Protected Gateway enabled for AuthN and HTTPS.
16 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Shared Schema: Hive Metastore
Register external “Amazon RDS” instances to use with Hive Metastore
Preserve Hive schema across multiple ephemeral clusters
18 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Shared Security Policies: Ranger
Create a set of “Shared Data Lake Services”
Preserve Ranger Security Policies across multiple ephemeral clusters
19 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Deployment Architecture
Access your cluster components through the
protected gateway via SSL on port 443 open on the controller security group.
CONTROLLER
PROTECTEDGATEWAY
USER ACCESS
Zeppelin
HIVE LLAP / SPARK WORKLOADS
Hive
LLAP
SHARED DATA LAKE SERVICES
Ranger
POLICY
(RDS)
AUDIT
(S3)
SCHEMA
(RDS)
DIRECTORY
(LDAP/AD)
Spark
Hive
Metastore
20 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Hortonworks Data Cloud + Shared Data Lake Services
1
2
3
Register an Authentication Source (i.e. LDAP/AD).
Create a “Shared Data Lake”, specify S3 Bucket & RDS.
When you create a cluster, ”attach” to the Shared Data Lake Services:• for Multi-User AuthN (LDAP/AD)• for AuthZ + Audit (Ranger)• for Schema (Hive Metastore)
PREREQUISITES
• LDAP/AD
• S3 Bucket
• RDS Instance
22 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
General Guidelines
Think Ephemeral. All of your data and metadata in S3 and RDS respectively, do not create tables or files in the local HDFS.
The Hive warehouse is setup to be on S3 for data lakes, create tables in this location instead of individual S3 buckets, it will make them easier to manage.
Use Hive “external tables” for tables that are outside this warehouse, typically if the data is being ingested through some path outside of Hadoop
Create S3 bucket policies that exactly match usage so that you can spin up clusters with the least privilege.
24 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Takeaways
Cloud driving more ephemeral data processing use cases
Ephemeral workloads leverage cloud storage
This pattern is driving an architectural approach for “Shared Data Lake Services”
Building blocks are Apache Ranger and Apache Knox
Resource Link
Hortonworks Data Cloud https://hortonworks.com/products/cloud/aws/
Apache Ranger https://hortonworks.com/apache/ranger/
Apache Knox https://hortonworks.com/apache/knox-gateway/
25 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Learn More
Enterprise ready security and governance for Hadoop ecosystem
Breakout SessionThursday, September 21 @ 3:10p
https://dataworkssummit.com/sydney-2017/sessions/treat-your-enterprise-data-lake-indigestion-enterprise-ready-security-and-governance-for-hadoop-ecosystem
Security, Governance and Cybersecurity
Bird of a FeatherThursday, September 21 @ 6:00p
https://dataworkssummit.com/sydney-2017/birds-of-a-feather/security-governance-cybersecurity/