Cloudera training: secure your Cloudera cluster

61
Cloudera training: secure your Cloudera cluster

Transcript of Cloudera training: secure your Cloudera cluster

Page 1: Cloudera training: secure your Cloudera cluster

© Cloudera, Inc. All rights reserved.

Cloudera training: secure your Cloudera cluster

Page 2: Cloudera training: secure your Cloudera cluster

© Cloudera, Inc. All rights reserved.

The demand for skills is high and Hadoop is the future. Customers cannot afford to move slowly in staffing their Big Data projects. Customers are building plans to ensure projects are staffed with skilled employees, and supported by a qualified services provider.

Job Trends from Indeed.com

What are you most concerned about when it comes to your readiness for big data and hadoop?

Cloudera MDP webinar poll results, July 2016

Page 3: Cloudera training: secure your Cloudera cluster

© Cloudera, Inc. All rights reserved.

Why Cloudera training?Aligned to best practices and the pace of change

1Broadest range of coursesLearning paths for Developer, Admin, Analyst

2Most experienced instructorsMore than 40,000 trained since 2009

6Widest geographic coverageMost classes offered: 50 cities worldwide plus online

7Most relevant platform & communityCDH deployed more than all other distributions combined

3Leader in certificationOver 12,000 accredited Cloudera professionals

Trusted source for training100,000+ people have attended online courses4

8Depth of training materialHands-on labs and VMs support live instruction

9Ongoing learningVideo tutorials and e-learning complement training

State of the art curriculumCourses updated as Hadoop evolves5 10 Commitment to big data education

University partnerships to teach Hadoop in colleges

Page 4: Cloudera training: secure your Cloudera cluster

© Cloudera, Inc. All rights reserved.

Creating leaders in the fieldTraining enables Big Data solutions and innovation

94%

66%

Would recommend or highly recommend Cloudera training to friends or colleagues

Draw on lessons from Cloudera training on at least a monthly basis

40% Develop new apps or perform business-critical analyses as a result of training alone

Sources: Cloudera Past Public Training Participant Study, December 2012.

Cloudera Customer Satisfaction Study, January 2013.

88% Indicate Cloudera training provided the Hadoopexpertise their roles require

Page 5: Cloudera training: secure your Cloudera cluster

© Cloudera, Inc. All rights reserved.

What is available from Cloudera University?

• Private training: Course delivered at location of customer choice to internal audience

• Public training: Courses regularly scheduled around the globe. Schedule available on web

• Virtual training: Live training accessed via the internet; available for public and private courses

• OnDemand training: Pre-recorded lecture with identical content/exercises as live training options

• Certification: Rigorously developed and meaningful bodies of knowledge

OnDemand Virtual live classroom Private onsitePublic live classroom

Page 6: Cloudera training: secure your Cloudera cluster

© Cloudera, Inc. All rights reserved.

Suggested Cloudera University curricula

Developers

• Python/Scala Training

• Developer for Spark and Hadoop

• CCA: Spark and Hadoop

Developer

• Spark ML & Kafka modules

• Topic specific training (Search,

HBase)

• Hands on practice

• CCP: Data Engineer

Administrators

• Cloudera Administration training

• CCA: Administrator

• Cloudera Security OnDemand

Data Analysts/Data Scientists

• Data Analyst: Using Hive, Pig & Impala

• CCA: Data Analyst

• Cloudera Data Science

Page 7: Cloudera training: secure your Cloudera cluster

7© Cloudera, Inc. All rights reserved.

Security for Hadoop

Carlo Lazzaris | Technical Instructor

Page 8: Cloudera training: secure your Cloudera cluster

8© Cloudera, Inc. All rights reserved.

Security Webinar Agenda

1. The need for Hadoop Security

Hacker news and legal regulations

2. Cloudera Security Implementation

Five levels of security

3. How to secure your Cloudera cluster

Cloudera Documentation

Cloudera professional services

Cloudera OnDemand security course

Page 9: Cloudera training: secure your Cloudera cluster

9© Cloudera, Inc. All rights reserved.

The need for Hadoop security

Page 10: Cloudera training: secure your Cloudera cluster

10© Cloudera, Inc. All rights reserved.

Unguarded data stores are the victims

Page 11: Cloudera training: secure your Cloudera cluster

11© Cloudera, Inc. All rights reserved.

Regulatory Compliance

Organizations can be fined up to 4% of annual global turnover for breaching GDPR

or €20 Million

Page 12: Cloudera training: secure your Cloudera cluster

12© Cloudera, Inc. All rights reserved.

Cloudera security implementation

Page 13: Cloudera training: secure your Cloudera cluster

13© Cloudera, Inc. All rights reserved.

Cloudera Enterprise CDH

13

The modern platform for machine learning and analytics optimized for the cloud

EXTENSIBLE SERVICES

CORE SERVICESDATA

ENGINEERINGOPERATIONAL

DATABASEANALYTIC DATABASE

DATA CATALOG

INGEST & REPLICATION

SECURITY GOVERNANCEWORKLOAD

MANAGEMENT

DATA SCIENCE

S3 ADLS HDFS KUDUSTORAGESERVICES

Page 14: Cloudera training: secure your Cloudera cluster

14© Cloudera, Inc. All rights reserved.

• Unified security – protects sensitive data with consistent

controls, even for transient and recurring workloads

• Consistent governance – enables secure self-service access

to all relevant data and increases compliance

• Easy workload management – increases user productivity and

boosts job predictability

• Flexible ingest and replication – aggregates a single copy of

all data, provides disaster recovery, and eases migration

• Shared catalog – defines and preserves structure and

business context of data for new applications and partner

solutions

Open platform servicesBuilt for multi-function analytics | Optimized for cloud

Page 15: Cloudera training: secure your Cloudera cluster

15© Cloudera, Inc. All rights reserved.

Cloudera Enterprise-Grade Security and Governance

Access

Defining what

users and

applications can

do with data

Technical Concepts:

Permissions

Authorization

Data

Protection

Shielding data in

the cluster from

unauthorized

visibility

Technical Concepts:

Encryption at rest & in

motion

Visibility

Reporting on

where data came

from and how it’s

being used

Technical Concepts:

Auditing

Lineage

Cloudera Manager Apache Sentry Cloudera NavigatorNavigator Encrypt &

Key Trustee

Identity

Validate users by

membership in

enterprise

directory

Technical

Concepts:Authentication

User/group mapping

Page 16: Cloudera training: secure your Cloudera cluster

16© Cloudera, Inc. All rights reserved.

Cloudera Certified Technology Partners

Data Sources Data IngestProcess, Refine

& PrepData Discovery Advanced Analytics

Connected Machines/Data sources

Other Data Sources

Page 17: Cloudera training: secure your Cloudera cluster

17© Cloudera, Inc. All rights reserved.

A certified product ensures it integrates securely

• Authenticate via Kerberos or LDAP

Authentication

• Handle Apache Sentry with Hive, Impala, Search, HDFS

Authorization

• Support HDFS transport encryption, at-rest encryption; support SSL/TLS connection encryption

Encryption

Page 18: Cloudera training: secure your Cloudera cluster

18© Cloudera, Inc. All rights reserved.

Vulnerability Response and Process

Vulnerability reports

Upstream

Internal

External

Fix Publish

Page 19: Cloudera training: secure your Cloudera cluster

19© Cloudera, Inc. All rights reserved.

Cluster Security Levels

Page 20: Cloudera training: secure your Cloudera cluster

20© Cloudera, Inc. All rights reserved.

Cloudera Enterprise

20

The modern platform for machine learning and analytics optimized for the cloud

Page 21: Cloudera training: secure your Cloudera cluster

21© Cloudera, Inc. All rights reserved.

Enterprise Encryption Performance

Page 22: Cloudera training: secure your Cloudera cluster

23© Cloudera, Inc. All rights reserved.

Disclaimer

This talk serves as a general guideline for

security implementation on Hadoop.

The actual implementation procedures and

scope of implementation vary on a case-by-

case basis, and should be assessed by

Cloudera’s Professional Services team or

certified Cloudera SI Partners.

Page 23: Cloudera training: secure your Cloudera cluster

24© Cloudera, Inc. All rights reserved.

Non-secure #0Data Free for All

Page 24: Cloudera training: secure your Cloudera cluster

25© Cloudera, Inc. All rights reserved.

Firewall

ActiveDirectory/KDC

Hadoop cluster

Cloudera Manager

Gateway node

Cloudera Worker nodesDatacenter

Applications

Page 25: Cloudera training: secure your Cloudera cluster

26© Cloudera, Inc. All rights reserved.

4 modes of Identity Management

1. Simple Authentication2. Kerberos3. LDAP4. SAML

File group ownership• AD integration• SSSD or CentrifyConsideration in large enterprises.

via SSSD

via

Page 26: Cloudera training: secure your Cloudera cluster

27© Cloudera, Inc. All rights reserved.

Simple Authentication detect the user

Firewall

ActiveDirectory

Master

Worker Worker Worker

Cloudera Manager

Master

(SSSD/Centrify)

Page 27: Cloudera training: secure your Cloudera cluster

28© Cloudera, Inc. All rights reserved.

Simple authentication =

no authentication

Page 28: Cloudera training: secure your Cloudera cluster

29© Cloudera, Inc. All rights reserved.

Minimal Security #1

Reduce Risk Exposure

Page 29: Cloudera training: secure your Cloudera cluster

30© Cloudera, Inc. All rights reserved.

How it works: Authentication

• LDAP and SAML authentication options

Web UIs

• LDAP/AD and Kerberos authentication options

SQL Access

•Kerberos authentication

•Automation provided by Cloudera Manager to leverage Active Directory (AD)

Command Lines

User authenticates to AD or KDC

Authenticated user gets Kerberos Ticket

Ticket grants access to Services e.g. Impala

User [ssmith]

Password [***** ]

Page 30: Cloudera training: secure your Cloudera cluster

31© Cloudera, Inc. All rights reserved.

Kerberos

EXAMPLE.COM

KDC

[email protected]

Hadoop

[email protected]

user

Strong Authentication

KDC Key Distribution Center

• MIT

• ActiveDirectory (more common)

realmprimary

Page 31: Cloudera training: secure your Cloudera cluster

32© Cloudera, Inc. All rights reserved.

Kerberos

Consideration in large corporates

Time synchronization

CM Kerberos Wizard

• Configure AD to create a Kerberos

principal for CM server, and to

delegate CM the ability to

create/manage Kerberos

principals

Page 32: Cloudera training: secure your Cloudera cluster

33© Cloudera, Inc. All rights reserved.

Kerberos

Consideration in large corporates

Time synchronization

CM Kerberos Wizard

• Configure AD to create a Kerberos

principal for CM server, and to

delegate CM the ability to

create/manage Kerberos

principals

Page 33: Cloudera training: secure your Cloudera cluster

34© Cloudera, Inc. All rights reserved.

Kerberos Authentication

* LDAP over SSL

Page 34: Cloudera training: secure your Cloudera cluster

35© Cloudera, Inc. All rights reserved.

Authorization/Access Control

HDFS File ACL YARN job submission

Hbase ACLs Oozie ACL

Access Control List (ACLs)

Hive

Sentry Managed

(RBAC)

Impala

Page 35: Cloudera training: secure your Cloudera cluster

36© Cloudera, Inc. All rights reserved.

Auditing

Page 36: Cloudera training: secure your Cloudera cluster

37© Cloudera, Inc. All rights reserved.

Backup/Disaster Recovery

Cloudera Backup/Disaster Recovery (BDR)

• A high performance data replicator

• Copies incremental data on the source cluster at specified schedules

Supports

Kerberos

Data encryption

HDFS replication to cloud

Page 37: Cloudera training: secure your Cloudera cluster

38© Cloudera, Inc. All rights reserved.

Kerberized BDR Best Practice

Production DR

Cloudera BDRPROD.EXAMPLE.COM

Cross-realm trustKDC KDC

DR.EXAMPLE.COM

Page 38: Cloudera training: secure your Cloudera cluster

39© Cloudera, Inc. All rights reserved.

More Security #2

Managed, Secure, Protected

Page 39: Cloudera training: secure your Cloudera cluster

40© Cloudera, Inc. All rights reserved.

Data In-Motion Encryption

RPC encryption

Data transport encryption

• Supports AES CTR, up to 256-bit

key length

HTTP TLS/SSL encryption

• No self-signed certificates in

production

Master

Worker Worker Worker

Master

Application

RPC encryption

Transport encryption

TLS/SSL

Page 40: Cloudera training: secure your Cloudera cluster

41© Cloudera, Inc. All rights reserved.

Data At-Rest Encryption

Transparent encryption

Supports any Hadoop applications

Encryption Zone

$ hadoop key create mykey

$ hadoop fs -mkdir /zone

$ hdfs crypto -createZone -keyName mykey -path /zone

/

/tmp /zone

foo bar

Encryption zone

Page 41: Cloudera training: secure your Cloudera cluster

42© Cloudera, Inc. All rights reserved.

Key Management Server Deployment (non-prod)

HDFS NameNode

Client

Java Keystore

KMSKeystore file

Separation of duties

• Encryption Zone Key (EZK) is stored in

KMS server

• HDFS super user can not decrypt files

Page 42: Cloudera training: secure your Cloudera cluster

43© Cloudera, Inc. All rights reserved.

Key Management Server/Key Trustee Server Deployment

HDFS NameNode

ClientKey Trustee

KMS

Key Trustee KMS

Firewall

Key Trustee Server

(Active)

Key Trustee Server

(Passive)

synchronization

(or more)

Page 43: Cloudera training: secure your Cloudera cluster

44© Cloudera, Inc. All rights reserved.

KMS+KTS+HSM Deployment

HDFS NameNode

Client HSM KMS

HSM KMS

Firewall

Key Trustee Server

(Active)

Key Trustee Server

(Passive)

synchronization

Key HSM

(or more)

Key HSM

HSM

HSM

Page 44: Cloudera training: secure your Cloudera cluster

45© Cloudera, Inc. All rights reserved.

Troubleshooting: Encryption Performance Anomaly

• Configuration

• AES-NI Hardware acceleration

• OpenSSL library

• Entropy

Page 45: Cloudera training: secure your Cloudera cluster

46© Cloudera, Inc. All rights reserved.

Fine Grained Access Control with Apache Sentry

Page 46: Cloudera training: secure your Cloudera cluster

47© Cloudera, Inc. All rights reserved.

Most Security #3

Secure Data Vault

Page 47: Cloudera training: secure your Cloudera cluster

48© Cloudera, Inc. All rights reserved.

Level 3 Secure Data Vault

• All data, both data-at-rest and data-in-transit is encrypted

• Key management system is fault-tolerant

• Auditing mechanisms comply with industry, government, and regulatory

standards (PCI, HIPAA, NIST, for example)

• Auditing extends from EDH to the other systems that integrate with it.

• Cluster administrators are well-trained

• Security procedures have been certified by an expert

• Cluster can pass technical review

Page 48: Cloudera training: secure your Cloudera cluster

49© Cloudera, Inc. All rights reserved.

Data Redaction

Personal Identifiable Information

• PCI-DSS, HIPAA

Best practices followed

Password

• stores in credential files, not in configuration

Log, queries

• Cloudera Manager

Page 49: Cloudera training: secure your Cloudera cluster

50© Cloudera, Inc. All rights reserved.

Full Encryption

Encrypt Data Spills

• MapReduce

• Impala

• Hive

• Flume

OS-level encryption

• Navigator Encrypt

Page 50: Cloudera training: secure your Cloudera cluster

51© Cloudera, Inc. All rights reserved.

How to secure your Cloudera cluster

Page 51: Cloudera training: secure your Cloudera cluster

52© Cloudera, Inc. All rights reserved.

Cloudera Documentation

Page 52: Cloudera training: secure your Cloudera cluster

53© Cloudera, Inc. All rights reserved.

Cloudera Professional Services security engagement

• Review security requirements and provide an overview of data security policies

• Audit architecture and current systems for security policies and best practices

• Custom tailor a security reference architecture

• Optimize OS and Java to take advantage of hardware-based crypto-acceleration

• Install and configure Kerberos with MIT Kerberos KDC or Active Directory

• Install and configure Sentry and Cloudera Navigator (license required)

• Install and configure Navigator Encrypt and Key Trustee with an HSM root of trust

• Review fine-grain permissions on sample data using Sentry

• Review audit and lineage on sample data using Navigator

• Use Cloudera Manager and Hue to review security integration for users

• Enable and configure HDFS encryption

https://www.cloudera.com/more/services-and-support/professional-services/security-integration-pilot.html

Page 53: Cloudera training: secure your Cloudera cluster

54© Cloudera, Inc. All rights reserved.

Cloudera online ondemand security course

• Online self paced training course https://ondemand.cloudera.com

• Launch planned for mid Feb 2018

• 3 days estimate worth of content at Cloudera level 1 and 2 security level

• Currently 375~ slides with 9 detailed chapters and 16 instructor demonstrations :

1. Security overview

2. Security Architecture

3. Host Security

4. Encrypting Data in motion

5. Authentication

6. Authorization

7. Encrypting Data at Rest

8. Auditing

9. Additional Considerations: Data Governance

Page 54: Cloudera training: secure your Cloudera cluster

55© Cloudera, Inc. All rights reserved.

Ondemand security course instructor guided demos

1. Potential Attack vectors

2. Securing the cluster hosts

3. Generating and managing keys for TLS

4. Configuring Cloudera Manager for TLS

5. Encrypting Data in Motion

6. Hadoop default authentication

7. Kerberizing Cluster with MIT Kerberos

8. Kerberizing Cluster with Active Directory

9. Configuring Authorising with Cloudera

Manager

10. Controlling access to Yarn

11. Controlling access to HDFS

12. Controlling access to Tables

13. Enabling HDFS Encryption

14. Protecting local data with NavEncrypt

15. Using Navigator for auditing

16. Reassessing cluster security

Page 55: Cloudera training: secure your Cloudera cluster

56© Cloudera, Inc. All rights reserved.

Ondemand security course disclaimer

THIS IS REALLY IMPORTANT:

The examples in this course are based on CM/CDH 5.12, running in a cloud-based deployment on a

cluster using the CentOS 7.2 operating system.

Given the almost limitless permutations of possible configurations, including different versions of CDH,

Cloudera Manager, operating systems, directory servers, Kerberos servers, web browsers, and other

tools, as well as variations in policies, laws, and practices that affect each organization differently, it's

impossible for a training course to cover all aspects of security.

This course is meant to provide a background that will help you to understand many important concepts

and techniques, but is not intended as a replacement for the relevant documentation or a consulting

engagement with an expert who can provide advice based on your specific requirements.

• Disclaimers ~ due to security variety and permutations

• Versions used: CDH 5.12 and Centos 7.2

Page 56: Cloudera training: secure your Cloudera cluster

57© Cloudera, Inc. All rights reserved.

Ondemand security course scenario

• Many of our demonstrations are based on a hypothetical scenario

• However, the concepts should apply to nearly any organization

• Loudacre Mobile is a fast-growing wireless carrier

• Employees serving in a variety of roles

• Data ingested from many sources, in many formats

• Data processed by many tools

Page 57: Cloudera training: secure your Cloudera cluster

58© Cloudera, Inc. All rights reserved.

Ondemand security course environment

Page 58: Cloudera training: secure your Cloudera cluster

59© Cloudera, Inc. All rights reserved.

Comprehensive demonstration cluster

Page 59: Cloudera training: secure your Cloudera cluster

60© Cloudera, Inc. All rights reserved.

Sample chapter structure: Encrypting Data in Motion

• Encryption Fundamentals

• Certificates

• Key Management

Instructor-Led Demonstration: Generating and Managing Keys for TLS

• Configuring Cloudera Manager for TLS

Instructor-Led Demonstration: Configuring Cloudera Manager for TLS

• Encrypting Hadoop’s Data in Motion

Instructor-Led Demonstration: Encrypting Hadoop’s Data in Motion

• Essential Points

Page 60: Cloudera training: secure your Cloudera cluster

61© Cloudera, Inc. All rights reserved.

Register your interest forOnDemand security course:

[email protected]

Page 61: Cloudera training: secure your Cloudera cluster

© Cloudera, Inc. All rights reserved.

Thank you