Doing hadoop securely

24
Big Data Consulting doing hadoop, securely

Transcript of Doing hadoop securely

Page 1: Doing hadoop securely

Big Data Consultingdoing hadoop, securely

Page 2: Doing hadoop securely

Rob Gibbon■ Architect @Big Industries Belgium■ Focus on designing, deploying & integrating web

scale solutions with Hadoop■ Deliveries for clients in telco, financial services &

media

Page 3: Doing hadoop securely

Hadoop was built to survive data tsunamis

■ a response to challenges that enterprise vendors were unable to address

■ focused on data volumes and cost reduction■ initially, the solution had some serious holes

Page 4: Doing hadoop securely

Confidentiality, Integrity, Availability

■ early prereleases couldn’t really meet any of these three fundamental infosec objectives

■ basic controls weren’t there

Page 5: Doing hadoop securely

the early days■ Multiple SPoF■ No authentication■ Easily spoofed authorisation■ No encryption of data at rest nor in transit■ No accounting

Page 6: Doing hadoop securely

enter the hadoop vendors■ Vendors like Cloudera focus on making Apache

Hadoop “enterprise ready”■ Includes building robust infosec controls into

Hadoop core■ Multilayer security is now available for Hadoop

Page 7: Doing hadoop securely

running a cluster in non-secure mode

■ malicious|mistaken user:■ recursively delete all the data please■ by the way, I’m the system superuser

■ hadoop:■ oh ok then

Page 8: Doing hadoop securely

bad things happen with slack controls in place

Page 9: Doing hadoop securely

average cost of a data breach = $3.8m

Page 10: Doing hadoop securely

running a secure cluster

■ Kerberos is one of the primary security controls you can use

■ Btw, what’s wrong with this kerberos principal?■ [email protected]

Page 11: Doing hadoop securely

kerberos continued

■ Kerberos uses a three-part principal■ hdfs/[email protected]■ hdfs/[email protected]

■ Best to use explicit mappings from kerberos principals to local users

Page 12: Doing hadoop securely

hive / impala■ HiveServer doesn’t support Kerberos => use HiveServer2■ Best to use Sentry to enforce role based access controls from

SQL■ Users can upload and execute arbitrary [possibly hostile] UDFs

=> enable Sentry■ Older versions of Metastore don’t enforce permissions on

grant_* and revoke_* APIs => stay up to date

Page 13: Doing hadoop securely

availability■ Most core components now support HA

■ HDFS■ YARN■ Hive■ Hbase

Page 14: Doing hadoop securely

disaster recovery■ HDFS and HBase offer point-in-time snapshots

■ => consistentency!■ Vendor-tethered solutions for site-to-site replication

are available

Page 15: Doing hadoop securely

encryption at rest■ HDFS encryption zones

■ transparent to existing applications■ minimal performance overhead on Intel

architecture■ key management is externalised

Page 16: Doing hadoop securely

wire encryption■ SSL encryption is now available for most Hadoop

services■ Note that AES-256 for SSL and for Kerberos preauth

requires extra JCE policy files on the cluster

Page 17: Doing hadoop securely

accounting

■ Vendor-tethered solutions are available for auditing■ Navigator for Cloudera clusters■ Ranger for HortonWorks clusters

Page 18: Doing hadoop securely

tokenization

■ The process of substituting a sensitive data element with a non-sensitive equivalent

■ 3rd Party vendor solutions are available that integrate well with Hadoop

Page 19: Doing hadoop securely

some places where there’s still some work to do

■ Setting up hadoop security controls is complex and time consuming

■ Not much support for SELinux around here■ No general, coherent, policy-based framework for controlling

resource access demands■ Apache Knox is a starting point■ => network and host resource access?

Page 20: Doing hadoop securely

Integration■ Integrating hadoop into an organisation’s services environment

needs careful planning■ Hadoop can conflict with established governance policies

■ system accounts & privileges■ remote access■ firewall flows■ domains and trust■ etc.

Page 21: Doing hadoop securely

layered security in hadoop-core■ Authentication: Kerberos■ Authorisation: Local unix group or LDAP mappings■ Authorisation: Sentry RBACS for hive/impala■ Encryption: HDFS encryption■ Encryption: SSL encryption for most services■ Availability: Active/Passive failover HDFS, YARN, Hbase■ Integrity: HDFS block replication & CRC checksum

Page 22: Doing hadoop securely

but what about poodle/heartbleed/shellshock/whatever...

■ underlines the need for a mature information security governance strategy & architecture

Page 23: Doing hadoop securely

defence-in-depth

■ A layered security architecture for Hadoop clusters is doable

■ eg. MasterCard’s Cloudera Hadoop cluster achieved PCI compliance in 2014 http://goo.gl/FP5DUt

Page 24: Doing hadoop securely

thanks for listeningbe.linkedin.com/in/robertgibbon

www.bigindustries.be