Hadoop Security Today & Tomorrow with Apache Knox
description
Transcript of Hadoop Security Today & Tomorrow with Apache Knox
© Hortonworks Inc. 2014
Hadoop Security Today & TomorrowAmsterdam - April3rd, 2014Vinay Shukla Twitter: @NeoMythos
© Hortonworks Inc. 2014
Agenda• What is Hadoop Security?
– 4 Security Pillars & Rings of Defense• What security elements exists today?
– Authentication– Authorization– Audit– Data Protection
• What is on the security roadmap?– Coming soon– Longer term projects
• Securing Hadoop with Apache Knox Gateway– Knox overview– Demo
• How to get involved
© Hortonworks Inc. 2014
Two Reasons for Security in Hadoop
Hadoop Contains Sensitive Data– As Hadoop adoption grows so too has the types of data
organizations look to store. Often the data is proprietary or personal and it must be protected.
– In this context, Hadoop is governed by the same security requirements as any data center platform.
Hadoop is subject to Compliance adherence– Organizations are often subject to comply with
regulations such as HIPPA, PCI DSS, FISAM that require protection of personal information.
– Adherence to other Corporate security policies.
1
2
© Hortonworks Inc. 2014
What is Apache Hadoop Security?
Security in Apache Hadoop is defined by four key pillars:
authentication, authorization, accountability, and data
protection.
© Hortonworks Inc. 2014
Security: Rings of Defense
Perimeter Level Security• Network Security (i.e. Firewalls)• Apache Knox (i.e. Gateways)
Data Protection• Core Hadoop• Partners
Authentication• Kerberos
OS Security
Authorization• MR ACLs• HDFS Permissions• HDFS ACLs• HiveATZ-NG• HBase ACLs• Accumulo Label Security
Page 5
© Hortonworks Inc. 2014
Authentication in Hadoop Today…
Authentication
Who am I/prove it?Control access to cluster.
Authorization
Restrict access to explicit data
Audit
Understand who did what
Data Protection
Encrypt data at rest & motion
Kerberos in native Apache Hadoop
Perimeter Security with Apache Knox Gateway
© Hortonworks Inc. 2014
Kerberos Authentication in Hadoop
For more than 20 years, Kerberos has been the de-facto standard for strong authentication. …no other option exists.
The design and implementation of Kerberos security in native Apache Hadoop was delivered by Hortonworker Owen O’Malley in 2010.
What does Kerberos Do?– Establishes identity for clients, hosts and services– Prevents impersonation/passwords are never sent over the wire– Integrates w/ enterprise identity management tools such as LDAP & Active Directory– More granular auditing of data access/job execution
© Hortonworks Inc. 2014
• Single Hadoop access point
• REST API hierarchy• Consolidated API
calls• Multi-cluster
support
• Eliminates SSH “edge node”
• Central API management
• Central audit control • Simple Service
level Authorization
• SSO Integration – Siteminder, API Key*, OAuth* & SAML*
• LDAP & AD integration
Perimeter Security with Apache Knox
Integrated with existing systems to
simplify identity maintenance
Incubated and led by Hortonworks, Apache Knox provides a simple and open framework for Hadoop perimeter security.
Single, simple point of access for a
cluster
Central controls ensure consistency across one or more
clusters
© Hortonworks Inc. 2014
Authentication & Audit in Hadoop today…
Authorization
Restrict access to explicit data
Audit
Understand who did what
Data Protection
Encrypt data at rest & motion
Kerberos in native Apache Hadoop
Perimeter Security with Apache Knox Gateway
Native in Apache Hadoop• MapReduce Access Control Lists • HDFS Permissions• Process Execution audit trail
Cell level access control in Apache Accumulo
Authentication
Who am I/prove it?Control access to cluster.
© Hortonworks Inc. 2014
Authorization: Who can do what in Hadoop?
• Access Control Services exist for each of the Hadoop components
– HDFS has file Permissions– YARN, MapReduce, HBase has Access Control Lists (ACL)– Accumulo Proves more granular label/cell level security
• Improvements to these services are being led by Hortonworks Team:
– HDFS Improvements – Extended ACL, more flexible via multiple policies on the same file or directory
– Hive Improvements – Hortonworks initiative called Hive ATZ-NG, better integration allows familiar SQL/database syntax (GRANT/REVOKE) and allows more clients (including partner integrations) to be secure.
© Hortonworks Inc. 2014
Data Protection in Hadoop today…
Authorization
Restrict access to explicit data
Audit
Understand who did what
Data Protection
Encrypt data at rest & motion
Kerberos in native Apache Hadoop
Perimeter Security with Apache Knox Gateway
Native in Apache Hadoop• MapReduce Access Control Lists • HDFS Permissions• Process Execution audit trail
Cell level access control in Apache Accumulo
Wire encryption in native Apache Hadoop
Orchestrated encryption with 3rd party tools
Authentication
Who am I/prove it?Control access to cluster.
© Hortonworks Inc. 2014
Data Protection in Hadoop
must be applied at three different layers in Apache Hadoop
Storage: encrypt data while it is at restDirect data flows “into” and “out of” 3rd party encryption tools and/or rely upon hardware specific techniques (i.e. drive-level encryption).
Transmission: encrypt data as it is in motionNative Apache Hadoop 2.0 provides wire encryption.
Upon Access: apply restrictions when accessedDirect data flows “into” and “out of” 3rd party encryption tools.
Data Protection
© Hortonworks Inc. 2014
Data Protection – Details - Today
• Encryption of Data at Rest– Option 1: OS or Hardware Level Encryption (Out of the Box)– Option 2: Custom Development– Option 3: Certified Partners– Work underway for encryption in Hive, HDFS and HBase as core
platform capabilities. • Encryption of Data on the Wire
– All wire protocols can be encrypted by HDP platform (2.x). Wire-level encryption enhancements led by HWX Team.
• Column Level Encryption– No current out of the box support in Hadoop.– Certified Partners provide these capabilities.
© Hortonworks Inc. 2014
What can be done today?
Authorization
Restrict access to explicit data
Audit
Understand who did what
Data Protection
Encrypt data at rest & motion
Kerberos in native Apache Hadoop
Perimeter Security with Apache Knox Gateway
Native in Apache Hadoop• MapReduce Access Control Lists • HDFS Permissions• Process Execution audit trail
Cell level access control in Apache Accumulo
Service level Authorization with KnoxAccess Audit with Knox
Wire encryption in native Apache Hadoop
Wire Encryption with Knox
Orchestrated encryption with 3rd party tools
Authentication
Who am I/prove it?Control access to cluster.
© Hortonworks Inc. 2014
Hadoop SecurityHortonworks is Delivering Secure Hadoop for the EnterpriseSecurity for Hadoop must be addressed within every layer of the stack and integrated into existing frameworksFor a full description of what is available in Enterprise Hadoop today across Authentication, Authorization, Accountability and Data Protection please visit our security labs page
Gov
erna
nce
& In
tegr
atio
n
Secu
rity
Ope
ratio
nsData Access
Data Management
HDP 2.1
New: Apache KnoxPerimeter security for Hadoop
A common place to preform authentication across Hadoop and all related projects
Integrated to LDAP and AD Currently supports:
WebHDFS, WebHCAT, Oozie, Hive & HBase Broad community effort, incubated with
Microsoft, broad set of developers involved
Security Investments
Security Phase 3:• Audit event correlation and Audit viewer• Data Encryption in HDFS, Hive & HBase• Knox for HDFS HA, Ambari & Falcon• Support Token-Based AuthN beyond Kerb
Security Phase 2:• ACLs for HDFS• Knox: Hadoop REST API Security• SQL-style Hive AuthZ (GRANT, REVOKE)• SSL support for Hive Server 2• SSL for DN/NN UI & WebHDFS• PAM support for Hive
Phase 1• Strong AuthN with Kerberos • HBase, Hive, HDFS basic AuthZ• Encryption with SSL for NN, JT, etc.• Wire encryption with Shuffle, HDFS, JDBC
HDP 2.1
© Hortonworks Inc. 2014
Hadoop Security: Phase 2
Page 16
HDP 2.1 Features
Release Theme REST API Security, Improve AuthZ, Wire Encryption
Specific Features • Hadoop REST API Security with Apache Knox• Eliminates SSH edge node• Single Hadoop access point• LDAP, AD based Authentication• Service-level Authorization• Audit support for REST access
• SQL style Hive Authorization with fine grain access• HDFS Access Control Lists• SSL support in HiveServer2• SSL support in NN/DN UI & WebHDFS• Pluggable Authentication Module (PAM) in Hive
Included Components
Apache Knox, Hive, HDFS
© Hortonworks Inc. 2014
Why Knox?
From fb.com/hadoopmemes
Apache Knox Gateway • REST/HTTP API security for
Hadoop • Eliminates SSH edge node
• Single REST API access point • Centralized Authentication,
Authorization, and Audit for Hadoop REST/HTTP services
• LDAP/AD Authentication, Service Authorization, Audit etc.
Knox Eliminates• Client’s requirements for intimate knowledge of cluster topology
© Hortonworks Inc. 2014
Knox Deployment with Hadoop ClusterApplication Tier
DMZ
Switch Switch
….Master NodesRack 1
Switch
NN
SNN
….Slave NodesRack 2
….Slave NodesRack N
SwitchSwitch
DN DN
Web Tier
LB
Knox
Hadoop CLIs
© Hortonworks Inc. 2014
Hadoop REST API Security: Drill-Down
Page 19
RESTClient
EnterpriseIdentityProviderLDAP/AD
Knox Gateway
GWGW
Firewall
Firewall
DMZ
LB
Edge Node/Hadoop CLIs
RPC
HTTP
HTTP HTTP
LDAP
Hadoop Cluster 1Masters
Slaves
RM
NN
WebHCatOozie
DN NM
HS2
Hadoop Cluster 2Masters
Slaves
RM
NN
WebHCatOozie
DN NM
HS2
HBase
HBase
© Hortonworks Inc. 2014
Selects appropriate service filter chain based on request
URL mapping rules
RESTClient
ProtocolListener
Listens for requests on the appropriate protocols (e.g. HTTP/HTTPS)
ServiceSelector
Service Specific Filter Chain
IdentityAsserter
FilterDispatchRewrite
FilterAuthNFilter
HadoopService
Enforces propagation of authenticated identity to Hadoop
by modifying request
Streams request and response to and from Hadoop service based
on rewritten URLs
Translates URLs in request and response between external
and internal URLs based on service specific rules
EnterpriseIdentityProvider
Enterprise/Cloud SSO
Provider
Challenges client for credentials and authenticates
or validates SSO Token
Service filter chains are composed and configured at deployment time
by service specific plugins
What is Knox? Client > Knox > Hadoop Cluster
Page 20
Knox Gateway
© Hortonworks Inc. 2014© Hortonworks Inc. 2014
Knox Gateway in actionSubmit MR job via Knox
Page 21
© Hortonworks Inc. 2014
HDFS & MR Operations with Knox• Create a few directories curl -iku guest:guest-password -X PUT 'https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/test?op=MKDIRS&permission=777' curl -iku guest:guest-password -X PUT "https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/test/input?op=MKDIRS&permission=777" curl -iku guest:guest-password -X PUT "https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/test/lib?op=MKDIRS&permission=777"
• Upload filescurl -iku guest:guest-password -L -T samples/hadoop-examples.jar -X PUT https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/test/
lib/hadoop-examples.jar?op=CREATEcurl -iku guest:guest-password -X PUT -L -T README -X PUT "https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/test/input/
README?op=CREATE"
• Run MR jobcurl -iku guest:guest-password -X POST -d arg=/user/guest/test/input -d arg=/user/guest/test/output -d jar=/user/guest/test/lib/hadoop-examples.jar -d class=org.apache.hadoop.examples.WordCount https://localhost:8443/gateway/sandbox/templeton/v1/mapreduce/jar
• Query the jobs for a usercurl -iku guest:guest-password https://localhost:8443/gateway/sandbox/templeton/v1/queue
• Query the status of a given jobcurl -iku guest:guest-password https://localhost:8443/gateway/sandbox/templeton/v1/queue/<job_id>
• Read the output filecurl -iku guest:guest-password -L -X GET https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/test/output/part-r-00000?op=OPEN
• Remove a directorycurl -iku guest:guest-password -X DELETE "https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/test?op=DELETE&recursive=true"
Page 22
© Hortonworks Inc. 2014
How to get Involved
Page 23
Resource Location
Security Labs http://hortonworks.com/labs/security/
Security Blogs http://hortonworks.com/blog/category/innovation/security/
Apache Knox Tutorial
http://hortonworks.com/hadoop-tutorial/securing-hadoop-infrastructure-apache-knox/
Need help? http://hortonworks.com/community/forums/forum/security/ or [email protected]
© Hortonworks Inc. 2014 Page 24
Thank you! Amsterdam - April3rd, 2014Vinay Shukla Twitter: @NeoMythos