Hadoop & Security - Past, Present, Future
-
Upload
uwe-seiler -
Category
Technology
-
view
215 -
download
7
Transcript of Hadoop & Security - Past, Present, Future
Page 5
Hadoop & Security 2010
Owen O‘Malley @ Hadoop Summit 2010http://de.slideshare.net/ydn/1-hadoop-securityindetailshadoopsummit2010
Page 6
Hadoop & Security 2010
Owen O‘Malley @ Hadoop Summit 2010http://de.slideshare.net/ydn/1-hadoop-securityindetailshadoopsummit2010
Page 7
Hadoop & Security (Not that long ago…)
Hadoop Cluster
User
SSH
hadoop fs -put
SSHGateway
/user/uwe/
Page 9
Security in Hadoop 2015
AuthorizationRestrict access to
explicit data
AuditUnderstand who did
what
Data ProtectionEncrypt data at rest
& in motion
• Kerberos in Native Apache Hadoop
• HTTP/REST API Secured with Apache Knox Gateway
AuthenticationWho am I/prove it?
• Wire encryption in Hadoop
• File Encryption • Built-in since
Hadoop 2.6• Partner tools
• HDFS, YARN, MapReduce, Hive & HBase
• Storm & Knox
• Fine grain access control
• Centralized audit reporting
• Policy and access history
Centralized Security Administration
Page 11
Typical Flow - Authenticate trough Kerberos
HDFSHiveServer 2
A B C
Beeline Client
KDC
Use Hive, submit query
Hive gets NameNode
(NN) Service Ticket
Hive creates
MapReduce/Tez
job using NN
Client gets Service
Ticket for Hive
Page 12
Typical Flow - Authorization through Ranger
HDFSHiveServer 2
A B C
Beeline Client
KDC
Use Hive, submit query
Hive gets NameNode
(NN) Service Ticket
Hive creates
MapReduce/Tez
job using NN
Client gets Service
Ticket for Hive
Ranger
Page 13
Typical Flow - Perimeter through Knox
HDFSHiveServer 2
A B C
Beeline Client
KDC
Hive gets NameNode
(NN) Service Ticket
Knox gets Service
Ticket for Hive
Ranger
Client gets
query result
Original request
with user
id/password
Knox runs
as proxy
user using
Hive
Hive creates
MapReduce/Tez
job using NN
Page 14
Typical Flow - Wire & File Encryption
HDFSHiveServer 2
A B C
Beeline Client
KDC
Hive gets NameNode
(NN) Service Ticket
Hive creates
MapReduce/Tez
job using NN
Knox gets Service
Ticket for Hive
Ranger
Knox runs
as proxy
user using
Hive
Original request
with user
id/password
Client gets
query result
SSL SSL SASL
SSL SSL
Page 16
Kerberos Synopsis
• Client never sends a password
• Sends a username + token instead
• Authentication is centralized
• Key Distribution Center (KDC)
• Client will receive a Ticket-Granting-Ticket
• Allows authenticated client to request access to secured services
• Clients establish a timed session
• Clients establish trust with services by sending KDC-stamped tickets to service
Page 17
Kerberos + Active Directory/LDAP
Cross Realm Trust
Client
Hadoop Cluster
AD / LDAP KDC
Hosts: [email protected]
Services: hdfs/[email protected]
User Store
Use existing directory tools to
manage users
Use Kerberos tools to manage host + service principals
Authentication
Users: [email protected]
Page 18
Ambari & Kerberos
• Install & Configure Kerberos
Server on a single node
Client on rest of the nodes
• Define Principals & Keytabs
A keytab (key table) is a file containing a key for a principal
Since there are a few dozen principals, Ambari can generate keytab data for your entire cluster as a downloadable csv file
• Configure User Permissions
Page 20
Load Balancer
Knox: Core Concept
Data Ingest
ETL
SSH
RPC CallFalconOozieScoopFlume
Admin /Data
Operator
Business User
HadoopAdmin
JDBC/ODBCREST/HTTP
Hadoop Cluster
HDFS Hive App XApp CApplication Layer
REST/HTTP
EdgeNode
Page 21
Knox: Hadoop REST APIService Direct URL Knox URL
WebHDFS http://namenode-host:50070/webhdfs https://knox-host:8443/webhdfs
WebHCat http://webhcat-host:50111/templeton https://knox-host:8443/templeton
Oozie http://ooziehost:11000/oozie https://knox-host:8443/oozie
HBase http://hbasehost:60080 https://knox-host:8443/hbase
Hive http://hivehost:10001/cliservice https://knox-host:8443/hive
YARN http://yarn-host:yarn-port/ws https://knox-host:8443/resourcemanager
Masters could be on many
different hosts
One host, one port
Consistent paths
SSL config at one host
Page 22
Knox: Features
Simplified Access
• Kerberos Encapsulation • Single Access Point• Multi-cluster support• Single SSL certificate
Centralized Control
• Central REST API auditing• Service-level authorization• Alternative to SSH “edge node”
Enterprise Integration
• LDAP / AD integration• SSO integration• Apache Shiro extensibility• Custom extensibility
Enhanced Security
• Protect network details• SSL for non-SSL services• WebApp vulnerability filter
Page 23
Knox: Architecture
RESTClient
EnterpriseIdentityProvider
Knox
Firewall
Firewall
DMZ
LB
Edge Node /Hadoop CLIs
RPC
HTTP
Slaves
RM
NN
WebHCat
Oozie
DN NM
HS2
HBase
KnoxKnox
Masters
Slaves
Hadoop Cluster 1
Slaves
RM
NN
WebHCat
Oozie
DN NM
HS2
HBaseMasters
Slaves
Hadoop Cluster 2
Page 24
Knox: What’s New in Version 0.6
• Knox support for HDFS HA
• Support for YARN REST API
• Support for SSL to Hadoop Cluster Services (WebHDFS, HBase, Hive & Oozie)
• Knox Management REST API
• Integration with Ranger for Knox Service Level Authorization
• Use Ambari for install/start/stop/configuration
Page 27
Authorization: Overview
• HDFS• Permissions
• ACL‘s
• YARN• Queue ACL‘s
• Pig• No server component to
check/enforce ACL‘s
• Hive• Column level ACL‘s
• HBase• Cell level ACL‘s
Page 28
Authorization: HDFS Permissions
hadoop fs -chown maya:sales /sales-data
hadoop fs -chmod 640 /sales-data
Page 29
Authorization: HDFS ACL‘s
New Requirements:– Maya, Diana and Clark are allowed to make modifications
– New group execs should be able to read the sales data
Page 30
Authorization: HDFS ACL‘s
hdfs dfs -setfacl -m group:execs:r-- /sales-data
hdfs dfs -getfacl /sales-data
hadoop fs -ls /sales-data
Page 31
Authorization: HDFS Best Practices
•Start with traditional HDFS file permissions to implementmost permission requirements
• Define a small number of ACL‘s to handle exceptionalcases
•A file/folder with ACL incurs an additional cost in memoryin the NameNode compared to a file/folder with traditional permissions
Page 32
Authorization: YARN Permissions
yarn.scheduler.capacity.root.longrunning-jobs.acl_submit_applications=“etl,admin,Uwe”
yarn.scheduler.capacity.root.longrunning-jobs.acl_administer_queue="admin,Uwe"
Page 33
Authorization: Hive
• Hive has traditionally offered full table access control via HDFS access control
• Solution for column-based control
– Let HiveServer2 check and submit the query execution
– Let the table accessible only by a special (technical) user
– Provide an authorization plugin to restrict UDF‘s and file formats
• Use standard SQL permission constructs– GRANT / REVOKE
• Store the ACL‘s in Hive Metastore
Page 35
Authorization: Hive
CREATE ROLE sales_role;
GRANT ALL ON DATABASE ‘sales-data’ TO ROLE ‘sales_role’;
GRANT SELECT ON DATABASE ‘marketing-data’ TO ROLE ‘sales_role’;
CREATE ROLE sales_column_role;
GRANT ‘c1,c2,c3’ to ‘sales_column_role’;
GRANT ‘SELECT(c1, c2, c3) ’ on ‘secret_table’ to ‘sales_column_role’;
Page 36
Authorization: Pig
• There is no Pig (or MapReduce) Server to submit andcheck column-based access
• Pig (and MapReduce) is restricted to full data access via HDFS access control
Page 37
Authorization: HBase
• The HBase permission model traditionally supported ACL‘sdefined at the namespace, table , column family andcolumn level
– This is sufficient to meet most requirements
• Cell-based security was introduced with HBase 0.98
– On par with the security model of Accumolo
Page 39
Ranger: Central Security Administration
Apache Ranger• Delivers a Single Pane for the
(Security) Administrator
• Centralizes administration of Security Policies
• Ensures consistent coverage across the entire Hadoop Stack
Page 43
Ranger: What’s New in Version 0.4?
• New Components Coverage
• Storm Authorization & Auditing
• Knox Authorization & Auditing
• Deeper Integration with HDP
• Windows Support
• Integration with Hive Auth API, support grant/revoke commands
• Support grant/revoke commands in HBase
• Enterprise Readiness
• Rest APIs for policy manager
• Store Audit logs locally in HDFS
• Support Oracle DB
• Ambari support, as part of Ambari 2.0 release
Page 45
Encryption: Data in motion
• Hadoop Client to DataNode via Data Transfer Protocol
– Client reads/writes to HDFS over encrypted channel
– Configurable encryption strength
• ODBC/JDBC Client to HiveServer2– Encryption via SASL Quality of Protection
• Mapper to Reducer during Shuffle/Sort Phase– Shuffle is over HTTP(S)– Supports mutual authentification via SSL– Host name verification enabled
• REST Protocols– SSL Support
Page 46
Encryption: Data at rest
HDFS Transparent Data Encryption• Install and run KMS on top of HDP 2.2
• Change according HDFS parameters (via Ambari)
• Create encryption key
hadoop key create key1 -size 256
hadoop key list –metadata
• Create an encryption zone using the key
hdfs dfs -mkdir /zone1
hdfs crypto -createZone -keyName key1 /zone1
hdfs –listZones
• Details:
– http://hortonworks.com/kb/hdfs-transparent-data-encryption/
Page 48
Apache Atlas: Data Classification
Currently in Incubation
– https://wiki.apache.org/incubator/AtlasProposal
Page 49
Apache Atlas: Tag-based Policies
HDFSHiveServer 2
A B C
Beeline Client
RangerMetadata Server
Data ClassificationTable1|“marketing“
Tag PolicyLogs IT-Admin Create
Data Ingestion / ETL
Falcon
Oozie
Source Data
Scoop
Flume
Page 50
Future: More goodies
Dynamic, Attribute based Access Control (ABAC)• Extend Ranger to support data or user attributes in policy decisions
• Example: Use geo-location of users
Enhanced Auditing• Ranger can stream audit data through Kafka&Storm into multiple stores
• Use Storm for correlation of data
Encryption as First Class Citizen• Build native encryption support in HDFS, Hive & HBase
• Ranger-based key management to support encryption
Page 51
Contact Details
Twitter: @[email protected]
Mail:[email protected]
Phone+49 176 1076531
XING:https://www.xing.com/profile/Uwe_Seiler