Up-Armoring The Elephant: Adding Kerberos-based Security to Hadoop
-
Upload
blueboxtraveler -
Category
Technology
-
view
4.950 -
download
0
description
Transcript of Up-Armoring The Elephant: Adding Kerberos-based Security to Hadoop
![Page 2: Up-Armoring The Elephant: Adding Kerberos-based Security to Hadoop](https://reader034.fdocuments.net/reader034/viewer/2022052622/5594a02e1a28ab19398b4619/html5/thumbnails/2.jpg)
Jakob Homan
• HDFS full-time @Y!
• ApacheHadoop committer
• Past six months –security!
• @blueboxtraveler
Who I am
8/14/20102
![Page 3: Up-Armoring The Elephant: Adding Kerberos-based Security to Hadoop](https://reader034.fdocuments.net/reader034/viewer/2022052622/5594a02e1a28ab19398b4619/html5/thumbnails/3.jpg)
Using Hadoop at Yahoo!
38,000+ Nodes170 PB worth of
storage
More than 1,000,000 MR jobs monthly
Almost every product uses
Hadoop in some way
8/14/20103
![Page 4: Up-Armoring The Elephant: Adding Kerberos-based Security to Hadoop](https://reader034.fdocuments.net/reader034/viewer/2022052622/5594a02e1a28ab19398b4619/html5/thumbnails/4.jpg)
As of 2009, 72% percent of
patches going into the
Hadoop source code were
coming from Yahoo! 72%
Developing Hadoop at Yahoo!
8/14/20104
![Page 5: Up-Armoring The Elephant: Adding Kerberos-based Security to Hadoop](https://reader034.fdocuments.net/reader034/viewer/2022052622/5594a02e1a28ab19398b4619/html5/thumbnails/5.jpg)
Yahoo! provides extensive
QE and QA resources to
test Hadoop releases at
scale. Q{A,E}
Developing Hadoop at Yahoo!
8/14/20105
![Page 6: Up-Armoring The Elephant: Adding Kerberos-based Security to Hadoop](https://reader034.fdocuments.net/reader034/viewer/2022052622/5594a02e1a28ab19398b4619/html5/thumbnails/6.jpg)
Developing Hadoop at Yahoo!
8/14/20106
The Yahoo! distribution of
Hadoop, available on
Github, is the same code
we run internally on our
servers.
Patches important to
stability and performance
and stability are applied
here, as well as Apache.
![Page 7: Up-Armoring The Elephant: Adding Kerberos-based Security to Hadoop](https://reader034.fdocuments.net/reader034/viewer/2022052622/5594a02e1a28ab19398b4619/html5/thumbnails/7.jpg)
Developing Hadoop at Yahoo!
8/14/20107
The rest of the family
![Page 8: Up-Armoring The Elephant: Adding Kerberos-based Security to Hadoop](https://reader034.fdocuments.net/reader034/viewer/2022052622/5594a02e1a28ab19398b4619/html5/thumbnails/8.jpg)
Hadoop at Yahoo! Sunnyvale
8/14/20108
![Page 9: Up-Armoring The Elephant: Adding Kerberos-based Security to Hadoop](https://reader034.fdocuments.net/reader034/viewer/2022052622/5594a02e1a28ab19398b4619/html5/thumbnails/9.jpg)
Why do we need a secure Hadoop?
• Different clusters for different data not a workable solution
• Costs of operating clusters andmoving data too high
Silos don’t cut it anymore
• Personably Identifiable Information
• Financial data
• Regulatory requirements
More sensitive data
8/14/20109
![Page 10: Up-Armoring The Elephant: Adding Kerberos-based Security to Hadoop](https://reader034.fdocuments.net/reader034/viewer/2022052622/5594a02e1a28ab19398b4619/html5/thumbnails/10.jpg)
Current state of security in Hadoop
8/14/201010
Lessthanideal
File system
• POSIX-style permissions
• Audit logging available
Authentication
• Do we really know who we’re talking to?
• Both users and services
Authorization
• Who can see files, launch jobs?
• File systems permissions help with this
![Page 11: Up-Armoring The Elephant: Adding Kerberos-based Security to Hadoop](https://reader034.fdocuments.net/reader034/viewer/2022052622/5594a02e1a28ab19398b4619/html5/thumbnails/11.jpg)
Current state of security in Hadoop
8/14/201011
Bowser copyright Nintendo
![Page 12: Up-Armoring The Elephant: Adding Kerberos-based Security to Hadoop](https://reader034.fdocuments.net/reader034/viewer/2022052622/5594a02e1a28ab19398b4619/html5/thumbnails/12.jpg)
The elephant is too trusting
8/14/201012
![Page 13: Up-Armoring The Elephant: Adding Kerberos-based Security to Hadoop](https://reader034.fdocuments.net/reader034/viewer/2022052622/5594a02e1a28ab19398b4619/html5/thumbnails/13.jpg)
Which can let bad people do bad things
8/14/201013
![Page 14: Up-Armoring The Elephant: Adding Kerberos-based Security to Hadoop](https://reader034.fdocuments.net/reader034/viewer/2022052622/5594a02e1a28ab19398b4619/html5/thumbnails/14.jpg)
Why is securing Hadoop hard?
8/14/201014
![Page 15: Up-Armoring The Elephant: Adding Kerberos-based Security to Hadoop](https://reader034.fdocuments.net/reader034/viewer/2022052622/5594a02e1a28ab19398b4619/html5/thumbnails/15.jpg)
Industry-standard network authentication protocol.
Open-source project from MIT.
Acts as trusted third party to identify and authenticate components in an Hadoop cluster.
It’s out there: Microsoft’s Active Directory can act as a KDC.
Enter Kerberos!
8/14/201015
![Page 16: Up-Armoring The Elephant: Adding Kerberos-based Security to Hadoop](https://reader034.fdocuments.net/reader034/viewer/2022052622/5594a02e1a28ab19398b4619/html5/thumbnails/16.jpg)
Kerberos workflow
User or service authenticatesto KDC
• Users use kinit, can be automatic upon login
• Services use keytabs
KDC provides a ticket-granting-ticket (TGT)
• This verifies identity to other actors in system
• TGTs last for 10 hours, renewable for up to 7 days
User or service presents this ticket
to NN or JT
8/14/201016
![Page 17: Up-Armoring The Elephant: Adding Kerberos-based Security to Hadoop](https://reader034.fdocuments.net/reader034/viewer/2022052622/5594a02e1a28ab19398b4619/html5/thumbnails/17.jpg)
RPC upgraded to use SASL/GSSAPI
Hadoop RPC
• Hadoop has own RPC framework
• Lots of players:
• Namenode
• Datanodes
• Clients
• JobTracker
• TaskTrackers
Simple Authentication and
Security Layer
• RFC 2222 –Standard for lightweight authentication between clients and servers
• Works with GSSAPI to Support Kerberos as an authentication method
• Delegation tokens are also supported
Delegation Tokens
• DIGEST-MD5-based identifiers generated by Namenode
• Alleviate load on Kerberos server when 10,000s of tasks launch simultaneously
• Used to support cross-cluster authentication
8/14/201017
![Page 18: Up-Armoring The Elephant: Adding Kerberos-based Security to Hadoop](https://reader034.fdocuments.net/reader034/viewer/2022052622/5594a02e1a28ab19398b4619/html5/thumbnails/18.jpg)
What does a secure Hadoop look like?
8/14/201018
![Page 19: Up-Armoring The Elephant: Adding Kerberos-based Security to Hadoop](https://reader034.fdocuments.net/reader034/viewer/2022052622/5594a02e1a28ab19398b4619/html5/thumbnails/19.jpg)
Like this
8/14/201019
![Page 20: Up-Armoring The Elephant: Adding Kerberos-based Security to Hadoop](https://reader034.fdocuments.net/reader034/viewer/2022052622/5594a02e1a28ab19398b4619/html5/thumbnails/20.jpg)
Everyone now authenticated
Users browsing filesystem on command line
Users submitting jobs on command
line
Servers within system
• Datanodes Namenode
• Tasktrackers JobTracker
• SNN NameNode
Oozie
• Submits jobs on behalf of users
• Configurable proxy user
8/14/201020
![Page 21: Up-Armoring The Elephant: Adding Kerberos-based Security to Hadoop](https://reader034.fdocuments.net/reader034/viewer/2022052622/5594a02e1a28ab19398b4619/html5/thumbnails/21.jpg)
Additional security throughout system
• MapReduce system directory
• Task directory
• On-node HDFS directories
On-disk directory permissions are
700
• Linux Task Controller now runs as userwho owns job
Tasks run as user who launched them
• Use privileged ports for non-RPC calls
• Working on making this pluggable for other types of solutions
DataNodes’ports secured
• Streaming tasks can verify the identity of the TaskTracker and vice versaStreaming secured
8/14/201021
![Page 22: Up-Armoring The Elephant: Adding Kerberos-based Security to Hadoop](https://reader034.fdocuments.net/reader034/viewer/2022052622/5594a02e1a28ab19398b4619/html5/thumbnails/22.jpg)
How do I write a secure MapReduce job?
8/14/201022
Word count pre-security
![Page 23: Up-Armoring The Elephant: Adding Kerberos-based Security to Hadoop](https://reader034.fdocuments.net/reader034/viewer/2022052622/5594a02e1a28ab19398b4619/html5/thumbnails/23.jpg)
Word count post-
security
This is how
8/14/201023
No
changes!
![Page 24: Up-Armoring The Elephant: Adding Kerberos-based Security to Hadoop](https://reader034.fdocuments.net/reader034/viewer/2022052622/5594a02e1a28ab19398b4619/html5/thumbnails/24.jpg)
UserGroupInformation.java
• Completely re-written – nexus of authentication code
• Really should never have been public
New type of DistributedCache
• Public is available to all users
• Private is secured for only submitting user
Significant user-facing changes
8/14/201024
![Page 25: Up-Armoring The Elephant: Adding Kerberos-based Security to Hadoop](https://reader034.fdocuments.net/reader034/viewer/2022052622/5594a02e1a28ab19398b4619/html5/thumbnails/25.jpg)
Authenticating users for web access is pluggable
• Yahoo! has internal internal web authentication system, other organizations do as well.
Would really like to have a SPNEGO implementation
• Any volunteers?
Until then, the Doctor has returned
• Simple plugin returns DrWho for web access
Secure web access is pluggable
8/14/201025
![Page 26: Up-Armoring The Elephant: Adding Kerberos-based Security to Hadoop](https://reader034.fdocuments.net/reader034/viewer/2022052622/5594a02e1a28ab19398b4619/html5/thumbnails/26.jpg)
DistCP works… in 3 out of 4 cases
Destination
Cluster
Unsecure 20 Secure 20
So
urc
eC
luste
r Unsecure 20
✔ ✗
Secure 20
✔ ✔
8/14/201026
![Page 27: Up-Armoring The Elephant: Adding Kerberos-based Security to Hadoop](https://reader034.fdocuments.net/reader034/viewer/2022052622/5594a02e1a28ab19398b4619/html5/thumbnails/27.jpg)
Out of scope
On-disk encryption
Datanode directories’ permissions more
locked down
Actual block files and metadata not
encrypted
On-the-wire encryption
RPC and http data transfer sent in the
clear
Assumption that network is secure
8/14/201027
![Page 28: Up-Armoring The Elephant: Adding Kerberos-based Security to Hadoop](https://reader034.fdocuments.net/reader034/viewer/2022052622/5594a02e1a28ab19398b4619/html5/thumbnails/28.jpg)
Impact on performance
4%Maximum performance degradationallowed by our performance team. We met or bested this requirement.
8/14/201028
![Page 29: Up-Armoring The Elephant: Adding Kerberos-based Security to Hadoop](https://reader034.fdocuments.net/reader034/viewer/2022052622/5594a02e1a28ab19398b4619/html5/thumbnails/29.jpg)
Download from http://yhoo.it/aVAke1
Take security for a test drive
8/14/201029
![Page 30: Up-Armoring The Elephant: Adding Kerberos-based Security to Hadoop](https://reader034.fdocuments.net/reader034/viewer/2022052622/5594a02e1a28ab19398b4619/html5/thumbnails/30.jpg)
Gory details at http://bit.ly/aze3Ba
Or build a secure cluster at home
8/14/201030
![Page 31: Up-Armoring The Elephant: Adding Kerberos-based Security to Hadoop](https://reader034.fdocuments.net/reader034/viewer/2022052622/5594a02e1a28ab19398b4619/html5/thumbnails/31.jpg)
Other projects and security
Pig
• We’ve worked with the Pig team.
• Pig 6 and 7support security
Hbase
• Work in progress
• JIRA: HBASE-2016
Oozie
• Extensive collaboration.
• Oozie 2supports security
Hive
• Early work in progress
• JIRA: HIVE-1264
8/14/201031
![Page 32: Up-Armoring The Elephant: Adding Kerberos-based Security to Hadoop](https://reader034.fdocuments.net/reader034/viewer/2022052622/5594a02e1a28ab19398b4619/html5/thumbnails/32.jpg)
Yahoo!’s distribution
• Security deployed to all clusters.
• The rest soon.
• All patches in Yahoo!’s gitrepository at: http://github.com/yahoo/hadoop-common
• Committed to open-sourcing all improvements and bug fixes to Y20S.
Current state
8/14/201032
![Page 33: Up-Armoring The Elephant: Adding Kerberos-based Security to Hadoop](https://reader034.fdocuments.net/reader034/viewer/2022052622/5594a02e1a28ab19398b4619/html5/thumbnails/33.jpg)
Apache Distribution
• All of the security work has been forward-ported to trunk
• Still working on securing new-to-trunk features
• 22 will be first fully secured Apache release
Current state
8/14/201033
![Page 34: Up-Armoring The Elephant: Adding Kerberos-based Security to Hadoop](https://reader034.fdocuments.net/reader034/viewer/2022052622/5594a02e1a28ab19398b4619/html5/thumbnails/34.jpg)
Security list
8/14/201034
Send security holes to this
email list
Already have had two security issues identified,
fixes in-flight
![Page 35: Up-Armoring The Elephant: Adding Kerberos-based Security to Hadoop](https://reader034.fdocuments.net/reader034/viewer/2022052622/5594a02e1a28ab19398b4619/html5/thumbnails/35.jpg)
Questions?
8/14/201035