Hadoop Elephant in Active Directory Forest

Post on 10-Feb-2017

230 views 0 download

Transcript of Hadoop Elephant in Active Directory Forest

Hadoop Elephant in Active Directory Forest

Marek Gawiński, Arkadiusz OsińskiAllegro Group

Agenda

● Goals and motivations● Technology stack● Architecture evolution● Automation integrating new servers● Making AD users and groups visible to Linux● Making architecture non-vulnerable to AD

service inaccessibility● Auto-deployment clients software on

desktops

Allegro Hadoop cluster in numbers

4 terabytes RAM2 petabytes disk space47 datanodes79 projects612 users

Goals and motivations

● Secured cluster● Central authentication and authorisation ● Compliance for real and project users and

groups● Cluster resources available from desktop● Integrating new servers automatically● Making whole architecture non-vulnerable

for failures or timeouts to AD● Auto-deployment and autoconfiguration of

Hadoop clients’ software on users desktops

Technology stack

● Cloudera CDH5● MIT Kerberos● Microsoft Active Directory● FreeIPA● sssd● puppet● msktutil● Hadoop desktop client

History - FreeIPA+FreeIPA Kerberos

Client

Secured Hadoop cluster

FreeIPA User

Local groups management

Kerberos KDCUser/pass

Kerberos Service Ticket

Che

ck u

ser/p

ass

Internal hadoop credsCheck groups

History - FreeIPA+own Kerberos

Client

Secured Hadoop cluster

FreeIPA User

Local groups managementKerberos Service Ticket

Che

ck u

ser/p

ass

User/pass

Inte

rnal

had

oop

cred

s

Check groups

Kerberos KDC

Kerberos KDC MIT

History - FreeIPA+own Kerberos+AD

Client

Secured Hadoop cluster

FreeIPA User

Local groups management

Kerberos KDC MIT

Kerberos Service Ticket

Che

ck u

ser/p

ass

AD User&Groups

AD KerberosChe

ck u

ser/p

ass

User/pass

Internal hadoop credsCheck groups

Check groupsUser/pass

Final - own Kerberos+AD

Client

Secured Hadoop cluster

Kerberos Service Ticket

AD User&Groups

AD KerberosChe

ck u

ser/p

ass

Kerberos KDC MIT

Internal hadoop creds

Check groupsUser/pass

Integrating new Linux servers automatically with AD

AD User&Groups

AD Kerberos

Msktutil

Kerberos keytab

Create user

Create principal

Integrating new Linux servers automatically with AD

define get_ad_keytab ( $path = '', ...) { ... $realm = 'SOME_REALM' $pass = hiera('hadoop_prod/ad/krb_manager_pass') $principal = "${title}/${host}@${realm}" $command = "echo ${pass} | kinit _hadoop_manager@${realm}; \ /usr/local/bin/add_ad_princ.sh ${title} ${host} ${path}; kdestroy" ...

msktutil -c -s $PRINCIPAL --upn $PRINCIPAL -k $KEYTAB \ --computer-name $COMPUTER_NAME \ --server $SERVER_KRB \ --realm $REALM \ -b $USER_LDAP_ROOT \ --dont-expire-password \ --description "\"$DESCRIPTION\"" \ --user-creds-only

Integrating new Linux servers automatically with AD

root@nn1:~# klist -ketKeytab name: FILE:/etc/krb5.keytabKVNO Timestamp Principal---- ------------------- ------------------------------------------------------ 1 08/17/2015 13:26:45 host/nn1.local@IPA.REALM (aes256-cts-hmac-sha1-96) 1 08/17/2015 13:26:45 host/nn1.local@IPA.REALM (aes128-cts-hmac-sha1-96) 1 08/17/2015 13:26:45 host/nn1.local@IPA.REALM (des3-cbc-sha1) 1 08/17/2015 13:26:45 host/nn1.local@IPA.REALM (arcfour-hmac) 1 08/17/2015 13:26:45 host/nn1.local@IPA.REALM (camellia128-cts-cmac) 1 08/17/2015 13:26:45 host/nn1.local@IPA.REALM (camellia256-cts-cmac) 4 08/17/2015 13:30:23 91c76848bc458b62e67$@AD.REALM (arcfour-hmac) 4 08/17/2015 13:30:23 91c76848bc458b62e67$@AD.REALM (aes128-cts-hmac-sha1-96) 4 08/17/2015 13:30:23 91c76848bc458b62e67$@AD.REALM (aes256-cts-hmac-sha1-96) 4 08/17/2015 13:30:23 host/nn1.local@AD.REALM (arcfour-hmac) 4 08/17/2015 13:30:23 host/nn1.local@AD.REALM (aes128-cts-hmac-sha1-96) 4 08/17/2015 13:30:23 host/nn1.local@AD.REALM (aes256-cts-hmac-sha1-96)

Integrating new Linux servers automatically with AD

Separated Subtree in AD structure

System Security Services Daemon

● Identity and authentication● Multiple providers (FreeIPA, LDAP, AD)● High availability for backends● Provides PAM and NSS modules● Caching● > 1.11.x - stable support for AD forest auth

System Security Services Daemon

AD schema with no modifications

/etc/sssd/sssd.conf

[domain/AD.REALM]id_provider = adad_server = h1, h2, h3ad_backup_server = hb1, hb2, hb3auth_provider = adchpass_provider = adaccess_provider = adenumerate = Falsekrb5_realm = AD.REALMldap_schema = adldap_id_mapping = Truecache_credentials = Trueldap_access_order = expireldap_account_expire_policy = adldap_force_upper_case_realm = truefallback_homedir = /home/AD.REALM/%udefault_shell = /bin/falseldap_referrals = false

root@nn1:~# id _hc_tech_prod |tr "," "\n"uid=1827653611(_hc_tech_prod)gid=1827600513(domain users)groups=1827600513(domain users)1827652945(_gr_hc_users_common)1827647474(_gr_hc_hadoop_prod)1827652940(_gr_hc_project1_prod)1827652919(_gr_hc_project2_prod)

Making whole architecture non-vulnerable for failures

/etc/sssd/sssd.conf

[nss]memcache_timeout = 3600

Local filesystem nss cache

Active Closest DC

Fallback servers in Remote DC

Auto-deployment and autoconfiguration on desktops

● Install script for Hadoop Client on desktops● Refresh configs with currently prod environment● Support for HDFS/YARN/Hive/Spark

[marek.gawinski:~/ALLEHADOOP] $ sh env.shPassword for marek.gawinski@AD.REALM: **************

[marek.gawinski:~/ALLEHADOOP] $ klistTicket cache: FILE:/tmp/krb5cc_1511317717Default principal: marek.gawinski@AD.REALM

Valid starting Expires Service principal09/04/15 23:31:35 09/05/15 09:31:35 krbtgt/AD.REALM@AD.REALM

renew until 09/11/15 23:31:33

Auto-deployment and autoconfiguration on desktops

[marek.gawinski:~/ALLEHADOOP] $ hivehive (default)> show databases;OKdatabase_nametpch_benchmarks...xwing_pocTime taken: 0.816 seconds, Fetched: 72 row(s)hive (default)> set hive.execution.engine = tez;hive (default)> select count(*) from table1;

[marek.gawinski:~/ALLEHADOOP] $ hdfs dfs -lsFound 8 itemsdrwxr-xr-x - marek.gawinski hadoop 0 2015-08-06 02:00 .Trashdrwxr-xr-x - marek.gawinski hadoop 0 2015-07-28 21:01 .hiveJarsdrwxr-xr-x - marek.gawinski hadoop 0 2015-07-09 10:43 .sparkStagingdrwx------ - marek.gawinski hadoop 0 2015-05-22 02:35 .stagingdrwxr-xr-x - marek.gawinski hadoop 0 2015-08-31 13:11 oozie1-rw-r--r-- 3 marek.gawinski hadoop 43 2015-05-26 15:26 ozzietest1.hql-rw-r--r-- 3 marek.gawinski hadoop 13 2015-08-31 12:30 pwd.txtdrwxr-xr-x - marek.gawinski hadoop 0 2015-04-16 16:21 tables

Auto-deployment and autoconfiguration on desktops

Auto-deployment and autoconfiguration on desktops

Auto-deployment and autoconfiguration on desktops

Auto-deployment and autoconfiguration on desktops

Benefits

● One standard for access control to all company resources

● Every new employee automatically can play with Hadoop with no additional effort

● One password to all systems

Thank you!

Questions?