Hadoop Elephant in Active Directory Forest

25
Hadoop Elephant in Active Directory Forest Marek Gawiński, Arkadiusz Osiński Allegro Group

Transcript of Hadoop Elephant in Active Directory Forest

Page 1: Hadoop Elephant in Active Directory Forest

Hadoop Elephant in Active Directory Forest

Marek Gawiński, Arkadiusz OsińskiAllegro Group

Page 2: Hadoop Elephant in Active Directory Forest
Page 3: Hadoop Elephant in Active Directory Forest

Agenda

● Goals and motivations● Technology stack● Architecture evolution● Automation integrating new servers● Making AD users and groups visible to Linux● Making architecture non-vulnerable to AD

service inaccessibility● Auto-deployment clients software on

desktops

Page 4: Hadoop Elephant in Active Directory Forest

Allegro Hadoop cluster in numbers

4 terabytes RAM2 petabytes disk space47 datanodes79 projects612 users

Page 5: Hadoop Elephant in Active Directory Forest

Goals and motivations

● Secured cluster● Central authentication and authorisation ● Compliance for real and project users and

groups● Cluster resources available from desktop● Integrating new servers automatically● Making whole architecture non-vulnerable

for failures or timeouts to AD● Auto-deployment and autoconfiguration of

Hadoop clients’ software on users desktops

Page 6: Hadoop Elephant in Active Directory Forest

Technology stack

● Cloudera CDH5● MIT Kerberos● Microsoft Active Directory● FreeIPA● sssd● puppet● msktutil● Hadoop desktop client

Page 7: Hadoop Elephant in Active Directory Forest

History - FreeIPA+FreeIPA Kerberos

Client

Secured Hadoop cluster

FreeIPA User

Local groups management

Kerberos KDCUser/pass

Kerberos Service Ticket

Che

ck u

ser/p

ass

Internal hadoop credsCheck groups

Page 8: Hadoop Elephant in Active Directory Forest

History - FreeIPA+own Kerberos

Client

Secured Hadoop cluster

FreeIPA User

Local groups managementKerberos Service Ticket

Che

ck u

ser/p

ass

User/pass

Inte

rnal

had

oop

cred

s

Check groups

Kerberos KDC

Kerberos KDC MIT

Page 9: Hadoop Elephant in Active Directory Forest

History - FreeIPA+own Kerberos+AD

Client

Secured Hadoop cluster

FreeIPA User

Local groups management

Kerberos KDC MIT

Kerberos Service Ticket

Che

ck u

ser/p

ass

AD User&Groups

AD KerberosChe

ck u

ser/p

ass

User/pass

Internal hadoop credsCheck groups

Check groupsUser/pass

Page 10: Hadoop Elephant in Active Directory Forest

Final - own Kerberos+AD

Client

Secured Hadoop cluster

Kerberos Service Ticket

AD User&Groups

AD KerberosChe

ck u

ser/p

ass

Kerberos KDC MIT

Internal hadoop creds

Check groupsUser/pass

Page 11: Hadoop Elephant in Active Directory Forest

Integrating new Linux servers automatically with AD

AD User&Groups

AD Kerberos

Msktutil

Kerberos keytab

Create user

Create principal

Page 12: Hadoop Elephant in Active Directory Forest

Integrating new Linux servers automatically with AD

define get_ad_keytab ( $path = '', ...) { ... $realm = 'SOME_REALM' $pass = hiera('hadoop_prod/ad/krb_manager_pass') $principal = "${title}/${host}@${realm}" $command = "echo ${pass} | kinit _hadoop_manager@${realm}; \ /usr/local/bin/add_ad_princ.sh ${title} ${host} ${path}; kdestroy" ...

msktutil -c -s $PRINCIPAL --upn $PRINCIPAL -k $KEYTAB \ --computer-name $COMPUTER_NAME \ --server $SERVER_KRB \ --realm $REALM \ -b $USER_LDAP_ROOT \ --dont-expire-password \ --description "\"$DESCRIPTION\"" \ --user-creds-only

Page 13: Hadoop Elephant in Active Directory Forest

Integrating new Linux servers automatically with AD

root@nn1:~# klist -ketKeytab name: FILE:/etc/krb5.keytabKVNO Timestamp Principal---- ------------------- ------------------------------------------------------ 1 08/17/2015 13:26:45 host/[email protected] (aes256-cts-hmac-sha1-96) 1 08/17/2015 13:26:45 host/[email protected] (aes128-cts-hmac-sha1-96) 1 08/17/2015 13:26:45 host/[email protected] (des3-cbc-sha1) 1 08/17/2015 13:26:45 host/[email protected] (arcfour-hmac) 1 08/17/2015 13:26:45 host/[email protected] (camellia128-cts-cmac) 1 08/17/2015 13:26:45 host/[email protected] (camellia256-cts-cmac) 4 08/17/2015 13:30:23 [email protected] (arcfour-hmac) 4 08/17/2015 13:30:23 [email protected] (aes128-cts-hmac-sha1-96) 4 08/17/2015 13:30:23 [email protected] (aes256-cts-hmac-sha1-96) 4 08/17/2015 13:30:23 host/[email protected] (arcfour-hmac) 4 08/17/2015 13:30:23 host/[email protected] (aes128-cts-hmac-sha1-96) 4 08/17/2015 13:30:23 host/[email protected] (aes256-cts-hmac-sha1-96)

Page 14: Hadoop Elephant in Active Directory Forest

Integrating new Linux servers automatically with AD

Separated Subtree in AD structure

Page 15: Hadoop Elephant in Active Directory Forest

System Security Services Daemon

● Identity and authentication● Multiple providers (FreeIPA, LDAP, AD)● High availability for backends● Provides PAM and NSS modules● Caching● > 1.11.x - stable support for AD forest auth

Page 16: Hadoop Elephant in Active Directory Forest

System Security Services Daemon

AD schema with no modifications

/etc/sssd/sssd.conf

[domain/AD.REALM]id_provider = adad_server = h1, h2, h3ad_backup_server = hb1, hb2, hb3auth_provider = adchpass_provider = adaccess_provider = adenumerate = Falsekrb5_realm = AD.REALMldap_schema = adldap_id_mapping = Truecache_credentials = Trueldap_access_order = expireldap_account_expire_policy = adldap_force_upper_case_realm = truefallback_homedir = /home/AD.REALM/%udefault_shell = /bin/falseldap_referrals = false

root@nn1:~# id _hc_tech_prod |tr "," "\n"uid=1827653611(_hc_tech_prod)gid=1827600513(domain users)groups=1827600513(domain users)1827652945(_gr_hc_users_common)1827647474(_gr_hc_hadoop_prod)1827652940(_gr_hc_project1_prod)1827652919(_gr_hc_project2_prod)

Page 17: Hadoop Elephant in Active Directory Forest

Making whole architecture non-vulnerable for failures

/etc/sssd/sssd.conf

[nss]memcache_timeout = 3600

Local filesystem nss cache

Active Closest DC

Fallback servers in Remote DC

Page 18: Hadoop Elephant in Active Directory Forest

Auto-deployment and autoconfiguration on desktops

● Install script for Hadoop Client on desktops● Refresh configs with currently prod environment● Support for HDFS/YARN/Hive/Spark

[marek.gawinski:~/ALLEHADOOP] $ sh env.shPassword for [email protected]: **************

[marek.gawinski:~/ALLEHADOOP] $ klistTicket cache: FILE:/tmp/krb5cc_1511317717Default principal: [email protected]

Valid starting Expires Service principal09/04/15 23:31:35 09/05/15 09:31:35 krbtgt/[email protected]

renew until 09/11/15 23:31:33

Page 19: Hadoop Elephant in Active Directory Forest

Auto-deployment and autoconfiguration on desktops

[marek.gawinski:~/ALLEHADOOP] $ hivehive (default)> show databases;OKdatabase_nametpch_benchmarks...xwing_pocTime taken: 0.816 seconds, Fetched: 72 row(s)hive (default)> set hive.execution.engine = tez;hive (default)> select count(*) from table1;

[marek.gawinski:~/ALLEHADOOP] $ hdfs dfs -lsFound 8 itemsdrwxr-xr-x - marek.gawinski hadoop 0 2015-08-06 02:00 .Trashdrwxr-xr-x - marek.gawinski hadoop 0 2015-07-28 21:01 .hiveJarsdrwxr-xr-x - marek.gawinski hadoop 0 2015-07-09 10:43 .sparkStagingdrwx------ - marek.gawinski hadoop 0 2015-05-22 02:35 .stagingdrwxr-xr-x - marek.gawinski hadoop 0 2015-08-31 13:11 oozie1-rw-r--r-- 3 marek.gawinski hadoop 43 2015-05-26 15:26 ozzietest1.hql-rw-r--r-- 3 marek.gawinski hadoop 13 2015-08-31 12:30 pwd.txtdrwxr-xr-x - marek.gawinski hadoop 0 2015-04-16 16:21 tables

Page 20: Hadoop Elephant in Active Directory Forest

Auto-deployment and autoconfiguration on desktops

Page 21: Hadoop Elephant in Active Directory Forest

Auto-deployment and autoconfiguration on desktops

Page 22: Hadoop Elephant in Active Directory Forest

Auto-deployment and autoconfiguration on desktops

Page 23: Hadoop Elephant in Active Directory Forest

Auto-deployment and autoconfiguration on desktops

Page 24: Hadoop Elephant in Active Directory Forest

Benefits

● One standard for access control to all company resources

● Every new employee automatically can play with Hadoop with no additional effort

● One password to all systems

Page 25: Hadoop Elephant in Active Directory Forest

Thank you!

Questions?