Shug meetup Hops Hadoop
-
Upload
jim-dowling -
Category
Technology
-
view
53 -
download
1
Transcript of Shug meetup Hops Hadoop
Multi-Tenant Hadoop-as-a-Service (for free!)
Jim Dowling Associate Prof @ KTH
Senior Researcher @ SICSCEO @ Hops AB
SHUG Meetup, Stockholm, April 21st 2016
www.hops.io @hopshadoop
(Some Slides by Prof. Tor Björn Minde, CEO SICS North Swedish ICT AB)
3
Talk Overview
• World’s First Open Data Centre for Big Data in Luleå
• Metadata in Hadoop
• True Multi-Tenancy for Hadoop
• DEMO: Spark/Flink/Hadoop-as-a-Service
4
Vision SICS ICE research facilityA 2 MW datacenter research and test environment
Purpose: Increase knowledge, strengthen universities, companies and researchers
R&D institute, 5 lab modules, 3-4000 servers, 2-3000 square meters
5
What SICS ICE will offer1. Compute capacity and tools for big data and cloud
• Hadoop/Spark/Flink-as-a-Service
2. Demonstration space for new products & solutions
3. Datacenter infrastructure for experiments and facility data• Flexible lab modules and re-configuration• Measurement equipment for energy, cooling, capacity
4. Competence for verticals and datacenter infrastructure
6
Status of SICS-ICE research facility(ICE = Infrastructure and Cloud research Environment)
Phase 1 (1 room built)• Establish test projects in a “room-in-
room” commercial co-location facility • Start of operation February 2016• Officially Launched in April 2016
Phase 2 (Design phase) • Design of a flexible and general research
facility summer-fall 2016• Contracts with Akademiska Hus & E.ON• Plan is to start build phase Spring 2017• Plan is to start installation fall 2017• Plan is to start operation early 2018
7
Phase 1 room-in-room module 1
8
A Data Center Optimized for Hadoop
Dell servers from Hi5 in module 1
• 3600 cores• 40 TB RAM• Up to 7.5 petabyte storage• 10/40 Gb/s network• Separate management network
Hadoop-as-a-Service on SICS ICE
9
But First…. MetaData in Hadoop
10
Metadata Totem Poles in Hadoop
11Eventual Consistency
12
With Many Hadoop Clusters
Cluster 1 Cluster N
MetaDataService
MetaDataService
MetaData Service (Aggregator)
Eventually consistent MetaData aggregated using moreeventually consistent protocols.
MetaData in Hops Hadoop
HDFSYARN
NDB
ProjectsDataSets
Users
ProvenanceSearch
HistoryCustomMetaData
13
Case Study: Access Control as a MetaData Service
14
15
Access Control in Relational Databases# Multi-tenancy for alice and bob on db1 and db2
grant all privileges on db1.* to ‘alice'@‘%‘;grant all privileges on db2.* to ‘bob'@‘%‘;
#More fine-grained privilegesgrant SELECT privileges on db2.sensitiveTable to ‘alice'@‘192.168.1.2‘;
Databases ensure the consistency of security and policies using foreign keys.
“drop table db2.sensitiveTable” => delete associated privileges
16
Access Control in Hadoop: Apache Sentry
How do you ensure the consistency of the policies and the data?
[Mujumdar’15]
17
Policy Editor for Sentry
Administrators administer privileges for users
18
Problem: Sensitive Data needs its own Cluster
NSA DataSet
User DataSet
has access to
has access to
Alice can copy/cross-link between data sets
Alice has only one Kerberos Identity. Neither attribute-based access control nor dynamic roles supported in Hadoop.
Alice
19
Solution: Project-Specific UserIDs
Project NSA
Project UsersMember of
NSA__Alice
Users__Alice
Member of
HDFS enforcesaccess control
How can we share DataSets between Projects?
20
Sharing DataSets with HopsWorks
Project NSA
Project UsersMember of
DataSetowns
Add members of Project NSA to the DataSet group
NSA__Alice
Users__Alice
Member of
Web Application Enforces Dynamic Roles
21
NSA__Alice
Authenticate
Users__Alice
HopsWorks
HopsFS
HopsYARN
Projects
SecureImpersonation
22
User• Authentication Provider
- JDBC Realm- 2-Factor Authentication- LDAP
23
Project• Users
- Roles: Owner, Data Scientist
• DataSets - Home project- Can be shared
24
Project Roles• Data Owner Privileges
- Import/Export data- Manage Membership- Share DataSets
• Data Scientist Privileges- Write code- Run code- Request access to DataSets
We delegate administration of privileges to users
25
Per Project CPU and Storage Quotas• 300 GB per Project
• 1000 CPU mins
• Uber-Style Pricing- Elastic Demand Curve
27
Sharing DataSets between Projects
The same as Sharing Folders in Dropbox
28
Delegate Access Control to HDFS• HDFS enforces access control- UserID per Project- GroupID per
Project and DataSet
• Metadata Integrity using Foreign Keys- Removing a project removes
all users, groups, extended metadata, and (optionally) DataSets.
29
Free Text Search with Consistent Metadata
Free-Text Search
Distributed DatabaseElasticSearch
The Distributed Database is the Single Source of Truth.Foreign keys ensure the integrity of Metadata.
MetaDataDesigner
MetaDataEntry
30
The NoteBook Proxy Wars
Demo
31
32
Short-Term RoadMap• Multi-tenant Kafka
- Per-project Topics
• Oozie Workflow Editor
• Genomics Support with Adam/Spark
• Tiered Storage: Hot Data, Normal, Archived
• Improved Data Ingress- Sharing Public DataSets Globally using P2P technology
The TeamActive: Jim Dowling, Seif Haridi, Tor Björn Minde,
Gautier Berthou, Salman Niazi, Mahmoud Ismail,Kamal Hakimzadeh, Ermias Gebremeskel, Theofilos Kakantousis, Johan Svedlund Nordström, Someya Sayeh, Vasileios Giannokostas, Antonios Kouzoupis, Misganu Dessalegn, Rizvi Hasan,Ahmad Al-Shishtawy, Ali Gholami, Paul Mälzer.
Alumni: K. “Sri” Srijeyanthan, Steffen Grohsschmiedt, Alberto Lorente, Andre Moré, Stig Viaene, Hooman Peiro, Evangelos Savvidis, Jude D’Souza, Qi Qi, Gayana Chandrasekara,Nikolaos Stanogias, Daniel Bali, Ioannis Kerkinos,Peter Buechler, Pushparaj Motamari, Hamid Afzali,Wasif Malik, Lalith Suresh, Mariano Valles, Ying Lieu.
Hops
34
Conclusions• HopsWork is providing a world’s first: Hadoop-as-a-Service to researchers and industry.
• Workshop on 12th May, 17.30 – 20.00 in SICS, 6th Floor of the Electrum Building, Kista.Register at www.hops.io/?q=news
• Join the team – talk to me!
www.hops.iowww.hops.site