Taking Hadoop to Enterprise Security Standards

23
©2014 LinkedIn Corporation. All Rights Reserved. Taking Hadoop to Enterprise Security Standards Karthik Ramasamy Harsh Singhal Arvind Mani

description

 

Transcript of Taking Hadoop to Enterprise Security Standards

Page 1: Taking Hadoop to Enterprise Security Standards

©2014 LinkedIn Corporation. All Rights Reserved.

Taking Hadoop to Enterprise Security StandardsKarthik Ramasamy

Harsh Singhal

Arvind Mani

Page 2: Taking Hadoop to Enterprise Security Standards

Access Control

Page 3: Taking Hadoop to Enterprise Security Standards

How many of you need or have access control in Hadoop?

Page 4: Taking Hadoop to Enterprise Security Standards

©2014 LinkedIn Corporation. All Rights Reserved.

Users First Internal Threat

Keeping Data Secure

External Threat

Page 5: Taking Hadoop to Enterprise Security Standards

More granular the access controls are more people can have access to

the data

Page 6: Taking Hadoop to Enterprise Security Standards

©2014 LinkedIn Corporation. All Rights Reserved.

Hadoop – Status Quo

Multiple Query Execution Engines

Custom Code Execution

Auditing

Page 7: Taking Hadoop to Enterprise Security Standards

©2014 LinkedIn Corporation. All Rights Reserved.

User ID Email Address IP address Billing address

Security Customer Service Data Scientist

Adding & Removing group membership can take up to few hours

HDFS file permissions are very coarse (at file level)

HDFS File Permissions

Page 8: Taking Hadoop to Enterprise Security Standards

©2014 LinkedIn Corporation. All Rights Reserved.

Other Access Control Solutions

Page 9: Taking Hadoop to Enterprise Security Standards

©2014 LinkedIn Corporation. All Rights Reserved.

Mixed Data Multiple Data Processing Systems

Data for Everyone

Challenges

Page 10: Taking Hadoop to Enterprise Security Standards

©2014 LinkedIn Corporation. All Rights Reserved.

Extensible

Authorization

Fine Grain Control

Fast Changes to Authorization

Rules

What do we need?

Page 11: Taking Hadoop to Enterprise Security Standards

©2014 LinkedIn Corporation. All Rights Reserved.

Our Solution: Access Control via Encryption

Apache Kafka

HDFS

Event name

Symm

etric Encryption Key

Key Server

Parq

uet

ETLEncrypted Events

Page 12: Taking Hadoop to Enterprise Security Standards

©2014 LinkedIn Corporation. All Rights Reserved.

User A’s Job

User B’s Job

User C’s Job

Producer Job

ETL User

Parquet File

User Columns

A 5

B 2, 5

Key Server

Access Control via Encryption

Page 13: Taking Hadoop to Enterprise Security Standards

©2014 LinkedIn Corporation. All Rights Reserved.

Columnar Storage

Page 0

Page 1

Page 2

Column a Column b

Row

gro

up

Parquet Format

Brief Overview of Parquet

Page 14: Taking Hadoop to Enterprise Security Standards

©2014 LinkedIn Corporation. All Rights Reserved. *Yet to be integrated into open source Parquet

Field mode

Page

Column

| Page Mode | Hybrid Mode

Encryption Support in Parquet*

Page 15: Taking Hadoop to Enterprise Security Standards

©2014 LinkedIn Corporation. All Rights Reserved.

Examples Emails – Analysts need it to join with other tables but may not require

access to individual emails

N Values (Page)

Encrypt each value at a time

[email protected]

[email protected]

[email protected]

[email protected]

xxxxxxx

yyyyyyy

yyyyyyy

zzzzzzz

Field Mode

Page 16: Taking Hadoop to Enterprise Security Standards

©2014 LinkedIn Corporation. All Rights Reserved.

Field Mode

Joins Counts Distribution Analysis

No/Low compression

Page 17: Taking Hadoop to Enterprise Security Standards

©2014 LinkedIn Corporation. All Rights Reserved.

Page Mode

No information is leaked except entropy of the data Better performance than other modes

N Values (Page)

Encode Compress Encrypt

Page 18: Taking Hadoop to Enterprise Security Standards

©2014 LinkedIn Corporation. All Rights Reserved.

Hybrid Mode

More fine grain control of information Increase in overhead due to double encryption/decryption

N Values (Page)

Encrypt each value Encrypt

Page 19: Taking Hadoop to Enterprise Security Standards

©2014 LinkedIn Corporation. All Rights Reserved.

Plain Text | Encrypted Value |No Access

Field Mode Page Mode

Hybrid Mode

Page 20: Taking Hadoop to Enterprise Security Standards

©2014 LinkedIn Corporation. All Rights Reserved.

Key Versioning

Each key is versioned and specific for a source (File/Event name) Reduces the exposure incase of key leakage Time based access control

– All users by default can access only last 30 days of data– Give users access to data in specific time period

Authentication of producers can be done separately

Page 21: Taking Hadoop to Enterprise Security Standards

©2014 LinkedIn Corporation. All Rights Reserved.

Better Auditing Coverage

Retention Enforcement

Key Server Features

Multifactor Authentication

Page 22: Taking Hadoop to Enterprise Security Standards

©2014 LinkedIn Corporation. All Rights Reserved.

PIG Usage

Page 23: Taking Hadoop to Enterprise Security Standards

Thank you!