Apache Eagle @ IEEE International Conference

14
EAGLE: User Profile-based Anomaly Detection for Securing Hadoop Clusters 01 NOV, 2015 CHAITALI GUPTA, RANJAN SINHA, YONG ZHANG

Transcript of Apache Eagle @ IEEE International Conference

Page 1: Apache Eagle @ IEEE International Conference

EAGLE: User Profile-based Anomaly Detection for Securing Hadoop Clusters

01 NOV, 2015

CHAITALI GUPTA, RANJAN SINHA, YONG ZHANG

Page 2: Apache Eagle @ IEEE International Conference

Outline

Why EAGLE?

Architecture of EAGLE

User Profiles in EAGLE

Experiments

Performance Results

Future Work

Page 3: Apache Eagle @ IEEE International Conference

Big Data @ eBay

800MListings *

159M Global Active Buyers *

*Q3 2015 data

7 Hadoop Clusters*

800MHDFS operations (single cluster)*

120 PB Data*

Page 4: Apache Eagle @ IEEE International Conference

Motivation

Who is accessing the data?

What data are they accessing?

Is someone trying to access data that they don’t have access to?

Are there any anomalous access patterns?

Is there a security threat?

How to monitor and get notified during or prior to an anomalous event occurring?

Page 5: Apache Eagle @ IEEE International Conference

ARCHITECTURE

STREAM PROCESSINGENGINE

Dat

a C

olle

ctor

Kaf

ka

HDFS, Audit, Security

METADATA MANAGER

DATA STO

RESREMEDIATION

ENGINEApache Ranger

MACHINE LEARNING MODULE

Custom module

Alerts

Activities

Alerts

PolicyThresholdsUser properties

ML Thresholds

Real Time Alert Dashboard

HDFS Archive

Security Analyst

Admin Console

Security Engineer

Insights

Metadata

Management

MACHINE LEARNING TRAINING MODULE

Page 6: Apache Eagle @ IEEE International Conference

USER PROFILE ALGORITHMSDensity Estimation

• Compute mean and standard deviation

• Compute probability density estimation

• Detect anomaly if probability density below minimum probability density seen so far from training set

Page 7: Apache Eagle @ IEEE International Conference

USER PROFILE ALGORITHMS…Eigen Value Decomposition

• Compute mean and variance

• Compute Eigen Vectors and determine Principal

Components

• Normal data points lie near first few principal

components

• Abnormal data points lie further from first few

principal components and closer to later

components

Page 8: Apache Eagle @ IEEE International Conference

USER PROFILE ARCHITECTURE

Page 9: Apache Eagle @ IEEE International Conference

EXPERIMENTAL METHODOLOGY

User Population

• 1500 ebay users accessing Hadoop clusters

Features• HDFS operation frequencies aggregated across one

minute interval • Examples

• Command frequencies• Time of the job

Page 10: Apache Eagle @ IEEE International Conference

EXPERIMENTAL METHODOLOGY…

Determine users who are behaviorally different

• Compute Mahalanobis distance between users data

,where are mean and standard deviation

• Compute clusters

• Use behaviorally different users from a user as cross-validation set

Page 11: Apache Eagle @ IEEE International Conference

PERFORMANCE RESULTS

Sensitivity

Page 12: Apache Eagle @ IEEE International Conference

FUTURE WORK

• Apache incubation releases• Twitter feed: https://twitter.com/theapacheeagle

• Extend to HIVE, HBASE, Pig and other Big Data Technologies

• Explore alternative algorithms

• Consider more features

Page 13: Apache Eagle @ IEEE International Conference

APACHE EAGLE - OPEN SOURCE

Eagle Site: http://goeagle.io

Tech Blog: http://www.ebaytechblog.com

Github Repo:https://github.com/eBay/Eagle

Apache Incubator Project: Oct 26, 2015

Page 14: Apache Eagle @ IEEE International Conference

Thank You!