Apache Eagle @ IEEE International Conference

Post on 09-Feb-2017

706 views 0 download

Transcript of Apache Eagle @ IEEE International Conference

EAGLE: User Profile-based Anomaly Detection for Securing Hadoop Clusters

01 NOV, 2015

CHAITALI GUPTA, RANJAN SINHA, YONG ZHANG

Outline

Why EAGLE?

Architecture of EAGLE

User Profiles in EAGLE

Experiments

Performance Results

Future Work

Big Data @ eBay

800MListings *

159M Global Active Buyers *

*Q3 2015 data

7 Hadoop Clusters*

800MHDFS operations (single cluster)*

120 PB Data*

Motivation

Who is accessing the data?

What data are they accessing?

Is someone trying to access data that they don’t have access to?

Are there any anomalous access patterns?

Is there a security threat?

How to monitor and get notified during or prior to an anomalous event occurring?

ARCHITECTURE

STREAM PROCESSINGENGINE

Dat

a C

olle

ctor

Kaf

ka

HDFS, Audit, Security

METADATA MANAGER

DATA STO

RESREMEDIATION

ENGINEApache Ranger

MACHINE LEARNING MODULE

Custom module

Alerts

Activities

Alerts

PolicyThresholdsUser properties

ML Thresholds

Real Time Alert Dashboard

HDFS Archive

Security Analyst

Admin Console

Security Engineer

Insights

Metadata

Management

MACHINE LEARNING TRAINING MODULE

USER PROFILE ALGORITHMSDensity Estimation

• Compute mean and standard deviation

• Compute probability density estimation

• Detect anomaly if probability density below minimum probability density seen so far from training set

USER PROFILE ALGORITHMS…Eigen Value Decomposition

• Compute mean and variance

• Compute Eigen Vectors and determine Principal

Components

• Normal data points lie near first few principal

components

• Abnormal data points lie further from first few

principal components and closer to later

components

USER PROFILE ARCHITECTURE

EXPERIMENTAL METHODOLOGY

User Population

• 1500 ebay users accessing Hadoop clusters

Features• HDFS operation frequencies aggregated across one

minute interval • Examples

• Command frequencies• Time of the job

EXPERIMENTAL METHODOLOGY…

Determine users who are behaviorally different

• Compute Mahalanobis distance between users data

,where are mean and standard deviation

• Compute clusters

• Use behaviorally different users from a user as cross-validation set

PERFORMANCE RESULTS

Sensitivity

FUTURE WORK

• Apache incubation releases• Twitter feed: https://twitter.com/theapacheeagle

• Extend to HIVE, HBASE, Pig and other Big Data Technologies

• Explore alternative algorithms

• Consider more features

APACHE EAGLE - OPEN SOURCE

Eagle Site: http://goeagle.io

Tech Blog: http://www.ebaytechblog.com

Github Repo:https://github.com/eBay/Eagle

Apache Incubator Project: Oct 26, 2015

Thank You!