Apache Eagle: 来自eBay的分布式实时Hadoop数据安全引擎

28
Apache 来来 eBay 来来来来来来 Hadoop 来来来来来来 蒋蒋蒋 | 蒋蒋蒋 eBay

Transcript of Apache Eagle: 来自eBay的分布式实时Hadoop数据安全引擎

Page 1: Apache Eagle: 来自eBay的分布式实时Hadoop数据安全引擎

Apache• 来自 eBay 的分布式实时 Hadoop 数据安全引擎

蒋吉麟 | 赵晴雯eBay

Page 2: Apache Eagle: 来自eBay的分布式实时Hadoop数据安全引擎

2

Agenda•About Eagle•Front End

– Evolution– Modularization– Features

•Back End– Architecture– Tech Highlights– Integration

•Q & A

Page 3: Apache Eagle: 来自eBay的分布式实时Hadoop数据安全引擎

3

Apache Eagle is a distributed real-time monitoring and alerting engine for Hadoop from eBayOpen sourced as Apache Incubator Project on Oct 26th 2015

See http://eagle.incubator.apache.org or http://goeagle.io

Page 4: Apache Eagle: 来自eBay的分布式实时Hadoop数据安全引擎

4

Hadoop @eBay

1-10 nodes

2007

100+ nodes1000 + core

1 PB2010

20111000+ node10,000+ core10+ PB

4000+ node40,000+ core

50+ PB2013

201510,000+ nodes150,000+ cores150+ PB

200910+ nodes

Page 5: Apache Eagle: 来自eBay的分布式实时Hadoop数据安全引擎

5

•swf•exe

Page 6: Apache Eagle: 来自eBay的分布式实时Hadoop数据安全引擎

6

Features•common•metadata•classification•metrics

Page 7: Apache Eagle: 来自eBay的分布式实时Hadoop数据安全引擎

7

common•Policies•Alerts

Page 8: Apache Eagle: 来自eBay的分布式实时Hadoop数据安全引擎

8

metadata

Page 9: Apache Eagle: 来自eBay的分布式实时Hadoop数据安全引擎

9

classification•Tree View•Table View

Page 10: Apache Eagle: 来自eBay的分布式实时Hadoop数据安全引擎

10

metrics

Page 11: Apache Eagle: 来自eBay的分布式实时Hadoop数据安全引擎

11

ArchitectureSTREAM PROCESSING

ENGINEUser Profile

based Anomaly detection

Policy evaluation based

Framework

Eagle Storage(Metadata,

metrics, alerts…

User Profile training

Eagle Query

Dat

a Co

llect

ion(

Kaf

ka, Y

arn

API

…)

Hadoop jmx

Dat

a Si

nk(e

mai

l, K

akfa

…)

Other Remediation

Systems…

Page 12: Apache Eagle: 来自eBay的分布式实时Hadoop数据安全引擎

12

Tech Highlights•Data Collection•Stream Processing DSL•Distributed Policy Engine•ML-based anomaly detection•Query Framework

NOTE {NAME}-{NUMBER} like HDFS-6914 means open source project ticket id contributed by us

Page 13: Apache Eagle: 来自eBay的分布式实时Hadoop数据安全引擎

13

Apache Eagle – Data CollectionDecoupled with Apache Kafka• high-throughput distributed messaging• Easy to inject various kinds of data sources

• Python/Java/C++ Kafka clients

Current data sources support• Hadoop data

HDFS, HBase audit log GC logs JMX metrics History/Running MR job data

• …• Generic format data

Page 14: Apache Eagle: 来自eBay的分布式实时Hadoop数据安全引擎

14

Apache Eagle – Stream Processing DSLEasy use

– Easily assemble data transformation, filtering, join…Flexibility

– Physical execution platform independent

STREAM PROCESSING ENGINESTREAM PROCESSING ENGINE

.flatMap(AuditLogTransformer) .groupBy(_.user) .flatMap(UserProfileAggregator);

env.fromKafka (KafkaConfig)

.alert.persistAndEmail

val env = ExecutionEnvironment.getStorm()

env.execute()

Page 15: Apache Eagle: 来自eBay的分布式实时Hadoop数据安全引擎

15

Apache Eagle – Stream Processing DSL

.flatMap(AuditLogTransformer) .groupBy(_.user) .flatMap(UserProfileAggregator);

env.fromKafka (KafkaConfig)

.alert.persistAndEmail

val env = ExecutionEnvironment.getStorm()

env.execute();

Distributed Streaming Cluster Environment

AlertExecutor_{1}

AlertExecutor_{2}

AlertExecutor_{N}

Alerts

Real-time Event Stream

Stream_{1}

Stream_{*}

Stream Processing

env.execute()

Page 16: Apache Eagle: 来自eBay的分布式实时Hadoop数据安全引擎

16

Apache Eagle - Distributed Real-time Policy EngineFeatures

• Extensibility• Usability• Real-time• Scalability• Metadata-driven

METADATA MANAGER

Distributed Streaming Cluster Environment

AlertExecutor_{1}

AlertExecutor_{2}

AlertExecutor_{N}

Real Time Alerts

Alerts

Policy Management

Policy

Dynamical Policy Deployment

Real-time Event Stream

Stream_{1}

Stream_{*}

Dynamical Stream Schema

Stream Processing

Page 17: Apache Eagle: 来自eBay的分布式实时Hadoop数据安全引擎

17

Apache Eagle – Distributed Real-time Policy Engine

Distributed Real-time Policy Engine

Siddhi CEP Policy

Evaluator

Machine Learning Policy

Evaluator

Extensibility

• Default is WSO2 Siddhi CEP• Powerful SQL-Like event stream

processing• Open to other customized policy engine

Extensible Policy Evaluator

public interface PolicyEvaluatorServiceProvider {public String getPolicyType(); // literal string to identify one type of policypublic Class<? extends PolicyEvaluator<T>> getPolicyEvaluator(); // get policy

evaluator implementationpublic List getBindingModules(); // policy text with json format to object mapping

}

public interface PolicyEvaluator {public void evaluate(ValuesArray input) throws Exception; // evaluate

input eventpublic void onPolicyUpdate(AlertDefinitionAPIEntity newAlertDef);//

policy updatepublic void onPolicyDelete(); // invoked when policy is deleted

}

METADATA MANAGER

Policy/Metadata

Page 18: Apache Eagle: 来自eBay的分布式实时Hadoop数据安全引擎

18

Apache Eagle – Distributed Real-time Policy Engine

METADATA MANAGER

Distributed Streaming Cluster Environment

Real Time Alerts

Alerts

Policy Management

Policy

Dynamical Policy Deployment

Usability• Powerful SQL-Like CEP CQL

for Policy Definition• Dynamical Policy Lifecycle

Management (Deployment/Update)

• Easy-to-use Policy management and Alert analytics UI

from metricStream[(name == 'ReplLag') and (value > 1000)] select * insert into

outputStream;

Page 19: Apache Eagle: 来自eBay的分布式实时Hadoop数据安全引擎

19

Apache Eagle – Distributed Real-time Policy Engine

Page 20: Apache Eagle: 来自eBay的分布式实时Hadoop数据安全引擎

20

Apache Eagle – Distributed Real-time Policy Engine

Real-time• Stream events are

processed and alerts are evaluated during streaming

Distributed Streaming

AlertExecutor_{1}

AlertExecutor_{2}

AlertExecutor_{N}

Real Time Alerts

AlertsStream_{1}

Stream_{*}

Stream Processing

Real-time Event Stream

Page 21: Apache Eagle: 来自eBay的分布式实时Hadoop数据安全引擎

21

Apache Eagle – Distributed Real-time Policy Engine

Metadata-Driven

• Stream Schema: AlertStreamSchemaEntity

• Policy Definition: AlertDefinitionAPIEntity

@Table("alertdef")@ColumnFamily("f")@Prefix("alertdef")@Service(AlertConstants.ALERT_DEFINITION_SERVICE_ENDPOINT_NAME)@JsonIgnoreProperties(ignoreUnknown = true)@TimeSeries(false)@Tags({"site", "dataSource", "alertExecutorId", "policyId", "policyType"})@Indexes({ @Index(name="Index_1_alertExecutorId", columns = { "alertExecutorID" }, unique = true),})public class AlertDefinitionAPIEntity extends TaggedLogAPIEntity{ @Column("a") private String desc; @Column("b") private String policyDef; @Column("c") private String dedupeDef;

METADATA MANAGER

Distributed Real-time Policy Engine

Dynamic Metadata Loading

Page 22: Apache Eagle: 来自eBay的分布式实时Hadoop数据安全引擎

22

Apache Eagle – Distributed Real-time Policy Engine

Distributed Streaming Cluster Environment

AlertExecutor_{1}

AlertExecutor_{2}

AlertExecutor_{N}

Stream_{1}

Stream_{*}

Stream Processing

Scalability• Policy scalability: policy partitioning• Event scalability: grouping• Example: N Users with 3 partitions, M policies with 2 partitions, then 3*2 physical tasks

Page 23: Apache Eagle: 来自eBay的分布式实时Hadoop数据安全引擎

23

Apache Eagle – Query FrameworkQuery Syntax• Full-function SQL-Like REST

Query (aggregation, sorting…)

Eagle Storage• NOSQL storage like HBase• RDMS• Other storage systems

Page 25: Apache Eagle: 来自eBay的分布式实时Hadoop数据安全引擎

25

Apache Eagle – Integration I• Eagle in Apache Ambari

– natively be part of hadoop ecosystem– http://eagle.incubator.apache.org/docs/ambari-plugin-install.html

• Eagle in Docker– natively fly on Cloud/Container – https://github.com/apache/incubator-eagle

Page 26: Apache Eagle: 来自eBay的分布式实时Hadoop数据安全引擎

26

Apache Eagle – Integration II•Apache Ranger

– remediation engine– Eagle data source

•Splunk– Eagle alert consumer – EAGLE alert output is the 1st abstraction of analytics and Splunk is the 2nd abstraction

• Dataguise, Apache knox– Eagle data source

Page 27: Apache Eagle: 来自eBay的分布式实时Hadoop数据安全引擎

27

Learn more about Apache Eagle• EAGLE: USER PROFILE-BASED ANOMALY DETECTION IN HADOOP CLUSTER

(IEEE)• EAGLE: DISTRIBUTED REALTIME MONITORING FRAMEWORK FOR HADOOP

CLUSTER

Page 28: Apache Eagle: 来自eBay的分布式实时Hadoop数据安全引擎

28

Q&A

apache/incubator-eagle

@TheApacheEagle

@ApacheEagle

http://eagle.incubator.apache.org