Integrating Apache NiFi and Apache Apex

32
Page 1 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Integrating Apache NiFi and Apache Apex Feb 25 th 2016 Bryan Bende – Member of Technical Staff

Transcript of Integrating Apache NiFi and Apache Apex

Page 1 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Integrating Apache NiFi and Apache Apex

Feb 25th 2016

Bryan Bende – Member of Technical Staff

Page 2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Outline

•  Introduction to NiFi

•  NiFi Site-To-Site

•  Apex + NiFi Integration

•  Use Case Discussion

Page 3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

About Me

•  Member of Technical Staff at Hortonworks

•  Apache NiFi Committer & PMC Member

•  Contributed NiFi + Apex Integration

•  Twitter: @bbende / Blog: bryanbende.com

Page 4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Introduction to Apache NiFi

Page 5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Apache NiFi •  Powerful and reliable system to process and

distribute data •  Directed graphs of data routing and transformation

•  Web-based User Interface for creating, monitoring, & controlling data flows

•  Highly configurable - modify data flow at runtime, dynamically prioritize data

•  Data Provenance tracks data through entire system

•  Easily extensible through development of custom components

[1] https://nifi.apache.org/

Page 6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

NiFi - Terminology FlowFile

•  Unit of data moving through the system •  Content + Attributes (key/value pairs)

Processor •  Performs the work, can access FlowFiles

Connection •  Links between processors •  Queues that can be dynamically prioritized

Process Group •  Set of processors and their connections •  Receive data via input ports, send data via output ports

Page 7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

NiFi - User Interface

•  Drag and drop processors to build a flow •  Start, stop, and configure components in real time •  View errors and corresponding error messages •  View statistics and health of data flow •  Create templates of common processor & connections

Page 8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

NiFi - Provenance

•  Tracks data at each point as it flows through the system

•  Records, indexes, and makes events available for display

•  Handles fan-in/fan-out, i.e. merging and splitting data

•  View attributes and content at given points in time

Page 9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

NiFi - Queue Prioritization

•  Configure a prioritizer per connection

•  Determine what is important for your data – time based, arrival order, importance of a data set

•  Funnel many connections down to a single connection to prioritize across data sets

•  Develop your own prioritizer if needed

Page 10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

NiFi - Extensibility

Built from the ground up with extensions in mind

Service-loader pattern for… •  Processors •  Controller Services •  Reporting Tasks •  Prioritizers

Extensions packaged as NiFi Archives (NARs) •  Deploy NiFi lib directory and restart •  Provides ClassLoader isolation •  Same model as standard components

Page 11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

NiFi - Architecture

OS/Host

JVM

Flow Controller

Web Server

Processor 1 Extension N

FlowFile Repository

Content Repository

Provenance Repository

Local Storage

OS/Host

JVM

Flow Controller

Web Server

Processor 1 Extension N

FlowFile Repository

Content Repository

Provenance Repository

Local Storage

OS/Host

JVM

NiFi Cluster Manager – Request Replicator

Web Server

Master NiFi Cluster Manager (NCM)

OS/Host

JVM

Flow Controller

Web Server

Processor 1 Extension N

FlowFile Repository

Content Repository

Provenance Repository

Local Storage

Slaves NiFi Nodes

Page 12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

NiFi Site-To-Site

Page 13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

NiFi Site-To-Site

•  Direct communication between two NiFi instances

•  Push to Input Port on receiver, or Pull from Output Port on source

•  Communicate between clusters, standalone instances, or both

•  Handles load balancing and reliable delivery

•  Secure connections using certificates (optional)

Page 14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Site-To-Site Push

•  Source connects Remote Process Group to Input Port on destination

•  Site-To-Site takes care of load balancing across the nodes in the cluster

NCM

Node 1

Input Port

Node 2

Input Port

Standalone NiFi

RPG

Page 15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Site-To-Site Pull

•  Destination connects Remote Process Group to Output Port on the source

•  If source was a cluster, each node would pull from each node in cluster

NCM

Node 1

RPG

Node 2

RPG

Standalone NiFi

Output Port

Page 16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Site-To-Site Client

•  Code for Site-To-Site broken out into reusable module •  https://github.com/apache/nifi/tree/master/nifi-commons/nifi-site-to-site-client

•  Can be used from any Java program to push/pull from NiFi

Java Program

Site-To-Site Client

Node 1

Output Port

NCM

Node 2

Output Port

Page 17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Apex + NiFi Integration

Page 18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Apex + NiFi Integration

•  Use Site-To-Site Client in Apex/Malhar Operators

•  Input operators to pull data from NiFi Output Port

•  Output operators to push data to NiFi Input Port

•  NiFiDataPacket to represent data to/from NiFi (think FlowFile) public interface NiFiDataPacket { byte[] getContent(); Map<String, String> getAttributes(); }

Page 19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Apex NiFi Input Operators

AbstractNiFiInputOperator§  Base class for NiFi Input Operators §  Provides interaction with Site-to-Site client, handles replaying of windows §  Delegates to sub-classes for creating a tuple and emitting a list of tuples

AbstractNiFiSinglePortInputOperator§  Extends AbstractNiFiInputOperator and adds a single OutputPort<T> §  Emits a list of tuples to the provided output port

NiFiSinglePortInputOperator§  Extends AbstractNiFiSinglePortInputOperator §  Provides implementation that produces NiFiDataPackets

Page 20 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

NiFi Input Operator Example final SiteToSiteClient.Builder builder = new

SiteToSiteClient.Builder() .url("http://localhost:8080/nifi") .portName("Apex") .requestBatchCount(5); WindowDataManager wdm = new WindowDataManager.NoopWindowDataManager(); NiFiSinglePortInputOperator nifi = dag.addOperator("nifi", new

NiFiSinglePortInputOperator(builder, wdm));

ConsoleOutputOperator console = dag.addOperator("console", new ConsoleOutputOperator());

dag.addStream("nifi_console", nifi.outputPort, console.input);

Page 21 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Apex NiFi Output Operators

AbstractNiFiOutputOperator§  Base class for NiFi Output Operators §  Provides method to process a list of tuples §  Uses NiFiDataPacketBuilder to convert incoming tuples to NiFiDataPackets

NiFiSinglePortOutputOperator§  Extends AbstractNiFiOutputOperator and adds a buffering Input Port §  Buffering Input Port flushes tuples (i.e. sends to NiFi) when batch size is reached

Page 22 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

NiFi Output Operator Example final SiteToSiteClient.Builder builder = new

SiteToSiteClient.Builder() .url("http://localhost:8080/nifi") .portName("Apex”); NiFiDataPacketBuilder<String> dpb = new StringNiFiDataPacketBuilder(); WindowDataManager wdm = new WindowDataManager.NoopWindowDataManager();

NiFiSinglePortOutputOperator nifi = dag.addOperator("nifi", new NiFiSinglePortOutputOperator(builder, dpb, wdm, 1));

RandomEventGenerator rand = dag.addOperator("rand", new RandomEventGenerator());

dag.addStream("rand_nifi", rand.string_data, nifi.inputPort);

Page 23 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Use Case Discussion

Page 24 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Drive Data to Apex for Analysis

NiFi Apex

NiFi

NiFi

•  Drive data from sources to central data center for analysis

•  Tiered collection approach at various locations, think regional data centers

Edge

Edge

Core

Page 25 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Dynamically Adjusting Data Flow

•  Push analytic results from Apex back to NiFi

•  Push results back to edge locations/devices to change behavior

NiFi Apex

NiFi

NiFi

Edge

Edge

Core

Page 26 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

1.  Logs filtered by level and sent from Edge -> Core

2.  Apex produces new filter levels based on rate & sends back to core

3.  Edge polls core for new filter levels & updates filtering

Example: Dynamic Log Collection

Core NiFi Apex

Edge NiFi Logs Logs

New Filters

Logs Output Log Input Log Output

Result Input Store Result

Service Fetch Result Poll Service

Filter

New Filters

New Filters

Poll

Analytic

Page 27 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Dynamic Log Collection – Edge NiFi

Page 28 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Dynamic Log Collection – Core NiFi

Page 29 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Dynamic Log Collection – Apex Application NiFiSinglePortInputOperator nifiInput = ...;dag.addOperator("nifi-in”, nifiInput);

LogLevelWindowCount count = dag.addOperator("count", new LogLevelWindowCount(attributName));

dag.setAttribute(count, OperatorContext.APPLICATION_WINDOW_COUNT, ...);

NiFiDataPacketBuilder<LogLevels> dataPacketBuilder = new DictionaryBuilder(...);

NiFiSinglePortOutputOperator nifiOutput = ...; dag.addOperator("nifi-out", nifiOutput);

// nifi-in > count -> nifi-outdag.addStream("nifi-in-count", nifiInput.outputPort, count.input);dag.addStream("count-nifi-out", count.output, nifiOutput.inputPort);

Page 30 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Dynamic Log Collection – Full Flow

NiFi Apex

NiFi

NiFi

Edge

Edge

Core

Logs

Logs

Logs

New Filters

New Filters

New Filters

Page 31 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Summary

•  Use NiFi to drive data from sources to Apex

•  Leverage results from Apex to adjust your dataflows

•  Dynamic Log Collection Example:

https://github.com/bbende/nifi-streaming-examples

Contact Info: •  Email: [email protected] •  Twitter: @bbende

Page 32 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Thank you