Data ingestion and distribution with apache NiFi
-
Upload
lev-brailovskiy -
Category
Data & Analytics
-
view
551 -
download
2
Transcript of Data ingestion and distribution with apache NiFi
![Page 1: Data ingestion and distribution with apache NiFi](https://reader030.fdocuments.net/reader030/viewer/2022021423/58ac33db1a28ab145e8b5249/html5/thumbnails/1.jpg)
Data Ingestion & Distribution with
Apache NiFi
![Page 2: Data ingestion and distribution with apache NiFi](https://reader030.fdocuments.net/reader030/viewer/2022021423/58ac33db1a28ab145e8b5249/html5/thumbnails/2.jpg)
Agenda
Introduction to NiFi
Our use case for NiFiDemoQ&A
![Page 3: Data ingestion and distribution with apache NiFi](https://reader030.fdocuments.net/reader030/viewer/2022021423/58ac33db1a28ab145e8b5249/html5/thumbnails/3.jpg)
Introduction toNiFi
![Page 4: Data ingestion and distribution with apache NiFi](https://reader030.fdocuments.net/reader030/viewer/2022021423/58ac33db1a28ab145e8b5249/html5/thumbnails/4.jpg)
History & Facts
Created by : NSA
Incubating : 2014
Available : 2015
Main contributors: Hortonworks
Current Stable Version : 1.1.1
Delivery Guarantees : at least once
Out of Order Processing : no
Windowing : no
Back-pressure : yes
Latency : configurable
Resource Management : native
API : REST (GUI)
![Page 5: Data ingestion and distribution with apache NiFi](https://reader030.fdocuments.net/reader030/viewer/2022021423/58ac33db1a28ab145e8b5249/html5/thumbnails/5.jpg)
Ecosystem
Stream ProcessingData Moving
![Page 6: Data ingestion and distribution with apache NiFi](https://reader030.fdocuments.net/reader030/viewer/2022021423/58ac33db1a28ab145e8b5249/html5/thumbnails/6.jpg)
Architecture
![Page 7: Data ingestion and distribution with apache NiFi](https://reader030.fdocuments.net/reader030/viewer/2022021423/58ac33db1a28ab145e8b5249/html5/thumbnails/7.jpg)
Flow Files
Basic Abstraction● Pointer to content
● Content Attributes (key/value)
● Connection to provenance events
![Page 8: Data ingestion and distribution with apache NiFi](https://reader030.fdocuments.net/reader030/viewer/2022021423/58ac33db1a28ab145e8b5249/html5/thumbnails/8.jpg)
Repositories● FlowFile
● Content
● Provenance
● Immutable
● Copy-on-write
![Page 9: Data ingestion and distribution with apache NiFi](https://reader030.fdocuments.net/reader030/viewer/2022021423/58ac33db1a28ab145e8b5249/html5/thumbnails/9.jpg)
ProcessorProcessors actually perform the work of
data routing, transformation, or
mediation between systems. Processors
have access to attributes of a given
FlowFile and its content stream.
Processors can operate on zero or more
Flow Files in a given unit of work and
either commit that work or rollback
![Page 10: Data ingestion and distribution with apache NiFi](https://reader030.fdocuments.net/reader030/viewer/2022021423/58ac33db1a28ab145e8b5249/html5/thumbnails/10.jpg)
Processor● Basic Work Unit
● State
● Statistics
● Settings
● Input/Output
● Provenance
● Scheduling
● Logging (bulletins)
![Page 11: Data ingestion and distribution with apache NiFi](https://reader030.fdocuments.net/reader030/viewer/2022021423/58ac33db1a28ab145e8b5249/html5/thumbnails/11.jpg)
ConnectionConnections provide the actual linkage
between processors. These act as
queues and allow various processes to
interact at differing rates. These queues
can be prioritized dynamically and can
have upper bounds on load, which
enable back pressure
![Page 12: Data ingestion and distribution with apache NiFi](https://reader030.fdocuments.net/reader030/viewer/2022021423/58ac33db1a28ab145e8b5249/html5/thumbnails/12.jpg)
Connection● Queue
● Statistics
● Settings
● Prioritization
● Details
![Page 13: Data ingestion and distribution with apache NiFi](https://reader030.fdocuments.net/reader030/viewer/2022021423/58ac33db1a28ab145e8b5249/html5/thumbnails/13.jpg)
Process GroupSpecific set of processes and their
connections, which can receive data
via input ports and send data out via
output ports. In this manner, process
groups allow creation of entirely new
components simply by composition of
other components
![Page 14: Data ingestion and distribution with apache NiFi](https://reader030.fdocuments.net/reader030/viewer/2022021423/58ac33db1a28ab145e8b5249/html5/thumbnails/14.jpg)
TemplatesTemplates tend to be highly pattern oriented and while there are often many
different ways to solve a problem, it helps greatly to be able to share those
best practices. Templates allow subject matter experts to build and publish
their flow designs and for others to benefit and collaborate on them
● XML Based
● Reusable unit
● Versioning (versioning with Git)
![Page 15: Data ingestion and distribution with apache NiFi](https://reader030.fdocuments.net/reader030/viewer/2022021423/58ac33db1a28ab145e8b5249/html5/thumbnails/15.jpg)
Data ProvenanceNiFi automatically records, indexes, and makes available
provenance data as objects flow through the system even
across fan-in, fan-out, transformations, and more. This
information becomes extremely critical in supporting
compliance, troubleshooting, optimization, and other scenarios
![Page 16: Data ingestion and distribution with apache NiFi](https://reader030.fdocuments.net/reader030/viewer/2022021423/58ac33db1a28ab145e8b5249/html5/thumbnails/16.jpg)
Data Provenance● Details
● Attributes
● Content
![Page 17: Data ingestion and distribution with apache NiFi](https://reader030.fdocuments.net/reader030/viewer/2022021423/58ac33db1a28ab145e8b5249/html5/thumbnails/17.jpg)
Controller ServiceController Service allows
developers to share functionality
and state across the JVM in a
clean and consistent manner
● No scheduling
● No connections
● Used by Processors,
Reporting Tasks, and other
Controller Services
![Page 18: Data ingestion and distribution with apache NiFi](https://reader030.fdocuments.net/reader030/viewer/2022021423/58ac33db1a28ab145e8b5249/html5/thumbnails/18.jpg)
Reporting TasksProvides a capability for reporting
status, statistics, metrics, and
monitoring information to external
services
● ElastichSearchProvenanceReporter and DataDogReportingTask
![Page 19: Data ingestion and distribution with apache NiFi](https://reader030.fdocuments.net/reader030/viewer/2022021423/58ac33db1a28ab145e8b5249/html5/thumbnails/19.jpg)
Extensibility● Ready to use maven template
● Well defined interface for each component
● Classloader Isolation (.nar files)
● Great documentation for developers
![Page 20: Data ingestion and distribution with apache NiFi](https://reader030.fdocuments.net/reader030/viewer/2022021423/58ac33db1a28ab145e8b5249/html5/thumbnails/20.jpg)
Statistics● 200+ built in Processors
● 10+ built Control Services
● 10+ built in Reporting Tasks
![Page 21: Data ingestion and distribution with apache NiFi](https://reader030.fdocuments.net/reader030/viewer/2022021423/58ac33db1a28ab145e8b5249/html5/thumbnails/21.jpg)
Introduction Summary● Processor
● Connection
● Processing Group
● Template
● Controller Service
● Reporting Task
![Page 22: Data ingestion and distribution with apache NiFi](https://reader030.fdocuments.net/reader030/viewer/2022021423/58ac33db1a28ab145e8b5249/html5/thumbnails/22.jpg)
Our use case forNiFi
![Page 23: Data ingestion and distribution with apache NiFi](https://reader030.fdocuments.net/reader030/viewer/2022021423/58ac33db1a28ab145e8b5249/html5/thumbnails/23.jpg)
What was before● Inhouse built file collector
● Footprint of 10 server
● Hard to manage, scale, extend
![Page 24: Data ingestion and distribution with apache NiFi](https://reader030.fdocuments.net/reader030/viewer/2022021423/58ac33db1a28ab145e8b5249/html5/thumbnails/24.jpg)
DWH Real Time
![Page 25: Data ingestion and distribution with apache NiFi](https://reader030.fdocuments.net/reader030/viewer/2022021423/58ac33db1a28ab145e8b5249/html5/thumbnails/25.jpg)
DWH Batch
![Page 26: Data ingestion and distribution with apache NiFi](https://reader030.fdocuments.net/reader030/viewer/2022021423/58ac33db1a28ab145e8b5249/html5/thumbnails/26.jpg)
Reports Distribution
![Page 27: Data ingestion and distribution with apache NiFi](https://reader030.fdocuments.net/reader030/viewer/2022021423/58ac33db1a28ab145e8b5249/html5/thumbnails/27.jpg)
Statistics
20TBData Ingested Daily
250KFiles Ingested Daily
Near Real Time Data AvailabilityMinimum Interval :1 min
1 TBData Distributed Reports
1 TB
30KFiles Exported Daily
![Page 28: Data ingestion and distribution with apache NiFi](https://reader030.fdocuments.net/reader030/viewer/2022021423/58ac33db1a28ab145e8b5249/html5/thumbnails/28.jpg)
AWS - Hadoop Ingestion
![Page 29: Data ingestion and distribution with apache NiFi](https://reader030.fdocuments.net/reader030/viewer/2022021423/58ac33db1a28ab145e8b5249/html5/thumbnails/29.jpg)
AWS - Hadoop Ingestion
![Page 30: Data ingestion and distribution with apache NiFi](https://reader030.fdocuments.net/reader030/viewer/2022021423/58ac33db1a28ab145e8b5249/html5/thumbnails/30.jpg)
Kafka Reprocessing
![Page 31: Data ingestion and distribution with apache NiFi](https://reader030.fdocuments.net/reader030/viewer/2022021423/58ac33db1a28ab145e8b5249/html5/thumbnails/31.jpg)
sFTP - HDFS Ingestion
![Page 32: Data ingestion and distribution with apache NiFi](https://reader030.fdocuments.net/reader030/viewer/2022021423/58ac33db1a28ab145e8b5249/html5/thumbnails/32.jpg)
Let’s break something ;)
![Page 33: Data ingestion and distribution with apache NiFi](https://reader030.fdocuments.net/reader030/viewer/2022021423/58ac33db1a28ab145e8b5249/html5/thumbnails/33.jpg)
Use Cases Summary● Web User Interface
● Configurable
● Scalable
● Easy to Manage
● Designed for Extension
![Page 34: Data ingestion and distribution with apache NiFi](https://reader030.fdocuments.net/reader030/viewer/2022021423/58ac33db1a28ab145e8b5249/html5/thumbnails/34.jpg)
Q & A
![Page 35: Data ingestion and distribution with apache NiFi](https://reader030.fdocuments.net/reader030/viewer/2022021423/58ac33db1a28ab145e8b5249/html5/thumbnails/35.jpg)
THANKYOU