Big Data Ingestion with Kafka -> HDFS using Apache Apex
-
Upload
datatorrent -
Category
Technology
-
view
79 -
download
0
Transcript of Big Data Ingestion with Kafka -> HDFS using Apache Apex
Big Data Ingestion with Kafka
Chinmay [email protected]
Agenda
● Data Ingestion● Use case: Kafka => HDFS● Brief about Kafka● Steps for development● Let’s code!!!
2
Data Ingestion3
● Reading data in
● Storing in accessible location
● Beginning data pipeline or write path
● From here, it is processed further or read path
Use case: KAFKA => HDFS4
● Reading from Kafka Messaging Queue
● Writing to HDFS
KAFKA HDFS
Use case: Examples5
● Log Aggregation○ Collect logs from various sources○ Streams them as a single topic○ Put all the logs in centralized place i.e. HDFS
● Real time sensor data processing○ Read sensor data from various sources○ Process stream○ Dump results to HDFS
Brief about Kafka6
● Distributed Messaging System
● Fast Reads and Writes
● Can handle large number of clients
● Scalable, fault-tolerant, partitionable
● Persistent messages
Brief about Kafka (contd.)7
● Terminologies○ Topic○ Producer○ Consumer○ Broker
Steps for developing application8
1. Create maven project using apex mvn archetype2. Add required maven dependencies3. Add operators to DAG4. Add stream(s) to DAG5. Set properties in properties.xml6. Compile and run
9
Summary10
● Ease of development using Apex
● Reusable malhar components
● Fault-tolerant, Scalable
● Reduced Time to Production
11
Resources
Apache Apex Meetup
• Apache Apex - http://apex.apache.org/• Subscribe - http://apex.apache.org/community.html• Download - https://www.datatorrent.com/download/• Twitter
o @ApacheApex; Follow - https://twitter.com/apacheapexo @DataTorrent; Follow – https://twitter.com/datatorrent
• Meetups - http://www.meetup.com/topics/apache-apex• Webinars - https://www.datatorrent.com/webinars/• Videos - https://www.youtube.com/user/DataTorrent• Slides - http://www.slideshare.net/DataTorrent/presentations• Startup Accelerator Program - Full featured enterprise product
o https://www.datatorrent.com/product/startup-accelerator/
We Are Hiring
Apache Apex Meetup
• [email protected]• Developers/Architects• QA Automation Developers• Information Developers• Build and Release• Community Leaders