JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
Flurry Analytic Backend - Processing Terabytes of Data in Real-time
-
Upload
trieu-nguyen -
Category
Data & Analytics
-
view
625 -
download
3
description
Transcript of Flurry Analytic Backend - Processing Terabytes of Data in Real-time
www.flurry.com
November 14, 2013
Anthony Watkins, Senior Director of Developer Relations
Processing Terabytes of Data in Real-
Time
@flurrymobile
@antwatkins
www.flurry.com
Flurry is a leading mobile advertising and analytics provider
Pub
lishe
r
Adv
ertis
er
Audience
AppCircle Applications: 10,000+
Devices/month: 300M
Conversions/month: 120M
AppSpot Applications: 2,500+
Devices/month: 250M
Impressions/month: 7.5B
Analytics Applications: 400,000
Devices/month: 1.2B
Data points/month: 1.9T
• Why Flurry Switched from a MapReduce Framework to
pipeline processing
• How Flurry uses Kafka in data processing
• Tuning of Kafka to work in Flurry’s environment
• Flurry Monitoring and error handling of streams
Topics
The Path to Real-Time Processing
www.flurry.com 4
The Why
www.flurry.com 5
Past Processing Model
www.flurry.com 6
Device Reports
NoSQL DataStore
Batch
Collectors
MapReduce
(jobs)
External
Action
Flurry Analytics MapReduce Architecture
www.flurry.com 7
Agent Portal Data Log Processor
Developer
Portal Metrics Computer
HDFS
HBase
HBase
Hadoop/Hbase
Jetty
Jetty
HTTP
Binary Encoded
Data
Raw Data
Log Archive
Metrics Table
(Cube)
Normalized
Data Storage
User Profile
Data
MySQL
Hadoop Map/Reduce
Hadoop Map/Reduce
Web Layer Metrics Processing
Data Collection and Processing in MR
Pros
www.flurry.com 8
MapReduce
(jobs)
Data Collection and Processing in MR
Cons
www.flurry.com 9
Device Reports
MapReduce
(jobs)
Job Time
Startup Time
Flurry Kafka
The Move to Kafka
www.flurry.com 10
About Kafka
Origin
www.flurry.com 11
November 2010 June 2011 November 2012
About Kafka
www.flurry.com 12
Producer Producer Producer
Kakfa Broker
Consumer Consumer Consumer
About Kafka
www.flurry.com 13
Kafka Broker
*
* Partition image courtesy of http://kafka.apache.org/images/log_anatomy.png
About Kafka
www.flurry.com 14
Producer 1 Producer N Producer 2
Kafka Cluster
Broker 1
P0 P2
Broker 2
P1 P3
Consumer Group
C1 C2 C3
Why Kafka for Flurry
www.flurry.com 15
Device Reports
MapReduce
(jobs) Kafka
Startup
Time
Introducing the Data Log Consumer (DLC)
www.flurry.com 16
Agent Portal Data Log Consumer
Developer
Portal Metrics Computer
HDFS
HBase
HBase
Hadoop/Hbase
Jetty
Jetty
HTTP
Binary Encoded
Data
Metrics Table
(Cube)
Normalized
Data Storage
User Profile
Data
MySQL
Kafka
Hadoop Map/Reduce
Web Layer Metrics Processing
• Zookeeper timeouts
• Completely async service
• Default fsync interval
• Commit threshold from local environments
Tuning Kafka for Flurry
Challenges
www.flurry.com 17
How Flurry Uses Kafka
Infrastructure and Setup
www.flurry.com 18
Consumer Group
C1 C2 C… C325
Kafka Cluster
B1 B2 B3
Broker
P1 P2 P… P400
Topic
Flurry Monitoring / Error Handling
Monitoring
www.flurry.com 19
• Alerts
• Consumer Failure
• Broker Failure
Error Handling
Next Steps: 0.8
www.flurry.com 20
Data Log Consumer
HDFS
Kafka
Data Log Consumer
Kafka
Kafka Cluster
Broker 1
P0 P2
Broker 2
P1 P3
P1’ P3’ P0’ P2’
Next Steps: Extended Pipeline
www.flurry.com 21
Input Data
NoSQL DataStore
Real-Time Batch
Collectors
Consumer/
Producer
Systems
MapReduce
(jobs)
External
Action External
Action
Next Steps: Topics and Consumer Groups
Infrastructure and Setup
www.flurry.com 22
Consumer Group 2
C1’ C2’ C… CN’
Topic 1
Consumer Group 1
C1 C2 C… CN
Consumer Group N
C1’’ C2’’ C… CN’’
Topic 2
www.flurry.com
November 14, 2013
blog.flurry.com
@flurrymobile
@antwatkins
Thank you