Real-time Big Data at FPT (for TechCamp University)

33
Real-time Big Data at FPT and some key ideas to build real-time big data platform from open source tools Apache Spark Reactive Function X (RFX) Presented by @tantrieuf31 http://nguyentantrieu.info

Transcript of Real-time Big Data at FPT (for TechCamp University)

Page 1: Real-time Big Data at FPT (for TechCamp University)

Real-time Big Data at FPT

and some key ideas to build real-time big data platform from open source tools○ Apache Spark○ Reactive Function X (RFX)

Presented by @tantrieuf31http://nguyentantrieu.info

Page 2: Real-time Big Data at FPT (for TechCamp University)

about me ?● Full Stack Engineer and Tech Lead at AdsPlay,

startup project from FPT Telecom● Founder at RFXLab.com, building RFX

framework and Fast Data Intelligence Platform for Data-driven Organization

● Tech Blogger at http://engineering.adsplay.net

Page 3: Real-time Big Data at FPT (for TechCamp University)

Abstract

1. Just 5 minutes about the history of “Big Data”2. Does Big Data solve big problems ?3. Overview about Open Source Tools

a. Netty (Event Collector)b. Kafka (Event Queue)c. RFX-Stream (Event Processor)d. Apache Spark (Big Data processing engine)e. RFX-Iris (Fast Data Query Interface)

Page 4: Real-time Big Data at FPT (for TechCamp University)

5 minutes about the history of “Big Data”

Page 5: Real-time Big Data at FPT (for TechCamp University)

Imagine what if you have to build a GREAT pyramid ?

In fact, the Big Data was born in 3000 years ago. When you have to build a great thing, you would face with making decisions with lots of data.

Page 6: Real-time Big Data at FPT (for TechCamp University)

How ?Decisions without Data ?

Page 7: Real-time Big Data at FPT (for TechCamp University)
Page 8: Real-time Big Data at FPT (for TechCamp University)

OK, let’s get back to 2015

Page 9: Real-time Big Data at FPT (for TechCamp University)

What if the business is not driven by data?Refer: http://www.nytimes.com/2011/04/24/business/24unboxed.html

Page 10: Real-time Big Data at FPT (for TechCamp University)

Since 2015, the Fast Data, a new trend, has been replacing Big Data

http://www.tibco.com/blog/2015/03/27/how-analytics-facilitates-fast-data

Page 11: Real-time Big Data at FPT (for TechCamp University)

1970s 1990s 2000s 2010s

Data Management Technology and Trends● Netty.io● Apache Storm● Apache Kafka● Apache Spark● RFX● ...

● Hadoop Ecosystem● NoSQL Ecosystem● ...

● Oracle● MySQL● PostgreSQL● ...

Page 12: Real-time Big Data at FPT (for TechCamp University)

“Does Big Data solve our big problems ?

Page 13: Real-time Big Data at FPT (for TechCamp University)

tracking all access logs and user’s activities

Processing in real-time( seconds) !

Storing multiple types of log (video, web, mobile, like, comment, play, … )

Page 14: Real-time Big Data at FPT (for TechCamp University)
Page 15: Real-time Big Data at FPT (for TechCamp University)
Page 16: Real-time Big Data at FPT (for TechCamp University)

http://www.rfxlab.com

Page 17: Real-time Big Data at FPT (for TechCamp University)

boosting Sale Revenue / Profit

Log events

Reactive events

Page 18: Real-time Big Data at FPT (for TechCamp University)

How is the Big Data used at FPT ?

Page 19: Real-time Big Data at FPT (for TechCamp University)

Does Vietnamese love football ? The correlation said YES

Page 20: Real-time Big Data at FPT (for TechCamp University)

Analyzing trending events in real-time !

Page 21: Real-time Big Data at FPT (for TechCamp University)

Visualizing all user’s devices

Page 22: Real-time Big Data at FPT (for TechCamp University)

Real-time Big Data Architecture

Page 23: Real-time Big Data at FPT (for TechCamp University)

“How to build an “Just-Work” real-

time big data system ?

Page 24: Real-time Big Data at FPT (for TechCamp University)

KEY IDEA is “divide and conquer”

Page 25: Real-time Big Data at FPT (for TechCamp University)

User Story in plain English

1. Hercules is thinking about some questions. E.g: What’s hot songs of Nhacso on Facebook ?

2. He decides to ask Iris about this question.3. Iris analyzes the question into “query

messages” and deliver them to Zeus.4. Zeus uses his power of “large-scale data

processing” to answer the question.5. Done, Zeus return the result “hot songs on

Facebook” for Iris. 6. She sends the result to Hercules

Page 26: Real-time Big Data at FPT (for TechCamp University)

Visualizing our user storyQuestion about Big Data: What’s hot songs of NhacSo.net on Facebook ?

messages

ZeusIrisHercules

Page 27: Real-time Big Data at FPT (for TechCamp University)
Page 28: Real-time Big Data at FPT (for TechCamp University)

Let’s see how it works

Page 29: Real-time Big Data at FPT (for TechCamp University)

Awesome Open Source Projects to follow

RFXLab.com◎ http://www.rfxlab.com ◎ https://github.com/rfxlab

Kafka : http://kafka.apache.org Hadoop http://hadoop.apache.org Apache Spark https://spark.apache.org

Page 30: Real-time Big Data at FPT (for TechCamp University)

Awesome Open Source Projects to follow

Native Kafka driver: https://github.com/edenhill/librdkafka/

PHP Kafka driver: https://github.com/EVODelavega/phpkafka

Data Visualization JavaScript Libraryhttps://github.com/nvd3-community/nvd3

Page 31: Real-time Big Data at FPT (for TechCamp University)

Good ref books

"Spend some time alone and learn to develop your personal resources."

Alexander Reid Martin

Page 32: Real-time Big Data at FPT (for TechCamp University)
Page 33: Real-time Big Data at FPT (for TechCamp University)

More info at http://engineering.adsplay.net/jobs-at-adsplay-team