Using Spark at Vungle
-
Upload
alicia-strait -
Category
Engineering
-
view
259 -
download
0
Transcript of Using Spark at Vungle
![Page 1: Using Spark at Vungle](https://reader031.fdocuments.net/reader031/viewer/2022032222/55c3562bbb61eb0f7e8b463e/html5/thumbnails/1.jpg)
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
1
![Page 2: Using Spark at Vungle](https://reader031.fdocuments.net/reader031/viewer/2022032222/55c3562bbb61eb0f7e8b463e/html5/thumbnails/2.jpg)
2
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
● Introduction
● Old Architecture
● New Architecture
● Decoupling
● Streaming
● Conclusion
![Page 3: Using Spark at Vungle](https://reader031.fdocuments.net/reader031/viewer/2022032222/55c3562bbb61eb0f7e8b463e/html5/thumbnails/3.jpg)
3
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
● Legacy Java Process○ “Crunches” data○ Sends data downstream to our own datastores and to 3rd party
analytics○ Runs every hour
● Growth○ Process can run over an hour○ 12 GB -> 24GB heap in less than 1 year○ Cron is a horrible job management system○ A failure requires rerunning a job from the beginning
● 2.0○ Horizontably scalable○ Real Time ETL○ Reuesable
![Page 4: Using Spark at Vungle](https://reader031.fdocuments.net/reader031/viewer/2022032222/55c3562bbb61eb0f7e8b463e/html5/thumbnails/4.jpg)
4
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
ETL @ Vungle
● ~1 Billion Events / Day
● Deduplication
● Calculating $$$
● Outputting data to various destinations
![Page 5: Using Spark at Vungle](https://reader031.fdocuments.net/reader031/viewer/2022032222/55c3562bbb61eb0f7e8b463e/html5/thumbnails/5.jpg)
5
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
Old Architecture
![Page 6: Using Spark at Vungle](https://reader031.fdocuments.net/reader031/viewer/2022032222/55c3562bbb61eb0f7e8b463e/html5/thumbnails/6.jpg)
6
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
![Page 7: Using Spark at Vungle](https://reader031.fdocuments.net/reader031/viewer/2022032222/55c3562bbb61eb0f7e8b463e/html5/thumbnails/7.jpg)
7
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
![Page 8: Using Spark at Vungle](https://reader031.fdocuments.net/reader031/viewer/2022032222/55c3562bbb61eb0f7e8b463e/html5/thumbnails/8.jpg)
8
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
![Page 9: Using Spark at Vungle](https://reader031.fdocuments.net/reader031/viewer/2022032222/55c3562bbb61eb0f7e8b463e/html5/thumbnails/9.jpg)
9
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
![Page 10: Using Spark at Vungle](https://reader031.fdocuments.net/reader031/viewer/2022032222/55c3562bbb61eb0f7e8b463e/html5/thumbnails/10.jpg)
10
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
![Page 11: Using Spark at Vungle](https://reader031.fdocuments.net/reader031/viewer/2022032222/55c3562bbb61eb0f7e8b463e/html5/thumbnails/11.jpg)
11
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
![Page 12: Using Spark at Vungle](https://reader031.fdocuments.net/reader031/viewer/2022032222/55c3562bbb61eb0f7e8b463e/html5/thumbnails/12.jpg)
12
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
![Page 13: Using Spark at Vungle](https://reader031.fdocuments.net/reader031/viewer/2022032222/55c3562bbb61eb0f7e8b463e/html5/thumbnails/13.jpg)
13
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
![Page 14: Using Spark at Vungle](https://reader031.fdocuments.net/reader031/viewer/2022032222/55c3562bbb61eb0f7e8b463e/html5/thumbnails/14.jpg)
14
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
New Architecture
![Page 15: Using Spark at Vungle](https://reader031.fdocuments.net/reader031/viewer/2022032222/55c3562bbb61eb0f7e8b463e/html5/thumbnails/15.jpg)
15
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
![Page 16: Using Spark at Vungle](https://reader031.fdocuments.net/reader031/viewer/2022032222/55c3562bbb61eb0f7e8b463e/html5/thumbnails/16.jpg)
16
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
![Page 17: Using Spark at Vungle](https://reader031.fdocuments.net/reader031/viewer/2022032222/55c3562bbb61eb0f7e8b463e/html5/thumbnails/17.jpg)
17
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
![Page 18: Using Spark at Vungle](https://reader031.fdocuments.net/reader031/viewer/2022032222/55c3562bbb61eb0f7e8b463e/html5/thumbnails/18.jpg)
18
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
![Page 19: Using Spark at Vungle](https://reader031.fdocuments.net/reader031/viewer/2022032222/55c3562bbb61eb0f7e8b463e/html5/thumbnails/19.jpg)
19
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
![Page 20: Using Spark at Vungle](https://reader031.fdocuments.net/reader031/viewer/2022032222/55c3562bbb61eb0f7e8b463e/html5/thumbnails/20.jpg)
20
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
![Page 21: Using Spark at Vungle](https://reader031.fdocuments.net/reader031/viewer/2022032222/55c3562bbb61eb0f7e8b463e/html5/thumbnails/21.jpg)
21
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
![Page 22: Using Spark at Vungle](https://reader031.fdocuments.net/reader031/viewer/2022032222/55c3562bbb61eb0f7e8b463e/html5/thumbnails/22.jpg)
22
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
Decoupling
![Page 23: Using Spark at Vungle](https://reader031.fdocuments.net/reader031/viewer/2022032222/55c3562bbb61eb0f7e8b463e/html5/thumbnails/23.jpg)
23
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
![Page 24: Using Spark at Vungle](https://reader031.fdocuments.net/reader031/viewer/2022032222/55c3562bbb61eb0f7e8b463e/html5/thumbnails/24.jpg)
24
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
![Page 25: Using Spark at Vungle](https://reader031.fdocuments.net/reader031/viewer/2022032222/55c3562bbb61eb0f7e8b463e/html5/thumbnails/25.jpg)
25
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
![Page 26: Using Spark at Vungle](https://reader031.fdocuments.net/reader031/viewer/2022032222/55c3562bbb61eb0f7e8b463e/html5/thumbnails/26.jpg)
26
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
![Page 27: Using Spark at Vungle](https://reader031.fdocuments.net/reader031/viewer/2022032222/55c3562bbb61eb0f7e8b463e/html5/thumbnails/27.jpg)
27
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
![Page 28: Using Spark at Vungle](https://reader031.fdocuments.net/reader031/viewer/2022032222/55c3562bbb61eb0f7e8b463e/html5/thumbnails/28.jpg)
28
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
![Page 29: Using Spark at Vungle](https://reader031.fdocuments.net/reader031/viewer/2022032222/55c3562bbb61eb0f7e8b463e/html5/thumbnails/29.jpg)
29
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
![Page 30: Using Spark at Vungle](https://reader031.fdocuments.net/reader031/viewer/2022032222/55c3562bbb61eb0f7e8b463e/html5/thumbnails/30.jpg)
30
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
![Page 31: Using Spark at Vungle](https://reader031.fdocuments.net/reader031/viewer/2022032222/55c3562bbb61eb0f7e8b463e/html5/thumbnails/31.jpg)
31
Introduction Problem Decoupling Streaming Conclusion
Setup connection and spark streams
Map each line of log into Mongo Objects and insert into mongo
![Page 32: Using Spark at Vungle](https://reader031.fdocuments.net/reader031/viewer/2022032222/55c3562bbb61eb0f7e8b463e/html5/thumbnails/32.jpg)
32
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
Setup connection and spark streams
![Page 33: Using Spark at Vungle](https://reader031.fdocuments.net/reader031/viewer/2022032222/55c3562bbb61eb0f7e8b463e/html5/thumbnails/33.jpg)
33
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
Mapping to Mongo objects and insertions
![Page 34: Using Spark at Vungle](https://reader031.fdocuments.net/reader031/viewer/2022032222/55c3562bbb61eb0f7e8b463e/html5/thumbnails/34.jpg)
34
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
Questions
![Page 35: Using Spark at Vungle](https://reader031.fdocuments.net/reader031/viewer/2022032222/55c3562bbb61eb0f7e8b463e/html5/thumbnails/35.jpg)
35
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
Streaming
![Page 36: Using Spark at Vungle](https://reader031.fdocuments.net/reader031/viewer/2022032222/55c3562bbb61eb0f7e8b463e/html5/thumbnails/36.jpg)
36
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
![Page 37: Using Spark at Vungle](https://reader031.fdocuments.net/reader031/viewer/2022032222/55c3562bbb61eb0f7e8b463e/html5/thumbnails/37.jpg)
37
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
![Page 38: Using Spark at Vungle](https://reader031.fdocuments.net/reader031/viewer/2022032222/55c3562bbb61eb0f7e8b463e/html5/thumbnails/38.jpg)
38
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
![Page 39: Using Spark at Vungle](https://reader031.fdocuments.net/reader031/viewer/2022032222/55c3562bbb61eb0f7e8b463e/html5/thumbnails/39.jpg)
39
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
Ingestion
![Page 40: Using Spark at Vungle](https://reader031.fdocuments.net/reader031/viewer/2022032222/55c3562bbb61eb0f7e8b463e/html5/thumbnails/40.jpg)
40
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
Event ID Request View Install ... Request Added
View Added
Install Added
Value
Ingestion Table Schema
![Page 41: Using Spark at Vungle](https://reader031.fdocuments.net/reader031/viewer/2022032222/55c3562bbb61eb0f7e8b463e/html5/thumbnails/41.jpg)
41
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
... Date Time Deliveries Views Installs Processed Deliveries
Processed Views
Processed Installs
Fact Table Schema
![Page 42: Using Spark at Vungle](https://reader031.fdocuments.net/reader031/viewer/2022032222/55c3562bbb61eb0f7e8b463e/html5/thumbnails/42.jpg)
42
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
Ingestion
![Page 43: Using Spark at Vungle](https://reader031.fdocuments.net/reader031/viewer/2022032222/55c3562bbb61eb0f7e8b463e/html5/thumbnails/43.jpg)
43
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
![Page 44: Using Spark at Vungle](https://reader031.fdocuments.net/reader031/viewer/2022032222/55c3562bbb61eb0f7e8b463e/html5/thumbnails/44.jpg)
44
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
![Page 45: Using Spark at Vungle](https://reader031.fdocuments.net/reader031/viewer/2022032222/55c3562bbb61eb0f7e8b463e/html5/thumbnails/45.jpg)
45
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
![Page 46: Using Spark at Vungle](https://reader031.fdocuments.net/reader031/viewer/2022032222/55c3562bbb61eb0f7e8b463e/html5/thumbnails/46.jpg)
46
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
![Page 47: Using Spark at Vungle](https://reader031.fdocuments.net/reader031/viewer/2022032222/55c3562bbb61eb0f7e8b463e/html5/thumbnails/47.jpg)
47
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
![Page 48: Using Spark at Vungle](https://reader031.fdocuments.net/reader031/viewer/2022032222/55c3562bbb61eb0f7e8b463e/html5/thumbnails/48.jpg)
48
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
![Page 49: Using Spark at Vungle](https://reader031.fdocuments.net/reader031/viewer/2022032222/55c3562bbb61eb0f7e8b463e/html5/thumbnails/49.jpg)
49
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
Process
![Page 50: Using Spark at Vungle](https://reader031.fdocuments.net/reader031/viewer/2022032222/55c3562bbb61eb0f7e8b463e/html5/thumbnails/50.jpg)
50
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
![Page 51: Using Spark at Vungle](https://reader031.fdocuments.net/reader031/viewer/2022032222/55c3562bbb61eb0f7e8b463e/html5/thumbnails/51.jpg)
51
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
![Page 52: Using Spark at Vungle](https://reader031.fdocuments.net/reader031/viewer/2022032222/55c3562bbb61eb0f7e8b463e/html5/thumbnails/52.jpg)
52
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
![Page 53: Using Spark at Vungle](https://reader031.fdocuments.net/reader031/viewer/2022032222/55c3562bbb61eb0f7e8b463e/html5/thumbnails/53.jpg)
53
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
![Page 54: Using Spark at Vungle](https://reader031.fdocuments.net/reader031/viewer/2022032222/55c3562bbb61eb0f7e8b463e/html5/thumbnails/54.jpg)
54
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
![Page 55: Using Spark at Vungle](https://reader031.fdocuments.net/reader031/viewer/2022032222/55c3562bbb61eb0f7e8b463e/html5/thumbnails/55.jpg)
55
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
Next Steps
● Switching from JSON to ProtoBuf
● Using YARN to run multiple jobs on one cluster
● Data Science
● Who knows?
![Page 56: Using Spark at Vungle](https://reader031.fdocuments.net/reader031/viewer/2022032222/55c3562bbb61eb0f7e8b463e/html5/thumbnails/56.jpg)
56
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion
Questions
![Page 57: Using Spark at Vungle](https://reader031.fdocuments.net/reader031/viewer/2022032222/55c3562bbb61eb0f7e8b463e/html5/thumbnails/57.jpg)
Thank you!
57