Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
The Hive Think Tank: "Stream Processing Systems" by Karthik Ramasamy of Twitter
-
Upload
the-hive -
Category
Data & Analytics
-
view
354 -
download
0
Transcript of The Hive Think Tank: "Stream Processing Systems" by Karthik Ramasamy of Twitter
Stream Processing Systems
Karthik RamasamyTwitter
@karthikz
2
Value of Real Time DataIt’s contextual
[1] Courtesy Michael Franklin, BIRTE, 2015.
3
Heron
Batching of tuplesAmortizing the cost of transferring tuples
Task isolation
Ease of
debug-ability/isolation/profiling
Fully API compatible with StormDirected acyclic graph
Topologies, Spouts and Bolts
Support for back pressureTopologies should self adjustingg
Use of main stream languagesC++, Java and Python
EfficiencyReduce resource consumption G
Design: Goals
4
Better Storm
Twitter Heron
Container Based Architecture\Separate Monitoring and Scheduling-Simplified Execution Model2Much Better Performance
5
HeronSample Topologies
6
Heron@TwitterStorm is decommissioned
LARG
EST
CLUS
TER
100’
s of T
OPO
LOGI
ES
BILL
IONS
OF M
ESSA
GES
100’s
OF T
ERAB
YTES
REDU
CED
INCI
DENT
S
GOO
D N
IGHT
SLE
EP
3X reduction in resource usage
Auto scaling the system in the presence of unpredictability
7
Technology Challenges
The Road Ahead
Auto tuning of real time analytics jobs/queries
Exploiting faster networks for efficiently moving data
ÄÜ
J
8
@karthikz Get in Touch