Kamal Hakimzadeh – Reproducible Distributed Experiments
-
Upload
flink-forward -
Category
Technology
-
view
5.331 -
download
0
Transcript of Kamal Hakimzadeh – Reproducible Distributed Experiments
www.karamel.io 1
Reproducible Distributed
Experiments
Kamal HakimzadehPhD Student [email protected]
Jim DowlingAssociate Professor
www.karamel.io 2
Agenda
• Motivation
• Reproducibility
• Demo: Simple experiment 30-40 min
• Karamel Rep.
• Karamel Engine
• Orchestration
• Challenges
www.karamel.io 3
Motivation
Analytical vs Empirical proof
DS supports many scientific advancements
Scheduling, fault tolerant, scalability … Extremely complex
www.karamel.io 4
Reproducible vs. Replicable
1. Laboratory2. Experimenter3. Apparatus
Different and same conclusion
Same and same results
Reproducible
Replicable
Computational Reproducibility: Infrastructure, software, experiment and data
www.karamel.io 5
Demo : Word Count
Hadoop NN
Flink JM
Hadoop DNFlink TM
Hadoop DNFlink TM
Hadoop DNFlink TM
Text Generator Text Generator Text Generator
Word Count
www.karamel.io 6
Karamel: Rep. in different layers
Bare MetalGoogle Compute Engine
Virtual Machine is and abstract entity
Software is defined in Chef It is publicly available in Github
www.karamel.io 7
Karamel Engine
DSL Service
Cloud Clients
Karamel Engine
Physical Mapping
Orchestrator
www.karamel.io 10
Challenges and future work
Scalability Fault Recovery Model
Elasticity – Handle ChurnInstrumentation
Recommendation System
Language Support
Load generators
Scheduling
Container base machines Result Management
Debugging
www.karamel.io 11
Team members
Kamal HakimzadehPhD Student at KTH
Alberto Lorente LealSoftware Developer at Comeon
Jim DowlingAssociate Professor at KTH
Hooman Peiro SajjadPhD Student at KTH
Abhimanyu BabbarBackend Developer at Wrap