Post on 02-Jan-2016
Replay Debugging for Distributed Application
D. Geels, G. Altekar, S. Shenker and I. Stoica
Presented by: Olusanya Soyannwo
Outline
Introduction Design Challenges Limitations Evaluation Related Work Conclusion
Introduction
Goal Find non-deterministic failures in
deployed, distributed applications Motivation
Growth of distributed applications Limitations of existing tools
• Network inconsistency• Inadequacy of simulations
Reproduction difficulty
Introduction
Deterministic Replay Remote Debugging latency Continuous interaction Connection problems
Continuous logging Performance concerns
Consistent Group Replay Multiple snapshots
Mixed Environment Determine (non-)cooperating peers
Introduction
LiblogProvides consistent replay in mixed
env.No Additional Hardware or patchesWorks on unmodified C/C++ applicationSimple
• Startup script• GDB interface
Design
Shared Library Implementation Intercepts calls to libc and vice versa Less complicated
Message Tagging and Capture Log messages Time stamps
Central Replay Local replay Network bandwidth, matching h/w, data
accessibility
Challenges
Multi-threaded applications P.-Shared memory S.-Implement new scheduler
Illegal memory accesses P.-Heap/Stack corruption S.-Zero out memory*
TCP Limitation Querying for non-cooperating peers GDB uniprocess restriction
Limitations
Log storage Host Requirements Scheduling semantics Network overhead Limited consistency Completeness Soundness
Evaluation
ExperimentsDual 3.06Ghz, Pentium 4 Xeon, 512K
L2 cache2GB of RAM, 80 GB 7500 rpm
ATA/100 diskBroadcom 1000TX gigabit Ethernet
Evaluation
Evaluation
Evaluation
Conclusion
Related WorkLiblog is similar to several others
(DejaVu, Jockey, Flashback) Useful for select applications Needs a lot of enhancements
Ideas/Issues
Useful for simulations Restricted to none resource intensive
applications. No significant comparison How long can logging occur for?
4MB/hr Inadequate citations/references