Replay Debugging for Distributed Application D. Geels, G. Altekar, S. Shenker and I. Stoica...

Post on 02-Jan-2016

213 views 0 download

Transcript of Replay Debugging for Distributed Application D. Geels, G. Altekar, S. Shenker and I. Stoica...

Replay Debugging for Distributed Application

D. Geels, G. Altekar, S. Shenker and I. Stoica

Presented by: Olusanya Soyannwo

Outline

Introduction Design Challenges Limitations Evaluation Related Work Conclusion

Introduction

Goal Find non-deterministic failures in

deployed, distributed applications Motivation

Growth of distributed applications Limitations of existing tools

• Network inconsistency• Inadequacy of simulations

Reproduction difficulty

Introduction

Deterministic Replay Remote Debugging latency Continuous interaction Connection problems

Continuous logging Performance concerns

Consistent Group Replay Multiple snapshots

Mixed Environment Determine (non-)cooperating peers

Introduction

LiblogProvides consistent replay in mixed

env.No Additional Hardware or patchesWorks on unmodified C/C++ applicationSimple

• Startup script• GDB interface

Design

Shared Library Implementation Intercepts calls to libc and vice versa Less complicated

Message Tagging and Capture Log messages Time stamps

Central Replay Local replay Network bandwidth, matching h/w, data

accessibility

Challenges

Multi-threaded applications P.-Shared memory S.-Implement new scheduler

Illegal memory accesses P.-Heap/Stack corruption S.-Zero out memory*

TCP Limitation Querying for non-cooperating peers GDB uniprocess restriction

Limitations

Log storage Host Requirements Scheduling semantics Network overhead Limited consistency Completeness Soundness

Evaluation

ExperimentsDual 3.06Ghz, Pentium 4 Xeon, 512K

L2 cache2GB of RAM, 80 GB 7500 rpm

ATA/100 diskBroadcom 1000TX gigabit Ethernet

Evaluation

Evaluation

Evaluation

Conclusion

Related WorkLiblog is similar to several others

(DejaVu, Jockey, Flashback) Useful for select applications Needs a lot of enhancements

Ideas/Issues

Useful for simulations Restricted to none resource intensive

applications. No significant comparison How long can logging occur for?

4MB/hr Inadequate citations/references