Crash N' Burn: Writing Linux application fault handlers
-
Upload
tuxologynet -
Category
Documents
-
view
29 -
download
4
description
Transcript of Crash N' Burn: Writing Linux application fault handlers
Crash N' BurnOR Version 1.1
When bad things happens to good programs...
Gilad Ben-YossefChief Coffee Drinker Codefidence Ltd. [email protected] http://codefidence.com
1
What's this tutorial is about?
Segmentation fault: core dumped
2
Dealing with faults
3
What's wrong with core dumps?
Instant gratification No space left on device for 753Mb core dump No source, no (network) access but working code needed for paycheck Access to external state (e.g. FPGA) Easier access to internal state machine. Custom fault behavior Haiku error messages4
Haiku error messages?Firs t smoke , the n si len ce. Th is thousa nd d ol lar rout er di es so b eaut iful ly. Seg mentati on f aul t: core dum ped5
The Plan
We shall:
Trap signals sent by the kernel in response to faults (SIGSEGV and friends) Print back trace and custom state information (Haiku form optional) ??? Profit!
Easy to do Difficult to do right6
Signals
Signals are asynchronous notifications sent to a process by the kernel, another process or itself Process can register a signal handler function to respond to signal Process faults make the kernel generate a signal ... which the process can catch and respond to
Signals Worth CatchingSIGQUIT - Quit from keyboard SIGILL - Illegal Instruction SIGABRT - Abort signal from abort(3) SIGFPE - Floating point exception SIGSEGV - Invalid memory reference SIGBUS - Bus error (bad memory access)
Catching Signalsint sigaction(int signum, \ const struct sigaction *act, \ struct sigaction *oldact); Register a signal handler.
signum: signal number. act: pointer to new struct sigaction. oldact: pointer to buffer to be filled with current sigaction (or NULL, if not interested).
Catching Signals cont.
The sigaction structure is defined as:
struct sigaction { void (*sa_handler)(int); void (*sa_sigaction)(int, siginfo_t *, void *); sigset_t sa_mask; int sa_flags; ... } sa_hander and sa_sigaction are two forms of signal handler call backs. We'll use the SA_SIGINFO flag to choose the sa_sigaction form sa_mask holds the mask of signals which will be blocked during the callback run. We'll flip all bits.
Where:
Registering Handler Examplestruct sigaction act; memset(&act, 0, sizeof (act)); act.sa_handler = my_handler; sigfillset (&act.sa_mask); act.sa_flags = 0; return sigaction(SIGSEGV, &act, NULL);
Signal Handler
Signal handler prototype:void handler (int signal, siginfo_t * siginfo, \ void * context)
Where:
signal is the signal number siginfo is a pointer to struct siginfo_t context is a pointer to architecture specific structure holding context of interrupted program.
Signal info
struct siginfo_t holdes information about the signal delivered. Interesting fields for exceptions include:
si_errno: errno value
Not always filled on all platforms/versions It's an index to a list of specific error descriptions. See sigaction(2). For SIGILL, SIGFPE, SIGSEGV, and SIGBUS only.
si_code: Error description code
si_addr: Fault address
Signal Context
A structure that saves the hardware context which the signal interrupted
Architecture specific Undocumented Changes between release e.g. getting IP in various architectures:
x86: context->uc_mcontext.gregs[REG_EIP] PPC: context->uc_mcontext.regs->nip
Check out sys/ucontext.h for your favorite architecture
Getting a Backtrace
glibc back trace support:
#include int backtrace(void **buffer, int size); Fills the buffer with call stack address char ** backtrace_symbols(void *buffer, int size); Returns a malloc-ed array of strings of function names. Returned buffer needs to be free()-ed. void backtrace_symbols_fd(void *const *buffer, int size, int fd); Prints function names to file descriptor fd.
Symbols taken from dynamic symbol table, use -rdynamic to populate.
Nave Example
WARNING! The code you are about to see is wrong It is also very common...
What's Wrong?
Async-signal non safe functions Heap usage after malloc arena corruption Not thread safe Signal handler induced stack munging is hiding real fault location
On some architectures at least.
Async-signal Safety
Signal handler run asynchronously Can't share locks between signal handler and main program
If lock is taken and signal handler is called we have dead lock.
Can only use list of async-safe functions defined in POSIX.1-2003
See signal(2) for the list.
fprint, malloc, backtrace_symbols, fflush are not on the list
Heap Usage
The fault may have occurred due to malloc arena corruption Trying to malloc() / free() memory may lead to double fault. So don't ...
Do not call malloc / free anything Do not call functions that do
free, backtrace_symbols obviously not good
Detecting Heap Usage
Poison__malloc_hook and friends:void * kill_malloc(size_t size, const void *caller) { printf("Malloc called from %p\n", caller); abort(); } __malloc_hook = kill_malloc;
Poison the heap:char * p = sbrk(0); memset(p-1024, 42, 1024);
Dynamic linker heap usage
backtrace and friends are dynamically loaded from libgcc.so The dynamic linker calls malloc to load the new library So...
make dummy call to backtrace when installing handler, to force linker to load libgcc with a sane heap. Or statically link libgcc in.
Thread Safety
Multiple threads can fault together
Will garble our output
Use spin lock in signal handler to block concurrent faulting threads Can't spin on the lock if contending thread is of higher RT priority Use pthread_spin_trylock() and sleep with pselect() if failed.
Handler Stack MungingOriginal user mode stack 0x1234.. ... 0x1255... ... Handler returns bar(...) Handler called Munged user mode stack 0x1234... ... 0xffffe... 0x1266... signal_handler() foo(...) foo(...)
Signal handling code
Kernel
trampoline in vsyscall page
Putting It All Together
Fork a watchdog process sleeping on a pipe to handle faults
System wide daemon also possible
Collect information in signal handler and send it over the pipe to the watchdog process for analysis, printing etc. Finalize by sending backtrace_symbols_fd down the pipe Use EIP from signal context to overcome stack munging
Questions?Slides & code at: http://tuxology.netGilad Ben-Yossef Chief Coffee Drinker Codefidence Ltd. [email protected] http://codefidence.com 2008 Codefidence Ltd. Released under a CC-by-sa 2.5 License.25