Saturday, 16 May 2009

A really simple tracing debugger

gdb can be really useful for debugging when it prints the information you want, but sometimes it will fail to give a useful backtrace, and it can be hard to find out what happened just before your process crashed.

I faced that problem when porting glibc to Native Client. gdb (or strace -i) could tell me the value of the instruction pointer when the process crashed, but it couldn't produce a backtrace because it didn't know how to read memory with NaCl's x86 segmentation setup. (I should point out that gdb has since been ported.) Often I could look up the address with addr2line or objdump -d to find the cause of the problem. However, that didn't help when the process died by jumping to address 0, or when it jumped to _start when running atexit handlers.

I found another way to do debugging, using a trace of the program's execution.

So, here's how to create a debugger with a single shell pipeline (plus a small C program) that works on statically-linked executables.

The first part is a tool called itrace, which starts a subprocess and single-steps its execution using Linux's ptrace() system call and prints the process's eip and esp registers (instruction pointer and stack pointer) at every step. (The code is below.)

Here's what we get if we run itrace on a simple statically-linked executable:

$ ./itrace ./hellow-static | less
8048130 ffce4a50
8048132 ffce4a50
8048133 ffce4a54
8048135 ffce4a54
8048138 ffce4a50
...
To make this intelligible, all we have to do is pipe the output of itrace through addr2line. This tool is part of binutils and translates addresses to source code filenames and line numbers using the debugging information in the executable. Also, pipe through uniq to remove duplicate lines:
$ ./itrace ./hellow-static | addr2line -e ./hellow-static | uniq | less
??:0
/work/glibc/csu/libc-start.c:93
/work/glibc/csu/libc-start.c:103
/work/glibc/csu/libc-start.c:93
/work/glibc/csu/libc-start.c:103
/work/glibc/csu/libc-start.c:93
/work/glibc/csu/libc-start.c:103
/work/glibc/csu/libc-start.c:106
/work/glibc/csu/libc-start.c:103
/work/glibc/csu/libc-start.c:106
...
/work/glibc/stdlib/exit.c:88
/work/glibc/stdlib/exit.c:90
/work/glibc/posix/../sysdeps/unix/sysv/linux/i386/_exit.S:25
/work/glibc/posix/../sysdeps/unix/sysv/linux/i386/_exit.S:29
/work/glibc/posix/../sysdeps/unix/sysv/linux/i386/_exit.S:30
addr2line prints "??:0" for addresses that it can't find in debugging information. If your executable hasn't been built with debugging information - the static libc.a in Debian/Ubuntu isn't built with debugging enabled - you can pass the -f option to addr2line to print function names as well, and then filter out the useless "??:0" lines:
$ cat hellow.c <<END
#include <stdio.h>
int main() { printf("Hello, world!\n"); return 0; }
END
$ gcc -static hellow.c -o hellow-static
$ ./itrace ./hellow-static | addr2line -e ./hellow-static -f | grep -v "??:0" | uniq | less
_start
__libc_start_main
_dl_aux_init
__libc_start_main
_dl_discover_osversion
__uname
_dl_discover_osversion
__libc_start_main
__pthread_initialize_minimal
__libc_setup_tls
...
_IO_default_setbuf
_IO_new_file_setbuf
_IO_unbuffer_write
_IO_cleanup
exit
_exit
This tells us what happened before the program exited. In this case, it was executing exit() and flushing stdio buffers.

Tracing debugging can often be more helpful than postmortem debugging. Debugging with printf is still popular, after all! However, single-stepping using ptrace() has quite a high overhead so this is not really suitable for longer-running processes. Potentially Valgrind could give us an execution trace more efficiently using its dynamic translation approach, but that would still be quite slow. Modern x86 CPUs have a "Branch Trace Store" feature for recording address of basic blocks as they are executed which has the potential to be a faster method, and there is a proposal to make this available in Linux through the ptrace() interface.

itrace.c

This assumes i386, so if you're on a x86-64 system you can build it with:
gcc -m32 itrace.c -o itrace
#include <stdio.h>
#include <sys/wait.h>
#include <unistd.h>

#include <linux/user.h>
#include <sys/ptrace.h>

int main(int argc, char **argv)
{
  int pid = fork();
  if(pid == 0) {
    if(ptrace(PTRACE_TRACEME) < 0) {
      perror("ptrace");
      _exit(1);
    }
    execvp(argv[1], argv + 1);
    perror("exec");
    _exit(1);
  }
  while(1) {
    int status;
    struct user_regs_struct regs;
    if(waitpid(pid, &status, 0) < 0)
      perror("waitpid");
    if(!WIFSTOPPED(status))
      break;
    if(ptrace(PTRACE_GETREGS, pid, 0, ®s) < 0)
      perror("ptrace/GETREGS");
    printf("%lx %lx\n", regs.eip, regs.esp);
    if(ptrace(PTRACE_SINGLESTEP, pid, 0, 0) < 0)
      perror("ptrace/SINGLESTEP");
  }
  return 0;
}

No comments: