Friday, 18 September 2009

Logging with stack traces

I find that postmortem debugging is useful (e.g. examining process state in gdb or pdb after a crash or traceback), but trying to debug using breakpoints is an exercise in tedium. I prefer debugging using logging. However, logs obtained just by adding printf statements tend to lack context information, and if you add lots of printfs the logs require more effort to read.

You can get context information in Python by doing traceback.print_stack() after every message, but the result tends to be unreadable. Below is a tool which prints stack context in a more readable tree format. I found it very useful a couple of months ago when investigating a bug in which a recursive invocation of a callback function caused failures later on. Had I not seen the stack context, I might not have realised that the callback was being called recursively.

Here is some example output. It comes from running the Tahoe client, while monkey-patching some socket functions to show which sockets the process is binding and connecting to.

0:/usr/bin/twistd:21:<module>
 1:/usr/lib/python2.5/site-packages/twisted/scripts/twistd.py:27:run
  2:/usr/lib/python2.5/site-packages/twisted/application/app.py:379:run
   3:/usr/lib/python2.5/site-packages/twisted/scripts/twistd.py:23:runApp
    4:/usr/lib/python2.5/site-packages/twisted/application/app.py:158:run
     5:/usr/lib/python2.5/site-packages/twisted/scripts/_twistd_unix.py:213:postApplication
      6:/usr/lib/python2.5/site-packages/twisted/scripts/_twistd_unix.py:174:startApplication
       7:/usr/lib/python2.5/site-packages/twisted/application/service.py:228:privilegedStartService
        8:/usr/lib/python2.5/site-packages/twisted/application/service.py:228:privilegedStartService
         9:/usr/lib/python2.5/site-packages/twisted/application/service.py:228:privilegedStartService
          10:/usr/lib/python2.5/site-packages/twisted/application/internet.py:68:privilegedStartService
           11:/usr/lib/python2.5/site-packages/twisted/application/internet.py:86:_getPort
            12:/usr/lib/python2.5/site-packages/twisted/internet/posixbase.py:467:listenTCP
             13:/usr/lib/python2.5/site-packages/twisted/internet/tcp.py:730:startListening
              14:/usr/lib/python2.5/site-packages/twisted/internet/tcp.py:718:createInternetSocket
               15:/usr/lib/python2.5/site-packages/twisted/internet/base.py:724:createInternetSocket
                16:tahoe-client.tac:55:__init__
                 create socket <Socket object at 0x8eb580c> (2, 1)
              14:tahoe-client.tac:64:bind
               bind <Socket object at 0x8eb580c> (('', 56530),)
foolscap.pb.Listener starting on 56530
         9:/usr/lib/python2.5/site-packages/twisted/application/service.py:228:privilegedStartService
          10:/usr/lib/python2.5/site-packages/twisted/application/internet.py:68:privilegedStartService
           11:/usr/lib/python2.5/site-packages/twisted/application/internet.py:86:_getPort
            12:/usr/lib/python2.5/site-packages/twisted/internet/posixbase.py:467:listenTCP
             13:/usr/lib/python2.5/site-packages/twisted/internet/tcp.py:730:startListening
              14:/usr/lib/python2.5/site-packages/twisted/internet/tcp.py:718:createInternetSocket
               15:/usr/lib/python2.5/site-packages/twisted/internet/base.py:724:createInternetSocket
                16:tahoe-client.tac:55:__init__
                 create socket <Socket object at 0x8eb5844> (2, 1)
              14:tahoe-client.tac:64:bind
               bind <Socket object at 0x8eb5844> (('0.0.0.0', 3456),)
nevow.appserver.NevowSite starting on 3456
Starting factory <nevow.appserver.NevowSite instance at 0x8eb07ac>
My pid: 27115
      6:/usr/lib/python2.5/site-packages/twisted/application/app.py:113:runReactorWithLogging
       7:/usr/lib/python2.5/site-packages/twisted/internet/posixbase.py:220:run
        8:/usr/lib/python2.5/site-packages/twisted/internet/posixbase.py:228:mainLoop
         9:/usr/lib/python2.5/site-packages/twisted/internet/base.py:561:runUntilCurrent
          10:/usr/lib/python2.5/site-packages/foolscap/eventual.py:26:_turn
           11:/usr/lib/python2.5/site-packages/allmydata/node.py:259:_startService
            12:/usr/lib/python2.5/site-packages/twisted/internet/defer.py:191:addCallback
             13:/usr/lib/python2.5/site-packages/twisted/internet/defer.py:182:addCallbacks
              14:/usr/lib/python2.5/site-packages/twisted/internet/defer.py:317:_runCallbacks
               15:/usr/lib/python2.5/site-packages/allmydata/node.py:259:<lambda>
                16:/usr/lib/python2.5/site-packages/allmydata/util/iputil.py:93:get_local_addresses_async
                 17:/usr/lib/python2.5/site-packages/allmydata/util/iputil.py:137:get_local_ip_for
                  18:/usr/lib/python2.5/site-packages/twisted/internet/posixbase.py:388:listenUDP
                   19:/usr/lib/python2.5/site-packages/twisted/internet/udp.py:84:startListening
                    20:/usr/lib/python2.5/site-packages/twisted/internet/udp.py:89:_bindSocket
                     21:/usr/lib/python2.5/site-packages/twisted/internet/base.py:724:createInternetSocket
                      22:tahoe-client.tac:55:__init__
                       create socket <Socket object at 0x8eb58ec> (2, 2)
                     21:tahoe-client.tac:64:bind
                      bind <Socket object at 0x8eb58ec> (('', 0),)
twisted.internet.protocol.DatagramProtocol starting on 53017
Starting protocol <twisted.internet.protocol.DatagramProtocol instance at 0x8eb0dcc>
                  18:/usr/lib/python2.5/site-packages/twisted/internet/udp.py:181:connect
                   19:tahoe-client.tac:58:connect
                    connect <Socket object at 0x8eb58ec> (('198.41.0.4', 7),)
(Port 53017 Closed)
Stopping protocol <twisted.internet.protocol.DatagramProtocol instance at 0x8eb0dcc>
         9:/usr/lib/python2.5/site-packages/twisted/internet/base.py:561:runUntilCurrent
          10:/usr/lib/python2.5/site-packages/foolscap/eventual.py:26:_turn
           11:/usr/lib/python2.5/site-packages/twisted/internet/defer.py:239:callback
            12:/usr/lib/python2.5/site-packages/twisted/internet/defer.py:304:_startRunCallbacks
             13:/usr/lib/python2.5/site-packages/twisted/internet/defer.py:317:_runCallbacks
              14:/usr/lib/python2.5/site-packages/allmydata/client.py:136:_start_introducer_client
               15:/usr/lib/python2.5/site-packages/twisted/application/service.py:148:setServiceParent
                16:/usr/lib/python2.5/site-packages/twisted/application/service.py:260:addService
                 17:/usr/lib/python2.5/site-packages/allmydata/introducer/client.py:58:startService
                  18:/usr/lib/python2.5/site-packages/foolscap/pb.py:901:connectTo
                   19:/usr/lib/python2.5/site-packages/foolscap/reconnector.py:60:startConnecting
                    20:/usr/lib/python2.5/site-packages/foolscap/reconnector.py:87:_connect
                     21:/usr/lib/python2.5/site-packages/foolscap/pb.py:830:getReference
                      22:/usr/lib/python2.5/site-packages/twisted/internet/defer.py:107:maybeDeferred
                       23:/usr/lib/python2.5/site-packages/foolscap/pb.py:856:_getReference
                        24:/usr/lib/python2.5/site-packages/foolscap/pb.py:947:getBrokerForTubRef
                         25:/usr/lib/python2.5/site-packages/foolscap/negotiate.py:1332:connect
                          26:/usr/lib/python2.5/site-packages/foolscap/negotiate.py:1359:connectToAll
                           27:/usr/lib/python2.5/site-packages/twisted/internet/posixbase.py:474:connectTCP
                            28:/usr/lib/python2.5/site-packages/twisted/internet/base.py:665:connect
                             29:/usr/lib/python2.5/site-packages/twisted/internet/tcp.py:885:_makeTransport
                              30:/usr/lib/python2.5/site-packages/twisted/internet/tcp.py:593:__init__
                               31:/usr/lib/python2.5/site-packages/twisted/internet/tcp.py:497:createInternetSocket
                                32:tahoe-client.tac:55:__init__
                                 create socket <Socket object at 0x8eb5a74> (2, 1)
                           27:/usr/lib/python2.5/site-packages/twisted/internet/posixbase.py:474:connectTCP
                            28:/usr/lib/python2.5/site-packages/twisted/internet/base.py:665:connect
                             29:/usr/lib/python2.5/site-packages/twisted/internet/tcp.py:885:_makeTransport
                              30:/usr/lib/python2.5/site-packages/twisted/internet/tcp.py:593:__init__
                               31:/usr/lib/python2.5/site-packages/twisted/internet/tcp.py:497:createInternetSocket
                                32:tahoe-client.tac:55:__init__
                                 create socket <Socket object at 0x8c59e64> (2, 1)
         9:/usr/lib/python2.5/site-packages/twisted/internet/base.py:561:runUntilCurrent
          10:/usr/lib/python2.5/site-packages/twisted/internet/tcp.py:506:resolveAddress
           11:/usr/lib/python2.5/site-packages/twisted/internet/tcp.py:513:_setRealAddress
            12:/usr/lib/python2.5/site-packages/twisted/internet/tcp.py:539:doConnect
             13:tahoe-client.tac:61:connect_ex
              connect_ex <Socket object at 0x8eb5a74> (('127.0.0.1', 45651),)
          10:/usr/lib/python2.5/site-packages/twisted/internet/tcp.py:506:resolveAddress
           11:/usr/lib/python2.5/site-packages/twisted/internet/tcp.py:513:_setRealAddress
            12:/usr/lib/python2.5/site-packages/twisted/internet/tcp.py:539:doConnect
             13:tahoe-client.tac:61:connect_ex
              connect_ex <Socket object at 0x8c59e64> (('192.168.20.244', 45651),)
Here is the code for the logging tool:
import sys

laststack = []

def tlog(*msgs):
    """Outputs a log message with stack trace context."""
    global laststack
    try:
        raise Exception()
    except Exception:
        frame = sys.exc_info()[2].tb_frame.f_back
    stack = extract(frame)
    for i, fr in enumerate(stack):
        if not (i < len(laststack) and laststack[i] is fr):
            lineno = fr.f_lineno
            co = fr.f_code
            filename = co.co_filename
            name = co.co_name
            print " " * i + ("%i:%s:%s:%s" % (i, filename, lineno, name))
    print " " * len(stack) + " ".join(map(str, msgs))
    laststack = stack

def extract(frame):
    got = []
    while frame is not None:
        got.append(frame)
        frame = frame.f_back
    got.reverse()
    return got

This makes use of the fact that Python's stack frames are heap-allocated, so they have identity, and it is possible to hold on to a stack frame object after the frame has exited. This means it is possible to distinguish between two different invocations of the same function, even when the tracebacks are textually identical. This would not be possible in C -- gdb would have to approximate.

For completeness, here is the monkey-patching code that I used:

import socket

class Socket(socket.socket):
    def __init__(self, *args, **kwargs):
        tlog("create socket", self, args)
        super(Socket, self).__init__(*args, **kwargs)
    def connect(self, *args, **kwargs):
        tlog("connect", self, args)
        super(Socket, self).connect(*args, **kwargs)
    def connect_ex(self, *args, **kwargs):
        tlog("connect_ex", self, args)
        super(Socket, self).connect_ex(*args, **kwargs)
    def bind(self, *args, **kwargs):
        tlog("bind", self, args)
        super(Socket, self).bind(*args, **kwargs)

socket.socket = Socket

No comments: