Monday, 31 August 2009

How to do model checking of Python code

I read about eXplode in early 2007. The paper describes a neat technique, inspired by model checking, for exhaustively exploring all execution paths of a program. The difference from model checking is that it doesn't require a model of the program to be written in a special language; it can operate on a program written in a normal language. The authors of the paper use the technique to check the correctness of Linux filesystems, and version control systems layered on top of them, in the face of crashes.

The basic idea is to have a special function, choose(), that performs a non-deterministic choice, e.g.

   choose([1, 2, 3])

Conceptually, this forks execution into three branches, returning 1, 2 or 3 in each branch.

The program can call choose() multiple times, so you end up with a choice tree.

The job of the eXplode check runner is to explore that choice tree, and to try to find sequences of choices that make the program fail.

eXplode does not try to snapshot the state of the program when choose() is called. (That's what other model checkers have tried to do, and it's what would happen if choose() were implemented using the Unix fork() call.)

Instead, eXplode restores the state of the program by running it from the start and replaying a previously-recorded sequence of choices by returning them one-by-one from choose(). A node in the choice tree is represented by the sequence of choices required to get to it, and so to explore that node's subtree the check runner will repeatedly run the program from the start, replaying the choices for that subtree. The check runner does a breadth-first traversal of the choice tree, so it has a queue of nodes to visit, and each visit will add zero or more subnodes to the queue. (See the code below for a concrete implementation.)

This replay technique means that the program must be deterministic, otherwise the choice tree won't get explored properly. All non-determinism must occur through choose().

There seem to be two situations in which you might call choose():

  1. You can call choose() from your test code to generate input.

    In eXplode, the test program writes files and tries to read them back, and it uses choose() to pick what kind of writes it does.

    So far I have used choose() in tests to generate input in two different ways:

    In one case, I started with valid input and mangled it in different ways, to check the error handling of the software-under-test. The test code used choose() to pick a file to delete or line to remove. The idea was that the software-under-test should give a nicely-formatted error message, specifying the faulty file, rather than a Python traceback. There was only a bounded number of faults to introduce so the choice tree was finite.

    In another case, I built up XML trees to check that the software under test could process them. In this case the choice tree is infinite. I decided to prune it arbitrarily at a certain depth so that the test could run as part of a fairly quickly-running test suite.

    In both these cases, using the eXplode technique wasn't strictly necessary. It would have been possible to write the tests without choose(), perhaps by copying the inputs before modifying them. But I think choose() makes the test more concise, and not all objects have interfaces for making copies.

  2. You can also call choose() from the software-under-test to introduce deliberate faults, or from a fake object that the software-under-test is instantiated with.

    In eXplode, the filesystem is tested with a fake block device which tries to simulate all the possible failures modes of a real hard disc (note that Flash drives have different failure modes!). The fake device uses choose() to decide when to simulate a crash and restart, and which blocks should be successfully written to the virtual device when the crash happens.

    You can also wrap malloc() so that it non-deterministically returns NULL in order to test that software handles out-of-memory conditions gracefully. Testing error paths is difficult to do and this looks like a good way of doing it.

    In these cases, the calls to choose() can be buried deeply in the call stack, and the eXplode check runner is performing an inversion of control that would be hard to do in a conventional test case.

Here's an example of how to implement the eXplode technique in Python:


import unittest


# If anyone catches and handles this, it will break the checking model.
class ModelCheckEscape(Exception):

    pass


class Chooser(object):

    def __init__(self, chosen, queue):
        self._so_far = chosen
        self._index = 0
        self._queue = queue

    def choose(self, choices):
        if self._index < len(self._so_far):
            choice = self._so_far[self._index]
            if choice not in choices:
                raise Exception("Program is not deterministic")
            self._index += 1
            return choice
        else:
            for choice in choices:
                self._queue.append(self._so_far + [choice])
            raise ModelCheckEscape()


def check(func):
    queue = [[]]
    while len(queue) > 0:
        chosen = queue.pop(0)
        try:
            func(Chooser(chosen, queue))
        except ModelCheckEscape:
            pass
        # Can catch other exceptions here and report the failure
        # - along with the choices that caused it - and then carry on.


class ModelCheckTest(unittest.TestCase):

    def test(self):
        got = []

        def func(chooser):
            # Example of how to generate a Cartesian product using choose():
            v1 = chooser.choose(["a", "b", "c"])
            v2 = chooser.choose(["x", "y", "z"])
            # We don't normally expect the function being tested to
            # capture state - in this case the "got" list - but we
            # do it here for testing purposes.
            got.append(v1 + v2)

        check(func)
        self.assertEquals(sorted(got),
                          sorted(["ax", "ay", "az",
                                  "bx", "by", "bz",
                                  "cx", "cy", "cz"]))


if __name__ == "__main__":
    unittest.main()

Sunday, 28 June 2009

Encapsulation in Python: two approaches

Time for another post on how object-capability security might be done in Python.

Suppose function closures in Python were encapsulated (i.e. the attributes func_closure, func_globals etc. were removed) and were the only route for getting encapsulation for non-builtin objects, as is the case in Python's old restricted mode (a.k.a. rexec). There would be two ways of defining new encapsulated objects. Let's call them the Cajita approach and the Bastion approach.

The Cajita-style approach (named after the Cajita subset of Javascript) is to define an object using a record of function closures, one closure per method. Here's an example, from a post from Ka-Ping Yee from 2003:

def Counter():
    self = Namespace()
    self.i = 0

    def next():
        self.i += 1
        return self.i

    return ImmutableNamespace(next)
(There are some more examples - a read-only FileReader object, and Mint and Purse objects based on the example from E - in Tav's post on object-capabilities in Python.)

This is instead of the far more idiomatic, but unencapsulated class definition:

class Counter(object):

    def __init__(self):
        self._i = 0

    def next(self):
        self._i += 1
        return self._i
(Ignoring that the more idiomatic way to do this is to use a generator or itertools.count().)

I am calling this the Cajita approach because it is analagous to how objects are defined in the Cajita subset of Javascript (part of the Caja project), where this code would be written as:

function Counter() {
    var i = 0;
    return Obj.freeze({
        next: function () {
            i += 1;
            return i;
        },
    });
}
The Python version of the Cajita style is a lot more awkward because Python is syntactically stricter than Javascript:
  • Expressions cannot contain statements, so function definitions cannot be embedded in an object creation expression. As a result, method names have to be specified twice: once in the method definitions and again when passed to ImmutableNamespace. (It would be three times if ImmutableNamespace did not use __name__.)
  • A function can only assign to a variable in an outer scope by using the nonlocal declaration (recently introduced), and that is awkward. The example works around this by doing "self = Namespace()" and assigning to the attribute self.i.
The Cajita style also has a memory usage cost. The size of each object will be O(n) in the number of methods, because each method is created as a separate function closure with its own pointer to the object's state. This is not trivial to optimise away because doing so would change the object's garbage collection behaviour.

For these reasons it would be better not to use the Cajita style in Python.

The alternative is the approach taken by the Bastion module, which the Python standard library used to provide for use with Python's restricted mode. (Restricted mode and Bastion.py are actually still present in Python 2.6, but restricted mode has holes and is unsupported, and Bastion.py is disabled.)

Bastion provided wrapper objects that exposed only the public attributes of the object being wrapped (where non-public attribute names are those that begin with an underscore). This approach means we can define objects using class definitions as usual, and wrap the resulting objects. To make a class encapsulated and uninheritable, we just have to add a decorator:

@sealed
class Counter(object):

    def __init__(self):
        self._i = 0

    def next(self):
        self._i += 1
        return self._i
where "sealed" can be defined as follows:
# Minimal version of Bastion.BastionClass.
# Converts a function closure to an object.
class BastionClass:

    def __init__(self, get):
        self._get = get

    def __getattr__(self, attr):
        return self._get(attr)

# Minimal version of Bastion.Bastion.
def Bastion(object):
    # Create a function closure wrapping the object.
    def get(attr):
        if type(attr) is str and not attr.startswith("_"):
            return getattr(object, attr)
        raise AttributeError(attr)
    return BastionClass(get)

def sealed(klass):
    def constructor(*args, **kwargs):
        return Bastion(klass(*args, **kwargs))
    return constructor
>>> c = Counter()
>>> c.next()
1
>>> c.next()
2
>>> c._i
AttributeError: _i

Mutability problem

One problem with the Bastion approach is that Bastion's wrapper objects are mutable. Although you can't use the Bastion object to change the wrapped object's attributes, you can change the Bastion object's attributes. This means that these wrapper objects cannot be safely shared between mutually distrusting components. Multiple holders of the object could use it as a communications channel or violate each other's expectations.

There isn't an obvious way to fix this in pure Python. Overriding __setattr__ isn't enough. I expect the simplest way to deal with this is to implement Bastion in a C extension module instead of in Python. The same goes for ImmutableNamespace (referred to in the first example) - the Cajita-style approach faces the same issue.

Wrapping hazard

There is a potential hazard in defining encapsulated objects via wrappers that is not present in CapPython. In methods, "self" will be bound to the private view of the object. There is a risk of accidentally writing code that passes the unwrapped self object to untrusted code.

An example:

class LoggerMixin(object):

    def sub_log(self, prefix):
        return PrefixLog(self, prefix)

@sealed
class PrefixLog(LoggerMixin):

    def __init__(self, log, prefix):
        self._log = log
        self._prefix = log

    def log(self, message):
        self._log.log("%s: %s" % (prefix, message))
(This isn't a great example because of the circular relationship between the two classes.)

This hazard is something we could lint for. It would be easy to warn about cases where self is used outside of an attribute access.

The method could be changed to:

    def sub_log(self, prefix):
        return PrefixLog(seal_object(self), prefix)
This creates a new wrapper object each time which is not always desirable. To avoid this we could store the wrapper object in an attribute of the wrapped object. Note that that would create a reference cycle, and while CPython's cycle collector is usually fine for collecting cycles, creating a cycle for every object might not be a good idea. An alternative would be to memoize seal_object() using a weak dictionary.

Sunday, 14 June 2009

Python standard library in Native Client

The Python standard library now works under Native Client in the web browser, including the Sqlite extension module.

By that I mean that it is possible to import modules from the standard library, but a lot of system calls won't be available. Sqlite works for in-memory use.

Here's a screenshot of the browser-based Python REPL:

The changes needed to make this work were quite minor:

  • Implementing stat() in glibc so that it does an RPC to the Javascript running on the web page. Python uses stat() when searching for modules in sys.path.
  • Changing fstat() in NaCl so that it doesn't return a fixed inode number. Python uses st_ino to determine whether an extension module is the same as a previously-loaded module, which doesn't work if st_ino is the same for different files.
  • Some tweaks to nacl-glibc-gcc, a wrapper script for nacl-gcc.
  • Ensuring that NaCl-glibc's libpthread.so works.
I didn't have to modify Python or Sqlite at all. I didn't even have to work out what arguments to pass to Sqlite's configure script - the Debian package for Sqlite builds without modification. It's just a matter of setting PATH to override gcc to run nacl-glibc-gcc.

Thursday, 28 May 2009

A really simple tracing debugger: part 2

In my last post I showed how to get a readable execution trace of a process using a simple ptrace()-based tracer (itrace) and binutils' addr2line. By doing a bit more work we can extract the function call tree from the trace. In order to distinguish callers and callees, we can use the stack pointer (%esp), which itrace outputs.

Here's the call tree for a "hello world" program using glibc 2.9 on i386, statically linked:

$ ./itrace ./hellow-static | python treeify.py ./hellow-static
_start ??:0 8048130
  __libc_start_main glibc/csu/libc-start.c:93 8048220
    _dl_aux_init glibc/elf/dl-support.c:165 80518e0
    _dl_discover_osversion glibc/elf/../sysdeps/unix/sysv/linux/dl-sysdep.c:67 80521e0
      __uname ??:0 806e160
    __pthread_initialize_minimal glibc/csu/libc-tls.c:247 8048750
      __libc_setup_tls glibc/csu/libc-tls.c:111 8048550
        __sbrk glibc/misc/sbrk.c:34 80505d0
          __brk glibc/misc/../sysdeps/unix/sysv/linux/i386/brk.c:36 806f9c0
          __brk glibc/misc/../sysdeps/unix/sysv/linux/i386/brk.c:36 806f9c0
        memcpy glibc/string/memcpy.c:33 804fc10
      init_slotinfo glibc/csu/libc-tls.c:83 80486cb
        init_static_tls glibc/csu/libc-tls.c:100 8048708
  _dl_setup_stack_chk_guard glibc/csu/../sysdeps/unix/sysv/linux/dl-osinfo.h:76 8048283
    __libc_init_first glibc/csu/../sysdeps/unix/sysv/linux/init-first.c:44 80522f0
      __setfpucw glibc/math/../sysdeps/i386/setfpucw.c:29 8056270
      __libc_init_secure glibc/elf/enbl-secure.c:34 8052190
      _dl_non_dynamic_init glibc/elf/dl-support.c:235 8051b50
        getenv glibc/stdlib/getenv.c:36 8056ea0
          strlen glibc/string/../sysdeps/i386/i486/strlen.S:34 804f9f0
        getenv glibc/stdlib/getenv.c:36 8056ea0
          strlen glibc/string/../sysdeps/i386/i486/strlen.S:34 804f9f0
        _dl_init_paths glibc/elf/dl-load.c:630 8072ae0
          _dl_important_hwcaps glibc/elf/dl-support.c:306 8051b20
          __libc_malloc glibc/malloc/malloc.c:3540 804eb60
            malloc_hook_ini glibc/malloc/hooks.c:35 804f360
              ptmalloc_init glibc/malloc/arena.c:426 804b370
                ptmalloc_init_minimal glibc/malloc/arena.c:374 804b39f
                  __getpagesize glibc/misc/../sysdeps/unix/sysv/linux/getpagesize.c:29 8050650
                  __linkin_atfork glibc/nptl/../nptl/sysdeps/unix/sysv/linux/register-atfork.c:115 80513a0
                next_env_entry glibc/malloc/arena.c:344 804b45d
            __libc_malloc glibc/malloc/malloc.c:3540 804eb60
              _int_malloc glibc/malloc/malloc.c:4130 804cae0
                malloc_consolidate glibc/malloc/malloc.c:4835 804afd0
                malloc_init_state glibc/malloc/malloc.c:2412 804a8e0
              sYSMALLOc glibc/malloc/malloc.c:2929 804d200
                __default_morecore glibc/malloc/morecore.c:48 804f9c0
                  __sbrk glibc/misc/sbrk.c:34 80505d0
                    __brk glibc/misc/../sysdeps/unix/sysv/linux/i386/brk.c:36 806f9c0
                __default_morecore glibc/malloc/morecore.c:48 804f9c0
                  __sbrk glibc/misc/sbrk.c:34 80505d0
                    __brk glibc/misc/../sysdeps/unix/sysv/linux/i386/brk.c:36 806f9c0
          __libc_malloc glibc/malloc/malloc.c:3540 804eb60
            _int_malloc glibc/malloc/malloc.c:4130 804cae0
        getenv glibc/stdlib/getenv.c:36 8056ea0
          strlen glibc/string/../sysdeps/i386/i486/strlen.S:34 804f9f0
        getenv glibc/stdlib/getenv.c:36 8056ea0
          strlen glibc/string/../sysdeps/i386/i486/strlen.S:34 804f9f0
        getenv glibc/stdlib/getenv.c:36 8056ea0
          strlen glibc/string/../sysdeps/i386/i486/strlen.S:34 804f9f0
        getenv glibc/stdlib/getenv.c:36 8056ea0
          strlen glibc/string/../sysdeps/i386/i486/strlen.S:34 804f9f0
      dl_platform_init glibc/elf/../sysdeps/i386/dl-machine.h:273 8051c38
        getenv glibc/stdlib/getenv.c:36 8056ea0
          strlen glibc/string/../sysdeps/i386/i486/strlen.S:34 804f9f0
    __init_misc glibc/misc/init-misc.c:31 8051120
      rindex glibc/string/../sysdeps/i386/strrchr.S:37 8067a30
    __cxa_atexit glibc/stdlib/cxa_atexit.c:34 8048ab0
      __new_exitfn glibc/stdlib/cxa_atexit.c:63 8048960
    __libc_csu_init glibc/csu/elf-init.c:65 80487b0
      _init /build/buildd/glibc-2.7/build-tree/amd64-i386/csu/crti.S:15 80480f4
        _init ??:0 8048116
          frame_dummy crtstuff.c:0 80481b0
            __register_frame_info ??:0 80a2b50
              __register_frame_info_bases ??:0 80a2ab0
                __i686.get_pc_thunk.bx ??:0 80a29f9
          __do_global_ctors_aux crtstuff.c:0 80a4b10
        _init /build/buildd/glibc-2.7/build-tree/amd64-i386/csu/crtn.S:10 8048120
    _setjmp glibc/setjmp/../sysdeps/i386/bsd-_setjmp.S:36 8048830
    main ??:0 80481f0
      _IO_puts glibc/libio/ioputs.c:34 8048b10
        strlen glibc/string/../sysdeps/i386/i486/strlen.S:34 804f9f0
        _IO_new_file_xsputn glibc/libio/fileops.c:1288 8065870
          _IO_new_file_overflow glibc/libio/fileops.c:829 80661a0
            _IO_doallocbuf glibc/libio/genops.c:419 8049780
              _IO_file_doallocate glibc/libio/filedoalloc.c:88 808bd50
                _IO_file_stat glibc/libio/fileops.c:1225 8065b90
                  ___fxstat64 glibc/io/../sysdeps/unix/sysv/linux/fxstat64.c:46 804fe20
                mmap glibc/misc/../sysdeps/unix/sysv/linux/i386/mmap.S:81 8051080
                _IO_setb glibc/libio/genops.c:404 8049670
            _IO_new_do_write glibc/libio/fileops.c:493 8065a50
          _IO_default_xsputn glibc/libio/genops.c:452 8049850
      _IO_acquire_lock_fct glibc/libio/libioP.h:968 8048b8d
    exit glibc/stdlib/exit.c:34 8048870
      __libc_csu_fini glibc/csu/elf-init.c:91 8048770
      _fini /build/buildd/glibc-2.7/build-tree/amd64-i386/csu/crti.S:41 80a5574
        _fini ??:0 80a5587
          __do_global_dtors_aux crtstuff.c:0 8048160
            fini glibc/dlfcn/dlerror.c:207 80975e0
              check_free glibc/dlfcn/dlerror.c:188 8097560
            __deregister_frame_info ??:0 80a3c30
            __deregister_frame_info_bases ??:0 80a3b20
              __i686.get_pc_thunk.bx ??:0 80a29f9
        _fini /build/buildd/glibc-2.7/build-tree/amd64-i386/csu/crtn.S:21 80a558c
      _IO_cleanup glibc/libio/genops.c:1007 8049fd0
        _IO_flush_all_lockp glibc/libio/genops.c:823 8049db0
          _IO_new_file_overflow glibc/libio/fileops.c:829 80661a0
            _IO_new_do_write glibc/libio/fileops.c:493 8065a50
              new_do_write glibc/libio/fileops.c:505 8065760
                _IO_new_file_write glibc/libio/fileops.c:1261 8065a80
                  ?? ??:0 8
                __libc_write ??:0 804ffa0
                  __write_nocancel ??:0 804ffaa
      _IO_unbuffer_write glibc/libio/genops.c:951 8049fe8
        _IO_new_file_setbuf glibc/libio/fileops.c:445 8066380
          _IO_default_setbuf glibc/libio/genops.c:562 80496d0
            _IO_new_file_sync glibc/libio/fileops.c:891 80660d0
            _IO_setb glibc/libio/genops.c:404 8049670
      _exit glibc/posix/../sysdeps/unix/sysv/linux/i386/_exit.S:25 804fdc0
Believe it or not, this trace is quite illuminating. Working out that these calls will happen by reading the glibc source could take a long time!

The Python script below (treeify.py) does two things:

  • Firstly, it filters the trace to include only function entry points. We can just about infer function entry points from addr2line's output. We assume that when a function name (plus filename and line number) appears for the first time in the trace, the current address is the function's entry point. This doesn't work fully when inline functions are instantiated multiple times though. We could use nm to find symbol addresses, but addr2line gives us source filenames.
  • Secondly, it works out the nesting of the call tree by looking at the stack pointer.
import subprocess
import sys

def read():
    for line in sys.stdin:
        try:
            regs = [int(x, 16) for x in line.split(" ")]
            yield {"eip": regs[0], "esp": regs[1]}
        # Ignore lines interspersed with other output!
        except (ValueError, IndexError):
            pass

def addr2line(iterable):
    proc = subprocess.Popen(["addr2line", "-e", sys.argv[1], "-f"],
                            stdin=subprocess.PIPE, stdout=subprocess.PIPE)
    for regs in iterable:
        proc.stdin.write("%x\n" % regs["eip"])
        a = proc.stdout.readline().rstrip("\n")
        b = proc.stdout.readline().rstrip("\n")
        regs["func"] = "%s %s" % (a, b)
        yield regs

def entry_points(iterable):
    funcs = {}
    # We treat the first address we see for the function as its entry
    # point, and only report those entries from this point on.
    for regs in iterable:
        func = regs["func"].split(":")[0]
        if funcs.setdefault(func, regs["eip"]) == regs["eip"]:
            yield regs

def add_nesting(iterable):
    stack = [2 ** 64]
    for regs in iterable:
        stack_pos = regs["esp"]
        if stack_pos < stack[-1]:
            stack.append(stack_pos)
        while stack_pos > stack[-1]:
            stack.pop()
        regs["indent"] = "  " * len(stack)
        yield regs

for x in add_nesting(entry_points(addr2line(read()))):
    print x["indent"], x["func"], "%x" % x["eip"]

Saturday, 16 May 2009

A really simple tracing debugger

gdb can be really useful for debugging when it prints the information you want, but sometimes it will fail to give a useful backtrace, and it can be hard to find out what happened just before your process crashed.

I faced that problem when porting glibc to Native Client. gdb (or strace -i) could tell me the value of the instruction pointer when the process crashed, but it couldn't produce a backtrace because it didn't know how to read memory with NaCl's x86 segmentation setup. (I should point out that gdb has since been ported.) Often I could look up the address with addr2line or objdump -d to find the cause of the problem. However, that didn't help when the process died by jumping to address 0, or when it jumped to _start when running atexit handlers.

I found another way to do debugging, using a trace of the program's execution.

So, here's how to create a debugger with a single shell pipeline (plus a small C program) that works on statically-linked executables.

The first part is a tool called itrace, which starts a subprocess and single-steps its execution using Linux's ptrace() system call and prints the process's eip and esp registers (instruction pointer and stack pointer) at every step. (The code is below.)

Here's what we get if we run itrace on a simple statically-linked executable:

$ ./itrace ./hellow-static | less
8048130 ffce4a50
8048132 ffce4a50
8048133 ffce4a54
8048135 ffce4a54
8048138 ffce4a50
...
To make this intelligible, all we have to do is pipe the output of itrace through addr2line. This tool is part of binutils and translates addresses to source code filenames and line numbers using the debugging information in the executable. Also, pipe through uniq to remove duplicate lines:
$ ./itrace ./hellow-static | addr2line -e ./hellow-static | uniq | less
??:0
/work/glibc/csu/libc-start.c:93
/work/glibc/csu/libc-start.c:103
/work/glibc/csu/libc-start.c:93
/work/glibc/csu/libc-start.c:103
/work/glibc/csu/libc-start.c:93
/work/glibc/csu/libc-start.c:103
/work/glibc/csu/libc-start.c:106
/work/glibc/csu/libc-start.c:103
/work/glibc/csu/libc-start.c:106
...
/work/glibc/stdlib/exit.c:88
/work/glibc/stdlib/exit.c:90
/work/glibc/posix/../sysdeps/unix/sysv/linux/i386/_exit.S:25
/work/glibc/posix/../sysdeps/unix/sysv/linux/i386/_exit.S:29
/work/glibc/posix/../sysdeps/unix/sysv/linux/i386/_exit.S:30
addr2line prints "??:0" for addresses that it can't find in debugging information. If your executable hasn't been built with debugging information - the static libc.a in Debian/Ubuntu isn't built with debugging enabled - you can pass the -f option to addr2line to print function names as well, and then filter out the useless "??:0" lines:
$ cat hellow.c <<END
#include <stdio.h>
int main() { printf("Hello, world!\n"); return 0; }
END
$ gcc -static hellow.c -o hellow-static
$ ./itrace ./hellow-static | addr2line -e ./hellow-static -f | grep -v "??:0" | uniq | less
_start
__libc_start_main
_dl_aux_init
__libc_start_main
_dl_discover_osversion
__uname
_dl_discover_osversion
__libc_start_main
__pthread_initialize_minimal
__libc_setup_tls
...
_IO_default_setbuf
_IO_new_file_setbuf
_IO_unbuffer_write
_IO_cleanup
exit
_exit
This tells us what happened before the program exited. In this case, it was executing exit() and flushing stdio buffers.

Tracing debugging can often be more helpful than postmortem debugging. Debugging with printf is still popular, after all! However, single-stepping using ptrace() has quite a high overhead so this is not really suitable for longer-running processes. Potentially Valgrind could give us an execution trace more efficiently using its dynamic translation approach, but that would still be quite slow. Modern x86 CPUs have a "Branch Trace Store" feature for recording address of basic blocks as they are executed which has the potential to be a faster method, and there is a proposal to make this available in Linux through the ptrace() interface.

itrace.c

This assumes i386, so if you're on a x86-64 system you can build it with:
gcc -m32 itrace.c -o itrace
#include <stdio.h>
#include <sys/wait.h>
#include <unistd.h>

#include <linux/user.h>
#include <sys/ptrace.h>

int main(int argc, char **argv)
{
  int pid = fork();
  if(pid == 0) {
    if(ptrace(PTRACE_TRACEME) < 0) {
      perror("ptrace");
      _exit(1);
    }
    execvp(argv[1], argv + 1);
    perror("exec");
    _exit(1);
  }
  while(1) {
    int status;
    struct user_regs_struct regs;
    if(waitpid(pid, &status, 0) < 0)
      perror("waitpid");
    if(!WIFSTOPPED(status))
      break;
    if(ptrace(PTRACE_GETREGS, pid, 0, ®s) < 0)
      perror("ptrace/GETREGS");
    printf("%lx %lx\n", regs.eip, regs.esp);
    if(ptrace(PTRACE_SINGLESTEP, pid, 0, 0) < 0)
      perror("ptrace/SINGLESTEP");
  }
  return 0;
}

Monday, 4 May 2009

Progress on Native Client

Back in January I wrote that I was porting glibc to Native Client. I have made some good progress since then.

The port is at the stage where it can run simple statically-linked and dynamically-linked executables, from the command line and from the web browser.

In particular, Python works. I have put together a simple interactive top-level that demonstrates running Python from the web browser:

Upstream NaCl doesn't support any filename-based calls such as open(), but we do. In this setup, open() does not of course access the real filesystem on the host system. open() in NaCl-glibc sends a message to the Javascript code on the web page. The Javascript code can fetch the requested file, create a file descriptor for the file using the plugin's API, and send the file descriptor to the NaCl subprocess in reply.

This has not involved modifying Python at all, although I have added an extension module to wrap a couple of NaCl-specific system calls (imc_sendmsg() and imc_recvmsg()). The Python build runs to completion, including the parts where it runs the executable it builds. A simple instruction-rewriting trick means that dynamically-linked NaCl executables can run outside of NaCl, directly under Linux, linked against the regular Linux glibc. This means we can avoid some of the problems associated with cross-compilation when building for NaCl.

This work has involved extending Native Client:

  • Adding support for dynamic loading of code. Initially I have focused on just making it work. Now I'll focus on ensuring this is secure.
  • Writing a custom browser plugin for NaCl which allows Javascript code to handle asynchronous callbacks from NaCl subprocesses.
  • Making various changes to the NaCl versions of gcc and binutils to support dynamically linked code.
Hopefully these changes can get merged upstream eventually. Some of the toolchain changes have already gone in.

See the web page for instructions on how to get the code and try it out.

Saturday, 18 April 2009

Python variable binding semantics, part 2

Time for me to follow up my previous blog post and explain the code snippets.

Snippet 1

funcs = []
for x in (1, 2, 3):
    funcs.append(lambda: x)
for f in funcs:
    print f()
Although you might expect this Python code snippet to print "1 2 3", it actually prints "3 3 3". The reason for this is that the same mutable variable binding for x is used across all iterations of the loop. The for loop does not create new variable bindings.

The function closures that are created inside the loop do not capture the value of x; they capture the cell object that x is bound to under the hood (or the globals dictionary if this code snippet is run at the top level), and each iteration mutates the same cell object.

Interestingly, Python does not restrict the target of the for loop to be a variable or tuple of variables. It can be any lvalue expression that can appear on the left hand side of an assignment, including indexing expressions. For example:

x = [10, 20]
for x[0] in (1,2,3):
    pass
print x # Prints [3, 20]

Snippet 2

This problem is Python's variable binding semantics is not specific to lambdas and also occurs if you use def.
funcs = []
for x in (1, 2, 3):
    def myfunc():
        return x
    funcs.append(myfunc)
for f in funcs:
    print f()
This also prints "3 3 3".

Snippet 3

The remaining snippets are examples of ways to work around this problem. They print the desired "1 2 3".

One way is to use default arguments:

funcs = []
for x in (1, 2, 3):
    funcs.append(lambda x=x: x)
for f in funcs:
    print f()
This is actually the trick that was used to get the effect of lexical scoping before lexical scoping was added to Python.

The default argument captures the value of x at the point where the function closure is created, and this value gets rebound to x when the function is called.

Although it is concise, I'm not too keen on this workaround. It is an abuse of default arguments. There is a risk that you accidentally pass too many arguments to the function and thereby override the captured value of x.

Snippet 4

funcs = []
for x in (1, 2, 3):
    def g(x):
        funcs.append(lambda: x)
    g(x)
for f in funcs:
    print f()
This is quite an ugly workaround, but it's perhaps the most general one. The loop body is wrapped in a function. The x inside function g and the x outside are statically different variable bindings. A new mutable binding of x is created on every call to g, but this binding is never modified.

Snippets 5 and 6

The remaining two snippets are attempts to make the code less ugly, by giving names to the intermediate functions that are introduced to rebind x and thereby give the desired behaviour.
funcs = []
def append_a_func(x):
    funcs.append(lambda: x)
for x in (1, 2, 3):
    append_a_func(x)
for f in funcs:
    print f()

def make_func(x):
    return lambda: x
funcs = []
for x in (1, 2, 3):
    funcs.append(make_func(x))
for f in funcs:
    print f()
I suppose there is a downside to de-uglifying the code. If the code looks too normal, there's a risk that someone will refactor it to the incorrect version without testing that it works properly when looping over lists containing multiple items.

Can the language be fixed?

The root cause is that Python does away with explicit variable binding declarations. If a variable is assigned within a function -- including being assigned as the target of a for loop -- it becomes local to the function, but variables are never local to smaller statement blocks such as a loop body. Python doesn't have block scoping. There are a couple of naive ways in which you might try to fix the language so that the code does what you would expect, both of which have problems. We could change the semantics of function closure creation so that closures take a snapshot of variables' values. But this would break mutually recursive function definitions, and cases where functions are referred to before they are defined, such as:
def f(): # This "def" would raise an UnboundLocalError or a NameError
    g()
def g():
    pass
We could change the semantics of for loops so that the target variable is given a variable binding that is local to the loop body. But this wouldn't change the status of variable bindings created by assignment, so this code would still have the unwanted behaviour:
for x in (1, 2, 3):
    y = x
    funcs.append(lambda: y)

Linting

It should be possible to create a lint tool to detect this hazard. It could look for closures that capture variables that are assigned by or in loops. I wonder how many false positives it would give on a typical codebase.

Javascript

Javascript suffers from the same problem:
funcs = [];
for(var x = 0; x < 3; x++) {
    funcs.push(function() { return x; })
}
for(var i = 0; i < funcs.length; i++) {
    print(funcs[i]());
}
This code prints "3 3 3" when run using rhino.

In Javascript's case the affliction is entirely gratuitous. It happens because var declarations are implicitly hoisted up to the top of the function, for no apparent reason. It is gratuitous because this is not the result of a trade-off.

The good news, though, is that the problem is recognised and a fix to Javascript is planned. I think the plan is for Ecmascript 3.1 to introduce a let declaration as an alternative to var without the hoisting behaviour.

List comprehensions

Back to Python. The same problem also applies to list comprehensions and generator expressions.
funcs = [(lambda: x) for x in (1, 2, 3)]
for f in funcs:
    print f()

funcs = list((lambda: x) for x in (1, 2, 3))
for f in funcs:
    print f()

These both print "3 3 3".

(Note that I added brackets around the lambdas for clarity but the syntax does not require them.)

This is forgivable for list comprehensions, at least in Python 2.x, because the assignment to x escapes into the surrounding function. (See my earlier post.)

But for generators (and for list comprehensions in Python 3.0), the scope of x is limited to the comprehension. Semantically, it would be easy to limit the scope of x to within a loop iteration, so that each iteration introduces a new variable binding.