Monday, 31 August 2009

How to do model checking of Python code

I read about eXplode in early 2007. The paper describes a neat technique, inspired by model checking, for exhaustively exploring all execution paths of a program. The difference from model checking is that it doesn't require a model of the program to be written in a special language; it can operate on a program written in a normal language. The authors of the paper use the technique to check the correctness of Linux filesystems, and version control systems layered on top of them, in the face of crashes.

The basic idea is to have a special function, choose(), that performs a non-deterministic choice, e.g.

   choose([1, 2, 3])

Conceptually, this forks execution into three branches, returning 1, 2 or 3 in each branch.

The program can call choose() multiple times, so you end up with a choice tree.

The job of the eXplode check runner is to explore that choice tree, and to try to find sequences of choices that make the program fail.

eXplode does not try to snapshot the state of the program when choose() is called. (That's what other model checkers have tried to do, and it's what would happen if choose() were implemented using the Unix fork() call.)

Instead, eXplode restores the state of the program by running it from the start and replaying a previously-recorded sequence of choices by returning them one-by-one from choose(). A node in the choice tree is represented by the sequence of choices required to get to it, and so to explore that node's subtree the check runner will repeatedly run the program from the start, replaying the choices for that subtree. The check runner does a breadth-first traversal of the choice tree, so it has a queue of nodes to visit, and each visit will add zero or more subnodes to the queue. (See the code below for a concrete implementation.)

This replay technique means that the program must be deterministic, otherwise the choice tree won't get explored properly. All non-determinism must occur through choose().

There seem to be two situations in which you might call choose():

  1. You can call choose() from your test code to generate input.

    In eXplode, the test program writes files and tries to read them back, and it uses choose() to pick what kind of writes it does.

    So far I have used choose() in tests to generate input in two different ways:

    In one case, I started with valid input and mangled it in different ways, to check the error handling of the software-under-test. The test code used choose() to pick a file to delete or line to remove. The idea was that the software-under-test should give a nicely-formatted error message, specifying the faulty file, rather than a Python traceback. There was only a bounded number of faults to introduce so the choice tree was finite.

    In another case, I built up XML trees to check that the software under test could process them. In this case the choice tree is infinite. I decided to prune it arbitrarily at a certain depth so that the test could run as part of a fairly quickly-running test suite.

    In both these cases, using the eXplode technique wasn't strictly necessary. It would have been possible to write the tests without choose(), perhaps by copying the inputs before modifying them. But I think choose() makes the test more concise, and not all objects have interfaces for making copies.

  2. You can also call choose() from the software-under-test to introduce deliberate faults, or from a fake object that the software-under-test is instantiated with.

    In eXplode, the filesystem is tested with a fake block device which tries to simulate all the possible failures modes of a real hard disc (note that Flash drives have different failure modes!). The fake device uses choose() to decide when to simulate a crash and restart, and which blocks should be successfully written to the virtual device when the crash happens.

    You can also wrap malloc() so that it non-deterministically returns NULL in order to test that software handles out-of-memory conditions gracefully. Testing error paths is difficult to do and this looks like a good way of doing it.

    In these cases, the calls to choose() can be buried deeply in the call stack, and the eXplode check runner is performing an inversion of control that would be hard to do in a conventional test case.

Here's an example of how to implement the eXplode technique in Python:


import unittest


# If anyone catches and handles this, it will break the checking model.
class ModelCheckEscape(Exception):

    pass


class Chooser(object):

    def __init__(self, chosen, queue):
        self._so_far = chosen
        self._index = 0
        self._queue = queue

    def choose(self, choices):
        if self._index < len(self._so_far):
            choice = self._so_far[self._index]
            if choice not in choices:
                raise Exception("Program is not deterministic")
            self._index += 1
            return choice
        else:
            for choice in choices:
                self._queue.append(self._so_far + [choice])
            raise ModelCheckEscape()


def check(func):
    queue = [[]]
    while len(queue) > 0:
        chosen = queue.pop(0)
        try:
            func(Chooser(chosen, queue))
        except ModelCheckEscape:
            pass
        # Can catch other exceptions here and report the failure
        # - along with the choices that caused it - and then carry on.


class ModelCheckTest(unittest.TestCase):

    def test(self):
        got = []

        def func(chooser):
            # Example of how to generate a Cartesian product using choose():
            v1 = chooser.choose(["a", "b", "c"])
            v2 = chooser.choose(["x", "y", "z"])
            # We don't normally expect the function being tested to
            # capture state - in this case the "got" list - but we
            # do it here for testing purposes.
            got.append(v1 + v2)

        check(func)
        self.assertEquals(sorted(got),
                          sorted(["ax", "ay", "az",
                                  "bx", "by", "bz",
                                  "cx", "cy", "cz"]))


if __name__ == "__main__":
    unittest.main()

5 comments:

Anonymous said...

Interesting article. Thanks.

Could you take a look at your code, though? Note that longer lines are cut off; they don't fully display.

Mark Seaborn said...

Good point. I have changed the theme to one that is more code friendly.

Florian Bauer said...

Very interesting approach. I would really like to use this for our company-internal test suite. Is this ok with you? What license is your code under?

Mark Seaborn said...

Yes, you can use this code. Any source code that I post inline in a blog post can be considered public domain. Attribution, e.g. a link to the blog post, is encouraged but not required!

Mark Seaborn said...

There's a usability bug in the code above. If you pass an empty list to choose(), it silently prunes the choice tree.

It should contain an assertion to guard against that:

assert len(choices) > 0

I discovered this because I'm using this code to model a multi-threading construct in order to check for deadlocks (specifically, NaCl's thread suspension logic). If we reach a point where no threads are runnable but not all have exited, there's a deadlock and we print an error. However, in one case I was passing a list of runnable thread IDs to choose(). This list could be empty, so this deadlock scenario was being ignored.