Tuesday 23 August 2011

Fixing the trouble with Buildbot

Last year I wrote a blog post, "The trouble with Buildbot", about how Buildbot creates a dilemma for complex projects because it forces you to choose between two ways of describing a project's build steps:
  • You can describe build steps in the Buildbot config. Buildbot configs are awkward to update -- someone has to restart the Buildbot master -- and hard to test, but you get the benefit that the build steps appear as separate steps in Buildbot's display.
  • You can write a script which runs the build steps directly, and check it into the same repository as your project. This is easier to maintain and test, but traditionally all the output from the script would appear as a single Buildbot build step, making the output hard to read.

Fortunately, Brad Nelson has addressed this problem with an extension to Buildbot known as "Buildbot Annotations". The Python code for this currently lives in chromium_step.py (see AnnotatedCommand).

The idea is that your checked-in script will run multiple steps sequentially but output tags between them (e.g. "@@@BUILD_STEP tests@@@") so that the output can be parsed into chunks by the Buildbot master, and displayed as separate chunks.

For example, an early version of Native Client's Annotations-based buildbot script looked something like this:

...
echo @@@BUILD_STEP gyp_compile@@@
make -C .. -k -j12 V=1 BUILDTYPE=${GYPMODE}

echo @@@BUILD_STEP scons_compile${BITS}@@@
./scons -j 8 -k --verbose ${GLIBCOPTS} --mode=${MODE}-host,nacl \
    platform=x86-${BITS}

echo @@@BUILD_STEP small_tests${BITS}@@@
./scons -k --verbose ${GLIBCOPTS} --mode=${MODE}-host,nacl small_tests \
    platform=x86-${BITS} ||
    { RETCODE=$? && echo @@@STEP_FAILURE@@@;}
...

(More recently, this shell script has been replaced with a Python script.)

You can see this in use on the Native Client Buildbot page (and also on the trybot page, though that's less readable).

The logic for running NaCl's many build steps -- including a clobber step, a Scons build, a Gyp build, small_tests, medium_tests, large_tests, chrome_browser_tests etc. -- used to live in the Buildbot config, and we'd usually have to get Brad Nelson to change it on our behalf. Brad would have to restart the buildbot masters manually, and this would halt any builds that were in progress, including trybot jobs.

Now the knowledge of these build steps has moved into scripts that are checked into the native_client repo, which can easily be updated. We can change the scripts at the same time as changing other code, with an atomic SVN commit. Changes can be tested via the trybots.

Chromium is not using Buildbot Annotations yet for its buildbot, but it would be good to switch it over. One obstacle is timeout handling. Buildbot's build steps can have separate timeouts, and the Buildbot build slave is responsible for terminating a build step's subprocess(es) if they take too long. With Buildbot Annotations, the responsibility for doing per-step timeouts would move to the checked-in build script.

The current Annotations output format has some down sides:

  • The syntax is simple but kind of ugly.
  • It's not possible to nest build steps.
  • It's not possible to interleave output from two concurrent build steps.

Overall, Annotations reduces our dependence on Buildbot. If there were a simpler, more scalable alternative to Buildbot that also supported the Annotations format, we could easily switch to it because we our Buildbot config is not as complex as it used to be.