Monday 7 December 2009

Modular vs. monolithic build systems

I have spent some time packaging software as Debian packages. While the Debian packaging system has its faults, it has the nice property that it is modular. This post is an attempt to articulate what aspects of Debian packaging -- and other modular build systems -- are worth replicating, and why it is worth co-operating with these systems rather than ignoring them or working around them.

What is a modular build?

  1. A modular build consists of a set of modules,
  2. each of which can be built separately.
  3. Each module's build produces some output (a directory tree).
  4. A module may depend on the outputs of other modules, but it can't reach inside the others' build trees.
  5. There is a common interface that each module provides for building itself.
  6. The build tool can be replaced with another. The description of the module set is separate from the modules themselves.

What is a non-modular, monolithic build?

  1. A monolithic build consists of one big build tree.
  2. Any part of the build can reference any other part via relative filenames.
  3. It might consist of multiple checkouts from version control, but they have to be checked out to specific directory tree locations (as in the Chromium build).

Some examples of modular builds:

  • Build systems/tools:
    • Debian packages (and presumably RPMs too)
    • JHBuild
    • Zero-Install
    • Nix
  • Module interfaces:
    • GNU autotools (./configure && make && make install)
    • Python distutils (setup.py)
  • Software collections:
    • GNOME
    • Xorg (7.0 onwards)
    • Sugar
Examples of monolithic builds:
  • XFree86 (and Xorg 6.9): Before Xorg was modularised, there was a big makefile that built everything, from Xlib to the X server to example X clients.
  • Chromium web browser: This uses a tool called "gyp" to generate a big makefile which compiles individual source files from several libraries, including WebKit, V8 and the Native Client IPC library. It ignores WebKit's own build system.
  • Native Client: One SCons build builds the core code as well as the NPAPI browser plugin and example code; it needs to know how to cross-compile NaCl code as well as compile host system code. Another makefile builds the compiler toolchain from tarballs and patch files that are checked into SVN.
  • CPython: The standard library builds many Python C extensions.

Modular build systems offer a number of advantages:

  • You can download and build only the parts you need. This can be a big help if some modules are huge but seldom change while the modules you work on are small and fast to build.
  • Some systems (such as Debian packages) give you binary packages so you don't need to build the dependencies of the modules that you want to work on. JHBuild doesn't provide this but it could be achieved with a little work.
  • Dependencies are clearer.
  • External interfaces are clearer too.
  • It is possible to change one module's version independently of other modules (to the extent that differing versions are compatible).
  • They are relatively easy to use in a decentralised way. It is easy to create a new version of a module set which adds or removes modules.
  • You don't have to check huge dependencies into your version control system. Some projects check in monster tarballs or source trees, which dwarf the project's own code. If you avoid this practice you will make it easier for distributions to package your software.

The two categories can coexist: Each module may internally be a monolithic build which can be arbitrarily complex. Autotools is an example of that. This is not too bad because at least we have contained the complexity within the module. The layer on top, which connects modules together, can be relatively simple.

Despite its faults, autotools is very amenable to being part of a modular build:

  • The build tree does not need to be kept around after doing "make install".
  • Output can be directed using "--prefix=foo" and "make install DESTDIR=foo".
  • Inputs can be specified via --prefix and PATH and other environment variables.
  • The build tree can be separate from the source tree. It's easy to have multiple build trees with different build options.

The systems I listed as modular all have their own problems. The main problem with Debian packages is that they are installed system-wide, which requires root access and makes it difficult to install multiple versions of a package. It is possible to work around this problem using chroots. JHBuild, Zero-Install and Nix avoid this problem. JHBuild and Zero-Install are not so good at capturing immutable snapshots of package sets. Nix is good at capturing snapshots, but Nix makes it difficult to change a library without rebuilding everything that uses it.

Despite these problems, these systems have a nice property: they are layered. It is possible to mix and match modules and replace the build layer. Hence it is possible to build Xorg and GNOME either with JHBuild or as Debian packages. In turn, there is a choice of tools for building Debian source packages. There is even a tool for making sets of Debian packages from JHBuild module descriptions.

These systems do not interoperate perfectly, but they do work and scale.

There are some arguments for having a monolithic system. In some situations it is difficult to split pieces of software into separately-built modules. For example, Plash-glibc is currently built by symlinking the source for the Plash IPC library into the glibc source tree, so that glibc builds it with the correct compiler flags and with the glibc internal header files. Ideally the IPC library would be built as a separate module, but for now it is better not to.

Still, if you can find good module boundaries, it is a good idea to take advantage of them.

No comments: