Showing posts with label packaging. Show all posts
Showing posts with label packaging. Show all posts

Monday, 7 December 2009

Modular vs. monolithic build systems

I have spent some time packaging software as Debian packages. While the Debian packaging system has its faults, it has the nice property that it is modular. This post is an attempt to articulate what aspects of Debian packaging -- and other modular build systems -- are worth replicating, and why it is worth co-operating with these systems rather than ignoring them or working around them.

What is a modular build?

  1. A modular build consists of a set of modules,
  2. each of which can be built separately.
  3. Each module's build produces some output (a directory tree).
  4. A module may depend on the outputs of other modules, but it can't reach inside the others' build trees.
  5. There is a common interface that each module provides for building itself.
  6. The build tool can be replaced with another. The description of the module set is separate from the modules themselves.

What is a non-modular, monolithic build?

  1. A monolithic build consists of one big build tree.
  2. Any part of the build can reference any other part via relative filenames.
  3. It might consist of multiple checkouts from version control, but they have to be checked out to specific directory tree locations (as in the Chromium build).

Some examples of modular builds:

  • Build systems/tools:
    • Debian packages (and presumably RPMs too)
    • JHBuild
    • Zero-Install
    • Nix
  • Module interfaces:
    • GNU autotools (./configure && make && make install)
    • Python distutils (setup.py)
  • Software collections:
    • GNOME
    • Xorg (7.0 onwards)
    • Sugar
Examples of monolithic builds:
  • XFree86 (and Xorg 6.9): Before Xorg was modularised, there was a big makefile that built everything, from Xlib to the X server to example X clients.
  • Chromium web browser: This uses a tool called "gyp" to generate a big makefile which compiles individual source files from several libraries, including WebKit, V8 and the Native Client IPC library. It ignores WebKit's own build system.
  • Native Client: One SCons build builds the core code as well as the NPAPI browser plugin and example code; it needs to know how to cross-compile NaCl code as well as compile host system code. Another makefile builds the compiler toolchain from tarballs and patch files that are checked into SVN.
  • CPython: The standard library builds many Python C extensions.

Modular build systems offer a number of advantages:

  • You can download and build only the parts you need. This can be a big help if some modules are huge but seldom change while the modules you work on are small and fast to build.
  • Some systems (such as Debian packages) give you binary packages so you don't need to build the dependencies of the modules that you want to work on. JHBuild doesn't provide this but it could be achieved with a little work.
  • Dependencies are clearer.
  • External interfaces are clearer too.
  • It is possible to change one module's version independently of other modules (to the extent that differing versions are compatible).
  • They are relatively easy to use in a decentralised way. It is easy to create a new version of a module set which adds or removes modules.
  • You don't have to check huge dependencies into your version control system. Some projects check in monster tarballs or source trees, which dwarf the project's own code. If you avoid this practice you will make it easier for distributions to package your software.

The two categories can coexist: Each module may internally be a monolithic build which can be arbitrarily complex. Autotools is an example of that. This is not too bad because at least we have contained the complexity within the module. The layer on top, which connects modules together, can be relatively simple.

Despite its faults, autotools is very amenable to being part of a modular build:

  • The build tree does not need to be kept around after doing "make install".
  • Output can be directed using "--prefix=foo" and "make install DESTDIR=foo".
  • Inputs can be specified via --prefix and PATH and other environment variables.
  • The build tree can be separate from the source tree. It's easy to have multiple build trees with different build options.

The systems I listed as modular all have their own problems. The main problem with Debian packages is that they are installed system-wide, which requires root access and makes it difficult to install multiple versions of a package. It is possible to work around this problem using chroots. JHBuild, Zero-Install and Nix avoid this problem. JHBuild and Zero-Install are not so good at capturing immutable snapshots of package sets. Nix is good at capturing snapshots, but Nix makes it difficult to change a library without rebuilding everything that uses it.

Despite these problems, these systems have a nice property: they are layered. It is possible to mix and match modules and replace the build layer. Hence it is possible to build Xorg and GNOME either with JHBuild or as Debian packages. In turn, there is a choice of tools for building Debian source packages. There is even a tool for making sets of Debian packages from JHBuild module descriptions.

These systems do not interoperate perfectly, but they do work and scale.

There are some arguments for having a monolithic system. In some situations it is difficult to split pieces of software into separately-built modules. For example, Plash-glibc is currently built by symlinking the source for the Plash IPC library into the glibc source tree, so that glibc builds it with the correct compiler flags and with the glibc internal header files. Ideally the IPC library would be built as a separate module, but for now it is better not to.

Still, if you can find good module boundaries, it is a good idea to take advantage of them.

Monday, 27 October 2008

Making relocatable packages with JHBuild

I have been revisiting an experiment I began back in March with building GNOME with JHBuild. I wanted to see if it would be practical to use JHBuild to package all of GNOME with Zero-Install. The main issue with packaging anything with Zero-Install is relocatability.

If you build and install an autotools-based package with

./configure --prefix=/FOO && make install
the resulting files installed under /FOO will often have the pathname /FOO embedded in them, sometimes in text files, other times compiled into libraries and executables. This is a problem for Zero-Install because it runs under a normal user account and wants to install files into a user's home directory under ~/.cache. Currently if a program is to be packaged with Zero-Install it must be relocatable via environment variables. Compiling pathnames in is no good (at least without an environment variable override) because you don't know in advance where the program will be installed.

I found a few cases where pathnames get compiled in:

  • text: pkg-config .pc files
  • text: libtool .la files
  • text: shell scripts generated from .in files, such as gtkdocize, intltoolize and gettextize
  • binary: rpaths added by libtool

It is possible to handle these individual cases. Zero-Install's make-headers tool will fix up pkg-config .pc files. libtool .la files can apparently just be removed on Linux without any adverse effects. libtool could be modified to not use rpaths (unfortunately --disable-rpath doesn't seem to work), which are overridden by LD_LIBRARY_PATH anyway. gtkdocize et al could be modified. But that sounds like a lot of work. I'd like to get something working first.

In revisiting this I hoped that the only cases that would matter would be text files. It would be easy to do a search and replace inside text files to relocate packages. The idea would be to build with

/configure --prefix=/FAKEPREFIX
make install DESTDIR=/tempdest
and then rewrite /FAKEPREFIX to (say) /home/fred/realprefix. In a text file, changing the length of a pathname and changing the size of the file usually doesn't matter, but doing this to an ELF executable would completely screw the executable up. This search-and-replace trick would be a hack, but it would be worth trying.

It turned out that Evince (which I was using as a test case) embeds the pathname /FAKEPREFIX/share/applications/evince.desktop in its own executable, and if this file doesn't exist, it segfaults on startup.

Then it occurred to me that I could rewrite filenames inside binary files without changing the length of the filename: just pad the filenames out to a fixed length at the start.

So the idea now is to build with something like:

/configure --prefix=/home/bob/builddir/gtk-XXXXXXX
make install
and, when installing the files on another machine, rewrite
/home/bob/builddir/gtk-XXXXXXXXXXXXXXXXXXX
to
/home/fred/.cache/0install.net/gtk-XXXXXXX

Just make sure you start off with enough padding to allow the package to be relocated to any path a user is likely to use in practice.

This is even hackier than rewriting filenames inside text files, but it's very simple!

This is partly inspired by Nix, which does something similar, but with a bit more complexity. Nix will install a package under (something like) /nix/store/<hash>, where <hash> is (roughly) a cryptographic hash of the binary package's contents. But packages like Evince contain embedded filenames referring to their own contents, so Nix will build it with:

./configure --prefix=/nix/store/<made-up-hash>
make install
where <made-up-hash> is chosen randomly. Afterwards, the output is rewritten to replace <made-up-hash> with the real hash of the output, but there is some cleverness to discount <made-up-hash> from affecting the real hash.

(Earlier versions of Nix used the hash of the build input to identify packages rather than the hash of the build output. This avoided the need to do rewriting but didn't allow a package's contents to be verified based on its hash name.)

The fact that Nix uses this scheme successfully indicates that filename rewriting in binaries works, and filenames are not being checksummed or compressed or encoded in weird ways, which is good.

My plan now is:

  • Extend JHBuild to build packages into fixed-length prefixes and produce Zero-Install feeds. My work so far is on this Bazaar branch.
  • Extend Zero-Install to do filename rewriting inside files in order to relocate packages.