Thursday 4 November 2010

An introduction to FreeBSD-Capsicum

In my last blog post, I described one of the features in FreeBSD-Capsicum: process descriptors. Now it's time for an overview of Capsicum.

Capsicum is a set of new features for FreeBSD that adds better support for sandboxing, using a capability model in which the capabilities are Unix file descriptors (FDs).

Capsicum takes a fairly conservative approach, in that it does not make operations on file descriptors virtualisable. This approach has some limitations -- we do not get the advantages of having purely message-passing syscalls. However, it does mean that the new features are orthogonal.

The main new features are:

  • A per-process "capability mode", which is turned on via a new cap_enter() syscall.

    This mode disables any system call that provides ambient authority. So it disables system calls that use global namespaces, including the file namespace (e.g. open()), the PID namespace (e.g. kill()) and the network address namespace (e.g. connect()).

    This is not just a syscall filter, though. Some system calls optionally use a global namespace. For example, sendmsg() and sendto() optionally take a socket address. For openat(), the directory FD can be omitted. Capability mode disables those cases.

    Furthermore, capability mode disallows the use of ".." (parent directory) in filenames for openat() and the other *at() calls. This changes directory FDs to be limited-authority objects that convey access to a specific directory and not the whole filesystem. (It is interesting that this appears to be a property of the process, via capability mode, rather than of the directory FD itself.)

    Capability mode is inherited across fork and exec.

  • Finer-grained permissions for file descriptors. Each FD gets a large set of permission bits. A less-permissive copy of an FD can be created with cap_new(). For example, you can have read-only directory FDs, or non-seekable FDs for files.
  • Process descriptors. Capsicum doesn't allow kill() inside the sandbox because kill() uses a global namespace (the PID namespace). So Capsicum introduces process descriptors (a new FD type) as a replacement for process IDs, and adds pdfork(), pdwait() and pdkill() as replacements for fork(), wait() and kill().

Plus there are a couple of smaller features:

  • Message-based sockets. The Capsicum guys implemented Linux's SOCK_SEQPACKET interface for FreeBSD.
  • An fexecve() system call which takes a file descriptor for an executable. This replaces execve(), which is disabled in capability mode because execve() takes a filename.

    Capsicum's fexecve() ignores the implicit filename that is embedded in the executable's PT_INTERP field, so it is only good for loading the dynamic linker directly or for loading other statically linked executables.

Currently, the only programs that run under Capsicum are those that have been ported specially:

  • The Capsicum guys ported Chromium, and it works much the same way as on Linux. On both systems, Chromium's renderer process runs sandboxed, but the browser process does not. On both systems, Chromium needs to be able to turn on sandboxing after the process has started up, because it relies on legacy libraries that use open() during startup.
  • Some Unix utilities, including gzip and dhclient, have been extended to use sandboxing internally (privilege separation). Like Chromium, gzip can open files and then switch to capability mode.

However, it should be possible to run legacy Unix programs under Capsicum by porting Plash.

At first glance, it looks like Plash would have to do the same tricks under FreeBSD-Capsicum as it does under Linux to run legacy programs. Under Linux, Plash uses a modified version of glibc in order to intercept its system calls and convert them to system calls that work in the sandbox. That's because the Linux kernel doesn't provide any help with intercepting the system calls. The situation is similar under FreeBSD -- Capsicum does not add any extensions for bouncing syscalls back to a user space handler.

However, there are two aspects of FreeBSD that should make Plash easier to implement there than on Linux:

  • FreeBSD's libc is friendlier towards overriding its functions. On both systems, it is possible to override (for example) open() via an LD_PRELOAD library that defines its own "open" symbol. But with glibc on Linux, this doesn't work for libc's internal calls to open(), such as from fopen(). For a small gain in efficiency, these calls don't go through PLT entries and so cannot be intercepted.

    FreeBSD's libc doesn't use this optimisation and so it allows the internal calls to be intercepted too.

  • FreeBSD's dynamic linker and libc are not tightly coupled, so it is possible to change the dynamic linker to open its libraries via IPC calls without having to rebuild libc in lockstep.

    In contrast, Linux glibc's and are built together, share some data structures (such as TLS), and cannot be replaced independently.