Wednesday 23 July 2014

Implementing fork() on the Mill CPU

The Mill is a new CPU architecture that claims to provide high performance but at a much better performance-per-watt than conventional CPUs that use out-of-order execution.

The Mill achieves this by making various architectural simplifications. One of those is to remove the TLB from the fast path of memory accesses. Rather than having the TLB between the CPU core and the cache, the Mill's TLB is between the cache and main memory.

OS models

This means the Mill is best suited for running Single Address Space operating systems (SASOS). The intent is that different processes will live at different addresses within a shared 64-bit address space. A process runs with permissions to access restricted ranges of this address space. Switching between processes is therefore just a matter of switching those permissions, which is fast on the Mill. It doesn't involve an expensive flush of the TLB (as on a conventional OS). It doesn't involve flushing the Mill's virtual-address-tagged (VIVT) cache.

This runs into a problem if we want to run Unix programs that use fork(), though. Use of fork() assumes that multiple processes will want to use the same virtual addresses.

The Mill developers have said they have a scheme for handling fork(), but they haven't said what it is, so I'm going to speculate. :-)

Context switching

If you create forked processes that share a range of virtual address space, a Mill OS can just flush the TLB and cache for those ranges whenever it needs to context switch between the two processes.

Basically, a Mill OS can act like a Separate Address Space OS when handling forked processes.

Switching costs

How expensive that would be depends on how the TLB and cache work.

  • In conventional CPUs, the TLB is per-core. The Mill's TLB might be shared between cores, though, if the Mill has higher-level caches that are both virtually-tagged and shared between cores.

    If that's the case, forked processes wouldn't be able to run concurrently on multiple cores.

  • Flushing the TLB and/or cache when context switching between forked processes would be slow.

    However, the OS would only have to flush the address ranges that have diverged between the forked processes -- that is, those that have been written to (as copy-on-write) or mmap()'d.

This would be fine for most uses of fork() on Unix, where execve() is called shortly after fork(). Forked processes usually exist only briefly, so there's little need to context-switch between them.

Zygotes

Forked processes are sometimes used as an optimisation, though. Android and Linux Chrome fork many child processes from a parent "zygote" process. This is a way to speed up process creation and reduce memory usage by sharing pages. This technique would become a pessimisation on the Mill. Android and Chrome would have to find a different way to do this.

Mill portals

Implementing fork() on the Mill doesn't seem very difficult, then.

However, there is one Mill feature that fork() might interfere with.

Mill provides "portals", a mechanism for one protection domain (a process in the single 64-bit address space) to call into another. A Mill OS would have to ensure that a forked process can't be invoked directly via a portal if that forked process has been context-switched out. The portal would have to be redirected to a routine that waits for the forked process to be context-switched back in before invoking it. Alternatively, a Mill OS could just disallow invoking forked process via portals entirely.

This doesn't seem like a big problem. Portals are a Mill-specific feature for optimising inter-process calls, whereas fork() is a "legacy" Unix feature. We want fork() for running existing Unix programs, which won't need to define Mill portals.

There might be an issue if a portal invocation needs to return to a forked process. For example, Unix syscalls might be implemented as portal invocations. Maybe these invocations would need to go via a proxy that knows how to context-switch the calling process back in.