Sunday 4 January 2009

What does NaCl mean for Plash?

Google's Native Client (NaCl), announced last month, is an ingenious hack to get around the problem that existing OSes don't provide adequate security mechanisms for sandboxing native code.

You can look at NaCl as an interesting combination of OS-based and language-based security mechanisms:

  • NaCl uses a code verifier to prevent use of unsafe instructions such as those that perform system calls. This is not a million miles away from programming language subsets like Cajita and Joe-E, except that it operates at the level of x86 instructions rather than source code.

    Since x86 instructions are variable-length and unaligned, NaCl has to stop you from jumping into an unsafe instruction hidden in the middle of a safe instruction. It does that by requiring that all indirect jumps are jumps to the start of 32-byte-aligned blocks; instructions are not allowed to straddle these blocks.

  • It uses the x86 architecture's little-used segmentation feature to limit memory accesses to a range of address space. So the processor is doing bounds checking for free.

    Actually, segmentation has been used before - in L4 and EROS's "small spaces" facility for switching between processes with small address spaces without flushing the TLB. NaCl gets the same benefit: switching between trusted and untrusted code should be fast; faster than trapping system calls with ptrace(), for example.

Plash is also a hack to get sandboxing, specifically on Linux, but it has some limitations:
  1. it doesn't block network access;
  2. it doesn't limit CPU and memory resource usage, so sandboxed programs can still cause denial of service;
  3. it requires a custom glibc, which can be a pain to build;
  4. it changes the API/ABI that sandboxed programs see in some small but significant ways:
    • some syscalls are effectively disabled; programs must go through libc for these calls, which stops statically linked programs from working;
    • /proc/self doesn't work, and Plash's architecture makes it hard to emulate /proc.
Interface changes mean some programs require patching, e.g. to not depend on /proc. If there were more people behind Plash, these interface changes wouldn't be a big problem. These problems can be addressed, with work. But Plash hasn't really caught on, so the manpower isn't there.

NaCl also breaks the ABI - it breaks it totally. Code must be recompiled. However, NaCl provides bigger benefits in return. It allows programs to be deployed in new contexts: on Windows; in a web browser. It is more secure than Plash, because it can block network access and limit the amount of memory a process can allocate. Also, because NaCl mediates access more completely, it would be easier to emulate interfaces like /proc.

NaCl isn't only useful as a browser plugin. We could use it as a general purpose OS security mechanism. We could have GNU/Linux programs running on Windows (without the Linux bit).

Currently NaCl does not support all the features you'd need in a modern OS. In particular, dynamic linking. NaCl doesn't yet support loading code beyond an initial statically linked ELF executable. But we can add this. I am making a start at porting glibc, along with its dynamic linker. After all, I have ported glibc once before!

3 comments:

Anonymous said...

Mark, that's great news. FWIW I can't wait to see the result. I've still got plans in the back of my mind for how to leverage Plash but have not yet got the time to put them into action. Plash has so much potential, it ought to have caught on IMO.

Daira Hopwood said...

Why can't Plash block network access? Network access can only be done via syscalls that could be intercepted by ptrace, no?

Anonymous said...

Plash doesn't use ptrace.

(It could, but raw ptrace has a heavy performance overhead, because it adds some extra context switches to *every* system call the application makes -- even harmless ones like read() and write(). Better system call interposition tools provide support for trapping on only a subset of system calls, but those are not portable and are not available by default in the Linux kernel.)

Another plausible architecture would be to use something like seccomp. However I think that this doesn't quite work with seccomp as currently formed, because seccomp blocks readmsg() and sendmsg(), which are needed to pass fd's between processes. But if we imagine a seccomp that does allow readmsg() and sendmsg() (seccomp++, anyone?), then I think one could imagine a Plash variant that is built on top of seccomp++, and hence blocks network access. I've thought for a while that this would provide a robust foundation for a Plash-like sandbox, but I haven't written any code, so what do I know?

Well, that's my understanding, anyway. Hopefully Mark will correct any mistakes I have have introduced.