Sunday, 28 June 2009

Encapsulation in Python: two approaches

Time for another post on how object-capability security might be done in Python.

Suppose function closures in Python were encapsulated (i.e. the attributes func_closure, func_globals etc. were removed) and were the only route for getting encapsulation for non-builtin objects, as is the case in Python's old restricted mode (a.k.a. rexec). There would be two ways of defining new encapsulated objects. Let's call them the Cajita approach and the Bastion approach.

The Cajita-style approach (named after the Cajita subset of Javascript) is to define an object using a record of function closures, one closure per method. Here's an example, from a post from Ka-Ping Yee from 2003:

def Counter():
    self = Namespace()
    self.i = 0

    def next():
        self.i += 1
        return self.i

    return ImmutableNamespace(next)
(There are some more examples - a read-only FileReader object, and Mint and Purse objects based on the example from E - in Tav's post on object-capabilities in Python.)

This is instead of the far more idiomatic, but unencapsulated class definition:

class Counter(object):

    def __init__(self):
        self._i = 0

    def next(self):
        self._i += 1
        return self._i
(Ignoring that the more idiomatic way to do this is to use a generator or itertools.count().)

I am calling this the Cajita approach because it is analagous to how objects are defined in the Cajita subset of Javascript (part of the Caja project), where this code would be written as:

function Counter() {
    var i = 0;
    return Obj.freeze({
        next: function () {
            i += 1;
            return i;
        },
    });
}
The Python version of the Cajita style is a lot more awkward because Python is syntactically stricter than Javascript:
  • Expressions cannot contain statements, so function definitions cannot be embedded in an object creation expression. As a result, method names have to be specified twice: once in the method definitions and again when passed to ImmutableNamespace. (It would be three times if ImmutableNamespace did not use __name__.)
  • A function can only assign to a variable in an outer scope by using the nonlocal declaration (recently introduced), and that is awkward. The example works around this by doing "self = Namespace()" and assigning to the attribute self.i.
The Cajita style also has a memory usage cost. The size of each object will be O(n) in the number of methods, because each method is created as a separate function closure with its own pointer to the object's state. This is not trivial to optimise away because doing so would change the object's garbage collection behaviour.

For these reasons it would be better not to use the Cajita style in Python.

The alternative is the approach taken by the Bastion module, which the Python standard library used to provide for use with Python's restricted mode. (Restricted mode and Bastion.py are actually still present in Python 2.6, but restricted mode has holes and is unsupported, and Bastion.py is disabled.)

Bastion provided wrapper objects that exposed only the public attributes of the object being wrapped (where non-public attribute names are those that begin with an underscore). This approach means we can define objects using class definitions as usual, and wrap the resulting objects. To make a class encapsulated and uninheritable, we just have to add a decorator:

@sealed
class Counter(object):

    def __init__(self):
        self._i = 0

    def next(self):
        self._i += 1
        return self._i
where "sealed" can be defined as follows:
# Minimal version of Bastion.BastionClass.
# Converts a function closure to an object.
class BastionClass:

    def __init__(self, get):
        self._get = get

    def __getattr__(self, attr):
        return self._get(attr)

# Minimal version of Bastion.Bastion.
def Bastion(object):
    # Create a function closure wrapping the object.
    def get(attr):
        if type(attr) is str and not attr.startswith("_"):
            return getattr(object, attr)
        raise AttributeError(attr)
    return BastionClass(get)

def sealed(klass):
    def constructor(*args, **kwargs):
        return Bastion(klass(*args, **kwargs))
    return constructor
>>> c = Counter()
>>> c.next()
1
>>> c.next()
2
>>> c._i
AttributeError: _i

Mutability problem

One problem with the Bastion approach is that Bastion's wrapper objects are mutable. Although you can't use the Bastion object to change the wrapped object's attributes, you can change the Bastion object's attributes. This means that these wrapper objects cannot be safely shared between mutually distrusting components. Multiple holders of the object could use it as a communications channel or violate each other's expectations.

There isn't an obvious way to fix this in pure Python. Overriding __setattr__ isn't enough. I expect the simplest way to deal with this is to implement Bastion in a C extension module instead of in Python. The same goes for ImmutableNamespace (referred to in the first example) - the Cajita-style approach faces the same issue.

Wrapping hazard

There is a potential hazard in defining encapsulated objects via wrappers that is not present in CapPython. In methods, "self" will be bound to the private view of the object. There is a risk of accidentally writing code that passes the unwrapped self object to untrusted code.

An example:

class LoggerMixin(object):

    def sub_log(self, prefix):
        return PrefixLog(self, prefix)

@sealed
class PrefixLog(LoggerMixin):

    def __init__(self, log, prefix):
        self._log = log
        self._prefix = log

    def log(self, message):
        self._log.log("%s: %s" % (prefix, message))
(This isn't a great example because of the circular relationship between the two classes.)

This hazard is something we could lint for. It would be easy to warn about cases where self is used outside of an attribute access.

The method could be changed to:

    def sub_log(self, prefix):
        return PrefixLog(seal_object(self), prefix)
This creates a new wrapper object each time which is not always desirable. To avoid this we could store the wrapper object in an attribute of the wrapped object. Note that that would create a reference cycle, and while CPython's cycle collector is usually fine for collecting cycles, creating a cycle for every object might not be a good idea. An alternative would be to memoize seal_object() using a weak dictionary.

Sunday, 14 June 2009

Python standard library in Native Client

The Python standard library now works under Native Client in the web browser, including the Sqlite extension module.

By that I mean that it is possible to import modules from the standard library, but a lot of system calls won't be available. Sqlite works for in-memory use.

Here's a screenshot of the browser-based Python REPL:

The changes needed to make this work were quite minor:

  • Implementing stat() in glibc so that it does an RPC to the Javascript running on the web page. Python uses stat() when searching for modules in sys.path.
  • Changing fstat() in NaCl so that it doesn't return a fixed inode number. Python uses st_ino to determine whether an extension module is the same as a previously-loaded module, which doesn't work if st_ino is the same for different files.
  • Some tweaks to nacl-glibc-gcc, a wrapper script for nacl-gcc.
  • Ensuring that NaCl-glibc's libpthread.so works.
I didn't have to modify Python or Sqlite at all. I didn't even have to work out what arguments to pass to Sqlite's configure script - the Debian package for Sqlite builds without modification. It's just a matter of setting PATH to override gcc to run nacl-glibc-gcc.