Sunday 14 September 2008

CapPython, unbound methods and Python 3.0

CapPython needs to block access to method functions. Luckily, one case is already covered by Python's unbound methods, but they are going away in Python 3.0.

Consider the following piece of Python code:

class C(object):

    def f(self):
        return self._field

x = C()

In Python, methods are built out of functions. "def" always defines a function. In this case, the function f is defined in a class scope, so the function gets wrapped up inside a class object, making it available as a method.

There are three ways in which we might use function f:

  • Via instances of class C as a normal method, e.g. x.f(). This is the common case. The expression x.f returns a bound method, which wraps the instance x and the function f.
  • Via class C, e.g. C.f(x). The expression C.f returns an unbound method. If you call this unbound method with C.f(y), it first checks that y is an instance of C. If that is the case, it calls f(x). Otherwise, it raises a TypeError.
  • Directly, as a function, assuming you can get hold of the unwrapped function. There are several ways to get hold of the function:
    • x.f.im_func or C.f.im_func. Bound and unbound methods make the function they wrap available via an attribute called "im_func".
    • In class scope, "f" is visible directly as a variable.
    • C.__dict__["f"]

CapPython allows the first two but aims to block direct use of method functions.

In CapPython, attribute access is restricted so that you can only access private attributes (those starting with an underscore) via a "self" variable inside a method function. For this to work, access to methods functions must be restricted. Function f should only ever be used on instances of C and its subclasses.

Suppose that constraint was violated. If you could get hold of the unwrapped function f, you could apply it to an object y of any type, and f(y) would return the value of the private attribute, y._field. That would violate encapsulation.

To enforce encapsulation, CapPython blocks the paths for getting hold of f that are listed above, as well as some others:

  • "im_func" is treated as a private attribute, even though it doesn't start with an underscore.
  • In class scope, reading the variable f is forbidden. Or, to be more precise, if variable f is read, f is no longer treated as a method function, and its access to self._field is forbidden.
  • __dict__ is a private attribute, so the expression C.__dict__ is rejected.
  • Use of __metaclass__ is blocked, because it provides another way of getting hold of a class's __dict__.
  • Use of decorators is restricted.

Bound methods and unbound methods both wrap up function f so that it can be used safely.

However, this is changing in Python 3.0. Unbound methods are being removed. This means that C.f simply returns function f. If CapPython is going to work on Python 3.0, I am afraid it will have to become a lot more complicated. CapPython would have to apply rewriting to class definitions so that class objects do not expose unwrapped method functions. Pre-3.0 CapPython has been able to get away without doing source rewriting.

Pre-3.0 CapPython has a very nice property: It is possible for non-CapPython-verified code to pass classes and objects into verified CapPython code without allowing the latter to break encapsulation. The non-verified code has to be careful not to grant the CapPython code unsafe objects such as "type" or "globals" or "getattr", but the chances of doing that are fairly low, and this is something we could easily lint for. However, if almost every class in Python 3.0 provides access to objects that break CapPython's encapsulation (that is, method functions), so that the non-CapPython code must wrap every class, the risks of combining code in this way are significantly increased.

Ideally, I'd like to see this change in Python 3.0 reverted. Unbound methods were scheduled for removal in a one-liner in PEP 3100. This originated in a comment on Guido van Rossum's blog and a follow-on thread. The motivation seems to be to simplify the language, which is often good, but not in this case. However, I'm about 3 years too late, and Python 3.0 is scheduled to be released in the next few weeks.

9 comments:

Daira Hopwood said...

That's shocking. They are deliberately introducing exactly the problem that JavaScript has that makes it so difficult to create secure subsets of JavaScript.

Daira Hopwood said...

Hmm. Having looked at this in more detail, why does the same problem not occur even pre-Python-3.0 for builtin methods (which are never unbound).

Anonymous said...

I'll echo David-Sarah's comments.

Can you write your static verifier to simply forbid the code from accessing unbound methods? e.g., forbid the syntax "C.method"? Seems like this would preserve the interoperability properties you're trying to achieve. Would this represent an unacceptable loss of functionality? Is there some reason why it's important to be able to use unbound methods?

Daira Hopwood said...

Just like in JavaScript (modulo some differences that I don't think are important here), the syntax "C.method" is used both for access to methods of a class, and for any other field/property access. I suspect that it is difficult to distinguish between these statically, since references to classes are first-class values.

OTOH, I haven't looked at this for Python in nearly as much detail as for JavaScript, so don't take my word for it.

Daira Hopwood said...

If you were to post something about this to the python-3000 (at) python.org list, I'd back you up.

Mark Seaborn said...

@David-Sarah: The same problem does not occur for builtin methods such as list.append because these already apply a type check:

>>> class C(object): pass
>>> list.append(C(), 1)
TypeError: descriptor 'append' requires a 'list' object but received a 'C'

Builtin unbound methods have to check this otherwise the interpreter could crash. list.append is a primitive operation that depends on the list type's private representation.

Unbound method wrappers are just adding the check that builtin methods already do.

Daira Hopwood said...

"Unbound method wrappers are just adding the check that builtin methods already do."

Ah, OK. So the post I referenced is completely wrong; all builtin methods are effectively unbound.

From the original post:
"Function f should only ever be used on instances of C and its subclasses."

Shouldn't that only be instances of exactly C? Being able to call a superclass method implementation on a subclass that has overridden that method, seems like an encapsulation violation.

(The Java VM design also has that problem, IIRC.)

Mark Seaborn said...

@David-Sarah: "Being able to call a superclass method implementation on a subclass that has overridden that method, seems like an encapsulation violation." - Yes, it is a limited encapsulation violation, at least for non-self references. (You would expect to be able to do this on "self".) Let's call this the "override override problem" (or maybe "underride"?).

Some background: Python has two mechanisms for calling a superclass's method implementation: super() (newer) and class attributes (older). If unbound methods did not accept subclasses they would not have been usable for this purpose in normal Python.

Ideally, class objects would not have attributes at all. It is a compromise for CapPython to rely on unbound methods' type check.

My answer to this particular "override override problem" for now is that it means you have to be careful about how you use inheritance. Suppose you have a class hierarchy defined in one module: classes C and D, where D overrides C's method. You don't want C's version of the method to be callable on instances of D by outside code. My answer is that you should "close off" C and D by not exporting the class objects themselves from the module and by only exporting the constructors.

There some other approaches but they are more complex and I'll leave them for another post.

Guido van Rossum said...

Mark, if you're listening, would you mind bringing this up on python-ideas@python.org? I'd like to have a discussion about this, but the comment section of a blog doesn't feel like a good place to have it. I am surprised by the fact that this matters to you so much -- I don't understand where the function object f gets its magic powers. So there is a lot to learn for me and the Python community about what it is that you do and need.