Tuesday, 5 August 2008

Four Python variable binding oddities

Python has some strange variable binding semantics. Here are some examples.

Oddity 1: If Python were a normal lambda language, you would expect the expression x to be equivalent to (lambda: x)(). I mean x to be a variable name here, but you would expect the equivalence to hold if x were any expression. However, there is one context in which the two are not equivalent: class scope.

x = 1
class C:
    x = 2
    print x
    print (lambda: x)()
Expected output:
2
2
Actual output:
2
1
There is a fairly good reason for this.

Oddity 2: This is also about class scope. If you're familiar with Python's list comprehensions and generator expressions, you might expect list comprehensions to be just a special case of generators that evaluates the sequence up-front.

x = 1
class C:
    x = 2
    print [x for y in (1,2)]
    print list(x for y in (1,2))
Expected output:
[2, 2]
[2, 2]
Actual output:
[2, 2]
[1, 1]
This happens for a mixture of good reasons and bad reasons. List comprehensions and generators have different variable binding rules. Class scopes are somewhat odd, but they are at least consistent in their oddness. If list comprehensions and generators are brought into line with each other, you would actually expect to get this output:
[1, 1]
[1, 1]
Otherwise class scopes would not behave as consistently.

Oddity 3:

x = "top"
print (lambda: (["a" for x in (1,2)], x))()
print (lambda: (list("a" for x in (1,2)), x))()
Expected output might be:
(['a', 'a'], 'top')
(['a', 'a'], 'top')
Or if you're aware of list comprehension oddness, you might expect it to be:
(['a', 'a'], 2)
(['a', 'a'], 2)
(assuming this particular ordering of the "print" statements) But it's actually:
(['a', 'a'], 2)
(['a', 'a'], 'top')
If you thought that you can't assign to a variable in an expression in Python, you'd be wrong. This expression:
[1 for x in [100]]
is equivalent to this statement:
x = 100
Oddity 4: Back to class scopes again.
x = "xtop"
y = "ytop"
def func():
    x = "xlocal"
    y = "ylocal"
    class C:
        print x
        print y
        y = 1
func()
Naively you might expect it to print this:
xlocal
ylocal
If you know a bit more you might expect it to print something like this:
xlocal
Traceback ... UnboundLocalError: local variable 'y' referenced before assignment
(or a NameError instead of an UnboundLocalError)
Actually it prints this:
xlocal
ytop
I think this is the worst oddity, because I can't see a good use for it. For comparison, if you replace "class C" with a function scope, as follows:
x = "xtop"
y = "ytop"
def func():
    x = "xlocal"
    y = "ylocal"
    def g():
        print x
        print y
        y = 1
    g()
func()
then you get:
xlocal
Traceback ... UnboundLocalError: local variable 'y' referenced before assignment
I find that more reasonable.

Why bother? These issues become important if you want to write a verifier for an object-capability subset of Python. Consider an expression like this:

(lambda: ([open for open in (1,2)], open))()
It could be completely harmless, or it might be so dangerous that it could give the program that contains it the ability to read or write any of your files. You'd like to be able to tell. This particular expression is harmless. Or at least it is harmless until a new release of Python changes the semantics of list comprehensions...

5 comments:

Anonymous said...

I think you could avoid the need to rewrite/transform code, and stick to a simple static verifier, with the following trick: forbid all shadowing. If the same variable name appears at multiple scopes, reject the whole Python file, and tell the user "don't do that". Assuming that CapPython is intended for new code (not for neutering piles of legacy code), it seems a simple solution that would should be straightforward for programmers -- unless I'm missing something.

I mentioned this to Mark in private email, and Mark points out that for robustness and future-proofing it would be a good idea to treat list comprehensions as if they introduced new scopes, even though they currently do not. I agree: this is a nice improvement on my suggestion.

rouli said...

Can you explain what's the "good reason" behind the first case?
Coming to Python from other languages (c++), I find it very unintuitive.

Unknown said...

ITT: functional language devotee discovers the world doesn't revolve around him

Mark Seaborn said...

@rouli: The reason for the first case is so that variables defined in the class are not visible as variables inside methods. See this post from python-dev.

Anything that introduces a new scope ignores the bindings from the class scope. So "lambda" and generators are simply behaving consistently with "def".

Danny said...

Also strange (excuse the somewhat convoluted way, but I just wanted to be sure):

fs = []

i = 0
for q in range(2):
..f = lambda : i
..del i
..i = 1
..fs.append(f)

for f in fs:
..print(f())

(substitute ".." for space)

I'd expect that to print
0
1

But it prints
1
1

... in both Python 2 and Python 3. Weird...

A harder-to-understand oneliner would be:
[lambda: x for x in range(2)][0]()

I would expect that to give 0 but it gives 1.