Saturday 18 April 2009

Python variable binding semantics, part 2

Time for me to follow up my previous blog post and explain the code snippets.

Snippet 1

funcs = []
for x in (1, 2, 3):
    funcs.append(lambda: x)
for f in funcs:
    print f()
Although you might expect this Python code snippet to print "1 2 3", it actually prints "3 3 3". The reason for this is that the same mutable variable binding for x is used across all iterations of the loop. The for loop does not create new variable bindings.

The function closures that are created inside the loop do not capture the value of x; they capture the cell object that x is bound to under the hood (or the globals dictionary if this code snippet is run at the top level), and each iteration mutates the same cell object.

Interestingly, Python does not restrict the target of the for loop to be a variable or tuple of variables. It can be any lvalue expression that can appear on the left hand side of an assignment, including indexing expressions. For example:

x = [10, 20]
for x[0] in (1,2,3):
    pass
print x # Prints [3, 20]

Snippet 2

This problem is Python's variable binding semantics is not specific to lambdas and also occurs if you use def.
funcs = []
for x in (1, 2, 3):
    def myfunc():
        return x
    funcs.append(myfunc)
for f in funcs:
    print f()
This also prints "3 3 3".

Snippet 3

The remaining snippets are examples of ways to work around this problem. They print the desired "1 2 3".

One way is to use default arguments:

funcs = []
for x in (1, 2, 3):
    funcs.append(lambda x=x: x)
for f in funcs:
    print f()
This is actually the trick that was used to get the effect of lexical scoping before lexical scoping was added to Python.

The default argument captures the value of x at the point where the function closure is created, and this value gets rebound to x when the function is called.

Although it is concise, I'm not too keen on this workaround. It is an abuse of default arguments. There is a risk that you accidentally pass too many arguments to the function and thereby override the captured value of x.

Snippet 4

funcs = []
for x in (1, 2, 3):
    def g(x):
        funcs.append(lambda: x)
    g(x)
for f in funcs:
    print f()
This is quite an ugly workaround, but it's perhaps the most general one. The loop body is wrapped in a function. The x inside function g and the x outside are statically different variable bindings. A new mutable binding of x is created on every call to g, but this binding is never modified.

Snippets 5 and 6

The remaining two snippets are attempts to make the code less ugly, by giving names to the intermediate functions that are introduced to rebind x and thereby give the desired behaviour.
funcs = []
def append_a_func(x):
    funcs.append(lambda: x)
for x in (1, 2, 3):
    append_a_func(x)
for f in funcs:
    print f()

def make_func(x):
    return lambda: x
funcs = []
for x in (1, 2, 3):
    funcs.append(make_func(x))
for f in funcs:
    print f()
I suppose there is a downside to de-uglifying the code. If the code looks too normal, there's a risk that someone will refactor it to the incorrect version without testing that it works properly when looping over lists containing multiple items.

Can the language be fixed?

The root cause is that Python does away with explicit variable binding declarations. If a variable is assigned within a function -- including being assigned as the target of a for loop -- it becomes local to the function, but variables are never local to smaller statement blocks such as a loop body. Python doesn't have block scoping. There are a couple of naive ways in which you might try to fix the language so that the code does what you would expect, both of which have problems. We could change the semantics of function closure creation so that closures take a snapshot of variables' values. But this would break mutually recursive function definitions, and cases where functions are referred to before they are defined, such as:
def f(): # This "def" would raise an UnboundLocalError or a NameError
    g()
def g():
    pass
We could change the semantics of for loops so that the target variable is given a variable binding that is local to the loop body. But this wouldn't change the status of variable bindings created by assignment, so this code would still have the unwanted behaviour:
for x in (1, 2, 3):
    y = x
    funcs.append(lambda: y)

Linting

It should be possible to create a lint tool to detect this hazard. It could look for closures that capture variables that are assigned by or in loops. I wonder how many false positives it would give on a typical codebase.

Javascript

Javascript suffers from the same problem:
funcs = [];
for(var x = 0; x < 3; x++) {
    funcs.push(function() { return x; })
}
for(var i = 0; i < funcs.length; i++) {
    print(funcs[i]());
}
This code prints "3 3 3" when run using rhino.

In Javascript's case the affliction is entirely gratuitous. It happens because var declarations are implicitly hoisted up to the top of the function, for no apparent reason. It is gratuitous because this is not the result of a trade-off.

The good news, though, is that the problem is recognised and a fix to Javascript is planned. I think the plan is for Ecmascript 3.1 to introduce a let declaration as an alternative to var without the hoisting behaviour.

List comprehensions

Back to Python. The same problem also applies to list comprehensions and generator expressions.
funcs = [(lambda: x) for x in (1, 2, 3)]
for f in funcs:
    print f()

funcs = list((lambda: x) for x in (1, 2, 3))
for f in funcs:
    print f()

These both print "3 3 3".

(Note that I added brackets around the lambdas for clarity but the syntax does not require them.)

This is forgivable for list comprehensions, at least in Python 2.x, because the assignment to x escapes into the surrounding function. (See my earlier post.)

But for generators (and for list comprehensions in Python 3.0), the scope of x is limited to the comprehension. Semantically, it would be easy to limit the scope of x to within a loop iteration, so that each iteration introduces a new variable binding.