Mail me

What's upvalue?

A closure implementation in a compiler requires a mechanism for keeping track of values of variables that were bound in the environment surrounding the function at the moment it's declared. The values can change after the function is declared (usually by the function itself), so some form of non-local state needs to be maintained.

This was a lot of words, so here's a JavaScript example.

  function newCounter() {
      let i = 0;
      return function inner() {
          i += 1;
          return i;
      }
  }

  const c1 = newCounter();
  console.log(c1()); // 1
  console.log(c1()); // 2

  const c2 = newCounter();
  console.log(c2()); // 1

inner() accesses (and modifies) i, which is a variable declared outside inner() (we say it's a free variable in inner()). Each call to newCounter() will create a new closure with its own i.

Implementation

So how do you implement this? I only know about the Lua way, which I learned while reading Crafting Interpreters. (See also The implementation of Lua 5.0.) Free variables are stored in a structure that they call an upvalue.

When the compiler resolves a variable in a function, it first checks if it's a local variable. If not then it tries to resolve it as an upvalue, by first trying to resolve it as a local in the immediately surrounding block. If it can't then it tries to resolve it as an upvalue in the surrounding block.

So it's a recursive procedure, so resolving an upvalue might require going up an arbitrary number of blocks. However because the resolved upvalue gets added to the list of upvalues in the immediately surrounding block, it's still fast to look up at runtime.

One complication is introduced by the fact that multiple closures might refer to the same variable. For example:

  function init()
      local i = 0
      function get()
          return i
      end
      function set(j)
          i = j
      end
      return get, set
  end

  get, set = init()
  print(get()) // 0
  set(5)
  print(get()) // 5

Both get() and set() share the state of i. If each of them created their own upvalue this wouldn't work as they each would have their own state. The way Lua handles that is by keeping a list of all upvalues in the stack, which allows for checking for whether an upvalue has already been created before adding a new one.

What makes an upvalue different from a regular local variable is that when the environment where it's bound goes out of scope, it doesn't merely get popped off the stack. So there's also a simple mechanism for moving them from the stack to the heap when a block ends.

Lua example

From Programming in Lua:

  function newCounter()
      local i = 0
      return function()
          i = i + 1
          return i
      end
  end

  c1 = newCounter()
  print(c1())

This produces the following bytecode:

  main <closure.lua:0,0> (11 instructions at 0x5e0bec242a30)
  0+ params, 2 slots, 1 upvalue, 0 locals, 3 constants, 1 function
  	1	[1]		VARARGPREP	0
  	2	[7]		CLOSURE  	0 0	; 0x5e0bec242f90
  	3	[1]		SETTABUP 	0 0 0	; _ENV "newCounter"
  	4	[9]		GETTABUP 	0 0 0	; _ENV "newCounter"
  	5	[9]		CALL     	0 1 2	; 0 in 1 out
  	6	[9]		SETTABUP 	0 1 0	; _ENV "c1"
  	7	[10]	GETTABUP 	0 0 2	; _ENV "print"
  	8	[10]	GETTABUP 	1 0 1	; _ENV "c1"
  	9	[10]	CALL     	1 1 0	; 0 in all out
  	10	[10]	CALL     	0 0 1	; all in 0 out
  	11	[10]	RETURN   	0 1 1	; 0 out

  function <closure.lua:1,7> (4 instructions at 0x5e0bec242f90)
  0 params, 2 slots, 0 upvalues, 1 local, 0 constants, 1 function
  	1	[2]	LOADI    	0 0
  	2	[6]	CLOSURE  	1 0	; 0x5e0bec2432a0
  	3	[6]	RETURN   	1 2 0k	; 1 out
  	4	[7]	RETURN   	1 1 0k	; 0 out

  function <closure.lua:3,6> (7 instructions at 0x5e0bec2432a0)
  0 params, 2 slots, 1 upvalue, 0 locals, 0 constants, 0 functions
  	1	[4]	GETUPVAL 	0 0	; i
  	2	[4]	ADDI     	0 0 1
  	3	[4]	MMBINI   	0 1 6 0	; __add
  	4	[4]	SETUPVAL 	0 0	; i
  	5	[5]	GETUPVAL 	0 0	; i
  	6	[5]	RETURN1  	0
  	7	[6]	RETURN0

Note SETUPVAL and GETUPVAL. (Also note that Lua uses a register-based VM, unlike Lox.)

Is it an original concept?

The Lua 5.0 paper presents this concept as novel, but given how simple it is and given how important closures are to functional programming I wonder to which extent that's true. Perhaps the novelty comes from the design goal of being a high-level language that's also small and easy to embed, which imposes a simple implementation. I wonder how ECL does it…