How Stacksmith handles References

HyperTalk is designed to never crash. That is, barring a bug in HyperCard itself, or a bug in a native-code plugin, any code that a scripter writes should at worst bring up an error message and abort execution of the current event handler, dropping you back into the main event loop.

This sounds simple at first, but becomes a bit of a problem if you actually are the one implementing a HyperTalk interpreter, like I did for Stacksmith’s Hammer programming language.

Scripters can write things like

on mouseUp
  delete me
  set the name of me to "I'm gone"
end mouseUp 

If you implement that naively, you implement me as being simply a pointer to the object the script belongs to. But what do you do when it goes away? You could employ reference counting and just prevent the object from going away as long as a script is running, but then you’ve just punted the problem one level up. If a button deletes the window it sits in, the button (running the script) would stay, because the script holds on to it, but if you then ask the button for its window, you’d still not be able to get a valid object pointer.

You really want to have an object know who is referencing it, so it can just set those references to NULL. Then you make your code check for NULL returns and abort early. But that’s a lot of housekeeping data and maintenance overhead, especially if you have an object that is referenced many times (like me during the course of a script).

So what Stacksmith does is it adopts a handle approach: References to objects are stored as indexes into a master pointer table. Each entry consists of the actual pointer and a generation number, the seed.

Whenever you reference another object, you keep that seed and the index, and fill out a new such entry in the global array of references. You write the pointer to the actual object into it and increase the seed by one.

To retrieve such a referenced object, you find its entry in the array of master pointers, compare that the seed is the same as the one you saved, and if it is, copy the pointer.

Why the seed? Well, you see, for this to work without crashing, once created, a master pointer entry must stay around for the life of the program, so we want to be able to re-use them. So, what happens when an object goes away? Well, it knows about its master pointer and sets it to NULL.

Now, when we look for an unused master pointer, that’s what we look for: An entry in that table that is NULL. We increment its seed, and stick our pointer in it. If some code that referenced the old object comes around now and tries to access the pointer, it still finds valid memory (so no crash), however, when it compares the seed to the one it stored, it realizes that it doesn’t match (indicating that the object has been deleted and the slot re-used), and just behaves the same way as if it was NULL.

This behaviour goes pretty well with the performance characteristics of most programs:

In the common case, when the object is still around, the only penalty we incur is one additional pointer de-reference plus an int comparison.
In the uncommon case where an object has gone away, it is just as fast.
When an object is deleted, it simply sets one pointer to NULL, and everybody who still references it lazily finds out if they try to access it, is none the wiser if no access ever happens.

The penalties we pay here occur due to a bit less locality when accessing referenced memory, and when creating a reference to an object:

Our reference is larger, it stores a seed in addition to the actual pointer.
The first time a reference is created, we need to find an empty slot in the table of master pointers, currently via linear search.
If our table is full, we need to increase the table’s size and allocate a new block of master pointers, which now has to stay permanently resident in memory. While a pointer and seed only uses 16 bytes, this still means this memory ends up not going down after our peak memory usage.

Now, in Stacksmith, there are several ways this is (or could be) optimized:

The master pointer table is per “script context group”, so roughly per project. If you close a project, we know that there are no more scripts using this particular table, and we can free the memory.
We can remember the last master pointer entry we used, and just start our search from there, so in most cases our “linear search” will just find the next empty entry.
Every object keeps track of its reference (so it can set it to NULL when it is deleted). So when a second reference to the same object is created, we can just ask the object to give us its master pointer and seed, without having to scan the master pointer table for an empty slot.
Since our objects are reference-counted in addition to using this master pointer scheme, for some uses we can just retain the object instead of taking out a reference.

While originally designed for the above use case, these references have become a useful facility for avoiding all kinds of lifetime issues across the language, and will likely also come in handy when adding reference parameters to function calls as well. After all, these references do not care what data type they point to. As long as you keep the seed and properly deregister, you can store any pointer in such a reference. Even to a platform-native object.