Language Design

There are 16 entries in this Category.

How Stacksmith handles References

HyperTalk is designed to never crash. That is, barring a bug in HyperCard itself, or a bug in a native-code plugin, any code that a scripter writes should at worst bring up an error message and abort execution of the current event handler, dropping you back into the main event loop.

This sounds simple at first, but becomes a bit of a problem if you actually are the one implementing a HyperTalk interpreter, like I did for Stacksmith’s Hammer programming language.

Scripters can write things like

on mouseUp
  delete me
  set the name of me to "I'm gone"
end mouseUp

If you implement that naively, you implement me as being simply a pointer to the object the script belongs to. But what do you do when it goes away? You could employ reference counting and just prevent the object from going away as long as a script is running, but then you’ve just punted the problem one level up. If a button deletes the window it sits in, the button (running the script) would stay, because the script holds on to it, but if you then ask the button for its window, you’d still not be able to get a valid object pointer.

You really want to have an object know who is referencing it, so it can just set those references to NULL. Then you make your code check for NULL returns and abort early. But that’s a lot of housekeeping data and maintenance overhead, especially if you have an object that is referenced many times (like me during the course of a script).

So what Stacksmith does is it adapts a handle approach. References to objects are stored as indexes into a master pointer table. Each entry consists of the actual pointer and a generation number, the seed.

Whenever you reference another object, you keep that seed and the index, and fill out a new such entry in the global array of references. You write the pointer to the actual object into it and increase the seed by one.

To retrieve such a referenced object, you find its entry in the array of master pointers, compare that the seed is the same as the one you saved, and if it is, copy the pointer.

Why the seed? Well, you see, for this to work without crashing, once created, a master pointer entry must stay around for the life of the program, so we want to be able to re-use them. So, what happens when an object goes away? Well, it knows about its master pointer and sets it to NULL.

Now, when we look for an unused master pointer, that’s what we look for: An entry in that table that is NULL. We increment its seed, and stick our pointer in it. If some code that referenced the old object comes around now and tries to access the pointer, it still finds valid memory (so no crash), however, when it compares the seed to the one it stored, it realizes that it doesn’t match (indicating that the object has been deleted and the slot re-used), and just returns NULL.

This behaviour goes pretty well with the performance characteristics of most programs:

  • In the common case, when the object is still around, the only penalty we incur is one additional pointer de-reference plus an int comparison.
  • In the uncommon case where an object has gone away, it is just as fast.
  • When an object is deleted, it simply sets one pointer to NULL, and everybody who still references it lazily finds out if they try to access it, is none the wiser if no access ever happens.

The penalties we pay here occur due to a bit less locality when accessing referenced memory, and when creating a reference to an object:

  • Our reference is larger, it stores a seed *in addition to* the actual pointer.
  • The first time a reference is created, we need to find an empty slot in the table of master pointers, currently via linear search.
  • If our table is full, we need to increase the table’s size and allocate a new block of master pointers, which now has to stay permanently resident in memory. While a pointer and seed only uses 8 bytes, this still means this memory ends up not going down after our peak memory usage.

Now, in Stacksmith, there are several ways this is (or could be) optimized:

  • The master pointer table is per “script context group”, so roughly per project. If you close a project, we know that there are no more scripts using this particular table, and we can free the memory.
  • We can remember the last master pointer entry we used, and just start our search from there, so in most cases our “linear search” will just find the next empty entry.
  • Every object keeps track of its reference (so it can set it to NULL when it is deleted). So when a second reference to the same object is created, we can just ask the object to give us its master pointer and seed, without having to scan the master pointer table for an empty slot.
  • Since our objects are reference-counted in addition to using this master pointer scheme, for some uses we can just retain the object instead of taking out a reference.

While originally designed for the above use case, these references have become a useful facility for avoiding all kinds of lifetime issues across the language, and will likely also come in handy when adding reference parameters to function calls as well. After all, these references do not care what data type they point to. As long as you keep the seed and properly deregister, you can store any pointer in such a reference. Even to a platform-native object.

Death to Booleans!

DeathToBooleans

One of the most annoying aspects of most C-descended languages is that function calls become kind of unreadable when they have more than a single boolean parameter. The calls start looking like:

    OpenFile( "/etc/passwd", true, true, false );

and you have no idea what effect each boolean actually has. Sometimes people solve this by naming all parameters in the function name, but of course that doesn’t permit adding more optional parameters to a function later, because you’d have to change the name:

    OpenFilePathEditableSaveSavingAllowNetworkURLs( "/etc/passwd", true, true, false );

A disciplined programmer will solve this by adding an enum and using that instead of the booleans:

    enum FileEditability { kReadOnly, kEditable }
    enum FileSafeSaveability { kSafeSave, kOverwriteInPlace }
    enum FileAllowNetworkURLs { kFileURLsOnly, kAllowNetworkURLs };
    void    OpenFile( const char* path, enum FileEditability fe, enum FileSafeSaveability fs, enum FileAllowNetworkURLs fu );

Or maybe just make all booleans a “flags” bitfield:

    enum
    {
        kEditable = (1 << 0),
        kSafeSave = (1 << 1),
        kAllowNetworkURLs = (1 << 2)
    }
    typedef uint32_t FileOpenFlags;
    void    OpenFile( const char* path, FileOpenFlags inFlags );

But that requires the foresight to never use a single boolean. And of course the actual discipline.

Wouldn't it be nice if C had a special provision for naming booleans? My first thought was allowing to specify enums in-line for parameters:

    void OpenFile( const char* path, enum { kReadOnly, kEditable } inReadOnly );

But to be convenient, this would require some rather too-clever scoping rules. It'd be easy to make the enum available to all callers when they directly call the function, but what about cases where you want to store the value in a variable? Maybe we could do C++-style scope resolution and allow saying OpenFile::kReadOnly ?

Would be a nice way to make it easy to name parameters, but not really readable.

I guess that's why other languages have named parameters instead. Avoids all those issues. So...

The boolean is dead! Long live the boolean! (as long as you have named parameters to label them with)

How to write a compiler

A bytecode interpreter feeding instructions through it.Since there isn’t that much beginner info out there about this topic, here’s a very rough run down of what I know about the basics of writing your own compiler, in particular about how the CPU works and how to generate code for it.

CPU/bytecode interpreter

A bytecode interpreter works just like a CPU, the difference being that it is in software, while the CPU is actual hardware. So all a fake or real CPU does is take a list of instructions and fetch them one by one.

To properly do that, there is one variable (in a real CPU, this is a register) that contains the position of the current instruction. This is called the program counter or PC for short, and is basically just the memory address of whatever command it is to execute.

Once an instruction finishes, the CPU adds to the PC to make the pointer point at the next instruction (or in the case of a conditional or loop, it rewinds the PC back to the start of the loop, or jumps over the ‘else’ section, or whatever.

So it’s fairly easy to create a bytecode interpreter. It’s just a loop:

#define NO_OP        0
#define PRINT        1
#define END          2
struct Instruction { int instructionType; int param1; int param2; };
Instruction *currentInstruction = LoadCodeFromDisk();

while( currentInstruction )
{
    if( instructionType == NO_OP )
        currentInstruction++;
    else if( instructionType == PRINT )
    {
        DoPrint( currentInstruction->param1, currentInstruction->param2 );
        currentInstruction++;
    }
    else if( instructionType == END )
        currentInstruction = NULL;
    else
        exit(1); // UNKNOWN INSTRUCTION! BLOW UP!
}

So all that generating machine code or byte code is, is really adding items to an array of structs. Of course, if you want to generate Intel machine code it’s a little more complicated, because instructions can be different size, so you can’t use a classic array, but you can write the raw data to a file or memory block just the same.

Generating code that jumps

If you’ve ever programmed BASIC, you’ve probably seen the following program:

Text with arrows indicating the progression from Print to Goto back to Print1 PRINT "Hello World"
2 GOTO 1

This is an endless loop. Line 1 prints some text to the screen, and line 2 jumps back to line 1. Once line 1 is done, we continue to line 2 again, which jumps back to line 1, forever and ever until someone turns off the computer. So all GOTO does is change the currentInstruction from our above example, the program counter.

currentInstruction = 1;

You can implement GOTO the same way in your bytecode interpreter. However, since you usually don’t know what address your code will be loaded at (and it definitely won’t be address 1), you will generally write your code so it jumps relative to the current instruction’s location. So our version of GOTO, the JUMPBY instruction, would be

currentInstruction += currentInstruction->param1;

For our pseudo-machine-code:

PRINT "Hello World"
JUMPBY -1

With this instruction under your belt, you can quickly implement conditional statements like if. An if-instruction is simply an instruction that looks at a location in memory (whose address could be provided as param2) and if that location is 1 (true), jumps by param1. Otherwise it does the usual currentInstruction++.

The conditional GOTO is the basic building block of all flow control. If/else:

Flow from 1,2,5,6 for true case,  2,3,4,6 for false case.1 FOO=1
2 GOTO 5 IF FOO
3 PRINT "Foo is false."
4 GOTO 6
5 PRINT "Foo is true."
6 END

loops:

1 FOO=0
2 GOTO 6 IF FOO
3 PRINT "Repeating."
4 DO_SOMETHING_THAT_COULD_CHANGE_FOO
5 GOTO 2
6 PRINT "End of loop"
7 END
While loop execution order: 1,2,3,4,5,2 and then if FOO is 1, from there to 6 and 7, otherwise on to 3 again etc.

Note that bytecode has no operators, no expressions, no precedence. You provide operations in the order it is supposed to execute them. If you want to compare two strings, you do so in the instruction at the top of the loop, save its result to FOO, then loop over FOO:

1 FOO=COMPARE("a","b")
2 GOTO 6 IF FOO
3 PRINT "Repeating."
4 DO_SOMETHING_THAT_COULD_CHANGE_FOO
5 GOTO 1
6 PRINT "End of loop"
7 END

(Note how line 5 jumps to line *1* here, i.e. every time through the loop, the condition is evaluated, then the conditional GOTO tests it)

Retroactive code generation

Now how do you generate code for this? How do I know, before I have read and generated the individual instructions, what line the GOTO in line 2 will have to jump to?

Well, you don’t. Instead, what you do is write out the GOTO as

2 GOTO 0 IF FOO

and then later, when you reach the end of the loop in line 5, where you write out the GOTO that jumps back to the condition, you simply change the destination of line 2 after the fact. This is usually fairly easy if you write a function for parsing every syntax element. Take the following C-like program:

strcpy( txt, "a" );
while( compare(txt,"b") )
{
    print("repeating");
    do_something_that_could_change_txt();
}
print( "End of loop" );

You’d have a function CompileOneLine() that reads one line and looks at the first word. If it is “if” it calls CompileIfLine(), if it is “while” it calls CompileWhileLine(), “print” – CompilePrintLine() etc.

CompileWhileLine would look something like:

void CompileWhileLine()
{
    int conditionOffset = ReadAndCompileExpression( "FOO" );
    int conditionalGotoOffset = WriteInstruction( "GOTO IF", 0, "FOO" );
    if( NextChar() == '{' )
    {
        SkipChar('{');
        while( NextChar() != '}' )
            CompileOneLine();
    }
    else
        CompileOneLine();
    int gotoOffset = WriteInstruction( "GOTO", conditionOffset );
    SetDestinationOfGoto( conditionalGotoOffset, gotoOffset );
}

And since we call CompileOneLine() again to read the lines inside this while statement, we can nest while statements.

Variables

As demonstrated, this byte-code has one big downside: How do we do variables? We can’t put the variables in with the code, because that would mean that when a function calls itself, it would use the same variables as its previous iteration. So we need some sort of stack to keep track of the most recent function’s variables.

And this is what the stack in programming languages like C is: The code for a function has a prolog and an epilog. That is, a few instructions at the start, before any actual commands that the function contains, where it makes the stack larger to reserve space for the variables we need, and a few more after the end to get rid of them again. In our BASIC-like pseudocode, this could look like:

PUSH 0 // Reserve space for 3 int variables:
PUSH 0
PUSH 0
// ... actual code goes here
POP // Get rid of no longer needed 3 int variables
POP
POP

Now since we sometimes need a variable just for a short while (e.g. FOO for holding the loop condition’s result to hand off to the conditional GOTO instruction), it would be kind of awkward to find our variables by counting from the back of the stack. So for that we have a base pointer. A base pointer is another variable/register, like our program counter before. Before we push our variables on the stack, we write the size of the stack into the base pointer.

SavedBasePointerPUSH basePointer // Save the old basePointer (for whoever called us)
SET_BASEPOINTER // Write current stack size to basePointer
PUSH 0
PUSH 0
PUSH 0
// ... actual code goes here
POP
POP
POP
POP_BASEPOINTER // Restore the saved base pointer into the basePointer

Now, to use a variable, we can simply access it relative to the basePointer, i.e. basePointer[0]=1. No matter how many variables we add, these numbers stay the same for the lifetime of our function.

ParametersOnStackAnd once we have this, we can also implement parameters. Parameters are simply variables at a negative offset. Since whoever calls us pushes parameters on the stack right before calling us, they all can be found right before our saved basePointer. So basePointer[-2] is parameter 1, basePointer[-3] parameter 2 etc. This means the parameters need to be pushed in reverse order, but that’s all there is to it.

Returning

Given the above, it comes as no surprise that you can’t just return from your function. You need to make sure that, before you return, your clean-up code runs. So what we usually do is, we make a note of the position at which the clean-up code starts, put a RETURN statement right at its end, and then whenever the code wants to return, we just write the return value somewhere on the stack, GOTO the clean-up code and let it return.

Of course, you’ll have to remember all spots that GOTO that code, because when you write them, the clean-up code won’t have been written yet, and fill it out in retrospect. But if you got this far, that’s all old hat to you.

Strings and other (bigger) data

Illustration of a few instructions followed by a block of data with arrows from the instructions to their particular stringsAs you may have noticed, our instructions only have two ints as parameters. What if you need to e.g. provide a format string to printf? Well, you’ll need an instruction that loads a string constant. Pushes its address on the stack, which the print command can then grab and display.

Commonly, what one does is simply write the strings in the same file (or memory block) as the bytecode, e.g. right after the actual instructions, and then load the whole thing into RAM. Since you know how big your block of code is after you’ve written it, you can retroactively fill out the offset from the instruction to the string it is supposed to push on the stack, and at runtime, the instruction can use currentInstruction +n to get a pointer to it and push that on the stack.

And that’s pretty much how you create a bytecode interpreter and generate bytecode (and in a simplified form, this is how you generate machine code).

What a block really is

BlocksLegoBrickAfter quite a while of thinking that Objective-C blocks did some mean magic on the stack, it simply took me seriously using C++’s lambdas (their implementation of the concept) that I realized what blocks are.

Effectively, a block is simply a declaration of a class, plus an instantiation of one instance of that class, hidden under syntactic sugar. Don’t believe me? Well, let’s have a look at C++ lambdas to clear things up:

MyVisitorPattern( [localVariableToCapture]( MyObject* objectToVisit ) { objectToVisit->Print(localVariableToCapture); }, 15 );

The red part is a C++ block. It’s pretty much the same as an Objective-C block, with two differences:

  1. You explicitly specify which local variables to capture in square brackets.
  2. Instead of the ^-operator, you use those square brackets to indicate that this is a block.

Seeing the captured variables specified explicitly listed here, like parameters to a constructor, made me realize that that’s really all that a block is. In-line syntax to declare a subclass of a functor (i.e. an object whose entire purpose is to call a single of its methods), and return you an instance of that class. In ObjC-like pseudo-code, you could rewrite the above statement as:

@interface MYBlockSubClass : NSBlock
{
    int localVariableToCapture;
}

-(id) initWithLocalVar: (int)inLocalVariableToCapture;

-(void) runForObject: (MyObject*)objectToVisit;

@end

@implementation MYBlockSubClass
-(id) initWithLocalVar: (int)inLocalVariableToCapture
{
    self = [super init];
    if( self )
        localVariableToCature = inLocalVariableToCapture;
    return self;
}

-(void) runForObject: (MyObject*)objectToVisit
{
    objectToVisit->Print(localVariableToCapture);
}
@end

and at the actual call site:

MyVisitorPattern( [[MYBlockSubClass alloc] initWithLocalVar: localVariableToCapture], 15 );

The difference is that C++ (and even more so Objective-C) automatically declare the class for you, create the instance variables and constructor for the variables you want to capture, pick a unique class name (which you can see in the stack backtraces if you stop the debugger inside a block) and instantiate the class all in a single line of code.

So there you see it, blocks aren’t really black magic, they’re 99% syntactic sugar. Delicious, readability-enhancing syntactic sugar. Mmmmmh…

PS – Of course I’m simplifying. Objective-C blocks are actually Objective-C objects created on the stack, which you usually can’t do in plain Objective-C, though it can be done with some clever C code if you insist.

A more magical approach to blocks

That said, there is a fancier way for a compiler developer to implement blocks that also makes them 100% compatible with regular C functions:

If you implement a function in assembler, you can stick additional data onto the end of a function and calculate an offset between an instruction and the end of the function (e.g. by just filling the end of the function with a bunch of 1-byte No-Ops). This means that if someone duplicates a block, they’ll duplicate this data section as well. So what you can do is declare a struct equivalent to the captured variables, and implement your code with (pseudocode):

void    MyBlock( void )
{
struct CapturedIVars * capturedIVars = NULL;

currentInstruction:
    capturedIVars = pointer_to_current_instruction + ivarsSection-currentInstruction;

    // Block's code goes here.
    
    goto ivarsSectionEnd; // Jump over ivars so we don't try to execute our data.
    
ivarsSection:
    assembler_magic_to_reserve_some_space;
ivarsSectionEnd:
}

Now you can use the capturedIVars pointer to access the data attached to your function, but to any caller, MyBlock is just a plain old function that takes no arguments. But if you look at it from a distance, this is simply an object prefixed with a stub that looks like a function, so our general theory of blocks being just special syntax for objects holds.

I presume this is how Swift implements its blocks, because it really doesn’t distinguish between blocks and functions.

Why Cocoa programmers can’t have nice things

IMG_0315
Amy Worrall pointed me at a nice post on the technical feasibility of using exceptions for more than programming errors in Cocoa by Hari Karam Singh. Even though he is misled by some Mac documentation into thinking iOS didn’t have zero-cost exceptions and then disproves that documentation by disassembly, he draws lots of valid conclusions.

However, the problem is not one of technology. The problem is one of the sheer size of Apple’s Cocoa codebase, which would have to be updated to survive having an exception thrown through it. Apple would have to add @trys in every location where they call a SEL, after all, since they don’t know which of them may be user-provided and throw.

Since they’re not doing that, a user who decides to use exceptions anyway would have to add @trys to every method that might ever be called by the framework. That means you can’t catch exceptions thrown by that method when you call it, though, because it swallows them itself. So if you want to handle errors from that method, you either split it up into okButtonClickedThrows: and okButtonClicked:, duplicating every method and working in parallel with two error handling schemes, or you give up, like Apple, and just use one non-exception error handling scheme.

I love exceptions, but I don’t think my Cocoa code will be cleaner and error handling nicer if I put a try block at the top of every action and delegate method. NSError is less dangerous, because if an object returns nil and gives an error (and you don’t look at the returned error), the method call will simply collapse (call to NIL is a no-op) so nothing much will happen. Since I can’t put up an error dialog from drawing code or table delegate callbacks like numberOfSections, there’s not much difference there. The code is actually cleaner, because with NSError and nil returns I can just ignore errors, while with exceptions in an exception-unsafe Cocoa, I must catch here or I’ll risk throwing through Cocoa.

C++ also has an advantage when working with exceptions over Objective-C because it uses “Resource Acquisition Is Initialization” (or RAII for short). Locks, allocations, even changes of a global boolean can be implemented as stack objects using RAII to set themselves up when created and clean up behind themselves in their destructor. You don’t even have a ‘finally’ block in the language. OTOH, every method you write in an exception-handling ObjC would need an @finally block, even if it doesn’t care about the errors, just to clean up behind itself.

ARC, @autoreleasepool and @synchronized can help a little with clean-up of memory and locks these days, as they’ll get triggered on an exception anyway. But as Cocoa and Apple’s frameworks currently stand, using exceptions effectively doubles your work.

The same applies to existing code. Nobody wants to have to completely rewrite their apps for 10.9 just to adopt a new error handling scheme when their code already has working error handling with NSError. Apple understands that their developers want a certain degree of backward compatibility. That’s the reason why only iOS got the new runtime on 32-bit: There was no code that relied on the old runtime there, it was a new platform. But all existing Mac applications would have been broken if the system had suddenly no longer guaranteed instance variable layouts and method lookup-tables. However, since 64-bit required changes to pointer sizes and data structures anyway, nobody complained when Apple introduced the new runtime for 64 bit on the Mac. They had to re-test and update their applications anyway.

All that said, I would love a new Objective-C framework that uses exceptions and is exception-safe for new projects to be built against. It just doesn’t seem like something Apple can retrofit at a whim.

At best, they can slowly make each framework exception-safe, and then in every spot where there can be an error, instead of returning it, look at an “app supports exception handling”-flag and throw their errors if that is set. That way, existing applications will keep working, while new applications can be written exception-safe. And once the majority of applications have moved to exceptions, Apple can switch to using exceptions themselves (see above — you don’t want 2 versions, exception-safe and unsafe, of every method), and tell the stragglers to please make their code exception-safe.

A proposal: Categories for C++

Categories?

One of the most useful features I’ve used in object-oriented programming languages is the “category”. A category is a way of adding methods to a class dynamically. Take, for example Objective C’s “NSAttributedString” class. As defined in the Foundation framework, it is simply a string where you can attach key-value pairs to a range of it.

Only a category in the GUI framework AppKit actually defines the methods and constants that define what key to use for “bold”, and let you draw such a string to the screen. Yet in everyday use with AppKit, these two disparate parts feel like one homogeneous whole. And, more importantly, anyone who wants to write a text-processing command line tool can use NSAttributedString without having to drag in all that unneeded drawing code.

How C++ objects work

But C++ does not have this feature. So let’s come up with an implementation of this concept that a compiler vendor could implement, and that stays true to the core of C++, mainly its compile-time determination of as much as possible. But of course we want to be able to add a category to a system class or a class in another module, and want to define virtual methods in a category, and override them in a category on a subclass. So, of course we want to end up at something that looks like this in C++:

category BarSupport : MyObject // extend class MyObject with some methods.
{
    void doBar( MyObject baz );
};

and can be called just like any other method on MyObject, e.g.

MyObject foo, dodo;
foo.doBar( dodo );

If we didn’t have to support virtual methods, things would be trivial. A non-virtual method call like

MyObject foo;
foo.doBar(baz);

compiles to something like

MyObject_doBar( foo, baz );

Where the this pointer is simply passed in as the first parameter before the first parameter you defined. So all we’d have to do is tell the parser about these new methods, compile the functions that implement them, and call them.

But virtual methods work differently. First, there is a virtual function table, something like this:

struct MyObjectVTable
{
    void (*doFoo)( struct MyObject* this ); // defined by user as doFoo(void).
};

And whenever you define a new class, what it actually does is add a hidden instance variable at the start:

struct MyObject
{
    struct MyObjectVTable *vtable;
    int                   firstInstanceVariable;
};

And it declares a global variable containing the vtable once, for all objects created with this class, and stashes it in each new object’s vtable instance variable:

struct MyObjectVTable gMyObjectSharedVTable = { MyObject_doFoo };

...

// Equivalent code to MyObject* foo = new MyObject :
struct MyObject* foo = malloc(sizeof(struct MyObject));
foo->vtable = &gMyObjectSharedVTable;
MyObject_Constructor( foo );

Here, MyObject_doFoo() is a function just like our

MyObject_doBar()

above. But when a virtual method is called, it is done slightly differently:

foo.vtable->doFoo( &foo );

This means that a subclass that wants to override doFoo() can provide its own gMySubclassSharedVTable that is also a struct MyObjectVTable, but contains a pointer to MySubclass_doFoo instead, and thus overrides the behaviour of doFoo() for all MySubclass objects. And if it wants to define additional virtual methods in its subclass, it can simply declare a struct MySubclassVTable that starts with the same fields in the same order as struct MyObjectVTable, and contains the additional methods after that. That way, anyone who expects an object for the base class doesn’t even have an inkling that there is more stuff after the methods it knows.

Applying that to Categories

Our categories also want to add methods, but the problem is we can’t just add new fields to the struct. The system classes like std::string have already been compiled, and we can’t just change their code to add these ivars. But what we can do is declare a parallel class hierarchy for our categories. So, first we declare a struct for our category:

struct MyObject_BarSupportVTable
{
    void (*doBar)( struct MyObject* this, struct MyObject* baz );
};

Then we extend the vtable to list categories:

struct CategoryEntry
{
    MyObject_BarSupportVTable* catVTable; // Simplified, each category's vtable is really a different struct.
};

struct MyObjectVTable
{
    struct CategoryEntry *cattable;
    void (*doFoo)( struct MyObject* this ); // defined by user as doFoo(void).
};

When a class is first built, the cattable array is empty, but as soon as someone declares a category, it gets added to that list. We just add some initialization code at the start of main() that mallocs the list. But how do we look up the vtable for a category? We could store its name, but then we’d have to loop over this list on each call and compare category names until we find it. How can we make that more efficient?

Simple: When we add the category to the class, we remember the index into the array where we added this category in the list, cache it in a global variable int gMyObject_BarSupport_Index; and then we can call a virtual method in a category like:

foo.vtable->cattable[gMyObject_BarSupport_Index].catVTable->doBar( &foo, baz );

Again, a subclass-category that overrides this method can just provide its own doBar method in the struct MyObject_BarSupportVTable.

The cost of categories

Of course this means that, just like with C++ virtual methods, and a bit more so, you pay a price for calling a virtual method in a category. You also pay with a little bit of overhead at startup, when the categories are added to the base class vtable. And just like with C++ virtual methods, to override a category method in a subclass, you have to know that the category exists on the base class. And if your program uses threads and you load a dynamic library that contains a category on a system class, bad things could happen while the category lists change under your running code’s rear.

Another gotcha with this approach is that the category list has to be built from the base class to the subclasses, because once a subclass locks down an index in the table for its first category vtable, you can’t add another category to the base class (it would have the index of the subclass’s category). But even that can be fixed:

We can just give every subclass its own categories table. So it only deals with the base class table when it wants to call a method inherited from the base class. E.g.:

struct MySubclassVTable
{
    struct CategoryEntry *cattable;
    void (*doFoo)( struct MyObject* this ); // defined by user as doFoo(void).
    struct CategoryEntry2 *cattable2;
    void (*doBar)( struct MyObject* this, struct MyObject baz ); // defined by user as doBar( MyObject baz ).
};

And now, calling a method in a category specific to the subclass simply goes through the cattable2.

Thoughts, suggestions, additional runtime geekery?

Creativity Finds a Way

Uli's xDraw XCMD screenshot

Great observations

There is currently a nice little discussion on HyperCard going on in the comments on Stanislav Datskovskiy’s article Why HyperCard had to Die:

The article looks at the right facts, but I think draws the wrong conclusions: Yes, HyperCard was an echo of an era where a computer was a complex machine, and its owners were tinkerers who needed to customize it before it became useful. Yes, when Steve Jobs came back, he killed a lot of projects. And the Steve Jobs biography mentions that he doesn’t like other people screwing around with his designs.

But I do not think this automatically leads to the conclusion that Apple is on a grand crusade to remove the users’ control over their computers. Nor does it mean what many of the commenters say, that Apple is trying to dumb down programs and that programmers are underestimating their users.

How people work

Every programmer knows how important a coherent whole is: If a button appears in the wrong context, it will easily (and unintentionally) trick the user into thinking it does the opposite of what it really does. You can add paragraphs over paragraphs of text telling your users the opposite and they will not read it.

This is not because users are stupid, but because users “scan”. Screens are complex, and full of data. For the user to find something without spending hours of their life on it, they tend to quickly slide their eyes across the page, looking for words that come from the same category as the thing they are trying to do next.

This is a human efficiency optimization. It is a good thing. If we didn’t have this mechanism, we’d probably all be autistic, and incapable of coping with the world. Once a word is found, the user starts reading a little bit around it to verify the impression that this is what they want, and then they click the button.

It seems trivial to engineer a program for that, but it’s easy to overlook that a computer is not a single application at a time. There are other things happening on the screen, there may be other windows open. There may be system alerts popping up. Even if they are marked with each application’s icon or name, chances are that most users are too busy getting actual work done to memorize application names and icons. They won’t be able to distinguish what is your application, what is another.

Similar with haxies. Any halfway successful programmer probably has a story of how they tried to track down a crash or oddity a user encountered in their program that was actually caused by a plug-in or haxie that injects itself into every application to modify some behaviour system-wide. And once they are installed, even I occasionally forgot I had them installed. Or didn’t expect it to have an effect; Why should a tool that watches when my cursor hits the edge of my screen and then remote-controls the cursor on another computer as if it was an attached screen cause the menu bar to just randomly not show up when switching between applications?

Software is complex. Designing reliable, usable software is complex. In a comment, Stanislav had a great analogy for this (in response to someone’s pipe dream that one would just have to use HTML, and the technical stuff was all already done, you just had to add the human touch):

All the pieces of the world’s greatest statue are sitting inside a granite mountain. Somebody just has to come and chip away all the extra granite, adding the human touch. The technical problems are all virtually solved!

Software is hard. I don’t say this because it makes me sound cooler when I say I’m a programmer, but because you’re not just building a thing. You are building behaviours. HyperCard was notorious for being the tool for the creation of a number of the ugliest, least Mac-like programs ever released on the Mac. Because even with the best camera, your movie is only as good as the camera man.

So was Steve Jobs happy to get rid of HyperCard and stop people from screwing with his design? Probably. Was he forced to let it linger instead of killing it outright because he didn’t want to lose the educational market? I can’t disprove it. But Steve Jobs was also known to be an idealist. He genuinely thought his work would improve the world. What would he gain by making everyone dumb and uncreative?

Why assume malice when Occam’s Razor is a much better explanation?

You can’t hold a good idea down

When the Mac was originally released, it was intended as a machine for everyone. To bring computers to the masses. Almost from day one, the goal of Apple Computer, Inc. has been to drop the darned “Computer” from their name. Compared to the mainframes of the time, the Apple ][ that started the home computing revolution already was a “dumbing down” of computers.

Was this the end of the world? Should we have stayed in the trees? Will people become un-creative? Look around on the net. There are people out there who have no programming skills, who dig around in the games they bought and modify them, create their own levels, use existing game engines to create a game about their favorite book or TV show. Heck, there are people out there who create a 3D game engine in Excel.

If there is one thing we can learn, it is that Creativity Finds a Way.

HyperCard was designed in the late 1980s, for hardware of the time, for what very smart people thought would be the future at the time. Being creative with a computer, at the time, meant writing code. So they gave us a better programming language. Ways to click on a “Link to…” button to create the code to change pages. Not unlike Henry Ford’s competitors would have built you a better horse, but not a car.

Yes, I am saying that the developers of HyperCard didn’t quite anticipate the future correctly. They didn’t anticipate the internet, for example. That’s not a shame. It was ’87 back then. I didn’t get what the internet would be good for in ’91. I probably wouldn’t even have managed to invent a better horse. But anyway, all I am saying is that HyperCard’s creators didn’t know some things we know now, and probably made some compromises that wouldn’t make sense now.

The world has changed: This is 2011! All our programs do so much more. You can create 3D graphs in Excel, colorful drawings and animations in Keynote, and upload it all to the web with Sandvox. So many tools are available for such low prices. Why would you bother with a low-level, rudimentary tool like HyperCard when all you want to do is a movie with some branching?

A new tool for a new world

After all that, it might surprise you that I still agree with everyone in the comments who says that we need a new HyperCard for the 2010s. However, I do not agree that any of the examples the commenters mentioned (or even HyperCard as it shipped) are this program. Yes, Xcode and the NeXT-descended dev tools, and VB and others use the Rapid Application Development drag-and-drop manipulation to lay out your UI. But guess what? So does Pages.

Yes, you can use Ruby and Python and Smalltalk to branch between different choices. Or you could just use links to move between web pages built using Sandvox.

Yes, you can build real, runnable applications from your work with Java or AppleScript. But why would anyone want to build an application? Movies can be uploaded to YouTube, web sites can be built with WordPress, and I don’t have to transfer huge files to users. I just send my friends the link, and they know what to do. There’s no installer.

Our computing world has become so much richer, so much easier, that it is more efficient and actually smarter to just create your stuff with those tools that we old HyperCarders see as dumb. They can stand on the shoulders of giants, and spend their time creating the best possible gameplay instead of coding yet another 3D renderer. That is why HyperCard 2.4 just won’t cut it, or as David Stevens commented on that very same article:

most people get on a train to go somewhere, not because they really want to lay track, which explains the shortage of track laying machines in yard sales, and the demise of HyperCard.

The new HyperCard won’t be like HyperCard. Maybe the web is enough. Maybe it will just be a good “web editor”, like it used to be included in every copy of Netscape back in the day.

Or maybe, it will just be a niche product aimed at people who find that they want to do more than their tools let them do. This will not be the typical movie-maker, podcaster or writer. Like the directors, radio hosts or journalists in the generations before them, those will specialize. They will be exceptional at directing, making a show or researching the truth. But they will not care how the camera transports the film, they won’t care how their voice is really broadcast as radio waves and re-assembled in the receiver, nor how to build a printing press.

The people a new HyperCard is aimed at will be a person like you, who saw HyperCard, and at some point stood up and said: This is not enough. I want to create more. And then maybe went out and bought CompileIt!, which let her use the system APIs from the comfort of her HyperCard stack, only needing to use the scary memory management stuff when absolutely necessary. And then went and bought MPW, or THINK C, or CodeWarrior, or Xcode.

A real programmer doesn’t program because she wants to use HyperCard. A real programmer programs because she wants to. Because she just has to. A real programmer doesn’t limit herself to that one language and IDE she learned once so she never has to learn anything else. A real programmer learns new programming languages because each language teaches her a new way of looking at or solving a problem. A real programmer has been programming since she opened PowerPoint for the first time. She will find a way.

It’s been like that back in the days of HyperCard. Why shouldn’t it be like that again?

Dennis Ritchie Deceased

Apparently, a few days ago, Dennis Ritchie, the “R” in “K&R”, co-creator of the C programming language has died.

Thank you for laying the groundwork for our profession, Mr. Ritchie.

Funny thing about C parameter evaluation order…

I just explained this to a friend today, and thought this might make an interesting blog posting:

#include <stdio.h>

int main( int argc, const char * argv[] ) { char theText[2] = { 'A', 'B' }; char* myString = theText; printf( "%c, %c\n", *(++myString), *myString );

return 0; }

The above code is platform-dependent in C. Yes, you read correctly: platform dependent. And I’m not nitpicking that this may cause a problem if your compiler is old or that some compiler may not have printf() or the POSIX standard.

This code is platform-dependent, because the C standard says that there is no guarantee in which order the parameters of a function call get evaluated. So, if you run the above code, it could print B, B (which most of you probably expected because it corresponds to our left-to-right reading order) or it could print B, A.

If you want to test this and you own an Intel Mac, you can do the following thanks to Rosetta’s PowerPC emulation: Create a new “Standard Tool” project in Xcode and paste the above code into the main.c file. Switch to “Release” and change “Architectures” in the build settings for the release build configuration to be “ppc”. Build and Run. It’ll print B, B. Now change the architecture to “i386” and build and run again. It’ll print B, A.

So, why doesn’t C define an order? Why did anyone think such odd behaviour was a good idea? Well, to explain that, we’ll have to look at what your computer does under the hood to execute a function call. In general, there are two steps: First, the parameters are evaluated and stored in some standardized place where the called function can find them, and then the processor “jumps” to the first command in the new function and starts executing it.

Some CPUs have registers inside the CPU, which are little variables that can hold short values, and which can be accessed a lot quicker than actually going over to a RAM chip and fetching a value. There are different registers for different kinds of values. Many CPUs have separate registers for floating-point numbers and integers. And just like with RAM, it’s sometimes faster to access these registers in a certain order.

So, it may be faster to first evaluate all integer-value parameters, and then those that contain floating-point values. Depending on what physical CPU your computer has (or in the case of Rosetta, what characteristics the emulated CPU your code is being run on has), these performance characteristics may be different. Some CPUs may have so few registers that the parameters will always have to be passed in RAM. Others may put larger parameters in RAM and smaller ones in registers, others again may put the first couple parameters in registers (maybe even distributing a longer parameter across several registers), and the rest that don’t fit in RAM, etc.

So, to make sure C can be made to run that little bit faster on any of these CPUs, its designers decided not to enforce an order for execution of parameters. And that’s one of the dangers of writing code in C++ or Objective C: It may look like a high-level language, but underneath it is still a portable assembler, with platform-dependencies like this.

Generating Machine Code at Runtime

Okay, so my next attempt at learning how my computer works and how to speak machine language is the following C code fragment:

typedef int (*FuncPtr)();

// Create a function:
char            testFunc[] = { 0x90,                         // NOP (not really necessary...)
                               0xB8, 0x10, 0x00, 0x00, 0x00, // MOVL $16,%eax
                               0xC3 };                       // RET

// Make a copy on the heap, OS doesn't like executing the stack:
FuncPtr         testFuncPtr = (FuncPtr) malloc(7);
memmove( (void*) testFuncPtr, testFunc, 7 );

printf("Before function.\n");
int result = (*testFuncPtr)();
printf("Result %d\n", result);

Basically, this stores the raw opcodes of a function in an array of chars. The first byte of each line is usually the opcode, i.e. 0x90 is No-Op, 0xB8 is a MOVL into the eax register (with the next 4 bytes being the number to store, in this case 16), and 0xC3 is the return instruction (I had to look up the opcodes in Intel’s documentation).

One thing to watch out for here (at least on Mac OS X), is that you’ll get a bad access error if you try to execute testFunc directly. That’s because testFunc is on the stack, and the stack shouldn’t contain executable code (it’s a small safety measure). So, what we do is we simply malloc some memory on the heap, and stuff our code in there.

You may wonder why I’m using eax of all registers to store my number 16 in. Easy: Because the convention is that an int return value (and most other 4-byte return values) goes in eax when a function returns. So, what this does is it essentially returns 16. Which our printf() proves. Neat!

Intel’s documentation describes the opcodes in a very complicated way, so what I essentially do is I write some assembler code and enclose the instruction whose byte sequence I want to find out in instructions whose byte sequence I already know (I like to use six nops, which are short and show up as 0x90 90 90 90 90 90). Then I compile that, and then use a hex editor to search for the known instructions, and whatever is between them must be my new one. Here’s a small table of other operations you may find in the typical program and what byte sequences they turn to:

0x50 pushl %eax
0x53 pushl %ebx
0x55 pushl %ebp
0x89 E5 movl %esp, %ebp
0x90 nop
0xB8 NN NN NN NN movl $N, %eax
0x68 NN NN NN NN pushl $N
0xE8 NN NN NN NN call relativeOffsetNFromEndOfInstruction
0x8B 1C 24 movl (%esp), %ebx
0x8D 83 NN NN NN NN leal relativeOffsetToData(%ebx), %eax
0x8D 85 NN NN NN NN leal relativeOffsetToData(%ebp), %eax
0x5B popl %ebx
0x83 C4 NN addl $NN,%esp
0x83 EC NN subl $NN,%esp
0x8B 00 movl (%eax), %eax
0x89 45 NN movl %eax, NN(%ebp)
0xC9 leave
0xC3 ret

The code fragment above is essentially what one would need to create a just-in-time compiler. For a real compiler, instead of executing this directly, we’d have to write it to a complete MachO file and link it with crt1.o.

Update: on top of the instructions for position-independent code (PIC), I’ve also added some more useful in passing structs as parameters on the stack.