There are 7 entries in this Category.

Raw graphics output in Linux: Part 2


In Part 1 of this series, we’ve set up a command-line Linux in the VirtualBox emulator with support for direct frame buffer access, the git version control system and the clang compiler. Now let’s use this to draw graphics to the screen “by hand”.

Getting the code

The code we’ll be using is on my Github. So check it out, e.g. by doing:

mkdir ~/Programming
cd ~/Programming
git clone 'https://github.com/uliwitness/winner.git'

Now you’ll have a ‘winner’ folder in a ‘Programming’ folder inside your home folder. Let’s build and run the code:

cd winner
sudo ./winner

Screen Shot 2015-10-03 at 16.48.13

This code just drew a few shapes on the screen and then immediately quit. The Terminal was rather surprised about that, so just prints its last line on top of that.

How to access the screen

It took me a bit of googling, but eventually I found out that, to draw on the screen in Linux, you use the framebuffer. As most things in Linux, the frame buffer is a pseudo-file that you can just open and write to. This pseudo-file resides at /dev/fb0, and is the whole reason for the extra hoops we jumped through in Part 1 because a minimal Ubuntu doesn’t have this file.

So if you look at the file linux/framebuffer.hpp in our winner subversion repository, it simply opens that file and maps it into memory, using the ioctl() function and some selector constants defined in the system header linux/fb.h to find out how large our screen is and how the pixels are laid out.

This is necessary, as at this low level, a screen is simply a long chain of bytes. Third row chained after second row after first row. Each row consists of pixels, which consist of R, G, B and optionally alpha components.

By mapping it into memory, we can use the screen just like any other block of memory and don’t have to resort to seek() and write() to change pixels on the screen.


Since computers are sometimes faster when memory is aligned on certain multiples of numbers, and you also sometimes want to provide a frame buffer that is a subset of a bigger one (e.g. if a windowed operating system wanted to launch a framebuffer-based application and just trick it into thinking that the rectangle occupied by its window was the screen), the frame buffer includes a line length, x-offset and y-offset.

X and Y offset effectively shift all coordinates, so define the upper left corner of your screen inside the larger buffer. They’re usually 0 for our use case.

The line length is the number of bytes in one row of pixels, which may be larger than the number of pixels * number of bytes in one pixel, because it may include additional, unused “filler” bytes that the computer needs to more quickly access the memory (some computers access memory faster if it is e.g. on an even-numbered address).

Actually drawing into the frame buffer

The actual drawing code is in our image class, which doesn’t know about frame buffers. It just knows about a huge block of memory containing pixels, and its layout.

The main method in this class is set_pixel() which calculates a pointer to the first byte of a pixel at a given coordinate, and then, depending on the bit depth of the pixels in the bitmap, composes a 2-byte (16 bit) or 4-byte (32 bit) color value by filing out the given bits of our buffer.

All other drawing methods depend on this one:

Drawing rectangles

If you look at fill_rect, it simply takes a starting point (upper left corner of the rectangle) and then fills rows of pixels with that color.

To draw a frame around a rectangle is almost the same. We simply fill as many top and bottom rows as our line width dictates, and the rows in between get filled with a pixel (or whatever our line width is) at the left and right of our rectangle.

Drawing lines

Drawing one-pixel lines involves a tad of basic maths, but it’s nothing that you couldn’t get from a quick glance at Wikipedia. You take the line equation called the “point-slope-form”.

Then you calculate the line’s slope based on your start and end point. If the line is more horizontal than vertical, you loop over the X coordinate from start to end and use that and the slope to calculate the corresponding Y. If it is more vertical than horizontal, you loop over the Y coordinate to get the X instead.

Now, if you use this naïve approach, you may get small gaps in the line, because lines work with fractional numbers, while our computer screen only has full, integer pixels. This is why this example uses a variation on the same process that was invented by someone named “Bresenham”, which keeps track of the loss of precision and adds pixels in as needed.

Now drawing a line of more than one pixel width is a little harder. You see, lines are really infinitely thin, and don’t have a width. When you draw a line of a certain width, what computers usually do is either draw a rotated rectangle that is centered over the line and is as long as it is, and as wide as your line width, or it simply rubber-stamps a filled square or circle of the line width centered over each point on the line, which gives a similar look.

I essentially go with the latter approach in this example, but since I plan to eventually support different opacity for pixels, I do not want to draw whole boxes each time, because they would overlap and a 10% opaque line would end up 20% opaque in every spot where they overlap. So I just detect whether a line is mainly horizontal or vertical, then draw a horizontal or vertical 1 pixel line of the line width through each point.

This isn’t quite perfect and gives diagonal lines a slanted edge, and makes them a bit too wide, so I eventually plan to at least change the code so the small lines are drawn at a 90° angle to the actual line you’re drawing. But that’s not done yet.

Drawing circles

Again, I just get the equation for circles off Wikipedia. It says that r2 = (x-centerX)2+(y-centerY)2. Where “r” is the radius of the circle you want to draw, x and y are the coordinates of any point which you want to test whether it is on the circle, and centerX and centerY are the center of the circle.

Once you know that, you can draw a circle like you draw a rectangle. You calculate the enclosing rectangle of our circle (by subtracting/adding the radius from/to the center point) and then, instead of just drawing the rectangle, you insert each point into the circle equation. If the right-hand-side equates to r2 or less, the point is in the circle, and you can draw it, otherwise you skip this point.

Drawing the outline of a circle is just a specialized version of filling it here. Instead of checking whether the equation comes up as < r2, you also check whether it is greater than (r -lineWidth)2. So essentially you’re checking whether a point lies between two circles, the inner edge of your outline, and the outer edge of it.

This is probably not the optimal way to draw a circle, but it looks decent and is easy enough to understand. There are many tricks. For example, you could calculate only the upper right quarter of the circle, then flip the coordinate horizontally and vertically around the center and thus draw 4 points with every calculation. Bresenham even came with an algorithm where you only calculate 1/8th of a circle’s pixels.


The library doesn’t do ovals yet, but I think they could be implemented by using the circle equation and multiplying the coordinate of the longer side of the surrounding rectangle by the ratio between width and height. That way, your coordinates are “projected onto a square”, in which you can use the circle equation.

There are probably more efficient ways to do this.

Drawing bitmaps and text

To draw a bitmap (or rather, a pixel map) is basically a special case of rect drawing again. You take a buffer that already contains the raw pixels (like letterA in our example main.cpp). For simplicity, the code currently assumes that all images that you want to draw to the screen use 32-bit pixels. That also allows us to have a transparency value in the last 8 bits.

It simply draws a rectangle that is the size of the image, but instead of calling set_pixel() with a fixed color, it reads the color from the corresponding pixel in the pixel buffer we are supposed to draw. It also only draws pixels that are 100% opaque.

Text drawing is now simply a special case of this. You create a bitmap for every letter, then when asked to draw a certain character, load the corresponding bitmap and draw that. Of course, serious text processing would be more complex than that, but that is the foundational process as far as a drawing engine is concerned.

You’d of course need a text layout engine on top of that to handle wrapping, and other code to e.g. combine decomposed characters. Also, if you wanted to support the full Unicode character set (or even just all Chinese glyphs), you’d probably want to make your look-up happen in a way that you don’t need to load all bitmaps immediately, but can rather lazy-load them as they are used.


When we later implement our own window manager, we will need to be able to have windows overlap. To do that, we need to be able to designate areas as “covered” and have set_pixel() just not draw when asked to draw into those.

This is not yet implemented. The general approach is to have a bitmap (i.e. a pixel buffer whose pixels only occupy 1 bit, on or off) of the same size as our pixel buffer that indicates which pixels may be drawn into (usually that’s called a “mask”).

There are of course various optimizations you can apply to this. The original Macintosh’s QuickDraw engine used a compressed form of a bitmap called a “Region”, which simply contained entries for pixels in each line indicating the length of each color. I.e. “5 pixels off, 10 pixels full”. Some graphics engines simply only allow to clip to rectangles (which can be described by 4 coordinates). If all your windows are rectangular, that is sufficient.

The only clipping the image class currently implements is that circles that fall off any of the edges get clipped, and that rectangles and bitmaps that fall off the bottom or right edges get clipped. The way rectangles are currently specified, it is impossible to have them fall off the left or top, as that would require negative coordinates.

If you currently try to draw outside the image’s defined area using set_pixel(), you will corrupt memory. For a shipping drawing system you’d want to avoid this, and we’ll get to this once we implement a higher-level drawing system on top of this one that deals with clipping, coordinate systems and transformations.

Raw graphics output on Linux: Part 1


In my quest to understand better how my computer works, I decided I want to write a very minimal window server. The first step in that is to create something that performs raw graphics output to the screen, directly to its back buffer.

So, as a test bed, I decided to grab the VirtualBox emulator and install Ubuntu Minimal on it. Ubuntu Minimal is a (comparatively) small Linux that is still easy to install, and will provide the graphics drivers we’ll be talking to, and a file system and a loader to load the code to run.

If you just want to know how drawing itself works, feel free to skip to Part 2 in this blog series.

Setting up the virtual machine

Setting up a VM is fairly self-explanatory with the setup assistant in VirtualBox. It has presets for Linux and even for various Ubuntus, and most of the time the defaults are fine for us:

Screen Shot 2015-10-03 at 01.15.15

Screen Shot 2015-10-03 at 01.15.44

Screen Shot 2015-10-03 at 01.15.51

Screen Shot 2015-10-03 at 01.16.06

Screen Shot 2015-10-03 at 01.16.19

I’m choosing to name the VM “Winner”, short for window server, but you can choose whatever name you like:

Screen Shot 2015-10-03 at 01.16.34

Now you have a nice emulated empty computer

Screen Shot 2015-10-03 at 01.16.50

Now, we need to tell it to pretend that the mini.iso Linux disk image file we downloaded from Ubuntu was a CD inserted in its optical drive by selecting the “Empty” entry under the CD, then clicking the little disc icon next to the popup on the right to select a file:

Screen Shot 2015-10-03 at 01.17.14

Note that you would have to use the “Choose Virtual Optical Disk File…” item, I have the mini.iso entry in here already because I previously selected the file.

Screen Shot 2015-10-03 at 01.17.28

Screen Shot 2015-10-03 at 01.17.40

Now you can close the window using the “OK” button and click the green “Start” arrow toolbar icon to boot the emulated computer.

Installing Ubuntu Minimal

Screen Shot 2015-10-03 at 01.18.35

Ubuntu will boot up. Choose “Command-Line install” and use the arrow and return keys to navigate through the set-up. Pick your language, country and keyboard layout (if you’re on a Mac, choose to tell it instead of having it detect, and pick the “Macintosh” variant they offer):

Screen Shot 2015-10-03 at 01.18.49

It will then churn a bit:

Screen Shot 2015-10-03 at 01.21.03

And then it will ask you to name your computer:

Screen Shot 2015-10-03 at 01.21.24

You can pick pretty much any name for your emulated home computer, it doesn’t really matter for what we are doing. I picked “winner”.

Then it will ask you to choose the country you are currently in, so it can pick the closest server for downloading additional components:

Screen Shot 2015-10-03 at 01.21.35

And if they have several servers in your country, they’ll offer a choice. Just pick whatever it offers you, it’ll be fine.

Screen Shot 2015-10-03 at 01.21.58

Then it will ask you if you need to use a proxy. Unless you’re in a weird restrictive company or university network or trying to get around an oppressive government’s firewall, you can just leave the field empty and press return here to indicate no proxy is needed:

Screen Shot 2015-10-03 at 01.22.18

Then it will churn some more, downloading stuff off the internet etc.:

Screen Shot 2015-10-03 at 01.22.42

Now it’s time to set up your user account, password (twice) etc.:

Screen Shot 2015-10-03 at 01.23.39

Screen Shot 2015-10-03 at 01.23.45

In this emulator, we don’t need an encrypted hard disk (If you need it, your computer’s hard disk is probably already encrypted, and your emulated computer’s files are all stored on that anyway).

Screen Shot 2015-10-03 at 01.24.40

Then it will ask you about some system clock settings (the defaults should all be fine here:

Screen Shot 2015-10-03 at 01.25.06

Then it will ask how to partition and format the hard disk. You’re not dual-booting anything, the emulated computer is for Linux only, so just let it use the entire disk:

Screen Shot 2015-10-03 at 01.25.31

And don’t worry about selecting the wrong disk, it will only offer the emulated hard disk we created. Tell it to create whatever partitions it thinks are right:

Screen Shot 2015-10-03 at 01.26.02

And it will churn and download some more:

Screen Shot 2015-10-03 at 01.26.11

Since we may want to keep using this for a while, let’s play it safe and tell it to apply any important updates automatically:

Screen Shot 2015-10-03 at 01.36.03

And when it asks if it is OK to install the boot loader in the MBR, just say yes:

Screen Shot 2015-10-03 at 01.38.22

Again, there is no other operating system inside this emulation, they’re just being overly cautious because so many linux users have weird setups.

For the same reason, you can just let it run the emulator with a UTC system clock as it suggests:

Screen Shot 2015-10-03 at 01.38.38

That’s pretty much all. Tell it to restart, and quickly eject the CD disk image by un-checking it from your “Devices” menu:

Screen Shot 2015-10-03 at 01.38.39

Setting up Ubuntu

Ubuntu is pretty much ready to go. You’ll have a neat command line OS. However, for our purposes, we want to have graphics card drivers. Since this is the minimal Ubuntu, a lot is turned off, so let’s turn that back on again and install some missing parts that we want for our experiments. Log in with your username and password and edit the configuration file /etc/default/grub which tells the bootloader what to do:

Screen Shot 2015-10-03 at 12.22.58

If you’re unfamiliar with the Unix Terminal, just type sudo nano /etc/default/grub and enter your password once it asks. sudo means pretend you’re the computer’s administrator (as we’re changing basic system settings, that’s why it wants your password). nano is a small but fairly easy to use text editor. It shows you all the commands you can use at the bottom in little white boxes, with the keyboard shortcuts used to trigger them right in them (“^” stands for the control key there):

Screen Shot 2015-10-03 at 12.23.33

Most of the lines in this file are deactivated (commented out) using the “#” character. Remove the one in front of GRUB_GFXMODE to tell it we want it to use a graphical display of that size, not the usual text mode that we’re currently using.

Save and close the file (WriteOut and Exit, i.e. Ctrl+O, Ctrl+X in nano).

Now usually this would be enough, but Ubuntu Minimal is missing a few components. So now type sudo apt-get install v86d. This tells Ubuntu to install the v86d package that does … something. If you left out this step, you would get an error message telling you that v86d doesn’t work on the next step. Confirm that you want to install these whopping 370kb of code by pressing “y” when asked. It will churn a bit.

Type in sudo modprobe uvesafb. The graphics drivers on Linux all implement the so-called “framebuffer” commands. That’s what “fb” here stands for. VirtualBox emulates a “VESA” display, and “uvesafb” is the modern version of the “vesafb” graphics driver you’d want for that. So we’re telling our Kernel to load that module now.

If all works, all that you should see is that your screen resizes to 640×480, i.e. becomes more square-ish:

Screen Shot 2015-10-03 at 12.25.54

Now we don’t want to manually have to activate the frame buffer every time, so let’s add it to the list of modules the Kernel loads automatically at startup. Type sudo nano /etc/initramfs-tools/modules to edit the module list and add “uvesafb” to the end of the list (in my case, that list is empty):

Screen Shot 2015-10-03 at 14.51.45

The professionals also suggest that you check the file /etc/modprobe.d/blacklist-framebuffer.conf to make sure it doesn’t list “uvesafb” as one of the modules not to load. If it does, just put a “#” in front of it to deactivate it.

Screen Shot 2015-10-03 at 12.51.22

Now run sudo update-initramfs -u which tells the system to re-generate some of the startup files that are affected by us adding a new module to the list. It will churn for a moment.

Now we need a nice compiler to compile our code with. There’s probably a copy of GCC already on here, but just for kicks, let’s use clang instead, which gives nicer error messages. Enter sudo apt-get install clang:

Screen Shot 2015-10-03 at 12.26.34

Finally, we need a way to get our source code on this machine, so let’s install the git version control system:

sudo apt-get install git

OK, now pretty much everything we need is set up. Part 2 in this series will get us to actually running some code against this graphics card driver.

You can shut down your virtual Linux box until you’re ready to try Part 2 by typing sudo poweroff.

Death to Booleans!


One of the most annoying aspects of most C-descended languages is that function calls become kind of unreadable when they have more than a single boolean parameter. The calls start looking like:

    OpenFile( "/etc/passwd", true, true, false );

and you have no idea what effect each boolean actually has. Sometimes people solve this by naming all parameters in the function name, but of course that doesn’t permit adding more optional parameters to a function later, because you’d have to change the name:

    OpenFilePathEditableSaveSavingAllowNetworkURLs( "/etc/passwd", true, true, false );

A disciplined programmer will solve this by adding an enum and using that instead of the booleans:

    enum FileEditability { kReadOnly, kEditable }
    enum FileSafeSaveability { kSafeSave, kOverwriteInPlace }
    enum FileAllowNetworkURLs { kFileURLsOnly, kAllowNetworkURLs };
    void    OpenFile( const char* path, enum FileEditability fe, enum FileSafeSaveability fs, enum FileAllowNetworkURLs fu );

Or maybe just make all booleans a “flags” bitfield:

        kEditable = (1 << 0),
        kSafeSave = (1 << 1),
        kAllowNetworkURLs = (1 << 2)
    typedef uint32_t FileOpenFlags;
    void    OpenFile( const char* path, FileOpenFlags inFlags );

But that requires the foresight to never use a single boolean. And of course the actual discipline.

Wouldn't it be nice if C had a special provision for naming booleans? My first thought was allowing to specify enums in-line for parameters:

    void OpenFile( const char* path, enum { kReadOnly, kEditable } inReadOnly );

But to be convenient, this would require some rather too-clever scoping rules. It'd be easy to make the enum available to all callers when they directly call the function, but what about cases where you want to store the value in a variable? Maybe we could do C++-style scope resolution and allow saying OpenFile::kReadOnly ?

Would be a nice way to make it easy to name parameters, but not really readable.

I guess that's why other languages have named parameters instead. Avoids all those issues. So...

The boolean is dead! Long live the boolean! (as long as you have named parameters to label them with)

The fast road to unit tests with Xcode

Supposedly Xcode has unit test support. I’ve never seen that work for more than two Xcode revisions. So I’ve come up with a minimal unit test scheme that works reliably.

1) Add a “command line tool” target (Foundation application, C++ application, whatever makes sense). Put your test code in its main.m or whatever. After each test, print out a line starting with “error: ” if the test failed. If you want to be able to see the successes as well, start them with “note: “. Keep a counter of failed tests (e.g. in a global). Use the number as the app’s return value of your main().

2) Add a “Run Shell Script” build phase to this target, at the very end. Set it to run ${TARGET_BUILD_DIR}/${PRODUCT_NAME}. Yes, that’s right, we make it build the unit test app, then immediately run it. Xcode will see the “error: ” and “note: ” lines and format them correctly, including making the build fail.

3) Optionally, if you want these tests to run with every build, make that command line tool target a dependency of your main app, so it runs before every build. Otherwise, just make sure your build tester regularly builds this test target.

4) Add a preprocessor switch to the tests that lets you change all “error:” lines into “warning:” instead. Otherwise, when a test fails, you won’t be able to run it in the debugger to see what’s actually going wrong.

What a block really is

BlocksLegoBrickAfter quite a while of thinking that Objective-C blocks did some mean magic on the stack, it simply took me seriously using C++’s lambdas (their implementation of the concept) that I realized what blocks are.

Effectively, a block is simply a declaration of a class, plus an instantiation of one instance of that class, hidden under syntactic sugar. Don’t believe me? Well, let’s have a look at C++ lambdas to clear things up:

MyVisitorPattern( [localVariableToCapture]( MyObject* objectToVisit ) { objectToVisit->Print(localVariableToCapture); }, 15 );

The red part is a C++ block. It’s pretty much the same as an Objective-C block, with two differences:

  1. You explicitly specify which local variables to capture in square brackets.
  2. Instead of the ^-operator, you use those square brackets to indicate that this is a block.

Seeing the captured variables specified explicitly listed here, like parameters to a constructor, made me realize that that’s really all that a block is. In-line syntax to declare a subclass of a functor (i.e. an object whose entire purpose is to call a single of its methods), and return you an instance of that class. In ObjC-like pseudo-code, you could rewrite the above statement as:

@interface MYBlockSubClass : NSBlock
    int localVariableToCapture;

-(id) initWithLocalVar: (int)inLocalVariableToCapture;

-(void) runForObject: (MyObject*)objectToVisit;


@implementation MYBlockSubClass
-(id) initWithLocalVar: (int)inLocalVariableToCapture
    self = [super init];
    if( self )
        localVariableToCature = inLocalVariableToCapture;
    return self;

-(void) runForObject: (MyObject*)objectToVisit

and at the actual call site:

MyVisitorPattern( [[MYBlockSubClass alloc] initWithLocalVar: localVariableToCapture], 15 );

The difference is that C++ (and even more so Objective-C) automatically declare the class for you, create the instance variables and constructor for the variables you want to capture, pick a unique class name (which you can see in the stack backtraces if you stop the debugger inside a block) and instantiate the class all in a single line of code.

So there you see it, blocks aren’t really black magic, they’re 99% syntactic sugar. Delicious, readability-enhancing syntactic sugar. Mmmmmh…

PS – Of course I’m simplifying. Objective-C blocks are actually Objective-C objects created on the stack, which you usually can’t do in plain Objective-C, though it can be done with some clever C code if you insist.

A more magical approach to blocks

That said, there is a fancier way for a compiler developer to implement blocks that also makes them 100% compatible with regular C functions:

If you implement a function in assembler, you can stick additional data onto the end of a function and calculate an offset between an instruction and the end of the function (e.g. by just filling the end of the function with a bunch of 1-byte No-Ops). This means that if someone duplicates a block, they’ll duplicate this data section as well. So what you can do is declare a struct equivalent to the captured variables, and implement your code with (pseudocode):

void    MyBlock( void )
struct CapturedIVars * capturedIVars = NULL;

    capturedIVars = pointer_to_current_instruction + ivarsSection-currentInstruction;

    // Block's code goes here.
    goto ivarsSectionEnd; // Jump over ivars so we don't try to execute our data.

Now you can use the capturedIVars pointer to access the data attached to your function, but to any caller, MyBlock is just a plain old function that takes no arguments. But if you look at it from a distance, this is simply an object prefixed with a stub that looks like a function, so our general theory of blocks being just special syntax for objects holds.

I presume this is how Swift implements its blocks, because it really doesn’t distinguish between blocks and functions.

A proposal: Categories for C++


One of the most useful features I’ve used in object-oriented programming languages is the “category”. A category is a way of adding methods to a class dynamically. Take, for example Objective C’s “NSAttributedString” class. As defined in the Foundation framework, it is simply a string where you can attach key-value pairs to a range of it.

Only a category in the GUI framework AppKit actually defines the methods and constants that define what key to use for “bold”, and let you draw such a string to the screen. Yet in everyday use with AppKit, these two disparate parts feel like one homogeneous whole. And, more importantly, anyone who wants to write a text-processing command line tool can use NSAttributedString without having to drag in all that unneeded drawing code.

How C++ objects work

But C++ does not have this feature. So let’s come up with an implementation of this concept that a compiler vendor could implement, and that stays true to the core of C++, mainly its compile-time determination of as much as possible. But of course we want to be able to add a category to a system class or a class in another module, and want to define virtual methods in a category, and override them in a category on a subclass. So, of course we want to end up at something that looks like this in C++:

category BarSupport : MyObject // extend class MyObject with some methods.
    void doBar( MyObject baz );

and can be called just like any other method on MyObject, e.g.

MyObject foo, dodo;
foo.doBar( dodo );

If we didn’t have to support virtual methods, things would be trivial. A non-virtual method call like

MyObject foo;

compiles to something like

MyObject_doBar( foo, baz );

Where the this pointer is simply passed in as the first parameter before the first parameter you defined. So all we’d have to do is tell the parser about these new methods, compile the functions that implement them, and call them.

But virtual methods work differently. First, there is a virtual function table, something like this:

struct MyObjectVTable
    void (*doFoo)( struct MyObject* this ); // defined by user as doFoo(void).

And whenever you define a new class, what it actually does is add a hidden instance variable at the start:

struct MyObject
    struct MyObjectVTable *vtable;
    int                   firstInstanceVariable;

And it declares a global variable containing the vtable once, for all objects created with this class, and stashes it in each new object’s vtable instance variable:

struct MyObjectVTable gMyObjectSharedVTable = { MyObject_doFoo };


// Equivalent code to MyObject* foo = new MyObject :
struct MyObject* foo = malloc(sizeof(struct MyObject));
foo->vtable = &gMyObjectSharedVTable;
MyObject_Constructor( foo );

Here, MyObject_doFoo() is a function just like our


above. But when a virtual method is called, it is done slightly differently:

foo.vtable->doFoo( &foo );

This means that a subclass that wants to override doFoo() can provide its own gMySubclassSharedVTable that is also a struct MyObjectVTable, but contains a pointer to MySubclass_doFoo instead, and thus overrides the behaviour of doFoo() for all MySubclass objects. And if it wants to define additional virtual methods in its subclass, it can simply declare a struct MySubclassVTable that starts with the same fields in the same order as struct MyObjectVTable, and contains the additional methods after that. That way, anyone who expects an object for the base class doesn’t even have an inkling that there is more stuff after the methods it knows.

Applying that to Categories

Our categories also want to add methods, but the problem is we can’t just add new fields to the struct. The system classes like std::string have already been compiled, and we can’t just change their code to add these ivars. But what we can do is declare a parallel class hierarchy for our categories. So, first we declare a struct for our category:

struct MyObject_BarSupportVTable
    void (*doBar)( struct MyObject* this, struct MyObject* baz );

Then we extend the vtable to list categories:

struct CategoryEntry
    MyObject_BarSupportVTable* catVTable; // Simplified, each category's vtable is really a different struct.

struct MyObjectVTable
    struct CategoryEntry *cattable;
    void (*doFoo)( struct MyObject* this ); // defined by user as doFoo(void).

When a class is first built, the cattable array is empty, but as soon as someone declares a category, it gets added to that list. We just add some initialization code at the start of main() that mallocs the list. But how do we look up the vtable for a category? We could store its name, but then we’d have to loop over this list on each call and compare category names until we find it. How can we make that more efficient?

Simple: When we add the category to the class, we remember the index into the array where we added this category in the list, cache it in a global variable int gMyObject_BarSupport_Index; and then we can call a virtual method in a category like:

foo.vtable->cattable[gMyObject_BarSupport_Index].catVTable->doBar( &foo, baz );

Again, a subclass-category that overrides this method can just provide its own doBar method in the struct MyObject_BarSupportVTable.

The cost of categories

Of course this means that, just like with C++ virtual methods, and a bit more so, you pay a price for calling a virtual method in a category. You also pay with a little bit of overhead at startup, when the categories are added to the base class vtable. And just like with C++ virtual methods, to override a category method in a subclass, you have to know that the category exists on the base class. And if your program uses threads and you load a dynamic library that contains a category on a system class, bad things could happen while the category lists change under your running code’s rear.

Another gotcha with this approach is that the category list has to be built from the base class to the subclasses, because once a subclass locks down an index in the table for its first category vtable, you can’t add another category to the base class (it would have the index of the subclass’s category). But even that can be fixed:

We can just give every subclass its own categories table. So it only deals with the base class table when it wants to call a method inherited from the base class. E.g.:

struct MySubclassVTable
    struct CategoryEntry *cattable;
    void (*doFoo)( struct MyObject* this ); // defined by user as doFoo(void).
    struct CategoryEntry2 *cattable2;
    void (*doBar)( struct MyObject* this, struct MyObject baz ); // defined by user as doBar( MyObject baz ).

And now, calling a method in a category specific to the subclass simply goes through the cattable2.

Thoughts, suggestions, additional runtime geekery?

Runtime Time

[Image of stylized running human with ones and zeroes in its wake.]One of the more obscure parts of creating a new programming language is the so-called runtime. The runtime, essentially, is the part of your programming language that needs to be added to a compiled program to make it work. The best-known part of the runtime is probably the runtime library. Many programming languages have a runtime library, and it contains things like mathematical functions, common algorithms, and commands to deal with lists and strings.

While in most languages some of these things may actually be built into the language itself instead of being a separate library, the code has to come from somewhere, and usually consists of a pre-written module of code that just gets added to every program you compile. Especially in higher-level programming languages, the runtime library may take on epic dimensions, and include things like a sockets API for internet communications, or a whole 3D engine. Java is best known for having an extensive collection of useful libraries (and, as some people like to point out, maybe a few on top of that). But another thing that is usually part of the runtime in object-oriented programming languages is much less obvious to most people: The object model.

You see, computing (and actually, all of craft itself) has a long history of bootstrapping, i.e. building new things based on old things. The mold for a new chisel may be created by using its predecessor on a piece of rock, or whatever. Thus, most people who create a new object-oriented programming language do so in a procedural language. What’s misleading is that often people will say things like: “I can write programs in Objective C that I just wouldn’t be able to do in C”. That’s not quite true. There’s nothing keeping you from writing object-oriented code using straight C (In fact, early versions of C++ were simply a program that translated a special form of "C with classes" into straight C). It’s just that it’s a bit of a pain to do so, not to mention that if the OO features aren’t part of the language, you usually end up reinventing what you’d get for free and already optimized from an object-oriented language, and every library would invent their own, making inter-operability a bit of a problem.

That part you’d have to reinvent is the runtime. To make things easier, I’ll just walk you through how you’d write object-oriented code with C. The first thing you’d have to do is create classes. That’s easy: In C++, classes are the same as structs, so surely they’ll be similar. So, let’s have a struct Person:

[Memory layout of the Person struct: 20 bytes for mName, followed by 4 bytes for mAge]struct Person
    char    mName[20];
    int     mAge;

Now, we want to subclass Person. What we want is something that is a Person, plus something more. Let’s create a superhero, who has a super power:

[Memory layout of the Superhero struct: 20 bytes for mPerson.mName, 4 bytes for mPerson.mAge and that followed by 40 bytes for mSuperpower]struct Superhero
    Person mPerson;
    char    mSuperpower[40];

Now, as it so happens, everyone who expects a Person will find that one can also use an object of this type, as name and age are exactly in the same position, since the Person is first. That is, the above could be rewritten as:

struct Superhero
    char    mPerson_mName[20];
    int     mPerson_mAge;
    char    mSuperpower[40];

A Superhero will thus be usable everywhere a Person is expected, and we don’t even need a typecast, we can just pass its mPerson to functions that expect a person. And if we have a case where we are generally satisfied with a Person, but can do something special if it’s a Superhero, we can take advantage of the fact that a pointer to mPerson is the same as a pointer to the object itself, and do:

void SaveInnocents( Person* saver, int howMany )
    if( strcmp( saver->mName, "Clark Kent" ) == 0 ) // We know he's a superhero!
        DoSaveInnocentsWithSuperpower( (Superhero*) saver, howMany );
        DoSaveInnocentsAsBestYouCan( saver, howMany );

Handy, huh? But of course you’re burning to see how to call a function on an object here. Well, that’s easy. Where in C++ you would write

clarkKent->SaveInnocent( 15 );

In our OO-C, you’d simply rewrite that to:

SaveInnocent( clarkKent, 15 );

So, we simply insert a pointer to this as the first parameter. We can even name that parameter this or self or me or whatever your favorite programming language calls it.

Now, it’s a little awkward to have each SaveInnocents()-function contain a huge block of if-then-else statements to handle all classes. Imagine doing this for the reign of Supermen… there’s Clark Kent, Henry Irons, Superboy, the Cyborg, Eradicator, and then all of the Justice League… Surely there’s a way to split this across separate functions for each class, like in real OO languages? Yes, there is. What you do is simply use function pointers to remember the address of the function to call:

struct Person
    char    mName[20];
    int     mAge;
    void (*mSaveInnocentProc)( int howMany );

And when you create a new Person, you do:

struct Person peteRoss;

strcpy( peteRoss.mName, "Pete Ross" );
peteRoss.mAge = 20;
peteRoss.mSaveInnocentProc = DoSaveInnocentsAsBestYouCan;

And when you create a new Superhero you do:

struct Superhero barryAllen;

strcpy( barryAllen.mPerson.mName, "Barry Allen" );
barryAllen.mPerson.mAge = 30;
strcpy( barryAllen.mSuperpower, "High Speed" );
barryAllen.mPerson.mSaveInnocentProc = DoSaveInnocentsWithSuperpower;

And now, whenever you want to save an innocent, what you do is:

peteRoss.mSaveInnocentProc( &peteRoss, 15 );
barryAllen.mPerson.mSaveInnocentProc( &barryAllen, 15 );

And whaddaya know, except for having to explicitly provide a “this” object, this already looks a lot like C++. But this technique has one huge drawback: Every object carries around a pointer to all its member functions (or “methods”), and there’s no way to call through to the superclass easily. What to do? In general, all objects of the same class use the same methods, so the easiest route to go would probably be to just share the list of functions a class understands across all instances:

struct PersonClassData
    void (*mSaveInnocentProc)( int howMany );

struct PersonClassData gSharedPersonClassData = { DoSaveInnocentsAsBestYouCan };

struct SuperheroClassData
    PersonClassData mPersonMethods;
    // Any methods Superhero adds would follow here.

struct SuperheroClassData gSharedSuperheroClassData = { { DoSaveInnocentsWithSuperpower } };

Which uses the same “superclass stuff at identical position at start”-approach as we used for the actual objects of our class. And once we have that, we change Person to:

[A Person and a Superhero object, pointing to their corresponding ClassData structs]struct Person
    struct PersonClassData* mMethods;
    char    mName[20];
    int     mAge;

And now, when we create a new Person, we set its mMethods to point to gSharedPersonClassData, and when we create a new Superhero, we set it to gSharedSuperheroClassData instead. And when we call those two methods, we do:

peteRoss.mMethods->mSaveInnocentProc( &peteRoss, 15 );
barryAllen.mPerson.mMethods->mSaveInnocentProc( &barryAllen, 15 );

Not too shabby, isn’t it? And on top of that, we can now find out what class an object belongs to, by simply comparing mMethods to gSharedSuperheroClassData. But we still can’t call through to the superclass. Well, we could just directly look up mSaveInnocentProc in gSharedPersonClassData (which is effectively what C++ does these days). But you can also add a field named super to the base class’s shared class data, which points to NULL if this object really is of the base class, or to the immediate superclass if it’s a subclass. Then you can follow each of these pointers up to the base class, if you want to.

There are many more fun things you can do. For example, many languages have special “constructor” and “destructor” functions in the class data that are automatically called when the memory for an object has been allocated, so you can initialize the objects’ values, and share the initialization code for inherited variables by calling through to the superclass’s function but passing it your class’s mPerson or whatever is appropriate.

Also, the fixed layout above is a little problematic because, if you want to add a function to a class that has subclasses, all these subclasses will need to be recompiled because the functions they may have added will have been moved down. While this isn’t that big a deal if this is all your program, if you have some libraries that subclass the changed class, you might not be able to recompile them. One way to solve this is to keep the list of functions in a hash map or other kind of associative array, and look them up by name each time you call them, because no matter at what offset in the map your function ends up, the name will stay the same.

Similarly, you can make it possible to add instance variables to a class without having to recompile, or you can keep additional information in your class data, like name of the class, names of functions (so you can save a function name to a file, e.g. a script) and later look up the correct function to call, etc. etc.

If you’re curious about this, you may want to check out my Dan’s Runtime sample code, which demonstrates how to implement objects in plain C (well, mostly plain C, with a few C++-isms for convenience), as well as how to implement a garbage collector so you won’t have to manually allocate and release your objects, and is heavily commented and documented so you can get a second, slightly different explanation of all this.