tutorial

4 entries have been tagged with tutorial.

Custom Elements on WebKit Pages

Recently, I wanted to create a simple HTML editor. basically, I just wanted to WYSIWYG-edit styled text, and maybe insert a few images. But, of course there was one reason I could not use one of the existing HTML editors:

I needed to be able to insert special placeholder items. Little boxes, that could be double clicked for editing, and would otherwise just be part of the text like a regular character. In the final output, these would expand to special commands in a server-side scripting language, but I didn’t want to see them in the editor of course.

Luckily, after a bunch of code contributions and bug reports by Karelia, makers of SandVox, and then Mail.app using WebKit for editing styled e-mails, WebKit supports in-line editing. It is quite easy, actually: You just call [myWebView setEditable: YES] and give the WebView an empty HTML document. That is all you need to have a styled edit field from which you can get HTML and to which you can assign HTML.

Apple’s website also documents how to load HTML into the WebView, and how to get it back out. Loading is trivial using [[myWebView mainFrame] loadHTMLString: @"<html><body></body></html>"], and of course you can insert your HTML text in between these two tags. Writing them back out is slightly more complicated: What you do is get to the DOMDocument and then the documentElement from the mainFrame, then grab the element named @"body" out of that, and then you can typecast that to a DOMHTMLElement to obtain its innerHTML, which is a string containing the HTML for your styled text. If you want the full document and not just edit HTML fragments, you can simply take the document element’s outerHTML.

That was all straightforward so far, once I’d grown used to WebKit’s terminology. Now, how to manage the placeholders? Well, first, you will have to create a new WebKit plug-in project from the Standard Apple plug-ins category. This gives you a simple NSView, which you can associate with a MIME type by editing its Info.plist. Set up this project, so that it is built with your application project, and make sure you have a Copy Files build phase that copies it into a PlugIns folder in your application’s bundle.

You should now be able to specify an embed tag in your HTML source code with the type attribute set to your plugin’s MIME type, and WebKit will automatically load your plug-in and place it in this spot. However, you will either have to manually specify width and height attributes on the embed tag, or your plug-in will have to calculate the sizes, and use the arguments dictionary it gets passed on creation to add to them to the embed tag in code by manipulating the DOM tree through the DOMElement in WebPlugInContainingElementKey in the arguments dictionary. Since this key already refers to our embed tag, this is as simple as using the -setAttribute:value: call on this object, and using +stringWithFormat: to convert our sizes into strings.

What I learned from this? Once you know that there is no way to load a plug-in directly from inside your application, it is fairly straightforward to add your own objects to a WebView. Sadly, you always only know that after…

Generating Machine Code at Runtime

Okay, so my next attempt at learning how my computer works and how to speak machine language is the following C code fragment:

typedef int (*FuncPtr)();

// Create a function:
char            testFunc[] = { 0x90,                         // NOP (not really necessary...)
                               0xB8, 0x10, 0x00, 0x00, 0x00, // MOVL $16,%eax
                               0xC3 };                       // RET

// Make a copy on the heap, OS doesn't like executing the stack:
FuncPtr         testFuncPtr = (FuncPtr) malloc(7);
memmove( (void*) testFuncPtr, testFunc, 7 );

printf("Before function.\n");
int result = (*testFuncPtr)();
printf("Result %d\n", result);

Basically, this stores the raw opcodes of a function in an array of chars. The first byte of each line is usually the opcode, i.e. 0x90 is No-Op, 0xB8 is a MOVL into the eax register (with the next 4 bytes being the number to store, in this case 16), and 0xC3 is the return instruction (I had to look up the opcodes in Intel’s documentation).

One thing to watch out for here (at least on Mac OS X), is that you’ll get a bad access error if you try to execute testFunc directly. That’s because testFunc is on the stack, and the stack shouldn’t contain executable code (it’s a small safety measure). So, what we do is we simply malloc some memory on the heap, and stuff our code in there.

You may wonder why I’m using eax of all registers to store my number 16 in. Easy: Because the convention is that an int return value (and most other 4-byte return values) goes in eax when a function returns. So, what this does is it essentially returns 16. Which our printf() proves. Neat!

Intel’s documentation describes the opcodes in a very complicated way, so what I essentially do is I write some assembler code and enclose the instruction whose byte sequence I want to find out in instructions whose byte sequence I already know (I like to use six nops, which are short and show up as 0x90 90 90 90 90 90). Then I compile that, and then use a hex editor to search for the known instructions, and whatever is between them must be my new one. Here’s a small table of other operations you may find in the typical program and what byte sequences they turn to:

0×50 pushl %eax
0×53 pushl %ebx
0×55 pushl %ebp
0×89 E5 movl %esp, %ebp
0×90 nop
0xB8 NN NN NN NN movl $N, %eax
0×68 NN NN NN NN pushl $N
0xE8 NN NN NN NN call relativeOffsetNFromEndOfInstruction
0x8B 1C 24 movl (%esp), %ebx
0x8D 83 NN NN NN NN leal relativeOffsetToData(%ebx), %eax
0x8D 85 NN NN NN NN leal relativeOffsetToData(%ebp), %eax
0x5B popl %ebx
0×83 C4 NN addl $NN,%esp
0×83 EC NN subl $NN,%esp
0x8B 00 movl (%eax), %eax
0×89 45 NN movl %eax, NN(%ebp)
0xC9 leave
0xC3 ret

The code fragment above is essentially what one would need to create a just-in-time compiler. For a real compiler, instead of executing this directly, we’d have to write it to a complete MachO file and link it with crt1.o.

Update: on top of the instructions for position-independent code (PIC), I’ve also added some more useful in passing structs as parameters on the stack.

Building a loader…

One neat thing about a computer’s internals is the “loader”. A loader is a little bit of code (generally in the system) that takes a file of compiled code and loads it into RAM, preparing it for execution.

Preparing code for execution means that it looks where the system functions are and fixes up any calls to them to point to the current address, and does the same for other dynamically loaded libraries that might not get loaded at the same address every time.

For my experiments with code generation, runtimes etc., I recently wrote myself a little loader. I could have used the system’s loader, but then my output files would have had to adhere to the Mach-O file format, and that looked a tad too complicated for me. So, I rolled my own that’s fairly simple, but in essence does the same things a real loader would do:

It loads the actual code and data from a file. Then it looks at the symbol table which is also part of the loaded data, and looks up the actual function corresponding to each symbol name, and inserts it in the symbol table. The way one generally does this is to have the symbol table consist of 5-byte placeholders. Why 5 bytes? Well, because that’s the size of a JMP (jump-to-address) instruction in Intel machine code.

[Illustration of the code, the symbol table entry and how control flows when it is called]

All code that calls the function is written so it jumps to these 5 bytes and executes them, using a CALL instruction. And when the loader prepares our code for execution, it fills these 5 bytes with the opcode for the JMP instruction plus the address of the actual function to be called (in the case of our example image, that would be the address of printf()).

The nice part about this approach is that all CALL statements point to this central JMP statement in the symbol table (which is often called a “jump island”), and the loader only has to update this one place. Another nice part comes from the use of CALL and JMP: CALL remembers the location we were at when it got executed, so that the called function can return to that place. On the other hand, the JMP instruction just transfers execution to its destination. That means that when printf returns, it will not return to the JMP instruction, but rather it will directly jump back to the CALL instruction, because that was the last time someone saved where to return to. So, we don’t need an additional RET (return) instruction after the JMP, and we don’t waste time jumping to the symbol table just to jump back to the CALL statement.

But still, we make that extra jump. Can’t we do better? Yes! Since CALL pushes the address of the instruction after it onto the stack, it’s easy to find the CALL instruction that called into the jump island. So, many loaders don’t put the address of printf into that symbol table entry, but rather the address of a linker-fix-up-function. This function then inserts the address of printf into the CALL instruction. That way, the first time each CALL statement is executed, we go through the jump island, the fix-up function changes the CALL instruction to call printf directly and then jumps to printf. Subsequent times, the CALL statement will call printf directly.

So, how does the fix-up function know to call printf? Well, before it updates the CALL statement, it takes the address of our jump island from there. So, we can just use the return address to find the CALL statement, and the CALL statement to find our jump island, and then do whatever we would have done before to get the real function address (e.g. get the function’s name that our compiler stored before our jump island in the code), stash it in the CALL statement and jump to it ourselves.

The advantage is that we can lazily update each CALL to directly call printf, which saves us one extra jump (and maybe even flushing memory — who knows how far the symbol table is away from the code we’re executing). But since we do it only on code that is actually called, if a piece of code is never reached, we never look up that symbol. That may save us a lot of time. It’s also handy for weak-linking: If the function is never called, we don’t care whether the library isn’t available. So, whoever wrote the code we’re loading can just check for existence of a library and not call into it if it isn’t there.

Neat, huh?

Carbon for the Cocoa Guy: Handles

One of the more confusing aspects of MacOS programming is the Handle. That doesn’t have to be so. I’ll quickly illustrate the history of the Handle and then everything should become clearer. I’ll also include a little memory-management-101 at the beginning.

Memory Fragmentation

When you use malloc() or similar APIs to allocate memory, you face the problem of fragmentation: Imagine you have the following (top) situation in memory:

[Illustration of three consecutive memory blocks, where the one in the middle gets deleted, fragmenting memory]

 

You have three blocks of memory: The blue one, the green one and the red one, each 6 bytes in length. Now, you dispose of the green one by calling free() (bottom). Now there’s a six-byte hole between the two blocks. The total free memory (white blocks) is 108 bytes. Trouble is, since the computer can only allocate contiguous blocks of memory, the largest block you can allocate is 102 bytes.

 

Now, we can’t just move the red block to the left to make more continuous free space available, because our program keeps track of each block by its position (its “address”), and moving it would change that address. The system would have to go through your program and change each occurrence of the moved block’s address, which is simply impossible since only your program knows which parts of its memory are used for what. Not to mention it would cause pauses in execution. So, if you needed 103 bytes, you couldn’t get them, even though we have 108 free bytes in total, more than we’d need.

[Illustration of three pointers pointing to the same memory block]

 

Handles – a modern solution to fragmentation

So, what Apple did is they created the Handle. A Handle is essentially a centralized way of storing pointers to memory blocks, so the system only has to change one centralized pointer when it needs to move a block to make more memory available.

[Illustration of three Handles pointing at a master pointer, which in turn points at the memory block]

 

Each pointer to the actual memory is owned by the system, and kept in a central table of pointers (the “master pointer block”). When you want to allocate memory, you ask the system to do it for you using the NewHandle() function. The system gives you a Handle, which is a pointer to the actual master pointer it owns. You only use this Handle to remember where your memory is. When you want to access the memory, you de-reference the Handle once to get at the pointer, and then use that pointer like any other.

The advantage of this is that all access to memory goes through those central master pointers. When the OS has to move a block of memory, all it has to do is change the master pointer, because all Handles point there. The OS also takes care to bunch up all master pointers at the start of memory, and to re-use old master pointers, so that they can’t fragment memory.

Since we don’t want the OS to move around our master pointer while we are using it, there is an HLock() function that lets you mark a Handle as immovable, and HUnlock() to make it movable again as soon as you are finished. To get rid of a Handle, there is the DisposeHandle() function (So, NewHandle/DisposeHandle can be seen as roughly equivalent to malloc() and free(), though they’re not exchangeable).

History, or Carbon, intervenes

By the time Apple switched to Mac OS X, pretty much everything was a Handle. There were ControlHandles used for buttons, MenuHandles used for menus … Since Mac OS X was very different under the hood, Apple needed to change the way menus and controls worked. Mac OS 9 had a single address space, so it was easy to just hand a menu Handle between applications.

But Mac OS X has protected memory. For security reasons, every application runs in its own segregated area of memory, with very limited access to memory in other applications. Moreover, the Window Server, responsible for drawing windows and the menu bar, is now a separate application. You can’t just pass it a MenuHandle, it has to be something different.

Since Apple didn’t want to break everyone’s code, they changed the name of ControlHandle to ControlRef, but kept the old name for compatibility, even though it isn’t a Handle anymore. Which means you have to be extra careful with xxxHandle data types.

Handles – a solution still used today

Unix offers a slightly improved version of the same approach: Each address a program uses isn’t a real address on the RAM chip, but rather a “virtual” address that gets translated to a physical address in RAM. There is a translation table for translating addresses from virtual to physical. When the OS needs to move a block of memory on the chip, it essentially just changes this translation table (I’m simplifying here). So, your code doesn’t even have to know its memory just moved, doesn’t have to mess with Handles. But that needs a faster, better CPU with a Memory Management Unit, which early Macs with their 680×0 CPUs didn’t have (at least not until the Performas and Quadras).

Since Mac OS X is a descendant of BSD Unix, it inherited this, and Apple uses this new Unix-style mechanism in most new APIs. Also, in Mac OS X, a Handle is never moved. Since, under the hood, the master pointer’s memory block is allocated using malloc(), it’s not necessary anymore. So, if you write new code, use malloc()/free(). If you need to talk with APIs that still use Handles, use NewHandle()/DisposeHandle() and double-de-reference, but there’s no need to call HLock()/HUnlock() anymore. Though it doesn’t hurt either.