Okay, so my next attempt at learning how my computer works and how to speak machine language is the following C code fragment:

typedef int (*FuncPtr)();

// Create a function:
char            testFunc[] = { 0x90,                         // NOP (not really necessary...)
                               0xB8, 0x10, 0x00, 0x00, 0x00, // MOVL $16,%eax
                               0xC3 };                       // RET

// Make a copy on the heap, OS doesn't like executing the stack:
FuncPtr         testFuncPtr = (FuncPtr) malloc(7);
memmove( (void*) testFuncPtr, testFunc, 7 );

printf("Before function.\n");
int result = (*testFuncPtr)();
printf("Result %d\n", result);

Basically, this stores the raw opcodes of a function in an array of chars. The first byte of each line is usually the opcode, i.e. 0x90 is No-Op, 0xB8 is a MOVL into the eax register (with the next 4 bytes being the number to store, in this case 16), and 0xC3 is the return instruction (I had to look up the opcodes in Intel's documentation).

One thing to watch out for here (at least on Mac OS X), is that you'll get a bad access error if you try to execute testFunc directly. That's because testFunc is on the stack, and the stack shouldn't contain executable code (it's a small safety measure). So, what we do is we simply malloc some memory on the heap, and stuff our code in there.

You may wonder why I'm using eax of all registers to store my number 16 in. Easy: Because the convention is that an int return value (and most other 4-byte return values) goes in eax when a function returns. So, what this does is it essentially returns 16. Which our printf() proves. Neat!

Intel's documentation describes the opcodes in a very complicated way, so what I essentially do is I write some assembler code and enclose the instruction whose byte sequence I want to find out in instructions whose byte sequence I already know (I like to use six nops, which are short and show up as 0x90 90 90 90 90 90). Then I compile that, and then use a hex editor to search for the known instructions, and whatever is between them must be my new one. Here's a small table of other operations you may find in the typical program and what byte sequences they turn to:

0x50 pushl %eax
0x53 pushl %ebx
0x55 pushl %ebp
0x89 E5 movl %esp, %ebp
0x90 nop
0xB8 NN NN NN NN movl $N, %eax
0x68 NN NN NN NN pushl $N
0xE8 NN NN NN NN call relativeOffsetNFromEndOfInstruction
0x8B 1C 24 movl (%esp), %ebx
0x8D 83 NN NN NN NN leal relativeOffsetToData(%ebx), %eax
0x8D 85 NN NN NN NN leal relativeOffsetToData(%ebp), %eax
0x5B popl %ebx
0x83 C4 NN addl $NN,%esp
0x83 EC NN subl $NN,%esp
0x8B 00 movl (%eax), %eax
0x89 45 NN movl %eax, NN(%ebp)
0xC9 leave
0xC3 ret

The code fragment above is essentially what one would need to create a just-in-time compiler. For a real compiler, instead of executing this directly, we'd have to write it to a complete MachO file and link it with crt1.o.

Update: on top of the instructions for position-independent code (PIC), I've also added some more useful in passing structs as parameters on the stack.