i386

3 entries have been tagged with i386.

Generating Machine Code at Runtime

Okay, so my next attempt at learning how my computer works and how to speak machine language is the following C code fragment:

typedef int (*FuncPtr)();

// Create a function:
char            testFunc[] = { 0x90,                         // NOP (not really necessary...)
                               0xB8, 0x10, 0x00, 0x00, 0x00, // MOVL $16,%eax
                               0xC3 };                       // RET

// Make a copy on the heap, OS doesn't like executing the stack:
FuncPtr         testFuncPtr = (FuncPtr) malloc(7);
memmove( (void*) testFuncPtr, testFunc, 7 );

printf("Before function.\n");
int result = (*testFuncPtr)();
printf("Result %d\n", result);

Basically, this stores the raw opcodes of a function in an array of chars. The first byte of each line is usually the opcode, i.e. 0x90 is No-Op, 0xB8 is a MOVL into the eax register (with the next 4 bytes being the number to store, in this case 16), and 0xC3 is the return instruction (I had to look up the opcodes in Intel’s documentation).

One thing to watch out for here (at least on Mac OS X), is that you’ll get a bad access error if you try to execute testFunc directly. That’s because testFunc is on the stack, and the stack shouldn’t contain executable code (it’s a small safety measure). So, what we do is we simply malloc some memory on the heap, and stuff our code in there.

You may wonder why I’m using eax of all registers to store my number 16 in. Easy: Because the convention is that an int return value (and most other 4-byte return values) goes in eax when a function returns. So, what this does is it essentially returns 16. Which our printf() proves. Neat!

Intel’s documentation describes the opcodes in a very complicated way, so what I essentially do is I write some assembler code and enclose the instruction whose byte sequence I want to find out in instructions whose byte sequence I already know (I like to use six nops, which are short and show up as 0x90 90 90 90 90 90). Then I compile that, and then use a hex editor to search for the known instructions, and whatever is between them must be my new one. Here’s a small table of other operations you may find in the typical program and what byte sequences they turn to:

0×50 pushl %eax
0×53 pushl %ebx
0×55 pushl %ebp
0×89 E5 movl %esp, %ebp
0×90 nop
0xB8 NN NN NN NN movl $N, %eax
0×68 NN NN NN NN pushl $N
0xE8 NN NN NN NN call relativeOffsetNFromEndOfInstruction
0x8B 1C 24 movl (%esp), %ebx
0x8D 83 NN NN NN NN leal relativeOffsetToData(%ebx), %eax
0x8D 85 NN NN NN NN leal relativeOffsetToData(%ebp), %eax
0x5B popl %ebx
0×83 C4 NN addl $NN,%esp
0×83 EC NN subl $NN,%esp
0x8B 00 movl (%eax), %eax
0×89 45 NN movl %eax, NN(%ebp)
0xC9 leave
0xC3 ret

The code fragment above is essentially what one would need to create a just-in-time compiler. For a real compiler, instead of executing this directly, we’d have to write it to a complete MachO file and link it with crt1.o.

Update: on top of the instructions for position-independent code (PIC), I’ve also added some more useful in passing structs as parameters on the stack.

Intel assembler on Mac OS X

I’ve always wanted to learn another assembler, and with one of my colleagues being a real assembler guru, and the Intel reference books on my bookshelf, and the Intel switch just behind us, I thought this would be a good opportunity to finally get going with x86 assembler.

Now, assembler programming under Mac OS X isn’t quite as well documented as one would wish. There’s no tutorial that I could find (lots of tutorials for Linux and Windows, but none for Mac OS X yet). This won’t be one either, but rather this is a blog posting of me sharing what I found out about assembler on OS X, and is probably only useful to someone who already knows some assembler, but just doesn’t know Intel on Mac OS X. My main approach is to compile C source code into assembler source files using GCC. Then I can look at that code and find out what assembler instructions correspond to what C command. If all of this turns out to be correct and I should happen to have loads of time on my hand, I may still go out there and turn this into a decent tutorial.

The basics are pretty simple

	.text						# start of code indicator.
.globl _main					# make the main function visible to the outside.
_main:							# actually label this spot as the start of our main function.
	pushl	%ebp				# save the base pointer to the stack.
	movl	%esp, %ebp			# put the previous stack pointer into the base pointer.
	subl	$8, %esp			# Balance the stack onto a 16-byte boundary.
	movl	$0, %eax			# Stuff 0 into EAX, which is where result values go.
	leave						# leave cleans up base and stack pointers again.
	ret							# returns to whoever called us.

Now, the underscore in front of “main” is a convention in C, so just accept it. When you enter the _main function, the return address (i.e. the instruction where the program will continue after the function has finished, aka “back pointer”) has already been pushed on the stack, taking up 4 bytes. We also save the base pointer (the point where our caller can find its parameters on the stack) to the stack, and set it to the current stack pointer (which is where our parameters are). That takes another 4 bytes, so we have 8 bytes now. Since the stack should be aligned on 16 bytes before you can make a call to another function, we subtract another 8 from the stack pointer, which pads out the stack (we could also just do two “pushl $0″ for the same effect). If we used any local variables, we would use this opportunity to subtract more for them.

Now comes the actual body of our function. What we do is simply return 0. This is done by stuffing 0 in the eax register.

Finally, we have the tail end of our function, which calls leave (which cleans up by restoring our caller’s base pointer and stack pointer) and then call ret, which pops the return address off the stack and continues execution there.

Calling a local function

Calling a function is fairly simple, as long as it’s a local one right in the same file as ours. In that case, what you do is you first declare that function:

	.text
.globl _doSomething				# Our doSomething function.
_doSomething:
	pushl	%ebp
	movl	%esp, %ebp
	subl	$8, %esp
	nop							# does nothing.
	leave
	ret
.globl _main
_main:
	pushl	%ebp
	movl	%esp, %ebp
	subl	$24, %esp			# 8 to align, 16 for our 4-byte parameter and padding.
	movl	$3, (%esp)			# write our parameter at the end of the stack (i.e. padding goes first).
	call	_doSomething		# call doSomething.
	movl	$0, %eax
	leave
	ret

“nop” is a do-nothing instruction I just inserted here to show where doSomething’s code would go. That’s pretty easy. You just write the function, push the parameters on the stack and use call to jump to the function, and that will take care of pushing the return address and all that. The only tricky thing is passing the parameters. You have to pad first, and then push (or mov, in our case) the parameters in reverse order (i.e. #1 is at the bottom of the stack, #2 above it etc.). That’s because otherwise the function being called would have to skip the padding. Well, could be worse.

Accessing parameters

To acess any parameters, you address relative to the base pointer. The value immediately at the base pointer is generally your caller’s base pointer and the return address, so you need to add 4 + 4 = 8 bytes. Yes, since the stack starts at the end of memory and grows towards the beginning, and you subtract from the stack pointer to make it larger, you need to add to the stack pointer to find something on the stack. The same applies to our base pointer, of course:

	movl	12(%ebp), %eax	# get parameter 2 at offset 4 + 4 + 4
	addl	8(%ebp), %eax	# get parameter 1 at offset 4 + 4

Would store your second parameter in eax and then add the first parameter to it, leaving the result in eax, where it’s ready for use as a return value. Note the ##(foo) syntax, which adds the number ## to the pointer foo. This is register-relative addressing.

An added benefit of this is that you can actually pass more parameters to a function than it knows to handle, and it will just ignore the rest.

Fetching data

To access data (e.g. strings), it gets trickier. You declare data like the following:

	.cstring
myHelloWorld:
	.ascii "Hello World!\0"
	.text
.globl _main
_main:
. . .

So, you add a .cstring section at the top of the function, and in that you declare a label and use the .ascii keyword to actually stash your string there. So far, so good, there’s only one problem:

All data manipulation is done using absolute addresses. But we don’t know at what position in memory our program will be loaded. Labels aren’t absolute addresses, they get compiled into relative offsets from the start of our code. So, how do we find out at which absolute address our string myHelloWorld is? Well, the trick MachO uses is that it knows that our program will be loaded as one huge chunk. So, we know that the distance between any of our instructions in the code will always stay at the same distance to our string.

So, if we could only get the address of one instruction in our code that has a label, we could calculate the absolute address of our string from that. Now, look above, at our function call code. Notice anything? Our return address is an absolute pointer to the next instruction after a function call. So, all we need to do to get our address is call a function. When you assemble C source code, they call this helper function ___i686.get_pc_thunk.bx, which is quite a mouthful. Let’s just call it _nextInstructionAddress:

. . .
	call	_nextInstructionAddress
myAnchorPoint:
. . .

That’s what we call somewhere at the start of our code to find our own address. Note how I cleverly already added a label myAnchorPoint, which labels the instruction whose address we’ll get. Then we somewhere (e.g. at the bottom) define that function:

. . .
_nextInstructionAddress:
	movl	(%esp), %ebx
	ret

We don’t even bother aligning the stack or changing and restoring the base pointer. This simply peeks at the last item on the stack (the return address) and stashes that in register ebx. Then it returns (and obviously doesn’t call leave because we pushed no base pointer that it could restore).

Once we have this address in ebx, we can do the following to get our string’s address into a register, and from there onto the stack:

. . .
	leal	myHelloWorld-myAnchorPoint(%ebx), %eax
	movl	%eax, (%esp)
. . .

LEA means “Load Effective Address”, i.e. take an address and stash it into a register. myHelloWorld-myAnchorPoint calculates the difference between our two labels, and thus tells us how far myHelloWorld is from myAnchorPoint. Since myHelloWorld is probably at the start of the program, e.g. at address 3 maybe, and myAnchorPoint further down, say at address 20, what we get is a negative value, e.g. -17. And xxx(%ebx) is how you tell the assembler that you want to add an offset to a register to get a memory address. ebx contains the address of myAnchorPoint, so what this does is subtract 17 from myAnchorPoint’s absolute address, giving us the absolute address of myHelloWorld! Whooo! And this mess is called “position-independent code”.

Now, our call to LEAL loads a “Long” (which is 32 bits, i.e. the size of a pointer on a 32-bit CPU) and stashes it into register eax. And the movl call moves that long from our register into the last item on the stack, ready for use as a parameter to a function.

Calling a system function

Now, it’d be really nice if we could printf() or something, right? Well, trouble is, we don’t know the address of printf(). But this time it’s actually easy. We add a new section at the bottom of our code:

. . .
	.section __IMPORT,__jump_table,symbol_stubs,self_modifying_code+pure_instructions,5
_printf_stub:
	.indirect_symbol _printf
	hlt ; hlt ; hlt ; hlt ; hlt
_getchar_stub:
	.indirect_symbol _getchar
	hlt ; hlt ; hlt ; hlt ; hlt

This is a new section named __IMPORT,__jump_table. It has the type symbols_stubs and the attributes self_modifying_code and pure_instructions. 5 is the size of the stub, and intentionally is the same as the number of hlt statements below.

This section is special, because when our code is loaded, the loader will look at it. It will see that there is an .indirect_symbol directive for a function named “printf”, and will look up that function. Then it will replace the five hlt instructions, each of which is one byte in size, with an instruction to jump to that address (hence the self_modifying_code). We also added a label for each indirect symbol, which we name the same as the symbol, just with “_stub” appended.

So, to call printf, all you have to do now is push the string on the stack and then

	call	_printf_stub

Which will jump to _printf_stub and immediately continue to printf itself. And just to show you that you can have several such imported symbols, I’ve also included a stub for getchar. Now note that the system usually doesn’t name these symbols “_foo_stub”, but rather “L_foo$stub” (yes, a label name can contain dollar signs. You can even put the label in quotes and have spaces in it…). Same difference.

Okay, so that’s how much I’ve guessed my way through it so far. Comments? Corrections? If you want

PS – Thanks to John Kohr, Alexandre Colucci, Jonas Maebe, Eric Albert and Jordan Krushen, all of which helped me figure this out one way or the other. Thanks, guys!

Update: Added mention of how to actually access parameters.

Nice Intel assembler text…

[two of Intel's instruction set manuals]

I’ve recently been looking into assembler coding a little. I learned assembler theory back in High School in Mr. Trapp’s computer programming elective, and later learned a bit of 68000 assembler as well, but never got round to actually getting into it when the PPC arrived on the scene. So, when I recently heard at work how one can get a whole bunch of Intel reference books for free, I thought this might be a good opportunity to learn x86 assembler. After all, I’m a parser and compiler geek, it’s kind of a gap in my skill set if I can’t do the backend.

Now, trouble is, while there are many tutorials for Linux and Windows, I couldn’t find a single one for Mac OS X. So, I started googling, assembling C code and bothering some developers I know and others on mailing lists with my questions, and I thought I’d share my first findings:

  • I got a link to Apple’s Mac OS X ABI docs. This is really good, as it documents an important part on OS X in detail: How to align the stack (on 16 bytes, no matter what Intel’s docs tell you), and how to call your own functions.
  • Aforementioned 16-byte stack alignment is not always necessary, but when you call a function, you must give it a properly aligned stack. When you are called, however, the stack will have the return address on it, which is 4 bytes. So, after you push the base pointer on the stack (4 more bytes), you have to move the stack pointer by another 8 bytes at least to make it aligned on a 16-byte boundary again.
  • A nice way to learn assembler is by writing very simple C programs and using gcc -S my_simple_c_program.c to get it translated into assembler code. Note that by simple, I recommend you start out with stuff that doesn’t use any system functions, because those are dynamically linked and make for rather complex assembler.
  • To compile such a program, simply pass it to GCC again, as you would with a C source file. E.g. gcc my_simple_c_program.s -o my_simple_c_program

This might be a good point to mention my Memory Management chapter in the Masters of the Void C tutorial again, which illustrates how memory works. As I learn more, I may post supplements to that that slowly teach you assembler. Well, I’m not promising anything, but I’d love to do that.