I've always wanted to learn another assembler, and with one of my colleagues being a real assembler guru, and the Intel reference books on my bookshelf, and the Intel switch just behind us, I thought this would be a good opportunity to finally get going with x86 assembler.

Now, assembler programming under Mac OS X isn't quite as well documented as one would wish. There's no tutorial that I could find (lots of tutorials for Linux and Windows, but none for Mac OS X yet). This won't be one either, but rather this is a blog posting of me sharing what I found out about assembler on OS X, and is probably only useful to someone who already knows some assembler, but just doesn't know Intel on Mac OS X. My main approach is to compile C source code into assembler source files using GCC. Then I can look at that code and find out what assembler instructions correspond to what C command. If all of this turns out to be correct and I should happen to have loads of time on my hand, I may still go out there and turn this into a decent tutorial.

The basics are pretty simple

	.text						# start of code indicator.
.globl _main					# make the main function visible to the outside.
_main:							# actually label this spot as the start of our main function.
	pushl	%ebp				# save the base pointer to the stack.
	movl	%esp, %ebp			# put the previous stack pointer into the base pointer.
	subl	$8, %esp			# Balance the stack onto a 16-byte boundary.
	movl	$0, %eax			# Stuff 0 into EAX, which is where result values go.
	leave						# leave cleans up base and stack pointers again.
	ret							# returns to whoever called us.

Now, the underscore in front of "main" is a convention in C, so just accept it. When you enter the _main function, the return address (i.e. the instruction where the program will continue after the function has finished, aka "back pointer") has already been pushed on the stack, taking up 4 bytes. We also save the base pointer (the point where our caller can find its parameters on the stack) to the stack, and set it to the current stack pointer (which is where our parameters are). That takes another 4 bytes, so we have 8 bytes now. Since the stack should be aligned on 16 bytes before you can make a call to another function, we subtract another 8 from the stack pointer, which pads out the stack (we could also just do two "pushl $0" for the same effect). If we used any local variables, we would use this opportunity to subtract more for them.

Now comes the actual body of our function. What we do is simply return 0. This is done by stuffing 0 in the eax register.

Finally, we have the tail end of our function, which calls leave (which cleans up by restoring our caller's base pointer and stack pointer) and then call ret, which pops the return address off the stack and continues execution there.

Calling a local function

Calling a function is fairly simple, as long as it's a local one right in the same file as ours. In that case, what you do is you first declare that function:

.globl _doSomething				# Our doSomething function.
	pushl	%ebp
	movl	%esp, %ebp
	subl	$8, %esp
	nop							# does nothing.
.globl _main
	pushl	%ebp
	movl	%esp, %ebp
	subl	$24, %esp			# 8 to align, 16 for our 4-byte parameter and padding.
	movl	$3, (%esp)			# write our parameter at the end of the stack (i.e. padding goes first).
	call	_doSomething		# call doSomething.
	movl	$0, %eax

"nop" is a do-nothing instruction I just inserted here to show where doSomething's code would go. That's pretty easy. You just write the function, push the parameters on the stack and use call to jump to the function, and that will take care of pushing the return address and all that. The only tricky thing is passing the parameters. You have to pad first, and then push (or mov, in our case) the parameters in reverse order (i.e. #1 is at the bottom of the stack, #2 above it etc.). That's because otherwise the function being called would have to skip the padding. Well, could be worse.

Accessing parameters

To acess any parameters, you address relative to the base pointer. The value immediately at the base pointer is generally your caller's base pointer and the return address, so you need to add 4 + 4 = 8 bytes. Yes, since the stack starts at the end of memory and grows towards the beginning, and you subtract from the stack pointer to make it larger, you need to add to the stack pointer to find something on the stack. The same applies to our base pointer, of course:

	movl	12(%ebp), %eax	# get parameter 2 at offset 4 + 4 + 4
	addl	8(%ebp), %eax	# get parameter 1 at offset 4 + 4

Would store your second parameter in eax and then add the first parameter to it, leaving the result in eax, where it's ready for use as a return value. Note the ##(foo) syntax, which adds the number ## to the pointer foo. This is register-relative addressing.

An added benefit of this is that you can actually pass more parameters to a function than it knows to handle, and it will just ignore the rest.

Fetching data

To access data (e.g. strings), it gets trickier. You declare data like the following:

	.ascii "Hello World!
.globl _main
. . .

So, you add a .cstring section at the top of the function, and in that you declare a label and use the .ascii keyword to actually stash your string there. So far, so good, there's only one problem:

All data manipulation is done using absolute addresses. But we don't know at what position in memory our program will be loaded. Labels aren't absolute addresses, they get compiled into relative offsets from the start of our code. So, how do we find out at which absolute address our string myHelloWorld is? Well, the trick MachO uses is that it knows that our program will be loaded as one huge chunk. So, we know that the distance between any of our instructions in the code will always stay at the same distance to our string.

So, if we could only get the address of one instruction in our code that has a label, we could calculate the absolute address of our string from that. Now, look above, at our function call code. Notice anything? Our return address is an absolute pointer to the next instruction after a function call. So, all we need to do to get our address is call a function. When you assemble C source code, they call this helper function ___i686.get_pc_thunk.bx, which is quite a mouthful. Let's just call it _nextInstructionAddress:

. . .
	call	_nextInstructionAddress
. . .

That's what we call somewhere at the start of our code to find our own address. Note how I cleverly already added a label myAnchorPoint, which labels the instruction whose address we'll get. Then we somewhere (e.g. at the bottom) define that function:

. . .
	movl	(%esp), %ebx

We don't even bother aligning the stack or changing and restoring the base pointer. This simply peeks at the last item on the stack (the return address) and stashes that in register ebx. Then it returns (and obviously doesn't call leave because we pushed no base pointer that it could restore).

Once we have this address in ebx, we can do the following to get our string's address into a register, and from there onto the stack:

. . .
	leal	myHelloWorld-myAnchorPoint(%ebx), %eax
	movl	%eax, (%esp)
. . .

LEA means "Load Effective Address", i.e. take an address and stash it into a register. myHelloWorld-myAnchorPoint calculates the difference between our two labels, and thus tells us how far myHelloWorld is from myAnchorPoint. Since myHelloWorld is probably at the start of the program, e.g. at address 3 maybe, and myAnchorPoint further down, say at address 20, what we get is a negative value, e.g. -17. And xxx(%ebx) is how you tell the assembler that you want to add an offset to a register to get a memory address. ebx contains the address of myAnchorPoint, so what this does is subtract 17 from myAnchorPoint's absolute address, giving us the absolute address of myHelloWorld! Whooo! And this mess is called "position-independent code".

Now, our call to LEAL loads a "Long" (which is 32 bits, i.e. the size of a pointer on a 32-bit CPU) and stashes it into register eax. And the movl call moves that long from our register into the last item on the stack, ready for use as a parameter to a function.

Calling a system function

Now, it'd be really nice if we could printf() or something, right? Well, trouble is, we don't know the address of printf(). But this time it's actually easy. We add a new section at the bottom of our code:

. . .
	.section __IMPORT,__jump_table,symbol_stubs,self_modifying_code+pure_instructions,5
	.indirect_symbol _printf
	hlt ; hlt ; hlt ; hlt ; hlt
	.indirect_symbol _getchar
	hlt ; hlt ; hlt ; hlt ; hlt

This is a new section named __IMPORT,__jump_table. It has the type symbols_stubs and the attributes self_modifying_code and pure_instructions. 5 is the size of the stub, and intentionally is the same as the number of hlt statements below.

This section is special, because when our code is loaded, the loader will look at it. It will see that there is an .indirect_symbol directive for a function named "printf", and will look up that function. Then it will replace the five hlt instructions, each of which is one byte in size, with an instruction to jump to that address (hence the self_modifying_code). We also added a label for each indirect symbol, which we name the same as the symbol, just with "_stub" appended.

So, to call printf, all you have to do now is push the string on the stack and then

	call	_printf_stub

Which will jump to _printf_stub and immediately continue to printf itself. And just to show you that you can have several such imported symbols, I've also included a stub for getchar. Now note that the system usually doesn't name these symbols "_foo_stub", but rather "L_foo$stub" (yes, a label name can contain dollar signs. You can even put the label in quotes and have spaces in it...). Same difference.

Okay, so that's how much I've guessed my way through it so far. Comments? Corrections? If you want

PS - Thanks to John Kohr, Alexandre Colucci, Jonas Maebe, Eric Albert and Jordan Krushen, all of which helped me figure this out one way or the other. Thanks, guys!

Update: Added mention of how to actually access parameters.