assembler

3 entries have been tagged with assembler.

Funny thing about C parameter evaluation order…

I just explained this to a friend today, and thought this might make an interesting blog posting:

#include <stdio.h>

int main( int argc, const char * argv[] ) { char theText[2] = { 'A', 'B' }; char* myString = theText; printf( "%c, %c\n", *(++myString), *myString );

return 0; }

The above code is platform-dependent in C. Yes, you read correctly: platform dependent. And I’m not nitpicking that this may cause a problem if your compiler is old or that some compiler may not have printf() or the POSIX standard.

This code is platform-dependent, because the C standard says that there is no guarantee in which order the parameters of a function call get evaluated. So, if you run the above code, it could print B, B (which most of you probably expected because it corresponds to our left-to-right reading order) or it could print B, A.

If you want to test this and you own an Intel Mac, you can do the following thanks to Rosetta’s PowerPC emulation: Create a new “Standard Tool” project in Xcode and paste the above code into the main.c file. Switch to “Release” and change “Architectures” in the build settings for the release build configuration to be “ppc”. Build and Run. It’ll print B, B. Now change the architecture to “i386″ and build and run again. It’ll print B, A.

So, why doesn’t C define an order? Why did anyone think such odd behaviour was a good idea? Well, to explain that, we’ll have to look at what your computer does under the hood to execute a function call. In general, there are two steps: First, the parameters are evaluated and stored in some standardized place where the called function can find them, and then the processor “jumps” to the first command in the new function and starts executing it.

Some CPUs have registers inside the CPU, which are little variables that can hold short values, and which can be accessed a lot quicker than actually going over to a RAM chip and fetching a value. There are different registers for different kinds of values. Many CPUs have separate registers for floating-point numbers and integers. And just like with RAM, it’s sometimes faster to access these registers in a certain order.

So, it may be faster to first evaluate all integer-value parameters, and then those that contain floating-point values. Depending on what physical CPU your computer has (or in the case of Rosetta, what characteristics the emulated CPU your code is being run on has), these performance characteristics may be different. Some CPUs may have so few registers that the parameters will always have to be passed in RAM. Others may put larger parameters in RAM and smaller ones in registers, others again may put the first couple parameters in registers (maybe even distributing a longer parameter across several registers), and the rest that don’t fit in RAM, etc.

So, to make sure C can be made to run that little bit faster on any of these CPUs, its designers decided not to enforce an order for execution of parameters. And that’s one of the dangers of writing code in C++ or Objective C: It may look like a high-level language, but underneath it is still a portable assembler, with platform-dependencies like this.

Generating Machine Code at Runtime

Okay, so my next attempt at learning how my computer works and how to speak machine language is the following C code fragment:

typedef int (*FuncPtr)();

// Create a function:
char            testFunc[] = { 0x90,                         // NOP (not really necessary...)
                               0xB8, 0x10, 0x00, 0x00, 0x00, // MOVL $16,%eax
                               0xC3 };                       // RET

// Make a copy on the heap, OS doesn't like executing the stack:
FuncPtr         testFuncPtr = (FuncPtr) malloc(7);
memmove( (void*) testFuncPtr, testFunc, 7 );

printf("Before function.\n");
int result = (*testFuncPtr)();
printf("Result %d\n", result);

Basically, this stores the raw opcodes of a function in an array of chars. The first byte of each line is usually the opcode, i.e. 0x90 is No-Op, 0xB8 is a MOVL into the eax register (with the next 4 bytes being the number to store, in this case 16), and 0xC3 is the return instruction (I had to look up the opcodes in Intel’s documentation).

One thing to watch out for here (at least on Mac OS X), is that you’ll get a bad access error if you try to execute testFunc directly. That’s because testFunc is on the stack, and the stack shouldn’t contain executable code (it’s a small safety measure). So, what we do is we simply malloc some memory on the heap, and stuff our code in there.

You may wonder why I’m using eax of all registers to store my number 16 in. Easy: Because the convention is that an int return value (and most other 4-byte return values) goes in eax when a function returns. So, what this does is it essentially returns 16. Which our printf() proves. Neat!

Intel’s documentation describes the opcodes in a very complicated way, so what I essentially do is I write some assembler code and enclose the instruction whose byte sequence I want to find out in instructions whose byte sequence I already know (I like to use six nops, which are short and show up as 0x90 90 90 90 90 90). Then I compile that, and then use a hex editor to search for the known instructions, and whatever is between them must be my new one. Here’s a small table of other operations you may find in the typical program and what byte sequences they turn to:

0×50 pushl %eax
0×53 pushl %ebx
0×55 pushl %ebp
0×89 E5 movl %esp, %ebp
0×90 nop
0xB8 NN NN NN NN movl $N, %eax
0×68 NN NN NN NN pushl $N
0xE8 NN NN NN NN call relativeOffsetNFromEndOfInstruction
0x8B 1C 24 movl (%esp), %ebx
0x8D 83 NN NN NN NN leal relativeOffsetToData(%ebx), %eax
0x8D 85 NN NN NN NN leal relativeOffsetToData(%ebp), %eax
0x5B popl %ebx
0×83 C4 NN addl $NN,%esp
0×83 EC NN subl $NN,%esp
0x8B 00 movl (%eax), %eax
0×89 45 NN movl %eax, NN(%ebp)
0xC9 leave
0xC3 ret

The code fragment above is essentially what one would need to create a just-in-time compiler. For a real compiler, instead of executing this directly, we’d have to write it to a complete MachO file and link it with crt1.o.

Update: on top of the instructions for position-independent code (PIC), I’ve also added some more useful in passing structs as parameters on the stack.

Nice Intel assembler text…

[two of Intel's instruction set manuals]

I’ve recently been looking into assembler coding a little. I learned assembler theory back in High School in Mr. Trapp’s computer programming elective, and later learned a bit of 68000 assembler as well, but never got round to actually getting into it when the PPC arrived on the scene. So, when I recently heard at work how one can get a whole bunch of Intel reference books for free, I thought this might be a good opportunity to learn x86 assembler. After all, I’m a parser and compiler geek, it’s kind of a gap in my skill set if I can’t do the backend.

Now, trouble is, while there are many tutorials for Linux and Windows, I couldn’t find a single one for Mac OS X. So, I started googling, assembling C code and bothering some developers I know and others on mailing lists with my questions, and I thought I’d share my first findings:

  • I got a link to Apple’s Mac OS X ABI docs. This is really good, as it documents an important part on OS X in detail: How to align the stack (on 16 bytes, no matter what Intel’s docs tell you), and how to call your own functions.
  • Aforementioned 16-byte stack alignment is not always necessary, but when you call a function, you must give it a properly aligned stack. When you are called, however, the stack will have the return address on it, which is 4 bytes. So, after you push the base pointer on the stack (4 more bytes), you have to move the stack pointer by another 8 bytes at least to make it aligned on a 16-byte boundary again.
  • A nice way to learn assembler is by writing very simple C programs and using gcc -S my_simple_c_program.c to get it translated into assembler code. Note that by simple, I recommend you start out with stuff that doesn’t use any system functions, because those are dynamically linked and make for rather complex assembler.
  • To compile such a program, simply pass it to GCC again, as you would with a C source file. E.g. gcc my_simple_c_program.s -o my_simple_c_program

This might be a good point to mention my Memory Management chapter in the Masters of the Void C tutorial again, which illustrates how memory works. As I learn more, I may post supplements to that that slowly teach you assembler. Well, I’m not promising anything, but I’d love to do that.