Cocoa: String comparisons and the optimizer

By uliwitness

Woman in front of a mirrorA while ago, a friend came to me with this bit of code:

NSString *a = @"X";
NSString *b = @"X";
if( a == b )
{
    NSLog(@"Same!");
}

“How come it works with the == operator? Didn’t you have to call isEqualToString: in the old days?”

Before we answer his question, let’s go into what he implicitly already knew:

Why wouldn’t == work on objects?

By default, C compares two pointers by simply comparing the addresses. That is logical, fast, and useful. However, it is also a little annoying with strings, arrays and other collections, because you may have two collections that still contain identical objects.

If you have the phone books from 2013 and 2014, do you just want to compare the numbers 2013 and 2014 and be told: “No that’s not the same phone book”, or are you actually interested in whether their contents are different? If nobody’s phone book entry changed in a particular city, wouldn’t you want to know that and save yourself the trip to the phone company to pick up a new phone book?

Since all Objective-C objects are pointers, the only way to do more than compare the addresses needs some special syntax. So NSString offers the isEqualToString: method, which, if the pointers do not match, goes on to check their contents. It compares each character to the same position in the second string to find out whether even though they’re not the same slip of paper, they at least have the same writing on it.

So why does the code above think they’re the same?

After all that, why does the code above think they are the same object after all? Doesn’t a point to the @"X" in the first line, b to the @"X" in the second line?

That is what is conceptually true, what a naïve compiler would do. However, most compilers these days are smart. Compilers know that a string constant can never change. And they see that the contents of both string objects pointed to by a and b are the same. So they just create one constant object to save memory, and make both point to the same object.

There is no difference for your program’s functionality. You get an immutable string containing “X”.

However, note that there is no guarantee that this will happen. Some compilers perform this optimization for identical strings in one file, but not across files. Others are smarter, and give you the same object even across source files in the same project. On some platforms, the string class keeps track of all strings it has created so far, and if you ask for one it already did, gives you that, to save RAM. In most, if you load a dynamic library (like a framework) it gets its own copy of each string, because the compiler can not know whether the surrounding application actually already has that string (it might be loaded into any arbitrary app.

Mutability is important

This is a neat trick compilers use that works only with immutable objects, like NSString or NSDictionary. It does not work with NSMutableString. Why? Because if it put the same mutable string into a and b, and you call appendString: on a, b would change as well. And of course we wouldn’t want to change our program’s behaviour that way.

For the same reason, NSString may be optimized so that copy is implemented like this:

-(id) copy
{
    return [self retain];
}

That’s right. It gives you the same object, just with the reference count bumped, because you can’t change this string once it has been created. From the outside it looks the same. Copy gives you the object with its retain count bumped, so you can release it safely once you’re done with it. It behaves just like a copy. The only hint you may have that this happened is that instead of an NSString with reference count owned solely by you, you get one with a reference count of 2 whose ownership you share with another object. But that’s what shared ownership is about after all.

Of course, this optimization doesn’t work with NSMutableString.

What I take away from this

So if someone walks up to you and shows you code that uses the == operator where it should really be checking for content equality, and argues that “it works, so it is correct”, now you’ll know why it just happens to work:

It’s a fluke, and if Apple decides to switch compilers or finds a better way to optimize performance or memory usage that requires them to no longer perform this optimization, they might just remove it, and this code will break, because it relied on a side effect. And we don’t want our code to break.

Share your thoughts