What is Bitcode?

LLVM Bitcode is a way of storing your code inside your executable in a way that allows Apple to re-target it to different CPUs. Some areas of Apple’s app store ecosystem require that your app include LLVM Bitcode, and Xcode will just compile it in, as a second copy of your code. But what is it?

Bitcode is a more space-efficiently encoded representation of LLVM IR, the “LLVM Intermediate Representation”. And LLVM IR is … ?

LLVM IR is a sort of portable, assembler-like meta-language that LLVM uses internally to allow the compiler to just compile to the abstract concepts, and then the back-end can take this intermediate representation and turn it into actual machine code for whatever architecture it is being compiled.

This allows the optimizer to transform the code without having to know the underlying CPU. You don’t need a new optimizer for each CPU. You just turn off certain optimizations for CPUs that don’t support certain concepts they depend on.

Now in some ways, LLVM IR is both higher-level and lower-level than assembler. For example, LLVM IR knows about functions and other things, but it also knows about registers and the stack. It is generated from already-preprocessed code and makes platform-specific assumptions in its algorithms, but is not actually specific to a given CPU.

This also means that LLVM IR is about as readable and unreadable as assembler code. Of course, since bitcode is not intended for humans to maintain, it isn’t quite as easy to read as human-created assembly: There are no local variable names, for example.

So this Means I’m Giving Out my Source Code ?

Nope. As I said, it’s about as readable as assembly without variable names. So anyone who can point a disassembler at your executable can get about the same level of readability. Also, while I haven’t verified it, I’m pretty certain Apple strip bitcode from executables before sending it to users, so the only people who even get a copy of your Bitcode are Apple.

I don’t Believe you, how do I See for Myself?

For my explorations, I used Alex Denisov’s Bitcode Retriever. It extracts the code from your executable, per-architecture, as XAR archives like arm64.xar. You can then extract them using

xar -x -f arm64.XAR

which gets you additional xar files with names like 1 and 2. Those are the actual segments of your application represented as bitcode. unpack them to get files named 01, 02 etc. that are actual bitcode files.

Now you can install LLVM using Homebrew to get llvm-dis, the bitcode disassembler, which will turn your bitcode back into LLVM IR, in files named something like 01.ll etc.:

/usr/local/opt/llvm/bin/llvm-dis 01

You can also get some interesting information by pointing LLVM’s bitcode analyzer at the files:

/usr/local/opt/llvm/bin/llvm-bcanalyzer -dump -show-binary-blobs 01

That’s pretty much all that is involved in seeing your app’s bitcode yourself.