Anti-disassembly & obfuscation #1: Apple doesn’t follow their own Mach-O specifications?

I smile when I think about this “feature”! I liked it so much that things got out of control and I wrote a crackme to show it. It happens because Apple doesn’t follow their own documentation/specification and the reversing tools of the trade do. The result is that IDA terminates, disassemblers output the wrong disassembly, strings are messed up, lldb disassembles the wrong code (not gdb), class-dump will fail, and the reverser looks at a weird Mach-O header.

In the end, it’s just a funny illusion πŸ™‚

If you try to load the crackme into IDA, it will complain of negative sizes and/or offsets. Otool also outputs weird stuff such as sections past end of file. The problem applies to the section command and a few of its fields. The 32 bits version of this structure is:

struct section { /* for 32-bit architectures */
	char		sectname[16];	/* name of this section */
	char		segname[16];	/* segment this section goes in */
	uint32_t	addr;		/* memory address of this section */
	uint32_t	size;		/* size in bytes of this section */
	uint32_t	offset;		/* file offset of this section */
	uint32_t	align;		/* section alignment (power of 2) */
	uint32_t	reloff;		/* file offset of relocation entries */
	uint32_t	nreloc;		/* number of relocation entries */
	uint32_t	flags;		/* flags (section type and attributes)*/
	uint32_t	reserved1;	/* reserved (for offset or index) */
	uint32_t	reserved2;	/* reserved (for count or sizeof) */
};

Let’s start with the one that produces the “best” results: offset. The definition at the reference document is:
“An integer specifying the offset to this section in the file.”

My interpretation of this isΒ (should be?) the offset (anywhere) in the file where the code/data for the section is located at. That makes sense right? It’s an offset so in theory it can be located anywhere in the file – it doesn’t need to be sequential or in a specific order. Once again, it’s open for some kind of abuse πŸ™‚

What happens if you change the offset value to somewhere else?
IDA, for example, will respect the content of the offset field and try to read the data pointed by it. Want to do a simple test? Grab a normal file, change the cstring section offset, save and load into IDA. Voila, the strings are now “obfuscated” because IDA is reading the wrong data.

That is fun, right? And if you try to run the modified binary, it works fine! That is, sort of, unexpected. Try the same trick with the text section. Now it’s the program code that is all wrong and it still runs fine. Hum…

What is happening? That is the fun part. I think that a good picture for this is that the kernel loads and maps the binary in a linear way from the disk and ignores the offset field.
The execve() system call is explained in detail starting page 812 in the great Mac OS X Internals book. The exec_mach_imgact() function (bsd/kern/kern_exec.c) calls load_machfile(), which is responsible for load executable, handle certain mach-o load commands, etc.
@bsd/kern/kern_exec.c

        /*
         * Actually load the image file we previously decided to load.
         */
        lret = load_machfile(imgp, mach_header, thread, map, &load_result);

Inside load_machfile(), we have a call to parse the new binary, parse_machfile().
@bsd/kern/mach_loader.c

        lret = parse_machfile(vp, map, thread, header, file_offset, macho_size,
                              0, result);

We can find there a nice description of this function:

/*
 * The file size of a mach-o file is limited to 32 bits; this is because
 * this is the limit on the kalloc() of enough bytes for a mach_header and
 * the contents of its sizeofcmds, which is currently constrained to 32
 * bits in the file format itself.  We read into the kernel buffer the
 * commands section, and then parse it in order to parse the mach-o file
 * format load_command segment(s).  We are only interested in a subset of
 * the total set of possible commands.
 */

Scrolling down that function you can observe a cycle that will process a subset of all possible commands. The section commands are found inside a LC_SEGMENT/LC_SEGMENT_64 command, so you are interested in giving a look at load_segment(). There you can observe that verifications are only done at segment command level, never at section level (that’s why we can’t mangle the segment command :-)).
When parse_machfile() returns, all parsing is done, linker is loaded and soon the program entrypoint will be called. The binary was mapped as it is found in the disk (why I picture it in a linear way) and the section info wasn’t used for anything. There’s an implicit assumption that the binary will be formatted correctly.

Is this behaviour correct? In my opinion, it’s not. The kernel does not respect the Mach-O specification. Or am I abusing my interpretation of the docs and the implicit assumption is correct? In a age of so much distrust (and wasted money) regarding user input this kind of assumptions should be made explicit and verified accordingly.

By the way, you should continue to read about the full load sequence – there’s another fun trick hidden in the crackme ;-).

You can also change the flags, size, section and segment names, and the order of the sections. That will confuse the tools and you, the reverser. What you need to do is to make the same assumption as the kernel and ignore those fields. That seems a bit odd, right?

I hope you have enjoyed this one and motivates you to spend some time with xnu and dyld.

Have fun,
fG!

Update:
This is a small PoC that implements the trick described above. The code is only for 32bits, non-fat binaries, console targets. If applied to Objective-C the target will not load because not all sections can be mangled.

manglemacho.c.gz
SHA256(manglemacho.c.gz)= d79a612b72130732d7e47b2925fba7fc0b63824622d05f08e7f33641d522a8b5

Update 2:
As a matter of fact, all the fields in each section can be 0, without any adverse consequences (except the mod_init_func). I had played with this and didn’t took any notes. If there’s no further obfuscation IDA is smart (in some cases) and can disassemble because of the valid entrypoint. IDA is more confused if we play with the offset and sizes fields.
Set the second argument in this improved version to something if you want to zero all fields.

manglemacho_v0.3.c.gz
SHA256(manglemacho_v0.3.c.gz)= 4b33dc5f43bbb9114e6a8c18dba8894ca44b991cd69a5e5e54bfdcd03607fc9c

4 thoughts on “Anti-disassembly & obfuscation #1: Apple doesn’t follow their own Mach-O specifications?

  1. This is an inspiring article!

    I made a small tool to fix the binary that was mangled by manglemacho.

    http://reversi.ng/?p=805

    Also, after I finished the 64bit part of manglemacho v0.3, I noticed that some section in 64bit will cause problems if you mangled them, not sure why.

    1. Oh, this is nice!
      I was just poking around your blog and I see you have been busy.
      This is great to see, others developing tools and advancing knowledge.
      I will check your code later.

      Keep working!

Leave a Reply

Your email address will not be published. Required fields are marked *