The title of this post is a partial rip-off of Dynamic Code Encryption as an Anti Dump and Anti Reverse Engineering measure blogpost. Alexey describes a technique similar to the one I used in my crackme, which isn’t altogether that new. His post is a good introduction to some possible attack vectors and what is at stake. You should give it a look.
The crackme uses a multi-layer dynamic code encryption approach, with two different encryption algorithms (Rabbit and Salsa). The objective was to avoid an easy memory dump of all the code and also as anti-debugging. I decided not to obfuscate the algorithms, except removing some obvious web-searchable stuff, because that wasn’t the crackme objective. Also one of the algorithms was right away exposed – necessary to do the first stage decryption. The “trustable” XOR could have been used for adding another stage of obfuscation, but still not the point.
While I was thinking in a way to store the decryption information and searching around, I came up with this interesting post at Root Labs blog. It is a cute idea to help against breakpoints and patching and it fit nicely into what I had in mind. In this case, the encryption/decryption key is composed by the SHA256 checksum of the function to be protected, to which is added a 128 bit salt value, and then generated the final SHA256, which is the key. The salt values are stored into the four hardware registers. Just a very simple and fast way to try to deter hardware breakpoints. All this work is done by an external binary, the protector, against the compiled but still not crypted crackme.
The last piece of the puzzle is the storage of the decryption information, where to decrypt and its size. Alexey suggests some markers at the start and end of the function. My solution was to store that information inside the function that will decrypt the next stage into a fake piece of code that would desync IDA disassembly (a very basic fake jump), together with a marker to find it. This could be located anywhere in the function and found by a simple loop inserted into every function. The system function mach_vm_protect() is used to make the code section writable and then back to the original protection, and it’s simply obfuscated using a function pointer and XORed strings.
I think I’m not missing any big detail. The best way to explain it is to show a piece of the code. It’s huge and not exactly the most beautiful piece of code.
/*
* the debug loop, we decrypt and encrypt the exception handler and the
* function that deals with the exceptions
*/
void
debug_loop(void)
{
#if DEBUG
printf("***************************\n");
printf("[DEBUG] Started debug loop!\n");
printf("***************************\n");
#endif
JUNK_CODE1;
uint32_t search;
asm(".att_syntax prefix");
asm __volatile__("nop\n\t"
"call 1f\n\t" // E8 00 00 00 00
"1:\n\t"
"pop %%eax\n\t" // 58
"mov %%eax, %0" // 89 C3
: "=r" (search)
:
:"eax");
asm(".att_syntax prefix");
#if DEBUG
printf("Search is %x %x\n", search, *(uint32_t*)(search+5));
#endif
EVIL_ASM5;
uint8_t iv[8] = "hackersz";
mach_vm_address_t addr = 0;
uint32_t protectSize = 0;
uint32_t functionSize = 0;
uint32_t functionBegin = 0;
volatile uint16_t xorKey = 0x4569;
for (uint32_t i = search; i < search + 0x2000; i++)
{
if ((*(uint16_t*)i ^xorKey) == 0x745D) // 0x3134
{
addr = *(uint32_t*)(i+4);
protectSize = *(uint32_t*)(i+8);
functionSize = *(uint32_t*)(i+12);
#if DEBUG
printf("[DEBUG] Found exception_handler decrypting info at %x . Address:%x Size:%x Function Size:%x!\n", i,
*(uint32_t*)(i+4), *(uint32_t*)(i+8),*(uint32_t*)(i+12));
#endif
break;
}
}
xorKey = 0x1233;
for (uint32_t i = search; i > 0x1000; i--)
{
if ((*(uint16_t*)i ^ xorKey) == 0xF7BA) // 0xe589
{
functionBegin = i - 1;
#if DEBUG
printf("[DEBUG] Found debug_loop() beginning at %x\n", functionBegin);
#endif
break;
}
}
kern_return_t (*mymach_vm_protect)(vm_map_t target_task,mach_vm_address_t address,mach_vm_size_t size,
boolean_t set_maximum,vm_prot_t new_protection);
volatile uint8_t vmprotectsymbol[] = {0x2e,0x22,0x20,0x2b,0x1c,0x35,0x2e,0x1c,0x33,0x31,0x2c,0x37,0x26,0x20,0x37,0x43};
DECRYPT_STRING(16, vmprotectsymbol, 0x43);
mymach_vm_protect = DLSYM_GET(vmprotectsymbol);
while (1)
{
// decrypt exception handler
// build the key
uint8_t key[32];
EVIL_ASM3;
sha2((uint8_t*)functionBegin, functionSize, key, 0);
// build the salt array
uint32_t salt[4];
x86_debug_state32_t debug;
mach_msg_type_number_t count;
thread_state_flavor_t flavor;
flavor = x86_DEBUG_STATE32;
count = x86_DEBUG_STATE32_COUNT;
thread_act_port_array_t thread_list;
mach_msg_type_number_t thread_count;
kern_return_t (*mytask_threads)(task_t target_task,thread_act_array_t *act_list,mach_msg_type_number_t *act_listCnt);
volatile uint8_t taskthreadssymbol[] = {0x20,0x35,0x27,0x3f,0x0b,0x20,0x3c,0x26,0x31,0x35,0x30,0x27,0x54};
DECRYPT_STRING(13,taskthreadssymbol,0x54);
mytask_threads = DLSYM_GET(taskthreadssymbol);
(*mytask_threads)(mach_task_self(), &thread_list, &thread_count);
kern_return_t (*mythread_get_state)(thread_act_t target_act,thread_state_flavor_t flavor,thread_state_t old_state,
mach_msg_type_number_t *old_stateCnt);
volatile uint8_t threadgetstatesymbol[] = {0x37,0x2b,0x31,0x26,0x22,0x27,0x1c,0x24,0x26,0x37,0x1c,0x30,0x37,0x22,0x37,0x26,0x43};
DECRYPT_STRING(17,threadgetstatesymbol,0x43);
mythread_get_state = DLSYM_GET(threadgetstatesymbol);
(*mythread_get_state)(thread_list[0], flavor, (thread_state_t)&debug, &count);
salt[0] = debug.__dr0;
salt[1] = debug.__dr1;
salt[2] = debug.__dr2;
salt[3] = debug.__dr3;
uint8_t *tempKey = malloc(sizeof(salt) + sizeof(key));
memcpy(tempKey, key, sizeof(key));
memcpy(tempKey+sizeof(key), salt, sizeof(salt));
// compute the final salted key
sha2((uint8_t*)tempKey, sizeof(salt) + sizeof(key), key, 0);
free(tempKey);
EVIL_ASM5;
// and now we can finally decrypt
// modify memory protection so we can decrypt and write
kern_return_t kr;
#if DEBUG
printf("[DEBUG] Starting to decrypt exception_handler...\n");
#endif
kr = (*mymach_vm_protect)(mach_task_self(), (mach_vm_address_t)addr, (mach_vm_size_t)protectSize, FALSE, WRITEPROTECTION);
#if DEBUG
EXIT_ON_MACH_ERROR("Failurex", 1);
#endif
// start decryption, the input buffer is the same as the output buffer
SALSA_ctx ctx;
SALSA_keysetup(&ctx, key, 256, 64);
SALSA_ivsetup(&ctx, iv);
SALSA_decrypt_bytes(&ctx, (uint8_t*)addr, (uint8_t*)addr, protectSize);
EVIL_ASM1;
// restore original memory permissions
kr = (*mymach_vm_protect)(mach_task_self(), (mach_vm_address_t)addr, (mach_vm_size_t)protectSize, FALSE, READPROTECTION);
#if DEBUG
EXIT_ON_MACH_ERROR("Failure", 1);
printf("[DEBUG] End exception_handler decrypt\n");
printf("[DEBUG] Calling exception handler...\n");
#endif
exception_handler();
// crypt exception handler?
#if DEBUG
printf("[DEBUG] Starting to encrypt exception_handler...\n");
#endif
kr = (*mymach_vm_protect)(mach_task_self(), (mach_vm_address_t)addr, (mach_vm_size_t)protectSize, FALSE, WRITEPROTECTION);
#if DEBUG
EXIT_ON_MACH_ERROR("Failurex", 1);
#endif
SALSA_keysetup(&ctx, key, 256, 64);
SALSA_ivsetup(&ctx, iv);
EVIL_ASM2;
SALSA_encrypt_bytes(&ctx, (uint8_t*)addr, (uint8_t*)addr, protectSize);
// restore original memory permissions
kr = (*mymach_vm_protect)(mach_task_self(), (mach_vm_address_t)addr, (mach_vm_size_t)protectSize, FALSE, READPROTECTION);
#if DEBUG
EXIT_ON_MACH_ERROR("Failure", 1);
printf("[DEBUG] End exception_handler encrypt\n");
printf("[DEBUG] Return from exception handler...\n");
#endif
// the tail that will hold our decryption data
asm(".intel_syntax noprefix");
asm __volatile__ ("xor edx, edx\n\t" //31d2
"test edx, edx\n\t" // 85d2
"jz 1f\n\t" // 7416
// "jmp 1f\n\t"
".byte 0x00\n\t"
".long 0x0064a990\n\t"
".long 0x00003134\n\t"
".long 0x00000000\n\t" // address
".long 0x00000000\n\t" // size
".long 0x00000000\n\t" // function size
"1:\n\t");
asm(".att_syntax prefix");
}
}
You can find a text file with the same code here.
So what was the objective of all this? Besides trying to hide the other tricks that I really wanted to show, it also demonstrates how easy is to raise the barrier in OS X, both for malware and legit software protection. It’s not exactly rocket science! The crackme and tools took like 3 weeks time to write, and could be vastly improved by fixing some of the assumptions and if the target was something else than a crackme, where you have a single, well defined objective.
I have a feeling this post quality is somewhat mehhhh! Been too busy with Xcode in real-life projects and writing inspiration has been low. Sometimes one must pass the initial barrier, write something and then comeback later to fix it. Feel free to leave comments to clear doubts & questions!
The crackme also uses code from PolarSSL libraries. Salsa and Rabbit are from Ecrypt Project.
Have fun,
fG!