I was lucky enough to get my hands on an updated version of interesting multiplatform virus and decided to reverse the OS X part. The original virus is from 2006 by JPanic and it’s called CAPZLOQ TEKNIQ v1.0. The new version adds support to infect OS X binaries, 32 bit x86 only, although it supports infection of fat binaries (the x86 version only).
Source code for the original version is available. I just took a peek at a thing or two, mostly to confirm my assumptions, even if code is a bit different. The code is written in assembly, which is more fun to reverse due to the tricks it allows. It is refreshing to see interesting virus code in OS X for a change.
My reversing target is an already infected Mach-O binary, executed in a new environment. What happens is that the infected sample will try to infect other binaries in the new environment. The reason for this is that the original sample I got is the Windows binary, which was used to infect my OS X binary. Since it is multiplatform the virus payload and infection is the same so it is ok to just reverse an infected binary instead of the first generation launcher.
Let’s do as the Lilliputians wanted to and start the house by the top. How can you spot an infected binary?
The __PAGEZERO segment is modified to hold the virus payload. File offset is modified to point to the payload location, appended to the end of the file, and memory protections are modified to read and execute. This is the technique described by Roy G Biv in his 2006 article “Infecting Mach-O Files”. The other modified command is LC_UNIXTHREAD. Here EIP is redirected to address 0 to execute the virus payload instead of the original entrypoint.
Within these conditions detection is easy to achieve because the __PAGEZERO segment should not have those permissions and pointing to valid file offsets. If this happens something fishy is happening, either with this particular virus infection or something else. Maybe Apple can fix this at the Mach-O kernel loader and close this “hole”? Pretty easy to achieve and I can’t foresee for now any nasty side-effects.
As far as I experienced there are no anti-debugging measures in the virus payload. These are uncommon in OS X and a bit trickier to implement due to its exception mechanism (prove me wrong, I would love to see them!). IDA does not have any special problems dealing with it except with the jump/call to the middle of instructions, which is easy to solve manually. Hopper can’t disassemble yet the __PAGEZERO segment. You need to load the target as RAW. Vincent is aware of this and will fix it anytime soon.
The main body is located at address 0x0 and ends at 0xAE where it returns to the original entrypoint. The following tasks are executed here:
- CRC32 the virus payload, size is hardcoded at address 0x1A, and in my sample it is 0xB3A.
- Init a “table” with function pointers to other functions. This happens at address 0x48, which calls the function on above screenshot.
- Mmap __PAGEZERO segment.
- Lookup all files in the current directory and try to infect them. Supports 32 bit fat and non-fat Mach-O, Windows PE, and ELF. Also sets access and modification time to the original one to hide its modications.
- If running as root, try to infect binaries in /bin and /usr/bin.
- Munmap __PAGEZERO segment.
- Calculate the original entrypoint and return to it.
- Fix the code of function sub_1C9 and get the following:
The function pointers are located in the middle of the code. This function just loops and loads them into the stack. For my analysis I just imagined a table (int table[16]) with all the core data that is used in the code. A sample dump:
0xbffffa28: 0x421487a3 0x00000000 0x00000000 0x00000000
0xbffffa38: 0x00000000 0x00000000 0x0000002e 0x0000066b
0xbffffa48: 0x000006bd 0x0000073e 0x00000751 0x00000760
0xbffffa58: 0x000007d4 0x00000694 0x00000839 0x0000082f
The description of each field:
table.0 = Current payload CRC32
table.1 = fd to current directory, ebp-0x5b
table.2 = fd to “/bin”, ebp-0x57
table.3 = fd to “/usr/bin”, ebp-0x53
table.4 = pointer to mapped region
table.5 = geteuid() result, ebp-4B
table.6 = 0x2e “.” – current dir, ebp-0x47
table.7 = function 0x0000066b, does geteuid() and mmap
table.8 = function 0x000006bd, ebp-3F, find first file to infect
table.9 = function 0x0000073e, ebp-3B, find next file
table.10 = function 0x00000751, ebp-0x37, close fd of current directory we tried to infect
table.11 = function 0x00000760, ebp-33, mmap target file for infection
table.12 = function 0x000007d4, ebp-2F, close file – restore original times, restore chmod, etc
table.13 = function 0x00000694, ebp-2B, close all fds and munmap __PAGEZERO
table.14 = function 0x00000839, ebp-27, get fds for “.”, “/bin”, and “/usr/bin”
table.15 = function 0x0000082f, ebp-23, change working dir using fchdir()
The core infection routine is called at 0x52 and located at 0xB8. It starts by executing a getdirentries() of current directory where infected file was executed and looking up for regular files (struct dirent field d_type == DT_REG). Also retrieves the file mode permissions, its size. The file is then open and mmap’ed (chmod is also executed if necessary), and extends the file by 0x3000 via ftruncate(). After this it uses the mmap’ed buffer to determine the type of target by trying to read the magic values – MZ, PE, ELF, 0xFEEDFACE, 0xBEBAFECA. If target is valid then it calls an infection routine for each type. After a successful infection, it munmaps memory, restores any modified permissions and original access and modification times, and continues to process dirent buffer until there are no more entries left.
Moving inside the function that tries to infect 32 bit non-fat Mach-O binaries, located at sub_AA8.
It starts by verifying the cpu and file type, available in struct mach_header. It can only infect CPU_TYPE_I386 and MH_EXECUTE targets. Everything else is discarded! If these conditions are met the segments commands are processed. The code is looking for two things – a valid __PAGEZERO segment, where vmaddr field is set to 0 and filesize field also set to 0. This avoids string comparison with __PAGEZERO and it is a robust assumption to find this segment. The other command that it needs is LC_UNIXTHREAD to modify and redirect initial execution into the virus payload. Here it just lookups for command 0x5 (LC_UNIXTHREAD) and its flavor 0x1 (X86_THREAD_STATE32).
If the two commands are found and valid for infection proceeds with its modification. As already demonstrated in the very first screenshot, the vmsize, fileoff, filesize, memory protections, and EIP are modified accordingly. There is a minor flaw in this process. It does not support the new LC_MAIN command, which is now used instead of LC_UNIXTHREAD to load the application entrypoint. This means it will be unable to infect system binaries in /bin and /usr/bin in Mountain Lion. Well, it would be unable anyway since most are 64 bit only binaries. It is easy to fix so do not forget this new command if you are writing infectors and need to modify the entrypoint.
The last call at 0xB34 is responsible for copying the virus payload into the target and updating some values. Before this, a quick jump to the end and how original entrypoint is solved.
Those two values at 0x8A and 0x9C are modified on each infection and are specific to each binary. A quick hack in C to solve the OEP:
#include <stdio.h>
#include <stdint.h>
int main(void)
{
int32_t eax = 0x8AB1C871;
int32_t ecx = eax, ebx = eax;
for (eax *= ecx, eax--; eax != 0; eax--)
{
eax += ebx;
ebx = eax;
eax *= ecx;
}
printf("eax is %x ebx %x\n", eax, ebx);
ebx = ebx * 0x385d8f8;
printf("OEP is %x\n", ebx);
return 0;
}
Back to function sub_120. It does three things – copy virus payload into the infected target mapped memory, update the two hardcoded values for solving the entrypoint, and modify three different memory locations inside the virus payload (0x196, 0x490, 0xA05).
A small detail is that you should use hardware breakpoints while debugging else the software breakpoints will be copied into the infected target in case they are active (the copy is done using rep movsb and the source is the memory of the running infected binary). One thing that I do not understand is why the three memory locations are being XORed. As far as I have seen they are not used for anything else. Checksums will always be different because of the entrypoint values so I am not sure if that is the original goal or something else.
Right now I can’t remember anything else in particular to expose about Clapzok. The June Virus Bulletin is finally out and if you have access you can also read Peter Ferrie’s article about this same virus, covering all three platforms. Unfortunately I can’t share the issue. It contains some other details such as small bugs but no code samples. This post will probably be updated with some other details and/or fixes so keep watching. I am not sure yet if I can share my sample, if I do I will post it here. Before closing, the picture of the virus main().
Conclusions…
It was fun to reverse this PoC! Since it is coded in assembly it contains some optimizations and tricks that I find funny. There is room for improvement and fix some of its bugs. Its biggest issue is infecting __PAGEZERO. That is very easy to detect and even fix if Apple really wants to (unless I am wrong!). But, never forget that hindsight is always easy and what you do not know is not always clear. It is also a PoC so there is no malicious intent by its author.
It is good to see some movement in the OS X virus arena – it can finally shake up things and call for attention that it is not a safe platform as most people want to believe in. Assuming the author is the same as version 1.0, congratulations to JPanic for his good work. Keep’em coming.
Enjoy,
fG!
Update(s):
I forgot to mention two details. Syscalls are used via int80 (as Crisis dropper does for example), and that code signed binaries will also be infected, thus rendering the code signature invalid. The easy way to avoid this is to lookup for LC_CODE_SIGNATURE command and do not infect binaries containing this segment command.
I just read Ferrie’s article in detail and the three data areas I mention appear to contain credits and other information. The functions address table set in the beginning also seems to be used by the other platforms if I am reading correctly his article. Since I only reversed the OS X part I do not have total awareness of all cross-platform mechanisms present in the code.