No. I just clickbaited you but don’t leave yet, keep reading for something fun!
A couple of days ago I found something curious on VirusTotal. There were more than 40 thousand binaries with the same size in a single day. That seemed very odd so I loaded two random binaries and compared their contents. The only difference was on strings section.
VirusTotal detections were very low (two to three) and identified the samples as EvilQuest/ThiefQuest malware.
To prove that all the binaries were the same except for strings, I wrote a quick Mach-O stats utility in Go (yes, 2020 is this crazy!) to hash the code and strings sections separately. The hypothesis is that the code section would have the same hash for all the samples, and the strings section would have a unique hash for each sample. The output confirmed that this was indeed the case - same code, different strings.
Running this program against 206091 binaries totalling 34GB of data:
Mach-O Stats
(c) 2020 Pedro Vilaca. All Rights Reserved
100% |███████████████████████████████████████| (206091/206091, 1563 it/s) [2m11s:0s]
__text map
cd87dfd659fc2334ccc59093c1f41ba9abf4c88046d438ddd8bc2d82f55859d7 206091
Given that the strings are encrypted/obfuscated, my first idea was that this could be a new version with mutated versions being used in different sources. Doesn’t make that much sense given that the code was the same but given that EvilQuest has ransomware features, this could be for example different BitCoin wallets for each sample.
Now it was time to load one of the samples into a disassembler and give a look at its contents. Assuming that the VirusTotal detections were correct even if too low, I grabbed the known sample of EvilQuest. This sample contains debugging symbols so it’s very easy to navigate since most function names are explicit about their intents. The new sample fixed that mistake and had that information removed.
Before bringing the heavy diffing guns such as BinDiff and Diaphora I like to give a look around to feel what’s going on. In this case the code had differences but was very similar. I could see what were clearly obfuscated/encrypted strings like in the original sample. So, I tried to find those functions using the symbols from the first sample. That was fast and easy and confirmed that the code was related (either from the same author or someone reusing it - attribution is hard :P).
Scott Knight released a script to decrypt/encrypt the original samples strings, but it doesn’t work with the new samples. It makes sense given that there are keys and tables that could have changed, and also what appears to be a new type of obfuscated/encrypted string format.
000Bg{0000090nQ4XL1qPsnl1ZjpKX0lkFoa0000053
The new strings type appears to always starts with 000Bg{.
Learning a new programming language is easier when you have things to do with it, so I decided to write a decrypter/deobfuscator in Go. In hindsight it wasn’t a smart decision because it’s kind of ugly to deal with buffers in Go and much easier in C (or I don’t know yet the best way to do it in Go).
$ ./evilquest_deobfuscator -s "000Bg{0000090nQ4XL1qPsnl1ZjpKX0lkFoa0000053"
EvilQuest String Deobfuscator
(c) 2020 Pedro Vilaca. All Rights Reserved
000Bg{0000090nQ4XL1qPsnl1ZjpKX0lkFoa0000053 -> rb+
Meanwhile, the next day there were again more than 40 thousand new samples with the same size. Confirmed again that the only difference was in strings. While reversing and writing the strings decrypter I noticed that the hash of the sample I was using was modified. That generated a brain click and I went to bed thinking that this wasn’t a big malware campaign (very sad!) because it didn’t make sense with so many samples but it could be a VirusTotal issue. VirusTotal sandbox just got trapped into an analysis loop. This idea was reinforced by the fact that the sample had been submitted from the ZZ country code, meaning unknown origin. Connecting these two ideas reinforced my belief that this was the right path.
After I finished the strings decrypter I could verify that my unique samples campaign hypothesis wasn’t valid. The strings were all the same, just encrypted/obfuscated with different keys.
So, the next step was to verify the code to see what was happening there. This was very easy to find since it’s the first thing the sample does.
At the entrypoint we can observe the mutation function being called first with argv[0]
as its argument.
10001A8D0 public start
10001A8D0 start proc near
(...)
10001A8D0 push rbp
10001A8D1 mov rbp, rsp
10001A8D4 sub rsp, 2F0h
10001A8DB mov rax, cs:___stack_chk_guard_ptr
10001A8E2 mov rax, [rax]
10001A8E5 mov [rbp+var_8], rax
10001A8E9 mov [rbp+var_94], 0
10001A8F3 mov [rbp+var_98], edi
10001A8F9 mov [rbp+var_A0], rsi
10001A900 mov rax, [rbp+var_A0]
10001A907 mov rdi, [rax] ; argv[0]
10001A90A call fg_open_and_reencrypt_cstrings ; binary self modifies here
(...)
Next follows opening the executable itself with rb+
mode (reading and writing). Fun enough there is a memory leak because the decrypted string buffer is malloc’ed in the decryptor function. One of the differences from this sample versus the previous is the increased usage of dynamically allocated memory, increasing the potential for memory leaks. There are a lot more memory leaks all over the code. Xcode Instruments has a nice leak detector (hint, hint).
10001A840 fg_open_and_reencrypt_cstrings proc near
10001A840 ; CODE XREF: start+3A↓p
10001A840
10001A840 var_24 = dword ptr -24h
10001A840 __filename = qword ptr -20h
10001A840 FILE_pointer = qword ptr -18h
10001A840 var_10 = qword ptr -10h
10001A840 var_4 = dword ptr -4
10001A840
10001A840 push rbp
10001A841 mov rbp, rsp
10001A844 sub rsp, 30h
10001A848 mov [rbp+var_10], rdi
10001A84C mov rdi, [rbp+var_10]
10001A850 lea rax, a000bg0000090nq_18 ; "000Bg{0000090nQ4XL1qPsnl1ZjpKX0lkFoa000"...
10001A857 mov [rbp+__filename], rdi
10001A85B mov rdi, rax
10001A85E call fg_decrypt_0000Bg_string ; decrypt/decode string
10001A863 mov rdi, [rbp+__filename]
10001A867 mov rsi, rax ; "rb+"
10001A867 ; memleak here since the returned ptr was calloc'ed
10001A86A call _fopen
10001A86F mov [rbp+FILE_pointer], rax
10001A873 cmp [rbp+FILE_pointer], 0
10001A878 jz loc_10001A890
10001A87E mov rdi, [rbp+FILE_pointer] ; FILE *
10001A882 call _ftrylockfile
10001A887 cmp eax, 0
10001A88A jz loc_10001A89C
10001A890
10001A890 loc_10001A890: ; CODE XREF: fg_open_and_reencrypt_cstrings+38↑j
10001A890 mov [rbp+var_4], 0FFFFFFFFh
10001A897 jmp loc_10001A8C1
10001A89C ; ---------------------------------------------------------------------------
10001A89C
10001A89C loc_10001A89C: ; CODE XREF: fg_open_and_reencrypt_cstrings+4A↑j
10001A89C mov rdi, [rbp+FILE_pointer] ; FILE* handle
10001A8A0 call fg_reencrypt_cstrings
10001A8A5 mov rdi, [rbp+FILE_pointer] ; FILE *
10001A8A9 call _funlockfile
10001A8AE mov rdi, [rbp+FILE_pointer] ; FILE *
10001A8B2 call _fclose
10001A8B7 mov [rbp+var_4], 0
10001A8BE mov [rbp+var_24], eax
10001A8C1
10001A8C1 loc_10001A8C1: ; CODE XREF: fg_open_and_reencrypt_cstrings+57↑j
10001A8C1 mov eax, [rbp+var_4]
10001A8C4 add rsp, 30h
10001A8C8 pop rbp
10001A8C9 retn
10001A8C9 fg_open_and_reencrypt_cstrings endp
The fg_reencrypt_cstrings
function is previous listing is where the mutation occurs.
The function will find the __cstring
section and iterate over its contents, decrypting and encrypting the strings, and write back to the binary. The original binary is already modified when it returns from fg_open_and_reencrypt_cstrings
.
(...)
for ( j = 0; j < sg->nsects; ++j ) {
v12 = (__int64)sub_100006580(a1, v17, 80LL);
// obfuscated string is "__cstring"
v2 = fg_decrypt_0000Bg_string("000Bg{00000H0nQ4XL1qPsnl3oBkir1CDCUq3Z{iy|22B2MZ0000073");
if ( !strcmp((const char *)v12, v2) ) {
v11 = (__int64)sub_100006580(a1, *(unsigned int *)(v12 + 48), *(_QWORD *)(v12 + 40));
v10 = 0LL;
v9 = 0LL;
v8 = 0;
fseek(a1, *(unsigned int *)(v12 + 48), 0);
while ( (unsigned __int64)v8 < *(_QWORD *)(v12 + 40) ) {
if ( *(_BYTE *)(v11 + v8) ) {
++v9;
}
else if ( v9 ) {
v7 = (char *)calloc(1uLL, v9 + 1);
__memcpy_chk(v7, v10 + v11, v9, -1LL);
v6 = fg_decrypt_0000Bg_string(v7);
__s = (char *)fg_encrypt_0000Bg_string(v6);
if ( v7 != v6 ) {
v3 = strlen(__s);
if ( v3 == strlen(v7) ) {
fseek(a1, v10 + *(unsigned int *)(v12 + 48), 0);
fwrite(__s, 1uLL, v9, a1);
free(v6);
}
}
free(v7);
free(__s);
v10 += v9 + 1;
v9 = 0LL;
}
else {
++v10;
}
++v8;
}
}
v17 += 80LL;
}
(...)
At this point it was very clear that the sandbox loop was happening. The original sample submitted to VirusTotal mutated itself, generating a new sample that was also submitted to the sandbox because it was a “new” executable file (the sandbox sees the original as a dropper) and so on. This explains why since 5th September 2020 there are around more than 40k new daily samples.
Date | Total |
---|---|
2020-09-05 | 14033 |
2020-09-06 | 47782 |
2020-09-07 | 47887 |
2020-09-08 | 47849 |
2020-09-09 | 48540 |
2020-09-10 | 48819 |
2020-09-11 | 45681 |
2020-09-12 | 45263 |
2020-09-13 | 46797 |
2020-09-14 | 16437 |
2020-09-15 | 43440 |
2020-09-16 | 40616 |
2020-09-17 | 40165 |
2020-09-18 | 40382 |
2020-09-19 | 39901 |
It’s also easy to see on VirusTotal graph feature with the relationships between the samples. This is a simple example but you can draw graphs with more items that show this relationship.
Given all these assumptions it should be easy to find the patient zero sample.
As far as I can see this started on 2020-09-05. I looked up the earliest submission date for that day and there were two samples with the following hashes:
9efc7a1f373026a266a642b8417544b92de08e25b6bcdc12d7bfd44bb8993721
2f1fbd634ebac9079c29e1e659fe3e3f3fd7f3d0aefd4d513563d371b558d22c
Both have ookcucythguan
as submission name, and were submitted from Germany via the web interface. A minute later we can observe other samples being analyzed for the first time.
2020-09-05 17:06:57 9efc7a1f373026a266a642b8417544b92de08e25b6bcdc12d7bfd44bb8993721 (P0)
2020-09-05 17:07:47 2f1fbd634ebac9079c29e1e659fe3e3f3fd7f3d0aefd4d513563d371b558d22c (P0)
2020-09-05 17:08:32 3499d8119db3bd9365a7b1d0b3f677cc9adc5efe9097234ac92e1aa915ef11b1
2020-09-05 17:08:33 a3f9a98d8a60c77666d4bf73b9ae2b72dafa32251813cba3a79a2aeb7511037c
2020-09-05 17:08:35 ade3e5d2bc094dd2835905aea82d58801a8fb53aa6449bd520d404fcfbc19e88
2020-09-05 17:08:36 ca984bcda781d11cc220d45bc01b2e34bd8349e83139f6a9fdd9dd55ddbad4fd
2020-09-05 17:08:38 d1a11b45b807a9f7da05db69ba1706a850064497376a4775cbb15b1d94b95588
2020-09-05 17:08:39 60db8eb741601aba3514e809775a0514a47a05117f50a93773e9c38ce868326d
2020-09-05 17:09:16 bd6e1b8ee1c01cb0326d01e026d9d9adcce64c1d021a97b18416680f2e774ca1
2020-09-05 17:09:18 5e8a0a3b6aeb4a37fc1d949ebe4846accd5842d4eb145d37fa7af41b0acbc70c
(...)
Let’s see if the code for those initial samples is identical:
Mach-O Stats
(c) 2020 Pedro Vilaca. All Rights Reserved
__text map
cd87dfd659fc2334ccc59093c1f41ba9abf4c88046d438ddd8bc2d82f55859d7 10
__cstring map
c8bcd6734f292c094295c8902432e57bfd7e040e52b684966298789e79280a17 1
6e663fe4412847efc35eba032dd931ac47d68296a5498a2b32888ce721f52ae4 1
8feeda9ad8667378faa4f18c593941b25a9778f4dfb9d4c158af9dc960b0f3f8 1
5992aa7be51662a1aa4c9a8c817d914c8ae5e1af178ef440cb38d553f7ff626a 1
51245ddb85164d78d0e968a5b0ed9607b6ee54b11a6a43622018d7260fff1c95 1
f475383e966261ee28209a636bda87280640b34198adcb507d4518bad93e1728 1
112c82eaa856d4594d9c2e61020fb922e0f203fc0a7791fe2cc98f1a829f7bcd 1
6cdd2f844a19db4a9dafc95883a7bbc53b205637fc3a71b823ceda15c45c160e 1
f083c6df6305f979fd9228ea520997327fd3bda7dea7cbf98291235edb3c5683 1
3c444bd356c468001b1d2c79d4df2f9efd676302f4203b8342ecdb63334b0fa5 1
What this tells us is that the __text
section is identical (same SHA256 for all 10 samples), while each __cstring
section is unique (10 different hashes).
VirusTotal has information about the execution parents of processes submitted to the sandbox, for dropper analysis. Let’s check the execution parents of these samples:
First the patient zero samples:
{
"sha256": "9efc7a1f373026a266a642b8417544b92de08e25b6bcdc12d7bfd44bb8993721",
"submission_names": [
"ookcucythguan"
],
"execution_parents": null,
"first_seen": "2020-09-05 17:06:57"
}
{
"sha256": "2f1fbd634ebac9079c29e1e659fe3e3f3fd7f3d0aefd4d513563d371b558d22c",
"submission_names": [
"ookcucythguan"
],
"execution_parents": null,
"first_seen": "2020-09-05 17:07:47"
}
And next patient zero “children”:
{
"sha256": "3499d8119db3bd9365a7b1d0b3f677cc9adc5efe9097234ac92e1aa915ef11b1",
"submission_names": [
"/Users/user1/Library/com.apple.fmdd"
],
"execution_parents": [
"9efc7a1f373026a266a642b8417544b92de08e25b6bcdc12d7bfd44bb8993721"
],
"first_seen": "2020-09-05 17:08:32"
}
{
"sha256": "a3f9a98d8a60c77666d4bf73b9ae2b72dafa32251813cba3a79a2aeb7511037c",
"submission_names": [
"/Users/user1/Library/com.apple.fmgd"
],
"execution_parents": [
"9efc7a1f373026a266a642b8417544b92de08e25b6bcdc12d7bfd44bb8993721"
],
"first_seen": "2020-09-05 17:08:33"
}
{
"sha256": "ade3e5d2bc094dd2835905aea82d58801a8fb53aa6449bd520d404fcfbc19e88",
"submission_names": [
"/Users/user1/Library/com.apple.fmjd"
],
"execution_parents": [
"9efc7a1f373026a266a642b8417544b92de08e25b6bcdc12d7bfd44bb8993721"
],
"first_seen": "2020-09-05 17:08:35"
}
{
"sha256": "ca984bcda781d11cc220d45bc01b2e34bd8349e83139f6a9fdd9dd55ddbad4fd",
"submission_names": [
"/Users/user1/Library/com.apple.fmld"
],
"execution_parents": [
"9efc7a1f373026a266a642b8417544b92de08e25b6bcdc12d7bfd44bb8993721"
]
}
{
"sha256": "d1a11b45b807a9f7da05db69ba1706a850064497376a4775cbb15b1d94b95588",
"submission_names": [
"/Users/user1/Library/osxmobiledata/com.apple.afsvcpd"
],
"execution_parents": [
"9efc7a1f373026a266a642b8417544b92de08e25b6bcdc12d7bfd44bb8993721"
],
"first_seen": "2020-09-05 17:08:38"
}
{
"sha256": "bd6e1b8ee1c01cb0326d01e026d9d9adcce64c1d021a97b18416680f2e774ca1",
"submission_names": [
"/Users/user1/Library/com.apple.fmdd"
],
"execution_parents": [
"2f1fbd634ebac9079c29e1e659fe3e3f3fd7f3d0aefd4d513563d371b558d22c"
],
"first_seen": "2020-09-05 17:09:16"
}
A couple of samples later and we can already observe “grandsons” of patient zero:
{
"sha256": "5e8a0a3b6aeb4a37fc1d949ebe4846accd5842d4eb145d37fa7af41b0acbc70c",
"submission_names": [
"/Users/user1/Library/com.apple.fmhd"
],
"execution_parents": [
"3499d8119db3bd9365a7b1d0b3f677cc9adc5efe9097234ac92e1aa915ef11b1",
"2f1fbd634ebac9079c29e1e659fe3e3f3fd7f3d0aefd4d513563d371b558d22c"
],
"first_seen": "2020-09-05 17:09:18"
}
{
"sha256": "6ecdd0f33349a66635ed29b57afd9eafc3391c5a6b2267a3a866b66045290efc",
"submission_names": [
"/Users/user1/Library/com.apple.fmfd"
],
"execution_parents": [
"5e8a0a3b6aeb4a37fc1d949ebe4846accd5842d4eb145d37fa7af41b0acbc70c",
"cda3796c74c9047466384fda223f618d7efe5c00390ae1654e9dff7b3ab07f36",
"d1a11b45b807a9f7da05db69ba1706a850064497376a4775cbb15b1d94b95588"
],
"first_seen": "2020-09-05 17:10:05"
}
If my theory is correct, we should be able to see the mutated samples analyzed all day long and the next day. This hypothesis holds true if we verify next day execution timeline:
2020-09-06 00:00:00
2020-09-06 00:00:01
2020-09-06 00:00:25
2020-09-06 00:00:26
2020-09-06 00:00:30
2020-09-06 00:00:31
2020-09-06 00:00:33
2020-09-06 00:01:00
(...)
2020-09-06 23:59:37
2020-09-06 23:59:42
2020-09-06 23:59:43
2020-09-06 23:59:45
2020-09-06 23:59:48
2020-09-06 23:59:49
2020-09-06 23:59:58
We can find the first next day samples and see if their parent(s) belongs to the previous day:
{
"sha256": "1a1e84793d5e68259e276d5728736f8dcdadbc10beaf579f3b56c415055f1474",
"submission_names": [
"/Users/user1/Library/osxmobiledata/com.apple.afsvcpd"
],
"execution_parents": [
"ee7075bd8f20e94b61436e6344631d0bbe7380a30c73f6dca39f14b402d5672f"
],
"first_seen": "2020-09-06 00:00:00"
}
{
"sha256": "f34f8e458fd98006f454c8e327c1df1872ce8cd93989ed9a41a914507061487e",
"submission_names": [
"/Users/user1/client/tmp/8e3b432bab64466a202b0557a8273f9eab1a63cff9b1ba62292a5180136cc95d/sample.bin"
],
"execution_parents": [
"d780db5b3796dfd39d6bb40aac51d94e4a758308be8414d6e1e76fa2c0c22f7f",
"8e3b432bab64466a202b0557a8273f9eab1a63cff9b1ba62292a5180136cc95d"
],
"first_seen": "2020-09-06 00:00:00"
}
Their parents from previous day:
{
"sha256": "ee7075bd8f20e94b61436e6344631d0bbe7380a30c73f6dca39f14b402d5672f",
"submission_names": [
"/Users/user1/Library/com.apple.fmtd",
"/Library/osxmobiledata/com.apple.afsvcpd",
"/Users/user1/Library/osxmobiledata/com.apple.afsvcpd",
"com.apple.afsvcpd0"
],
"execution_parents": [
"7ccd8fe515bfc9316f9370e9191b971f231d5d4f28a865e549289b5e25a67f14"
],
"first_seen": "2020-09-05 18:12:03"
}
{
"sha256": "d780db5b3796dfd39d6bb40aac51d94e4a758308be8414d6e1e76fa2c0c22f7f",
"submission_names": [
"/Users/user1/Library/com.apple.fmkd"
],
"execution_parents": [
"e9a3d60e34b380fee9a3910ad4c652589fa683c32f471fc6dcac99759541b1ea"
],
"first_seen": "2020-09-05 18:11:32"
}
{
"sha256": "8e3b432bab64466a202b0557a8273f9eab1a63cff9b1ba62292a5180136cc95d",
"submission_names": [
"/Users/user1/Library/com.apple.fmrd"
],
"execution_parents": [
"dd3d15e7cf6a1f62922041f7213d4b1eef7595dee613c1663a98f6015a4f936f"
],
"first_seen": "2020-09-05 19:45:30"
}
Last verify if it is the same code and different strings, which still holds true.
Mach-O Stats
(c) 2020 Pedro Vilaca. All Rights Reserved
__text map
cd87dfd659fc2334ccc59093c1f41ba9abf4c88046d438ddd8bc2d82f55859d7 5
__cstring map
6ac22c26e24ceb5a922b3e4b57365757b12211eb858cfca5746a8a30846757bd 1
ef848f3c91e9ad50f02ae61970289b33b5398b2a48129bbd78c98ba85ae40e74 1
0239a72b6f4b339aa8e82504d2be8acc42dc089ddafbbb44330bb8c32d63f5ec 1
318e952c78afe7b772b54e93aded05857705114e71cf53d7520f1f3814730249 1
9ccedba8c6262fd7e1ac78380266c095308179862d547e9227d33aa6dd3fb24e 1
Given all this it seems that there is a “bug” in VirusTotal macOS sandbox which allows to “fork” bomb it. This is a feature to tackle polymorphic code and it makes sense to exist. But in this case there is no code polymorphism, just strings being mutated. From my point of view there should be some kind of trigger to stop this after a day or two. But it can be a complicated decision and problem to solve. Where is the balance?
This could lead to a possible DoS or wasteful usage of VirusTotal macOS sandbox by submitting a couple of different Mach-O samples that modify themselves. If the files are big enough it could consume a lot of disk space. And flood everyone else looking at daily feeds.
The sandbox appears to be executing a sample every 2 seconds so we might be able to infer VirusTotal macOS sandbox analysis capacity.
Regarding the sample itself, it appears to be a new version of EvilQuest/ThiefQuest. There is a command line switch to display the version number, currently 3.105
. I haven’t yet analyzed its capabilities to understand if there are any new features or improvements to the initial public found in last June. Its development appears to be active and so this threat might grow in the future.
The hardcoded C2 is still the same 159.65.147.28
as described in this post about an updated version back in July.
In the first days it had very few detections (2 to 3) but it seems AVs finally catched up. On 2020-09-09 the number of detections finally grew to 5, doubling next day and most vendors finally catching up over the next days. A reanalysis of the initial samples returns 21 detections at the time of writing.
I guess those detection signatures for the first version weren’t that good.
{
"sha256": "9efc7a1f373026a266a642b8417544b92de08e25b6bcdc12d7bfd44bb8993721",
"positives": 2,
"scan_date": "2020-09-05 17:06:57"
}
{
"sha256": "2f1fbd634ebac9079c29e1e659fe3e3f3fd7f3d0aefd4d513563d371b558d22c",
"positives": 2,
"scan_date": "2020-09-05 17:07:47"
}
The question is if the malware author was trying to mess around with the sandbox or was just a coincidence. If the latter, I don’t understand what’s the benefit of mutating the strings when the code is still the same. Might be able to fool lame AV signatures. Besides that, a significant amount of encrypted/obfuscated strings will just put the spotlight on this type of binary. Doesn’t make that much of a sense.
The conclusion is that unfortunately it’s not the biggest malware attack ever with near half a million samples on VirusTotal but a new version of EvilQuest/ThiefQuest that triggered a cute loop in VirusTotal sandbox.
Hope you have enjoyed this little adventure and was worth the clickbait.
Have fun,
fG!
P.S.: The Go code is already pushed to GitHub. Here and here.