Back in 2017 (feels like ages ago) I decided to take a peek into the ShadowBrokers leaks and reverse some of the tools.
I started on dewdrop
simply because it had a macOS version. I made local presentations at 0xOpoSec and BSidesLisbon but those slides were never published for obvious reasons (aka live implants all over the Internet).
Significant time has passed and everyone went crazy last week with the beautiful NSO exploit VM published by Project Zero, so why not ride the wave and present a simple NSA BPF VM. It is still an interesting work and you have to admire the great engineering that goes behind this code. It’s not everyday that you can take a peek at code developed by a well funded state actor.
This post is only going to focus on the BPF part of the implant so you will have to fill in the blanks about everything else.
So let’s start!
After we start reversing the dewdrop
binary we reach a point where we understand that a libpcap sniffer is installed and we are dealing with a port knocking backdoor with multiple targets such as Solaris, Linux, FreeBSD, HP-UX, JunOS, OS X. If it’s connected to the Internet there is a backdoor version. SIGINT all the things!
FX created the first (AFAIK!) port knocking backdoor, cd00r.c. Knock is a another example. Port knocking removes the need for port listening backdoors and hiding those listeners from traditional network tools (netstat and friends). It’s the year 2000, rootkits are all the rage! Holy crap, 21 years already?
Port knocking is essentially implemented as a custom sniffer looking for a magic packet (or a group of magic packets) and do something when it matches such as open a port, make a callback to a remote host, etc. Without the right sequence and/or data you can’t get in and detect it from the network scans.
Quoting cd00r.c
:
The approach of cd00r.c is to provide remote access to the system without showing an open port all the time. This is done by using a sniffer on the specified interface to capture all kinds of packets. The sniffer is not running in promiscuous mode to prevent a kernel message in syslog and detection by programs like AnitSniff.
Libpcap has all the necessary features to implement this. We just need to install a listener and activate a BPF based filter since we don’t need to capture everything (it’s the 2000s, CPUs are slow, we don’t want to raise alarms). When the magic packet or sequence is triggered a callback is executed and we can do whatever we want.
The following is the observed backdoor process diagram:
The initial process just forks and exit as a debugging measure (gdb used to have problems following child processes - software breakpoints crash because no debugger installed in the child). The first fork is the daemon and watchdog process responsible for managing the rest of the code. It then forks a child worker where the port knocking sniffer is installed.
The watchdog process is also very simple. It redirects all output to /dev/null
, removes all signal handlers, forks a worker child and waits for it. Core files are disabled, file limits are increased, and a signal handler to kill the child is set, to avoid zombie processes if the watchdog is killed. Too much gore in Unix, uh?
Strings are XOR obfuscated. You can find a Unicorn Engine based deobfuscation utility here. At the time I was playing a lot with Unicorn so it was just faster to copy the shellcode into a quick Unicorn emulator. Recently I used a better approach with Lamberts obfuscation with the delambert IDA plugin.
The same obfuscation algorithm is reused in other tools. Ooops, code reusage sin!
After we understand the process diagram it is clear that we want to debug the worker child. My trick at the time was to patch a sleep (or just use Trammell’s simple and powerful infinite loop trick, which takes less bytes and doesn’t need offset computations) in the code that the child is going to run, attach the debugger, restore (or emulate) the original bytes, rewind EIP and have fun. Or, you can just skip the fork, and patch code or set EIP to the code that would be executed by the child.
Not sure why I originally used a sleep patch instead of the infinite loop. Maybe because it would look nicer on slides or to show how to call imported symbols. Go figure!
I can’t remember how but I understood that libpcap was used. All the library original strings are removed to avoid (easy) library identification. I think at the time I just compiled a libpcap version and manually identified the most interesting functions. Bindiff or Diaphora are the tools of the trade for this kind of task.
Usually the flow to install a libpcap based sniffer is:
- Locate the network interfaces to sniff at.
- Compile a filter.
- Install the filter.
- Start sniffing.
- Get matching packets into a callback.
- Process the matched packets.
- Do something (evil).
Just read cd00r.c
code and you will understand what’s going in the backdoor code.
In this sample the clear text BPF filter is missing and a call to pcap_compile
to compile that filter. Instead of compiled on the fly, the (compiled) bytecode is embedded in the binary.
We can find the code that retrieves and installs the bytecode here:
__text:0x100001882 E8 59 23 00 00 call sub_100003BE0 ; retrieve the pre compiled bpf_program
__text:0x100001887 4C 89 E7 mov rdi, r12
__text:0x10000188A 48 89 C6 mov rsi, rax ; rax = 0x000000010000A640
__text:0x10000188A ;
__text:0x10000188A ; struct bpf_program {
__text:0x10000188A ; u_int bf_len;
__text:0x10000188A ; struct bpf_insn *bf_insns;
__text:0x10000188A ; };
__text:0x10000188A ;
__text:0x10000188A ; struct bpf_insn {
__text:0x10000188A ; u_short code;
__text:0x10000188A ; u_char jt;
__text:0x10000188A ; u_char jf;
__text:0x10000188A ; bpf_u_int32 k;
__text:0x10000188A ; };
__text:0x10000188D E8 9E 51 00 00 call pcap_setfilter ; int
__text:0x10000188D ; pcap_setfilter(pcap_t *p, struct bpf_program *fp)
The input to pcap_setfilter
is a bpf program:
pcap_setfilter()
is used to specify a filter program.fp
is a pointer to abpf_program
struct, usually the result of a call topcap_compile(3PCAP)
.
If we take a look at the referenced data it clearly looks like a potential bpf_program
:
__data:0x10000A640 39 00 00 00 dword_10000A640 dd 39h ; DATA XREF: sub_100003BE0+62↑o
__data:0x10000A644 00 db 0
__data:0x10000A645 00 db 0
__data:0x10000A646 00 db 0
__data:0x10000A647 00 db 0
__data:0x10000A648 60 A6 00 00 01 00 00 00 off_10000A648 dq offset port_knocking_bpf_program
__data:0x10000A650 00 00 00 00 00 00 00 00+ align 20h
__data:0x10000A660 30 port_knocking_bpf_program db 30h ; DATA XREF: __data:off_10000A648↑o
__data:0x10000A661 00 db 0
__data:0x10000A662 00 db 0
__data:0x10000A663 00 db 0
__data:0x10000A664 0E db 0Eh
__data:0x10000A665 00 db 0
__data:0x10000A666 00 db 0
__data:0x10000A667 00 db 0
__data:0x10000A668 15 db 15h
__data:0x10000A669 00 db 0
__data:0x10000A66A 01 db 1
__data:0x10000A66B 00 db 0
__data:0x10000A66C 45 db 45h ; E
We can define the following types in IDA to transform the data into an array:
struct bpf_insn {
u_short code;
u_char jt;
u_char jf;
int32 k;
};
struct bpf_program {
u_int bf_len;
struct bpf_insn *bf_insns;
};
Applying the new types to the data we can see the expected bpf_program
structure:
__data:0x10000A640 stru_10000A640 bpf_program <39h, 0, offset port_knocking_bpf_program>
__data:0x10000A640 ; DATA XREF: sub_100003BE0+62↑o
__data:0x10000A640 ; sub_100003BE0+4↑r ...
__data:0x10000A650 align 20h
__data:0x10000A660 port_knocking_bpf_program bpf_insn < 30h, 0, 0, 0Eh>
__data:0x10000A660 ; DATA XREF: __data:stru_10000A640↑o
__data:0x10000A660 bpf_insn < 15h, 1, 0, 45h>
__data:0x10000A660 bpf_insn < 6, 0, 0, 0>
__data:0x10000A660 bpf_insn < 28h, 0, 0, 10h>
__data:0x10000A660 bpf_insn < 4, 0, 0, 0Eh>
__data:0x10000A660 bpf_insn < 35h, 1, 0, 0AAh>
__data:0x10000A660 bpf_insn < 6, 0, 0, 0>
__data:0x10000A660 bpf_insn < 2, 0, 0, 1>
__data:0x10000A660 bpf_insn < 14h, 0, 0, 6>
__data:0x10000A660 bpf_insn < 7, 0, 0, 0>
__data:0x10000A660 bpf_insn < 48h, 0, 0, 0>
__data:0x10000A660 bpf_insn < 44h, 0, 0, 0E6CFh>
(...)
The bpf_program
header tells us the program has 57 instructions. We definitely want to disassemble and reverse it.
Before the BPF program is retrieved, the network frame header size is computed:
__text:0x100001877 mov rdi, r12 ; pcap_t *
__text:0x10000187A call find_frame_header_size
Quite a few cases are supported but the virtual machine executing the sample uses Ethernet so we will go through the DLT_EN10MB
case.
__text:0x100001550 find_frame_header_size proc near ; CODE XREF: sub_1000015F0+A8↓p
__text:0x100001550 ; sub_100001720+15A↓p
__text:0x100001550 ; __unwind {
__text:0x100001550 push rbp
__text:0x100001551 mov rbp, rsp
__text:0x100001554 call pcap_datalink ; int
__text:0x100001554 ; pcap_datalink(pcap_t *p)
__text:0x100001554 ; {
__text:0x100001554 ; return (p->linktype);
__text:0x100001554 ; }
__text:0x100001559 cmp eax, 6Bh ; 'k' ; DLT_FRELAY
__text:0x10000155C jg short loc_100001574
__text:0x10000155E cmp eax, 1 ; DLT_EN10MB
__text:0x100001561 jz short loc_100001585
__text:0x100001563 cmp eax, 8 ; DLT_SLIP
__text:0x100001566 jz short loc_10000158C
__text:0x100001568 cmp eax, 0Ch ; DLT_RAW
__text:0x10000156B jz short loc_100001593
__text:0x10000156D
__text:0x10000156D loc_10000156D: ; CODE XREF: find_frame_header_size+2C↓j
__text:0x10000156D mov eax, 0FFFFh
__text:0x100001572 pop rbp
__text:0x100001573 retn
__text:0x100001574 ; --------------------------------------------------------------------
__text:0x100001574
__text:0x100001574 loc_100001574: ; CODE XREF: find_frame_header_size+C↑j
__text:0x100001574 cmp eax, 71h ; 'q' ; DLT_LINUX_SLL
__text:0x100001577 jz short loc_10000158C
__text:0x100001579 cmp eax, 6Ch ; 'l' ; DLT_LOOP
__text:0x10000157C jnz short loc_10000156D
__text:0x10000157E mov eax, 4
__text:0x100001583 pop rbp
__text:0x100001584 retn
__text:0x100001585 ; --------------------------------------------------------------------
__text:0x100001585
__text:0x100001585 loc_100001585: ; CODE XREF: find_frame_header_size+11↑j
__text:0x100001585 mov eax, 0Eh ; /* Ethernet header length */
__text:0x10000158A pop rbp
__text:0x10000158B retn
__text:0x10000158C ; --------------------------------------------------------------------
__text:0x10000158C
__text:0x10000158C loc_10000158C: ; CODE XREF: find_frame_header_size+16↑j
__text:0x10000158C ; find_frame_header_size+27↑j
__text:0x10000158C mov eax, 10h
__text:0x100001591 pop rbp
__text:0x100001592 retn
__text:0x100001593 ; --------------------------------------------------------------------
__text:0x100001593
__text:0x100001593 loc_100001593: ; CODE XREF: find_frame_header_size+1B↑j
__text:0x100001593 xor eax, eax
__text:0x100001595 pop rbp
__text:0x100001596 retn
__text:0x100001596 ; } // starts at 100001550
__text:0x100001596 find_frame_header_size endp
The frame header size is necessary because the bpf program is modified to support those different link types (old stuff, uh?). This is what the function that returns the bpf program does:
__text:0x100003BE0 sub_100003BE0 proc near ; CODE XREF: sub_100001720+162↑p
__text:0x100003BE0 ; __unwind {
__text:0x100003BE0 push rbp
__text:0x100003BE1 mov rbp, rsp
__text:0x100003BE4 mov rax, cs:stru_10000A640.bf_insns
__text:0x100003BEB mov [rax+4], edi
__text:0x100003BEE lea eax, [rdi+2]
__text:0x100003BF1 mov rcx, cs:stru_10000A640.bf_insns
__text:0x100003BF8 mov [rcx+1Ch], eax
__text:0x100003BFB mov rax, cs:stru_10000A640.bf_insns
__text:0x100003C02 mov [rax+24h], edi
__text:0x100003C05 lea eax, [rdi+9Ch]
__text:0x100003C0B mov rcx, cs:stru_10000A640.bf_insns
__text:0x100003C12 mov [rcx+2Ch], eax
__text:0x100003C15 lea eax, [rdi+9]
__text:0x100003C18 mov rcx, cs:stru_10000A640.bf_insns
__text:0x100003C1F mov [rcx+0E4h], eax
__text:0x100003C25 lea eax, [rdi+20h]
__text:0x100003C28 mov rcx, cs:stru_10000A640.bf_insns
__text:0x100003C2F mov [rcx+0F4h], eax
__text:0x100003C35 mov rax, cs:stru_10000A640.bf_insns
__text:0x100003C3C mov [rax+11Ch], edi
__text:0x100003C42 lea rax, stru_10000A640 ; bpf_program size
__text:0x100003C49 pop rbp
__text:0x100003C4A retn
Or with decompiler assistance (IDA subscriptions model, lol…):
bpf_program *__fastcall sub_100003BE0(u_int32 frame_size)
{
stru_10000A640.bf_insns->k = frame_size;
stru_10000A640.bf_insns[3].k = frame_size + 2;
stru_10000A640.bf_insns[4].k = frame_size;
stru_10000A640.bf_insns[5].k = frame_size + 156;
stru_10000A640.bf_insns[28].k = frame_size + 9;
stru_10000A640.bf_insns[30].k = frame_size + 32;
stru_10000A640.bf_insns[35].k = frame_size;
return &stru_10000A640;
}
Recalling the struct bpf_insn
definition:
struct bpf_insn {
u_short code; // Instruction type and addressing mode
u_char jt; // Jump if false
u_char jf; // Jump if true
int32 k; // Generid field used for various purposes
};
That function is just adapting the bpf program to the different types of data links and fixing offsets.
The default program is defined for Ethernet, so the values will be the same as the original, meaning that we can dump directly the program from the binary. The following table comparing the original and computed values to modify proves this:
__data:0x10000A660 bpf_insn < 30h, 0, 0, 0Eh> <- 0: 0xE = 0xE
__data:0x10000A660 bpf_insn < 28h, 0, 0, 10h> <- 3: 0xE + 2 = 0x10
__data:0x10000A660 bpf_insn < 4, 0, 0, 0Eh> <- 4: 0xE = 0xE
__data:0x10000A660 bpf_insn < 35h, 1, 0, AAh> <- 5: 0xE + 156 = 0xAA
__data:0x10000A660 bpf_insn < 30h, 0, 0, 17h> <- 28: 0xE + 9 = 0x17
__data:0x10000A660 bpf_insn < 30h, 0, 0, 2Eh> <- 30: 0xE + 32 = 0x2E
__data:0x10000A660 bpf_insn < 48h, 0, 0, 0Eh> <- 35: 0xE = 0xE
To disassemble the bytecode we can use bpftools published by Cloudflare. This is based out of the available Linux kernel tools in kernel source code. This repo compiles easily while I think I had some issues compiling the kernel version. Just install the dependencies and compile (Linux only AFAIK). The bpf debugger and disassembler can be found at linux_tools/bpf_dbg
. If you feel brave (and lucky) you can always try radare2 (eh eh eh eh).
To load the bytecode we need to convert it to this tool format. Example:
load bpf 12,40 0 0 12,21 0 5 34525,48 0 0 20,21 6 0 17,21 0 6 44,48 0 0 54,21 3 4 17,21 0 3 2048,48 0 0 23,21 0 1 17,6 0 0 262144,6 0 0 0
The first field is the number of instructions, followed by each instruction in base 10 and space separated fields.
I created bpf_dbg_output to convert the instructions array to bpf_dbg
input format.
We can finally disassemble the bpf payload:
> load bpf 57,48 0 0 14,21 1 0 69,6 0 0 0,40 0 0 16,4 0 0 14,53 1 0 170,6 0 0 0,2 0 0 1,20 0 0 6,7 0 0 0,72 0 0 0,68 0 0 59087,2 0 0 15,72 0 0 0,84 0 0 59087,132 0 0 0,20 0 0 1,7 0 0 0,96 0 0 15,92 0 0 0,7 0 0 0,2 0 0 2,96 0 0 1,28 0 0 0,7 0 0 0,72 0 0 0,2 0 0 3,97 0 0 2,48 0 0 23,21 0 5 6,48 0 0 46,116 0 0 2,20 0 0 20,12 0 0 0,7 0 0 0,72 0 0 14,2 0 0 4,96 0 0 1,20 0 0 2,7 0 0 0,72 0 0 0,68 0 0 40298,2 0 0 15,72 0 0 0,84 0 0 40298,132 0 0 0,20 0 0 1,7 0 0 0,96 0 0 15,92 0 0 0,7 0 0 0,96 0 0 4,29 2 0 0,96 0 0 3,29 0 1 0,6 0 0 65535,6 0 0 0
> disassemble
l0: ldb [14]
l1: jeq #0x45, l3, l2
l2: ret #0
l3: ldh [16]
l4: add #14
l5: jge #0xaa, l7, l6
l6: ret #0
l7: st M[1]
l8: sub #6
l9: tax
l10: ldh [x+0]
l11: or #0xe6cf
l12: st M[15]
l13: ldh [x+0]
l14: and #0xe6cf
l15: neg
l16: sub #1
l17: tax
l18: ld M[15]
l19: and x
l20: tax
l21: st M[2]
l22: ld M[1]
l23: sub x
l24: tax
l25: ldh [x+0]
l26: st M[3]
l27: ldx M[2]
l28: ldb [23]
l29: jeq #0x6, l30, l35
l30: ldb [46]
l31: rsh #2
l32: sub #20
l33: add x
l34: tax
l35: ldh [x+14]
l36: st M[4]
l37: ld M[1]
l38: sub #2
l39: tax
l40: ldh [x+0]
l41: or #0x9d6a
l42: st M[15]
l43: ldh [x+0]
l44: and #0x9d6a
l45: neg
l46: sub #1
l47: tax
l48: ld M[15]
l49: and x
l50: tax
l51: ld M[4]
l52: jeq x, l55, l53
l53: ld M[3]
l54: jeq x, l55, l56
l55: ret #0xffff
l56: ret #0
To understand what is going on here we need the bpf instruction set:
Instruction Addressing mode Description
ld 1, 2, 3, 4, 10 Load word into A
ldi 4 Load word into A
ldh 1, 2 Load half-word into A
ldb 1, 2 Load byte into A
ldx 3, 4, 5, 10 Load word into X
ldxi 4 Load word into X
ldxb 5 Load byte into X
st 3 Store A into M[]
stx 3 Store X into M[]
jmp 6 Jump to label
ja 6 Jump to label
jeq 7, 8 Jump on A == k
jneq 8 Jump on A != k
jne 8 Jump on A != k
jlt 8 Jump on A < k
jle 8 Jump on A <= k
jgt 7, 8 Jump on A > k
jge 7, 8 Jump on A >= k
jset 7, 8 Jump on A & k
add 0, 4 A + <x>
sub 0, 4 A - <x>
mul 0, 4 A * <x>
div 0, 4 A / <x>
mod 0, 4 A % <x>
neg !A
and 0, 4 A & <x>
or 0, 4 A | <x>
xor 0, 4 A ^ <x>
lsh 0, 4 A << <x>
rsh 0, 4 A >> <x>
tax Copy A into X
txa Copy X into A
ret 4, 9 Return
The instruction set consists of load, store, branch, alu, miscellaneous and return instructions. Documentation from Linux kernel.
The following registers are available:
A
: 32 bit wide accumulatorX
: 32 bit wide X registerM[]
: 16 x 32 bit wide misc registers aka “scratch memory store”, addressable from 0 to 15
And finally the addressing modes:
Addressing mode Syntax Description
0 x/%x Register X
1 [k] BHW at byte offset k in the packet
2 [x + k] BHW at the offset X + k in the packet
3 M[k] Word at offset k in M[]
4 #k Literal value stored in k
5 4*([k]&0xf) Lower nibble * 4 at byte offset k in the packet
6 L Jump label L
7 #k,Lt,Lf Jump to Lt if true, otherwise jump to Lf
8 x/%x,Lt,Lf Jump to Lt if true, otherwise jump to Lf
9 #k,Lt Jump to Lt if predicate is true
10 x/%x,Lt Jump to Lt if predicate is true
11 a/%a Accumulator A
12 extension BPF extension
Note the relevant sizes definitions used here:
- Half word: 2 bytes
- Word: 4 bytes
The following valid ICMP packet (produced by their port knocking tool) is used to reverse the bpf program:
s10:40:09.498560 IP (tos 0x0, ttl 64, id 1, offset 0, flags [none], proto ICMP (1), length 164)
192.168.30.14 > 192.168.30.15: ICMP echo request, id 24374, seq 25107, length 144
0x0000: 000c 29db 3ac4 000c 29b6 b3c6 0800 4500 ..).:...).....E.
0x0010: 00a4 0001 0000 4001 bcea c0a8 1e0e c0a8 ......@.........
0x0020: 1e0f 0800 9e57 5f36 6213 7e29 5000 1743 .....W_6b.~)P..C
0x0030: 5b50 ab0d addf b955 1089 578f 849b ecdf [P.....U..W.....
0x0040: 83fd 84c0 a779 7118 43ac ec65 1249 7e5f .....yq.C..e.I~_
0x0050: e7ea 2b2a 6265 a1d3 912f 2dd8 b3ab e30a ..+*be.../-.....
0x0060: 0d3b e0f4 e527 3955 9f44 46d7 0608 7703 .;...'9U.DF...w.
0x0070: f134 5138 4845 68bd 7382 d1c4 1fcb adf5 .4Q8HEh.s.......
0x0080: c2ae 87b4 ac48 b398 5f65 24d3 7090 6c04 .....H.._e$.p.l.
0x0090: fc1a cb5b 99b5 ec76 a129 596e edb3 668b ...[...v.)Yn..f.
0x00a0: 8848 9bce f31c 458e 07b5 52c7 e647 7e0f .H....E...R..G~.
0x00b0: e343 .C
The reversed and commented version of the program (modified instructions are 0, 3, 4, 5, 28, 30, 35):
> disassemble
l0: ldb [14] <- load byte from offset 14 (aka skip frame header) into A. A = 0xE
l1: jeq #0x45, l3, l2 <- check if it's 0x45 - IP packet, header length = 5, version 4
l2: ret #0 <- exit if not a IPv4 packet
l3: ldh [16] <- load IP packet total length into A: 0xA4 = 164 bytes. A = 0xA4
l4: add #14 <- add link layer header size to A. A = 0xA4 + 0xE = 0xB2 (178)
l5: jge #0xaa, l7, l6 <- full packet must be at least 170 bytes. A = 178 bytes
l6: ret #0 <- exit if packet too short
l7: st M[1] <- store packet length into misc register 1. M[1] = 178
l8: sub #6 <- A = packet_length - 6 = offset 172. A = 172
l9: tax <- copy A to X register. X = 0xAC
l10: ldh [x+0] <- load half word value from packet offset 0xAC. A = 0xE647
l11: or #0xe6cf <- 0xE647 | 0xE6CF. A = 0xE6CF
l12: st M[15] <- store result at M[15]. M[15] = 0xE6CF
l13: ldh [x+0] <- load half word value from packet offset 0xAC. A = 0xE647
l14: and #0xe6cf <- 0xE647 & 0xE6CF. A = 0xE647
l15: neg <- A = ~0xE647 + 1 = 0x19B9
l16: sub #1 <- A = 0x19B8
l17: tax <- copy A to X register. X = 0x19B8
l18: ld M[15] <- load from M[15]. A = 0xE6CF
l19: and x <- 0xE6CF & 0x19B8. A = 0x88 (payload size)
<- ((0xE647 | 0xE6CF) & (~(0xE647 & 0xE6CF) + 1) - 1) = 0x88
<- this is just 0xE647 ^ 0xE6CF
<- because there is no XOR operation in this VM
<- this just extracts the payload size, which is fixed at 136 bytes
l20: tax <- X = 0x88 (payload data size)
l21: st M[2] <- M[2] = 0x88 (136)
l22: ld M[1] <- A = 178 (0xB2)
l23: sub x <- A = 0xB2 - 0x88 = 42 (0x2A)
l24: tax <- X = 42 (size of all headers up to the payload). 178 - 136 = 42
l25: ldh [x+0] <- 42 is the ICMP data offset in the packet
- so all this just computes where the data starts
- they call it the trigger
- loads the data at packet offset 0x42. A = 0x7E29
l26: st M[3] <- M[3] = 0x7E29
l27: ldx M[2] <- X = 0x88
l28: ldb [23] <- loads the data at packet offset 23. A = 1
<- IP header protocol field, ICMP in this case
l29: jeq #0x6, l30, l35 <- check if it's TCP protocol
l30: ldb [46] <- it's TCP
l31: rsh #2 <-
l32: sub #20 <-
l33: add x <-
l34: tax <-
l35: ldh [x+14] <- load half word from offset 0x88 + 0xE = 150 (0x96). A = 0xEC76
<- not sure what is this
l36: st M[4] <- M[4] = 0xEC76
l37: ld M[1] <- A = 178 (0xB2) (total packet size)
l38: sub #2 <- A = 176
l39: tax <- X = 176
l40: ldh [x+0] <- A = 0xE343 (load half word from packet offset 0xB0).
<- It's the last half word from the packet.
l41: or #0x9d6a <- 0xE343 | 0x9D6A. A = 0xFF6B
l42: st M[15] <- M[15] = 0xFF6B
l43: ldh [x+0] <- A = 0xE343
l44: and #0x9d6a <- A = 0xE343 & 0x9D6A = 0x8142
l45: neg <- A = ~0x8142 + 1 = 0x7EBE
l46: sub #1 <- A = 0x7EBD
l47: tax <- X = 0x7EBD
l48: ld M[15] <- A = 0xFF6B
l49: and x <- A = 0xFF6B & 0x7EBD = 0x7E29
l50: tax <- X = 0x7E29
<- ((0xE343 | 0x9D6A) & (~(0xE343 & 0x9D6A) + 1) - 1) = 0x7E29
<- this is just 0xE343 ^ 0x9D6A = 0x7E29
<- this means that the last half word XOR 0x96DA must be
<- equal to the trigger
l51: ld M[4] <- A = 0xEC76 (the value at packet offset 0x96)
l52: jeq x, l55, l53 <- OK if equal
l53: ld M[3] <- A = 0x7E29 (the value at beginning of data payload)
l54: jeq x, l55, l56 <- OK if equal
l55: ret #0xffff <- WE HAVE A WINNER!
l56: ret #0 <- Bad luck, maybe next packet?
We can describe the format of the data payload:
struct __attribute__((packed)) payload {
uint16_t trigger; // = checksum ^ 0x9D6A
char data[128];
uint16_t size; // = sizeof(struct payload) ^ 0xE6CF
uint16_t unknown;
uint16_t checksum;
}; // sizeof() = 136 bytes
Log from the tool that does the port knocking (different packet):
TRIGGER DATA
COMMAND = 0x01
DESTINATION ADDRESS = 192.168.30.15
TRANSPORT PROTOCOL = icmp (1)
TIME STAMP = Mon Apr 17 10:00:00 2017 (1492437600)
TIME SKEW = 43200
ICMP TYPE, CODE = 8, 0
CALLBACK ADDRESS = 192.168.30.14:55555
SOURCE PORT = 55302
START OF TRIGGER = 0xdd67
The port knocking tool is extremely flexible and can send all kinds of packets and payloads. It supports TCP, UDP, ICMP, and besides raw packets it can produce DNS, SMTP, SIP application payloads. Can set different flags in TCP packets, for example, send a RST packet with the port knocking payload. Even has a PIX firewall bypass (SYN only packet). Pretty much port knocking on steroids.
Next are some of the different packets it can build.
SIP application packet:
17:36:08.576977 IP (tos 0x0, ttl 64, id 2, offset 0, flags [none], proto TCP (6), length 342)
192.168.30.14.64778 > 192.168.30.15.acmsoda: Flags [.], cksum 0x69fe (correct),
seq 32774:33076, ack 47681, win 32767, length 302
0x0000: 4500 0156 0002 0000 4006 bc32 c0a8 1e0e E..V....@..2....
0x0010: c0a8 1e0f fd0a 1b39 0000 8006 0000 ba41 .......9.......A
0x0020: 5010 7fff 69fe 0000 5245 4749 5354 4552 P...i...REGISTER
0x0030: 2073 6970 3a61 2053 4950 2f32 2e30 0d0a .sip:a.SIP/2.0..
0x0040: 546f 3a20 3837 203c 7369 703a 3130 3240 To:.87.<sip:102@
0x0050: 7878 2e6e 6574 3e0d 0a46 726f 6d3a 2032 xx.net>..From:.2
0x0060: 3435 203c 7369 703a 3130 3140 7878 2e6e 45.<sip:101@xx.n
0x0070: 6574 3e0d 0a43 616c 6c2d 4944 3a20 3233 et>..Call-ID:.23
0x0080: 3334 3340 7071 7879 760d 0a43 5365 713a 343@pqxyv..CSeq:
0x0090: 2032 3720 5245 4749 5354 4552 0d0a 436f .27.REGISTER..Co
0x00a0: 6e74 6163 743a 203c 7369 703a 3130 3140 ntact:.<sip:101@
0x00b0: 7878 2e6e 6574 3e0d 0a43 6f6e 7465 6e74 xx.net>..Content
0x00c0: 2d4c 656e 6774 683a 2031 3330 0d0a 1050 -Length:.130...P
0x00d0: 2545 647d a2e2 f84d 3281 31bf dcd8 3669 %Ed}...M2.1...6i
0x00e0: 4d73 a214 cb7f ff29 d61a 7e80 f543 8c71 Ms.....)..~..C.q
0x00f0: a7c8 b138 4fe8 59a8 02aa cecd c0c3 a94c ...8O.Y........L
0x0100: 2102 1938 58e4 32d6 3c4e f05a 9d9d be74 !..8X.2.<N.Z...t
0x0110: 0dd1 5617 6eb6 7f50 424b bf94 52e5 bc52 ..V.n..PBK..R..R
0x0120: a769 4bfa 47c6 31d2 989a 93e5 972b 2ff3 .iK.G.1......+/.
0x0130: 0987 6eec 5eed 3013 29dd 2e0f 5dcd 1c53 ..n.^.0.)...]..S
0x0140: 3943 4712 54f7 5688 6791 3313 0c82 b47a 9CG.T.V.g.3....z
0x0150: e647 1076 8d3a .G.v.:
The trigger is 0x8d3a ^ 0x9d6a = 0x1050
(offset 0xCE).
DNS packet:
17:37:18.675460 IP (tos 0x0, ttl 64, id 2, offset 0, flags [none], proto TCP (6), length 233)
192.168.30.14.56331 > 192.168.30.15.acmsoda: Flags [.], cksum 0x40a1 (correct), seq 10342:10535,
ack 48998, win 32767, length 193
0x0000: 4500 00e9 0002 0000 4006 bc9f c0a8 1e0e E.......@.......
0x0010: c0a8 1e0f dc0b 1b39 0000 2866 0000 bf66 .......9..(f...f
0x0020: 5010 7fff 40a1 0000 af99 0180 0001 0001 P...@...........
0x0030: 0000 0000 0231 3502 3330 0331 3638 0331 .....15.30.168.1
0x0040: 3932 0769 6e2d 6164 6472 0461 7270 6100 92.in-addr.arpa.
0x0050: 0010 0001 c00c 0010 0001 0000 ffff 0088 ................
0x0060: 8718 4938 38d6 654b 6539 8033 da58 cc73 ..I88.eKe9.3.X.s
0x0070: 3a57 25a3 7c08 5e9e 7734 a126 f954 3879 :W%.|.^.w4.&.T8y
0x0080: 64bf 8f7b 741e 2e33 8cbd b07b c750 ae14 d..{t..3...{.P..
0x0090: 819b d7f3 1742 1c05 2198 570d d509 f1a8 .....B..!.W.....
0x00a0: c323 da36 41b2 ca11 b955 dd59 67e2 d495 .#.6A....U.Yg...
0x00b0: fff3 e7cf 3ce7 33a2 6bc1 f8a3 5d98 9983 ....<.3.k...]...
0x00c0: 583f b3b9 289e c3b9 70bf 45e9 69eb db32 X?..(...p.E.i..2
0x00d0: 1a42 e586 220a fd23 77ad acff 75a2 027d .B.."..#w...u..}
0x00e0: 1a6c 83e6 4718 6f85 23 .l..G.o.#
The trigger is 0x8523 ^ 0x9d6a = 0x1849
(offset 0x61).
SMTP HELO packet:
17:38:18.413178 IP (tos 0x0, ttl 64, id 2, offset 0, flags [none], proto TCP (6), length 181)
192.168.30.14.19215 > 192.168.30.15.acmsoda: Flags [.], cksum 0x790a (correct), seq 51209:51350,
ack 1802, win 32767, length 141
0x0000: 4500 00b5 0002 0000 4006 bcd3 c0a8 1e0e E.......@.......
0x0010: c0a8 1e0f 4b0f 1b39 0000 c809 0000 070a ....K..9........
0x0020: 5010 7fff 790a 0000 4845 4c4f 204f dc75 P...y...HELO.O.u
0x0030: eee6 2436 82ce 0567 a237 e408 c0c5 ea55 ..$6...g.7.....U
0x0040: 4425 df77 fb9f 1498 c2b5 3cd3 84cd 4178 D%.w......<...Ax
0x0050: e9f7 23e2 0f02 0ce7 7c36 8aa6 f4ea 43cd ..#.....|6....C.
0x0060: d1fe 2406 77b5 29a7 83c4 f497 5c06 75a5 ..$.w.).....\.u.
0x0070: 1528 39a0 d80c 412f f35e 067d d857 ee1c .(9...A/.^.}.W..
0x0080: 7f3d b09b f209 b8ad fff1 1f1d 52a8 87ca .=..........R...
0x0090: 73dc 5c1a 4a69 4205 f5d9 6836 6f09 be5e s.\.JiB...h6o..^
0x00a0: 6b87 7ad6 1138 de67 6c3b 33d8 98a4 35e6 k.z..8.gl;3...5.
0x00b0: 474f fad2 b6 GO...
The trigger is 0xd2b6 ^ 0x9d6a = 0x4fdc
(offset 0x2D).
In the port knocking tool we can find references to CORDIALFLIMSY
. This appears to be the codename for the packet format (includes trigger and payload). Because the tool is to be used only by the attacker, it has a lot of debug messages and strings aren’t obfuscated. Guess they weren’t counting on losing all those tools.
The data contents are encrypted with RC5/6, and contain the callback address and port. In theory it would be possible to recover the callback hosts if we had network dumps of all the packets. The leaks contain traffic bouncer tools so these callback addresses should be just bouncer addresses.
Regarding the RC5/6 code, we can find the same constant 0x61C88647
identified by Kaspersky as specific to the Equation group:
__int64 __fastcall RC56_keysetup(int *a1, _DWORD *a2)
{
__int64 v2; // rax
int i; // ecx
int v4; // ecx
int v5; // edi
__int64 result; // rax
int v7; // er10
unsigned int v8; // er11
int v9; // er14
int v10[7]; // [rsp+0h] [rbp-1Ch]
v10[0] = *a1;
v10[1] = a1[1];
v10[2] = a1[2];
*a2 = 0xB7E15163;
v2 = 1LL;
for ( i = 0x5618CB1C; ; i -= 0x61C88647 )
{
a2[v2] = i;
if ( v2 == 0x31 )
break;
++v2;
}
v4 = 0;
v5 = 0x96;
LODWORD(result) = 0;
v7 = 0;
v8 = 0;
do
{
v7 = __ROL4__(a2[v8] + v4 + v7, 3);
a2[v8] = v7;
v9 = __ROL4__(v7 + v4 + v10[(unsigned int)result], (v7 + v4) & 0x1F);
v10[(unsigned int)result] = v9;
v8 = v8 - 50 * ((v8 + 1) / 0x32) + 1;
result = (unsigned int)result - 3 * (((int)result + 1) / 3u) + 1;
--v5;
v4 = v9;
}
while ( v5 );
return result;
}
Stephen Checkoway wrote that Kaspersky analysis is wrong (the original URL is gone since 2019 or something).
The funny thing is that the macOS binary analysed here was most probably compiled with Clang 4.x (and Xcode 4.x) so his GCC theory might be wrong. Who knows? :-)
$ otool -l dewdrop__v__3_3_2_2_x86_64-darwin
Load command 10
cmd LC_LOAD_DYLIB
cmdsize 56
name /usr/lib/libSystem.B.dylib (offset 24)
time stamp 2 Thu Jan 1 01:00:02 1970
current version 159.1.0
compatibility version 1.0.0
The 159.1.0 version of /usr/lib/libSystem.B.dylib
can be found in 10.7 SDK and at least Xcode 4.6.3. It’s a Lion 10.7.2 to 10.7.5 system library, and so it should be available in older Xcode versions. Xcode 4 was released between 2011 (4.0) and 2013 (4.6).
$ otool -l /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.7.sdk/usr/lib/libSystem.B.dylib
Load command 3
cmd LC_ID_DYLIB
cmdsize 56
name /usr/lib/libSystem.B.dylib (offset 24)
time stamp 1 Thu Jan 1 01:00:01 1970
current version 159.1.0
compatibility version 1.0.0
$ otool -l /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.8.sdk/usr/lib/libSystem.B.dylib
Load command 3
cmd LC_ID_DYLIB
cmdsize 56
name /usr/lib/libSystem.B.dylib (offset 24)
time stamp 1 Thu Jan 1 01:00:01 1970
current version 169.3.0
compatibility version 1.0.0
$ uname -an
Darwin lion.local 11.4.2 Darwin Kernel Version 11.4.2: Thu Aug 23 16:25:48 PDT 2012; root:xnu-1699.32.7~1/RELEASE_X86_64 x86_64
$ otool -l /usr/lib/libSystem.B.dylib
Load command 3
cmd LC_ID_DYLIB
cmdsize 56
name /usr/lib/libSystem.B.dylib (offset 24)
time stamp 1 Thu Jan 1 01:00:01 1970
current version 159.1.0
compatibility version 1.0.0
There are some other hints that this codebase is quite old. For example, libpcap version is definitely between 0.9.6 and 0.9.8, all released in 2007. You can determine this via the struct pcap
definition. Version 0.9.5 was released in 2006, and version 1.0.0 in 2008. Most probably someone around 2007 or 2008 forked one of 0.9.6-0.9.8 versions and modified it to remove all strings and error messages build up. Need to verify the other operating systems versions to see if they match this information and use the same shared libpcap codebase.
Conclusions
The NSA toolset is pretty cool! I’m a big fan.
You can definitely see that architecture and code are carefully thought and engineered. They have been doing this for a long time and definitely have more resources than most (nation state) attackers. This isn’t some random proof of concept code. Almost every operating system is a target so their catalog is impressive. The problems of SIGINT and wanting to collect all the things.
There is no heavily obfuscated code, there are no hardcore anti-debugging measures, and no packers and/or cryptors used. They try to blend in and not be too unique. The leaked Lamberts (CIA) sample has similar ideas behind it. A common cyber philosophy? :-)
But they aren’t perfect (nobody is)! They suffer from the code reusage sin. For example, the same string obfuscation algorithm can be seen in other tools. The RC5/6 constant might be another example, if Kaspersky theory is right. Maybe this was true in the past, and the post ShadowBrokers future is better segmented.
Hindsight is always 20/20. It is very easy to detect the hacked machines after we have access to these tools and spend some time reverse engineering them. Four years ago I asked my friends at BinaryEdge to scan the Internet since this was their thing. Why reinvent the wheel if you know people who do great wheels?
Asked them to send this type of packet and check for the answer. The result is that we managed to find a bunch of hacked machines all over the Internet. That’s the main reason why I never published this before and this post is just focused on the BPF program. I am just a curious reverse engineer ;-).
I believe that the tools from the ShadowBrokers leaks are way more interesting than the exploits (altough the leaked exploits sure created a ton more damage worldwide). Reversing the tools allows you to understand a bit of their mindset, their engineering, and operations. And more important, appreciate their work. Here we care about bits and bytes.
Have fun,
fG!