Knock Knock! Who's There? - An NSA VM

December 17, 2021 · 27 min · 5571 words · fG!

Back in 2017 (feels like ages ago) I decided to take a peek into the ShadowBrokers leaks and reverse some of the tools.

I started on dewdrop simply because it had a macOS version. I made local presentations at 0xOpoSec and BSidesLisbon but those slides were never published for obvious reasons (aka live implants all over the Internet).

Significant time has passed and everyone went crazy last week with the beautiful NSO exploit VM published by Project Zero, so why not ride the wave and present a simple NSA BPF VM. It is still an interesting work and you have to admire the great engineering that goes behind this code. It’s not everyday that you can take a peek at code developed by a well funded state actor.

This post is only going to focus on the BPF part of the implant so you will have to fill in the blanks about everything else.

So let’s start!

After we start reversing the dewdrop binary we reach a point where we understand that a libpcap sniffer is installed and we are dealing with a port knocking backdoor with multiple targets such as Solaris, Linux, FreeBSD, HP-UX, JunOS, OS X. If it’s connected to the Internet there is a backdoor version. SIGINT all the things!

FX created the first (AFAIK!) port knocking backdoor, cd00r.c. Knock is a another example. Port knocking removes the need for port listening backdoors and hiding those listeners from traditional network tools (netstat and friends). It’s the year 2000, rootkits are all the rage! Holy crap, 21 years already?

Port knocking is essentially implemented as a custom sniffer looking for a magic packet (or a group of magic packets) and do something when it matches such as open a port, make a callback to a remote host, etc. Without the right sequence and/or data you can’t get in and detect it from the network scans.

Quoting cd00r.c:

The approach of cd00r.c is to provide remote access to the system without showing an open port all the time. This is done by using a sniffer on the specified interface to capture all kinds of packets. The sniffer is not running in promiscuous mode to prevent a kernel message in syslog and detection by programs like AnitSniff.

Libpcap has all the necessary features to implement this. We just need to install a listener and activate a BPF based filter since we don’t need to capture everything (it’s the 2000s, CPUs are slow, we don’t want to raise alarms). When the magic packet or sequence is triggered a callback is executed and we can do whatever we want.

The following is the observed backdoor process diagram:

process diagram

The initial process just forks and exit as a debugging measure (gdb used to have problems following child processes - software breakpoints crash because no debugger installed in the child). The first fork is the daemon and watchdog process responsible for managing the rest of the code. It then forks a child worker where the port knocking sniffer is installed.

The watchdog process is also very simple. It redirects all output to /dev/null, removes all signal handlers, forks a worker child and waits for it. Core files are disabled, file limits are increased, and a signal handler to kill the child is set, to avoid zombie processes if the watchdog is killed. Too much gore in Unix, uh?

Strings are XOR obfuscated. You can find a Unicorn Engine based deobfuscation utility here. At the time I was playing a lot with Unicorn so it was just faster to copy the shellcode into a quick Unicorn emulator. Recently I used a better approach with Lamberts obfuscation with the delambert IDA plugin.

The same obfuscation algorithm is reused in other tools. Ooops, code reusage sin!

After we understand the process diagram it is clear that we want to debug the worker child. My trick at the time was to patch a sleep (or just use Trammell’s simple and powerful infinite loop trick, which takes less bytes and doesn’t need offset computations) in the code that the child is going to run, attach the debugger, restore (or emulate) the original bytes, rewind EIP and have fun. Or, you can just skip the fork, and patch code or set EIP to the code that would be executed by the child.

Not sure why I originally used a sleep patch instead of the infinite loop. Maybe because it would look nicer on slides or to show how to call imported symbols. Go figure!

I can’t remember how but I understood that libpcap was used. All the library original strings are removed to avoid (easy) library identification. I think at the time I just compiled a libpcap version and manually identified the most interesting functions. Bindiff or Diaphora are the tools of the trade for this kind of task.

Usually the flow to install a libpcap based sniffer is:

Locate the network interfaces to sniff at.
Compile a filter.
Install the filter.
Start sniffing.
Get matching packets into a callback.
Process the matched packets.
Do something (evil).

Just read cd00r.c code and you will understand what’s going in the backdoor code.

In this sample the clear text BPF filter is missing and a call to pcap_compile to compile that filter. Instead of compiled on the fly, the (compiled) bytecode is embedded in the binary.

We can find the code that retrieves and installs the bytecode here:

IDA - dewdrop__v__3_3_2_2_x86_64-darwin

__text:0x100001882 E8 59 23 00 00  call    sub_100003BE0   ; retrieve the pre compiled bpf_program
__text:0x100001887 4C 89 E7        mov     rdi, r12
__text:0x10000188A 48 89 C6        mov     rsi, rax        ; rax = 0x000000010000A640
__text:0x10000188A                                         ;
__text:0x10000188A                                         ; struct bpf_program {
__text:0x10000188A                                         ;         u_int bf_len;
__text:0x10000188A                                         ;         struct bpf_insn *bf_insns;
__text:0x10000188A                                         ; };
__text:0x10000188A                                         ;
__text:0x10000188A                                         ; struct bpf_insn {
__text:0x10000188A                                         ;         u_short code;
__text:0x10000188A                                         ;         u_char  jt;
__text:0x10000188A                                         ;         u_char  jf;
__text:0x10000188A                                         ;         bpf_u_int32 k;
__text:0x10000188A                                         ; };
__text:0x10000188D E8 9E 51 00 00  call    pcap_setfilter  ; int
__text:0x10000188D                                         ; pcap_setfilter(pcap_t *p, struct bpf_program *fp)

The input to pcap_setfilter is a bpf program:

pcap_setfilter() is used to specify a filter program. fp is a pointer to a bpf_program struct, usually the result of a call to pcap_compile(3PCAP).

If we take a look at the referenced data it clearly looks like a potential bpf_program:

IDA __data:0x10000A644 00 __data:0x10000A645 00 __data:0x10000A646 00 __data:0x10000A647 00 __data:0x10000A648 __data:0x10000A650 __data:0x10000A660 30 __data:0x10000A661 00 __data:0x10000A662 00 __data:0x10000A663 00 __data:0x10000A664 0E __data:0x10000A665 00 __data:0x10000A666 00 __data:0x10000A667 00 __data:0x10000A668 15 __data:0x10000A669 00 __data:0x10000A66A 01 __data:0x10000A66B 00 __data:0x10000A66C 45 

We can define class="circle red"> - dewdrop__v__3_3_2_2_x86_64-darwin

__data:0x10000A640 39 00 00 00   dword_10000A640 dd 39h            ; DATA XREF: sub_100003BE0+62↑o db    0 db    0 db    0 db    0 60 A6 00 00 01 00 00 00 off_10000A648   dq offset port_knocking_bpf_program 00 00 00 00 00 00 00 00+                align 20h port_knocking_bpf_program db  30h ; DATA XREF: __data:off_10000A648↑o db    0 db    0 db    0 db  0Eh db    0 db    0 db    0 db  15h db    0 db    1 db    0 db  45h ; E the following types in IDA to transform the data into an array:
struct bpf_insn {
    u_short code;
    u_char  jt;
    u_char  jf;
    int32   k;
};

struct bpf_program {
    u_int bf_len;
    struct bpf_insn *bf_insns;
};
Applying the new types to the data we can see the expected bpf_program structure:


IDA - dewdrop__v__3_3_2_2_x86_64-darwin
__data:0x10000A640 stru_10000A640  bpf_program <39h, 0, offset port_knocking_bpf_program>
__data:0x10000A640                                   ; DATA XREF: sub_100003BE0+62↑o
__data:0x10000A640                                   ; sub_100003BE0+4↑r ...
__data:0x10000A650                 align 20h
__data:0x10000A660 port_knocking_bpf_program bpf_insn <     30h,        0,        0,      0Eh>
__data:0x10000A660                                   ; DATA XREF: __data:stru_10000A640↑o
__data:0x10000A660                 bpf_insn <     15h,        1,        0,      45h>
__data:0x10000A660                 bpf_insn <       6,        0,        0,        0>
__data:0x10000A660                 bpf_insn <     28h,        0,        0,      10h>
__data:0x10000A660                 bpf_insn <       4,        0,        0,      0Eh>
__data:0x10000A660                 bpf_insn <     35h,        1,        0,     0AAh>
__data:0x10000A660                 bpf_insn <       6,        0,        0,        0>
__data:0x10000A660                 bpf_insn <       2,        0,        0,        1>
__data:0x10000A660                 bpf_insn <     14h,        0,        0,        6>
__data:0x10000A660                 bpf_insn <       7,        0,        0,        0>
__data:0x10000A660                 bpf_insn <     48h,        0,        0,        0>
__data:0x10000A660                 bpf_insn <     44h,        0,        0,   0E6CFh>
 (...)

The bpf_program header tells us the program has 57 instructions. We definitely want to disassemble and reverse it.
Before the BPF program is retrieved, the network frame header size is computed:


IDA - dewdrop__v__3_3_2_2_x86_64-darwin
__text:0x100001877            mov     rdi, r12        ; pcap_t *
__text:0x10000187A            call    find_frame_header_size

Quite a few cases are supported but the virtual machine executing the sample uses Ethernet so we will go through the DLT_EN10MB case.


IDA - dewdrop__v__3_3_2_2_x86_64-darwin
__text:0x100001550 find_frame_header_size proc near   ; CODE XREF: sub_1000015F0+A8↓p
__text:0x100001550                                    ; sub_100001720+15A↓p
__text:0x100001550 ; __unwind {
__text:0x100001550            push    rbp
__text:0x100001551            mov     rbp, rsp
__text:0x100001554            call    pcap_datalink   ; int
__text:0x100001554                                    ; pcap_datalink(pcap_t *p)
__text:0x100001554                                    ; {
__text:0x100001554                                    ;         return (p->linktype);
__text:0x100001554                                    ; }
__text:0x100001559            cmp     eax, 6Bh ; 'k'  ; DLT_FRELAY
__text:0x10000155C            jg      short loc_100001574
__text:0x10000155E            cmp     eax, 1          ; DLT_EN10MB
__text:0x100001561            jz      short loc_100001585
__text:0x100001563            cmp     eax, 8          ; DLT_SLIP
__text:0x100001566            jz      short loc_10000158C
__text:0x100001568            cmp     eax, 0Ch        ; DLT_RAW
__text:0x10000156B            jz      short loc_100001593
__text:0x10000156D
__text:0x10000156D loc_10000156D:               ; CODE XREF: find_frame_header_size+2C↓j
__text:0x10000156D            mov     eax, 0FFFFh
__text:0x100001572            pop     rbp
__text:0x100001573            retn
__text:0x100001574 ; --------------------------------------------------------------------
__text:0x100001574
__text:0x100001574 loc_100001574:               ; CODE XREF: find_frame_header_size+C↑j
__text:0x100001574            cmp     eax, 71h ; 'q'  ; DLT_LINUX_SLL
__text:0x100001577            jz      short loc_10000158C
__text:0x100001579            cmp     eax, 6Ch ; 'l'  ; DLT_LOOP
__text:0x10000157C            jnz     short loc_10000156D
__text:0x10000157E            mov     eax, 4
__text:0x100001583            pop     rbp
__text:0x100001584            retn
__text:0x100001585 ; --------------------------------------------------------------------
__text:0x100001585
__text:0x100001585 loc_100001585:                ; CODE XREF: find_frame_header_size+11↑j
__text:0x100001585            mov     eax, 0Eh   ; /* Ethernet header length */
__text:0x10000158A            pop     rbp
__text:0x10000158B            retn
__text:0x10000158C ; --------------------------------------------------------------------
__text:0x10000158C
__text:0x10000158C loc_10000158C:                ; CODE XREF: find_frame_header_size+16↑j
__text:0x10000158C                               ; find_frame_header_size+27↑j
__text:0x10000158C            mov     eax, 10h
__text:0x100001591            pop     rbp
__text:0x100001592            retn
__text:0x100001593 ; --------------------------------------------------------------------
__text:0x100001593
__text:0x100001593 loc_100001593:                ; CODE XREF: find_frame_header_size+1B↑j
__text:0x100001593            xor     eax, eax
__text:0x100001595            pop     rbp
__text:0x100001596            retn
__text:0x100001596 ; } // starts at 100001550
__text:0x100001596 find_frame_header_size endp

The frame header size is necessary because the bpf program is modified to support those different link types (old stuff, uh?). This is what the function that returns the bpf program does:


IDA - dewdrop__v__3_3_2_2_x86_64-darwin
__text:0x100003BE0 sub_100003BE0   proc near     ; CODE XREF: sub_100001720+162↑p
__text:0x100003BE0 ; __unwind {
__text:0x100003BE0            push    rbp
__text:0x100003BE1            mov     rbp, rsp
__text:0x100003BE4            mov     rax, cs:stru_10000A640.bf_insns
__text:0x100003BEB            mov     [rax+4], edi
__text:0x100003BEE            lea     eax, [rdi+2]
__text:0x100003BF1            mov     rcx, cs:stru_10000A640.bf_insns
__text:0x100003BF8            mov     [rcx+1Ch], eax
__text:0x100003BFB            mov     rax, cs:stru_10000A640.bf_insns
__text:0x100003C02            mov     [rax+24h], edi
__text:0x100003C05            lea     eax, [rdi+9Ch]
__text:0x100003C0B            mov     rcx, cs:stru_10000A640.bf_insns
__text:0x100003C12            mov     [rcx+2Ch], eax
__text:0x100003C15            lea     eax, [rdi+9]
__text:0x100003C18            mov     rcx, cs:stru_10000A640.bf_insns
__text:0x100003C1F            mov     [rcx+0E4h], eax
__text:0x100003C25            lea     eax, [rdi+20h]
__text:0x100003C28            mov     rcx, cs:stru_10000A640.bf_insns
__text:0x100003C2F            mov     [rcx+0F4h], eax
__text:0x100003C35            mov     rax, cs:stru_10000A640.bf_insns
__text:0x100003C3C            mov     [rax+11Ch], edi
__text:0x100003C42            lea     rax, stru_10000A640 ; bpf_program size
__text:0x100003C49            pop     rbp
__text:0x100003C4A            retn

Or with decompiler assistance (IDA subscriptions model, lol…):
bpf_program *__fastcall sub_100003BE0(u_int32 frame_size)
{
  stru_10000A640.bf_insns->k = frame_size;
  stru_10000A640.bf_insns[3].k = frame_size + 2;
  stru_10000A640.bf_insns[4].k = frame_size;
  stru_10000A640.bf_insns[5].k = frame_size + 156;
  stru_10000A640.bf_insns[28].k = frame_size + 9;
  stru_10000A640.bf_insns[30].k = frame_size + 32;
  stru_10000A640.bf_insns[35].k = frame_size;
  return &stru_10000A640;
}
Recalling the struct bpf_insn definition:
struct bpf_insn {
    u_short code; // Instruction type and addressing mode
    u_char  jt;   // Jump if false
    u_char  jf;   // Jump if true
    int32   k;    // Generid field used for various purposes
};
That function is just adapting the bpf program to the different types of data links and fixing offsets.
The default program is defined for Ethernet, so the values will be the same as the original, meaning that we can dump directly the program from the binary. The following table comparing the original and computed values to modify proves this:


IDA - dewdrop__v__3_3_2_2_x86_64-darwin
__data:0x10000A660  bpf_insn < 30h, 0, 0, 0Eh> <- 0:  0xE = 0xE
__data:0x10000A660  bpf_insn < 28h, 0, 0, 10h> <- 3:  0xE + 2 = 0x10
__data:0x10000A660  bpf_insn <   4, 0, 0, 0Eh> <- 4:  0xE = 0xE
__data:0x10000A660  bpf_insn < 35h, 1, 0, AAh> <- 5:  0xE + 156 = 0xAA
__data:0x10000A660  bpf_insn < 30h, 0, 0, 17h> <- 28: 0xE + 9 = 0x17
__data:0x10000A660  bpf_insn < 30h, 0, 0, 2Eh> <- 30: 0xE + 32 = 0x2E
__data:0x10000A660  bpf_insn < 48h, 0, 0, 0Eh> <- 35: 0xE = 0xE

To disassemble the bytecode we can use bpftools published by Cloudflare. This is based out of the available Linux kernel tools in kernel source code. This repo compiles easily while I think I had some issues compiling the kernel version. Just install the dependencies and compile (Linux only AFAIK). The bpf debugger and disassembler can be found at linux_tools/bpf_dbg. If you feel brave (and lucky) you can always try radare2 (eh eh eh eh).
To load the bytecode we need to convert it to this tool format. Example:
load bpf 12,40 0 0 12,21 0 5 34525,48 0 0 20,21 6 0 17,21 0 6 44,48 0 0 54,21 3 4 17,21 0 3 2048,48 0 0 23,21 0 1 17,6 0 0 262144,6 0 0 0
The first field is the number of instructions, followed by each instruction in base 10 and space separated fields.
I created bpf_dbg_output to convert the instructions array to bpf_dbg input format.
We can finally disassemble the bpf payload:
> load bpf 57,48 0 0 14,21 1 0 69,6 0 0 0,40 0 0 16,4 0 0 14,53 1 0 170,6 0 0 0,2 0 0 1,20 0 0 6,7 0 0 0,72 0 0 0,68 0 0 59087,2 0 0 15,72 0 0 0,84 0 0 59087,132 0 0 0,20 0 0 1,7 0 0 0,96 0 0 15,92 0 0 0,7 0 0 0,2 0 0 2,96 0 0 1,28 0 0 0,7 0 0 0,72 0 0 0,2 0 0 3,97 0 0 2,48 0 0 23,21 0 5 6,48 0 0 46,116 0 0 2,20 0 0 20,12 0 0 0,7 0 0 0,72 0 0 14,2 0 0 4,96 0 0 1,20 0 0 2,7 0 0 0,72 0 0 0,68 0 0 40298,2 0 0 15,72 0 0 0,84 0 0 40298,132 0 0 0,20 0 0 1,7 0 0 0,96 0 0 15,92 0 0 0,7 0 0 0,96 0 0 4,29 2 0 0,96 0 0 3,29 0 1 0,6 0 0 65535,6 0 0 0
> disassemble
l0:     ldb [14]
l1:     jeq #0x45, l3, l2
l2:     ret #0
l3:     ldh [16]
l4:     add #14
l5:     jge #0xaa, l7, l6
l6:     ret #0
l7:     st M[1]
l8:     sub #6
l9:     tax 
l10:    ldh [x+0]
l11:    or #0xe6cf
l12:    st M[15]
l13:    ldh [x+0]
l14:    and #0xe6cf
l15:    neg 
l16:    sub #1
l17:    tax 
l18:    ld M[15]
l19:    and x
l20:    tax 
l21:    st M[2]
l22:    ld M[1]
l23:    sub x
l24:    tax 
l25:    ldh [x+0]
l26:    st M[3]
l27:    ldx M[2]
l28:    ldb [23]
l29:    jeq #0x6, l30, l35
l30:    ldb [46]
l31:    rsh #2
l32:    sub #20
l33:    add x
l34:    tax 
l35:    ldh [x+14]
l36:    st M[4]
l37:    ld M[1]
l38:    sub #2
l39:    tax 
l40:    ldh [x+0]
l41:    or #0x9d6a
l42:    st M[15]
l43:    ldh [x+0]
l44:    and #0x9d6a
l45:    neg 
l46:    sub #1
l47:    tax 
l48:    ld M[15]
l49:    and x
l50:    tax 
l51:    ld M[4]
l52:    jeq x, l55, l53
l53:    ld M[3]
l54:    jeq x, l55, l56
l55:    ret #0xffff
l56:    ret #0
To understand what is going on here we need the bpf instruction set:
  Instruction      Addressing mode      Description

  ld               1, 2, 3, 4, 10       Load word into A
  ldi              4                    Load word into A
  ldh              1, 2                 Load half-word into A
  ldb              1, 2                 Load byte into A
  ldx              3, 4, 5, 10          Load word into X
  ldxi             4                    Load word into X
  ldxb             5                    Load byte into X

  st               3                    Store A into M[]
  stx              3                    Store X into M[]

  jmp              6                    Jump to label
  ja               6                    Jump to label
  jeq              7, 8                 Jump on A == k
  jneq             8                    Jump on A != k
  jne              8                    Jump on A != k
  jlt              8                    Jump on A <  k
  jle              8                    Jump on A <= k
  jgt              7, 8                 Jump on A >  k
  jge              7, 8                 Jump on A >= k
  jset             7, 8                 Jump on A &  k

  add              0, 4                 A + <x>
  sub              0, 4                 A - <x>
  mul              0, 4                 A * <x>
  div              0, 4                 A / <x>
  mod              0, 4                 A % <x>
  neg                                   !A
  and              0, 4                 A & <x>
  or               0, 4                 A | <x>
  xor              0, 4                 A ^ <x>
  lsh              0, 4                 A << <x>
  rsh              0, 4                 A >> <x>

  tax                                   Copy A into X
  txa                                   Copy X into A

  ret              4, 9                 Return
The instruction set consists of load, store, branch, alu, miscellaneous and return instructions. Documentation from Linux kernel.
The following registers are available:
A : 32 bit wide accumulator
X : 32 bit wide X register
M[] : 16 x 32 bit wide misc registers aka “scratch memory store”, addressable from 0 to 15
And finally the addressing modes:
  Addressing mode  Syntax               Description

   0               x/%x                 Register X
   1               [k]                  BHW at byte offset k in the packet
   2               [x + k]              BHW at the offset X + k in the packet
   3               M[k]                 Word at offset k in M[]
   4               #k                   Literal value stored in k
   5               4*([k]&0xf)          Lower nibble * 4 at byte offset k in the packet
   6               L                    Jump label L
   7               #k,Lt,Lf             Jump to Lt if true, otherwise jump to Lf
   8               x/%x,Lt,Lf           Jump to Lt if true, otherwise jump to Lf
   9               #k,Lt                Jump to Lt if predicate is true
  10               x/%x,Lt              Jump to Lt if predicate is true
  11               a/%a                 Accumulator A
  12               extension            BPF extension
Note the relevant sizes definitions used here:
Half word: 2 bytes
Word: 4 bytes
The following valid ICMP packet (produced by their port knocking tool) is used to reverse the bpf program:
s10:40:09.498560 IP (tos 0x0, ttl 64, id 1, offset 0, flags [none], proto ICMP (1), length 164)
    192.168.30.14 > 192.168.30.15: ICMP echo request, id 24374, seq 25107, length 144
        0x0000:  000c 29db 3ac4 000c 29b6 b3c6 0800 4500  ..).:...).....E.
        0x0010:  00a4 0001 0000 4001 bcea c0a8 1e0e c0a8  ......@.........
        0x0020:  1e0f 0800 9e57 5f36 6213 7e29 5000 1743  .....W_6b.~)P..C
        0x0030:  5b50 ab0d addf b955 1089 578f 849b ecdf  [P.....U..W.....
        0x0040:  83fd 84c0 a779 7118 43ac ec65 1249 7e5f  .....yq.C..e.I~_
        0x0050:  e7ea 2b2a 6265 a1d3 912f 2dd8 b3ab e30a  ..+*be.../-.....
        0x0060:  0d3b e0f4 e527 3955 9f44 46d7 0608 7703  .;...'9U.DF...w.
        0x0070:  f134 5138 4845 68bd 7382 d1c4 1fcb adf5  .4Q8HEh.s.......
        0x0080:  c2ae 87b4 ac48 b398 5f65 24d3 7090 6c04  .....H.._e$.p.l.
        0x0090:  fc1a cb5b 99b5 ec76 a129 596e edb3 668b  ...[...v.)Yn..f.
        0x00a0:  8848 9bce f31c 458e 07b5 52c7 e647 7e0f  .H....E...R..G~.
        0x00b0:  e343                                     .C
The reversed and commented version of the program (modified instructions are 0, 3, 4, 5, 28, 30, 35):
> disassemble
l0:     ldb [14]          <- load byte from offset 14 (aka skip frame header) into A. A = 0xE
l1:     jeq #0x45, l3, l2 <- check if it's 0x45 - IP packet, header length = 5, version 4
l2:     ret #0            <- exit if not a IPv4 packet
l3:     ldh [16]          <- load IP packet total length into A: 0xA4 = 164 bytes. A = 0xA4
l4:     add #14           <- add link layer header size to A. A = 0xA4 + 0xE = 0xB2 (178)
l5:     jge #0xaa, l7, l6 <- full packet must be at least 170 bytes. A = 178 bytes
l6:     ret #0            <- exit if packet too short
l7:     st M[1]           <- store packet length into misc register 1. M[1] = 178
l8:     sub #6            <- A = packet_length - 6 = offset 172. A = 172
l9:     tax               <- copy A to X register. X = 0xAC
l10:    ldh [x+0]         <- load half word value from packet offset 0xAC. A = 0xE647
l11:    or #0xe6cf        <- 0xE647 | 0xE6CF. A = 0xE6CF
l12:    st M[15]          <- store result at M[15]. M[15] = 0xE6CF
l13:    ldh [x+0]         <- load half word value from packet offset 0xAC. A = 0xE647
l14:    and #0xe6cf       <- 0xE647 & 0xE6CF. A = 0xE647
l15:    neg               <- A = ~0xE647 + 1 = 0x19B9
l16:    sub #1            <- A = 0x19B8
l17:    tax               <- copy A to X register. X = 0x19B8
l18:    ld M[15]          <- load from M[15]. A = 0xE6CF
l19:    and x             <- 0xE6CF & 0x19B8. A = 0x88 (payload size)
                          <- ((0xE647 | 0xE6CF) & (~(0xE647 & 0xE6CF) + 1) - 1) = 0x88
                          <- this is just 0xE647 ^ 0xE6CF
                          <- because there is no XOR operation in this VM
                          <- this just extracts the payload size, which is fixed at 136 bytes
l20:    tax               <- X = 0x88 (payload data size)
l21:    st M[2]           <- M[2] = 0x88 (136)
l22:    ld M[1]           <- A = 178 (0xB2)
l23:    sub x             <- A = 0xB2 - 0x88 = 42 (0x2A)
l24:    tax               <- X = 42 (size of all headers up to the payload). 178 - 136 = 42
l25:    ldh [x+0]         <- 42 is the ICMP data offset in the packet 
                           - so all this just computes where the data starts
                           - they call it the trigger
                           - loads the data at packet offset 0x42. A = 0x7E29
l26:    st M[3]           <- M[3] = 0x7E29
l27:    ldx M[2]          <- X = 0x88
l28:    ldb [23]          <- loads the data at packet offset 23. A = 1 
                          <- IP header protocol field, ICMP in this case
l29:    jeq #0x6, l30, l35 <- check if it's TCP protocol
l30:    ldb [46]          <- it's TCP
l31:    rsh #2            <- 
l32:    sub #20           <-
l33:    add x             <-
l34:    tax               <-
l35:    ldh [x+14]        <- load half word from offset 0x88 + 0xE = 150 (0x96). A = 0xEC76
                          <- not sure what is this
l36:    st M[4]           <- M[4] = 0xEC76
l37:    ld M[1]           <- A = 178 (0xB2) (total packet size)
l38:    sub #2            <- A = 176
l39:    tax               <- X = 176
l40:    ldh [x+0]         <- A = 0xE343 (load half word from packet offset 0xB0).
                          <- It's the last half word from the packet.
l41:    or #0x9d6a        <- 0xE343 | 0x9D6A. A = 0xFF6B
l42:    st M[15]          <- M[15] = 0xFF6B
l43:    ldh [x+0]         <- A = 0xE343
l44:    and #0x9d6a       <- A = 0xE343 & 0x9D6A = 0x8142
l45:    neg               <- A = ~0x8142 + 1 = 0x7EBE
l46:    sub #1            <- A = 0x7EBD
l47:    tax               <- X = 0x7EBD
l48:    ld M[15]          <- A = 0xFF6B
l49:    and x             <- A = 0xFF6B & 0x7EBD = 0x7E29
l50:    tax               <- X = 0x7E29
                          <- ((0xE343 | 0x9D6A) & (~(0xE343 & 0x9D6A) + 1) - 1) = 0x7E29
                          <- this is just 0xE343 ^ 0x9D6A = 0x7E29
                          <- this means that the last half word XOR 0x96DA must be 
                          <- equal to the trigger
l51:    ld M[4]           <- A = 0xEC76 (the value at packet offset 0x96)
l52:    jeq x, l55, l53   <- OK if equal
l53:    ld M[3]           <- A = 0x7E29 (the value at beginning of data payload)
l54:    jeq x, l55, l56   <- OK if equal
l55:    ret #0xffff       <- WE HAVE A WINNER!
l56:    ret #0            <- Bad luck, maybe next packet?
We can describe the format of the data payload:
struct __attribute__((packed)) payload {
    uint16_t trigger;   // = checksum ^ 0x9D6A
    char     data[128];
    uint16_t size;      // = sizeof(struct payload) ^ 0xE6CF
    uint16_t unknown;
    uint16_t checksum;
}; // sizeof() = 136 bytes
Log from the tool that does the port knocking (different packet):
  TRIGGER DATA
  COMMAND                  = 0x01
  DESTINATION ADDRESS      = 192.168.30.15
  TRANSPORT PROTOCOL       = icmp (1)
  TIME STAMP               = Mon Apr 17 10:00:00 2017 (1492437600)
  TIME SKEW                = 43200
  ICMP TYPE, CODE          = 8, 0
  CALLBACK ADDRESS         = 192.168.30.14:55555
  SOURCE PORT              = 55302
  START OF TRIGGER         = 0xdd67
The port knocking tool is extremely flexible and can send all kinds of packets and payloads. It supports TCP, UDP, ICMP, and besides raw packets it can produce DNS, SMTP, SIP application payloads. Can set different flags in TCP packets, for example, send a RST packet with the port knocking payload. Even has a PIX firewall bypass (SYN only packet). Pretty much port knocking on steroids.
Next are some of the different packets it can build.
SIP application packet:


tcpdump
17:36:08.576977 IP (tos 0x0, ttl 64, id 2, offset 0, flags [none], proto TCP (6), length 342)
    192.168.30.14.64778 > 192.168.30.15.acmsoda: Flags [.], cksum 0x69fe (correct),
    seq 32774:33076, ack 47681, win 32767, length 302
        0x0000:  4500 0156 0002 0000 4006 bc32 c0a8 1e0e  E..V....@..2....
        0x0010:  c0a8 1e0f fd0a 1b39 0000 8006 0000 ba41  .......9.......A
        0x0020:  5010 7fff 69fe 0000 5245 4749 5354 4552  P...i...REGISTER
        0x0030:  2073 6970 3a61 2053 4950 2f32 2e30 0d0a  .sip:a.SIP/2.0..
        0x0040:  546f 3a20 3837 203c 7369 703a 3130 3240  To:.87.<sip:102@
        0x0050:  7878 2e6e 6574 3e0d 0a46 726f 6d3a 2032  xx.net>..From:.2
        0x0060:  3435 203c 7369 703a 3130 3140 7878 2e6e  45.<sip:101@xx.n
        0x0070:  6574 3e0d 0a43 616c 6c2d 4944 3a20 3233  et>..Call-ID:.23
        0x0080:  3334 3340 7071 7879 760d 0a43 5365 713a  343@pqxyv..CSeq:
        0x0090:  2032 3720 5245 4749 5354 4552 0d0a 436f  .27.REGISTER..Co
        0x00a0:  6e74 6163 743a 203c 7369 703a 3130 3140  ntact:.<sip:101@
        0x00b0:  7878 2e6e 6574 3e0d 0a43 6f6e 7465 6e74  xx.net>..Content
        0x00c0:  2d4c 656e 6774 683a 2031 3330 0d0a 1050  -Length:.130...P
        0x00d0:  2545 647d a2e2 f84d 3281 31bf dcd8 3669  %Ed}...M2.1...6i
        0x00e0:  4d73 a214 cb7f ff29 d61a 7e80 f543 8c71  Ms.....)..~..C.q
        0x00f0:  a7c8 b138 4fe8 59a8 02aa cecd c0c3 a94c  ...8O.Y........L
        0x0100:  2102 1938 58e4 32d6 3c4e f05a 9d9d be74  !..8X.2.<N.Z...t
        0x0110:  0dd1 5617 6eb6 7f50 424b bf94 52e5 bc52  ..V.n..PBK..R..R
        0x0120:  a769 4bfa 47c6 31d2 989a 93e5 972b 2ff3  .iK.G.1......+/.
        0x0130:  0987 6eec 5eed 3013 29dd 2e0f 5dcd 1c53  ..n.^.0.)...]..S
        0x0140:  3943 4712 54f7 5688 6791 3313 0c82 b47a  9CG.T.V.g.3....z
        0x0150:  e647 1076 8d3a                           .G.v.:

The trigger is 0x8d3a ^ 0x9d6a = 0x1050 (offset 0xCE).
DNS packet:


tcpdump
17:37:18.675460 IP (tos 0x0, ttl 64, id 2, offset 0, flags [none], proto TCP (6), length 233)
    192.168.30.14.56331 > 192.168.30.15.acmsoda: Flags [.], cksum 0x40a1 (correct), seq 10342:10535, 
    ack 48998, win 32767, length 193
        0x0000:  4500 00e9 0002 0000 4006 bc9f c0a8 1e0e  E.......@.......
        0x0010:  c0a8 1e0f dc0b 1b39 0000 2866 0000 bf66  .......9..(f...f
        0x0020:  5010 7fff 40a1 0000 af99 0180 0001 0001  P...@...........
        0x0030:  0000 0000 0231 3502 3330 0331 3638 0331  .....15.30.168.1
        0x0040:  3932 0769 6e2d 6164 6472 0461 7270 6100  92.in-addr.arpa.
        0x0050:  0010 0001 c00c 0010 0001 0000 ffff 0088  ................
        0x0060:  8718 4938 38d6 654b 6539 8033 da58 cc73  ..I88.eKe9.3.X.s
        0x0070:  3a57 25a3 7c08 5e9e 7734 a126 f954 3879  :W%.|.^.w4.&.T8y
        0x0080:  64bf 8f7b 741e 2e33 8cbd b07b c750 ae14  d..{t..3...{.P..
        0x0090:  819b d7f3 1742 1c05 2198 570d d509 f1a8  .....B..!.W.....
        0x00a0:  c323 da36 41b2 ca11 b955 dd59 67e2 d495  .#.6A....U.Yg...
        0x00b0:  fff3 e7cf 3ce7 33a2 6bc1 f8a3 5d98 9983  ....<.3.k...]...
        0x00c0:  583f b3b9 289e c3b9 70bf 45e9 69eb db32  X?..(...p.E.i..2
        0x00d0:  1a42 e586 220a fd23 77ad acff 75a2 027d  .B.."..#w...u..}
        0x00e0:  1a6c 83e6 4718 6f85 23                   .l..G.o.#

The trigger is 0x8523 ^ 0x9d6a = 0x1849 (offset 0x61).
SMTP HELO packet:


tcpdump
17:38:18.413178 IP (tos 0x0, ttl 64, id 2, offset 0, flags [none], proto TCP (6), length 181)
    192.168.30.14.19215 > 192.168.30.15.acmsoda: Flags [.], cksum 0x790a (correct), seq 51209:51350, 
    ack 1802, win 32767, length 141
        0x0000:  4500 00b5 0002 0000 4006 bcd3 c0a8 1e0e  E.......@.......
        0x0010:  c0a8 1e0f 4b0f 1b39 0000 c809 0000 070a  ....K..9........
        0x0020:  5010 7fff 790a 0000 4845 4c4f 204f dc75  P...y...HELO.O.u
        0x0030:  eee6 2436 82ce 0567 a237 e408 c0c5 ea55  ..$6...g.7.....U
        0x0040:  4425 df77 fb9f 1498 c2b5 3cd3 84cd 4178  D%.w......<...Ax
        0x0050:  e9f7 23e2 0f02 0ce7 7c36 8aa6 f4ea 43cd  ..#.....|6....C.
        0x0060:  d1fe 2406 77b5 29a7 83c4 f497 5c06 75a5  ..$.w.).....\.u.
        0x0070:  1528 39a0 d80c 412f f35e 067d d857 ee1c  .(9...A/.^.}.W..
        0x0080:  7f3d b09b f209 b8ad fff1 1f1d 52a8 87ca  .=..........R...
        0x0090:  73dc 5c1a 4a69 4205 f5d9 6836 6f09 be5e  s.\.JiB...h6o..^
        0x00a0:  6b87 7ad6 1138 de67 6c3b 33d8 98a4 35e6  k.z..8.gl;3...5.
        0x00b0:  474f fad2 b6                             GO...

The trigger is 0xd2b6 ^ 0x9d6a = 0x4fdc (offset 0x2D).
In the port knocking tool we can find references to CORDIALFLIMSY. This appears to be the codename for the packet format (includes trigger and payload). Because the tool is to be used only by the attacker, it has a lot of debug messages and strings aren’t obfuscated. Guess they weren’t counting on losing all those tools.
The data contents are encrypted with RC5/6, and contain the callback address and port. In theory it would be possible to recover the callback hosts if we had network dumps of all the packets. The leaks contain traffic bouncer tools so these callback addresses should be just bouncer addresses.
Regarding the RC5/6 code, we can find the same constant 0x61C88647 identified by Kaspersky as specific to the Equation group:
__int64 __fastcall RC56_keysetup(int *a1, _DWORD *a2)
{
  __int64 v2; // rax
  int i; // ecx
  int v4; // ecx
  int v5; // edi
  __int64 result; // rax
  int v7; // er10
  unsigned int v8; // er11
  int v9; // er14
  int v10[7]; // [rsp+0h] [rbp-1Ch]

  v10[0] = *a1;
  v10[1] = a1[1];
  v10[2] = a1[2];
  *a2 = 0xB7E15163;
  v2 = 1LL;
  for ( i = 0x5618CB1C; ; i -= 0x61C88647 )
  {
    a2[v2] = i;
    if ( v2 == 0x31 )
      break;
    ++v2;
  }
  v4 = 0;
  v5 = 0x96;
  LODWORD(result) = 0;
  v7 = 0;
  v8 = 0;
  do
  {
    v7 = __ROL4__(a2[v8] + v4 + v7, 3);
    a2[v8] = v7;
    v9 = __ROL4__(v7 + v4 + v10[(unsigned int)result], (v7 + v4) & 0x1F);
    v10[(unsigned int)result] = v9;
    v8 = v8 - 50 * ((v8 + 1) / 0x32) + 1;
    result = (unsigned int)result - 3 * (((int)result + 1) / 3u) + 1;
    --v5;
    v4 = v9;
  }
  while ( v5 );
  return result;
}
Stephen Checkoway wrote that Kaspersky analysis is wrong (the original URL is gone since 2019 or something).
The funny thing is that the macOS binary analysed here was most probably compiled with Clang 4.x (and Xcode 4.x) so his GCC theory might be wrong. Who knows? :-)


bash
$ otool -l dewdrop__v__3_3_2_2_x86_64-darwin
Load command 10
          cmd LC_LOAD_DYLIB
      cmdsize 56
         name /usr/lib/libSystem.B.dylib (offset 24)
   time stamp 2 Thu Jan  1 01:00:02 1970
      current version 159.1.0
compatibility version 1.0.0

The 159.1.0 version of /usr/lib/libSystem.B.dylib can be found in 10.7 SDK and at least Xcode 4.6.3. It’s a Lion 10.7.2 to 10.7.5 system library, and so it should be available in older Xcode versions. Xcode 4 was released between 2011 (4.0) and 2013 (4.6).


bash
$ otool -l /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.7.sdk/usr/lib/libSystem.B.dylib
Load command 3
          cmd LC_ID_DYLIB
      cmdsize 56
         name /usr/lib/libSystem.B.dylib (offset 24)
   time stamp 1 Thu Jan  1 01:00:01 1970
      current version 159.1.0
compatibility version 1.0.0

$ otool -l /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.8.sdk/usr/lib/libSystem.B.dylib
Load command 3
          cmd LC_ID_DYLIB
      cmdsize 56
         name /usr/lib/libSystem.B.dylib (offset 24)
   time stamp 1 Thu Jan  1 01:00:01 1970
      current version 169.3.0
compatibility version 1.0.0

$ uname -an
Darwin lion.local 11.4.2 Darwin Kernel Version 11.4.2: Thu Aug 23 16:25:48 PDT 2012; root:xnu-1699.32.7~1/RELEASE_X86_64 x86_64

$ otool -l /usr/lib/libSystem.B.dylib
Load command 3
          cmd LC_ID_DYLIB
      cmdsize 56
         name /usr/lib/libSystem.B.dylib (offset 24)
   time stamp 1 Thu Jan  1 01:00:01 1970
      current version 159.1.0
compatibility version 1.0.0

There are some other hints that this codebase is quite old. For example, libpcap version is definitely between 0.9.6 and 0.9.8, all released in 2007. You can determine this via the struct pcap definition. Version 0.9.5 was released in 2006, and version 1.0.0 in 2008. Most probably someone around 2007 or 2008 forked one of 0.9.6-0.9.8 versions and modified it to remove all strings and error messages build up. Need to verify the other operating systems versions to see if they match this information and use the same shared libpcap codebase.
Conclusions#
The NSA toolset is pretty cool! I’m a big fan.
You can definitely see that architecture and code are carefully thought and engineered. They have been doing this for a long time and definitely have more resources than most (nation state) attackers. This isn’t some random proof of concept code. Almost every operating system is a target so their catalog is impressive. The problems of SIGINT and wanting to collect all the things.
There is no heavily obfuscated code, there are no hardcore anti-debugging measures, and no packers and/or cryptors used. They try to blend in and not be too unique. The leaked Lamberts (CIA) sample has similar ideas behind it. A common cyber philosophy? :-)
But they aren’t perfect (nobody is)! They suffer from the code reusage sin. For example, the same string obfuscation algorithm can be seen in other tools. The RC5/6 constant might be another example, if Kaspersky theory is right. Maybe this was true in the past, and the post ShadowBrokers future is better segmented.
Hindsight is always 20/20. It is very easy to detect the hacked machines after we have access to these tools and spend some time reverse engineering them. Four years ago I asked my friends at BinaryEdge to scan the Internet since this was their thing. Why reinvent the wheel if you know people who do great wheels?
Asked them to send this type of packet and check for the answer. The result is that we managed to find a bunch of hacked machines all over the Internet. That’s the main reason why I never published this before and this post is just focused on the BPF program. I am just a curious reverse engineer ;-).
I believe that the tools from the ShadowBrokers leaks are way more interesting than the exploits (altough the leaked exploits sure created a ton more damage worldwide). Reversing the tools allows you to understand a bit of their mindset, their engineering, and operations. And more important, appreciate their work. Here we care about bits and bytes.
Have fun,
fG!