These days the de facto debugger in macOS is LLDB. Apple’s old gdb fork doesn’t work anymore and the GNU gdb version is better these days but still quite meh (in the past it couldn’t deal with fat binary targets and I still think this holds true). So we are all essentially stuck with LLDB, warts and all. I also hate the lack of a gdbinit style output but Deroko started that project and I improved it with lldbinit.

Besides its horrible long command line syntax which is so unpopular that gdb-compatible commands were introduced, my biggest problem with it has been the lack of x86 hardware breakpoint support. While hardware breakpoints might not be needed to debug applications within Xcode, they are essential to any serious reverse engineer dealing with arbitrary untrusted targets such as malware, packers, obfuscators, and DRM. It has been a serious blocker for me against some targets and a source of immense frustration because it should be a basic debugger feature.

Last week I finally got fed up enough to dive into the LLDB C++ codebase and finally try to implement this feature. Instead of just posting a patch, this post is a journey into LLDB internals and how I implemented this feature. Hopefully it will help others exploring the LLDB codebase, which seems unfriendly because of the lack of really good documentation into its architecture. Maybe this could lead to further improvements and make LLDB more reverse engineer friendly.

Step number 1: Building LLDB

The official building instructions seem slightly outdated so I will start by how I build my LLDB. If we can’t build it we can’t easily modify it.

Dependencies:

The version numbers are the ones I used but others might also work. Host systems were macOS High Sierra and Mojave. This is untested in Catalina but I don’t expect issues.

After all the dependencies are installed we can finally download the code. I am using the latest public release version 9.0.0 but the patches merge successfully into the master branch through at least November 18. Apple uses a different version scheme tied to Xcode versions. In the past there used to be separate source packages but LLDB is now in a single repository for LLVM and its subprojects.

All the code snippets and features will be based on the 9.0.0 release tag. I have different Xcode versions, implying different LLDB versions and different sets of features, so it’s best to reference a single version.

git clone https://github.com/llvm/llvm-project.git
cd llvm-project
git checkout llvmorg-9.0.0

Now we can start to build lldb and debugserver. First let’s generate the build files with CMake.

An important issue is that a code signing certificate is required to build debugserver. The build system defaults to a self-signed code certificate named lldb_codesign. The process is described here. There is a script lldb/scripts/macos-setup-codesign.sh that should generate a functioning certificate if you don’t already have one on your Mac.

mkdir build
cd build
cmake -G Ninja -DLLVM_ENABLE_PROJECTS="clang;lldb" -DLLDB_INCLUDE_TESTS=OFF ../llvm

If you have a valid Apple Developer certificate then you can build with it and distribute your lldb to other computers. In this case we need to specify the certificate User ID/OU to CMake using the LLDB_CODESIGN_IDENTITY option. The codesign utility needs to be authorized to use this certificate, so the easiest way is to select “Always Allow” on the Keychain authorization prompt otherwise you will have to authenticate on each build.

mkdir build
cd build
cmake -G Ninja -DLLVM_ENABLE_PROJECTS="clang;lldb" -DLLDB_CODESIGN_IDENTITY="XXXXXXXX" -DLLDB_INCLUDE_TESTS=OFF ../llvm

When generation is finished we can finally build lldb using ninja. We need to build at least two targets, lldb and debugserver. More about debugserver next.

ninja lldb
ninja debugserver

On a 6-core Mac Pro lldb takes around 20 minutes to build (around 6 mins on 28 cores KVM/QEMU based VM). debugserver is much faster, a couple of seconds.

The problem of this build is that it can’t be moved to other machines because liblldb.dyld references. LLDB build documentation talks about a standalone build but those instructions appear outdated and don’t really work. I have hacked something that works but it’s not perfect - LLDB architecture doesn’t seem adequate to just copy two binaries (lldb and debugserver) between different machines.

I have included a Cmake file together with the patch. It is adapted from Apple-lldb-base.cmake, Apple-lldb-macOS.cmake, and Apple-lldb-Xcode.cmake referenced in LLDB build documentation. The build process is slightly different.

mkdir build
cd build
cmake -G Ninja -C /path/to/standalone_lldb.cmake -DLLVM_ENABLE_PROJECTS="clang;lldb" -DLLDB_CODESIGN_IDENTITY="XXXXXXX" ../llvm
ninja lldb
ninja debugserver

After build is finished we need to modify the RPATH for lldb binary. It used the absolute path to the build environment so if we move the binary to some other machine it will not run because of that.

Assuming that we are in the build folder root:

install_name_tool -rpath "$PWD/bin" . bin/lldb

This way we can move the bin folder to some other machine and execute lldb from it. The correct debugserver binary will be used. I haven’t found a better way to make lldb and debugserver portable. The install-distribution by default generates a Xcode.app install tree, which is something we don’t want.

I would love to know if there is a better way to build a distributable LLDB. The best solution would be a statically linked lldb binary but I doubt that is possible because Python dependencies inside the LLDB.framework and other things. This way it works and the compromise appears acceptable. Copy everything into /usr/local/bin and rename the lldb binary to avoid conflict with Xcode.

Step number 2: About debugserver

I never bothered to understand the role of debugserver in LLDB. I was confused when the breakpoints error message I was looking was located in source files labeled “remote” and needed to understand why. The LLDB remote debugging page clarifies this. LLDB employs a client-server architecture using gdb-remote protocol even for local debugging. Instead of having different code for local and remote debugging sessions (like gdb and gdbserver), gdb-remote protocol is used for both. Local session communication is made via loopback interface, where debugserver is the remote debugging stub but listening locally. This is a design decision that makes sense and performance-wise it should be fine - if we are willing to accept latency in remote debugging sessions then the same could hold true for local sessions. In practice you don’t really notice performance issues in local debugging.

This is the main reason why debugserver needs to be code signed – because it is the real debugger process responsible for controlling targets and managing the target exceptions. In macOS this requires certain entitlements that need a code signature to become enabled.

For example if debugserver is not code signed or ad-hoc signed there will be an error when we try to launch a target process.

(lldbinit) process launch -s
Process 22920 exited with status = -1 (0xffffffff) Error 1

In this case debugserver is unable to retrieve the task port for the target and so it can’t proceed as a functioning debugger.

Step number 3: LLDB logging

After some initial digging to start understanding the problem with hardware breakpoints and a bit of frustrated ranting at Twitter, Jason Molenda (one of the developers that ported gdb to macOS) gave me a very useful hint on how to enable LLDB logging. This turned out to be super useful because the logs show the functions and methods called when you try to set breakpoints. A big thank you to Jason!

To activate logging just use the following command before starting the target process:

settings set target.process.extra-startup-command QSetLogging:bitmask=LOG_ALL;

The bitmask can be a combination of the following:

LOG_VERBOSE
LOG_PROCESS
LOG_THREAD
LOG_EXCEPTIONS
LOG_SHLIB
LOG_MEMORY
LOG_MEMORY_DATA_SHORT
LOG_MEMORY_DATA_LONG
LOG_MEMORY_PROTECTIONS
LOG_BREAKPOINTS
LOG_EVENTS
LOG_WATCHPOINTS
LOG_STEP
LOG_TASK
LOG_ALL
LOG_DEFAULT
LOG_NONE
LOG_RNB_MINIMAL
LOG_RNB_MEDIUM
LOG_RNB_MAX
LOG_RNB_COMM
LOG_RNB_REMOTE
LOG_RNB_EVENTS
LOG_RNB_PROC
LOG_RNB_PACKETS
LOG_RNB_ALL
LOG_RNB_DEFAULT
LOG_DARWIN_LOG
LOG_RNB_NONE

The log messages are sent to the macOS log server so we can follow those messages in real time:

log stream --process debugserver --style compact

For example a setting of LOG_ALL|LOG_RNB_ALL gives us the following output when we set a software breakpoint:

[59b0/0e03]: ::read ( 8, 0x70000a1f98c0, 1024 ) => 18 err = 0x00000000
[59b0/0e03]: read: $Z0,10000b19d,1#34
[59b0/0e03]: getpkt: $Z0,10000b19d,1#34
[59b0/0307]: RNBRunLoopInferiorExecuting ctx.Events().WaitForSetEvents(0x000000a5) => 0x00000020 (read_packet_available )
[59b0/0307]: HandleReceivedPacket ("Z0,10000b19d,1");
[59b0/0307]: MachProcess::CreateBreakpoint ( addr = 0x10000b19d, length = 1, hardware = 0)
[59b0/0307]: MachProcess::EnableBreakpoint ( addr = 0x10000b19d )
[59b0/0307]: ::mach_vm_read ( task = 0x1503, addr = 0x10000b19d, size = 1, data => 0x103064000, dataCnt => 1 ) err = 0x00000000
[59b0/0307]: MachTask::ReadMemory ( addr = 0x10000b19d, size = 1, buf = 0x7fe9cc7012d0) => 1 bytes read 0x10000b19d: 6a
[59b0/0307]: ::mach_vm_region_recurse ( task = 0x1503, address => 0x10000a000, size => 307200, nesting_depth => 0, info => 0x7ffeed232024, infoCnt => 12) addr = 0x10000b19d  err = 0x00000000
[59b0/0307]: info = { prot = 5, max_prot = 7, inheritance = 0x00000001, offset = 0x00000000, user_tag = 0x00000000, ref_count = 829, shadow_depth = 2, ext_pager = 1, share_mode = 1, is_submap = 0, behavior = 0, object_id = 0x39739631, us
[59b0/0307]: ::mach_vm_protect ( task = 0x1503, addr = 0x10000b19d, size = 1, set_max = 0, prot = 3 ) err = 0x00000000
[59b0/0307]: ::mach_vm_write ( task = 0x1503, addr = 0x10000b19d, data = 0x102b1ceb8, dataCnt = 1 ) err = 0x00000000
[59b0/0307]: ::mach_vm_protect ( task = 0x1503, addr = 0x10000b19d, size = 1, set_max = 0, prot = 5 ) err = 0x00000000
[59b0/0307]: MachTask::WriteMemory ( addr = 0x10000b19d, size = 1, buf = 0x102b1ceb8) => 1 bytes written 0x10000b19d: cc
[59b0/0307]: ::mach_vm_read ( task = 0x1503, addr = 0x10000b19d, size = 1, data => 0x103064000, dataCnt => 1 ) err = 0x00000000
[59b0/0307]: MachTask::ReadMemory ( addr = 0x10000b19d, size = 1, buf = 0x7ffeed23217c) => 1 bytes read 0x10000b19d: cc
[59b0/0307]: MachProcess::EnableBreakpoint ( addr = 0x10000b19d ) : SUCCESS.
[59b0/0307]: MachProcess::CreateBreakpoint ( addr = 0x10000b19d, length = 1) => 0x7fe9cc7012c8
[59b0/0307]:  8832296 RNBRemote::SendPacket (OK) called
[59b0/0307]: ::write ( socket = 8, buffer = 0x7ffeed231de9, length = 6) => 6 err = 0x00000000
[59b0/0307]: putpkt: $OK#00
[59b0/0307]: sent: $OK#00
[59b0/0307]: RNBRunLoopInferiorExecuting ctx.Events().WaitForSetEvents(0x000000a5) ...

When we try to set a hardware breakpoint on unmodified LLDB:

[59b6/0f03]: ::read ( 8, 0x700004eb7a50, 1024 ) => 18 err = 0x00000000
[59b6/0f03]: read: $Z1,10000b19d,1#35
[59b6/0f03]: getpkt: $Z1,10000b19d,1#35
[59b6/0307]: RNBRunLoopInferiorExecuting ctx.Events().WaitForSetEvents(0x000000a5) => 0x00000020 (read_packet_available )
[59b6/0307]: unimplemented packet: 'Z1,10000b19d,1'
[59b6/0307]:  8898466 RNBRemote::HandlePacket_UNIMPLEMENTED("Z1,10000b19d,1")
[59b6/0307]:       24 RNBRemote::SendPacket () called
[59b6/0307]: ::write ( socket = 8, buffer = 0x7ffeef7bdb11, length = 4) => 4 err = 0x00000000
[59b6/0307]: putpkt: $#00
[59b6/0307]: sent: $#00
[59b6/0307]: RNBRunLoopInferiorExecuting ctx.Events().WaitForSetEvents(0x000000a5) ...

It is clear that software breakpoints are set with a Z0 packet and Z1 is used for hardware breakpoints (and deleted with z0 and z1). The most helpful output is about the function and method names that allow us to quickly find the relevant source code without wasting time understanding the entire LLDB codebase.

While writing this blogpost and browsing the source code I found an alternative way to enable logging. It’s the log command inside lldb. Newer versions seem to have better logging output versus older versions. At first I thought it was some differences in Apple’s version but it’s not. There also seem to be some differences output-wise versus the first logging method so a combination of both might be a good choice.

The command to enable logging inside lldb is:

(lldbinit) help log enable
     Enable logging for a single log channel.

Syntax: log enable <cmd-options> <log-channel> <log-category> [<log-category> [...]]

We can list all the available channels with log list:

  • dwarf
  • gdb-remote
  • kdp-remote
  • lldb

And for each there are different log categories.

(lldbinit) log list
(...)
Logging categories for 'gdb-remote':
  all - all available logging categories
  default - default set of logging categories
  async - log asynchronous activity
  break - log breakpoints
  comm - log communication activity
  packets - log gdb remote packets
  memory - log memory reads and writes
  data-short - log memory bytes for memory reads and writes for short transactions only
  data-long - log memory bytes for memory reads and writes for all transactions
  process - log process events and activities
  step - log step related activities
  thread - log thread events and activities
  watch - log watchpoint related activities
(...)

For example, to enable gdb-remote packet logging:

(lldbinit) log enable gdb-remote packets
(lldbinit) breakpoint set -a 0x1000041a2
 history[1] tid=0x0307 <   1> send packet: +
 history[2] tid=0x0307 <  19> send packet: $QStartNoAckMode#b0
 history[3] tid=0x0307 <   1> read packet: +
 history[4] tid=0x0307 <   6> read packet: $OK#9a
 history[5] tid=0x0307 <   1> send packet: +
 history[6] tid=0x0307 <  41> send packet: $qSupported:xmlRegisters=i386,arm,mips#12
 history[7] tid=0x0307 <  48> read packet: $qXfer:features:read+;PacketSize=20000;qEcho+#00
 history[8] tid=0x0307 <  26> send packet: $QThreadSuffixSupported#e4
 history[9] tid=0x0307 <   6> read packet: $OK#00
 history[10] tid=0x0307 <  27> send packet: $QListThreadsInStopReply#21
 (...)

By default the log will be sent to lldb console, which can be a bit messy combined with the debugging session. Output can be redirected to a file with -f filename option to log enable.

Step number 4: Diving into LLDB codebase

The starting point is the hardware breakpoint error message.

(lldbinit) breakpoint set -a 0x10000b19d -H
warning: failed to set breakpoint site at 0x10000b19d for breakpoint 1.1: hardware breakpoints are not supported
Breakpoint 1: where = dyld`_dyld_start + 1, address = 0x000000010000b19d

It can be found at lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp:

Status ProcessGDBRemote::EnableBreakpointSite(BreakpointSite *bp_site) {
(...)
  // The process of setting a hardware breakpoint is much the same
  // as above.
  // We check the supported boolean for this breakpoint type, and if it is
  // thought to be supported then we will try to set this breakpoint with
  // a hardware breakpoint.
  if (m_gdb_comm.SupportsGDBStoppointPacket(eBreakpointHardware)) {
    // Try to send off a hardware breakpoint packet ($Z1)
    uint8_t error_no = m_gdb_comm.SendGDBStoppointTypePacket(
        eBreakpointHardware, true, addr, bp_op_size);
    if (error_no == 0) {
      // The breakpoint was placed successfully
      bp_site->SetEnabled(true);
      bp_site->SetType(BreakpointSite::eHardware);
      return error;
    }

    // Check if the error was something other then an unsupported
    // breakpoint type
    if (m_gdb_comm.SupportsGDBStoppointPacket(eBreakpointHardware)) {
      // Unable to set this hardware breakpoint
      if (error_no != UINT8_MAX)
        error.SetErrorStringWithFormat(
            "error: %d sending the hardware breakpoint request "
            "(hardware breakpoint resources might be exhausted"
            "or unavailable)", error_no);
      else
        error.SetErrorString("error sending the hardware breakpoint "
                             "request (hardware breakpoint resources "
                             "might be exhausted or unavailable)");
      return error;
    }

    // We will reach here when the stub gives an unsupported response to a
    // hardware breakpoint
    LLDB_LOGF(log, "Hardware breakpoints are unsupported");

    // Finally we will falling through to a #trap style breakpoint
  }

  // Don't fall through when hardware breakpoints were specifically
  // requested
  if (bp_site->HardwareRequired()) {
    error.SetErrorString("hardware breakpoints are not supported");
    return error;
  }

  // As a last resort we want to place a manual breakpoint. An instruction
  // is placed into the process memory using memory write packets.
  return EnableSoftwareBreakpoint(bp_site);
}

The first time SupportsGDBStoppointPacket(eBreakpointHardware) is executed it returns true, which is the default value from lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationClient.h:

  bool m_supports_qProcessInfoPID : 1, m_supports_qfProcessInfo : 1,
     m_supports_qUserName : 1, m_supports_qGroupName : 1,
     m_supports_qThreadStopInfo : 1, m_supports_z0 : 1, m_supports_z1 : 1,
     m_supports_z2 : 1, m_supports_z3 : 1, m_supports_z4 : 1,
     m_supports_QEnvironment : 1, m_supports_QEnvironmentHexEncoded : 1,
     m_supports_qSymbol : 1, m_qSymbol_requests_done : 1,
     m_supports_qModuleInfo : 1, m_supports_jThreadsInfo : 1,
     m_supports_jModulesInfo : 1;

  bool SupportsGDBStoppointPacket(GDBStoppointType type) {
    switch (type) {
    case eBreakpointSoftware:
      return m_supports_z0;
    case eBreakpointHardware:
      return m_supports_z1;
    case eWatchpointWrite:
      return m_supports_z2;
    case eWatchpointRead:
      return m_supports_z3;
    case eWatchpointReadWrite:
      return m_supports_z4;
    default:
      return false;
    }
  }

This means that SendGDBStoppointTypePacket will be executed and send a request to debugserver to set a hardware breakpoint.

Let’s look at that function:

uint8_t GDBRemoteCommunicationClient::SendGDBStoppointTypePacket(
    GDBStoppointType type, bool insert, addr_t addr, uint32_t length) {
  Log *log(GetLogIfAnyCategoriesSet(LIBLLDB_LOG_BREAKPOINTS));
  LLDB_LOGF(log, "GDBRemoteCommunicationClient::%s() %s at addr = 0x%"
            PRIx64, __FUNCTION__, insert ? "add" : "remove", addr);

  // Check if the stub is known not to support this breakpoint type
  if (!SupportsGDBStoppointPacket(type))
    return UINT8_MAX;
  // Construct the breakpoint packet
  char packet[64];
  const int packet_len =
      ::snprintf(packet, sizeof(packet), "%c%i,%" PRIx64 ",%x",
                 insert ? 'Z' : 'z', type, addr, length);
  // Check we haven't overwritten the end of the packet buffer
  assert(packet_len + 1 < (int)sizeof(packet));
  UNUSED_IF_ASSERT_DISABLED(packet_len);
  StringExtractorGDBRemote response;
  // Make sure the response is either "OK", "EXX" where XX are two hex 
  // digits, or "" (unsupported)
  response.SetResponseValidatorToOKErrorNotSupported();
  // Try to send the breakpoint packet, and check that it was correctly
  // sent
  if (SendPacketAndWaitForResponse(packet, response, true) ==
      PacketResult::Success) {
    // Receive and OK packet when the breakpoint successfully placed
    if (response.IsOKResponse())
      return 0;

    // Status while setting breakpoint, send back specific error
    if (response.IsErrorResponse())
      return response.GetError();

    // Empty packet informs us that breakpoint is not supported
    if (response.IsUnsupportedResponse()) {
      // Disable this breakpoint type since it is unsupported
      switch (type) {
      case eBreakpointSoftware:
        m_supports_z0 = false;
        break;
      case eBreakpointHardware:
        m_supports_z1 = false;
        break;
      case eWatchpointWrite:
        m_supports_z2 = false;
        break;
      case eWatchpointRead:
        m_supports_z3 = false;
        break;
      case eWatchpointReadWrite:
        m_supports_z4 = false;
        break;
      case eStoppointInvalid:
        return UINT8_MAX;
      }
    }
  }
  // Signal generic failure
  return UINT8_MAX;
}

If hardware breakpoints are not supported then m_supports_z1 will be set to false and further attempts to set a hardware breakpoint will not call SendGDBStoppointTypePacket again because SupportsGDBStoppointPacket(eBreakpointHardware) will now always return false.

We can also see in the code that a Z packet is being created, matching what we previously saw in the logs. So we need to point our attention to the code that handles the packets in debugserver.

The log points to HandleReceivedPacket that can be found at lldb/tools/debugserver/source/RNBRemote.cpp:

rnb_err_t RNBRemote::HandleReceivedPacket(PacketEnum *type) {
  static DNBTimer g_packetTimer(true);

  //  DNBLogThreadedIf (LOG_RNB_REMOTE, "%8u RNBRemote::%s",
  //  (uint32_t)m_comm.Timer().ElapsedMicroSeconds(true), __FUNCTION__);
  rnb_err_t err = rnb_err;
  std::string packet_data;
  RNBRemote::Packet packet_info;
  err = GetPacket(packet_data, packet_info, false);

  if (err == rnb_success) {
    DNBLogThreadedIf(LOG_RNB_REMOTE, "HandleReceivedPacket (\"%s\");",
                     packet_data.c_str());
    HandlePacketCallback packet_callback = packet_info.normal;
    if (packet_callback != NULL) {
      if (type != NULL)
        *type = packet_info.type;
      return (this->*packet_callback)(packet_data.c_str());
    } else {
      // Do not fall through to end of this function, if we have valid
      // packet_info and it has a NULL callback, then we need to respect
      // that it may not want any response or anything to be done.
      return err;
    }
  }
  return rnb_err;
}

This function is responsible for parsing the packet and executing the registered callback handler for that packet.

The “unimplemented packet” log message comes from RNBRemote::GetPacket. There we can find the m_packets vector iterator responsible for returning packet_info where the callback is extracted from if the packet is valid.

rnb_err_t RNBRemote::GetPacket(std::string &packet_payload,
                               RNBRemote::Packet &packet_info, bool wait) {
(...)
  if (err == rnb_success) {
    Packet::iterator it;
    for (it = m_packets.begin(); it != m_packets.end(); ++it) {
      if (payload.compare(0, it->abbrev.size(), it->abbrev) == 0)
        break;
    }

    // A packet we don't have an entry for. This can happen when we
    // get a packet that we don't know about or support. We just reply
    // accordingly and go on.
    if (it == m_packets.end()) {
      DNBLogThreadedIf(LOG_RNB_PACKETS, "unimplemented packet: '%s'",
                       payload.c_str());
      HandlePacket_UNIMPLEMENTED(payload.c_str());
      return rnb_err;
    } else {
      packet_info = *it;
      packet_payload = payload;
    }
  }
  return err;
}

If we track m_packets we can find the first place where patching is needed. The m_packets vector is initialized in RNBRemote::CreatePacketTable and we can see that there is no callback registered for hardware breakpoint packets.

void RNBRemote::CreatePacketTable() {
(...)
  std::vector<Packet> &t = m_packets;
(...)
t.push_back(Packet(insert_mem_bp, &RNBRemote::HandlePacket_z, NULL, "Z0",
                   "Insert memory breakpoint"));
t.push_back(Packet(remove_mem_bp, &RNBRemote::HandlePacket_z, NULL, "z0",
                   "Remove memory breakpoint"));
(...)
  //  t.push_back (Packet (insert_hardware_bp,
  //  &RNBRemote::HandlePacket_UNIMPLEMENTED, NULL, "Z1", "Insert hardware
  //  breakpoint"));
  //  t.push_back (Packet (remove_hardware_bp,
  //  &RNBRemote::HandlePacket_UNIMPLEMENTED, NULL, "z1", "Remove hardware
  //  breakpoint"));
t.push_back(Packet(insert_write_watch_bp, &RNBRemote::HandlePacket_z, 
                   NULL, "Z2", "Insert write watchpoint"));
t.push_back(Packet(remove_write_watch_bp, &RNBRemote::HandlePacket_z, 
                   NULL, "z2", "Remove write watchpoint"));
(...)
}

We need to uncomment the code for Z1 and z1 packets and modify the packet handler callback function. The RNBRemote::HandlePacket_z method is already able to handle hardware breakpoints so we don’t need any modifications or new code. The ARM version already supports hardware breakpoints.

  if (packet_cmd == 'Z') {
    // set
    switch (break_type) {
    case '0': // set software breakpoint
    case '1': // set hardware breakpoint
    {
      // gdb can send multiple Z packets for the same address and
      // these calls must be ref counted.
      bool hardware = (break_type == '1');

      if (DNBBreakpointSet(pid, addr, byte_size, hardware)) {
        // We successfully created a breakpoint, now lets full out
        // a ref count structure with the breakID and add it to our
        // map.
        return SendPacket("OK");
      } else {
        // We failed to set the software breakpoint
        return SendPacket("E09");
      }
    } break;

If we enable the handlers for Z1 and z1 packets and recompile we are now able to set a hardware breakpoint without errors.

(lldbinit) breakpoint set -a 0x10000419d -H
Breakpoint 2: where = dyld`_dyld_start + 1, address = 0x000000010000419d

And we get the following log messages:

[6582/1003]: ::read ( 8, 0x700007a908c0, 1024 ) => 18 err = 0x00000000
[6582/1003]: read: $Z1,10000419d,1#07
[6582/1003]: getpkt: $Z1,10000419d,1#07
[6582/0307]: RNBRunLoopInferiorExecuting ctx.Events().WaitForSetEvents(0x000000a5) => 0x00000020 (read_packet_available )
[6582/0307]: HandleReceivedPacket ("Z1,10000419d,1");
[6582/0307]: MachProcess::CreateBreakpoint ( addr = 0x10000419d, length = 1, hardware = 1)
[6582/0307]: MachProcess::EnableBreakpoint ( addr = 0x10000419d )
[6582/0307]: ::mach_vm_read ( task = 0x1503, addr = 0x10000419d, size = 1, data => 0x1071ac000, dataCnt => 1 ) err = 0x00000000
[6582/0307]: MachTask::ReadMemory ( addr = 0x10000419d, size = 1, buf = 0x7ff002e00080) => 1 bytes read 0x10000419d: 6a
[6582/0307]: ::mach_vm_region_recurse ( task = 0x1503, address => 0x100003000, size => 307200, nesting_depth => 0, info => 0x7ffee90e7fe4, infoCnt => 12) addr = 0x10000419d  err = 0x00000000
[6582/0307]: info = { prot = 5, max_prot = 7, inheritance = 0x00000001, offset = 0x00000000, user_tag = 0x00000000, ref_count = 804, shadow_depth = 2, ext_pager = 1, share_mode = 1, is_submap = 0, behavior = 0, object_id = 0x39b39031, us
[6582/0307]: ::mach_vm_protect ( task = 0x1503, addr = 0x10000419d, size = 1, set_max = 0, prot = 3 ) err = 0x00000000
[6582/0307]: ::mach_vm_write ( task = 0x1503, addr = 0x10000419d, data = 0x106c64f18, dataCnt = 1 ) err = 0x00000000
[6582/0307]: ::mach_vm_protect ( task = 0x1503, addr = 0x10000419d, size = 1, set_max = 0, prot = 5 ) err = 0x00000000
[6582/0307]: MachTask::WriteMemory ( addr = 0x10000419d, size = 1, buf = 0x106c64f18) => 1 bytes written 0x10000419d: cc
[6582/0307]: ::mach_vm_read ( task = 0x1503, addr = 0x10000419d, size = 1, data => 0x1071ac000, dataCnt => 1 ) err = 0x00000000
[6582/0307]: MachTask::ReadMemory ( addr = 0x10000419d, size = 1, buf = 0x7ffee90e813c) => 1 bytes read 0x10000419d: cc
[6582/0307]: MachProcess::EnableBreakpoint ( addr = 0x10000419d ) : SUCCESS.
[6582/0307]: MachProcess::CreateBreakpoint ( addr = 0x10000419d, length = 1) => 0x7ff002e00078
[6582/0307]: 18455015 RNBRemote::SendPacket (OK) called
[6582/0307]: ::write ( socket = 8, buffer = 0x7ffee90e7da9, length = 6) => 6 err = 0x00000000
[6582/0307]: putpkt: $OK#00
[6582/0307]: sent: $OK#00
[6582/0307]: RNBRunLoopInferiorExecuting ctx.Events().WaitForSetEvents(0x000000a5) ...

This time a breakpoint is set, but as a software breakpoint, because a 0xCC (i.e. an int3 instruction) byte is written to the target address and a software breakpoint exception is raised instead.

Process 25985 stopped
* thread #1, stop reason = breakpoint 2.1
    frame #0: 0x000000010000419d dyld`_dyld_start + 1

[6582/2803]: ::catch_mach_exception_raise ( exc_port = 0x2903, thd_port = 0x2703, tsk_port = 0x1503, exc_type = 6 ( EXC_BREAKPOINT ), exc_data[2] = { 0x2, 0x0 })
(...)
[6582/2803]:     state { task_port = 0x1503, thread_port =  0x2703, exc_type = 6 (EXC_BREAKPOINT) ...
[6582/2803]:             exc_data[0]: 0x2
[6582/2803]:             exc_data[1]: 0x0
[6582/2803]: [  0] #  1 tid: 0x00098dfb, pc: 0x000000010000419d, sp: 0x00007ffeefbff9c0, user: 0.000001, system: 0.000038, cpu:  0, policy:  1, run_state:  3 (waiting), flags:  0, suspend_count:  0 (current  0), sleep_time: 0

The exception address 0x10000419d we see in pc register is the same we set the breakpoint at. This time we don’t have error messages but we also don’t have real hardware breakpoints.

Step number 5: Understanding how breakpoints are set

We saw that support for hardware breakpoints already exists in the code and that it shares functions and methods with software breakpoints. Since hardware breakpoint requests are being set as software we need to trace the implementation from the packet handler.

  if (packet_cmd == 'Z') {
    // set
    switch (break_type) {
    case '0': // set software breakpoint
    case '1': // set hardware breakpoint
    {
      // gdb can send multiple Z packets for the same address and
      // these calls must be ref counted.
      bool hardware = (break_type == '1');

      if (DNBBreakpointSet(pid, addr, byte_size, hardware)) {
        // We successfully created a breakpoint, now lets full out
        // a ref count structure with the breakID and add it to our
        // map.
        return SendPacket("OK");
      } else {
        // We failed to set the software breakpoint
        return SendPacket("E09");
      }
    } break;

DNBBreakpointSet can be found at lldb/tools/debugserver/source/DNB.cpp.

(...)
typedef std::shared_ptr<MachProcess> MachProcessSP;
(...)
// Breakpoints
nub_bool_t DNBBreakpointSet(nub_process_t pid, nub_addr_t addr, 
                            nub_size_t size, nub_bool_t hardware) {
  MachProcessSP procSP;
  if (GetProcessSP(pid, procSP))
    return procSP->CreateBreakpoint(addr, size, hardware) != NULL;
  return false;
}

The MachProcess class definition can be found at lldb/tools/debugserver/source/MacOSX/MachProcess.h and implementation at lldb/tools/debugserver/source/MacOSX/MachProcess.mm.

DNBBreakpoint *MachProcess::CreateBreakpoint(nub_addr_t addr, 
                                             nub_size_t length,
                                             bool hardware) {
  DNBLogThreadedIf(LOG_BREAKPOINTS, "MachProcess::CreateBreakpoint "
                                    "( addr = 0x%8.8llx, length = %llu,"
                                    " hardware = %i)",
                   (uint64_t)addr, (uint64_t)length, hardware);

  DNBBreakpoint *bp = m_breakpoints.FindByAddress(addr);
  if (bp)
    bp->Retain();
  else
    bp = m_breakpoints.Add(addr, length, hardware);

  if (EnableBreakpoint(addr)) {
    DNBLogThreadedIf(LOG_BREAKPOINTS, "MachProcess::CreateBreakpoint "
                                      "( addr = 0x%8.8llx, length = %llu)"
                                      " => %p",
                     (uint64_t)addr, (uint64_t)length,
                     reinterpret_cast<void *>(bp));
    return bp;
  } else if (bp->Release() == 0) {
    m_breakpoints.Remove(addr);
  }
  // We failed to enable the breakpoint
  return NULL;
}

The logging message at the top is the same we have seen in previous logs when we set a hardware breakpoint. Looking at this method’s code we can easily understand that we want to find EnableBreakpoint (it’s also the next method in the log).

bool MachProcess::EnableBreakpoint(nub_addr_t addr) {
  DNBLogThreadedIf(LOG_BREAKPOINTS,
                   "MachProcess::EnableBreakpoint ( addr = 0x%8.8llx )",
                   (uint64_t)addr);
  DNBBreakpoint *bp = m_breakpoints.FindByAddress(addr);
  if (bp) {
    if (bp->IsEnabled()) {
      DNBLogWarning("MachProcess::EnableBreakpoint ( addr = 0x%8.8llx ): "
                    "breakpoint already enabled.",
                    (uint64_t)addr);
      return true;
    } else {
      if (bp->HardwarePreferred()) {
        bp->SetHardwareIndex(m_thread_list.EnableHardwareBreakpoint(bp));
        if (bp->IsHardware()) {
          bp->SetEnabled(true);
          return true;
        }
      }
(...)
}

Once again it appears that the code to deal with hardware breakpoints is already implemented. The first condition depends on bp->HardwarePreferred() found at lldb/tools/debugserver/source/DNBBreakpoint.h.

(...)
  bool HardwarePreferred() const { return m_hw_preferred; }
  bool IsHardware() const { return m_hw_index != INVALID_NUB_HW_INDEX; }
  uint32_t GetHardwareIndex() const { return m_hw_index; }
  void SetHardwareIndex(uint32_t hw_index) { m_hw_index = hw_index; }
(...)
private:
  uint32_t m_retain_count; // Each breakpoint is maintained by address and
                           // is ref counted in case multiple people set a
                           // breakpoint at the same address
  uint32_t m_byte_size;    // Length in bytes of the breakpoint if set in 
                           // memory
  uint8_t m_opcode[8];     // Saved opcode bytes
  nub_addr_t m_addr;       // Address of this breakpoint
  uint32_t m_enabled : 1,  // Flags for this breakpoint
      m_hw_preferred : 1,  // 1 if this point has been requested to be set
                           // using hardware
                           // (which may fail due to lack of resources)
      m_is_watchpoint : 1, // 1 if this is a watchpoint
      m_watch_read : 1,    // 1 if we stop when the watched data is read 
                           // from
      m_watch_write : 1;   // 1 if we stop when the watched data is 
                           // written to
  uint32_t m_hw_index;     // The hardware resource index for this 
                           // breakpoint/watchpoint

The condition will be true when we try to set a hardware breakpoint so it’s not a problem. What we need to care about is the result of m_thread_list.EnableHardwareBreakpoint(bp), which seems to return the debug register number where the hardware breakpoint was set (remember that x86 hardware breakpoints can be set on debug registers DR0 to DR3). Next step then is to see what m_thread_list is about. We can find the instance variable in lldb/tools/debugserver/source/MacOSX/MachProcess.h.

  MachThreadList m_thread_list; // A list of threads that is 
                                // maintained/updated after each stop

And the MachThreadList class defined at lldb/tools/debugserver/source/MacOSX/MachThreadList.h.

class MachThreadList {
public:
(...)
  uint32_t EnableHardwareBreakpoint(const DNBBreakpoint *bp) const;
  bool DisableHardwareBreakpoint(const DNBBreakpoint *bp) const;
  uint32_t EnableHardwareWatchpoint(const DNBBreakpoint *wp) const;
  bool DisableHardwareWatchpoint(const DNBBreakpoint *wp) const;
  uint32_t NumSupportedHardwareWatchpoints() const;
(...)
};

And our journey through the LLDB source code is at an end, for the MachThreadList::EnableHardwareBreakpoint implementation shows the problem:

uint32_t
MachThreadList::EnableHardwareBreakpoint(const DNBBreakpoint *bp) const {
  if (bp != NULL) {
    const size_t num_threads = m_threads.size();
    for (uint32_t idx = 0; idx < num_threads; ++idx)
      m_threads[idx]->EnableHardwareBreakpoint(bp);
  }
  return INVALID_NUB_HW_INDEX;
}

The return value will always be error INVALID_NUB_HW_INDEX meaning that bp->IsHardware() will fail and MachProcess::EnableBreakpoint will fall through to the software breakpoint code because the hardware breakpoint wasn’t succesfully set. Compare MachThreadList::EnableHardwareBreakpoint with the hardware watchpoints code. Hardware watchpoints are set using the same DR0-DR3 registers.

// DNBWatchpointSet() -> MachProcess::CreateWatchpoint() ->
// MachProcess::EnableWatchpoint()
// -> MachThreadList::EnableHardwareWatchpoint().
uint32_t
MachThreadList::EnableHardwareWatchpoint(const DNBBreakpoint *wp) const {
  uint32_t hw_index = INVALID_NUB_HW_INDEX;
  if (wp != NULL) {
    PTHREAD_MUTEX_LOCKER(locker, m_threads_mutex);
    const size_t num_threads = m_threads.size();
    // On Mac OS X we have to prime the control registers for new threads.
    // We do this using the control register data for the first thread,
    // for lack of a better way of choosing.
    bool also_set_on_task = true;
    for (uint32_t idx = 0; idx < num_threads; ++idx) {
      if ((hw_index = m_threads[idx]->EnableHardwareWatchpoint(
               wp, also_set_on_task)) == INVALID_NUB_HW_INDEX) {
        // We know that idx failed for some reason.  Let's rollback the
        // transaction for [0, idx).
        for (uint32_t i = 0; i < idx; ++i)
          m_threads[i]->RollbackTransForHWP();
        return INVALID_NUB_HW_INDEX;
      }
      also_set_on_task = false;
    }
    // Notify each thread to commit the pending transaction.
    for (uint32_t idx = 0; idx < num_threads; ++idx)
      m_threads[idx]->FinishTransForHWP();
  }
  return hw_index;
}

It is more complete and returns a valid index if everything goes well. We will use a slightly modified version of this watchpoint code for hardware breakpoints. The same will happen for disable hardware breakpoint code (currently always returns false).

m_threads is a vector of MachThread objects defined at lldb/tools/debugserver/source/MacOSX/MachThread.h. Let’s look at the EnableHardwareBreakpoint implementation there.

uint32_t MachThread::EnableHardwareBreakpoint(const DNBBreakpoint *bp) {
  if (bp != NULL && bp->IsBreakpoint())
    return m_arch_up->EnableHardwareBreakpoint(bp->Address(), bp->ByteSize());
  return INVALID_NUB_HW_INDEX;
}

uint32_t MachThread::EnableHardwareWatchpoint(const DNBBreakpoint *wp,
                                              bool also_set_on_task) {
  if (wp != NULL && wp->IsWatchpoint())
    return m_arch_up->EnableHardwareWatchpoint(
        wp->Address(), wp->ByteSize(), wp->WatchpointRead(),
        wp->WatchpointWrite(), also_set_on_task);
  return INVALID_NUB_HW_INDEX;
}

There is a call to a method belonging to whatever class m_arch_up refers to.

  std::unique_ptr<DNBArchProtocol>
      m_arch_up; // Arch specific information for register state and more

We need to understand what DNBArchProtocol is all about.

Step 6: CPU architecture implementation

The DNBArchProtocol is the class each CPU specific implementation derives from. This is where LLDB object-based architecture makes sense. To extend LLDB to a new CPU and/or platform we just need to implement the methods defined in DNBArchProtocol class for that specific target.

We can find all the different CPUs supported for MacOSX targets.

lldb/tools/debugserver/source/MacOSX/arm
lldb/tools/debugserver/source/MacOSX/arm64
lldb/tools/debugserver/source/MacOSX/i386
lldb/tools/debugserver/source/MacOSX/ppc
lldb/tools/debugserver/source/MacOSX/x86_64

The x86_64 specific implementation:

class DNBArchImplX86_64 : public DNBArchProtocol {
public:
(...)
  virtual uint32_t NumSupportedHardwareWatchpoints();
  virtual uint32_t EnableHardwareWatchpoint(nub_addr_t addr, 
                                            nub_size_t size,
                                            bool read, bool write,
                                            bool also_set_on_task);
  virtual bool DisableHardwareWatchpoint(uint32_t hw_break_index,
                                         bool also_set_on_task);
  virtual uint32_t GetHardwareWatchpointHit(nub_addr_t &addr);
(...)
};

And ARM implementation:

class DNBArchMachARM : public DNBArchProtocol {
public:
(...)
  virtual uint32_t NumSupportedHardwareBreakpoints();
  virtual uint32_t NumSupportedHardwareWatchpoints();
  virtual uint32_t EnableHardwareBreakpoint(nub_addr_t addr, 
                                            nub_size_t size);
  virtual bool DisableHardwareBreakpoint(uint32_t hw_break_index);

  virtual uint32_t EnableHardwareWatchpoint(nub_addr_t addr, 
                                            nub_size_t size,
                                            bool read, bool write,
                                            bool also_set_on_task);
  virtual bool DisableHardwareWatchpoint(uint32_t hw_break_index,
                                         bool also_set_on_task);
  virtual bool DisableHardwareWatchpoint_helper(uint32_t hw_break_index,
                                                bool also_set_on_task);
  virtual bool ReenableHardwareWatchpoint(uint32_t hw_break_index);
  virtual bool ReenableHardwareWatchpoint_helper(uint32_t hw_break_index);
(...)
};

Now it is clear that the x86_64 implementation lacks hardware breakpoints while ARM has it (but not ARM64).

What we need to do is to implement NumSupportedHardwareBreakpoints, EnableHardwareBreakpoint and DisableHardwareBreakpoint in lldb/tools/debugserver/source/MacOSX/x86_64/DNBArchImplX86_64.cpp.

My patch does exactly this. I copied the watchpoint methods and modified it where necessary because of differences between enabling/disabling hardware breakpoints and watchpoints. I have also added the argument bool also_set_on_task to the prototype. The reason for this is that if we set the hardware breakpoint on the task port then newly created threads will inherit the hardware breakpoint, otherwise the breakpoint will only be set on existing threads. This would be a problem if the code that we want to breakpoint would hit on a new thread after we set it.

The other major modification that we need to perform is at MachThreadList::EnableHardwareBreakpoint and MachThreadList::DisableHardwareBreakpoint because the current implementation will always return errors and false values as we have seen before.

The updated version is essentially a copy of the watchpoints implementation with necessary modifications (calling EnableHardwareBreakpoint instead of EnableHardwareWatchpoint:

uint32_t
MachThreadList::EnableHardwareBreakpoint(const DNBBreakpoint *bp) const {
  uint32_t hw_index = INVALID_NUB_HW_INDEX;
  if (bp != NULL) {
    PTHREAD_MUTEX_LOCKER(locker, m_threads_mutex);
    const size_t num_threads = m_threads.size();
    // On Mac OS X we have to prime the control registers for new threads.
    // We do this using the control register data for the first thread,
    // for lack of a better way of choosing.
    bool also_set_on_task = true;
    for (uint32_t idx = 0; idx < num_threads; ++idx) {
      if ((hw_index = m_threads[idx]->EnableHardwareBreakpoint(
               bp, also_set_on_task)) == INVALID_NUB_HW_INDEX) {
        // We know that idx failed for some reason.  Let's rollback the
        // transaction for [0, idx).
        for (uint32_t i = 0; i < idx; ++i) {
          m_threads[i]->RollbackTransForHWP();
        }
        return INVALID_NUB_HW_INDEX;
      }
      also_set_on_task = false;
    }
    // Notify each thread to commit the pending transaction.
    for (uint32_t idx = 0; idx < num_threads; ++idx) {
      m_threads[idx]->FinishTransForHWP();
    }
  }
  return hw_index;
}

After patching and recompiling we finally have Intel 64-bit hardware breakpoint support in LLDB. The exception code for hardware breakpoints is EXC_I386_SGL, and the modified lldb and debugserver show it when the breakpoint is hit.

Process 25992 stopped
* thread #1, stop reason = EXC_BREAKPOINT (code=EXC_I386_SGL, subcode=0x10000b1a6)
    frame #0: 0x000000010000b1a6 dyld`_dyld_start + 10

Testing a hardware breakpoint in a loop proves that they aren’t one shot and work as expected.

Step 7: Caveats

This implementation has a small problem (or maybe not). When the target is restarted the existing hardware breakpoints are not reenabled. This means that we need to disable and enable the hardware breakpoints again. I am not sure I am happy with this behavior. It would be better to enable all hardware breakpoints that were enabled prior to target restart, just like how software breakpoints behave. This is something that I need to explore further, and fixing it may be very simple.

Temporary hardware breakpoints aren’t also working (they aren’t disabled when hit). Need to investigate why.

Step 8: Conclusion

I am not a fan of C++ and the object paradigm in general. My silly brain doesn’t like to think under that paradigm although I understand it is a good design choice for applications like (U)EFI parsing (my first attempt to parse EFI capsules was in C and it was a pure nightmare) and LLDB. After we understand its architecture and where to look at everything is quite easy to implement - I was genuinely surprised at the low amount of effort that it took me to get this feature done. I always overestimate the amount of work required and hence never took the effort to get it done. Clearly my bias against C++ played a role. The logging features avoided extra effort and allowed me to understand the code and find the correct spots much faster.

You can find the patch at github. I am not submitting the patch to LLVM, since I’m not in the mood to deal with license agreements and bureaucracy. I hereby place this code in the public domain. It is essentially the same original code slightly tweaked so I guess it inherits the LLVM license? The lldbinit script has also been updated to support hardware breakpoints and also some fixes related to Python3.

Now LLDB is finally a real debugger!
Maybe this can be a start to make LLDB a better debugger for reverse engineering.

As usual a big thanks to Jeffrey Czerniak (@geekable) for pre-publication editing.

Have fun,
fG!

Update: LLDB project has finally integrated this feature and now it’s a real debugger :-)