Gatekeerper – A kernel extension to mitigate Gatekeeper bypasses

Last month Patrick Wardle presented “Exposing Gatekeeper” at VB2015 Prague.
The core of the presentation deals with Gatekeeper bypasses originating in the fact that Gatekeeper only verifies the code signatures of the main binary and not of any linked libraries/frameworks/bundles.
This means it is possible to run unsigned code using dynamic library hijacking techniques also presented by Patrick in code that should be protected by Gatekeeper. His exploit uses an Apple code signed application that is vulnerable to dylib hijacking and his modified to run unsigned code when downloaded from the Internet. In this scenario Gatekeeper enters into action and should verify if the download is code signed (assuming the default OS X scenario where it is enabled). But in this case Gatekeeper will fail to verify the linked code and effectively is bypassed.

The core of the problem is that Gatekeeper only deals with the main binary code, and never verifies any linked code. This is obviously a flaw and hopefully a fix by Apple should be out sooner or later. Meanwhile we can try to build ourselves a fix using the TrustedBSD framework. For this I created Gatekeerper, a proof of concept kernel extension for Yosemite 10.10.5 (can be easily adapted to work with El Capitan, but I don’t want to release that code).

What it does is to verify all executable code that is being mapped into the main process, and if it’s not code signed the application will not run. This only happens if the main binary is signed, since if it was unsigned there is really no purpose in verifying signed linked code – the main binary can already be modified at will and/or it wouldn’t bypass Gatekeeper (assuming no other Gatekeeper bypass exploit).
It can also be easily modified to not allow any kind of unsigned code to run.

From my perspective this is what Apple fix should do, verify the code signature of all executable code being loaded into a binary protected by Gatekeeper, and kill process if unsigned code is loaded. This is not exactly the same as refusing to run any unsigned code as in iOS. We are just talking about Gatekeeper feature – the protection of (Internet) downloaded applications.
Apple could go the extra mile and implement a configuration option to refuse to run any unsigned code as it happens in iOS. Such implementation would be less secure than iOS counterpart mainly because there’s no trusted boot chain, kernel extensions can be loaded, and so on. Like rootless, such feature would be one kernel exploit away from being disabled. By default this feature could be off and advanced users could enable it if they wish to do so. Rootless introduced new paradigms for Apple and I guess this type of option is acceptable in that “vision”.

How is Gatekeerper implemented?

My PoC is based on the fact that dyld (the linker) will be responsible for mmap’ing the linked code – libraries, frameworks, bundles. If we browse dyld source code we can see the following code responsible for mapping executable segments (there are different code paths due to cached libraries and so on):

void ImageLoaderMachO::mapSegments(int fd, uint64_t offsetInFat, uint64_t lenInFat, uint64_t fileLen, const LinkContext& context)
	// find address range for image
	intptr_t slide = this->assignSegmentAddresses(context);
	if ( context.verboseMapping )
		dyld::log("dyld: Mapping %s\n", this->getPath());
	// map in all segments
	for(unsigned int i=0, e=segmentCount(); i < e; ++i) { (...) // wholly zero-fill segments have nothing to mmap() in if ( size > 0 ) {
			if ( (fileOffset+size) > fileLen ) {
				dyld::throwf("truncated mach-o error: segment %s extends to %llu which is past end of file %llu", 
								segName(i), (uint64_t)(fileOffset+size), fileLen);
			void* loadAddress = mmap((void*)requestedLoadAddress, size, protection, MAP_FIXED | MAP_PRIVATE, fd, fileOffset);
			if ( loadAddress == ((void*)(-1)) ) {
				dyld::throwf("mmap() error %d at address=0x%08lX, size=0x%08lX segment=%s in Segment::map() mapping %s", 
					errno, requestedLoadAddress, (uintptr_t)size, segName(i), getPath());

	// update slide to reflect load location			

The TrustedBSD framework contains a useful mmap hook:

  @brief Access control check for mapping a file
  @param cred Subject credential
  @param fg fileglob representing file to map
  @param label Policy label associated with vp
  @param prot mmap protections; see mmap(2)
  @param flags Type of mapped object; see mmap(2)
  @param maxprot Maximum rights

  Determine whether the subject identified by the credential should be
  allowed to map the file represented by fg with the protections specified
  in prot.  The maxprot field holds the maximum permissions on the new
  mapping, a combination of VM_PROT_READ, VM_PROT_WRITE, and VM_PROT_EXECUTE.
  To avoid overriding prior access control checks, a policy should only
  remove flags from maxprot.

  @return Return 0 if access is granted, otherwise an appropriate value for
  errno should be returned. Suggested failure: EACCES for label mismatch or
  EPERM for lack of privilege.
typedef int mpo_file_check_mmap_t(
	kauth_cred_t cred,
	struct fileglob *fg,
	struct label *label,
	int prot,
	int flags,
	int *maxprot

Essentially this hook allows a kernel extension to control what is mmap’ed into a process, which is perfect to control what dyld tries to load into a process.
The TrustedBSD hook in kernel’s mmap implementation can be see here:

 error = mac_file_check_mmap(vfs_context_ucred(ctx),
			     fp->f_fglob, prot, flags, &maxprot);
 if (error) {
	goto bad;
#endif /* MAC */

This is precisely one of the tasks of the infamous AMFI – it implements a hook here called file_check_mmap. You can disassemble AppleMobileFileIntegrity kernel extension and see this and other AMFI hooks. The AMFI El Capitan version is more complete than the Yosemite version. For example there is new code that allows the linker (dyld) to reject any code that is not signed by the same Team ID if the main binary has a special codesign flag CS_REQUIRE_LV. An example of this flag implementation can be found in XCode 7.x. If you try to inject a dynamic library into XCode 7.x the process will be killed. Recent Pangu Team presentation at Ruxcon 2015 also discusses this new feature. I guess this new code originated from iOS code base due to frequent code injection abuse by iOS jailbreaks.

The AMFI code contains two interesting code signing related functions, csfg_get_teamid and csfg_get_platform_binary. They allow to retrieve team id and code signing information from the a file located in the filesystem. The problem is that they aren’t adequate to my purposes – what I want to know is if the executable code is signed or not. In a bit I’ll explain why those functions fail in some cases.

Unfortunately for us the code signing features available inside XNU kernel are very poor and most aren’t even available in KPIs (El Capitan adds more functions but still not enough). This is definitely an area I would love Apple to improve, to create a robust set of code signing KPIs so developers could use them to develop their own security solutions instead of requiring all kinds of potentially unstable tricks and hooks.

Apple, let’s finally assume security as a priority and that (some) developers are interested in developing security solutions. It doesn’t signal your security is bad, everyone already knows that it is so the step forward is to improve it (which you already are but more is required). Pretty please? 🙂

To understand my solution for code signature detection let me show you csfg_get_platform_binary source code from XNU kernel:

 * Function: csfg_get_platform_binary
 * Description: This function returns the 
 *		platform binary field for the 
 * 		fileglob fg
csfg_get_platform_binary(struct fileglob *fg)
	int platform_binary = 0;
	struct ubc_info *uip;
	vnode_t vp;

		return 0;
	vp = (struct vnode *)fg->fg_data;
	if (vp == NULL)
		return 0;

		goto out;
	uip = vp->v_ubcinfo;
	if (uip == NULL)
		goto out;
	if (uip->cs_blobs == NULL)
		goto out;

	/* It is OK to extract the teamid from the first blob
	   because all blobs of a vnode must have the same teamid */	
	platform_binary = uip->cs_blobs->csb_platform_binary;

	return platform_binary;

The return value of this function is zero if target binary is not a platform binary and one if it is. A binary can only be (potentially) considered a platform binary if it is code signed, otherwise it will never be. This is part of what we want – to know if something is code signed or not. The problem is that a binary can be code signed and not be a platform binary.
What is considered a platform binary?
It is a binary signed by Apple that is located in certain system paths:

  • “/private/var/db/dyld/dyld_shared_cache_”
  • “/usr/lib/”
  • “/usr/libexec/”
  • “/System/Library/”
  • “/usr/bin/”
  • “/bin/”
  • “/sbin/”
  • “/usr/sbin/”

This decision also happens inside AMFI in the vnode_check_signature hook. The hook is triggered from kernel function ubc_cs_blob_add:

	 * Let policy module check whether the blob's signature is accepted.
	error = mac_vnode_check_signature(vp, base_offset, blob->csb_sha1, (const void*)cd, size, &is_platform_binary);
	if (error) {
		if (cs_debug) 
			printf("check_signature[pid: %d], error = %d\n", current_proc()->p_pid, error);
		goto out;

So our problem using this function is that we can’t really distinguish between unsigned and signed code – non platform binary code signed applications will return zero on this function. But this function is perfect if we modify it to something like:

	if (uip->cs_blobs == NULL)
		goto out;
                platform_binary = 1;

	return platform_binary;

This small modification guarantees that it will always return zero if the code is not signed, and one if the code is signed – cs_blobs will never be NULL in this case.

My solution is to clone this function, modify it, and then use it in our own hook to verify if target code is signed or not.
Let’s observe the disassembly output of this function to understand the necessary modifications:

_text:FFFFFF80007A7AA0                                         public _csfg_get_platform_binary
__text:FFFFFF80007A7AA0                         _csfg_get_platform_binary proc near
__text:FFFFFF80007A7AA0 55                                      push    rbp
__text:FFFFFF80007A7AA1 48 89 E5                                mov     rbp, rsp
__text:FFFFFF80007A7AA4 41 56                                   push    r14
__text:FFFFFF80007A7AA6 53                                      push    rbx
__text:FFFFFF80007A7AA7 48 8B 47 28                             mov     rax, [rdi+28h]
__text:FFFFFF80007A7AAB 31 DB                                   xor     ebx, ebx
__text:FFFFFF80007A7AAD 83 38 01                                cmp     dword ptr [rax], 1
__text:FFFFFF80007A7AB0 75 3A                                   jnz     short loc_FFFFFF80007A7AEC
__text:FFFFFF80007A7AB2 4C 8B 77 38                             mov     r14, [rdi+38h]
__text:FFFFFF80007A7AB6 4D 85 F6                                test    r14, r14
__text:FFFFFF80007A7AB9 74 31                                   jz      short loc_FFFFFF80007A7AEC
__text:FFFFFF80007A7ABB 4C 89 F7                                mov     rdi, r14
__text:FFFFFF80007A7ABE E8 2D 2D C6 FF                          call    _lck_mtx_lock   ; - fix relocation
__text:FFFFFF80007A7AC3 31 DB                                   xor     ebx, ebx
__text:FFFFFF80007A7AC5 41 0F B7 46 68                          movzx   eax, word ptr [r14+68h]
__text:FFFFFF80007A7ACA 83 F8 01                                cmp     eax, 1
__text:FFFFFF80007A7ACD 75 15                                   jnz     short loc_FFFFFF80007A7AE4
__text:FFFFFF80007A7ACF 49 8B 46 70                             mov     rax, [r14+70h]
__text:FFFFFF80007A7AD3 48 85 C0                                test    rax, rax
__text:FFFFFF80007A7AD6 74 0C                                   jz      short loc_FFFFFF80007A7AE4
__text:FFFFFF80007A7AD8 48 8B 40 50                             mov     rax, [rax+50h]
__text:FFFFFF80007A7ADC 48 85 C0                                test    rax, rax
__text:FFFFFF80007A7ADF 74 03                                   jz      short loc_FFFFFF80007A7AE4
__text:FFFFFF80007A7AE1 8B 58 68                                mov     ebx, [rax+68h]  ; - modify here
__text:FFFFFF80007A7AE4                         loc_FFFFFF80007A7AE4:                   ; CODE XREF: _csfg_get_platform_binary+2Dj
__text:FFFFFF80007A7AE4                                                                 ; _csfg_get_platform_binary+36j ...
__text:FFFFFF80007A7AE4 4C 89 F7                                mov     rdi, r14
__text:FFFFFF80007A7AE7 E8 04 33 C6 FF                          call    _lck_mtx_unlock ; - fix relocation
__text:FFFFFF80007A7AEC                         loc_FFFFFF80007A7AEC:                   ; CODE XREF: _csfg_get_platform_binary+10j
__text:FFFFFF80007A7AEC                                                                 ; _csfg_get_platform_binary+19j
__text:FFFFFF80007A7AEC 89 D8                                   mov     eax, ebx
__text:FFFFFF80007A7AEE 5B                                      pop     rbx
__text:FFFFFF80007A7AEF 41 5E                                   pop     r14
__text:FFFFFF80007A7AF1 5D                                      pop     rbp
__text:FFFFFF80007A7AF2 C3                                      retn
__text:FFFFFF80007A7AF2                         _csfg_get_platform_binary endp

The first thing we need is to modify the return value. The code at 0xFFFFFF80007A7AE1 is responsible for moving the value of csb_platform_binary into platform_binary variable (platform_binary = uip->cs_blobs->csb_platform_binary;). Instead of this we can move one (platform_binary = 1;), meaning that cs_blobs isn’t NULL and code is signed. The value in EBX is the value of platform_binary variable so the necessary code modification is just a “inc ebx” to replace the original mov (original instruction is three bytes, new one is two so remaining byte is patched into a NOP).

The next step is to fix the two function calls to the locks (lck_mtx_lock and lck_mtx_unlock), since they are RIP relative and our copy is located in a different memory address. We simply need to compute the distance from our copy to those kernel functions and update the offset values in both calls. And voilá, with three simple patches we have a perfect working cloned function that does what we need.

The modification to csproc_get_teamid is necessary to return if current process (aka main binary) is code signed or not. The original function returns a pointer but I modified the clone function to return an int value – zero if main binary is not signed, one otherwise. This function is used to decide if we want to verify the linked libraries or not – if main binary is not code signed there is no interested in verifying any linked code. I use this function because it contains everything I need and is easy to modify for my purposes.

After we have working cloned functions that allow us to retrieve the code signing status of the main process and any code that dyld wants to mmap, the last step is to replace the original AMFI hook with my version.
What my version does is to verify if current process is code signed. If that is true then it will verify if all executable code is code signed or not. If something is not code signed that it will refuse to mmap and process will crash (an alternative is to kill the process right there). After this the control is passed back to the original AMFI hook.
This means that I am replacing the original AMFI hook with my own copy. I can’t really remember the reason why I did this (I used this solution for something else so I just reused it here). I guess in this particular case we could just create a new hook after AMFI and verify there, instead of playing some pointers exchange games with AMFI. Maybe you should try to implement this that way.

To make everything easier I hardcoded some space for cloned functions inside the Gatekeerper kernel extension. This makes everything easier because it avoids intermediate jump islands in case the allocated memory doesn’t fit in a 32 bit offset (check my SyScan rootkits presentation/code from this year). Because kernel extensions are guaranteed to always be at a 32 bit offset distance from the kernel it’s very easy to fix the kernel symbols used in the cloned functions.

The PoC code only works with Yosemite 10.10.5 build 14F27. The reason is that the offsets for the cloned functions are hardcoded and not dynamically discovered. This could be done with a disassembler, but this is just a PoC and I’m not working for free ;-). To make it work with El Capitan is just a bit of extra work away.

You can find the code at Gatekeerper Github repo.

I can’t remember why I named it Gatekeerper. JP says it’s the natural evolution from the worm image I frequently use. Sounds like robust logic!

Have fun,

Leave a Reply

Your email address will not be published. Required fields are marked *