flareon

FlareOn11: Challenge 5 - sshd

Yousuf Alhajri

Nov 9, 2024 • 12 min read

Introduction

If you're new to FlareOn or haven't heard about it, FlareOn is an annual reverse-engineering CTF competition organized by Mandiant team. It's designed to challenge security enthusiasts, malware analysts, and reverse engineers with a series of increasingly difficult puzzles that test a wide range of skills, from binary analysis and deobfuscation to encryption and scripting. Each level is crafted to mimic real-world scenarios in malware analysis, providing participants with hands-on experience in tackling complex reverse-engineering tasks.

In this writeup, I'll walkthrough my solution for the 5th challenge of FlareOn 11.

Walkthrough

This FlareOn 11 challenge reads as follows:

TITLE: sshd

Our server in the FLARE Intergalactic HQ has crashed! 
Now criminals are trying to sell me my own data!!! 
Do your part, random internet hacker, to help FLARE out 
and tell us what data they stole! We used the best forensic 
preservation technique of just copying all the files on the system for you.

7zip archive password: flare

The challenge can be downloaded from here (this file contains all FlareOn11 challenges). Extract the file, and you'll find the sshd.7z that corresponds to this challenge.

In this challenge, we are provided with the files of a Docker container that appears to have experienced a crash. We do not know yet what exactly crashed, but from the title of the challenge sshd, we can assume that it's something related to SSH (potentially sshd, which is the OpenSSH server process).

Before we start solving the challenge, I have to say that this challenge is one of my favorite FlareOn challenges this year. The reason is that I've been thinking for a while about creating a CTF challenge, as we organize a yearly national CTF competition, that involves analyzing a core dump from a process. It was fun to see a challenge this year with a similar concept, so kudos to the author!

First, after decompressing both the provided sshd.7z file and the ssh_container.tar with the command tar -xf ssh_container.tar -C ssh_container, we get a list of files/folders of a linux docker container:

After further inspection within the provided file system, we can see that there's a core dump of the sshd process located at /var/lib/systemd/coredump/sshd.core.93794.0.0.11.1725917676:

A core dump, in simple words, is a file that captures the memory state of a running process at a specific point, typically when the process crashes. It includes the contents of the process's memory, such as stack, heap, and registers values, allowing us to debug and analyze the program's state at the time of the crash.

To start our analysis, we'll first copy the sshd binary to the current folder along with its core dump, configure gdb to use the files from the target environment, and then load the core dump.

(base) joe@FlareVM:bins$ cp ../ssh_container/usr/sbin/sshd .
(base) joe@FlareVM:bins$ cp ../ssh_container/var/lib/systemd/coredump/sshd.core.93794.0.0.11.1725917676 .
(base) joe@FlareVM:bins$ gdb ./sshd
...
(gdb) set sysroot ../ssh_container/
(gdb) core ./sshd.core.93794.0.0.11.1725917676

Now that the analysis environment is configured, we can start inspecting why the process crashed through bt (backtrace) command that displays the call stack.

The call stack at the point when the crash occurred

We can observe that the crash occurred due to calling a null pointer. If we take a look at the return address, we can see that it's 0x00007f4a18c8f88f, which is located in the library liblzma.so.5. Let's load this library in IDA and inspect it further.

Once we load it in IDA, we'll need to set the base address to match when the crash occurred.

To do this, we can display all the info about the loaded shared libraries by info sharedlibrary in gdb:

(gdb) info sharedlibrary
From                To                  Syms Read   Shared Object Library
...
0x00007f4a18c8ad40  0x00007f4a18ca8d26  Yes (*)     ../ssh_container/lib/x86_64-linux-gnu/liblzma.so.5
...

The From address represents the start address of the .text segment. To get the base address, we can use readelf and subtract the offset to the .text segment:

(base) joe@FlareVM:infected_liblzma$ readelf -S liblzma.so.5.4.1 | grep .text
  [15] .text             PROGBITS         0000000000004d40  00004d40

(base) joe@FlareVM:infected_liblzma$ python
...
>>> hex(0x00007f4a18c8ad40 - 0x4d40)
'0x7f4a18c86000'

Next, we'll update the base address in IDA (Edit -> Segments -> Rebase program...):

Now that the base address is set, we can inspect the address at which the program crashed:

The location at which the crash happened

This confirms that the program crashed due to attempting to call a null address.

Now the point of this challenge is not to analyze why the program crashed, but to actually get a flag :v

If we take a closer look at the code of this library, we'll find that it's actually an infected library that sshd loaded.

The following is the code that comes right before where the crash happened:

As shown, right before the program crashed, it called several functions. Some of them are mmap and memcpy. mmap is used to dynamically allocate a space for data, while memcpy is used to copy data to a target location.

When the program called mmap, it attempted to allocate a space with memory protection of 7. This value is a bitmask of the protections PROT_READ | PROT_WRITE | PROT_EXEC, which simply means that the allocated space will have the rwx protection. This is usually used by malware developers to allocate space for their shellcodes.

Once a space has been allocated, the program calls memcpy to copy
an encrypted shellcode from unk_7F4A18CA9960 to the allocated space.

memcpy also takes a number of bytes to be copied as the third argument. In this case, it's dword_7F4A18CB8360, which is a DWORD value that equals 0xF96.

Shellcode size

This means that the encrypted shellcode being copied is 0xF96 bytes long.

Our next step is to figure out how to decrypt the shellcode. In order to do this, we need to figure out the algorithm first.

Let's take a look at the functions sub_7F4A18C8F3F0 and sub_7F4A18C8F520 shown in the previous basic disassembly block and decompile them:

sub_7F4A18C8F520 is a bit too long, so I'm just going to include the necessary parts:

void __fastcall sub_7F4A18C8F520(unsigned __int64 *a1, _BYTE *a2, __int64 a3)
{
...
          v27 = __ROL4__(v25 ^ v12, 16);
...
          v30 = __ROL4__(v28 ^ v15, 12);
...
          v35 = __ROL4__(v33 ^ v27, 8);
...
          v39 = __ROL4__((v28 + v34) ^ v30, 7);
...
}

From the decompiled code, we can see that the algorithm used is ChaCha20. This assumption is based on the following observations:

The sigma value in ChaCha20 is expand 32-byte k.
The rotations in the encryption/decryption function are 16, 12, 8, and 7, which indicates it is ChaCha20 and not Salsa20, as they have similar implementations.

sub_7F4A18C8F3F0 is called to set up and initialize the ChaCha20 cipher state, while sub_7F4A18C8F520 is called to perform encryption/decryption.

Now that we know the algorithm, the next step is to find the ChaCha20 key and nonce.

Based on the two discussed ChaCha20 functions, the function responsible for taking the key and nonce is sub_7F4A18C8F3F0 as it's the one responsible for initializing the cipher state. a1 serves as the pointer to the memory location where the ChaCha20 cipher state is initialized and stored. The key is passed in a2, while the nonce is passed in a3.

Since the program crashed at 0x00007F4A18C8F88D, we can backtrack through the previous function calls. The second argument (key) passed to the function sub_7F4A18C8F3F0 is rbp+4, while the third argument (nonce) is rbp+24.

Knowing that the register rbp has not been modified at all before the crash, we can select the previous stack frame and easily dump the key and nonce as follows:

(gdb) frame 1
#1  0x00007f4a18c8f88f in ?? () from ../ssh_container/lib/x86_64-linux-gnu/liblzma.so.5
(gdb) x/32bx $rbp + 0x4
0x55b46d51dde4: 0x94    0x3d    0xf6    0x38    0xa8    0x18    0x13    0xe2
0x55b46d51ddec: 0xde    0x63    0x18    0xa5    0x07    0xf9    0xa0    0xba
0x55b46d51ddf4: 0x2d    0xbb    0x8a    0x7b    0xa6    0x36    0x66    0xd0
0x55b46d51ddfc: 0x8d    0x11    0xa6    0x5e    0xc9    0x14    0xd6    0x6f
(gdb) x/12bx $rbp + 0x24
0x55b46d51de04: 0xf2    0x36    0x83    0x9f    0x4d    0xcd    0x71    0x1a
0x55b46d51de0c: 0x52    0x86    0x29    0x55

key = 943df638a81813e2de6318a507f9a0ba2dbb8a7ba63666d08d11a65ec914d66f

nonce = f236839f4dcd711a52862955

Next, we'll dump the shellcode from the library in order to decrypt it.

(gdb) dump memory shellcode.bin 0x00007f4a18ca9960 0x00007f4a18caa8f6

Now, you can decrypt the shellcode using your favorite tool. I'll use CyberChef:

Once the shellcode is decrypted, you can save it to a file for further analysis.

To analyze this shellcode, I'll switch to Binary Ninja as shellcodes can be easily loaded and analyzed there:

Loading the decrypted shellcode into Binary Ninja

Binary Ninja shows several functions within the shellcode. Instead of attempting to run the shellcode, let's analyze what it does so that we understand the overall logic.

The main function of the shellcode sub_0 is a wrapper around sub_dc2 that does all the work and is shown in the following figure:

The function sub_dc2 is the one that does all the work of the shellcode. As shown, the shellcode uses syscalls to achieve its objective.

On the first line of this function, we notice the values 0x539 and 0xa00020f, which correspond to a port (1337) and IP address (10.0.2.15). The function sub_1a is used to create a socket for communication. For the sake of simplicity, we'll assume that this function creates a TCP/IP socket without any shenanigans.

Unfortunately, the decompiled code is a bit too hard to follow afterwards, so we'll switch to the disassembly view.

Receiving 4 different values from the TCP socket

Once the socket is created, the shellcode executes several syscalls with the number 0x2d (45). Based on the Linux Syscall Table, this corresponds to the recvfrom syscall:

Based on the listed syscalls, the shellcode receives 4 buffers from the socket:

a buffer of (0x20) 32 bytes.
a buffer of (0xc) 12 bytes.
a buffer of 4 bytes.
a buffer that contains data of the size sent in buffer 3.

Next, the shellcode appends a NULL character (0x00) to buffer 4 and performs a syscall with a syscall number of 2 (open). It passes the the last buffer (buffer 4) as the first argument to open as shown in the following figure:

As shown in open syscall manual page, the first argument is a path to a file in the file system. This means that the 4th buffer is a file path and the 3rd buffer is the length of it.

Next, the shellcode performs a read syscall (0x0) to read up to 0x80 bytes from the file:

The next few instructions are shown in the following figure:

In this figure, and once the file has been read, the shellcode calculates the file content length assuming that it's null-terminated. Next, it initializes the encryption context using a slightly modified implementation of ChaCha20. Then, it encrypts the file and sends it to the server using the previously established socket.

With this in mind, we know the following about the buffers received by the shellcode:

a buffer of (0x20) 32 bytes = key
a buffer of (0xc) 12 bytes = nonce
a buffer of 4 bytes = length of file path
a buffer that contains data of the size sent in buffer 3 = file path

Before proceeding to identify what needs to be decrypted, we should note that the ChaCha20 implementation in this shellcode is slightly modified, specifically in its sigma value, which is non-standard. In this case, the sigma value is expand 32-byte K, with a capital 'K' at the end:

With this in mind, let's try to find what we're supposed to decrypt. Since the shellcode successfully ran in the crashed process (because it was called before our crashing point), we'll probably be able to find some clues in the core dump file.

Since the shellcode took a file path along with other values, we'll attempt to find values that are in the same format. Let's look for file paths that are null-terminated. We immediately find the following path:

The file path is null terminated and is prepended by a 4-byte value. That's a good sign that this is what to decrypt.

Based on the value we found, here are the 4 buffers:

key = 8D EC 91 12 EB 76 0E DA 7C 7D 87 A4 43 27 1C 35 D9 E0 CB 87 89 93 B4 D9 04 AE F9 34 FA 21 66 D7
nonce = 11 11 11 11 11 11 11 11 11 11 11 11
file_length = 20 00 00 00 (little-endian) - still smaller than the found file path, but we'll ignore for now because it's not important for decryption
file_path = /root/certificate_authority_signing_key.txt

The only remaining part is the encrypted file content.

Looking around the found file path, I tried to find a value that looks like an encrypted buffer and also doesn't look like an address. The following value seems like a good candidate that is located nearby the file path:

Next, the only remaining part is to decrypt the file content. We'll write a simple python script that implements ChaCha20 with a custom sigma value similar to the shellcode. The final script is shown as follows:

from typing import List, Tuple
import struct

def rotl32(v: int, c: int) -> int:
    """Rotate left operation"""
    return ((v << c) & 0xffffffff) | (v >> (32 - c))

def quarter_round(state: List[int], a: int, b: int, c: int, d: int) -> None:
    """ChaCha20 quarter round operation"""
    state[a] = (state[a] + state[b]) & 0xffffffff
    state[d] ^= state[a]
    state[d] = rotl32(state[d], 16)
    
    state[c] = (state[c] + state[d]) & 0xffffffff
    state[b] ^= state[c]
    state[b] = rotl32(state[b], 12)
    
    state[a] = (state[a] + state[b]) & 0xffffffff
    state[d] ^= state[a]
    state[d] = rotl32(state[d], 8)
    
    state[c] = (state[c] + state[d]) & 0xffffffff
    state[b] ^= state[c]
    state[b] = rotl32(state[b], 7)

def chacha20_block(key: bytes, counter: int, nonce: bytes, sigma: bytes) -> bytes:
    """Generate a ChaCha20 block"""
    state = [0] * 16
    
    state[0:4] = struct.unpack('<IIII', sigma)
    state[4:12] = struct.unpack('<IIIIIIII', key)
    state[12] = counter
    state[13:16] = struct.unpack('<III', nonce)
    
    working_state = state.copy()
    
    for _ in range(10):
        quarter_round(working_state, 0, 4, 8, 12)
        quarter_round(working_state, 1, 5, 9, 13)
        quarter_round(working_state, 2, 6, 10, 14)
        quarter_round(working_state, 3, 7, 11, 15)
        quarter_round(working_state, 0, 5, 10, 15)
        quarter_round(working_state, 1, 6, 11, 12)
        quarter_round(working_state, 2, 7, 8, 13)
        quarter_round(working_state, 3, 4, 9, 14)
    
    working_state = [(working_state[i] + state[i]) & 0xffffffff for i in range(16)]
    
    return b''.join(struct.pack('<I', x) for x in working_state)

def chacha20_crypt(ciphertext: bytes, key: bytes, nonce: bytes, counter: int = 0, sigma: bytes = b'expand 32-byte k') -> bytes:        
    plaintext = bytearray()
    for i in range(0, len(ciphertext), 64):
        keystream = chacha20_block(key, counter + (i // 64), nonce, sigma)
        chunk = ciphertext[i:i + 64]
        plaintext.extend(x ^ y for x, y in zip(chunk, keystream[:len(chunk)]))
    
    return bytes(plaintext)

def main():
    # key (32 bytes)
    key = bytes.fromhex("8D EC 91 12 EB 76 0E DA 7C 7D 87 A4 43 27 1C 35 D9 E0 CB 87 89 93 B4 D9 04 AE F9 34 FA 21 66 D7".replace(" ", ""))
    # nonce (12 bytes)
    nonce = bytes.fromhex("11 11 11 11 11 11 11 11 11 11 11 11".replace(" ", ""))
    # Custom sigma constant (capital K)
    custom_sigma = b'expand 32-byte K'
    
    encrypted_data = bytes.fromhex("A9 F6 34 08 42 2A 9E 1C 0C 03 A8 08 94 70 BB 8D AA DC 6D 7B 24 FF 7F 24 7C DA 83 9E 92 F7 07 1D 02 63 90 2E C1 58".replace(" ", ""))
    
    decrypted = chacha20_crypt(encrypted_data, key, nonce, 0, custom_sigma)

    print(decrypted.decode(errors="ignore"))

if __name__ == "__main__":
    main()

Let's run the script:

$ python decrypt.py
supp1y_cha1n_sund4y@flare-on.com
...

Finally! supp1y_cha1n_sund4y@flare-on.com is the flag :)

Hope you enjoyed it and happy hacking!