PatriotCTF 2023 - write-ups

PatriotCTF 2023 was interesting, so I thought that I might document some of the reversing challenges. SavedByTheShell did not do at its best on this one, but we almost cleared the rev category.

[rev] Reduced Reduced Instruction Set

I heard RISC was pretty good, so I reduced it some more! Check out my password checker program I wrote with it.

Flag format: PCTF{}

Author: @arrilius

File: vm, password_checker.smol

VM analysis

With no hiding, this is indeed a VM challenge. We were also given bytecode of password_checker written for the VM. Let’s call this architecture SMOL.

Putting the VM in reversing framework, we quickly find the main loop of the VM. It reads 4-byte sequence by using decode_instruction, then execute it using execute_instruction. After that, it clears the decode buffer and continues the next iteration.

Main loop

When I first analyzed execute_instruction function in Relyze, it sadly did not recognize the code as a switch statement, and so I have to manually look for each case and the relevant code.

`execute_instruction`

Lucky for us, they are not too complex. Below is the register table that I deduced after decoding the bytecode. See the sequel challenge for instructions table (opcode from 0x00 to 0x0D).

Register table

Index	Size (bits)	Description
0	32	General purpose
1	32	General purpose
2	32	General purpose
3	32	General purpose
4	32	Zero flag
5	32	Unused/padding
6	64	Stack pointer

Bytecode quick analysis

For ease of debugging, I transpiled the bytecode into Python code. Since there is a jump instruction, I make each line a Python function. For most instruction, it will call the function next to it. If it is a jump instruction, we make an conditional if to jump to the destination line. The output was great, both the code size (> 200 KB) and the runnability. Now we can test it without the VM. I modify the test_sub(r32, r32) instruction to print out the value of two of the registers. Why? I noticed that it is called right before a jump instruction, and the way it does not write back the result resembles x86 CMP instruction, where temporary is discarded.

Executing password_checker independently without original VM

Fascinatingly, we find that the debug line has a value similar to what we input. I predicted that this is possibly a basic value comparison, and so I entered value similar to what was expected. After few iteration, I reached the end of the execution. Noticed that in our execution, we have first few lines to be an array with 3 zeros. In fact, it was my poor translation of print_nbytes instruction, which was supposed to print value on stack as if on actual memory (This is fixed in my script for the sequel challenge). Since the address in my implementation is actually a number object, we stored in a 32-bit number instead of consecutive 8-bit cells. It was fortunate that thare are no unaligned memory accesses. I simply convert back the 32-bit number to hex and then ASCII to get the flag. Decoding code is below.

import binascii

key = [
    1885566054,
    2071358815,
    1915975269,
    1920152425,
    1850171185,
    1935635570,
    2100310064,
]

print(binascii.a2b_hex("".join([hex(x)[2:] for x in key])))
# Output:
# b'pctf{vm_r3vers3inG_1s_tr}000'

[rev] Reduced Reduced Instruction Set 2

I added a couple improvements to my reduced reduced instruction set and have another password checker program for your to check out!

Flag format: PCTF{}

Author: @arrilius

File: vm2, password_checker2.smol

VM analysis, again

This is a continuation of the previous part. Fortunately, my overworked solution from the previous part did not go to waste, as the VM only adds new instructions.

This time, I opened vm2 in Cutter, and it clearly showed a switch statement with all cases of the switch table. All new instructions are easily deducible as well.

`execute_instruction` in Cutter

You can see below for full instructions table.

Instructions table

Opcode (hex)	Instruction name (self-deduced)	Description
00	`mov(r32, r32)`	Similar to x86 MOV - two registers form
01	`add3_imm8(r32, r32, i8)`	Add two registers and an 8-bit immediate
02	`add3_imm8_v2(r32, r32, i8)`	No differences from above
03	`print(r32)`	Print the value stored in the register
04	`mov_imm8(r32, i8)`	Similar to x86 MOV - one register, one immediate form
05	`push(r32)`	Push register value to a stack
06	`pop(r32)`	Pop value from stack to a register
07	`test_sub(r32, r32)`	Subtract two registers, stored result to zero flag. Resemble x86 CMP
08	`jz_rel(i16)`	If zero flag is zero (0), jump relatively to an offset declared in the instruction
09	`print_nbytes(i8)`	Print a string of n bytes in stack, starting from stack pointer
0A	`mul_add_imm8(r32, r32, i8)`	Multiply-add. Multiply two registers then add an immediate
0B	`add2_imm8(r32, i8)`	Add a register and an immediate
0C	`halt`	Stop execution
0D	`scan(r32)`	Read number and assign the value to the register
0E	`xor(r32, r32)`	XOR two registers
0F	`shr(r32, i5)`	Shift right
10	`getch(r32)`	Get a char from input and store in the register (and cast u8 -> u32)
11	`shl(r32, i5)`	Shift left
12	`sub2_imm8(r32, i8)`	Subtract a register and an immediate
13	`mod(r32, r32)`	Modulo operation on two registers

Improving bytecode translation

When I first converted to Python using the previous method, it outputs roughly 11 MB of code. The loading of the program is also sluggish, so I change the target language to C++. With optimization can be done by C++ compiler, hopefully we can reveal the true intention of the program. In a way, I am statically compiling bytecode into native machine code via C++. The C++ code size with bytecode comments is 7,051 KB, and the output executable is 5,724 KB. Compiling the source code takes over one minute.

The code is still slow due to the significant number of functions, thus I performed simplification on program flow. Previously, I systematically generate a function for each line, which Clang could not intelligently reduce. Now, if a line can be called from other line (via jump instruction, which can easily be pre-calculated), we make it the start of the function, else group it with previous line(s). Now the code size is 3,581 KB, and the output executable is 2,467 KB, with optimization enabled (-O2) is 828 KB. This is a great reduction on overhead.

When comparing the size to the original bytecode, it is still bigger, which means there can be better translations. However, we are more than ready to reverse the bytecode.

Exploring and actually reading new bytecode

Here we use our C++ compiled version to explore the bytecode.

Compared to the previous version, the difficulty of this one is increased. When we execute the program, it asks a lot of input, which we know through debugging code to be due to getch instruction that takes a character one by one.

Using binary search and Python, we find that the size of input is 840 bytes. Quite strange to be the size of a flag. With few attempts, we realize that it checks the last character, then the second-to-last character. Thinking that it uses technique similar to last one, we quickly found the last character to be A. But that does not hold for the second case. Our helpful injection reports that the second check is at line 8540, so let’s visit the code.

Cutter, sadly, wants to be explicit rather than useful.

password_checker2 - 2nd checker in cutter

Meanwhile, Relyze has a better insight, with a bit of help from Clang side.

password_checker2 - 2nd checker in Relyze

Looking at line 21, which is suspected to be printf code, we see that the second parameter, which refers to the first parameter of the instruction, is computed by (v1 ^ 0x41) + v3 + 1. Comparing with the C++ source code (with bytecode comment) side-by-side, we soon realize the actual behavior.

password_checker2 in C++ in Sakura Editor

Line 4062-4063: Xor value with 65
Line 4064-4065: Add value with 1
Line 4066-4067: Modulus 255 of value
Line 4068-4071: Compare the value with 98

Retroactively, we know that line 18 in Relyze is actually modulo code, rewritten and optimized by compiler using reciprocal multiplication.

With the program continuing execution after few characters, we believe the analysis is correct.

password_checker2 execution

Since all checkers have a similar pattern, with the exeception of the first checker, we can write a compiler pass to find and extract values from them.

password_checker2 key recover

Recovering the key (and reverse it), we got the flag within it.

Solve scripts are available here: dungwinux/patriotctf2023-script. Since they were written during the competition, they might contain some of my initial impression and can be misleading. Consult this writeup for my final thoughts.