PatriotCTF 2023 - write-ups
PatriotCTF 2023 was interesting, so I thought that I might document some of the reversing challenges. SavedByTheShell did not do at its best on this one, but we almost cleared the rev category.
[rev] Reduced Reduced Instruction Set
I heard RISC was pretty good, so I reduced it some more! Check out my password checker program I wrote with it.
Flag format: PCTF{}
Author: @arrilius
File: vm, password_checker.smol
VM analysis
With no hiding, this is indeed a VM challenge. We were also given bytecode of password_checker written for the VM. Let’s call this architecture SMOL.
Putting the VM in reversing framework, we quickly find the main loop of the VM.
It reads 4-byte sequence by using decode_instruction
,
then execute it using execute_instruction
. After that, it clears the
decode buffer and continues the next iteration.
When I first analyzed execute_instruction
function in
Relyze, it sadly did not recognize the code as a switch statement, and so I have
to manually look for each case and the relevant code.
Lucky for us, they are not too complex. Below is the register table that I deduced after decoding the bytecode. See the sequel challenge for instructions table (opcode from 0x00 to 0x0D).
Register table
Index | Size (bits) | Description |
---|---|---|
0 | 32 | General purpose |
1 | 32 | General purpose |
2 | 32 | General purpose |
3 | 32 | General purpose |
4 | 32 | Zero flag |
5 | 32 | Unused/padding |
6 | 64 | Stack pointer |
Bytecode quick analysis
For ease of debugging, I transpiled the bytecode into Python code.
Since there is a jump instruction, I make each line a Python function.
For most instruction, it will call the function next to it. If it is a jump
instruction, we make an conditional if
to jump to the destination line.
The output was great, both the code size (> 200 KB) and the runnability.
Now we can test it without the VM.
I modify the test_sub(r32, r32)
instruction to print out the value
of two of the registers. Why? I noticed that it is called right before a jump
instruction, and the way it does not write back the result resembles x86 CMP
instruction, where temporary is discarded.
Fascinatingly, we find that the debug line has a value similar to what we input. I predicted that this is possibly a basic value comparison, and so I entered value similar to what was expected. After few iteration, I reached the end of the execution. Noticed that in our execution, we have first few lines to be an array with 3 zeros. In fact, it was my poor translation of print_nbytes instruction, which was supposed to print value on stack as if on actual memory (This is fixed in my script for the sequel challenge). Since the address in my implementation is actually a number object, we stored in a 32-bit number instead of consecutive 8-bit cells. It was fortunate that thare are no unaligned memory accesses. I simply convert back the 32-bit number to hex and then ASCII to get the flag. Decoding code is below.
import binascii
key = [
1885566054,
2071358815,
1915975269,
1920152425,
1850171185,
1935635570,
2100310064,
]
print(binascii.a2b_hex("".join([hex(x)[2:] for x in key])))
# Output:
# b'pctf{vm_r3vers3inG_1s_tr}000'
[rev] Reduced Reduced Instruction Set 2
I added a couple improvements to my reduced reduced instruction set and have another password checker program for your to check out!
Flag format: PCTF{}
Author: @arrilius
File: vm2, password_checker2.smol
VM analysis, again
This is a continuation of the previous part. Fortunately, my overworked solution from the previous part did not go to waste, as the VM only adds new instructions.
This time, I opened vm2
in Cutter, and it clearly showed a switch statement
with all cases of the switch table. All new instructions are easily deducible
as well.
You can see below for full instructions table.
Instructions table
Opcode (hex) | Instruction name (self-deduced) | Description |
---|---|---|
00 | mov(r32, r32) |
Similar to x86 MOV - two registers form |
01 | add3_imm8(r32, r32, i8) |
Add two registers and an 8-bit immediate |
02 | add3_imm8_v2(r32, r32, i8) |
No differences from above |
03 | print(r32) |
Print the value stored in the register |
04 | mov_imm8(r32, i8) |
Similar to x86 MOV - one register, one immediate form |
05 | push(r32) |
Push register value to a stack |
06 | pop(r32) |
Pop value from stack to a register |
07 | test_sub(r32, r32) |
Subtract two registers, stored result to zero flag. Resemble x86 CMP |
08 | jz_rel(i16) |
If zero flag is zero (0), jump relatively to an offset declared in the instruction |
09 | print_nbytes(i8) |
Print a string of n bytes in stack, starting from stack pointer |
0A | mul_add_imm8(r32, r32, i8) |
Multiply-add. Multiply two registers then add an immediate |
0B | add2_imm8(r32, i8) |
Add a register and an immediate |
0C | halt |
Stop execution |
0D | scan(r32) |
Read number and assign the value to the register |
0E | xor(r32, r32) |
XOR two registers |
0F | shr(r32, i5) |
Shift right |
10 | getch(r32) |
Get a char from input and store in the register (and cast u8 -> u32) |
11 | shl(r32, i5) |
Shift left |
12 | sub2_imm8(r32, i8) |
Subtract a register and an immediate |
13 | mod(r32, r32) |
Modulo operation on two registers |
Improving bytecode translation
When I first converted to Python using the previous method, it outputs roughly 11 MB of code. The loading of the program is also sluggish, so I change the target language to C++. With optimization can be done by C++ compiler, hopefully we can reveal the true intention of the program. In a way, I am statically compiling bytecode into native machine code via C++. The C++ code size with bytecode comments is 7,051 KB, and the output executable is 5,724 KB. Compiling the source code takes over one minute.
The code is still slow due to the significant number of functions,
thus I performed simplification on program flow. Previously, I systematically generate
a function for each line, which Clang could not intelligently reduce.
Now, if a line can be called from other line
(via jump instruction, which can easily be pre-calculated), we make it the start
of the function, else group it with previous line(s).
Now the code size is 3,581 KB, and the output executable is 2,467 KB,
with optimization enabled (-O2
) is 828 KB. This is a great reduction on overhead.
When comparing the size to the original bytecode, it is still bigger, which means there can be better translations. However, we are more than ready to reverse the bytecode.
Exploring and actually reading new bytecode
Here we use our C++ compiled version to explore the bytecode.
Compared to the previous version, the difficulty of this one is increased.
When we execute the program, it asks a lot of input, which we know through
debugging code to be due to getch
instruction that takes a character one by one.
Using binary search and Python, we find that the size of input is 840 bytes.
Quite strange to be the size of a flag. With few attempts, we realize that it
checks the last character, then the second-to-last character. Thinking that it
uses technique similar to last one, we quickly found the last character to
be A
. But that does not hold for the second case. Our helpful injection reports
that the second check is at line 8540, so let’s visit the code.
Cutter, sadly, wants to be explicit rather than useful.
Meanwhile, Relyze has a better insight, with a bit of help from Clang side.
Looking at line 21, which is suspected to be printf
code, we see that the second
parameter, which refers to the first parameter of the instruction, is computed
by (v1 ^ 0x41) + v3 + 1
. Comparing with the C++ source code (with bytecode comment)
side-by-side, we soon realize the actual behavior.
Line 4062-4063: Xor value with 65
Line 4064-4065: Add value with 1
Line 4066-4067: Modulus 255 of value
Line 4068-4071: Compare the value with 98
Retroactively, we know that line 18 in Relyze is actually modulo code, rewritten and optimized by compiler using reciprocal multiplication.
With the program continuing execution after few characters, we believe the analysis is correct.
Since all checkers have a similar pattern, with the exeception of the first checker, we can write a compiler pass to find and extract values from them.
Recovering the key (and reverse it), we got the flag within it.
Solve scripts are available here: dungwinux/patriotctf2023-script. Since they were written during the competition, they might contain some of my initial impression and can be misleading. Consult this writeup for my final thoughts.