TSG CTF 2023 - T the weakest Write-up
TSG have a wonderful CTF for 2023, although I consider the challenges extremely hard. I consumed most of play time in solving one - T the weakest - because it is the coolest challenge I have ever played this year.
Description
Author: m1kit
18 solves
EN
T「AHHHHHH!」 S「LOOKS LIKE T IS DEFEATED」 G「HEH. HE’S ALWAYS THE WEAKEST OF THE BIG ONE HUNDRED.」 C「LOSING TO A MERE MORTAL. WHAT A DISGRANCE TO TSG-ER.」
JP
T「グアアアア」 S「Tがやられたようだな…」 G「フフフ…奴は百天王の中でも最弱…」 C「人間ごときに負けるとはTSGerの面汚しよ…」
Approach
I love this challenge even from its description. It is funny, and that is what CTF is supposed to be.
Here, we have a Linux ELF x86_64 binary, t_the_weakest
. Since this is a reversing challenge, let’s not bother checksec
-ing it.
Putting it in the reversing framework (I put into Cutter), the program looks simple. However, some instructions looks like they are not normally generated by a C compiler. Interestingly, when I put it in Ghidra, the decompiled code was different from Cutter. But the decompiler in Cutter is also ghidra! That is why the suspicion is raised.
When I look at the for loop, I realized something strange. For unknown reasons, we are having a loop that starts from data.0012c053
(an address?) then goes to 0, which makes no senses.
Turns out, the decompiler incorrectly deduced that 0x12c053 is an address. Looking at the x86 disassembly, the program is looping using one-instruction-loop instruction rep
. The variant is movesb
, so we are dealing with copying value of an 8-bit array from one place to another 8-bit array. The source is rsi (pointing to a place in data section) and the destination is rdi (pointing to a place in stack). We are looping ecx (0x12c053) times, which means it is also the length of the array. Adding 0x12c053 with 0x2d, we get 0x12c080 (the size of stack allocation), which means we are copying full data onto the stack. The size is huge, so I doubt this is the flag. Let’s call this part “Copy Operation”.
At this point, I decided to work on the rest of the challenge by relying on x86 disassembly instead.
After that, we have a simple cmp
that checks the first character in the second argument of the program execution, which in here is supposed to be T
, then use that value for later part. Let’s call this part “Verify Operation”.
Eventually, we are looping again (this time without using rep
), then do several calculation. I first thought this to be the fast division trick that a compiler would pull, but unfortunately this is just a series of calculation for a chaining cipher (using the previous result to decrypt the next value). Three parameters involve in this cipher, an addend, a multiplier, and a modulo divisor. The initialization value is the value that was checked in Verify Operation. Let’s call this part “Decrypt Operation”.
Later that, we have a syscall
with id 0x13f, a write
to an address which was returned from the syscall of id 319, a sprintf
, then an execv
. Let’s call this part “Execute Operation”.
You might be curious when I have name for each part. Surprisingly, when we decrypt the data section (by reconstructing the “Decrypt Operation”), we find another ELF binary! Inspecting the binary in the reversing framework, we find a very similar structure: Copy Operation, Verify Operation, Decrypt Operation, and Execute Operation. This is confirmed when we look up the syscall: 319 is sys_memfd_create
, and man
said it is used to create a temporary file. Basically, for Execution Operation, we are writing the decrypted program into a temporary file, then execute it using same program argument but the second argument is off by 1. The workflow of the program(s) is as below:
Copy <-----
V |
Verify |
V | (Fork)
Decrypt |
V |
Execute----
At this point, we can deduce that the flag is inputted via the second argument. I project that each program would only check one character of the flag, and it could go on forever, so I decided to script in Python.
Observe the pattern
Since we know the flag follows the standard format (TSGCTF{}
), we are guaranteed to check the first 7 versions of the program. The Execute Operation mostly remain the same. The dynamic parts are Copy Operation, Verify Operation and Decrypt Operation. The observations were done in Cutter.
Copy Operation
Through multiple layers, we find that:
- This part is always ended with the
rep movsb
instruction. lea rdi, ...
andlea rsi, ...
are always a few lines above. The values are different between layers.mov ecx, ...
would be anywhere between the start ofmain
function andrep movsb
. Since we only care the final result, we pick the closest one.-
lea rsi, ...
would point to the data section. For the first 21 layers, it is constantly 0x2018 in the binary. Later, the offset changes so I have to parse the assembly instead.For example, in the 22nd layer, we have
lea rsi, [rip + 0xf58]
when disassembling raw code. The address for the data would be 0xf58 bytes after the instruction.rip
points to right after the highest byte of the instruction in memory. The instruction is located atmain+0x17
, the instruction size is 7, andmain
in this binary starts at 0x10B0. Thus, we can find the data in the binary at 0x10B0 + 0x17 + 7 + 0xf58 = 0x2026.
Verify Operation
This part would spread out from the start of main
function to before Decrypt Operation. This will always check for only one character, the first one in argv[1]
. If it is correct, it will decrypt the program using it as a initialization value, else it will jump to a different function that prints out ng
(NGですか?). There are only 4 variations:
- One instruction. The program simply does a
cmp
with a constant value, then does amov
of same value to input it into the Decrypt Operation. (Whymov
again?)cmp edx, expect mov edx, expect
- Modulo. The program checks the modulo result of input with two divisors.
idiv devis1 cmp ..., expect1 / dec ... je ..., ... idiv devis2 cmp ..., expect2 / dec ... je ..., ...
(The
/
in code above means diferent variations.) In one case, I was confused when I cannot find the instruction that changes register flag beforetest
, until I run ZydisInfo and learn thatdec
also modifies the register flags. We can brute-force using the following python code:[x for x in range(0xff) if x % devis1 == expect1 and x % devis2 == expect2]
- Xorshift32 (but on 64-bit number). The program compares the output of Xorshift32(input) with pre-calculated number.
shl ..., 0xd shr ..., 0x11 shl ..., 0x5 cmp ..., expect
Only expect is non-static, so we only have to look for it when this method is used. Then, we can solve using the following python code:
[x for x in range(0xff) for y in [(x << 0xd) ^ x] for z in [(y >> 0x11) ^ y] if z ^ (z << 5) == expect]
- Multiplication OR. The program calculates a 5th degree polynomial (quintic) equation and check if the input is one of the roots/solutions. For example:
lea rax, [rdx - 0x5f] lea rcx, [rdx + 0xe] imul rax, rcx lea rcx, [rdx + 0x8a] imul rax, rcx lea rcx, [rdx - 0x97] imul rax, rcx lea rcx, [rdx - 0xb4] imul rax, rcx test rax, rax
Since the flag should be in readable ASCII, we can limit the solutions to the range [0x20, 0x7F]. While all those roots are valid solutions, the expected input here would be 0x5f.
Decrypt Operation
Through multiple layers, it is consistent that there would be:
- A
cqo
followed by anidiv ecx
that calculates moduloecx
- A
mul
with multiplier, located before thecqo
- An
add
with addend, located before themul
- An assignment
mov ecx, ...
that set the devisor in the modulo operation. We simply look for the nearest one above theidiv ecx
.
All the above decryption parameters are different at each layer, so we must parse them. Then, we can quickly decrypt data using Python list comprehension:
prev = ... # Previous value, init with the flag character in the current layer
[ch ^ (0xff & (prev := (overflowFix(prev * multiplier, 8) + addend) % modulo)) for ch in encrypted_data]
Semi-automated solve
I scripted in Python and used Capstone library for disassembling. I implemented detection and parsing for most of the parameters described above, except the Verify Operation part that has a lot of variations with complex patterns (I would do if I have time). Instead, I let the script prints the disassembly of the Verify Operation and prompts me its solution. Since each layer can be time-consuming to solve, I save its solution in a flag file and load it on later rerun.
Reconstructed image of my working terminal. Left: The script logs the parameters it found at stage 8, then asks me the solution of the Verify Operation. Top-right: A Python REPL interprets list comprehensions that brute-force the solution for variant 2 and 3 of Verify Operation. Bottom-right: ZydisInfo prints information on dec edx
instruction.
One lucky thing I encountered was when I try addressing the location of main
function in binary. My solve script sets it to constant 0x10A0, but in reality some binaries have different start. For example, in the 22nd layer, the address of main
is actually 0x10B0, off by 0x10 bytes. But those 0x10 bytes still make valid instructions and does not corrupt the parsing of main
code.
Conclusion
The whole process took me 6 hours in total, with around 40 minutes running the script and just-in-time-solving the Verify Operation part. During the process, I reckoned that I could have run it through symbolic execution, but that would cost me a lot more time making the tool works, and I learned a lot more about x86 architecture this way.
This Matryoshka-like mechanism reminds me of a malware type that unpacks actual payload only on execution. One layer of packing is fine, but what about 100 (like this challenge)? or 1000? This is where symbolic execution is needed. angr has this feature and runs on Python (which is quick to script on), so learning it goes straight into my bucket list after the CTF.
Like PatriotCTF’s reduced_reduced_instruction_set, this challenges my understanding of both reversing and assembly skills. And I love it! I’m sincerely curious how they crafted this challenge.
You can check out my script at here. Hope you enjoy reading the writeup like how I felt when solving the challenge.