TSG have a wonderful CTF for 2023, although I consider the challenges extremely hard. I consumed most of play time in solving one - T the weakest - because it is the coolest challenge I have ever played this year.

Description

Author: m1kit

18 solves

EN

T「AHHHHHH!」 S「LOOKS LIKE T IS DEFEATED」 G「HEH. HE’S ALWAYS THE WEAKEST OF THE BIG ONE HUNDRED.」 C「LOSING TO A MERE MORTAL. WHAT A DISGRANCE TO TSG-ER.」

JP

T「グアアアア」 S「Tがやられたようだな…」 G「フフフ…奴は百天王の中でも最弱…」 C「人間ごときに負けるとはTSGerの面汚しよ…」

Approach

I love this challenge even from its description. It is funny, and that is what CTF is supposed to be.

Here, we have a Linux ELF x86_64 binary, t_the_weakest. Since this is a reversing challenge, let’s not bother checksec-ing it.

Putting it in the reversing framework (I put into Cutter), the program looks simple. However, some instructions looks like they are not normally generated by a C compiler. Interestingly, when I put it in Ghidra, the decompiled code was different from Cutter. But the decompiler in Cutter is also ghidra! That is why the suspicion is raised.

Weird decompilation in Cutter

When I look at the for loop, I realized something strange. For unknown reasons, we are having a loop that starts from data.0012c053 (an address?) then goes to 0, which makes no senses.

Disassembly - Copy Operation

Turns out, the decompiler incorrectly deduced that 0x12c053 is an address. Looking at the x86 disassembly, the program is looping using one-instruction-loop instruction rep. The variant is movesb, so we are dealing with copying value of an 8-bit array from one place to another 8-bit array. The source is rsi (pointing to a place in data section) and the destination is rdi (pointing to a place in stack). We are looping ecx (0x12c053) times, which means it is also the length of the array. Adding 0x12c053 with 0x2d, we get 0x12c080 (the size of stack allocation), which means we are copying full data onto the stack. The size is huge, so I doubt this is the flag. Let’s call this part “Copy Operation”.

At this point, I decided to work on the rest of the challenge by relying on x86 disassembly instead.

After that, we have a simple cmp that checks the first character in the second argument of the program execution, which in here is supposed to be T, then use that value for later part. Let’s call this part “Verify Operation”.

Disassembly - Decrypt Operation

Eventually, we are looping again (this time without using rep), then do several calculation. I first thought this to be the fast division trick that a compiler would pull, but unfortunately this is just a series of calculation for a chaining cipher (using the previous result to decrypt the next value). Three parameters involve in this cipher, an addend, a multiplier, and a modulo divisor. The initialization value is the value that was checked in Verify Operation. Let’s call this part “Decrypt Operation”.

Disassembly - Execute Operation

Later that, we have a syscall with id 0x13f, a write to an address which was returned from the syscall of id 319, a sprintf, then an execv. Let’s call this part “Execute Operation”.

You might be curious when I have name for each part. Surprisingly, when we decrypt the data section (by reconstructing the “Decrypt Operation”), we find another ELF binary! Inspecting the binary in the reversing framework, we find a very similar structure: Copy Operation, Verify Operation, Decrypt Operation, and Execute Operation. This is confirmed when we look up the syscall: 319 is sys_memfd_create, and man said it is used to create a temporary file. Basically, for Execution Operation, we are writing the decrypted program into a temporary file, then execute it using same program argument but the second argument is off by 1. The workflow of the program(s) is as below:

Copy <-----
V         |
Verify    |
V         | (Fork)
Decrypt   |
V         |
Execute----

At this point, we can deduce that the flag is inputted via the second argument. I project that each program would only check one character of the flag, and it could go on forever, so I decided to script in Python.

Observe the pattern

Since we know the flag follows the standard format (TSGCTF{}), we are guaranteed to check the first 7 versions of the program. The Execute Operation mostly remain the same. The dynamic parts are Copy Operation, Verify Operation and Decrypt Operation. The observations were done in Cutter.

Copy Operation

Through multiple layers, we find that:

  • This part is always ended with the rep movsb instruction.
  • lea rdi, ... and lea rsi, ... are always a few lines above. The values are different between layers.
  • mov ecx, ... would be anywhere between the start of main function and rep movsb. Since we only care the final result, we pick the closest one.
  • lea rsi, ... would point to the data section. For the first 21 layers, it is constantly 0x2018 in the binary. Later, the offset changes so I have to parse the assembly instead.

    For example, in the 22nd layer, we have lea rsi, [rip + 0xf58] when disassembling raw code. The address for the data would be 0xf58 bytes after the instruction. rip points to right after the highest byte of the instruction in memory. The instruction is located at main+0x17, the instruction size is 7, and main in this binary starts at 0x10B0. Thus, we can find the data in the binary at 0x10B0 + 0x17 + 7 + 0xf58 = 0x2026.

Verify Operation

This part would spread out from the start of main function to before Decrypt Operation. This will always check for only one character, the first one in argv[1]. If it is correct, it will decrypt the program using it as a initialization value, else it will jump to a different function that prints out ng (NGですか?). There are only 4 variations:

  1. One instruction. The program simply does a cmp with a constant value, then does a mov of same value to input it into the Decrypt Operation. (Why mov again?)
     cmp edx, expect
     mov edx, expect
    
  2. Modulo. The program checks the modulo result of input with two divisors.
     idiv devis1
     cmp ..., expect1 / dec ...
     je ..., ...
     idiv devis2
     cmp ..., expect2 / dec ...
     je ..., ...
    

    (The / in code above means diferent variations.) In one case, I was confused when I cannot find the instruction that changes register flag before test, until I run ZydisInfo and learn that dec also modifies the register flags. We can brute-force using the following python code:

     [x for x in range(0xff) if x % devis1 == expect1 and x % devis2 == expect2]
    
  3. Xorshift32 (but on 64-bit number). The program compares the output of Xorshift32(input) with pre-calculated number.
     shl ..., 0xd
     shr ..., 0x11
     shl ..., 0x5
     cmp ..., expect
    

    Only expect is non-static, so we only have to look for it when this method is used. Then, we can solve using the following python code:

     [x for x in range(0xff) for y in [(x << 0xd) ^ x] for z in [(y >> 0x11) ^ y] if z ^ (z << 5) == expect]
    
  4. Multiplication OR. The program calculates a 5th degree polynomial (quintic) equation and check if the input is one of the roots/solutions. For example:
     lea     rax, [rdx - 0x5f]
     lea     rcx, [rdx + 0xe]
     imul    rax, rcx
     lea     rcx, [rdx + 0x8a]
     imul    rax, rcx
     lea     rcx, [rdx - 0x97]
     imul    rax, rcx
     lea     rcx, [rdx - 0xb4]
     imul    rax, rcx
     test    rax, rax
    

    Since the flag should be in readable ASCII, we can limit the solutions to the range [0x20, 0x7F]. While all those roots are valid solutions, the expected input here would be 0x5f.

Decrypt Operation

Through multiple layers, it is consistent that there would be:

  • A cqo followed by an idiv ecx that calculates modulo ecx
  • A mul with multiplier, located before the cqo
  • An add with addend, located before the mul
  • An assignment mov ecx, ... that set the devisor in the modulo operation. We simply look for the nearest one above the idiv ecx.

All the above decryption parameters are different at each layer, so we must parse them. Then, we can quickly decrypt data using Python list comprehension:

prev = ...  # Previous value, init with the flag character in the current layer
[ch ^ (0xff & (prev := (overflowFix(prev * multiplier, 8) + addend) % modulo)) for ch in encrypted_data]

Semi-automated solve

I scripted in Python and used Capstone library for disassembling. I implemented detection and parsing for most of the parameters described above, except the Verify Operation part that has a lot of variations with complex patterns (I would do if I have time). Instead, I let the script prints the disassembly of the Verify Operation and prompts me its solution. Since each layer can be time-consuming to solve, I save its solution in a flag file and load it on later rerun.

Reconstruction of my working terminal

Reconstructed image of my working terminal. Left: The script logs the parameters it found at stage 8, then asks me the solution of the Verify Operation. Top-right: A Python REPL interprets list comprehensions that brute-force the solution for variant 2 and 3 of Verify Operation. Bottom-right: ZydisInfo prints information on dec edx instruction.

One lucky thing I encountered was when I try addressing the location of main function in binary. My solve script sets it to constant 0x10A0, but in reality some binaries have different start. For example, in the 22nd layer, the address of main is actually 0x10B0, off by 0x10 bytes. But those 0x10 bytes still make valid instructions and does not corrupt the parsing of main code.

Conclusion

The whole process took me 6 hours in total, with around 40 minutes running the script and just-in-time-solving the Verify Operation part. During the process, I reckoned that I could have run it through symbolic execution, but that would cost me a lot more time making the tool works, and I learned a lot more about x86 architecture this way.

This Matryoshka-like mechanism reminds me of a malware type that unpacks actual payload only on execution. One layer of packing is fine, but what about 100 (like this challenge)? or 1000? This is where symbolic execution is needed. angr has this feature and runs on Python (which is quick to script on), so learning it goes straight into my bucket list after the CTF.

Like PatriotCTF’s reduced_reduced_instruction_set, this challenges my understanding of both reversing and assembly skills. And I love it! I’m sincerely curious how they crafted this challenge.

You can check out my script at here. Hope you enjoy reading the writeup like how I felt when solving the challenge.