Like last year, I bring rev and misc to UMassCTF25. The ideas are not new, but require some good observations. I hope all live players enjoyed the experience or learned something new. For others, be sure to try the challenges first before reading the write-ups.

Source code and build can be found on my GitHub: dungwinux/umassctf25-src. With that, let’s dive in.

ToC

[rev] the void

Description

  • Final points: 486
  • Solves: 20

It’s just you and me.

SHA256: 45bb5f7188f9c3723b5d32b9a45ad48eae97966e350857d5a2162da25122b36c

Small note: Be aware that some antivirus programs might flag this binary. If you plan to run it, make sure you have Visual C++ Redistributable installed :)

Visual C++ latest (aka.ms)

Files:

Comment

I learned about the Heaven’s Gate 2 years ago, but I haven’t seen a challenge doing it in an inverse way. Honestly, there are no advantages in doing this, other than confusing decompiler and debugger.

The core logic is written in x86-32 assembly with AVX2. For the idea, I remember a removed feture on x86: Binary-coded Decimal (BCD). It is not useful for the majority of modern applications, but it is certainly something new to those who aren’t familiar with retro computing. To spice up the reversing a bit, I also added my own implementation of string as a diversion. And that’s all the ingredients. Testing wasn’t so easy, as some debuggers exhibit their weakness:

  • x64dbg failed to parse instruction and cannot continue after switching CPU mode
  • windbg executed correctly, but failed to render disassembly and AVX registers after the switch (WinAPI issue?)
  • lldb and gdb also ran program correctly, also failed to render YMM registers.
  • Only an unofficial build of gdb was able to print YMMs correctly

Fascinatingly, an unexpected issue in player’s solve was found out during the few final hours of the CTF. Due to my selection of flag, there is an overflow in BCD addition at the third-to-last character, making some brute force attempts failed. To see why, read the solution below.

Solution

Diving through code, we can see our main program reads input, runs it through function at 0x004017a0, and compares the output with a specific string. To move on, we need to decipher that function.

The simplified anatomy of the function is:

  1. Check if CPU supports AVX2
  2. Switch to compatibility mode (64-bit -> 32-bit)
  3. While the length >= 32, use AVX2 implementation, else use Intel BCD opcodes
  4. Switch to a smaller encoding: Subtract by 0x30, and limit at 75 (any larger number becomes 75)
  5. Convert each byte of input into BCD representation (19 (0x13) -> 0x19)
  6. Perform BCD addition with immediate based on byte index (0x19 + 0x02 -> 0x21)
  7. Write-back to destination
  8. Go back to step 3 if not done
  9. Return to long mode (32-bit -> 64-bit)

The switch of CPU mode could be recognized via the use of retf instruction. As x86-64 inherits most of the instructions from x86-32, the majority of the disassembly are still readable. We should see the step 4 as:

AVX2 x86-32
vmovdqu ymm5, ymmword [rdx + rsi]
vpsubb ymm5, ymm5, ymmword [rsp + 0x58]
vpaddusb ymm5, ymm5, ymm1
vpsubb ymm5, ymm5, ymm1
movzx ebx, byte ptr [esi + edx]
sub bl, 0x30
mov ax, 75
cmp al, bl
cmova eax, ebx

and step 5 as:

AVX2 x86-32
vpunpcklbw ymm6, ymm5, ymm0
vpmullw ymm6, ymm6, ymm2
vpsrlw ymm6, ymm6, 0xb
vpunpckhbw ymm7, ymm5, ymm0
vpmullw ymm7, ymm7, ymm2
vpsrlw ymm7, ymm7, 0xb
vpackuswb ymm7, ymm6, ymm7
vpaddb ymm7, ymm7, ymm7
vpsllw ymm6, ymm7, 3
vpsubb ymm5, ymm5, ymm7
vpaddb ymm7, ymm7, ymm7
vpaddb ymm7, ymm7, ymm7
vpsubb ymm5, ymm5, ymm7
vpor ymm7, ymm5, ymm6
mov cx, 0xa04 ; 2564

div ch
shl al, cl
or al, ah

However, to recognize the intention of BCD addition is not straightforward, especially with AVX2 instructions seen in the disassembly. However, even if you failed to fully convince your tool to read them as x86-32, there is a tiny hint:

Cutter disassembly

The line with invalid starts with 0x27, which is the opcode for DAA Decimal Adjust AL After Addition. This opcode is used in conjunction with ADD to perform BCD addition. You can click the link to see how it is implemented. The AVX2 implementation can be seen roughly following the same idea, except that the CF=1 case is not accounted since it is impossible to overflow before adjustment with our encoding.

The first addend is our input converted into our encoding. The second addend can be reconstructed and represented through the following Python code (which I named in my solve script, confusingly, offset):

lo_offset = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5]
hi_offset = [0, 1, 2, 3, 4, 5, 6, 7]
offset    = [((hi << 4) | lo) for hi in hi_offset for lo in lo_offset]

With those understandings, we can reverse the work by writing a BCD subtraction. This version of mine also includes underflow handling.

def bcd_sub(a: int, b: int):
    """BCD Subtraction. For 8-bit only"""

    a_ = 100 + int(f'{a:x}')
    b_ = int(f'{b:x}')
    ret = int(f'{a_ - b_}'[-2:], 10)
    # Verbose
    print(f'0x{a:x} ({a_ % 100}) - 0x{b:x} ({b_}) -> 0x{ret}')
    return ret

Running bcd_sub through the expected output, we can derive what the input (aka the flag) should be:

Expected Offset Derived input
0x0 (0) 0x0 (0) 0x0 0
0x21 (21) 0x1 (1) 0x20 D
0x22 (22) 0x2 (2) 0x20 D
0x4 (4) 0x3 (3) 0x1 1
0x77 (77) 0x4 (4) 0x73 y
0x52 (52) 0x5 (5) 0x47 _
0x30 (30) 0x6 (6) 0x24 H
0x11 (11) 0x7 (7) 0x4 4
0x70 (70) 0x8 (8) 0x62 n
0x29 (29) 0x9 (9) 0x20 D
0x47 (47) 0x0 (0) 0x47 _
0x1 (1) 0x1 (1) 0x0 0
0x66 (66) 0x2 (2) 0x64 p
0x71 (71) 0x3 (3) 0x68 t
0x61 (61) 0x4 (4) 0x57 i
0x34 (34) 0x5 (5) 0x29 M
0x67 (67) 0x10 (10) 0x57 i
0x13 (13) 0x11 (11) 0x2 2
0x65 (65) 0x12 (12) 0x53 e
0x33 (33) 0x13 (13) 0x20 D
0x61 (61) 0x14 (14) 0x47 _
0x55 (55) 0x15 (15) 0x40 X
0x24 (24) 0x16 (16) 0x8 8
0x23 (23) 0x17 (17) 0x6 6
0x24 (24) 0x18 (18) 0x6 6
0x23 (23) 0x19 (19) 0x4 4
0x57 (57) 0x10 (10) 0x47 _
0x75 (75) 0x11 (11) 0x64 p
0x78 (78) 0x12 (12) 0x66 r
0x13 (13) 0x13 (13) 0x0 0
0x69 (69) 0x14 (14) 0x55 g
0x81 (81) 0x15 (15) 0x66 r
0x24 (24) 0x20 (20) 0x4 4
0x50 (50) 0x21 (21) 0x29 M
0x69 (69) 0x22 (22) 0x47 _
0x73 (73) 0x23 (23) 0x50 b
0x97 (97) 0x24 (24) 0x73 y
0x72 (72) 0x25 (25) 0x47 _
0x87 (87) 0x26 (26) 0x61 m
0x30 (30) 0x27 (27) 0x3 3
0x75 (75) 0x28 (28) 0x47 _
0x40 (40) 0x29 (29) 0x11 ;
0x34 (34) 0x20 (20) 0x14 >
0x68 (68) 0x21 (21) 0x47 _
0x57 (57) 0x22 (22) 0x35 S
0x86 (86) 0x23 (23) 0x63 o
0x97 (97) 0x24 (24) 0x73 y
0x29 (29) 0x25 (25) 0x4 4
0x88 (88) 0x30 (30) 0x58 j
0x54 (54) 0x31 (31) 0x23 G
0x64 (64) 0x32 (32) 0x32 P
0x57 (57) 0x33 (33) 0x24 H
0x0 (0) 0x34 (34) 0x66 r
0x38 (38) 0x35 (35) 0x3 3
0x91 (91) 0x36 (36) 0x55 g

Thus, the flag is: UMASS{0DD1y_H4nD_0ptiMi2eD_X8664_pr0gr4M_by_m3_;>_Soy4jGPHr3g}

[misc] pandora

Description

  • Final points: 498
  • Solves: 9

Surprise, surprise!

SHA256: 4FE8721994E47EF16546380A01EBE66624D57B00DAE478C90DAB1F88278F6072

Note: The file can be considered malicious. Handle with care.

Files:

Comment

This is certainly what I expected when making the challenge:

Comment on CTF discord server

While brainstorming ideas, I made three observations of the zip file, which later become the three pillars of the challenge:

  1. You can have entries with duplicated names. Archive tool will try to unzip both and ask for confirmation from the second time due to the possibilty of overwrite.
  2. Each entry is compressed individually in Zip archive (antonym is solid). In other words, you can seek and decompress file seperately.
  3. DEFLATE has a maximum compression ratio of 1032:1. Meanwhile, BZip2 works well with exceptionally high repetition of characters.

Originally, I planned to block unzipping by putting huge file right before the flag in the archive. However, 7-Zip sees compressing everything is inefficient and decides to store instead of compress the flag. Instead, I decided to create a sparse file, with the data stored at the beginning. But it would be boring to just put flag in one file, right? So I hid the flag among what compression algorithm can do best: plain text - which leads us to Project Gutenberg.

But the longest step in the making process is compressing eveything, which took me ~25 hours doing naively through 7-Zip. All build commands can be found in my GitHub. Since the ideas are laid out, it should be obvious what to do.

Solution

In case you are new, this is called a zipbomb. Unless you have enough resources (perhaps in the near future), it would hurt your memory and disk to extract everything, so you are expected to seek and partially extract each file in the zip. There are many solutions beyond mine, as it rather depends on how your selected tool parses zip.

Using libzip

libzip comes with useful ziptool that you can use to seek to each entry then stream the decompression. Since the rest are \0 and would take long to decompress, you can exit program base on timeout. After checking each entry, you should find the 40th one containing the flag. You can find Bash script and PowerShell solution below.

seq 41 | xargs -i timeout 0.1 ziptool archive.zip cat "{}" | grep UMASS
0..41 | % -Parallel { busybox timeout 0.5 ziptool archive.zip cat $_ | sls UMASS }
7-Zip + Python

While you can also specify entry and print out its content using Python, there is an issue in zipfile library implementation which will decompress all entries in your memory before you’d even get there. A workaround is to delete entry after reading it, which can be done manually through 7-Zip by selecting the entry and press Delete key on your keyboard. Eventually, you should also find the flag in the 40th entry.

And the flag should be: UMASS{That script, it's got life in it. Working, it is. No doubt.}