Hack The Boo 2023 - Competition - write-ups
It seems like HackTheBox nailed another good CTF this year. The challenges are beginner-friendly while at the same time being trendy. I will highlight cool ones in this blog post. My solve scripts will be available at here.
[rev] SpookyCheck (Medium)
We were given check.pyc
, which is a compiled Python binary. What can we do with this type of file? Since the file contains Python bytecode, we can either disassemble it or decompile it.
Previously, we have used Decompyle++ for Python bytecode, so let’s use it again!
$ pycdc check.pyc
# Source Generated with Decompyle++
# File: check.pyc (Python 3.11)
KEY = b'SUP3RS3CR3TK3Y'
CHECK = bytearray(b'\xe9\xef\xc0V\x8d\x8a\x05\xbe\x8ek\xd9yX\x8b\x89\xd3\x8c\xfa\xdexu\xbe\xdf1\xde\xb6\\')
def transform(flag):
return enumerate(flag)()
def check(flag):
Error decompyling check.pyc: vector::_M_range_check: __n (which is 2) >= this->size() (which is 2)
Surprisingly, decompiling this file triggers an assertion in Decompyle++. The error seems to be from std::vector
of C++ STL, which is not so helpful in this situation.
Rather than fixing the decompiler, I decided to disassemble the file using pycdas
, which should read the file as-is. And I love reading assembly!
Since the disassembly output is long, I will quote important parts that help solving the challenge.
First, we find the symbol list used at global:
'KEY'
'bytearray'
'CHECK'
'transform'
'check'
'__name__'
'print'
'input'
'inp'
'encode'
Cross-checking the partial output from the decompiler, we at least know that KEY
and CHECK
are two byte-strings.
CHECK
is probably for final comparison with our input. We don’t know what KEY
is for yet.
Then there are transform
and check
functions. Let’s see what they are!
Note ahead: you can find documentations for Python opcodes here.
transform
[Disassembly]
0 RESUME 0
2 LOAD_CONST 1: <CODE> <listcomp>
4 MAKE_FUNCTION 0
6 LOAD_GLOBAL 1: NULL + enumerate
18 LOAD_FAST 0: flag
20 PRECALL 1
24 CALL 1
34 GET_ITER
36 PRECALL 0
40 CALL 0
50 RETURN_VALUE
The disassembly of this function is almost as simple as what the decompiler guesses. But wait! There is a line that loads <CODE> <listcomp>
into the stack then creates a function from it. For those who don’t know, listcomp
refers to a feature in Python called List Comprehension, a syntactic sugar for transforming any iterable into a list. The ability is similar to a higher-order function, but the functionality is close to a lambda.
After that, transform
loads function pointer enumerate
and variable flag
, then use CALL 1
to call a function with one argument. Behind the scene, CALL
finds on the stack NULL
at the beginning, enumerate
as the callable, and flag
as a positional argument in ascending order, pops all of them then eventually pushes the execution result of enumerate(flag)
onto the stack. (More details on CALL
opcode can be found here). At transform+34
, GET_ITER
converts the result into an iterable, then we see another CALL
. With the state of the stack, we know this is a call to the anonymous <listcomp>
function. Let’s see what the function does.
<listcomp>
[Disassembly]
0 RESUME 0
2 BUILD_LIST 0
4 LOAD_FAST 0: .0
6 FOR_ITER 54 (to 116)
8 UNPACK_SEQUENCE 2
12 STORE_FAST 1: i
14 STORE_FAST 2: f
16 LOAD_FAST 2: f
18 LOAD_CONST 0: 24
20 BINARY_OP 0 (+)
24 LOAD_CONST 1: 255
26 BINARY_OP 1 (&)
30 LOAD_GLOBAL 0: KEY
42 LOAD_FAST 1: i
44 LOAD_GLOBAL 3: NULL + len
56 LOAD_GLOBAL 0: KEY
68 PRECALL 1
72 CALL 1
82 BINARY_OP 6 (%)
86 BINARY_SUBSCR
96 BINARY_OP 12 (^)
100 LOAD_CONST 2: 74
102 BINARY_OP 10 (-)
106 LOAD_CONST 1: 255
108 BINARY_OP 1 (&)
112 LIST_APPEND 2
114 JUMP_BACKWARD 55
116 RETURN_VALUE
The main part we can see here is a loop that covers most of the function: FOR_ITER
jumps to <listcomp>+116
after competing the loop. Before that, we see BUILD_LIST
, which means the top of the stack and probably what we are working on is a list. In the loop, we have:
- Unpacking two values from 1st argument:
i
andf
. We know that<listcomp>
is called onenumerate(flag)
that produces the new iterator with index in the value (for example,[4, 7, 9]
into[(0, 4), (1, 7), (2, 9)]
), soi
means index here, and f is the element offlag
. - From
<listcomp>+16
to<listcomp>+26
,f
and24
are pushed on the stack, then it callsBINARY_OP
, which means it pops the 2 stack values (in ascending order), performs operation on them, then pushes the result back. The argument forBINARY_OP
here is+
, so the it pushesf + 24
. Eventually, the value would be(f + 24) & 255
(A). - From
<listcomp>+30
to<listcomp>+86
, similarly withCALL
andBINARY_OP
, we know that the code callslen(KEY)
and computesi % len(KEY)
. What aboutBINARY_SUBSCR
? This is similar toBINARY_OP
but with[]
operation (for example, when the stack is: [str
,3
], the result isstr[3]
). Eventually, the value would beKEY[i % len(KEY)]
(B). - From
<listcomp>+96
to<listcomp>+108
, we have((A ^ B) - 74) & 255
.LIST_APPEND
then puts the result into another list that we have created beforehand at<listcomp>+2
.
In summary, this list comprehension can be expressed in Python as:
[
(
(
((f + 24) & 255) ^ KEY[i % len(KEY)]
) - 74
) & 255
for i, f in enumerate(flag)]
This is the core of the transform
function. With speculation, we can predict that check
and the rest of the program would be checking transform(user_input) == CHECK
. In the list comprehension above, XOR is reversible, modulo would wrap around, and thus transform
is a bijection function (one-to-one and onto), which means we can inverse it and find the flag.
[pwn] pinata (Easy)
This is not a first pwn that I solve, but the first one to have writeup on this blog. This would be a reference to any of my future pwn solve.
Running checksec
on our target, we identify that this program has no PIE and Stack is executable. Immediately, those made me think of ret2shellcode.
Arch: amd64-64-little
RELRO: Partial RELRO
Stack: Canary found
NX: NX unknown - GNU_STACK missing
PIE: No PIE (0x400000)
Stack: Executable
RWX: Has RWX segments
Let’s speculate the decompiled view of the program.
The main
function is simple. Since we are looking forward to exploiting the program using user input, let’s look at reader
.
Jackpot! We found a classic overflow bug in the program: gets
. With the input allocated at rbp-0x10
and rbp
is pushed at the beginning of the function, we need to override 0x10 + 0x8 = 0x18
bytes before reaching the return address.
Now that we are allowed to change the return address and stack is executable, how can we return to the stack without knowing its address? The answer is simple: push rsp
. What this x86 instruction does is pushing the rsp
value onto the stack, and rsp
is exactly the top of the stack (before pushing). Combining with a ret
, we can change the rip
to rsp
, which is exactly what we wanted. Using rp++
, we can find the ROP gadget we want at 0x418c22
(push rsp; ret
). We are lucky not having to find the dynamic address since there are no PIE.
With the instruction pointer changing to the stack, what code do we write on the stack? As basic as it can be, I did an execve
syscall with /bin/sh
as the filename. The syscall number for execve
is 59. For argv
and envp
, I set them to a null pointer. Since "/bin/sh\0"
fits on 8 bytes (Little-endian 0x68732f6e69622f), one trick I tried was to push it onto the stack and use rsp
as pointer to the string. The following is the final shellcode for const char *const tmp[] = {NULL}; execve("/bin/sh", tmp, tmp);
:
mov rdi, 0x68732f6e69622f
push rdi
mov rdi, rsp
push 0
mov rsi, rsp
mov rdx, rsp
mov rax, 59
syscall
(Tips: I used Keystone Engine, a lightning-fast assembler library)
And the payload we send to the program would be:
payload = b"A" * 0x18 + p64(push_rsp) + shellcode
With this, you should be able to access the shell and do cat flag.txt
.
[pwn] claw_machine (Medium)
In retrospective, this is not a hard challenge, but certainly an one-step-up from pinata. Starting the program, we can see a simple claw game, and it asks for feedback after the game ends.
The game is impossible play, so we would not play fair either ;)
Here is the checksec
of the program.
Arch: amd64-64-little
RELRO: Full RELRO
Stack: Canary found
NX: NX enabled
PIE: PIE enabled
RUNPATH: b'./glibc/'
No hopes for shellcoding. Since there are few input fields after the game ended, let’s search for them in the decompiler.
Turns out they are in the fb
function.
The problems in this function are:
printf((int64_t)&buf + 3)
, which is format string vulnerability, andread(0, &var_58h, 0x5e)
, which we can use to overflow buffer.
Looking at the symbol list of the program, we also find the read_flag
function, so this challenge is likely ret2win.
Alternatively, it is also possible to leak the libc address by using the format string vulnerability, but I did not find a good place to jump to.
Here is the devised plan.
- Leak the stack canary and return address using the format string vulnerability. Because the target OS of the binary is x86_64 Linux, arguments after 6th one would be stored on the stack. This function allocates 0x80 bytes, while the stack canary is at
rbp-0x10
, thus the distance is 0x70 bytes. Therefore, the canary is at the 21st argument (1 + 6 + (0x70 / 8) = 21
). The return address is 0x10 bytes after it, so it would be in the 23rd argument. - Bypass the canary check using the leaked stack canary. It is at
rbp-0x10
in the function frame. - Calculate the
main
address based on the return address that is stored when program callfb
, which is subtracting the return address with 53 (main+53
is right aftercall fb
inmain
). - Override the return address with
read_flag
address that is calculated based on the leakedmain
address. Using the buffer overflow bug, we can write starting fromrbp-0x50
, with the maximum size of 0x5e bytes.
The only “troublesome” part is that the distance between our vulnerable buffer and the return address is 0x58 bytes. This means we can override all the allocated memory after rbp-0x50
on stack (0x48 bytes), stack canary, saved rbp
, and only the least significant (bottom) 6 bytes of the return address. However, because we do not jump to stack nor heap address which has different upper 2-byte, this is not a big deal.
The payload for the format string would be: b'%21$p!%23$p!'
. I used !
for easier
parsing of the output.
And the payload for the buffer overflow would be:
payload = (b''
+ b'A' * 0x48 # buffer
+ p64(canary) # stack canary
+ b'R' * 8 # saved rbp
+ p64(e.sym["read_flag"]) # return address
)[:0x5e]
The flag should be printed out when we successfully exploit the program.