iutCTF 2024 (III): Ghost of the Past misc Challenge
Writeup for my misc challenge Ghost of the Past challenge from iutCTF24.
Ghost of the past
TL;DR Timing attack on trie data structure.
#!/usr/local/bin/python
import random
import hashlib
print('Welcome Welcome welcome. Want the flag? Predict my letters and you will get it!')
def get_word(sz):
ret = ''
for _ in range(sz):
toss = random.randint(1, 10)
if toss <= 5:
ret += 'a'
else:
ret += 'b'
return ret
def MD5(s):
return int(hashlib.md5(s.encode()).hexdigest(), 16)
sz = 50
word = get_word(sz)
mod = int(1e9 + 7)
def process_query(p, s):
global word
minSum = 1<<2048
st, ed = 0, len(s) - 1
while ed < len(word) and ed <= p:
s_ = word[st : ed + 1]
Sum = abs(MD5(s) - MD5(s_))
minSum = min(minSum, Sum)
st, ed = st + 1, ed + 1
return (pow(minSum, 42069) % mod + random.randint(1<<254, 1<<256)) % mod
for _ in range(sz):
q = input('Enter your query: ').split()
p, s = int(q[0]), q[1]
if len(s) > sz:
print('What a dumb thing to ask!')
exit()
print('Here is your answer:', process_query(p, s))
predicted = input('Moment of truth. Enter your prediction: ')
if word == predicted:
print(open('flag.txt', 'r').read())
else:
print('Lo053r!')
The goal is to predict a word with $50$ letters generated by the get_word
function. If we take a close look at that function, we can see that the word it generates only contains the letters “a” and “b”. We have $50$ queries to interact with the server and predict the word.
Each interaction with the server requires us to provide an integer $p$ and a string $s$ which returns us a hint (huge number modulo $10^9 + 7$). The hint is generated by the process_query
function.
Understanding the process_query
function
Assume the word we have to guess is $W$. We send as query $p$, $s$. The string $s$ is used to do some sort of searching (we will see in a bit) inside the word $W$. The role of $p$ is to restict the search bound. That is, if the length of $W$ is $n$, i.e $W = w_1w_2\cdots w_n$, the search is going to take place only upto $w_p$ where $p \le n$. What happens at the search is that,
- A window equal to length of $s$ is taken from the starting position of the word $W$.
- That is, two indices $st, ed$ is chosen where $st = 0, ed = \text{size}(s)-1$.
- Let the window be $s’ = w_{st}\cdots w_{ed}$. The window is hashed and subtracted from the hash of the query word $s$. In oher words, $|\text{H}(s’) - \text{H}(s)|$ is calculated where $| \ |$ denotes absolute value.
- At each iteration the variables $st, ed$ is increased by one and the hash difference is caluclated in the same way as explained in the previous step. This repeats until $ed > p$.
- Of all the calculated differences, the minimum one is taken. That value(suppose $x$) is raised to a huge power ($42069$) and added with some huge random value before being reduced modulo $10^9 + 7$. The value $y = x^{42069} + rand(2^{254}, 2^{256}) \mod (10^9 + 7)$ is returned.
Thinking of a solution
The first step is to notice the pow
function. In python, pow
can work in two ways:
pow(a, b)
: Calculates $a^b$ with a time complexity of $O(b)$. Linear in terms of $b$.pow(a, b, p)
: Calculates $a^b \mod p$ with a time complexity of $O(\log_2{b})$ steps. Logarithmic in terms of $b$. Uses something called binary exponentiation.
Which means that pow(a, b)
is extremely slower compared to pow(a, b, p)
. Suppose we have $a = 35, b = 42069, p = 10^9 + 7$.
Function | Operations | Python Execution Time |
---|---|---|
pow(35, 42069) | $42069$ | $> 3 \ s$ |
pow(35, 42069, 10^9 + 7) | $\log_2(42069) \approx 15$ | $< 0.5 \ s$ |
This significant time difference is what we are going to leverage in this problem. A thing to note is that $pow(a, b)$ will work faster only if $a = 0$ or $a = 1$ in which case there will be some internal optimization and the result will be returned immediately.
Let $W = abaaabb$ and we have somehow managed to recover the prefix $aba$. How do we transition to the next state $abaa$? That is, how do we predict that the next character is $a$? We can send $q = 4$ to restrict the search within the window $abaa$ of $W$. Let use see what happens when we send $s = abaa$ vs when we send $s = abab$.
q | s | minSum | pow(minSum, 42069) | Python Execution Time |
---|---|---|---|---|
$4$ | $abaa$ | $0$ | $0$ | $<0.5 \ s$ |
$4$ | $abab$ | $> 0$ | $> 0$ | $>3 \ s$ |
Using this time differenec, we can predict that the next character is going to be $a$. In this same way we can progress prefix by prefix and predict every single character. This is exactly like a trie
data structure, where we can check if a string is a prefix of another string.
from tqdm import tqdm
import time
io = ProcessWrapper(['python', '/content/server.py'])
sofar = ''
pos = 0
io.recvline()
sz = 256
for _ in tqdm(range(sz)):
io.recvuntil(': ')
to_send = str(pos) + ' ' + sofar + 'a'
io.sendline(to_send)
start = time.time()
io.recvline()
T = time.time() - start
if T <= 0.5:
sofar += 'a'
else:
sofar += 'b'
pos += 1
io.recvuntil(': ')
io.sendline(sofar)
print(io.recvline())
iutctf{t1m1n6_4774cK_0n_Tr1e_D474_57ruC7ure}