iutCTF 2024 (III): Ghost of the Past misc Challenge
Writeup for my misc challenge Ghost of the Past challenge from iutCTF24.
Ghost of the past
TL;DR Timing attack on trie data structure.
#!/usr/local/bin/python
import random
import hashlib
print('Welcome Welcome welcome. Want the flag? Predict my letters and you will get it!')
def get_word(sz):
    ret = ''
    for _ in range(sz):
        toss = random.randint(1, 10)
        if toss <= 5:
            ret += 'a'
        else:
            ret += 'b'
    return ret
def MD5(s):
    return int(hashlib.md5(s.encode()).hexdigest(), 16)
sz = 50
word = get_word(sz)
mod = int(1e9 + 7)
def process_query(p, s):
    global word
    minSum = 1<<2048
    st, ed = 0, len(s) - 1
    while ed < len(word) and ed <= p:
        s_ = word[st : ed + 1]
        Sum = abs(MD5(s) - MD5(s_))
        minSum  = min(minSum, Sum)
        st, ed = st + 1, ed + 1
    return (pow(minSum, 42069) % mod + random.randint(1<<254, 1<<256)) % mod
for _ in range(sz):
    q = input('Enter your query: ').split()
    p, s = int(q[0]), q[1]
    if len(s) > sz:
        print('What a dumb thing to ask!')
        exit()
    print('Here is your answer:', process_query(p, s))
predicted = input('Moment of truth. Enter your prediction: ')
if word == predicted:
    print(open('flag.txt', 'r').read())
else:
    print('Lo053r!')
The goal is to predict a word with $50$ letters generated by the get_word function. If we take a close look at that function, we can see that the word it generates only contains the letters “a” and “b”. We have $50$ queries to interact with the server and predict the word.
Each interaction with the server requires us to provide an integer $p$ and a string $s$ which returns us a hint (huge number modulo $10^9 + 7$). The hint is generated by the process_query function.
Understanding the process_query function
Assume the word we have to guess is $W$. We send as query $p$, $s$. The string $s$ is used to do some sort of searching (we will see in a bit) inside the word $W$. The role of $p$ is to restict the search bound. That is, if the length of $W$ is $n$, i.e $W = w_1w_2\cdots w_n$, the search is going to take place only upto $w_p$ where $p \le n$. What happens at the search is that,
- A window equal to length of $s$ is taken from the starting position of the word $W$.
 - That is, two indices $st, ed$ is chosen where $st = 0, ed = \text{size}(s)-1$.
 - Let the window be $s’ = w_{st}\cdots w_{ed}$. The window is hashed and subtracted from the hash of the query word $s$. In oher words, $|\text{H}(s’) - \text{H}(s)|$ is calculated where $| \ |$ denotes absolute value.
 - At each iteration the variables $st, ed$ is increased by one and the hash difference is caluclated in the same way as explained in the previous step. This repeats until $ed > p$.
 - Of all the calculated differences, the minimum one is taken. That value(suppose $x$) is raised to a huge power ($42069$) and added with some huge random value before being reduced modulo $10^9 + 7$. The value $y = x^{42069} + rand(2^{254}, 2^{256}) \mod (10^9 + 7)$ is returned.
 
Thinking of a solution
The first step is to notice the pow function. In python, pow can work in two ways:
pow(a, b): Calculates $a^b$ with a time complexity of $O(b)$. Linear in terms of $b$.pow(a, b, p): Calculates $a^b \mod p$ with a time complexity of $O(\log_2{b})$ steps. Logarithmic in terms of $b$. Uses something called binary exponentiation.
Which means that pow(a, b) is extremely slower compared to pow(a, b, p). Suppose we have $a = 35, b = 42069, p = 10^9 + 7$.
| Function | Operations | Python Execution Time | 
|---|---|---|
pow(35, 42069) | $42069$ | $> 3 \ s$ | 
pow(35, 42069, 10^9 + 7) | $\log_2(42069) \approx 15$ | $< 0.5 \ s$ | 
This significant time difference is what we are going to leverage in this problem. A thing to note is that $pow(a, b)$ will work faster only if $a = 0$ or $a = 1$ in which case there will be some internal optimization and the result will be returned immediately.
Let $W = abaaabb$ and we have somehow managed to recover the prefix $aba$. How do we transition to the next state $abaa$? That is, how do we predict that the next character is $a$? We can send $q = 4$ to restrict the search within the window $abaa$ of $W$. Let use see what happens when we send $s = abaa$ vs when we send $s = abab$.
| q | s | minSum | pow(minSum, 42069) | Python Execution Time | 
|---|---|---|---|---|
| $4$ | $abaa$ | $0$ | $0$ | $<0.5 \ s$ | 
| $4$ | $abab$ | $> 0$ | $> 0$ | $>3 \ s$ | 
Using this time differenec, we can predict that the next character is going to be $a$. In this same way we can progress prefix by prefix and predict every single character. This is exactly like a trie data structure, where we can check if a string is a prefix of another string.
from tqdm import tqdm
import time
io = ProcessWrapper(['python', '/content/server.py'])
sofar = ''
pos = 0
io.recvline()
sz = 256
for _ in tqdm(range(sz)):
  io.recvuntil(': ')
  to_send = str(pos) + ' ' + sofar + 'a'
  io.sendline(to_send)
  start = time.time()
  io.recvline()
  T = time.time() - start
  if T <= 0.5:
    sofar += 'a'
  else:
    sofar += 'b'
  pos += 1
io.recvuntil(': ')
io.sendline(sofar)
print(io.recvline())
iutctf{t1m1n6_4774cK_0n_Tr1e_D474_57ruC7ure}