iutCTF 2024 (III): Ghost of the Past misc Challenge

2024-04-22 886 words 5 minutes

Writeup for my misc challenge Ghost of the Past challenge from iutCTF24.

Ghost of the past

TL;DR Timing attack on trie data structure.

#!/usr/local/bin/python

import random
import hashlib

print('Welcome Welcome welcome. Want the flag? Predict my letters and you will get it!')

def get_word(sz):
    ret = ''
    for _ in range(sz):
        toss = random.randint(1, 10)
        if toss <= 5:
            ret += 'a'
        else:
            ret += 'b'
    return ret

def MD5(s):
    return int(hashlib.md5(s.encode()).hexdigest(), 16)

sz = 50
word = get_word(sz)
mod = int(1e9 + 7)

def process_query(p, s):
    global word
    minSum = 1<<2048
    st, ed = 0, len(s) - 1
    while ed < len(word) and ed <= p:
        s_ = word[st : ed + 1]
        Sum = abs(MD5(s) - MD5(s_))
        minSum  = min(minSum, Sum)
        st, ed = st + 1, ed + 1
    return (pow(minSum, 42069) % mod + random.randint(1<<254, 1<<256)) % mod

for _ in range(sz):
    q = input('Enter your query: ').split()
    p, s = int(q[0]), q[1]
    if len(s) > sz:
        print('What a dumb thing to ask!')
        exit()
    print('Here is your answer:', process_query(p, s))

predicted = input('Moment of truth. Enter your prediction: ')
if word == predicted:
    print(open('flag.txt', 'r').read())
else:
    print('Lo053r!')

The goal is to predict a word with $50$ letters generated by the get_word function. If we take a close look at that function, we can see that the word it generates only contains the letters “a” and “b”. We have $50$ queries to interact with the server and predict the word.

Each interaction with the server requires us to provide an integer $p$ and a string $s$ which returns us a hint (huge number modulo $10^9 + 7$). The hint is generated by the process_query function.

Understanding the `process_query` function

Assume the word we have to guess is $W$. We send as query $p$, $s$. The string $s$ is used to do some sort of searching (we will see in a bit) inside the word $W$. The role of $p$ is to restict the search bound. That is, if the length of $W$ is $n$, i.e $W = w_1w_2\cdots w_n$, the search is going to take place only upto $w_p$ where $p \le n$. What happens at the search is that,

A window equal to length of $s$ is taken from the starting position of the word $W$.
That is, two indices $st, ed$ is chosen where $st = 0, ed = \text{size}(s)-1$.
Let the window be $s’ = w_{st}\cdots w_{ed}$. The window is hashed and subtracted from the hash of the query word $s$. In oher words, $|\text{H}(s’) - \text{H}(s)|$ is calculated where $| \ |$ denotes absolute value.
At each iteration the variables $st, ed$ is increased by one and the hash difference is caluclated in the same way as explained in the previous step. This repeats until $ed > p$.
Of all the calculated differences, the minimum one is taken. That value(suppose $x$) is raised to a huge power ($42069$) and added with some huge random value before being reduced modulo $10^9 + 7$. The value $y = x^{42069} + rand(2^{254}, 2^{256}) \mod (10^9 + 7)$ is returned.

Thinking of a solution

The first step is to notice the pow function. In python, pow can work in two ways:

pow(a, b) : Calculates $a^b$ with a time complexity of $O(b)$. Linear in terms of $b$.
pow(a, b, p) : Calculates $a^b \mod p$ with a time complexity of $O(\log_2{b})$ steps. Logarithmic in terms of $b$. Uses something called binary exponentiation.

Which means that pow(a, b) is extremely slower compared to pow(a, b, p). Suppose we have $a = 35, b = 42069, p = 10^9 + 7$.

Function	Operations	Python Execution Time
`pow(35, 42069)`	$42069$	$> 3 \ s$
`pow(35, 42069, 10^9 + 7)`	$\log_2(42069) \approx 15$	$< 0.5 \ s$

This significant time difference is what we are going to leverage in this problem. A thing to note is that $pow(a, b)$ will work faster only if $a = 0$ or $a = 1$ in which case there will be some internal optimization and the result will be returned immediately.

Let $W = abaaabb$ and we have somehow managed to recover the prefix $aba$. How do we transition to the next state $abaa$? That is, how do we predict that the next character is $a$? We can send $q = 4$ to restrict the search within the window $abaa$ of $W$. Let use see what happens when we send $s = abaa$ vs when we send $s = abab$.

q	s	minSum	`pow(minSum, 42069)`	Python Execution Time
$4$	$abaa$	$0$	$0$	$<0.5 \ s$
$4$	$abab$	$> 0$	$> 0$	$>3 \ s$

Using this time differenec, we can predict that the next character is going to be $a$. In this same way we can progress prefix by prefix and predict every single character. This is exactly like a trie data structure, where we can check if a string is a prefix of another string.

from tqdm import tqdm
import time

io = ProcessWrapper(['python', '/content/server.py'])

sofar = ''
pos = 0

io.recvline()

sz = 256
for _ in tqdm(range(sz)):
  io.recvuntil(': ')
  to_send = str(pos) + ' ' + sofar + 'a'
  io.sendline(to_send)
  start = time.time()
  io.recvline()
  T = time.time() - start

  if T <= 0.5:
    sofar += 'a'
  else:
    sofar += 'b'
  pos += 1

io.recvuntil(': ')
io.sendline(sofar)

print(io.recvline())

iutctf{t1m1n6_4774cK_0n_Tr1e_D474_57ruC7ure}

iutCTF 2024 (III): Ghost of the Past misc Challenge

Ghost of the past

Understanding the process_query function

Thinking of a solution

Understanding the `process_query` function