obfuscation

Attacking a stack based JavaScript virtual machine

Analysing Pixelmelt's Javascript Virtualisation Obfuscation - A Research Study

Ciarán

16 Jul 2025 • 13 min read

Artistic rendition of how I was asked to help with project

I was asked by my friend @PixelMelt to help out with a research project he was working on, focused on the analysis of a few JavaScript obfuscators. I happily agreed to take part and was asked to analyse one of three different obfuscators:

Obfuscator.io: Likely the most common JavaScript obfuscator.
JS-Confuser: A newer obfuscator with some nice techniques.
PISTOL: PixelMelt's JavaScript Virtualisation Obfuscation.

I chose to go with PISTOL as I am already familiar enough with the others, and VM based obfuscation is much more interesting than transformation obfuscation.

Preface:

You are going to be given 3 JavaScript samples to attempt to retrieve a flag from, easy, medium, and hard. 
You will stop after attempting to retrieve the flag from each sample either after 120 minutes pass, or you find the flag. 
The flag is a string in the format “FLAG{}”. 
Once you complete each test you will fill out the following form with your results. 
You should start with the easiest sample and work up to hard. 
You will have until June 28th to submit the results.

Pix did also make it clear that the primary goal of this study was not just to successfully reverse the samples, but to gather useful and thoughtful feedback on the challenges and techniques used. The analysis and insight provided being more important than just recovering the flag.
With this information I set out downloading the PISTOL samples 1.0.js, 2.o.js & 3.o.js, then writing down a few goals I had before starting.

Follow along:

If you would like to follow along with this post or attempt to solve the samples yourself beforehand (little competition between us lol) you can find the samples and my full solutions here:

Goals:

Complete all 3 challenges within the 120 minute time constraint
Reverse each sample properly but I also want to try and attack it in ways that I am maybe not expected to
We are supposed to be emulating an attacker, so thinking like an attacker I want to try the simplest possible attack vectors first
It will be funny if I can guess a flag :>

`1.0.js`: Swimming in the Constant Pool

After downloading the challenges, I saw what appeared to be Terser output, a popular aggressive JavaScript mangler and compressor. I ran the sample which revealed in its default state the program returns false, our goal is likely to make the program return true.

Establishing this goal I skimmed the code, making some observations:

The sample abuses quirks such as invoking functions via JavaScript's backticks e.g. V``
Around the bottom I noticed that what I assumed to be the VMs bytecode. It looked to be base64 encoded.
module.check at the very bottom seems to be what validates our flag, but check isn't defined anywhere.

Using that information, we want to try and look for base64 related functions to determine the VM entry point as it will need to be decoded first. We also know that module.check must be registered by the VM, we maybe want to do some tracing to try and figure out where its created.

Now we know what we're looking for, I started by doing a simple Ctrl + f search for "atob" and "base64" which didn't yield any immediate results. So I decided to look a bit closer, searching for logic which contained functions that are commonly used for string manipulation and came across 2 functions.

Function 1:

R = ((A, e = 0) => {
  for (l = A[E], n = (3 * l + 1) >> 2, a = e ? (n % 1 ? -~n : n / e) * e : n, f = new w(a), i = 0, _ = 0, s = 0; s < l; s++)
    if (((p = 3 & s), 
      (i |= ((r = A[s].charCodeAt(0)), ((r > 64) & (r < 91) ? r - 65 : (r > 96) & (r < 123) ? r - 71 : (r > 47) & (r < 58) ? r + 4 : 43 == r ? 62 : 47 == r ? 63 : 0) << (6 * (3 - p))
    )), 3 == p || l - s == 1)) {
      for (d = 0; d < 3 && _ < a; d++, _++) f[_] = (i >>> ((16 >>> d) & 24)) & 255;
      i = 0;
    } return f;
})(e)

This is the Base64 decoder, it's manually implemented but has the same character-to-value lookup table as standard b64 implementations.

Ctrl + clicking on the identifer R reveals that it is passed to an IIFE, which just so happens to be our second function. 👀

Function 2:

x = ((A) => {
	let e = 0;
	for (; 35 != A[e] || 74 != A[e + 1] || 13 != A[e + 2] || 91 != A[e + 3]; )
		e++;
	e += 4;
	let t = new c(A[U][b](e, e + 2))[0],
		r = new g(A[U][b](e + 2, e + 6))[0];
	e += 6;
	let l = new c(A[U][b](e, e + 6 * t)),
		n = {};
	for (let A = 0; A < l[E]; A += 3)
		n[(l[A + 2] << 16) | l[A + 1]] = String.fromCharCode(l[A]);
	e += 6 * t;
	let o = '',
		a = 0,
		f = 0,
		i = 0;
	for (let t of A[b](e))
		for (let A = 7; A >= 0 && i < r; A--) {
			(a = (a << 1) | ((t >> A) & 1)), f++, i++;
			let e = (a << 16) | f;
			n[e] && ((o += n[e]), (a = 0), (f = 0));
		}
	return o.split('|~|');
})(R)

It's pretty clear this function is doing some more manipulation on our now-decoded bytecode, and notably returning a string split at the separator |~|. This really stood out to me, so I added a console.log to figure out what it was.

Running the program a second time the console outputted:

VM62:96 FLAG{l3vel-0ne}|~|check
VM62:96 FLAG{l3vel-0ne}|~|check
false

Solving the challenge with the first try with the flag living in the constant pool.🎉

| Time taken (HH-MM): 00:05

`2.o.js`: Chasing a Rolling Key

Since the flag was so simple, in the first challenge and the constraints of the challenge allow for flag retrieval "by any means necessary". I followed the "Keep it Simple, Stupid" (KISS) principle, I attempted to run it again with a guessed flag. The previous flag was FLAG{l3vel-0ne}, so a logical guess was FLAG{l3vel-Tw0}.

Picking at the Seams

Glancing over the code once again, it looked virtually identical to the previous sample, so I decided to once again inspect the constant pool, outputting:

length|~|charCodeAt|~|check

This suggested the program's logic would:

length - Perform a length check on our input.
charCodeAt - Iterate through the character codes of the flag.
check - Register the check function.

Another "Find Usages" on the IIFE revealed it was only called by a single instruction.
Pulling up the chrome debugger and breakpointing this instruction revealed the first few operations:

The VM accessed check to create the module.check function, just as it had before.
It indexed length and the program terminated.

My assumption was that my empty input flag was failing the length check immediately. With this info I had two paths forward:

Bruteforce the length by trying different inputs and counting instructions executed.
Inspect the stack to see if the correct length was ever directly compared.

I chose the latter, as it would likely reveal more information.

A Glimmer in the Stack

Upon inspecting the stack, before any length-related logic even executed, something caught my eye; a newly created array of numbers.
[207, 130, 193, 133, 250, 206, 188, 200, 237, 148, 255, 134, 254, 222, 226, 144]

My first instinct was to map these to characters, but the result was a jumble of unprintable data: Ï\x82Á\x85úÎ¼Èí\x94ÿ\x86þÞâ\x90. This screamed "encrypted data." I mentally tagged it as the encrypted_array and kept digging.

Just after the length string was pulled from the constant pool, two numbers were pushed to the stack. The second was 16 the exact same length as my encrypted_array. The pieces were starting to click.

I set my input flag to a dummy value of the correct length, FLAG{xxxxxxxxxx}, and re-ran the debugger. Stepping through I saw the VM performed a strict not-equal (!==) check between my flag's length and 16, then took the logical NOT of the result (false). This inverted logic meant my flag of length 16 passed the first check. 🎉

Unraveling the Cipher

My working theory was that the program XORed the character codes of my input against the encrypted_array. So I tried to find any XOR related opcodes and placed breakpoints on them.

My assumption was close, but slightly off. Stepping through the execution revealed the following sequence for the first character, 'F':

The flag FLAG{xxxxxxxxxx} and charCodeAt are popped from the stack to get the character code of the character at index 0.

<- POPPED: FLAG{xxxxxxxxxx}
<- POPPED: charCodeAt
<- POPPED: 0
-> PUSHED: 70  // charCode for 'F'

This character code is then XORed with 137. I'd overlooked it before, but 137 was pushed to the stack right before the encrypted_array. This was our initial key.

<- POPPED: 70
<- POPPED: 137
-> PUSHED: 207

This result, 207, is then XORed against the character's index in the string (0).

<- POPPED: 207
<- POPPED: 0
-> PUSHED: 207

Finally, this result is compared against the first element of our encrypted_array. So, encrypted_array was really validation_array.

Following the execution for the second character, 'L' which was accessed sequentially, revealed the key wasn't static; it was a rolling key.

<- POPPED: 76
<- POPPED: 207
-> PUSHED: 131

The character code for 'L' (76) is retrieved, then it's XORed against the result of the previous operation. This result now serves as the key for the current character.

Reversing the final algorithm

The full validation logic was essentially doing:

result = (key ^ char_code) ^ index;
is_valid = (result === validation_arr[index]);

Since XOR is a reversible operation we can do the exact opposite to generate our flag:

char_code = (validation_array ^ index) ^ key;

Mapping the result to a string, revealed the final flag: FLAG{1ts-part-2}. 🎉

| Time taken (HH-MM): 00:17

`3.o.js`: When Tracers Lie

The third challenge began with familiar routine, with the script once again looking very similar to the previous samples.

Running the script revealed another length check, this time for 20 characters, using the same strict not-equal (!==) comparison as before. I padded my input to the correct length, then started my stack inspection.

This is where the routine broke down. The volume of data being pushed and popped was overwhelming; it was impossible to follow the logic mentally. I needed to see the machine's instructions, not just its data. I decided to pivot and build a naive tracer by asking Google Gemini to inject console.log statements for each opcode. In retrospect, I should have done this myself... :/

The trace worked, but it produced over 18,000 lines of output. While still massive, it was a structured view of the program's execution and we could hunt for patterns.

A Fingerprint: Key-Scheduling Algorithm

Almost immediately, a repeating block of operations at the very beginning of the trace caught my eye.

[PUSH_CONST] 256
[GET_CLOSURE_VAR] id: 12
[LT] 256 < 0
[BRANCH_IF_FALSE] cond: true target: 553
[PUSH_CONST] 257
[GET_CLOSURE_VAR] id: 12
[GET_CLOSURE_VAR] id: 11
[GET_CLOSURE_VAR] id: 9
[MUL] 6 * 251
[ADD] 0 + 1506
[MOD] 257 % 1506
[SET_CLOSURE_VAR] id: 9, scope: 0, value: 221
...
[INC] 0 + 1
[SET_CLOSURE_VAR] id: 12, scope: 0, value: 1
...
[JUMP] target: 350

This pattern of MUL -> ADD -> MOD was executing inside a loop that incremented from 0 to 255. Reconstructing this into a high-level representation:

// Closure variables
let S = [];       // v7
let j = 251;      // v9 (initial value from first iteration)
const C1 = 6;     // v11 (constant value from first iteration)

// Loop from i = 0 to 255
for (let i = 0; i < 256; i++) {
  // Update state variable 'j'
  j = (i + C1 * j) % 257;
  // Store the new 'j' in array 'S' at the current index 'i'
  S[i] = j;
}

My pattern-recognition alarms went off. This reminded me of a modified version of the Key-Scheduling Algorithm (KSA) from the RC4 stream cipher. The program was initialising a state array, or S-box, for a symmetric encryption scheme.

for i from 0 to 255
    S[i] := i
endfor
j := 0
for i from 0 to 255
    j := (j + S[i] + key[i mod keylength]) mod 256
    swap values of S[i] and S[j]
endfor

RC4 Psudeocode for reference

Sanitising the S-Box

After this first loop, the KSA array was fully populated but contained duplicate values. The program entered a second 256-iteration loop.

[PUSH_FALSE]
[PUSH_CONST] "fill"
[PUSH_CONST] 256
[PUSH_CONST] "Array"
[GET_GLOBAL] d[Array]
[NEW] constructor: 256 args count: 1
[CALL_PROP] obj: false prop: fill args count: 1
[SET_CLOSURE_VAR] id: 14, scope: 1, value: (256) [false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, …]
[POP_MULTI] count: 1
[PUSH_CONST] 256
[PUSH_CONST] "Array"
[GET_GLOBAL] d[Array]
[NEW] constructor: 256 args count: 1
[SET_CLOSURE_VAR] id: 16, scope: 1, value: (256) [empty × 256]
[POP_MULTI] count: 1
[PUSH_CONST] 0
[SET_CLOSURE_VAR] id: 18, scope: 1, value: 0
[POP_MULTI] count: 1
[PUSH_CONST] 0
[SET_CLOSURE_VAR] id: 19, scope: 1, value: 0

This section initialises some new variables:

id: 14: An array of 256 false values. Let's call this unconfident_array.
id: 16: A new empty array of size 256. Let's call it KSA2.
id: 18: A counter initialised to 0. Let's call this fallback_counter.
id: 19: Another counter initialised to 0. This is the main loop counter.

Tracing its logic revealed a sanitisation process:

It iterates through the initial KSA.
For each value v = KSA[i], it checks if v has been seen before using the unconfident_array.
If v is new, it's placed into a second array, KSA2, at the current index i, and marked as "seen".
If v is a duplicate, the program uses the fallback_counter to find the next available integer (from 0-255) that hasn't been used, places that into KSA2[i], and marks it as seen.

The goal of this entire process was to create KSA2: a valid permutation where every number from 0-255 appears exactly once. This was the final, clean S-box for the cipher.

The Cipher's Core

With the key schedule established, the final validation loop began.

[SET_CLOSURE_VAR] id: 21, scope: 1, value: 83
[CREATE_ARRAY] with 20 items from stack
[SET_CLOSURE_VAR] id: 23, scope: 1, value: (20) [85, 39, 30, ...]
[SET_CLOSURE_VAR] id: 25, scope: 1, value: 83

It initialised a state variable (id: 21) to 83 and loads a hardcoded 20-byte array, which I tagged target_data. This was the ciphertext my input needed to produce.

For each character of my 20-character input, the VM performed the following steps:

Calculate an index into the S-box: x = state ^ i ^ charCodeAt(i)
Get the resulting keystream byte: result = KSA2[x]
Check its valid: is_valid = (result === target_data[i])
Update the state for the next round: state = result

Reversing the final algorithm and a Painful Twist

Since the entire cipher was based on XOR and S-box lookups, it was reversible. To find the original character code, I just needed to work backward:

For a given target_data[i], find the index j in KSA2 such that KSA2[j] == target_data[i].
This j is the result of the initial XOR chain: j = state ^ i ^ charCodeAt(i).
Solve for the unknown: charCodeAt(i) = state ^ i ^ j.
Remember that the state for the next iteration is simply the current target_data[i].

I wrote a solver script to perform the KSA generation and then run this decryption logic...

Which gave me the flag FLAG{chster-thr2e2}...

...what?

// This array (closure var 7) is generated by the first major loop
const initial_ksa = [
  221, 42, 254, 242, 171, 3, 24, 151, 143, 96, 72, 186, 100, 99, 94, 65, 149,
  140, 87, 27, 182, 85, 18, 131, 39, 2, 38, 255, 16, 125, 9, 85, 28, 201, 212,
  22, 168, 17, 140, 108, 174, 57, 127, 34, 248, 248, 249, 256, 42, 44, 57, 136,
  97, 121, 9, 109, 196, 205, 3, 77, 8, 109, 202, 247, 4, 89, 86, 69, 225, 134,
  103, 175, 94, 123, 41, 64, 203, 10, 138, 136, 125, 60, 185, 165, 46, 104, 196,
  235, 213, 82, 68, 242, 2, 105, 210, 70, 2, 109, 238, 242, 10, 161, 40, 86,
  106, 227, 183, 177, 142, 190, 222, 158, 32, 48, 145, 214, 115, 36, 77, 67, 8,
  169, 108, 0, 124, 98, 200, 42, 123, 96, 192, 255, 120, 82, 112, 36, 95, 193,
  11, 205, 85, 137, 193, 16, 240, 43, 147, 1, 154, 45, 163, 101, 244, 75, 90,
  181, 214, 156, 66, 41, 149, 27, 67, 51, 213, 158, 86, 169, 154, 65, 46, 190,
  27, 78, 128, 172, 180, 229, 10, 239, 72, 99, 5, 213, 177, 219, 215, 192, 55,
  5, 220, 226, 6, 229, 26, 94, 246, 131, 213, 192, 67, 89, 222, 250, 162, 149,
  72, 125, 187, 46, 229, 43, 213, 206, 165, 177, 250, 175, 240, 117, 151, 99,
  45, 236, 98, 42, 221, 11, 37, 194, 109, 114, 145, 75, 170, 227, 56, 59, 78,
  193, 113, 148, 102, 84, 234, 107, 117, 178, 31, 178, 33, 192, 119, 196, 145,
  97,
];

const KSA2 = new Array(256);
const unconfident_array = new Array(256).fill(false);
let fallback_counter = 0;

for (let i = 0; i < 256; i++) {
  const value_from_ksa = initial_ksa[i];
  if (!unconfident_array[value_from_ksa]) {
    KSA2[i] = value_from_ksa;
    unconfident_array[value_from_ksa] = true;
  } else {
    while (unconfident_array[fallback_counter]) {
      fallback_counter++;
    }
    KSA2[i] = fallback_counter;
    unconfident_array[fallback_counter] = true;
  }
}

// The target ciphertext array (closure var 23)
const target_data = [
  85, 39, 30, 68, 77, 34, 203, 235, 240, 137, 195, 215, 230, 71, 201, 110, 203,
  226, 129, 216,
];
// correct array
// [85,39,30,68,77,34,203,232,240,137,195,215,230,71,201,110,203,226,129,216]

// The initial state (closure var 21)
const initial_state = 83;

let flag = "";
let state = initial_state;
target_data.map((keystream_byte, i) => {
  const j = KSA2.indexOf(keystream_byte);

  // Original: j = state ^ i ^ charCode
  // Reversed: charCode = state ^ i ^ j
  const charCode = state ^ i ^ j;
  flag += String.fromCharCode(charCode);
  state = keystream_byte;
})

console.log("Recovered Flag:", flag);

solve-flag.js

Not being weary of how much time I had left I started re-analysing. I thought I must have missed some novel technique that was influencing the target_data somehow. The target_data array was the perfect length for our flag, but the flag string result FLAG{chster-thr2e2} was 19 characters, we needed 20?

After trying and failing to understand where I had went wrong, I finally noticed I had ran out of time :( and just guessed the flag: "chster"? That had to be "chapter." Our flag FLAG{ch4pter-thr2e2}, passed in the original VM but not my traced version?

I had the flag, but the fact that it only worked in the untampered VM revealed that the issue was with my own tooling.

Postmortem

The most frustrating part of this challenge was this near-miss. My logic was fine but the solver seemingly for no reason produced a bad result. The AI-generated trace was the culprit.

In its task to add logging into the VM's instructions, Gemini had made a single, almost imperceptible change. It had converted one character in the Base64-encoded bytecode from an 'A' to a 'Y'.

'A' in Base64 is index 0 (6-bit 000000).
'Y' in Base64 is index 24 (6-bit 011000).

This two-bit flip cascaded through the decoding process, corrupting a single byte in the hardcoded target_data array, arguably the worst thing it could have broken. The value 232 became 235. My reliance on an LLM for a "simple" but critical task introduced a bug that was nearly impossible to spot, leading me down a rabbit hole of debugging my already correct logic. This experience was a harsh lesson in trusting black-box tools: the blame lies with me for not validating the output. :/

| Time taken (HH-MM): 01:00

Did I achieve my goals?:

Goal	Success
Complete all challenges within time constraint	❌
Attack in novel way	✅
Simple attack vectors	✅
Guess a flag	✅ *not how I hoped

Thanks

Big thank you to @PixelMelt for providing the opportunity to take part and the fun challenge! :>