The Universal Instruction Stream: Why All Code is (Theoretically) The Same

Stephanie Domas
Jul 15
4 min read

Updated: Jul 16

As programmers, our purpose is to define sequences of instructions. That's our craft, our raison d'être. But what if I told you that, at a fundamental level, all code is identical? What if every program could be written and executed using the exact same instruction stream? This isn't about interpreters where a VM's instructions execute different programs; that's old hat. This is about the processor executing precisely the same sequence of instructions, regardless of what "program" it's running. It might sound like a philosophical crisis, but it's a fascinating reality we've explored.

Beyond the MMU: The M/o/Vfuscator's Role

Before we dive in, let's clarify. Some might point to the x86 MMU's Turing-completeness, where programs can run with no instructions at all, with the MMU doing the heavy lifting. While true, that's "elsewhere" computation. My goal was to demonstrate universal instruction streams across any architecture, where the processor itself executes the identical sequence.

Our journey to this revelation begins with a rather audacious tool: the M/o/Vfuscator. This ridiculous compiler transforms any C program into nothing but x86 mov instructions. Now, these aren't identical movs – registers, operands, addressing modes, and sizes vary. But the key is, they're all movs. This isn't just a party trick; it's a crucial first step. By reducing programs to a single instruction type, we drastically simplify the instruction stream, paving the way for further homogenization. The M/o/Vfuscator outputs a continuous loop of these mov instructions.

Homogenizing the movs: Towards RISC-like Simplicity

Even with just movs, x86 still offers a bewildering variety of addressing modes: mov eax, edx, mov dl, [esi+4*ecx+0x19afc09], and everything in between. Many architectures lack such complexity. To achieve true universality, we needed to simplify these diverse movs into a uniform, 4-byte, indexed addressing form, using as few registers as possible. This mimics the simple load/store operations typical of RISC architectures.

For instance, a mov eax, edx (register-to-register) can be transformed. We can write edx to a scratch memory location (e.g., [0x10000+esi]) and then read it back into eax. Similarly, partial register operations like mov al, [0x20000] require padding and careful manipulation to ensure all reads and writes are 4-byte indexed operations. Complex addressing modes are broken down by the M/o/Vfuscator itself, using a series of mov operations to perform shifts and additions, ultimately reducing them to a simple mov with an indexed address.

After these transformations, all our reads look like mov esi/edi, [base + esi/edi] and all our writes like mov [base + esi/edi], esi/edi. By strategically inserting dummy reads and writes, we further normalize the instruction stream to consist only of alternating reads and writes. The only remaining variations are the choice of register and the base address.

The Pseudo-MOVE Interpreter: Abstracting Registers to Memory

This simplified stream opens the door for a truly non-branching mov interpreter. We elevate CPU registers to "virtual," memory-based registers. This means we treat esi and edi as labels on 4-byte memory locations, rather than hardcoding logic for specific CPU registers. Our program now becomes a list of these pseudo-MOVE operations, each a tuple of memory addresses representing source, destination, and virtual index registers.

The interpreter itself is a fixed sequence of mov instructions that execute these pseudo-MOVE tuples. The physical esi register holds the address of the current tuple. The interpreter loop performs the following:

Reads the source and destination addresses from the current tuple, effectively "dereferencing" the virtual registers.
Performs the actual mov operation from the computed source to the computed destination.
Loads esi with the address of the next tuple to execute.
Jumps back to the beginning of the loop.

This creates the final system: a universal, single instruction stream. The operand list (our program's logic) is generated by the compiler, and this single, unchanging interpreter loop is appended to it.

Code snippet

start:
    mov esi, operands   ; Initialize pointer to the first operand tuple

loop:
    mov ebx, [esi+0]    ; Read part 1 of source address (e.g., virtual base)
    mov ebx, [ebx]      ; Dereference virtual base to get actual base
    add ebx, [esi+4]    ; Add offset (e.g., virtual index register value)
    mov ebx, [ebx]      ; Dereference to get the final source data

    mov edx, [esi+8]    ; Read part 1 of destination address
    mov edx, [edx]      ; Dereference virtual base for destination
    add edx, [esi+12]   ; Add offset for destination

    mov [edx], ebx      ; Perform the actual data transfer (the "MOVE")

    mov esi, [esi+16]   ; Load ESI with address of the NEXT tuple
    jmp short loop      ; Loop endlessly

The Existential Implication

What does this mean? It means all C programs, reduced and compiled through this process, resolve to this exact instruction stream. The instructions are simple, making them adaptable to other architectures. Crucially, there are no branches in this interpreter loop. The precise sequence of instructions executed by the processor is identical for every program. The program's logic isn't in the processor's branching decisions; it's externalized, distilled into a list of memory addresses that this mundane, endless data transfer loop processes.

So, if our job is to "code," and all "code" can be made equivalent, does our job become less interesting? Perhaps. But the essence of the program isn't gone; it's simply been removed from the immediate execution flow of the processor. It's diffused into a list of memory addresses.

When execution loses its branching meaning, and logic is distilled to nothing but data transfers, a programmer's job is no longer to "code." It's to "data!" It's a profound shift in perspective, isn't it?

You can find this project, including the reducing compiler and examples of AES and Minesweeper running with these identical instructions, on GitHub. It's a fun rabbit hole, I promise.

For more in depth check out PoC || GTFO volume 12 (starting page 28)

Watch my Shakacon Presentation here

And, to read more on my single instruction compiler check out this blog post: M/o/Vfuscator

Christopher Domas (@xoreaxeaxeax)