# PLAN — Comptime Bytecode VM + flat memory (then re-home the compiler-API on it) > **Direction change (2026-06-17).** The comptime compiler-API stream pivots off the > **byte-weld**. The weld (sx structs whose layout is validated to mirror the > compiler's Zig types) + the **serialization / marshaling** bridge at the call > boundary is the wrong direction — it bolts a parallel layout regime and hand-built > byte-copies onto a comptime value model that fundamentally isn't bytes. We strip it > and build the right foundation: a **bytecode VM over flat, byte-addressable > memory**, where comptime values ARE native bytes (like runtime). On that base the > compiler-API needs no weld, no validation, no marshaling — the compiler's own types > are read/built directly as memory and its functions take/return real pointers. > > Supersedes the build order in `design/comptime-compiler-api.md` (kept for history). > This is the active plan for the stream. Branch: `reify`. ## Why `src/ir/interp.zig` is a tree-walking interpreter over the SSA IR that represents every value as a tagged `Value` union (`int`, `float`, `aggregate: []const Value`, `type_tag`, `heap_ptr`, …). Two consequences: 1. **Slow.** Per-value boxing in a tagged union; per-op `switch` over `Inst`; an aggregate is a heap `[]const Value`, walked element-by-element. 2. **Not native memory.** A struct value is `[]const Value` (tagged unions), NOT the struct's bytes. So a comptime `@ptrCast(*StructInfo)` reads the `Value` union's memory, not a `StructInfo` — which forced the whole weld+marshal detour. Make comptime values **native bytes in a flat memory** and both problems dissolve: structs/arrays/slices are their bytes at natural layout (no weld), the compiler's own records are directly addressable (no marshal), and a bytecode loop over flat memory is fast. ## End state - Comptime execution = a **bytecode VM** over a **flat linear memory** (real host-allocated bytes; layout is **target-aware** via the type table's sizes). Values are bytes at addresses plus a scalar register file. No tagged `Value` union. - The comptime compiler-API: the compiler **exposes its real types + functions** to comptime sx. sx reads/builds them as native memory and calls compiler functions by pointer. No `abi(.zig)` weld, no `validateStructLayout`, no `register_struct` field-by-field marshaling — gone. - `declare`/`define`/`type_info` and `#compiler`/`BuildOptions` ride this one mechanism; the bespoke interp arms are deleted. ## Principles (hold at every step) - **Green at every step.** `zig build && zig build test` pass after each sub-step. The existing tagged-`Value` interpreter stays the live evaluator until the VM reaches corpus parity; swap behind a build flag, then delete the old path. - **Target-aware, not host-baked.** Flat-memory layout uses the type table's target sizes (`pointer_size`, `typeSizeBytes`/offsets), NEVER host `@sizeOf`. This is what keeps cross-compilation correct (the JIT-comptime alternative could not). - **Sandboxed.** Flat-memory accesses are bounds-checked; step/call-depth budgets remain; an OOB / bad access traps to a build-gating diagnostic with a source span — never a compiler-process crash. - **No silent fallbacks** (per CLAUDE.md): an unhandled op / shape bails loudly with a named reason, never a zero/default that looks like success. ## Phases ### Phase 0 — Strip the weld / serialize / marshal machinery Delete the wrong-direction code so the VM builds on a clean base. Pure removal + corpus rebaseline; suite green. - `src/ir/compiler_lib.zig`: the reflection (`weldStruct` / `bound_types` / `FieldLayout` / `BoundType`), the layout validation (`validateStructLayout` / `LayoutMismatch` / `SxField`). Decide the fate of the `bound_fns` host-call registry (`intern`/`text_of` handlers) — it is likely subsumed by the VM's compiler-call path in Phase 3, but `intern`/`text_of` may survive as the first such calls. - `src/ir/lower/nominal.zig`: `validateWeldedStruct` + `weldedFieldOrderStr` + the `sd.abi == .zig` validation call in `registerStructDecl`. - `src/ir/interp.zig`: the `compiler_welded` dispatch branch. - `src/backend/llvm/ops.zig`: the `emitCall` comptime-only gate keyed on `compiler_welded` (re-derive the comptime-only guard from a non-weld signal if still needed). - Corpus: retire / convert the weld examples + diagnostics — `0625`, `0627` (welded struct), `1183`, `1186` (weld-layout diagnostics), `1184`/`1185` (welded-fn). Keep `0626` (`intern`/`text_of` round-trip) only if it survives the new call path. - **Keep (re-evaluate in Phase 3), independent of the weld semantics:** the `#library "compiler"` decl, the `abi(.x)` annotation + `extern ` syntax, and the `callconv → abi` unification. These are surface syntax that may still serve the compiler-API; only the *weld semantics* are stripped here. **Verification:** `zig build test` green with the weld machinery gone; the surviving syntax still parses (parser unit tests). ### Phase 1 — Flat-memory value model (still IR-walking, no bytecode yet) Introduce flat memory and move comptime values onto it, **decoupled from bytecode** so the value-model change is isolated. Each sub-step ports one op group and keeps the corpus green; the OLD tagged path stays behind a build flag (`-Dcomptime-flat`) until all groups land, then the shim is deleted. 1. **Machine + scalars.** A flat memory region (host `[]u8`) with a stack (frames) + bump-allocated heap, and a scalar register file. Port `int`/`float`/`bool`/`undef` and arithmetic/compare/branch. Aggregates still go through a compat shim to the old representation. 2. **Aggregates.** Structs/arrays/tuples laid out in flat memory at **target** layout; port `struct_init` / `struct_get` / `array` / `index_gep` to read/write bytes at computed offsets. 3. **Slices / strings.** `{ptr, len}` fat pointers in flat memory. 4. **Optionals / enums / tagged unions.** Tag + payload bytes. 5. **Pointers.** `alloca` / `store` / `load` / GEP unified onto flat addresses; retire `slot_ptr` / `heap_ptr` / `byte_ptr` in favor of flat-memory addresses. 6. **Closures.** Fn id + captured env materialized in flat memory. 7. **Extern / host calls.** A struct arg is already bytes → pass its address; this removes most of `marshalExternArg`. 8. **Reflection / minting.** `declare` / `define` / `type_info` read flat-memory values; type-table mutation copies escaping data into compiler-owned memory at the boundary (lifetime), as today. **Verification:** with `-Dcomptime-flat` the full corpus (currently 692) is byte-for- byte identical to the tagged path; then make flat the default and delete the shim. ### Phase 2 — Bytecode Compile a comptime function's IR → a compact bytecode and execute the bytecode instead of walking `Inst`. Pure encoding/speed; semantics identical to Phase 1. Land at least a minimal register-bytecode loop (the stream's stated goal is a *bytecode* VM); a fragment cache is optional follow-up. **Verification:** corpus identical to Phase 1; comptime throughput measurably improved on a heavy-comptime micro-benchmark. ### Phase 1.final — host wiring (the remaining integration) The wiring ENTRY POINT exists: `comptime_vm.tryEval(gpa, module, func_id) ?Value` runs a comptime function entirely on the VM and returns a legacy `Value`, or `null` to fall back. Unit-tested (pure `6*7` → 42; unsupported → null). Remaining to actually route the host through it: 1. **Panic→error hardening (prerequisite).** `Machine.readWord`/`writeWord`/`bytes` currently `assert` (debug panic) on null/OOB. For arbitrary host functions to be safe, make them return `error.OutOfBounds` so a malformed run BAILS (→ null → legacy) instead of crashing the compiler. Ripples through `readField`/`writeField`/slice helpers (add `try`). 2. **Implicit context.** Host comptime functions may have `has_implicit_ctx` (param 0 = `*Context`); the legacy `run` materializes a default ctx. The VM `run` does not — so either materialize it too, or only route `tryEval` at funcs without implicit ctx. 3. **Wire one site** behind a flag/env (`SX_COMPTIME_FLAT`, → `-Dcomptime-flat` later): the const-init fold in `emit_llvm.zig` `emitGlobals` (`result = tryEval(...) orelse interp.call(...)`). Default off → corpus unaffected. 4. **Parity + coverage.** Run the corpus with the flag ON; results must be byte-identical to legacy. Measure how many comptime evals the VM already handles; the bail `detail`s name what to port next (tagged-union payload / any / closures / builtins). 5. Grow coverage (port the deferred ops + `call_builtin`/`compiler_call` via the bridge) until the VM is the default and the legacy path is deleted. ### Phase 3 — Compiler-API on flat memory (resume the stream — no weld) With native-byte comptime values, re-home the compiler-API: - **Expose the compiler's real types.** Register the actual `types.zig` records (`StructInfo`, `EnumInfo`, `Field`, …) into the comptime type table under sx-visible names, with their **real (host) layout** — the type IS the compiler's, so there is nothing to validate or keep in sync. (This is the projection that *replaces* the weld's reflection — owned by the compiler, not declared in sx.) - **Expose the compiler's functions.** `register_struct`, `find_type`, `intern`, `text_of`, and the reflection readers operate on flat-memory pointers / handles directly (no marshaling — the bytes already ARE the record). - **Re-express** `declare` / `define` / `type_info` as sx over these; delete the bespoke interp arms (`defineStruct` / `defineEnum` / `defineTuple` / `reflectTypeInfo`); migrate `examples/0622` (struct), `0619`/`0620`/`0623` (enum/tuple). - **Migrate `BuildOptions`** off `#compiler` onto this mechanism; **delete `#compiler`**. **Verification:** the metatype + `#compiler` surfaces are gone, re-expressed as sx over the exposed compiler-API; full corpus green. ## Open questions (resolve as reached, record decisions here) - **Host-ABI vs target-ABI split.** The compiler runs on the host, so its OWN exposed records are host-laid-out; user comptime types are target-laid-out. The flat-memory model must carry both regimes (a per-type ABI tag on layout queries). Confirm the boundary where a flat-memory pointer to a compiler record is handed to host Zig code uses host layout. - **Exposing compiler types to sx.** Mechanism for projecting `types.zig` records into the comptime type table with real offsets (the non-weld replacement) — a registry the compiler owns, keyed by sx-visible name → real Zig type's layout + a host-call ABI. - **Bytecode shape.** IR-derived vs a fresh ISA; register vs stack; fragment caching. - **Pointer escape / lifetime.** Flat-memory pointers stored into the persistent type table must be copied into compiler-owned memory at the boundary (as today). - **Old-path retirement.** Keep the tagged interpreter until Phase 1 parity, then delete — confirm no non-comptime consumer depends on `Value`. ## File map (current → touched) | Area | File | Phase | |------|------|-------| | Comptime evaluator | `src/ir/interp.zig` | 0 (strip weld dispatch), 1–2 (rebuild) | | Weld registry | `src/ir/compiler_lib.zig` | 0 (strip), 3 (replace with type/fn exposure) | | Weld validation | `src/ir/lower/nominal.zig` | 0 (strip `validateWeldedStruct`) | | Comptime-only gate | `src/backend/llvm/ops.zig` | 0 (re-derive without weld signal) | | Host-FFI marshalling | `src/ir/host_ffi.zig` | 1 (struct-by-pointer trims it) | | Metatype arms | `src/ir/interp.zig` (`defineStruct`/…/`reflectTypeInfo`) | 3 (delete, re-express in sx) | | `#compiler` / BuildOptions | `library/modules/build.sx`, `src/ir/compiler_hooks.zig` | 3 (migrate, delete `#compiler`) | | Surface syntax | `src/parser.zig`, `src/ast.zig` (`abi`/`extern`/`#library`) | kept; revisited Phase 3 | ## Status - **Phase 0 — DONE (2026-06-17).** The struct-weld machinery is stripped: `compiler_lib.zig` lost the type registry (`weldStruct`/`bound_types`/`BoundType`/ `FieldLayout`/`findType`/`SxField`/`LayoutMismatch`/`validateStructLayout`); `nominal.zig` lost `validateWeldedStruct`/`weldedFieldOrderStr` + the `sd.abi == .zig` call; the struct-weld unit tests + examples `0625`/`0627`/`1183`/ `1186` are removed. **Decision (recorded):** the `intern`/`text_of` function host-call bridge is KEPT — it is a clean scalar dispatch (string→handle), not weld/serialize/marshal, and is the seed Phase 3 grows the compiler-call path from. So the `compiler_welded` dispatch (`interp.callExtern` is unchanged at HEAD — the pre-branch in `call()`), `weldedCompilerFn` (decl.zig), the `emitCall` comptime-only gate (ops.zig), and examples `0626`/`1184`/`1185` stay. The `#library`/`abi`/`extern` SYNTAX stays. `zig build test` green (688 corpus, 0 failed; unit tests pass). - **Phase 1 — in progress.** - **Sub-step 1 — DONE.** `src/ir/comptime_vm.zig`: the flat-memory `Machine` (linear byte memory + bump/stack allocator with `mark`/`reset` reclamation + scalar `readWord`/`writeWord` (1/2/4/8, little-endian) + `bytes` views; addr 0 reserved as `null_addr`) and `Frame` (register file indexed by Ref + stack reclamation on `deinit`). A register `Reg` is a raw u64 — immediate scalar OR `Addr`. Standalone + unit-tested (`comptime_vm.test.zig`, in the barrel); does NOT touch the live interpreter, so the corpus stays green (688). No op execution yet. - **Sub-step 2 — DONE.** The executor (`Vm` in `comptime_vm.zig`): walks the SAME IR `Inst` over flat-memory frames, mirroring the legacy interp's scalar semantics (i64 wrapping/signed + f64 register words, keyed off the result/operand `TypeId`). Ported: constants (`const_int`/`float`/`bool`/`null`/`undef`), arithmetic (`add`/`sub`/`mul`/`div`/`mod`/`neg`), comparison (`cmp_*`), logical (`bool_and`/`or`/`not`), conversions (`widen`/`narrow`/`bitcast` passthrough, `int_to_float`/`float_to_int`), terminators (`br`/`cond_br`/`ret`/`ret_void`) and `block_param` (branch args passed as Refs — the same frame persists, SSA-safe). Any other op bails loudly (`error.Unsupported` + `detail = @tagName(op)`). Unit-tested on hand-built IR (`Fb` builder): integer add, f64 arithmetic, cond_br branch selection, a block-param loop summing i..1, div-by-zero + unsupported-op bails. Corpus untouched (688 green) — the executor is exercised by unit tests only, not yet wired to real comptime eval. - **Sub-step 3 — DONE.** Memory + structs on flat memory. `Vm` gained an optional `table: *const TypeTable` (target-aware layout). Ported `alloca`/`load`/`store` (over flat addresses, `Store.val_ty` drives width) and `struct_init`/`struct_get`/ `struct_gep` (structs laid out at the table's natural offsets). The value model: a `Kind.word` (scalar/pointer ≤8B) sits in a register; a `Kind.aggregate` (struct) lives in flat memory and its "value" IS its address (read returns the address, write memcpys), so nested structs compose and `struct_gep` is just base+offset (no field-pointer dance). `kindOf` bails loudly on the not-yet-ported types (slice/string/any/optional/enum/array/tuple/…). The Addr-based value model survives allocator realloc (offsets are stable; slices are only materialized transiently). Unit-tested: struct_init+get round-trip, alloca+gep+store+load, nested-struct aggregate copy + nested read. Corpus untouched (688 green). - **Sub-step 4a — DONE.** Tuples + arrays. `kindOf` widened (`tuple`/`array` → aggregate). Ported `tuple_init`/`tuple_get` (positional, `tupleFieldOffset`), `index_get`/`index_gep` (`elemAddr` = base + idx*elem_size over array/pointer/ many_pointer bases; slice/string bases bail), and `length` on an array value (static `ArrayInfo.length`). Unit-tested: mixed tuple round-trip, `[3]i64` gep/store + index_get sum (42), array `length` (3). 688 corpus green. - **Sub-step 4b — DONE.** Slices + strings as `{ptr@0 (pointer_size), len@8 (i64)}` fat pointers (`kindOf`: string/slice → aggregate). Ported `const_string` (materializes text+NUL in flat memory + a fat pointer), `length`/`data_ptr` (read len/ptr fields), `array_to_slice`, `subslice`, indexing *through* a slice/string (`elemAddr` loads `.ptr` first), and `str_eq`/`str_ne` (len+memcmp). Helpers `makeSlice`/`sliceLen`/ `sliceData`. Unit-tested: string length + str_eq/ne, array→slice + slice index + slice length (23), array subslice (43). 688 corpus green. - **Sub-step 4c — DONE (optionals + payloadless enums).** `kindOf`: `enum` → word; `?T` → word if pointer-child (null==0) else `{T@0, i1@sizeof(T)}` aggregate. Ported `optional_wrap`/`unwrap`/`has_value`/`coalesce` (with `optChildIsPtr`/`optHas` helpers; `const_null` → `null_addr` reads as none), `enum_init` (payloadless: tag is the value), `enum_tag` (payloadless/word). Unit-tested: non-pointer `?i64` wrap/unwrap/coalesce (91), pointer `?*i64` null==0 (99), payloadless enum tag (11). 688 corpus green. - **Sub-step 4d — partial (`addr_of`/`deref` DONE).** `addr_of` passes through (an aggregate value already IS its address; a pointer is already an address — mirrors the legacy); `deref` = `readField` through the pointer (`ins.ty` is the pointee). Unit-tested (deref a `*i64` → 77; addr_of a struct value + field read → 80). **Deferred to the wiring phase (intentionally, not ported blind):** tagged-union payload (`enum_init` w/ payload, `enum_payload` — the legacy stores *untyped* Values and `field_index` indexes payload sub-fields, not variants, so a byte model's payload type is ambiguous without a real call site), `any` boxing, closures, and the bitwise ops. These have subtleties best resolved against actual corpus cases — the VM's loud `error.Unsupported` + `detail` will name exactly what each real eval needs. - **Sub-step 1.5 — direct `call` DONE.** `Vm` gained `module: *const Module` (resolves a callee `FuncId`) + a `depth`/`max_depth` recursion guard. `call` marshals arg Refs → Reg words and recursively `run`s the callee; aggregate args/ results pass as their `Addr` over the SHARED flat memory (no copy). **Stack-lifetime change:** `Frame` no longer reclaims the machine on exit (a returned aggregate's Addr would dangle) — a comptime eval's allocations live to `Vm.deinit`; `Machine.mark`/`reset` stay for explicit use. Extern/builtin callees (no blocks) bail loudly (1.5b). Unit-tested: direct call (`add(20,22)+100` → 142) and recursion (`sum(0..n)` → 15/55). 688 corpus green. - **Sub-step 1.5b — `Reg`↔`Value` boundary bridge DONE.** The builtin/`compiler_call`/ extern handlers are all coupled to the legacy `Interpreter` (e.g. `compiler_lib` handlers take `*Interpreter`), so the VM can't call them directly — the wiring uses WHOLE-FUNCTION fallback instead (VM runs pure functions; a bail re-runs the whole eval in the legacy). That needs the boundary bridge: `valueToReg` (host `Value` arg → VM `Reg`, materializing aggregates into flat memory) + `regToValue` (VM result → `Value`, deep-copied out). Covers scalars + strings + structs (other aggregate shapes bail loudly; added as wiring surfaces them). Transitional — deleted once the VM owns comptime end-to-end. Unit-tested with round-trips. 688 corpus green. - **Then the wiring step** (below) — now unblocked. ### Decision (2026-06-17): pivot from blind op-porting to CALLS + hybrid wiring The common leaf ops are ported (scalars, control flow, structs, tuples, arrays, slices, strings, optionals, payloadless enums, deref/addr_of) and unit-tested. Continuing to port the rarer ops (tagged-union payload, any, closures) in isolation risks subtle bugs and has low signal. The higher-value path: 1. **Calls (sub-step 1.5)** — `call` (direct), then `call_builtin`/`compiler_call`. The shared flat memory makes aggregate args/results pass naturally (they're Addrs). The one design point: **aggregate-return lifetime** — a callee's stack-reclaim would dangle a returned struct Addr, so for comptime (bounded) the VM should stop reclaiming per-frame and let the whole eval's allocations live until `Vm.deinit` (keep `Machine.mark/reset` for explicit use; drop it from `Frame.deinit`). 2. **Hybrid wiring** — `-Dcomptime-flat` routes a comptime eval through the VM, falling back to the legacy interp on `error.Unsupported`. This makes the VM run the REAL corpus, proving parity incrementally and surfacing exactly which ops each real eval needs — far better signal than more isolated unit tests.