comptime VM: host wiring, full corpus parity, build flag, Phase 3 seed

Phase 1.final of the flat-memory comptime VM — wire the host through it,
reach corpus parity, and gate it behind a build flag — plus the first
Phase 3 (compiler-API) step. Default OFF; legacy interpreter unchanged.

Host wiring + hardening:
- Machine accessors return error.OutOfBounds (no debug panic) on bad
  addresses; Frame.get/set bounds-check and bail (no panic) on a malformed
  operand ref (e.g. a ret Ref.none from an unresolved name).
- tryEval routed at both comptime call sites in emit_llvm — the const-init
  fold and the #run side-effect path — with per-eval legacy fallback;
  yields .void_val for void/noreturn entries. Both sites sx_trace_clear()
  before the legacy fallback so a partial VM run that pushed trace frames
  doesn't double-push on re-run.

VM coverage (all corpus const-inits except the inline-asm global):
- Implicit context materialized from the __sx_default_context global; the
  full allocator protocol runs on the VM (context.allocator.alloc ->
  call_indirect -> CAllocator thunk -> libc_malloc -> native flat malloc).
- Native libc memory builtins (malloc/calloc/free/memcpy/memmove/memset)
  on flat memory; f32 stored/loaded as the 4-byte single; signed sub-64-bit
  loads sign-extended; global_get (lazy + memoized); func_ref/call_indirect
  (func-ref encoded fid+1, 0 reserved for null); string/slice fat-pointer
  field access; is_comptime; the failable/error cluster (error_set tuples,
  trace_frame + native sx_trace_push/clear -> raise/catch/or + return traces).

Build flag + Phase 3 seed:
- -Dcomptime-flat (build_opts module) OR SX_COMPTIME_FLAT env enables the VM;
  zig build test -Dcomptime-flat runs the full corpus on the VM (688/0).
- intern/text_of serviced natively on flat memory via Vm.callCompilerFn
  (compiler_welded boundary) — the seed the rest of the compiler-API grows on.

Parity 688/688 gate ON and OFF. Unit tests added throughout. The
lowering-time #insert wiring was explored and reverted (lowering-time IR can
be malformed; full malformed-IR hardening is a prerequisite, deferred).
This commit is contained in:
agra
2026-06-18 08:27:58 +03:00
parent b8f3d6fd78
commit 0367d96d9b
7 changed files with 1142 additions and 108 deletions

View File

@@ -26,8 +26,22 @@ with ONE welded mechanism. Branch: `reify` (off `master`). Update after every st
> breaks cross-compilation — host vs target layout — and loses the sandbox. A
> flat-memory VM keeps both while getting native bytes + speed.)
>
> **Next action:** execute Phase 0 of `PLAN-COMPILER-VM.md` (strip the weld machinery),
> then Phase 1 (flat-memory value model). Build/verify: `zig build && zig build test`.
> **Next action (2026-06-18):** Phase 1.final op-porting is essentially COMPLETE — the VM
> handles **36** real corpus const-inits (0 → 16 → 27 → 31 → 36), with only **2** fallbacks
> left, both principled (`intern` = the welded compiler-API fn, Phase 3; inline-asm global
> `1654`, never comptime-evaluable). Parity **688/688** (gate ON and OFF). The VM now covers
> scalars/control-flow/aggregates/strings/optionals/enums, calls+recursion, the implicit
> context + full allocator protocol, globals, and failables + return traces. BOTH comptime
> call sites (const-init + `#run` side-effects) route through the VM with legacy fallback.
> **The forward work is Phase 2 (bytecode) and Phase 3 (compiler-API on flat memory)**; flipping the VM to
> default + deleting the legacy path awaits those. See `PLAN-COMPILER-VM.md` Phase 1.final
> Status steps 710 (Phase 3 seed: `intern`/`text_of` native on the VM — `0626` handled).
> Build/verify: `zig build && zig build test` (688, gate OFF). Run the corpus ON the VM:
> `zig build test -Dcomptime-flat` (the build flag) OR env `SX_COMPTIME_FLAT=1`. Coverage
> trace: `SX_COMPTIME_FLAT_TRACE=1`. **Forward: Phase 3 — grow the compiler-API on the VM**
> (`find_type` / `register_struct` / reflection readers via `Vm.callCompilerFn`, then
> re-express `declare`/`define`/`type_info` as sx and delete the bespoke interp arms);
> Phase 2 (bytecode) is the orthogonal speed work.
### (superseded) prior weld resume
Phase 1 done; Phase 2 welded structs were working via reflection + memory-order
@@ -298,6 +312,122 @@ when reached (sentinels or accessor fns; see the design doc Risks).
`List` growth; orthogonal, see `current/CHECKPOINT-METATYPE.md`.)
## Log
- **VM robustness — `Frame` bounds-check; lowering-time `#insert` wiring explored + reverted (2026-06-18).**
Explored wiring the VM at the LOWERING-time comptime site (`evalComptimeString`, the
`#insert` string fold). 12/13 `#insert` examples ran on the VM with parity, but `0737`
(an `#insert` of an unresolved `secret()`) CRASHED the VM (SIGABRT): lowering-time IR can
be malformed (a `ret Ref.none` from the unresolved name) and `Frame.get` panicked on the
out-of-range index. **Decision: reverted the lowering-time wiring** — unlike the emit-time
folds (fully lowered IR), lowering-time IR can be erroneous, and hardening the VM against
ALL malformed IR (every `ref_types[...]` / `aggType` access, not just `Frame`) is out of
scope here. The emit-time sites already give full corpus coverage. **KEPT** the defensive
fix regardless (CLAUDE.md "never crash"): `Frame.get`/`set` now bounds-check and flip a
`bad_ref` flag; the `run` loop bails (`badRef`) instead of panicking. Unit test added
(malformed `ret Ref.none` → bail, not crash). Parity **688/688** both ways.
- **Phase 3 SEED (VM plan) — compiler-call path: `intern`/`text_of` native on the VM (2026-06-18).**
`invoke` now dispatches a welded `compiler`-library fn (gated on `compiler_welded`) to
`Vm.callCompilerFn`, serviced NATIVELY on flat memory (no legacy `Interpreter`):
`intern(string)->StringId` reads the flat-memory string bytes and `internString`s into the
const-cast table (pool-only — doesn't touch type layout, so cached sizes stay valid);
`text_of(StringId)->string` materializes the pooled text back into flat memory. Unlocked
`0626`; the ONLY remaining const-init fallback is now the inline-asm global (`1654`).
Parity **688/688** (gate ON and OFF); unit test added. This is the mechanism Phase 3 grows
— the next compiler fns (`find_type`, `register_struct`, reflection readers) bind the same
way (flat-memory pointer in, handle/pointer out, no marshaling).
- **Phase 1.final step 9 (VM plan) — `-Dcomptime-flat` build flag (the "swap behind a build flag" step) (2026-06-18).**
Added the `-Dcomptime-flat` build option (build.zig → a `build_opts` options module on
`mod`; `emit_llvm.init` reads `build_opts.comptime_flat or SX_COMPTIME_FLAT env`). This is
the plan's "reach parity → swap behind a build flag → delete the old path" mechanism.
`zig build test -Dcomptime-flat` runs the FULL corpus on the VM (688/0). Verified the flag
toggles the binary: flag-built `sx` reports VM HANDLED with no env var; default-built does
not. Default OFF — `zig build test` unchanged (688/0). Env var still works for ad-hoc runs.
Next (forward): Phase 2 (bytecode) / Phase 3 (compiler-API on flat memory); eventual
default-flip + legacy deletion.
- **Phase 1.final step 8 (VM plan) — wire the `#run` side-effect path + trace-clear-on-fallback (2026-06-18).**
Wired the SECOND comptime call site (`runComptimeSideEffects`, top-level `#run <expr>;`)
through `tryEval` with legacy fallback, mirroring the const-init fold. `tryEval` now
handles void/noreturn entries (→ `.void_val`) so a void side-effect doesn't bail at the
result conversion. **Fixed a trace-corruption** the new site exposed (`1035`): a
side-effect that pushes return-trace frames and then bails (e.g. on `print`) had the
legacy re-run DOUBLE-push them (`sx_trace_push` is a side effect on the shared buffer).
Both wiring sites now `sx_trace_clear()` right before the legacy fallback, discarding the
VM's partial pushes. **Parity 688/688** (gate ON and OFF). Most side-effects still bail
(print/global_addr/call_builtin) → legacy, but the path is now uniform. All comptime
evaluation routes through the VM-with-fallback.
- **Phase 1.final step 7 (VM plan) — is_comptime + failable/error cluster + signed-load fix; coverage 31→36 (2026-06-18).**
`is_comptime` → 1 (unlocked `1030`). Ported the failable/error-channel cluster (`1037`
escape, `1038` handled): `kindOf(error_set)→word`, `regToValue` bridges TUPLES (the
failable `(value…,tag)` shape `checkComptimeFailable` reads), `trace_frame` packs
`(func_id<<32|span.start)` from a new `call_stack` (pushed by invoke/runEntry), and
`sx_trace_push`/`sx_trace_clear` serviced NATIVELY (the VM calls the real sx_trace.c
functions linked into the compiler, so the return-trace buffer is populated identically
to legacy). raise/catch/or now run on the VM. **Surfaced + fixed a real GENERAL bug:**
`readField` was ZERO-extending signed sub-64-bit loads, so a stored `i32 -1` reloaded as
`0xFFFFFFFF` (+4.29e9) and `< 0` was false — silently hiding `raise error.Bad`; now
SIGN-extends `i8`/`i16`/`i32`/`isize` (gate-ON parity confirms it's a strict fix; unit
test added). VM HANDLES **36** corpus const-inits (was 31); **parity 688/688** (gate ON
and OFF). Only **2 fallbacks** remain, both principled: `intern` (`0626`, welded
compiler-API fn — Phase 3) + inline-asm global (`1654`). Forward work: Phase 2 (bytecode),
Phase 3 (compiler-API on flat memory).
- **Phase 1.final step 6 (VM plan) — real default context + call_indirect + func_ref + global_get; coverage 27→31 (2026-06-17).**
Per the user's direction ("the VM can set up a default context"), `runEntry` now
materializes the REAL default context instead of a zeroed one. The implicit-ctx param is
an opaque `*void`, so `materializeDefaultContext` finds the `__sx_default_context` global
and lays its initializer (`{ {null, alloc_fn, dealloc_fn}, null }`, the CAllocator thunk
func-refs) into flat memory via a new recursive `layoutConst`. With `func_ref` (function
value encoded `FuncId.index()+1`, reserving word 0 for the null fn-ptr) and
`call_indirect` (decode word → FuncId → dispatch; 0 → bail) ported, the whole allocator
protocol runs on the VM:
`context.allocator.alloc_bytes` → call_indirect → thunk → `CAllocator.alloc_bytes` →
`libc_malloc` → native flat malloc. Unlocked `0606` (string global). Also: `global_get`
lazily evaluates a comptime global's `comptime_func` (memoized) — unlocked `CT_CHAIN`;
field access (`fieldOffset`/`struct_get`) handles string/slice `{ptr@0,len@8}` fat
pointers (needed by `alloc_string`); `regToValue` maps function-typed words → `.func_ref`
(kept `1128`'s rejection byte-identical). Native `malloc` is still required (the thunk
bottoms out at it; a host pointer can't be used with flat-memory load/store). VM HANDLES
**31** corpus const-inits (was 27); **parity 688/688** (gate ON and OFF). Unit tests:
global_get, func_ref+call_indirect. Remaining fallbacks (7): `.unsupported` aggregates
(3× — `1037`/`1038`), extern/builtin `intern`+asm (2×), `trace_frame`, `is_comptime`.
- **Phase 1.final step 5 cont. (VM plan) — libc memory builtins + f32 fix; coverage 16→27 (2026-06-17).**
Identified the dominant fallback (`call to extern/builtin`) as **11× `malloc`** (0604) +
1× `intern`. Modeled a curated set of libc MEMORY builtins natively on flat memory
(`Vm.callMemBuiltin`): `malloc`/`calloc` → `allocBytes` (16-aligned, 256-MiB cap → bail),
`free` → no-op, `memcpy`/`memmove`/`memset` on flat bytes — sandboxed (no host heap/dlsym),
target-aware; the computed result is byte-identical to legacy (which calls real libc).
This surfaced a **real latent f32 bug**: float registers hold f64 bits, but f32 MEMORY is
the 4-byte single — `readField`/`writeField` were truncating the f64 bits (writing zeros
for `1.0`); now they `@floatCast` on f32 load/store (mirrors legacy `storeAtRawPtr`).
Result: VM HANDLES **27** corpus const-inits (was 16); **parity 688/688** (gate ON and
OFF). Unit tests added (f32 round-trip; malloc → usable flat memory). Next: the `kindOf`
`.unsupported` aggregates (3×), `global_get` (2×), the rest.
- **Phase 1.final step 5 (VM plan) — implicit-context materialization; coverage 0→16 (2026-06-17).**
`tryEval` now MATERIALIZES the implicit ctx instead of skipping it: a `has_implicit_ctx`
comptime entry (sole param `*Context`) gets a zeroed `Context` of the right size/align
in flat memory, its address passed as arg 0. Const bodies that ignore the ctx run; a
body that uses the allocator hits unported `call_indirect` → bails → legacy. No func-ref
materialization needed (handled bodies don't read ctx contents; parity is the guard).
Fixed a real bug surfaced by the coverage pass: storing a `null` non-pointer optional
(the `null_addr` sentinel) into an aggregate slot OOB-bailed — `writeField` now ZEROES
the destination for a `null_addr` aggregate source (= none/empty); unit-test regression
added. Result: VM HANDLES **16** corpus const-inits (was 0); **parity 688/688 both
gate ON and OFF**. Next: port the ops the trace names — `call_builtin`/`compiler_call`/
extern (13×, via the bridge), `kindOf` `.unsupported` aggregates (3×), `global_get` (2×),
func_ref / call_indirect / trace_frame / is_comptime.
- **Phase 1.final steps 14 (VM plan) — host wiring landed; coverage measured (2026-06-17).**
(1) **Hardening:** `Machine.readWord`/`writeWord`/`bytes` now return `error.OutOfBounds`
(null / out-of-range / oversized / overflow-safe) instead of `assert`-panicking;
`OutOfBounds` added to `Vm.Error`; `try` threaded through every helper + exec arm + the
bridge. New unit tests (accessor OOB returns; null-deref → `tryEval` null, not a crash).
(2) **Implicit context:** `tryEval` returns null for `has_implicit_ctx` funcs (legacy
fallback) — conservative; full ctx materialization deferred to step 5. (3) **Wiring:**
const-init fold in `emit_llvm.zig` `emitGlobals` is `(if comptime_flat) tryEval else
null) orelse interp.call(...)`, gated by env `SX_COMPTIME_FLAT` (read once into
`LLVMEmitter.comptime_flat`). Default OFF. (4) **Parity + coverage:** gate ON → full
corpus byte-identical (688, 0 failed) + manual 0605/0606/0607 byte-identical.
**Finding: 0 of 37 measured corpus const-inits are VM-handled — ALL are
`has_implicit_ctx`-gated.** Added a coverage-trace facility (`comptime_vm.last_bail_reason`
+ env `SX_COMPTIME_FLAT_TRACE`). **Next: step 5 = implicit-context materialization** (the
unblocker), then port the deferred ops. 688 corpus green (gate OFF).
- **Phase 1.final start (VM plan) — wiring entry point `tryEval` (2026-06-17).**
`comptime_vm.tryEval(gpa, module, func_id) ?Value` runs a comptime function entirely on
the VM, returns a legacy `Value` (deep-copied to `gpa`) or `null` to fall back.

View File

@@ -142,6 +142,117 @@ host through it:
5. Grow coverage (port the deferred ops + `call_builtin`/`compiler_call` via the bridge)
until the VM is the default and the legacy path is deleted.
**Status (2026-06-17): steps 14 DONE; step 5 = the next session.**
- **(1) Hardening — DONE.** `Machine.readWord`/`writeWord`/`bytes` return
`error.OutOfBounds` (null / out-of-range / oversized / overflow-safe) instead of
asserting. `OutOfBounds` added to `Vm.Error`; `try` threaded through
`readField`/`writeField`/`optHas`/`makeSlice`/`sliceLen`/`sliceData`/`elemAddr` and
every exec arm + the bridge. New unit tests: hardened-accessor OOB returns, and a
null-deref function → `tryEval` returns `null` (legacy fallback), not a panic.
- **(2) Implicit context — DONE (materialized, 2026-06-17 step 5).** Initially a
conservative skip; now `tryEval` MATERIALIZES the implicit ctx: a comptime entry with
`has_implicit_ctx` (whose sole param is the `*Context`) gets a zeroed `Context` of the
right size/align allocated in flat memory, its address passed as arg 0. The common
const body never reads the ctx; a body that USES the allocator loads a fn from it and
`call_indirect`s (unported) → bails → legacy. No func-ref materialization was needed:
handled bodies don't read the ctx contents, and gate-ON corpus parity (688, 0 failed)
empirically confirms no divergence. (A body that read+branched on a null allocator fn
could in principle diverge; none does — parity is the guard.)
- **(3) Wire one site — DONE.** Const-init fold in `emitGlobals` is `(if comptime_flat)
tryEval(...) else null) orelse interp.call(...)`. Gated by env `SX_COMPTIME_FLAT`
(a `LLVMEmitter.comptime_flat` field read once from `std.c.getenv` in `init`).
Default OFF → corpus unaffected (688 green).
- **(4) Parity + coverage — DONE.** Gate ON: full corpus byte-identical (688, 0 failed);
manual `sx run` of 0605/0606/0607/0608 byte-identical to gate-OFF. Coverage-trace
facility in place (`comptime_vm.last_bail_reason` + env `SX_COMPTIME_FLAT_TRACE`,
printing HANDLED / fallback+reason per init).
- **(5) Implicit-context materialization + memory builtins + f32 — DONE; op-porting CONTINUES.**
Coverage climbed **0 → 16 → 27** handled corpus const-inits (fallbacks 22 → 11); parity
stays **688/688** (gate ON and OFF) at every step. Landed, in order: implicit ctx
materialized (→16); `writeField` null-aggregate fix (storing a `null` non-pointer
optional `null_addr` sentinel into an aggregate slot OOB-bailed → now ZEROES the
destination = none/empty; unit-test regression); curated libc MEMORY builtins on flat
memory (`Vm.callMemBuiltin`: `malloc`/`calloc` → `allocBytes` 16-aligned & 256-MiB-capped,
`free` → no-op, `memcpy`/`memmove`/`memset` on flat bytes — sandboxed, target-aware,
result byte-identical to legacy; unlocked `0604`'s 11 comptime mallocs); and an **f32
storage fix** (float registers hold f64 bits, but f32 memory is the 4-byte single —
`readField`/`writeField` now `@floatCast` instead of truncating the f64 bits, which had
written zeros for `1.0`; a real latent bug `0604` surfaced; unit tests added).
- **(6) Real default context + call_indirect + func_ref + global_get — DONE.** Coverage
**27 → 31** handled (fallbacks 11 → 7); parity stays **688/688** both gate ON and OFF.
Per the user's direction ("the VM can set up a default context"), `runEntry` now
materializes the REAL default context (not a zeroed one): the implicit-ctx param is an
opaque `*void`, so `materializeDefaultContext` finds the `__sx_default_context` global
and lays its initializer constant (`{ {null, alloc_fn, dealloc_fn}, null }`, carrying
the CAllocator thunk func-refs) into flat memory via a new recursive `layoutConst`.
With `func_ref` (a function value encoded as `FuncId.index() + 1` so word 0 stays
reserved for the NULL function pointer — `funcRefWord`/`funcRefToId`) and `call_indirect`
(decode the callee word → `FuncId` → dispatch; 0 → bail) ported, a comptime body
that allocates via `context.allocator` now runs ENTIRELY on the VM: `alloc_string` →
`context.allocator.alloc_bytes` → `call_indirect` → thunk → `CAllocator.alloc_bytes` →
`libc_malloc` → the VM's native flat-memory `malloc`. Unlocked `0606` (string global via
the allocator). Also: `global_get` lazily evaluates a comptime global's `comptime_func`
(memoized in `global_cache`) — unlocked `CT_CHAIN`; struct field access (`fieldOffset`/
`struct_get`) now handles string/slice `{ptr@0,len@8}` fat pointers (needed by
`alloc_string`'s `s.ptr`/`s.len`); and `regToValue` maps a function-typed word back to
`.func_ref` so a func-ref result serializes identically to legacy (kept `1128`'s
rejection diagnostic byte-identical). Unit tests added (global_get, func_ref +
call_indirect). **Note: native `malloc` is still REQUIRED** — the CAllocator thunk
bottoms out at libc `malloc`, and the VM can't use a host pointer with flat-memory
load/store, so comptime `malloc` must allocate from flat memory. The default context
lets the allocator PROTOCOL run; native `malloc` is its final step.
- **(7) `is_comptime` + failable/error cluster + the signed-load fix — DONE.** Coverage
**31 → 36** handled (fallbacks 7 → 2); parity stays **688/688** both gate ON and OFF.
- **`is_comptime`** → always 1 on the VM (folds to false in compiled code). Unlocked `1030`.
- **Failable / error-channel cluster** (`1037` escape, `1038` handled): `kindOf(error_set)
→ word` (a u32 tag id); `regToValue` now bridges TUPLES (the failable `(value…, tag)`
shape the host's `checkComptimeFailable` reads); `trace_frame` packs `(func_id<<32 |
span.start)` from a new `call_stack` (pushed by `invoke`/`runEntry`); and `sx_trace_push`
/ `sx_trace_clear` are serviced NATIVELY (the VM calls the real sx_trace.c functions —
linked into the compiler — so the return-trace buffer the host reads is populated
identically to the legacy dlsym path). `raise`/`catch`/`or` all run on the VM now.
- **Signed sub-64-bit load fix (a real GENERAL bug the failable case surfaced):**
`readField` now SIGN-extends `i8`/`i16`/`i32`/`isize` loads (was zero-extending, so a
stored `i32 -1` reloaded as `0xFFFFFFFF` = +4.29e9 and `< 0` was false — which silently
hid `raise error.Bad`). Affects any negative signed sub-64-bit value stored & reloaded;
gate-ON corpus parity confirms it's a strict fix. Unit test added (+ failable tests
pass via 1037/1038 in the corpus).
- **Remaining fallbacks (2, both principled — the VM correctly stays on legacy):**
`intern` (`0626`, the welded compiler-API fn — Phase 3 re-homes it) and the inline-asm
global call (`1654`, never comptime-evaluable). Every other measured corpus const-init
is handled on the VM.
At this point the flat-memory VM handles essentially the entire real comptime corpus
(scalars, control flow, structs/tuples/arrays/slices/strings/optionals/enums, calls +
recursion, the implicit context + allocator protocol, globals, failables + return
traces). Phase 2 (bytecode) and Phase 3 (compiler-API on flat memory) are the forward
work; flipping the VM to default + deleting the legacy path awaits those.
- **(8) Wire the `#run` side-effect path; trace-clear-on-fallback — DONE.** The second
comptime call site (`emit_llvm.runComptimeSideEffects`, top-level `#run <expr>;`) now
routes through `tryEval` with legacy fallback, like the const-init fold; `tryEval` yields
`.void_val` for a void/noreturn entry. Fixed a trace-corruption the new site exposed
(`1035`): a side-effect that pushes trace frames then bails (on `print`) had the legacy
re-run double-push them — both sites now `sx_trace_clear()` right before the legacy
fallback to discard the VM's partial pushes. Parity **688/688** both gate ON and OFF. All
comptime evaluation now routes through the VM-with-fallback (uniform).
- **(9) `-Dcomptime-flat` build flag — DONE (the "swap behind a build flag" step).** The VM
gate is now a build option (`build.zig` → a `build_opts` module on `mod`; `emit_llvm.init`
reads `build_opts.comptime_flat or SX_COMPTIME_FLAT env`), default OFF. `zig build test
-Dcomptime-flat` runs the FULL corpus on the VM (688/0) — the build-integrated parity
gate. Verified the flag toggles the binary (flag-built `sx` uses the VM with no env var;
default-built does not). This is the prerequisite to eventually making the VM default +
deleting the legacy path (which still awaits Phase 2/3 + broader confidence).
- **(10) Compiler-call path on the VM — `intern`/`text_of` native (Phase 3 SEED) — DONE.**
`invoke` now services a welded `compiler`-library function (the `compiler_welded` flag is
the safety boundary) via `Vm.callCompilerFn` — natively on flat memory, NO legacy
`Interpreter`: `intern(s: string) -> StringId` reads the string bytes from flat memory and
`internString`s into the (const-cast) table (pool-only, never touches type layout, so the
VM's cached sizes stay valid); `text_of(id) -> string` materializes the pooled text back
into flat memory as a fat pointer. Unlocked `0626` — the ONLY remaining const-init fallback
is now the inline-asm global (`1654`, genuinely not comptime-evaluable). Parity **688/688**
both gate ON and OFF; unit test added. This is the mechanism Phase 3 grows: the next
compiler functions (`find_type`, `register_struct`, the reflection readers) are added the
same way — flat-memory pointer in, handle/pointer out, no marshaling.
### Phase 3 — Compiler-API on flat memory (resume the stream — no weld)
With native-byte comptime values, re-home the compiler-API: