Files
sx/design/execution-evolution-roadmap.md
agra ded106333b docs(design): execution-model roadmap + reify implementation stream
Add the async-first execution-model roadmap (comptime JIT spine, colorblind
fibers/Io, atomics, hot-reload) with all seven decisions resolved and
three-way reviewed, and carve the first stream: comptime type_info/reify
(PLAN-REIFY + checkpoint) — the codebase-validated foundation for channel
result types and race's synthesized tagged union.
2026-06-16 16:43:29 +03:00

39 KiB
Raw Permalink Blame History

Execution-Model Evolution — Roadmap (comptime JIT · async · concurrency · hot-reload)

Status: exploratory design-of-record. Captures the forward plan for sx's execution model across five interlocking threads. Not yet an active PLAN-*/CHECKPOINT-* stream — this is the shared design the streams would be carved from. Cross-platform shipping (the bundled-zig backend + the sx bundler) is already landed; see bundled-zig-link-backend-design.md and ../current/PLAN-DIST.md.


0. The thesis

sx's compiler stays small by pushing capability into library sx + three general primitives (inline asm, extern/export, atomics) rather than baking features into codegen. Concretely:

  • Async is a library, not a language feature — colorblind, stackful fibers behind an Io interface (Zig-inspired). No function coloring, no async→state-machine transform. The implementation is pure sx down to a per-arch inline-asm context switch.
  • Comptime gains a JIT escape hatch — the interpreter stays the default (debuggable, portable), but drops to a host-JIT for the one thing it can't walk (inline asm) and, later, for whole fragments (the bundler).
  • One shared substrate — a persistent ORC LLJIT + host-target emitter — serves comptime-asm, the bundler, and JIT-resident hot-reload.

The honest trade is small surface, but each primitive is deep — not "small compiler." The net-new compiler obligations this plan adds (all verified absent today): atomics lowering (N1), generic enums enum($T), type_info + reify + field_type (comptime type construction), callconv(.naked), repointable-context codegen (+ per-fiber stack-limit), the S1 persistent JIT spine, C1 thunk synthesis, comptime-asm lifting (C3), and (later) the S2 ORC C++ shim. Async itself is genuinely a library; the enabling primitives are a major codegen/runtime investment. Already landed: inline asm (in flight), extern/export, the !/try/catch/onfail/raise ERR stream, value-level reflection, the sx run ORC LLJIT, and the host-FFI trampolines.


1. The spine (shared substrate)

ID Piece What Size
S1 Persistent JIT executor A long-lived ORC LLJIT + a host-triple LLVMEmitter + a compiled-fragment cache, plumbed into the interpreter. Today the LLJIT exists only for sx run's main (target.zig:319); the emitter carries one target machine (emit_llvm.zig:274). L
S2 ORC C++ shim MachOPlatform::Create + redirectable/lazy-reexport symbols. The bare LLVMOrcCreateLLJIT can't do thread-locals, C constructors, or symbol redefinition — the wall the C-with-sx JIT spike hit (_Thread_local SIGABRT; errors-* examples crashed). Required by any non-trivial JIT or symbol repoint. M

S1/S2 are the spine: built once, consumed by C1 (the FFI thunks — the main near-term consumer), C3, and (later) R2. S1 alone suffices for C1/C3 (bare calling/asm thunks — no TLS/ctors); S2 is only needed for R2 and JIT-ing C-with-sx.


2. Comptime / build layer

ID Piece Unblocks Depends Size
C1 Real comptime FFI — JIT calling-thunks (LLVM = single ABI authority). Trivial calls (scalar/ptr/string args, single-reg return) keep the existing host_ffi.zig trampoline fast-path; everything else (floats, structs-by-value, aggregate returns, >8 args, varargs) synthesizes a per-signature thunk, JIT-compiles it via S1, and calls it with an args buffer the interpreter fills by known layout (type_info). LLVM emits the ABI-correct call — the same lowering as runtime codegen — so comptime and runtime FFI share ONE ABI implementation. Rejected: libffi (foreign 2nd ABI impl), hand-rolled sx+asm (3rd impl + drift risk + needs C3 to run its own asm leaf anyway). struct/string/slice/float signatures at comptime; full C interop in #run; lifts the bundler's API straightjacket; unifies comptime+runtime FFI S1 (fast-path: none) L
C2 #compilerextern collapse — BuildOptions hooks become real exported C symbols resolved through C1; *BuildConfig threaded via global/handle; delete .compiler_expr/compiler_call/Registry. one FFI mechanism, not two C1 (extern/export already shipped) M
C3 Comptime asm via host-JIT — stop bailing on inline_asm (interp.zig:1019); lift the block (operand model at inst.zig:354: inputs/out_value/out_place/out_ty/clobbers) to a host-arch thunk via LLVMGetInlineAsm, JIT, call through C1, cache by template+sig. running asm-containing code at comptime S1, C1 (+S2 non-trivial) M
C4 (DROPPED) JIT-the-bundlernot built (Decision 6). Interp+C1 is the shipping bundler (I/O-bound, so native speed is moot; C1 closes the only capability gap). Remains an always-available S1 optimization if profiling ever shows the bundler's own logic is a hotspot.

Residue: cross-arch comptime asm (C3) can't run on the host — narrows the bail to the cross-compile case; needs a sharp diagnostic ("asm targets <arch>, host is <host>").


3. Concurrency primitives (atomics + threads)

Why this is its own section: we are doing multiple OS threads, so the async runtime and any lock-free structure need real atomics. OS threads already exist; atomics do not.

ID Piece State Size
N1 Atomics — NET-NEW compiler feature. Atomic load/store/RMW (add/sub/and/or/xor/swap + fetch_min/fetch_max; no nand), compare_exchange/_weak (→ ?T, null = success), and fences, with orderings (relaxed/acquire/release/acq_rel/seq_cst). LLVM provides all — an emit feature, not a runtime library. Surface LOCKED = Atomic($T) wrapper + Ordering enum (not @atomic_*@ is address-of in sx). lowering absent — zero LLVM atomicrmw/cmpxchg/fence emission today; some IR/inference scaffolding exists M
N2 OS threads + pthread Mutex/Cond + worker Pool landedstd/thread.sx (pthread_create/join/detach, in-place Mutex/Cond, bounded Pool). NOTE: pthread mutex blocks the OS thread — it is not fiber-aware (it would park every fiber on that thread); fiber-aware sync is N3, built on N1.
N3 Fiber-aware sync — mutex / channel / waitgroup that suspend the fiber, not the OS thread. Hybrid: atomic fast-path (N1) + fiber-suspend slow-path (A2/A5). Distinct from the pthread primitives in N2. new library M

Compiler obligation for N1: the emit must map sx orderings to LLVM's and not reorder across atomics/fences. Comptime is single-threaded, so the interpreter can treat atomic ops as ordinary ops (seq_cst is trivially satisfied with one thread) — no interp atomics machinery needed.

N1 is a prerequisite for M:N scheduling (A5) and N3, and is broadly useful (lock-free queues, refcounts, the allocator). It is the load-bearing new primitive this revision adds.


4. Async — colorblind, stackful, pure-sx

Commitment: no function coloring, no async→state-machine transform. Async is a capability carried in context (like context.allocator), not a property of a function's signature. A function does I/O through context.io; whether the call suspends is decided by the Io implementation, transparently.

ID Piece Notes Size
A1 Io interface + context.io — a protocol/vtable threaded like Allocator. io.async(fn,args) → Future, future.await, cancellation. leverages protocols + context M
A2 Stackful coroutine runtime — in sx lib, NOT a compiler builtin. The context-switch is a callconv(.naked) sx fn with an inline-asm body (save callee-saved + SP/LR into *from, load from *to, ret); fiber bootstrap + stack alloc (mmap+guard via extern) also sx. The compiler's job is only (a) the general primitives — inline asm, callconv(.naked), atomics — and (b) fiber-safe codegen: context lowered as a repointable indirection (never raw TLS) so the switch can repoint it, and stack-limit guards (if emitted) read from a swappable per-fiber location. Most arch-delicate sx in the tree (must match the platform callee-saved set + the compiler ABI), but it's inspectable sx, not a black box. per-arch, arch-gated; co-validate vs codegen M
A3 Event-loop Io impls — kqueue / epoll / io_uring drive readiness, then the (now-ready) syscall via C1. Plus a trivial blocking Io. pure sx around syscall externs L
A4 Stdlib I/O rework — fs/socket/process take/use context.io instead of raw blocking syscalls, so existing calls participate in async. mirrors the allocator-threading rule M
A5 Schedulers — M:1 → N×(M:1) → M:N, all sx std-lib Io vtables (committed; M:N last, not deferred). M:1 first (minimal vehicle to validate the colorblind stack; covers I/O-bound). N×(M:1) = first parallel step (per-thread M:1 loops + std/thread.sx spawn; shared state uses N1 atomics — expected under parallelism, not a wart). M:N work-stealing last (most machinery: thread-safe steal queues + migration + errno/TLS discipline). All over N1 atomics + the A2 asm context-switch + extern syscalls. pinning API for thread-affine work (UI main thread, GL context). see §4.3 M (M:1) / M (N×M:1) / L (M:N)

4.1 How control enters sx (the colorblind model)

  • sx→sx is ordinary. The whole call chain lives on the fiber stack; a suspend at a leaf io.* freezes the native stack verbatim. No frame knows it suspended. Zero special handling at call boundaries — that's the point.
  • Three inbound boundaries where the runtime enters sx:
    1. Task entry (io.async(fn)) — a trampoline starts fn on a fresh fiber stack via the normal calling convention.
    2. Resumption — a context-switch (asm), not a call; sx continues mid-stack.
    3. C callback → sx — must be export/callconv(.c); runs on the event-loop stack (not a fiber) so it cannot itself suspend — it may resume/enqueue a fiber or run a non-suspending sx fn to completion (leaf-only).

4.2 context is fiber-local (the key obligation)

context.io/context.allocator/the push Context stack are dynamically scoped. Fibers time-share OS threads (and migrate under M:N), so context must travel with the fiber — saved/restored on every context-switch — never a raw TLS read. A spawned task snapshots the spawner's context, then evolves its own push Context stack. This is the CLAUDE.md "capture your owning allocator" rule one level up: ambient state that outlives a suspension point must be carried by the fiber.

4.3 Threads & the two hazard classes (why atomics)

Model Parallelism Migration Hazards
M:1 (1 OS thread) none none cooperative, race-free — simplest
N×(M:1) (per-thread schedulers, no migration) yes none data races on shared state → atomics/locks
M:N (work-stealing) yes yes data races + TLS-migration hazards
  • Parallelism hazard (any N>1): shared mutable state races → needs N1 atomics + N3 fiber-aware sync. The M:1 "no locks" simplicity is gone.
  • Migration hazard (M:N only): a fiber that moves threads across a suspend reads the wrong thread's TLS. errno must be captured immediately after each syscall; context must be fiber-local (§4.2) — non-negotiable under M:N.
  • Pinning (io.pinToThread()): some work must stay put — the UI main thread (UIKit/macOS/Android — directly the app targets in §6), OpenGL current-context, TLS-using FFI. M:N needs a "don't migrate / main-thread-only" fiber attribute (Go's LockOSThread).

4.4 Pure-sx boundary

Everything is sx except the irreducible FFI floor: the asm context-switch (per-arch, in .sx), syscall externs (kernel-implemented, like any libc binding), and raw stack memory (mmap). The schedulers, event loops, futures, cancellation, and sync primitives are ordinary sx. Payoff: swappable Io vtables — blocking, io_uring, kqueue, a mock Io for tests, a deterministic-simulation Io (fake clock, scripted readiness) for reproducible concurrency tests — all libraries.

4.5 Comptime async = blocking Io

At comptime install the blocking Io: io.* just blocks; no fibers, no scheduler, no suspend. Same source, different vtable. The interpreter never needs suspend/resume, and the FFI (C1) needs no async awareness. This is why the colorblind model resolves comptime async for free.

4.6 Syntax surface (grounded against the grammar)

All of the concurrency/atomics surface lands on existing sx grammar — enum tagged unions + if x == { case … } match (specs.md:364,408), first-class tuples with named fields (specs.md:815-852), => closures, struct($T) generics, callconv(...), and the ERR keywords (try/catch/onfail/raise/error). race/async/await/atomic are not reserved words (specs.md:168), so they stay library types/methods — no keyword additions. One genuinely-new compiler capability is required (see end).

Atomics (N1) — generic wrapper type.

Ordering :: enum { relaxed; acquire; release; acq_rel; seq_cst; }
Atomic   :: ($T: Type) -> Type #builtin;   // atomicity carried by the type

counter : Atomic(i64) = .init(0);
counter.store(0, .relaxed);
n    := counter.load(.acquire);
prev := counter.fetch_add(1, .seq_cst);            // + fetch_sub/and/or/xor (min/max: open)
old  := counter.swap(42, .acq_rel);
got  := counter.compare_exchange(old, new, .acq_rel, .acquire);        // strong → ?T (null = success)
got2 := counter.compare_exchange_weak(old, new, .acq_rel, .acquire);   // may fail spuriously; for retry loops
fence(.seq_cst);
  • CAS takes two orderings (success, failure); failure ordering may not be release/acq_rel nor stronger than success — enforce in the compiler.
  • Weak vs strong matters on aarch64 (LL/SC) — weak in a loop is the idiom; both compile identically on x86.

Channels (N3) — methods only (no <-); recv returns a tagged union (not (v, ok)).

RecvResult :: enum($T: Type) { value: T; closed; }        // ordinary generic enum (not the race-synthesized union)
TryResult  :: enum($T: Type) { value: T; empty; closed; } // non-blocking: 3 states a bool can't express

ch := Channel(i64).make(16);     // capacity; .make() unbuffered
ch.send(v);
if ch.recv() == { case .value: (v) { use(v); }  case .closed: { /* drained */ } }
ch.close();
// ergonomic layer: `for ch (v) { … }` consumes until closed, hiding RecvResult

Fiber-aware locks (N3) — explicit lock + defer (no guard sugar).

m : Mutex;
m.lock();  defer m.unlock();

Futures & spawn (A1).

f := context.io.async(worker, arg);     // Future(R)
r := f.await();                         // suspends this fiber
f.cancel();
d := context.io.timeout(5000);          // a Future too — raceable like any other

Pinning (A5) — spawn attribute, accepts a thread handle.

PinTarget :: enum { any; main; on: Thread; }            // default = .any (may migrate)
f := context.io.async(render, pin = .main);
f := context.io.async(worker, pin = .on(some_thread));

race (Zig model — over futures, named tuple in → synthesized tagged-union out). The input is a named tuple (positional also allowed → .0/.1 tags); the result is an anonymous tagged union whose variants mirror the tuple's labels, each payload = that field's Future(T) projected to T. Losers are cancelled and joined before race returns (structured).

fa := context.io.async(read_a, conn);     // Future(A)
fb := context.io.async(read_b, conn);     // Future(B)

winner := context.io.race((a: fa, b: fb));   // RaceResult = enum { a: A; b: B }
if winner == {
    case .a: (v) { handle_a(v); }            // v : A
    case .b: (v) { handle_b(v); }            // v : B
}
// positional form: race((fa, fb)) → tags .0 / .1

The Go-style handler-map and the map literal that propped it up are droppedrace over futures subsumes select, and cancellation handles the losers.

Cancellation rides ERR. A cancelled io.* raises; the fiber unwinds through defer/onfail (try/catch/raise are real keywords). Cancellation is cooperative (observed only at suspend points — every io.* is a cancellation point) and structured (race joins losers' teardown before returning). No parallel unwind path — it reuses the error channel.

Context switch (A2).

swap_context :: (from: *Fiber, to: *Fiber) callconv(.naked) {
    asm { /* save callee-saved + SP into *from; load from *to; ret */ };
}

callconv(.naked)callconv(.c): no prologue/epilogue/frame — required because a context switch deliberately makes SP-in ≠ SP-out (a .c epilogue would restore from the wrong stack). Body is a single asm block; you emit your own ret. Args arrive in ABI registers, read directly from asm.

One new compiler capability (gates race): comptime tuple→tagged-union synthesis. Reflection today only reads types (field_count/field_name/ type_of); RaceResult(T) must construct an anonymous enum from a tuple's (label, payload-type) pairs. Supporting pieces: a field_type($T, i) -> Type reflection accessor (we have value-level field_value + type_of, but type-only field projection is missing) and Future(T) → T projection (falls out of generics). This is the generic "derive a sum from a product" — useful beyond race.


5. Dev loop / hot-reload

ID Piece Notes Depends Size
R1 Hot-reload (dylib swap) — host owns State+allocator; reloadable module is a .dylib with a fixed export interface; watch→rebuild→dlopen→rebind→dlclose. State survives (host-owned). leans on export (shipped); sidesteps S2; native M
R2 Hot-reload (JIT-resident) — program runs under S1's LLJIT; reloadable calls route through ORC indirection stubs, repointed on change. Finer granularity; same spine. S1, S2 L
R3 Incremental compilation — dependency tracking + recompile-only-changed. Perf enabler; coarse per-file v1 suffices first. L

Core rule: the data that must survive a reload cannot be owned by the code that reloads. Code/state separation — the CLAUDE.md owning-allocator discipline, one level up.

Residue — state migration on layout change: body-only changes hot-swap; layout/signature/global-type changes are detected (compare new vs running State layout via types.zig) and trigger rebuild+restart. Migration hooks (on_reload(old)→new) are a hard later item. Design against silent corruption.


6. Cross-platform (mostly landed) — from a macOS laptop

6.1 Landed

Capability State Reach from a mac
extern/export C linkage done (replaced #foreign) all targets
Bundled-zig cc cross-link backend Phases 02 done; packaging pending macOS, Linux(-musl/static), Windows(-gnu) verified
sx-side bundler (.app/.apk) done macOS, iOS sim/device, Android
JIT sx run (ORC LLJIT) done host
Target shorthands done macos[-arm], linux[-musl[-arm]], windows[-gnu], ios[-arm], ios-sim[-arm/-x86], android[-arm64/-x86_64], wasm

6.2 Workflows

# macOS (native): inner loop is JIT; ship is Mach-O / .app
sx run app.sx
sx build app.sx -o app
sx build app.sx --bundle MyApp.app

# Linux (cross, landed killer feature): static, zero-dep ELF
sx build app.sx --target linux-musl -o app      # scp anywhere, runs

# Windows (cross, landed, MinGW path): PE32+
sx build app.sx --target windows-gnu -o app.exe # cf. example 1660 (win32)

# iOS simulator (mac-only host)
sx build app.sx --target ios-sim --bundle App.app

# iOS device — signing threaded via the build program (BuildOptions setters)
#   #run { o := build_options(); o.set_bundle_id(...); o.set_codesign_identity(...);
#          o.set_provisioning_profile(...); }
sx build build.sx --target ios --bundle App.app

# Android (cross + bundle): javac → d8 → aapt2 → zipalign → apksigner, then adb
sx build app.sx --target android --apk app.apk

6.3 Where the roadmap lights up cross-platform

  • C1 + C4 → the iOS/Android bundlers (orchestrate ~a dozen host tools at comptime; biggest win; always host-arch so no cross-arch risk).
  • R1/R2 + A1A5 → the inner dev loop for non-host targets: push-a-dylib + remote-trigger-reload over an async laptop↔device channel — a capability that doesn't exist today short of full rebuild+reinstall.
  • A1/A2 colorblind Io → the dev tooling is itself async, and the same networking code runs blocking inside the bundler (adb push) and async in the live session — no coloring.
  • Pinning (A5) → the UI render fiber pins to the main OS thread on every app target.

The single hard constraint the matrix exposes: cross builds mean target arch ≠ host arch, so C3's residue bites — comptime/#run code reaching target-arch inline asm can't execute on the mac. Native macOS dev never hits it; every cross target must gate comptime asm to host-arch (when host_arch == …) or get a loud diagnostic.


7. Linear build sequence (async-first — no parallel streams)

Single ordered list; deps satisfied at every step. Async-first (user-chosen): the async story needs no JIT spine (syscalls use the existing trampoline FFI; comptime async = blocking Io), so the FFI/JIT cluster comes after. C4 is omitted (dropped — an S1 optimization if ever profiled). Net-new compiler prereqs (per the codebase grounding) are explicit steps, not buried.

Foundations — compiler primitives the async story needs (all net-new):

  1. N1 — Atomics lowering. IR/inference scaffolding exists; add LLVM atomicrmw/cmpxchg/fence emission + orderings. Surface = Atomic($T) wrapper. Gates channels/N3 + parallel schedulers.
  2. Generic enums enum($T) DROPPED. RecvResult($T)/TryResult($T) are type-fns over reify (step 3), not a new enum($T) language feature — and type-fns (user ($T)->Type in type position) already work (e.g. Make, Complex). A declarative enum($T) surface, if ever wanted, is later sugar desugaring to a type-fn-over-reify.
  3. type_info + reify + field_type — comptime metaprogramming floor. Gates race synthesis and channel RecvResult/TryResult (all type-fns over reify; generic-enum syntax dropped). Validated against the codebase (3 reviewers): a small extension reusing existing machinery throughout — not net-new architecture. Five contracts:
    1. Nominal identity via type-fn memoization — type-fns dedup by mangled (fn,args) name (generic.zig:1620-1629) + reify findByName, so RecvResult(i64) is one TypeId and the body runs once. (NOT structural dedup — enums are nominal via nominal_id, types.zig:1110.)
    2. Functional through codegen — layout / construct / match+exhaustiveness / toLLVMType / type_name+format are all type-table-driven, zero AST coupling, so a backing-decl-less reify'd enum flows through unmodified.
    3. Validate loudly at the single intern/internNominal choke point (types.zig:411-439): reject dup variants / bad backing / unresolved payloads.
    4. Comptime-only, JIT-free — a type-table op in the interp; no S1 dependency (keeps reify, hence channels + race, off the JIT critical path).
    5. Reference-based self-reference (v1)*Self/[]Self payloads via the reserve-placeholder→complete path recursive source types already use (nominal.zig:86/108/120, types.zig:442); by-value recursion rejected (loud, infinite size). reify gains a reify_rec((self) => …) builder form.
    • Type-minting precedents (7): monomorphization, protocol vtables, tuples, vector/array, ptr/slice ctors, FFI stubs, type-fn instantiation — all construct TypeInfo programmatically + intern(). Residual = plumbing, not capability: name reify-results by the instantiation's mangled name (done for inline-struct bodies — extend to reify-results) + reify input validation.
  4. callconv(.naked) — extend CallConv {default, c} (types.zig:169) + skip prologue/epilogue lowering. Gates A2.
  5. Repointable-context codegen — lower context as a swappable indirection (never raw TLS) + per-fiber stack-limit. Compiler obligation; gates A2 and cross-fiber context.io correctness. (Reviewer note: this is a prerequisite of A2, not a successor.)

Async runtime — sx lib over the primitives: 6. A1 — Io interface + context.io + Future + cancel() API. 7. A2 — fiber runtime (naked context-switch asm, bootstrap, mmap stacks). 8. A3 — blocking Io → deterministic-sim Io (keystone, calibrated) → event-loop Io. 9. A5·M:1 — single-thread scheduler. 10. N3 — fiber-aware sync (channels/mutex/waitgroup; recv → RecvResult). 11. A6 — Cancellation. .canceled in the ! channel (model a); per-fiber atomic flag (N1); every io.* a cancellation point; structured cancel-and-join; masked during cleanup. 12. A4 — stdlib I/O rework (fs/socket/process onto context.io). 13. A5·N×(M:1) — first parallel (errno-capture + context-fiber-local discipline). 14. A5·M:N — work-stealing (steal queues + migration + pinning).

Then comptime / FFI / JIT cluster: 15. S1 — persistent JIT spine → 16. C1 — real FFI (LLVM = ABI authority, on S1) → 17. C2 — #compilerextern → 18. C3 — comptime asm (S1 + C1; +S2 if TLS/ctors).

Deferred tail: 19. S2 — ORC C++ shim (highest-risk — see §8; macOS MachOPlatform; ELF/COFF unplanned) → 20. R1 — dylib reload (shipped export) → 21. R2 — JIT-resident reload (S1 + S2; ↔ async live-fiber coupling, §8) → 22. R3 — incremental compilation.

Hard edges to remember: C1 depends on S1 (the non-trivial FFI cases); C3 depends on C1 (calls through its thunk path); R1/R2 couple to the async runtime (can't hot-swap code with live suspended fibers — runtime + long-lived fibers stay persistent, only leaf logic reloads).


8. Irreducible hard problems (detect-and-degrade, don't pretend)

  1. State migration across layout change (R1/R2) → v1 detects + rebuild/restart; migration hooks later.
  2. Cross-arch comptime asm (C3) → can't run on host; narrows the bail + loud diagnostic; gate to host-arch.
  3. M:N migration hazards (A5) → errno-capture discipline + fiber-local context (mandatory), pinning for thread-affine work.

8.1 Highest technical risks (from review — ranked, async-first lens)

  1. A2 context-switch correctness (in the async critical path). Silent stack corruption, per-arch, untestable by the deterministic-Io harness (it tests scheduling, not the switch); a one-register slip is invisible until it crashes on the right arch. Couples library asm to the compiler ABI — ABI drift breaks it silently later. → needs a dedicated switch-stress test (§10).
  2. reify → anonymous-tagged-union → match-codegen (gates race + channels). DE-RISKED by review (§7 step 3): all enum stages are type-table-driven with zero AST coupling, identity is handled by existing type-fn mangled-name memoization, and forward-declaration for self-ref already exists. Residual is plumbing (name reify-results by mangled name + input validation), not new architecture.
  3. Deterministic-Io is the test keystone yet itself uncalibrated — a buggy deterministic scheduler yields deterministic-wrong stdout that snapshots lock in. → calibrate against the blocking Io / property-test fixed order (§10).
  4. context-fiber-local + errno discipline (A5 M:N). "Non-negotiable" but enforced by manual rule, not the compiler; M:1 can't even exercise migration.
  5. S2 ORC shim (deferred, but highest-risk when reached): only C++ in the tree, already failed a spike (_Thread_local SIGABRT), MachOPlatform is macOS-specific — Linux/Windows JIT-resident reload + non-Mac TLS/ctor JIT have no named plan. One "M" box hides a per-OS effort.
  6. C1 args-buffer layout-vs-ABI — "LLVM emits the call" covers the call, not the interpreter's buffer pack from type_info. Disagreement on edge layouts (over-aligned/empty structs, aarch64 small-struct register splitting, bool) = silent comptime corruption. → adversarial layout cases (§10).

9. Decisions log (all resolved)

Sequencing — locked: async-first (§7). The async cluster (steps 114) precedes the FFI/JIT cluster (1518) because async needs no JIT spine. Cancellation (A6) = model (a) — a .canceled variant in the existing ! error channel that io.* already returns (I/O is inherently fallible, so io.* is already !-typed — the "keep calls clean" argument for the non-local-raise model is moot). Reuses !/try/catch/onfail; no new unwind primitive. Net-new prereq surfaced by grounding: callconv(.naked) (only .default/.c today). Generic enums droppedRecvResult($T)/TryResult($T) are type-fns over reify (type-fns already work in type position, e.g. Make/Complex), so no enum($T) feature is needed; reify gains two contracts (deterministic identity + functional-enum output, §7 step 3).

Locked (see §4.6 for the grounded surface):

  • N1 atomics surface = generic wrapper Atomic($T) + Ordering enum, .init, compare_exchange/_weak returning ?T (null = success — pinned, opposite of most priors). (Not @atomic_* builtins — @ is address-of in sx.) RMW set = add/sub/and/or/xor/swap + fetch_min/fetch_max (free from LLVM); no nand.
  • race = over futures (Zig model), single named-tuple in (race((a: fa, b: fb))) → synthesized tagged-union out; Go-style handler-map + map literal dropped. No async spawn-sugar — always context.io.async(...).
  • Channels = send/recv methods (no <-); recv returns a tagged union RecvResult($T){ value; closed } (not (v, ok)), try_recv{ value; empty; closed }; optional for ch (v) {…} iteration sugar. locks = lock() + defer unlock() (no guard sugar). race/async/await stay library, not keywords.
  • Comptime type metaprogramming = type_info + reify builtins only (Zig @typeInfo/@Type model). Everything else is sx libmake_enum, field_type, RaceResult. reify coverage starts at enum/struct/tuple, grows later. Future($T) exposes Value :: T so Future(X)→X is plain member access (no type_arg builtin).
  • C1 FFI engine = LLVM as single ABI authority — per-signature JIT calling-thunks via S1 (LLVM emits the ABI-correct call, same as runtime codegen); trampoline fast-path for trivial calls. libffi/dyncall + hand-rolled-sx rejected (2nd/3rd ABI impl; hand-rolled needs C3 for its own asm leaf anyway). Promotes S1 to foundational (shared by C1, C3).

Scheduler (Decision 5) — locked: M:1 → N×(M:1) → M:N, all sx std-lib Io vtables (compiler only provides N1 atomics + the A2 asm context-switch + extern syscalls). M:1 ships first (validates the colorblind stack, covers I/O-bound); N×(M:1) is the first parallel step; M:N is last in sequence but committed — not deferred. Data races under parallelism are expected and handled with atomics + fiber-aware sync — that is parallelism, not a wart; M:1's lock-freedom is just a property of the single-threaded case.

Deferred, orthogonal additions (Decisions 67) — both addable later without revisiting anything locked:

  • C4 (Decision 6) — fully orthogonal; not built now. Pure deferred optimization riding S1 (already present for C1/C3): JIT the bundler subgraph instead of interpreting it. Zero coupling — same bundler sx, same C1 FFI. Apply only if profiling ever shows the bundler's own logic is a hotspot (it's I/O-bound, so unlikely). Interp+C1 is the shipping bundler.
  • Hot-reload (Decision 7) — deferred; mechanism additive. Substrate ready: R1 (dylib-swap) needs only shipped export; R2 (JIT-resident) needs S1 + the S2 ORC shim. R1-vs-R2 chosen at pickup. One coupling (a design constraint, not a decision change): you can't hot-swap code with live suspended fibers pointing into the old module — so the async runtime + long-lived fibers stay on the persistent side, only transient leaf logic is reloadable (or quiesce fibers before swap).

10. Testing & gates

Inherits the project cadence (CLAUDE.md): zig build && zig build test after every step; xfail-then-green or behavior-lock — no commit both adds a test AND makes it pass; never regenerate snapshots while red; corpus = examples/ + issues/ with .exit/.stdout/.stderr/.ir snapshots. Per-step gates live in the eventual PLAN-* streams; this section is the design-level verification strategy that those streams must implement.

10.1 The async test harness = the deterministic-simulation Io (the keystone)

Concurrency is nondeterministic (scheduling/readiness order), which breaks snapshot testing outright. So the deterministic-sim Io (fixed clock, scripted readiness, deterministic single-stepping scheduler) is not merely a feature — it is the test harness for everything async. Every concurrency example runs under it → reproducible stdout → snapshottable. Consequence for sequencing: build the deterministic Io right after the blocking Io (it's the simplest scheduler after blocking and it gates the ability to test fibers/channels/race/schedulers at all). The 10 patterns in §4.6-adjacent examples become corpus tests only because they run under it.

10.2 What is NOT snapshot-testable

True parallel data races (N×M:1 / M:N) are nondeterministic by construction. They run under the deterministic Io for correctness repro, but race-detection needs a separate stress harness (run-N-times / TSan-style), not the corpus. Any such coverage bound must be stated loudly (a log()-style note in the harness), never silently skipped — per the REJECTED-PATTERNS rule against silent gaps.

10.3 Arch-sensitive lowering — atomics + context-switch

Atomic orderings lower differently per arch (x86 lock-prefix / plain MOV vs aarch64 LL/SC / ldar/stlr), and the A2 context-switch is per-arch asm. Lock both with the existing inline-asm cross-arch sibling pattern: a .build {"target": "…"} sidecar runs ir-only on a non-matching host (asserts .ir + .exit + .stderr from sx ir --target) and end-to-end on a matching CI runner. So Atomic lowering carries x86_64 + aarch64 .ir snapshots; the context-switch gets per-arch run tests on matching runners.

10.4 New corpus categories

17xx atomics · 18xx concurrency (fibers/channels/race/async, all under the deterministic Io). Comptime metaprogramming (type_info/reify) + comptime-asm extend 06xx; C1 FFI extends 12xx; the cross-arch comptime-asm loud bail and the cancellation diagnostics are 11xx.

10.5 Per-piece gates (design level)

Piece Locks via
N1 atomics unit emit_llvm.test.zig (LLVM atomicrmw/cmpxchg/fence + ordering emission); corpus 17xx single-thread (deterministic); arch-gated .ir (x86_64 + aarch64)
type_info / reify unit (reflect round-trips; reify'd enum has correct layout/match codegen); corpus 06xx comptime (deterministic)
C1 FFI behavior-lock existing trampoline cases first; then xfail→green 12xx comptime extern with floats / structs-by-value / aggregate ({ptr,len}) returns; unit for thunk-synth + args-buffer marshal
S1 spine infra — exercised transitively via C1/C3 examples; unit for LLJIT lifecycle + thunk cache
C3 comptime asm corpus 06xx host-arch #run asm computes a value; 11xx diagnostic asserts the cross-arch loud bail
A1/A2 fibers unit (scheduler step, fiber bootstrap); context-switch arch-gated run tests; corpus 18xx under deterministic Io
A3/A5 schedulers, channels, race, cancel corpus 18xx (the 10 patterns) under deterministic Io → deterministic snapshots; cancellation cleanup (onfail/defer) asserted via stdout ordering

10.6 Cadence example (atomics, N1)

  1. xfail — add examples/17xx-atomics-fetch-add.sx using Atomic(i64).fetch_add; seed the .exit marker → red (codegen missing). (test added, not yet passing)
  2. green — emit LLVM atomicrmw add + ordering; example passes; capture .stdout + x86_64/aarch64 .ir snapshots; review the diff. (makes it pass, no new test)

This satisfies "no commit both adds a test and makes it pass," and every other piece follows the same xfail→green (or behavior-lock→extend) shape.

10.7 Review-surfaced gaps (the high-corruption-risk pieces need correctness, not existence, tests)

The §10.5 gates prove things run; the §8.1 risks are silent-corruption modes a run/snapshot test won't catch. Each needs an explicit adversarial gate:

  • A2 context-switch — switch-stress test. Scribble every callee-saved register
    • a stack-canary before suspend; deep/recursive fiber chains; verify all survive post-resume. Run/snapshot tests don't prove register preservation. (The single highest-corruption-risk piece, §8.1.1.)
  • Deterministic-Io — calibrate the oracle. Cross-check a handful of cases against the blocking Io and property-test that scheduling order is actually fixed, before trusting it to gate everything async (a deterministic-but-wrong scheduler snapshots garbage).
  • context-fiber-local invariant — named test at the N×M:1/M:N step. M:1 can't exercise migration; add a test that forces a fiber to migrate and asserts it reads its context/errno, not the new thread's.
  • N1 ordering semantics are out of snapshot scope — state it loudly. .ir snapshots prove the keyword emitted, not weak-memory correctness (e.g. relaxed where acquire was needed ships green). Declare this out-of-scope parallel to §10.2's race carve-out; lock-free structures need the stress harness.
  • C1 args-buffer — adversarial layout cases. Over-aligned structs, empty structs, aarch64 small-struct register splitting, bool — a wrong layout that happens to print right passes a stdout test. Call these out explicitly, not just "structs-by-value."
  • S2 — has no gate today despite a prior spike failure. When reached, add a TLS + C-constructor JIT test (the exact _Thread_local SIGABRT case), per host OS.
  • Hot-reload — no row today. When picked up: state-survival test + the live-suspended-fiber-into-stale-module hazard (R1/R2).