Files
sx/current/PLAN-FIBERS.md
agra 7044b8133b fibers: carve Stream B1 (PLAN-FIBERS + CHECKPOINT-FIBERS)
Carve the async-runtime fibers stream off PLAN-POST-METATYPE Stream B,
mirroring the atomics carve. Grounds the B1 compiler floor against the
tree:

- abi(.pure) exists in the ABI enum but is inert (type_resolver maps it
  to .default CC, emit emits no naked attr) -> B1.0 makes it emit LLVM
  naked + skip prologue/ctx. Corrected the design's callconv(.naked)
  spelling to the real abi(.pure).
- context is already an implicit *Context param (slot 0) + push Context
  is a stack alloca -> fiber-local for free; only shared root is the
  __sx_default_context global. B1.1 grounded as likely library-only
  (probe-first).
- B1.0 snapshot story corrected: naked body is raw per-arch asm -> two
  arch-gated examples (aarch64 + x86_64), not one host .ir.

Full xfail->green step detail + a B1.0a kickoff prompt. Baseline green
(721/0). No code change; first implementation step is B1.0a.
2026-06-20 14:16:39 +03:00

16 KiB
Raw Blame History

PLAN-FIBERS — Stream B1 (fibers + Io + M:1 scheduler)

STATUS: 🚧 carved, not started. First implementation step = B1.0a (naked-ABI lock commit). See the kickoff prompt at the bottom.

Carved from PLAN-POST-METATYPE.md Stream B (§B1) + the design-of-record ../design/execution-evolution-roadmap.md §4 (async), §7 steps 49, §8.1 (risks), §10 (testing). Progress in CHECKPOINT-FIBERS.md. Stream B2 (channels/cancel/stdlib) is a separate carve ([PLAN-CHANNELS.md], when reached) and depends on this + atomics ().

Goal: the colorblind, stackful, pure-sx async runtime — fibers behind an Io interface, an M:1 scheduler, blocking + deterministic-sim + event-loop Io impls. The compiler floor is small and net-new: make abi(.pure) actually emit an LLVM naked function (B1.0), and confirm/close the per-fiber context root (B1.1). Everything else — the context-switch asm, fiber bootstrap, mmap stacks, the scheduler, futures, the Io vtables — is ordinary sx library code (design §4, §4.4). The irreducible FFI floor: the per-arch asm context-switch (in .sx), syscall externs, and mmap.

Cadence (IMPASSIBLE): no commit both adds a test AND makes it pass (lock-to-bail, then flip to green); zig build && zig build test green after every step; never regen snapshots while red; scope regens with -Dname=examples/NNNN-…sx -Dupdate-goldens + review the diff. New corpus category: 18xx concurrency. On an unrelated compiler bug → file issues/NNNN, mark this checkpoint BLOCKED, STOP (CLAUDE.md). The in-session worker-fix override (delegate a blocker to a worker) applies only with explicit user authorization.


Design (grounded against the tree)

B1.0 — abi(.pure) ⇒ LLVM naked (the one genuinely net-new compiler piece in B1)

The design doc spells callconv(.naked); the real sx surface is abi(.pure) — written in the postfix slot, name :: (sig) -> Ret abi(.pure) { asm { … }; } (cf. build_options :: () -> BuildOptions abi(.compiler); in build.sx:28).

Grounding (verified — do not re-derive):

  • The ABI enum already carries .pureABI = enum { default, c, compiler, pure } (ast.zig:142), documented "pure / naked function (inline asm body), no calling-convention prologue/epilogue." So B1.0 is NOT "extend the enum."
  • .pure is inert today: type_resolver.zig:237 maps .compiler, .pure → .default CC, and emit_llvm emits no naked attribute. So the net-new work is exactly: carry abi == .pure into the IR Function, emit the LLVM naked attr, and skip the implicit-Context / prologue lowering so the body is just the asm block + its own ret.
  • The IR Function struct (inst.zig:605) carries call_conv (default/c) + is_compiler_domain, but no naked flag — add one (is_naked: bool).
  • Attribute API is in-tree: nounwind is set at emit_llvm.zig:1339 via LLVMGetEnumAttributeKindForName("nounwind", 8)LLVMCreateEnumAttribute(ctx, id, 0)LLVMAddAttributeAtIndex(func, func_idx_attr /* -1 */, attr). The naked attr is the same shape: LLVMGetEnumAttributeKindForName("naked", 5).
  • The .c ABI already skips the implicit ctx at lowering — lam.abi == .c / fd.abi == .c gates (closure.zig:171, decl.zig:515). .pure must skip it too (a naked fn gets no synthetic __sx_ctx, no stack frame, no prologue — args arrive in ABI registers and are read directly from asm).
  • Inline asm already works end-to-end (lower→emit→JIT): aarch64 (examples/1645), x86_64 (examples/1651), global asm, JIT (1653). emitInlineAsm / LLVMGetInlineAsm at ops.zig:915. The naked body is a single asm block reusing this path.

.pure.c (design §4.6 context-switch note): a .c epilogue restores SP from the frame; a context switch deliberately makes SP-in ≠ SP-out, so the .c epilogue would restore from the wrong stack. naked = no prologue/epilogue/frame — the asm emits its own ret. This is why the switch must be naked, not .c.

Snapshot story (per the atomics precedent, corrected): the LLVM naked attribute text is arch-invariant, but a naked fn's body is raw per-arch asm (it can't be portable — that's the point). So unlike atomics (where one host .ir sufficed), B1.0 needs two arch-gated examples — an aarch64 one and an x86_64 one — exactly like the existing asm corpus split (1645 aarch64 / 1651 x86). Each carries a .build {"target": "<triple>"} sidecar: it runs end-to-end on a matching host and falls to ir-only on a mismatch (asserting the .ir shows define … #N with the naked attribute + the asm body). State loudly: the .ir proves the naked keyword + asm emitted, NOT that the hand-written register save/restore is correct — that is the B1.3 switch-stress harness's job, never the corpus's.

B1.1 — per-fiber context root (grounding says this is SMALL, likely library-only)

Grounding (verified — closes the design doc's open sizing question):

  • context is an implicit *Context parameter (__sx_ctx, slot 0), threaded through every default-conv sx call (lower.zig:259) — not raw TLS. Inside a function current_ctx_ref = Ref.fromIndex(0) (the param) → it rides the fiber stack frame for free.
  • push Context.{…} allocates the new Context with a stack alloca and rebinds current_ctx_ref to that slot (stmt.zig:1263) — "No global, no walk." So push frames are fiber-local for free.
  • The only shared root is the __sx_default_context global, bound at entry-points / abi(.c) fns before any user code runs (decl.zig:2667, :2815).

⇒ The design doc's "lower as swappable indirection, never raw TLS" guards a non-problem (confirmed). The real, now-sized B1.1 work is purely a library convention: a freshly-spawned fiber must take its root Context from the spawner's snapshot (passed as the fiber-entry fn's __sx_ctx slot-0 arg by the spawn trampoline), not the __sx_default_context global. That is sx-side (the trampoline already controls slot 0) — expected to be ZERO compiler change. B1.1's first action is a probe confirming this; if a fiber genuinely re-reads the global root mid-stack (it should not — entry binds once), then and only then is there a compiler obligation. Ground the probe before sizing any compiler work. Prerequisite of B1.3 (a fiber needs a valid root before it switches).

B1.2B1.5 — pure sx over the primitives (design §4)

  • B1.2 (A1): Io interface + context.io + Future + cancel() — a protocol/vtable threaded exactly like Allocator (which already lives at Context field 0; see allocViaContext call.zig:1214). Io becomes another Context field. No compiler change — protocols + context already carry it.
  • B1.3 (A2): the fiber runtime — naked context-switch asm (per-arch), bootstrap, mmap stacks with mandatory guard pages. All sx. Highest corruption risk in the stream (§8.1.1) and untestable by the deterministic Io (which tests scheduling, not the switch). Its first deliverable, before the scheduler AND the deterministic Io: a standalone 2-fiber ping-pong switch-stress harness (§10.7) — scribble every callee-saved register + a stack canary before each suspend, deep/recursive chains, verify all survive post-resume. This harness — not B1.4 — is A2's correctness gate.
  • B1.4 (A3): Io impls in order blocking → deterministic-sim (KEYSTONE) → event-loop (kqueue/epoll/io_uring). Build the deterministic Io right after blocking; calibrate it against blocking Io before trusting it to gate everything async (§8.1.3, §10.7) — a deterministic-but-wrong scheduler snapshots garbage. (Open, deferred: the event loop does not yet cooperate with a platform UI run loop — CFRunLoop/ALooper; that's a §6 app-target gap, out of B1.)
  • B1.5 (A5·M:1): the single-thread scheduler — validates the whole colorblind stack end-to-end. 18xx corpus runs under the deterministic Io, asserting a program-emitted ordering contract (sequence markers), not raw interleaving, so scheduler-policy tweaks don't churn every snapshot.

Files the compiler floor touches (B1.0 only; B1.1B1.5 are library + tests)

B1.0 (naked) forces the exhaustive-switch / plumbing sites:

  • ast.zig:142ABI.pure (exists; reference only).
  • inst.zig:605 — add is_naked: bool = false to Function.
  • decl.zig — set is_naked from fd.abi == .pure; gate the implicit-ctx / param-stack / prologue lowering off for .pure (mirror the .c skips at decl.zig:515 + the entry-ctx bind at :2667/:2815 — a naked fn binds no ctx).
  • type_resolver.zig:237 — leave CC .default (a naked fn-pointer type has no CC of its own; the nakedness is a decl-level emit attribute).
  • emit_llvm.zig:1339-adjacent — emit the naked enum attr on the LLVM function when is_naked; ensure no body-prologue is generated (naked body = the asm block only).
  • Any .op/Function-field switch the Zig build flags — let the build tell you.

Phases (xfail→green steps)

B1.0 — abi(.pure) ⇒ LLVM naked ← START HERE

  • B1.0a (lock) — carry abi == .pure into IR Function.is_naked; thread through decl.zig (skip implicit-ctx for .pure, like .c); in emit_llvm BAIL loudly when a naked fn is emitted ("naked (abi(.pure)) emission not yet implemented"). Add examples/1800-concurrency-naked-aarch64.sx (a tiny naked fn with an aarch64 asm body that reads its arg from x0/returns via ret, .build {"target":"aarch64-macos"}) and examples/1801-concurrency-naked-x86_64.sx (x86_64 sibling, .build {"target":"x86_64-linux"}). Seed markers; capture the bail diagnostic as the locked snapshot (these are ir-only on the non-matching host, so the bail must surface at emit/IR time, not run). zig build && zig build test green against the bail. Commit.
  • B1.0b (green) — emit the LLVM naked attr (LLVMGetEnumAttributeKindForName("naked", 5) + add at func index 1); ensure the naked body lowers to only the asm block (no prologue/epilogue, no ctx). On a matching host the example runs (asserts the computed value); on a mismatch it's ir-only (assert .ir shows naked + the asm). Capture both arch .ir snapshots; add a unit test in emit_llvm.test.zig asserting the naked attribute is present on a .pure function. Review the diff (no stray error text). Commit.

B1.1 — per-fiber context root (probe-first; likely zero compiler change)

  • B1.1a (probe + lock) — write a probe (.sx-tmp/) + an 18xx example that snapshots a Context (e.g. a custom allocator pushed via push Context) and confirms it is carried by slot 0 across an ordinary call chain (it is — grounded). If the probe shows a fiber-entry trampoline can pass a snapshotted ctx as slot 0 with no compiler change, this phase is a library convention doc (record it in the checkpoint) + a corpus example locking the behavior. If (and only if) the probe surfaces a real compiler gap (a path re-reads __sx_default_context mid-stack), file it as a step here and size it then.

B1.2 — A1: Io interface + context.io + Future + cancel() API

Library-only. Io as a protocol added to Context (mirror Allocator). Future/cancel API surface. xfail→green via an 18xx example exercising the blocking Io default (real suspend lands in B1.3). No compiler change expected; if a protocol-in-context gap appears, file it.

B1.3 — A2: fiber runtime (naked switch + bootstrap + guarded mmap stacks)

  • B1.3a (switch-stress harness FIRST) — the standalone 2-fiber ping-pong harness (register + canary survival, deep chains) per §10.7. This is A2's gate and predates the scheduler + deterministic Io. Arch-gated run test (matching-host run; ir-only elsewhere).
  • B1.3b — fiber bootstrap + mmap stacks with guard pages (mandatory — §8.1.1).
  • (Cadence inside B1.3 follows lock→green per sub-piece; the asm switch is the highest-risk artifact — review adversarially, with a worker if authorized.)

B1.4 — A3: Io impls (blocking → deterministic-sim KEYSTONE → event-loop)

Blocking first; then the deterministic-sim Io, calibrated against blocking before any 18xx test trusts it; then the event loop. The deterministic Io is the test harness for all of B1.5 + Stream B2.

B1.5 — A5: M:1 scheduler

End-to-end validation of the colorblind stack. 18xx corpus under the deterministic Io, asserting program-emitted ordering contracts.


Gates

  • B1.0: unit emit_llvm.test.zig (the naked attr present on a .pure fn); two arch-gated examples (aarch64 + x86_64) run end-to-end on a matching host, ir-only on a mismatch (assert naked + asm in .ir). OUT of corpus scope, stated loudly: the correctness of any hand-written register save/restore — that's the B1.3 stress harness.
  • B1.1: an 18xx example locking context-carried-by-slot-0 behavior + a checkpoint note on the spawn-trampoline convention.
  • B1.3: the switch-stress harness is A2's gate (register/canary survival — §10.7), NOT a run/snapshot test; plus arch-gated run tests.
  • B1.4: deterministic Io calibrated against blocking Io (§8.1.3) before trusting it; 18xx under the deterministic Io.
  • B1.5: 18xx ordering-contract snapshots under the deterministic Io.

Kickoff prompt (B1.0a — paste into a fresh session)

Implement Stream B1 step B1.0a (naked-ABI lock commit) per current/PLAN-FIBERS.md. Verify zig build && zig build test is green first. Then: (1) add is_naked: bool = false to the IR Function struct (src/ir/inst.zig:605); (2) in src/ir/lower/decl.zig, set is_naked from fd.abi == .pure and gate the implicit-Context / param-stack / entry-ctx lowering OFF for .pure (mirror the existing fd.abi == .c skips at decl.zig:515 + the __sx_default_context binds at :2667/:2815 — a naked fn binds no ctx); (3) in src/ir/emit_llvm.zig, BAIL loudly when emitting a naked function ("naked (abi(.pure)) emission not yet implemented") — do NOT emit the attr yet; (4) add examples/1800-concurrency-naked-aarch64.sx (tiny naked fn, aarch64 asm body, .build {"target":"aarch64-macos"}) and examples/1801-concurrency-naked-x86_64.sx (x86_64 sibling, .build {"target":"x86_64-linux"}); seed the .exit markers, capture the emit/IR-time bail diagnostic as the locked snapshot, confirm zig build test green, review the diff, commit. STOP — B1.0b (real naked emission) is the next step; do NOT implement emission in the same commit that adds the examples. Handle any exhaustive-switch site the Zig build flags from the new Function field. If you hit an UNRELATED compiler bug, file issues/NNNN, mark CHECKPOINT-FIBERS.md BLOCKED, and STOP.