Files
sx/current/PLAN-FIBERS.md
agra 7044b8133b fibers: carve Stream B1 (PLAN-FIBERS + CHECKPOINT-FIBERS)
Carve the async-runtime fibers stream off PLAN-POST-METATYPE Stream B,
mirroring the atomics carve. Grounds the B1 compiler floor against the
tree:

- abi(.pure) exists in the ABI enum but is inert (type_resolver maps it
  to .default CC, emit emits no naked attr) -> B1.0 makes it emit LLVM
  naked + skip prologue/ctx. Corrected the design's callconv(.naked)
  spelling to the real abi(.pure).
- context is already an implicit *Context param (slot 0) + push Context
  is a stack alloca -> fiber-local for free; only shared root is the
  __sx_default_context global. B1.1 grounded as likely library-only
  (probe-first).
- B1.0 snapshot story corrected: naked body is raw per-arch asm -> two
  arch-gated examples (aarch64 + x86_64), not one host .ir.

Full xfail->green step detail + a B1.0a kickoff prompt. Baseline green
(721/0). No code change; first implementation step is B1.0a.
2026-06-20 14:16:39 +03:00

227 lines
16 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# PLAN-FIBERS — Stream B1 (fibers + Io + M:1 scheduler)
> **STATUS: 🚧 carved, not started.** First implementation step = **B1.0a** (naked-ABI
> lock commit). See the kickoff prompt at the bottom.
Carved from [PLAN-POST-METATYPE.md](PLAN-POST-METATYPE.md) Stream B (§B1) + the
design-of-record [../design/execution-evolution-roadmap.md](../design/execution-evolution-roadmap.md)
§4 (async), §7 steps 49, §8.1 (risks), §10 (testing). Progress in
[CHECKPOINT-FIBERS.md](CHECKPOINT-FIBERS.md). Stream B2 (channels/cancel/stdlib) is a
separate carve ([PLAN-CHANNELS.md], when reached) and depends on this + atomics (✅).
**Goal:** the colorblind, stackful, **pure-sx** async runtime — fibers behind an `Io`
interface, an M:1 scheduler, blocking + deterministic-sim + event-loop `Io` impls. The
**compiler floor is small and net-new**: make `abi(.pure)` actually emit an LLVM `naked`
function (B1.0), and confirm/close the per-fiber `context` root (B1.1). **Everything
else — the context-switch asm, fiber bootstrap, `mmap` stacks, the scheduler, futures,
the `Io` vtables — is ordinary sx library code** (design §4, §4.4). The irreducible FFI
floor: the per-arch asm context-switch (in `.sx`), syscall `extern`s, and `mmap`.
**Cadence (IMPASSIBLE):** no commit both adds a test AND makes it pass (lock-to-bail, then
flip to green); `zig build && zig build test` green after every step; never regen snapshots
while red; scope regens with `-Dname=examples/NNNN-…sx -Dupdate-goldens` + review the diff.
New corpus category: `18xx` concurrency. On an **unrelated** compiler bug → file
`issues/NNNN`, mark this checkpoint BLOCKED, STOP (CLAUDE.md). The in-session
worker-fix override (delegate a blocker to a worker) applies only with explicit user
authorization.
---
## Design (grounded against the tree)
### B1.0 — `abi(.pure)` ⇒ LLVM naked (the one genuinely net-new compiler piece in B1)
The design doc spells `callconv(.naked)`; the **real sx surface is `abi(.pure)`** — written
in the postfix slot, `name :: (sig) -> Ret abi(.pure) { asm { … }; }` (cf.
`build_options :: () -> BuildOptions abi(.compiler);` in [build.sx:28](../library/modules/build.sx#L28)).
**Grounding (verified — do not re-derive):**
- The `ABI` enum **already carries `.pure`**`ABI = enum { default, c, compiler, pure }`
([ast.zig:142](../src/ast.zig#L142)), documented "pure / naked function (inline asm
body), no calling-convention prologue/epilogue." So B1.0 is **NOT** "extend the enum."
- `.pure` is **inert today**: [type_resolver.zig:237](../src/ir/type_resolver.zig#L237)
maps `.compiler, .pure → .default` CC, and `emit_llvm` emits **no naked attribute**. So
the net-new work is exactly: **carry `abi == .pure` into the IR `Function`, emit the LLVM
`naked` attr, and skip the implicit-`Context` / prologue lowering** so the body is just
the asm block + its own `ret`.
- The IR `Function` struct ([inst.zig:605](../src/ir/inst.zig#L605)) carries `call_conv`
(default/c) + `is_compiler_domain`, but **no naked flag** — add one (`is_naked: bool`).
- Attribute API is in-tree: `nounwind` is set at
[emit_llvm.zig:1339](../src/ir/emit_llvm.zig#L1339) via
`LLVMGetEnumAttributeKindForName("nounwind", 8)``LLVMCreateEnumAttribute(ctx, id, 0)`
`LLVMAddAttributeAtIndex(func, func_idx_attr /* -1 */, attr)`. The `naked` attr is the
same shape: `LLVMGetEnumAttributeKindForName("naked", 5)`.
- The `.c` ABI **already skips the implicit ctx** at lowering — `lam.abi == .c` /
`fd.abi == .c` gates (closure.zig:171, [decl.zig:515](../src/ir/lower/decl.zig#L515)).
`.pure` must skip it **too** (a naked fn gets no synthetic `__sx_ctx`, no stack frame,
no prologue — args arrive in ABI registers and are read directly from asm).
- **Inline asm already works end-to-end** (lower→emit→JIT): aarch64
([examples/1645](../examples/1645-platform-asm-aarch64-add.sx)), x86_64
([examples/1651](../examples/1651-platform-asm-x86-syscall-write.sx)), global asm, JIT
([1653](../examples/1653-platform-asm-global-jit.sx)). `emitInlineAsm` /
`LLVMGetInlineAsm` at [ops.zig:915](../src/backend/llvm/ops.zig#L915). The naked body is
a single asm block reusing this path.
**`.pure``.c` (design §4.6 context-switch note):** a `.c` epilogue restores SP from the
frame; a context switch deliberately makes SP-in ≠ SP-out, so the `.c` epilogue would
restore from the *wrong* stack. `naked` = no prologue/epilogue/frame — the asm emits its
own `ret`. This is *why* the switch must be naked, not `.c`.
**Snapshot story (per the atomics precedent, corrected):** the LLVM `naked` attribute text
is **arch-invariant**, but a naked fn's *body is raw per-arch asm* (it can't be portable —
that's the point). So unlike atomics (where one host `.ir` sufficed), B1.0 needs **two
arch-gated examples** — an aarch64 one and an x86_64 one — exactly like the existing asm
corpus split (1645 aarch64 / 1651 x86). Each carries a `.build {"target": "<triple>"}`
sidecar: it runs end-to-end on a matching host and falls to **ir-only** on a mismatch
(asserting the `.ir` shows `define … #N` with the `naked` attribute + the asm body). State
loudly: **the `.ir` proves the `naked` keyword + asm emitted, NOT that the hand-written
register save/restore is correct** — that is the B1.3 switch-stress harness's job, never
the corpus's.
### B1.1 — per-fiber `context` root (grounding says this is SMALL, likely library-only)
**Grounding (verified — closes the design doc's open sizing question):**
- `context` is an **implicit `*Context` parameter** (`__sx_ctx`, slot 0), threaded through
every default-conv sx call ([lower.zig:259](../src/ir/lower.zig#L259)) — **not raw TLS**.
Inside a function `current_ctx_ref = Ref.fromIndex(0)` (the param) → it **rides the fiber
stack frame for free**.
- `push Context.{…}` allocates the new `Context` with a **stack `alloca`** and rebinds
`current_ctx_ref` to that slot ([stmt.zig:1263](../src/ir/lower/stmt.zig#L1263)) — "No
global, no walk." So **push frames are fiber-local for free**.
- The **only shared root** is the `__sx_default_context` **global**, bound at
entry-points / `abi(.c)` fns *before any user code runs*
([decl.zig:2667](../src/ir/lower/decl.zig#L2667), :2815).
⇒ The design doc's "lower as swappable indirection, never raw TLS" guards a **non-problem**
(confirmed). The **real, now-sized** B1.1 work is purely a **library convention**: a
freshly-`spawn`ed fiber must take its root `Context` from the **spawner's snapshot** (passed
as the fiber-entry fn's `__sx_ctx` slot-0 arg by the spawn trampoline), **not** the
`__sx_default_context` global. That is sx-side (the trampoline already controls slot 0) —
**expected to be ZERO compiler change.** B1.1's first action is a probe confirming this; if
a fiber genuinely re-reads the global root mid-stack (it should not — entry binds once),
*then* and only then is there a compiler obligation. **Ground the probe before sizing any
compiler work.** Prerequisite of B1.3 (a fiber needs a valid root before it switches).
### B1.2B1.5 — pure sx over the primitives (design §4)
- **B1.2 (A1):** `Io` interface + `context.io` + `Future` + `cancel()` — a protocol/vtable
threaded exactly like `Allocator` (which already lives at `Context` field 0; see
`allocViaContext` [call.zig:1214](../src/ir/lower/call.zig#L1214)). `Io` becomes another
`Context` field. No compiler change — protocols + context already carry it.
- **B1.3 (A2):** the fiber runtime — naked context-switch asm (per-arch), bootstrap, `mmap`
stacks **with mandatory guard pages**. All sx. **Highest corruption risk in the stream**
(§8.1.1) and **untestable by the deterministic `Io`** (which tests *scheduling*, not the
*switch*). Its **first deliverable, before the scheduler AND the deterministic `Io`**: a
standalone **2-fiber ping-pong switch-stress harness** (§10.7) — scribble every
callee-saved register + a stack canary before each suspend, deep/recursive chains, verify
all survive post-resume. This harness — not B1.4 — is A2's correctness gate.
- **B1.4 (A3):** `Io` impls in order **blocking → deterministic-sim (KEYSTONE) → event-loop**
(kqueue/epoll/io_uring). Build the deterministic `Io` right after blocking; **calibrate it
against blocking `Io`** before trusting it to gate everything async (§8.1.3, §10.7) — a
deterministic-but-wrong scheduler snapshots garbage. (Open, deferred: the event loop does
**not** yet cooperate with a platform UI run loop — CFRunLoop/ALooper; that's a §6
app-target gap, out of B1.)
- **B1.5 (A5·M:1):** the single-thread scheduler — validates the whole colorblind stack
end-to-end. `18xx` corpus runs under the deterministic `Io`, asserting a **program-emitted
ordering contract** (sequence markers), not raw interleaving, so scheduler-policy tweaks
don't churn every snapshot.
### Files the compiler floor touches (B1.0 only; B1.1B1.5 are library + tests)
B1.0 (naked) forces the exhaustive-switch / plumbing sites:
- [ast.zig:142](../src/ast.zig#L142) — `ABI.pure` (exists; reference only).
- [inst.zig:605](../src/ir/inst.zig#L605) — add `is_naked: bool = false` to `Function`.
- [decl.zig](../src/ir/lower/decl.zig) — set `is_naked` from `fd.abi == .pure`; gate the
implicit-ctx / param-stack / prologue lowering off for `.pure` (mirror the `.c` skips at
decl.zig:515 + the entry-ctx bind at :2667/:2815 — a naked fn binds no ctx).
- [type_resolver.zig:237](../src/ir/type_resolver.zig#L237) — leave CC `.default` (a naked
fn-pointer type has no CC of its own; the nakedness is a decl-level emit attribute).
- [emit_llvm.zig:1339](../src/ir/emit_llvm.zig#L1339)-adjacent — emit the `naked` enum attr
on the LLVM function when `is_naked`; ensure no body-prologue is generated (naked body =
the asm block only).
- Any `.op`/`Function`-field switch the Zig build flags — let the build tell you.
---
## Phases (xfail→green steps)
### B1.0 — `abi(.pure)` ⇒ LLVM naked ← START HERE
- **B1.0a (lock)** — carry `abi == .pure` into IR `Function.is_naked`; thread through
`decl.zig` (skip implicit-ctx for `.pure`, like `.c`); in `emit_llvm` **BAIL loudly** when
a naked fn is emitted ("naked (`abi(.pure)`) emission not yet implemented"). Add
`examples/1800-concurrency-naked-aarch64.sx` (a tiny naked fn with an aarch64 asm body that
reads its arg from `x0`/returns via `ret`, `.build {"target":"aarch64-macos"}`) **and**
`examples/1801-concurrency-naked-x86_64.sx` (x86_64 sibling, `.build
{"target":"x86_64-linux"}`). Seed markers; capture the **bail diagnostic** as the locked
snapshot (these are ir-only on the non-matching host, so the bail must surface at emit/IR
time, not run). `zig build && zig build test` green against the bail. Commit.
- **B1.0b (green)** — emit the LLVM `naked` attr (`LLVMGetEnumAttributeKindForName("naked",
5)` + add at func index 1); ensure the naked body lowers to *only* the asm block (no
prologue/epilogue, no ctx). On a matching host the example runs (asserts the computed
value); on a mismatch it's ir-only (assert `.ir` shows `naked` + the asm). Capture both
arch `.ir` snapshots; add a unit test in `emit_llvm.test.zig` asserting the `naked`
attribute is present on a `.pure` function. Review the diff (no stray error text). Commit.
### B1.1 — per-fiber `context` root (probe-first; likely zero compiler change)
- **B1.1a (probe + lock)** — write a probe (`.sx-tmp/`) + an `18xx` example that snapshots a
`Context` (e.g. a custom allocator pushed via `push Context`) and confirms it is carried by
slot 0 across an ordinary call chain (it is — grounded). If the probe shows a fiber-entry
trampoline can pass a snapshotted ctx as slot 0 with **no compiler change**, this phase is a
**library convention doc** (record it in the checkpoint) + a corpus example locking the
behavior. If (and only if) the probe surfaces a real compiler gap (a path re-reads
`__sx_default_context` mid-stack), file it as a step here and size it then.
### B1.2 — A1: `Io` interface + `context.io` + `Future` + `cancel()` API
Library-only. `Io` as a protocol added to `Context` (mirror `Allocator`). `Future`/`cancel`
API surface. xfail→green via an `18xx` example exercising the blocking `Io` default (real
suspend lands in B1.3). No compiler change expected; if a protocol-in-context gap appears,
file it.
### B1.3 — A2: fiber runtime (naked switch + bootstrap + guarded `mmap` stacks)
- **B1.3a (switch-stress harness FIRST)** — the standalone 2-fiber ping-pong harness
(register + canary survival, deep chains) per §10.7. This is A2's gate and predates the
scheduler + deterministic `Io`. Arch-gated run test (matching-host run; ir-only elsewhere).
- **B1.3b** — fiber bootstrap + `mmap` stacks **with guard pages** (mandatory — §8.1.1).
- (Cadence inside B1.3 follows lock→green per sub-piece; the asm switch is the highest-risk
artifact — review adversarially, with a worker if authorized.)
### B1.4 — A3: `Io` impls (blocking → deterministic-sim KEYSTONE → event-loop)
Blocking first; then the deterministic-sim `Io`, **calibrated against blocking** before any
`18xx` test trusts it; then the event loop. The deterministic `Io` is the test harness for
*all* of B1.5 + Stream B2.
### B1.5 — A5: M:1 scheduler
End-to-end validation of the colorblind stack. `18xx` corpus under the deterministic `Io`,
asserting program-emitted ordering contracts.
---
## Gates
- **B1.0:** unit `emit_llvm.test.zig` (the `naked` attr present on a `.pure` fn); two
arch-gated examples (aarch64 + x86_64) run end-to-end on a matching host, ir-only on a
mismatch (assert `naked` + asm in `.ir`). **OUT of corpus scope, stated loudly:** the
*correctness* of any hand-written register save/restore — that's the B1.3 stress harness.
- **B1.1:** an `18xx` example locking context-carried-by-slot-0 behavior + a checkpoint note
on the spawn-trampoline convention.
- **B1.3:** the **switch-stress harness is A2's gate** (register/canary survival — §10.7),
NOT a run/snapshot test; plus arch-gated run tests.
- **B1.4:** deterministic `Io` **calibrated** against blocking `Io` (§8.1.3) before trusting
it; `18xx` under the deterministic `Io`.
- **B1.5:** `18xx` ordering-contract snapshots under the deterministic `Io`.
## Kickoff prompt (B1.0a — paste into a fresh session)
> Implement Stream B1 step **B1.0a** (naked-ABI lock commit) per
> `current/PLAN-FIBERS.md`. Verify `zig build && zig build test` is green first. Then: (1)
> add `is_naked: bool = false` to the IR `Function` struct (`src/ir/inst.zig:605`); (2) in
> `src/ir/lower/decl.zig`, set `is_naked` from `fd.abi == .pure` and gate the implicit-`Context`
> / param-stack / entry-ctx lowering OFF for `.pure` (mirror the existing `fd.abi == .c`
> skips at decl.zig:515 + the `__sx_default_context` binds at :2667/:2815 — a naked fn binds
> no ctx); (3) in `src/ir/emit_llvm.zig`, **BAIL loudly** when emitting a naked function
> ("naked (`abi(.pure)`) emission not yet implemented") — do NOT emit the attr yet; (4) add
> `examples/1800-concurrency-naked-aarch64.sx` (tiny naked fn, aarch64 asm body, `.build
> {"target":"aarch64-macos"}`) and `examples/1801-concurrency-naked-x86_64.sx` (x86_64
> sibling, `.build {"target":"x86_64-linux"}`); seed the `.exit` markers, capture the
> emit/IR-time bail diagnostic as the locked snapshot, confirm `zig build test` green, review
> the diff, commit. STOP — B1.0b (real `naked` emission) is the next step; do NOT implement
> emission in the same commit that adds the examples. Handle any exhaustive-switch site the
> Zig build flags from the new `Function` field. If you hit an UNRELATED compiler bug, file
> `issues/NNNN`, mark `CHECKPOINT-FIBERS.md` BLOCKED, and STOP.