fibers: carve Stream B1 (PLAN-FIBERS + CHECKPOINT-FIBERS)

Carve the async-runtime fibers stream off PLAN-POST-METATYPE Stream B,
mirroring the atomics carve. Grounds the B1 compiler floor against the
tree:

- abi(.pure) exists in the ABI enum but is inert (type_resolver maps it
  to .default CC, emit emits no naked attr) -> B1.0 makes it emit LLVM
  naked + skip prologue/ctx. Corrected the design's callconv(.naked)
  spelling to the real abi(.pure).
- context is already an implicit *Context param (slot 0) + push Context
  is a stack alloca -> fiber-local for free; only shared root is the
  __sx_default_context global. B1.1 grounded as likely library-only
  (probe-first).
- B1.0 snapshot story corrected: naked body is raw per-arch asm -> two
  arch-gated examples (aarch64 + x86_64), not one host .ir.

Full xfail->green step detail + a B1.0a kickoff prompt. Baseline green
(721/0). No code change; first implementation step is B1.0a.
This commit is contained in:
agra
2026-06-20 14:16:39 +03:00
parent 3fad2d5a21
commit 7044b8133b
3 changed files with 305 additions and 1 deletions

View File

@@ -0,0 +1,78 @@
# CHECKPOINT-FIBERS — Stream B1 (fibers + Io + M:1 scheduler)
Companion to [PLAN-FIBERS.md](PLAN-FIBERS.md). Update after every step (one step at a time,
per the cadence rule). New corpus category: `18xx` concurrency.
## Last completed step
**Carve** — wrote PLAN-FIBERS.md + this checkpoint. Grounded the B1 compiler floor against
the tree (see Decisions). Baseline verified green: `zig build && zig build test` → **721
ran, 0 failed** (one Android-SDK-gated example skipped; the trailing "failed command:" line
is the zig listen-protocol echo, not a failure). HEAD `3fad2d5`, tree clean.
## Current state
Stream A (atomics) is feature-complete (✅) and unblocks B2-channels. Stream B1 is **carved,
not started**. No fibers/Io/scheduler code exists yet. The compiler floor for B1 is grounded:
- `abi(.pure)` exists in the `ABI` enum but is **inert** — maps to `.default` CC, emits no
naked attribute. B1.0 makes it actually emit LLVM `naked`.
- `context` is already an implicit `*Context` param (slot 0) + `push Context` is a stack
`alloca`**fiber-local for free**. The only shared root is the `__sx_default_context`
global (entry-point bind). B1.1 is therefore expected to be a **library convention** (spawn
trampoline snapshots the spawner's ctx into slot 0), **likely zero compiler change**
confirm by probe first.
- Inline asm works end-to-end (lower→emit→JIT, aarch64 + x86_64) — the naked body reuses it.
## Next step
**B1.0a (naked-ABI lock commit)** — per PLAN-FIBERS.md "Phases → B1.0 → B1.0a" and the
kickoff prompt at the bottom of that file. Add `Function.is_naked`, thread `abi == .pure`
through `decl.zig` (skip implicit-ctx like `.c`), make `emit_llvm` **BAIL loudly** on a naked
fn, add the two arch-gated examples (`1800` aarch64 / `1801` x86_64), lock to the bail
diagnostic. STOP before B1.0b (real emission) — separate commit (cadence rule).
## Known issues / capability gaps
- **Orthogonal (not a B1 blocker):** default VALUES for comptime params don't bind on
generic-struct methods (free-fn defaults DO work) — inherited from Stream A. Only matters
if a B2 lib type wants a defaulted comptime param; atomics/fibers require explicit, so
unaffected.
- **Issue 0144 (open, independent):** calling an unrecognized bodiless `#builtin` silently
returns 0 / exit 0 — a silent-fallback footgun in the generic builtin-call path. Filed;
leave for its own fix session unless prioritized. Not a B1 blocker.
- **Deferred design gap (documented):** the B1.4 event-loop `Io` does not yet cooperate with
a platform UI run loop (CFRunLoop/NSRunLoop/ALooper); pinning gives thread-affinity, not
run-loop integration — a §6 app-target concern, out of B1 scope.
## Decisions (Stream B1 specifics; surface locked in design §4 / §4.6)
- **The async runtime is sx LIBRARY code.** The compiler provides only: the general
primitives (inline asm ✅, `abi(.pure)` naked [B1.0], atomics ✅) + fiber-safe codegen
(`context` already fiber-local — B1.1). Schedulers, fibers, channels, futures, `Io`
vtables, `mmap` stacks are all sx.
- **`abi(.pure)` is the real spelling of the design's `callconv(.naked)`** — postfix slot,
`name :: (sig) -> Ret abi(.pure) { asm { … }; }`. B1.0 = carry it into IR + emit LLVM
`naked` + skip prologue/ctx (mirror the existing `.c` skip), NOT extend the enum (it's
already there, just inert).
- **`.pure``.c`:** a `.c` epilogue would restore SP from the wrong stack across a context
switch (SP-in ≠ SP-out by design). naked = no prologue/epilogue/frame; the asm emits its
own `ret`. This is why the switch must be naked.
- **B1.0 snapshot scope:** the `naked` attr text is arch-invariant, but a naked body is raw
per-arch asm — so B1.0 needs **two arch-gated examples** (aarch64 + x86_64, `.build`
target-gated, ir-only on mismatch), unlike atomics' single host `.ir`. The `.ir` proves
`naked` + asm emitted, NOT register-save correctness (that's B1.3's stress harness).
- **B1.1 grounded as library-only (pending probe):** push frames are stack-`alloca`'d and
the implicit ctx rides slot 0, so a spawn trampoline can pass a snapshotted ctx with no
compiler change. The design doc's "never raw TLS" guards a non-problem (context is not
TLS). Probe to confirm before sizing any compiler work.
- **Test keystones (design §10):** the **B1.3 switch-stress harness** gates the
context-switch (the one piece the deterministic `Io` can't test — §8.1.1, §10.7); the
**B1.4 deterministic-sim `Io`** (calibrated against blocking `Io` — §8.1.3) gates all
scheduling tests. Both must exist + be calibrated before the async tests they gate are
trusted. `18xx` asserts program-emitted ordering contracts, not raw interleaving.
## Log
- **carve** — wrote PLAN-FIBERS.md + CHECKPOINT-FIBERS.md. Grounded the B1 compiler floor:
`ABI.pure` inert (type_resolver.zig:237), IR `Function` has no naked flag (inst.zig:605),
attribute API pattern (emit_llvm.zig:1339 nounwind), `.c` ctx-skip precedent
(decl.zig:515), `push Context` stack-alloca + slot-0 implicit ctx (stmt.zig:1263,
lower.zig:259), `__sx_default_context` root (decl.zig:2667/2815), inline-asm corpus
(1645/1651). Corrected the design's `callconv(.naked)` → real `abi(.pure)` spelling and
the B1.0 snapshot story (two arch-gated examples, not one host `.ir`). B1.1 grounded as
likely library-only. Baseline green (721/0). Stream ready; **B1.0a is the first
implementation step.**

226
current/PLAN-FIBERS.md Normal file
View File

@@ -0,0 +1,226 @@
# PLAN-FIBERS — Stream B1 (fibers + Io + M:1 scheduler)
> **STATUS: 🚧 carved, not started.** First implementation step = **B1.0a** (naked-ABI
> lock commit). See the kickoff prompt at the bottom.
Carved from [PLAN-POST-METATYPE.md](PLAN-POST-METATYPE.md) Stream B (§B1) + the
design-of-record [../design/execution-evolution-roadmap.md](../design/execution-evolution-roadmap.md)
§4 (async), §7 steps 49, §8.1 (risks), §10 (testing). Progress in
[CHECKPOINT-FIBERS.md](CHECKPOINT-FIBERS.md). Stream B2 (channels/cancel/stdlib) is a
separate carve ([PLAN-CHANNELS.md], when reached) and depends on this + atomics (✅).
**Goal:** the colorblind, stackful, **pure-sx** async runtime — fibers behind an `Io`
interface, an M:1 scheduler, blocking + deterministic-sim + event-loop `Io` impls. The
**compiler floor is small and net-new**: make `abi(.pure)` actually emit an LLVM `naked`
function (B1.0), and confirm/close the per-fiber `context` root (B1.1). **Everything
else — the context-switch asm, fiber bootstrap, `mmap` stacks, the scheduler, futures,
the `Io` vtables — is ordinary sx library code** (design §4, §4.4). The irreducible FFI
floor: the per-arch asm context-switch (in `.sx`), syscall `extern`s, and `mmap`.
**Cadence (IMPASSIBLE):** no commit both adds a test AND makes it pass (lock-to-bail, then
flip to green); `zig build && zig build test` green after every step; never regen snapshots
while red; scope regens with `-Dname=examples/NNNN-…sx -Dupdate-goldens` + review the diff.
New corpus category: `18xx` concurrency. On an **unrelated** compiler bug → file
`issues/NNNN`, mark this checkpoint BLOCKED, STOP (CLAUDE.md). The in-session
worker-fix override (delegate a blocker to a worker) applies only with explicit user
authorization.
---
## Design (grounded against the tree)
### B1.0 — `abi(.pure)` ⇒ LLVM naked (the one genuinely net-new compiler piece in B1)
The design doc spells `callconv(.naked)`; the **real sx surface is `abi(.pure)`** — written
in the postfix slot, `name :: (sig) -> Ret abi(.pure) { asm { … }; }` (cf.
`build_options :: () -> BuildOptions abi(.compiler);` in [build.sx:28](../library/modules/build.sx#L28)).
**Grounding (verified — do not re-derive):**
- The `ABI` enum **already carries `.pure`**`ABI = enum { default, c, compiler, pure }`
([ast.zig:142](../src/ast.zig#L142)), documented "pure / naked function (inline asm
body), no calling-convention prologue/epilogue." So B1.0 is **NOT** "extend the enum."
- `.pure` is **inert today**: [type_resolver.zig:237](../src/ir/type_resolver.zig#L237)
maps `.compiler, .pure → .default` CC, and `emit_llvm` emits **no naked attribute**. So
the net-new work is exactly: **carry `abi == .pure` into the IR `Function`, emit the LLVM
`naked` attr, and skip the implicit-`Context` / prologue lowering** so the body is just
the asm block + its own `ret`.
- The IR `Function` struct ([inst.zig:605](../src/ir/inst.zig#L605)) carries `call_conv`
(default/c) + `is_compiler_domain`, but **no naked flag** — add one (`is_naked: bool`).
- Attribute API is in-tree: `nounwind` is set at
[emit_llvm.zig:1339](../src/ir/emit_llvm.zig#L1339) via
`LLVMGetEnumAttributeKindForName("nounwind", 8)``LLVMCreateEnumAttribute(ctx, id, 0)`
`LLVMAddAttributeAtIndex(func, func_idx_attr /* -1 */, attr)`. The `naked` attr is the
same shape: `LLVMGetEnumAttributeKindForName("naked", 5)`.
- The `.c` ABI **already skips the implicit ctx** at lowering — `lam.abi == .c` /
`fd.abi == .c` gates (closure.zig:171, [decl.zig:515](../src/ir/lower/decl.zig#L515)).
`.pure` must skip it **too** (a naked fn gets no synthetic `__sx_ctx`, no stack frame,
no prologue — args arrive in ABI registers and are read directly from asm).
- **Inline asm already works end-to-end** (lower→emit→JIT): aarch64
([examples/1645](../examples/1645-platform-asm-aarch64-add.sx)), x86_64
([examples/1651](../examples/1651-platform-asm-x86-syscall-write.sx)), global asm, JIT
([1653](../examples/1653-platform-asm-global-jit.sx)). `emitInlineAsm` /
`LLVMGetInlineAsm` at [ops.zig:915](../src/backend/llvm/ops.zig#L915). The naked body is
a single asm block reusing this path.
**`.pure``.c` (design §4.6 context-switch note):** a `.c` epilogue restores SP from the
frame; a context switch deliberately makes SP-in ≠ SP-out, so the `.c` epilogue would
restore from the *wrong* stack. `naked` = no prologue/epilogue/frame — the asm emits its
own `ret`. This is *why* the switch must be naked, not `.c`.
**Snapshot story (per the atomics precedent, corrected):** the LLVM `naked` attribute text
is **arch-invariant**, but a naked fn's *body is raw per-arch asm* (it can't be portable —
that's the point). So unlike atomics (where one host `.ir` sufficed), B1.0 needs **two
arch-gated examples** — an aarch64 one and an x86_64 one — exactly like the existing asm
corpus split (1645 aarch64 / 1651 x86). Each carries a `.build {"target": "<triple>"}`
sidecar: it runs end-to-end on a matching host and falls to **ir-only** on a mismatch
(asserting the `.ir` shows `define … #N` with the `naked` attribute + the asm body). State
loudly: **the `.ir` proves the `naked` keyword + asm emitted, NOT that the hand-written
register save/restore is correct** — that is the B1.3 switch-stress harness's job, never
the corpus's.
### B1.1 — per-fiber `context` root (grounding says this is SMALL, likely library-only)
**Grounding (verified — closes the design doc's open sizing question):**
- `context` is an **implicit `*Context` parameter** (`__sx_ctx`, slot 0), threaded through
every default-conv sx call ([lower.zig:259](../src/ir/lower.zig#L259)) — **not raw TLS**.
Inside a function `current_ctx_ref = Ref.fromIndex(0)` (the param) → it **rides the fiber
stack frame for free**.
- `push Context.{…}` allocates the new `Context` with a **stack `alloca`** and rebinds
`current_ctx_ref` to that slot ([stmt.zig:1263](../src/ir/lower/stmt.zig#L1263)) — "No
global, no walk." So **push frames are fiber-local for free**.
- The **only shared root** is the `__sx_default_context` **global**, bound at
entry-points / `abi(.c)` fns *before any user code runs*
([decl.zig:2667](../src/ir/lower/decl.zig#L2667), :2815).
⇒ The design doc's "lower as swappable indirection, never raw TLS" guards a **non-problem**
(confirmed). The **real, now-sized** B1.1 work is purely a **library convention**: a
freshly-`spawn`ed fiber must take its root `Context` from the **spawner's snapshot** (passed
as the fiber-entry fn's `__sx_ctx` slot-0 arg by the spawn trampoline), **not** the
`__sx_default_context` global. That is sx-side (the trampoline already controls slot 0) —
**expected to be ZERO compiler change.** B1.1's first action is a probe confirming this; if
a fiber genuinely re-reads the global root mid-stack (it should not — entry binds once),
*then* and only then is there a compiler obligation. **Ground the probe before sizing any
compiler work.** Prerequisite of B1.3 (a fiber needs a valid root before it switches).
### B1.2B1.5 — pure sx over the primitives (design §4)
- **B1.2 (A1):** `Io` interface + `context.io` + `Future` + `cancel()` — a protocol/vtable
threaded exactly like `Allocator` (which already lives at `Context` field 0; see
`allocViaContext` [call.zig:1214](../src/ir/lower/call.zig#L1214)). `Io` becomes another
`Context` field. No compiler change — protocols + context already carry it.
- **B1.3 (A2):** the fiber runtime — naked context-switch asm (per-arch), bootstrap, `mmap`
stacks **with mandatory guard pages**. All sx. **Highest corruption risk in the stream**
(§8.1.1) and **untestable by the deterministic `Io`** (which tests *scheduling*, not the
*switch*). Its **first deliverable, before the scheduler AND the deterministic `Io`**: a
standalone **2-fiber ping-pong switch-stress harness** (§10.7) — scribble every
callee-saved register + a stack canary before each suspend, deep/recursive chains, verify
all survive post-resume. This harness — not B1.4 — is A2's correctness gate.
- **B1.4 (A3):** `Io` impls in order **blocking → deterministic-sim (KEYSTONE) → event-loop**
(kqueue/epoll/io_uring). Build the deterministic `Io` right after blocking; **calibrate it
against blocking `Io`** before trusting it to gate everything async (§8.1.3, §10.7) — a
deterministic-but-wrong scheduler snapshots garbage. (Open, deferred: the event loop does
**not** yet cooperate with a platform UI run loop — CFRunLoop/ALooper; that's a §6
app-target gap, out of B1.)
- **B1.5 (A5·M:1):** the single-thread scheduler — validates the whole colorblind stack
end-to-end. `18xx` corpus runs under the deterministic `Io`, asserting a **program-emitted
ordering contract** (sequence markers), not raw interleaving, so scheduler-policy tweaks
don't churn every snapshot.
### Files the compiler floor touches (B1.0 only; B1.1B1.5 are library + tests)
B1.0 (naked) forces the exhaustive-switch / plumbing sites:
- [ast.zig:142](../src/ast.zig#L142) — `ABI.pure` (exists; reference only).
- [inst.zig:605](../src/ir/inst.zig#L605) — add `is_naked: bool = false` to `Function`.
- [decl.zig](../src/ir/lower/decl.zig) — set `is_naked` from `fd.abi == .pure`; gate the
implicit-ctx / param-stack / prologue lowering off for `.pure` (mirror the `.c` skips at
decl.zig:515 + the entry-ctx bind at :2667/:2815 — a naked fn binds no ctx).
- [type_resolver.zig:237](../src/ir/type_resolver.zig#L237) — leave CC `.default` (a naked
fn-pointer type has no CC of its own; the nakedness is a decl-level emit attribute).
- [emit_llvm.zig:1339](../src/ir/emit_llvm.zig#L1339)-adjacent — emit the `naked` enum attr
on the LLVM function when `is_naked`; ensure no body-prologue is generated (naked body =
the asm block only).
- Any `.op`/`Function`-field switch the Zig build flags — let the build tell you.
---
## Phases (xfail→green steps)
### B1.0 — `abi(.pure)` ⇒ LLVM naked ← START HERE
- **B1.0a (lock)** — carry `abi == .pure` into IR `Function.is_naked`; thread through
`decl.zig` (skip implicit-ctx for `.pure`, like `.c`); in `emit_llvm` **BAIL loudly** when
a naked fn is emitted ("naked (`abi(.pure)`) emission not yet implemented"). Add
`examples/1800-concurrency-naked-aarch64.sx` (a tiny naked fn with an aarch64 asm body that
reads its arg from `x0`/returns via `ret`, `.build {"target":"aarch64-macos"}`) **and**
`examples/1801-concurrency-naked-x86_64.sx` (x86_64 sibling, `.build
{"target":"x86_64-linux"}`). Seed markers; capture the **bail diagnostic** as the locked
snapshot (these are ir-only on the non-matching host, so the bail must surface at emit/IR
time, not run). `zig build && zig build test` green against the bail. Commit.
- **B1.0b (green)** — emit the LLVM `naked` attr (`LLVMGetEnumAttributeKindForName("naked",
5)` + add at func index 1); ensure the naked body lowers to *only* the asm block (no
prologue/epilogue, no ctx). On a matching host the example runs (asserts the computed
value); on a mismatch it's ir-only (assert `.ir` shows `naked` + the asm). Capture both
arch `.ir` snapshots; add a unit test in `emit_llvm.test.zig` asserting the `naked`
attribute is present on a `.pure` function. Review the diff (no stray error text). Commit.
### B1.1 — per-fiber `context` root (probe-first; likely zero compiler change)
- **B1.1a (probe + lock)** — write a probe (`.sx-tmp/`) + an `18xx` example that snapshots a
`Context` (e.g. a custom allocator pushed via `push Context`) and confirms it is carried by
slot 0 across an ordinary call chain (it is — grounded). If the probe shows a fiber-entry
trampoline can pass a snapshotted ctx as slot 0 with **no compiler change**, this phase is a
**library convention doc** (record it in the checkpoint) + a corpus example locking the
behavior. If (and only if) the probe surfaces a real compiler gap (a path re-reads
`__sx_default_context` mid-stack), file it as a step here and size it then.
### B1.2 — A1: `Io` interface + `context.io` + `Future` + `cancel()` API
Library-only. `Io` as a protocol added to `Context` (mirror `Allocator`). `Future`/`cancel`
API surface. xfail→green via an `18xx` example exercising the blocking `Io` default (real
suspend lands in B1.3). No compiler change expected; if a protocol-in-context gap appears,
file it.
### B1.3 — A2: fiber runtime (naked switch + bootstrap + guarded `mmap` stacks)
- **B1.3a (switch-stress harness FIRST)** — the standalone 2-fiber ping-pong harness
(register + canary survival, deep chains) per §10.7. This is A2's gate and predates the
scheduler + deterministic `Io`. Arch-gated run test (matching-host run; ir-only elsewhere).
- **B1.3b** — fiber bootstrap + `mmap` stacks **with guard pages** (mandatory — §8.1.1).
- (Cadence inside B1.3 follows lock→green per sub-piece; the asm switch is the highest-risk
artifact — review adversarially, with a worker if authorized.)
### B1.4 — A3: `Io` impls (blocking → deterministic-sim KEYSTONE → event-loop)
Blocking first; then the deterministic-sim `Io`, **calibrated against blocking** before any
`18xx` test trusts it; then the event loop. The deterministic `Io` is the test harness for
*all* of B1.5 + Stream B2.
### B1.5 — A5: M:1 scheduler
End-to-end validation of the colorblind stack. `18xx` corpus under the deterministic `Io`,
asserting program-emitted ordering contracts.
---
## Gates
- **B1.0:** unit `emit_llvm.test.zig` (the `naked` attr present on a `.pure` fn); two
arch-gated examples (aarch64 + x86_64) run end-to-end on a matching host, ir-only on a
mismatch (assert `naked` + asm in `.ir`). **OUT of corpus scope, stated loudly:** the
*correctness* of any hand-written register save/restore — that's the B1.3 stress harness.
- **B1.1:** an `18xx` example locking context-carried-by-slot-0 behavior + a checkpoint note
on the spawn-trampoline convention.
- **B1.3:** the **switch-stress harness is A2's gate** (register/canary survival — §10.7),
NOT a run/snapshot test; plus arch-gated run tests.
- **B1.4:** deterministic `Io` **calibrated** against blocking `Io` (§8.1.3) before trusting
it; `18xx` under the deterministic `Io`.
- **B1.5:** `18xx` ordering-contract snapshots under the deterministic `Io`.
## Kickoff prompt (B1.0a — paste into a fresh session)
> Implement Stream B1 step **B1.0a** (naked-ABI lock commit) per
> `current/PLAN-FIBERS.md`. Verify `zig build && zig build test` is green first. Then: (1)
> add `is_naked: bool = false` to the IR `Function` struct (`src/ir/inst.zig:605`); (2) in
> `src/ir/lower/decl.zig`, set `is_naked` from `fd.abi == .pure` and gate the implicit-`Context`
> / param-stack / entry-ctx lowering OFF for `.pure` (mirror the existing `fd.abi == .c`
> skips at decl.zig:515 + the `__sx_default_context` binds at :2667/:2815 — a naked fn binds
> no ctx); (3) in `src/ir/emit_llvm.zig`, **BAIL loudly** when emitting a naked function
> ("naked (`abi(.pure)`) emission not yet implemented") — do NOT emit the attr yet; (4) add
> `examples/1800-concurrency-naked-aarch64.sx` (tiny naked fn, aarch64 asm body, `.build
> {"target":"aarch64-macos"}`) and `examples/1801-concurrency-naked-x86_64.sx` (x86_64
> sibling, `.build {"target":"x86_64-linux"}`); seed the `.exit` markers, capture the
> emit/IR-time bail diagnostic as the locked snapshot, confirm `zig build test` green, review
> the diff, commit. STOP — B1.0b (real `naked` emission) is the next step; do NOT implement
> emission in the same commit that adds the examples. Handle any exhaustive-switch site the
> Zig build flags from the new `Function` field. If you hit an UNRELATED compiler bug, file
> `issues/NNNN`, mark `CHECKPOINT-FIBERS.md` BLOCKED, and STOP.

View File

@@ -60,7 +60,7 @@ differently — x86 vs LL/SC). **Out of snapshot scope, state loudly:** ordering
The colorblind, stackful, pure-sx async runtime (design §4). Compiler floor is small;
the runtime is sx lib. Likely carved as two PLANs:
### B1 — Fibers + Io + M:1 (the runtime; `PLAN-FIBERS.md`)
### B1 — Fibers + Io + M:1 (the runtime; `PLAN-FIBERS.md`) · 🚧 **CARVED** (not started; first step B1.0a)
- B1.0 **`abi(.naked)` — make the EXISTING `.pure` ABI actually naked.** The enum
already carries `.pure` (ast.zig:142, documented "pure/naked, no prologue/epilogue"),
but it is an **inert label today**: `type_resolver.zig:237` maps `.pure → .default`