Files
sx/current/CHECKPOINT-FIBERS.md
agra 0ab26c8a40 fibers B1.2: record review findings — async surface blocked on 0151 (widened)
Adversarial review of 45d869d: the Io infrastructure (both materializers,
push-inherit, 37 .ir regens, !-lint) is correct + landed; but await/cancel
(*Future($R)) are uncallable in EVERY form because sx can't infer a generic
$T from a pointer-wrapped arg. Widened issue 0151 to that root (repro:
unbox(b: *Box($T)) -> $T). Checkpoint: B1.2 partially landed; next = fix 0151
generic inference -> make await/cancel callable -> add 1805/1806 -> B1.3.
2026-06-21 00:43:09 +03:00

294 lines
22 KiB
Markdown

# CHECKPOINT-FIBERS — Stream B1 (fibers + Io + M:1 scheduler)
Companion to [PLAN-FIBERS.md](PLAN-FIBERS.md). Update after every step (one step at a time,
per the cadence rule). New corpus category: `18xx` concurrency.
## Last completed step
**B1.2 (Io capability) — PARTIALLY LANDED + adversarially reviewed. Infrastructure GOOD;
async SURFACE blocked on a generic-inference compiler bug (issue 0151, widened).**
Commits `a1b14f0` (lock) + `45d869d` (Io capability) + `3eeb965` (issue 0151). Suite green
**726/0**, master clean.
- **LANDED + review-confirmed correct** (commit 45d869d): `Io :: protocol #inline`
(spawn_raw/suspend_raw/ready/poll/now_ms/arm_timer) + `io` field on `Context`
(`{allocator; data; io}`, io LAST); BOTH `__sx_default_context` materializers
(protocol.zig + comptime_vm.zig) build an identical CBlockingIo→Io vtable (review verified
byte-for-byte agreement; `context.io.now_ms()` dispatches at runtime AND comptime); the
`push Context.{…}` omitted-field-**inherits-ambient** fix (review: correct, right fix, no
bad blast radius); `library/modules/std/io.sx` (`Future($R)`, `CBlockingIo`,
`async`/`await`/`cancel`); the `!`-protocol-impl-lint suppression; 37 `.ir` regens
(review: pure layout/type-table, no error text, zero .exit/.stdout/.stderr change).
- **BLOCKED — async surface non-functional:** `await`/`cancel` take `*Future($R)` and are
**uncallable in EVERY form** (not just UFCS) — sx can't infer a generic `$T` from a
pointer-wrapped arg (`*Future($R)`). `async(...)` (create) works via explicit call and
produces a correct `.ready` Future, but you can't `await` it. Root bug = **issue 0151
(WIDENED)**: infer `$T` from `*T`-wrapped params + closure-return-via-pack + UFCS dispatch.
Minimal repro: `unbox :: (b: *Box($T)) -> $T` fails to infer `T`.
- **No async example in the corpus** (1805 was removed because it needs the blocked surface)
→ the green suite does NOT cover async. Restore `1805` (async/await) + add `1806` (cancel)
once 0151 is fixed.
### Earlier — B1.1 (per-fiber `context` root) — DONE. Zero compiler change (confirmed by probe).
The fiber-spawn context convention works end-to-end with ordinary language features:
- `snap := context` captures the spawner's `Context` as a value;
- the snapshot is stored in a struct (the stand-in `Fiber`);
- a trampoline running under a *different* ambient context installs the fiber's stored root
with `push f.root { … }`, and the body reads the snapshot — not the trampoline's ambient
context — because `context` is an implicit slot-0 `*Context` param (call-carried, rides the
callee's own stack) and `push` allocates on the caller frame (no global, no TLS).
- Locked by `examples/1804-concurrency-context-snapshot.sx`: prints `fiber root: 42` (the
installed snapshot wins over ambient 99) + `ambient after: 99` (the `push` scope restores
the ambient context on exit). No fiber runtime yet (that's B1.3) — this proves the plumbing
it will build on. No `.build` pin (pure sx, host-independent).
- **Probe result:** the design doc's "lower as swappable indirection, never raw TLS" guarded
a non-problem — context was already param-carried, never TLS. No path re-reads
`__sx_default_context` mid-stack, so there is **no compiler obligation** here.
- `zig build && zig build test` green: **726 ran, 0 failed**.
### Earlier — B1.0 (`abi(.naked)` codegen) — complete
Replaced the emit bail with real LLVM `naked` emission:
- `emit_llvm` declaration pass: for `func.is_naked`, add the LLVM `naked` + `noinline` +
`nounwind` attributes and **skip** the `frame-pointer=all` attribute (incompatible with a
frameless function). Pass 2 now emits the `.naked` body normally — `naked` makes the
backend emit it verbatim (the inline asm + its own `ret`) with no prologue/epilogue.
- IR shape (verified): `; Function Attrs: naked noinline nounwind` / `define internal i64
@answer() #0 { entry: call void asm sideeffect "…ret…", ""() unreachable }` /
`attributes #0 = { naked noinline nounwind }`. The caller invokes it as an ordinary
`() -> i64` call (`.naked` is `call_conv == .default`).
- `examples/1800-concurrency-naked-asm.sx` — now GREEN, aarch64-pinned (`.build {"target":
"macos"}`): runs end-to-end → **exit 42** on this host, ir-only on a mismatch; `.ir`
snapshot captured.
- `examples/1801-concurrency-naked-generic.sx` (renamed from `-bail`) — the generic `.naked`
now emits a correct naked `answer__i64` (exit 42), proving generic.zig produces a naked
body, not a framed one. aarch64-pinned.
- `examples/1802-concurrency-naked-asm-x86.sx` — x86_64 cross sibling (`.build {"target":
"x86_64-linux"}`, ir-only here): `.ir` locks `naked` + `movl $42, %eax` / `ret`.
- Unit test `emit: abi(.naked) function gets the naked attribute (no frame-pointer)` in
`emit_llvm.test.zig` (asserts `naked` present, `frame-pointer` absent).
- **B1.0c (review-hardening):** a param-bearing `.naked` fn emitted invalid LLVM (loud
verifier error "cannot use argument of naked function") because the param-alloca loop
wasn't gated. Fixed forward (this *enables* the B1.3 context-switch use case rather than
rejecting it): gated the param-alloca loop on `fd.abi != .naked` in decl.zig (both paths) +
generic.zig; a naked fn's args stay in registers (read by asm), declared-but-unused in
LLVM. Locked by `examples/1803-concurrency-naked-asm-param.sx` (`add(a,b)` → x0+x1 → 42).
- `zig build && zig build test` green: **725 ran, 0 failed** + unit tests.
### Earlier — B1.0a (lock + review hardening)
Plumbed `Function.is_naked` (set from `fd.abi == .naked` at both decl sites + generic.zig +
pack.zig); `funcWantsImplicitCtx` skips `.naked` (no synthetic ctx, like `.c`); all
body-lowering paths bypass `lowerValueBody` for `.naked` (asm body + `unreachable` cap — no sx
return); `emit_llvm` Pass 2 bailed loudly (since flipped to real emission). Adversarial
review caught the generic/pack `is_naked` gap (a generic `.naked` silently shipped a framed
body); closed + locked. The review's `.naked`-lambda CRITICAL was a false positive
(unparseable — `isLambda` breaks on the `abi` keyword).
## Current state
**B1.2 is UNBLOCKED.** Master GREEN (726/0), installed `sx` clean. The earlier "blockers"
were NOT real: issue **0151 was INVALID** (its repro used the non-idiomatic `($A)->$R`
bare-fn-ptr form) — **removed**. The correct `async` idiom **works today with no compiler
change** (verified live): `spawn :: (worker: Closure(..$args) -> $R, ..$args) -> Wrap($R)`
with a **lambda worker** + the `w : Wrap($R) = ---; w.v = worker(..args);` build form —
mirrors the canonical `examples/0543-packs-canonical-map.sx`. Ran `42 42` for homogeneous +
heterogeneous args. Caveats (work within them, not "bugs"): lambda params must be annotated
(`(a: i64, b: i64) -> i64 => …`); a bare *named* fn passed as the worker is non-idiomatic —
use a lambda; build the result struct with `= ---` + field-assign, not a struct-literal in
`return`. Issue **0150** (`void` struct field → SIGTRAP exit 133) is a **real** bug but only
reached via `Future(void)` (void-returning worker / `timeout`) — **DEFERRED**: B1.2 supports
non-void workers; revisit `Future(void)` in B1.4 (or fix 0150 standalone). The B1.2 *design*
(Io protocol on Context, blocking `CBlockingIo`, `context.io.now_ms()`) was validated live;
WIP at `.sx-tmp/b12-wip/` has the working Io/Context/materializer parts — reuse those, rewrite
the async layer to the pack-lambda idiom above.
### B1.2 attempt (BLOCKED — design proven, two compiler bugs filed)
What was built + verified WORKING (then reverted to keep master green):
- `Io :: protocol #inline { spawn_raw; suspend_raw -> !; ready; poll; now_ms; arm_timer; }`
in `core.sx` next to `Allocator`, with `SpawnOpts{ pin: PinTarget }` + `ParkToken{ handle }`.
Six methods, each justified by a downstream consumer (B1.3-B1.5).
- `Context :: struct { allocator; data; io: Io; }` — `io` appended LAST so `allocator` stays
index 0 (the `call.zig:1229` hardcode) and `data` keeps index 1 (minimal VM-fallback churn).
- Both `__sx_default_context` materializers updated in lockstep + verified: `protocol.zig`
`emitDefaultContextGlobal` (extended `ctx_fields` 2→3, built the `CBlockingIo→Io` inline
7-word vtable `{null-ctx, fn0..fn5}` via `getOrCreateThunks("Io","CBlockingIo")`) and
`comptime_vm.zig` `materializeDefaultContext` fallback (wrote the 6 thunk func-refs at
`io_base = addr + 4*ps`, offset `+ (i+1)*ps`). The global path auto-followed the 3-field
Context type. **`context.io.now_ms()` printed `clock ok` live — the capability threads + the
vtable dispatches correctly.**
- Stateless `CBlockingIo :: struct {}` + `impl Io for CBlockingIo` (mirror of `CAllocator`):
blocking semantics — `spawn_raw`/`ready`/`poll`/`arm_timer` no-op/0, `now_ms` → `time.mono_ms()`.
- **push-inherit-omitted fix** (`stmt.zig` `lowerPush`): a `push Context.{...}` now SEEDS the
new slot from the ambient context (load+store), then overwrites ONLY the literal's named
fields — so omitted fields (now incl. `io`) are INHERITED, never zero-inited to a null
vtable. Eliminates the omitted-field footgun globally (zero per-site churn across the 17
partial-literal sites). This is the correct capability-bag semantics; it compiled clean.
- **`!`-protocol-method warning fix** (`error_analysis.zig` + a new `Lowering.impl_method_names`
set populated in `protocols.zig` `registerImplBlock`): a protocol impl method may be declared
`!` by contract (e.g. `Io.suspend_raw`) yet never raise; the "declared `!` but never errors —
drop the `!`" hint is a false positive for impl methods, now suppressed for them.
Where it BROKE (the two blockers — both INDEPENDENT of the Io design, both repro standalone):
- **issue 0150** — `Future(void)` (for `timeout -> Future(void)`) makes a `result: void` field;
a `void` struct field crashes the compiler with an unsized-type SIGTRAP in LLVM
`getTypeSizeInBits` (a bare `struct { v: void; }` repros it). `timeout` was DEFERRED (it is a
B1.4 stub needing `arm_timer` anyway) rather than routed around with a non-void shape.
- **issue 0151** — `async(io, worker: ($A) -> $R, arg: $A) -> Future($R)`: `$R` inferred from a
fn-pointer parameter's RETURN type type-checks the call but is NOT bound as a usable type in
the body, so `Future(R)` errors `unknown type 'R'`. A direct `arg: $A` binds fine — the gap is
specific to type-vars nested in a fn-ptr/closure param signature. This blocks the central
`async`/`await` free-fns. (Manifested as the "unresolved type reached LLVM emission" panic —
the same one another session filed against my dirty binary as issue 0149, now moot after the
revert.)
Per the IMPASSABLE STOP rule: filed 0150 + 0151, reverted all B1.2 working changes (master
green again, photo project unbroken), STOPPED. Resume B1.2 once 0150 + 0151 land — the WIP in
`.sx-tmp/b12-wip/` makes it ~mechanical (the design is proven).
### Earlier — B1.0 + B1.1 complete
Stream A (atomics) is feature-complete (✅). Stream B1: **B1.0 + B1.1 complete.** The two
compiler-floor preconditions for the fiber runtime are in place: (1) `abi(.naked)` emits a
real LLVM `naked` function end-to-end (decl, generic, pack paths) — the context-switch
substrate; (2) per-fiber `context` root needs **no compiler change** — the spawn convention
(snapshot `context`, store, `push` it from the trampoline) is pure library sx. No
fibers/Io/scheduler code yet. Grounded floor facts:
- `context` is an implicit slot-0 `*Context` param + `push Context` is a stack `alloca` ⇒
**fiber-local for free** (confirmed by the B1.1 probe — never TLS, never re-read from the
`__sx_default_context` global mid-stack). A spawn passes the snapshot as the fiber-entry
fn's slot-0 ctx via `push f.root { entry(args) }`. Locked by `1804-...-context-snapshot`.
- Inline asm works end-to-end (lower→emit→JIT, aarch64 + x86_64) — the `.naked` body reuses it.
- **`.naked` with PARAMS works** (B1.0c, the B1.3 substrate): the param-alloca loop is gated
on `fd.abi != .naked` in decl.zig (both paths) + generic.zig — a naked fn's args stay in
ABI registers (read by the asm body), declared-but-unused in LLVM (verifier-legal).
Example `1803-concurrency-naked-asm-param.sx` (`add(a,b)` reads x0/x1). **Unsupported (loud,
not silent):** a `.naked` *variadic-pack* fn (pack.zig's param loop is intertwined with
comptime-param/`#insert` handling, and a naked fn can't read a runtime-sized pack from
registers anyway) → loud LLVM-verifier error for that nonsensical construct. Acceptable
boundary; a sharper sx diagnostic for it is a candidate polish, not a blocker.
## Next step
**Fix issue 0151 (generic inference through a pointer) so `await`/`cancel` become callable —
then complete B1.2's async surface.** Sequence:
1. **Fix 0151 (WIDENED):** make the generic-inference engine bind `$T` from a pointer-wrapped
param (`b: *Box($T)` ⇒ `unbox(@b)` infers `T`), which unblocks `await`/`cancel`
(`*Future($R)`). The same bug has two more faces — `$R` from a closure-return via a pack,
and via UFCS dot-dispatch (the original 0151 + the `f.await()` SIGTRAP). Acceptance cases
are in `issues/0151-...md`. This is generic-inference-engine work
(`src/ir/lower/generic.zig` `extractTypeParam` + the UFCS call path) — its own focused step.
2. **Restore/add the async examples:** `examples/1805-concurrency-io-blocking-async.sx`
(`context.io.async((a:i64,b:i64)->i64 => a+b, 40, 2).await()` → 42) + `1806-...-io-cancel.sx`
(cancel → state `.canceled`). lock→green. Regen `.ir` only after green; confirm layout-only.
3. Then B1.2 is truly done → proceed to **B1.3 (fiber runtime)**.
**Deferred (do NOT block B1.2 on these):** issue **0150** (`void` struct field SIGTRAP) — only
`Future(void)`/`timeout`, which are B1.4. The **`::` callable-parameter feature** (named-fn
async workers `async(read_a, conn)`) — WIP at `.sx-tmp/wip-callable-params/patch.diff` (parser
done, inference incomplete); a dedicated effort; lambda workers are the B1.2 idiom meanwhile.
`Context` layout settled: `{ allocator; data; io; }` (allocator index 0 fixed by
`call.zig:1229`, io last). Io protocol + materializers + push-inherit are LANDED + reviewed.
## Known issues / capability gaps
- **🔴 B1.2 BLOCKERS (both filed, both standalone-reproducible, both independent of the Io
design):**
- **issue 0150** — a `void` struct field crashes the compiler (unsized-type SIGTRAP in LLVM
`getTypeSizeInBits`). Blocks `Future(void)` → `timeout`. Repro: `issues/0150-...`.
- **issue 0151** — a type-var inferred from a fn-pointer parameter's RETURN type is not bound
in the function body (`unknown type 'R'`). Blocks `async(io, worker: ($A)->$R, arg)`'s
`Future(R)`. Repro: `issues/0151-...`.
- (Note: **issue 0149**, filed by another session against the dirty in-progress binary, was a
manifestation of 0151 — "unresolved type reached LLVM emission". Moot after the revert; its
real root cause is 0151.)
- **Orthogonal (not a B1 blocker):** default VALUES for comptime params don't bind on
generic-struct methods (free-fn defaults DO work) — inherited from Stream A. Only matters
if a B2 lib type wants a defaulted comptime param; atomics/fibers require explicit, so
unaffected.
- **Issue 0144 (open, independent):** calling an unrecognized bodiless `#builtin` silently
returns 0 / exit 0 — a silent-fallback footgun in the generic builtin-call path. Filed;
leave for its own fix session unless prioritized. Not a B1 blocker.
- **Deferred design gap (documented):** the B1.4 event-loop `Io` does not yet cooperate with
a platform UI run loop (CFRunLoop/NSRunLoop/ALooper); pinning gives thread-affinity, not
run-loop integration — a §6 app-target concern, out of B1 scope.
## Decisions (Stream B1 specifics; surface locked in design §4 / §4.6)
- **The async runtime is sx LIBRARY code.** The compiler provides only: the general
primitives (inline asm ✅, `abi(.naked)` naked [B1.0], atomics ✅) + fiber-safe codegen
(`context` already fiber-local — B1.1). Schedulers, fibers, channels, futures, `Io`
vtables, `mmap` stacks are all sx.
- **`abi(.naked)` is the real spelling of the design's `callconv(.naked)`** — postfix slot,
`name :: (sig) -> Ret abi(.naked) { asm { … }; }`. B1.0 = carry it into IR + emit LLVM
`naked` + skip prologue/ctx (mirror the existing `.c` skip), NOT extend the enum (it's
already there, just inert).
- **`.naked` ≠ `.c`:** a `.c` epilogue would restore SP from the wrong stack across a context
switch (SP-in ≠ SP-out by design). `.naked` = no prologue/epilogue/frame; the asm emits its
own `ret`. This is why the switch must be `.naked`.
- **Naming:** sx-facing name is **`naked`** (keyword `abi(.naked)`, field `is_naked`, the
diagnostic), matching LLVM's `naked` attribute and the industry term (Zig/Rust/GCC/Clang).
The ABI variant was renamed `.pure → .naked` (user direction): "pure" universally means
*side-effect-free*, the opposite of a register-clobbering context switch.
- **B1.0 snapshot scope:** a `.naked` body is raw per-arch asm; LLVM's `naked` attr text is
arch-invariant. **B1.0a** = one host example locked to the emit bail (host-independent —
fires before instruction selection; no `.build` pin). **B1.0b** = pin aarch64 + add an
x86_64 cross sibling (`.build` target-gated, ir-only on mismatch), like the asm corpus
split. The `.ir` proves the `naked` attr + asm emitted, NOT register-save correctness
(that's B1.3's stress harness).
- **B1.1 — per-fiber context is library-only (CONFIRMED by probe):** push frames are
stack-`alloca`'d and the implicit ctx rides slot 0, so the spawn convention — snapshot
`context`, store it, `push f.root { entry(args) }` from the trampoline — installs the
fiber's root with no compiler change. Verified: the body reads the snapshot over a different
ambient context, and `push` restores ambient on exit (`1804-...-context-snapshot`). The
design doc's "never raw TLS" guarded a non-problem (context was never TLS).
- **Test keystones (design §10):** the **B1.3 switch-stress harness** gates the
context-switch (the one piece the deterministic `Io` can't test — §8.1.1, §10.7); the
**B1.4 deterministic-sim `Io`** (calibrated against blocking `Io` — §8.1.3) gates all
scheduling tests. Both must exist + be calibrated before the async tests they gate are
trusted. `18xx` asserts program-emitted ordering contracts, not raw interleaving.
## Log
- **carve** — wrote PLAN-FIBERS.md + CHECKPOINT-FIBERS.md. Grounded the B1 compiler floor:
`ABI.naked` inert (type_resolver.zig:237), IR `Function` has no naked flag (inst.zig:605),
attribute API pattern (emit_llvm.zig:1339 nounwind), `.c` ctx-skip precedent
(decl.zig:515), `push Context` stack-alloca + slot-0 implicit ctx (stmt.zig:1263,
lower.zig:259), `__sx_default_context` root (decl.zig:2667/2815), inline-asm corpus
(1645/1651). Corrected the design's `callconv(.naked)` → real `abi(.naked)` spelling and
the B1.0 snapshot story. B1.1 grounded as likely library-only. Baseline green (721/0).
- **B1.0a** — plumbed `Function.is_naked` (set from `fd.abi == .naked` at both decl sites);
`funcWantsImplicitCtx` skips `.naked` (no implicit ctx, like `.c`); both body-lowering
paths bypass `lowerValueBody` for `.naked` (asm body + `unreachable` cap — no sx return);
`emit_llvm` Pass 2 bails loudly on `func.is_naked`. `examples/1800-concurrency-naked-asm.sx`
locked to the bail (exit 1 + diagnostic). Suite green (722/0). (ABI variant later renamed
`.pure → .naked` — see the Naming decision above — so all `is_*`/`abi(.*)`/example names
here read `naked`.)
- **B1.0a review-hardening** — adversarial review found generic/pack Function-creation paths
left `is_naked` false (silent framed body for a generic `.naked` instance — returned 42 but
corrupted the stack). Fixed generic.zig + pack.zig (set `is_naked` + asm-only `unreachable`
cap); locked by `examples/1801-concurrency-naked-generic-bail.sx`. The review's `.naked`-
lambda CRITICAL was a false positive (unparseable — `isLambda` breaks on `abi`). Suite
green (723/0).
- **B1.0b** — real `naked` emission: emit_llvm declaration pass adds LLVM `naked`/`noinline`/
`nounwind` + skips `frame-pointer` for `func.is_naked`; Pass 2 emits the body verbatim (no
prologue). `1800` green aarch64-pinned (exit 42 + `.ir`); renamed `1801` → `-generic`
(generic `.naked` emits a naked body, exit 42); added x86_64 sibling `1802` (ir-only, `.ir`
locks `naked` + `movl $42, %eax`). Unit test asserts `naked` present + `frame-pointer`
absent. Suite green (724/0).
- **B1.0c** — review-hardening: param-bearing `.naked` emitted invalid LLVM (loud verifier
error). Gated the param-alloca loop on `fd.abi != .naked` (decl.zig both paths + generic.zig)
— naked args stay in registers, read by the asm body (the B1.3 context-switch shape).
Locked by `examples/1803-concurrency-naked-asm-param.sx`. Pack `.naked` left unsupported
(loud, nonsensical). **B1.0 complete.** Suite green (725/0).
- **rename** — ABI variant `.pure → .naked` (keyword, `Function.is_naked`, diagnostics,
examples 1800-1803 `*-pure-* → *-naked-*`, docs). "pure" universally means side-effect-free
— wrong for a register-clobbering switch; "naked" matches LLVM/Zig/Rust/GCC/Clang. Pure
cosmetics, no semantic change. Suite green (725/0).
- **B1.1** — per-fiber `context` root: **zero compiler change** (probe-confirmed). The spawn
convention (snapshot `context` → store in a struct → `push f.root { entry() }` from the
trampoline) installs the fiber's root via the implicit slot-0 `*Context` param; the body
reads the snapshot, not the trampoline's ambient ctx, and the `push` scope restores ambient
on exit. Locked by `examples/1804-concurrency-context-snapshot.sx` (prints `fiber root: 42`
/ `ambient after: 99`). Suite green (726/0). **Next: B1.2 (Io interface + context.io).**
- **B1.2 (BLOCKED)** — built the full `Io` capability (protocol on `Context`, stateless
`CBlockingIo` blocking default, both `__sx_default_context` materializers, push-inherit-omitted
fix, `!`-impl-method warning fix) and VERIFIED the core works live (`context.io.now_ms()` →
`clock ok`). Two independent compiler bugs blocked the `async`/`await`/`timeout` layer:
**0150** (`void` struct field → unsized SIGTRAP, blocks `Future(void)`) and **0151** (type-var
from a fn-ptr param's return type not bound in the body, blocks `async`'s `Future(R)`). Both
filed with standalone repros + investigation prompts. Per the STOP rule: reverted ALL B1.2
working changes (master green again, 726/0; the dirty binary had broken the photo project —
see the now-moot 0149), saved WIP to `.sx-tmp/b12-wip/`, STOPPED. Resume after 0150 + 0151.