diff --git a/current/PLAN-IO-UNIFY.md b/current/PLAN-IO-UNIFY.md new file mode 100644 index 00000000..bf940445 --- /dev/null +++ b/current/PLAN-IO-UNIFY.md @@ -0,0 +1,126 @@ +# PLAN-IO-UNIFY — fold the fiber scheduler behind `context.io`, re-home `race` + +## Why +Today there are **two parallel async stacks**: + +| stack | behind `context.io`? | real suspension? | cancellation channel | +|---|---|---|---| +| io.sx `async`/`await`/`cancel`/`Future` | yes (`impl Io for CBlockingIo`) | **no** — runs the worker inline to completion | `suspend_raw -> !` / `IoErr.Canceled` (designed, unused) | +| sched.sx `go`/`wait`/`cancel`/`race` (just landed) | **no** | yes (`swap_context` fibers) | none — `suspend_self -> void` | + +`context.io` is structurally Zig's `std.Io` (an `Io` protocol carried *implicitly* in `Context` — better +ergonomics than Zig's explicit `io:` param), and the roadmap (§A5, §4.6) already says the fiber +scheduler should be **one of its `Io` vtables** and that `race` is **`context.io.race(..)` over Futures**. +The just-landed `race` on `sched.Scheduler` over `*Task` is the proven LOGIC at the wrong LAYER. + +**Goal:** make the fiber `Scheduler` an `impl Io`, lift `async`/`await`/`cancel`/`race` onto the `Io` +protocol so they run colorblind under either impl, and let cancellation fall out of the existing +`suspend_raw -> !` contract (the "true cancellation, model A" the user picked — already the interface's +design). One async stack, behind `context.io`. + +## The fiber → `Io` mapping (the crux) +`Io :: protocol { spawn_raw, suspend_raw -> !, ready, poll, now_ms, arm_timer }` (core.sx). Map each onto +the existing fiber primitives in sched.sx (`spawn`/`suspend_self`/`wake`/`sleep`/`block_on_fd`/`run`): + +| `Io` method | fiber realization | +|---|---| +| `spawn_raw(entry, arg, opts) -> *void` | `spawn` a fiber whose body invokes `entry(arg)` (raw C-ABI thunk, not a closure — see Bridge below). Returns the `*Fiber` as the opaque handle. | +| `suspend_raw(park) -> !` | `suspend_self()`, then on resume CHECK the current task's cancel flag and `raise IoErr.Canceled` if set. `park.handle` = the `*Fiber` to re-ready. **This is the cancellation delivery point.** | +| `ready(park)` | `wake(park.handle as *Fiber)` (already guarded on `.suspended`). | +| `arm_timer(deadline_ms, park) -> *void` | arm a `Timer{deadline, fiber=park.handle}` (today's `sleep` minus the self-suspend); return the timer handle so a cancel can evict it. | +| `poll(deadline_ms) -> i64` | ONE iteration of the `run` loop: drain ready, then fire the earliest timer / block on fds up to `deadline_ms`. Returns the next pending deadline (or sentinel when idle). | +| `now_ms() -> i64` | the virtual `clock_ms` (deterministic), NOT a wall clock — keeps 1817/1821-style tests reproducible. | + +`Scheduler.run()` stays as the explicit DRIVER (the top-level loop that calls `poll` to quiescence), +installed via `push Context { io = xx scheduler } { … s.run(); }` — exactly the existing sched examples, +just with the scheduler now reachable as `context.io`. + +## Status (2026-06-27) +- **Phase 0 — fibers inherit the spawn-time context. DONE** (`2f2d7f1d`). Discovered during Phase 1: a + fiber body ran under `__sx_default_context` (the `abi(.c)` `fib_dispatch` dropped the implicit + context), so a scheduler installed as `context.io` was invisible inside a worker. Fixed: + `Scheduler.spawn` snapshots `context` → `Fiber.dctx`; `fib_dispatch` re-pushes it. Behavior-preserving + (suite 828/0), no cross-fiber leak (context is parameter-threaded per stack). Lock: example 1822. +- **Phase 1 — `impl Io for Scheduler`. DONE** (`5c30bfe0`, hardened `da7dd1f1`). Six methods over the + fiber primitives; `spawn_raw` bridges the erased `(*void)->void` worker thunk via an fn-ptr round-trip. + Lock: example 1823 (spawn→arm→suspend→ready→resume entirely through `context.io`, deterministic). + Adversarial review fixed: `arm_timer`/`spawn_raw` null guards, `poll` fd-pending abort + `deadline_ms` + doc, stale `fib_dispatch` comment. +- **Resolved design decisions:** D1 = direct `impl Io for Scheduler` (chosen). D2 = `now_ms` returns the + virtual `clock_ms` (deterministic) — a real-clock variant is later. D4 = deferred to Phase 3. +- **Open for Phase 2:** + - **ParkToken↔fiber binding.** `ready(park)` needs `park.handle` = the awaiter `*Fiber`. The scheduler + knows `self.current` at suspend; the cleanest is `suspend_raw(park: *ParkToken)` writing + `park.handle = self.current` before parking (a small protocol change: the materializer installs + thunks by name/order, signature-agnostic — verified low-risk). Decide vs a token→fiber registry. + - **`ready()` liveness (review CONCERN 6).** Casting a stale/reaped `*Fiber` handle and `wake`-ing it is + a latent UAF once real `await` runs — `wake`'s `.suspended` value-check on freed bytes is luck, not + safety. Phase 2 must guarantee single-ready / deregistration (mirror the bespoke-race deregister). +- **Out-of-scope compiler bug found by review (not filed yet):** closure free-var analysis does not + descend into a nested `push Context {…}` block inside a closure body — a var used only there reports + `unresolved`. Phase 0 sidesteps it (capture is at the `Fiber` level, not via closure), so it does NOT + block the unification; worth an `issues/` entry in a separate session. + +## Phases (each: implement → lock with an example → `zig build test` green → both platforms) + +1. **`impl Io for Scheduler` (the vehicle).** Implement the six methods over the fiber primitives. Add + a `Fiber.canceled`/task back-ref so `suspend_raw` can raise on resume. Keep `CBlockingIo` intact. + Lock: install the fiber Io into `context.io`, run a root fiber that `suspend_raw`s and is `ready()`'d — + asserts real park/resume through the protocol (not inline). **Bridge** (the one fiddly bit): `async`'s + generic `Closure(..$args) -> $R` worker → `spawn_raw`'s raw `entry/arg`. Box the worker thunk on the + heap; `entry` is a C-ABI `(env: *void) -> void` invoke-thunk (mirrors `fib_dispatch`), `arg` is the env. + +2. **`async`/`await` over the fiber Io (real interleaving).** Under a suspending Io, `async` calls + `spawn_raw` and returns a PENDING `Future($R)` (no longer born `.ready`); the spawned body fills + `f.value`/`f.state` and `ready(f.park)`s the awaiter. `await(f)` checks `.ready` else `suspend_raw(f.park)` + then returns/raises — the suspending sibling of today's immediate `await`. `CBlockingIo` keeps the + run-inline path (degenerate, still correct). Lock: two `context.io.async` tasks interleave under the + fiber Io (the io.sx layer, replacing the bespoke `sched.go`). + +3. **True cancellation via `suspend_raw -> !`.** `cancel(f)` flips `f.canceled` AND `ready(f.park)`s / + wakes the worker fiber so its NEXT `suspend_raw` raises `IoErr.Canceled`. The worker's suspends + (`await`, a future `io.sleep`) propagate via `try`/`!`; the worker body unwinds, the future ends + `.canceled`, its post-cancel side-effects DON'T run. This is the model-A "true cancellation" — now + delivered through the protocol, not bespoke. Lock: a cancelled task's work stops at its next suspend + (assert via a shared log: the post-suspend line never prints). + +4. **`race` over Futures — `context.io.race((a: fa, b: fb))`.** Re-home the proven race logic (winner + scan, deregister-all-on-wake, structured cancel+join of losers) from `sched.race(*Task tuple)` onto + `*Future` handles + the `Io` protocol. The type-level machinery ports UNCHANGED — `RaceResult($T)`, + `make_variant`, the tuple reflection (GAP 1/2, all landed) — only the runtime swaps `*Task`→`*Future` + and `suspend_self`→`suspend_raw`/`ready`. Cancellation of losers now uses Phase 3 (their next suspend + raises), so `race` returns at WINNER-time, not slowest-loser-time. Lock: re-point 1821 at + `context.io.race`; assert winner value + losers' work stopped (not merely flagged). + +5. **Converge — retire the bespoke fiber async API.** Fold `sched.go`/`wait`/`cancel`/`race` into the + io.sx layer; `Scheduler` stays as the fiber Io's engine + driver. Migrate 1811–1821 to the + `context.io` API. One async stack, all behind the protocol. Update the roadmap/checkpoints. + +## Open decisions (need a call before/within the phase noted) +- **D1 (Phase 1) — `impl Io for Scheduler` vs a `FiberIo` wrapper.** Direct impl makes `context.io` BE the + scheduler (`xx scheduler` as the Io value, stateful receiver — mirrors the allocator `xx local` rule). + A wrapper adds a level but decouples the public Io vtable from the scheduler internals. *Lean: direct + impl* (simplest, matches the allocator convention). +- **D2 (Phase 1) — virtual vs real clock under the fiber Io.** Tests need the deterministic virtual clock + (`clock_ms`); a real deployment wants `time.mono_ms`. Thread it as a Scheduler mode, or two Io impls + (`FiberIo` virtual-clock for tests, real-clock for prod). *Lean: a `clock: enum { virtual; real }` field + so one impl serves both; tests pin `.virtual`.* +- **D3 (Phase 2) — `Future(void)` (issue 0150 SIGTRAP).** A `void`-result task can't build `Future(void)` + today. Defer (race/async target non-void), or fix the `void` struct-field path. *Lean: defer, gate with + a diagnostic.* +- **D4 (Phase 3) — where the cancel flag lives.** The `Future` already has `canceled: Atomic(bool)`; the + fiber needs to reach it from `suspend_raw`. Give `Fiber` a `*Atomic(bool)` back-ref to its future's flag + (set at `spawn_raw`), so `suspend_raw` consults it with no per-suspend lookup. *Lean: back-ref pointer.* + +## Validation (every phase) +- `zig build && zig build test` green (full corpus). +- New/changed `18xx` examples byte-identical on aarch64-macOS host AND aarch64-linux container + (deterministic virtual clock). +- Adversarial review of each phase (worker + read-only reviewer), per the session workflow. + +## What this supersedes +- `sched.sx`'s bespoke `go`/`wait`/`cancel`/`race` (Phase 5 retires them; the proven logic moves onto the + protocol). The just-landed `race` (commit `9099735e`) is the reference logic for Phase 4, not the final + home. +- PLAN-RACE.md's "race on `sched.Scheduler`" framing — this plan moves it onto `context.io` per the + roadmap's §A5 / §4.6 design-of-record.