docs: PLAN-IO-UNIFY status — Phase 0 + Phase 1 done, Phase 2 open items
This commit is contained in:
126
current/PLAN-IO-UNIFY.md
Normal file
126
current/PLAN-IO-UNIFY.md
Normal file
@@ -0,0 +1,126 @@
|
|||||||
|
# PLAN-IO-UNIFY — fold the fiber scheduler behind `context.io`, re-home `race`
|
||||||
|
|
||||||
|
## Why
|
||||||
|
Today there are **two parallel async stacks**:
|
||||||
|
|
||||||
|
| stack | behind `context.io`? | real suspension? | cancellation channel |
|
||||||
|
|---|---|---|---|
|
||||||
|
| io.sx `async`/`await`/`cancel`/`Future` | yes (`impl Io for CBlockingIo`) | **no** — runs the worker inline to completion | `suspend_raw -> !` / `IoErr.Canceled` (designed, unused) |
|
||||||
|
| sched.sx `go`/`wait`/`cancel`/`race` (just landed) | **no** | yes (`swap_context` fibers) | none — `suspend_self -> void` |
|
||||||
|
|
||||||
|
`context.io` is structurally Zig's `std.Io` (an `Io` protocol carried *implicitly* in `Context` — better
|
||||||
|
ergonomics than Zig's explicit `io:` param), and the roadmap (§A5, §4.6) already says the fiber
|
||||||
|
scheduler should be **one of its `Io` vtables** and that `race` is **`context.io.race(..)` over Futures**.
|
||||||
|
The just-landed `race` on `sched.Scheduler` over `*Task` is the proven LOGIC at the wrong LAYER.
|
||||||
|
|
||||||
|
**Goal:** make the fiber `Scheduler` an `impl Io`, lift `async`/`await`/`cancel`/`race` onto the `Io`
|
||||||
|
protocol so they run colorblind under either impl, and let cancellation fall out of the existing
|
||||||
|
`suspend_raw -> !` contract (the "true cancellation, model A" the user picked — already the interface's
|
||||||
|
design). One async stack, behind `context.io`.
|
||||||
|
|
||||||
|
## The fiber → `Io` mapping (the crux)
|
||||||
|
`Io :: protocol { spawn_raw, suspend_raw -> !, ready, poll, now_ms, arm_timer }` (core.sx). Map each onto
|
||||||
|
the existing fiber primitives in sched.sx (`spawn`/`suspend_self`/`wake`/`sleep`/`block_on_fd`/`run`):
|
||||||
|
|
||||||
|
| `Io` method | fiber realization |
|
||||||
|
|---|---|
|
||||||
|
| `spawn_raw(entry, arg, opts) -> *void` | `spawn` a fiber whose body invokes `entry(arg)` (raw C-ABI thunk, not a closure — see Bridge below). Returns the `*Fiber` as the opaque handle. |
|
||||||
|
| `suspend_raw(park) -> !` | `suspend_self()`, then on resume CHECK the current task's cancel flag and `raise IoErr.Canceled` if set. `park.handle` = the `*Fiber` to re-ready. **This is the cancellation delivery point.** |
|
||||||
|
| `ready(park)` | `wake(park.handle as *Fiber)` (already guarded on `.suspended`). |
|
||||||
|
| `arm_timer(deadline_ms, park) -> *void` | arm a `Timer{deadline, fiber=park.handle}` (today's `sleep` minus the self-suspend); return the timer handle so a cancel can evict it. |
|
||||||
|
| `poll(deadline_ms) -> i64` | ONE iteration of the `run` loop: drain ready, then fire the earliest timer / block on fds up to `deadline_ms`. Returns the next pending deadline (or sentinel when idle). |
|
||||||
|
| `now_ms() -> i64` | the virtual `clock_ms` (deterministic), NOT a wall clock — keeps 1817/1821-style tests reproducible. |
|
||||||
|
|
||||||
|
`Scheduler.run()` stays as the explicit DRIVER (the top-level loop that calls `poll` to quiescence),
|
||||||
|
installed via `push Context { io = xx scheduler } { … s.run(); }` — exactly the existing sched examples,
|
||||||
|
just with the scheduler now reachable as `context.io`.
|
||||||
|
|
||||||
|
## Status (2026-06-27)
|
||||||
|
- **Phase 0 — fibers inherit the spawn-time context. DONE** (`2f2d7f1d`). Discovered during Phase 1: a
|
||||||
|
fiber body ran under `__sx_default_context` (the `abi(.c)` `fib_dispatch` dropped the implicit
|
||||||
|
context), so a scheduler installed as `context.io` was invisible inside a worker. Fixed:
|
||||||
|
`Scheduler.spawn` snapshots `context` → `Fiber.dctx`; `fib_dispatch` re-pushes it. Behavior-preserving
|
||||||
|
(suite 828/0), no cross-fiber leak (context is parameter-threaded per stack). Lock: example 1822.
|
||||||
|
- **Phase 1 — `impl Io for Scheduler`. DONE** (`5c30bfe0`, hardened `da7dd1f1`). Six methods over the
|
||||||
|
fiber primitives; `spawn_raw` bridges the erased `(*void)->void` worker thunk via an fn-ptr round-trip.
|
||||||
|
Lock: example 1823 (spawn→arm→suspend→ready→resume entirely through `context.io`, deterministic).
|
||||||
|
Adversarial review fixed: `arm_timer`/`spawn_raw` null guards, `poll` fd-pending abort + `deadline_ms`
|
||||||
|
doc, stale `fib_dispatch` comment.
|
||||||
|
- **Resolved design decisions:** D1 = direct `impl Io for Scheduler` (chosen). D2 = `now_ms` returns the
|
||||||
|
virtual `clock_ms` (deterministic) — a real-clock variant is later. D4 = deferred to Phase 3.
|
||||||
|
- **Open for Phase 2:**
|
||||||
|
- **ParkToken↔fiber binding.** `ready(park)` needs `park.handle` = the awaiter `*Fiber`. The scheduler
|
||||||
|
knows `self.current` at suspend; the cleanest is `suspend_raw(park: *ParkToken)` writing
|
||||||
|
`park.handle = self.current` before parking (a small protocol change: the materializer installs
|
||||||
|
thunks by name/order, signature-agnostic — verified low-risk). Decide vs a token→fiber registry.
|
||||||
|
- **`ready()` liveness (review CONCERN 6).** Casting a stale/reaped `*Fiber` handle and `wake`-ing it is
|
||||||
|
a latent UAF once real `await` runs — `wake`'s `.suspended` value-check on freed bytes is luck, not
|
||||||
|
safety. Phase 2 must guarantee single-ready / deregistration (mirror the bespoke-race deregister).
|
||||||
|
- **Out-of-scope compiler bug found by review (not filed yet):** closure free-var analysis does not
|
||||||
|
descend into a nested `push Context {…}` block inside a closure body — a var used only there reports
|
||||||
|
`unresolved`. Phase 0 sidesteps it (capture is at the `Fiber` level, not via closure), so it does NOT
|
||||||
|
block the unification; worth an `issues/` entry in a separate session.
|
||||||
|
|
||||||
|
## Phases (each: implement → lock with an example → `zig build test` green → both platforms)
|
||||||
|
|
||||||
|
1. **`impl Io for Scheduler` (the vehicle).** Implement the six methods over the fiber primitives. Add
|
||||||
|
a `Fiber.canceled`/task back-ref so `suspend_raw` can raise on resume. Keep `CBlockingIo` intact.
|
||||||
|
Lock: install the fiber Io into `context.io`, run a root fiber that `suspend_raw`s and is `ready()`'d —
|
||||||
|
asserts real park/resume through the protocol (not inline). **Bridge** (the one fiddly bit): `async`'s
|
||||||
|
generic `Closure(..$args) -> $R` worker → `spawn_raw`'s raw `entry/arg`. Box the worker thunk on the
|
||||||
|
heap; `entry` is a C-ABI `(env: *void) -> void` invoke-thunk (mirrors `fib_dispatch`), `arg` is the env.
|
||||||
|
|
||||||
|
2. **`async`/`await` over the fiber Io (real interleaving).** Under a suspending Io, `async` calls
|
||||||
|
`spawn_raw` and returns a PENDING `Future($R)` (no longer born `.ready`); the spawned body fills
|
||||||
|
`f.value`/`f.state` and `ready(f.park)`s the awaiter. `await(f)` checks `.ready` else `suspend_raw(f.park)`
|
||||||
|
then returns/raises — the suspending sibling of today's immediate `await`. `CBlockingIo` keeps the
|
||||||
|
run-inline path (degenerate, still correct). Lock: two `context.io.async` tasks interleave under the
|
||||||
|
fiber Io (the io.sx layer, replacing the bespoke `sched.go`).
|
||||||
|
|
||||||
|
3. **True cancellation via `suspend_raw -> !`.** `cancel(f)` flips `f.canceled` AND `ready(f.park)`s /
|
||||||
|
wakes the worker fiber so its NEXT `suspend_raw` raises `IoErr.Canceled`. The worker's suspends
|
||||||
|
(`await`, a future `io.sleep`) propagate via `try`/`!`; the worker body unwinds, the future ends
|
||||||
|
`.canceled`, its post-cancel side-effects DON'T run. This is the model-A "true cancellation" — now
|
||||||
|
delivered through the protocol, not bespoke. Lock: a cancelled task's work stops at its next suspend
|
||||||
|
(assert via a shared log: the post-suspend line never prints).
|
||||||
|
|
||||||
|
4. **`race` over Futures — `context.io.race((a: fa, b: fb))`.** Re-home the proven race logic (winner
|
||||||
|
scan, deregister-all-on-wake, structured cancel+join of losers) from `sched.race(*Task tuple)` onto
|
||||||
|
`*Future` handles + the `Io` protocol. The type-level machinery ports UNCHANGED — `RaceResult($T)`,
|
||||||
|
`make_variant`, the tuple reflection (GAP 1/2, all landed) — only the runtime swaps `*Task`→`*Future`
|
||||||
|
and `suspend_self`→`suspend_raw`/`ready`. Cancellation of losers now uses Phase 3 (their next suspend
|
||||||
|
raises), so `race` returns at WINNER-time, not slowest-loser-time. Lock: re-point 1821 at
|
||||||
|
`context.io.race`; assert winner value + losers' work stopped (not merely flagged).
|
||||||
|
|
||||||
|
5. **Converge — retire the bespoke fiber async API.** Fold `sched.go`/`wait`/`cancel`/`race` into the
|
||||||
|
io.sx layer; `Scheduler` stays as the fiber Io's engine + driver. Migrate 1811–1821 to the
|
||||||
|
`context.io` API. One async stack, all behind the protocol. Update the roadmap/checkpoints.
|
||||||
|
|
||||||
|
## Open decisions (need a call before/within the phase noted)
|
||||||
|
- **D1 (Phase 1) — `impl Io for Scheduler` vs a `FiberIo` wrapper.** Direct impl makes `context.io` BE the
|
||||||
|
scheduler (`xx scheduler` as the Io value, stateful receiver — mirrors the allocator `xx local` rule).
|
||||||
|
A wrapper adds a level but decouples the public Io vtable from the scheduler internals. *Lean: direct
|
||||||
|
impl* (simplest, matches the allocator convention).
|
||||||
|
- **D2 (Phase 1) — virtual vs real clock under the fiber Io.** Tests need the deterministic virtual clock
|
||||||
|
(`clock_ms`); a real deployment wants `time.mono_ms`. Thread it as a Scheduler mode, or two Io impls
|
||||||
|
(`FiberIo` virtual-clock for tests, real-clock for prod). *Lean: a `clock: enum { virtual; real }` field
|
||||||
|
so one impl serves both; tests pin `.virtual`.*
|
||||||
|
- **D3 (Phase 2) — `Future(void)` (issue 0150 SIGTRAP).** A `void`-result task can't build `Future(void)`
|
||||||
|
today. Defer (race/async target non-void), or fix the `void` struct-field path. *Lean: defer, gate with
|
||||||
|
a diagnostic.*
|
||||||
|
- **D4 (Phase 3) — where the cancel flag lives.** The `Future` already has `canceled: Atomic(bool)`; the
|
||||||
|
fiber needs to reach it from `suspend_raw`. Give `Fiber` a `*Atomic(bool)` back-ref to its future's flag
|
||||||
|
(set at `spawn_raw`), so `suspend_raw` consults it with no per-suspend lookup. *Lean: back-ref pointer.*
|
||||||
|
|
||||||
|
## Validation (every phase)
|
||||||
|
- `zig build && zig build test` green (full corpus).
|
||||||
|
- New/changed `18xx` examples byte-identical on aarch64-macOS host AND aarch64-linux container
|
||||||
|
(deterministic virtual clock).
|
||||||
|
- Adversarial review of each phase (worker + read-only reviewer), per the session workflow.
|
||||||
|
|
||||||
|
## What this supersedes
|
||||||
|
- `sched.sx`'s bespoke `go`/`wait`/`cancel`/`race` (Phase 5 retires them; the proven logic moves onto the
|
||||||
|
protocol). The just-landed `race` (commit `9099735e`) is the reference logic for Phase 4, not the final
|
||||||
|
home.
|
||||||
|
- PLAN-RACE.md's "race on `sched.Scheduler`" framing — this plan moves it onto `context.io` per the
|
||||||
|
roadmap's §A5 / §4.6 design-of-record.
|
||||||
Reference in New Issue
Block a user