11 KiB
PLAN-IO-UNIFY — fold the fiber scheduler behind context.io, re-home race
Why
Today there are two parallel async stacks:
| stack | behind context.io? |
real suspension? | cancellation channel |
|---|---|---|---|
io.sx async/await/cancel/Future |
yes (impl Io for CBlockingIo) |
no — runs the worker inline to completion | suspend_raw -> ! / IoErr.Canceled (designed, unused) |
sched.sx go/wait/cancel/race (just landed) |
no | yes (swap_context fibers) |
none — suspend_self -> void |
context.io is structurally Zig's std.Io (an Io protocol carried implicitly in Context — better
ergonomics than Zig's explicit io: param), and the roadmap (§A5, §4.6) already says the fiber
scheduler should be one of its Io vtables and that race is context.io.race(..) over Futures.
The just-landed race on sched.Scheduler over *Task is the proven LOGIC at the wrong LAYER.
Goal: make the fiber Scheduler an impl Io, lift async/await/cancel/race onto the Io
protocol so they run colorblind under either impl, and let cancellation fall out of the existing
suspend_raw -> ! contract (the "true cancellation, model A" the user picked — already the interface's
design). One async stack, behind context.io.
The fiber → Io mapping (the crux)
Io :: protocol { spawn_raw, suspend_raw -> !, ready, poll, now_ms, arm_timer } (core.sx). Map each onto
the existing fiber primitives in sched.sx (spawn/suspend_self/wake/sleep/block_on_fd/run):
Io method |
fiber realization |
|---|---|
spawn_raw(entry, arg, opts) -> *void |
spawn a fiber whose body invokes entry(arg) (raw C-ABI thunk, not a closure — see Bridge below). Returns the *Fiber as the opaque handle. |
suspend_raw(park) -> ! |
suspend_self(), then on resume CHECK the current task's cancel flag and raise IoErr.Canceled if set. park.handle = the *Fiber to re-ready. This is the cancellation delivery point. |
ready(park) |
wake(park.handle as *Fiber) (already guarded on .suspended). |
arm_timer(deadline_ms, park) -> *void |
arm a Timer{deadline, fiber=park.handle} (today's sleep minus the self-suspend); return the timer handle so a cancel can evict it. |
poll(deadline_ms) -> i64 |
ONE iteration of the run loop: drain ready, then fire the earliest timer / block on fds up to deadline_ms. Returns the next pending deadline (or sentinel when idle). |
now_ms() -> i64 |
the virtual clock_ms (deterministic), NOT a wall clock — keeps 1817/1821-style tests reproducible. |
Scheduler.run() stays as the explicit DRIVER (the top-level loop that calls poll to quiescence),
installed via push Context { io = xx scheduler } { … s.run(); } — exactly the existing sched examples,
just with the scheduler now reachable as context.io.
Status (2026-06-27)
- Phase 0 — fibers inherit the spawn-time context. DONE (
2f2d7f1d). Discovered during Phase 1: a fiber body ran under__sx_default_context(theabi(.c)fib_dispatchdropped the implicit context), so a scheduler installed ascontext.iowas invisible inside a worker. Fixed:Scheduler.spawnsnapshotscontext→Fiber.dctx;fib_dispatchre-pushes it. Behavior-preserving (suite 828/0), no cross-fiber leak (context is parameter-threaded per stack). Lock: example 1822. - Phase 1 —
impl Io for Scheduler. DONE (5c30bfe0, hardenedda7dd1f1). Six methods over the fiber primitives;spawn_rawbridges the erased(*void)->voidworker thunk via an fn-ptr round-trip. Lock: example 1823 (spawn→arm→suspend→ready→resume entirely throughcontext.io, deterministic). Adversarial review fixed:arm_timer/spawn_rawnull guards,pollfd-pending abort +deadline_msdoc, stalefib_dispatchcomment. - Resolved design decisions: D1 = direct
impl Io for Scheduler(chosen). D2 =now_msreturns the virtualclock_ms(deterministic) — a real-clock variant is later. D4 = deferred to Phase 3. - Phase 2 —
async/awaitcolorblind over the fiber Io. DONE (967aed67, hardenedada8d162).asyncheap-allocs a*Future, boxes a completion closure in a monomorphicThunkBox, and submits viaio.spawn_raw(inline underCBlockingIo, a fiber under the scheduler);awaitparks viasuspend_rawuntil ready. Protocol changed tosuspend_raw(park: *ParkToken)(write-back of the awaiter). Workers are nullary (call-site capture). Migrated 1805/1806; adoptedpush .{ … }. Lock: example 1824 (deferral visible:1 2 10 20 123). Review fixed: one-awaiterawaitguard; documented the Future allocator-lifetime contract + thatcanceldoesn't stop an already-spawned worker (Phase 3).- Resolved D2 (ParkToken):
suspend_raw(*ParkToken)write-back (chosen over a registry). ready() liveness (CONCERN 6): safe for single async/await (awaiter is suspended, not reaped, when readied);racefan-in must still deregister (Phase 4). - Carried to convergence:
asyncshould capture the scheduler's long-lived allocator (likesched.go'sown_allocator) instead of the call-sitecontext.allocator— needs a protocol affordance; documented as a contract for now.
- Resolved D2 (ParkToken):
- Open for later phases:
- ParkToken↔fiber binding.
ready(park)needspark.handle= the awaiter*Fiber. The scheduler knowsself.currentat suspend; the cleanest issuspend_raw(park: *ParkToken)writingpark.handle = self.currentbefore parking (a small protocol change: the materializer installs thunks by name/order, signature-agnostic — verified low-risk). Decide vs a token→fiber registry. ready()liveness (review CONCERN 6). Casting a stale/reaped*Fiberhandle andwake-ing it is a latent UAF once realawaitruns —wake's.suspendedvalue-check on freed bytes is luck, not safety. Phase 2 must guarantee single-ready / deregistration (mirror the bespoke-race deregister).
- ParkToken↔fiber binding.
- Out-of-scope compiler bug found by review (not filed yet): closure free-var analysis does not
descend into a nested
push Context {…}block inside a closure body — a var used only there reportsunresolved. Phase 0 sidesteps it (capture is at theFiberlevel, not via closure), so it does NOT block the unification; worth anissues/entry in a separate session.
Phases (each: implement → lock with an example → zig build test green → both platforms)
-
impl Io for Scheduler(the vehicle). Implement the six methods over the fiber primitives. Add aFiber.canceled/task back-ref sosuspend_rawcan raise on resume. KeepCBlockingIointact. Lock: install the fiber Io intocontext.io, run a root fiber thatsuspend_raws and isready()'d — asserts real park/resume through the protocol (not inline). Bridge (the one fiddly bit):async's genericClosure(..$args) -> $Rworker →spawn_raw's rawentry/arg. Box the worker thunk on the heap;entryis a C-ABI(env: *void) -> voidinvoke-thunk (mirrorsfib_dispatch),argis the env. -
async/awaitover the fiber Io (real interleaving). Under a suspending Io,asynccallsspawn_rawand returns a PENDINGFuture($R)(no longer born.ready); the spawned body fillsf.value/f.stateandready(f.park)s the awaiter.await(f)checks.readyelsesuspend_raw(f.park)then returns/raises — the suspending sibling of today's immediateawait.CBlockingIokeeps the run-inline path (degenerate, still correct). Lock: twocontext.io.asynctasks interleave under the fiber Io (the io.sx layer, replacing the bespokesched.go). -
True cancellation via
suspend_raw -> !.cancel(f)flipsf.canceledANDready(f.park)s / wakes the worker fiber so its NEXTsuspend_rawraisesIoErr.Canceled. The worker's suspends (await, a futureio.sleep) propagate viatry/!; the worker body unwinds, the future ends.canceled, its post-cancel side-effects DON'T run. This is the model-A "true cancellation" — now delivered through the protocol, not bespoke. Lock: a cancelled task's work stops at its next suspend (assert via a shared log: the post-suspend line never prints). -
raceover Futures —context.io.race((a: fa, b: fb)). Re-home the proven race logic (winner scan, deregister-all-on-wake, structured cancel+join of losers) fromsched.race(*Task tuple)onto*Futurehandles + theIoprotocol. The type-level machinery ports UNCHANGED —RaceResult($T),make_variant, the tuple reflection (GAP 1/2, all landed) — only the runtime swaps*Task→*Futureandsuspend_self→suspend_raw/ready. Cancellation of losers now uses Phase 3 (their next suspend raises), soracereturns at WINNER-time, not slowest-loser-time. Lock: re-point 1821 atcontext.io.race; assert winner value + losers' work stopped (not merely flagged). -
Converge — retire the bespoke fiber async API. Fold
sched.go/wait/cancel/raceinto the io.sx layer;Schedulerstays as the fiber Io's engine + driver. Migrate 1811–1821 to thecontext.ioAPI. One async stack, all behind the protocol. Update the roadmap/checkpoints.
Open decisions (need a call before/within the phase noted)
- D1 (Phase 1) —
impl Io for Schedulervs aFiberIowrapper. Direct impl makescontext.ioBE the scheduler (xx scheduleras the Io value, stateful receiver — mirrors the allocatorxx localrule). A wrapper adds a level but decouples the public Io vtable from the scheduler internals. Lean: direct impl (simplest, matches the allocator convention). - D2 (Phase 1) — virtual vs real clock under the fiber Io. Tests need the deterministic virtual clock
(
clock_ms); a real deployment wantstime.mono_ms. Thread it as a Scheduler mode, or two Io impls (FiberIovirtual-clock for tests, real-clock for prod). Lean: aclock: enum { virtual; real }field so one impl serves both; tests pin.virtual. - D3 (Phase 2) —
Future(void)(issue 0150 SIGTRAP). Avoid-result task can't buildFuture(void)today. Defer (race/async target non-void), or fix thevoidstruct-field path. Lean: defer, gate with a diagnostic. - D4 (Phase 3) — where the cancel flag lives. The
Futurealready hascanceled: Atomic(bool); the fiber needs to reach it fromsuspend_raw. GiveFibera*Atomic(bool)back-ref to its future's flag (set atspawn_raw), sosuspend_rawconsults it with no per-suspend lookup. Lean: back-ref pointer.
Validation (every phase)
zig build && zig build testgreen (full corpus).- New/changed
18xxexamples byte-identical on aarch64-macOS host AND aarch64-linux container (deterministic virtual clock). - Adversarial review of each phase (worker + read-only reviewer), per the session workflow.
What this supersedes
sched.sx's bespokego/wait/cancel/race(Phase 5 retires them; the proven logic moves onto the protocol). The just-landedrace(commit9099735e) is the reference logic for Phase 4, not the final home.- PLAN-RACE.md's "race on
sched.Scheduler" framing — this plan moves it ontocontext.ioper the roadmap's §A5 / §4.6 design-of-record.