feat: true cancellation for the fiber Io layer (PLAN-IO-UNIFY Phase 3)
A cancelled async worker now abandons its body at its next suspend instead
of running to completion.
- Cancel-flag back-ref (D4): SpawnOpts.cancel_flag (core.sx) + Fiber.cancel_flag
(sched.sx), set from opts.cancel_flag in Scheduler.spawn_raw; async passes
xx @f.canceled (the Future.canceled Atomic(bool) erased to *void).
- Delivery: Scheduler.suspend_raw consults fiber_canceled(self.current) PRE-park
(raise without parking — no deadlock if cancel landed before the worker ran)
and POST-resume (cancel landed while parked), raising error.Canceled.
cancel(f) flips the sticky flag, marks .canceled, and wakes the worker.
- async worker is failable Closure() -> ($R, !); the completion closure
f.value = worker() catch {…} marks .canceled/.failed and wakes the awaiter,
so post-suspend side effects never run. New failable io.sleep(ms) is the
cancellation point.
- Compiler: a -> ! fn whose only error source is try-ing a protocol method
(io.suspend_raw) was wrongly flagged 'declared ! but never errors';
collectErrorSites now marks a try of a non-identifier callee as a dynamic
(opaque) error source, suppressing the warning.
- Two UAFs found by adversarial review and fixed: (1) cancel-before-park
orphaned io.sleep's armed timer — suspend_raw's pre-park raise now evicts the
current fiber's timer/waiter first; (2) cancel(f) could wake a reaped worker —
now only wakes when was_pending.
Migrated 1805/1806/1824 to failable workers. Lock: example 1825 (seq: 1 -99,
post-suspend line never runs); byte-identical on aarch64-macOS + aarch64-linux.
.ir churn is the SpawnOpts layout change (type-table string renumbering).
This commit is contained in:
@@ -35,6 +35,66 @@ the existing fiber primitives in sched.sx (`spawn`/`suspend_self`/`wake`/`sleep`
|
||||
installed via `push Context { io = xx scheduler } { … s.run(); }` — exactly the existing sched examples,
|
||||
just with the scheduler now reachable as `context.io`.
|
||||
|
||||
## Status (2026-06-28)
|
||||
- **Phase 3 — TRUE cancellation via `suspend_raw -> !`. DONE.** A cancelled async
|
||||
worker now abandons its body at its next suspend instead of running to
|
||||
completion. Pieces:
|
||||
- **Cancel-flag back-ref (D4 — back-ref pointer, chosen):** `SpawnOpts.cancel_flag:
|
||||
*void` (core.sx) + `Fiber.cancel_flag: *void` (sched.sx), set from
|
||||
`opts.cancel_flag` in `Scheduler.spawn_raw`. `async` passes `xx @f.canceled`
|
||||
(the `Future.canceled` `Atomic(bool)` erased to `*void`).
|
||||
- **Delivery:** `Scheduler.suspend_raw` checks `fiber_canceled(self.current)` (a
|
||||
`*Atomic(bool)` load) PRE-park (raise without parking — no deadlock if cancel
|
||||
landed before the worker ran) and POST-resume (cancel landed while parked),
|
||||
raising `error.Canceled` (a bare `-> !`; set inferred). `cancel(f)` flips the
|
||||
sticky flag, marks `.canceled`, and `ready(.{handle=f.task})`s the worker.
|
||||
- **Worker is failable** `Closure() -> ($R, !)`: the `async` completion closure
|
||||
`f.value = worker() catch { … }` (the captured-failable-closure-call the
|
||||
Phase-3-prereq fix enabled) marks `.canceled`/`.failed` and wakes the awaiter;
|
||||
the worker's post-suspend side effects never run. New failable `io.sleep(ms)`
|
||||
(arm_timer + `try suspend_raw`) is the cancellation point.
|
||||
- **Compiler gap fixed:** a `-> !` fn whose only error source is `try`-ing a
|
||||
protocol method (`io.suspend_raw`) was wrongly flagged "declared `!` but never
|
||||
errors". `collectErrorSites` (error_analysis.zig) now sets a `dyn` flag for a
|
||||
`try` of a non-identifier callee (opaque error channel), suppressing the
|
||||
warning.
|
||||
- **Two UAFs found by adversarial review and FIXED:** (1) cancel-before-park
|
||||
orphaned `io.sleep`'s armed timer → `suspend_raw`'s pre-park raise now evicts
|
||||
the current fiber's timer/waiter first. (2) `cancel(f)` woke a possibly-reaped
|
||||
worker → now only wakes when `was_pending` (`.pending` before the store).
|
||||
- Migrated 1805/1806/1824 to failable workers. Lock:
|
||||
`examples/concurrency/1825-concurrency-fiber-cancel-suspend.sx` (`seq: 1 -99`
|
||||
— post-suspend line never runs). **Validated byte-identical on aarch64-macOS
|
||||
host AND aarch64-linux container** (1824 + 1825). Suite 853/0. Expected `.ir`
|
||||
churn (SpawnOpts layout) regenerated; no non-`.ir` snapshot changed.
|
||||
|
||||
|
||||
- **Phase 3 PREREQUISITE — captured-failable-closure call typing. DONE.** The
|
||||
async completion closure (`b.run = () => { f.value = worker() catch {…} }`)
|
||||
captures a failable `worker` and consumes its error channel; the free-variable
|
||||
capture analysis (`collectCaptures` in `src/ir/lower/closure.zig`) did not
|
||||
descend into the error-handling / context / asm / multi-assign nodes, so
|
||||
`worker` was never captured — inside the lambda it resolved against an empty
|
||||
scope and typed as `.unresolved` (`catch`/`try` then rejected it). Fixed: added
|
||||
`try_expr`, `catch_expr`, `onfail_stmt`, `raise_stmt`, `multi_assign`,
|
||||
`push_stmt`, `comptime_expr`, `insert_expr`, `spread_expr`, `asm_expr` arms to
|
||||
`collectCaptures`. Adversarially reviewed (captures resolve, locals correctly
|
||||
excluded, no false-positive captures, 851/0). Lock: example
|
||||
`examples/closures/0314-closures-capture-failable-call.sx` (catch + try over a
|
||||
captured failable closure; pure language feature, host-only). The `push_stmt`
|
||||
arm also fixes the previously-noted "free-var analysis doesn't descend into a
|
||||
nested `push Context {…}`" gap. **Phase 3 is now unblocked.**
|
||||
- Two PRE-EXISTING, orthogonal bugs surfaced during review (neither blocked
|
||||
Phase 3): (1) calling a closure stored in a **struct data field** typed as
|
||||
`unresolved` (value → garbage; failable → can't `catch`) — **RESOLVED**
|
||||
(`issues/0201`): `CallResolver.plan` gained a closure/fn-pointer field arm and
|
||||
the lowering closure-field arm now also handles bare `.function` fields;
|
||||
regression `examples/closures/0315-closures-struct-field-call.sx`. (2) asm
|
||||
write-through place through a deref (`asm { … "+r" -> @(p.*) }`) fails LLVM
|
||||
verification — repros with NO closure (independent of capture analysis);
|
||||
possibly an unsupported deref-place form rather than a confirmed bug, not
|
||||
filed.
|
||||
|
||||
## Status (2026-06-27)
|
||||
- **Phase 0 — fibers inherit the spawn-time context. DONE** (`2f2d7f1d`). Discovered during Phase 1: a
|
||||
fiber body ran under `__sx_default_context` (the `abi(.c)` `fib_dispatch` dropped the implicit
|
||||
|
||||
Reference in New Issue
Block a user