fibers: deterministic virtual-time timers (B1.4b)

Add a virtual clock + sleep timers to the M:1 scheduler so fibers
schedule in reproducible simulated time. Scheduler gains clock_ms (the
virtual clock, advances only as timers fire), a timers list, now_ms(),
sleep(ms) (arm {clock_ms+ms, current} + suspend), and a timer-driven
run (drain ready -> fire earliest timer -> advance clock -> wake ->
repeat; the orphan-suspend deadlock check is preserved for a genuine
no-timer park). Wakes fire in deadline order with a FIFO tiebreak.

Adversarial review found a use-after-free: a fiber woken early (manual
or Task wake) before its sleep timer fired was reaped while its Timer
kept a dangling *Fiber, so a later fire dereferenced freed memory.
Fixed: wake evicts the fiber's pending timer (cancel_timer_for) -- every
re-ready path funnels through wake, so no stale timer outlives its fiber.

Examples: 1814 (sim-timer deadline ordering), 1815 (early-wake timer
eviction regression). Suite green 753/0.
This commit is contained in:
agra
2026-06-21 19:09:22 +03:00
parent 02ab077bfb
commit 62ffea0663
13 changed files with 363 additions and 64 deletions

View File

@@ -4,8 +4,33 @@ Companion to [PLAN-FIBERS.md](PLAN-FIBERS.md). Update after every step (one step
per the cadence rule). New corpus category: `18xx` concurrency.
## Last completed step
**B1.4aa truly-SUSPENDING fiber-task async layer (`go`/`wait`/`cancel`) — landed +
adversarially reviewed; cleared two more compiler blockers en route.** `library/modules/std/sched.sx`
**B1.4bdeterministic VIRTUAL-TIME timer scheduling (the KEYSTONE) — landed + adversarially
reviewed (caught a CRITICAL UAF, fixed).** `library/modules/std/sched.sx` gained a virtual clock +
sleep timers so fibers schedule in reproducible simulated time (no real clock): `clock_ms` (advances
ONLY as timers fire), a `timers: List(Timer)` (insertion-order, linear min-scan, FIFO tiebreak),
`now_ms()`, `sleep(ms)` (arm `{clock_ms+ms, current}` + `suspend_self`), and a timer-driven `run`
(drain ready → fire earliest timer → advance clock → wake → repeat; orphan-deadlock check preserved
for a genuine no-timer suspend). Locked by `1814` (5 fibers sleep 30/10/20/15/15 → wake order
B@10, D@15, E@15 (FIFO), C@20, A@30 — deadline order, not spawn order; `now_ms()` reads each virtual
deadline; final clock 30). §8.1.3 calibration note in the header: the deterministic wake ORDER
equals what real `sleep`s produce, reproducing blocking semantics' observable ordering without real
time. The deterministic-sim `Io` is realized at the scheduler level (`sleep`/`now_ms`/timer-`run`),
not as an erased `Io`-protocol impl (same erasure reason as FiberIo).
- **Adversarial review (worker) of the run-loop change: found a CRITICAL use-after-free** — a fiber
that armed a `sleep` timer but was woken EARLY by another path (a manual/`Task` `wake`) ran to
completion + was reaped (stack `munmap`'d, `Fiber` freed) while its `Timer` still held a dangling
`*Fiber`; a later fire would `wake` freed memory (silent-corruption: "passes" only because the
freed slot coincidentally read `state != .suspended`). FIXED: `wake` now evicts the woken fiber's
pending timer (`cancel_timer_for`) — every re-ready path funnels through `wake` (the timer-fire in
`run` already removed the fired timer, so it's a harmless re-scan there), so no stale timer can
outlive its fiber. Regression `1815-concurrency-fiber-timer-early-wake.sx` (early wake → `clock: 0`,
the stale 100ms timer evicted, not fired). Review CLEARED: `n_suspended` accounting,
orphan-deadlock false-positives, timer-list integrity (re-arm during fire), clock monotonicity,
termination — all traced/probed safe.
- Suite GREEN (count below). Next: **B1.4c** (event-loop `Io` — real fd readiness, kqueue/epoll).
### Earlier — B1.4a — a truly-SUSPENDING fiber-task async layer (`go`/`wait`/`cancel`)
landed + adversarially reviewed; cleared two more compiler blockers en route. `library/modules/std/sched.sx`
now carries `Task($R)` + `Scheduler.go(work) -> *Task($R)` + `wait`/`cancel` (a `ufcs` layer over
the M:1 scheduler). `s.go(work)` runs the nullary thunk `work` as a REAL fiber; `t.wait()` SUSPENDS
the caller until it completes (vs io.sx's blocking `context.io.async`, which runs inline). Locked by
@@ -257,21 +282,21 @@ body); closed + locked. The review's `.naked`-lambda CRITICAL was a false positi
(unparseable — `isLambda` breaks on the `abi` keyword).
## Current state
**B1.4a COMPLETE — truly-suspending fiber-task async exists.** `library/modules/std/sched.sx` carries
the M:1 scheduler core (B1.5a) PLUS the async-task layer: `Task($R)` + `Scheduler.go(work) ->
*Task($R)` + `wait`/`cancel`. `s.go(work)` spawns a nullary thunk as a fiber; `t.wait()` suspends
the caller until it completes. Locked by `1813` (`sequence: 1 2 3 42 100 -99` — real interleave +
awaited values + cancel). Two compiler blockers fixed en route (0156 Part 1 — `$R` type-arg in a
pack-fn; 0157 — UFCS generic name collision), both regression-tested (`0216`, `0217`). Adversarially
reviewed; determinism + non-fiber-wait + cancel-skip-work all hardened. The io.sx blocking
`context.io.async` (1805/1806) is untouched and coexists. Suite GREEN 751/0.
**B1.4b COMPLETE — deterministic virtual-time timer scheduling exists.** `library/modules/std/sched.sx`
now carries: the M:1 scheduler core (B1.5a: `spawn`/`yield_now`/`suspend_self`/`wake`/`run`), the
suspending fiber-task async (B1.4a: `Task($R)`/`go`/`wait`/`cancel`), AND deterministic timers (B1.4b:
`clock_ms` virtual clock, `timers` list, `now_ms`/`sleep`, timer-driven `run`). Fibers `sleep(ms)` in
reproducible simulated time and wake in deadline order. The timer-vs-early-wake UAF found in review is
fixed (`wake` evicts the fiber's pending timer). Locked by `1811` (round-robin), `1812` (suspend/wake),
`1813` (async go/wait/cancel), `1814` (sim-timer deadline ordering), `1815` (timer early-wake eviction).
Suite GREEN (count below).
The remaining B1.4 work: **B1.4b** the deterministic-sim `Io` (virtual clock + timer min-heap,
calibrated against blocking — the KEYSTONE test harness), **B1.4c** the event-loop `Io`
(kqueue/epoll). Then **B1.5** end-to-end M:1 validation under the deterministic `Io`. NOTE: the
suspending async lives as `sched.go`/`wait` (M:1, receiver-driven), NOT routed through the erased
The remaining B1 work: **B1.4c** the event-loop `Io` (kqueue mac / epoll linux — real fd readiness),
then **B1.5** end-to-end M:1 validation under the deterministic timers. NOTE: the suspending async +
deterministic timers live as `sched.*` methods (M:1, receiver-driven), NOT routed through the erased
`context.io` (which would force sched.sx into every std consumer + duplicate the `_fib_tramp` global
asm); the `Io` protocol's `spawn_raw`/`suspend_raw`/`ready` remain reserved for the M:N evolution.
asm); the `Io` protocol's `spawn_raw`/`suspend_raw`/`ready`/`arm_timer`/`poll` remain reserved for the
M:N evolution / when a program wants the capability-threaded form.
### Earlier — B1.5a COMPLETE — the M:1 scheduler CORE exists
`library/modules/std/sched.sx` drives N fibers
@@ -363,24 +388,22 @@ fibers/Io/scheduler code yet. Grounded floor facts:
boundary; a sharper sx diagnostic for it is a candidate polish, not a blocker.
## Next step
**→ B1.4b — the deterministic-sim `Io` (the KEYSTONE test harness).** B1.4a (suspending fiber-task
async, `sched.go`/`wait`) is done. Now build a deterministic `Io` impl: a virtual clock (`now_ms`
returns simulated time), a timer min-heap (`arm_timer` schedules a wake at a sim deadline), and
`poll` advances the clock to the next due timer and wakes its parked fiber. Drive it over the M:1
scheduler so a program using sim-time sleeps/timeouts runs fully deterministically. **Calibrate it
against blocking `Io`** (§8.1.3): the same program under blocking vs deterministic `Io` must produce
the same observable result before the deterministic one is trusted to gate async tests. Lock with an
`18xx` example asserting a program-emitted ORDERING contract (sim-time scheduling), aarch64-pinned
(`.build {"target":"macos"}`). This harness gates B1.5 + Stream B2.
**→ B1.4c — the event-loop `Io` (real fd readiness).** B1.4b (deterministic virtual-time timers,
`sched.sleep`/`now_ms`/timer-`run`) is done — the KEYSTONE deterministic harness exists at the
scheduler level. Now add real-I/O readiness: a `poll`-style step over `kqueue` (macOS) / `epoll`
(linux) that blocks until an fd is readable/writable (or a real-time timeout), then wakes the parked
fiber waiting on it. Likely shape: a `block_on_fd(fd, events)` that registers the current fiber's
interest, suspends, and is woken when `run`'s poll step reports the fd ready. Lock with an `18xx`
example doing genuine fd I/O (e.g. a `pipe(2)`: a fiber blocks reading, another writes, the reader
wakes with the bytes) — aarch64-macOS-pinned, kqueue. The deterministic timers (1814) and real I/O
should compose (a real `poll` with a timeout vs the virtual clock — keep them as separate run modes,
or unify with care). Then **B1.5** end-to-end M:1 validation. The §10.7 gate (1808) + guarded-stack
(1809) + Win64 (1810) + scheduler/async/timers (1811-1815) must keep passing throughout.
Then: **B1.4c** event-loop `Io` (kqueue mac / epoll linux — real fd readiness), **B1.5** end-to-end
M:1 validation under the deterministic `Io`. The §10.7 gate (1808) + guarded-stack (1809) + Win64
(1810) + scheduler (1811/1812) + async (1813) must keep passing throughout.
Open design question for B1.4b/c: a deterministic/event-loop `Io` needs a current-`Scheduler`
handle to park/wake. `sched.go`/`wait` thread it via the `Task`; an `Io` impl that wants the same
will likely need an ambient current-scheduler accessor in sched.sx (deferred from B1.4a — the
`Task`-threaded form sufficed). Decide when wiring `arm_timer` → a parked fiber.
Design note carried forward: an event-loop `Io` needs a current-`Scheduler` handle. `sched.*` methods
thread it via `self`/the `Task`; if B1.4c wants the capability-threaded `context.io` form it'll need
an ambient current-scheduler accessor in sched.sx (still deferred — the `sched.*`-method form
suffices). The `Io` protocol's `poll`/`arm_timer` map onto this when/if that wiring is built.
**Side thread (optional, low priority): the SysV/Linux x86_64 sibling.** A THIRD switch variant
for `x86_64-linux`: SysV callee-saved = rbx, rbp, r12-r15 + rsp (6 GP + sp; **no** callee-saved
@@ -670,3 +693,13 @@ incomplete); a dedicated effort; lambda workers are the idiom meanwhile.
diagnostic), a `wait`-outside-fiber null-deref (loud guard), and cancel-not-skipping-work (skip
if pre-canceled) — all fixed. Simplified `1812` (`**Fiber` → `Sh.parked`). 0156 Part 2 reframed
OPEN/non-blocking. Suite GREEN **751/0**. Next: B1.4b (deterministic-sim `Io`, the KEYSTONE).
- **B1.4b COMPLETE (this session) — deterministic virtual-time timers + a CRITICAL UAF fix.** Added
`clock_ms`/`timers`/`now_ms`/`sleep` + a timer-driven `run` to `sched.sx` (worker-built): fibers
sleep in reproducible simulated time, waking in deadline order (FIFO tiebreak). Locked `1814`
(5 fibers, wake order B@10/D@15/E@15/C@20/A@30). Adversarial review of the run-loop change found a
CRITICAL use-after-free — a fiber woken EARLY (manual/Task `wake`) before its `sleep` timer fired
was reaped while its `Timer` kept a dangling `*Fiber`; a later fire dereferenced freed memory
(silent "pass" only by luck). Fixed: `wake` evicts the fiber's pending timer (`cancel_timer_for`);
regression `1815` (early wake → `clock: 0`, stale timer never fires). Review cleared n_suspended
accounting, deadlock false-positives, timer-list integrity, clock monotonicity, termination.
Suite GREEN **753/0**. Next: B1.4c (event-loop `Io`, kqueue/epoll).

View File

@@ -7,11 +7,10 @@
> `suspend_self`/`wake`/`run`) ✅** (fixed blocker 0154) · **B1.4a (suspending fiber-task async —
> `sched.go`/`wait`/`cancel` over `Task($R)`, nullary-thunk) ✅** (adversarially reviewed; fixed
> blockers 0156-Part1 + 0157 en route; locked `1813`).
> **→ NOW: B1.4b** — the deterministic-sim `Io` (virtual clock + timer min-heap, calibrated against
> blocking — §8.1.3, the KEYSTONE test harness). Then B1.4c (event-loop `Io`), B1.5 (end-to-end M:1
> under deterministic `Io`). Detailed progress in [CHECKPOINT-FIBERS.md](CHECKPOINT-FIBERS.md).
> NOTE: the suspending async is `sched.go`/`wait` (M:1, receiver-driven), NOT routed through the
> erased `context.io` (avoids forcing sched.sx into every std consumer + the `_fib_tramp` dup-symbol
> **B1.4b (deterministic virtual-time timers — sched.sleep/now_ms/timer-run) ✅** (reviewed; fixed a CRITICAL timer-vs-early-wake UAF; locked 1814/1815).
> **→ NOW: B1.4c** — the event-loop `Io` (kqueue/epoll, real fd readiness). Then B1.5 (end-to-end
> M:1). Detailed progress in [CHECKPOINT-FIBERS.md](CHECKPOINT-FIBERS.md). NOTE: suspending async +
> deterministic timers live as `sched.*` methods (M:1), NOT routed through the erased `context.io` (avoids forcing sched.sx into every std consumer + the `_fib_tramp` dup-symbol
> trap); the `Io` protocol's `spawn_raw`/`suspend_raw`/`ready` stay reserved for M:N. Deferred:
> issue 0150 (`Future(void)`/`timeout`); 0156-Part2 (deferred `..` spread); the `::` callable-param
> feature.