fibers B1.3a-1: stackful context switch (naked swap_context + fiber bootstrap)

The first piece of the B1.3 fiber runtime — the stackful context switch, pure
sx over abi(.naked). swap_context(from, to) saves the callee-saved registers +
SP/LR into *from and loads them from *to, then rets onto to's stack (SP-in !=
SP-out by design — why it must be .naked). Fibers are bootstrapped by hand: the
saved context starts with SP = top of an alloc_bytes stack, LR = a global-asm
trampoline (mov x0, x19; bl _fib_body, reaching the sx body via export), and
x19 = the *Fiber.

Locked by examples/1807-concurrency-fiber-context-switch.sx (aarch64-pinned):
- 2-fiber ping-pong (A <-> B, 3 rounds each): rounds: 6, and a per-fiber stack
  canary held live across every suspend survives (canary fails: 0);
- a 64-frame deep recursive chain suspended at the bottom and resumed, verifying
  every frame's stack-local on the unwind (frames verified: 64, depth fails: 0).

Scope (honest): exercises register/stack preservation INDIRECTLY (compiler-
allocated live values + the canary). The EXPLICIT every-callee-saved GP
(x19-x28) + FP (d8-d15) sentinel scribble — the full design-section-10.7 gate —
is B1.3a-2, still owed. x86_64 sibling + mmap guard-page stacks are B1.3b.

Suite green 733/0. Runs under JIT, ir-only on a non-arm host.
This commit is contained in:
agra
2026-06-21 06:16:58 +03:00
parent 37d68e72be
commit b234b7df6f
7 changed files with 17175 additions and 8 deletions

View File

@@ -4,9 +4,24 @@ Companion to [PLAN-FIBERS.md](PLAN-FIBERS.md). Update after every step (one step
per the cadence rule). New corpus category: `18xx` concurrency.
## Last completed step
**B1.2 COMPLETE — the async surface works end-to-end.** All three surface blockers (0151, 0152,
0153) are FIXED + committed; the async examples are landed + green. Suite green **732/0**, master
clean.
**B1.3a-1 — the stackful context switch works (foundational + indirect survival harness).**
Pure sx over `abi(.naked)`: a naked `swap_context(from, to)` saves callee-saved + SP/LR into
`*from` and loads from `*to`; fibers are bootstrapped by hand (SP = top of an `alloc_bytes`
stack, LR = a global-asm trampoline, x19 = `*Fiber`; the trampoline `mov x0, x19; bl _fib_body`).
Locked by `examples/1807-concurrency-fiber-context-switch.sx` (aarch64-pinned, `.build
{"target":"macos"}`, `.ir` captured): a 2-fiber ping-pong (A⇄B, 3 rounds each → `rounds: 6`) with
a per-fiber stack canary surviving every switch (`canary fails: 0`), plus a 64-frame deep
recursive chain suspended at the bottom and resumed, verifying every frame's stack-local on the
unwind (`frames verified: 64` / `depth fails: 0`). Suite green **733/0**, master clean.
- **Honest scope:** this exercises register/stack preservation INDIRECTLY (compiler-allocated
live values + the canary), which catches a broken SP/LR or a dropped callee-saved. It does NOT
yet EXPLICITLY scribble every callee-saved GP (x19-x28) + FP (d8-d15) with sentinels — that
full §10.7 gate is **B1.3a-2** (a dedicated naked scribble/verify routine, see Next step).
- The mechanism (bootstrap + naked switch + resume-mid-stack + `alloc_bytes` stacks) is proven;
the WIP probe lives at `.sx-tmp/fib_full.sx` / `.sx-tmp/fib_probe.sx`.
### Earlier — B1.2 COMPLETE — the async surface works end-to-end
All three surface blockers (0151, 0152, 0153) FIXED + committed; async examples landed + green.
- **0151 fixed** (`362674f`): generic `$T` infers through generic-struct / pointer / UFCS-pack
params. Regression `0214` + `0215`.
- **0152 fixed** (`e5586f6`): `Atomic(bool)` load/store byte-promoted to `i8` in the codegen
@@ -189,11 +204,28 @@ fibers/Io/scheduler code yet. Grounded floor facts:
boundary; a sharper sx diagnostic for it is a candidate polish, not a blocker.
## Next step
**B1.2 is done → start B1.3 (fiber runtime).** The compiler floor (B1.0 `abi(.naked)`, B1.1
per-fiber `context`) + the capability surface (B1.2 Io / `async`/`await`/`cancel`) are all in.
B1.3 builds the actual M:1 fiber scheduler on the `.naked` context-switch substrate — see
`PLAN-FIBERS.md` for the B1.3 step list. The B1.3 switch-stress harness (design §10.7) gates the
context-switch correctness the deterministic Io can't test.
**B1.3a-2 — the EXPLICIT register/FP scribble gate (design §10.7), then B1.3b.** The foundational
switch (B1.3a-1) is in; the rigorous gate is still owed. Sequence:
1. **B1.3a-2 (explicit scribble — the real §10.7 gate):** a naked `scribble_verify(self, peer,
base)` that loads known sentinels into EVERY callee-saved GP (x19-x28) AND FP (d8-d15), saves
the original return addr on the stack, swaps to the peer, and on resume reads every register
back and returns a mismatch count. Mechanism worked out: a naked fn CAN `bl swap_context` and
resume in-place (the swap saves/restores its lr), so push the caller-return on the fiber stack
before the swap and pop it after (sp is part of the saved context, so it round-trips). Add to
`1807` (or a sibling `1808`); expect 0 mismatches. This is the single highest-corruption-risk
asm in the stream — **review adversarially (worker if authorized)** per the plan.
2. **B1.3b:** the x86_64 sibling of `swap_context` (rbx/rbp/r12-r15/rsp save area — different
slot count + regs) + `mmap` stacks **with mandatory guard pages** (`mprotect` the low page
`PROT_NONE`; a fixed stack without a guard silently corrupts neighbors — §8.1.1). Replace the
`alloc_bytes` stack in `1807` with the guarded `mmap` path; add the x86_64 run sibling.
3. Then B1.3 (fiber runtime substrate) is done → **B1.4** (`Io` impls: blocking ✅ →
deterministic-sim KEYSTONE → event-loop) and **B1.5** (M:1 scheduler) build the real scheduler
on top, replacing the hand-bootstrapped ping-pong with `spawn`/`yield`/`resume`.
**Deferred (do NOT block on these):** issue **0150** (`void` struct field SIGTRAP) — only
`Future(void)`/`timeout`, which are B1.4. The **`::` callable-parameter feature** (named-fn
async workers `async(read_a, conn)`) — WIP at `.sx-tmp/wip-callable-params/patch.diff` (parser
done, inference incomplete); a dedicated effort; lambda workers are the B1.2 idiom meanwhile.
**Deferred (do NOT block on these):** issue **0150** (`void` struct field SIGTRAP) — only
`Future(void)`/`timeout`, which are B1.4. The **`::` callable-parameter feature** (named-fn
@@ -350,3 +382,15 @@ done, inference incomplete); a dedicated effort; lambda workers are the B1.2 idi
42` / `double: 42` / `clock ok`) + **`1806`** (`cancel` → `await` raises `.Canceled` → `or`
default; `ok: 7` / `canceled: -99`). **B1.2 (Io capability + M:1 async surface) is COMPLETE.**
Next: B1.3 (fiber runtime) on the `.naked` context-switch substrate.
- **B1.3a-1 — context switch works.** Implemented the stackful switch in pure sx over
`abi(.naked)`: `swap_context(from, to)` (save callee-saved x19-x28 + fp/lr + sp into `*from`,
load from `*to`, `ret` onto `to`'s stack) + by-hand fiber bootstrap (SP = top of an
`alloc_bytes` stack, LR = a `.global _fib_tramp` global-asm trampoline that does `mov x0, x19;
bl _fib_body`, x19 = `*Fiber`). Proven via a probe (main↔fiber), then locked by
`examples/1807-concurrency-fiber-context-switch.sx` (aarch64-pinned): a 2-fiber ping-pong
(`rounds: 6`, `canary fails: 0` — a per-fiber stack canary survives every switch) + a 64-frame
deep recursive chain suspended at the bottom and resumed (`frames verified: 64` / `depth fails:
0`). The `bl _fib_body` reaches the sx body via `export "fib_body"` (the 1655 asm→sx pattern);
runs under JIT, ir-only on a non-arm host (`.ir` captured — `swap_context` shows `naked noinline
nounwind`). Suite green 733/0. **Honest scope:** indirect register/stack survival only; the
EXPLICIT every-callee-saved + FP scribble (§10.7) is B1.3a-2, still owed. Next: B1.3a-2.