From 95bedf726d26dc12cac26bddd9bcbf694b6a1240 Mon Sep 17 00:00:00 2001 From: agra Date: Fri, 26 Jun 2026 10:50:50 +0300 Subject: [PATCH] =?UTF-8?q?docs:=20file=20issue=200193=20=E2=80=94=20linux?= =?UTF-8?q?=20fiber-runtime=20port=20WIP=20+=20wrapped-asm=20drop?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Port of std/sched.sx (the M:1 fiber runtime) to aarch64-linux. The epoll bindings + std.event.Loop epoll backend are already committed and runtime- validated (cc137002); this records the SCHEDULER port, which is WIP: - WORKS, validated in an Apple `container` Linux VM: 1811 (round-robin) and 1816 (block_on_fd over the epoll fd path) run identically to macOS kqueue. - Bug A: a register-indirect trampoline (naked fn + `br x20`, to avoid a per-OS hand-written global-asm symbol) bus-errors on the 1817 go/wait/sleep capstone on both platforms, though 1811/1816 work — unresolved. - Bug B: wrapping the original global `asm` trampoline in an `inline if`/`case` drops it (nm: fib_tramp U) in sched.sx's context, though every minimal repro emits fine — a flatten/lowering interaction in src/imports.zig. The WIP sched.sx port is preserved both in `git stash` and as issues/0193-linux-fiber-port.patch. Two resolution paths (either suffices) documented in the issue. sched.sx itself is left at HEAD (macOS green). --- ...3-linux-fiber-port-and-wrapped-asm-drop.md | 73 ++++++ issues/0193-linux-fiber-port.patch | 226 ++++++++++++++++++ 2 files changed, 299 insertions(+) create mode 100644 issues/0193-linux-fiber-port-and-wrapped-asm-drop.md create mode 100644 issues/0193-linux-fiber-port.patch diff --git a/issues/0193-linux-fiber-port-and-wrapped-asm-drop.md b/issues/0193-linux-fiber-port-and-wrapped-asm-drop.md new file mode 100644 index 00000000..aebaa5bb --- /dev/null +++ b/issues/0193-linux-fiber-port-and-wrapped-asm-drop.md @@ -0,0 +1,73 @@ +# Issue 0193 — linux fiber-runtime port (sched.sx) + a wrapped top-level `asm` drop + +Status: **OPEN.** Two intertwined items uncovered while porting `library/modules/std/sched.sx` +(the M:1 fiber runtime) to aarch64-linux. The WIP sched.sx port is preserved in +`git stash` (`stash@{0}`, "WIP on fix/0192-qualified-import-const-comptime") — pop it to resume. + +The epoll *bindings* + `std.event.Loop` epoll backend are already committed (`cc137002`) and +**runtime-validated on real Linux** via Apple `container` (see the event.sx VALIDATION note / the +[[apple-container-linux-testing]] memory). This issue is only about the **fiber scheduler** port. + +## What WORKS (validated on aarch64-linux in an Apple `container`) + +With the stashed sched.sx port, built `--target aarch64-linux --self-contained` and run in an +alpine container: +- **1811** (scheduler round-robin via `yield_now`): `sequence: 0 1 2 0 1 2 0 1 2`, all done. ✓ +- **1816** (`block_on_fd` over a pipe — the **epoll** fd path): `log: wrote read 3 [97 98 99]`, + `n_suspended: 0` — identical to macOS kqueue. ✓ +- macOS (kqueue) stays green for both. + +The port (all in sched.sx) is: `MAP_AP` 0x1002→0x22; an `inline if OS == .linux { ep :: #import +"modules/std/net/epoll.sx" }`; and `inline if OS == { case .linux: case .macos: }` +branches in `block_on_fd` (open + `EPOLLIN|EPOLLONESHOT` register), the run-loop Mode-2 +(`epoll_wait` + `EPOLL_CTL_DEL`-on-fire for one-shot parity), and `cancel_io_waiter_for` +(`EPOLL_CTL_DEL`-on-early-wake). Those epoll branches are correct (1816 proves it). + +## Bug A — register-indirect trampoline bus-errors on the go/wait/sleep capstone (1817) + +To get the fiber trampoline onto linux without a per-OS hand-written global-asm symbol +(`_fib_tramp` vs `fib_tramp`), the stash replaces the global `asm` trampoline with a **naked sx fn ++ register-indirect branch**: `spawn` presets `regs[1]` (x20) = `xx fib_dispatch`, and +`fib_tramp :: () abi(.naked) { asm { mov x0, x19 ; br x20 } }` tail-branches to dispatch. Its own +symbol is auto-emitted per-OS, so no `.global`/`bl ` literal. + +This **works for 1811 + 1816** (both run on linux AND macOS) but **bus-errors immediately on 1817** +(`go`/`wait`/`sleep`) on BOTH macOS and linux — `Bus error`, no output, a short recursive-looking +stack trace. HEAD's 1817 (committed global-asm trampoline) works (`sum: 123`), so the redesign is +the regression. Root cause not yet found: 1811/1816 use the same `spawn`/tramp path; the only thing +1817 adds is timer `sleep` + `Task` `go`/`wait` (suspend/resume). Suspect something about the +naked-fn tramp or x20 liveness specific to the Task-closure / resume path — needs a debugger on the +container build. + +## Bug B — a top-level `asm` block wrapped in an `inline if` is DROPPED (in sched.sx's context) + +The redesign in Bug A was forced by this: wrapping the **original** global `asm` trampoline in +`inline if OS == { case .linux: asm{…fib_tramp…} case .macos: asm{…_fib_tramp…} }` (or the plain +`inline if OS == .linux { asm } else { asm }` form) makes the asm **not emit at all** — `nm` shows +`fib_tramp` as `U` (undefined), both platforms fail to link. A PLAIN unwrapped `asm{}` emits fine. + +NOT reproducible in isolation: minimal/medium repros (top-level asm in a case; two case blocks; +case asm in an imported module; naked fn + case asm with `bl` to an exported fn; a one-sided +`inline if .linux { #import }` before it) ALL emit + link correctly. Only sched.sx (the full module) +drops it. So there's a real flatten/lowering interaction in `src/imports.zig` +`flattenComptimeConditionals` / `appendBranchDecls` (the comptime-conditional pre-pass that surfaces +top-level decls from a taken `if_expr`/`match_expr` arm) with a top-level `asm` node, triggered by +something else in sched.sx — not yet isolated. + +## Two paths to resolve (either suffices) + +- **Path A (compiler):** fix Bug B — make a top-level `asm` block survive `inline if`/`case` + flattening in all module contexts. Then the original global-asm trampoline can be OS-branched with + the `case` form directly (no tramp redesign), sidestepping Bug A entirely. This is what the user + asked for ("case form to emit top-level asm block"). Start: instrument + `flattenComptimeConditionals` to dump the surfaced top-level decls for sched.sx and see where the + `asm` node is lost. +- **Path B (library):** fix Bug A — debug the register-indirect tramp's 1817 bus error (gdb/lldb in + the container on the aarch64-linux build, or a reduced go/wait/sleep repro). No compiler change. + +## Verification + +`git stash pop`; then per-example: `sx build --target aarch64-linux --self-contained -o /tmp/x +examples/concurrency/.sx` and `container run --rm -v "$PWD/.sx-tmp:/work" alpine /work/x` +(see [[apple-container-linux-testing]]). Target: 1811/1814/1816/1817 all green on linux AND macOS, +plus the full `zig build test` macOS suite. diff --git a/issues/0193-linux-fiber-port.patch b/issues/0193-linux-fiber-port.patch new file mode 100644 index 00000000..cff9aba2 --- /dev/null +++ b/issues/0193-linux-fiber-port.patch @@ -0,0 +1,226 @@ +diff --git a/library/modules/std/sched.sx b/library/modules/std/sched.sx +index c2f4b564..c2a99271 100644 +--- a/library/modules/std/sched.sx ++++ b/library/modules/std/sched.sx +@@ -26,6 +26,14 @@ + // mismatch. + #import "modules/std.sx"; + kqb :: #import "modules/std/net/kqueue.sx"; ++// The fd-readiness backend is per-OS: kqueue (kqb, above) on darwin, epoll on ++// linux. The epoll import is scoped to the linux branch so darwin never pulls ++// epoll's types into the concurrency examples' type tables (the same ++// std-barrel-drift rule std.event.Loop follows); `block_on_fd` / the run loop ++// reference `ep` only inside their own `inline if OS == .linux` arms. ++inline if OS == .linux { ++ ep :: #import "modules/std/net/epoll.sx"; ++} + + // --- libc mmap stack primitives ------------------------------------------- + +@@ -40,7 +48,14 @@ abort :: () -> noreturn extern libc "abort"; + + PROT_NONE :: 0; + PROT_RW :: 3; // PROT_READ | PROT_WRITE +-MAP_AP :: 0x1002; // macOS MAP_PRIVATE (0x2) | MAP_ANON (0x1000) ++// Exhaustive on the SUPPORTED OSes (linux/macOS), no default case: an ++// unsupported target matches no case → MAP_AP undefined → a loud compile error ++// on use rather than a silent wrong flag. (The fiber runtime is aarch64-only ++// anyway — the swap_context asm — so only these two platforms are wired.) ++inline if OS == { ++ case .linux: MAP_AP :: 0x22; // linux MAP_PRIVATE (0x2) | MAP_ANON (0x20) ++ case .macos: MAP_AP :: 0x1002; // macOS MAP_PRIVATE (0x2) | MAP_ANON (0x1000) ++} + GUARD :: 16384; // one 16 KB page (aarch64-macOS) + STACK :: 131072; // 128 KB usable per fiber + +@@ -172,10 +187,11 @@ Scheduler :: struct { + self.n_spawned = self.n_spawned + 1; + + top := boot_stack(f, STACK); +- f.ctx.regs[0] = xx f; // x19 = self +- f.ctx.regs[10] = 0; // fp +- f.ctx.regs[11] = xx fib_tramp; // lr → trampoline +- f.ctx.regs[12] = top; // sp ++ f.ctx.regs[0] = xx f; // x19 = self (→ x0 in the tramp) ++ f.ctx.regs[1] = xx fib_dispatch; // x20 = dispatch entry (tramp `br`s to it) ++ f.ctx.regs[10] = 0; // fp ++ f.ctx.regs[11] = xx fib_tramp; // lr → trampoline ++ f.ctx.regs[12] = top; // sp + + f.state = .ready; + enqueue(self, f); +@@ -333,20 +349,38 @@ Scheduler :: struct { + } + j = j + 1; + } +- // Lazily open the kqueue fd the first time fd-blocking is used. ++ // Lazily open the event-queue fd the first time fd-blocking is used: ++ // kqueue on darwin, epoll on linux. `self.kq` holds whichever — it is ++ // just "the readiness queue fd". + if self.kq < 0 { +- self.kq = kqb.kqueue(); ++ inline if OS == { ++ case .linux: self.kq = ep.ep_create(); ++ case .macos: self.kq = kqb.kqueue(); ++ } + if self.kq < 0 { +- print("sched: kqueue() failed to open the event queue\n"); ++ print("sched: failed to open the event queue\n"); + abort(); + } + } +- // Arm a one-shot read-readiness registration for `fd`. udata is unused +- // (we match the waiter by fd in the drain), so pass 0. +- chg := kqb.kev_change(fd, kqb.EVFILT_READ, kqb.EV_ADD | kqb.EV_ENABLE | kqb.EV_ONESHOT, 0); +- if !kqb.kq_apply(self.kq, chg) { +- print("sched: kevent() failed to register fd {} for read readiness\n", fd); +- abort(); ++ // Arm a one-shot read-readiness registration for `fd`, matched back by ++ // the run-loop drain (kqueue by ident; epoll stashes the fd in `data`). ++ // darwin EV_ONESHOT auto-removes the registration on fire; epoll's ++ // EPOLLONESHOT only DISABLES it, so the linux paths additionally ++ // EPOLL_CTL_DEL on fire (run) and on early-wake (cancel_io_waiter_for). ++ inline if OS == { ++ case .linux: { ++ if !ep.ep_ctl(self.kq, ep.EPOLL_CTL_ADD, fd, ep.EPOLLIN | ep.EPOLLONESHOT) { ++ print("sched: epoll_ctl() failed to register fd {} for read readiness\n", fd); ++ abort(); ++ } ++ } ++ case .macos: { ++ chg := kqb.kev_change(fd, kqb.EVFILT_READ, kqb.EV_ADD | kqb.EV_ENABLE | kqb.EV_ONESHOT, 0); ++ if !kqb.kq_apply(self.kq, chg) { ++ print("sched: kevent() failed to register fd {} for read readiness\n", fd); ++ abort(); ++ } ++ } + } + // Record the waiter BEFORE parking — the run loop matches the fired + // event's ident back to this record. Long-lived-container rule: the +@@ -407,20 +441,42 @@ Scheduler :: struct { + // kernel reports at least one fd ready, then wake every waiter whose + // fd fired. (null timeout via -1 → wait forever.) + if self.io_waiters.len > 0 { +- evbuf : [MAXEV]kqb.Kevent = ---; +- n := kqb.kq_wait(self.kq, @evbuf[0], MAXEV, -1); +- if n < 0 { +- print("sched: kevent() wait failed while blocking on fd readiness\n"); +- abort(); +- } +- // For each fired event, find the io-waiter whose fd matches its +- // ident, evict it, and wake its fiber. EV_ONESHOT already removed +- // the kernel registration, so we only drop the waiter record. +- i := 0; +- while i < n { +- ready_fd : i32 = xx evbuf[i].ident; +- wake_io_waiter_for_fd(self, ready_fd); +- i = i + 1; ++ // BLOCK on the readiness queue until ≥1 fd fires (timeout -1 = ++ // forever), then for each fired event match the fd back to its ++ // io-waiter, evict the record, and wake the fiber. ++ inline if OS == { ++ case .linux: { ++ evbuf : [MAXEV]ep.EpollEvent = ---; ++ n := ep.ep_wait(self.kq, .{ ptr = @evbuf[0], len = MAXEV }, MAXEV, -1); ++ if n < 0 { ++ print("sched: epoll_wait() failed while blocking on fd readiness\n"); ++ abort(); ++ } ++ i := 0; ++ while i < n { ++ ready_fd := ep.ev_fd(evbuf[i]); ++ wake_io_waiter_for_fd(self, ready_fd); ++ // EPOLLONESHOT only DISABLED the registration; remove it ++ // fully so the fd can be re-armed by a future block_on_fd ++ // (kqueue's EV_ONESHOT removes it for free). ++ ep.ep_ctl(self.kq, ep.EPOLL_CTL_DEL, ready_fd, 0); ++ i = i + 1; ++ } ++ } ++ case .macos: { ++ evbuf : [MAXEV]kqb.Kevent = ---; ++ n := kqb.kq_wait(self.kq, @evbuf[0], MAXEV, -1); ++ if n < 0 { ++ print("sched: kevent() wait failed while blocking on fd readiness\n"); ++ abort(); ++ } ++ i := 0; ++ while i < n { ++ ready_fd : i32 = xx evbuf[i].ident; ++ wake_io_waiter_for_fd(self, ready_fd); ++ i = i + 1; ++ } ++ } + } + continue; + } +@@ -542,21 +598,37 @@ ASM + // First-entry trampoline: a fiber's bootstrapped LR points here. x19 holds the + // `*Fiber` (preset in the saved context); move it to x0 and call the generic + // dispatch. +-asm { +- #string T +-.global _fib_tramp +-_fib_tramp: ++// Symbol naming is per-OS: darwin prefixes user/exported symbols with `_` ++// (`_fib_tramp` / `_fib_dispatch`), linux does not. The sx-side `fib_tramp` ++// extern + `export "fib_dispatch"` resolve to the platform-prefixed name ++// automatically; only this hand-written asm must spell the literal symbol, so ++// branch it. (The `swap_context` naked asm above has no symbol literals — only ++// instructions — so it is shared.) ++// First-entry trampoline: a fiber's bootstrapped LR points here, with x19 = ++// `*Fiber` and x20 = `&fib_dispatch` (both preset in the saved context by ++// `spawn`, both callee-saved so `swap_context` restores them on first entry). ++// Move the fiber to x0 and tail-branch to dispatch via the REGISTER — no ++// hand-written global-asm symbol, so nothing here needs per-OS symbol naming ++// (`_fib_tramp`/`fib_tramp`) or a `bl` to a named export. As a naked sx fn its ++// own symbol is emitted with the platform-correct name automatically, so ++// `spawn`'s `xx fib_tramp` resolves on every target. (This register-indirect ++// bootstrap replaced an OS-conditional global `asm` block: a top-level `asm` ++// wrapped in an `inline if` is dropped in this module's context — see ++// issues/0193 — and a naked fn + `br` sidesteps the hand-written symbol ++// entirely, which is cleaner regardless.) ++fib_tramp :: () abi(.naked) { ++ asm volatile { ++ #string T + mov x0, x19 +- bl _fib_dispatch +- brk #0 +-T, +-}; +-fib_tramp :: () extern; ++ br x20 ++T ++ }; ++} + +-// The ONE place that runs a fiber body. Reached only from `_fib_tramp` on first ++// The ONE place that runs a fiber body. Reached only from `fib_tramp` on first + // entry, on the fiber's own fresh stack. Runs the body, marks the fiber done, + // and switches back to the scheduler — never returns past the final switch. +-fib_dispatch :: (self: *Fiber) export "fib_dispatch" { ++fib_dispatch :: (self: *Fiber) { + self.body(); + self.state = .done; + swap_context(@self.ctx, @self.sched.sched_ctx); +@@ -687,7 +759,19 @@ cancel_io_waiter_for :: (self: *Scheduler, f: *Fiber) { + i := 0; + while i < self.io_waiters.len { + if self.io_waiters.items[i].fiber == f { +- remove_io_waiter(self, i); ++ // Early-wake: the fiber is re-readied by another path while its fd ++ // registration is still armed. kqueue's EV_ONESHOT lingers ++ // harmlessly (a never-fired one-shot the drain ignores); epoll's ++ // EPOLLONESHOT registration stays enabled — it could fire later with ++ // no waiter, and blocks a re-arm of the same fd — so remove it. ++ inline if OS == { ++ case .linux: { ++ fd := self.io_waiters.items[i].fd; ++ remove_io_waiter(self, i); ++ if self.kq >= 0 { ep.ep_ctl(self.kq, ep.EPOLL_CTL_DEL, fd, 0); } ++ } ++ case .macos: remove_io_waiter(self, i); ++ } + return; + } + i = i + 1;