Files
sx/issues/0193-linux-fiber-port-and-wrapped-asm-drop.md
agra 95bedf726d docs: file issue 0193 — linux fiber-runtime port WIP + wrapped-asm drop
Port of std/sched.sx (the M:1 fiber runtime) to aarch64-linux. The epoll
bindings + std.event.Loop epoll backend are already committed and runtime-
validated (cc137002); this records the SCHEDULER port, which is WIP:

- WORKS, validated in an Apple `container` Linux VM: 1811 (round-robin) and 1816
  (block_on_fd over the epoll fd path) run identically to macOS kqueue.
- Bug A: a register-indirect trampoline (naked fn + `br x20`, to avoid a per-OS
  hand-written global-asm symbol) bus-errors on the 1817 go/wait/sleep capstone
  on both platforms, though 1811/1816 work — unresolved.
- Bug B: wrapping the original global `asm` trampoline in an `inline if`/`case`
  drops it (nm: fib_tramp U) in sched.sx's context, though every minimal repro
  emits fine — a flatten/lowering interaction in src/imports.zig.

The WIP sched.sx port is preserved both in `git stash` and as
issues/0193-linux-fiber-port.patch. Two resolution paths (either suffices)
documented in the issue. sched.sx itself is left at HEAD (macOS green).
2026-06-26 10:50:50 +03:00

4.8 KiB

Issue 0193 — linux fiber-runtime port (sched.sx) + a wrapped top-level asm drop

Status: OPEN. Two intertwined items uncovered while porting library/modules/std/sched.sx (the M:1 fiber runtime) to aarch64-linux. The WIP sched.sx port is preserved in git stash (stash@{0}, "WIP on fix/0192-qualified-import-const-comptime") — pop it to resume.

The epoll bindings + std.event.Loop epoll backend are already committed (cc137002) and runtime-validated on real Linux via Apple container (see the event.sx VALIDATION note / the apple-container-linux-testing memory). This issue is only about the fiber scheduler port.

What WORKS (validated on aarch64-linux in an Apple container)

With the stashed sched.sx port, built --target aarch64-linux --self-contained and run in an alpine container:

  • 1811 (scheduler round-robin via yield_now): sequence: 0 1 2 0 1 2 0 1 2, all done. ✓
  • 1816 (block_on_fd over a pipe — the epoll fd path): log: wrote read 3 [97 98 99], n_suspended: 0 — identical to macOS kqueue. ✓
  • macOS (kqueue) stays green for both.

The port (all in sched.sx) is: MAP_AP 0x1002→0x22; an inline if OS == .linux { ep :: #import "modules/std/net/epoll.sx" }; and inline if OS == { case .linux: <epoll> case .macos: <kqueue> } branches in block_on_fd (open + EPOLLIN|EPOLLONESHOT register), the run-loop Mode-2 (epoll_wait + EPOLL_CTL_DEL-on-fire for one-shot parity), and cancel_io_waiter_for (EPOLL_CTL_DEL-on-early-wake). Those epoll branches are correct (1816 proves it).

Bug A — register-indirect trampoline bus-errors on the go/wait/sleep capstone (1817)

To get the fiber trampoline onto linux without a per-OS hand-written global-asm symbol (_fib_tramp vs fib_tramp), the stash replaces the global asm trampoline with a **naked sx fn

  • register-indirect branch**: spawn presets regs[1] (x20) = xx fib_dispatch, and fib_tramp :: () abi(.naked) { asm { mov x0, x19 ; br x20 } } tail-branches to dispatch. Its own symbol is auto-emitted per-OS, so no .global/bl <name> literal.

This works for 1811 + 1816 (both run on linux AND macOS) but bus-errors immediately on 1817 (go/wait/sleep) on BOTH macOS and linux — Bus error, no output, a short recursive-looking stack trace. HEAD's 1817 (committed global-asm trampoline) works (sum: 123), so the redesign is the regression. Root cause not yet found: 1811/1816 use the same spawn/tramp path; the only thing 1817 adds is timer sleep + Task go/wait (suspend/resume). Suspect something about the naked-fn tramp or x20 liveness specific to the Task-closure / resume path — needs a debugger on the container build.

Bug B — a top-level asm block wrapped in an inline if is DROPPED (in sched.sx's context)

The redesign in Bug A was forced by this: wrapping the original global asm trampoline in inline if OS == { case .linux: asm{…fib_tramp…} case .macos: asm{…_fib_tramp…} } (or the plain inline if OS == .linux { asm } else { asm } form) makes the asm not emit at allnm shows fib_tramp as U (undefined), both platforms fail to link. A PLAIN unwrapped asm{} emits fine.

NOT reproducible in isolation: minimal/medium repros (top-level asm in a case; two case blocks; case asm in an imported module; naked fn + case asm with bl to an exported fn; a one-sided inline if .linux { #import } before it) ALL emit + link correctly. Only sched.sx (the full module) drops it. So there's a real flatten/lowering interaction in src/imports.zig flattenComptimeConditionals / appendBranchDecls (the comptime-conditional pre-pass that surfaces top-level decls from a taken if_expr/match_expr arm) with a top-level asm node, triggered by something else in sched.sx — not yet isolated.

Two paths to resolve (either suffices)

  • Path A (compiler): fix Bug B — make a top-level asm block survive inline if/case flattening in all module contexts. Then the original global-asm trampoline can be OS-branched with the case form directly (no tramp redesign), sidestepping Bug A entirely. This is what the user asked for ("case form to emit top-level asm block"). Start: instrument flattenComptimeConditionals to dump the surfaced top-level decls for sched.sx and see where the asm node is lost.
  • Path B (library): fix Bug A — debug the register-indirect tramp's 1817 bus error (gdb/lldb in the container on the aarch64-linux build, or a reduced go/wait/sleep repro). No compiler change.

Verification

git stash pop; then per-example: sx build --target aarch64-linux --self-contained -o /tmp/x examples/concurrency/<ex>.sx and container run --rm -v "$PWD/.sx-tmp:/work" alpine /work/x (see apple-container-linux-testing). Target: 1811/1814/1816/1817 all green on linux AND macOS, plus the full zig build test macOS suite.