Port of std/sched.sx (the M:1 fiber runtime) to aarch64-linux. The epoll
bindings + std.event.Loop epoll backend are already committed and runtime-
validated (cc137002); this records the SCHEDULER port, which is WIP:
- WORKS, validated in an Apple `container` Linux VM: 1811 (round-robin) and 1816
(block_on_fd over the epoll fd path) run identically to macOS kqueue.
- Bug A: a register-indirect trampoline (naked fn + `br x20`, to avoid a per-OS
hand-written global-asm symbol) bus-errors on the 1817 go/wait/sleep capstone
on both platforms, though 1811/1816 work — unresolved.
- Bug B: wrapping the original global `asm` trampoline in an `inline if`/`case`
drops it (nm: fib_tramp U) in sched.sx's context, though every minimal repro
emits fine — a flatten/lowering interaction in src/imports.zig.
The WIP sched.sx port is preserved both in `git stash` and as
issues/0193-linux-fiber-port.patch. Two resolution paths (either suffices)
documented in the issue. sched.sx itself is left at HEAD (macOS green).
4.8 KiB
Issue 0193 — linux fiber-runtime port (sched.sx) + a wrapped top-level asm drop
Status: OPEN. Two intertwined items uncovered while porting library/modules/std/sched.sx
(the M:1 fiber runtime) to aarch64-linux. The WIP sched.sx port is preserved in
git stash (stash@{0}, "WIP on fix/0192-qualified-import-const-comptime") — pop it to resume.
The epoll bindings + std.event.Loop epoll backend are already committed (cc137002) and
runtime-validated on real Linux via Apple container (see the event.sx VALIDATION note / the
apple-container-linux-testing memory). This issue is only about the fiber scheduler port.
What WORKS (validated on aarch64-linux in an Apple container)
With the stashed sched.sx port, built --target aarch64-linux --self-contained and run in an
alpine container:
- 1811 (scheduler round-robin via
yield_now):sequence: 0 1 2 0 1 2 0 1 2, all done. ✓ - 1816 (
block_on_fdover a pipe — the epoll fd path):log: wrote read 3 [97 98 99],n_suspended: 0— identical to macOS kqueue. ✓ - macOS (kqueue) stays green for both.
The port (all in sched.sx) is: MAP_AP 0x1002→0x22; an inline if OS == .linux { ep :: #import "modules/std/net/epoll.sx" }; and inline if OS == { case .linux: <epoll> case .macos: <kqueue> }
branches in block_on_fd (open + EPOLLIN|EPOLLONESHOT register), the run-loop Mode-2
(epoll_wait + EPOLL_CTL_DEL-on-fire for one-shot parity), and cancel_io_waiter_for
(EPOLL_CTL_DEL-on-early-wake). Those epoll branches are correct (1816 proves it).
Bug A — register-indirect trampoline bus-errors on the go/wait/sleep capstone (1817)
To get the fiber trampoline onto linux without a per-OS hand-written global-asm symbol
(_fib_tramp vs fib_tramp), the stash replaces the global asm trampoline with a **naked sx fn
- register-indirect branch**:
spawnpresetsregs[1](x20) =xx fib_dispatch, andfib_tramp :: () abi(.naked) { asm { mov x0, x19 ; br x20 } }tail-branches to dispatch. Its own symbol is auto-emitted per-OS, so no.global/bl <name>literal.
This works for 1811 + 1816 (both run on linux AND macOS) but bus-errors immediately on 1817
(go/wait/sleep) on BOTH macOS and linux — Bus error, no output, a short recursive-looking
stack trace. HEAD's 1817 (committed global-asm trampoline) works (sum: 123), so the redesign is
the regression. Root cause not yet found: 1811/1816 use the same spawn/tramp path; the only thing
1817 adds is timer sleep + Task go/wait (suspend/resume). Suspect something about the
naked-fn tramp or x20 liveness specific to the Task-closure / resume path — needs a debugger on the
container build.
Bug B — a top-level asm block wrapped in an inline if is DROPPED (in sched.sx's context)
The redesign in Bug A was forced by this: wrapping the original global asm trampoline in
inline if OS == { case .linux: asm{…fib_tramp…} case .macos: asm{…_fib_tramp…} } (or the plain
inline if OS == .linux { asm } else { asm } form) makes the asm not emit at all — nm shows
fib_tramp as U (undefined), both platforms fail to link. A PLAIN unwrapped asm{} emits fine.
NOT reproducible in isolation: minimal/medium repros (top-level asm in a case; two case blocks;
case asm in an imported module; naked fn + case asm with bl to an exported fn; a one-sided
inline if .linux { #import } before it) ALL emit + link correctly. Only sched.sx (the full module)
drops it. So there's a real flatten/lowering interaction in src/imports.zig
flattenComptimeConditionals / appendBranchDecls (the comptime-conditional pre-pass that surfaces
top-level decls from a taken if_expr/match_expr arm) with a top-level asm node, triggered by
something else in sched.sx — not yet isolated.
Two paths to resolve (either suffices)
- Path A (compiler): fix Bug B — make a top-level
asmblock surviveinline if/caseflattening in all module contexts. Then the original global-asm trampoline can be OS-branched with thecaseform directly (no tramp redesign), sidestepping Bug A entirely. This is what the user asked for ("case form to emit top-level asm block"). Start: instrumentflattenComptimeConditionalsto dump the surfaced top-level decls for sched.sx and see where theasmnode is lost. - Path B (library): fix Bug A — debug the register-indirect tramp's 1817 bus error (gdb/lldb in the container on the aarch64-linux build, or a reduced go/wait/sleep repro). No compiler change.
Verification
git stash pop; then per-example: sx build --target aarch64-linux --self-contained -o /tmp/x examples/concurrency/<ex>.sx and container run --rm -v "$PWD/.sx-tmp:/work" alpine /work/x
(see apple-container-linux-testing). Target: 1811/1814/1816/1817 all green on linux AND macOS,
plus the full zig build test macOS suite.