Files
sx/issues/0193-linux-fiber-port-and-wrapped-asm-drop.md
agra f3f061ef00 fix: aarch64-linux port of the M:1 fiber runtime (sched.sx)
Port library/modules/std/sched.sx to run on aarch64-linux alongside
aarch64-macOS, validated byte-identical on both via Apple `container`.

Per-OS bits are comptime-branched:
- MAP_AP (mmap MAP_ANON flag): linux 0x22 / macOS 0x1002.
- fd-readiness backend: epoll on linux, kqueue on darwin (epoll import
  scoped to the linux branch). block_on_fd, the run-loop Mode-2 drain,
  and cancel_io_waiter_for each branch; the epoll paths EPOLL_CTL_DEL on
  fire and on early-wake (EPOLLONESHOT only disables a registration;
  kqueue EV_ONESHOT auto-removes it).
- first-entry trampoline: a per-OS hand-written global-asm symbol becomes
  a naked sx fn fib_tramp (mov x0,x19; br x20) + register-indirect
  dispatch (spawn presets regs[1] == x20 == &fib_dispatch), dropping the
  per-OS .global symbol entirely.

Fixes issue 0193 Bug A: the trampoline redesign bus-errored on the
go/wait/sleep capstone (1817) because fib_dispatch was not pinned to the
C ABI. Without an explicit ABI, fib_dispatch uses sx's internal calling
convention (x0 = implicit context, first arg self shifted to x1) while
the trampoline hands self over in x0 (C-ABI); on first entry the body
runs (x1 happens to alias self) but the closure then loads
regs[1] == &fib_dispatch as its first capture and re-invokes fib_dispatch
forever -> stack overflow -> bus error. Annotating fib_dispatch
`abi(.c)` pins it to the C-ABI (self in x0), matching the trampoline.
`abi(.c)` rather than `export` because the fn is reached only by address
through the trampoline, never by an external name -- so it needs the
convention, not a public symbol (it stays a local symbol). Root cause
found via lldb on an AOT build; confirmed against the compiler source.

Bug B (a top-level asm block wrapped in inline-if is dropped during the
comptime-conditional flatten) is carved out to issue 0194 (OPEN) -- no
live trigger remains, since the naked-fn trampoline sidesteps it.

1811/1814/1816/1817 run byte-identical on the aarch64-macOS host and in
an aarch64-linux container; full suite green (817/0). Documents the fiber
runtime in readme.md.
2026-06-26 11:51:46 +03:00

105 lines
7.0 KiB
Markdown

# Issue 0193 — linux fiber-runtime port (sched.sx) + a wrapped top-level `asm` drop
> **RESOLVED — port landed on aarch64-linux.**
>
> **Bug A (register-indirect trampoline bus-errors on 1817): FIXED.** Root cause found via lldb on an
> AOT macOS build (the bug reproduced on macOS too, so no container needed): the WIP port had left
> `fib_dispatch` with no explicit ABI annotation (the original pinned the C-ABI via `export
> "fib_dispatch"`, which the redesign dropped). Without a C-ABI pin the fn uses sx's INTERNAL calling
> convention, which reserves x0 for the implicit `context` pointer and shifts the first real arg `self`
> to x1 — but the trampoline (`mov x0, x19; br x20`) hands the fiber over in x0, C-ABI style. On first
> entry x1 coincidentally aliases `&fiber.ctx == self` (left there by the scheduler's prior
> `swap_context(from, to)`, x1 = to), so the body runs once; but inside it the closure loads
> `[Fiber+8] == ctx.regs[1] == &fib_dispatch` as its "first capture" and re-invokes `fib_dispatch`
> forever → stack overflow → bus error. **Fix:** annotate `fib_dispatch` `abi(.c)` so it keeps the
> C-ABI (`self` in x0), matching what the trampoline supplies — a one-line library change, no compiler
> change. `abi(.c)` is used rather than `export "fib_dispatch"` because the fn is reached only by
> address through the trampoline (`xx fib_dispatch`), never by an external name, so it needs the
> convention, not a public symbol (it stays a local symbol). The register-indirect naked-fn trampoline
> design is kept (it sidesteps Bug B's hand-written per-OS global-asm symbol). Adversarially reviewed
> against the compiler source (`src/ir/lower/decl.zig` `funcWantsImplicitCtx`/`wants_ctx`/
> `CallingConvention.c`); root cause + fix confirmed CORRECT.
>
> **Validation:** 1811 / 1814 / 1816 / 1817 (the go/wait/sleep capstone) all run **byte-identical** on
> the aarch64-macOS host AND in an aarch64-linux Apple `container` (`sum: 123`, completion order
> `2@10 3@20 1@30`, etc.). Full `zig build test` macOS suite GREEN (817/0).
>
> **Bug B (wrapped top-level `asm` dropped): carved out to `issues/0194-wrapped-toplevel-asm-dropped.md`
> as an OPEN compiler bug.** It is no longer triggered anywhere in the tree (the port no longer uses a
> wrapped global-asm block), so it does not block anything — but it is a real defect and stays filed.
>
> Original writeup below for history.
---
Status: **(historical — see RESOLVED banner above).** Two intertwined items uncovered while porting
`library/modules/std/sched.sx` (the M:1 fiber runtime) to aarch64-linux.
The epoll *bindings* + `std.event.Loop` epoll backend are already committed (`cc137002`) and
**runtime-validated on real Linux** via Apple `container` (see the event.sx VALIDATION note / the
[[apple-container-linux-testing]] memory). This issue is only about the **fiber scheduler** port.
## What WORKS (validated on aarch64-linux in an Apple `container`)
With the stashed sched.sx port, built `--target aarch64-linux --self-contained` and run in an
alpine container:
- **1811** (scheduler round-robin via `yield_now`): `sequence: 0 1 2 0 1 2 0 1 2`, all done. ✓
- **1816** (`block_on_fd` over a pipe — the **epoll** fd path): `log: wrote read 3 [97 98 99]`,
`n_suspended: 0` — identical to macOS kqueue. ✓
- macOS (kqueue) stays green for both.
The port (all in sched.sx) is: `MAP_AP` 0x1002→0x22; an `inline if OS == .linux { ep :: #import
"modules/std/net/epoll.sx" }`; and `inline if OS == { case .linux: <epoll> case .macos: <kqueue> }`
branches in `block_on_fd` (open + `EPOLLIN|EPOLLONESHOT` register), the run-loop Mode-2
(`epoll_wait` + `EPOLL_CTL_DEL`-on-fire for one-shot parity), and `cancel_io_waiter_for`
(`EPOLL_CTL_DEL`-on-early-wake). Those epoll branches are correct (1816 proves it).
## Bug A — register-indirect trampoline bus-errors on the go/wait/sleep capstone (1817)
To get the fiber trampoline onto linux without a per-OS hand-written global-asm symbol
(`_fib_tramp` vs `fib_tramp`), the stash replaces the global `asm` trampoline with a **naked sx fn
+ register-indirect branch**: `spawn` presets `regs[1]` (x20) = `xx fib_dispatch`, and
`fib_tramp :: () abi(.naked) { asm { mov x0, x19 ; br x20 } }` tail-branches to dispatch. Its own
symbol is auto-emitted per-OS, so no `.global`/`bl <name>` literal.
This **works for 1811 + 1816** (both run on linux AND macOS) but **bus-errors immediately on 1817**
(`go`/`wait`/`sleep`) on BOTH macOS and linux — `Bus error`, no output, a short recursive-looking
stack trace. HEAD's 1817 (committed global-asm trampoline) works (`sum: 123`), so the redesign is
the regression. Root cause not yet found: 1811/1816 use the same `spawn`/tramp path; the only thing
1817 adds is timer `sleep` + `Task` `go`/`wait` (suspend/resume). Suspect something about the
naked-fn tramp or x20 liveness specific to the Task-closure / resume path — needs a debugger on the
container build.
## Bug B — a top-level `asm` block wrapped in an `inline if` is DROPPED (in sched.sx's context)
The redesign in Bug A was forced by this: wrapping the **original** global `asm` trampoline in
`inline if OS == { case .linux: asm{…fib_tramp…} case .macos: asm{…_fib_tramp…} }` (or the plain
`inline if OS == .linux { asm } else { asm }` form) makes the asm **not emit at all**`nm` shows
`fib_tramp` as `U` (undefined), both platforms fail to link. A PLAIN unwrapped `asm{}` emits fine.
NOT reproducible in isolation: minimal/medium repros (top-level asm in a case; two case blocks;
case asm in an imported module; naked fn + case asm with `bl` to an exported fn; a one-sided
`inline if .linux { #import }` before it) ALL emit + link correctly. Only sched.sx (the full module)
drops it. So there's a real flatten/lowering interaction in `src/imports.zig`
`flattenComptimeConditionals` / `appendBranchDecls` (the comptime-conditional pre-pass that surfaces
top-level decls from a taken `if_expr`/`match_expr` arm) with a top-level `asm` node, triggered by
something else in sched.sx — not yet isolated.
## Two paths to resolve (either suffices)
- **Path A (compiler):** fix Bug B — make a top-level `asm` block survive `inline if`/`case`
flattening in all module contexts. Then the original global-asm trampoline can be OS-branched with
the `case` form directly (no tramp redesign), sidestepping Bug A entirely. This is what the user
asked for ("case form to emit top-level asm block"). Start: instrument
`flattenComptimeConditionals` to dump the surfaced top-level decls for sched.sx and see where the
`asm` node is lost.
- **Path B (library):** fix Bug A — debug the register-indirect tramp's 1817 bus error (gdb/lldb in
the container on the aarch64-linux build, or a reduced go/wait/sleep repro). No compiler change.
## Verification
`git stash pop`; then per-example: `sx build --target aarch64-linux --self-contained -o /tmp/x
examples/concurrency/<ex>.sx` and `container run --rm -v "$PWD/.sx-tmp:/work" alpine /work/x`
(see [[apple-container-linux-testing]]). Target: 1811/1814/1816/1817 all green on linux AND macOS,
plus the full `zig build test` macOS suite.