From 22f4719e834dbd1ad119e9531f73b99cc812214d Mon Sep 17 00:00:00 2001 From: agra Date: Fri, 26 Jun 2026 11:32:01 +0300 Subject: [PATCH] fix: aarch64-linux port of the M:1 fiber runtime (sched.sx) Port library/modules/std/sched.sx to run on aarch64-linux alongside aarch64-macOS, validated byte-identical on both via Apple `container`. Per-OS bits are comptime-branched: - MAP_AP (mmap MAP_ANON flag): linux 0x22 / macOS 0x1002. - fd-readiness backend: epoll on linux, kqueue on darwin (epoll import scoped to the linux branch). block_on_fd, the run-loop Mode-2 drain, and cancel_io_waiter_for each branch; the epoll paths EPOLL_CTL_DEL on fire and on early-wake (EPOLLONESHOT only disables a registration; kqueue EV_ONESHOT auto-removes it). - first-entry trampoline: a per-OS hand-written global-asm symbol becomes a naked sx fn fib_tramp (mov x0,x19; br x20) + register-indirect dispatch (spawn presets regs[1] == x20 == &fib_dispatch), dropping the per-OS .global symbol entirely. Fixes issue 0193 Bug A: the trampoline redesign bus-errored on the go/wait/sleep capstone (1817) until `export "fib_dispatch"` was restored. Without the export, fib_dispatch reverts to sx's internal ABI (x0 = implicit context, first arg self shifted to x1) while the trampoline hands self over in x0 (C-ABI); on first entry the body runs (x1 happens to alias self) but the closure then loads regs[1] == &fib_dispatch as its first capture and re-invokes fib_dispatch forever -> stack overflow -> bus error. The export pins fib_dispatch to the C-ABI (self in x0), matching the trampoline. Root cause found via lldb on an AOT build; confirmed against the compiler source. Bug B (a top-level asm block wrapped in inline-if is dropped during the comptime-conditional flatten) is carved out to issue 0194 (OPEN) -- no live trigger remains, since the naked-fn trampoline sidesteps it. 1811/1814/1816/1817 run byte-identical on the aarch64-macOS host and in an aarch64-linux container; full suite green (817/0). Documents the fiber runtime in readme.md. --- current/CHECKPOINT-FIBERS.md | 50 ++++- ...3-linux-fiber-port-and-wrapped-asm-drop.md | 33 ++- issues/0194-wrapped-toplevel-asm-dropped.md | 99 +++++++++ library/modules/std/sched.sx | 207 +++++++++++++----- readme.md | 46 ++++ 5 files changed, 370 insertions(+), 65 deletions(-) create mode 100644 issues/0194-wrapped-toplevel-asm-dropped.md diff --git a/current/CHECKPOINT-FIBERS.md b/current/CHECKPOINT-FIBERS.md index f4b3081e..abbc159e 100644 --- a/current/CHECKPOINT-FIBERS.md +++ b/current/CHECKPOINT-FIBERS.md @@ -4,7 +4,32 @@ Companion to [PLAN-FIBERS.md](PLAN-FIBERS.md). Update after every step (one step per the cadence rule). New corpus category: `18xx` concurrency. ## Last completed step -**B1 follow-up — `Scheduler.deinit` (close the bounded leaks).** Post-B1 non-blocking cleanup: a +**B1.6 — aarch64-LINUX port of the M:1 fiber runtime (sched.sx).** `library/modules/std/sched.sx` +now runs end-to-end on aarch64-linux as well as aarch64-macOS, validated **byte-identical** on both +via Apple `container` (static ELF, no emulation). The per-OS bits are comptime-branched: +- `MAP_AP` (mmap MAP_ANON flag) — `inline if OS == { case .linux: 0x22 case .macos: 0x1002 }`, + exhaustive on the supported OSes (no default → a new target fails loud on use). +- The fd-readiness backend — kqueue on darwin, **epoll on linux**. The `epoll` import is scoped to + the linux branch (`inline if OS == .linux { ep :: #import "modules/std/net/epoll.sx" }`) so darwin + never pulls epoll types into the concurrency examples (the std-barrel-drift rule). `block_on_fd`, the + run-loop Mode-2 drain, and `cancel_io_waiter_for` each branch kqueue/epoll; epoll additionally + `EPOLL_CTL_DEL`s on fire + on early-wake (EPOLLONESHOT only DISABLES, kqueue EV_ONESHOT auto-removes). +- The first-entry trampoline was redesigned from a per-OS hand-written global-asm symbol to a **naked + sx fn** `fib_tramp` (`mov x0, x19; br x20`) + register-indirect dispatch (spawn presets + `regs[1] == x20 == &fib_dispatch`), so no per-OS `.global _fib_tramp`/`fib_tramp` symbol literal is + needed. This sidesteps a compiler bug (wrapped top-level `asm` dropped — now **issue 0194**, OPEN). + +**Bug fixed en route (issue 0193 Bug A):** the tramp redesign initially bus-errored on the 1817 +go/wait/sleep capstone (both OSes) because the WIP had dropped `export "fib_dispatch"`. Without the +export `fib_dispatch` uses sx's internal ABI (x0 = implicit `context`, `self` shifted to x1), but the +trampoline hands `self` in x0 (C-ABI) → on first entry the body runs (x1 happens to alias `self`) but +the closure then loads `regs[1] == &fib_dispatch` as its first capture and recurses forever → stack +overflow. **Fix: restore `export "fib_dispatch"`** (pins it to C-ABI, `self` in x0). Root cause found +via lldb on an AOT macOS build; confirmed by an adversarial source review (`src/ir/lower/decl.zig`). +The 1817 capstone in the suite guards the fix. Suite GREEN **817/0**; 1811/1814/1816/1817 byte-identical +macOS host ↔ aarch64-linux container. + +### Earlier — B1 follow-up — `Scheduler.deinit` (close the bounded leaks). Post-B1 non-blocking cleanup: a terminal `deinit` on `library/modules/std/sched.sx`'s `Scheduler` releases the resources B1 documented as leaked. Frees, in order: (1) any fibers still enqueued ready (leak-safety net for `spawn`/`go` without `run()` — `munmap` stack + free struct; a suspended off-queue fiber is unreachable, but a clean @@ -401,12 +426,12 @@ env remains unfreeable (language limitation). Locked by `18xx` 1800–1820 (nake blocking async, the switch + §10.7 stress gate + guarded stacks + Win64 sibling, scheduler round-robin, suspend/wake, async go/wait/cancel, sim-timer ordering, timer early-wake eviction, kqueue pipe I/O, the **1817 end-to-end capstone**, sleep-negative/double-wait guards, and **1820 scheduler-deinit**). Suite -GREEN **759/0**, committed. +GREEN **817/0**, committed. **B1.6: now also runs on aarch64-linux** (epoll fd-backend + comptime-branched +`MAP_AP` + naked-fn trampoline) — validated byte-identical to macOS in an Apple `container`. -Future work (none blocking B1): a **linux epoll twin** of `block_on_fd` (mirror via `std/net/epoll`; -OS-neutral facade `std.event`) — B1.4c wired macOS kqueue only; routing the suspending async through -the erased `context.io` (forces sched.sx into every std consumer + duplicates the `_fib_tramp` global -asm — deferred to the M:N model, where the `Io` protocol's `spawn_raw`/`suspend_raw`/`ready`/ +Future work (none blocking B1): routing the suspending async through +the erased `context.io` (forces sched.sx into every std consumer — deferred to the M:N model, where +the `Io` protocol's `spawn_raw`/`suspend_raw`/`ready`/ `arm_timer`/`poll` hooks take over); `Future(void)`/`timeout` (issue 0150); freeing the heap-Task / closure-env / kq-fd (a Scheduler `deinit` + closure-env-ownership affordance). **Next carve: Stream B2** (channels / structured cancel / async stdlib) — see PLAN-CHANNELS.md when started. @@ -684,6 +709,19 @@ incomplete); a dedicated effort; lambda workers are the idiom meanwhile. trusted. `18xx` asserts program-emitted ordering contracts, not raw interleaving. ## Log +- **B1.6 — aarch64-linux port of sched.sx.** Comptime-branched the per-OS bits: `MAP_AP` (linux + `0x22` / macOS `0x1002`), the fd-readiness backend (epoll on linux, kqueue on darwin — epoll import + scoped to the linux branch; `block_on_fd` / run-loop Mode-2 / `cancel_io_waiter_for` each branch, + epoll `EPOLL_CTL_DEL`s on fire + early-wake), and the first-entry trampoline (per-OS global-asm + symbol → naked sx fn `fib_tramp` + register-indirect `br x20` to `&fib_dispatch` preset in + `regs[1]`). **Fixed issue 0193 Bug A:** the tramp redesign bus-errored on 1817 (both OSes) until + `export "fib_dispatch"` was restored — without it the fn uses sx's internal ABI (x0 = implicit + `context`, `self` → x1) while the trampoline supplies `self` in x0, so the closure loads + `regs[1] == &fib_dispatch` as its first capture and recurses forever → stack-overflow bus error. + Root cause found via lldb (AOT macOS build) + an adversarial source review. **Bug B** (wrapped + top-level `asm` dropped) carved to **issue 0194** (OPEN; no live trigger — the naked-fn tramp + sidesteps it). Validated byte-identical on aarch64-macOS host AND aarch64-linux Apple `container` + for 1811/1814/1816/1817; full suite GREEN **817/0**. - **B1 follow-up — `Scheduler.deinit`.** Closes the bounded leaks B1 documented. Added a `task_allocs: List(*void)` field (appended in `go` so the scheduler can reach its generic `Task($R)`s) + a canonical `close` extern, then a terminal idempotent `deinit`: reap leftover ready fibers (`munmap` + free) → diff --git a/issues/0193-linux-fiber-port-and-wrapped-asm-drop.md b/issues/0193-linux-fiber-port-and-wrapped-asm-drop.md index aebaa5bb..7067146c 100644 --- a/issues/0193-linux-fiber-port-and-wrapped-asm-drop.md +++ b/issues/0193-linux-fiber-port-and-wrapped-asm-drop.md @@ -1,8 +1,35 @@ # Issue 0193 — linux fiber-runtime port (sched.sx) + a wrapped top-level `asm` drop -Status: **OPEN.** Two intertwined items uncovered while porting `library/modules/std/sched.sx` -(the M:1 fiber runtime) to aarch64-linux. The WIP sched.sx port is preserved in -`git stash` (`stash@{0}`, "WIP on fix/0192-qualified-import-const-comptime") — pop it to resume. +> **RESOLVED — port landed on aarch64-linux.** +> +> **Bug A (register-indirect trampoline bus-errors on 1817): FIXED.** Root cause found via lldb on an +> AOT macOS build (the bug reproduced on macOS too, so no container needed): the WIP port had dropped +> `export "fib_dispatch"` from `fib_dispatch`. Without the export the fn reverts to sx's INTERNAL +> calling convention, which reserves x0 for the implicit `context` pointer and shifts the first real +> arg `self` to x1 — but the trampoline (`mov x0, x19; br x20`) hands the fiber over in x0, C-ABI +> style. On first entry x1 coincidentally aliases `&fiber.ctx == self` (left there by the scheduler's +> prior `swap_context(from, to)`, x1 = to), so the body runs once; but inside it the closure loads +> `[Fiber+8] == ctx.regs[1] == &fib_dispatch` as its "first capture" and re-invokes `fib_dispatch` +> forever → stack overflow → bus error. **Fix:** restore `export "fib_dispatch"` so the fn keeps the +> C-ABI (`self` in x0), matching what the trampoline supplies — a one-line library change, no compiler +> change. The register-indirect naked-fn trampoline design is kept (it sidesteps Bug B's hand-written +> per-OS global-asm symbol). Adversarially reviewed against the compiler source (`src/ir/lower/decl.zig` +> `funcWantsImplicitCtx`/`wants_ctx`/`CallingConvention.c`); root cause + fix confirmed CORRECT. +> +> **Validation:** 1811 / 1814 / 1816 / 1817 (the go/wait/sleep capstone) all run **byte-identical** on +> the aarch64-macOS host AND in an aarch64-linux Apple `container` (`sum: 123`, completion order +> `2@10 3@20 1@30`, etc.). Full `zig build test` macOS suite GREEN (817/0). +> +> **Bug B (wrapped top-level `asm` dropped): carved out to `issues/0194-wrapped-toplevel-asm-dropped.md` +> as an OPEN compiler bug.** It is no longer triggered anywhere in the tree (the port no longer uses a +> wrapped global-asm block), so it does not block anything — but it is a real defect and stays filed. +> +> Original writeup below for history. + +--- + +Status: **(historical — see RESOLVED banner above).** Two intertwined items uncovered while porting +`library/modules/std/sched.sx` (the M:1 fiber runtime) to aarch64-linux. The epoll *bindings* + `std.event.Loop` epoll backend are already committed (`cc137002`) and **runtime-validated on real Linux** via Apple `container` (see the event.sx VALIDATION note / the diff --git a/issues/0194-wrapped-toplevel-asm-dropped.md b/issues/0194-wrapped-toplevel-asm-dropped.md new file mode 100644 index 00000000..ce8b7597 --- /dev/null +++ b/issues/0194-wrapped-toplevel-asm-dropped.md @@ -0,0 +1,99 @@ +# Issue 0194 — a top-level global `asm` block wrapped in `inline if` / `case` is DROPPED + +Status: **OPEN.** Carved out of issue 0193 (the linux fiber-runtime port). The port itself is +RESOLVED — it sidesteps this bug entirely by using a naked-sx-fn trampoline (`fib_tramp`) plus a +register-indirect `br x20` instead of a hand-written global-asm symbol, so there is **no live trigger +for this bug in the tree today.** It is filed standalone so the compiler defect is not lost. + +## Symptom + +A top-level global `asm { … }` block that defines a symbol (e.g. `.global _foo` / `_foo: …`) is +**not emitted** when it is wrapped in a comptime `inline if OS == { case … }` (or +`inline if OS == .linux { asm } else { asm }`). `nm main.o` shows the symbol as `U` (undefined) and +the link fails on both platforms. A PLAIN, unwrapped top-level `asm { … }` emits fine. + +- **Observed:** symbol undefined, link error. +- **Expected:** the `asm` block in the taken comptime arm emits its template into the module's global + asm exactly as an unwrapped block would (the comptime-conditional pre-pass already surfaces the + taken arm's *other* top-level decls — fns, consts, imports — correctly; only the `asm_global` node + is lost). + +## Reproduction + +**Not yet reproducible in isolation.** During the 0193 port, minimal/medium repros ALL emitted + +linked correctly: a top-level `asm` in a single `case`; two `case` blocks; a `case` asm in an +imported module; a naked fn + `case` asm with `bl` to an exported fn; a one-sided +`inline if .linux { #import }` before the asm. **Only the full `library/modules/std/sched.sx` +dropped it** — so the trigger is an interaction with something else in that module, not the wrapped +`asm` alone. + +The exact form that triggered it (now replaced on the branch, recoverable from history): the original +global trampoline + +```sx +asm { + #string T +.global _fib_tramp +_fib_tramp: + mov x0, x19 + bl _fib_dispatch + brk #0 +T, +}; +fib_tramp :: () extern; +``` + +wrapped as + +```sx +inline if OS == { + case .linux: asm { #string T +fib_tramp: + mov x0, x19 + bl fib_dispatch + br x30 +T, }; + case .macos: asm { #string T +.global _fib_tramp +_fib_tramp: + mov x0, x19 + bl _fib_dispatch + brk #0 +T, }; +} +``` + +dropped the asm in BOTH arms (whichever was taken). See `issues/0193-linux-fiber-port.patch` for the +full module context that triggers it, and the 0193 writeup for the larger investigation history. + +## Investigation prompt (ready to paste) + +> A top-level global `asm` block defining a symbol is dropped when wrapped in a comptime +> `inline if OS == { case … }` — but only inside the full `library/modules/std/sched.sx`; it can't be +> reproduced in isolation. Find where the surfaced `asm_global` node is lost between the +> comptime-conditional flatten and IR lowering. +> +> Key files: +> - `src/imports.zig` — `flattenComptimeConditionals` (line ~38) + `appendBranchDecls` (line ~72): the +> pre-pass that surfaces a taken comptime arm's top-level decls. It *appears* correct — it appends +> every node of the taken branch's block, `asm_global` included — so confirm the flattened slice +> actually carries the `asm_global` node (dump `flat_decls` at `src/imports.zig:932`). +> - `src/ir/lower/decl.zig` — `lowerMainAndComptime` (line ~1494), whose `.asm_global` arm (line ~1503) +> appends the verbatim template to `self.module.global_asm`. **Prime suspect:** does the lowering +> entry point feed `lowerMainAndComptime` the *flattened* decl list, or a pre-flatten `root.decls` +> that never contains the surfaced (formerly-nested) `asm_global`? If the asm-emission pass walks a +> different decl list than the one flattening wrote to, a surfaced `asm_global` is silently skipped. +> - `src/ir/emit_llvm.zig:384` — where `module.global_asm` is concatenated into the LLVM module. If the +> node never reached `global_asm`, it never emits. +> +> Steps: (1) build sched.sx's wrapped-asm variant (recover from `issues/0193-linux-fiber-port.patch` +> or git history of branch `fix/0192-qualified-import-const-comptime`), (2) instrument +> `flattenComptimeConditionals` to log whether the `asm_global` node survives into `flat_decls`, +> (3) instrument `lowerMainAndComptime` to log whether it ever *sees* an `asm_global`, (4) bisect what +> else in sched.sx must be present for the drop to occur (the isolation repros lacked it). +> Verification: `nm` the object shows the wrapped-asm symbol DEFINED (not `U`); the wrapped form links +> and runs identically to a plain unwrapped `asm`. +> +> **Verify it isn't a syntax issue first:** it reproduces with both the `case` and `if/else` forms, +> and plain unwrapped asm emits fine — so the wrapping, not the asm itself, is the trigger. That points +> to the flatten/lowering interaction, not user error. diff --git a/library/modules/std/sched.sx b/library/modules/std/sched.sx index c2f4b564..d918d764 100644 --- a/library/modules/std/sched.sx +++ b/library/modules/std/sched.sx @@ -13,19 +13,28 @@ // - `swap_context` (aarch64 `abi(.naked)`, 13-slot save area: x19..x28, fp, // lr, sp) saves the callee-saved registers + SP into `*from` and loads them // from `*to`, then `ret`s onto `to`'s stack. -// - the `_fib_tramp` global-asm first-entry trampoline: x19 holds the -// bootstrapped `*Fiber`; it moves it to x0 and `bl`s the exported generic -// dispatch `fib_dispatch`, which calls the body then switches back to the -// scheduler. +// - the `fib_tramp` first-entry trampoline (a naked sx fn): x19 holds the +// bootstrapped `*Fiber` and x20 = `&fib_dispatch`; it moves the fiber to x0 +// and `br`s through x20 to the C-ABI `fib_dispatch`, which calls the body +// then switches back to the scheduler. // - guarded `mmap` stacks: `[GUARD | usable]`, low GUARD page `mprotect`'d // PROT_NONE, 16-aligned top returned as the bootstrapped SP. // -// aarch64-macOS-pinned: the `swap_context` asm + the 13-slot save area are -// per-arch; the `mmap` flag constants (MAP_ANON = 0x1000) and the 16 KB guard -// page are Apple-specific. Runs end-to-end on a matching host, ir-only on a -// mismatch. +// aarch64-pinned (macOS + linux): the `swap_context` asm + the 13-slot save +// area are per-arch. The per-OS bits are branched at comptime — `mmap`'s +// MAP_ANON flag (`MAP_AP`) and the fd-readiness backend (kqueue on darwin, +// epoll on linux). Runs end-to-end on a matching aarch64 host, ir-only on an +// arch mismatch. #import "modules/std.sx"; kqb :: #import "modules/std/net/kqueue.sx"; +// The fd-readiness backend is per-OS: kqueue (kqb, above) on darwin, epoll on +// linux. The epoll import is scoped to the linux branch so darwin never pulls +// epoll's types into the concurrency examples' type tables (the same +// std-barrel-drift rule std.event.Loop follows); `block_on_fd` / the run loop +// reference `ep` only inside their own `inline if OS == .linux` arms. +inline if OS == .linux { + ep :: #import "modules/std/net/epoll.sx"; +} // --- libc mmap stack primitives ------------------------------------------- @@ -40,7 +49,14 @@ abort :: () -> noreturn extern libc "abort"; PROT_NONE :: 0; PROT_RW :: 3; // PROT_READ | PROT_WRITE -MAP_AP :: 0x1002; // macOS MAP_PRIVATE (0x2) | MAP_ANON (0x1000) +// Exhaustive on the SUPPORTED OSes (linux/macOS), no default case: an +// unsupported target matches no case → MAP_AP undefined → a loud compile error +// on use rather than a silent wrong flag. (The fiber runtime is aarch64-only +// anyway — the swap_context asm — so only these two platforms are wired.) +inline if OS == { + case .linux: MAP_AP :: 0x22; // linux MAP_PRIVATE (0x2) | MAP_ANON (0x20) + case .macos: MAP_AP :: 0x1002; // macOS MAP_PRIVATE (0x2) | MAP_ANON (0x1000) +} GUARD :: 16384; // one 16 KB page (aarch64-macOS) STACK :: 131072; // 128 KB usable per fiber @@ -172,10 +188,11 @@ Scheduler :: struct { self.n_spawned = self.n_spawned + 1; top := boot_stack(f, STACK); - f.ctx.regs[0] = xx f; // x19 = self - f.ctx.regs[10] = 0; // fp - f.ctx.regs[11] = xx fib_tramp; // lr → trampoline - f.ctx.regs[12] = top; // sp + f.ctx.regs[0] = xx f; // x19 = self (→ x0 in the tramp) + f.ctx.regs[1] = xx fib_dispatch; // x20 = dispatch entry (tramp `br`s to it) + f.ctx.regs[10] = 0; // fp + f.ctx.regs[11] = xx fib_tramp; // lr → trampoline + f.ctx.regs[12] = top; // sp f.state = .ready; enqueue(self, f); @@ -239,12 +256,13 @@ Scheduler :: struct { // but was woken by another path (a manual wake, a Task completion), its // `IoWaiter` would otherwise survive pointing at a fiber that runs to // completion and is reaped (stack munmap'd + Fiber freed). A later - // kqueue drain matching that stale record would `wake` freed memory. - // Evict it here. NOTE: we do NOT EV_DELETE the kqueue registration — it - // is EV_ONESHOT, so a never-fired registration simply lingers in the - // kernel queue until the fd is readable, at which point the drain finds - // no matching waiter and ignores it (see `run`). The fd is the example's - // to close; closing it auto-removes any pending registration. + // readiness drain matching that stale record would `wake` freed memory. + // Evict it here. The kernel-side registration is handled per-OS inside + // `cancel_io_waiter_for`: on darwin the EV_ONESHOT kqueue registration is + // left to linger (a never-fired one-shot the drain ignores; the fd's + // owner closes it, auto-removing it), but on linux the EPOLLONESHOT + // registration stays enabled and must be `EPOLL_CTL_DEL`'d (else it could + // fire later with no waiter and would block a re-arm of the same fd). cancel_io_waiter_for(self, f); self.n_suspended = self.n_suspended - 1; f.state = .ready; @@ -333,20 +351,38 @@ Scheduler :: struct { } j = j + 1; } - // Lazily open the kqueue fd the first time fd-blocking is used. + // Lazily open the event-queue fd the first time fd-blocking is used: + // kqueue on darwin, epoll on linux. `self.kq` holds whichever — it is + // just "the readiness queue fd". if self.kq < 0 { - self.kq = kqb.kqueue(); + inline if OS == { + case .linux: self.kq = ep.ep_create(); + case .macos: self.kq = kqb.kqueue(); + } if self.kq < 0 { - print("sched: kqueue() failed to open the event queue\n"); + print("sched: failed to open the event queue\n"); abort(); } } - // Arm a one-shot read-readiness registration for `fd`. udata is unused - // (we match the waiter by fd in the drain), so pass 0. - chg := kqb.kev_change(fd, kqb.EVFILT_READ, kqb.EV_ADD | kqb.EV_ENABLE | kqb.EV_ONESHOT, 0); - if !kqb.kq_apply(self.kq, chg) { - print("sched: kevent() failed to register fd {} for read readiness\n", fd); - abort(); + // Arm a one-shot read-readiness registration for `fd`, matched back by + // the run-loop drain (kqueue by ident; epoll stashes the fd in `data`). + // darwin EV_ONESHOT auto-removes the registration on fire; epoll's + // EPOLLONESHOT only DISABLES it, so the linux paths additionally + // EPOLL_CTL_DEL on fire (run) and on early-wake (cancel_io_waiter_for). + inline if OS == { + case .linux: { + if !ep.ep_ctl(self.kq, ep.EPOLL_CTL_ADD, fd, ep.EPOLLIN | ep.EPOLLONESHOT) { + print("sched: epoll_ctl() failed to register fd {} for read readiness\n", fd); + abort(); + } + } + case .macos: { + chg := kqb.kev_change(fd, kqb.EVFILT_READ, kqb.EV_ADD | kqb.EV_ENABLE | kqb.EV_ONESHOT, 0); + if !kqb.kq_apply(self.kq, chg) { + print("sched: kevent() failed to register fd {} for read readiness\n", fd); + abort(); + } + } } // Record the waiter BEFORE parking — the run loop matches the fired // event's ident back to this record. Long-lived-container rule: the @@ -407,20 +443,42 @@ Scheduler :: struct { // kernel reports at least one fd ready, then wake every waiter whose // fd fired. (null timeout via -1 → wait forever.) if self.io_waiters.len > 0 { - evbuf : [MAXEV]kqb.Kevent = ---; - n := kqb.kq_wait(self.kq, @evbuf[0], MAXEV, -1); - if n < 0 { - print("sched: kevent() wait failed while blocking on fd readiness\n"); - abort(); - } - // For each fired event, find the io-waiter whose fd matches its - // ident, evict it, and wake its fiber. EV_ONESHOT already removed - // the kernel registration, so we only drop the waiter record. - i := 0; - while i < n { - ready_fd : i32 = xx evbuf[i].ident; - wake_io_waiter_for_fd(self, ready_fd); - i = i + 1; + // BLOCK on the readiness queue until ≥1 fd fires (timeout -1 = + // forever), then for each fired event match the fd back to its + // io-waiter, evict the record, and wake the fiber. + inline if OS == { + case .linux: { + evbuf : [MAXEV]ep.EpollEvent = ---; + n := ep.ep_wait(self.kq, .{ ptr = @evbuf[0], len = MAXEV }, MAXEV, -1); + if n < 0 { + print("sched: epoll_wait() failed while blocking on fd readiness\n"); + abort(); + } + i := 0; + while i < n { + ready_fd := ep.ev_fd(evbuf[i]); + wake_io_waiter_for_fd(self, ready_fd); + // EPOLLONESHOT only DISABLED the registration; remove it + // fully so the fd can be re-armed by a future block_on_fd + // (kqueue's EV_ONESHOT removes it for free). + ep.ep_ctl(self.kq, ep.EPOLL_CTL_DEL, ready_fd, 0); + i = i + 1; + } + } + case .macos: { + evbuf : [MAXEV]kqb.Kevent = ---; + n := kqb.kq_wait(self.kq, @evbuf[0], MAXEV, -1); + if n < 0 { + print("sched: kevent() wait failed while blocking on fd readiness\n"); + abort(); + } + i := 0; + while i < n { + ready_fd : i32 = xx evbuf[i].ident; + wake_io_waiter_for_fd(self, ready_fd); + i = i + 1; + } + } } continue; } @@ -539,23 +597,48 @@ ASM }; } -// First-entry trampoline: a fiber's bootstrapped LR points here. x19 holds the -// `*Fiber` (preset in the saved context); move it to x0 and call the generic -// dispatch. -asm { - #string T -.global _fib_tramp -_fib_tramp: +// First-entry trampoline: a fiber's bootstrapped LR points here, with x19 = +// `*Fiber` and x20 = `&fib_dispatch` (both preset in the saved context by +// `spawn`, both callee-saved so `swap_context` restores them on first entry). +// Move the fiber to x0 and tail-branch to dispatch via the REGISTER (x20) — so +// there is no hand-written global-asm symbol and nothing here needs per-OS +// symbol naming (`_fib_tramp` on darwin vs `fib_tramp` on linux) or a `bl` to a +// named export. As a naked sx fn `fib_tramp`'s own symbol is emitted with the +// platform-correct name automatically, so `spawn`'s `xx fib_tramp` resolves on +// every target. This register-indirect bootstrap replaced an OS-conditional +// global `asm` block (a top-level `asm` wrapped in an `inline if` is dropped in +// this module's context — see issues/0193) and sidesteps the hand-written +// symbol entirely, which is cleaner regardless. +fib_tramp :: () abi(.naked) { + asm volatile { + #string T mov x0, x19 - bl _fib_dispatch - brk #0 -T, -}; -fib_tramp :: () extern; + br x20 +T + }; +} -// The ONE place that runs a fiber body. Reached only from `_fib_tramp` on first +// The ONE place that runs a fiber body. Reached only from `fib_tramp` on first // entry, on the fiber's own fresh stack. Runs the body, marks the fiber done, // and switches back to the scheduler — never returns past the final switch. +// +// `export "fib_dispatch"` is MANDATORY, not decorative: it pins this fn to the +// **C ABI** (first real arg `self` in x0). The trampoline hands the fiber over +// in x0 (`mov x0, x19; br x20`), which is exactly C-ABI. Drop the export and the +// fn reverts to sx's INTERNAL calling convention, which reserves x0 for the +// implicit `context` pointer and shifts `self` to x1 — so the trampoline's x0 +// would land in the context slot and `self` would be read from a garbage x1. On +// first entry that garbage happens to alias `&fiber.ctx == self` (left in x1 by +// the scheduler's prior `swap_context`), so the body runs once; but inside it +// the closure loads `[Fiber+8] == regs[1] == &fib_dispatch` as its "first +// capture" and re-invokes `fib_dispatch` forever → stack overflow → bus error +// (issue 0193 Bug A, observed only on the go/wait/sleep capstone 1817). +// +// One consequence of the C-ABI boundary: an exported fn has no implicit +// `context` param, so `self.body()` runs under the static `__sx_default_context` +// — NOT whatever `push Context { allocator = ... }` was in force at the +// `run()` call site. Fiber bodies do not inherit a caller-scoped allocator; a +// body that needs one must capture it explicitly (the long-lived-container rule). fib_dispatch :: (self: *Fiber) export "fib_dispatch" { self.body(); self.state = .done; @@ -687,7 +770,19 @@ cancel_io_waiter_for :: (self: *Scheduler, f: *Fiber) { i := 0; while i < self.io_waiters.len { if self.io_waiters.items[i].fiber == f { - remove_io_waiter(self, i); + // Early-wake: the fiber is re-readied by another path while its fd + // registration is still armed. kqueue's EV_ONESHOT lingers + // harmlessly (a never-fired one-shot the drain ignores); epoll's + // EPOLLONESHOT registration stays enabled — it could fire later with + // no waiter, and blocks a re-arm of the same fd — so remove it. + inline if OS == { + case .linux: { + fd := self.io_waiters.items[i].fd; + remove_io_waiter(self, i); + if self.kq >= 0 { ep.ep_ctl(self.kq, ep.EPOLL_CTL_DEL, fd, 0); } + } + case .macos: remove_io_waiter(self, i); + } return; } i = i + 1; diff --git a/readme.md b/readme.md index 1929461b..31fd4fb4 100644 --- a/readme.md +++ b/readme.md @@ -30,6 +30,7 @@ main :: () { - Pattern matching on enums, optionals, and type categories - C interop via `extern` / `export` and `#import c` - Inline assembly as a first-class expression +- Colorblind async via a pure-sx cooperative fiber runtime (no function coloring) - Targets: macOS (ARM64, x86_64), Linux (x86_64, ARM64), Windows (x86_64), WebAssembly ## Usage @@ -511,6 +512,51 @@ fence(.seq_cst); // standalone memory fence combinations are compile errors. The same operations run at compile time (`#run`) under single-threaded semantics. +### Async / Concurrency (`modules/std/sched.sx`) + +A pure-sx cooperative fiber runtime — **colorblind async**, with no `async` / +`await` keywords and no function coloring. Any function can suspend; a `Scheduler` +drives any number of stackful fibers, each on its own guard-paged stack. The +high-level API is `go` to spawn a task and `wait` to suspend until it completes: + +```sx +#import "modules/std.sx"; +sched :: #import "modules/std/sched.sx"; + +main :: () { + s := sched.Scheduler.init(); + ps := @s; // closures capture by value — capture a pointer to the scheduler + + // The coordinator runs as a fiber so `wait` has a fiber to park. + s.spawn(() => { + a := ps.go(() -> i64 => { ps.sleep(30); 100 }); // launch async tasks + b := ps.go(() -> i64 => { ps.sleep(10); 20 }); + c := ps.go(() -> i64 => { ps.sleep(20); 3 }); + + sum := (a.wait() or 0) + (b.wait() or 0) + (c.wait() or 0); // 123 + print("sum: {}\n", sum); + }); + + s.run(); // drive the scheduler until all fibers finish +} +``` + +Tasks complete in deadline order, not spawn or await order. The runtime offers: + +- **`go(work) -> *Task($R)`** / **`wait() -> R !TaskErr`** / **`cancel()`** — the + task layer. `wait` rides the `!` error channel so a cancel surfaces as + `error.Canceled`. +- **`spawn`**, **`yield_now`**, **`suspend_self`**, **`wake`** — the raw fiber + primitives the task layer is built on. +- **`sleep(ms)`** / **`now_ms()`** — timer-driven suspension on a virtual clock + (deterministic, no real wall time). +- **`block_on_fd(fd, want_read)`** — suspend until a file descriptor is ready, + backed by kqueue (darwin) or epoll (linux). + +It's an M:1 model (cooperative, no preemption — so no data races between fibers +and no atomics needed across them), built on `abi(.naked)` context switching over +guarded `mmap` stacks. Currently aarch64-pinned (macOS + Linux). + ### Command-line interface (`modules/std/cli.sx`) `std.cli` builds command-line front-ends over an explicit logical argv: `os_args`