fix: aarch64-linux port of the M:1 fiber runtime (sched.sx)

Port library/modules/std/sched.sx to run on aarch64-linux alongside
aarch64-macOS, validated byte-identical on both via Apple `container`.

Per-OS bits are comptime-branched:
- MAP_AP (mmap MAP_ANON flag): linux 0x22 / macOS 0x1002.
- fd-readiness backend: epoll on linux, kqueue on darwin (epoll import
  scoped to the linux branch). block_on_fd, the run-loop Mode-2 drain,
  and cancel_io_waiter_for each branch; the epoll paths EPOLL_CTL_DEL on
  fire and on early-wake (EPOLLONESHOT only disables a registration;
  kqueue EV_ONESHOT auto-removes it).
- first-entry trampoline: a per-OS hand-written global-asm symbol becomes
  a naked sx fn fib_tramp (mov x0,x19; br x20) + register-indirect
  dispatch (spawn presets regs[1] == x20 == &fib_dispatch), dropping the
  per-OS .global symbol entirely.

Fixes issue 0193 Bug A: the trampoline redesign bus-errored on the
go/wait/sleep capstone (1817) until `export "fib_dispatch"` was restored.
Without the export, fib_dispatch reverts to sx's internal ABI (x0 =
implicit context, first arg self shifted to x1) while the trampoline
hands self over in x0 (C-ABI); on first entry the body runs (x1 happens
to alias self) but the closure then loads regs[1] == &fib_dispatch as its
first capture and re-invokes fib_dispatch forever -> stack overflow ->
bus error. The export pins fib_dispatch to the C-ABI (self in x0),
matching the trampoline. Root cause found via lldb on an AOT build;
confirmed against the compiler source.

Bug B (a top-level asm block wrapped in inline-if is dropped during the
comptime-conditional flatten) is carved out to issue 0194 (OPEN) -- no
live trigger remains, since the naked-fn trampoline sidesteps it.

1811/1814/1816/1817 run byte-identical on the aarch64-macOS host and in
an aarch64-linux container; full suite green (817/0). Documents the fiber
runtime in readme.md.
This commit is contained in:
agra
2026-06-26 11:32:01 +03:00
parent 7218280bf0
commit 22f4719e83
5 changed files with 370 additions and 65 deletions

View File

@@ -4,7 +4,32 @@ Companion to [PLAN-FIBERS.md](PLAN-FIBERS.md). Update after every step (one step
per the cadence rule). New corpus category: `18xx` concurrency. per the cadence rule). New corpus category: `18xx` concurrency.
## Last completed step ## Last completed step
**B1 follow-up — `Scheduler.deinit` (close the bounded leaks).** Post-B1 non-blocking cleanup: a **B1.6 — aarch64-LINUX port of the M:1 fiber runtime (sched.sx).** `library/modules/std/sched.sx`
now runs end-to-end on aarch64-linux as well as aarch64-macOS, validated **byte-identical** on both
via Apple `container` (static ELF, no emulation). The per-OS bits are comptime-branched:
- `MAP_AP` (mmap MAP_ANON flag) — `inline if OS == { case .linux: 0x22 case .macos: 0x1002 }`,
exhaustive on the supported OSes (no default → a new target fails loud on use).
- The fd-readiness backend — kqueue on darwin, **epoll on linux**. The `epoll` import is scoped to
the linux branch (`inline if OS == .linux { ep :: #import "modules/std/net/epoll.sx" }`) so darwin
never pulls epoll types into the concurrency examples (the std-barrel-drift rule). `block_on_fd`, the
run-loop Mode-2 drain, and `cancel_io_waiter_for` each branch kqueue/epoll; epoll additionally
`EPOLL_CTL_DEL`s on fire + on early-wake (EPOLLONESHOT only DISABLES, kqueue EV_ONESHOT auto-removes).
- The first-entry trampoline was redesigned from a per-OS hand-written global-asm symbol to a **naked
sx fn** `fib_tramp` (`mov x0, x19; br x20`) + register-indirect dispatch (spawn presets
`regs[1] == x20 == &fib_dispatch`), so no per-OS `.global _fib_tramp`/`fib_tramp` symbol literal is
needed. This sidesteps a compiler bug (wrapped top-level `asm` dropped — now **issue 0194**, OPEN).
**Bug fixed en route (issue 0193 Bug A):** the tramp redesign initially bus-errored on the 1817
go/wait/sleep capstone (both OSes) because the WIP had dropped `export "fib_dispatch"`. Without the
export `fib_dispatch` uses sx's internal ABI (x0 = implicit `context`, `self` shifted to x1), but the
trampoline hands `self` in x0 (C-ABI) → on first entry the body runs (x1 happens to alias `self`) but
the closure then loads `regs[1] == &fib_dispatch` as its first capture and recurses forever → stack
overflow. **Fix: restore `export "fib_dispatch"`** (pins it to C-ABI, `self` in x0). Root cause found
via lldb on an AOT macOS build; confirmed by an adversarial source review (`src/ir/lower/decl.zig`).
The 1817 capstone in the suite guards the fix. Suite GREEN **817/0**; 1811/1814/1816/1817 byte-identical
macOS host ↔ aarch64-linux container.
### Earlier — B1 follow-up — `Scheduler.deinit` (close the bounded leaks). Post-B1 non-blocking cleanup: a
terminal `deinit` on `library/modules/std/sched.sx`'s `Scheduler` releases the resources B1 documented terminal `deinit` on `library/modules/std/sched.sx`'s `Scheduler` releases the resources B1 documented
as leaked. Frees, in order: (1) any fibers still enqueued ready (leak-safety net for `spawn`/`go` as leaked. Frees, in order: (1) any fibers still enqueued ready (leak-safety net for `spawn`/`go`
without `run()``munmap` stack + free struct; a suspended off-queue fiber is unreachable, but a clean without `run()``munmap` stack + free struct; a suspended off-queue fiber is unreachable, but a clean
@@ -401,12 +426,12 @@ env remains unfreeable (language limitation). Locked by `18xx` 18001820 (nake
blocking async, the switch + §10.7 stress gate + guarded stacks + Win64 sibling, scheduler round-robin, blocking async, the switch + §10.7 stress gate + guarded stacks + Win64 sibling, scheduler round-robin,
suspend/wake, async go/wait/cancel, sim-timer ordering, timer early-wake eviction, kqueue pipe I/O, the suspend/wake, async go/wait/cancel, sim-timer ordering, timer early-wake eviction, kqueue pipe I/O, the
**1817 end-to-end capstone**, sleep-negative/double-wait guards, and **1820 scheduler-deinit**). Suite **1817 end-to-end capstone**, sleep-negative/double-wait guards, and **1820 scheduler-deinit**). Suite
GREEN **759/0**, committed. GREEN **817/0**, committed. **B1.6: now also runs on aarch64-linux** (epoll fd-backend + comptime-branched
`MAP_AP` + naked-fn trampoline) — validated byte-identical to macOS in an Apple `container`.
Future work (none blocking B1): a **linux epoll twin** of `block_on_fd` (mirror via `std/net/epoll`; Future work (none blocking B1): routing the suspending async through
OS-neutral facade `std.event`) — B1.4c wired macOS kqueue only; routing the suspending async through the erased `context.io` (forces sched.sx into every std consumer — deferred to the M:N model, where
the erased `context.io` (forces sched.sx into every std consumer + duplicates the `_fib_tramp` global the `Io` protocol's `spawn_raw`/`suspend_raw`/`ready`/
asm — deferred to the M:N model, where the `Io` protocol's `spawn_raw`/`suspend_raw`/`ready`/
`arm_timer`/`poll` hooks take over); `Future(void)`/`timeout` (issue 0150); freeing the heap-Task / `arm_timer`/`poll` hooks take over); `Future(void)`/`timeout` (issue 0150); freeing the heap-Task /
closure-env / kq-fd (a Scheduler `deinit` + closure-env-ownership affordance). **Next carve: Stream closure-env / kq-fd (a Scheduler `deinit` + closure-env-ownership affordance). **Next carve: Stream
B2** (channels / structured cancel / async stdlib) — see PLAN-CHANNELS.md when started. B2** (channels / structured cancel / async stdlib) — see PLAN-CHANNELS.md when started.
@@ -684,6 +709,19 @@ incomplete); a dedicated effort; lambda workers are the idiom meanwhile.
trusted. `18xx` asserts program-emitted ordering contracts, not raw interleaving. trusted. `18xx` asserts program-emitted ordering contracts, not raw interleaving.
## Log ## Log
- **B1.6 — aarch64-linux port of sched.sx.** Comptime-branched the per-OS bits: `MAP_AP` (linux
`0x22` / macOS `0x1002`), the fd-readiness backend (epoll on linux, kqueue on darwin — epoll import
scoped to the linux branch; `block_on_fd` / run-loop Mode-2 / `cancel_io_waiter_for` each branch,
epoll `EPOLL_CTL_DEL`s on fire + early-wake), and the first-entry trampoline (per-OS global-asm
symbol → naked sx fn `fib_tramp` + register-indirect `br x20` to `&fib_dispatch` preset in
`regs[1]`). **Fixed issue 0193 Bug A:** the tramp redesign bus-errored on 1817 (both OSes) until
`export "fib_dispatch"` was restored — without it the fn uses sx's internal ABI (x0 = implicit
`context`, `self` → x1) while the trampoline supplies `self` in x0, so the closure loads
`regs[1] == &fib_dispatch` as its first capture and recurses forever → stack-overflow bus error.
Root cause found via lldb (AOT macOS build) + an adversarial source review. **Bug B** (wrapped
top-level `asm` dropped) carved to **issue 0194** (OPEN; no live trigger — the naked-fn tramp
sidesteps it). Validated byte-identical on aarch64-macOS host AND aarch64-linux Apple `container`
for 1811/1814/1816/1817; full suite GREEN **817/0**.
- **B1 follow-up — `Scheduler.deinit`.** Closes the bounded leaks B1 documented. Added a `task_allocs: - **B1 follow-up — `Scheduler.deinit`.** Closes the bounded leaks B1 documented. Added a `task_allocs:
List(*void)` field (appended in `go` so the scheduler can reach its generic `Task($R)`s) + a canonical List(*void)` field (appended in `go` so the scheduler can reach its generic `Task($R)`s) + a canonical
`close` extern, then a terminal idempotent `deinit`: reap leftover ready fibers (`munmap` + free) → `close` extern, then a terminal idempotent `deinit`: reap leftover ready fibers (`munmap` + free) →

View File

@@ -1,8 +1,35 @@
# Issue 0193 — linux fiber-runtime port (sched.sx) + a wrapped top-level `asm` drop # Issue 0193 — linux fiber-runtime port (sched.sx) + a wrapped top-level `asm` drop
Status: **OPEN.** Two intertwined items uncovered while porting `library/modules/std/sched.sx` > **RESOLVED — port landed on aarch64-linux.**
(the M:1 fiber runtime) to aarch64-linux. The WIP sched.sx port is preserved in >
`git stash` (`stash@{0}`, "WIP on fix/0192-qualified-import-const-comptime") — pop it to resume. > **Bug A (register-indirect trampoline bus-errors on 1817): FIXED.** Root cause found via lldb on an
> AOT macOS build (the bug reproduced on macOS too, so no container needed): the WIP port had dropped
> `export "fib_dispatch"` from `fib_dispatch`. Without the export the fn reverts to sx's INTERNAL
> calling convention, which reserves x0 for the implicit `context` pointer and shifts the first real
> arg `self` to x1 — but the trampoline (`mov x0, x19; br x20`) hands the fiber over in x0, C-ABI
> style. On first entry x1 coincidentally aliases `&fiber.ctx == self` (left there by the scheduler's
> prior `swap_context(from, to)`, x1 = to), so the body runs once; but inside it the closure loads
> `[Fiber+8] == ctx.regs[1] == &fib_dispatch` as its "first capture" and re-invokes `fib_dispatch`
> forever → stack overflow → bus error. **Fix:** restore `export "fib_dispatch"` so the fn keeps the
> C-ABI (`self` in x0), matching what the trampoline supplies — a one-line library change, no compiler
> change. The register-indirect naked-fn trampoline design is kept (it sidesteps Bug B's hand-written
> per-OS global-asm symbol). Adversarially reviewed against the compiler source (`src/ir/lower/decl.zig`
> `funcWantsImplicitCtx`/`wants_ctx`/`CallingConvention.c`); root cause + fix confirmed CORRECT.
>
> **Validation:** 1811 / 1814 / 1816 / 1817 (the go/wait/sleep capstone) all run **byte-identical** on
> the aarch64-macOS host AND in an aarch64-linux Apple `container` (`sum: 123`, completion order
> `2@10 3@20 1@30`, etc.). Full `zig build test` macOS suite GREEN (817/0).
>
> **Bug B (wrapped top-level `asm` dropped): carved out to `issues/0194-wrapped-toplevel-asm-dropped.md`
> as an OPEN compiler bug.** It is no longer triggered anywhere in the tree (the port no longer uses a
> wrapped global-asm block), so it does not block anything — but it is a real defect and stays filed.
>
> Original writeup below for history.
---
Status: **(historical — see RESOLVED banner above).** Two intertwined items uncovered while porting
`library/modules/std/sched.sx` (the M:1 fiber runtime) to aarch64-linux.
The epoll *bindings* + `std.event.Loop` epoll backend are already committed (`cc137002`) and The epoll *bindings* + `std.event.Loop` epoll backend are already committed (`cc137002`) and
**runtime-validated on real Linux** via Apple `container` (see the event.sx VALIDATION note / the **runtime-validated on real Linux** via Apple `container` (see the event.sx VALIDATION note / the

View File

@@ -0,0 +1,99 @@
# Issue 0194 — a top-level global `asm` block wrapped in `inline if` / `case` is DROPPED
Status: **OPEN.** Carved out of issue 0193 (the linux fiber-runtime port). The port itself is
RESOLVED — it sidesteps this bug entirely by using a naked-sx-fn trampoline (`fib_tramp`) plus a
register-indirect `br x20` instead of a hand-written global-asm symbol, so there is **no live trigger
for this bug in the tree today.** It is filed standalone so the compiler defect is not lost.
## Symptom
A top-level global `asm { … }` block that defines a symbol (e.g. `.global _foo` / `_foo: …`) is
**not emitted** when it is wrapped in a comptime `inline if OS == { case … }` (or
`inline if OS == .linux { asm } else { asm }`). `nm main.o` shows the symbol as `U` (undefined) and
the link fails on both platforms. A PLAIN, unwrapped top-level `asm { … }` emits fine.
- **Observed:** symbol undefined, link error.
- **Expected:** the `asm` block in the taken comptime arm emits its template into the module's global
asm exactly as an unwrapped block would (the comptime-conditional pre-pass already surfaces the
taken arm's *other* top-level decls — fns, consts, imports — correctly; only the `asm_global` node
is lost).
## Reproduction
**Not yet reproducible in isolation.** During the 0193 port, minimal/medium repros ALL emitted +
linked correctly: a top-level `asm` in a single `case`; two `case` blocks; a `case` asm in an
imported module; a naked fn + `case` asm with `bl` to an exported fn; a one-sided
`inline if .linux { #import }` before the asm. **Only the full `library/modules/std/sched.sx`
dropped it** — so the trigger is an interaction with something else in that module, not the wrapped
`asm` alone.
The exact form that triggered it (now replaced on the branch, recoverable from history): the original
global trampoline
```sx
asm {
#string T
.global _fib_tramp
_fib_tramp:
mov x0, x19
bl _fib_dispatch
brk #0
T,
};
fib_tramp :: () extern;
```
wrapped as
```sx
inline if OS == {
case .linux: asm { #string T
fib_tramp:
mov x0, x19
bl fib_dispatch
br x30
T, };
case .macos: asm { #string T
.global _fib_tramp
_fib_tramp:
mov x0, x19
bl _fib_dispatch
brk #0
T, };
}
```
dropped the asm in BOTH arms (whichever was taken). See `issues/0193-linux-fiber-port.patch` for the
full module context that triggers it, and the 0193 writeup for the larger investigation history.
## Investigation prompt (ready to paste)
> A top-level global `asm` block defining a symbol is dropped when wrapped in a comptime
> `inline if OS == { case … }` — but only inside the full `library/modules/std/sched.sx`; it can't be
> reproduced in isolation. Find where the surfaced `asm_global` node is lost between the
> comptime-conditional flatten and IR lowering.
>
> Key files:
> - `src/imports.zig` — `flattenComptimeConditionals` (line ~38) + `appendBranchDecls` (line ~72): the
> pre-pass that surfaces a taken comptime arm's top-level decls. It *appears* correct — it appends
> every node of the taken branch's block, `asm_global` included — so confirm the flattened slice
> actually carries the `asm_global` node (dump `flat_decls` at `src/imports.zig:932`).
> - `src/ir/lower/decl.zig` — `lowerMainAndComptime` (line ~1494), whose `.asm_global` arm (line ~1503)
> appends the verbatim template to `self.module.global_asm`. **Prime suspect:** does the lowering
> entry point feed `lowerMainAndComptime` the *flattened* decl list, or a pre-flatten `root.decls`
> that never contains the surfaced (formerly-nested) `asm_global`? If the asm-emission pass walks a
> different decl list than the one flattening wrote to, a surfaced `asm_global` is silently skipped.
> - `src/ir/emit_llvm.zig:384` — where `module.global_asm` is concatenated into the LLVM module. If the
> node never reached `global_asm`, it never emits.
>
> Steps: (1) build sched.sx's wrapped-asm variant (recover from `issues/0193-linux-fiber-port.patch`
> or git history of branch `fix/0192-qualified-import-const-comptime`), (2) instrument
> `flattenComptimeConditionals` to log whether the `asm_global` node survives into `flat_decls`,
> (3) instrument `lowerMainAndComptime` to log whether it ever *sees* an `asm_global`, (4) bisect what
> else in sched.sx must be present for the drop to occur (the isolation repros lacked it).
> Verification: `nm` the object shows the wrapped-asm symbol DEFINED (not `U`); the wrapped form links
> and runs identically to a plain unwrapped `asm`.
>
> **Verify it isn't a syntax issue first:** it reproduces with both the `case` and `if/else` forms,
> and plain unwrapped asm emits fine — so the wrapping, not the asm itself, is the trigger. That points
> to the flatten/lowering interaction, not user error.

View File

@@ -13,19 +13,28 @@
// - `swap_context` (aarch64 `abi(.naked)`, 13-slot save area: x19..x28, fp, // - `swap_context` (aarch64 `abi(.naked)`, 13-slot save area: x19..x28, fp,
// lr, sp) saves the callee-saved registers + SP into `*from` and loads them // lr, sp) saves the callee-saved registers + SP into `*from` and loads them
// from `*to`, then `ret`s onto `to`'s stack. // from `*to`, then `ret`s onto `to`'s stack.
// - the `_fib_tramp` global-asm first-entry trampoline: x19 holds the // - the `fib_tramp` first-entry trampoline (a naked sx fn): x19 holds the
// bootstrapped `*Fiber`; it moves it to x0 and `bl`s the exported generic // bootstrapped `*Fiber` and x20 = `&fib_dispatch`; it moves the fiber to x0
// dispatch `fib_dispatch`, which calls the body then switches back to the // and `br`s through x20 to the C-ABI `fib_dispatch`, which calls the body
// scheduler. // then switches back to the scheduler.
// - guarded `mmap` stacks: `[GUARD | usable]`, low GUARD page `mprotect`'d // - guarded `mmap` stacks: `[GUARD | usable]`, low GUARD page `mprotect`'d
// PROT_NONE, 16-aligned top returned as the bootstrapped SP. // PROT_NONE, 16-aligned top returned as the bootstrapped SP.
// //
// aarch64-macOS-pinned: the `swap_context` asm + the 13-slot save area are // aarch64-pinned (macOS + linux): the `swap_context` asm + the 13-slot save
// per-arch; the `mmap` flag constants (MAP_ANON = 0x1000) and the 16 KB guard // area are per-arch. The per-OS bits are branched at comptime — `mmap`'s
// page are Apple-specific. Runs end-to-end on a matching host, ir-only on a // MAP_ANON flag (`MAP_AP`) and the fd-readiness backend (kqueue on darwin,
// mismatch. // epoll on linux). Runs end-to-end on a matching aarch64 host, ir-only on an
// arch mismatch.
#import "modules/std.sx"; #import "modules/std.sx";
kqb :: #import "modules/std/net/kqueue.sx"; kqb :: #import "modules/std/net/kqueue.sx";
// The fd-readiness backend is per-OS: kqueue (kqb, above) on darwin, epoll on
// linux. The epoll import is scoped to the linux branch so darwin never pulls
// epoll's types into the concurrency examples' type tables (the same
// std-barrel-drift rule std.event.Loop follows); `block_on_fd` / the run loop
// reference `ep` only inside their own `inline if OS == .linux` arms.
inline if OS == .linux {
ep :: #import "modules/std/net/epoll.sx";
}
// --- libc mmap stack primitives ------------------------------------------- // --- libc mmap stack primitives -------------------------------------------
@@ -40,7 +49,14 @@ abort :: () -> noreturn extern libc "abort";
PROT_NONE :: 0; PROT_NONE :: 0;
PROT_RW :: 3; // PROT_READ | PROT_WRITE PROT_RW :: 3; // PROT_READ | PROT_WRITE
MAP_AP :: 0x1002; // macOS MAP_PRIVATE (0x2) | MAP_ANON (0x1000) // Exhaustive on the SUPPORTED OSes (linux/macOS), no default case: an
// unsupported target matches no case → MAP_AP undefined → a loud compile error
// on use rather than a silent wrong flag. (The fiber runtime is aarch64-only
// anyway — the swap_context asm — so only these two platforms are wired.)
inline if OS == {
case .linux: MAP_AP :: 0x22; // linux MAP_PRIVATE (0x2) | MAP_ANON (0x20)
case .macos: MAP_AP :: 0x1002; // macOS MAP_PRIVATE (0x2) | MAP_ANON (0x1000)
}
GUARD :: 16384; // one 16 KB page (aarch64-macOS) GUARD :: 16384; // one 16 KB page (aarch64-macOS)
STACK :: 131072; // 128 KB usable per fiber STACK :: 131072; // 128 KB usable per fiber
@@ -172,10 +188,11 @@ Scheduler :: struct {
self.n_spawned = self.n_spawned + 1; self.n_spawned = self.n_spawned + 1;
top := boot_stack(f, STACK); top := boot_stack(f, STACK);
f.ctx.regs[0] = xx f; // x19 = self f.ctx.regs[0] = xx f; // x19 = self (→ x0 in the tramp)
f.ctx.regs[10] = 0; // fp f.ctx.regs[1] = xx fib_dispatch; // x20 = dispatch entry (tramp `br`s to it)
f.ctx.regs[11] = xx fib_tramp; // lr → trampoline f.ctx.regs[10] = 0; // fp
f.ctx.regs[12] = top; // sp f.ctx.regs[11] = xx fib_tramp; // lr → trampoline
f.ctx.regs[12] = top; // sp
f.state = .ready; f.state = .ready;
enqueue(self, f); enqueue(self, f);
@@ -239,12 +256,13 @@ Scheduler :: struct {
// but was woken by another path (a manual wake, a Task completion), its // but was woken by another path (a manual wake, a Task completion), its
// `IoWaiter` would otherwise survive pointing at a fiber that runs to // `IoWaiter` would otherwise survive pointing at a fiber that runs to
// completion and is reaped (stack munmap'd + Fiber freed). A later // completion and is reaped (stack munmap'd + Fiber freed). A later
// kqueue drain matching that stale record would `wake` freed memory. // readiness drain matching that stale record would `wake` freed memory.
// Evict it here. NOTE: we do NOT EV_DELETE the kqueue registration — it // Evict it here. The kernel-side registration is handled per-OS inside
// is EV_ONESHOT, so a never-fired registration simply lingers in the // `cancel_io_waiter_for`: on darwin the EV_ONESHOT kqueue registration is
// kernel queue until the fd is readable, at which point the drain finds // left to linger (a never-fired one-shot the drain ignores; the fd's
// no matching waiter and ignores it (see `run`). The fd is the example's // owner closes it, auto-removing it), but on linux the EPOLLONESHOT
// to close; closing it auto-removes any pending registration. // registration stays enabled and must be `EPOLL_CTL_DEL`'d (else it could
// fire later with no waiter and would block a re-arm of the same fd).
cancel_io_waiter_for(self, f); cancel_io_waiter_for(self, f);
self.n_suspended = self.n_suspended - 1; self.n_suspended = self.n_suspended - 1;
f.state = .ready; f.state = .ready;
@@ -333,20 +351,38 @@ Scheduler :: struct {
} }
j = j + 1; j = j + 1;
} }
// Lazily open the kqueue fd the first time fd-blocking is used. // Lazily open the event-queue fd the first time fd-blocking is used:
// kqueue on darwin, epoll on linux. `self.kq` holds whichever — it is
// just "the readiness queue fd".
if self.kq < 0 { if self.kq < 0 {
self.kq = kqb.kqueue(); inline if OS == {
case .linux: self.kq = ep.ep_create();
case .macos: self.kq = kqb.kqueue();
}
if self.kq < 0 { if self.kq < 0 {
print("sched: kqueue() failed to open the event queue\n"); print("sched: failed to open the event queue\n");
abort(); abort();
} }
} }
// Arm a one-shot read-readiness registration for `fd`. udata is unused // Arm a one-shot read-readiness registration for `fd`, matched back by
// (we match the waiter by fd in the drain), so pass 0. // the run-loop drain (kqueue by ident; epoll stashes the fd in `data`).
chg := kqb.kev_change(fd, kqb.EVFILT_READ, kqb.EV_ADD | kqb.EV_ENABLE | kqb.EV_ONESHOT, 0); // darwin EV_ONESHOT auto-removes the registration on fire; epoll's
if !kqb.kq_apply(self.kq, chg) { // EPOLLONESHOT only DISABLES it, so the linux paths additionally
print("sched: kevent() failed to register fd {} for read readiness\n", fd); // EPOLL_CTL_DEL on fire (run) and on early-wake (cancel_io_waiter_for).
abort(); inline if OS == {
case .linux: {
if !ep.ep_ctl(self.kq, ep.EPOLL_CTL_ADD, fd, ep.EPOLLIN | ep.EPOLLONESHOT) {
print("sched: epoll_ctl() failed to register fd {} for read readiness\n", fd);
abort();
}
}
case .macos: {
chg := kqb.kev_change(fd, kqb.EVFILT_READ, kqb.EV_ADD | kqb.EV_ENABLE | kqb.EV_ONESHOT, 0);
if !kqb.kq_apply(self.kq, chg) {
print("sched: kevent() failed to register fd {} for read readiness\n", fd);
abort();
}
}
} }
// Record the waiter BEFORE parking — the run loop matches the fired // Record the waiter BEFORE parking — the run loop matches the fired
// event's ident back to this record. Long-lived-container rule: the // event's ident back to this record. Long-lived-container rule: the
@@ -407,20 +443,42 @@ Scheduler :: struct {
// kernel reports at least one fd ready, then wake every waiter whose // kernel reports at least one fd ready, then wake every waiter whose
// fd fired. (null timeout via -1 → wait forever.) // fd fired. (null timeout via -1 → wait forever.)
if self.io_waiters.len > 0 { if self.io_waiters.len > 0 {
evbuf : [MAXEV]kqb.Kevent = ---; // BLOCK on the readiness queue until ≥1 fd fires (timeout -1 =
n := kqb.kq_wait(self.kq, @evbuf[0], MAXEV, -1); // forever), then for each fired event match the fd back to its
if n < 0 { // io-waiter, evict the record, and wake the fiber.
print("sched: kevent() wait failed while blocking on fd readiness\n"); inline if OS == {
abort(); case .linux: {
} evbuf : [MAXEV]ep.EpollEvent = ---;
// For each fired event, find the io-waiter whose fd matches its n := ep.ep_wait(self.kq, .{ ptr = @evbuf[0], len = MAXEV }, MAXEV, -1);
// ident, evict it, and wake its fiber. EV_ONESHOT already removed if n < 0 {
// the kernel registration, so we only drop the waiter record. print("sched: epoll_wait() failed while blocking on fd readiness\n");
i := 0; abort();
while i < n { }
ready_fd : i32 = xx evbuf[i].ident; i := 0;
wake_io_waiter_for_fd(self, ready_fd); while i < n {
i = i + 1; ready_fd := ep.ev_fd(evbuf[i]);
wake_io_waiter_for_fd(self, ready_fd);
// EPOLLONESHOT only DISABLED the registration; remove it
// fully so the fd can be re-armed by a future block_on_fd
// (kqueue's EV_ONESHOT removes it for free).
ep.ep_ctl(self.kq, ep.EPOLL_CTL_DEL, ready_fd, 0);
i = i + 1;
}
}
case .macos: {
evbuf : [MAXEV]kqb.Kevent = ---;
n := kqb.kq_wait(self.kq, @evbuf[0], MAXEV, -1);
if n < 0 {
print("sched: kevent() wait failed while blocking on fd readiness\n");
abort();
}
i := 0;
while i < n {
ready_fd : i32 = xx evbuf[i].ident;
wake_io_waiter_for_fd(self, ready_fd);
i = i + 1;
}
}
} }
continue; continue;
} }
@@ -539,23 +597,48 @@ ASM
}; };
} }
// First-entry trampoline: a fiber's bootstrapped LR points here. x19 holds the // First-entry trampoline: a fiber's bootstrapped LR points here, with x19 =
// `*Fiber` (preset in the saved context); move it to x0 and call the generic // `*Fiber` and x20 = `&fib_dispatch` (both preset in the saved context by
// dispatch. // `spawn`, both callee-saved so `swap_context` restores them on first entry).
asm { // Move the fiber to x0 and tail-branch to dispatch via the REGISTER (x20) — so
#string T // there is no hand-written global-asm symbol and nothing here needs per-OS
.global _fib_tramp // symbol naming (`_fib_tramp` on darwin vs `fib_tramp` on linux) or a `bl` to a
_fib_tramp: // named export. As a naked sx fn `fib_tramp`'s own symbol is emitted with the
// platform-correct name automatically, so `spawn`'s `xx fib_tramp` resolves on
// every target. This register-indirect bootstrap replaced an OS-conditional
// global `asm` block (a top-level `asm` wrapped in an `inline if` is dropped in
// this module's context — see issues/0193) and sidesteps the hand-written
// symbol entirely, which is cleaner regardless.
fib_tramp :: () abi(.naked) {
asm volatile {
#string T
mov x0, x19 mov x0, x19
bl _fib_dispatch br x20
brk #0 T
T, };
}; }
fib_tramp :: () extern;
// The ONE place that runs a fiber body. Reached only from `_fib_tramp` on first // The ONE place that runs a fiber body. Reached only from `fib_tramp` on first
// entry, on the fiber's own fresh stack. Runs the body, marks the fiber done, // entry, on the fiber's own fresh stack. Runs the body, marks the fiber done,
// and switches back to the scheduler — never returns past the final switch. // and switches back to the scheduler — never returns past the final switch.
//
// `export "fib_dispatch"` is MANDATORY, not decorative: it pins this fn to the
// **C ABI** (first real arg `self` in x0). The trampoline hands the fiber over
// in x0 (`mov x0, x19; br x20`), which is exactly C-ABI. Drop the export and the
// fn reverts to sx's INTERNAL calling convention, which reserves x0 for the
// implicit `context` pointer and shifts `self` to x1 — so the trampoline's x0
// would land in the context slot and `self` would be read from a garbage x1. On
// first entry that garbage happens to alias `&fiber.ctx == self` (left in x1 by
// the scheduler's prior `swap_context`), so the body runs once; but inside it
// the closure loads `[Fiber+8] == regs[1] == &fib_dispatch` as its "first
// capture" and re-invokes `fib_dispatch` forever → stack overflow → bus error
// (issue 0193 Bug A, observed only on the go/wait/sleep capstone 1817).
//
// One consequence of the C-ABI boundary: an exported fn has no implicit
// `context` param, so `self.body()` runs under the static `__sx_default_context`
// — NOT whatever `push Context { allocator = ... }` was in force at the
// `run()` call site. Fiber bodies do not inherit a caller-scoped allocator; a
// body that needs one must capture it explicitly (the long-lived-container rule).
fib_dispatch :: (self: *Fiber) export "fib_dispatch" { fib_dispatch :: (self: *Fiber) export "fib_dispatch" {
self.body(); self.body();
self.state = .done; self.state = .done;
@@ -687,7 +770,19 @@ cancel_io_waiter_for :: (self: *Scheduler, f: *Fiber) {
i := 0; i := 0;
while i < self.io_waiters.len { while i < self.io_waiters.len {
if self.io_waiters.items[i].fiber == f { if self.io_waiters.items[i].fiber == f {
remove_io_waiter(self, i); // Early-wake: the fiber is re-readied by another path while its fd
// registration is still armed. kqueue's EV_ONESHOT lingers
// harmlessly (a never-fired one-shot the drain ignores); epoll's
// EPOLLONESHOT registration stays enabled — it could fire later with
// no waiter, and blocks a re-arm of the same fd — so remove it.
inline if OS == {
case .linux: {
fd := self.io_waiters.items[i].fd;
remove_io_waiter(self, i);
if self.kq >= 0 { ep.ep_ctl(self.kq, ep.EPOLL_CTL_DEL, fd, 0); }
}
case .macos: remove_io_waiter(self, i);
}
return; return;
} }
i = i + 1; i = i + 1;

View File

@@ -30,6 +30,7 @@ main :: () {
- Pattern matching on enums, optionals, and type categories - Pattern matching on enums, optionals, and type categories
- C interop via `extern` / `export` and `#import c` - C interop via `extern` / `export` and `#import c`
- Inline assembly as a first-class expression - Inline assembly as a first-class expression
- Colorblind async via a pure-sx cooperative fiber runtime (no function coloring)
- Targets: macOS (ARM64, x86_64), Linux (x86_64, ARM64), Windows (x86_64), WebAssembly - Targets: macOS (ARM64, x86_64), Linux (x86_64, ARM64), Windows (x86_64), WebAssembly
## Usage ## Usage
@@ -511,6 +512,51 @@ fence(.seq_cst); // standalone memory fence
combinations are compile errors. The same operations run at compile time (`#run`) combinations are compile errors. The same operations run at compile time (`#run`)
under single-threaded semantics. under single-threaded semantics.
### Async / Concurrency (`modules/std/sched.sx`)
A pure-sx cooperative fiber runtime — **colorblind async**, with no `async` /
`await` keywords and no function coloring. Any function can suspend; a `Scheduler`
drives any number of stackful fibers, each on its own guard-paged stack. The
high-level API is `go` to spawn a task and `wait` to suspend until it completes:
```sx
#import "modules/std.sx";
sched :: #import "modules/std/sched.sx";
main :: () {
s := sched.Scheduler.init();
ps := @s; // closures capture by value — capture a pointer to the scheduler
// The coordinator runs as a fiber so `wait` has a fiber to park.
s.spawn(() => {
a := ps.go(() -> i64 => { ps.sleep(30); 100 }); // launch async tasks
b := ps.go(() -> i64 => { ps.sleep(10); 20 });
c := ps.go(() -> i64 => { ps.sleep(20); 3 });
sum := (a.wait() or 0) + (b.wait() or 0) + (c.wait() or 0); // 123
print("sum: {}\n", sum);
});
s.run(); // drive the scheduler until all fibers finish
}
```
Tasks complete in deadline order, not spawn or await order. The runtime offers:
- **`go(work) -> *Task($R)`** / **`wait() -> R !TaskErr`** / **`cancel()`** — the
task layer. `wait` rides the `!` error channel so a cancel surfaces as
`error.Canceled`.
- **`spawn`**, **`yield_now`**, **`suspend_self`**, **`wake`** — the raw fiber
primitives the task layer is built on.
- **`sleep(ms)`** / **`now_ms()`** — timer-driven suspension on a virtual clock
(deterministic, no real wall time).
- **`block_on_fd(fd, want_read)`** — suspend until a file descriptor is ready,
backed by kqueue (darwin) or epoll (linux).
It's an M:1 model (cooperative, no preemption — so no data races between fibers
and no atomics needed across them), built on `abi(.naked)` context switching over
guarded `mmap` stacks. Currently aarch64-pinned (macOS + Linux).
### Command-line interface (`modules/std/cli.sx`) ### Command-line interface (`modules/std/cli.sx`)
`std.cli` builds command-line front-ends over an explicit logical argv: `os_args` `std.cli` builds command-line front-ends over an explicit logical argv: `os_args`