From cc137002375bedf32bbfc85ae8112cc35b76ac66 Mon Sep 17 00:00:00 2001 From: agra Date: Fri, 26 Jun 2026 08:37:12 +0300 Subject: [PATCH] feat: linux epoll backend for std.event.Loop (the kqueue twin) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add library/modules/std/net/epoll.sx — raw epoll bindings, the linux twin of std/net/kqueue.sx — and branch std.event.Loop on `inline if OS` so the OS-neutral readiness Loop runs on linux (epoll) as well as darwin (kqueue); callers never see the backend. epoll_event has no packed-struct primitive in sx, so it is modelled as an arch-branched struct of u32 fields — { events, data_lo, data_hi } → 12 bytes on x86_64 (matching __attribute__((packed))), { events, pad, data_lo, data_hi } → 16 bytes on aarch64 — every field 4-aligned, so the layout is byte-exact for the kernel ABI with no packed attribute and no unaligned access. The fd is stashed in data_lo (epoll echoes one data word, not the fd separately). epoll.sx is self-contained (libc only, no build.sx): the `inline if ARCH` selecting the struct is resolved by the compiler's flatten pre-pass, so the module's IR stays small. The epoll backend is imported INSIDE event.sx's `inline if OS == .linux` branch (not top level): event.sx rides the std.sx barrel, so a top-level import would register epoll's types into every std program's type table on darwin and drift every .ir snapshot. The epoll Loop keeps a small per-fd registration table (combined EPOLLIN/OUT mask via EPOLL_CTL_ADD/MOD/DEL), maps the fd back to the caller's udata, arms EPOLLRDHUP so a peer half-close surfaces as Event.eof (matching kqueue EV_EOF), and uses an eventfd as the cross-thread wake channel (kqueue's EVFILT_USER). Validation: the kqueue path runs end-to-end on the macOS host (1632 unchanged); the epoll bindings + ABI layout are corpus-locked ir-only by examples/event/1633 (x86_64-linux, both arches probe-verified). The epoll Loop is verified to lower clean for both linux arches and self-reviewed, but is not corpus-snapshotted (a Loop example drags the std barrel → ~18k-line brittle IR); runtime behavior validates on a linux runner. --- current/CHECKPOINT-FIBERS.md | 20 ++ .../event/1633-event-epoll-bindings-linux.sx | 32 +++ .../1633-event-epoll-bindings-linux.build | 1 + .../1633-event-epoll-bindings-linux.exit | 1 + .../1633-event-epoll-bindings-linux.ir | 244 ++++++++++++++++++ .../1633-event-epoll-bindings-linux.stderr | 1 + library/modules/std/event.sx | 216 +++++++++++++++- library/modules/std/net/epoll.sx | 140 ++++++++++ 8 files changed, 647 insertions(+), 8 deletions(-) create mode 100644 examples/event/1633-event-epoll-bindings-linux.sx create mode 100644 examples/event/expected/1633-event-epoll-bindings-linux.build create mode 100644 examples/event/expected/1633-event-epoll-bindings-linux.exit create mode 100644 examples/event/expected/1633-event-epoll-bindings-linux.ir create mode 100644 examples/event/expected/1633-event-epoll-bindings-linux.stderr create mode 100644 library/modules/std/net/epoll.sx diff --git a/current/CHECKPOINT-FIBERS.md b/current/CHECKPOINT-FIBERS.md index 19f0c952..b6f65e53 100644 --- a/current/CHECKPOINT-FIBERS.md +++ b/current/CHECKPOINT-FIBERS.md @@ -518,6 +518,26 @@ non-unification: virtual-time timers and real kqueue timeouts are NOT merged — timer before ever blocking on kqueue (a program uses `sleep` OR fds); a true "fd-or-real-timeout" wants a kqueue `EVFILT_TIMER`, future work. +> **▶ LINUX EPOLL — in progress (2026-06-26), via `std.event.Loop` (the OS-neutral facade).** +> Chosen over the sched.sx `block_on_fd` twin because the facade is the named home for epoll, is pure +> sx + libc (zero compiler change), is consumed by http.sx, and has a runnable darwin sibling. Landed: +> (A) **`library/modules/std/net/epoll.sx`** — raw bindings, the linux twin of `std/net/kqueue.sx`. +> `epoll_event` is modelled as an **arch-branched struct** (`{events, data_lo, data_hi}` u32 fields → +> 12 B x86_64 packed / 16 B aarch64), so layout is byte-exact with NO packed attribute, NO unaligned +> access, NO scalar-pointer indexing (issue 0155) — the struct-per-arch approach the user flagged as +> better than raw byte poking. Self-contained (libc only — NO build.sx import; the top-level `inline if +> ARCH` resolves via the compiler's flatten pre-pass, keeping the IR small). Locked by +> `examples/event/1633-event-epoll-bindings-linux.sx` (ir-only x86_64-linux, durable 244-line .ir; +> aarch64 16 B layout also probe-verified). (B) **`std.event.Loop` branched on `inline if OS`** into two +> top-level OS-selected structs (sx has no conditional struct fields): the kqueue Loop unchanged +> (darwin, runs — 1632 green), a new epoll Loop (linux) with the per-fd registration table (combined +> EPOLLIN/OUT mask via ADD/MOD/DEL), eventfd wake channel, and EPOLLRDHUP→eof. **Verified to LOWER** +> clean for both linux arches (every epoll syscall emits) + self-reviewed; NOT corpus-snapshotted (a +> Loop example drags the std barrel → ~18k-line brittle IR — documented in event.sx). Runtime validation +> pends a linux runner. **Remaining:** a linux CI run to validate end-to-end; optionally route sched.sx +> `block_on_fd` through `std.event` (still needs the linux sched.sx port — mmap consts, tramp symbol, +> errno, x86_64 SysV switch). + > **✅ issue 0192 FIXED (2026-06-26) — epoll work UNBLOCKED.** A qualified-import-member const > (`m.EV_SIZE`) now folds as a compile-time constant in every position the bare/flat form does > (array dim, arithmetic, Vector lane, generic value-param, inline-for) — so the clean diff --git a/examples/event/1633-event-epoll-bindings-linux.sx b/examples/event/1633-event-epoll-bindings-linux.sx new file mode 100644 index 00000000..f48224e1 --- /dev/null +++ b/examples/event/1633-event-epoll-bindings-linux.sx @@ -0,0 +1,32 @@ +// std/net/epoll (the linux twin of std/net/kqueue): the raw bindings lower for +// a linux target with a byte-exact `epoll_event` layout — 12-byte stride on +// x86_64 (packed), modelled as an arch-branched `{events, data_lo, data_hi}` +// struct of u32 fields (no packed attribute, no unaligned access). Exercises +// create / ctl / wait + the readiness accessors so the IR covers the surface. +// +// Imports ONLY epoll.sx (libc-only — no std/build) so the .ir snapshot stays +// small and churns only when the bindings change. ir-only on the aarch64-macOS +// dev host (target x86_64-linux mismatches host arch+os → the runner asserts +// .exit + .ir + .stderr from `sx ir --target`); runtime behavior validates on a +// linux runner (see the module header's VALIDATION NOTE). The `inline if ARCH` +// in epoll.sx is resolved by the compiler's flatten pre-pass, so no build.sx. +ep :: #import "modules/std/net/epoll.sx"; + +main :: () -> i32 { + epfd := ep.ep_create(); + // register read + peer-close interest on a fd, then drain readiness + if !ep.ep_ctl(epfd, ep.EPOLL_CTL_ADD, 1, ep.EPOLLIN | ep.EPOLLRDHUP) { return 2; } + ep.ep_ctl(epfd, ep.EPOLL_CTL_MOD, 1, ep.EPOLLIN | ep.EPOLLOUT); + + evs : [8]ep.EpollEvent = ---; + sl : []ep.EpollEvent = .{ ptr = @evs[0], len = 8 }; + n := ep.ep_wait(epfd, sl, 8, 100); + if n > 0 { + if ep.ev_readable(evs[0]) { return ep.ev_fd(evs[0]); } + if ep.ev_writable(evs[0]) { return 4; } + if ep.ev_eof(evs[0]) { return 9; } + if ep.ev_err(evs[0]) { return 8; } + } + ep.ep_ctl(epfd, ep.EPOLL_CTL_DEL, 1, 0); + return xx size_of(ep.EpollEvent); // 12 on x86_64 +} diff --git a/examples/event/expected/1633-event-epoll-bindings-linux.build b/examples/event/expected/1633-event-epoll-bindings-linux.build new file mode 100644 index 00000000..7fbbed1a --- /dev/null +++ b/examples/event/expected/1633-event-epoll-bindings-linux.build @@ -0,0 +1 @@ +{ "target": "x86_64-linux" } diff --git a/examples/event/expected/1633-event-epoll-bindings-linux.exit b/examples/event/expected/1633-event-epoll-bindings-linux.exit new file mode 100644 index 00000000..573541ac --- /dev/null +++ b/examples/event/expected/1633-event-epoll-bindings-linux.exit @@ -0,0 +1 @@ +0 diff --git a/examples/event/expected/1633-event-epoll-bindings-linux.ir b/examples/event/expected/1633-event-epoll-bindings-linux.ir new file mode 100644 index 00000000..2c01155e --- /dev/null +++ b/examples/event/expected/1633-event-epoll-bindings-linux.ir @@ -0,0 +1,244 @@ + +; Function Attrs: nounwind +declare i32 @epoll_create1(i32) #0 + +; Function Attrs: nounwind +declare i32 @epoll_ctl(i32, i32, i32, ptr) #0 + +; Function Attrs: nounwind +declare i32 @epoll_wait(i32, ptr, i32, i32) #0 + +; Function Attrs: nounwind +declare i32 @eventfd(i32, i32) #0 + +; Function Attrs: nounwind +declare ptr @__errno_location() #0 + +; Function Attrs: nounwind +define internal i1 @ev_readable({ i32, i32, i32 } %0) #0 { +entry: + %alloca = alloca { i32, i32, i32 }, align 8 + store { i32, i32, i32 } %0, ptr %alloca, align 4 + %load = load { i32, i32, i32 }, ptr %alloca, align 4 + %sg = extractvalue { i32, i32, i32 } %load, 0 + %and = and i32 %sg, 1 + %cmp.ext = zext i32 %and to i64 + %icmp = icmp ne i64 %cmp.ext, 0 + ret i1 %icmp +} + +; Function Attrs: nounwind +define internal i1 @ev_writable({ i32, i32, i32 } %0) #0 { +entry: + %alloca = alloca { i32, i32, i32 }, align 8 + store { i32, i32, i32 } %0, ptr %alloca, align 4 + %load = load { i32, i32, i32 }, ptr %alloca, align 4 + %sg = extractvalue { i32, i32, i32 } %load, 0 + %and = and i32 %sg, 4 + %cmp.ext = zext i32 %and to i64 + %icmp = icmp ne i64 %cmp.ext, 0 + ret i1 %icmp +} + +; Function Attrs: nounwind +define internal i1 @ev_eof({ i32, i32, i32 } %0) #0 { +entry: + %alloca = alloca { i32, i32, i32 }, align 8 + store { i32, i32, i32 } %0, ptr %alloca, align 4 + %load = load { i32, i32, i32 }, ptr %alloca, align 4 + %sg = extractvalue { i32, i32, i32 } %load, 0 + %and = and i32 %sg, 8208 + %cmp.ext = zext i32 %and to i64 + %icmp = icmp ne i64 %cmp.ext, 0 + ret i1 %icmp +} + +; Function Attrs: nounwind +define internal i1 @ev_err({ i32, i32, i32 } %0) #0 { +entry: + %alloca = alloca { i32, i32, i32 }, align 8 + store { i32, i32, i32 } %0, ptr %alloca, align 4 + %load = load { i32, i32, i32 }, ptr %alloca, align 4 + %sg = extractvalue { i32, i32, i32 } %load, 0 + %and = and i32 %sg, 8 + %cmp.ext = zext i32 %and to i64 + %icmp = icmp ne i64 %cmp.ext, 0 + ret i1 %icmp +} + +; Function Attrs: nounwind +define internal i32 @ev_fd({ i32, i32, i32 } %0) #0 { +entry: + %alloca = alloca { i32, i32, i32 }, align 8 + store { i32, i32, i32 } %0, ptr %alloca, align 4 + %load = load { i32, i32, i32 }, ptr %alloca, align 4 + %sg = extractvalue { i32, i32, i32 } %load, 1 + ret i32 %sg +} + +; Function Attrs: nounwind +define internal i32 @ep_create() #0 { +entry: + %call = call i32 @epoll_create1(i32 524288) + ret i32 %call +} + +; Function Attrs: nounwind +define internal i1 @ep_ctl(i32 %0, i32 %1, i32 %2, i32 %3) #0 { +entry: + %alloca = alloca i32, align 4 + store i32 %0, ptr %alloca, align 4 + %allocaN = alloca i32, align 4 + store i32 %1, ptr %allocaN, align 4 + %allocaN = alloca i32, align 4 + store i32 %2, ptr %allocaN, align 4 + %allocaN = alloca i32, align 4 + store i32 %3, ptr %allocaN, align 4 + %allocaN = alloca { i32, i32, i32 }, align 8 + %load = load i32, ptr %allocaN, align 4 + %loadN = load i32, ptr %allocaN, align 4 + %si = insertvalue { i32, i32, i32 } undef, i32 %load, 0 + %siN = insertvalue { i32, i32, i32 } %si, i32 %loadN, 1 + %siN = insertvalue { i32, i32, i32 } %siN, i32 0, 2 + store { i32, i32, i32 } %siN, ptr %allocaN, align 4 + %loadN = load i32, ptr %alloca, align 4 + %loadN = load i32, ptr %allocaN, align 4 + %loadN = load i32, ptr %allocaN, align 4 + %call = call i32 @epoll_ctl(i32 %loadN, i32 %loadN, i32 %loadN, ptr %allocaN) + %cmp.ext = sext i32 %call to i64 + %icmp = icmp eq i64 %cmp.ext, 0 + ret i1 %icmp +} + +; Function Attrs: nounwind +define internal i32 @ep_wait(i32 %0, { ptr, i64 } %1, i32 %2, i32 %3) #0 { +entry: + %alloca = alloca i32, align 4 + %allocaN = alloca i32, align 4 + store i32 %0, ptr %alloca, align 4 + %allocaN = alloca { ptr, i64 }, align 8 + store { ptr, i64 } %1, ptr %allocaN, align 8 + %allocaN = alloca i32, align 4 + store i32 %2, ptr %allocaN, align 4 + %allocaN = alloca i32, align 4 + store i32 %3, ptr %allocaN, align 4 + br label %while.hdr.2 + +while.hdr.2: ; preds = %if.merge.8, %entry + br i1 true, label %while.body.3, label %while.exit.4 + +while.body.3: ; preds = %while.hdr.2 + %load = load i32, ptr %alloca, align 4 + %loadN = load { ptr, i64 }, ptr %allocaN, align 8 + %igp.data = extractvalue { ptr, i64 } %loadN, 0 + %igp.ptr = getelementptr { i32, i32, i32 }, ptr %igp.data, i64 0 + %loadN = load i32, ptr %allocaN, align 4 + %loadN = load i32, ptr %allocaN, align 4 + %call = call i32 @epoll_wait(i32 %load, ptr %igp.ptr, i32 %loadN, i32 %loadN) + store i32 %call, ptr %allocaN, align 4 + %loadN = load i32, ptr %allocaN, align 4 + %cmp.ext = sext i32 %loadN to i64 + %icmp = icmp sge i64 %cmp.ext, 0 + br i1 %icmp, label %if.then.5, label %if.merge.6 + +while.exit.4: ; preds = %while.hdr.2 + ret i32 -1 + +if.then.5: ; preds = %while.body.3 + %loadN = load i32, ptr %allocaN, align 4 + ret i32 %loadN + +if.merge.6: ; preds = %while.body.3 + %callN = call ptr @__errno_location() + %deref = load i32, ptr %callN, align 4 + %cmp.ext11 = sext i32 %deref to i64 + %icmpN = icmp ne i64 %cmp.ext11, 4 + br i1 %icmpN, label %if.then.7, label %if.merge.8 + +if.then.7: ; preds = %if.merge.6 + ret i32 -1 + +if.merge.8: ; preds = %if.merge.6 + br label %while.hdr.2 +} + +; Function Attrs: nounwind +define i32 @main() #0 { +entry: + %allocaN = alloca [8 x { i32, i32, i32 }], align 8 + %allocaN = alloca { ptr, i64 }, align 8 + %allocaN = alloca i32, align 4 + %call = call i32 @ep_create() + %alloca = alloca i32, align 4 + store i32 %call, ptr %alloca, align 4 + %load = load i32, ptr %alloca, align 4 + %callN = call i1 @ep_ctl(i32 %load, i32 1, i32 1, i32 8193) + %lnot = xor i1 %callN, true + br i1 %lnot, label %if.then.0, label %if.merge.1 + +if.then.0: ; preds = %entry + ret i32 2 + +if.merge.1: ; preds = %entry + %loadN = load i32, ptr %alloca, align 4 + %callN = call i1 @ep_ctl(i32 %loadN, i32 3, i32 1, i32 5) + %igp.ptr = getelementptr { i32, i32, i32 }, ptr %allocaN, i64 0 + %si = insertvalue { ptr, i64 } undef, ptr %igp.ptr, 0 + %siN = insertvalue { ptr, i64 } %si, i64 8, 1 + store { ptr, i64 } %siN, ptr %allocaN, align 8 + %loadN = load i32, ptr %alloca, align 4 + %loadN = load { ptr, i64 }, ptr %allocaN, align 8 + %callN = call i32 @ep_wait(i32 %loadN, { ptr, i64 } %loadN, i32 8, i32 100) + store i32 %callN, ptr %allocaN, align 4 + %loadN = load i32, ptr %allocaN, align 4 + %cmp.ext = sext i32 %loadN to i64 + %icmp = icmp sgt i64 %cmp.ext, 0 + br i1 %icmp, label %if.then.9, label %if.merge.10 + +if.then.9: ; preds = %if.merge.1 + %igp.ptr12 = getelementptr { i32, i32, i32 }, ptr %allocaN, i64 0 + %loadN = load { i32, i32, i32 }, ptr %igp.ptr12, align 4 + %callN = call i1 @ev_readable({ i32, i32, i32 } %loadN) + br i1 %callN, label %if.then.11, label %if.merge.12 + +if.merge.10: ; preds = %if.merge.18, %if.merge.1 + %loadN = load i32, ptr %alloca, align 4 + %callN = call i1 @ep_ctl(i32 %loadN, i32 2, i32 1, i32 0) + ret i32 12 + +if.then.11: ; preds = %if.then.9 + %igp.ptr17 = getelementptr { i32, i32, i32 }, ptr %allocaN, i64 0 + %loadN = load { i32, i32, i32 }, ptr %igp.ptr17, align 4 + %callN = call i32 @ev_fd({ i32, i32, i32 } %loadN) + ret i32 %callN + +if.merge.12: ; preds = %if.then.9 + %igp.ptr20 = getelementptr { i32, i32, i32 }, ptr %allocaN, i64 0 + %loadN = load { i32, i32, i32 }, ptr %igp.ptr20, align 4 + %callN = call i1 @ev_writable({ i32, i32, i32 } %loadN) + br i1 %callN, label %if.then.13, label %if.merge.14 + +if.then.13: ; preds = %if.merge.12 + ret i32 4 + +if.merge.14: ; preds = %if.merge.12 + %igp.ptr23 = getelementptr { i32, i32, i32 }, ptr %allocaN, i64 0 + %loadN = load { i32, i32, i32 }, ptr %igp.ptr23, align 4 + %callN = call i1 @ev_eof({ i32, i32, i32 } %loadN) + br i1 %callN, label %if.then.15, label %if.merge.16 + +if.then.15: ; preds = %if.merge.14 + ret i32 9 + +if.merge.16: ; preds = %if.merge.14 + %igp.ptr26 = getelementptr { i32, i32, i32 }, ptr %allocaN, i64 0 + %loadN = load { i32, i32, i32 }, ptr %igp.ptr26, align 4 + %callN = call i1 @ev_err({ i32, i32, i32 } %loadN) + br i1 %callN, label %if.then.17, label %if.merge.18 + +if.then.17: ; preds = %if.merge.16 + ret i32 8 + +if.merge.18: ; preds = %if.merge.16 + br label %if.merge.10 +} diff --git a/examples/event/expected/1633-event-epoll-bindings-linux.stderr b/examples/event/expected/1633-event-epoll-bindings-linux.stderr new file mode 100644 index 00000000..8b137891 --- /dev/null +++ b/examples/event/expected/1633-event-epoll-bindings-linux.stderr @@ -0,0 +1 @@ + diff --git a/library/modules/std/event.sx b/library/modules/std/event.sx index 3270673a..bf886752 100644 --- a/library/modules/std/event.sx +++ b/library/modules/std/event.sx @@ -6,24 +6,51 @@ // registrations cost nothing — the substrate an httpz-shaped server // worker stands on. // -// Backend: kqueue (std/net/kqueue) on darwin. The epoll twin -// (std/net/epoll, PLAN-HTTPZ S4) slots in behind this same surface -// when the linux target lands; callers never see the backend. +// Backend: kqueue (std/net/kqueue) on darwin, epoll (std/net/epoll) on +// linux. The whole `Loop` struct is selected per-OS by `inline if OS` +// (the compiler's flatten pre-pass picks the matching top-level decl) — +// callers never see the backend. The two backends differ enough in state +// that they are separate structs rather than one struct with conditional +// fields (sx has no conditional struct fields): kqueue carries only its +// queue fd, while epoll keeps a small per-fd registration table (it has +// ONE registration per fd with a combined interest mask, and its event +// echoes back only a single `data` word — we stash the fd there and the +// table maps fd → the caller's udata). // // Interest is per direction: read and write are registered and removed -// independently (mirroring kqueue filters; the epoll backend will -// compose its event mask internally). The typical server pattern: -// read interest for a connection's whole life, write interest only -// while a partial response is pending. +// independently. On kqueue these are independent EVFILT_* filters; on +// epoll the Loop composes the combined EPOLLIN/EPOLLOUT mask internally +// and issues EPOLL_CTL_ADD/MOD/DEL. The typical server pattern: read +// interest for a connection's whole life, write interest only while a +// partial response is pending. // // Deadlines: the loop deliberately has no timer registrations — // httpz-style timeout bookkeeping (request/keepalive eviction) is // deadline math the caller does with `deadline_in`/`expired` between // waits, passing the nearest deadline as `wait`'s timeout. +// +// VALIDATION: the kqueue path runs end-to-end on the macOS dev host +// (examples/event/1632 — which exercises the full facade surface: +// add_read/write, add_wake/wake, wait, del_*, EOF). The epoll path has no +// linux box here, so it is verified to LOWER clean for x86_64-linux and +// aarch64-linux (the whole module + every epoll syscall emits) and is +// self-reviewed; it is NOT corpus-snapshotted (a Loop example pulls in the +// std barrel → an ~18k-line IR dump that would churn on any unrelated std +// change — worse than the gap). The epoll ABI itself (the layout-sensitive +// part) IS corpus-locked, by examples/event/1633 over the raw bindings. +// Runtime behavior validates on a linux runner. #import "modules/std.sx"; kqb :: #import "modules/std/net/kqueue.sx"; timp :: #import "modules/std/time.sx"; +// NOTE: the epoll backend is imported INSIDE the `inline if OS == .linux` +// branch below, never at top level. event.sx rides the std.sx barrel, so a +// top-level `#import "epoll.sx"` would register epoll's types into EVERY std +// program's type table on darwin too — drifting every `.ir` snapshot. Scoping +// the import to the linux branch keeps darwin's type graph unchanged. (kqb +// stays top-level: it was already there before the epoll split, so darwin's +// table — and the snapshots — match; on linux its kqueue externs are unused +// declares.) EventErr :: error { Init, // the kernel queue could not be created @@ -36,7 +63,8 @@ EventErr :: error { // eof — the peer finished writing (drain pending bytes, then close); // err — the registration itself failed asynchronously; // user — a cross-thread wake() (see add_wake), no fd attached; -// nbytes — bytes readable / writable-buffer space (backend estimate); +// nbytes — bytes readable / writable-buffer space (backend estimate; +// kqueue reports it, epoll does not → 0 on linux); // udata — the word given at registration, verbatim. Event :: struct { fd: i32 = -1; @@ -49,6 +77,175 @@ Event :: struct { nbytes: i64 = 0; } +inline if OS == .linux { + +ep :: #import "modules/std/net/epoll.sx"; + +// ── epoll backend (linux) ────────────────────────────────────────────── +// epoll reports a single 64-bit `data` per event and carries ONE +// registration per fd, so the Loop keeps a tiny table: each `Reg` records +// the fd's current combined interest mask and the caller's udata. The fd +// itself is stashed in epoll's `data` (so `epoll_wait` reports which fd +// fired); the table recovers the udata and lets add/del compose the mask +// into an EPOLL_CTL_ADD / MOD / DEL. +// +// One semantic difference from the kqueue backend: epoll has a SINGLE +// udata per fd (not per direction), so registering read and write on the +// same fd with different udata words keeps the most recent — a readable +// and a writable event on that fd then report the same udata. Callers key +// udata on the fd/connection (the universal pattern), so this is +// invisible in practice; pass the same udata for both directions of a fd. +Reg :: struct { + fd: i32 = -1; + mask: u32 = 0; + udata: usize = 0; +} + +Loop :: struct { + epfd: i32 = -1; + wake_fd: i32 = -1; // eventfd, lazily created by add_wake + wake_udata: usize = 0; + regs: List(Reg); + // The Loop outlives the caller's current `context.allocator` scope, so + // capture the owning allocator at init and grow `regs` through it (the + // long-lived-container rule). + own: Allocator; + + init :: () -> Loop !EventErr { + e := ep.ep_create(); + if e < 0 { raise error.Init; } + return Loop.{ epfd = e, regs = .{}, own = context.allocator }; + } + + close :: (self: *Loop) { + if self.epfd >= 0 { socket.close(self.epfd); } + if self.wake_fd >= 0 { socket.close(self.wake_fd); } + self.regs.deinit(self.own); + self.epfd = -1; + self.wake_fd = -1; + } + + // Index of the registration for `fd`, or -1. Linear scan — fd counts in + // the M:1 / per-worker model are small (mirrors the scheduler's waiter + // lists). + reg_index :: (self: *Loop, fd: i32) -> i64 { + i := 0; + while i < self.regs.len { + if self.regs.items[i].fd == fd { return i; } + i += 1; + } + return -1; + } + + // Drive `fd`'s registration to interest `mask`: ADD a new fd, MOD an + // existing one, or DEL (and forget) when the mask drops to zero. The + // table is kept in lockstep with the kernel. True on success. + apply_mask :: (self: *Loop, fd: i32, mask: u32, udata: usize) -> bool { + idx := self.reg_index(fd); + if mask == 0 { + if idx < 0 { return true; } + ok := ep.ep_ctl(self.epfd, ep.EPOLL_CTL_DEL, fd, 0); + // swap-remove the forgotten reg (order is irrelevant). + self.regs.items[idx] = self.regs.items[self.regs.len - 1]; + self.regs.len = self.regs.len - 1; + return ok; + } + if idx >= 0 { + self.regs.items[idx].mask = mask; + self.regs.items[idx].udata = udata; + return ep.ep_ctl(self.epfd, ep.EPOLL_CTL_MOD, fd, mask); + } + self.regs.append(Reg.{ fd = fd, mask = mask, udata = udata }, self.own); + return ep.ep_ctl(self.epfd, ep.EPOLL_CTL_ADD, fd, mask); + } + + // Read interest also arms EPOLLRDHUP so a peer half-close surfaces as + // `Event.eof` — matching kqueue's EV_EOF, which comes for free. + add_read :: (self: *Loop, fd: i32, udata: usize) -> !EventErr { + idx := self.reg_index(fd); + mask := ep.EPOLLIN | ep.EPOLLRDHUP; + if idx >= 0 { mask = self.regs.items[idx].mask | ep.EPOLLIN | ep.EPOLLRDHUP; } + if !self.apply_mask(fd, mask, udata) { raise error.Register; } + return; + } + del_read :: (self: *Loop, fd: i32) { + idx := self.reg_index(fd); + if idx < 0 { return; } + mask := self.regs.items[idx].mask & ~(ep.EPOLLIN | ep.EPOLLRDHUP); + self.apply_mask(fd, mask, self.regs.items[idx].udata); + } + add_write :: (self: *Loop, fd: i32, udata: usize) -> !EventErr { + idx := self.reg_index(fd); + mask := ep.EPOLLOUT; + if idx >= 0 { mask = self.regs.items[idx].mask | ep.EPOLLOUT; } + if !self.apply_mask(fd, mask, udata) { raise error.Register; } + return; + } + del_write :: (self: *Loop, fd: i32) { + idx := self.reg_index(fd); + if idx < 0 { return; } + mask := self.regs.items[idx].mask & ~ep.EPOLLOUT; + self.apply_mask(fd, mask, self.regs.items[idx].udata); + } + + // The loop's wake channel: an eventfd registered for EPOLLIN. wake() + // from any thread writes the 8-byte counter, making wait() return an + // Event carrying `udata` with `.user` set. (kqueue uses EVFILT_USER; + // epoll's idiom is eventfd.) One registration serves the Loop's life. + add_wake :: (self: *Loop, udata: usize) -> !EventErr { + if self.wake_fd < 0 { + self.wake_fd = ep.eventfd(0, ep.EFD_CLOEXEC | ep.EFD_NONBLOCK); + if self.wake_fd < 0 { raise error.Register; } + } + self.wake_udata = udata; + if !ep.ep_ctl(self.epfd, ep.EPOLL_CTL_ADD, self.wake_fd, ep.EPOLLIN) { raise error.Register; } + return; + } + + // Thread-safe: writing the eventfd counter is atomic. + wake :: (self: *Loop) { + if self.wake_fd < 0 { return; } + one : u64 = 1; + socket.write(self.wake_fd, xx @one, 8); + } + + // Fill `out` with ready events, waiting at most `timeout_ms` + // (negative = forever). Returns the count; 0 is a timeout. + wait :: (self: *Loop, out: []Event, timeout_ms: i64) -> i64 !EventErr { + raw : [64]ep.EpollEvent = ---; + cap : i64 = 64; + if xx out.len < cap { cap = xx out.len; } + n := ep.ep_wait(self.epfd, .{ ptr = @raw[0], len = cap }, xx cap, xx timeout_ms); + if n < 0 { raise error.Wait; } + i := 0; + while i < n { + evr := raw[i]; + fd := ep.ev_fd(evr); + e : Event = .{ fd = fd }; + if self.wake_fd >= 0 and fd == self.wake_fd { + // Drain the eventfd counter so it doesn't re-fire immediately. + drain : u64 = 0; + socket.read(self.wake_fd, xx @drain, 8); + e.user = true; + e.udata = self.wake_udata; + } else { + idx := self.reg_index(fd); + if idx >= 0 { e.udata = self.regs.items[idx].udata; } + if ep.ev_readable(evr) { e.readable = true; } + if ep.ev_writable(evr) { e.writable = true; } + if ep.ev_eof(evr) { e.eof = true; } + if ep.ev_err(evr) { e.err = true; } + } + out[i] = e; + i += 1; + } + return xx n; + } +} + +} else { + +// ── kqueue backend (darwin) ──────────────────────────────────────────── Loop :: struct { kq: i32 = -1; @@ -118,7 +315,10 @@ Loop :: struct { } } +} + // ── deadline helpers (monotonic, std.time) ─────────────────────────── +// Backend-independent — shared by both Loop variants. // The absolute monotonic instant `ms` from now. deadline_in :: (ms: i64) -> i64 { diff --git a/library/modules/std/net/epoll.sx b/library/modules/std/net/epoll.sx new file mode 100644 index 00000000..01224130 --- /dev/null +++ b/library/modules/std/net/epoll.sx @@ -0,0 +1,140 @@ +// std/net/epoll — raw epoll bindings: the linux twin of std/net/kqueue. +// linux-only by definition; the OS-neutral Loop facade over both backends is +// std.event. Import this module explicitly — like its kqueue sibling it +// deliberately does not ride the std.sx barrel. +// +// One epoll instance multiplexes readiness for any number of fds: a registered +// fd reports through `epoll_wait` when its interest mask (EPOLLIN / EPOLLOUT) +// fires, and an idle registration costs nothing — the head-of-line-free +// substrate the event Loop and an httpz-shaped server worker stand on. +// +// ── How this differs from kqueue (and why the surface is shaped this way) ── +// - ONE registration per fd carries a combined events MASK; changing the mask +// is EPOLL_CTL_MOD, not a second EVFILT_* add. The Loop (std.event) tracks +// the per-fd mask and feeds the full mask on each change. +// - `epoll_event` echoes back a single 64-bit `data` word, NOT the fd in a +// separate field the way kqueue's `ident` is the fd. We stash the fd in the +// low 32 bits of `data` (`data_lo`) so `epoll_wait` reports which fd fired; +// a caller wanting a wider udata keeps its own fd→udata map. +// - EOF is EPOLLHUP / EPOLLRDHUP flags on a readable event, not kqueue's +// EV_EOF; an async registration error is EPOLLERR. +// +// ── struct epoll_event layout (the one real ABI landmine) ────────────────── +// struct epoll_event { uint32_t events; epoll_data_t data; }; // data is a +// union { void* ptr; int fd; uint32_t u32; uint64_t u64; } (8 bytes). +// On x86_64 the struct is __attribute__((packed)) → 12 bytes, `data` at +// offset 4. On every other arch (aarch64) it is naturally aligned → 16 bytes, +// `data` at offset 8. sx has no packed-struct primitive, so we model the +// 8-byte `data` union as two u32 halves and let the field layout fall out per +// arch: +// x86_64 : { events@0, data_lo@4, data_hi@8 } → 12 bytes +// aarch64: { events@0, pad@4, data_lo@8, data_hi@12 } → 16 bytes +// Every field is a u32 at a 4-aligned offset, so no packed attribute and no +// unaligned 8-byte access is ever needed — yet `size_of(EpollEvent)` and the +// `[N]EpollEvent` stride come out byte-exact for the kernel ABI on both +// arches, and `epoll_wait` can fill a plain `[]EpollEvent` directly. (Both +// arches are little-endian, so the fd — an `int` in the union — is the low +// word, `data_lo`.) This struct-per-arch shape was chosen over raw byte-offset +// poking deliberately: idiomatic field reads, no scalar-pointer indexing +// (issue 0155), no unaligned u64. +// +// VALIDATION NOTE: the dev host is aarch64-macOS — there is no linux box to run +// this against, so this module is currently IR-only verified: the arch-correct +// layout (12-byte / 16-byte stride, fd offset) surfaces as the struct shape in +// `sx ir --target *-linux`, and the whole module lowers clean. Runtime +// correctness (syscall behavior, the kernel-filled event array, EPOLLRDHUP +// semantics) validates end-to-end only on a linux runner — mirror of how the +// Win64 switch was IR-only until a Windows VM appeared (CHECKPOINT-FIBERS +// B1.3b-1). +// +// No `#import "modules/build.sx"` despite the `inline if ARCH` below: a +// top-level `inline if OS/ARCH/POINTER_SIZE` conditional is resolved by the +// compiler's flatten pre-pass (imports.zig — name-matched against the target), +// NOT by reading build.sx's `ARCH` global as a value. Skipping the import keeps +// this module's IR self-contained (libc only) — no std/compiler/bundle baggage. +libc :: #library "c"; + +// struct epoll_event, arch-exact (see the header). Both variants expose the +// same three load-bearing fields — `events`, `data_lo` (the fd), `data_hi` — so +// consumer code is arch-agnostic; the aarch64 `pad` is never touched. +inline if ARCH == .x86_64 { + EpollEvent :: struct { + events: u32 = 0; + data_lo: u32 = 0; // the fd (union's low 32 bits) + data_hi: u32 = 0; + } +} else { + EpollEvent :: struct { + events: u32 = 0; + pad: u32 = 0; // alignment pad before the 8-aligned data union + data_lo: u32 = 0; // the fd (union's low 32 bits) + data_hi: u32 = 0; + } +} + +// ── interest mask (events) ───────────────────────────────────────────────── +EPOLLIN :u32: 0x001; +EPOLLPRI :u32: 0x002; +EPOLLOUT :u32: 0x004; +EPOLLERR :u32: 0x008; +EPOLLHUP :u32: 0x010; +EPOLLRDHUP :u32: 0x2000; // peer half-closed (drain, then close) +EPOLLET :u32: 0x80000000; // edge-triggered +EPOLLONESHOT:u32: 0x40000000; // disarm after one delivery + +// ── epoll_ctl ops ────────────────────────────────────────────────────────── +EPOLL_CTL_ADD :i32: 1; +EPOLL_CTL_DEL :i32: 2; +EPOLL_CTL_MOD :i32: 3; + +// epoll_create1 / eventfd flags (== O_CLOEXEC). +EPOLL_CLOEXEC :i32: 0x80000; +EFD_CLOEXEC :i32: 0x80000; +EFD_NONBLOCK :i32: 0x800; + +epoll_create1 :: (flags: i32) -> i32 extern libc; +epoll_ctl :: (epfd: i32, op: i32, fd: i32, event: *EpollEvent) -> i32 extern libc; +epoll_wait :: (epfd: i32, events: *EpollEvent, maxevents: i32, timeout: i32) -> i32 extern libc; +// eventfd: the cross-thread wake channel (epoll's answer to EVFILT_USER). +eventfd :: (initval: u32, flags: i32) -> i32 extern libc; + +// errno, bound locally on linux (`__errno_location`; darwin's is `__error`, +// but this module only ever lowers under a linux target). +errno_slot_ep :: () -> *i32 extern libc "__errno_location"; +EINTR_EP :: 4; + +// ── readiness-flag helpers over one event ────────────────────────────────── +ev_readable :: (e: EpollEvent) -> bool { return (e.events & EPOLLIN) != 0; } +ev_writable :: (e: EpollEvent) -> bool { return (e.events & EPOLLOUT) != 0; } +// EPOLLHUP (full close) or EPOLLRDHUP (peer half-closed) — drain then close. +ev_eof :: (e: EpollEvent) -> bool { return (e.events & (EPOLLHUP | EPOLLRDHUP)) != 0; } +ev_err :: (e: EpollEvent) -> bool { return (e.events & EPOLLERR) != 0; } +// The fd stashed in `data` at registration. +ev_fd :: (e: EpollEvent) -> i32 { return xx e.data_lo; } + +// ── thin wrappers ────────────────────────────────────────────────────────── + +// Create an epoll instance (close-on-exec). <0 on failure. +ep_create :: () -> i32 { + return epoll_create1(EPOLL_CLOEXEC); +} + +// Apply one registration change: add / modify / delete `fd`'s interest +// `events` on `epfd`, stashing `fd` in `data` so `epoll_wait` reports it. True +// on success. For EPOLL_CTL_DEL the kernel ignores the event payload. +ep_ctl :: (epfd: i32, op: i32, fd: i32, events: u32) -> bool { + ev : EpollEvent = .{ events = events, data_lo = xx fd }; + return epoll_ctl(epfd, op, fd, @ev) == 0; +} + +// Drain ready events into `events` (room for `maxev` entries), waiting at most +// `timeout_ms` (negative = forever). Returns the event count (0 = timeout); -1 +// only on a real failure — EINTR is retried (mirror of kqueue's kq_wait). +ep_wait :: (epfd: i32, events: []EpollEvent, maxev: i32, timeout_ms: i32) -> i32 { + while true { + n := epoll_wait(epfd, @events[0], maxev, timeout_ms); + if n >= 0 { return n; } + if errno_slot_ep().* != EINTR_EP { return -1; } // EINTR: reissue + } + return -1; +}