diff --git a/current/PLAN-POST-METATYPE.md b/current/PLAN-POST-METATYPE.md index d9a23215..316fee85 100644 --- a/current/PLAN-POST-METATYPE.md +++ b/current/PLAN-POST-METATYPE.md @@ -34,8 +34,12 @@ earlier if FFI/`#compiler`-collapse becomes a priority). ## Stream A — ATOMICS (N1) · `PLAN-ATOMICS.md` when carved **Goal:** LLVM atomic codegen — the net-new emit primitive. Surface = `Atomic($T)` -wrapper + `Ordering` enum (locked, design §4.6). Some IR/inference scaffolding exists; -**lowering is absent**. +wrapper + `Ordering` enum (locked, design §4.6). **Grounding correction: this is 100% +net-new — there is NO atomics scaffolding.** `Atomic`/`Ordering` exist nowhere in +`library/` (the only `thread.sx` hit is the word "Atomically" in a comment), and the +only "ordering" in `lower.zig:1400-1418` is **comparison** ordering (`< <= > >=`), +entirely unrelated to memory ordering — do not mistake it for groundwork. A.0 must +build the type, the IR op, inference, AND lowering from zero. **Phases:** - A.0 `Atomic($T)` + `Ordering` lib types + `load`/`store` → LLVM `load atomic`/`store @@ -57,24 +61,50 @@ The colorblind, stackful, pure-sx async runtime (design §4). Compiler floor is the runtime is sx lib. Likely carved as two PLANs: ### B1 — Fibers + Io + M:1 (the runtime; `PLAN-FIBERS.md`) -- B1.0 **`callconv(.naked)`** — extend `CallConv {default, c}` (types.zig:169) + skip - prologue/epilogue lowering. (Net-new; gates the context-switch.) -- B1.1 **Repointable-`context` codegen** — lower `context` as a swappable indirection - (never raw TLS) + per-fiber stack-limit. **Prerequisite of B1.3, not a successor.** +- B1.0 **`abi(.naked)` — make the EXISTING `.pure` ABI actually naked.** The enum + already carries `.pure` (ast.zig:142, documented "pure/naked, no prologue/epilogue"), + but it is an **inert label today**: `type_resolver.zig:237` maps `.pure → .default` + CC and there is **zero naked-attribute emission in emit_llvm**. So B1.0 is NOT + "extend the enum" (done) — it is "emit the LLVM `naked` attr + skip prologue/epilogue + lowering for `.pure`," genuinely net-new. (Roadmap §7-step-4's "extend + `CallConv {default, c}`" is stale — CallConv was renamed ABI and already gained + `compiler`/`pure` in the compiler-API stream.) Gates the context-switch. +- B1.1 **Per-fiber `context` root + `push Context`-stack storage.** Grounding correction: + `context` is **already an implicit `*Context` parameter** (comptime_vm.zig:392, + lower.zig:257 "Implicit Context parameter machinery"), **not raw TLS** — so it already + rides the fiber stack and the design doc's "lower as swappable indirection, never raw + TLS" guards a non-problem. The **real, currently-unsized** scope is: (a) where a + freshly-spawned fiber's *root* `Context` comes from, and (b) where the `push Context` + stack frames live (if on the caller stack, fiber-local for free; if a global root, + that root must become per-fiber). **Ground the current mechanism FIRST** — B1.1's size + is unknown until then, and it may be much smaller than the prior "M" estimate. + **Prerequisite of B1.3, not a successor.** - B1.2 **A1 — `Io` interface + `context.io` + `Future` + `cancel()` API** (protocol/ vtable threaded like `Allocator`). -- B1.3 **A2 — fiber runtime**: `callconv(.naked)` context-switch asm (per-arch), - bootstrap, `mmap` stacks. **sx lib, not a compiler builtin** (design §4 A2). +- B1.3 **A2 — fiber runtime**: `abi(.naked)` context-switch asm (per-arch), bootstrap, + `mmap` stacks **with mandatory guard pages** (NOT optional — a fixed-stack fiber that + overflows without a guard corrupts adjacent fiber memory silently; §8.1.1). **sx lib, + not a compiler builtin** (design §4 A2). **First deliverable of B1.3, before the + scheduler AND before the deterministic `Io`: a standalone 2-fiber ping-pong + switch-stress harness** (scribble every callee-saved reg + a stack canary before each + suspend, deep/recursive fiber chains, verify all survive post-resume — §10.7). It + needs no scheduler and is the *only* gate that catches a one-register slip; A2 is + untestable by the deterministic-`Io` harness (which tests *scheduling*, not the + *switch*), so this harness — not B1.4 — is A2's correctness gate. - B1.4 **A3 — `Io` impls: blocking → deterministic-sim (KEYSTONE) → event-loop** (kqueue/epoll/io_uring). Build the deterministic `Io` *before* the event loop — it - is the test harness (§10.1). + is the test harness for *scheduling* (§10.1). (Note: the **event loop does not yet + cooperate with a platform UI run loop** — CFRunLoop/NSRunLoop/ALooper; pinning gives + thread-affinity, not run-loop integration. Tracked as an open design gap for the §6 + app targets, deferred out of B1.) - B1.5 **A5·M:1 scheduler** — validates the whole colorblind stack end-to-end. -**Gates:** deterministic-`Io` **calibrated** against blocking `Io` (don't trust an -uncalibrated oracle — §8.1.3); corpus `18xx` under deterministic `Io`; **A2 -switch-stress test** (scribble every callee-saved reg + canary, deep fiber chains, -verify post-resume — §10.7) + arch-gated run tests. A2 is the highest-corruption-risk -piece (§8.1.1). +**Gates:** the **B1.3 switch-stress harness is A2's gate** (register/canary survival, +not run/snapshot — §8.1.1, §10.7) + arch-gated run tests; deterministic-`Io` +**calibrated** against blocking `Io` (don't trust an uncalibrated oracle — §8.1.3); +corpus `18xx` under deterministic `Io` asserts a program-emitted **ordering contract** +(sequence markers), not raw interleaving, so scheduler-internal policy changes don't +churn every snapshot. ### B2 — Channels + cancellation + stdlib (`PLAN-CHANNELS.md`) - B2.0 **N3 — channels** (`Channel($T)`; `recv → RecvResult($T)` tagged union built via @@ -96,9 +126,19 @@ ordering; `RecvResult` exercises the metatype primitives. - C.1 **M:N** — work-stealing (thread-safe steal queues + migration); **pinning** API (`pin = .main | .any | .on(thread)`). M:N is **committed, not deferred** — just last. -**Gates:** data races aren't snapshottable → **stress harness** (run-N / TSan-style), -*loudly* out of corpus scope (§10.2). **Named `context`-fiber-local + errno migration -test** (M:1 can't exercise migration — §10.7). +**Gates:** data races aren't snapshottable, but "out of corpus scope" is **not** "no +plan" — Stream C is **blocked on a concrete, named stress harness landing FIRST** (a +gating artifact carved into `PLAN-PARALLEL.md`, not a footnote): +1. **Sanitizer build** — a `zig build`-integrated TSan (and ASan) variant of the + concurrency corpus; CI runs `18xx`/parallel examples under it. +2. **Run-N driver** — each parallel example executed N times (configurable, default + ≥100) with interleaving perturbation (randomized ready-queue / yield injection); any + nondeterministic divergence or sanitizer report fails the build. +3. **Coverage-bound `log()`** — the harness emits, loudly, exactly which guarantees it + does and does NOT cover (per the REJECTED-PATTERNS rule against silent gaps). +This harness is the **only** correctness story for N×(M:1)/M:N; C.0/C.1 do not start +until it exists and is calibrated. Plus the **named `context`-fiber-local + errno +migration test** (M:1 can't exercise migration — §10.7). --- @@ -143,14 +183,20 @@ module hazard; S2 TLS + C-constructor JIT test per host OS (the exact prior-spik ## Cross-cutting (applies across streams) -- **Testing keystone:** the deterministic-sim `Io` (B1.4) must exist + be calibrated - before *any* async test is trusted (§10.1). -- **Top risks to watch (§8.1):** A2 context-switch correctness (B1.3), minted-enum → - match codegen (de-risked, metatype stream), deterministic-`Io` oracle calibration, - `context`-fiber-local/errno (C), S2 (E), C1 args-buffer layout (D). -- **The compiler floor stays small, but deep:** atomics, `callconv(.naked)`, repointable- - `context` codegen, `declare`/`define`/`type_info` (metatype stream), the S1 JIT spine. - Everything else — schedulers, fibers, channels, the bundler — is sx lib. +- **Testing keystone:** the deterministic-sim `Io` (B1.4) gates *scheduling* tests + (§10.1); the **B1.3 switch-stress harness gates the context-switch** (the one piece + the deterministic `Io` can't test). Both must exist + be calibrated before the async + tests they gate are trusted. +- **Top risks to watch (§8.1):** A2 context-switch correctness (B1.3 — gated by its own + stress harness, not the deterministic `Io`), minted-enum → match codegen (de-risked, + metatype stream), deterministic-`Io` oracle calibration, `context`-fiber-local/errno + (C — gated by the named stress harness), S2 (E), C1 args-buffer layout (D). +- **The compiler floor stays small, but deep — net-new pieces, grounded:** atomics + (100% net-new, no scaffolding), making `abi(.pure)` actually naked (the enum variant + exists but is inert today), per-fiber `context` root + push-stack storage (`context` + is already an implicit param, NOT TLS — so this is smaller/different than "repointable + codegen" implied), `declare`/`define`/`type_info` (metatype stream — **done**), the + S1 JIT spine. Everything else — schedulers, fibers, channels, the bundler — is sx lib. ## Carving protocol diff --git a/design/execution-evolution-roadmap.md b/design/execution-evolution-roadmap.md index 86244964..eaabb9ca 100644 --- a/design/execution-evolution-roadmap.md +++ b/design/execution-evolution-roadmap.md @@ -74,7 +74,7 @@ is ``"). | ID | Piece | State | Size | |----|-------|-------|------| -| **N1** | **Atomics — NET-NEW compiler feature.** Atomic load/store/RMW (`add/sub/and/or/xor/swap` + `fetch_min`/`fetch_max`; no `nand`), `compare_exchange`/`_weak` (→ `?T`, **null = success**), and fences, with orderings (relaxed/acquire/release/acq_rel/seq_cst). LLVM provides all — an **emit** feature, not a runtime library. **Surface LOCKED = `Atomic($T)` wrapper + `Ordering` enum** (not `@atomic_*` — `@` is address-of in sx). | **lowering absent** — zero LLVM `atomicrmw`/`cmpxchg`/`fence` emission today; some IR/inference scaffolding exists | M | +| **N1** | **Atomics — NET-NEW compiler feature.** Atomic load/store/RMW (`add/sub/and/or/xor/swap` + `fetch_min`/`fetch_max`; no `nand`), `compare_exchange`/`_weak` (→ `?T`, **null = success**), and fences, with orderings (relaxed/acquire/release/acq_rel/seq_cst). LLVM provides all — an **emit** feature, not a runtime library. **Surface LOCKED = `Atomic($T)` wrapper + `Ordering` enum** (not `@atomic_*` — `@` is address-of in sx). | **fully net-new** — zero LLVM `atomicrmw`/`cmpxchg`/`fence` emission **and no atomics scaffolding**: `Atomic`/`Ordering` exist nowhere in `library/`, and the only "ordering" in `lower.zig:1400` is *comparison* ordering (`< <= >=`), unrelated to memory ordering | M | | **N2** | **OS threads + pthread Mutex/Cond + worker Pool** | **landed** — [std/thread.sx](../library/modules/std/thread.sx) (`pthread_create`/`join`/`detach`, in-place `Mutex`/`Cond`, bounded `Pool`). NOTE: pthread mutex **blocks the OS thread** — it is *not* fiber-aware (it would park every fiber on that thread); fiber-aware sync is N3, built on N1. | — | | **N3** | **Fiber-aware sync** — mutex / channel / waitgroup that **suspend the fiber**, not the OS thread. Hybrid: atomic fast-path (N1) + fiber-suspend slow-path (A2/A5). Distinct from the pthread primitives in N2. | new library | M | @@ -99,7 +99,7 @@ suspends is decided by the `Io` *implementation*, transparently. | ID | Piece | Notes | Size | |----|-------|-------|------| | **A1** | **`Io` interface + `context.io`** — a protocol/vtable threaded like `Allocator`. `io.async(fn,args) → Future`, `future.await`, cancellation. | leverages protocols + context | M | -| **A2** | **Stackful coroutine runtime — in sx lib, NOT a compiler builtin.** The context-switch is a `callconv(.naked)` sx fn with an inline-asm body (save callee-saved + SP/LR into `*from`, load from `*to`, `ret`); fiber bootstrap + stack alloc (`mmap`+guard via `extern`) also sx. The **compiler's** job is only (a) the general primitives — inline asm, `callconv(.naked)`, atomics — and (b) **fiber-safe codegen**: `context` lowered as a *repointable indirection* (never raw TLS) so the switch can repoint it, and stack-limit guards (if emitted) read from a swappable per-fiber location. Most arch-delicate sx in the tree (must match the platform callee-saved set + the compiler ABI), but it's inspectable sx, not a black box. | per-arch, arch-gated; co-validate vs codegen | M | +| **A2** | **Stackful coroutine runtime — in sx lib, NOT a compiler builtin.** The context-switch is a `callconv(.naked)` sx fn with an inline-asm body (save callee-saved + SP/LR into `*from`, load from `*to`, `ret`); fiber bootstrap + stack alloc (`mmap`+guard via `extern`) also sx. The **compiler's** job is only (a) the general primitives — inline asm, `abi(.naked)`, atomics — and (b) **fiber-safe codegen**: `context` is **already an implicit `*Context` param** (not TLS — see §7 step 5), so the switch repoints it for free by swapping the per-fiber root; the open work is the per-fiber root + push-stack storage, and stack-limit guards (**mandatory, not optional** — fixed mmap stacks without a guard corrupt neighbors silently) reading from a swappable per-fiber location. Most arch-delicate sx in the tree (must match the platform callee-saved set + the compiler ABI), but it's inspectable sx, not a black box. | per-arch, arch-gated; co-validate vs codegen | M | | **A3** | **Event-loop `Io` impls** — kqueue / epoll / io_uring drive readiness, then the (now-ready) syscall via C1. Plus a trivial **blocking `Io`**. | pure sx around syscall `extern`s | L | | **A4** | **Stdlib I/O rework** — fs/socket/process take/use `context.io` instead of raw blocking syscalls, so existing calls participate in async. | mirrors the allocator-threading rule | M | | **A5** | **Schedulers — M:1 → N×(M:1) → M:N, all sx std-lib `Io` vtables (committed; M:N last, not deferred).** M:1 first (minimal vehicle to validate the colorblind stack; covers I/O-bound). N×(M:1) = first parallel step (per-thread M:1 loops + `std/thread.sx` spawn; shared state uses N1 atomics — expected under parallelism, not a wart). M:N work-stealing last (most machinery: thread-safe steal queues + migration + errno/TLS discipline). All over N1 atomics + the A2 asm context-switch + `extern` syscalls. **pinning** API for thread-affine work (UI main thread, GL context). | see §4.3 | M (M:1) / M (N×M:1) / L (M:N) | @@ -395,12 +395,21 @@ grounding) are explicit steps, not buried. construct `TypeInfo` programmatically + `intern()`. **Residual = plumbing, not capability:** name minted results by the instantiation's mangled name + input validation. -4. **`callconv(.naked)`** — extend `CallConv {default, c}` (types.zig:169) + skip +4. **`abi(.naked)`** — *correction:* `CallConv` was renamed `ABI` and **already carries + `.pure`** (ast.zig:142, "pure/naked, no prologue/epilogue") during the compiler-API + stream — so this is NOT "extend the enum." `.pure` is an **inert label today**: + `type_resolver.zig:237` maps it to `.default` CC and emit_llvm emits **no** naked + attribute. The net-new work is making `.pure` actually emit LLVM `naked` + skip prologue/epilogue lowering. Gates A2. -5. **Repointable-`context` codegen** — lower `context` as a swappable indirection - (never raw TLS) + per-fiber stack-limit. Compiler obligation; gates A2 *and* - cross-fiber `context.io` correctness. (Reviewer note: this is a **prerequisite** - of A2, not a successor.) +5. **Per-fiber `context` root + push-stack storage** — *correction:* `context` is + **already an implicit `*Context` parameter** (comptime_vm.zig:392, lower.zig:257 + "Implicit Context parameter machinery"), **not raw TLS** — so the "lower as swappable + indirection, never raw TLS" framing guards a non-problem; it already rides the fiber + stack. The real, **currently-unsized** obligation is (a) where a freshly-spawned + fiber's *root* `Context` comes from and (b) where `push Context` frames live (caller + stack ⇒ fiber-local for free; a global root ⇒ must become per-fiber) + per-fiber + stack-limit. **Ground the current mechanism before sizing this.** Prerequisite of + A2, not a successor. **Async runtime — sx lib over the primitives:** 6. **A1 — `Io` interface + `context.io` + `Future` + `cancel()` API.**