sx/current/PLAN-ATOMICS.md

# PLAN-ATOMICS — Stream A (atomics lowering)

Carved from [PLAN-POST-METATYPE.md](PLAN-POST-METATYPE.md) Stream A + the design-of-record
[../design/execution-evolution-roadmap.md](../design/execution-evolution-roadmap.md) §3 (N1)
+ §4.6 (locked surface). Progress in [CHECKPOINT-ATOMICS.md](CHECKPOINT-ATOMICS.md).

**Goal:** net-new LLVM atomic codegen. Surface = a pure-sx `Atomic($T)` generic struct +
an `Ordering` enum (ordinary sx), with the actual atomic operations recognized as
`#builtin` intrinsics at lower-time and emitted as new IR ops. This is **100% net-new** —
no atomics scaffolding exists (the only `lower.zig` "ordering" is *comparison* ordering
`< <= >=`, unrelated to memory ordering — do not mistake it for groundwork).

**Cadence (IMPASSIBLE):** no commit both adds a test AND makes it pass (lock-to-bail, then
flip to green); `zig build && zig build test` green after every step; never regen snapshots
while red; scope regens with `-Dname=examples/NNNN-…sx -Dupdate-goldens` + review the diff.
New corpus category: `17xx` atomics.

---

## Design (grounded against the tree)

### Representation — minimal compiler surface
- **`Ordering`** is an ordinary sx enum, zero compiler coupling:
  ```sx
  Ordering :: enum { relaxed; acquire; release; acq_rel; seq_cst; }  // tags 0..4
  ```
- **`Atomic($T)`** is an ordinary sx **generic struct** (mirrors `List :: struct ($T: Type)`
  at [list.sx:5](../library/modules/std/list.sx#L5)), a transparent 1-field wrapper —
  atomicity is a property of the *operation*, not the storage, so `Atomic(i64)` has the
  exact layout/size/align of `i64`. NO new IR *type*, NO type-system coupling:
  ```sx
  Atomic :: struct ($T: Type) {
      value: T;
      init  :: (v: T) -> Atomic(T) { return .{ value = v }; }
      load  :: (self: *Atomic(T), o: Ordering) -> T       { return atomic_load(T, @self.value, o); }
      store :: (self: *Atomic(T), v: T, o: Ordering)      { atomic_store(T, @self.value, v, o); }
  }
  ```
- The **operations** are `#builtin` intrinsic free functions, recognized by name at
  lower-time (the established pattern — `size_of`/`type_info` in
  [`tryLowerReflectionCall`](../src/ir/lower/call.zig#L1672), recognized BEFORE arg lowering):
  ```sx
  atomic_load  :: ($T: Type, ptr: *T, o: Ordering) -> T #builtin;
  atomic_store :: ($T: Type, ptr: *T, v: T, o: Ordering)  #builtin;
  ```
  Explicit `$T` first arg follows the `size_of($T)` / `field_name($T, idx)` mixed
  type+value precedent (lowest-risk; the reflection path already resolves type args).

### Ordering is compile-time-only by construction — and that forces a capability gap
LLVM atomic ordering is an **instruction attribute**, not a runtime operand, so the
ordering MUST be known at emit time. The lower-time handler reads the ordering arg's
variant name statically (it must be a **constant enum literal** `.seq_cst`) and bakes it
into the IR op as a Zig enum field (`AtomicOrdering`). A non-literal ordering is a **loud
diagnostic**, never a silent default (REJECTED-PATTERNS).

**Discovered gap (grounded):** a generic `Atomic(T)` method `load(self, o: Ordering)` would
forward `o` — a *runtime parameter* — to the intrinsic, where it is NOT a literal. And
**comptime enum value params don't exist** (`$o: Ordering` → `o` is "unresolved" in the
body; `resolveValueParamArg` folds integer constraints only). A runtime dispatch hack
(`if o == { case .acquire: atomic_load(…, .acquire) … }`) also fails: `load` with a
`release`/`acq_rel` ordering is *invalid LLVM*, so the arms can't be uniform. Therefore the
**full ordering surface is blocked on a net-new capability** (comptime-constant ordering
propagation — either comptime enum value params, or compiler-recognized `Atomic` method
calls). That capability is its **own step (A.0.5)**, sequenced before ordering-bearing ops.

### sx tag → LLVM ordering is EXPLICIT (non-contiguous!)
LLVM's `LLVMAtomicOrdering` is **not** 0..4: `Monotonic=2, Acquire=4, Release=5,
AcquireRelease=6, SequentiallyConsistent=7` ([Core.h:338-354]). The sx `Ordering` tags
(relaxed=0…seq_cst=4) map via an explicit `switch`, never an identity cast:
`relaxed→Monotonic, acquire→Acquire, release→Release, acq_rel→AcquireRelease,
seq_cst→SequentiallyConsistent`.

### LLVM-C API (verified present in `llvm-c/Core.h`, no new extern decls needed)
- Atomic load = `LLVMBuildLoad2` + `LLVMSetOrdering(v, ord)` + `LLVMSetAlignment(v, size)`
  (**alignment is mandatory** on atomic load/store — LLVM verifier rejects atomics without
  it). There is **no** `LLVMBuildAtomicLoad`/`Store` (the Explore agent was wrong).
- Atomic store = `LLVMBuildStore` + `LLVMSetOrdering` + `LLVMSetAlignment`.
- (Later) `LLVMBuildAtomicRMW(B, op, ptr, val, ord, singleThread)`,
  `LLVMBuildAtomicCmpXchg(B, ptr, cmp, new, succOrd, failOrd, singleThread)`,
  `LLVMBuildFence(B, ord, singleThread, name)`, `LLVMSetWeak`.
- `singleThread = 0` (multi-thread / cross-thread ordering). Atomic-eligible `T` =
  integer / pointer / float of size 1·2·4·8(·16). **Reject non-scalar / bad-size `T`
  loudly** (diagnostic), do not silently emit.

### Comptime VM treats atomics as ordinary load/store
Comptime is single-threaded, so seq_cst is trivially satisfied — the
[`comptime_vm`](../src/ir/comptime_vm.zig#L659) arms for `atomic_load`/`atomic_store`
reuse the ordinary `load`/`store` paths (correct, NOT a bail). `sx run` JITs via LLVM so
runtime atomics execute the real ops; the VM arm only matters for `#run`/const-init.

### Files the new IR op variants force (exhaustive switches)
`atomic_load` / `atomic_store` variants must be handled in every `Op` switch or the Zig
build fails (this is the desired tripwire):
- [inst.zig:159](../src/ir/inst.zig#L159) — add `atomic_load: AtomicLoad`, `atomic_store: AtomicStore` + the structs (mirror `Store` at [inst.zig:286](../src/ir/inst.zig#L286)).
- [lower/call.zig:1672](../src/ir/lower/call.zig#L1672) — recognize the intrinsics, emit the ops (new `tryLowerAtomicIntrinsic`, called alongside `tryLowerReflectionCall` at [call.zig:80](../src/ir/lower/call.zig#L80)).
- [print.zig:231](../src/ir/print.zig#L231) — print arms (sx-IR / `ir-dump`).
- [emit_llvm.zig:1566](../src/ir/emit_llvm.zig#L1566) — dispatch arms → ops.zig.
- [backend/llvm/ops.zig:325](../src/backend/llvm/ops.zig#L325) — `emitAtomicLoad`/`emitAtomicStore` (mirror `emitLoad`/`emitStore`).
- [comptime_vm.zig:659](../src/ir/comptime_vm.zig#L659) — arms reusing load/store.
- Any other `.op` switch the Zig compiler flags (module.zig / program_index.zig) — let the build tell you.

### Test snapshots — the arch-`.ir` requirement is a MISCONCEPTION for atomics
`sx ir` = [`emitIR`](../src/main.zig#L210), which emits **LLVM IR** (respects `--target`);
`sx ir-dump` is the sx-IR printer. At the **LLVM-IR level**, `load atomic i64, ptr %x
seq_cst, align 8` is **arch-invariant** — identical text for x86_64 and aarch64. The
x86-`lock`/MOV vs aarch64-`ldar`/`stlr` divergence happens only at *instruction selection*
(`sx asm`), which the corpus does **not** snapshot. So:
- **A single host `.ir` snapshot** proves the achievable gate (the `load atomic <ordering>`
  keyword + correct ordering + alignment emitted). PLAN-POST §A / design §10.3's
  "arch-gated x86_64 + aarch64 `.ir`" would capture **byte-identical** files — drop it.
- Optionally add ONE cross-arch ir-only example (`.build {"target":"x86_64-linux"}` on an
  aarch64 host) purely as a **cross-target-emission-doesn't-crash** smoke — note in its
  header that the IR body is identical to host.
- **State loudly (out of snapshot scope, parallel to the ordering-semantics caveat):**
  asm-level arch lowering AND weak-memory ordering *semantics* are NOT proven by `.ir`;
  those need the Stream-C stress harness, not the corpus.

---

## Phases

### A.0 — `Atomic($T)` + `Ordering` + **`seq_cst`-only** `load`/`store`  ← START HERE
**Scope (descoped per the discovered gap above):** ship the net-new atomic load/store
codegen with a **`seq_cst` literal baked in the method bodies** — `load(self) -> T` /
`store(self, v)` (NO ordering param yet). The intrinsic still carries the full
`AtomicOrdering` field (always `.seq_cst` here); the recognizer + emit handle all five
orderings already, so A.0.5 only has to plumb the *constant* through. Explicit orderings
(`a.load(.acquire)`) land in A.0.5. seq_cst-only is correct (conservative-strongest), not a
silent fallback.

Two-commit cadence (lock-to-bail → green):

- **A.0a (lock)** — land the lib + IR plumbing with emit deliberately bailing:
  1. New `library/modules/std/atomic.sx`: `Ordering` enum, `Atomic($T)` struct (value +
     `init`/`load`/`store`), `atomic_load`/`atomic_store` `#builtin` decls. **Opt-in import
     (`#import "modules/std/atomic.sx"`), NOT carried by the universal `std.sx` facade** —
     mirrors `trace`. Rationale (grounded): adding the concrete `Ordering` enum to the
     universal prelude registers it into EVERY program's global type table, growing
     `@__sx_type_is_unsigned` (378→380) and shifting all string-global numbering → churned
     37 unrelated `.ir` snapshots + bloats every binary. Atomics is a deliberate concurrency
     capability, so consumers import it explicitly.
  2. Add IR ops `atomic_load`/`atomic_store` + `AtomicOrdering` + the two op structs
     (inst.zig); print arms; comptime_vm arms (reuse load/store); lower recognition
     (`tryLowerAtomicIntrinsic`) incl. the const-ordering-literal guard + non-scalar-`T`
     reject.
  3. emit_llvm/ops.zig arms **bail loudly** for now: `emitAtomicLoad`/`Store` call the
     emitter's bail-with-diagnostic path ("atomic load/store LLVM emission not yet
     implemented") so the Zig build is exhaustive but the example is red-by-diagnostic.
  4. Add `examples/1700-atomics-load-store.sx` (construct `Atomic(i64).init`, `store`,
     `load`, `print`). Seed marker; capture snapshot = the emit-bail diagnostic (nonzero
     exit). `zig build && zig build test` green (matches the locked bail snapshot). Commit.
- **A.0b (green)** — replace the emit bail with real emission:
  `LLVMBuildLoad2`+`LLVMSetOrdering`+`LLVMSetAlignment` / `LLVMBuildStore`+`LLVMSetOrdering`
  +`LLVMSetAlignment`, ordering via the explicit sx-tag→LLVM `switch`. Regen `1700` to
  success output + capture its host `.ir` (asserts `load atomic`/`store atomic` + ordering).
  Add a unit test in `emit_llvm.test.zig` (correct op + ordering + alignment emission).
  Review the diff (no stray error text). Commit.

### A.0.5 — comptime-constant ordering propagation (the capability gap)
Enable `a.load(.acquire)` etc. — i.e. an `Ordering` that reaches the intrinsic as a
compile-time constant through a method. Two candidate designs (pick at pickup):
- **(a) comptime enum value params** — make `$o: Ordering` resolve in the body to its
  variant tag (extend `comptime_value_bindings`/the typer beyond integers). General,
  reusable; larger typer change.
- **(b) compiler-recognized `Atomic` methods** — special-case `Atomic(T).load/store/…`
  calls (read the literal ordering arg at the method call site), bounded coupling to the
  std `Atomic` type (cf. how `Vector` is special-cased). Smaller; less general.
Also enforce per-op ordering validity (load: relaxed/acquire/seq_cst; store:
relaxed/release/seq_cst; CAS's dual orderings) as **compile errors**, which is exactly what
the constant-ordering path buys. Retrofit the ordering param onto `load`/`store` here.

### A.1 — RMW: `fetch_add/sub/and/or/xor` + `fetch_min/max` → `atomicrmw` (no `nand`)
One IR op `atomic_rmw` carrying an `RmwKind` (maps to `LLVMAtomicRMWBinOp*`). Signed vs
unsigned min/max picks `Max/Min` vs `UMax/UMin` from `T`'s signedness. Same lock→green
cadence; `17xx` examples.

### A.2 — `compare_exchange`/`_weak` → `cmpxchg` (returns **`?T`, null = success**)
`atomic_cmpxchg` op (ptr, cmp, new, success_ord, failure_ord, weak). LLVM `cmpxchg`
returns `{T, i1}`; lower to `?T` where **null = success** (extract the i1, invert).
**Validate the two orderings in the compiler** (design §4.6): failure ordering may not be
`release`/`acq_rel` nor stronger than success — loud diagnostic. `_weak` sets `LLVMSetWeak`.

### A.3 — `swap` + `fence(.ordering)`
`swap` = `atomic_rmw` with `Xchg` kind (folds into A.1's op). `fence` = a new `atomic_fence`
op (ordering only) → `LLVMBuildFence`. `17xx` examples.

---

## Gates (per the corrected snapshot story)
- **unit** `emit_llvm.test.zig`: each op emits the right LLVM builder + ordering + alignment.
- **corpus** `17xx` single-thread deterministic (`sx run`, JIT executes real atomics).
- **host `.ir`** snapshot per op proves the keyword/ordering/alignment lowered.
- **OUT of snapshot scope, stated loudly:** asm-level arch divergence (`sx asm`) and
  weak-memory ordering *semantics* — Stream-C stress harness territory, not the corpus.

## Kickoff prompt (A.0a — paste into a fresh session)
> Implement Stream A step A.0a (atomics lock commit) per `current/PLAN-ATOMICS.md`. Verify
> `zig build && zig build test` is green first. Then: (1) create
> `library/modules/std/atomic.sx` with the `Ordering` enum, `Atomic($T)` struct, and
> `atomic_load`/`atomic_store` `#builtin` decls; wire into `library/modules/std.sx`'s tail.
> (2) Add the `atomic_load`/`atomic_store` IR ops + `AtomicOrdering` + op structs in
> `src/ir/inst.zig`; handle them in every exhaustive `Op` switch the Zig build flags
> (print.zig, comptime_vm.zig reuse load/store, emit_llvm dispatch). (3) Add
> `tryLowerAtomicIntrinsic` in `src/ir/lower/call.zig` (recognize the two builtins, bake the
> const ordering literal into the op, loud-reject non-literal ordering AND non-scalar/bad-size
> `T`). (4) Make `emitAtomicLoad`/`emitAtomicStore` in `src/backend/llvm/ops.zig` BAIL loudly
> ("not yet implemented") this commit. (5) Add `examples/1700-atomics-load-store.sx`, seed the
> marker, capture the bail diagnostic as the locked snapshot, confirm `zig build test` green,
> commit. STOP — A.0b (real emission) is the next step. Do NOT implement emission in the same
> commit that adds the example.