# sx Inline Assembly — Implementation Plan (ASM stream) **Design source of truth:** [docs/inline-asm-design.md](../docs/inline-asm-design.md). This plan turns that doc's §II.7 stage-map + §II.8 phasing into ordered, commit-sized, testable steps. Read the design doc first — this file is the *how/when*, not the *what/why*. **Surface (decided):** `asm volatile { "template", "=r" -> T, "r" = expr, clobbers(.cc, .memory) }` — brace block; `->` output / `=` input; `clobbers(.…)` dot-name list; N `-> Type` outputs return a tuple; templates are pure AT&T (via LLVM). **Feasibility (confirmed):** sx links LLVM@19; `src/llvm_api.zig` `@cImport`s `llvm-c/Core.h`, so `llvm_api.c.*` already exposes `LLVMGetInlineAsm` (9-arg), `LLVMInlineAsmDialectATT`, `LLVMBuildCall2`, `LLVMAppendModuleInlineAsm`. No shim. **Relationship to other streams:** - Phases A–E (the inline-asm *expression*) are independent of EXTERN-EXPORT. - Phase F (global asm) consumes `extern`/`export` to import/expose asm symbols — do it **after** `PLAN-EXTERN-EXPORT.md` Phase 2. ## Cadence (IMPASSIBLE) No commit may both add a test AND make it pass. Each feature step is either a behavior-locking PASSING test, or an xfail test the *next* commit turns green. Arch-pinned tests live in `examples/16xx-platform-asm-*` and declare their target via the `expected/.target` sidecar marker (Phase 0). Never regenerate snapshots while red. ## Phase 0 — corpus target-gating (test-infra prerequisite; no compiler code) **Why first.** The flagship v1 examples are `x86_64` (syscall-write, divmod, cpuid) but the dev host is `aarch64`-Darwin, and the corpus runner ([src/corpus_run.test.zig](../src/corpus_run.test.zig)) currently (a) never threads a per-example `--target` and (b) has no host-arch gate — its only skip is "marker has no `.sx`". So D.0's `…-syscall-write` markers asserting exit/stdout describe output the harness *cannot* produce on this host, which would violate the cadence rule (the "next commit turns it green" can never happen). Phase 0 closes that gap. It touches **only the runner + two fixtures** — zero compiler code, zero risk to A–E, and unblocks every arch-pinned asm example. **Marker taxonomy (the cleanup).** The runner currently spreads per-example *directives* across standalone boolean/value sidecars (`.aot` now, `.target` proposed, more later). Replace that sprawl with **one optional config file, `expected/.build`**, holding all build/run directives; the output snapshots (`.exit` / `.stdout` / `.stderr` / `.ir`) stay separate — they are machine-regenerated data, not config. `.exit` remains the **test-discovery key** (every test has one; `.build` is optional). **`.build` format** — JSON, parsed with `std.json`: ```json { "aot": true, "target": "x86_64-linux" } ``` Parse via `std.json.parseFromSlice(BuildConfig, …)` into `struct { aot: bool = false, target: ?[]const u8 = null }`. Field defaults cover omitted keys; `std.json`'s default `ignore_unknown_fields = false` makes an **unknown key a loud `error.UnknownField`** (surfaced as a runner failure, never a silent ignore — CLAUDE.md no-silent-default rule). Extensible: future `"cpu"`, `"link"`, `"cwd"` are just new optional struct fields, no new sidecar file and no custom parser. **What the directives do:** 1. **`target = `** threads `--target ` into every `sx` invocation for that example (`run` / `build` / `ir` — `--target` is a global flag, confirmed [main.zig:39](../src/main.zig#L39)), AND **host-match selects the mode.** The runner parses the leading `arch` + `os` tokens of the resolved triple and compares them to `@import("builtin").target` (normalizing `arm64`→`aarch64`): - **match** → *execute* exactly as today (`sx run`, or `aot` build+exec) with the target threaded, plus the `.ir` diff if an `.ir` snapshot exists. ⇒ an x86_64 example gives **real end-to-end coverage on an x86_64 CI runner**. - **mismatch** → **ir-only**: run *only* `sx ir --target `; assert `.exit` (the ir command's exit), `.ir` (normalized stdout), and `.stderr` (diagnostics, normally empty). Do **not** run/build/exec; do **not** assert `.stdout`. An `.ir` snapshot is **required** in ir-only mode — its absence is a loud runner failure ("arch-pinned : ir-only mode requires an .ir snapshot"), never a silent pass. Robust even if `sx ir` treats `--target` as a partial no-op: the `inline_asm` op carries the template + constraint string verbatim, so the IR snapshot still locks the exact thing §II.11 flags as silently-miscompiling (the constraint assembler + template rewrite). 2. **`aot`** is the existing JIT-vs-build+exec switch, just relocated from the standalone `.aot` marker into `.build`. **Negative compile-error examples need NO `.build`.** `…-missing-volatile` (no-output-without-`volatile`) is a Sema diagnostic raised before codegen/JIT, so plain `sx run` reports it identically on any host — it stays a normal example with no config file. **update-goldens interaction:** in ir-only mode, `-Dupdate-goldens` writes `.exit` (ir exit) + `.ir` (+ `.stderr` if non-empty) and skips `.stdout`. Execute mode (incl. `aot`) is unchanged. `.build` is hand-authored — update-goldens never writes it. | Step | Commit | What | Files | |---|---|---|---| | 0.0 | lock | Add `BuildConfig` + `std.json` parse of `expected/.build` (unknown-key ⇒ `error.UnknownField`); **migrate** the 2 existing `.aot` markers → `.build` (content `{ "aot": true }`) and delete them; thread `target`'s `--target` into the spawned argv; add `hostMatchesTarget(value) bool` (arch+os token parse, `arm64`→`aarch64`) gating the **execute** path. Lock with `examples/16xx-platform-target-host.sx` (trivial `main`) + a `.build` `{ "target": "" }` (still runs+passes) and unit `test`s for the JSON parse + `hostMatchesTarget`. | `src/corpus_run.test.zig`, `examples/expected/1226-*.{aot→build}`, `…/1227-*`, + fixture | | 0.1 | lock | Implement the **mismatch ⇒ ir-only** branch (skip run/build/exec; assert `.exit`+`.ir`+`.stderr` from `sx ir --target`; require `.ir`). Lock with `examples/16xx-platform-target-cross.sx` (asm-free `() -> i64 { return 0; }`) + `.build` `{ "target": "x86_64-linux" }` + a checked-in `.ir` snapshot — exercises ir-only on the arm64 host. | `src/corpus_run.test.zig` + fixture | | 0.2 | docs | Update CLAUDE.md §"Test layout"/§"Testing" to document `.build` (format + `aot`/`target` keys) replacing the standalone `.aot` marker prose (lines ~435, ~492). | `CLAUDE.md` | Both 0.0 and 0.1 are **lock** commits: the runner change and the fixture that exercises it land together and pass the moment they land (the mechanism works immediately — nothing is left red), which is the cadence rule's "lock in current behavior" flavor, not a feature red→green. No asm lowering is gated on either. **Phase 0 verification:** `zig build test` green; deliberately corrupt the cross-target `.ir` fixture and confirm the runner reports an IR mismatch (proves ir-only actually asserts, isn't a no-op); delete it and confirm the "requires an .ir snapshot" failure fires. **Estimated runner delta:** ~70–90 lines (sidecar read + `--target` argv threading + `hostMatchesTarget` + the ir-only branch + update-mode tweak). Within the "no step > ~500 new lines" rule; well under the read budget. ## Phase A — keyword + AST + parser (parses; no codegen) | Step | Commit | What | Files | |---|---|---|---| | A.0 | lock | add `kw_asm` keyword + map entry; unit lex test `asm → kw_asm` | `src/token.zig`, `src/lexer.zig` + `.test.zig` | | A.1 | xfail | parse `asm { … }` → `AsmExpr`/`AsmOperand` in `parsePrimary`; pin an AST/`sx ir` parse snapshot; lowering still `bailDetail("inline asm codegen unimplemented")` | `src/ast.zig` (:85 union arm, :721 structs), `src/parser.zig` (parsePrimary), `src/ir/interp.zig` | | A.2 | green | parse-shape snapshot lands green; the unimplemented bail is loud + named | — | ## Phase B — sema / typing | Step | Commit | What | Files | |---|---|---|---| | B.0 | xfail | result-type rule (0→`void` / 1→`T` / N→named-or-positional tuple) + checklist (no-output⇒`volatile`, layout, comptime-string template) — pin error messages | `src/ir/expr_typer.zig` | | B.1 | green | typing + diagnostics implemented; `.unresolved` sentinel on failure (no silent default) | `src/ir/expr_typer.zig`, `src/ir/semantic_diagnostics.zig` | ## Phase C — IR op + lowering | Step | Commit | What | Files | |---|---|---|---| | C.0 | lock | add `inline_asm: InlineAsm` to `Op` + `AsmOperand` (role/name/constraint/operand) + interp `bailDetail` arm; unit tests for the IR shape | `src/ir/inst.zig` (:80), `src/ir/interp.zig` | | C.1 | xfail→green | `lowerAsmExpr` in `lowerExpr` dispatch — interns template/constraints/clobber-names, lowers input `Ref`s, sets result `TypeId` | `src/ir/lower/expr.zig` | ## Phase D — LLVM emit (single value-output; the core) | Step | Commit | What | Files | |---|---|---|---| | D.0 | xfail | `examples/16xx-platform-asm-syscall-write.sx` + `…-register-read.sx` + `…-no-output-volatile.sx` + `…-missing-volatile.sx` (expected compile error) — all red | examples + `expected/` markers | | D.1 | green | `emitInlineAsm`: **port `FuncGen.airAssembly`** — constraint-string assembler (outputs `=`/`+`, inputs, `clobbers(.name)`→`~{name}`), `%[name]`→`${N}` / `%%` / `%=` template rewriter, `LLVMGetInlineAsm`+`LLVMBuildCall2`, `sideeffect=volatile`, AT&T dialect | `src/ir/emit_llvm.zig` (emitInst dispatch + handler) | | D.2 | green | lock the template-rewrite + constraint string via an `expected/*.ir` snapshot on `…-template-subst.sx` | examples | **Phase D verification:** `zig build test`; the syscall example runs on `x86_64-linux`; IR snapshot matches the design doc's worked `sys_write` lowering. ## Phase E — multi-return tuples + `clobbers(.…)` | Step | Commit | What | Files | |---|---|---|---| | E.0 | xfail | `…-asm-multi-return.sx` (`divmod`→`(quot,rem)`, `cpuid`→4-tuple) red | examples | | E.1 | green | N `out_value` → LLVM struct return + `extractvalue i` → sx tuple (named when operands named); `clobbers(.name)` dot-name lowering finalized | `src/ir/emit_llvm.zig`, `src/ir/lower/expr.zig` | ## Phase F — global asm (needs EXTERN-EXPORT Phase 2) | Step | Commit | What | Files | |---|---|---|---| | F.0 | xfail | top-level `asm { … }` decl parsed (reject operands/`volatile`); `…-asm-global.sx` (defines a symbol, imported via `extern`) red | `src/parser.zig`, `src/ast.zig` | | F.1 | green | lower `asm_global` → `c.LLVMAppendModuleInlineAsm`; comptime-call guard (dlsym-miss is loud); blocks concatenate in source order | `src/ir/lower/decl.zig`, `src/ir/emit_llvm.zig`, `src/ir/interp.zig` | ## Phase G — later (own steps when scheduled) `-> @place` write-through + read-write (`"+r" -> @place`) + indirect-memory (`"=*m"`) outputs · `%=` unique-id · output-to-const rejection · Intel-dialect opt-in · naked functions (`callconv(.naked)`, coordinate with EXTERN-EXPORT). ## Open decisions (design doc §II.10) Dialect (AT&T-only v1, recommended) · `volatile` contextual-keyword (recommended) · brace separator comma (recommended) · `clobbers(.name)` dot-name sugar now → checked per-arch `Clobber` enum later (Phase 4 of the design doc). ## End-to-end verification (per phase) `zig build && zig build test`; for arch-pinned examples confirm they run on a matching host or assert on `sx ir`/`.s` snapshots. After intentional output changes only: `zig build test -Dupdate-goldens`, then review the diff.