Moves docs/inline-asm-design.md -> design/inline-asm-design.md (the internal design record now lives under design/, separate from the user-facing docs/). Updates all links: current/CHECKPOINT-ASM.md, current/PLAN-ASM.md, current/PLAN-EXTERN-EXPORT.md (../docs -> ../design) and docs/inline-assembly.md (same-dir -> ../design).
12 KiB
sx Inline Assembly — Implementation Plan (ASM stream)
Design source of truth: design/inline-asm-design.md. This plan turns that doc's §II.7 stage-map + §II.8 phasing into ordered, commit-sized, testable steps. Read the design doc first — this file is the how/when, not the what/why.
Surface (decided):
asm volatile { "template", "=r" -> T, "r" = expr, clobbers(.cc, .memory) }
— brace block; -> output / = input; clobbers(.…) dot-name list; N -> Type
outputs return a tuple; templates are pure AT&T (via LLVM).
Feasibility (confirmed): sx links LLVM@19; src/llvm_api.zig @cImports
llvm-c/Core.h, so llvm_api.c.* already exposes LLVMGetInlineAsm (9-arg),
LLVMInlineAsmDialectATT, LLVMBuildCall2, LLVMAppendModuleInlineAsm. No shim.
Relationship to other streams:
- Phases A–E (the inline-asm expression) are independent of EXTERN-EXPORT.
- Phase F (global asm) consumes
extern/exportto import/expose asm symbols — do it afterPLAN-EXTERN-EXPORT.mdPhase 2.
Cadence (IMPASSIBLE)
No commit may both add a test AND make it pass. Each feature step is either a
behavior-locking PASSING test, or an xfail test the next commit turns green.
Arch-pinned tests live in examples/16xx-platform-asm-* and declare their target
via the expected/<name>.target sidecar marker (Phase 0). Never regenerate
snapshots while red.
Phase 0 — corpus target-gating (test-infra prerequisite; no compiler code)
Why first. The flagship v1 examples are x86_64 (syscall-write, divmod,
cpuid) but the dev host is aarch64-Darwin, and the corpus runner
(src/corpus_run.test.zig) currently (a) never threads
a per-example --target and (b) has no host-arch gate — its only skip is "marker
has no .sx". So D.0's …-syscall-write markers asserting exit/stdout describe
output the harness cannot produce on this host, which would violate the cadence
rule (the "next commit turns it green" can never happen). Phase 0 closes that gap.
It touches only the runner + two fixtures — zero compiler code, zero risk to
A–E, and unblocks every arch-pinned asm example.
Marker taxonomy (the cleanup). The runner currently spreads per-example
directives across standalone boolean/value sidecars (.aot now, .target
proposed, more later). Replace that sprawl with one optional config file,
expected/<name>.build, holding all build/run directives; the output snapshots
(.exit / .stdout / .stderr / .ir) stay separate — they are
machine-regenerated data, not config. .exit remains the test-discovery key
(every test has one; .build is optional).
.build format — JSON, parsed with std.json:
{ "aot": true, "target": "x86_64-linux" }
Parse via std.json.parseFromSlice(BuildConfig, …) into
struct { aot: bool = false, target: ?[]const u8 = null }. Field defaults cover
omitted keys; std.json's default ignore_unknown_fields = false makes an
unknown key a loud error.UnknownField (surfaced as a runner failure, never a
silent ignore — CLAUDE.md no-silent-default rule). Extensible: future "cpu",
"link", "cwd" are just new optional struct fields, no new sidecar file and no
custom parser.
What the directives do:
target = <triple|shorthand>threads--target <value>into everysxinvocation for that example (run/build/ir—--targetis a global flag, confirmed main.zig:39), AND host-match selects the mode. The runner parses the leadingarch+ostokens of the resolved triple and compares them to@import("builtin").target(normalizingarm64→aarch64):- match → execute exactly as today (
sx run, oraotbuild+exec) with the target threaded, plus the.irdiff if an.irsnapshot exists. ⇒ an x86_64 example gives real end-to-end coverage on an x86_64 CI runner. - mismatch → ir-only: run only
sx ir <file> --target <t>; assert.exit(the ir command's exit),.ir(normalized stdout), and.stderr(diagnostics, normally empty). Do not run/build/exec; do not assert.stdout. An.irsnapshot is required in ir-only mode — its absence is a loud runner failure ("arch-pinned : ir-only mode requires an .ir snapshot"), never a silent pass. Robust even ifsx irtreats--targetas a partial no-op: theinline_asmop carries the template + constraint string verbatim, so the IR snapshot still locks the exact thing §II.11 flags as silently-miscompiling (the constraint assembler + template rewrite).
- match → execute exactly as today (
aotis the existing JIT-vs-build+exec switch, just relocated from the standalone.aotmarker into.build.
Negative compile-error examples need NO .build. …-missing-volatile
(no-output-without-volatile) is a Sema diagnostic raised before codegen/JIT, so
plain sx run reports it identically on any host — it stays a normal example with
no config file.
update-goldens interaction: in ir-only mode, -Dupdate-goldens writes .exit
(ir exit) + .ir (+ .stderr if non-empty) and skips .stdout. Execute mode
(incl. aot) is unchanged. .build is hand-authored — update-goldens never
writes it.
| Step | Commit | What | Files |
|---|---|---|---|
| 0.0 | lock | Add BuildConfig + std.json parse of expected/<name>.build (unknown-key ⇒ error.UnknownField); migrate the 2 existing .aot markers → .build (content { "aot": true }) and delete them; thread target's --target into the spawned argv; add hostMatchesTarget(value) bool (arch+os token parse, arm64→aarch64) gating the execute path. Lock with examples/16xx-platform-target-host.sx (trivial main) + a .build { "target": "<host arch triple>" } (still runs+passes) and unit tests for the JSON parse + hostMatchesTarget. |
src/corpus_run.test.zig, examples/expected/1226-*.{aot→build}, …/1227-*, + fixture |
| 0.1 | lock | Implement the mismatch ⇒ ir-only branch (skip run/build/exec; assert .exit+.ir+.stderr from sx ir --target; require .ir). Lock with examples/16xx-platform-target-cross.sx (asm-free () -> i64 { return 0; }) + .build { "target": "x86_64-linux" } + a checked-in .ir snapshot — exercises ir-only on the arm64 host. |
src/corpus_run.test.zig + fixture |
| 0.2 | docs | Update CLAUDE.md §"Test layout"/§"Testing" to document .build (format + aot/target keys) replacing the standalone .aot marker prose (lines ~435, ~492). |
CLAUDE.md |
Both 0.0 and 0.1 are lock commits: the runner change and the fixture that exercises it land together and pass the moment they land (the mechanism works immediately — nothing is left red), which is the cadence rule's "lock in current behavior" flavor, not a feature red→green. No asm lowering is gated on either.
Phase 0 verification: zig build test green; deliberately corrupt the
cross-target .ir fixture and confirm the runner reports an IR mismatch (proves
ir-only actually asserts, isn't a no-op); delete it and confirm the
"requires an .ir snapshot" failure fires.
Estimated runner delta: ~70–90 lines (sidecar read + --target argv threading
hostMatchesTarget+ the ir-only branch + update-mode tweak). Within the "no step > ~500 new lines" rule; well under the read budget.
Phase A — keyword + AST + parser (parses; no codegen)
| Step | Commit | What | Files |
|---|---|---|---|
| A.0 | lock | add kw_asm keyword + map entry; unit lex test asm → kw_asm |
src/token.zig, src/lexer.zig + .test.zig |
| A.1 | xfail | parse asm { … } → AsmExpr/AsmOperand in parsePrimary; pin an AST/sx ir parse snapshot; lowering still bailDetail("inline asm codegen unimplemented") |
src/ast.zig (:85 union arm, :721 structs), src/parser.zig (parsePrimary), src/ir/interp.zig |
| A.2 | green | parse-shape snapshot lands green; the unimplemented bail is loud + named | — |
Phase B — sema / typing
| Step | Commit | What | Files |
|---|---|---|---|
| B.0 | xfail | result-type rule (0→void / 1→T / N→named-or-positional tuple) + checklist (no-output⇒volatile, layout, comptime-string template) — pin error messages |
src/ir/expr_typer.zig |
| B.1 | green | typing + diagnostics implemented; .unresolved sentinel on failure (no silent default) |
src/ir/expr_typer.zig, src/ir/semantic_diagnostics.zig |
Phase C — IR op + lowering
| Step | Commit | What | Files |
|---|---|---|---|
| C.0 | lock | add inline_asm: InlineAsm to Op + AsmOperand (role/name/constraint/operand) + interp bailDetail arm; unit tests for the IR shape |
src/ir/inst.zig (:80), src/ir/interp.zig |
| C.1 | xfail→green | lowerAsmExpr in lowerExpr dispatch — interns template/constraints/clobber-names, lowers input Refs, sets result TypeId |
src/ir/lower/expr.zig |
Phase D — LLVM emit (single value-output; the core)
| Step | Commit | What | Files |
|---|---|---|---|
| D.0 | xfail | examples/16xx-platform-asm-syscall-write.sx + …-register-read.sx + …-no-output-volatile.sx + …-missing-volatile.sx (expected compile error) — all red |
examples + expected/ markers |
| D.1 | green | emitInlineAsm: port FuncGen.airAssembly — constraint-string assembler (outputs =/+, inputs, clobbers(.name)→~{name}), %[name]→${N} / %% / %= template rewriter, LLVMGetInlineAsm+LLVMBuildCall2, sideeffect=volatile, AT&T dialect |
src/ir/emit_llvm.zig (emitInst dispatch + handler) |
| D.2 | green | lock the template-rewrite + constraint string via an expected/*.ir snapshot on …-template-subst.sx |
examples |
Phase D verification: zig build test; the syscall example runs on
x86_64-linux; IR snapshot matches the design doc's worked sys_write lowering.
Phase E — multi-return tuples + clobbers(.…)
| Step | Commit | What | Files |
|---|---|---|---|
| E.0 | xfail | …-asm-multi-return.sx (divmod→(quot,rem), cpuid→4-tuple) red |
examples |
| E.1 | green | N out_value → LLVM struct return + extractvalue i → sx tuple (named when operands named); clobbers(.name) dot-name lowering finalized |
src/ir/emit_llvm.zig, src/ir/lower/expr.zig |
Phase F — global asm (needs EXTERN-EXPORT Phase 2)
| Step | Commit | What | Files |
|---|---|---|---|
| F.0 | xfail | top-level asm { … } decl parsed (reject operands/volatile); …-asm-global.sx (defines a symbol, imported via extern) red |
src/parser.zig, src/ast.zig |
| F.1 | green | lower asm_global → c.LLVMAppendModuleInlineAsm; comptime-call guard (dlsym-miss is loud); blocks concatenate in source order |
src/ir/lower/decl.zig, src/ir/emit_llvm.zig, src/ir/interp.zig |
Phase G — later (own steps when scheduled)
-> @place write-through + read-write ("+r" -> @place) + indirect-memory
("=*m") outputs · %= unique-id · output-to-const rejection · Intel-dialect
opt-in · naked functions (callconv(.naked), coordinate with EXTERN-EXPORT).
Open decisions (design doc §II.10)
Dialect (AT&T-only v1, recommended) · volatile contextual-keyword (recommended)
· brace separator comma (recommended) · clobbers(.name) dot-name sugar now →
checked per-arch Clobber enum later (Phase 4 of the design doc).
End-to-end verification (per phase)
zig build && zig build test; for arch-pinned examples confirm they run on a
matching host or assert on sx ir/.s snapshots. After intentional output
changes only: zig build test -Dupdate-goldens, then review the diff.