Files
sx/current/CHECKPOINT-ASM.md
agra 4d75b9323c feat(asm): Phase F — global (module-scope) asm
A top-level `asm { "tmpl", };` block (template only) lowers to LLVM `module asm`;
a lib-less `extern` declaration calls into the symbols it defines (the import
direction reuses the existing C-FFI extern path — no new surface).

- ast.zig: asm_global node (AsmGlobal { template }).
- parser.zig: parseAsmGlobal, dispatched from parseTopLevel on kw_asm — rejects
  `volatile` and any operands/clobbers (template only). The in-function asm
  expression form stays in parsePrimary.
- module.zig: Module.global_asm list; lower/decl.zig captures each template in
  lowerMainAndComptime (the real top-level pass — lowerDecls is dead for
  top-level); emit_llvm.zig emit() appends each via LLVMAppendModuleInlineAsm in
  source order.
- the new node forced asm_global arms in sema.zig (analyzeNode +
  findNodeAtOffset) and semantic_diagnostics.zig (checkBindingNames).

Verified end-to-end: an aarch64 `_my_add` global routine, called via `extern`,
returns 42 — AOT only (the ORC JIT doesn't link module-asm symbols; global-asm
symbols live in the final linked binary). Locked with 1648-platform-asm-global
({ "aot": true, "target": "macos" } → AOT build+run on aarch64, ir-only else).

zig build test green (656 corpus, 446 unit).
2026-06-15 22:22:29 +03:00

235 lines
16 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# sx Inline Assembly — Checkpoint (ASM stream)
Companion to `current/PLAN-ASM.md`; design in
[docs/inline-asm-design.md](../docs/inline-asm-design.md). Update after every
commit, one step at a time per the cadence rule (no commit may both add a test
and make it pass).
## Last completed step
**F** — global (module-scope) asm. A top-level `asm { "tmpl", };` block (template
only) lowers to LLVM `module asm`, and a lib-less `extern` calls into the symbols
it defines. New `asm_global` AST node (`src/ast.zig`) + `parseAsmGlobal`
(`src/parser.zig`, dispatched from `parseTopLevel` on `kw_asm`) — rejects
`volatile` and any operands/clobbers. The node forced (and got) arms in the same
three `Node.Data` switches as `asm_expr` (`sema.zig` ×2, `semantic_diagnostics.zig`).
`Module` gains a `global_asm: ArrayList([]const u8)` (`src/ir/module.zig`);
`lowerMainAndComptime` captures each template (the dead `lowerDecls` is NOT the
top-level pass — `lowerRoot` Pass 2 uses `lowerMainAndComptime`); `emit_llvm.zig`'s
`emit()` appends each via `LLVMAppendModuleInlineAsm` (source order). Verified
end-to-end: an aarch64 `_my_add` global routine called via `extern` returns 42 —
**AOT only** (the ORC JIT doesn't link module-asm symbols, so `sx run` is wrong;
the design ties global-asm symbols to the final linked binary). Locked with
`examples/1648-platform-asm-global.sx` (`.build { "aot": true, "target": "macos" }`
→ AOT build+run on aarch64, ir-only elsewhere). `zig build test` green (656
corpus, 446 unit). Files: `src/ast.zig`, `src/parser.zig`, `src/sema.zig`,
`src/ir/semantic_diagnostics.zig`, `src/ir/module.zig`, `src/ir/lower/decl.zig`,
`src/ir/emit_llvm.zig`, `examples/1648-*`.
Prior: **E** — multi-output tuples. **Inline asm now returns tuples.** Replaced the
N>1 bail with a shared `asmResultType` helper (`src/ir/lower/expr.zig`, mixed
into `Lowering`) that derives the result type from the `out_value` operands
(0→void, 1→T, N→named tuple, named via the §II.5 effective-name rule). The key
realization: `toLLVMType(tuple)` already produces a literal struct `{T1,…,Tn}`
exactly LLVM's multi-output asm return — so **emit needed NO change**; building
the op with a tuple result type makes the asm call return the struct, which IS
sx's tuple value (destructured by the normal `tuple_get` path). `inferType`'s
`.asm_expr` arm now also delegates to `asmResultType` (single owner), so
`return asm`, `x := asm`, and `q, r := asm` all agree on the type. Verified
end-to-end on aarch64: `split(0x1234)``(lo=52, hi=18)`, a udiv/msub divmod→
`(3, 2)`. IR is textbook: `call { i64, i64 } asm "divq ${4}",
"={rax},={rdx},{rax},{rdx},r,~{cc}"(…)` → extractvalue → tuple. Converted 1640 to
the x86_64 multi-output IR lock (ir-only) + added `1647-platform-asm-aarch64-multi`
(runs on aarch64). `zig build test` green (655 corpus, 446 unit). Files:
`src/ir/lower/expr.zig`, `src/ir/lower.zig`, `src/ir/expr_typer.zig`,
`examples/164{0,7}-*`.
Prior: **C.1 + D** — inline asm CODEGEN (lowering builds the op + LLVM emit). **Inline
assembly now runs end-to-end.** `lowerAsmExpr` (`src/ir/lower/expr.zig`) stops
bailing: it resolves each operand's effective name (§II.5 auto-naming), interns
template/constraints/clobbers, lowers input `Ref`s, derives the result `TypeId`
(0→void, 1→T), and builds the `inline_asm` op. Added a `%[name]`-references-a-
real-operand check (the last deferred validation). Multi-output (N>1) still bails
loudly ("Phase E"). `emitInlineAsm` (`src/backend/llvm/ops.zig`, port of Zig's
`airAssembly`): assembles the LLVM constraint string (outputs→inputs→`~{clobber}`,
`,``|`), rewrites the template (`%[name]``${N}`, `%%``%`, `$``$$`, `%=`
`${:uid}`), then `LLVMGetInlineAsm` + `LLVMBuildCall2` (AT&T). Dispatch wired
(`emit_llvm.zig`, replacing the C.0 `@panic`). **`llvm_shim.c`**: added
`LLVMInitializeNativeAsmParser()` — the JIT must assemble inline asm at run time.
Verified end-to-end: aarch64 `add`/`mov` run on the host (exit 42), `nop volatile`
runs (1642 now exit 0), IR is textbook (`call i64 asm "add ${0},${1},${2}",
"=r,r,r"(…)`). Locked with `examples/1645-platform-asm-aarch64-add.sx` (runs on
aarch64, ir-only elsewhere via `.build` + `.ir`). Also added the `inferType`
`.asm_expr` arm (`src/ir/expr_typer.zig`, 0→void / 1→T) — without it a bare
`x := asm {…-> T}` binding inferred `.unresolved` and silently produced 0;
regression-locked with `examples/1646-platform-asm-value-binding.sx`. Updated
1640 (now Phase-E bail) + 1642 (now runs). `zig build test` green (654 corpus,
446 unit). Files: `src/ir/lower/expr.zig`, `src/backend/llvm/ops.zig`,
`src/ir/emit_llvm.zig`, `src/ir/expr_typer.zig`, `llvm_shim.c`,
`examples/164{0,2,5,6}-*`.
Prior: **C.0** — IR op `inline_asm` (lock; no behavior change). Added `inline_asm:
InlineAsm` to the IR `Op` union + the `InlineAsm` struct (`template: StringId`,
`operands: []const AsmOperand` {role/name/constraint/operand}, `clobbers:
[]const StringId`, `has_side_effects`) in `src/ir/inst.zig` — all strings
interned, operands in source order, result on `Inst.ty`. The new variant forced
(and got) arms in two exhaustive `Op` switches: `src/ir/interp.zig` (loud
`bailDetail` — inline asm is never comptime-evaluable) and `src/ir/print.zig`
(IR dump). `src/ir/emit_llvm.zig` gets a `@panic` **tripwire** — emit lands in
Phase D, and until then `lowerAsmExpr` still bails so no `inline_asm` op is ever
created (reaching emit would be a lowering-switched-over-too-early bug). Unit
test `inline_asm op shape` in `src/ir/inst.test.zig`. `zig build test` green
(652 corpus, 446 unit). Files: `src/ir/inst.zig`, `src/ir/interp.zig`,
`src/ir/print.zig`, `src/ir/emit_llvm.zig`, `src/ir/inst.test.zig`.
Prior: **B.1** — operand-name validation (design §II.5 auto-naming rule). Extended
`lowerAsmExpr` with a `pinnedRegister(constraint)` helper (`"={eax}"``eax`,
`"+{rax}"``rax`, `"=r"`→null) and two checks: (1) **reject the echo form**
`[eax] "={eax}"` — a label identical to its own pinned register is redundant
(the operand is already auto-named after the register); (2) **reject duplicate
operand names** (ambiguous `%[name]` / result field). Locked with
`examples/1643-platform-asm-echo-name.sx` + `1644-platform-asm-duplicate-name.sx`.
`zig build test` green (652 corpus, 0 failed; 445 unit). Files:
`src/ir/lower/expr.zig`.
Prior: **B.0** — asm shape validation (compile-path diagnostics). Restructured the
`.asm_expr` lowering arm into `lowerAsmExpr` (`src/ir/lower/expr.zig`, mixed into
`Lowering` in `src/ir/lower.zig`): it validates BEFORE the not-yet-implemented
codegen bail, so the user sees the real problem first. Two checklist items now
enforced with named diagnostics: (1) **template must be a compile-time-known
string** (`"..."` / `#string`); (2) **no value outputs ⇒ must be `volatile`**
(mirrors Zig — a result-less asm could be deleted). Valid shapes still bail with
the "codegen not yet implemented" message. Result-type derivation + auto-naming
stay deferred to a later step (observable only once Phase C produces a real IR
op). Locked with `examples/1641-platform-asm-missing-volatile.sx` (volatile
error) + `1642-platform-asm-nop-volatile.sx` (volatile no-output accepted →
codegen bail). `zig build test` green (650 corpus, 0 failed; 445 unit). Files:
`src/ir/lower/expr.zig`, `src/ir/lower.zig`, `examples/164{1,2}-*`.
Prior: **A.1** — parse `asm { … }` + loud lowering bail (folded A.1+A.2 into one honest
lock commit, since the loud bail IS current correct behavior — cadence option
(a)). Added `AsmExpr`/`AsmOperand` to `src/ast.zig` + the `asm_expr` `Node.Data`
arm; `parseAsmExpr` in `src/parser.zig` (`parsePrimary` `.kw_asm` dispatch) —
parses the template, flat operand list (`[name]? "constraint" -> Type` value
output / `= expr` input), and `clobbers(.…)`; `volatile`/`clobbers` recognized
contextually via `isContextualWord`. The new `asm_expr` tag forced (and got)
arms in three exhaustive `Node.Data` switches: `src/sema.zig` `analyzeNode` +
`findNodeAtOffset`, `src/ir/semantic_diagnostics.zig` `checkBindingNames` (all
recurse into template + operand payloads). Lowering bails LOUD + named in
`src/ir/lower/expr.zig` ("inline assembly codegen is not yet implemented…") via
an explicit `.asm_expr` arm (not the generic `unknown_expr` else) returning
`emitPlaceholder`. `-> @place` write-through is rejected with a clear "Phase 2"
parse error. Locked with `examples/1640-platform-asm-parse.sx` (multi-output
`divmod`, named operands, register pins, clobbers — parses then bails; called
from `main`). `zig build test` green (648 corpus, 0 failed; 445 unit). Files:
`src/ast.zig`, `src/parser.zig`, `src/sema.zig`, `src/ir/semantic_diagnostics.zig`,
`src/ir/lower/expr.zig`, `examples/1640-*`.
Prior: **A.0**`kw_asm` keyword (first compiler code). Added the `kw_asm` `Token.Tag`
variant + `.{ "asm", .kw_asm }` keyword-map entry in `src/token.zig`; `volatile` /
`clobbers` deliberately stay OUT of the global table (contextual). New exhaustive
`Tag` switch in `src/lsp/server.zig` `classifyToken` flagged the missing arm (the
intended coverage tripwire) — added `.kw_asm` to the keyword group. Lock test in
new `src/lexer.test.zig` (`asm``kw_asm`, `volatile`/`clobbers``identifier`),
wired into the `src/root.zig` barrel as `lexer_tests`. `zig build test` green (648
corpus, 0 failed; 445 unit, 0 failed — +1). Files: `src/token.zig`,
`src/lexer.test.zig`, `src/root.zig`, `src/lsp/server.zig`.
Prior: **0.2** — CLAUDE.md docs for `<name>.build`; **Phase 0 COMPLETE**.
**0.1** — corpus runner **ir-only branch** for cross-target examples. Replaced
0.0's loud placeholder bail: when `cfg.target` doesn't match the host (`ir_only`),
`sweepRoot` skips run/build/exec and verifies via `sx ir --target` only —
asserting `.exit` (ir cmd) + `.ir` (normalized stdout) + `.stderr`, never
`.stdout` (write skipped in update mode, assertion skipped in verify mode). An
`.ir` snapshot is **required** in ir-only mode — its absence is a loud failure
("needs an .ir snapshot for ir-only mode"). Locked with
`examples/1639-platform-target-cross.sx` (asm-free `main :: () -> i64 { return 0;
}`), `.build` `{ "target": "x86_64-linux" }`, + checked-in `.ir`. Verified both
guards fire: corrupting the `.ir` → IR mismatch; deleting it → the require-failure.
`zig build test` green (647 corpus, 0 failed; 444 unit). Files:
`src/corpus_run.test.zig`, `examples/1639-*`.
## Current state
**Inline assembly works end-to-end: 0, 1, and N value outputs (tuples).** Full
pipeline: lex (A.0) → parse (A.1) → validate (B.0/B.1 + `%[name]` check) → IR op
(C.0) → lower-builds-op + LLVM emit + JIT asm-parser init (C.1/D) → multi-output
tuples (E). Register-class + register-pinned operands, inputs, clobbers, `#string`
multi-instruction templates, `%[name]`/`%%` rewriting, and the §II.5 auto-naming
rule all work and execute on the host JIT. Global `asm { … }` (Phase F) works AOT (call-into-asm
via lib-less `extern`). **Remaining feature gap:** `-> @place` write-through /
read-write / indirect-memory outputs (rejected at parse — Phase 2). Smaller
follow-ups: the comptime-call guard for global asm (`#run` into a module-asm
symbol should fail loud via dlsym-miss — pin a test), a JIT-vs-global-asm note
(`sx run` silently mishandles module-asm symbols; AOT is correct), the x86_64
syscall ir-only example, and a `readme.md` inline-asm section (docs-track-changes).
Known orthogonal bug: **issue 0137**`sx run` on a program with no `main`
segfaults (`src/target.zig:256-273`, unguarded JIT entry lookup). Pre-existing,
asm-independent; does NOT block the ASM stream (every example has a `main`).
Phase EF feasibility already confirmed against the live tree
(`LLVMGetInlineAsm` / `LLVMBuildCall2` / `LLVMAppendModuleInlineAsm` in LLVM@19
`Core.h`; ERR-stream `extractvalue`→tuple in `emit_llvm.zig:726-927`; lib-less
`extern`, 60 sites; `--target` a global CLI flag).
## Next step
**Phase 2 — `-> @place` outputs** (the last feature gap): write-through
(`"=…" -> @place`), read-write (`"+…" -> @place`), and indirect-memory (`"=*m"`)
outputs, currently rejected at parse. Needs: parse `-> @<place-expr>` into an
`out_place` operand (payload = the place expr), lower the place to an address +
`store` the asm result through it (place outputs don't join the result tuple),
the `+` read-write seeding, and output-to-`const` rejection. See `PLAN-ASM.md`
Phase G / design §II.2 Dev 5 + cookbook (`cas`, `memcpy_bytes`, `cpuid_into`).
Smaller polish (any order): comptime-call guard test for global asm; `sx run`
should error (not silently mishandle) a module-asm symbol; x86_64 syscall-write
ir-only example; `readme.md` inline-asm section. Orthogonal: **issue 0137**.
## Log
- (init) Plan + design doc written; ASM stream opened.
- (0.0) Corpus runner target-gating: `<name>.build` JSON config (replaces `.aot`
marker), `--target` threading, `hostMatchesTarget` execute-gate, loud
cross-target placeholder bail. Migrated 1226/1227 `.aot``.build`; locked with
1638 fixture + unit tests. `zig build test` green.
- (0.1) ir-only branch: cross-target examples verify via `sx ir --target` only
(exit+ir+stderr, no stdout; `.ir` required). Locked with 1639 fixture; verified
corrupt-.ir → mismatch and missing-.ir → loud failure. `zig build test` green.
- (0.2) docs: CLAUDE.md documents `<name>.build` JSON sidecar (aot + target +
ir-only gating), replacing stale `.aot` marker prose. **Phase 0 COMPLETE.**
- (A.0) `kw_asm` keyword in token.zig (+ map entry); LSP `classifyToken` switch
coverage; lock test in new `lexer.test.zig` (wired via root.zig). `volatile` /
`clobbers` stay contextual identifiers. `zig build test` green (445 unit, +1).
- (A.1) parse `asm { … }``AsmExpr` + loud lowering bail; `asm_expr` arms in 3
exhaustive `Node.Data` switches; `-> @place` rejected (Phase 2). Adopted operand
auto-naming rule (design §II.5). Locked with 1640 fixture. Filed orthogonal
issue 0137 (no-`main` JIT segfault). `zig build test` green (648 corpus, 445 unit).
- (B.0) asm shape validation in `lowerAsmExpr`: comptime-string template +
no-output⇒volatile, with named diagnostics before the codegen bail. Locked with
1641 (volatile error) + 1642 (volatile accepted). `zig build test` green (650
corpus, 445 unit).
- (B.1) operand-name validation: `pinnedRegister` helper + reject echo form
(`[eax] "={eax}"`) and duplicate names. Locked with 1643 + 1644. `zig build
test` green (652 corpus, 445 unit).
- (C.0) IR op `inline_asm: InlineAsm` + interp `bailDetail` + print arm + emit
`@panic` tripwire (Phase D). No behavior change (lowering still bails). Unit
test `inline_asm op shape`. `zig build test` green (652 corpus, 446 unit).
- (C.1+D) CODEGEN — `lowerAsmExpr` builds the op (effective names, interned
strings, input Refs, 0/1 result type) + `%[name]` validation; `emitInlineAsm`
(constraint string + template rewrite + `LLVMGetInlineAsm`/`BuildCall2`, AT&T);
`inferType` arm; `LLVMInitializeNativeAsmParser` for the JIT. **Inline asm runs
end-to-end.** N>1 bails (Phase E). Locked with 1645 (aarch64 add, runs) + 1646
(`:=` binding); updated 1640/1642. `zig build test` green (654 corpus, 446 unit).
- (E) multi-output tuples — `asmResultType` helper (0→void/1→T/N→named tuple),
shared by lowering + `inferType`. `toLLVMType(tuple)` == LLVM multi-output
struct, so emit unchanged; the asm struct return IS the sx tuple. Runs on
aarch64 (1647: `split``(lo,hi)`); 1640 → x86 multi-output IR lock (ir-only).
`zig build test` green (655 corpus, 446 unit).
- (F) global asm — `asm_global` AST node + `parseAsmGlobal` (top-level, rejects
volatile/operands); `Module.global_asm` captured in `lowerMainAndComptime`;
`emit()` appends via `LLVMAppendModuleInlineAsm`; call-into via lib-less
`extern`. AOT-verified (1648, `_my_add`→42). `zig build test` green (656 corpus).
## Known issues
- **0137** — `sx run` on a program with no `main` segfaults (unguarded JIT entry
lookup, `src/target.zig:256-273`). Pre-existing, asm-independent. Filed
`issues/0137-jit-run-no-main-segfault.md`. Does not block A.1.