Files
sx/current/CHECKPOINT-ASM.md

318 lines
22 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# sx Inline Assembly — Checkpoint (ASM stream)
Companion to `current/PLAN-ASM.md`; design in
[docs/inline-asm-design.md](../docs/inline-asm-design.md). Update after every
commit, one step at a time per the cadence rule (no commit may both add a test
and make it pass).
## Last completed step
**G (read-write `+` place outputs)** — a `+r` / `+{reg}` `-> @place` output is now
implemented (the last substantive feature). LLVM has no `+` constraint, so a
read-write place lowers to: an output **`=`** constraint (return slot, stored back
through the place after the call; the leading `+` rewritten to `=` in
`appendAsmConstraints`), **plus** a **tied input** (the decimal index of that
output) appended **after** the regular inputs, seeded with the place's loaded
value passed as a call arg. Tied inputs come **last** so existing operand indices
(`%[name]``${N}`) are undisturbed — `asmOperandIndex` unchanged. Lowering
(`lowerAsmExpr`) no longer rejects `+` (indirect `*` still rejected loudly).
`emitInlineAsm` (`src/backend/llvm/ops.zig`): grows arg/param arrays by the rw
count (`n_args = n_inputs + n_rw`), loads each seed (`asm.rw.seed`), emits the
tied constraint, and the existing store-back path writes the modified output back.
New `asmIsReadWrite(e, op)` helper. Verified by **running**: increment-in-place
(41→42, IR `"=r,0"`) and a mixed case (rw place + regular input + value output) →
textbook `"=r,=r,r,0"` with correct `${N}` indices and args `(input, seed)`. Two
commits per cadence: (1) `examples/1650-platform-asm-rw-place.sx` locked the
rejection; (2) implemented + flipped 1650 to a runnable aarch64-pinned example
(`{ "target": "macos" }`, ir-only elsewhere). `zig build test` green (658 corpus,
446 unit). Files: `src/ir/lower/expr.zig`, `src/backend/llvm/ops.zig`,
`examples/1650-*`.
Prior: **2**`-> @place` write-through outputs. An asm result can be **stored through
a place** (local / struct field) instead of returned; the place output does NOT
join the result tuple. Parser: `-> @place` parses the `@place` as an ordinary
address-of expression → an `out_place` operand (`src/parser.zig`). Lowering
(`lowerAsmExpr`): out_place operand = the lowered `@place` address, `out_ty` =
the pointee; read-write (`+`) and indirect-memory (`*`) constraints rejected
loudly (not yet implemented). Added `out_ty: TypeId` to the IR `AsmOperand`
(`src/ir/inst.zig`) so emit builds the **combined** return struct (ALL outputs).
`emitInlineAsm` rewrite (`src/backend/llvm/ops.zig`): the LLVM return type is now
built from every output's `out_ty`; after the call, out_place slots are
`store`d through their address and out_value slots rebuild the sx result — with a
**fast path** (no place outputs → the asm's struct return IS the result, so
pure-value asm IR is unchanged). Verified: write-to-local (`get42`→42), struct
field (`@p.b`), mixed value+place (`v=10 b=20`), `+` rejected. Locked with
`examples/1649-platform-asm-place-output.sx` (mixed, runs on aarch64). `zig build
test` green (657 corpus, 446 unit). Files: `src/parser.zig`, `src/ir/inst.zig`,
`src/ir/lower/expr.zig`, `src/backend/llvm/ops.zig`, `examples/1649-*`.
Prior: **F** — global (module-scope) asm. A top-level `asm { "tmpl", };` block (template
only) lowers to LLVM `module asm`, and a lib-less `extern` calls into the symbols
it defines. New `asm_global` AST node (`src/ast.zig`) + `parseAsmGlobal`
(`src/parser.zig`, dispatched from `parseTopLevel` on `kw_asm`) — rejects
`volatile` and any operands/clobbers. The node forced (and got) arms in the same
three `Node.Data` switches as `asm_expr` (`sema.zig` ×2, `semantic_diagnostics.zig`).
`Module` gains a `global_asm: ArrayList([]const u8)` (`src/ir/module.zig`);
`lowerMainAndComptime` captures each template (the dead `lowerDecls` is NOT the
top-level pass — `lowerRoot` Pass 2 uses `lowerMainAndComptime`); `emit_llvm.zig`'s
`emit()` appends each via `LLVMAppendModuleInlineAsm` (source order). Verified
end-to-end: an aarch64 `_my_add` global routine called via `extern` returns 42 —
**AOT only** (the ORC JIT doesn't link module-asm symbols, so `sx run` is wrong;
the design ties global-asm symbols to the final linked binary). Locked with
`examples/1648-platform-asm-global.sx` (`.build { "aot": true, "target": "macos" }`
→ AOT build+run on aarch64, ir-only elsewhere). `zig build test` green (656
corpus, 446 unit). Files: `src/ast.zig`, `src/parser.zig`, `src/sema.zig`,
`src/ir/semantic_diagnostics.zig`, `src/ir/module.zig`, `src/ir/lower/decl.zig`,
`src/ir/emit_llvm.zig`, `examples/1648-*`.
Prior: **E** — multi-output tuples. **Inline asm now returns tuples.** Replaced the
N>1 bail with a shared `asmResultType` helper (`src/ir/lower/expr.zig`, mixed
into `Lowering`) that derives the result type from the `out_value` operands
(0→void, 1→T, N→named tuple, named via the §II.5 effective-name rule). The key
realization: `toLLVMType(tuple)` already produces a literal struct `{T1,…,Tn}`
exactly LLVM's multi-output asm return — so **emit needed NO change**; building
the op with a tuple result type makes the asm call return the struct, which IS
sx's tuple value (destructured by the normal `tuple_get` path). `inferType`'s
`.asm_expr` arm now also delegates to `asmResultType` (single owner), so
`return asm`, `x := asm`, and `q, r := asm` all agree on the type. Verified
end-to-end on aarch64: `split(0x1234)``(lo=52, hi=18)`, a udiv/msub divmod→
`(3, 2)`. IR is textbook: `call { i64, i64 } asm "divq ${4}",
"={rax},={rdx},{rax},{rdx},r,~{cc}"(…)` → extractvalue → tuple. Converted 1640 to
the x86_64 multi-output IR lock (ir-only) + added `1647-platform-asm-aarch64-multi`
(runs on aarch64). `zig build test` green (655 corpus, 446 unit). Files:
`src/ir/lower/expr.zig`, `src/ir/lower.zig`, `src/ir/expr_typer.zig`,
`examples/164{0,7}-*`.
Prior: **C.1 + D** — inline asm CODEGEN (lowering builds the op + LLVM emit). **Inline
assembly now runs end-to-end.** `lowerAsmExpr` (`src/ir/lower/expr.zig`) stops
bailing: it resolves each operand's effective name (§II.5 auto-naming), interns
template/constraints/clobbers, lowers input `Ref`s, derives the result `TypeId`
(0→void, 1→T), and builds the `inline_asm` op. Added a `%[name]`-references-a-
real-operand check (the last deferred validation). Multi-output (N>1) still bails
loudly ("Phase E"). `emitInlineAsm` (`src/backend/llvm/ops.zig`, port of Zig's
`airAssembly`): assembles the LLVM constraint string (outputs→inputs→`~{clobber}`,
`,``|`), rewrites the template (`%[name]``${N}`, `%%``%`, `$``$$`, `%=`
`${:uid}`), then `LLVMGetInlineAsm` + `LLVMBuildCall2` (AT&T). Dispatch wired
(`emit_llvm.zig`, replacing the C.0 `@panic`). **`llvm_shim.c`**: added
`LLVMInitializeNativeAsmParser()` — the JIT must assemble inline asm at run time.
Verified end-to-end: aarch64 `add`/`mov` run on the host (exit 42), `nop volatile`
runs (1642 now exit 0), IR is textbook (`call i64 asm "add ${0},${1},${2}",
"=r,r,r"(…)`). Locked with `examples/1645-platform-asm-aarch64-add.sx` (runs on
aarch64, ir-only elsewhere via `.build` + `.ir`). Also added the `inferType`
`.asm_expr` arm (`src/ir/expr_typer.zig`, 0→void / 1→T) — without it a bare
`x := asm {…-> T}` binding inferred `.unresolved` and silently produced 0;
regression-locked with `examples/1646-platform-asm-value-binding.sx`. Updated
1640 (now Phase-E bail) + 1642 (now runs). `zig build test` green (654 corpus,
446 unit). Files: `src/ir/lower/expr.zig`, `src/backend/llvm/ops.zig`,
`src/ir/emit_llvm.zig`, `src/ir/expr_typer.zig`, `llvm_shim.c`,
`examples/164{0,2,5,6}-*`.
Prior: **C.0** — IR op `inline_asm` (lock; no behavior change). Added `inline_asm:
InlineAsm` to the IR `Op` union + the `InlineAsm` struct (`template: StringId`,
`operands: []const AsmOperand` {role/name/constraint/operand}, `clobbers:
[]const StringId`, `has_side_effects`) in `src/ir/inst.zig` — all strings
interned, operands in source order, result on `Inst.ty`. The new variant forced
(and got) arms in two exhaustive `Op` switches: `src/ir/interp.zig` (loud
`bailDetail` — inline asm is never comptime-evaluable) and `src/ir/print.zig`
(IR dump). `src/ir/emit_llvm.zig` gets a `@panic` **tripwire** — emit lands in
Phase D, and until then `lowerAsmExpr` still bails so no `inline_asm` op is ever
created (reaching emit would be a lowering-switched-over-too-early bug). Unit
test `inline_asm op shape` in `src/ir/inst.test.zig`. `zig build test` green
(652 corpus, 446 unit). Files: `src/ir/inst.zig`, `src/ir/interp.zig`,
`src/ir/print.zig`, `src/ir/emit_llvm.zig`, `src/ir/inst.test.zig`.
Prior: **B.1** — operand-name validation (design §II.5 auto-naming rule). Extended
`lowerAsmExpr` with a `pinnedRegister(constraint)` helper (`"={eax}"``eax`,
`"+{rax}"``rax`, `"=r"`→null) and two checks: (1) **reject the echo form**
`[eax] "={eax}"` — a label identical to its own pinned register is redundant
(the operand is already auto-named after the register); (2) **reject duplicate
operand names** (ambiguous `%[name]` / result field). Locked with
`examples/1643-platform-asm-echo-name.sx` + `1644-platform-asm-duplicate-name.sx`.
`zig build test` green (652 corpus, 0 failed; 445 unit). Files:
`src/ir/lower/expr.zig`.
Prior: **B.0** — asm shape validation (compile-path diagnostics). Restructured the
`.asm_expr` lowering arm into `lowerAsmExpr` (`src/ir/lower/expr.zig`, mixed into
`Lowering` in `src/ir/lower.zig`): it validates BEFORE the not-yet-implemented
codegen bail, so the user sees the real problem first. Two checklist items now
enforced with named diagnostics: (1) **template must be a compile-time-known
string** (`"..."` / `#string`); (2) **no value outputs ⇒ must be `volatile`**
(mirrors Zig — a result-less asm could be deleted). Valid shapes still bail with
the "codegen not yet implemented" message. Result-type derivation + auto-naming
stay deferred to a later step (observable only once Phase C produces a real IR
op). Locked with `examples/1641-platform-asm-missing-volatile.sx` (volatile
error) + `1642-platform-asm-nop-volatile.sx` (volatile no-output accepted →
codegen bail). `zig build test` green (650 corpus, 0 failed; 445 unit). Files:
`src/ir/lower/expr.zig`, `src/ir/lower.zig`, `examples/164{1,2}-*`.
Prior: **A.1** — parse `asm { … }` + loud lowering bail (folded A.1+A.2 into one honest
lock commit, since the loud bail IS current correct behavior — cadence option
(a)). Added `AsmExpr`/`AsmOperand` to `src/ast.zig` + the `asm_expr` `Node.Data`
arm; `parseAsmExpr` in `src/parser.zig` (`parsePrimary` `.kw_asm` dispatch) —
parses the template, flat operand list (`[name]? "constraint" -> Type` value
output / `= expr` input), and `clobbers(.…)`; `volatile`/`clobbers` recognized
contextually via `isContextualWord`. The new `asm_expr` tag forced (and got)
arms in three exhaustive `Node.Data` switches: `src/sema.zig` `analyzeNode` +
`findNodeAtOffset`, `src/ir/semantic_diagnostics.zig` `checkBindingNames` (all
recurse into template + operand payloads). Lowering bails LOUD + named in
`src/ir/lower/expr.zig` ("inline assembly codegen is not yet implemented…") via
an explicit `.asm_expr` arm (not the generic `unknown_expr` else) returning
`emitPlaceholder`. `-> @place` write-through is rejected with a clear "Phase 2"
parse error. Locked with `examples/1640-platform-asm-parse.sx` (multi-output
`divmod`, named operands, register pins, clobbers — parses then bails; called
from `main`). `zig build test` green (648 corpus, 0 failed; 445 unit). Files:
`src/ast.zig`, `src/parser.zig`, `src/sema.zig`, `src/ir/semantic_diagnostics.zig`,
`src/ir/lower/expr.zig`, `examples/1640-*`.
Prior: **A.0**`kw_asm` keyword (first compiler code). Added the `kw_asm` `Token.Tag`
variant + `.{ "asm", .kw_asm }` keyword-map entry in `src/token.zig`; `volatile` /
`clobbers` deliberately stay OUT of the global table (contextual). New exhaustive
`Tag` switch in `src/lsp/server.zig` `classifyToken` flagged the missing arm (the
intended coverage tripwire) — added `.kw_asm` to the keyword group. Lock test in
new `src/lexer.test.zig` (`asm``kw_asm`, `volatile`/`clobbers``identifier`),
wired into the `src/root.zig` barrel as `lexer_tests`. `zig build test` green (648
corpus, 0 failed; 445 unit, 0 failed — +1). Files: `src/token.zig`,
`src/lexer.test.zig`, `src/root.zig`, `src/lsp/server.zig`.
Prior: **0.2** — CLAUDE.md docs for `<name>.build`; **Phase 0 COMPLETE**.
**0.1** — corpus runner **ir-only branch** for cross-target examples. Replaced
0.0's loud placeholder bail: when `cfg.target` doesn't match the host (`ir_only`),
`sweepRoot` skips run/build/exec and verifies via `sx ir --target` only —
asserting `.exit` (ir cmd) + `.ir` (normalized stdout) + `.stderr`, never
`.stdout` (write skipped in update mode, assertion skipped in verify mode). An
`.ir` snapshot is **required** in ir-only mode — its absence is a loud failure
("needs an .ir snapshot for ir-only mode"). Locked with
`examples/1639-platform-target-cross.sx` (asm-free `main :: () -> i64 { return 0;
}`), `.build` `{ "target": "x86_64-linux" }`, + checked-in `.ir`. Verified both
guards fire: corrupting the `.ir` → IR mismatch; deleting it → the require-failure.
`zig build test` green (647 corpus, 0 failed; 444 unit). Files:
`src/corpus_run.test.zig`, `examples/1639-*`.
## Current state
**Inline assembly works end-to-end: 0, 1, and N value outputs (tuples).** Full
pipeline: lex (A.0) → parse (A.1) → validate (B.0/B.1 + `%[name]` check) → IR op
(C.0) → lower-builds-op + LLVM emit + JIT asm-parser init (C.1/D) → multi-output
tuples (E). Register-class + register-pinned operands, inputs, clobbers, `#string`
multi-instruction templates, `%[name]`/`%%` rewriting, and the §II.5 auto-naming
rule all work and execute on the host JIT. Global `asm { … }` (Phase F) works AOT (call-into-asm
via lib-less `extern`). `-> @place` **write-through** outputs work (Phase 2) and
**read-write (`+`)** place outputs work (Phase G — tied-input lowering, runs on
aarch64). Indirect-memory (`*`) place outputs are still rejected loudly as
not-yet-implemented — the only remaining substantive feature. The x86_64
syscall-write ir-only example is DONE (1651). Smaller follow-ups: the
comptime-call guard for global asm (`#run` into a module-asm symbol should fail
loud via dlsym-miss — pin a test) and a JIT-vs-global-asm note (`sx run` silently
mishandles module-asm symbols; AOT is correct). `readme.md` now has an "Inline
Assembly" section.
Known orthogonal bug: **issue 0137**`sx run` on a program with no `main`
segfaults (`src/target.zig:256-273`, unguarded JIT entry lookup). Pre-existing,
asm-independent; does NOT block the ASM stream (every example has a `main`).
Phase EF feasibility already confirmed against the live tree
(`LLVMGetInlineAsm` / `LLVMBuildCall2` / `LLVMAppendModuleInlineAsm` in LLVM@19
`Core.h`; ERR-stream `extractvalue`→tuple in `emit_llvm.zig:726-927`; lib-less
`extern`, 60 sites; `--target` a global CLI flag).
## Next step
The output-to-`const` rejection item is **DONE** (via issue 0138, now RESOLVED).
The general `@const` address-of bug — `@scalar_const` reinterpreting the folded
value as a pointer (`inttoptr (i64 40 to ptr)`) → segfault on deref / invalid
store for asm `-> @const` — was fixed in `src/ir/lower/expr.zig`'s `.address_of`
path (scalar `::` consts now diagnose "no storage"; array/struct consts keep real
storage). Because asm `-> @place` lowers `@place` through that same path, asm
`-> @const` now reports the clean diagnostic for free — no asm-specific code
needed. Regression: `examples/1177-diagnostics-addr-of-const-rejected.sx`.
Remaining work, all optional / additive:
- **Indirect-memory (`"=*m"`) outputs**: pass the place address as an arg, asm
writes through it (no return slot). Currently rejected. The last substantive
feature; needs `elementtype` type-attribute plumbing on the indirect arg
(precedent: `sret` in `emit_llvm.zig`/`ops.zig`).
- **Polish**: comptime-call guard test for global asm; make `sx run` error (not
silently mishandle) a module-asm symbol.
Done since last: output-to-`const` rejection (issue 0138), x86_64 syscall-write
ir-only example (1651).
Orthogonal: **issue 0137** (no-`main` segfault).
## Log
- (init) Plan + design doc written; ASM stream opened.
- (0.0) Corpus runner target-gating: `<name>.build` JSON config (replaces `.aot`
marker), `--target` threading, `hostMatchesTarget` execute-gate, loud
cross-target placeholder bail. Migrated 1226/1227 `.aot``.build`; locked with
1638 fixture + unit tests. `zig build test` green.
- (0.1) ir-only branch: cross-target examples verify via `sx ir --target` only
(exit+ir+stderr, no stdout; `.ir` required). Locked with 1639 fixture; verified
corrupt-.ir → mismatch and missing-.ir → loud failure. `zig build test` green.
- (0.2) docs: CLAUDE.md documents `<name>.build` JSON sidecar (aot + target +
ir-only gating), replacing stale `.aot` marker prose. **Phase 0 COMPLETE.**
- (A.0) `kw_asm` keyword in token.zig (+ map entry); LSP `classifyToken` switch
coverage; lock test in new `lexer.test.zig` (wired via root.zig). `volatile` /
`clobbers` stay contextual identifiers. `zig build test` green (445 unit, +1).
- (A.1) parse `asm { … }``AsmExpr` + loud lowering bail; `asm_expr` arms in 3
exhaustive `Node.Data` switches; `-> @place` rejected (Phase 2). Adopted operand
auto-naming rule (design §II.5). Locked with 1640 fixture. Filed orthogonal
issue 0137 (no-`main` JIT segfault). `zig build test` green (648 corpus, 445 unit).
- (B.0) asm shape validation in `lowerAsmExpr`: comptime-string template +
no-output⇒volatile, with named diagnostics before the codegen bail. Locked with
1641 (volatile error) + 1642 (volatile accepted). `zig build test` green (650
corpus, 445 unit).
- (B.1) operand-name validation: `pinnedRegister` helper + reject echo form
(`[eax] "={eax}"`) and duplicate names. Locked with 1643 + 1644. `zig build
test` green (652 corpus, 445 unit).
- (C.0) IR op `inline_asm: InlineAsm` + interp `bailDetail` + print arm + emit
`@panic` tripwire (Phase D). No behavior change (lowering still bails). Unit
test `inline_asm op shape`. `zig build test` green (652 corpus, 446 unit).
- (C.1+D) CODEGEN — `lowerAsmExpr` builds the op (effective names, interned
strings, input Refs, 0/1 result type) + `%[name]` validation; `emitInlineAsm`
(constraint string + template rewrite + `LLVMGetInlineAsm`/`BuildCall2`, AT&T);
`inferType` arm; `LLVMInitializeNativeAsmParser` for the JIT. **Inline asm runs
end-to-end.** N>1 bails (Phase E). Locked with 1645 (aarch64 add, runs) + 1646
(`:=` binding); updated 1640/1642. `zig build test` green (654 corpus, 446 unit).
- (E) multi-output tuples — `asmResultType` helper (0→void/1→T/N→named tuple),
shared by lowering + `inferType`. `toLLVMType(tuple)` == LLVM multi-output
struct, so emit unchanged; the asm struct return IS the sx tuple. Runs on
aarch64 (1647: `split``(lo,hi)`); 1640 → x86 multi-output IR lock (ir-only).
`zig build test` green (655 corpus, 446 unit).
- (F) global asm — `asm_global` AST node + `parseAsmGlobal` (top-level, rejects
volatile/operands); `Module.global_asm` captured in `lowerMainAndComptime`;
`emit()` appends via `LLVMAppendModuleInlineAsm`; call-into via lib-less
`extern`. AOT-verified (1648, `_my_add`→42). `zig build test` green (656 corpus).
- (docs) readme.md "Inline Assembly" section (b8800a2).
- (2) `-> @place` write-through — `out_place` operand; `out_ty` on the IR
AsmOperand; `emitInlineAsm` builds the combined output struct + splits
(out_place → store-through, out_value → result), fast-path when no places.
`+`/`*` rejected. Locked with 1649 (mixed, runs). `zig build test` green (657
corpus, 446 unit).
- (G) read-write `+` place outputs — `+` lowers to an output `=` + a tied input
(output-index constraint) seeded with the place's loaded value, tied inputs
appended last (operand indices undisturbed). `appendAsmConstraints` rewrites
`+``=`; `emitInlineAsm` grows args by the rw count + loads seeds;
`asmIsReadWrite` helper. Lowering stops rejecting `+` (`*` still rejected). Two
commits (cadence): 1650 locked the rejection, then flipped to a runnable
aarch64 example (`"=r,0"` IR). `zig build test` green (658 corpus, 446 unit).
- (0138) output-to-`const` rejection — fixed the underlying general bug: scalar
`@const` (address-of a folded `::` constant) reinterpreted the value as a
pointer (`inttoptr`). `src/ir/lower/expr.zig` `.address_of` now diagnoses a
scalar const (local + module) instead of falling through; array/struct consts
keep storage. asm `-> @const` gets the clean diagnostic for free (same path).
Regression `examples/1177-diagnostics-addr-of-const-rejected.sx`. Issue 0138
RESOLVED. `zig build test` green (659 corpus, 446 unit).
- (x86 syscall) x86_64 Linux `write(2)` via raw `syscall` — locks the constraint
string `={rax},{rax},{rdi},{rsi},{rdx},~{rcx},~{r11},~{memory}` (register-pinned
inputs + pinned value output + pointer input + clobbers). ir-only on aarch64
(`.ir` asserted), runs on x86_64-linux (hand-authored `"ok\n"` stdout).
`examples/1651-platform-asm-x86-syscall-write.sx`. Pure additive lock, no
compiler change. `zig build test` green (660 corpus, 446 unit).
## Known issues
- **0138** — RESOLVED. `@const` (address-of a `::` comptime constant) yielded a
wild pointer (`inttoptr (i64 <value> to ptr)`). Fixed by diagnosing scalar
`@const` in `src/ir/lower/expr.zig` `.address_of` (no storage; array/struct
consts unaffected). Delivered the ASM "output-to-`const` rejection" for free.
Regression `examples/1177-diagnostics-addr-of-const-rejected.sx`.
- **0137** — `sx run` on a program with no `main` segfaults (unguarded JIT entry
lookup, `src/target.zig:256-273`). Pre-existing, asm-independent. Filed
`issues/0137-jit-run-no-main-segfault.md`. Does not block A.1.