feat(asm): Phase 2 — -> @place write-through outputs

An asm result can be STORED through a place (a local / struct field) instead of
returned; the place output does not join the result tuple.

- parser.zig: `-> @place` parses `@place` as an ordinary address-of expression
  → an out_place operand (the in-function form; reuses the existing `@` prefix).
- inst.zig: AsmOperand gains out_ty (the output slot's value type) so emit can
  build the combined return struct without re-deriving from Inst.ty.
- lower/expr.zig: out_place operand = the lowered @place address, out_ty = the
  pointee. Read-write (`+`) and indirect-memory (`*`) constraints rejected loudly
  (not yet implemented) rather than miscompiled.
- ops.zig emitInlineAsm: the LLVM return type is built from ALL outputs
  (out_value + out_place); after the call, out_place slots are stored through
  their address and out_value slots rebuild the sx result. Fast path when there
  are no place outputs (the struct return IS the result — pure-value asm IR
  unchanged).

Verified: write-to-local (42), struct field, mixed value+place (v=10 b=20), `+`
rejected. Locked with 1649-platform-asm-place-output (mixed, runs on aarch64).

zig build test green (657 corpus, 446 unit).
This commit is contained in:
agra
2026-06-15 22:47:34 +03:00
parent b8800a234c
commit 967005621a
11 changed files with 198 additions and 24 deletions

View File

@@ -6,7 +6,25 @@ commit, one step at a time per the cadence rule (no commit may both add a test
and make it pass).
## Last completed step
**F** — global (module-scope) asm. A top-level `asm { "tmpl", };` block (template
**2** — `-> @place` write-through outputs. An asm result can be **stored through
a place** (local / struct field) instead of returned; the place output does NOT
join the result tuple. Parser: `-> @place` parses the `@place` as an ordinary
address-of expression → an `out_place` operand (`src/parser.zig`). Lowering
(`lowerAsmExpr`): out_place operand = the lowered `@place` address, `out_ty` =
the pointee; read-write (`+`) and indirect-memory (`*`) constraints rejected
loudly (not yet implemented). Added `out_ty: TypeId` to the IR `AsmOperand`
(`src/ir/inst.zig`) so emit builds the **combined** return struct (ALL outputs).
`emitInlineAsm` rewrite (`src/backend/llvm/ops.zig`): the LLVM return type is now
built from every output's `out_ty`; after the call, out_place slots are
`store`d through their address and out_value slots rebuild the sx result — with a
**fast path** (no place outputs → the asm's struct return IS the result, so
pure-value asm IR is unchanged). Verified: write-to-local (`get42`→42), struct
field (`@p.b`), mixed value+place (`v=10 b=20`), `+` rejected. Locked with
`examples/1649-platform-asm-place-output.sx` (mixed, runs on aarch64). `zig build
test` green (657 corpus, 446 unit). Files: `src/parser.zig`, `src/ir/inst.zig`,
`src/ir/lower/expr.zig`, `src/backend/llvm/ops.zig`, `examples/1649-*`.
Prior: **F** — global (module-scope) asm. A top-level `asm { "tmpl", };` block (template
only) lowers to LLVM `module asm`, and a lib-less `extern` calls into the symbols
it defines. New `asm_global` AST node (`src/ast.zig`) + `parseAsmGlobal`
(`src/parser.zig`, dispatched from `parseTopLevel` on `kw_asm`) — rejects
@@ -155,8 +173,9 @@ pipeline: lex (A.0) → parse (A.1) → validate (B.0/B.1 + `%[name]` check) →
tuples (E). Register-class + register-pinned operands, inputs, clobbers, `#string`
multi-instruction templates, `%[name]`/`%%` rewriting, and the §II.5 auto-naming
rule all work and execute on the host JIT. Global `asm { … }` (Phase F) works AOT (call-into-asm
via lib-less `extern`). **Remaining feature gap:** `-> @place` write-through /
read-write / indirect-memory outputs (rejected at parse — Phase 2). Smaller
via lib-less `extern`). `-> @place` **write-through** outputs work (Phase 2);
read-write (`+`) and indirect-memory (`*`) place outputs are rejected loudly as
not-yet-implemented — the remaining feature work. Smaller
follow-ups: the comptime-call guard for global asm (`#run` into a module-asm
symbol should fail loud via dlsym-miss — pin a test), a JIT-vs-global-asm note
(`sx run` silently mishandles module-asm symbols; AOT is correct), and the x86_64
@@ -172,17 +191,20 @@ Phase EF feasibility already confirmed against the live tree
`extern`, 60 sites; `--target` a global CLI flag).
## Next step
**Phase 2 — `-> @place` outputs** (the last feature gap): write-through
(`"=…" -> @place`), read-write (`"+…" -> @place`), and indirect-memory (`"=*m"`)
outputs, currently rejected at parse. Needs: parse `-> @<place-expr>` into an
`out_place` operand (payload = the place expr), lower the place to an address +
`store` the asm result through it (place outputs don't join the result tuple),
the `+` read-write seeding, and output-to-`const` rejection. See `PLAN-ASM.md`
Phase G / design §II.2 Dev 5 + cookbook (`cas`, `memcpy_bytes`, `cpuid_into`).
Inline assembly is **feature-complete for the common surface**. Remaining work,
all optional / additive (pick any):
- **Read-write (`"+…" -> @place`) place outputs**: LLVM expresses `+` as an
output `=` + a TIED input (`0` referencing the output index), with the seed
value passed as an arg — Zig's `llvm_rw_vals` mechanism. Currently rejected at
lowering. Needs the tied-input plumbing in `emitInlineAsm` + seeding a load of
the place.
- **Indirect-memory (`"=*m"`) outputs**: pass the place address as an arg, asm
writes through it (no return slot). Currently rejected.
- **Output-to-`const` rejection** for `-> @place` (the place must be mutable).
- **Polish**: comptime-call guard test for global asm; make `sx run` error (not
silently mishandle) a module-asm symbol; x86_64 syscall-write ir-only example.
Smaller polish (any order): comptime-call guard test for global asm; `sx run`
should error (not silently mishandle) a module-asm symbol; x86_64 syscall-write
ir-only example; `readme.md` inline-asm section. Orthogonal: **issue 0137**.
Orthogonal: **issue 0137** (no-`main` segfault).
## Log
- (init) Plan + design doc written; ASM stream opened.
@@ -227,6 +249,12 @@ ir-only example; `readme.md` inline-asm section. Orthogonal: **issue 0137**.
volatile/operands); `Module.global_asm` captured in `lowerMainAndComptime`;
`emit()` appends via `LLVMAppendModuleInlineAsm`; call-into via lib-less
`extern`. AOT-verified (1648, `_my_add`→42). `zig build test` green (656 corpus).
- (docs) readme.md "Inline Assembly" section (b8800a2).
- (2) `-> @place` write-through — `out_place` operand; `out_ty` on the IR
AsmOperand; `emitInlineAsm` builds the combined output struct + splits
(out_place → store-through, out_value → result), fast-path when no places.
`+`/`*` rejected. Locked with 1649 (mixed, runs). `zig build test` green (657
corpus, 446 unit).
## Known issues
- **0137** — `sx run` on a program with no `main` segfaults (unguarded JIT entry