feat(asm): Phase F — global (module-scope) asm

A top-level `asm { "tmpl", };` block (template only) lowers to LLVM `module asm`;
a lib-less `extern` declaration calls into the symbols it defines (the import
direction reuses the existing C-FFI extern path — no new surface).

- ast.zig: asm_global node (AsmGlobal { template }).
- parser.zig: parseAsmGlobal, dispatched from parseTopLevel on kw_asm — rejects
  `volatile` and any operands/clobbers (template only). The in-function asm
  expression form stays in parsePrimary.
- module.zig: Module.global_asm list; lower/decl.zig captures each template in
  lowerMainAndComptime (the real top-level pass — lowerDecls is dead for
  top-level); emit_llvm.zig emit() appends each via LLVMAppendModuleInlineAsm in
  source order.
- the new node forced asm_global arms in sema.zig (analyzeNode +
  findNodeAtOffset) and semantic_diagnostics.zig (checkBindingNames).

Verified end-to-end: an aarch64 `_my_add` global routine, called via `extern`,
returns 42 — AOT only (the ORC JIT doesn't link module-asm symbols; global-asm
symbols live in the final linked binary). Locked with 1648-platform-asm-global
({ "aot": true, "target": "macos" } → AOT build+run on aarch64, ir-only else).

zig build test green (656 corpus, 446 unit).
This commit is contained in:
agra
2026-06-15 22:22:29 +03:00
parent d3c6ffed5a
commit 4d75b9323c
14 changed files with 146 additions and 19 deletions

View File

@@ -6,7 +6,26 @@ commit, one step at a time per the cadence rule (no commit may both add a test
and make it pass).
## Last completed step
**E** — multi-output tuples. **Inline asm now returns tuples.** Replaced the
**F** — global (module-scope) asm. A top-level `asm { "tmpl", };` block (template
only) lowers to LLVM `module asm`, and a lib-less `extern` calls into the symbols
it defines. New `asm_global` AST node (`src/ast.zig`) + `parseAsmGlobal`
(`src/parser.zig`, dispatched from `parseTopLevel` on `kw_asm`) — rejects
`volatile` and any operands/clobbers. The node forced (and got) arms in the same
three `Node.Data` switches as `asm_expr` (`sema.zig` ×2, `semantic_diagnostics.zig`).
`Module` gains a `global_asm: ArrayList([]const u8)` (`src/ir/module.zig`);
`lowerMainAndComptime` captures each template (the dead `lowerDecls` is NOT the
top-level pass — `lowerRoot` Pass 2 uses `lowerMainAndComptime`); `emit_llvm.zig`'s
`emit()` appends each via `LLVMAppendModuleInlineAsm` (source order). Verified
end-to-end: an aarch64 `_my_add` global routine called via `extern` returns 42 —
**AOT only** (the ORC JIT doesn't link module-asm symbols, so `sx run` is wrong;
the design ties global-asm symbols to the final linked binary). Locked with
`examples/1648-platform-asm-global.sx` (`.build { "aot": true, "target": "macos" }`
→ AOT build+run on aarch64, ir-only elsewhere). `zig build test` green (656
corpus, 446 unit). Files: `src/ast.zig`, `src/parser.zig`, `src/sema.zig`,
`src/ir/semantic_diagnostics.zig`, `src/ir/module.zig`, `src/ir/lower/decl.zig`,
`src/ir/emit_llvm.zig`, `examples/1648-*`.
Prior: **E** — multi-output tuples. **Inline asm now returns tuples.** Replaced the
N>1 bail with a shared `asmResultType` helper (`src/ir/lower/expr.zig`, mixed
into `Lowering`) that derives the result type from the `out_value` operands
(0→void, 1→T, N→named tuple, named via the §II.5 effective-name rule). The key
@@ -135,10 +154,13 @@ pipeline: lex (A.0) → parse (A.1) → validate (B.0/B.1 + `%[name]` check) →
(C.0) → lower-builds-op + LLVM emit + JIT asm-parser init (C.1/D) → multi-output
tuples (E). Register-class + register-pinned operands, inputs, clobbers, `#string`
multi-instruction templates, `%[name]`/`%%` rewriting, and the §II.5 auto-naming
rule all work and execute on the host JIT. **Remaining feature gaps:** `-> @place`
write-through / read-write / indirect-memory outputs (rejected at parse — Phase 2)
and global `asm { … }` + `extern` call-into-asm (Phase F). `readme.md` has no
inline-asm section yet (docs-track-changes follow-up).
rule all work and execute on the host JIT. Global `asm { … }` (Phase F) works AOT (call-into-asm
via lib-less `extern`). **Remaining feature gap:** `-> @place` write-through /
read-write / indirect-memory outputs (rejected at parse — Phase 2). Smaller
follow-ups: the comptime-call guard for global asm (`#run` into a module-asm
symbol should fail loud via dlsym-miss — pin a test), a JIT-vs-global-asm note
(`sx run` silently mishandles module-asm symbols; AOT is correct), the x86_64
syscall ir-only example, and a `readme.md` inline-asm section (docs-track-changes).
Known orthogonal bug: **issue 0137**`sx run` on a program with no `main`
segfaults (`src/target.zig:256-273`, unguarded JIT entry lookup). Pre-existing,
@@ -150,21 +172,17 @@ Phase EF feasibility already confirmed against the live tree
`extern`, 60 sites; `--target` a global CLI flag).
## Next step
Two independent directions (pick either):
- **Phase F — global asm** (smaller; the plan calls it "Small"): top-level
`asm { … }` decl (template only — reject operands/`volatile`) → lower to
`c.LLVMAppendModuleInlineAsm`; the call-INTO-asm direction reuses the existing
lib-less `extern` (no new surface). Parser: recognize `asm {` at decl scope →
an `asm_global` decl. Plus the comptime-call guard (a global-asm symbol isn't
in the JIT host — dlsym-miss must be loud). See `PLAN-ASM.md` Phase F.
- **Phase 2 — `-> @place` outputs** (write-through, read-write `"+r" -> @place`,
indirect-memory `"=*m"`): currently rejected at parse. Needs place-expr
lowering for the output target + the indirect-constraint handling, plus
output-to-`const` rejection.
**Phase 2 — `-> @place` outputs** (the last feature gap): write-through
(`"=…" -> @place`), read-write (`"+…" -> @place`), and indirect-memory (`"=*m"`)
outputs, currently rejected at parse. Needs: parse `-> @<place-expr>` into an
`out_place` operand (payload = the place expr), lower the place to an address +
`store` the asm result through it (place outputs don't join the result tuple),
the `+` read-write seeding, and output-to-`const` rejection. See `PLAN-ASM.md`
Phase G / design §II.2 Dev 5 + cookbook (`cas`, `memcpy_bytes`, `cpuid_into`).
Also worth doing soon: the **x86_64 syscall-write** ir-only example (plan's D
verification) and a **readme.md** inline-asm section (docs-track-changes). And the
orthogonal **issue 0137** (no-`main` segfault) whenever.
Smaller polish (any order): comptime-call guard test for global asm; `sx run`
should error (not silently mishandle) a module-asm symbol; x86_64 syscall-write
ir-only example; `readme.md` inline-asm section. Orthogonal: **issue 0137**.
## Log
- (init) Plan + design doc written; ASM stream opened.
@@ -205,6 +223,10 @@ orthogonal **issue 0137** (no-`main` segfault) whenever.
struct, so emit unchanged; the asm struct return IS the sx tuple. Runs on
aarch64 (1647: `split``(lo,hi)`); 1640 → x86 multi-output IR lock (ir-only).
`zig build test` green (655 corpus, 446 unit).
- (F) global asm — `asm_global` AST node + `parseAsmGlobal` (top-level, rejects
volatile/operands); `Module.global_asm` captured in `lowerMainAndComptime`;
`emit()` appends via `LLVMAppendModuleInlineAsm`; call-into via lib-less
`extern`. AOT-verified (1648, `_my_add`→42). `zig build test` green (656 corpus).
## Known issues
- **0137** — `sx run` on a program with no `main` segfaults (unguarded JIT entry