feat(asm): Phase C.1 + D — inline asm codegen (runs end-to-end)

lowerAsmExpr stops bailing and builds the inline_asm op: resolves each operand's effective name (§II.5 — explicit [name] else the {reg} pin), interns template/constraints/clobbers, lowers input Refs, derives the result TypeId (0→void, 1→T). Adds the last deferred validation (every %[name] must name an operand). Multi-output (N>1) bails with a named "Phase E" diagnostic. emitInlineAsm (backend/llvm/ops.zig) ports Zig's airAssembly: assembles the LLVM constraint string (outputs → inputs → ~{clobber}, ',' → '|'), rewrites the template (%[name]→${N}, %%→%, $→$$, %=→${:uid}), then LLVMGetInlineAsm + LLVMBuildCall2 (AT&T dialect). Dispatch wired in emit_llvm.zig (replacing the C.0 @panic tripwire). inferType gains an .asm_expr arm (expr_typer.zig) so a bare `x := asm {…-> T}` binding types correctly — without it the binding inferred .unresolved and silently produced 0. llvm_shim.c: LLVMInitializeNativeAsmParser() — the JIT must assemble inline asm at run time. Verified end-to-end on the aarch64 host: `mov`/`add` with register-class inputs and a value output run (exit 42/99), `nop volatile` runs (exit 0). IR is textbook: `call i64 asm "add ${0},${1},${2}", "=r,r,r"(…)`. Locked with 1645 (aarch64 add, runs; ir-only on non-aarch64) + 1646 (:= binding). Updated 1640 (now Phase-E bail) + 1642 (now runs). zig build test green (654 corpus, 446 unit).
2026-06-15 21:39:54 +03:00
parent 6c08de8ec1
commit 5a5e04c6d5
23 changed files with 395 additions and 50 deletions
--- a/examples/1640-platform-asm-parse.sx
+++ b/examples/1640-platform-asm-parse.sx
@@ -1,10 +1,10 @@
-// ASM stream Phase A.1 — `asm { … }` PARSES into an AsmExpr: template, named
-// value outputs (`[quot] "={rax}" -> u64`), register-pinned inputs, and a
-// `clobbers(.…)` clause are all accepted with no parse error. Codegen is not
-// implemented yet (the IR op + LLVM emit land in Phases C–E), so lowering bails
-// LOUD + named. This example pins that intermediate diagnostic; a later phase
-// turns it into a running multi-return example. Called from `main` so lowering
-// actually reaches the asm body (lazy lowering skips uncalled functions).
+// ASM stream — `asm { … }` parses + validates the full rich shape: named value
+// outputs (`[quot] "={rax}" -> u64`), register-pinned inputs, and a
+// `clobbers(.…)` clause, all accepted. This is a MULTI-output (tuple-returning)
+// asm, which is deferred to Phase E — so lowering bails LOUD + named with the
+// specific "Phase E" diagnostic (single-output asm already runs; see 1645).
+// Called from `main` so lowering reaches the asm body (lazy lowering skips
+// uncalled functions).
 divmod :: (n: u64, d: u64) -> (quot: u64, rem: u64) {
    return asm {
        "divq %[d]",
--- a/examples/1642-platform-asm-nop-volatile.sx
+++ b/examples/1642-platform-asm-nop-volatile.sx
@@ -1,5 +1,5 @@
-// ASM stream Phase B — the no-output form IS accepted when `volatile` is
-// present: validation passes, and lowering then bails on the not-yet-
-// implemented codegen (Phases C–E). Confirms the volatile rule's positive side.
+// ASM stream — the no-output `volatile` form runs end-to-end: a bare `nop`
+// (no operands, no result) assembles and executes cleanly (exit 0). Confirms
+// the no-output⇒volatile rule's positive side AND the zero-operand emit path.
 nop :: () { asm volatile { "nop" }; }
 main :: () { nop(); }
--- a/examples/1645-platform-asm-aarch64-add.sx
+++ b/examples/1645-platform-asm-aarch64-add.sx
@@ -0,0 +1,10 @@
+// ASM stream Phase D — inline assembly that RUNS end-to-end. An aarch64 `add`
+// with two register-class inputs (`%[a]`, `%[b]`) and a value output (`%[out]`)
+// returned from the function. The `.build` pins aarch64-macOS: on a matching
+// host the runner executes it (exit 42); elsewhere it falls to ir-only mode and
+// asserts the `.ir` snapshot (the inline_asm op + LLVM `call asm` are target-
+// independent in the IR text). Regression for the full lower→emit→JIT path.
+add_asm :: (a: i64, b: i64) -> i64 {
+    return asm { "add %[out], %[a], %[b]", [out] "=r" -> i64, [a] "r" = a, [b] "r" = b };
+}
+main :: () -> i64 { return add_asm(40, 2); }
--- a/examples/1646-platform-asm-value-binding.sx
+++ b/examples/1646-platform-asm-value-binding.sx
@@ -0,0 +1,9 @@
+// ASM stream Phase D — a bare `x := asm { … -> T }` binding (not a direct
+// `return asm`) types correctly: the value output flows through the local and
+// out as the exit code. Regression for the `inferType` `.asm_expr` arm (without
+// it the binding inferred `.unresolved` and silently produced 0). aarch64-pinned
+// via `.build` → runs on a matching host, ir-only elsewhere.
+main :: () -> i64 {
+    x := asm { "mov %[out], #99", [out] "=r" -> i64 };
+    return x;
+}
--- a/examples/expected/1640-platform-asm-parse.stderr
+++ b/examples/expected/1640-platform-asm-parse.stderr
@@ -1,4 +1,4 @@
-error: inline assembly codegen is not yet implemented (ASM stream: lowering + emit land in Phases C–E)
+error: multi-output (tuple-returning) inline assembly is not yet implemented (ASM stream Phase E)
  --> examples/1640-platform-asm-parse.sx:9:12
   |
 9 |     return asm {
--- a/examples/expected/1642-platform-asm-nop-volatile.exit
+++ b/examples/expected/1642-platform-asm-nop-volatile.exit
@@ -1 +1 @@
-1
+0
--- a/examples/expected/1642-platform-asm-nop-volatile.stderr
+++ b/examples/expected/1642-platform-asm-nop-volatile.stderr
@@ -1,5 +1 @@
-error: inline assembly codegen is not yet implemented (ASM stream: lowering + emit land in Phases C–E)
-  --> examples/1642-platform-asm-nop-volatile.sx:4:13
-   |
- 4 | nop :: () { asm volatile { "nop" }; }
-   |             ^^^^^^^^^^^^^^^^^^^^^^
+
--- a/examples/expected/1645-platform-asm-aarch64-add.build
+++ b/examples/expected/1645-platform-asm-aarch64-add.build
@@ -0,0 +1 @@
+{ "target": "macos" }
--- a/examples/expected/1645-platform-asm-aarch64-add.exit
+++ b/examples/expected/1645-platform-asm-aarch64-add.exit
@@ -0,0 +1 @@
+42
--- a/examples/expected/1645-platform-asm-aarch64-add.ir
+++ b/examples/expected/1645-platform-asm-aarch64-add.ir
@@ -0,0 +1,21 @@
+
+; Function Attrs: nounwind
+define internal i64 @add_asm(i64 %0, i64 %1) #0 {
+entry:
+  %alloca = alloca i64, align 8
+  store i64 %0, ptr %alloca, align 8
+  %allocaN = alloca i64, align 8
+  store i64 %1, ptr %allocaN, align 8
+  %load = load i64, ptr %alloca, align 8
+  %loadN = load i64, ptr %allocaN, align 8
+  %asm = call i64 asm "add ${0}, ${1}, ${2}", "=r,r,r"(i64 %load, i64 %loadN)
+  ret i64 %asm
+}
+
+; Function Attrs: nounwind
+define i32 @main() #0 {
+entry:
+  %call = call i64 @add_asm(i64 40, i64 2)
+  %ca.tr = trunc i64 %call to i32
+  ret i32 %ca.tr
+}
--- a/examples/expected/1645-platform-asm-aarch64-add.stderr
+++ b/examples/expected/1645-platform-asm-aarch64-add.stderr
@@ -0,0 +1 @@
+
--- a/examples/expected/1645-platform-asm-aarch64-add.stdout
+++ b/examples/expected/1645-platform-asm-aarch64-add.stdout
@@ -0,0 +1 @@
+
--- a/examples/expected/1646-platform-asm-value-binding.build
+++ b/examples/expected/1646-platform-asm-value-binding.build
@@ -0,0 +1 @@
+{ "target": "macos" }
--- a/examples/expected/1646-platform-asm-value-binding.exit
+++ b/examples/expected/1646-platform-asm-value-binding.exit
@@ -0,0 +1 @@
+99
--- a/examples/expected/1646-platform-asm-value-binding.ir
+++ b/examples/expected/1646-platform-asm-value-binding.ir
@@ -0,0 +1,11 @@
+
+; Function Attrs: nounwind
+define i32 @main() #0 {
+entry:
+  %asm = call i64 asm "mov ${0}, #99", "=r"()
+  %alloca = alloca i64, align 8
+  store i64 %asm, ptr %alloca, align 8
+  %load = load i64, ptr %alloca, align 8
+  %ca.tr = trunc i64 %load to i32
+  ret i32 %ca.tr
+}
--- a/examples/expected/1646-platform-asm-value-binding.stderr
+++ b/examples/expected/1646-platform-asm-value-binding.stderr
@@ -0,0 +1 @@
+
--- a/examples/expected/1646-platform-asm-value-binding.stdout
+++ b/examples/expected/1646-platform-asm-value-binding.stdout
@@ -0,0 +1 @@
+