diff --git a/current/CHECKPOINT-ASM.md b/current/CHECKPOINT-ASM.md index 7ee5d5c..7b2970f 100644 --- a/current/CHECKPOINT-ASM.md +++ b/current/CHECKPOINT-ASM.md @@ -6,7 +6,31 @@ commit, one step at a time per the cadence rule (no commit may both add a test and make it pass). ## Last completed step -**C.0** — IR op `inline_asm` (lock; no behavior change). Added `inline_asm: +**C.1 + D** — inline asm CODEGEN (lowering builds the op + LLVM emit). **Inline +assembly now runs end-to-end.** `lowerAsmExpr` (`src/ir/lower/expr.zig`) stops +bailing: it resolves each operand's effective name (§II.5 auto-naming), interns +template/constraints/clobbers, lowers input `Ref`s, derives the result `TypeId` +(0→void, 1→T), and builds the `inline_asm` op. Added a `%[name]`-references-a- +real-operand check (the last deferred validation). Multi-output (N>1) still bails +loudly ("Phase E"). `emitInlineAsm` (`src/backend/llvm/ops.zig`, port of Zig's +`airAssembly`): assembles the LLVM constraint string (outputs→inputs→`~{clobber}`, +`,`→`|`), rewrites the template (`%[name]`→`${N}`, `%%`→`%`, `$`→`$$`, `%=`→ +`${:uid}`), then `LLVMGetInlineAsm` + `LLVMBuildCall2` (AT&T). Dispatch wired +(`emit_llvm.zig`, replacing the C.0 `@panic`). **`llvm_shim.c`**: added +`LLVMInitializeNativeAsmParser()` — the JIT must assemble inline asm at run time. +Verified end-to-end: aarch64 `add`/`mov` run on the host (exit 42), `nop volatile` +runs (1642 now exit 0), IR is textbook (`call i64 asm "add ${0},${1},${2}", +"=r,r,r"(…)`). Locked with `examples/1645-platform-asm-aarch64-add.sx` (runs on +aarch64, ir-only elsewhere via `.build` + `.ir`). Also added the `inferType` +`.asm_expr` arm (`src/ir/expr_typer.zig`, 0→void / 1→T) — without it a bare +`x := asm {…-> T}` binding inferred `.unresolved` and silently produced 0; +regression-locked with `examples/1646-platform-asm-value-binding.sx`. Updated +1640 (now Phase-E bail) + 1642 (now runs). `zig build test` green (654 corpus, +446 unit). Files: `src/ir/lower/expr.zig`, `src/backend/llvm/ops.zig`, +`src/ir/emit_llvm.zig`, `src/ir/expr_typer.zig`, `llvm_shim.c`, +`examples/164{0,2,5,6}-*`. + +Prior: **C.0** — IR op `inline_asm` (lock; no behavior change). Added `inline_asm: InlineAsm` to the IR `Op` union + the `InlineAsm` struct (`template: StringId`, `operands: []const AsmOperand` {role/name/constraint/operand}, `clobbers: []const StringId`, `has_side_effects`) in `src/ir/inst.zig` — all strings @@ -88,40 +112,40 @@ guards fire: corrupting the `.ir` → IR mismatch; deleting it → the require-f `src/corpus_run.test.zig`, `examples/1639-*`. ## Current state -Phase A underway: `asm { … }` lexes (A.0) and **parses** into `AsmExpr` (A.1); -lowering bails LOUD + named (no IR op / emit yet). Result-type derivation, the -operand auto-naming rule, and the validation checklist are **Phase B** (not yet -implemented — any asm reaching lowering errors out). The adopted **operand -auto-naming rule** (design §II.5, decided this session): name auto-derived from a -`{reg}` pin; explicit `[name]` only when it differs or for register-class (`=r`) -operands; echo form `[eax] "={eax}"` rejected. Parser stores `name: ?[]const u8`; -the rule is a Phase-B (typing) concern, so the parser needs no change for it. +**Inline assembly works end-to-end for 0/1 value outputs.** Pipeline complete: +lex (A.0) → parse (A.1) → validate (B.0/B.1 + the `%[name]` check) → IR op (C.0) +→ lower-builds-op + LLVM emit + JIT asm-parser init (C.1/D). Single-value-output +and no-output `volatile` asm assemble and execute on the host JIT; the auto-naming +rule (§II.5) is live (effective name = explicit `[name]` else `{reg}`). **Phase E +(multi-output tuples) is the remaining feature gap** — N>1 value outputs bail with +a named "Phase E" diagnostic (1640). `-> @place` write-through outputs are still +rejected at parse (Phase 2). Global asm (Phase F) not started. Known orthogonal bug: **issue 0137** — `sx run` on a program with no `main` segfaults (`src/target.zig:256-273`, unguarded JIT entry lookup). Pre-existing, asm-independent; does NOT block the ASM stream (every example has a `main`). -Phase B–E feasibility already confirmed against the live tree +Phase E–F feasibility already confirmed against the live tree (`LLVMGetInlineAsm` / `LLVMBuildCall2` / `LLVMAppendModuleInlineAsm` in LLVM@19 `Core.h`; ERR-stream `extractvalue`→tuple in `emit_llvm.zig:726-927`; lib-less `extern`, 60 sites; `--target` a global CLI flag). ## Next step -**C.1 + D together** (must land as one green step) — wire `lowerAsmExpr` to BUILD -the `inline_asm` op (intern template + constraints + clobber names; resolve each -operand's effective name via the §II.5 auto-naming rule; lower input `Ref`s; -compute the result `TypeId` from the `out_value` operands — 0→void, 1→T, N→tuple, -named) AND implement `emitInlineAsm` in `src/ir/emit_llvm.zig` (replacing the -`@panic` tripwire) — the port of Zig's `airAssembly`: assemble the LLVM constraint -string (outputs `=`/`+`, inputs, `clobbers`→`~{name}`), rewrite `%[name]`→`${N}` / -`%%` / `%=`, `LLVMGetInlineAsm` + `LLVMBuildCall2`, AT&T dialect. They land -together because the moment lowering stops bailing, emit is reached — a half-step -would hit the tripwire. First target: the single-value-output syscall on -`x86_64-linux` (ir-only via a `.build` `{ "target": "x86_64-linux" }` + `.ir` -snapshot, since the host is aarch64). Result-type derivation for `expr_typer.zig` -(`inferType` `.asm_expr` arm) also lands here — now observable. Then E (multi- -return tuples) + remaining validation (`%[name]` references a real operand). See -`PLAN-ASM.md` Phases C–E + design §II.6. +**Phase E** (multi-output tuples) — replace the N>1 "Phase E" bail in +`lowerAsmExpr`: build a tuple `TypeId` from the `out_value` types (named via the +effective-name rule), set it as the op result, and in `emitInlineAsm` make the +LLVM return type an anonymous struct `{T1,…,Tn}`, then `extractvalue i` per +`out_value` → assemble the sx tuple. Lock with `divmod`→`(quot,rem)` (reuse 1640's +shape, now running) + `cpuid`→4-tuple, arch-pinned. See `PLAN-ASM.md` Phase E + +design §II.6 (multi-return). Also worth adding: the x86_64-linux syscall-write +example (ir-only on this host via `.build { "target": "x86_64-linux" }` + `.ir`) +to lock the cross-target lowering, per the plan's D verification. + +Then Phase 2 (`-> @place` write-through / read-write / indirect-memory) and Phase +F (global asm + `extern` call into asm symbols). Result-type derivation for the +0/1 cases now lives in BOTH `lowerAsmExpr` (the op's `Inst.ty`) and +`expr_typer.zig`'s `inferType` (for `:=`/value-position typing); Phase E extends +both to the tuple case. ## Log - (init) Plan + design doc written; ASM stream opened. @@ -151,6 +175,12 @@ return tuples) + remaining validation (`%[name]` references a real operand). See - (C.0) IR op `inline_asm: InlineAsm` + interp `bailDetail` + print arm + emit `@panic` tripwire (Phase D). No behavior change (lowering still bails). Unit test `inline_asm op shape`. `zig build test` green (652 corpus, 446 unit). +- (C.1+D) CODEGEN — `lowerAsmExpr` builds the op (effective names, interned + strings, input Refs, 0/1 result type) + `%[name]` validation; `emitInlineAsm` + (constraint string + template rewrite + `LLVMGetInlineAsm`/`BuildCall2`, AT&T); + `inferType` arm; `LLVMInitializeNativeAsmParser` for the JIT. **Inline asm runs + end-to-end.** N>1 bails (Phase E). Locked with 1645 (aarch64 add, runs) + 1646 + (`:=` binding); updated 1640/1642. `zig build test` green (654 corpus, 446 unit). ## Known issues - **0137** — `sx run` on a program with no `main` segfaults (unguarded JIT entry diff --git a/examples/1640-platform-asm-parse.sx b/examples/1640-platform-asm-parse.sx index d2efaec..17a819f 100644 --- a/examples/1640-platform-asm-parse.sx +++ b/examples/1640-platform-asm-parse.sx @@ -1,10 +1,10 @@ -// ASM stream Phase A.1 — `asm { … }` PARSES into an AsmExpr: template, named -// value outputs (`[quot] "={rax}" -> u64`), register-pinned inputs, and a -// `clobbers(.…)` clause are all accepted with no parse error. Codegen is not -// implemented yet (the IR op + LLVM emit land in Phases C–E), so lowering bails -// LOUD + named. This example pins that intermediate diagnostic; a later phase -// turns it into a running multi-return example. Called from `main` so lowering -// actually reaches the asm body (lazy lowering skips uncalled functions). +// ASM stream — `asm { … }` parses + validates the full rich shape: named value +// outputs (`[quot] "={rax}" -> u64`), register-pinned inputs, and a +// `clobbers(.…)` clause, all accepted. This is a MULTI-output (tuple-returning) +// asm, which is deferred to Phase E — so lowering bails LOUD + named with the +// specific "Phase E" diagnostic (single-output asm already runs; see 1645). +// Called from `main` so lowering reaches the asm body (lazy lowering skips +// uncalled functions). divmod :: (n: u64, d: u64) -> (quot: u64, rem: u64) { return asm { "divq %[d]", diff --git a/examples/1642-platform-asm-nop-volatile.sx b/examples/1642-platform-asm-nop-volatile.sx index 2281ae8..457f192 100644 --- a/examples/1642-platform-asm-nop-volatile.sx +++ b/examples/1642-platform-asm-nop-volatile.sx @@ -1,5 +1,5 @@ -// ASM stream Phase B — the no-output form IS accepted when `volatile` is -// present: validation passes, and lowering then bails on the not-yet- -// implemented codegen (Phases C–E). Confirms the volatile rule's positive side. +// ASM stream — the no-output `volatile` form runs end-to-end: a bare `nop` +// (no operands, no result) assembles and executes cleanly (exit 0). Confirms +// the no-output⇒volatile rule's positive side AND the zero-operand emit path. nop :: () { asm volatile { "nop" }; } main :: () { nop(); } diff --git a/examples/1645-platform-asm-aarch64-add.sx b/examples/1645-platform-asm-aarch64-add.sx new file mode 100644 index 0000000..73f2954 --- /dev/null +++ b/examples/1645-platform-asm-aarch64-add.sx @@ -0,0 +1,10 @@ +// ASM stream Phase D — inline assembly that RUNS end-to-end. An aarch64 `add` +// with two register-class inputs (`%[a]`, `%[b]`) and a value output (`%[out]`) +// returned from the function. The `.build` pins aarch64-macOS: on a matching +// host the runner executes it (exit 42); elsewhere it falls to ir-only mode and +// asserts the `.ir` snapshot (the inline_asm op + LLVM `call asm` are target- +// independent in the IR text). Regression for the full lower→emit→JIT path. +add_asm :: (a: i64, b: i64) -> i64 { + return asm { "add %[out], %[a], %[b]", [out] "=r" -> i64, [a] "r" = a, [b] "r" = b }; +} +main :: () -> i64 { return add_asm(40, 2); } diff --git a/examples/1646-platform-asm-value-binding.sx b/examples/1646-platform-asm-value-binding.sx new file mode 100644 index 0000000..dbb614a --- /dev/null +++ b/examples/1646-platform-asm-value-binding.sx @@ -0,0 +1,9 @@ +// ASM stream Phase D — a bare `x := asm { … -> T }` binding (not a direct +// `return asm`) types correctly: the value output flows through the local and +// out as the exit code. Regression for the `inferType` `.asm_expr` arm (without +// it the binding inferred `.unresolved` and silently produced 0). aarch64-pinned +// via `.build` → runs on a matching host, ir-only elsewhere. +main :: () -> i64 { + x := asm { "mov %[out], #99", [out] "=r" -> i64 }; + return x; +} diff --git a/examples/expected/1640-platform-asm-parse.stderr b/examples/expected/1640-platform-asm-parse.stderr index a34f506..7122ba9 100644 --- a/examples/expected/1640-platform-asm-parse.stderr +++ b/examples/expected/1640-platform-asm-parse.stderr @@ -1,4 +1,4 @@ -error: inline assembly codegen is not yet implemented (ASM stream: lowering + emit land in Phases C–E) +error: multi-output (tuple-returning) inline assembly is not yet implemented (ASM stream Phase E) --> examples/1640-platform-asm-parse.sx:9:12 | 9 | return asm { diff --git a/examples/expected/1642-platform-asm-nop-volatile.exit b/examples/expected/1642-platform-asm-nop-volatile.exit index d00491f..573541a 100644 --- a/examples/expected/1642-platform-asm-nop-volatile.exit +++ b/examples/expected/1642-platform-asm-nop-volatile.exit @@ -1 +1 @@ -1 +0 diff --git a/examples/expected/1642-platform-asm-nop-volatile.stderr b/examples/expected/1642-platform-asm-nop-volatile.stderr index 27eff4c..8b13789 100644 --- a/examples/expected/1642-platform-asm-nop-volatile.stderr +++ b/examples/expected/1642-platform-asm-nop-volatile.stderr @@ -1,5 +1 @@ -error: inline assembly codegen is not yet implemented (ASM stream: lowering + emit land in Phases C–E) - --> examples/1642-platform-asm-nop-volatile.sx:4:13 - | - 4 | nop :: () { asm volatile { "nop" }; } - | ^^^^^^^^^^^^^^^^^^^^^^ + diff --git a/examples/expected/1645-platform-asm-aarch64-add.build b/examples/expected/1645-platform-asm-aarch64-add.build new file mode 100644 index 0000000..42e24dd --- /dev/null +++ b/examples/expected/1645-platform-asm-aarch64-add.build @@ -0,0 +1 @@ +{ "target": "macos" } diff --git a/examples/expected/1645-platform-asm-aarch64-add.exit b/examples/expected/1645-platform-asm-aarch64-add.exit new file mode 100644 index 0000000..d81cc07 --- /dev/null +++ b/examples/expected/1645-platform-asm-aarch64-add.exit @@ -0,0 +1 @@ +42 diff --git a/examples/expected/1645-platform-asm-aarch64-add.ir b/examples/expected/1645-platform-asm-aarch64-add.ir new file mode 100644 index 0000000..95ba700 --- /dev/null +++ b/examples/expected/1645-platform-asm-aarch64-add.ir @@ -0,0 +1,21 @@ + +; Function Attrs: nounwind +define internal i64 @add_asm(i64 %0, i64 %1) #0 { +entry: + %alloca = alloca i64, align 8 + store i64 %0, ptr %alloca, align 8 + %allocaN = alloca i64, align 8 + store i64 %1, ptr %allocaN, align 8 + %load = load i64, ptr %alloca, align 8 + %loadN = load i64, ptr %allocaN, align 8 + %asm = call i64 asm "add ${0}, ${1}, ${2}", "=r,r,r"(i64 %load, i64 %loadN) + ret i64 %asm +} + +; Function Attrs: nounwind +define i32 @main() #0 { +entry: + %call = call i64 @add_asm(i64 40, i64 2) + %ca.tr = trunc i64 %call to i32 + ret i32 %ca.tr +} diff --git a/examples/expected/1645-platform-asm-aarch64-add.stderr b/examples/expected/1645-platform-asm-aarch64-add.stderr new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/examples/expected/1645-platform-asm-aarch64-add.stderr @@ -0,0 +1 @@ + diff --git a/examples/expected/1645-platform-asm-aarch64-add.stdout b/examples/expected/1645-platform-asm-aarch64-add.stdout new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/examples/expected/1645-platform-asm-aarch64-add.stdout @@ -0,0 +1 @@ + diff --git a/examples/expected/1646-platform-asm-value-binding.build b/examples/expected/1646-platform-asm-value-binding.build new file mode 100644 index 0000000..42e24dd --- /dev/null +++ b/examples/expected/1646-platform-asm-value-binding.build @@ -0,0 +1 @@ +{ "target": "macos" } diff --git a/examples/expected/1646-platform-asm-value-binding.exit b/examples/expected/1646-platform-asm-value-binding.exit new file mode 100644 index 0000000..3ad5abd --- /dev/null +++ b/examples/expected/1646-platform-asm-value-binding.exit @@ -0,0 +1 @@ +99 diff --git a/examples/expected/1646-platform-asm-value-binding.ir b/examples/expected/1646-platform-asm-value-binding.ir new file mode 100644 index 0000000..e8c21b4 --- /dev/null +++ b/examples/expected/1646-platform-asm-value-binding.ir @@ -0,0 +1,11 @@ + +; Function Attrs: nounwind +define i32 @main() #0 { +entry: + %asm = call i64 asm "mov ${0}, #99", "=r"() + %alloca = alloca i64, align 8 + store i64 %asm, ptr %alloca, align 8 + %load = load i64, ptr %alloca, align 8 + %ca.tr = trunc i64 %load to i32 + ret i32 %ca.tr +} diff --git a/examples/expected/1646-platform-asm-value-binding.stderr b/examples/expected/1646-platform-asm-value-binding.stderr new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/examples/expected/1646-platform-asm-value-binding.stderr @@ -0,0 +1 @@ + diff --git a/examples/expected/1646-platform-asm-value-binding.stdout b/examples/expected/1646-platform-asm-value-binding.stdout new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/examples/expected/1646-platform-asm-value-binding.stdout @@ -0,0 +1 @@ + diff --git a/llvm_shim.c b/llvm_shim.c index c0fd238..f9c8f51 100644 --- a/llvm_shim.c +++ b/llvm_shim.c @@ -14,4 +14,7 @@ void sx_llvm_init_all_targets(void) { void sx_llvm_init_native_target(void) { LLVMInitializeNativeTarget(); LLVMInitializeNativeAsmPrinter(); + // Required for inline assembly: the JIT must assemble the asm template at + // run time, which needs the target's asm parser (ASM stream Phase D). + LLVMInitializeNativeAsmParser(); } diff --git a/src/backend/llvm/ops.zig b/src/backend/llvm/ops.zig index 17a977c..757856d 100644 --- a/src/backend/llvm/ops.zig +++ b/src/backend/llvm/ops.zig @@ -24,6 +24,7 @@ const Call = ir_inst.Call; const CallIndirect = ir_inst.CallIndirect; const ObjcMsgSend = ir_inst.ObjcMsgSend; const JniMsgSend = ir_inst.JniMsgSend; +const InlineAsm = ir_inst.InlineAsm; const BuiltinCall = ir_inst.BuiltinCall; const TriOp = ir_inst.TriOp; const Branch = ir_inst.Branch; @@ -774,6 +775,161 @@ pub const Ops = struct { self.e.mapRef(result); } + /// Inline assembly (ASM stream Phase D) — the port of Zig's `airAssembly`. + /// Handles 0 value outputs (void) and 1 (scalar); multi-output tuples are + /// Phase E (lowering bails before reaching here). Builds the LLVM constraint + /// string, rewrites the `%[name]` template, then `LLVMGetInlineAsm` + + /// `LLVMBuildCall2`. + pub fn emitInlineAsm(self: Ops, instruction: *const Inst, a: InlineAsm) void { + const e = self.e; + const alloc = e.alloc; + + var n_inputs: usize = 0; + for (a.operands) |op| { + if (op.role == .input) n_inputs += 1; + } + + // Result LLVM type: void (no value output) or the single scalar. + const ret_ty = if (instruction.ty == .void) e.cached_void else e.toLLVMType(instruction.ty); + + // One LLVM call param per input operand, in source order. + const param_types = alloc.alloc(c.LLVMTypeRef, n_inputs) catch unreachable; + defer alloc.free(param_types); + const call_args = alloc.alloc(c.LLVMValueRef, n_inputs) catch unreachable; + defer alloc.free(call_args); + { + var i: usize = 0; + for (a.operands) |op| { + if (op.role != .input) continue; + const raw_ty = e.argIRTypeOrFail(op.operand); + const llvm_ty = e.toLLVMType(raw_ty); + param_types[i] = llvm_ty; + call_args[i] = e.coerceArg(e.resolveRef(op.operand), llvm_ty); + i += 1; + } + } + + // ── Constraint string: outputs first, then inputs, then ~{clobber}. ── + var cons: std.ArrayList(u8) = .empty; + defer cons.deinit(alloc); + self.appendAsmConstraints(&cons, a, false); // outputs (out_value / out_place) + self.appendAsmConstraints(&cons, a, true); // inputs + for (a.clobbers) |cl| { + if (cons.items.len != 0) cons.append(alloc, ',') catch unreachable; + cons.appendSlice(alloc, "~{") catch unreachable; + cons.appendSlice(alloc, e.ir_mod.types.getString(cl)) catch unreachable; + cons.append(alloc, '}') catch unreachable; + } + + // ── Template rewrite: %[name]->${N}, %%->%, $->$$, %=->${:uid}. ── + var rendered: std.ArrayList(u8) = .empty; + defer rendered.deinit(alloc); + self.renderAsmTemplate(&rendered, a); + + const fn_ty = c.LLVMFunctionType(ret_ty, param_types.ptr, @intCast(n_inputs), 0); + const asm_val = c.LLVMGetInlineAsm( + fn_ty, + rendered.items.ptr, + rendered.items.len, + cons.items.ptr, + cons.items.len, + @intFromBool(a.has_side_effects), + 0, // IsAlignStack + c.LLVMInlineAsmDialectATT, + 0, // CanThrow + ); + const label: [*:0]const u8 = if (instruction.ty == .void) "" else "asm"; + const result = c.LLVMBuildCall2(e.builder, fn_ty, asm_val, call_args.ptr, @intCast(n_inputs), label); + // Always mapRef — the IR Ref counter advances regardless of result type. + e.mapRef(result); + } + + /// Append the constraint fragments for one role group (outputs or inputs), + /// comma-separated, with each operand's `,` rewritten to LLVM's `|` + /// (alternative-constraint separator). Mirrors `FuncGen.airAssembly`. + fn appendAsmConstraints(self: Ops, cons: *std.ArrayList(u8), a: InlineAsm, inputs: bool) void { + const e = self.e; + const alloc = e.alloc; + for (a.operands) |op| { + const is_input = op.role == .input; + if (is_input != inputs) continue; + if (cons.items.len != 0) cons.append(alloc, ',') catch unreachable; + const s = e.ir_mod.types.getString(op.constraint); + for (s) |ch| cons.append(alloc, if (ch == ',') '|' else ch) catch unreachable; + } + } + + /// The positional index of a named operand in the LLVM operand list + /// (outputs first, then inputs) — the `N` in `%[name]` → `${N}`. Lowering + /// guarantees every `%[name]` names an operand, so callers can assume a hit. + fn asmOperandIndex(self: Ops, a: InlineAsm, name: []const u8) ?usize { + const e = self.e; + var idx: usize = 0; + for ([_]bool{ false, true }) |inputs| { + for (a.operands) |op| { + const is_input = op.role == .input; + if (is_input != inputs) continue; + if (op.name != .empty and std.mem.eql(u8, e.ir_mod.types.getString(op.name), name)) return idx; + idx += 1; + } + } + return null; + } + + /// Rewrite the asm template into LLVM form. State machine over the bytes: + /// `$`→`$$`, `%%`→`%`, `%=`→`${:uid}`, `%[name]`→`${N}`, `%[name:mod]`→ + /// `${N:mod}`. Port of `FuncGen.zig`'s template rewriter. + fn renderAsmTemplate(self: Ops, out: *std.ArrayList(u8), a: InlineAsm) void { + const e = self.e; + const alloc = e.alloc; + const tmpl = e.ir_mod.types.getString(a.template); + var i: usize = 0; + while (i < tmpl.len) { + const ch = tmpl[i]; + if (ch == '$') { + out.appendSlice(alloc, "$$") catch unreachable; + i += 1; + continue; + } + if (ch == '%' and i + 1 < tmpl.len) { + const nxt = tmpl[i + 1]; + if (nxt == '%') { + out.append(alloc, '%') catch unreachable; + i += 2; + continue; + } + if (nxt == '=') { + out.appendSlice(alloc, "${:uid}") catch unreachable; + i += 2; + continue; + } + if (nxt == '[') { + const close = std.mem.indexOfScalarPos(u8, tmpl, i + 2, ']').?; // lowering validated + var name = tmpl[i + 2 .. close]; + var modifier: ?[]const u8 = null; + if (std.mem.indexOfScalar(u8, name, ':')) |colon| { + modifier = name[colon + 1 ..]; + name = name[0..colon]; + } + const idx = self.asmOperandIndex(a, name).?; // lowering validated + var buf: [16]u8 = undefined; + const ds = std.fmt.bufPrint(&buf, "{d}", .{idx}) catch unreachable; + out.appendSlice(alloc, "${") catch unreachable; + out.appendSlice(alloc, ds) catch unreachable; + if (modifier) |m| { + out.append(alloc, ':') catch unreachable; + out.appendSlice(alloc, m) catch unreachable; + } + out.append(alloc, '}') catch unreachable; + i = close + 1; + continue; + } + } + out.append(alloc, ch) catch unreachable; + i += 1; + } + } + pub fn emitCall(self: Ops, instruction: *const Inst, call_op: Call) void { // Evaluate comptime functions at compile time const callee_func = &self.e.ir_mod.functions.items[call_op.callee.index()]; diff --git a/src/ir/emit_llvm.zig b/src/ir/emit_llvm.zig index ea4430f..bb99e3c 100644 --- a/src/ir/emit_llvm.zig +++ b/src/ir/emit_llvm.zig @@ -1563,11 +1563,7 @@ pub const LLVMEmitter = struct { // ── Calls ───────────────────────────────────────────── .objc_msg_send => |msg| self.ops().emitObjcMsgSend(instruction, msg), .jni_msg_send => |msg| self.ops().emitJniMsgSend(instruction, msg), - // Tripwire (ASM stream): the IR op exists (Phase C.0) but emit lands - // in Phase D. Until then `lowerAsmExpr` still bails, so no inline_asm - // op is ever created — reaching here means lowering switched over - // before emit was ready. Crash loudly rather than miscompile. - .inline_asm => @panic("inline_asm reached LLVM emit before Phase D — lowering must still bail until emitInlineAsm lands"), + .inline_asm => |a| self.ops().emitInlineAsm(instruction, a), .call => |call_op| self.ops().emitCall(instruction, call_op), .call_indirect => |call_op| self.ops().emitCallIndirect(instruction, call_op), diff --git a/src/ir/expr_typer.zig b/src/ir/expr_typer.zig index d8ede4c..e0473dc 100644 --- a/src/ir/expr_typer.zig +++ b/src/ir/expr_typer.zig @@ -398,6 +398,22 @@ pub const ExprTyper = struct { } break :blk self.l.inferExprType(nc.rhs); }, + // Inline asm result type from the `out_value` operands: 0 → void, + // 1 → that operand's type. N>1 (tuple) is Phase E → `.unresolved` + // here (lowering bails on it anyway). Mirrors `lowerAsmExpr`, so a + // bare `x := asm {…-> T}` binding types correctly. + .asm_expr => |ae| blk: { + var n_out: usize = 0; + var first_out: ?*Node = null; + for (ae.operands) |op| { + if (op.role != .out_value) continue; + n_out += 1; + if (first_out == null) first_out = op.payload; + } + if (n_out == 0) break :blk .void; + if (n_out == 1) break :blk self.l.resolveTypeWithBindings(first_out.?); + break :blk .unresolved; + }, // Statements don't produce values (`.return_stmt` is handled above // as `.noreturn` — it diverges rather than yielding `void`). .assignment, .var_decl, .const_decl, .fn_decl, diff --git a/src/ir/lower/expr.zig b/src/ir/lower/expr.zig index a643d41..b901960 100644 --- a/src/ir/lower/expr.zig +++ b/src/ir/lower/expr.zig @@ -2261,9 +2261,98 @@ pub fn lowerAsmExpr(self: *Lowering, ae: *const ast.AsmExpr, span: ast.Span) Ref return self.emitPlaceholder("inline_asm"); } - // Shape is valid — codegen just isn't implemented yet (Phases C–E). - diags.addFmt(.err, span, "inline assembly codegen is not yet implemented (ASM stream: lowering + emit land in Phases C–E)", .{}); - return self.emitPlaceholder("inline_asm"); + // (4) Every `%[name]` in the template must name an operand (effective name: + // explicit `[name]` or auto-derived register). Caught here so emit's + // template rewriter never sees an unknown reference. §II.6. + { + const tmpl = ae.template.data.string_literal.raw; + var i: usize = 0; + while (i < tmpl.len) : (i += 1) { + if (tmpl[i] != '%' or i + 1 >= tmpl.len) continue; + const nxt = tmpl[i + 1]; + if (nxt == '%' or nxt == '=') { + i += 1; + continue; + } + if (nxt != '[') continue; + const close = std.mem.indexOfScalarPos(u8, tmpl, i + 2, ']') orelse { + diags.addFmt(.err, span, "unterminated `%[` in asm template", .{}); + return self.emitPlaceholder("inline_asm"); + }; + var ref_name = tmpl[i + 2 .. close]; + if (std.mem.indexOfScalar(u8, ref_name, ':')) |colon| ref_name = ref_name[0..colon]; + var found = false; + for (ae.operands) |op| { + const eff = op.name orelse (pinnedRegister(op.constraint) orelse ""); + if (eff.len != 0 and std.mem.eql(u8, eff, ref_name)) { + found = true; + break; + } + } + if (!found) { + diags.addFmt(.err, span, "asm template references `%[{s}]` but no operand is named `{s}`", .{ ref_name, ref_name }); + return self.emitPlaceholder("inline_asm"); + } + i = close; + } + } + + // ── Build the IR op (C.1). D emits 0 or 1 value output; N>1 (tuple result) + // is Phase E — bail loudly until then. ── + var n_value_outputs: usize = 0; + for (ae.operands) |op| { + if (op.role == .out_value) n_value_outputs += 1; + } + if (n_value_outputs > 1) { + diags.addFmt(.err, span, "multi-output (tuple-returning) inline assembly is not yet implemented (ASM stream Phase E)", .{}); + return self.emitPlaceholder("inline_asm"); + } + + // Result type: 0 outputs → void; 1 → that operand's resolved type. (The + // resolver diagnoses an unresolvable type and returns `.unresolved`.) + var result_ty: TypeId = .void; + for (ae.operands) |op| { + if (op.role == .out_value) { + result_ty = self.resolveTypeWithBindings(op.payload); + break; + } + } + if (result_ty == .unresolved) return self.emitPlaceholder("inline_asm"); + + // IR operands, in source order (= `%N` index space + LLVM operand order). + const ir_ops = self.alloc.alloc(inst_mod.InlineAsm.AsmOperand, ae.operands.len) catch unreachable; + for (ae.operands, 0..) |op, i| { + // Effective name (design §II.5): explicit `[name]`, else auto-derived + // from a `{reg}` pin, else anonymous (`.empty`). + const eff_name: []const u8 = op.name orelse (pinnedRegister(op.constraint) orelse ""); + ir_ops[i] = .{ + .role = switch (op.role) { + .out_value => .out_value, + .out_place => .out_place, + .input => .input, + }, + .name = if (eff_name.len == 0) types.StringId.empty else self.module.types.internString(eff_name), + .constraint = self.module.types.internString(op.constraint), + // input → the lowered value Ref; an output yields its value (none). + .operand = if (op.role == .input) self.lowerExpr(op.payload) else Ref.none, + }; + } + + const ir_clobbers = self.alloc.alloc(types.StringId, ae.clobbers.len) catch unreachable; + for (ae.clobbers, 0..) |cl, i| { + ir_clobbers[i] = self.module.types.internString(cl); + } + + // Template text RAW — no sx escape processing (matches `#string` literal + // bytes; the `%[name]`/`%%`/`$` rewrite happens at emit). §II.11. + const template_text = ae.template.data.string_literal.raw; + + return self.builder.emit(.{ .inline_asm = .{ + .template = self.module.types.internString(template_text), + .operands = ir_ops, + .clobbers = ir_clobbers, + .has_side_effects = ae.is_volatile, + } }, result_ty); } /// If `node` names a `for xs: (*x)` by-ref capture (an `*elem`), returns