Files
sx/current/CHECKPOINT-ASM.md

21 KiB
Raw Blame History

sx Inline Assembly — Checkpoint (ASM stream)

Companion to current/PLAN-ASM.md; design in docs/inline-asm-design.md. Update after every commit, one step at a time per the cadence rule (no commit may both add a test and make it pass).

Last completed step

G (read-write + place outputs) — a +r / +{reg} -> @place output is now implemented (the last substantive feature). LLVM has no + constraint, so a read-write place lowers to: an output = constraint (return slot, stored back through the place after the call; the leading + rewritten to = in appendAsmConstraints), plus a tied input (the decimal index of that output) appended after the regular inputs, seeded with the place's loaded value passed as a call arg. Tied inputs come last so existing operand indices (%[name]${N}) are undisturbed — asmOperandIndex unchanged. Lowering (lowerAsmExpr) no longer rejects + (indirect * still rejected loudly). emitInlineAsm (src/backend/llvm/ops.zig): grows arg/param arrays by the rw count (n_args = n_inputs + n_rw), loads each seed (asm.rw.seed), emits the tied constraint, and the existing store-back path writes the modified output back. New asmIsReadWrite(e, op) helper. Verified by running: increment-in-place (41→42, IR "=r,0") and a mixed case (rw place + regular input + value output) → textbook "=r,=r,r,0" with correct ${N} indices and args (input, seed). Two commits per cadence: (1) examples/1650-platform-asm-rw-place.sx locked the rejection; (2) implemented + flipped 1650 to a runnable aarch64-pinned example ({ "target": "macos" }, ir-only elsewhere). zig build test green (658 corpus, 446 unit). Files: src/ir/lower/expr.zig, src/backend/llvm/ops.zig, examples/1650-*.

Prior: 2-> @place write-through outputs. An asm result can be stored through a place (local / struct field) instead of returned; the place output does NOT join the result tuple. Parser: -> @place parses the @place as an ordinary address-of expression → an out_place operand (src/parser.zig). Lowering (lowerAsmExpr): out_place operand = the lowered @place address, out_ty = the pointee; read-write (+) and indirect-memory (*) constraints rejected loudly (not yet implemented). Added out_ty: TypeId to the IR AsmOperand (src/ir/inst.zig) so emit builds the combined return struct (ALL outputs). emitInlineAsm rewrite (src/backend/llvm/ops.zig): the LLVM return type is now built from every output's out_ty; after the call, out_place slots are stored through their address and out_value slots rebuild the sx result — with a fast path (no place outputs → the asm's struct return IS the result, so pure-value asm IR is unchanged). Verified: write-to-local (get42→42), struct field (@p.b), mixed value+place (v=10 b=20), + rejected. Locked with examples/1649-platform-asm-place-output.sx (mixed, runs on aarch64). zig build test green (657 corpus, 446 unit). Files: src/parser.zig, src/ir/inst.zig, src/ir/lower/expr.zig, src/backend/llvm/ops.zig, examples/1649-*.

Prior: F — global (module-scope) asm. A top-level asm { "tmpl", }; block (template only) lowers to LLVM module asm, and a lib-less extern calls into the symbols it defines. New asm_global AST node (src/ast.zig) + parseAsmGlobal (src/parser.zig, dispatched from parseTopLevel on kw_asm) — rejects volatile and any operands/clobbers. The node forced (and got) arms in the same three Node.Data switches as asm_expr (sema.zig ×2, semantic_diagnostics.zig). Module gains a global_asm: ArrayList([]const u8) (src/ir/module.zig); lowerMainAndComptime captures each template (the dead lowerDecls is NOT the top-level pass — lowerRoot Pass 2 uses lowerMainAndComptime); emit_llvm.zig's emit() appends each via LLVMAppendModuleInlineAsm (source order). Verified end-to-end: an aarch64 _my_add global routine called via extern returns 42 — AOT only (the ORC JIT doesn't link module-asm symbols, so sx run is wrong; the design ties global-asm symbols to the final linked binary). Locked with examples/1648-platform-asm-global.sx (.build { "aot": true, "target": "macos" } → AOT build+run on aarch64, ir-only elsewhere). zig build test green (656 corpus, 446 unit). Files: src/ast.zig, src/parser.zig, src/sema.zig, src/ir/semantic_diagnostics.zig, src/ir/module.zig, src/ir/lower/decl.zig, src/ir/emit_llvm.zig, examples/1648-*.

Prior: E — multi-output tuples. Inline asm now returns tuples. Replaced the N>1 bail with a shared asmResultType helper (src/ir/lower/expr.zig, mixed into Lowering) that derives the result type from the out_value operands (0→void, 1→T, N→named tuple, named via the §II.5 effective-name rule). The key realization: toLLVMType(tuple) already produces a literal struct {T1,…,Tn} — exactly LLVM's multi-output asm return — so emit needed NO change; building the op with a tuple result type makes the asm call return the struct, which IS sx's tuple value (destructured by the normal tuple_get path). inferType's .asm_expr arm now also delegates to asmResultType (single owner), so return asm, x := asm, and q, r := asm all agree on the type. Verified end-to-end on aarch64: split(0x1234)(lo=52, hi=18), a udiv/msub divmod→ (3, 2). IR is textbook: call { i64, i64 } asm "divq ${4}", "={rax},={rdx},{rax},{rdx},r,~{cc}"(…) → extractvalue → tuple. Converted 1640 to the x86_64 multi-output IR lock (ir-only) + added 1647-platform-asm-aarch64-multi (runs on aarch64). zig build test green (655 corpus, 446 unit). Files: src/ir/lower/expr.zig, src/ir/lower.zig, src/ir/expr_typer.zig, examples/164{0,7}-*.

Prior: C.1 + D — inline asm CODEGEN (lowering builds the op + LLVM emit). Inline assembly now runs end-to-end. lowerAsmExpr (src/ir/lower/expr.zig) stops bailing: it resolves each operand's effective name (§II.5 auto-naming), interns template/constraints/clobbers, lowers input Refs, derives the result TypeId (0→void, 1→T), and builds the inline_asm op. Added a %[name]-references-a- real-operand check (the last deferred validation). Multi-output (N>1) still bails loudly ("Phase E"). emitInlineAsm (src/backend/llvm/ops.zig, port of Zig's airAssembly): assembles the LLVM constraint string (outputs→inputs→~{clobber}, ,|), rewrites the template (%[name]${N}, %%%, $$$, %=${:uid}), then LLVMGetInlineAsm + LLVMBuildCall2 (AT&T). Dispatch wired (emit_llvm.zig, replacing the C.0 @panic). llvm_shim.c: added LLVMInitializeNativeAsmParser() — the JIT must assemble inline asm at run time. Verified end-to-end: aarch64 add/mov run on the host (exit 42), nop volatile runs (1642 now exit 0), IR is textbook (call i64 asm "add ${0},${1},${2}", "=r,r,r"(…)). Locked with examples/1645-platform-asm-aarch64-add.sx (runs on aarch64, ir-only elsewhere via .build + .ir). Also added the inferType .asm_expr arm (src/ir/expr_typer.zig, 0→void / 1→T) — without it a bare x := asm {…-> T} binding inferred .unresolved and silently produced 0; regression-locked with examples/1646-platform-asm-value-binding.sx. Updated 1640 (now Phase-E bail) + 1642 (now runs). zig build test green (654 corpus, 446 unit). Files: src/ir/lower/expr.zig, src/backend/llvm/ops.zig, src/ir/emit_llvm.zig, src/ir/expr_typer.zig, llvm_shim.c, examples/164{0,2,5,6}-*.

Prior: C.0 — IR op inline_asm (lock; no behavior change). Added inline_asm: InlineAsm to the IR Op union + the InlineAsm struct (template: StringId, operands: []const AsmOperand {role/name/constraint/operand}, clobbers: []const StringId, has_side_effects) in src/ir/inst.zig — all strings interned, operands in source order, result on Inst.ty. The new variant forced (and got) arms in two exhaustive Op switches: src/ir/interp.zig (loud bailDetail — inline asm is never comptime-evaluable) and src/ir/print.zig (IR dump). src/ir/emit_llvm.zig gets a @panic tripwire — emit lands in Phase D, and until then lowerAsmExpr still bails so no inline_asm op is ever created (reaching emit would be a lowering-switched-over-too-early bug). Unit test inline_asm op shape in src/ir/inst.test.zig. zig build test green (652 corpus, 446 unit). Files: src/ir/inst.zig, src/ir/interp.zig, src/ir/print.zig, src/ir/emit_llvm.zig, src/ir/inst.test.zig.

Prior: B.1 — operand-name validation (design §II.5 auto-naming rule). Extended lowerAsmExpr with a pinnedRegister(constraint) helper ("={eax}"eax, "+{rax}"rax, "=r"→null) and two checks: (1) reject the echo form [eax] "={eax}" — a label identical to its own pinned register is redundant (the operand is already auto-named after the register); (2) reject duplicate operand names (ambiguous %[name] / result field). Locked with examples/1643-platform-asm-echo-name.sx + 1644-platform-asm-duplicate-name.sx. zig build test green (652 corpus, 0 failed; 445 unit). Files: src/ir/lower/expr.zig.

Prior: B.0 — asm shape validation (compile-path diagnostics). Restructured the .asm_expr lowering arm into lowerAsmExpr (src/ir/lower/expr.zig, mixed into Lowering in src/ir/lower.zig): it validates BEFORE the not-yet-implemented codegen bail, so the user sees the real problem first. Two checklist items now enforced with named diagnostics: (1) template must be a compile-time-known string ("..." / #string); (2) no value outputs ⇒ must be volatile (mirrors Zig — a result-less asm could be deleted). Valid shapes still bail with the "codegen not yet implemented" message. Result-type derivation + auto-naming stay deferred to a later step (observable only once Phase C produces a real IR op). Locked with examples/1641-platform-asm-missing-volatile.sx (volatile error) + 1642-platform-asm-nop-volatile.sx (volatile no-output accepted → codegen bail). zig build test green (650 corpus, 0 failed; 445 unit). Files: src/ir/lower/expr.zig, src/ir/lower.zig, examples/164{1,2}-*.

Prior: A.1 — parse asm { … } + loud lowering bail (folded A.1+A.2 into one honest lock commit, since the loud bail IS current correct behavior — cadence option (a)). Added AsmExpr/AsmOperand to src/ast.zig + the asm_expr Node.Data arm; parseAsmExpr in src/parser.zig (parsePrimary .kw_asm dispatch) — parses the template, flat operand list ([name]? "constraint" -> Type value output / = expr input), and clobbers(.…); volatile/clobbers recognized contextually via isContextualWord. The new asm_expr tag forced (and got) arms in three exhaustive Node.Data switches: src/sema.zig analyzeNode + findNodeAtOffset, src/ir/semantic_diagnostics.zig checkBindingNames (all recurse into template + operand payloads). Lowering bails LOUD + named in src/ir/lower/expr.zig ("inline assembly codegen is not yet implemented…") via an explicit .asm_expr arm (not the generic unknown_expr else) returning emitPlaceholder. -> @place write-through is rejected with a clear "Phase 2" parse error. Locked with examples/1640-platform-asm-parse.sx (multi-output divmod, named operands, register pins, clobbers — parses then bails; called from main). zig build test green (648 corpus, 0 failed; 445 unit). Files: src/ast.zig, src/parser.zig, src/sema.zig, src/ir/semantic_diagnostics.zig, src/ir/lower/expr.zig, examples/1640-*.

Prior: A.0kw_asm keyword (first compiler code). Added the kw_asm Token.Tag variant + .{ "asm", .kw_asm } keyword-map entry in src/token.zig; volatile / clobbers deliberately stay OUT of the global table (contextual). New exhaustive Tag switch in src/lsp/server.zig classifyToken flagged the missing arm (the intended coverage tripwire) — added .kw_asm to the keyword group. Lock test in new src/lexer.test.zig (asmkw_asm, volatile/clobbersidentifier), wired into the src/root.zig barrel as lexer_tests. zig build test green (648 corpus, 0 failed; 445 unit, 0 failed — +1). Files: src/token.zig, src/lexer.test.zig, src/root.zig, src/lsp/server.zig.

Prior: 0.2 — CLAUDE.md docs for <name>.build; Phase 0 COMPLETE. 0.1 — corpus runner ir-only branch for cross-target examples. Replaced 0.0's loud placeholder bail: when cfg.target doesn't match the host (ir_only), sweepRoot skips run/build/exec and verifies via sx ir --target only — asserting .exit (ir cmd) + .ir (normalized stdout) + .stderr, never .stdout (write skipped in update mode, assertion skipped in verify mode). An .ir snapshot is required in ir-only mode — its absence is a loud failure ("needs an .ir snapshot for ir-only mode"). Locked with examples/1639-platform-target-cross.sx (asm-free main :: () -> i64 { return 0; }), .build { "target": "x86_64-linux" }, + checked-in .ir. Verified both guards fire: corrupting the .ir → IR mismatch; deleting it → the require-failure. zig build test green (647 corpus, 0 failed; 444 unit). Files: src/corpus_run.test.zig, examples/1639-*.

Current state

Inline assembly works end-to-end: 0, 1, and N value outputs (tuples). Full pipeline: lex (A.0) → parse (A.1) → validate (B.0/B.1 + %[name] check) → IR op (C.0) → lower-builds-op + LLVM emit + JIT asm-parser init (C.1/D) → multi-output tuples (E). Register-class + register-pinned operands, inputs, clobbers, #string multi-instruction templates, %[name]/%% rewriting, and the §II.5 auto-naming rule all work and execute on the host JIT. Global asm { … } (Phase F) works AOT (call-into-asm via lib-less extern). -> @place write-through outputs work (Phase 2) and read-write (+) place outputs work (Phase G — tied-input lowering, runs on aarch64). Indirect-memory (*) place outputs are still rejected loudly as not-yet-implemented — the only remaining substantive feature. Smaller follow-ups: the comptime-call guard for global asm (#run into a module-asm symbol should fail loud via dlsym-miss — pin a test), a JIT-vs-global-asm note (sx run silently mishandles module-asm symbols; AOT is correct), and the x86_64 syscall ir-only example. readme.md now has an "Inline Assembly" section.

Known orthogonal bug: issue 0137sx run on a program with no main segfaults (src/target.zig:256-273, unguarded JIT entry lookup). Pre-existing, asm-independent; does NOT block the ASM stream (every example has a main).

Phase EF feasibility already confirmed against the live tree (LLVMGetInlineAsm / LLVMBuildCall2 / LLVMAppendModuleInlineAsm in LLVM@19 Core.h; ERR-stream extractvalue→tuple in emit_llvm.zig:726-927; lib-less extern, 60 sites; --target a global CLI flag).

Next step

The output-to-const rejection item is DONE (via issue 0138, now RESOLVED). The general @const address-of bug — @scalar_const reinterpreting the folded value as a pointer (inttoptr (i64 40 to ptr)) → segfault on deref / invalid store for asm -> @const — was fixed in src/ir/lower/expr.zig's .address_of path (scalar :: consts now diagnose "no storage"; array/struct consts keep real storage). Because asm -> @place lowers @place through that same path, asm -> @const now reports the clean diagnostic for free — no asm-specific code needed. Regression: examples/1177-diagnostics-addr-of-const-rejected.sx.

Remaining work, all optional / additive:

  • Indirect-memory ("=*m") outputs: pass the place address as an arg, asm writes through it (no return slot). Currently rejected.
  • Output-to-const rejection for -> @place (the place must be mutable).
  • Polish: comptime-call guard test for global asm; make sx run error (not silently mishandle) a module-asm symbol; x86_64 syscall-write ir-only example.

Orthogonal: issue 0137 (no-main segfault).

Log

  • (init) Plan + design doc written; ASM stream opened.
  • (0.0) Corpus runner target-gating: <name>.build JSON config (replaces .aot marker), --target threading, hostMatchesTarget execute-gate, loud cross-target placeholder bail. Migrated 1226/1227 .aot.build; locked with 1638 fixture + unit tests. zig build test green.
  • (0.1) ir-only branch: cross-target examples verify via sx ir --target only (exit+ir+stderr, no stdout; .ir required). Locked with 1639 fixture; verified corrupt-.ir → mismatch and missing-.ir → loud failure. zig build test green.
  • (0.2) docs: CLAUDE.md documents <name>.build JSON sidecar (aot + target + ir-only gating), replacing stale .aot marker prose. Phase 0 COMPLETE.
  • (A.0) kw_asm keyword in token.zig (+ map entry); LSP classifyToken switch coverage; lock test in new lexer.test.zig (wired via root.zig). volatile / clobbers stay contextual identifiers. zig build test green (445 unit, +1).
  • (A.1) parse asm { … }AsmExpr + loud lowering bail; asm_expr arms in 3 exhaustive Node.Data switches; -> @place rejected (Phase 2). Adopted operand auto-naming rule (design §II.5). Locked with 1640 fixture. Filed orthogonal issue 0137 (no-main JIT segfault). zig build test green (648 corpus, 445 unit).
  • (B.0) asm shape validation in lowerAsmExpr: comptime-string template + no-output⇒volatile, with named diagnostics before the codegen bail. Locked with 1641 (volatile error) + 1642 (volatile accepted). zig build test green (650 corpus, 445 unit).
  • (B.1) operand-name validation: pinnedRegister helper + reject echo form ([eax] "={eax}") and duplicate names. Locked with 1643 + 1644. zig build test green (652 corpus, 445 unit).
  • (C.0) IR op inline_asm: InlineAsm + interp bailDetail + print arm + emit @panic tripwire (Phase D). No behavior change (lowering still bails). Unit test inline_asm op shape. zig build test green (652 corpus, 446 unit).
  • (C.1+D) CODEGEN — lowerAsmExpr builds the op (effective names, interned strings, input Refs, 0/1 result type) + %[name] validation; emitInlineAsm (constraint string + template rewrite + LLVMGetInlineAsm/BuildCall2, AT&T); inferType arm; LLVMInitializeNativeAsmParser for the JIT. Inline asm runs end-to-end. N>1 bails (Phase E). Locked with 1645 (aarch64 add, runs) + 1646 (:= binding); updated 1640/1642. zig build test green (654 corpus, 446 unit).
  • (E) multi-output tuples — asmResultType helper (0→void/1→T/N→named tuple), shared by lowering + inferType. toLLVMType(tuple) == LLVM multi-output struct, so emit unchanged; the asm struct return IS the sx tuple. Runs on aarch64 (1647: split(lo,hi)); 1640 → x86 multi-output IR lock (ir-only). zig build test green (655 corpus, 446 unit).
  • (F) global asm — asm_global AST node + parseAsmGlobal (top-level, rejects volatile/operands); Module.global_asm captured in lowerMainAndComptime; emit() appends via LLVMAppendModuleInlineAsm; call-into via lib-less extern. AOT-verified (1648, _my_add→42). zig build test green (656 corpus).
  • (docs) readme.md "Inline Assembly" section (b8800a2).
  • (2) -> @place write-through — out_place operand; out_ty on the IR AsmOperand; emitInlineAsm builds the combined output struct + splits (out_place → store-through, out_value → result), fast-path when no places. +/* rejected. Locked with 1649 (mixed, runs). zig build test green (657 corpus, 446 unit).
  • (G) read-write + place outputs — + lowers to an output = + a tied input (output-index constraint) seeded with the place's loaded value, tied inputs appended last (operand indices undisturbed). appendAsmConstraints rewrites +=; emitInlineAsm grows args by the rw count + loads seeds; asmIsReadWrite helper. Lowering stops rejecting + (* still rejected). Two commits (cadence): 1650 locked the rejection, then flipped to a runnable aarch64 example ("=r,0" IR). zig build test green (658 corpus, 446 unit).
  • (0138) output-to-const rejection — fixed the underlying general bug: scalar @const (address-of a folded :: constant) reinterpreted the value as a pointer (inttoptr). src/ir/lower/expr.zig .address_of now diagnoses a scalar const (local + module) instead of falling through; array/struct consts keep storage. asm -> @const gets the clean diagnostic for free (same path). Regression examples/1177-diagnostics-addr-of-const-rejected.sx. Issue 0138 RESOLVED. zig build test green (659 corpus, 446 unit).

Known issues

  • 0138 — RESOLVED. @const (address-of a :: comptime constant) yielded a wild pointer (inttoptr (i64 <value> to ptr)). Fixed by diagnosing scalar @const in src/ir/lower/expr.zig .address_of (no storage; array/struct consts unaffected). Delivered the ASM "output-to-const rejection" for free. Regression examples/1177-diagnostics-addr-of-const-rejected.sx.
  • 0137sx run on a program with no main segfaults (unguarded JIT entry lookup, src/target.zig:256-273). Pre-existing, asm-independent. Filed issues/0137-jit-run-no-main-segfault.md. Does not block A.1.