Files
sx/design/inline-asm-design.md
agra e7eeecc0f3 docs: move inline-asm design doc to a top-level design/ folder
Moves docs/inline-asm-design.md -> design/inline-asm-design.md (the
internal design record now lives under design/, separate from the
user-facing docs/). Updates all links: current/CHECKPOINT-ASM.md,
current/PLAN-ASM.md, current/PLAN-EXTERN-EXPORT.md (../docs -> ../design)
and docs/inline-assembly.md (same-dir -> ../design).
2026-06-16 07:46:01 +03:00

45 KiB

Inline Assembly for sx — Design Doc & Proposal

Status: proposal / not yet scheduled into a workstream Author: research pass over the Zig compiler (~/projects/zig, 0.16-dev) + the sx compiler Scope: how Zig implements inline assembly end-to-end, and a minimal-deviation proposal to bring the same model to sx.

Guiding constraint for this doc: mirror Zig's design; deviate only where sx's grammar or stdlib makes a 1:1 copy impossible, and call every deviation out explicitly with its justification. Every deviation below is tagged [DEVIATION] with a reason.


0. TL;DR + feasibility

  • Feasible today, no new infrastructure. sx already links LLVM (build.zig:10/opt/homebrew/opt/llvm@19) and @cImports llvm-c/Core.h (src/llvm_api.zig:1-17). That header exposes everything inline asm needs, reachable right now through llvm_api.c.*:
    • LLVMGetInlineAsm(Ty, AsmString, AsmStringSize, Constraints, ConstraintsSize, HasSideEffects, IsAlignStack, Dialect, CanThrow) — builds the asm callee (LLVM 19/21 share this 9-arg signature).
    • LLVMInlineAsmDialectATT / LLVMInlineAsmDialectIntel.
    • LLVMBuildCall2(...) — already used pervasively in src/ir/emit_llvm.zig (e.g. the Obj-C msgSend path) — calls the asm value like a function.
    • LLVMAppendModuleInlineAsm(M, Asm, Len) — module-level (global) asm.
  • The hard part is not codegen. Codegen is ~80 lines of well-trodden LLVM-C. The real work is (a) the parser grammar, (b) a faithful port of Zig's LLVM constraint-string assembly and %[name]$N template rewrite, and (c) Sema validation rules. All three are fully specified below.
  • Surface form (decided, §II.2): asm volatile { "tmpl", "=r" -> T, "r" = x, clobbers(.cc, .memory) } — a brace block; -> marks outputs / = marks inputs (no positional : sections); enum-literal clobbers(.…); and N -> Type outputs return a tuple (sx has tuples — Zig caps at one output).
  • Inline asm is never comptime-evaluable. The interpreter must bail loudly (bailDetail), per CLAUDE.md's "no silent unimplemented arms" rule.
  • One naming note: sx already has a sx asm <file> CLI subcommand (src/main.zig:203,386) that emits a .s file. That is a compiler output mode, a different namespace from a language token. No conflict, but worth knowing so nobody confuses the two.

PART I — How Zig implements inline assembly

All file references in Part I are under ~/projects/zig (0.16-dev, commit 3deb86bafd). Parser/AST/AstGen live in lib/std/zig/; Sema/AIR/codegen in src/.

I.1 Surface syntax

The canonical example (doc/langref/inline_assembly.zig), a Linux x86_64 syscall:

pub fn syscall3(number: usize, arg1: usize, arg2: usize, arg3: usize) usize {
    return asm volatile ("syscall"
        : [ret] "={rax}" (-> usize),
        : [number] "{rax}" (number),
          [arg1] "{rdi}" (arg1),
          [arg2] "{rsi}" (arg2),
          [arg3] "{rdx}" (arg3),
        : .{ .rcx = true, .r11 = true });
}

Grammar shape:

asm volatile? ( <template-string>
    : <output-item> , <output-item> , ...      # outputs   (optional section)
    : <input-item>  , <input-item>  , ...      # inputs    (optional section)
    : <clobbers> )                              # clobbers  (optional section)

output-item :  [name] "constraint" (-> Type)    # asm result becomes the value
            |  [name] "constraint" (lvalue)      # asm writes through the pointer
input-item  :  [name] "constraint" (expr)
clobbers    :  .{ .reg0 = true, .reg1 = true }   # struct literal (0.16-dev)

Key semantics (from doc/langref.html.in:4217-4300):

  • volatile marks side effects. Without it, an asm expression whose result is unused may be deleted. An asm expression with no outputs must be volatile (else compile error).

  • x86/x86_64 use AT&T syntax (LLVM provides the parser; Intel support is "buggy and not well tested").

  • %[name] in the template refers to a named operand's register; %% is a literal %.

  • Clobbers are registers the asm trashes that are not inputs/outputs. "memory" (the .memory = true field) means "writes to arbitrary memory." Failing to declare a clobber is unchecked illegal behavior.

  • Global assembly = an asm(...) in a namespace-level comptime block. It has different rules: volatile is forbidden, there are no inputs/outputs/ clobbers, no % substitution, and all global asm is concatenated verbatim:

    // doc/langref/test_global_assembly.zig
    comptime {
        asm (
            \\.global my_func;
            \\.type my_func, @function;
            \\my_func:
            \\  lea (%rdi,%rsi,1),%eax
            \\  retq
        );
    }
    extern fn my_func(a: i32, b: i32) i32;   // call into the global-asm symbol
    

I.2 Pipeline, stage by stage

Tokenizer — lib/std/zig/tokenizer.zig

Two keywords in the StaticStringMap: .{ "asm", .keyword_asm } and .{ "volatile", .keyword_volatile }.

AST — lib/std/zig/Ast.zig

Four node tags (Ast.zig:3789-3817):

  • asm_simpleasm(template) only, no operands.
  • @"asm" — full form; data is node_and_extra → (template node, ExtraIndex to an Asm).
  • asm_output[a] "b" (-> Type) or [a] "b" (ident).
  • asm_input[a] "b" (expr).

The "full" view the rest of the compiler consumes (Ast.zig:2797-2809):

pub const Asm = struct {
    ast: Components,
    volatile_token: ?TokenIndex,
    outputs: []const Node.Index,
    inputs: []const Node.Index,
    pub const Components = struct {
        asm_token: TokenIndex,
        template: Node.Index,
        items: []const Node.Index,       // outputs ++ inputs, interleaved order preserved
        clobbers: Node.OptionalIndex,    // a comptime expression (the struct literal)
        rparen: TokenIndex,
    };
};

The on-disk extra record (Ast.zig:3969-3975) stores items_start/items_end (a span into the node list), clobbers (optional node), and rparen.

Parser — lib/std/zig/Parse.zig

expectAsmExpr (Parse.zig:2771-2838) implements the grammar:

fn expectAsmExpr(p: *Parse) !Node.Index {
    const asm_token = p.assertToken(.keyword_asm);
    _ = p.eatToken(.keyword_volatile);
    _ = try p.expectToken(.l_paren);
    const template = try p.expectExpr();
    if (p.eatToken(.r_paren)) |rparen| { /* asm_simple */ }
    _ = try p.expectToken(.colon);
    // ... parse output items until a `:`/`)` ...
    const clobbers: Node.OptionalIndex = if (p.eatToken(.colon)) |_| clobbers: {
        // ... parse input items until a `:`/`)` ...
        _ = p.eatToken(.colon) orelse break :clobbers .none;
        break :clobbers (try p.expectExpr()).toOptional();   // clobbers = an expression
    } else .none;
    // ...
}
  • parseAsmOutputItem (Parse.zig:2840-2864): LBRACKET IDENT RBRACKET STRINGLITERAL LPAREN (MINUSRARROW TypeExpr | IDENT) RPAREN.
  • parseAsmInputItem (Parse.zig:2866-2883): LBRACKET IDENT RBRACKET STRINGLITERAL LPAREN Expr RPAREN.
  • Clobbers parse as a generic expression ((try p.expectExpr())), not a string list — this is the 0.16-dev change. It is later coerced to a std.lang.assembly.Clobbers struct at Sema time.

AST → ZIR — lib/std/zig/AstGen.zig

asmExpr (AstGen.zig:8553-8669) + addAsm (12257-12310). The ZIR payload (lib/std/zig/Zir.zig:2531-2564):

pub const Asm = struct {
    src_node: Ast.Node.Offset,
    asm_source: NullTerminatedString,   // template (string-literal case)
    output_type_bits: u32,              // bit i = output i uses `-> T` (vs ptr)
    clobbers: Ref,                      // comptime ref → assembly.Clobbers value
    pub const Small = packed struct(u16) { is_volatile: bool, outputs_len: u7, inputs_len: u8 };
    pub const Output = struct { name: NullTerminatedString, constraint: NullTerminatedString, operand: Ref };
    pub const Input  = struct { name: NullTerminatedString, constraint: NullTerminatedString, operand: Ref };
};

AstGen already enforces the structural rules:

  • Global (container-level) asm: rejects volatile, rejects any outputs/inputs/clobbers (AstGen.zig:8583-8587).
  • Local asm: "assembly expression with no output must be marked volatile."
  • outputs.len < 16, inputs.len < 32 (fit Small.outputs_len/inputs_len).
  • At most one output may use the -> T form ("inline assembly allows up to one output value"); output_type_bits records which.
  • Two ZIR tags: .@"asm" (string-literal template) vs .asm_expr (comptime expression template).

ZIR → AIR (Sema) — src/Sema.zig

zirAsm (Sema.zig:15044-15231, dispatched at 1396-1397). This is where all semantic validation happens. It:

  • Resolves the template to a comptime string (resolveConstString).
  • Global asm (func_index == .none): asserts no operands, then zcu.addGlobalAssembly(owner, asm_source) and returns .void_value.
  • requireRuntimeBlock — local asm can't run at comptime.
  • Per output: if -> T, resolve the type, ensureLayoutResolved, set the expression's result type; else resolve the operand pointer. Validates:
    • output type has a well-defined in-memory layout (else error);
    • cannot output to a const pointer ("asm cannot output to const '{s}'");
    • output must be a runtime value (no reference to a comptime var).
  • Per input: resolve operand, reject comptime-only refs, coerce comptime_intusize, comptime_floatf64.
  • Clobbers: coerce the expression to std.lang.assembly.Clobbers, resolve to a comptime value.

The AIR payload (src/Air.zig:1485-1497):

pub const Asm = struct {
    source_len: u32,
    inputs_len: u32,
    clobbers: InternPool.Index,         // comptime assembly.Clobbers value
    flags: packed struct(u32) { outputs_len: u31, is_volatile: bool },
};
// trailing: out operand refs, in operand refs, then the template bytes and
// (constraint\0 name\0) pairs packed into air_extra.

AIR → LLVM — src/codegen/llvm/FuncGen.zig

airAssembly (FuncGen.zig:2473-2852) is the crux. This is the algorithm sx must port. Three sub-tasks:

(a) Assemble the LLVM constraint string. Comma-separated. For each output: emit = (write-only) or + (read-write, recorded in llvm_rw_vals); a * prefix marks an indirect (memory) output passed as a pointer parameter; a non-indirect output contributes to the return type. The user's leading =/+ in constraint[0] is consumed and re-emitted; the rest is copied with Zig commas rewritten to LLVM | (alternative constraints). Inputs are copied similarly (no =). Clobbers: iterate the Clobbers struct's bool fields as a bigint; for each true field emit ~{fieldname} (via appendConstraints, which also expands target-specific aliases).

(b) Rewrite the template %[name] → LLVM positional ${N} (state machine, FuncGen.zig:2735-2802):

input output note
$ $$ escape LLVM's $
%% % literal percent
%= ${:uid} unique id
%[name] ${N} N = position in name_map
%[name:mod] ${N:mod} with modifier

name_map maps each operand's [name] to its positional index across all outputs+inputs.

(c) Build & call. Pick the LLVM function type: return_count == 0void; == 1 → the single return type; > 1 → an anonymous struct of the return types. Then:

const call = try self.wip.callAsm(
    attributes, llvm_fn_ty,
    .{ .sideeffect = is_volatile },        // Assembly.Info: sideeffect/alignstack/inteldialect/unwind
    rendered_template, llvm_constraints, llvm_param_values, "");

callAsm (lib/std/zig/llvm/Builder.zig:6131-6143) is a thin wrapper that builds the asm constant (asmValue) and emits a normal call. In LLVM-C terms this is exactly LLVMGetInlineAsm(...) + LLVMBuildCall2(...). Finally, non-indirect outputs are read back: with one return it's the call result; with several it's extractvalue i per output; indirect outputs were already written by the asm via their pointer parameter.

C backend — src/codegen/c.zig

No airAssembly for inline asm in the C backend in this tree; only global asm flows out (as module asm). For sx this is irrelevant — sx only has an LLVM backend.

Global asm & naked functions

  • Global asm bypasses everything above: Sema.addGlobalAssembly accumulates the verbatim source; the LLVM object emits it via the module-level asm string (LLVM-C: LLVMAppendModuleInlineAsm). Symbols it defines are reached with extern fn.
  • Naked functions (callconv(.naked)) drop the prologue/epilogue; the body is entirely inline asm. This is an orthogonal calling-convention feature, not part of the asm expression itself.

PART II — Proposal for sx

II.1 Design principles

  1. Copy Zig's semantic model exactly: a template + register/memory operands
    • clobbers + a volatile flag; AT&T syntax via LLVM; "no-output asm must be volatile"; %[name] substitution; AT&T-by-default.
  2. Copy the LLVM lowering exactly (the constraint-string assembler + template rewriter from FuncGen.zig are reproduced verbatim in §II.6 — these are the parts where "inventing our own" would silently miscompile).
  3. Diverge from Zig's surface syntax where sx has a better-fitting idiom, and only there. The deviations (§II.2) are deliberate: a brace block instead of ( … ); ->/= operand markers instead of positional : sections; an enum-literal clobbers(.…) list; and — because sx has tuples and Zig does not — true multiple return values instead of Zig's one-output cap.

II.2 sx surface syntax

asm is an expression (it yields the output value/tuple), introduced by a new asm keyword. The body is a brace block of comma-separated parts: a template string first, then operands, then an optional clobbers(.…) clause. Each operand is [name]? "constraint" <role>, where the role marker is:

  • -> Type — an output that produces a value (joins the result).
  • -> @place — an output that writes through to existing storage (Phase 2).
  • = expr — an input (the value fed in).

-> reuses sx's "produces" arrow (as in (a: i32) -> i32); = reuses sx's "is set to" binding. There are no positional : sections.

// x86_64-linux — write(2) via syscall
sys_write :: (fd: i64, buf: [*]u8, len: u64) -> i64 {
    return asm volatile {
        "syscall",
        "={rax}" -> i64,              // output → the expression's value
        "{rax}"  = 1,                 // SYS_write
        "{rdi}"  = fd,
        "{rsi}"  = buf,
        "{rdx}"  = len,
        clobbers(.rcx, .r11, .memory),
    };
}

// read a register, no inputs, named operand for %[out]
sp :: () -> u64 {
    return asm { "mov %%rsp, %[out]", [out] "=r" -> u64 };
}

Multi-instruction templates use sx's existing #string heredoc (src/lexer.zig:402) or a multi-line "..." literal — no new lexer feature:

serialize :: () {
    asm volatile {
        #string ATT
        mfence
        lfence
ATT,
    };
}

Outputs and the result type. A -> Type output contributes one value to the asm expression's result; the count decides the shape:

-> Type outputs result spelling
0 void (must be volatile) asm volatile { … }
1 that type T x := asm { …, "=r" -> T };
N a tuple (T1,…,Tn) (declaration order) a, b := asm { … };

A [name] on an output becomes a named tuple field — the same name you'd use for %[name] does double duty:

// sx has tuples, so asm gets real multiple return values (Zig caps you at one).
divmod :: (n: u64, d: u64) -> (quot: u64, rem: u64) {
    return asm {
        "divq %[d]",
        [quot] "={rax}" -> u64,       // → .quot   (operand 0)
        [rem]  "={rdx}" -> u64,       // → .rem    (operand 1)
        "{rax}" = n,
        "{rdx}" = 0,
        [d] "r" = d,
        clobbers(.cc),
    };
}
q, r := divmod(17, 5);                // q = 3, r = 2

Deviations from Zig (each deliberate; semantics unchanged)

  • [DEVIATION 1 — brace block, not ( … ).] The asm body is asm { … }, a comma-separated brace block (trailing comma allowed, per specs.md:226,501), not Zig's parenthesised form. Braces read as "a block of code," which is what an asm template is; #string heredoc templates especially benefit. asm is a keyword, so asm { / asm volatile { is unambiguous.

  • [DEVIATION 2 — ->/= operand markers, not : sections.] Zig groups operands into positional : outputs : inputs : clobbers sections (count the colons; : : for an empty one). sx tags each operand by role instead — -> Type / -> @place (output) and = expr (input) — so the list is flat, order-independent, with no positional colons. (<- for inputs was considered and rejected: it can't be a global token without mis-lexing a < -b; = reuses an existing token and the existing "binding" meaning.)

  • [DEVIATION 3 — clobbers are an enum-literal list clobbers(.cc, .memory).] Zig 0.16 uses a struct literal : .{ .rcx = true } coerced to a per-arch std.lang.assembly.Clobbers; older Zig used a string list. sx uses a dot-literal list, cleaner than both. v1: each .name is a dot-name lowered straight to ~{name} (.memory/.cc are recognized specials; register names pass through verbatim; LLVM validates). Phase 4: upgrade .name to members of a compile-time-checked per-arch Clobber enum — same syntax, gains typo-checking. Note the call-looking clobbers(…) is a declarative clause, not a call — nothing executes; it only feeds the register allocator.

  • [DEVIATION 4 — volatile is a contextual keyword.] sx's keyword set (specs.md:168) has neither asm nor volatile. asm becomes a real keyword; volatile appears only right after asm, so it can be recognized contextually (a plain identifier everywhere else), avoiding reserving it globally. The surface is byte-identical to Zig. (Alternative: reserve globally — simpler lexer, small source-compat risk. Recommend contextual.)

  • [DEVIATION 5 — multiple value-outputs return a tuple (sx ⊃ Zig).] Zig allows at most one -> T output; the rest must be pointer/lvalue outputs. sx has tuples, so N -> Type outputs return (T1,…,Tn) (named when operands are named), destructured with a, b := …. A deliberate improvement over Zig, enabled by a feature Zig lacks, and maps onto LLVM's existing multi-output struct return (§II.6). The other output flavor — -> @place write-through, plus read-write ("+r" -> @place) and indirect-memory ("=*m") outputs — is Phase 2 (needs indirect-constraint handling); the value-tuple form does not.

  • [DEVIATION 6 — global asm is a top-level asm { … } declaration.] sx has no namespace-level comptime {} block (it has #run, specs.md:2598), so global asm is a top-level statement:

    asm {
        #string ATT
        .global my_func
        .type my_func, @function
        my_func:
          lea (%rdi,%rsi,1), %eax
          retq
    

ATT, };

my_func :: (a: i32, b: i32) -> i32 extern; // extern, no library — valid sx today


Only the `comptime {}` wrapper is dropped; lowers to `LLVMAppendModuleInlineAsm`.

**Calling the asm symbol reuses the C-FFI *import* path** (no new mechanism for
v1). A lib-less `extern` fn declaration (its library is optional; used in 50+
stdlib sites, e.g. `chdir :: (path: [*]u8) -> i32 extern;`) emits exactly the
artifact needed to *call into* the asm symbol — an external-linkage,
**C-calling-convention**, raw-named, link-time-resolved declaration — the same
thing Zig's `extern fn` produces (also C-callconv). The reverse direction (asm
calling *back into* an sx function) is handled by `export`, the define-and-expose
dual of `extern`.

Everything *semantic* — comptime-known template, register/memory constraints
verbatim to LLVM, clobber meaning, "no-output ⇒ must be volatile," AT&T default,
`%[name]`/`%%` substitution — is **identical to Zig**. Only the surface (block,
`->`/`=`, `clobbers(.…)`, tuple returns) differs.

## II.3 sx AST

sx's AST is a pointer-based tagged union (`Data = union(enum)` at
`src/ast.zig:13`, nodes built via `Parser.createNode`), much simpler than Zig's
SoA `extra_data` scheme — so we can store slices directly. Add one arm to the
`Node.Data` union (`src/ast.zig:13`):

```zig
// in Node.Data union(enum):
asm_expr: AsmExpr,

// new node struct, alongside the other expression node defs:
pub const AsmExpr = struct {
  template: *Node,                  // string-literal / #string node (comptime string)
  is_volatile: bool = false,
  operands: []const AsmOperand,     // declaration order preserved (= %N indexing)
  clobbers: []const []const u8,     // dot-names from clobbers(.…): "rcx","cc","memory"
};

pub const AsmOperand = struct {
  name:       ?[]const u8 = null,   // optional [name]; only needed for %[name]
  constraint: []const u8,           // verbatim, e.g. "={rax}", "=r", "+r", "{rdi}", "r"
  role:       Role,
  payload:    *Node,                // out_value → Type node; out_place/input → expr node

  pub const Role = enum {
      out_value,   // `-> Type`     value output; N of these → a tuple result
      out_place,   // `-> @place`   write-through to existing storage (Phase 2)
      input,       // `= expr`
  };
};

A single flat operands list (not split into outputs/inputs) preserves source order — what the %0/%[name] indices and the LLVM constraint order key off. The result type is derived in Sema from the out_value operands (§II.5).

II.4 sx parser

asm is parsed in expression position. sx dispatches primary expressions in Parser.parsePrimary (src/parser.zig); add a .kw_asm case (mirroring how existing keyword/#-directive expressions like #run are handled):

  1. consume asm; contextually consume volatile if the next token is the word volatile (Deviation 4).
  2. expect(.l_brace); parse the first element as the template expression.
  3. then a comma-separated list until }. Each element is either:
    • an operand[name]? (a bracketed identifier), a string-literal constraint, then a role: -> Type (out_value) · -> @-place (out_place, Phase 2) · = expr (input); or
    • the clobbers clauseclobbers ( .ident (, .ident)* ).
  4. allow a trailing comma; expect(.r_brace); createNode(start, .{ .asm_expr = … }).

The first element is unambiguously the template (a string not followed by a role marker). -> vs = after the constraint disambiguates output vs input; inside a -> target, a leading @ marks a write-through place vs a type.

Top-level/global asm (Deviation 6): recognize asm { at declaration scope and build a dedicated asm_global decl (template only — reject operands/volatile).

Lexer/token: add kw_asm to the Token.Tag enum + keyword StaticStringMap in src/token.zig; volatile and clobbers stay out of the global table (contextual). No new operator tokens-> (arrow), = (equal), . (dot) and {} already exist.

II.5 sx Sema / typing

  • Result type from the out_value operands (-> Type), in declaration order: 0 → void (and the asm must be volatile); 1 → that operand's type T; N → a tuple (T1,…,Tn), named when the operands carry [name]s ((name1: T1, …)), positional otherwise. Implement in the expression typer (src/ir/expr_typer.zig / wherever inferExprType lives), returning the resolved TypeId (a tuple TypeId for N>1). Do not fall back to a silent default — an unresolvable output type is a real error (CLAUDE.md silent-default rule): emit a diagnostic and return the project's .unresolved sentinel.
  • Port Zig's validation checklist (these are the user-facing error messages):
    1. no output operand ⇒ the asm must be volatile;
    2. each out_value result type must have a well-defined in-memory layout;
    3. inputs must be runtime values; coerce comptime int→i64, float→f64;
    4. template must be a comptime-known string;
    5. (Phase 2) out_place cannot write a const; indirect-memory rules.
  • Every %[name] referenced in the template must name an operand (best surfaced as a Sema diagnostic; also caught at codegen during the rewrite — §II.6).

Operand naming rule (auto-name from a {reg} pin) — DECIDED

The [name] label on an operand is purely an sx-surface convenience: it provides the %[name] template alias and (for out_value) the result tuple's field name. LLVM never sees it (it sees positional ${N} + the constraint). To kill the common redundancy where a label just echoes its pinned register ([eax] "={eax}"), the operand name is derived as follows, uniformly across every operand kind (out_value / out_place / read-write / input):

  1. Explicit [name] wins — use it verbatim (the %[name] alias / field name).
  2. Else, if the constraint pins a single register"={eax}", "{rdi}", "+{rax}", i.e. a {reg} body (optionally with a =/+ prefix) — the operand is auto-named after that register (eax, rdi, rax). Usable as %[eax] and as the tuple field name.
  3. Else (register-class =r/+r/r, or memory =m, …) — the operand has no implicit name. A [name] is then required if the template references it (%[name]) or, for out_value, if a named result field is wanted; otherwise it is anonymous (positional tuple field).

Corollaries:

  • Reject the echo form. An explicit [name] that is identical to the register its own constraint pins ([eax] "={eax}") carries no information — emit a diagnostic ("redundant operand name eax — it already names the pinned register; drop the [eax]"). The useful form is a label that differs from the register ([quot] "={rax}" → field quot over register rax).
  • Result field names (the §II.5 result-type rule above) come from each out_value's effective name — explicit [name], else the auto-derived register name; positional only when neither exists (a class-constrained output with no [name]).
  • This is a typing-stage rule: the parser still stores name: ?[]const u8 (null when no [name] was written); Sema computes the effective name. No parser change.

Note: there is no "≤1 output" rule (that was Zig's limit; sx's tuples lift it).

II.6 sx IR + LLVM codegen (the part that must match Zig bit-for-bit)

IR op — src/ir/inst.zig

Add to Op = union(enum) (src/ir/inst.zig:80), next to objc_msg_send (:219). Strings are interned (StringId, as const_string at :85); operands are SSA Refs:

inline_asm: InlineAsm,

pub const InlineAsm = struct {
    template:    StringId,                  // interned, RAW (rewritten at emit)
    operands:    []const AsmOperand,        // declaration order (= %N indexing)
    clobbers:    []const StringId,          // interned dot-names: "rcx","cc","memory"
    has_side_effects: bool,
    // result rides on Inst.ty: void / a scalar TypeId / a tuple TypeId (N outputs)
};

pub const AsmOperand = struct {
    role:       enum { out_value, out_place, input },
    name:       StringId,                   // .none when unnamed
    constraint: StringId,                   // verbatim "={rax}" / "=r" / "+r" / "{rdi}"
    operand:    Ref,                        // out_value → .none; out_place/input → the Ref
};

Lowering — src/ir/lower/expr.zig

Add .asm_expr => self.lowerAsmExpr(...) to the lowerExpr dispatch. It interns the template + constraint strings + clobber names, lowers each input operand to a Ref, computes the result TypeId (§II.5), and emits the inline_asm op. (Same shape as the existing objc_msg_send lowering.)

Emit — src/ir/emit_llvm.zig

Add .inline_asm => self.emitInlineAsm(...) to the emitInst dispatch. This is a direct port of FuncGen.airAssembly. Using the already-imported llvm_api.c:

fn emitInlineAsm(self: *Emitter, inst: *const Inst, a: InlineAsm) void {
    // 1) result LLVM type + param types/values from constraints
    const ret_ty = self.lowerType(inst.ty);                 // void if no typed output
    var param_tys: ...; var args: ...;                       // one per `input` constraint
    // 2) assemble the LLVM constraint string  (see algorithm below)
    //    outputs first ("=..."/"+..."), then inputs, then "~{reg}" clobbers, comma-joined
    // 3) rewrite the template  %[name]->${N}, %%->%, %=->${:uid}, $->$$   (state machine below)
    const fn_ty = c.LLVMFunctionType(ret_ty, param_tys.ptr, n_params, 0);
    const asm_val = c.LLVMGetInlineAsm(
        fn_ty,
        rendered_template.ptr, rendered_template.len,
        constraint_str.ptr,    constraint_str.len,
        @intFromBool(a.has_side_effects),   // HasSideEffects (volatile)
        0,                                  // IsAlignStack
        c.LLVMInlineAsmDialectATT,          // AT&T (Deviation: none — matches Zig default)
        0,                                  // CanThrow
    );
    const result = c.LLVMBuildCall2(self.builder, fn_ty, asm_val, args.ptr, n_params, "");
    self.mapRef(inst, result);              // 1 output: the value; N: extractvalue i per out_value → tuple
}

(Optionally cache the asm value keyed by (template, constraints, fn_ty) the way emit_llvm.zig:167 caches objc_msg_send_value — but per-site construction is fine; LLVM uniques inline-asm constants internally.)

Constraint-string assembler (port of FuncGen.airAssembly):

parts = []
for op in operands where role == out_value or out_place:    # outputs first
    parts.append( op.constraint with ',' replaced by '|' )   # "={rax}", "=r", "+r" …
for op in operands where role == input:
    parts.append( op.constraint with ',' replaced by '|' )   # "{rdi}", "r" …
for name in clobbers:                                        # from clobbers(.name,…)
    parts.append( "~{" + name + "}" )                        # "~{rcx}", "~{cc}", "~{memory}"
constraint_str = ",".join(parts)

LLVM return type follows the out_value count: 0void; 1 → that type; N → an anonymous struct {T1,…,Tn} — after the call, extractvalue i per out_value builds the sx tuple (the multi-return path, §II.2 Dev 5). out_place outputs are stored through their Ref afterward instead.

For sys_write (one output): constraint ={rax},{rax},{rdi},{rsi},{rdx},~{rcx},~{r11},~{memory}, fn_ty = i64 (i64,ptr,i64), args = [1, fd, buf, len], sideeffect = true. For divmod (two outputs): ={rax},={rdx},{rax},{rdx},r,~{cc}, fn_ty = {i64,i64} (i64,i64,i64), and the two extractvalues become the (quot, rem) tuple.

Template rewriter (port verbatim from FuncGen.zig:2735-2802): state machine over the template bytes with a name_map: [name] -> positional index built from outputs ++ inputs:

state start:   '%' -> percent ;  '$' -> emit "$$" ;  else emit byte
state percent: '%' -> emit '%', start
               '[' -> emit "${", state input
               '=' -> emit "${:uid}", start
               else -> emit '%', emit byte, start
state input:   ']' -> emit name_map[name], emit '}', start
               ':' -> emit name_map[name], emit ':', state modifier
               else accumulate name
state modifier:']' -> emit accumulated modifier, emit '}', start
               else accumulate

An unknown %[name] is a hard error (mirror Zig's todo/diagnostic — not a silent pass-through; CLAUDE.md no-silent-arms rule).

Interpreter — src/ir/interp.zig

Inline asm cannot be comptime-evaluated. In the interpreter's op switch:

.inline_asm => return bailDetail("inline asm requires native execution; not available at comptime"),

(Same bailDetail pattern as the Obj-C/JNI ops — surfaces op=inline_asm: ... rather than a silent default.)

Global asm (Deviation 6)

Lower the top-level asm_global decl to a one-shot emit: c.LLVMAppendModuleInlineAsm(module, src.ptr, src.len) (present in the linked LLVM — @19/include/llvm-c/Core.h:971). No operands, no rewrite, no volatile; multiple blocks concatenate in source order (as Zig does).

Calling into an asm-defined symbol needs no new machinery — declare it with a lib-less extern (Deviation 6, §II.2): my_func :: (sig) -> R extern; emits an external-linkage, raw-named, C-ABI extern that the linker resolves against the .global the asm block defines.

Guard (CLAUDE.md no-silent-arms): a global-asm symbol exists only in the final linked binary, not in the #run/JIT host process. The interpreter resolves externs via dlsym(RTLD_DEFAULT) (host_ffi.zig), which won't find it — calling such a symbol at comptime must fail loudly (it should already, via the dlsym-miss diagnostic; pin it with a test). Edge case: a symbol referenced only by other asm/external code may need llvm.used / .no_dead_strip to survive dead-stripping; the common "sx references it" case is safe.

II.7 Stage-to-file map (implementation checklist)

Stage Zig reference sx file + insertion point New code
Keyword tokenizer.zig keywords src/token.zigToken.Tag + keyword StaticStringMap kw_asm (+ contextual volatile)
AST node Ast.zig:2797,3789 src/ast.zig:13,85,721Node.Data + new AsmExpr/AsmOperand ~25 lines
Parser Parse.zig:2771-2883 src/parser.zigparsePrimary .kw_asm case + global-asm at decl scope ~120 lines
Sema/typing Sema.zig:15044 src/ir/expr_typer.zig (inferExprType) + validation ~80 lines
IR op Air.zig:1485, Zir.zig:2531 src/ir/inst.zig:80inline_asm: InlineAsm ~25 lines
Lowering AstGen.zig:8553 src/ir/lower/expr.ziglowerExpr .asm_expr case ~60 lines
LLVM emit FuncGen.zig:2473-2852 src/ir/emit_llvm.zigemitInst .inline_asm case ~120 lines (constraint asm + template rewrite + LLVMGetInlineAsm/BuildCall2)
Global asm Sema.addGlobalAssembly + module asm decl lowering → c.LLVMAppendModuleInlineAsm ~15 lines
Interp bail n/a src/ir/interp.zig op switch 1 line

No change to src/codegen.zig is needed (the IR/LLVM path owns this).

II.8 Phasing

  • Phase 1 (MVP). asm { … } block; asm volatile; string-literal/#string template; = expr inputs; -> Type outputs including N→tuple multi-return; clobbers(.…) dot-name list; %[name]/%% substitution; "no-output ⇒ volatile" check; AT&T. Target: Linux/macOS x86_64 + aarch64 syscalls, intrinsics, and multi-value ops (divmod, cpuid, add_carry).
  • Phase 2. -> @place write-through outputs, read-write ("+r" -> @place) and indirect-memory ("=*m") constraints, %= unique-id, output-to-const rejection.
  • Phase 3. Global/module asm decl (LLVMAppendModuleInlineAsm) + the comptime-call guard, plus Intel-dialect opt-in. Small: the extern-call path already exists (lib-less extern).
  • Phase 4 (optional). Upgrade clobbers(.name) from dot-name sugar to a compile-time-checked per-architecture Clobber enum (typo-checking; same syntax).
  • Phase 5 (optional). Naked functions (callconv-equivalent) for full freestanding entry points.

II.9 Testing

asm output is target-specific, so tests must pin a target and assert on emitted IR/exit, not run host-natively unless the host matches. Use the existing corpus harness and the 16xx platform block (the closest fit in the XXXX-category scheme; specs.md/CLAUDE.md test-layout). Mirror Zig's own matrix:

  • examples/16xx-platform-asm-syscall-write.sx — x86_64-linux write(2), assert exit/stdout.
  • examples/16xx-platform-asm-register-read.sxmov %%rsp,%[out], no-input output.
  • examples/16xx-platform-asm-no-output-volatile.sx — bare asm volatile { "nop" }.
  • examples/16xx-platform-asm-missing-volatile.sxexpected compile error (no output, no volatile) — pins the diagnostic.
  • examples/16xx-platform-asm-template-subst.sx%[a]/%% rewriting, assert on the sx ir/.s snapshot.
  • examples/16xx-platform-asm-multi-return.sxdivmod(quot, rem) tuple, destructured.
  • examples/16xx-platform-asm-global.sx (Phase 3) — global asm + extern call.

Add an IR/.s snapshot (expected/*.ir) for the substitution test so the constraint-string + template-rewrite output is locked. Seed markers and regenerate with zig build test -Dupdate-goldens, then review the diff (CLAUDE.md snapshot-integrity rule).

II.10 Open decisions for the user

Largely settled through design review; what remains:

  1. Dialect: AT&T only (Zig's default) for v1, or expose an Intel opt-in (LLVMInlineAsmDialectIntel) from the start? Recommend AT&T-only v1.
  2. volatile keyword (Deviation 4): contextual (recommended, no source-compat risk) vs globally reserved (simpler lexer).
  3. Brace separator: comma (recommended — trailing-comma-friendly, literal-style) vs ; (matches sx statement blocks).
  4. Asm-symbol extern spelling (Deviation 6): RESOLVED — use the lib-less extern keyword to call into an asm symbol (import), and export for the reverse direction (an sx function asm can call back into). The dedicated linkage keywords landed (FFI-linkage stream), so no new surface is needed and both directions are covered.

Decided: brace block { … } (Dev 1) · ->/= markers, : sections dropped, <- rejected (Dev 2) · clobbers(.…) enum-literal list, dot-name sugar now → checked enum later (Dev 3) · multiple value-outputs return a tuple (Dev 5). For global asm (Dev 6) the call-into-asm direction reuses lib-less extern (Decision 4, resolved).

II.11 Risks

  • Constraint/template correctness is silent if wrong — a bad constraint string miscompiles with no diagnostic. Mitigation: port Zig's assembler/rewrite verbatim (don't paraphrase) and lock IR snapshots in tests.
  • Register-name validity is unchecked in v1's clobbers(.name) dot-name form — a typo'd register (.raxx) surfaces only as an LLVM error. This is exactly the gap the Phase-4 checked Clobber enum closes; acceptable for v1 (LLVM validates the emitted ~{…}).
  • #string heredoc + AT&T %/$ interplay: ensure the heredoc delivers the template bytes literally (no sx-level escape processing of %/$) before the rewrite stage.
  • Target gating: asm examples must declare their target or they break the corpus on other hosts; the test plan pins targets.

Appendix A — exact LLVM-C calls (already reachable via llvm_api.c)

// src/llvm_api.zig @cInclude("llvm-c/Core.h") exposes all of these:
LLVMValueRef LLVMGetInlineAsm(LLVMTypeRef Ty,
    const char *AsmString,   size_t AsmStringSize,
    const char *Constraints, size_t ConstraintsSize,
    LLVMBool HasSideEffects, LLVMBool IsAlignStack,
    LLVMInlineAsmDialect Dialect, LLVMBool CanThrow);   // LLVM 19 & 21: identical
LLVMValueRef LLVMBuildCall2(LLVMBuilderRef, LLVMTypeRef, LLVMValueRef Fn,
    LLVMValueRef *Args, unsigned NumArgs, const char *Name);
void LLVMAppendModuleInlineAsm(LLVMModuleRef M, const char *Asm, size_t Len);  // global asm
// enum: LLVMInlineAsmDialectATT, LLVMInlineAsmDialectIntel

Appendix B — file index

Zig (reference, ~/projects/zig): lib/std/zig/tokenizer.zig (keywords) · lib/std/zig/Ast.zig:2797,3789,3969 (nodes) · lib/std/zig/Parse.zig:2771-2883 (grammar) · lib/std/zig/AstGen.zig:8553-8669,12257 + lib/std/zig/Zir.zig:2531 (ZIR) · src/Sema.zig:15044-15231 (validation) · src/Air.zig:1485 (AIR) · src/codegen/llvm/FuncGen.zig:2473-2852 + lib/std/zig/llvm/Builder.zig:6131 (LLVM) · doc/langref/inline_assembly.zig, doc/langref/test_global_assembly.zig (syntax) · doc/langref.html.in:4217-4300 (spec).

sx (target, ~/projects/sx): src/token.zig · src/lexer.zig:402 (#string) · src/ast.zig:13 · src/parser.zig (parsePrimary), the optional extern library tail · src/ir/expr_typer.zig · src/ir/inst.zig:80,219,260 · src/ir/lower/expr.zig · src/ir/module.zig:300 (declareExtern) · src/ir/emit_llvm.zig:167 (msgSend cache), :1244 (extern⇒C-ABI), :1279 (raw symbol name) · src/ir/interp.zig (bailDetail) · src/llvm_api.zig:1-17 · build.zig:10 (LLVM@19).

Appendix C — Cookbook (final form: asm { … }, ->/=, clobbers(.…), pure AT&T)

// ── v1 ────────────────────────────────────────────────────────────────────

asm volatile { "nop" };                          // bare side-effecting

// write(2) syscall — register-pinned inputs, one value-output
sys_write :: (fd: i64, buf: [*]u8, len: u64) -> i64 {
    return asm volatile {
        "syscall",
        "={rax}" -> i64,
        "{rax}" = 1,  "{rdi}" = fd,  "{rsi}" = buf,  "{rdx}" = len,
        clobbers(.rcx, .r11, .memory),
    };
}

// mmap — full 6-arg syscall ABI (arg4 in r10, not rcx)
mmap :: (addr: *void, len: u64, prot: i32, flags: i32, fd: i32, off: i64) -> *void {
    return asm volatile {
        "syscall",
        "={rax}" -> *void,
        "{rax}" = 9, "{rdi}" = addr, "{rsi}" = len, "{rdx}" = prot,
        "{r10}" = flags, "{r8}" = fd, "{r9}" = off,
        clobbers(.rcx, .r11, .memory),
    };
}

// AT&T scaled-index addressing — arr[i]
load_idx :: (arr: *i64, i: u64) -> i64 {
    return asm {
        "movq (%[arr],%[i],8), %[out]",
        [out] "=r" -> i64,  [arr] "r" = arr,  [i] "r" = i,
    };
}

// CPUID AVX probe — immediates, heavy clobber set, single value-result
has_avx :: () -> bool {
    return asm volatile {
        #string ATT
        movl    $1, %%eax
        cpuid
        andl    $0x10000000, %%ecx
        setne   %[ok]
ATT,
        [ok] "=r" -> bool,
        clobbers(.rax, .rbx, .rcx, .rdx, .cc),
    };
}

// SSE packed add — xmm regs, no outputs ⇒ volatile
vadd4 :: (a: *f32, b: *f32, out: *f32) {
    asm volatile {
        #string ATT
        movups  (%[a]), %%xmm0
        movups  (%[b]), %%xmm1
        addps   %%xmm1, %%xmm0
        movups  %%xmm0, (%[out])
ATT,
        [a] "r" = a,  [b] "r" = b,  [out] "r" = out,
        clobbers(.xmm0, .xmm1, .memory),
    };
}

// ── multi-return (v1; sx has tuples, Zig caps at one output) ────────────────

// 64-bit divide → (quotient, remainder)
divmod :: (n: u64, d: u64) -> (quot: u64, rem: u64) {
    return asm {
        "divq %[d]",
        [quot] "={rax}" -> u64,
        [rem]  "={rdx}" -> u64,
        "{rax}" = n,  "{rdx}" = 0,  [d] "r" = d,
        clobbers(.cc),
    };
}

// rdtsc → two 32-bit halves, destructured straight out of the asm
rdtsc :: () -> u64 {
    lo, hi := asm volatile {
        "rdtsc",
        [lo] "={eax}" -> u32,
        [hi] "={edx}" -> u32,
    };
    return (xx hi << 32) | xx lo;
}

// cpuid → a clean 4-tuple
cpuid :: (leaf: u32, subleaf: u32) -> (eax: u32, ebx: u32, ecx: u32, edx: u32) {
    return asm volatile {
        "cpuid",
        [eax] "={eax}" -> u32,  [ebx] "={ebx}" -> u32,
        [ecx] "={ecx}" -> u32,  [edx] "={edx}" -> u32,
        "{eax}" = leaf,  "{ecx}" = subleaf,
    };
}

// add-with-carry → (sum, carry): value-output + tied input + flag capture
add_carry :: (a: u64, b: u64) -> (sum: u64, carry: u8) {
    return asm {
        #string ATT
        addq    %[b], %[sum]
        setc    %[carry]
ATT,
        [sum]   "=r" -> u64,
        [carry] "=r" -> u8,
        [a] "0" = a,  [b] "r" = b,
        clobbers(.cc),
    };
}

// ── Phase 2 (write-through / read-write / indirect) ─────────────────────────

// byte memcpy — labels, loop, read-write operands
memcpy_bytes :: (dst: [*]u8, src: [*]u8, n: u64) {
    d := dst;  s := src;  c := n;
    asm volatile {
        #string ATT
        testq   %[c], %[c]
        jz      2f
    1:  movb    (%[s]), %%al
        movb    %%al, (%[d])
        incq    %[s]
        incq    %[d]
        decq    %[c]
        jnz     1b
    2:
ATT,
        [d] "+r" -> @d,  [s] "+r" -> @s,  [c] "+r" -> @c,
        clobbers(.rax, .cc, .memory),
    };
}

// lock cmpxchg CAS — lock prefix, pinned read-write rax, two outputs
cas :: (ptr: *i64, expected: i64, desired: i64) -> bool {
    old := expected;  ok: bool = ---;
    asm volatile {
        #string ATT
        lock cmpxchgq %[desired], (%[ptr])
        sete    %[ok]
ATT,
        [ok]      "=r"     -> @ok,
        [old]     "+{rax}" -> @old,
        [ptr]     "r"      = ptr,
        [desired] "r"      = desired,
        clobbers(.cc, .memory),
    };
    return ok;
}

// fill an existing struct (write-through, no tuple)
cpuid_into :: (out: *CpuId, leaf: u32) {
    asm volatile {
        "cpuid",
        "={eax}" -> @out.eax,  "={ebx}" -> @out.ebx,
        "={ecx}" -> @out.ecx,  "={edx}" -> @out.edx,
        "{eax}" = leaf,
    };
}

Global asm + extern (Phase 3):

asm {
    #string ATT
    .global my_add
    my_add:
      lea (%rdi,%rsi,1), %eax
      retq
ATT,
};
my_add :: (a: i32, b: i32) -> i32 extern;       // lib-less extern = Zig's `extern fn`