Two new workstreams:
- ASM: inline assembly — asm { "tmpl", "=r" -> T, "r" = expr, clobbers(.…) },
multi-return tuples; lowers via the existing llvm_api.c (no shim).
- FFI-linkage: add extern/export postfix keywords, migrate every #foreign onto
them, then purge 'foreign' from the tree (end-state invariant).
Drop current/ from .gitignore so plans + checkpoints are tracked normally
(the dir was ignored; only checkpoints had been force-added). Includes
docs/inline-asm-design.md. specs.md change left uncommitted.
44 KiB
Inline Assembly for sx — Design Doc & Proposal
Status: proposal / not yet scheduled into a workstream
Author: research pass over the Zig compiler (~/projects/zig, 0.16-dev) + the sx compiler
Scope: how Zig implements inline assembly end-to-end, and a minimal-deviation proposal to bring the same model to sx.
Guiding constraint for this doc: mirror Zig's design; deviate only where sx's grammar or stdlib makes a 1:1 copy impossible, and call every deviation out explicitly with its justification. Every deviation below is tagged [DEVIATION] with a reason.
0. TL;DR + feasibility
- Feasible today, no new infrastructure. sx already links LLVM (
build.zig:10→/opt/homebrew/opt/llvm@19) and@cImportsllvm-c/Core.h(src/llvm_api.zig:1-17). That header exposes everything inline asm needs, reachable right now throughllvm_api.c.*:LLVMGetInlineAsm(Ty, AsmString, AsmStringSize, Constraints, ConstraintsSize, HasSideEffects, IsAlignStack, Dialect, CanThrow)— builds the asm callee (LLVM 19/21 share this 9-arg signature).LLVMInlineAsmDialectATT/LLVMInlineAsmDialectIntel.LLVMBuildCall2(...)— already used pervasively insrc/ir/emit_llvm.zig(e.g. the Obj-C msgSend path) — calls the asm value like a function.LLVMAppendModuleInlineAsm(M, Asm, Len)— module-level (global) asm.
- The hard part is not codegen. Codegen is ~80 lines of well-trodden LLVM-C.
The real work is (a) the parser grammar, (b) a faithful port of Zig's
LLVM constraint-string assembly and
%[name]→$Ntemplate rewrite, and (c) Sema validation rules. All three are fully specified below. - Surface form (decided, §II.2):
asm volatile { "tmpl", "=r" -> T, "r" = x, clobbers(.cc, .memory) }— a brace block;->marks outputs /=marks inputs (no positional:sections); enum-literalclobbers(.…); and N-> Typeoutputs return a tuple (sx has tuples — Zig caps at one output). - Inline asm is never comptime-evaluable. The interpreter must bail loudly
(
bailDetail), per CLAUDE.md's "no silent unimplemented arms" rule. - One naming note: sx already has a
sx asm <file>CLI subcommand (src/main.zig:203,386) that emits a.sfile. That is a compiler output mode, a different namespace from a language token. No conflict, but worth knowing so nobody confuses the two.
PART I — How Zig implements inline assembly
All file references in Part I are under ~/projects/zig (0.16-dev,
commit 3deb86bafd). Parser/AST/AstGen live in lib/std/zig/; Sema/AIR/codegen
in src/.
I.1 Surface syntax
The canonical example (doc/langref/inline_assembly.zig), a Linux x86_64 syscall:
pub fn syscall3(number: usize, arg1: usize, arg2: usize, arg3: usize) usize {
return asm volatile ("syscall"
: [ret] "={rax}" (-> usize),
: [number] "{rax}" (number),
[arg1] "{rdi}" (arg1),
[arg2] "{rsi}" (arg2),
[arg3] "{rdx}" (arg3),
: .{ .rcx = true, .r11 = true });
}
Grammar shape:
asm volatile? ( <template-string>
: <output-item> , <output-item> , ... # outputs (optional section)
: <input-item> , <input-item> , ... # inputs (optional section)
: <clobbers> ) # clobbers (optional section)
output-item : [name] "constraint" (-> Type) # asm result becomes the value
| [name] "constraint" (lvalue) # asm writes through the pointer
input-item : [name] "constraint" (expr)
clobbers : .{ .reg0 = true, .reg1 = true } # struct literal (0.16-dev)
Key semantics (from doc/langref.html.in:4217-4300):
-
volatilemarks side effects. Without it, an asm expression whose result is unused may be deleted. An asm expression with no outputs must bevolatile(else compile error). -
x86/x86_64 use AT&T syntax (LLVM provides the parser; Intel support is "buggy and not well tested").
-
%[name]in the template refers to a named operand's register;%%is a literal%. -
Clobbers are registers the asm trashes that are not inputs/outputs.
"memory"(the.memory = truefield) means "writes to arbitrary memory." Failing to declare a clobber is unchecked illegal behavior. -
Global assembly = an
asm(...)in a namespace-levelcomptimeblock. It has different rules:volatileis forbidden, there are no inputs/outputs/ clobbers, no%substitution, and all global asm is concatenated verbatim:// doc/langref/test_global_assembly.zig comptime { asm ( \\.global my_func; \\.type my_func, @function; \\my_func: \\ lea (%rdi,%rsi,1),%eax \\ retq ); } extern fn my_func(a: i32, b: i32) i32; // call into the global-asm symbol
I.2 Pipeline, stage by stage
Tokenizer — lib/std/zig/tokenizer.zig
Two keywords in the StaticStringMap: .{ "asm", .keyword_asm } and
.{ "volatile", .keyword_volatile }.
AST — lib/std/zig/Ast.zig
Four node tags (Ast.zig:3789-3817):
asm_simple—asm(template)only, no operands.@"asm"— full form;dataisnode_and_extra→ (template node,ExtraIndexto anAsm).asm_output—[a] "b" (-> Type)or[a] "b" (ident).asm_input—[a] "b" (expr).
The "full" view the rest of the compiler consumes (Ast.zig:2797-2809):
pub const Asm = struct {
ast: Components,
volatile_token: ?TokenIndex,
outputs: []const Node.Index,
inputs: []const Node.Index,
pub const Components = struct {
asm_token: TokenIndex,
template: Node.Index,
items: []const Node.Index, // outputs ++ inputs, interleaved order preserved
clobbers: Node.OptionalIndex, // a comptime expression (the struct literal)
rparen: TokenIndex,
};
};
The on-disk extra record (Ast.zig:3969-3975) stores items_start/items_end
(a span into the node list), clobbers (optional node), and rparen.
Parser — lib/std/zig/Parse.zig
expectAsmExpr (Parse.zig:2771-2838) implements the grammar:
fn expectAsmExpr(p: *Parse) !Node.Index {
const asm_token = p.assertToken(.keyword_asm);
_ = p.eatToken(.keyword_volatile);
_ = try p.expectToken(.l_paren);
const template = try p.expectExpr();
if (p.eatToken(.r_paren)) |rparen| { /* asm_simple */ }
_ = try p.expectToken(.colon);
// ... parse output items until a `:`/`)` ...
const clobbers: Node.OptionalIndex = if (p.eatToken(.colon)) |_| clobbers: {
// ... parse input items until a `:`/`)` ...
_ = p.eatToken(.colon) orelse break :clobbers .none;
break :clobbers (try p.expectExpr()).toOptional(); // clobbers = an expression
} else .none;
// ...
}
parseAsmOutputItem(Parse.zig:2840-2864):LBRACKET IDENT RBRACKET STRINGLITERAL LPAREN (MINUSRARROW TypeExpr | IDENT) RPAREN.parseAsmInputItem(Parse.zig:2866-2883):LBRACKET IDENT RBRACKET STRINGLITERAL LPAREN Expr RPAREN.- Clobbers parse as a generic expression (
(try p.expectExpr())), not a string list — this is the 0.16-dev change. It is later coerced to astd.lang.assembly.Clobbersstruct at Sema time.
AST → ZIR — lib/std/zig/AstGen.zig
asmExpr (AstGen.zig:8553-8669) + addAsm (12257-12310). The ZIR payload
(lib/std/zig/Zir.zig:2531-2564):
pub const Asm = struct {
src_node: Ast.Node.Offset,
asm_source: NullTerminatedString, // template (string-literal case)
output_type_bits: u32, // bit i = output i uses `-> T` (vs ptr)
clobbers: Ref, // comptime ref → assembly.Clobbers value
pub const Small = packed struct(u16) { is_volatile: bool, outputs_len: u7, inputs_len: u8 };
pub const Output = struct { name: NullTerminatedString, constraint: NullTerminatedString, operand: Ref };
pub const Input = struct { name: NullTerminatedString, constraint: NullTerminatedString, operand: Ref };
};
AstGen already enforces the structural rules:
- Global (container-level) asm: rejects
volatile, rejects any outputs/inputs/clobbers (AstGen.zig:8583-8587). - Local asm: "assembly expression with no output must be marked volatile."
outputs.len < 16,inputs.len < 32(fitSmall.outputs_len/inputs_len).- At most one output may use the
-> Tform ("inline assembly allows up to one output value");output_type_bitsrecords which. - Two ZIR tags:
.@"asm"(string-literal template) vs.asm_expr(comptime expression template).
ZIR → AIR (Sema) — src/Sema.zig
zirAsm (Sema.zig:15044-15231, dispatched at 1396-1397). This is where all
semantic validation happens. It:
- Resolves the template to a comptime string (
resolveConstString). - Global asm (
func_index == .none): asserts no operands, thenzcu.addGlobalAssembly(owner, asm_source)and returns.void_value. requireRuntimeBlock— local asm can't run at comptime.- Per output: if
-> T, resolve the type,ensureLayoutResolved, set the expression's result type; else resolve the operand pointer. Validates:- output type has a well-defined in-memory layout (else error);
- cannot output to a
constpointer ("asm cannot output to const '{s}'"); - output must be a runtime value (no reference to a comptime var).
- Per input: resolve operand, reject comptime-only refs, coerce
comptime_int→usize,comptime_float→f64. - Clobbers: coerce the expression to
std.lang.assembly.Clobbers, resolve to a comptime value.
The AIR payload (src/Air.zig:1485-1497):
pub const Asm = struct {
source_len: u32,
inputs_len: u32,
clobbers: InternPool.Index, // comptime assembly.Clobbers value
flags: packed struct(u32) { outputs_len: u31, is_volatile: bool },
};
// trailing: out operand refs, in operand refs, then the template bytes and
// (constraint\0 name\0) pairs packed into air_extra.
AIR → LLVM — src/codegen/llvm/FuncGen.zig
airAssembly (FuncGen.zig:2473-2852) is the crux. This is the algorithm sx
must port. Three sub-tasks:
(a) Assemble the LLVM constraint string. Comma-separated. For each output:
emit = (write-only) or + (read-write, recorded in llvm_rw_vals); a *
prefix marks an indirect (memory) output passed as a pointer parameter; a
non-indirect output contributes to the return type. The user's leading =/+
in constraint[0] is consumed and re-emitted; the rest is copied with Zig
commas rewritten to LLVM | (alternative constraints). Inputs are copied
similarly (no =). Clobbers: iterate the Clobbers struct's bool fields as a
bigint; for each true field emit ~{fieldname} (via appendConstraints,
which also expands target-specific aliases).
(b) Rewrite the template %[name] → LLVM positional ${N} (state machine,
FuncGen.zig:2735-2802):
| input | output | note |
|---|---|---|
$ |
$$ |
escape LLVM's $ |
%% |
% |
literal percent |
%= |
${:uid} |
unique id |
%[name] |
${N} |
N = position in name_map |
%[name:mod] |
${N:mod} |
with modifier |
name_map maps each operand's [name] to its positional index across all
outputs+inputs.
(c) Build & call. Pick the LLVM function type:
return_count == 0 → void; == 1 → the single return type; > 1 → an
anonymous struct of the return types. Then:
const call = try self.wip.callAsm(
attributes, llvm_fn_ty,
.{ .sideeffect = is_volatile }, // Assembly.Info: sideeffect/alignstack/inteldialect/unwind
rendered_template, llvm_constraints, llvm_param_values, "");
callAsm (lib/std/zig/llvm/Builder.zig:6131-6143) is a thin wrapper that
builds the asm constant (asmValue) and emits a normal call. In LLVM-C terms
this is exactly LLVMGetInlineAsm(...) + LLVMBuildCall2(...). Finally,
non-indirect outputs are read back: with one return it's the call result; with
several it's extractvalue i per output; indirect outputs were already written
by the asm via their pointer parameter.
C backend — src/codegen/c.zig
No airAssembly for inline asm in the C backend in this tree; only global asm
flows out (as module asm). For sx this is irrelevant — sx only has an LLVM
backend.
Global asm & naked functions
- Global asm bypasses everything above:
Sema.addGlobalAssemblyaccumulates the verbatim source; the LLVM object emits it via the module-level asm string (LLVM-C:LLVMAppendModuleInlineAsm). Symbols it defines are reached withextern fn. - Naked functions (
callconv(.naked)) drop the prologue/epilogue; the body is entirely inline asm. This is an orthogonal calling-convention feature, not part of the asm expression itself.
PART II — Proposal for sx
II.1 Design principles
- Copy Zig's semantic model exactly: a template + register/memory operands
- clobbers + a
volatileflag; AT&T syntax via LLVM; "no-output asm must be volatile";%[name]substitution; AT&T-by-default.
- clobbers + a
- Copy the LLVM lowering exactly (the constraint-string assembler + template
rewriter from
FuncGen.zigare reproduced verbatim in §II.6 — these are the parts where "inventing our own" would silently miscompile). - Diverge from Zig's surface syntax where sx has a better-fitting idiom, and
only there. The deviations (§II.2) are deliberate: a brace block instead of
( … );->/=operand markers instead of positional:sections; an enum-literalclobbers(.…)list; and — because sx has tuples and Zig does not — true multiple return values instead of Zig's one-output cap.
II.2 sx surface syntax
asm is an expression (it yields the output value/tuple), introduced by a new
asm keyword. The body is a brace block of comma-separated parts: a template
string first, then operands, then an optional clobbers(.…) clause. Each operand
is [name]? "constraint" <role>, where the role marker is:
-> Type— an output that produces a value (joins the result).-> @place— an output that writes through to existing storage (Phase 2).= expr— an input (the value fed in).
-> reuses sx's "produces" arrow (as in (a: i32) -> i32); = reuses sx's
"is set to" binding. There are no positional : sections.
// x86_64-linux — write(2) via syscall
sys_write :: (fd: i64, buf: [*]u8, len: u64) -> i64 {
return asm volatile {
"syscall",
"={rax}" -> i64, // output → the expression's value
"{rax}" = 1, // SYS_write
"{rdi}" = fd,
"{rsi}" = buf,
"{rdx}" = len,
clobbers(.rcx, .r11, .memory),
};
}
// read a register, no inputs, named operand for %[out]
sp :: () -> u64 {
return asm { "mov %%rsp, %[out]", [out] "=r" -> u64 };
}
Multi-instruction templates use sx's existing #string heredoc
(src/lexer.zig:402) or a multi-line "..." literal — no new lexer feature:
serialize :: () {
asm volatile {
#string ATT
mfence
lfence
ATT,
};
}
Outputs and the result type. A -> Type output contributes one value to the
asm expression's result; the count decides the shape:
-> Type outputs |
result | spelling |
|---|---|---|
| 0 | void (must be volatile) |
asm volatile { … } |
| 1 | that type T |
x := asm { …, "=r" -> T }; |
| N | a tuple (T1,…,Tn) (declaration order) |
a, b := asm { … }; |
A [name] on an output becomes a named tuple field — the same name you'd use
for %[name] does double duty:
// sx has tuples, so asm gets real multiple return values (Zig caps you at one).
divmod :: (n: u64, d: u64) -> (quot: u64, rem: u64) {
return asm {
"divq %[d]",
[quot] "={rax}" -> u64, // → .quot (operand 0)
[rem] "={rdx}" -> u64, // → .rem (operand 1)
"{rax}" = n,
"{rdx}" = 0,
[d] "r" = d,
clobbers(.cc),
};
}
q, r := divmod(17, 5); // q = 3, r = 2
Deviations from Zig (each deliberate; semantics unchanged)
-
[DEVIATION 1 — brace block, not
( … ).] The asm body isasm { … }, a comma-separated brace block (trailing comma allowed, perspecs.md:226,501), not Zig's parenthesised form. Braces read as "a block of code," which is what an asm template is;#stringheredoc templates especially benefit.asmis a keyword, soasm {/asm volatile {is unambiguous. -
[DEVIATION 2 —
->/=operand markers, not:sections.] Zig groups operands into positional: outputs : inputs : clobberssections (count the colons;: :for an empty one). sx tags each operand by role instead —-> Type/-> @place(output) and= expr(input) — so the list is flat, order-independent, with no positional colons. (<-for inputs was considered and rejected: it can't be a global token without mis-lexinga < -b;=reuses an existing token and the existing "binding" meaning.) -
[DEVIATION 3 — clobbers are an enum-literal list
clobbers(.cc, .memory).] Zig 0.16 uses a struct literal: .{ .rcx = true }coerced to a per-archstd.lang.assembly.Clobbers; older Zig used a string list. sx uses a dot-literal list, cleaner than both. v1: each.nameis a dot-name lowered straight to~{name}(.memory/.ccare recognized specials; register names pass through verbatim; LLVM validates). Phase 4: upgrade.nameto members of a compile-time-checked per-archClobberenum — same syntax, gains typo-checking. Note the call-lookingclobbers(…)is a declarative clause, not a call — nothing executes; it only feeds the register allocator. -
[DEVIATION 4 —
volatileis a contextual keyword.] sx's keyword set (specs.md:168) has neitherasmnorvolatile.asmbecomes a real keyword;volatileappears only right afterasm, so it can be recognized contextually (a plain identifier everywhere else), avoiding reserving it globally. The surface is byte-identical to Zig. (Alternative: reserve globally — simpler lexer, small source-compat risk. Recommend contextual.) -
[DEVIATION 5 — multiple value-outputs return a tuple (sx ⊃ Zig).] Zig allows at most one
-> Toutput; the rest must be pointer/lvalue outputs. sx has tuples, so N-> Typeoutputs return(T1,…,Tn)(named when operands are named), destructured witha, b := …. A deliberate improvement over Zig, enabled by a feature Zig lacks, and maps onto LLVM's existing multi-output struct return (§II.6). The other output flavor —-> @placewrite-through, plus read-write ("+r" -> @place) and indirect-memory ("=*m") outputs — is Phase 2 (needs indirect-constraint handling); the value-tuple form does not. -
[DEVIATION 6 — global asm is a top-level
asm { … }declaration.] sx has no namespace-levelcomptime {}block (it has#run,specs.md:2598), so global asm is a top-level statement:asm { #string ATT .global my_func .type my_func, @function my_func: lea (%rdi,%rsi,1), %eax retq
ATT, };
my_func :: (a: i32, b: i32) -> i32 #foreign; // extern, no library — valid sx today
Only the `comptime {}` wrapper is dropped; lowers to `LLVMAppendModuleInlineAsm`.
**Calling the asm symbol reuses the C-FFI *import* path** (no new mechanism for
v1) — but note `#foreign` is **not** a general `extern`. A lib-less `#foreign`
(its library is optional: `src/parser.zig:319-325`; used in 50+ stdlib sites,
e.g. `chdir :: (path: [*]u8) -> i32 #foreign;`) emits exactly the artifact needed
to *call into* the asm symbol — an external-linkage, **C-calling-convention**,
raw-named (`emit_llvm.zig:1279`), link-time-resolved declaration — the same thing
Zig's `extern fn` produces (Zig's `extern fn` is also C-callconv). At the IR level
`is_extern` is set straight from `is_foreign` (`decl.zig:1123`) and `#foreign`
forces the C ABI (`decl.zig:2110`). The two real differences from a general
`extern`: (1) `#foreign` is **import-only** — sx has no `#export`/linkname, so the
reverse direction (asm calling *back into* an sx function) is unsupported; (2) it
carries C-ABI marshaling and reads as "a foreign C function," a category-borrow
for a symbol your own module defines. It is the right *mechanism* but an imperfect
*spelling*; a dedicated `#extern`/linkname is an open question (§II.10).
(`specs.md:1209` was corrected to drop the false "library mandatory" claim.)
Everything *semantic* — comptime-known template, register/memory constraints
verbatim to LLVM, clobber meaning, "no-output ⇒ must be volatile," AT&T default,
`%[name]`/`%%` substitution — is **identical to Zig**. Only the surface (block,
`->`/`=`, `clobbers(.…)`, tuple returns) differs.
## II.3 sx AST
sx's AST is a pointer-based tagged union (`Data = union(enum)` at
`src/ast.zig:13`, nodes built via `Parser.createNode`), much simpler than Zig's
SoA `extra_data` scheme — so we can store slices directly. Add one arm next to
`foreign_expr` (`src/ast.zig:85`):
```zig
// in Node.Data union(enum):
asm_expr: AsmExpr,
// new, near ForeignExpr (src/ast.zig:721):
pub const AsmExpr = struct {
template: *Node, // string-literal / #string node (comptime string)
is_volatile: bool = false,
operands: []const AsmOperand, // declaration order preserved (= %N indexing)
clobbers: []const []const u8, // dot-names from clobbers(.…): "rcx","cc","memory"
};
pub const AsmOperand = struct {
name: ?[]const u8 = null, // optional [name]; only needed for %[name]
constraint: []const u8, // verbatim, e.g. "={rax}", "=r", "+r", "{rdi}", "r"
role: Role,
payload: *Node, // out_value → Type node; out_place/input → expr node
pub const Role = enum {
out_value, // `-> Type` value output; N of these → a tuple result
out_place, // `-> @place` write-through to existing storage (Phase 2)
input, // `= expr`
};
};
A single flat operands list (not split into outputs/inputs) preserves source
order — what the %0/%[name] indices and the LLVM constraint order key off. The
result type is derived in Sema from the out_value operands (§II.5).
II.4 sx parser
asm is parsed in expression position. sx dispatches primary expressions in
Parser.parsePrimary (src/parser.zig); add a .kw_asm case (mirroring how
existing keyword/#-directive expressions like #run are handled):
- consume
asm; contextually consumevolatileif the next token is the wordvolatile(Deviation 4). expect(.l_brace); parse the first element as the template expression.- then a comma-separated list until
}. Each element is either:- an operand —
[name]?(a bracketed identifier), a string-literal constraint, then a role:->Type(out_value) ·->@-place (out_place, Phase 2) ·=expr(input); or - the clobbers clause —
clobbers(.ident (,.ident)*).
- an operand —
- allow a trailing comma;
expect(.r_brace);createNode(start, .{ .asm_expr = … }).
The first element is unambiguously the template (a string not followed by a role
marker). -> vs = after the constraint disambiguates output vs input; inside a
-> target, a leading @ marks a write-through place vs a type.
Top-level/global asm (Deviation 6): recognize asm { at declaration scope and
build a dedicated asm_global decl (template only — reject operands/volatile).
Lexer/token: add kw_asm to the Token.Tag enum + keyword StaticStringMap in
src/token.zig; volatile and clobbers stay out of the global table
(contextual). No new operator tokens — -> (arrow), = (equal), .
(dot) and {} already exist.
II.5 sx Sema / typing
- Result type from the
out_valueoperands (-> Type), in declaration order: 0 →void(and the asm must bevolatile); 1 → that operand's typeT; N → a tuple(T1,…,Tn), named when the operands carry[name]s ((name1: T1, …)), positional otherwise. Implement in the expression typer (src/ir/expr_typer.zig/ whereverinferExprTypelives), returning the resolvedTypeId(a tupleTypeIdfor N>1). Do not fall back to a silent default — an unresolvable output type is a real error (CLAUDE.md silent-default rule): emit a diagnostic and return the project's.unresolvedsentinel. - Port Zig's validation checklist (these are the user-facing error messages):
- no output operand ⇒ the asm must be
volatile; - each
out_valueresult type must have a well-defined in-memory layout; - inputs must be runtime values; coerce comptime int→
i64, float→f64; - template must be a comptime-known string;
- (Phase 2)
out_placecannot write aconst; indirect-memory rules.
- no output operand ⇒ the asm must be
- Every
%[name]referenced in the template must name an operand (best surfaced as a Sema diagnostic; also caught at codegen during the rewrite — §II.6).
Note: there is no "≤1 output" rule (that was Zig's limit; sx's tuples lift it).
II.6 sx IR + LLVM codegen (the part that must match Zig bit-for-bit)
IR op — src/ir/inst.zig
Add to Op = union(enum) (src/ir/inst.zig:80), next to objc_msg_send
(:219). Strings are interned (StringId, as const_string at :85); operands
are SSA Refs:
inline_asm: InlineAsm,
pub const InlineAsm = struct {
template: StringId, // interned, RAW (rewritten at emit)
operands: []const AsmOperand, // declaration order (= %N indexing)
clobbers: []const StringId, // interned dot-names: "rcx","cc","memory"
has_side_effects: bool,
// result rides on Inst.ty: void / a scalar TypeId / a tuple TypeId (N outputs)
};
pub const AsmOperand = struct {
role: enum { out_value, out_place, input },
name: StringId, // .none when unnamed
constraint: StringId, // verbatim "={rax}" / "=r" / "+r" / "{rdi}"
operand: Ref, // out_value → .none; out_place/input → the Ref
};
Lowering — src/ir/lower/expr.zig
Add .asm_expr => self.lowerAsmExpr(...) to the lowerExpr dispatch. It interns
the template + constraint strings + clobber names, lowers each input operand to a
Ref, computes the result TypeId (§II.5), and emits the inline_asm op. (Same
shape as the existing objc_msg_send lowering.)
Emit — src/ir/emit_llvm.zig
Add .inline_asm => self.emitInlineAsm(...) to the emitInst dispatch. This is a
direct port of FuncGen.airAssembly. Using the already-imported
llvm_api.c:
fn emitInlineAsm(self: *Emitter, inst: *const Inst, a: InlineAsm) void {
// 1) result LLVM type + param types/values from constraints
const ret_ty = self.lowerType(inst.ty); // void if no typed output
var param_tys: ...; var args: ...; // one per `input` constraint
// 2) assemble the LLVM constraint string (see algorithm below)
// outputs first ("=..."/"+..."), then inputs, then "~{reg}" clobbers, comma-joined
// 3) rewrite the template %[name]->${N}, %%->%, %=->${:uid}, $->$$ (state machine below)
const fn_ty = c.LLVMFunctionType(ret_ty, param_tys.ptr, n_params, 0);
const asm_val = c.LLVMGetInlineAsm(
fn_ty,
rendered_template.ptr, rendered_template.len,
constraint_str.ptr, constraint_str.len,
@intFromBool(a.has_side_effects), // HasSideEffects (volatile)
0, // IsAlignStack
c.LLVMInlineAsmDialectATT, // AT&T (Deviation: none — matches Zig default)
0, // CanThrow
);
const result = c.LLVMBuildCall2(self.builder, fn_ty, asm_val, args.ptr, n_params, "");
self.mapRef(inst, result); // 1 output: the value; N: extractvalue i per out_value → tuple
}
(Optionally cache the asm value keyed by (template, constraints, fn_ty) the way
emit_llvm.zig:167 caches objc_msg_send_value — but per-site construction is
fine; LLVM uniques inline-asm constants internally.)
Constraint-string assembler (port of FuncGen.airAssembly):
parts = []
for op in operands where role == out_value or out_place: # outputs first
parts.append( op.constraint with ',' replaced by '|' ) # "={rax}", "=r", "+r" …
for op in operands where role == input:
parts.append( op.constraint with ',' replaced by '|' ) # "{rdi}", "r" …
for name in clobbers: # from clobbers(.name,…)
parts.append( "~{" + name + "}" ) # "~{rcx}", "~{cc}", "~{memory}"
constraint_str = ",".join(parts)
LLVM return type follows the out_value count: 0 → void; 1 → that type;
N → an anonymous struct {T1,…,Tn} — after the call, extractvalue i per
out_value builds the sx tuple (the multi-return path, §II.2 Dev 5). out_place
outputs are stored through their Ref afterward instead.
For sys_write (one output): constraint
={rax},{rax},{rdi},{rsi},{rdx},~{rcx},~{r11},~{memory}, fn_ty = i64 (i64,ptr,i64),
args = [1, fd, buf, len], sideeffect = true. For divmod (two outputs):
={rax},={rdx},{rax},{rdx},r,~{cc}, fn_ty = {i64,i64} (i64,i64,i64), and the two
extractvalues become the (quot, rem) tuple.
Template rewriter (port verbatim from FuncGen.zig:2735-2802): state machine
over the template bytes with a name_map: [name] -> positional index built from
outputs ++ inputs:
state start: '%' -> percent ; '$' -> emit "$$" ; else emit byte
state percent: '%' -> emit '%', start
'[' -> emit "${", state input
'=' -> emit "${:uid}", start
else -> emit '%', emit byte, start
state input: ']' -> emit name_map[name], emit '}', start
':' -> emit name_map[name], emit ':', state modifier
else accumulate name
state modifier:']' -> emit accumulated modifier, emit '}', start
else accumulate
An unknown %[name] is a hard error (mirror Zig's todo/diagnostic — not a
silent pass-through; CLAUDE.md no-silent-arms rule).
Interpreter — src/ir/interp.zig
Inline asm cannot be comptime-evaluated. In the interpreter's op switch:
.inline_asm => return bailDetail("inline asm requires native execution; not available at comptime"),
(Same bailDetail pattern as the Obj-C/JNI ops — surfaces op=inline_asm: ...
rather than a silent default.)
Global asm (Deviation 6)
Lower the top-level asm_global decl to a one-shot emit:
c.LLVMAppendModuleInlineAsm(module, src.ptr, src.len) (present in the linked
LLVM — @19/include/llvm-c/Core.h:971). No operands, no rewrite, no volatile;
multiple blocks concatenate in source order (as Zig does).
Calling into an asm-defined symbol needs no new machinery — declare it with a
lib-less #foreign (Deviation 6, §II.2): my_func :: (sig) -> R #foreign; emits
an external-linkage, raw-named, C-ABI extern that the linker resolves against the
.global the asm block defines.
Guard (CLAUDE.md no-silent-arms): a global-asm symbol exists only in the final
linked binary, not in the #run/JIT host process. The interpreter resolves
externs via dlsym(RTLD_DEFAULT) (host_ffi.zig), which won't find it — calling
such a symbol at comptime must fail loudly (it should already, via the
dlsym-miss diagnostic; pin it with a test). Edge case: a symbol referenced only
by other asm/external code may need llvm.used / .no_dead_strip to survive
dead-stripping; the common "sx references it" case is safe.
II.7 Stage-to-file map (implementation checklist)
| Stage | Zig reference | sx file + insertion point | New code |
|---|---|---|---|
| Keyword | tokenizer.zig keywords |
src/token.zig — Token.Tag + keyword StaticStringMap |
kw_asm (+ contextual volatile) |
| AST node | Ast.zig:2797,3789 |
src/ast.zig:13,85,721 — Node.Data + new AsmExpr/AsmOperand |
~25 lines |
| Parser | Parse.zig:2771-2883 |
src/parser.zig — parsePrimary .kw_asm case + global-asm at decl scope |
~120 lines |
| Sema/typing | Sema.zig:15044 |
src/ir/expr_typer.zig (inferExprType) + validation |
~80 lines |
| IR op | Air.zig:1485, Zir.zig:2531 |
src/ir/inst.zig:80 — inline_asm: InlineAsm |
~25 lines |
| Lowering | AstGen.zig:8553 |
src/ir/lower/expr.zig — lowerExpr .asm_expr case |
~60 lines |
| LLVM emit | FuncGen.zig:2473-2852 |
src/ir/emit_llvm.zig — emitInst .inline_asm case |
~120 lines (constraint asm + template rewrite + LLVMGetInlineAsm/BuildCall2) |
| Global asm | Sema.addGlobalAssembly + module asm |
decl lowering → c.LLVMAppendModuleInlineAsm |
~15 lines |
| Interp bail | n/a | src/ir/interp.zig op switch |
1 line |
No change to src/codegen.zig is needed (the IR/LLVM path owns this).
II.8 Phasing
- Phase 1 (MVP).
asm { … }block;asm volatile; string-literal/#stringtemplate;= exprinputs;-> Typeoutputs including N→tuple multi-return;clobbers(.…)dot-name list;%[name]/%%substitution; "no-output ⇒ volatile" check; AT&T. Target: Linux/macOSx86_64+aarch64syscalls, intrinsics, and multi-value ops (divmod,cpuid,add_carry). - Phase 2.
-> @placewrite-through outputs, read-write ("+r" -> @place) and indirect-memory ("=*m") constraints,%=unique-id, output-to-const rejection. - Phase 3. Global/module asm decl (
LLVMAppendModuleInlineAsm) + the comptime-call guard, plus Intel-dialect opt-in. Small: the extern-call path already exists (lib-less#foreign). - Phase 4 (optional). Upgrade
clobbers(.name)from dot-name sugar to a compile-time-checked per-architectureClobberenum (typo-checking; same syntax). - Phase 5 (optional). Naked functions (
callconv-equivalent) for full freestanding entry points.
II.9 Testing
asm output is target-specific, so tests must pin a target and assert on
emitted IR/exit, not run host-natively unless the host matches. Use the existing
corpus harness and the 16xx platform block (the closest fit in the
XXXX-category scheme; specs.md/CLAUDE.md test-layout). Mirror Zig's own
matrix:
examples/16xx-platform-asm-syscall-write.sx— x86_64-linux write(2), assert exit/stdout.examples/16xx-platform-asm-register-read.sx—mov %%rsp,%[out], no-input output.examples/16xx-platform-asm-no-output-volatile.sx— bareasm volatile { "nop" }.examples/16xx-platform-asm-missing-volatile.sx— expected compile error (no output, no volatile) — pins the diagnostic.examples/16xx-platform-asm-template-subst.sx—%[a]/%%rewriting, assert on thesx ir/.ssnapshot.examples/16xx-platform-asm-multi-return.sx—divmod→(quot, rem)tuple, destructured.examples/16xx-platform-asm-global.sx(Phase 3) — global asm + extern call.
Add an IR/.s snapshot (expected/*.ir) for the substitution test so the
constraint-string + template-rewrite output is locked. Seed markers and
regenerate with zig build test -Dupdate-goldens, then review the diff
(CLAUDE.md snapshot-integrity rule).
II.10 Open decisions for the user
Largely settled through design review; what remains:
- Dialect: AT&T only (Zig's default) for v1, or expose an Intel opt-in
(
LLVMInlineAsmDialectIntel) from the start? Recommend AT&T-only v1. volatilekeyword (Deviation 4): contextual (recommended, no source-compat risk) vs globally reserved (simpler lexer).- Brace separator: comma (recommended — trailing-comma-friendly,
literal-style) vs
;(matches sx statement blocks). - Asm-symbol extern spelling (Deviation 6): reuse lib-less
#foreignfor v1 (works, zero new surface — but it is a C-FFI import binding: import-only, C-ABI, spelled "foreign") vs a dedicated#extern/linkname (cleaner spelling, and the only path that could also export an sx symbol so asm can call back in — which#foreigncannot). Recommend#foreignfor v1; revisit#externif/when asm-calls-into-sx or non-C-ABI symbols are needed.
Decided: brace block { … } (Dev 1) · ->/= markers, : sections dropped,
<- rejected (Dev 2) · clobbers(.…) enum-literal list, dot-name sugar now →
checked enum later (Dev 3) · multiple value-outputs return a tuple (Dev 5). For
global asm (Dev 6) the call-into-asm direction reuses lib-less #foreign
(specs.md:1209 updated); the extern spelling is open decision 4 above.
II.11 Risks
- Constraint/template correctness is silent if wrong — a bad constraint string miscompiles with no diagnostic. Mitigation: port Zig's assembler/rewrite verbatim (don't paraphrase) and lock IR snapshots in tests.
- Register-name validity is unchecked in v1's
clobbers(.name)dot-name form — a typo'd register (.raxx) surfaces only as an LLVM error. This is exactly the gap the Phase-4 checkedClobberenum closes; acceptable for v1 (LLVM validates the emitted~{…}). #stringheredoc + AT&T%/$interplay: ensure the heredoc delivers the template bytes literally (no sx-level escape processing of%/$) before the rewrite stage.- Target gating: asm examples must declare their target or they break the corpus on other hosts; the test plan pins targets.
Appendix A — exact LLVM-C calls (already reachable via llvm_api.c)
// src/llvm_api.zig @cInclude("llvm-c/Core.h") exposes all of these:
LLVMValueRef LLVMGetInlineAsm(LLVMTypeRef Ty,
const char *AsmString, size_t AsmStringSize,
const char *Constraints, size_t ConstraintsSize,
LLVMBool HasSideEffects, LLVMBool IsAlignStack,
LLVMInlineAsmDialect Dialect, LLVMBool CanThrow); // LLVM 19 & 21: identical
LLVMValueRef LLVMBuildCall2(LLVMBuilderRef, LLVMTypeRef, LLVMValueRef Fn,
LLVMValueRef *Args, unsigned NumArgs, const char *Name);
void LLVMAppendModuleInlineAsm(LLVMModuleRef M, const char *Asm, size_t Len); // global asm
// enum: LLVMInlineAsmDialectATT, LLVMInlineAsmDialectIntel
Appendix B — file index
Zig (reference, ~/projects/zig): lib/std/zig/tokenizer.zig (keywords) ·
lib/std/zig/Ast.zig:2797,3789,3969 (nodes) · lib/std/zig/Parse.zig:2771-2883
(grammar) · lib/std/zig/AstGen.zig:8553-8669,12257 + lib/std/zig/Zir.zig:2531
(ZIR) · src/Sema.zig:15044-15231 (validation) · src/Air.zig:1485 (AIR) ·
src/codegen/llvm/FuncGen.zig:2473-2852 + lib/std/zig/llvm/Builder.zig:6131
(LLVM) · doc/langref/inline_assembly.zig, doc/langref/test_global_assembly.zig
(syntax) · doc/langref.html.in:4217-4300 (spec).
sx (target, ~/projects/sx): src/token.zig · src/lexer.zig:402 (#string) ·
src/ast.zig:13,85,721 · src/parser.zig (parsePrimary), :319 (optional
#foreign library) · src/ir/expr_typer.zig · src/ir/inst.zig:80,219,260 ·
src/ir/lower/expr.zig · src/ir/module.zig:300 (declareExtern) ·
src/ir/emit_llvm.zig:167 (msgSend cache), :1244 (extern⇒C-ABI), :1279
(raw symbol name) · src/ir/interp.zig (bailDetail) · src/llvm_api.zig:1-17 ·
build.zig:10 (LLVM@19).
Appendix C — Cookbook (final form: asm { … }, ->/=, clobbers(.…), pure AT&T)
// ── v1 ────────────────────────────────────────────────────────────────────
asm volatile { "nop" }; // bare side-effecting
// write(2) syscall — register-pinned inputs, one value-output
sys_write :: (fd: i64, buf: [*]u8, len: u64) -> i64 {
return asm volatile {
"syscall",
"={rax}" -> i64,
"{rax}" = 1, "{rdi}" = fd, "{rsi}" = buf, "{rdx}" = len,
clobbers(.rcx, .r11, .memory),
};
}
// mmap — full 6-arg syscall ABI (arg4 in r10, not rcx)
mmap :: (addr: *void, len: u64, prot: i32, flags: i32, fd: i32, off: i64) -> *void {
return asm volatile {
"syscall",
"={rax}" -> *void,
"{rax}" = 9, "{rdi}" = addr, "{rsi}" = len, "{rdx}" = prot,
"{r10}" = flags, "{r8}" = fd, "{r9}" = off,
clobbers(.rcx, .r11, .memory),
};
}
// AT&T scaled-index addressing — arr[i]
load_idx :: (arr: *i64, i: u64) -> i64 {
return asm {
"movq (%[arr],%[i],8), %[out]",
[out] "=r" -> i64, [arr] "r" = arr, [i] "r" = i,
};
}
// CPUID AVX probe — immediates, heavy clobber set, single value-result
has_avx :: () -> bool {
return asm volatile {
#string ATT
movl $1, %%eax
cpuid
andl $0x10000000, %%ecx
setne %[ok]
ATT,
[ok] "=r" -> bool,
clobbers(.rax, .rbx, .rcx, .rdx, .cc),
};
}
// SSE packed add — xmm regs, no outputs ⇒ volatile
vadd4 :: (a: *f32, b: *f32, out: *f32) {
asm volatile {
#string ATT
movups (%[a]), %%xmm0
movups (%[b]), %%xmm1
addps %%xmm1, %%xmm0
movups %%xmm0, (%[out])
ATT,
[a] "r" = a, [b] "r" = b, [out] "r" = out,
clobbers(.xmm0, .xmm1, .memory),
};
}
// ── multi-return (v1; sx has tuples, Zig caps at one output) ────────────────
// 64-bit divide → (quotient, remainder)
divmod :: (n: u64, d: u64) -> (quot: u64, rem: u64) {
return asm {
"divq %[d]",
[quot] "={rax}" -> u64,
[rem] "={rdx}" -> u64,
"{rax}" = n, "{rdx}" = 0, [d] "r" = d,
clobbers(.cc),
};
}
// rdtsc → two 32-bit halves, destructured straight out of the asm
rdtsc :: () -> u64 {
lo, hi := asm volatile {
"rdtsc",
[lo] "={eax}" -> u32,
[hi] "={edx}" -> u32,
};
return (xx hi << 32) | xx lo;
}
// cpuid → a clean 4-tuple
cpuid :: (leaf: u32, subleaf: u32) -> (eax: u32, ebx: u32, ecx: u32, edx: u32) {
return asm volatile {
"cpuid",
[eax] "={eax}" -> u32, [ebx] "={ebx}" -> u32,
[ecx] "={ecx}" -> u32, [edx] "={edx}" -> u32,
"{eax}" = leaf, "{ecx}" = subleaf,
};
}
// add-with-carry → (sum, carry): value-output + tied input + flag capture
add_carry :: (a: u64, b: u64) -> (sum: u64, carry: u8) {
return asm {
#string ATT
addq %[b], %[sum]
setc %[carry]
ATT,
[sum] "=r" -> u64,
[carry] "=r" -> u8,
[a] "0" = a, [b] "r" = b,
clobbers(.cc),
};
}
// ── Phase 2 (write-through / read-write / indirect) ─────────────────────────
// byte memcpy — labels, loop, read-write operands
memcpy_bytes :: (dst: [*]u8, src: [*]u8, n: u64) {
d := dst; s := src; c := n;
asm volatile {
#string ATT
testq %[c], %[c]
jz 2f
1: movb (%[s]), %%al
movb %%al, (%[d])
incq %[s]
incq %[d]
decq %[c]
jnz 1b
2:
ATT,
[d] "+r" -> @d, [s] "+r" -> @s, [c] "+r" -> @c,
clobbers(.rax, .cc, .memory),
};
}
// lock cmpxchg CAS — lock prefix, pinned read-write rax, two outputs
cas :: (ptr: *i64, expected: i64, desired: i64) -> bool {
old := expected; ok: bool = ---;
asm volatile {
#string ATT
lock cmpxchgq %[desired], (%[ptr])
sete %[ok]
ATT,
[ok] "=r" -> @ok,
[old] "+{rax}" -> @old,
[ptr] "r" = ptr,
[desired] "r" = desired,
clobbers(.cc, .memory),
};
return ok;
}
// fill an existing struct (write-through, no tuple)
cpuid_into :: (out: *CpuId, leaf: u32) {
asm volatile {
"cpuid",
"={eax}" -> @out.eax, "={ebx}" -> @out.ebx,
"={ecx}" -> @out.ecx, "={edx}" -> @out.edx,
"{eax}" = leaf,
};
}
Global asm + extern (Phase 3):
asm {
#string ATT
.global my_add
my_add:
lea (%rdi,%rsi,1), %eax
retq
ATT,
};
my_add :: (a: i32, b: i32) -> i32 #foreign; // lib-less extern = Zig's `extern fn`