Files
sx/design/inline-asm-design.md
2026-06-17 09:58:43 +03:00

1032 lines
45 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Inline Assembly for sx — Design Doc & Proposal
**Status:** proposal / not yet scheduled into a workstream
**Author:** research pass over the Zig compiler (`~/projects/zig`, 0.16-dev) + the sx compiler
**Scope:** how Zig implements inline assembly end-to-end, and a minimal-deviation proposal to bring the same model to sx.
> Guiding constraint for this doc: **mirror Zig's design; deviate only where sx's
> grammar or stdlib makes a 1:1 copy impossible, and call every deviation out
> explicitly with its justification.** Every deviation below is tagged
> **[DEVIATION]** with a reason.
---
## 0. TL;DR + feasibility
* **Feasible today, no new infrastructure.** sx already links LLVM (`build.zig:10`
`/opt/homebrew/opt/llvm@22`) and `@cImport`s `llvm-c/Core.h`
(`src/llvm_api.zig:1-17`). That header exposes everything inline asm needs,
reachable right now through `llvm_api.c.*`:
* `LLVMGetInlineAsm(Ty, AsmString, AsmStringSize, Constraints, ConstraintsSize, HasSideEffects, IsAlignStack, Dialect, CanThrow)` — builds the asm callee (LLVM 1922 share this 9-arg signature).
* `LLVMInlineAsmDialectATT` / `LLVMInlineAsmDialectIntel`.
* `LLVMBuildCall2(...)` — already used pervasively in `src/ir/emit_llvm.zig` (e.g. the Obj-C msgSend path) — calls the asm value like a function.
* `LLVMAppendModuleInlineAsm(M, Asm, Len)` — module-level (global) asm.
* **The hard part is not codegen.** Codegen is ~80 lines of well-trodden LLVM-C.
The real work is (a) the parser grammar, (b) a faithful port of Zig's
*LLVM constraint-string assembly* and *`%[name]`→`$N` template rewrite*, and
(c) Sema validation rules. All three are fully specified below.
* **Surface form (decided, §II.2):** `asm volatile { "tmpl", "=r" -> T, "r" = x, clobbers(.cc, .memory) }`
— a brace block; `->` marks outputs / `=` marks inputs (no positional `:`
sections); enum-literal `clobbers(.…)`; and N `-> Type` outputs return a
**tuple** (sx has tuples — Zig caps at one output).
* **Inline asm is never comptime-evaluable.** The interpreter must bail loudly
(`bailDetail`), per CLAUDE.md's "no silent unimplemented arms" rule.
* **One naming note:** sx already has a `sx asm <file>` *CLI subcommand*
(`src/main.zig:203,386`) that emits a `.s` file. That is a compiler output
mode, a different namespace from a language token. No conflict, but worth
knowing so nobody confuses the two.
---
# PART I — How Zig implements inline assembly
All file references in Part I are under `~/projects/zig` (0.16-dev,
commit `3deb86bafd`). Parser/AST/AstGen live in `lib/std/zig/`; Sema/AIR/codegen
in `src/`.
## I.1 Surface syntax
The canonical example (`doc/langref/inline_assembly.zig`), a Linux x86_64 syscall:
```zig
pub fn syscall3(number: usize, arg1: usize, arg2: usize, arg3: usize) usize {
return asm volatile ("syscall"
: [ret] "={rax}" (-> usize),
: [number] "{rax}" (number),
[arg1] "{rdi}" (arg1),
[arg2] "{rsi}" (arg2),
[arg3] "{rdx}" (arg3),
: .{ .rcx = true, .r11 = true });
}
```
Grammar shape:
```
asm volatile? ( <template-string>
: <output-item> , <output-item> , ... # outputs (optional section)
: <input-item> , <input-item> , ... # inputs (optional section)
: <clobbers> ) # clobbers (optional section)
output-item : [name] "constraint" (-> Type) # asm result becomes the value
| [name] "constraint" (lvalue) # asm writes through the pointer
input-item : [name] "constraint" (expr)
clobbers : .{ .reg0 = true, .reg1 = true } # struct literal (0.16-dev)
```
Key semantics (from `doc/langref.html.in:4217-4300`):
* **`volatile`** marks side effects. Without it, an asm expression whose result
is unused may be deleted. An asm expression with **no outputs must be
`volatile`** (else compile error).
* **x86/x86_64 use AT&T syntax** (LLVM provides the parser; Intel support is
"buggy and not well tested").
* **`%[name]`** in the template refers to a named operand's register; **`%%`** is
a literal `%`.
* **Clobbers** are registers the asm trashes that are *not* inputs/outputs.
`"memory"` (the `.memory = true` field) means "writes to arbitrary memory."
Failing to declare a clobber is unchecked illegal behavior.
* **Global assembly** = an `asm(...)` in a namespace-level `comptime` block. It
has *different rules*: `volatile` is forbidden, there are **no inputs/outputs/
clobbers**, no `%` substitution, and all global asm is concatenated verbatim:
```zig
// doc/langref/test_global_assembly.zig
comptime {
asm (
\\.global my_func;
\\.type my_func, @function;
\\my_func:
\\ lea (%rdi,%rsi,1),%eax
\\ retq
);
}
extern fn my_func(a: i32, b: i32) i32; // call into the global-asm symbol
```
## I.2 Pipeline, stage by stage
### Tokenizer — `lib/std/zig/tokenizer.zig`
Two keywords in the `StaticStringMap`: `.{ "asm", .keyword_asm }` and
`.{ "volatile", .keyword_volatile }`.
### AST — `lib/std/zig/Ast.zig`
Four node tags (`Ast.zig:3789-3817`):
* `asm_simple` — `asm(template)` only, no operands.
* `@"asm"` — full form; `data` is `node_and_extra` → (template node, `ExtraIndex` to an `Asm`).
* `asm_output` — `[a] "b" (-> Type)` or `[a] "b" (ident)`.
* `asm_input` — `[a] "b" (expr)`.
The "full" view the rest of the compiler consumes (`Ast.zig:2797-2809`):
```zig
pub const Asm = struct {
ast: Components,
volatile_token: ?TokenIndex,
outputs: []const Node.Index,
inputs: []const Node.Index,
pub const Components = struct {
asm_token: TokenIndex,
template: Node.Index,
items: []const Node.Index, // outputs ++ inputs, interleaved order preserved
clobbers: Node.OptionalIndex, // a comptime expression (the struct literal)
rparen: TokenIndex,
};
};
```
The on-disk extra record (`Ast.zig:3969-3975`) stores `items_start/items_end`
(a span into the node list), `clobbers` (optional node), and `rparen`.
### Parser — `lib/std/zig/Parse.zig`
`expectAsmExpr` (`Parse.zig:2771-2838`) implements the grammar:
```zig
fn expectAsmExpr(p: *Parse) !Node.Index {
const asm_token = p.assertToken(.keyword_asm);
_ = p.eatToken(.keyword_volatile);
_ = try p.expectToken(.l_paren);
const template = try p.expectExpr();
if (p.eatToken(.r_paren)) |rparen| { /* asm_simple */ }
_ = try p.expectToken(.colon);
// ... parse output items until a `:`/`)` ...
const clobbers: Node.OptionalIndex = if (p.eatToken(.colon)) |_| clobbers: {
// ... parse input items until a `:`/`)` ...
_ = p.eatToken(.colon) orelse break :clobbers .none;
break :clobbers (try p.expectExpr()).toOptional(); // clobbers = an expression
} else .none;
// ...
}
```
* `parseAsmOutputItem` (`Parse.zig:2840-2864`):
`LBRACKET IDENT RBRACKET STRINGLITERAL LPAREN (MINUSRARROW TypeExpr | IDENT) RPAREN`.
* `parseAsmInputItem` (`Parse.zig:2866-2883`):
`LBRACKET IDENT RBRACKET STRINGLITERAL LPAREN Expr RPAREN`.
* **Clobbers parse as a generic expression** (`(try p.expectExpr())`), not a
string list — this is the 0.16-dev change. It is later coerced to a
`std.lang.assembly.Clobbers` struct at Sema time.
### AST → ZIR — `lib/std/zig/AstGen.zig`
`asmExpr` (`AstGen.zig:8553-8669`) + `addAsm` (`12257-12310`). The ZIR payload
(`lib/std/zig/Zir.zig:2531-2564`):
```zig
pub const Asm = struct {
src_node: Ast.Node.Offset,
asm_source: NullTerminatedString, // template (string-literal case)
output_type_bits: u32, // bit i = output i uses `-> T` (vs ptr)
clobbers: Ref, // comptime ref → assembly.Clobbers value
pub const Small = packed struct(u16) { is_volatile: bool, outputs_len: u7, inputs_len: u8 };
pub const Output = struct { name: NullTerminatedString, constraint: NullTerminatedString, operand: Ref };
pub const Input = struct { name: NullTerminatedString, constraint: NullTerminatedString, operand: Ref };
};
```
AstGen already enforces the structural rules:
* Global (container-level) asm: rejects `volatile`, rejects any
outputs/inputs/clobbers (`AstGen.zig:8583-8587`).
* Local asm: **"assembly expression with no output must be marked volatile."**
* `outputs.len < 16`, `inputs.len < 32` (fit `Small.outputs_len`/`inputs_len`).
* At most one output may use the `-> T` form ("inline assembly allows up to one
output value"); `output_type_bits` records which.
* Two ZIR tags: `.@"asm"` (string-literal template) vs `.asm_expr` (comptime
expression template).
### ZIR → AIR (Sema) — `src/Sema.zig`
`zirAsm` (`Sema.zig:15044-15231`, dispatched at `1396-1397`). This is where all
*semantic* validation happens. It:
* Resolves the template to a comptime string (`resolveConstString`).
* **Global asm** (`func_index == .none`): asserts no operands, then
`zcu.addGlobalAssembly(owner, asm_source)` and returns `.void_value`.
* `requireRuntimeBlock` — local asm can't run at comptime.
* Per output: if `-> T`, resolve the type, `ensureLayoutResolved`, set the
expression's result type; else resolve the operand pointer. Validates:
* **output type has a well-defined in-memory layout** (else error);
* **cannot output to a `const` pointer** (`"asm cannot output to const '{s}'"`);
* output must be a runtime value (no reference to a comptime var).
* Per input: resolve operand, reject comptime-only refs, **coerce
`comptime_int`→`usize`, `comptime_float`→`f64`**.
* Clobbers: coerce the expression to `std.lang.assembly.Clobbers`, resolve to a
comptime value.
The AIR payload (`src/Air.zig:1485-1497`):
```zig
pub const Asm = struct {
source_len: u32,
inputs_len: u32,
clobbers: InternPool.Index, // comptime assembly.Clobbers value
flags: packed struct(u32) { outputs_len: u31, is_volatile: bool },
};
// trailing: out operand refs, in operand refs, then the template bytes and
// (constraint\0 name\0) pairs packed into air_extra.
```
### AIR → LLVM — `src/codegen/llvm/FuncGen.zig`
`airAssembly` (`FuncGen.zig:2473-2852`) is the crux. **This is the algorithm sx
must port.** Three sub-tasks:
**(a) Assemble the LLVM constraint string.** Comma-separated. For each output:
emit `=` (write-only) or `+` (read-write, recorded in `llvm_rw_vals`); a `*`
prefix marks an *indirect* (memory) output passed as a pointer parameter; a
non-indirect output contributes to the return type. The user's leading `=`/`+`
in `constraint[0]` is consumed and re-emitted; the rest is copied with Zig
commas rewritten to LLVM `|` (alternative constraints). Inputs are copied
similarly (no `=`). Clobbers: iterate the `Clobbers` struct's bool fields as a
bigint; for each `true` field emit `~{fieldname}` (via `appendConstraints`,
which also expands target-specific aliases).
**(b) Rewrite the template** `%[name]` → LLVM positional `${N}` (state machine,
`FuncGen.zig:2735-2802`):
| input | output | note |
|---|---|---|
| `$` | `$$` | escape LLVM's `$` |
| `%%` | `%` | literal percent |
| `%=` | `${:uid}` | unique id |
| `%[name]` | `${N}` | `N` = position in `name_map` |
| `%[name:mod]` | `${N:mod}` | with modifier |
`name_map` maps each operand's `[name]` to its positional index across all
outputs+inputs.
**(c) Build & call.** Pick the LLVM function type:
`return_count == 0` → `void`; `== 1` → the single return type; `> 1` → an
anonymous struct of the return types. Then:
```zig
const call = try self.wip.callAsm(
attributes, llvm_fn_ty,
.{ .sideeffect = is_volatile }, // Assembly.Info: sideeffect/alignstack/inteldialect/unwind
rendered_template, llvm_constraints, llvm_param_values, "");
```
`callAsm` (`lib/std/zig/llvm/Builder.zig:6131-6143`) is a thin wrapper that
builds the asm constant (`asmValue`) and emits a normal `call`. In LLVM-C terms
this is exactly `LLVMGetInlineAsm(...)` + `LLVMBuildCall2(...)`. Finally,
non-indirect outputs are read back: with one return it's the call result; with
several it's `extractvalue i` per output; indirect outputs were already written
by the asm via their pointer parameter.
### C backend — `src/codegen/c.zig`
No `airAssembly` for *inline* asm in the C backend in this tree; only global asm
flows out (as `module asm`). For sx this is irrelevant — sx only has an LLVM
backend.
### Global asm & naked functions
* **Global asm** bypasses everything above: `Sema.addGlobalAssembly` accumulates
the verbatim source; the LLVM object emits it via the module-level asm string
(LLVM-C: `LLVMAppendModuleInlineAsm`). Symbols it defines are reached with
`extern fn`.
* **Naked functions** (`callconv(.naked)`) drop the prologue/epilogue; the body
is entirely inline asm. This is an orthogonal calling-convention feature, not
part of the asm expression itself.
---
# PART II — Proposal for sx
## II.1 Design principles
1. **Copy Zig's *semantic* model exactly**: a template + register/memory operands
+ clobbers + a `volatile` flag; AT&T syntax via LLVM; "no-output asm must be
volatile"; `%[name]` substitution; AT&T-by-default.
2. **Copy the LLVM lowering exactly** (the constraint-string assembler + template
rewriter from `FuncGen.zig` are reproduced verbatim in §II.6 — these are the
parts where "inventing our own" would silently miscompile).
3. **Diverge from Zig's *surface* syntax where sx has a better-fitting idiom**, and
only there. The deviations (§II.2) are deliberate: a brace block instead of
`( … )`; `->`/`=` operand markers instead of positional `:` sections; an
enum-literal `clobbers(.…)` list; and — because sx has tuples and Zig does not —
**true multiple return values** instead of Zig's one-output cap.
## II.2 sx surface syntax
`asm` is an **expression** (it yields the output value/tuple), introduced by a new
`asm` keyword. The body is a **brace block** of comma-separated parts: a template
string first, then operands, then an optional `clobbers(.…)` clause. Each operand
is `[name]? "constraint" <role>`, where the role marker is:
* **`-> Type`** — an **output** that produces a value (joins the result).
* **`-> @place`** — an output that writes through to existing storage (Phase 2).
* **`= expr`** — an **input** (the value fed in).
`->` reuses sx's "produces" arrow (as in `(a: i32) -> i32`); `=` reuses sx's
"is set to" binding. There are no positional `:` sections.
```sx
// x86_64-linux — write(2) via syscall
sys_write :: (fd: i64, buf: [*]u8, len: u64) -> i64 {
return asm volatile {
"syscall",
"={rax}" -> i64, // output → the expression's value
"{rax}" = 1, // SYS_write
"{rdi}" = fd,
"{rsi}" = buf,
"{rdx}" = len,
clobbers(.rcx, .r11, .memory),
};
}
// read a register, no inputs, named operand for %[out]
sp :: () -> u64 {
return asm { "mov %%rsp, %[out]", [out] "=r" -> u64 };
}
```
Multi-instruction templates use sx's existing **`#string` heredoc**
(`src/lexer.zig:402`) or a multi-line `"..."` literal — no new lexer feature:
```sx
serialize :: () {
asm volatile {
#string ATT
mfence
lfence
ATT,
};
}
```
**Outputs and the result type.** A `-> Type` output contributes one value to the
asm expression's result; the count decides the shape:
| `-> Type` outputs | result | spelling |
|---|---|---|
| 0 | `void` (must be `volatile`) | `asm volatile { … }` |
| 1 | that type `T` | `x := asm { …, "=r" -> T };` |
| N | a **tuple** `(T1,…,Tn)` (declaration order) | `a, b := asm { … };` |
A `[name]` on an output becomes a **named tuple field** — the same name you'd use
for `%[name]` does double duty:
```sx
// sx has tuples, so asm gets real multiple return values (Zig caps you at one).
divmod :: (n: u64, d: u64) -> (quot: u64, rem: u64) {
return asm {
"divq %[d]",
[quot] "={rax}" -> u64, // → .quot (operand 0)
[rem] "={rdx}" -> u64, // → .rem (operand 1)
"{rax}" = n,
"{rdx}" = 0,
[d] "r" = d,
clobbers(.cc),
};
}
q, r := divmod(17, 5); // q = 3, r = 2
```
### Deviations from Zig (each deliberate; semantics unchanged)
* **[DEVIATION 1 — brace block, not `( … )`.]** The asm body is `asm { … }`, a
comma-separated brace block (trailing comma allowed, per `specs.md:226,501`),
not Zig's parenthesised form. Braces read as "a block of code," which is what an
asm template is; `#string` heredoc templates especially benefit. `asm` is a
keyword, so `asm {` / `asm volatile {` is unambiguous.
* **[DEVIATION 2 — `->`/`=` operand markers, not `:` sections.]** Zig groups
operands into positional `: outputs : inputs : clobbers` sections (count the
colons; `: :` for an empty one). sx tags each operand by role instead — `-> Type`
/ `-> @place` (output) and `= expr` (input) — so the list is flat,
order-independent, with no positional colons. *(`<-` for inputs was considered
and rejected: it can't be a global token without mis-lexing `a < -b`; `=` reuses
an existing token and the existing "binding" meaning.)*
* **[DEVIATION 3 — clobbers are an enum-literal list `clobbers(.cc, .memory)`.]**
Zig 0.16 uses a struct literal `: .{ .rcx = true }` coerced to a per-arch
`std.lang.assembly.Clobbers`; older Zig used a string list. sx uses a dot-literal
list, cleaner than both. **v1:** each `.name` is a dot-name lowered straight to
`~{name}` (`.memory`/`.cc` are recognized specials; register names pass through
verbatim; LLVM validates). **Phase 4:** upgrade `.name` to members of a
compile-time-checked per-arch `Clobber` enum — *same syntax*, gains typo-checking.
Note the call-looking `clobbers(…)` is a declarative clause, **not** a call —
nothing executes; it only feeds the register allocator.
* **[DEVIATION 4 — `volatile` is a *contextual* keyword.]** sx's keyword set
(`specs.md:168`) has neither `asm` nor `volatile`. `asm` becomes a real keyword;
`volatile` appears *only* right after `asm`, so it can be recognized contextually
(a plain identifier everywhere else), avoiding reserving it globally. The surface
is byte-identical to Zig. (Alternative: reserve globally — simpler lexer, small
source-compat risk. Recommend contextual.)
* **[DEVIATION 5 — multiple value-outputs return a tuple (sx ⊃ Zig).]** Zig allows
at most one `-> T` output; the rest must be pointer/lvalue outputs. sx has
tuples, so N `-> Type` outputs return `(T1,…,Tn)` (named when operands are
named), destructured with `a, b := …`. A deliberate *improvement* over Zig,
enabled by a feature Zig lacks, and maps onto LLVM's existing multi-output
struct return (§II.6). The other output flavor — `-> @place` write-through, plus
read-write (`"+r" -> @place`) and indirect-memory (`"=*m"`) outputs — is
**Phase 2** (needs indirect-constraint handling); the value-tuple form does not.
* **[DEVIATION 6 — global asm is a top-level `asm { … }` declaration.]** sx has no
namespace-level `comptime {}` block (it has `#run`, `specs.md:2598`), so global
asm is a top-level statement:
```sx
asm {
#string ATT
.global my_func
.type my_func, @function
my_func:
lea (%rdi,%rsi,1), %eax
retq
ATT,
};
my_func :: (a: i32, b: i32) -> i32 extern; // extern, no library — valid sx today
```
Only the `comptime {}` wrapper is dropped; lowers to `LLVMAppendModuleInlineAsm`.
**Calling the asm symbol reuses the C-FFI *import* path** (no new mechanism for
v1). A lib-less `extern` fn declaration (its library is optional; used in 50+
stdlib sites, e.g. `chdir :: (path: [*]u8) -> i32 extern;`) emits exactly the
artifact needed to *call into* the asm symbol — an external-linkage,
**C-calling-convention**, raw-named, link-time-resolved declaration — the same
thing Zig's `extern fn` produces (also C-callconv). The reverse direction (asm
calling *back into* an sx function) is handled by `export`, the define-and-expose
dual of `extern`.
Everything *semantic* — comptime-known template, register/memory constraints
verbatim to LLVM, clobber meaning, "no-output ⇒ must be volatile," AT&T default,
`%[name]`/`%%` substitution — is **identical to Zig**. Only the surface (block,
`->`/`=`, `clobbers(.…)`, tuple returns) differs.
## II.3 sx AST
sx's AST is a pointer-based tagged union (`Data = union(enum)` at
`src/ast.zig:13`, nodes built via `Parser.createNode`), much simpler than Zig's
SoA `extra_data` scheme — so we can store slices directly. Add one arm to the
`Node.Data` union (`src/ast.zig:13`):
```zig
// in Node.Data union(enum):
asm_expr: AsmExpr,
// new node struct, alongside the other expression node defs:
pub const AsmExpr = struct {
template: *Node, // string-literal / #string node (comptime string)
is_volatile: bool = false,
operands: []const AsmOperand, // declaration order preserved (= %N indexing)
clobbers: []const []const u8, // dot-names from clobbers(.…): "rcx","cc","memory"
};
pub const AsmOperand = struct {
name: ?[]const u8 = null, // optional [name]; only needed for %[name]
constraint: []const u8, // verbatim, e.g. "={rax}", "=r", "+r", "{rdi}", "r"
role: Role,
payload: *Node, // out_value → Type node; out_place/input → expr node
pub const Role = enum {
out_value, // `-> Type` value output; N of these → a tuple result
out_place, // `-> @place` write-through to existing storage (Phase 2)
input, // `= expr`
};
};
```
A single flat `operands` list (not split into outputs/inputs) preserves source
order — what the `%0`/`%[name]` indices and the LLVM constraint order key off. The
result type is derived in Sema from the `out_value` operands (§II.5).
## II.4 sx parser
`asm` is parsed in expression position. sx dispatches primary expressions in
`Parser.parsePrimary` (`src/parser.zig`); add a `.kw_asm` case (mirroring how
existing keyword/`#`-directive expressions like `#run` are handled):
1. consume `asm`; contextually consume `volatile` if the next token is the word
`volatile` (Deviation 4).
2. `expect(.l_brace)`; parse the first element as the **template** expression.
3. then a comma-separated list until `}`. Each element is either:
* an **operand** — `[name]?` (a bracketed identifier), a string-literal
constraint, then a role: `->` `Type` (out_value) · `->` `@`-place
(out_place, Phase 2) · `=` `expr` (input); or
* the **clobbers clause** — `clobbers` `(` `.`ident (`,` `.`ident)* `)`.
4. allow a trailing comma; `expect(.r_brace)`;
`createNode(start, .{ .asm_expr = … })`.
The first element is unambiguously the template (a string not followed by a role
marker). `->` vs `=` after the constraint disambiguates output vs input; inside a
`->` target, a leading `@` marks a write-through place vs a type.
Top-level/global asm (Deviation 6): recognize `asm {` at declaration scope and
build a dedicated `asm_global` decl (template only — reject operands/`volatile`).
Lexer/token: add `kw_asm` to the `Token.Tag` enum + keyword `StaticStringMap` in
`src/token.zig`; `volatile` and `clobbers` stay out of the global table
(contextual). **No new operator tokens** — `->` (`arrow`), `=` (`equal`), `.`
(`dot`) and `{}` already exist.
## II.5 sx Sema / typing
* **Result type** from the `out_value` operands (`-> Type`), in declaration order:
0 → `void` (and the asm **must** be `volatile`); 1 → that operand's type `T`;
N → a tuple `(T1,…,Tn)`, **named** when the operands carry `[name]`s
(`(name1: T1, …)`), positional otherwise. Implement in the expression typer
(`src/ir/expr_typer.zig` / wherever `inferExprType` lives), returning the resolved
`TypeId` (a tuple `TypeId` for N>1). **Do not** fall back to a silent default — an
unresolvable output type is a real error (CLAUDE.md silent-default rule): emit a
diagnostic and return the project's `.unresolved` sentinel.
* Port Zig's validation checklist (these are the user-facing error messages):
1. no output operand ⇒ the asm **must** be `volatile`;
2. each `out_value` result type must have a well-defined in-memory layout;
3. inputs must be runtime values; coerce comptime int→`i64`, float→`f64`;
4. template must be a comptime-known string;
5. (Phase 2) `out_place` cannot write a `const`; indirect-memory rules.
* Every `%[name]` referenced in the template must name an operand (best surfaced as
a Sema diagnostic; also caught at codegen during the rewrite — §II.6).
### Operand naming rule (auto-name from a `{reg}` pin) — DECIDED
The `[name]` label on an operand is purely an sx-surface convenience: it provides
the `%[name]` template alias and (for `out_value`) the result tuple's field name.
LLVM never sees it (it sees positional `${N}` + the constraint). To kill the
common redundancy where a label just echoes its pinned register
(`[eax] "={eax}"`), the **operand name is derived as follows**, uniformly across
every operand kind (`out_value` / `out_place` / read-write / `input`):
1. **Explicit `[name]` wins** — use it verbatim (the `%[name]` alias / field name).
2. **Else, if the constraint pins a single register** — `"={eax}"`, `"{rdi}"`,
`"+{rax}"`, i.e. a `{reg}` body (optionally with a `=`/`+` prefix) — the operand
is **auto-named after that register** (`eax`, `rdi`, `rax`). Usable as
`%[eax]` and as the tuple field name.
3. **Else (register-class `=r`/`+r`/`r`, or memory `=m`, …)** — the operand has
**no implicit name**. A `[name]` is then **required** if the template
references it (`%[name]`) or, for `out_value`, if a named result field is
wanted; otherwise it is anonymous (positional tuple field).
Corollaries:
* **Reject the echo form.** An explicit `[name]` that is identical to the
register its own constraint pins (`[eax] "={eax}"`) carries no information —
emit a diagnostic ("redundant operand name `eax` — it already names the pinned
register; drop the `[eax]`"). The useful form is a label that *differs* from the
register (`[quot] "={rax}"` → field `quot` over register `rax`).
* **Result field names** (the §II.5 result-type rule above) come from each
`out_value`'s *effective* name — explicit `[name]`, else the auto-derived
register name; positional only when neither exists (a class-constrained output
with no `[name]`).
* This is a **typing-stage** rule: the parser still stores `name: ?[]const u8`
(null when no `[name]` was written); Sema computes the effective name. No
parser change.
Note: there is **no** "≤1 output" rule (that was Zig's limit; sx's tuples lift it).
## II.6 sx IR + LLVM codegen (the part that must match Zig bit-for-bit)
### IR op — `src/ir/inst.zig`
Add to `Op = union(enum)` (`src/ir/inst.zig:80`), next to `objc_msg_send`
(`:219`). Strings are interned (`StringId`, as `const_string` at `:85`); operands
are SSA `Ref`s:
```zig
inline_asm: InlineAsm,
pub const InlineAsm = struct {
template: StringId, // interned, RAW (rewritten at emit)
operands: []const AsmOperand, // declaration order (= %N indexing)
clobbers: []const StringId, // interned dot-names: "rcx","cc","memory"
has_side_effects: bool,
// result rides on Inst.ty: void / a scalar TypeId / a tuple TypeId (N outputs)
};
pub const AsmOperand = struct {
role: enum { out_value, out_place, input },
name: StringId, // .none when unnamed
constraint: StringId, // verbatim "={rax}" / "=r" / "+r" / "{rdi}"
operand: Ref, // out_value → .none; out_place/input → the Ref
};
```
### Lowering — `src/ir/lower/expr.zig`
Add `.asm_expr => self.lowerAsmExpr(...)` to the `lowerExpr` dispatch. It interns
the template + constraint strings + clobber names, lowers each input operand to a
`Ref`, computes the result `TypeId` (§II.5), and emits the `inline_asm` op. (Same
shape as the existing `objc_msg_send` lowering.)
### Emit — `src/ir/emit_llvm.zig`
Add `.inline_asm => self.emitInlineAsm(...)` to the `emitInst` dispatch. This is a
**direct port of `FuncGen.airAssembly`**. Using the already-imported
`llvm_api.c`:
```zig
fn emitInlineAsm(self: *Emitter, inst: *const Inst, a: InlineAsm) void {
// 1) result LLVM type + param types/values from constraints
const ret_ty = self.lowerType(inst.ty); // void if no typed output
var param_tys: ...; var args: ...; // one per `input` constraint
// 2) assemble the LLVM constraint string (see algorithm below)
// outputs first ("=..."/"+..."), then inputs, then "~{reg}" clobbers, comma-joined
// 3) rewrite the template %[name]->${N}, %%->%, %=->${:uid}, $->$$ (state machine below)
const fn_ty = c.LLVMFunctionType(ret_ty, param_tys.ptr, n_params, 0);
const asm_val = c.LLVMGetInlineAsm(
fn_ty,
rendered_template.ptr, rendered_template.len,
constraint_str.ptr, constraint_str.len,
@intFromBool(a.has_side_effects), // HasSideEffects (volatile)
0, // IsAlignStack
c.LLVMInlineAsmDialectATT, // AT&T (Deviation: none — matches Zig default)
0, // CanThrow
);
const result = c.LLVMBuildCall2(self.builder, fn_ty, asm_val, args.ptr, n_params, "");
self.mapRef(inst, result); // 1 output: the value; N: extractvalue i per out_value → tuple
}
```
(Optionally cache the asm value keyed by `(template, constraints, fn_ty)` the way
`emit_llvm.zig:167` caches `objc_msg_send_value` — but per-site construction is
fine; LLVM uniques inline-asm constants internally.)
**Constraint-string assembler (port of `FuncGen.airAssembly`):**
```
parts = []
for op in operands where role == out_value or out_place: # outputs first
parts.append( op.constraint with ',' replaced by '|' ) # "={rax}", "=r", "+r" …
for op in operands where role == input:
parts.append( op.constraint with ',' replaced by '|' ) # "{rdi}", "r" …
for name in clobbers: # from clobbers(.name,…)
parts.append( "~{" + name + "}" ) # "~{rcx}", "~{cc}", "~{memory}"
constraint_str = ",".join(parts)
```
LLVM return type follows the `out_value` count: **0** → `void`; **1** → that type;
**N** → an anonymous struct `{T1,…,Tn}` — after the call, `extractvalue i` per
`out_value` builds the sx tuple (the multi-return path, §II.2 Dev 5). `out_place`
outputs are `store`d through their `Ref` afterward instead.
For `sys_write` (one output): constraint
`={rax},{rax},{rdi},{rsi},{rdx},~{rcx},~{r11},~{memory}`, `fn_ty = i64 (i64,ptr,i64)`,
`args = [1, fd, buf, len]`, `sideeffect = true`. For `divmod` (two outputs):
`={rax},={rdx},{rax},{rdx},r,~{cc}`, `fn_ty = {i64,i64} (i64,i64,i64)`, and the two
`extractvalue`s become the `(quot, rem)` tuple.
**Template rewriter (port verbatim from `FuncGen.zig:2735-2802`):** state machine
over the template bytes with a `name_map: [name] -> positional index` built from
`outputs ++ inputs`:
```
state start: '%' -> percent ; '$' -> emit "$$" ; else emit byte
state percent: '%' -> emit '%', start
'[' -> emit "${", state input
'=' -> emit "${:uid}", start
else -> emit '%', emit byte, start
state input: ']' -> emit name_map[name], emit '}', start
':' -> emit name_map[name], emit ':', state modifier
else accumulate name
state modifier:']' -> emit accumulated modifier, emit '}', start
else accumulate
```
An unknown `%[name]` is a hard error (mirror Zig's `todo`/diagnostic — **not** a
silent pass-through; CLAUDE.md no-silent-arms rule).
### Interpreter — `src/ir/interp.zig`
Inline asm cannot be comptime-evaluated. In the interpreter's op switch:
```zig
.inline_asm => return bailDetail("inline asm requires native execution; not available at comptime"),
```
(Same `bailDetail` pattern as the Obj-C/JNI ops — surfaces `op=inline_asm: ...`
rather than a silent default.)
### Global asm (Deviation 6)
Lower the top-level `asm_global` decl to a one-shot emit:
`c.LLVMAppendModuleInlineAsm(module, src.ptr, src.len)` (present in the linked
LLVM — `@19/include/llvm-c/Core.h:971`). No operands, no rewrite, no volatile;
multiple blocks concatenate in source order (as Zig does).
**Calling into an asm-defined symbol needs no new machinery** — declare it with a
lib-less `extern` (Deviation 6, §II.2): `my_func :: (sig) -> R extern;` emits
an external-linkage, raw-named, C-ABI extern that the linker resolves against the
`.global` the asm block defines.
**Guard (CLAUDE.md no-silent-arms):** a global-asm symbol exists only in the final
linked binary, not in the `#run`/JIT host process. The interpreter resolves
externs via `dlsym(RTLD_DEFAULT)` (`host_ffi.zig`), which won't find it — calling
such a symbol at comptime must fail **loudly** (it should already, via the
dlsym-miss diagnostic; pin it with a test). Edge case: a symbol referenced *only*
by other asm/external code may need `llvm.used` / `.no_dead_strip` to survive
dead-stripping; the common "sx references it" case is safe.
## II.7 Stage-to-file map (implementation checklist)
| Stage | Zig reference | sx file + insertion point | New code |
|---|---|---|---|
| Keyword | `tokenizer.zig` keywords | `src/token.zig` — `Token.Tag` + keyword `StaticStringMap` | `kw_asm` (+ contextual `volatile`) |
| AST node | `Ast.zig:2797,3789` | `src/ast.zig:13,85,721` — `Node.Data` + new `AsmExpr`/`AsmOperand` | ~25 lines |
| Parser | `Parse.zig:2771-2883` | `src/parser.zig` — `parsePrimary` `.kw_asm` case + global-asm at decl scope | ~120 lines |
| Sema/typing | `Sema.zig:15044` | `src/ir/expr_typer.zig` (`inferExprType`) + validation | ~80 lines |
| IR op | `Air.zig:1485`, `Zir.zig:2531` | `src/ir/inst.zig:80` — `inline_asm: InlineAsm` | ~25 lines |
| Lowering | `AstGen.zig:8553` | `src/ir/lower/expr.zig` — `lowerExpr` `.asm_expr` case | ~60 lines |
| LLVM emit | `FuncGen.zig:2473-2852` | `src/ir/emit_llvm.zig` — `emitInst` `.inline_asm` case | ~120 lines (constraint asm + template rewrite + `LLVMGetInlineAsm`/`BuildCall2`) |
| Global asm | `Sema.addGlobalAssembly` + `module asm` | decl lowering → `c.LLVMAppendModuleInlineAsm` | ~15 lines |
| Interp bail | n/a | `src/ir/interp.zig` op switch | 1 line |
No change to `src/codegen.zig` is needed (the IR/LLVM path owns this).
## II.8 Phasing
* **Phase 1 (MVP).** `asm { … }` block; `asm volatile`; string-literal/`#string`
template; `= expr` inputs; `-> Type` outputs **including N→tuple multi-return**;
`clobbers(.…)` dot-name list; `%[name]`/`%%` substitution; "no-output ⇒ volatile"
check; AT&T. Target: Linux/macOS `x86_64` + `aarch64` syscalls, intrinsics, and
multi-value ops (`divmod`, `cpuid`, `add_carry`).
* **Phase 2.** `-> @place` write-through outputs, read-write (`"+r" -> @place`) and
indirect-memory (`"=*m"`) constraints, `%=` unique-id, output-to-const rejection.
* **Phase 3.** Global/module asm decl (`LLVMAppendModuleInlineAsm`) + the
comptime-call guard, plus Intel-dialect opt-in. Small: the extern-call path
already exists (lib-less `extern`).
* **Phase 4 (optional).** Upgrade `clobbers(.name)` from dot-name sugar to a
compile-time-checked per-architecture `Clobber` enum (typo-checking; same syntax).
* **Phase 5 (optional).** Naked functions (`callconv`-equivalent) for full
freestanding entry points.
## II.9 Testing
asm output is target-specific, so tests must pin a target and assert on
emitted IR/exit, not run host-natively unless the host matches. Use the existing
corpus harness and the **`16xx` platform block** (the closest fit in the
`XXXX-category` scheme; `specs.md`/CLAUDE.md test-layout). Mirror Zig's own
matrix:
* `examples/16xx-platform-asm-syscall-write.sx` — x86_64-linux write(2), assert exit/stdout.
* `examples/16xx-platform-asm-register-read.sx` — `mov %%rsp,%[out]`, no-input output.
* `examples/16xx-platform-asm-no-output-volatile.sx` — bare `asm volatile { "nop" }`.
* `examples/16xx-platform-asm-missing-volatile.sx` — **expected compile error**
(no output, no volatile) — pins the diagnostic.
* `examples/16xx-platform-asm-template-subst.sx` — `%[a]`/`%%` rewriting, assert
on the `sx ir`/`.s` snapshot.
* `examples/16xx-platform-asm-multi-return.sx` — `divmod` → `(quot, rem)` tuple, destructured.
* `examples/16xx-platform-asm-global.sx` (Phase 3) — global asm + extern call.
Add an IR/`.s` snapshot (`expected/*.ir`) for the substitution test so the
constraint-string + template-rewrite output is locked. Seed markers and
regenerate with `zig build test -Dupdate-goldens`, then review the diff
(CLAUDE.md snapshot-integrity rule).
## II.10 Open decisions for the user
Largely settled through design review; what remains:
1. **Dialect:** AT&T only (Zig's default) for v1, or expose an Intel opt-in
(`LLVMInlineAsmDialectIntel`) from the start? **Recommend AT&T-only v1.**
2. **`volatile` keyword (Deviation 4):** contextual *(recommended, no
source-compat risk)* vs globally reserved *(simpler lexer)*.
3. **Brace separator:** comma *(recommended — trailing-comma-friendly,
literal-style)* vs `;` *(matches sx statement blocks)*.
4. **Asm-symbol extern spelling (Deviation 6): RESOLVED** — use the lib-less `extern`
keyword to call *into* an asm symbol (import), and `export` for the reverse
direction (an sx function asm can call *back into*). The dedicated linkage
keywords landed (FFI-linkage stream), so no new surface is needed and both
directions are covered.
*Decided:* brace block `{ … }` (Dev 1) · `->`/`=` markers, `:` sections dropped,
`<-` rejected (Dev 2) · `clobbers(.…)` enum-literal list, dot-name sugar now →
checked enum later (Dev 3) · multiple value-outputs return a tuple (Dev 5). For
global asm (Dev 6) the call-*into*-asm direction reuses lib-less `extern` (Decision
4, resolved).
## II.11 Risks
* **Constraint/template correctness is silent if wrong** — a bad constraint
string miscompiles with no diagnostic. Mitigation: port Zig's assembler/rewrite
verbatim (don't paraphrase) and lock IR snapshots in tests.
* **Register-name validity is unchecked** in v1's `clobbers(.name)` dot-name form —
a typo'd register (`.raxx`) surfaces only as an LLVM error. This is exactly the
gap the Phase-4 checked `Clobber` enum closes; acceptable for v1 (LLVM validates
the emitted `~{…}`).
* **`#string` heredoc + AT&T `%`/`$`** interplay: ensure the heredoc delivers the
template bytes literally (no sx-level escape processing of `%`/`$`) before the
rewrite stage.
* **Target gating:** asm examples must declare their target or they break the
corpus on other hosts; the test plan pins targets.
---
## Appendix A — exact LLVM-C calls (already reachable via `llvm_api.c`)
```c
// src/llvm_api.zig @cInclude("llvm-c/Core.h") exposes all of these:
LLVMValueRef LLVMGetInlineAsm(LLVMTypeRef Ty,
const char *AsmString, size_t AsmStringSize,
const char *Constraints, size_t ConstraintsSize,
LLVMBool HasSideEffects, LLVMBool IsAlignStack,
LLVMInlineAsmDialect Dialect, LLVMBool CanThrow); // LLVM 19 & 21: identical
LLVMValueRef LLVMBuildCall2(LLVMBuilderRef, LLVMTypeRef, LLVMValueRef Fn,
LLVMValueRef *Args, unsigned NumArgs, const char *Name);
void LLVMAppendModuleInlineAsm(LLVMModuleRef M, const char *Asm, size_t Len); // global asm
// enum: LLVMInlineAsmDialectATT, LLVMInlineAsmDialectIntel
```
## Appendix B — file index
**Zig (reference, `~/projects/zig`):** `lib/std/zig/tokenizer.zig` (keywords) ·
`lib/std/zig/Ast.zig:2797,3789,3969` (nodes) · `lib/std/zig/Parse.zig:2771-2883`
(grammar) · `lib/std/zig/AstGen.zig:8553-8669,12257` + `lib/std/zig/Zir.zig:2531`
(ZIR) · `src/Sema.zig:15044-15231` (validation) · `src/Air.zig:1485` (AIR) ·
`src/codegen/llvm/FuncGen.zig:2473-2852` + `lib/std/zig/llvm/Builder.zig:6131`
(LLVM) · `doc/langref/inline_assembly.zig`, `doc/langref/test_global_assembly.zig`
(syntax) · `doc/langref.html.in:4217-4300` (spec).
**sx (target, `~/projects/sx`):** `src/token.zig` · `src/lexer.zig:402` (#string) ·
`src/ast.zig:13` · `src/parser.zig` (`parsePrimary`), the optional `extern`
library tail · `src/ir/expr_typer.zig` · `src/ir/inst.zig:80,219,260` ·
`src/ir/lower/expr.zig` · `src/ir/module.zig:300` (`declareExtern`) ·
`src/ir/emit_llvm.zig:167` (msgSend cache), `:1244` (extern⇒C-ABI), `:1279`
(raw symbol name) · `src/ir/interp.zig` (`bailDetail`) · `src/llvm_api.zig:1-17` ·
`build.zig:10` (LLVM@19).
## Appendix C — Cookbook (final form: `asm { … }`, `->`/`=`, `clobbers(.…)`, pure AT&T)
```sx
// ── v1 ────────────────────────────────────────────────────────────────────
asm volatile { "nop" }; // bare side-effecting
// write(2) syscall — register-pinned inputs, one value-output
sys_write :: (fd: i64, buf: [*]u8, len: u64) -> i64 {
return asm volatile {
"syscall",
"={rax}" -> i64,
"{rax}" = 1, "{rdi}" = fd, "{rsi}" = buf, "{rdx}" = len,
clobbers(.rcx, .r11, .memory),
};
}
// mmap — full 6-arg syscall ABI (arg4 in r10, not rcx)
mmap :: (addr: *void, len: u64, prot: i32, flags: i32, fd: i32, off: i64) -> *void {
return asm volatile {
"syscall",
"={rax}" -> *void,
"{rax}" = 9, "{rdi}" = addr, "{rsi}" = len, "{rdx}" = prot,
"{r10}" = flags, "{r8}" = fd, "{r9}" = off,
clobbers(.rcx, .r11, .memory),
};
}
// AT&T scaled-index addressing — arr[i]
load_idx :: (arr: *i64, i: u64) -> i64 {
return asm {
"movq (%[arr],%[i],8), %[out]",
[out] "=r" -> i64, [arr] "r" = arr, [i] "r" = i,
};
}
// CPUID AVX probe — immediates, heavy clobber set, single value-result
has_avx :: () -> bool {
return asm volatile {
#string ATT
movl $1, %%eax
cpuid
andl $0x10000000, %%ecx
setne %[ok]
ATT,
[ok] "=r" -> bool,
clobbers(.rax, .rbx, .rcx, .rdx, .cc),
};
}
// SSE packed add — xmm regs, no outputs ⇒ volatile
vadd4 :: (a: *f32, b: *f32, out: *f32) {
asm volatile {
#string ATT
movups (%[a]), %%xmm0
movups (%[b]), %%xmm1
addps %%xmm1, %%xmm0
movups %%xmm0, (%[out])
ATT,
[a] "r" = a, [b] "r" = b, [out] "r" = out,
clobbers(.xmm0, .xmm1, .memory),
};
}
// ── multi-return (v1; sx has tuples, Zig caps at one output) ────────────────
// 64-bit divide → (quotient, remainder)
divmod :: (n: u64, d: u64) -> (quot: u64, rem: u64) {
return asm {
"divq %[d]",
[quot] "={rax}" -> u64,
[rem] "={rdx}" -> u64,
"{rax}" = n, "{rdx}" = 0, [d] "r" = d,
clobbers(.cc),
};
}
// rdtsc → two 32-bit halves, destructured straight out of the asm
rdtsc :: () -> u64 {
lo, hi := asm volatile {
"rdtsc",
[lo] "={eax}" -> u32,
[hi] "={edx}" -> u32,
};
return (xx hi << 32) | xx lo;
}
// cpuid → a clean 4-tuple
cpuid :: (leaf: u32, subleaf: u32) -> (eax: u32, ebx: u32, ecx: u32, edx: u32) {
return asm volatile {
"cpuid",
[eax] "={eax}" -> u32, [ebx] "={ebx}" -> u32,
[ecx] "={ecx}" -> u32, [edx] "={edx}" -> u32,
"{eax}" = leaf, "{ecx}" = subleaf,
};
}
// add-with-carry → (sum, carry): value-output + tied input + flag capture
add_carry :: (a: u64, b: u64) -> (sum: u64, carry: u8) {
return asm {
#string ATT
addq %[b], %[sum]
setc %[carry]
ATT,
[sum] "=r" -> u64,
[carry] "=r" -> u8,
[a] "0" = a, [b] "r" = b,
clobbers(.cc),
};
}
// ── Phase 2 (write-through / read-write / indirect) ─────────────────────────
// byte memcpy — labels, loop, read-write operands
memcpy_bytes :: (dst: [*]u8, src: [*]u8, n: u64) {
d := dst; s := src; c := n;
asm volatile {
#string ATT
testq %[c], %[c]
jz 2f
1: movb (%[s]), %%al
movb %%al, (%[d])
incq %[s]
incq %[d]
decq %[c]
jnz 1b
2:
ATT,
[d] "+r" -> @d, [s] "+r" -> @s, [c] "+r" -> @c,
clobbers(.rax, .cc, .memory),
};
}
// lock cmpxchg CAS — lock prefix, pinned read-write rax, two outputs
cas :: (ptr: *i64, expected: i64, desired: i64) -> bool {
old := expected; ok: bool = ---;
asm volatile {
#string ATT
lock cmpxchgq %[desired], (%[ptr])
sete %[ok]
ATT,
[ok] "=r" -> @ok,
[old] "+{rax}" -> @old,
[ptr] "r" = ptr,
[desired] "r" = desired,
clobbers(.cc, .memory),
};
return ok;
}
// fill an existing struct (write-through, no tuple)
cpuid_into :: (out: *CpuId, leaf: u32) {
asm volatile {
"cpuid",
"={eax}" -> @out.eax, "={ebx}" -> @out.ebx,
"={ecx}" -> @out.ecx, "={edx}" -> @out.edx,
"{eax}" = leaf,
};
}
```
Global asm + extern (Phase 3):
```sx
asm {
#string ATT
.global my_add
my_add:
lea (%rdi,%rsi,1), %eax
retq
ATT,
};
my_add :: (a: i32, b: i32) -> i32 extern; // lib-less extern = Zig's `extern fn`
```