Files
sx/docs/inline-assembly.md
agra 0e0ee40528 docs(asm): symbol refs are portable — explain the auto-:c mechanism
Updates the symbol-operand guide: x86 now uses the same plain %[fn] as
aarch64, and a 'How the portability works' note explains the mechanism
(compiler auto-injects LLVM's :c modifier for "s" operands, equivalent
to GCC :P/%P0 for x86 calls, no-op on aarch64, overridable). Drops the
stale per-arch :P guidance; checkpoint updated.
2026-06-16 09:05:15 +03:00

438 lines
13 KiB
Markdown

# Inline Assembly in sx
A guide to writing inline assembly in sx — emitting raw target
instructions, wiring values in and out, writing through memory, and
defining whole routines in assembly.
> Looking for the *why* behind the design (how it maps to LLVM, the
> Zig comparison, the emit algorithm)? That lives in
> [inline-asm-design.md](../design/inline-asm-design.md). This page is the
> user-facing how-to.
---
## The mental model
`asm` is an **expression**. It drops to the machine: you write a
template of real instructions, declare which sx values feed registers
going in and which come back out, and the block evaluates to the
output value (or a tuple of them).
```sx
add :: (a: i64, b: i64) -> i64 {
return asm { "add %[out], %[a], %[b]", [out] "=r" -> i64, [a] "r" = a, [b] "r" = b };
}
```
Three things to know up front:
1. **The body is a brace block of comma-separated parts:** the template
string first, then operands, then an optional `clobbers(.…)` clause.
2. **Each operand is tagged by role**, not by position: `-> Type` is a
value output, `= expr` is an input, `-> @place` writes through to
existing storage. The list is flat and order-independent — there are
no positional `:` sections.
3. **The outputs decide the result.** Zero outputs → `void` (and the
block must be `volatile`); one → that type; many → a tuple.
Templates are **AT&T syntax** (lowered through LLVM), **target-specific**,
and **never run at compile time** — see [When it runs](#when-it-runs).
---
## Operands
An operand is `[name]? "constraint" <role>`. The constraint string is
the LLVM/GCC-style constraint; the role marker says what the operand
does.
### Inputs — `= expr`
`= expr` feeds a value in. The constraint picks where it lands:
```sx
[a] "r" = a // any general register
"{rdi}" = fd // pinned to a specific register (x86_64 rdi)
```
### Symbol inputs — `"s" = fn`
A `"s"` input feeds a **function or global symbol** (not a runtime value).
In the template, `%[name]` expands to the symbol's **platform-mangled
name**, so you can branch or call straight to it:
```sx
cb :: (n: i64) -> i64 export "cb" { return n + 1; }
trampoline :: (n: i64) -> i64 {
return asm volatile {
#string ASM
mov x0, %[arg]
bl %[fn] // DIRECT call — `bl _cb` on macOS, `bl cb` on Linux
mov %[res], x0
ASM,
[res] "=r" -> i64,
[arg] "r" = n,
[fn] "s" = cb, // symbol operand
clobbers(.x0, .x30, .memory),
};
}
```
The same `%[fn]` works on **x86_64** — just the branch mnemonic differs:
```sx
return asm volatile {
"call %[fn]", // x86_64 — same portable %[fn]
[ret] "={rax}" -> i64,
"{rdi}" = n,
[fn] "s" = cb,
clobbers(.rcx, .rdx, .rsi, .r8, .r9, .r10, .r11, .memory),
};
```
Two reasons to prefer this over passing a function *pointer* in a plain
`"r"` register and using an indirect `blr`/`call *`:
- **One fewer indirection** — a direct PC-relative branch, no pointer
load into a register, and a predictable (non-indirect) branch.
- **Portable** — `%[fn]` is the same on every target; the backend emits
the correctly-mangled name, so you never hardcode the macOS leading
underscore *or* a per-arch operand modifier.
**How the portability works.** A bare `%[fn]` would render differently
per target — on x86 the symbol prints as `$cb` (an immediate `$`-prefix
that `call` rejects), while aarch64 prints it bare. So for a symbol (`"s"`)
operand the compiler **auto-injects LLVM's `:c` operand modifier** (`%[fn]`
`${N:c}`, "print the constant with no punctuation"). `:c` prints the
plain symbol on every target — equivalent to the GCC `:P`/`%P0` call-target
idiom on x86 (both emit the same `R_X86_64_PLT32` relocation) and a no-op
on aarch64. You can still override it with an explicit `%[fn:X]` if you
ever need a different rendering, but for a call/branch you never should.
The callee needs a stable, externally-linked symbol — i.e. `export`
(which also gives it the C ABI). A plain or `callconv(.c)`-only function
is `internal` and gets dead-code-eliminated, so the symbol won't link.
(A global-scope `asm { … }` routine has no operand list, so it can't use
a symbol operand — it references the literal symbol in its text.)
### Value outputs — `-> Type`
`-> Type` produces a value that becomes (part of) the block's result:
```sx
[out] "=r" -> i64 // result in any register
"={rax}" -> i64 // result pinned to rax
```
### Naming and `%[name]`
Inside the template, `%[name]` refers to an operand by its **effective
name**. An operand pinned to a register is **auto-named after that
register** — `"{rdi}"` is reachable as `%[rdi]`, `"={rax}"` as `%[rax]`
— so an explicit `[name]` is only needed:
- for a register-**class** operand (`"=r"`, `"r"`), which has no register
to name it; or
- to give a pinned operand a name *different* from its register.
Two labels are rejected so names stay unambiguous:
- the **echo form** `[rax] "={rax}"` — the label just repeats the pin, so
drop it (the operand is already `%[rax]`); and
- **duplicate** operand names.
In the template, `%%` is a literal `%`, and `%=` expands to a unique id
(handy for a local label that must differ across inlinings).
### The result type
The number of **value** outputs (`-> Type`) decides the block's type:
| `-> Type` outputs | result | example |
|---|---|---|
| 0 | `void` — must be `volatile` | `asm volatile { "dmb ish" }` |
| 1 | that type `T` | `x := asm { …, "=r" -> i64 }` |
| N | a **tuple**, fields named by each operand's name | `lo, hi := asm { … }` |
With multiple outputs you get real multiple return values — a named
operand becomes a named tuple field:
```sx
// aarch64 — split a value into low/high bytes
split :: (x: u64) -> (lo: u64, hi: u64) {
return asm {
#string ASM
and %[l], %[x], #0xff
lsr %[h], %[x], #8
ASM,
[l] "=r" -> u64, // → .lo (operand 0)
[h] "=r" -> u64, // → .hi (operand 1)
[x] "r" = x,
};
}
lo, hi := split(0x1234); // (0x34, 0x12) = (52, 18)
```
---
## `volatile`
`asm volatile { … }` marks the block as having side effects, so the
optimizer won't move or delete it. It is **required whenever there are
no value outputs** — a result-less, non-volatile asm would be dead code.
```sx
barrier :: () { asm volatile { "dmb ish" }; } // aarch64 full barrier
```
A block with outputs may still be `volatile` when its effects matter
beyond the returned value (e.g. a syscall).
---
## `clobbers(.…)`
`clobbers(.…)` is a dot-name list of registers and flags the asm trashes
that aren't already operands — so the register allocator keeps clear of
them:
```sx
clobbers(.rcx, .r11, .memory) // x86_64 syscall trashes rcx, r11, and memory
clobbers(.cc) // condition flags
```
`.memory` means "this asm reads or writes memory the compiler can't see,"
and `.cc` means "the condition flags are modified."
---
## Writing through memory — `-> @place`
Sometimes the asm should write into existing storage (a local, a struct
field) rather than *return* a value. `-> @place` does that: the place
output does **not** join the result tuple. There are three forms,
distinguished by the constraint.
### Write-through — `= …` constraint
The asm computes a value into a register; sx stores it through the
place's address afterward.
```sx
compute :: () -> i64 {
other : i64 = 0;
main_val := asm volatile {
#string ASM
mov %[m], #5
mov %[o], #37
ASM,
[m] "=r" -> i64, // value output → returned into main_val
[o] "=r" -> @other, // place output → stored through @other
};
return main_val + other; // 5 + 37 = 42
}
```
A value output and one or more place outputs can mix freely; only the
value outputs build the returned tuple.
### Read-write — `+` constraint
A `+` operand is read **and** written: the place's current value is fed
in, the asm updates it in place, and the result is stored back.
```sx
// increment-in-place: x is loaded, the asm adds 1, the result is stored back
bump :: () -> i64 {
x : i64 = 41;
asm volatile { "add %[v], %[v], #1", [v] "+r" -> @x };
return x; // 42
}
```
### Indirect memory — `=*m` constraint
An `=*m` operand passes the place's **address** to the asm, which writes
through it directly (no register round-trip, no return slot):
```sx
// store 42 straight into x's storage
poke :: () -> i64 {
x : i64 = 0;
asm volatile {
#string ASM
mov x9, #42
str x9, %[out]
ASM,
[out] "=*m" -> @x,
clobbers(.x9),
};
return x; // 42
}
```
**The place must be mutable storage.** Taking the address of a scalar
`::` constant has no meaning — a scalar constant folds to its value and
has no storage — so `-> @SOME_CONST` is a compile error:
```
cannot take the address of constant 'SOME_CONST' — a scalar '::'
constant has no storage (use a '=' variable or a local copy)
```
---
## Multi-instruction templates
A single `"…"` string is one fragment. For several instructions, use a
multi-line string literal or sx's **`#string` heredoc**, which is
delivered **verbatim** — no escape processing — so you write assembly
exactly as it should appear:
```sx
serialize :: () {
asm volatile {
#string ASM
mfence
lfence
ASM,
};
}
```
---
## Global (module-scope) assembly
A top-level `asm { … }` block is **global assembly** — template only
(no operands, no `volatile`), emitted as module-level assembly. It is
the place to define a whole routine in assembly. Symbols it defines are
reached from sx with a **lib-less `extern`** declaration:
```sx
asm {
#string ASM
.global _my_add
_my_add:
add x0, x0, x1
ret
ASM,
};
my_add :: (a: i64, b: i64) -> i64 extern;
main :: () -> i64 {
return my_add(40, 2); // 42 — computed by the global-asm routine
}
```
Multiple global blocks concatenate in source order. (Symbol naming
follows the platform convention — a leading underscore on macOS, none
on Linux.)
---
## When it runs
Inline assembly is emitted into the program and runs at **runtime**,
under both execution paths:
- **`sx run` (JIT)** — the module is compiled to an in-memory object
(the integrated assembler assembles your asm, including global blocks),
then run. Both inline and global asm work.
- **`sx build` (AOT)** — same, into a native binary.
It does **not** run at **compile time**. A `#run` (comptime) call into a
global-asm symbol fails loudly:
```sx
COMPUTED :: #run my_add(40, 2); // error: the symbol isn't linked yet at comptime
```
```
comptime extern call: symbol not found via dlsym
```
The comptime interpreter resolves `extern` calls against the host
process; a module-asm symbol only exists once the program is
assembled and linked, so call it at runtime, not in a `#run`.
---
## Cookbook
**Read a register** (no inputs):
```sx
stack_ptr :: () -> u64 {
return asm { "mov %[out], sp", [out] "=r" -> u64 }; // aarch64
}
```
**x86_64 syscall**`write(2)`, with pinned registers and clobbers:
```sx
sys_write :: (fd: i64, buf: *u8, count: i64) -> i64 {
return asm volatile {
"syscall",
[ret] "={rax}" -> i64, // bytes written, in rax
"{rax}" = 1, // SYS_write
"{rdi}" = fd,
"{rsi}" = buf,
"{rdx}" = count,
clobbers(.rcx, .r11, .memory),
};
}
```
**x86_64 divmod** — one instruction, two outputs, returned as a tuple:
```sx
divmod :: (n: u64, d: u64) -> (quot: u64, rem: u64) {
return asm {
"divq %[d]",
[quot] "={rax}" -> u64,
[rem] "={rdx}" -> u64,
"{rax}" = n, "{rdx}" = 0, [d] "r" = d,
clobbers(.cc),
};
}
q, r := divmod(17, 5); // (3, 2)
```
---
## Rules of thumb
- **`asm` yields a value.** Bind it (`x := asm { … }`), `return` it, or
destructure a multi-output tuple (`a, b := asm { … }`). A block with no
value outputs must be `volatile`.
- **Pinned operands name themselves.** `"{rdi}"` is `%[rdi]`; only add
`[name]` for register-class operands or to rename. Don't echo a pin
(`[rax] "={rax}"`).
- **`%%` for a literal percent; `%[name]` for an operand.** Templates are
AT&T.
- **List everything you trash** in `clobbers(.…)` — scratch registers,
`.cc`, and `.memory` if the asm touches memory the compiler can't see.
- **`-> @place` writes storage; pick the form:** `=` (compute then
store), `+` (read-modify-write), `=*m` (write through the address).
The place must be mutable — not a scalar `::` constant.
- **Global `asm { … }`** defines symbols; import them with a lib-less
`extern`. They run under JIT and AOT, but **not** in a `#run`.
- **It's target-specific.** Gate or pick instructions per architecture;
there is no portable instruction set.
---
## See also
- [inline-asm-design.md](../design/inline-asm-design.md) — the design rationale and
LLVM mapping.
- `examples/16xx-platform-asm-*` — the full, runnable example matrix
(basic in/out, tuples, the three `-> @place` forms, global asm, the
x86_64 syscall, and the comptime-boundary guard).
- The "Inline Assembly" section of [readme.md](../readme.md) for a
one-screen overview.
```