lab/sx

Files

agra 0e0ee40528 docs(asm): symbol refs are portable — explain the auto-:c mechanism

Updates the symbol-operand guide: x86 now uses the same plain %[fn] as
aarch64, and a 'How the portability works' note explains the mechanism
(compiler auto-injects LLVM's :c modifier for "s" operands, equivalent
to GCC :P/%P0 for x86 calls, no-op on aarch64, overridable). Drops the
stale per-arch :P guidance; checkpoint updated.

2026-06-16 09:05:15 +03:00

13 KiB

Raw Permalink Blame History

Inline Assembly in sx

A guide to writing inline assembly in sx — emitting raw target instructions, wiring values in and out, writing through memory, and defining whole routines in assembly.

Looking for the why behind the design (how it maps to LLVM, the Zig comparison, the emit algorithm)? That lives in inline-asm-design.md. This page is the user-facing how-to.

The mental model

asm is an expression. It drops to the machine: you write a template of real instructions, declare which sx values feed registers going in and which come back out, and the block evaluates to the output value (or a tuple of them).

add :: (a: i64, b: i64) -> i64 {
    return asm { "add %[out], %[a], %[b]", [out] "=r" -> i64, [a] "r" = a, [b] "r" = b };
}

Three things to know up front:

The body is a brace block of comma-separated parts: the template string first, then operands, then an optional clobbers(.…) clause.
Each operand is tagged by role, not by position: -> Type is a value output, = expr is an input, -> @place writes through to existing storage. The list is flat and order-independent — there are no positional : sections.
The outputs decide the result. Zero outputs → void (and the block must be volatile); one → that type; many → a tuple.

Templates are AT&T syntax (lowered through LLVM), target-specific, and never run at compile time — see When it runs.

Operands

An operand is [name]? "constraint" <role>. The constraint string is the LLVM/GCC-style constraint; the role marker says what the operand does.

Inputs — `= expr`

= expr feeds a value in. The constraint picks where it lands:

[a] "r"     = a      // any general register
"{rdi}"     = fd     // pinned to a specific register (x86_64 rdi)

Symbol inputs — `"s" = fn`

A "s" input feeds a function or global symbol (not a runtime value). In the template, %[name] expands to the symbol's platform-mangled name, so you can branch or call straight to it:

cb :: (n: i64) -> i64 export "cb" { return n + 1; }

trampoline :: (n: i64) -> i64 {
    return asm volatile {
        #string ASM
        mov x0, %[arg]
        bl %[fn]            // DIRECT call — `bl _cb` on macOS, `bl cb` on Linux
        mov %[res], x0
ASM,
        [res] "=r" -> i64,
        [arg] "r" = n,
        [fn]  "s" = cb,     // symbol operand
        clobbers(.x0, .x30, .memory),
    };
}

The same %[fn] works on x86_64 — just the branch mnemonic differs:

return asm volatile {
    "call %[fn]",              // x86_64 — same portable %[fn]
    [ret] "={rax}" -> i64,
    "{rdi}" = n,
    [fn]   "s" = cb,
    clobbers(.rcx, .rdx, .rsi, .r8, .r9, .r10, .r11, .memory),
};

Two reasons to prefer this over passing a function pointer in a plain "r" register and using an indirect blr/call *:

One fewer indirection — a direct PC-relative branch, no pointer load into a register, and a predictable (non-indirect) branch.
Portable — %[fn] is the same on every target; the backend emits the correctly-mangled name, so you never hardcode the macOS leading underscore or a per-arch operand modifier.

How the portability works. A bare %[fn] would render differently per target — on x86 the symbol prints as $cb (an immediate $-prefix that call rejects), while aarch64 prints it bare. So for a symbol ("s") operand the compiler auto-injects LLVM's :c operand modifier (%[fn] → ${N:c}, "print the constant with no punctuation"). :c prints the plain symbol on every target — equivalent to the GCC :P/%P0 call-target idiom on x86 (both emit the same R_X86_64_PLT32 relocation) and a no-op on aarch64. You can still override it with an explicit %[fn:X] if you ever need a different rendering, but for a call/branch you never should.

The callee needs a stable, externally-linked symbol — i.e. export (which also gives it the C ABI). A plain or callconv(.c)-only function is internal and gets dead-code-eliminated, so the symbol won't link. (A global-scope asm { … } routine has no operand list, so it can't use a symbol operand — it references the literal symbol in its text.)

Value outputs — `-> Type`

-> Type produces a value that becomes (part of) the block's result:

[out] "=r"    -> i64    // result in any register
"={rax}"      -> i64    // result pinned to rax

Naming and `%[name]`

Inside the template, %[name] refers to an operand by its effective name. An operand pinned to a register is auto-named after that register — "{rdi}" is reachable as %[rdi], "={rax}" as %[rax] — so an explicit [name] is only needed:

for a register-class operand ("=r", "r"), which has no register to name it; or
to give a pinned operand a name different from its register.

Two labels are rejected so names stay unambiguous:

the echo form [rax] "={rax}" — the label just repeats the pin, so drop it (the operand is already %[rax]); and
duplicate operand names.

In the template, %% is a literal %, and %= expands to a unique id (handy for a local label that must differ across inlinings).

The result type

The number of value outputs (-> Type) decides the block's type:

`-> Type` outputs	result	example
0	`void` — must be `volatile`	`asm volatile { "dmb ish" }`
1	that type `T`	`x := asm { …, "=r" -> i64 }`
N	a tuple, fields named by each operand's name	`lo, hi := asm { … }`

With multiple outputs you get real multiple return values — a named operand becomes a named tuple field:

// aarch64 — split a value into low/high bytes
split :: (x: u64) -> (lo: u64, hi: u64) {
    return asm {
        #string ASM
        and %[l], %[x], #0xff
        lsr %[h], %[x], #8
ASM,
        [l] "=r" -> u64,        // → .lo   (operand 0)
        [h] "=r" -> u64,        // → .hi   (operand 1)
        [x] "r" = x,
    };
}
lo, hi := split(0x1234);        // (0x34, 0x12) = (52, 18)

`volatile`

asm volatile { … } marks the block as having side effects, so the optimizer won't move or delete it. It is required whenever there are no value outputs — a result-less, non-volatile asm would be dead code.

barrier :: () { asm volatile { "dmb ish" }; }   // aarch64 full barrier

A block with outputs may still be volatile when its effects matter beyond the returned value (e.g. a syscall).

`clobbers(.…)`

clobbers(.…) is a dot-name list of registers and flags the asm trashes that aren't already operands — so the register allocator keeps clear of them:

clobbers(.rcx, .r11, .memory)   // x86_64 syscall trashes rcx, r11, and memory
clobbers(.cc)                   // condition flags

.memory means "this asm reads or writes memory the compiler can't see," and .cc means "the condition flags are modified."

Writing through memory — `-> @place`

Sometimes the asm should write into existing storage (a local, a struct field) rather than return a value. -> @place does that: the place output does not join the result tuple. There are three forms, distinguished by the constraint.

Write-through — `= …` constraint

The asm computes a value into a register; sx stores it through the place's address afterward.

compute :: () -> i64 {
    other : i64 = 0;
    main_val := asm volatile {
        #string ASM
        mov %[m], #5
        mov %[o], #37
ASM,
        [m] "=r" -> i64,        // value output → returned into main_val
        [o] "=r" -> @other,     // place output → stored through @other
    };
    return main_val + other;    // 5 + 37 = 42
}

A value output and one or more place outputs can mix freely; only the value outputs build the returned tuple.

Read-write — `+` constraint

A + operand is read and written: the place's current value is fed in, the asm updates it in place, and the result is stored back.

// increment-in-place: x is loaded, the asm adds 1, the result is stored back
bump :: () -> i64 {
    x : i64 = 41;
    asm volatile { "add %[v], %[v], #1", [v] "+r" -> @x };
    return x;   // 42
}

Indirect memory — `=*m` constraint

An =*m operand passes the place's address to the asm, which writes through it directly (no register round-trip, no return slot):

// store 42 straight into x's storage
poke :: () -> i64 {
    x : i64 = 0;
    asm volatile {
        #string ASM
        mov x9, #42
        str x9, %[out]
ASM,
        [out] "=*m" -> @x,
        clobbers(.x9),
    };
    return x;   // 42
}

The place must be mutable storage. Taking the address of a scalar :: constant has no meaning — a scalar constant folds to its value and has no storage — so -> @SOME_CONST is a compile error:

cannot take the address of constant 'SOME_CONST' — a scalar '::'
constant has no storage (use a '=' variable or a local copy)

Multi-instruction templates

A single "…" string is one fragment. For several instructions, use a multi-line string literal or sx's #string heredoc, which is delivered verbatim — no escape processing — so you write assembly exactly as it should appear:

serialize :: () {
    asm volatile {
        #string ASM
        mfence
        lfence
ASM,
    };
}

Global (module-scope) assembly

A top-level asm { … } block is global assembly — template only (no operands, no volatile), emitted as module-level assembly. It is the place to define a whole routine in assembly. Symbols it defines are reached from sx with a lib-less extern declaration:

asm {
    #string ASM
.global _my_add
_my_add:
    add x0, x0, x1
    ret
ASM,
};

my_add :: (a: i64, b: i64) -> i64 extern;

main :: () -> i64 {
    return my_add(40, 2);   // 42 — computed by the global-asm routine
}

Multiple global blocks concatenate in source order. (Symbol naming follows the platform convention — a leading underscore on macOS, none on Linux.)

When it runs

Inline assembly is emitted into the program and runs at runtime, under both execution paths:

sx run (JIT) — the module is compiled to an in-memory object (the integrated assembler assembles your asm, including global blocks), then run. Both inline and global asm work.
sx build (AOT) — same, into a native binary.

It does not run at compile time. A #run (comptime) call into a global-asm symbol fails loudly:

COMPUTED :: #run my_add(40, 2);   // error: the symbol isn't linked yet at comptime

comptime extern call: symbol not found via dlsym

The comptime interpreter resolves extern calls against the host process; a module-asm symbol only exists once the program is assembled and linked, so call it at runtime, not in a #run.

Cookbook

Read a register (no inputs):

stack_ptr :: () -> u64 {
    return asm { "mov %[out], sp", [out] "=r" -> u64 };   // aarch64
}

x86_64 syscall — write(2), with pinned registers and clobbers:

sys_write :: (fd: i64, buf: *u8, count: i64) -> i64 {
    return asm volatile {
        "syscall",
        [ret] "={rax}" -> i64,      // bytes written, in rax
        "{rax}" = 1,                // SYS_write
        "{rdi}" = fd,
        "{rsi}" = buf,
        "{rdx}" = count,
        clobbers(.rcx, .r11, .memory),
    };
}

x86_64 divmod — one instruction, two outputs, returned as a tuple:

divmod :: (n: u64, d: u64) -> (quot: u64, rem: u64) {
    return asm {
        "divq %[d]",
        [quot] "={rax}" -> u64,
        [rem]  "={rdx}" -> u64,
        "{rax}" = n,  "{rdx}" = 0,  [d] "r" = d,
        clobbers(.cc),
    };
}
q, r := divmod(17, 5);              // (3, 2)

Rules of thumb

asm yields a value. Bind it (x := asm { … }), return it, or destructure a multi-output tuple (a, b := asm { … }). A block with no value outputs must be volatile.
Pinned operands name themselves. "{rdi}" is %[rdi]; only add [name] for register-class operands or to rename. Don't echo a pin ([rax] "={rax}").
%% for a literal percent; %[name] for an operand. Templates are AT&T.
List everything you trash in clobbers(.…) — scratch registers, .cc, and .memory if the asm touches memory the compiler can't see.
-> @place writes storage; pick the form: = (compute then store), + (read-modify-write), =*m (write through the address). The place must be mutable — not a scalar :: constant.
Global asm { … } defines symbols; import them with a lib-less extern. They run under JIT and AOT, but not in a #run.
It's target-specific. Gate or pick instructions per architecture; there is no portable instruction set.

13 KiB Raw Permalink Blame History

Inline Assembly in sx

The mental model

Operands

Inputs — = expr

Symbol inputs — "s" = fn

Value outputs — -> Type

Naming and %[name]