Compare commits

...

40 Commits

Author SHA1 Message Date
agra
ded106333b docs(design): execution-model roadmap + reify implementation stream
Add the async-first execution-model roadmap (comptime JIT spine, colorblind
fibers/Io, atomics, hot-reload) with all seven decisions resolved and
three-way reviewed, and carve the first stream: comptime type_info/reify
(PLAN-REIFY + checkpoint) — the codebase-validated foundation for channel
result types and race's synthesized tagged union.
2026-06-16 16:43:29 +03:00
agra
b6a7378af4 feat(dist): bundled-zig link backend for hermetic macOS/Linux/Windows builds
Drive a bundled `zig` as `zig cc` for the AOT link step, supplying lld + CRT
+ libc (musl/glibc/mingw) so `sx build` produces native binaries with no host
toolchain. Default Linux output is static musl (portable-anywhere).

- src/zig_backend.zig: discover zig ($SX_ZIG / bundled-next-to-exe / PATH);
  bundled-vs-PATH provenance gates auto-activation.
- src/target.zig: selectZigLinker + emitZigLinkArgv + zigTargetTriple, dispatched
  before the per-OS branches; macOS/Linux/Windows in scope.
- src/ir/emit_llvm.zig: LLVMNormalizeTargetTriple so vendor-less zig triples
  (e.g. x86_64-windows-gnu) parse to the correct OS/object format (COFF not ELF).
- src/main.zig: --self-contained / --no-self-contained; linux-musl, linux-musl-arm,
  windows-gnu shorthands; de-vendor linux/linux-arm to match the corpus runner.
- examples/1660: Windows Win32 print-42 + exit(0) via kernel32 (ir-only off-Windows).

Auto-activates only for a bundled zig; a PATH-only zig engages under
--self-contained, so native dev/CI builds are never silently rerouted.

Docs: readme Cross-Compilation, design/bundled-zig-link-backend-design.md, current/PLAN-DIST.md.
2026-06-16 15:56:06 +03:00
agra
0e0ee40528 docs(asm): symbol refs are portable — explain the auto-:c mechanism
Updates the symbol-operand guide: x86 now uses the same plain %[fn] as
aarch64, and a 'How the portability works' note explains the mechanism
(compiler auto-injects LLVM's :c modifier for "s" operands, equivalent
to GCC :P/%P0 for x86 calls, no-op on aarch64, overridable). Drops the
stale per-arch :P guidance; checkpoint updated.
2026-06-16 09:05:15 +03:00
agra
066ba54346 feat(asm): portable symbol refs — auto-inject :c operand modifier
A `%[name]` that references a symbol ("s") operand without an explicit
modifier now lowers to `${N:c}` (LLVM 'bare constant — no punctuation')
instead of `${N}`. This makes `bl %[fn]` / `call %[fn]` portable across
targets with no per-arch knowledge: x86 would otherwise render `$cb`
(an invalid call target, requiring a hand-written `:P`); aarch64 is
unaffected. Verified `:c` is equivalent to `:P` for x86-64 calls (both
emit R_X86_64_PLT32), and correct for branch targets, RIP-relative
addressing, and `$`-prefixed absolute immediates.

renderAsmTemplate injects `:c` only for symbol operands lacking an
explicit modifier (asmNamedIsSymbol helper); an explicit `%[name:X]`
still wins (escape hatch). x86 example 1659 drops its `:P` for the same
plain `%[fn]` as aarch64 1656. Snapshots regen to `${N:c}`. zig build
test green (668 corpus, 446 unit).
2026-06-16 09:04:23 +03:00
agra
79042ab9ab docs(asm): note x86 %[fn:P] call modifier + checkpoint x86 coverage 2026-06-16 08:37:09 +03:00
agra
17e3b91eb9 test(asm): x86_64 cross-arch siblings for place + symbol operands
Adds ir-only x86_64 examples mirroring the aarch64 feature examples, so
each emit path is locked on both arches:
- 1657 read-write `+`  → "incq ${0}", "=r,0" (tied input)
- 1658 indirect `=*m`   → "movq $$42, ${0}", "=*m"(ptr elementtype i64)
- 1659 symbol `"s"`     → "call ${2:P}", direct call to an exported sx fn

Each is x86-pinned (ir-only on this aarch64 host — the .ir is the
assertion; runs on x86_64-linux, main returns 0 on success / 1 if the
asm misbehaved). x86 templates validated by cross-emitting an object
(LLVM's integrated assembler accepts them; objdump confirms 1659 is a
direct `call` reloc to cb). Note: x86 direct calls need the `P` operand
modifier (`%[fn:P]`); aarch64 `bl %[fn]` needs none. Pure additive
locks, no compiler change. zig build test green (668 corpus, 446 unit).
2026-06-16 08:36:33 +03:00
agra
a0face7571 docs(asm): document symbol operands ("s") + checkpoint
Adds a 'Symbol inputs — "s" = fn' section to docs/inline-assembly.md
(direct bl/call, portability, the export-vs-callconv linkage point) and
logs the symbol-operand + round-trip work in CHECKPOINT-ASM.
2026-06-16 08:26:22 +03:00
agra
10f4137cbd feat(asm): symbol operands ("s") — direct call/branch to a function
A `"s"` input operand feeds a function/global symbol; the template's
%[name] emits the platform-mangled name, so `bl %[fn]` / `call %[fn]`
branches DIRECTLY to it (PC-relative, no register load — one fewer
indirection than register-indirect `blr`).

Lowering: an `"s"` input lowers its RHS normally (a function name →
`ptr @fn`); the rejection added last commit is removed. Emit: a symbol
operand is passed with its OWN llvm type (LLVMTypeOf) and no coercion —
the function value is a `ptr`, and the old coerce-to-register-int path
mistyped it and failed the verifier. New asmIsSymbol helper.

Verified on aarch64: examples/1656 (sx → asm → bl _cb → sx → 42); the
emitted asm is a direct `bl <_cb>` (objdump-confirmed), IR constraint
`...,s,...`(ptr @cb). Flipped 1656 from the rejection lock to a runnable
aarch64 example. zig build test green (665 corpus, 446 unit).
2026-06-16 08:24:53 +03:00
agra
c187122531 test(asm): reject symbol "s" operands cleanly + lock (symbol-op prep)
A symbol operand (constraint "s") feeds a function/global symbol whose
mangled name the template emits — enabling a DIRECT `bl %[fn]` (one
fewer indirection than register-indirect `blr`). Until now `"s" = fn`
fell through to emit and produced an LLVM-verifier crash (param type
mismatch). Reject it at lowering with a clear diagnostic instead, and
lock that with examples/1656-platform-asm-symbol-operand.sx. The next
commit implements it and flips the example to run (→ 42).
2026-06-16 08:19:18 +03:00
agra
1346a2d020 test(asm): round-trip example — asm calls back into an sx function
Adds examples/1655-platform-asm-callback-into-sx.sx: a global-asm
trampoline (_caller) that `bl _cb` back into an `export`ed sx function.
Demonstrates the sx → asm → sx round trip and that `export` (external
linkage + stable C symbol + C ABI) is what makes the callback symbol
resolvable — `callconv(.c)` alone leaves it internal and it DCE's away.
Runs under the JIT on aarch64-macos (→ 42); ir-only elsewhere. Locks
current behavior; no compiler change.
2026-06-16 07:55:05 +03:00
agra
e7eeecc0f3 docs: move inline-asm design doc to a top-level design/ folder
Moves docs/inline-asm-design.md -> design/inline-asm-design.md (the
internal design record now lives under design/, separate from the
user-facing docs/). Updates all links: current/CHECKPOINT-ASM.md,
current/PLAN-ASM.md, current/PLAN-EXTERN-EXPORT.md (../docs -> ../design)
and docs/inline-assembly.md (same-dir -> ../design).
2026-06-16 07:46:01 +03:00
agra
b4d1ce78c3 docs(asm): add user-facing inline-assembly guide
Adds docs/inline-assembly.md — a how-to guide for inline assembly in the
docs/error-handling.md style: mental model, operands (inputs / value
outputs / naming + auto-naming rule), the result-type table, volatile,
clobbers, all three `-> @place` forms (write-through / read-write /
indirect-memory), multi-instruction `#string` templates, global asm +
lib-less extern, the JIT/AOT-yes vs `#run`-no execution model, a
cookbook (read-register, x86_64 syscall, divmod), and rules of thumb.
All aarch64 snippets are verified to run; x86_64 ones are labeled. The
design doc (docs/inline-asm-design.md) stays as the internal rationale;
this guide is the user-facing companion, linked from readme.md.
2026-06-16 07:41:14 +03:00
agra
73f5f0ed11 docs(asm): checkpoint comptime-call guard (1654) 2026-06-16 07:29:56 +03:00
agra
ab7fc393b6 test(asm): pin loud failure of #run into a module-asm symbol
Adds examples/1654-platform-asm-global-comptime-call.sx — the comptime
guard. A module-asm symbol only exists after assemble+link; the comptime
interpreter resolves extern calls via host dlsym, where it's absent, so
`#run my_add(…)` fails with a clear diagnostic ("comptime extern call:
symbol not found via dlsym") rather than misfiring. Runtime calls work
(1648/1653). dlsym-miss precedes asm assembly, so arch-independent — no
.build. Locks current behavior; no compiler change.
2026-06-16 07:29:38 +03:00
agra
66e1e39418 docs(asm): correct stale 'AOT only' module-asm prose (JIT works)
sx run compiles to an object before ORC relocation, so module asm is
assembled in and its symbols resolve at JIT main execution. Corrected
the Phase F note, Current state, and Next step; the only real boundary
is a compile-time #run into a module-asm symbol (loud dlsym-miss).
2026-06-16 07:25:32 +03:00
agra
e954f044d8 test(asm): global asm runs under the JIT (sx run), not just AOT
Adds examples/1653-platform-asm-global-jit.sx — a module-scope asm { … }
block executed via `sx run` (no `aot`). sx run compiles the module to an
in-memory object (the integrated assembler assembles the `module asm`
into it), then ORC relocates and runs it, so a module-asm symbol IS
resolvable at JIT main execution — the long-assumed "AOT only" limit was
stale. Sibling of 1648 (same feature via AOT). Locks current behavior
(exit 42); no compiler change.
2026-06-16 07:24:09 +03:00
agra
d5aee7a222 docs(asm): checkpoint indirect-memory =*m — inline asm feature-complete 2026-06-16 07:10:31 +03:00
agra
cb6c032c58 feat(asm): indirect-memory =*m place outputs
Implements indirect-memory (`=*m`) `-> @place` outputs — the last
substantive asm feature. Unlike a write-through `=` output (which
returns a value that is then stored), an indirect output passes the
place ADDRESS to the asm and the asm writes through it; there is no
return slot.

emitInlineAsm:
  - indirect outputs are excluded from the LLVM return type;
  - their pointer is passed as an opaque `ptr` call arg, placed FIRST
    (the arg-consuming constraint order is: output-section indirect
    pointers, then inputs, then read-write tied seeds);
  - each indirect arg gets an `elementtype(T)` call-site attribute
    (required in the opaque-pointer era), T = the pointee type;
  - the store-back loop skips indirect outputs (already written).
New asmIsIndirect helper. Lowering stops rejecting `*` (constraint kept
verbatim; `=*m` reaches the constraint string as-is). asmOperandIndex
is unchanged — indirect outputs still count as operands, so `%[name]`
${N} numbering holds.

Verified by running on aarch64: store-through-pointer (str x9, %[out]
→ 42, IR `=*m,~{x9}` with `ptr elementtype(i64)`) and a mixed case
(indirect + value output + input → `=*m,=r,r`, indirect ptr arg first,
${0}/${1}/${2} correct). 1652 flipped from the rejection lock to a
runnable aarch64 example (ir-only elsewhere). zig build test green
(661 corpus, 446 unit).
2026-06-16 07:09:17 +03:00
agra
2a43713d7f test(asm): lock indirect-memory =*m rejection (Phase G prep)
Adds examples/1652-platform-asm-indirect-mem.sx exercising a `=*m -> @x`
indirect-memory place output. Currently rejected loudly at lowering
("not yet implemented"); this locks that behavior as a passing test.
The next commit implements indirect-memory outputs and flips this
example to run end-to-end (store-through-pointer → 42).
2026-06-16 07:05:05 +03:00
agra
59469f2b2f docs(asm): checkpoint x86_64 syscall-write example (1651) 2026-06-16 06:39:14 +03:00
agra
cdd920b692 test(asm): x86_64 Linux syscall-write example (ir-only lock)
Adds examples/1651-platform-asm-x86-syscall-write.sx — the canonical
inline-asm use case: `write(2)` via a raw x86_64 `syscall` (SYS_write
in rax, fd/buf/count pinned to rdi/rsi/rdx, rcx+r11+memory clobbered,
byte count returned in rax). Exercises register-pinned inputs, a pinned
value output, a pointer input (*u8 -> rsi), and clobbers(.…) lowering
together.

x86-pinned via .build { "target": "x86_64-linux" }: ir-only on this
aarch64 host (the .ir snapshot locks the exact constraint string
`={rax},{rax},{rdi},{rsi},{rdx},~{rcx},~{r11},~{memory}` — the §II.11
silent-miscompile risk zone), runs natively on x86_64-linux printing
"ok\n" (hand-authored .stdout, asserted only in execute mode).

Pure additive test coverage — no compiler change (lock commit).
zig build test green (660 corpus, 446 unit).
2026-06-16 06:38:13 +03:00
agra
9e7661b915 docs(asm): checkpoint 0138 resolved — output-to-const rejection done 2026-06-16 06:30:22 +03:00
agra
2a954ceeb6 fix(0138): diagnose @scalar-const address-of (no storage)
A scalar `::` constant folds to its value and has no storage. The
unary `.address_of` lowering (src/ir/lower/expr.zig) skipped the
alloca path (is_alloca == false) and resolveGlobalRef (scalar consts
get no storage global), falling through to the generic addr_of arm,
which reinterpreted the folded value as a pointer:
`inttoptr (i64 <value> to ptr)`. That wild pointer segfaulted on
deref and emitted invalid stores for inline-asm `-> @const`.

Diagnose instead, in the address_of(identifier) path: a non-alloca,
non-ref-capture, non-pack-elem scope binding (local scalar const) and
a module_const_map name not backed by storage (module scalar const)
both report "cannot take the address of constant '<name>' — a scalar
'::' constant has no storage …" and return a placeholder Ref. Chose
diagnose over materializing read-only storage (consistent with the
fold-only scalar model). Array/struct consts keep real storage and
stay addressable (@K/@LIT unchanged).

Also gives the ASM stream's planned output-to-const rejection for
free — asm `-> @const` lowers through the same path. Regression:
examples/1177-diagnostics-addr-of-const-rejected.sx. Resolves 0138.
2026-06-16 06:29:36 +03:00
agra
c760b92548 issue(0138): @const address-of yields wild pointer; ASM output-to-const BLOCKED
Filed issues/0138: `@const` (address-of a `::` comptime constant) lowers
to `inttoptr (i64 <value> to ptr)` — segfaults on deref, invalid store for
asm `-> @const`. Root cause in src/ir/lower/expr.zig .address_of (not asm).
Marked CHECKPOINT-ASM Next step BLOCKED on 0138 for the output-to-const
rejection item.
2026-06-15 23:18:37 +03:00
agra
97a4050462 docs(asm): checkpoint Phase G — read-write + place outputs 2026-06-15 23:08:24 +03:00
agra
4128416d48 feat(asm): read-write + place outputs
Implements read-write (`+r` / `+{reg}`) `-> @place` outputs. LLVM has
no `+` constraint, so a read-write place lowers to:

  - an output `=` constraint (return slot, stored back through the
    place after the call), with the leading `+` rewritten to `=`; plus
  - a TIED input constraint (the decimal index of that output) appended
    after the regular inputs, seeded with the place's loaded value
    passed as a call arg.

Tied inputs are appended last so existing operand indices (%[name] ->
${N}) are undisturbed; asmOperandIndex stays correct. Lowering no longer
rejects `+` (indirect `*` still rejected). emitInlineAsm grows the
arg/param arrays by the rw count, loads each seed, and emits the tied
constraint.

Verified by running: increment-in-place (41 -> 42) and a mixed case
(rw place + regular input + value output) producing the textbook
"=r,=r,r,0" constraint with correct ${N} indices. 1650 flipped from
the rejection lock to a runnable aarch64-pinned example (ir-only
elsewhere). zig build test green (658 corpus, 446 unit).
2026-06-15 23:07:38 +03:00
agra
335ac52374 test(asm): lock read-write + place-output rejection (Phase G prep)
Adds examples/1650-platform-asm-rw-place.sx exercising a `+r -> @x`
read-write place output. Currently rejected loudly at lowering
("not yet implemented"); this locks that behavior as a passing test.
The next commit implements read-write outputs and flips this example
to run end-to-end (increment-in-place → 42).
2026-06-15 23:00:48 +03:00
agra
967005621a feat(asm): Phase 2 — -> @place write-through outputs
An asm result can be STORED through a place (a local / struct field) instead of
returned; the place output does not join the result tuple.

- parser.zig: `-> @place` parses `@place` as an ordinary address-of expression
  → an out_place operand (the in-function form; reuses the existing `@` prefix).
- inst.zig: AsmOperand gains out_ty (the output slot's value type) so emit can
  build the combined return struct without re-deriving from Inst.ty.
- lower/expr.zig: out_place operand = the lowered @place address, out_ty = the
  pointee. Read-write (`+`) and indirect-memory (`*`) constraints rejected loudly
  (not yet implemented) rather than miscompiled.
- ops.zig emitInlineAsm: the LLVM return type is built from ALL outputs
  (out_value + out_place); after the call, out_place slots are stored through
  their address and out_value slots rebuild the sx result. Fast path when there
  are no place outputs (the struct return IS the result — pure-value asm IR
  unchanged).

Verified: write-to-local (42), struct field, mixed value+place (v=10 b=20), `+`
rejected. Locked with 1649-platform-asm-place-output (mixed, runs on aarch64).

zig build test green (657 corpus, 446 unit).
2026-06-15 22:47:34 +03:00
agra
b8800a234c docs(asm): add Inline Assembly section to readme
Documents the `asm { … }` expression (template + `-> Type` / `= expr` operands +
clobbers), the §II.5 auto-naming rule (register pin → implicit name; echo form
rejected), the result-shape rule (0→void+volatile / 1→T / N→tuple), `#string`
multi-instruction templates, and top-level global asm + lib-less `extern`
call-into. Per the docs-track-changes rule (inline asm is a landed user-facing
feature). Examples are ones verified running in the corpus.
2026-06-15 22:28:10 +03:00
agra
4d75b9323c feat(asm): Phase F — global (module-scope) asm
A top-level `asm { "tmpl", };` block (template only) lowers to LLVM `module asm`;
a lib-less `extern` declaration calls into the symbols it defines (the import
direction reuses the existing C-FFI extern path — no new surface).

- ast.zig: asm_global node (AsmGlobal { template }).
- parser.zig: parseAsmGlobal, dispatched from parseTopLevel on kw_asm — rejects
  `volatile` and any operands/clobbers (template only). The in-function asm
  expression form stays in parsePrimary.
- module.zig: Module.global_asm list; lower/decl.zig captures each template in
  lowerMainAndComptime (the real top-level pass — lowerDecls is dead for
  top-level); emit_llvm.zig emit() appends each via LLVMAppendModuleInlineAsm in
  source order.
- the new node forced asm_global arms in sema.zig (analyzeNode +
  findNodeAtOffset) and semantic_diagnostics.zig (checkBindingNames).

Verified end-to-end: an aarch64 `_my_add` global routine, called via `extern`,
returns 42 — AOT only (the ORC JIT doesn't link module-asm symbols; global-asm
symbols live in the final linked binary). Locked with 1648-platform-asm-global
({ "aot": true, "target": "macos" } → AOT build+run on aarch64, ir-only else).

zig build test green (656 corpus, 446 unit).
2026-06-15 22:22:29 +03:00
agra
d3c6ffed5a feat(asm): Phase E — multi-output asm returns tuples
Replaces the N>1 "Phase E" bail with a shared asmResultType helper (lowering +
inferType) that derives the result type from the out_value operands: 0→void,
1→T, N→a named tuple (fields named via the §II.5 effective-name rule).

Key realization: toLLVMType(tuple) already produces a literal struct {T1,…,Tn} —
exactly what LLVM's multi-output inline asm returns — so emit needs NO change.
Building the op with a tuple result type makes the asm call return the struct,
which IS sx's tuple value (destructured by the normal tuple_get path).

inferType's .asm_expr arm now also delegates to asmResultType (single owner), so
`return asm`, `x := asm`, and `q, r := asm` all agree on the type.

Verified end-to-end on aarch64: split(0x1234)→(lo=52,hi=18), a udiv/msub
divmod→(3,2). IR: `call { i64, i64 } asm "divq ${4}",
"={rax},={rdx},{rax},{rdx},r,~{cc}"(…)` → extractvalue → tuple.

1640 → the x86_64 multi-output IR lock (ir-only); 1647 → a multi-output example
that runs on aarch64.

zig build test green (655 corpus, 446 unit).
2026-06-15 21:55:38 +03:00
agra
5a5e04c6d5 feat(asm): Phase C.1 + D — inline asm codegen (runs end-to-end)
lowerAsmExpr stops bailing and builds the inline_asm op: resolves each operand's
effective name (§II.5 — explicit [name] else the {reg} pin), interns
template/constraints/clobbers, lowers input Refs, derives the result TypeId
(0→void, 1→T). Adds the last deferred validation (every %[name] must name an
operand). Multi-output (N>1) bails with a named "Phase E" diagnostic.

emitInlineAsm (backend/llvm/ops.zig) ports Zig's airAssembly: assembles the LLVM
constraint string (outputs → inputs → ~{clobber}, ',' → '|'), rewrites the
template (%[name]→${N}, %%→%, $→$$, %=→${:uid}), then LLVMGetInlineAsm +
LLVMBuildCall2 (AT&T dialect). Dispatch wired in emit_llvm.zig (replacing the C.0
@panic tripwire).

inferType gains an .asm_expr arm (expr_typer.zig) so a bare `x := asm {…-> T}`
binding types correctly — without it the binding inferred .unresolved and
silently produced 0.

llvm_shim.c: LLVMInitializeNativeAsmParser() — the JIT must assemble inline asm
at run time.

Verified end-to-end on the aarch64 host: `mov`/`add` with register-class inputs
and a value output run (exit 42/99), `nop volatile` runs (exit 0). IR is
textbook: `call i64 asm "add ${0},${1},${2}", "=r,r,r"(…)`.

Locked with 1645 (aarch64 add, runs; ir-only on non-aarch64) + 1646 (:= binding).
Updated 1640 (now Phase-E bail) + 1642 (now runs).

zig build test green (654 corpus, 446 unit).
2026-06-15 21:39:54 +03:00
agra
6c08de8ec1 feat(asm): Phase C.0 — add inline_asm IR op (lock, no behavior change)
Adds the `inline_asm: InlineAsm` opcode to the IR Op union (inst.zig): interned
template + operand list (role/name/constraint/operand) + interned clobber names
+ has_side_effects; the result rides on Inst.ty (void / scalar / tuple).

The new variant forces coverage in the exhaustive Op switches:
- interp.zig: loud bailDetail — inline asm is never comptime-evaluable.
- print.zig: an IR-dump arm.
- emit_llvm.zig: a @panic TRIPWIRE — emit lands in Phase D, and until then
  lowerAsmExpr still bails, so no inline_asm op is ever created. Reaching emit
  would mean lowering switched over before emit was ready; crash loudly rather
  than miscompile.

No behavior change: lowering still bails, the op is constructed only in the new
`inline_asm op shape` unit test (inst.test.zig).

zig build test green (652 corpus, 446 unit).
2026-06-15 21:00:12 +03:00
agra
5f444aae26 feat(asm): Phase B.1 — operand-name validation (echo + duplicates)
Extends lowerAsmExpr with a pinnedRegister(constraint) helper and two §II.5
operand-naming checks, in the compile path before the codegen bail:

- reject the echo form `[eax] "={eax}"` — a label identical to the register its
  own constraint pins is redundant (the operand is already auto-named after the
  register); the useful form is a label that differs (`[quot] "={rax}"`);
- reject duplicate operand names (ambiguous %[name] / result field).

Locked with 1643-platform-asm-echo-name and 1644-platform-asm-duplicate-name.

zig build test green (652 corpus, 445 unit).
2026-06-15 20:41:41 +03:00
agra
1040b8c776 feat(asm): Phase B.0 — validate asm shape in the compile path
Restructures the .asm_expr lowering arm into lowerAsmExpr, which validates the
asm shape with specific named diagnostics BEFORE the not-yet-implemented codegen
bail, so the user sees the real problem first. Two checklist items enforced:

- template must be a compile-time-known string ("..." or #string), not a
  runtime expression;
- an asm with no value outputs must be `volatile` (else its effects could be
  deleted) — mirrors Zig's rule.

Valid shapes still bail with the "codegen not yet implemented" message. Result-
type derivation + the operand auto-naming rule stay deferred to Phase C, where a
real IR op makes the result type observable/testable.

Locked with 1641-platform-asm-missing-volatile (the volatile error) and
1642-platform-asm-nop-volatile (no-output + volatile accepted → codegen bail).

zig build test green (650 corpus, 445 unit).
2026-06-15 20:35:43 +03:00
agra
f8e029d719 feat(asm): Phase A.1 — parse asm { … } into AsmExpr; loud lowering bail
`asm volatile? { "tmpl", [name]? "constraint" (-> Type | = expr), …,
clobbers(.…) }` now parses into a flat-operand AsmExpr/AsmOperand (ast.zig +
parser.zig parseAsmExpr, dispatched from parsePrimary on .kw_asm). `volatile`
and `clobbers` are recognized contextually (not reserved). `-> @place`
write-through is rejected with a clear "Phase 2" parse error.

Codegen is not implemented yet (IR op + LLVM emit are Phases C–E), so lowering
bails LOUD + named via an explicit .asm_expr arm in lower/expr.zig (not the
generic unknown_expr else) — emitPlaceholder makes hasErrors() abort the build
on the message.

The new asm_expr tag forced (and got) arms in three exhaustive Node.Data
switches: sema.zig analyzeNode + findNodeAtOffset, semantic_diagnostics.zig
checkBindingNames — each recurses into template + operand payloads.

Design: adopted the operand auto-naming rule (design §II.5) — name auto-derived
from a {reg} pin, explicit [name] only when it differs or for register-class
operands, echo form rejected. Typing-stage rule; parser stores name: ?[]const u8.

Locked with examples/1640-platform-asm-parse.sx (multi-output divmod: named
operands, register pins, clobbers — parses then bails, called from main).

Also files issue 0137 (pre-existing, orthogonal: `sx run` with no `main`
segfaults via an unguarded JIT entry lookup in target.zig — not an asm bug).

zig build test green (648 corpus, 445 unit).
2026-06-15 20:21:25 +03:00
agra
3c9ecd0b42 feat(asm): Phase A.0 — add kw_asm keyword + lex test
`asm` now lexes as a dedicated `kw_asm` keyword (Token.Tag + keyword map entry).
`volatile` and `clobbers` stay out of the global keyword table — they are
recognized contextually only inside an `asm { … }` body (PLAN-ASM Deviation 4).

- token.zig: kw_asm tag + `.{ "asm", .kw_asm }` map entry.
- lsp/server.zig: classifyToken exhaustive switch gained the .kw_asm arm
  (the new enum value forced coverage — intended tripwire).
- lexer.test.zig (new, wired into root.zig barrel): locks `asm`->kw_asm and
  `volatile`/`clobbers`->identifier.

Lock commit (behavior-locking passing test). zig build test green (445 unit).
2026-06-15 18:32:34 +03:00
agra
c92d11e748 docs(asm): Phase 0.2 — document <name>.build sidecar; Phase 0 complete
CLAUDE.md §Testing + §Test-layout now describe the optional `<name>.build` JSON
config (aot + target keys, ir-only arch-gating, unknown-key-is-error) and list
it among the `expected/` files, replacing the stale standalone `.aot` marker
prose. Closes Phase 0 (corpus target-gating); next is Phase A (kw_asm keyword).
2026-06-15 18:20:33 +03:00
agra
0095584105 test(asm): Phase 0.1 — corpus ir-only branch for cross-target examples
When a `.build` target doesn't match the host, the runner can't execute the
example here, so it verifies via `sx ir --target` only: asserts exit + the `.ir`
snapshot (stdout) + diagnostics (stderr), never `.stdout`. An `.ir` snapshot is
REQUIRED in ir-only mode — its absence is a loud failure, never a silent pass.

- corpus_run.test.zig: ir_only flag (target set & !hostMatchesTarget); first
  dispatch arm runs `sx ir`, sets act_exit/act_err/act_ir; skip stdout in both
  update and verify modes; require ir_raw.
- lock fixture 1639-platform-target-cross (asm-free main, target x86_64-linux,
  checked-in .ir). Verified: corrupt .ir => IR mismatch; delete .ir => require
  failure.

Test-infra only; no compiler code. zig build test green (647 corpus, 444 unit).
2026-06-15 18:19:17 +03:00
agra
c88f4fbcef test(asm): Phase 0.0 — corpus target-gating + .build JSON config
Adds per-example build/run directives to the corpus runner via an optional
`expected/<name>.build` JSON sidecar (`BuildConfig { aot, target }`), replacing
the standalone `.aot` marker. Threads `--target` into the run/build/ir spawns
and gates the execute path on host arch+os match; a cross-target example fails
loudly ("ir-only mode not yet implemented") pending Phase 0.1.

- corpus_run.test.zig: BuildConfig + std.json parse (unknown-key => error),
  hostMatchesTarget (shorthand-expand + arch/os token match, arm64->aarch64),
  withTarget argv helper; unit tests for both.
- migrate 1226/1227 `.aot` markers -> `.build` { "aot": true }.
- lock fixture 1638-platform-target-host (`.build` { "target": "macos" }).

Test-infra only; no compiler code. zig build test green (646 corpus, 444 unit).
2026-06-15 17:37:35 +03:00
169 changed files with 4761 additions and 45 deletions

View File

@@ -431,10 +431,12 @@ After any compiler change:
- A test is still keyed off its `expected/<name>.exit` marker, so seed an
empty marker first for a brand-new example (see "Adding a feature").
`zig build test` is the only way to run the corpus — there is no standalone
shell runner (the legacy `tests/run_examples.sh` was removed). An
`expected/<name>.aot` marker switches an example from JIT `sx run` to a
`sx build` + execute flow (needed to exercise a C-ABI symbol exported FROM sx
a JIT-resident symbol is invisible to a dlopen'd C dylib).
shell runner (the legacy `tests/run_examples.sh` was removed). Per-example
build/run directives live in an optional `expected/<name>.build` **JSON** sidecar
(see "Test layout" below): `{ "aot": true }` switches an example from JIT `sx run`
to a `sx build` + execute flow (needed to exercise a C-ABI symbol exported FROM sx
— a JIT-resident symbol is invisible to a dlopen'd C dylib); `{ "target":
"x86_64-linux" }` threads `--target` and arch-gates the example.
### Test layout
@@ -453,6 +455,7 @@ split into three streams (no more merged `2>&1`) plus an optional IR snapshot:
<root>/expected/XXXX-category-name.stdout # normalized stdout
<root>/expected/XXXX-category-name.stderr # normalized stderr
<root>/expected/XXXX-category-name.ir # optional `sx ir` snapshot
<root>/expected/XXXX-category-name.build # optional JSON build/run directives
```
A test is any `<name>.sx` with an `expected/<name>.exit` marker. The runner
@@ -460,6 +463,20 @@ scans two roots: `examples/` (the feature suite) and `issues/` (pinned bug
repros). Multi-file tests keep companions (`.c`/`.h`, imported `.sx`, fixture
dirs) under the same `XXXX-` prefix.
The optional `<name>.build` JSON sidecar carries per-example directives
(unknown keys are a hard error — never silently ignored):
- `"aot": true` — build a native binary and execute it instead of JIT `sx run`.
- `"target": "<triple|shorthand>"` — thread `--target` into every `sx`
invocation and gate on the host. If the target's arch+os **match** the host,
the example runs normally; if they **mismatch** (e.g. `x86_64-linux` on an
aarch64 host), the runner switches to **ir-only** mode — it skips
run/build/exec and asserts only `.exit` + `.ir` + `.stderr` from
`sx ir --target` (`.stdout` is not asserted). An `.ir` snapshot is **required**
in ir-only mode (its absence is a loud failure). This is how arch-pinned
examples (e.g. x86_64 inline-asm) are tested on a non-matching dev host while
still running end-to-end on a matching CI runner.
### Snapshot integrity
**Never regenerate snapshots while tests are failing.** `-Dupdate-goldens` (and the legacy `--update`) blindly overwrite expected output with whatever the compiler produces — including error messages. If you regenerate during a broken state, the test suite will "pass" against garbage output and real regressions become invisible.

View File

@@ -1,24 +1,394 @@
# sx Inline Assembly — Checkpoint (ASM stream)
Companion to `current/PLAN-ASM.md`; design in
[docs/inline-asm-design.md](../docs/inline-asm-design.md). Update after every
[design/inline-asm-design.md](../design/inline-asm-design.md). Update after every
commit, one step at a time per the cadence rule (no commit may both add a test
and make it pass).
## Last completed step
None — plan authored, not yet started.
**G (indirect-memory `=*m` place outputs)** — the LAST substantive asm feature.
Unlike a write-through `=` output (which returns a value then stored), an
indirect output passes the place ADDRESS to the asm and the asm writes through
it — no return slot. `emitInlineAsm` (`src/backend/llvm/ops.zig`): indirect
outputs are excluded from the LLVM return type; their pointer is an opaque `ptr`
call arg placed **first** (arg-consuming constraint order = output-section
indirect pointers → inputs → read-write tied seeds); each gets an
`elementtype(T)` call-site attribute (required in the opaque-pointer era) via
`LLVMCreateTypeAttribute`/`LLVMAddCallSiteAttribute`; the store-back loop skips
them. New `asmIsIndirect(e, op)` helper. Lowering (`lowerAsmExpr`) stops
rejecting `*` (constraint kept verbatim, `=*m` reaches the constraint string
as-is). `asmOperandIndex` unchanged — indirect outputs still count as operands,
so `%[name]``${N}` holds. Verified by **running** on aarch64: store-through-
pointer (`str x9, %[out]` → 42, IR `"=*m,~{x9}"(ptr elementtype(i64) …)`) and a
mixed case (indirect + value output + input → `"=*m,=r,r"`, indirect ptr arg
first, `${0}/${1}/${2}` correct). Two commits per cadence: (1)
`examples/1652-platform-asm-indirect-mem.sx` locked the rejection; (2) implemented
+ flipped 1652 to a runnable aarch64-pinned example (`{ "target": "macos" }`,
ir-only elsewhere). `zig build test` green (661 corpus, 446 unit). Files:
`src/ir/lower/expr.zig`, `src/backend/llvm/ops.zig`, `examples/1652-*`.
Prior: **G (read-write `+` place outputs)** — a `+r` / `+{reg}` `-> @place` output is now
implemented. LLVM has no `+` constraint, so a
read-write place lowers to: an output **`=`** constraint (return slot, stored back
through the place after the call; the leading `+` rewritten to `=` in
`appendAsmConstraints`), **plus** a **tied input** (the decimal index of that
output) appended **after** the regular inputs, seeded with the place's loaded
value passed as a call arg. Tied inputs come **last** so existing operand indices
(`%[name]``${N}`) are undisturbed — `asmOperandIndex` unchanged. Lowering
(`lowerAsmExpr`) no longer rejects `+` (indirect `*` still rejected loudly).
`emitInlineAsm` (`src/backend/llvm/ops.zig`): grows arg/param arrays by the rw
count (`n_args = n_inputs + n_rw`), loads each seed (`asm.rw.seed`), emits the
tied constraint, and the existing store-back path writes the modified output back.
New `asmIsReadWrite(e, op)` helper. Verified by **running**: increment-in-place
(41→42, IR `"=r,0"`) and a mixed case (rw place + regular input + value output) →
textbook `"=r,=r,r,0"` with correct `${N}` indices and args `(input, seed)`. Two
commits per cadence: (1) `examples/1650-platform-asm-rw-place.sx` locked the
rejection; (2) implemented + flipped 1650 to a runnable aarch64-pinned example
(`{ "target": "macos" }`, ir-only elsewhere). `zig build test` green (658 corpus,
446 unit). Files: `src/ir/lower/expr.zig`, `src/backend/llvm/ops.zig`,
`examples/1650-*`.
Prior: **2**`-> @place` write-through outputs. An asm result can be **stored through
a place** (local / struct field) instead of returned; the place output does NOT
join the result tuple. Parser: `-> @place` parses the `@place` as an ordinary
address-of expression → an `out_place` operand (`src/parser.zig`). Lowering
(`lowerAsmExpr`): out_place operand = the lowered `@place` address, `out_ty` =
the pointee; read-write (`+`) and indirect-memory (`*`) constraints rejected
loudly (not yet implemented). Added `out_ty: TypeId` to the IR `AsmOperand`
(`src/ir/inst.zig`) so emit builds the **combined** return struct (ALL outputs).
`emitInlineAsm` rewrite (`src/backend/llvm/ops.zig`): the LLVM return type is now
built from every output's `out_ty`; after the call, out_place slots are
`store`d through their address and out_value slots rebuild the sx result — with a
**fast path** (no place outputs → the asm's struct return IS the result, so
pure-value asm IR is unchanged). Verified: write-to-local (`get42`→42), struct
field (`@p.b`), mixed value+place (`v=10 b=20`), `+` rejected. Locked with
`examples/1649-platform-asm-place-output.sx` (mixed, runs on aarch64). `zig build
test` green (657 corpus, 446 unit). Files: `src/parser.zig`, `src/ir/inst.zig`,
`src/ir/lower/expr.zig`, `src/backend/llvm/ops.zig`, `examples/1649-*`.
Prior: **F** — global (module-scope) asm. A top-level `asm { "tmpl", };` block (template
only) lowers to LLVM `module asm`, and a lib-less `extern` calls into the symbols
it defines. New `asm_global` AST node (`src/ast.zig`) + `parseAsmGlobal`
(`src/parser.zig`, dispatched from `parseTopLevel` on `kw_asm`) — rejects
`volatile` and any operands/clobbers. The node forced (and got) arms in the same
three `Node.Data` switches as `asm_expr` (`sema.zig` ×2, `semantic_diagnostics.zig`).
`Module` gains a `global_asm: ArrayList([]const u8)` (`src/ir/module.zig`);
`lowerMainAndComptime` captures each template (the dead `lowerDecls` is NOT the
top-level pass — `lowerRoot` Pass 2 uses `lowerMainAndComptime`); `emit_llvm.zig`'s
`emit()` appends each via `LLVMAppendModuleInlineAsm` (source order). Verified
end-to-end: an aarch64 `_my_add` global routine called via `extern` returns 42.
Locked with `examples/1648-platform-asm-global.sx`
(`.build { "aot": true, "target": "macos" }` → AOT build+run on aarch64, ir-only
elsewhere). `zig build test` green (656 corpus, 446 unit). **(Correction, later:
module asm ALSO runs under the JIT — `sx run` compiles to an in-memory object,
the integrated assembler assembles the `module asm` into it, ORC relocates and
runs it, so the symbol is resolvable at JIT main execution. The original "AOT
only" note was wrong; see 1653 for the JIT sibling. The genuine boundary is a
COMPILE-TIME `#run` call into a module-asm symbol, which fails loud via host
dlsym-miss — see 1654.)** Files: `src/ast.zig`, `src/parser.zig`, `src/sema.zig`,
`src/ir/semantic_diagnostics.zig`, `src/ir/module.zig`, `src/ir/lower/decl.zig`,
`src/ir/emit_llvm.zig`, `examples/1648-*`.
Prior: **E** — multi-output tuples. **Inline asm now returns tuples.** Replaced the
N>1 bail with a shared `asmResultType` helper (`src/ir/lower/expr.zig`, mixed
into `Lowering`) that derives the result type from the `out_value` operands
(0→void, 1→T, N→named tuple, named via the §II.5 effective-name rule). The key
realization: `toLLVMType(tuple)` already produces a literal struct `{T1,…,Tn}`
exactly LLVM's multi-output asm return — so **emit needed NO change**; building
the op with a tuple result type makes the asm call return the struct, which IS
sx's tuple value (destructured by the normal `tuple_get` path). `inferType`'s
`.asm_expr` arm now also delegates to `asmResultType` (single owner), so
`return asm`, `x := asm`, and `q, r := asm` all agree on the type. Verified
end-to-end on aarch64: `split(0x1234)``(lo=52, hi=18)`, a udiv/msub divmod→
`(3, 2)`. IR is textbook: `call { i64, i64 } asm "divq ${4}",
"={rax},={rdx},{rax},{rdx},r,~{cc}"(…)` → extractvalue → tuple. Converted 1640 to
the x86_64 multi-output IR lock (ir-only) + added `1647-platform-asm-aarch64-multi`
(runs on aarch64). `zig build test` green (655 corpus, 446 unit). Files:
`src/ir/lower/expr.zig`, `src/ir/lower.zig`, `src/ir/expr_typer.zig`,
`examples/164{0,7}-*`.
Prior: **C.1 + D** — inline asm CODEGEN (lowering builds the op + LLVM emit). **Inline
assembly now runs end-to-end.** `lowerAsmExpr` (`src/ir/lower/expr.zig`) stops
bailing: it resolves each operand's effective name (§II.5 auto-naming), interns
template/constraints/clobbers, lowers input `Ref`s, derives the result `TypeId`
(0→void, 1→T), and builds the `inline_asm` op. Added a `%[name]`-references-a-
real-operand check (the last deferred validation). Multi-output (N>1) still bails
loudly ("Phase E"). `emitInlineAsm` (`src/backend/llvm/ops.zig`, port of Zig's
`airAssembly`): assembles the LLVM constraint string (outputs→inputs→`~{clobber}`,
`,``|`), rewrites the template (`%[name]``${N}`, `%%``%`, `$``$$`, `%=`
`${:uid}`), then `LLVMGetInlineAsm` + `LLVMBuildCall2` (AT&T). Dispatch wired
(`emit_llvm.zig`, replacing the C.0 `@panic`). **`llvm_shim.c`**: added
`LLVMInitializeNativeAsmParser()` — the JIT must assemble inline asm at run time.
Verified end-to-end: aarch64 `add`/`mov` run on the host (exit 42), `nop volatile`
runs (1642 now exit 0), IR is textbook (`call i64 asm "add ${0},${1},${2}",
"=r,r,r"(…)`). Locked with `examples/1645-platform-asm-aarch64-add.sx` (runs on
aarch64, ir-only elsewhere via `.build` + `.ir`). Also added the `inferType`
`.asm_expr` arm (`src/ir/expr_typer.zig`, 0→void / 1→T) — without it a bare
`x := asm {…-> T}` binding inferred `.unresolved` and silently produced 0;
regression-locked with `examples/1646-platform-asm-value-binding.sx`. Updated
1640 (now Phase-E bail) + 1642 (now runs). `zig build test` green (654 corpus,
446 unit). Files: `src/ir/lower/expr.zig`, `src/backend/llvm/ops.zig`,
`src/ir/emit_llvm.zig`, `src/ir/expr_typer.zig`, `llvm_shim.c`,
`examples/164{0,2,5,6}-*`.
Prior: **C.0** — IR op `inline_asm` (lock; no behavior change). Added `inline_asm:
InlineAsm` to the IR `Op` union + the `InlineAsm` struct (`template: StringId`,
`operands: []const AsmOperand` {role/name/constraint/operand}, `clobbers:
[]const StringId`, `has_side_effects`) in `src/ir/inst.zig` — all strings
interned, operands in source order, result on `Inst.ty`. The new variant forced
(and got) arms in two exhaustive `Op` switches: `src/ir/interp.zig` (loud
`bailDetail` — inline asm is never comptime-evaluable) and `src/ir/print.zig`
(IR dump). `src/ir/emit_llvm.zig` gets a `@panic` **tripwire** — emit lands in
Phase D, and until then `lowerAsmExpr` still bails so no `inline_asm` op is ever
created (reaching emit would be a lowering-switched-over-too-early bug). Unit
test `inline_asm op shape` in `src/ir/inst.test.zig`. `zig build test` green
(652 corpus, 446 unit). Files: `src/ir/inst.zig`, `src/ir/interp.zig`,
`src/ir/print.zig`, `src/ir/emit_llvm.zig`, `src/ir/inst.test.zig`.
Prior: **B.1** — operand-name validation (design §II.5 auto-naming rule). Extended
`lowerAsmExpr` with a `pinnedRegister(constraint)` helper (`"={eax}"``eax`,
`"+{rax}"``rax`, `"=r"`→null) and two checks: (1) **reject the echo form**
`[eax] "={eax}"` — a label identical to its own pinned register is redundant
(the operand is already auto-named after the register); (2) **reject duplicate
operand names** (ambiguous `%[name]` / result field). Locked with
`examples/1643-platform-asm-echo-name.sx` + `1644-platform-asm-duplicate-name.sx`.
`zig build test` green (652 corpus, 0 failed; 445 unit). Files:
`src/ir/lower/expr.zig`.
Prior: **B.0** — asm shape validation (compile-path diagnostics). Restructured the
`.asm_expr` lowering arm into `lowerAsmExpr` (`src/ir/lower/expr.zig`, mixed into
`Lowering` in `src/ir/lower.zig`): it validates BEFORE the not-yet-implemented
codegen bail, so the user sees the real problem first. Two checklist items now
enforced with named diagnostics: (1) **template must be a compile-time-known
string** (`"..."` / `#string`); (2) **no value outputs ⇒ must be `volatile`**
(mirrors Zig — a result-less asm could be deleted). Valid shapes still bail with
the "codegen not yet implemented" message. Result-type derivation + auto-naming
stay deferred to a later step (observable only once Phase C produces a real IR
op). Locked with `examples/1641-platform-asm-missing-volatile.sx` (volatile
error) + `1642-platform-asm-nop-volatile.sx` (volatile no-output accepted →
codegen bail). `zig build test` green (650 corpus, 0 failed; 445 unit). Files:
`src/ir/lower/expr.zig`, `src/ir/lower.zig`, `examples/164{1,2}-*`.
Prior: **A.1** — parse `asm { … }` + loud lowering bail (folded A.1+A.2 into one honest
lock commit, since the loud bail IS current correct behavior — cadence option
(a)). Added `AsmExpr`/`AsmOperand` to `src/ast.zig` + the `asm_expr` `Node.Data`
arm; `parseAsmExpr` in `src/parser.zig` (`parsePrimary` `.kw_asm` dispatch) —
parses the template, flat operand list (`[name]? "constraint" -> Type` value
output / `= expr` input), and `clobbers(.…)`; `volatile`/`clobbers` recognized
contextually via `isContextualWord`. The new `asm_expr` tag forced (and got)
arms in three exhaustive `Node.Data` switches: `src/sema.zig` `analyzeNode` +
`findNodeAtOffset`, `src/ir/semantic_diagnostics.zig` `checkBindingNames` (all
recurse into template + operand payloads). Lowering bails LOUD + named in
`src/ir/lower/expr.zig` ("inline assembly codegen is not yet implemented…") via
an explicit `.asm_expr` arm (not the generic `unknown_expr` else) returning
`emitPlaceholder`. `-> @place` write-through is rejected with a clear "Phase 2"
parse error. Locked with `examples/1640-platform-asm-parse.sx` (multi-output
`divmod`, named operands, register pins, clobbers — parses then bails; called
from `main`). `zig build test` green (648 corpus, 0 failed; 445 unit). Files:
`src/ast.zig`, `src/parser.zig`, `src/sema.zig`, `src/ir/semantic_diagnostics.zig`,
`src/ir/lower/expr.zig`, `examples/1640-*`.
Prior: **A.0**`kw_asm` keyword (first compiler code). Added the `kw_asm` `Token.Tag`
variant + `.{ "asm", .kw_asm }` keyword-map entry in `src/token.zig`; `volatile` /
`clobbers` deliberately stay OUT of the global table (contextual). New exhaustive
`Tag` switch in `src/lsp/server.zig` `classifyToken` flagged the missing arm (the
intended coverage tripwire) — added `.kw_asm` to the keyword group. Lock test in
new `src/lexer.test.zig` (`asm``kw_asm`, `volatile`/`clobbers``identifier`),
wired into the `src/root.zig` barrel as `lexer_tests`. `zig build test` green (648
corpus, 0 failed; 445 unit, 0 failed — +1). Files: `src/token.zig`,
`src/lexer.test.zig`, `src/root.zig`, `src/lsp/server.zig`.
Prior: **0.2** — CLAUDE.md docs for `<name>.build`; **Phase 0 COMPLETE**.
**0.1** — corpus runner **ir-only branch** for cross-target examples. Replaced
0.0's loud placeholder bail: when `cfg.target` doesn't match the host (`ir_only`),
`sweepRoot` skips run/build/exec and verifies via `sx ir --target` only —
asserting `.exit` (ir cmd) + `.ir` (normalized stdout) + `.stderr`, never
`.stdout` (write skipped in update mode, assertion skipped in verify mode). An
`.ir` snapshot is **required** in ir-only mode — its absence is a loud failure
("needs an .ir snapshot for ir-only mode"). Locked with
`examples/1639-platform-target-cross.sx` (asm-free `main :: () -> i64 { return 0;
}`), `.build` `{ "target": "x86_64-linux" }`, + checked-in `.ir`. Verified both
guards fire: corrupting the `.ir` → IR mismatch; deleting it → the require-failure.
`zig build test` green (647 corpus, 0 failed; 444 unit). Files:
`src/corpus_run.test.zig`, `examples/1639-*`.
## Current state
Design fully converged (`docs/inline-asm-design.md`). Feasibility confirmed:
`llvm_api.c.*` exposes `LLVMGetInlineAsm` / `LLVMBuildCall2` /
`LLVMAppendModuleInlineAsm` (LLVM@19). No code written.
**Inline assembly works end-to-end: 0, 1, and N value outputs (tuples).** Full
pipeline: lex (A.0) → parse (A.1) → validate (B.0/B.1 + `%[name]` check) → IR op
(C.0) → lower-builds-op + LLVM emit + JIT asm-parser init (C.1/D) → multi-output
tuples (E). Register-class + register-pinned operands, inputs, **symbol operands
(`"s"` → direct `bl`/`call` to a function/global by mangled name)**, clobbers,
`#string` multi-instruction templates, `%[name]`/`%%` rewriting, and the §II.5
auto-naming rule all work and execute on the host JIT. Global `asm { … }` (Phase F) works via
lib-less `extern` under BOTH the JIT (`sx run` → 1653) and AOT (1648) — `sx run`
compiles to an object, so the integrated assembler bakes the `module asm` symbol
in and ORC resolves it. All three `-> @place` output forms now work and execute
on aarch64: **write-through** `=` (Phase 2), **read-write** `+` (tied input), and
**indirect-memory** `=*m` (pointer arg + `elementtype`, asm writes through it).
**Inline assembly is now feature-complete — no substantive features remain.** The
x86_64 syscall-write ir-only example is DONE (1651). Global asm runs under both
JIT (1653) and AOT (1648). `readme.md` now has an "Inline Assembly" section.
Known orthogonal bug: **issue 0137**`sx run` on a program with no `main`
segfaults (`src/target.zig:256-273`, unguarded JIT entry lookup). Pre-existing,
asm-independent; does NOT block the ASM stream (every example has a `main`).
Phase EF feasibility already confirmed against the live tree
(`LLVMGetInlineAsm` / `LLVMBuildCall2` / `LLVMAppendModuleInlineAsm` in LLVM@19
`Core.h`; ERR-stream `extractvalue`→tuple in `emit_llvm.zig:726-927`; lib-less
`extern`, 60 sites; `--target` a global CLI flag).
## Next step
**A.0** — add the `kw_asm` keyword (`src/token.zig` Tag + `StaticStringMap`) and a
unit lex test. Then A.1 (parse `asm { … }``AsmExpr`, lowering bails loudly).
**Inline assembly is feature-complete.** All substantive features are done:
0/1/N value outputs (tuples), register-class + pinned operands, inputs, clobbers,
`#string` templates, `%[name]`/`%%`/`$`/`%=` rewriting, §II.5 auto-naming, global
`asm { … }` (AOT), and all three `-> @place` output forms — write-through (`=`),
read-write (`+`), and indirect-memory (`=*m`). The x86_64 syscall-write ir-only
example (1651) and the output-to-`const` rejection (issue 0138) are also done.
Global asm runs under BOTH the JIT (`sx run` → object → ORC; 1653) and AOT (1648)
— the earlier "AOT only / `sx run` mishandles module-asm" note was stale and has
been corrected. The one genuine boundary is a COMPILE-TIME `#run` into a
module-asm symbol: the interpreter resolves externs via host dlsym, the symbol
isn't linked yet, so it already fails loud (`comptime extern call: symbol not
found via dlsym`) — pinned by 1654.
Remaining work, all **polish** (optional):
- None substantive. Possible niceties: tighten the `#run`-into-module-asm error
text to name module-asm specifically; broaden clobber validation to a checked
per-arch enum (design doc Phase 4).
Orthogonal: **issue 0137** (no-`main` JIT segfault).
Done since last: output-to-`const` rejection (issue 0138), x86_64 syscall-write
ir-only example (1651).
Orthogonal: **issue 0137** (no-`main` segfault).
## Log
- (init) Plan + design doc written; ASM stream opened.
- (0.0) Corpus runner target-gating: `<name>.build` JSON config (replaces `.aot`
marker), `--target` threading, `hostMatchesTarget` execute-gate, loud
cross-target placeholder bail. Migrated 1226/1227 `.aot``.build`; locked with
1638 fixture + unit tests. `zig build test` green.
- (0.1) ir-only branch: cross-target examples verify via `sx ir --target` only
(exit+ir+stderr, no stdout; `.ir` required). Locked with 1639 fixture; verified
corrupt-.ir → mismatch and missing-.ir → loud failure. `zig build test` green.
- (0.2) docs: CLAUDE.md documents `<name>.build` JSON sidecar (aot + target +
ir-only gating), replacing stale `.aot` marker prose. **Phase 0 COMPLETE.**
- (A.0) `kw_asm` keyword in token.zig (+ map entry); LSP `classifyToken` switch
coverage; lock test in new `lexer.test.zig` (wired via root.zig). `volatile` /
`clobbers` stay contextual identifiers. `zig build test` green (445 unit, +1).
- (A.1) parse `asm { … }``AsmExpr` + loud lowering bail; `asm_expr` arms in 3
exhaustive `Node.Data` switches; `-> @place` rejected (Phase 2). Adopted operand
auto-naming rule (design §II.5). Locked with 1640 fixture. Filed orthogonal
issue 0137 (no-`main` JIT segfault). `zig build test` green (648 corpus, 445 unit).
- (B.0) asm shape validation in `lowerAsmExpr`: comptime-string template +
no-output⇒volatile, with named diagnostics before the codegen bail. Locked with
1641 (volatile error) + 1642 (volatile accepted). `zig build test` green (650
corpus, 445 unit).
- (B.1) operand-name validation: `pinnedRegister` helper + reject echo form
(`[eax] "={eax}"`) and duplicate names. Locked with 1643 + 1644. `zig build
test` green (652 corpus, 445 unit).
- (C.0) IR op `inline_asm: InlineAsm` + interp `bailDetail` + print arm + emit
`@panic` tripwire (Phase D). No behavior change (lowering still bails). Unit
test `inline_asm op shape`. `zig build test` green (652 corpus, 446 unit).
- (C.1+D) CODEGEN — `lowerAsmExpr` builds the op (effective names, interned
strings, input Refs, 0/1 result type) + `%[name]` validation; `emitInlineAsm`
(constraint string + template rewrite + `LLVMGetInlineAsm`/`BuildCall2`, AT&T);
`inferType` arm; `LLVMInitializeNativeAsmParser` for the JIT. **Inline asm runs
end-to-end.** N>1 bails (Phase E). Locked with 1645 (aarch64 add, runs) + 1646
(`:=` binding); updated 1640/1642. `zig build test` green (654 corpus, 446 unit).
- (E) multi-output tuples — `asmResultType` helper (0→void/1→T/N→named tuple),
shared by lowering + `inferType`. `toLLVMType(tuple)` == LLVM multi-output
struct, so emit unchanged; the asm struct return IS the sx tuple. Runs on
aarch64 (1647: `split``(lo,hi)`); 1640 → x86 multi-output IR lock (ir-only).
`zig build test` green (655 corpus, 446 unit).
- (F) global asm — `asm_global` AST node + `parseAsmGlobal` (top-level, rejects
volatile/operands); `Module.global_asm` captured in `lowerMainAndComptime`;
`emit()` appends via `LLVMAppendModuleInlineAsm`; call-into via lib-less
`extern`. AOT-verified (1648, `_my_add`→42). `zig build test` green (656 corpus).
- (docs) readme.md "Inline Assembly" section (b8800a2).
- (2) `-> @place` write-through — `out_place` operand; `out_ty` on the IR
AsmOperand; `emitInlineAsm` builds the combined output struct + splits
(out_place → store-through, out_value → result), fast-path when no places.
`+`/`*` rejected. Locked with 1649 (mixed, runs). `zig build test` green (657
corpus, 446 unit).
- (G) read-write `+` place outputs — `+` lowers to an output `=` + a tied input
(output-index constraint) seeded with the place's loaded value, tied inputs
appended last (operand indices undisturbed). `appendAsmConstraints` rewrites
`+``=`; `emitInlineAsm` grows args by the rw count + loads seeds;
`asmIsReadWrite` helper. Lowering stops rejecting `+` (`*` still rejected). Two
commits (cadence): 1650 locked the rejection, then flipped to a runnable
aarch64 example (`"=r,0"` IR). `zig build test` green (658 corpus, 446 unit).
- (0138) output-to-`const` rejection — fixed the underlying general bug: scalar
`@const` (address-of a folded `::` constant) reinterpreted the value as a
pointer (`inttoptr`). `src/ir/lower/expr.zig` `.address_of` now diagnoses a
scalar const (local + module) instead of falling through; array/struct consts
keep storage. asm `-> @const` gets the clean diagnostic for free (same path).
Regression `examples/1177-diagnostics-addr-of-const-rejected.sx`. Issue 0138
RESOLVED. `zig build test` green (659 corpus, 446 unit).
- (x86 syscall) x86_64 Linux `write(2)` via raw `syscall` — locks the constraint
string `={rax},{rax},{rdi},{rsi},{rdx},~{rcx},~{r11},~{memory}` (register-pinned
inputs + pinned value output + pointer input + clobbers). ir-only on aarch64
(`.ir` asserted), runs on x86_64-linux (hand-authored `"ok\n"` stdout).
`examples/1651-platform-asm-x86-syscall-write.sx`. Pure additive lock, no
compiler change. `zig build test` green (660 corpus, 446 unit).
- (G indirect) indirect-memory `=*m` place outputs — the place address is passed
as an opaque `ptr` arg (with an `elementtype(T)` call-site attr), placed before
inputs; asm writes through it; no return slot; store-back skips it.
`asmIsIndirect` helper; lowering stops rejecting `*`. Verified by running on
aarch64 (store-through → 42; mixed indirect+value+input → `"=*m,=r,r"`). Two
commits (cadence): 1652 locked the rejection, then flipped to a runnable aarch64
example. **Inline asm now feature-complete.** `zig build test` green (661 corpus,
446 unit).
- (jit) explored "asm in JIT": found it ALREADY works — `sx run` emits an
in-memory object (integrated assembler bakes in both in-function inline asm and
`module asm`), then ORC relocates+runs it. The stale "AOT only / `sx run`
mishandles module-asm" checkpoint prose was corrected. Locked global-asm-under-
JIT with `examples/1653-platform-asm-global-jit.sx` (`{ "target": "macos" }`, no
aot, → 42). `zig build test` green (662 corpus, 446 unit).
- (comptime guard) pinned the one genuine module-asm boundary:
`examples/1654-platform-asm-global-comptime-call.sx``#run` into a module-asm
symbol fails loud (`comptime extern call: symbol not found via dlsym`) because
the interpreter resolves externs via host dlsym before link. Arch-independent
(no `.build`). `zig build test` green (663 corpus, 446 unit).
- (round trip) `examples/1655-platform-asm-callback-into-sx.sx` — global-asm
trampoline that `bl _cb` back into an `export`ed sx function (sx→asm→sx, → 42).
Documented that `export` (external linkage + C symbol + C ABI) is what makes
the callback resolvable; `callconv(.c)` alone leaves it `internal` (DCE'd).
`zig build test` green (664 corpus, 446 unit).
- (symbol ops) symbol operands (`"s"`) — feed a function/global symbol; the
template emits its platform-mangled name so `bl %[fn]` is a DIRECT branch (one
fewer indirection than register-indirect `blr`, portable — no hardcoded `_`).
Emit passes the operand with its own llvm type (LLVMTypeOf), no coercion
(`asmIsSymbol` helper); lowering lowers the function RHS to `ptr @fn`. Decided
AGAINST mirroring Zig (which has no symbol operand — 483 std asm sites, none
call a function) because the direct `bl` matters. Two commits (cadence): 1656
locked the rejection (replacing an LLVM-verifier crash), then implemented +
flipped to a runnable aarch64 example (objdump-confirmed direct `bl <_cb>`).
`zig build test` green (665 corpus, 446 unit).
- (x86 cross-arch) ir-only x86_64 siblings so each emit path is locked on BOTH
arches: 1657 read-write (`"incq ${0}","=r,0"`), 1658 indirect (`"movq $$42,
${0}","=*m"`(ptr elementtype)), 1659 symbol (`"call ${2:P}"`, direct call). x86
templates validated by cross-emitting an object (integrated assembler accepts;
objdump confirms 1659's direct `call` reloc). Pure additive locks. `zig build
test` green (668 corpus, 446 unit).
- (symbol portability) made `%[fn]` portable across arches — `renderAsmTemplate`
auto-injects LLVM's `:c` modifier (`${N}``${N:c}`) for symbol (`"s"`) operands
lacking an explicit modifier (`asmNamedIsSymbol` helper). Without it x86 renders
`$cb` (a bad `call` target needing a hand-written `:P`); aarch64 unaffected.
Verified `:c``:P` for x86-64 calls (both → `R_X86_64_PLT32`). Explicit
`%[fn:X]` still wins (escape hatch). 1659 dropped its `:P` → same plain `%[fn]`
as aarch64 1656; both IRs regen to `${N:c}`. `zig build test` green (668 corpus,
446 unit).
## Known issues
None yet.
- **0138** — RESOLVED. `@const` (address-of a `::` comptime constant) yielded a
wild pointer (`inttoptr (i64 <value> to ptr)`). Fixed by diagnosing scalar
`@const` in `src/ir/lower/expr.zig` `.address_of` (no storage; array/struct
consts unaffected). Delivered the ASM "output-to-`const` rejection" for free.
Regression `examples/1177-diagnostics-addr-of-const-rejected.sx`.
- **0137** — `sx run` on a program with no `main` segfaults (unguarded JIT entry
lookup, `src/target.zig:256-273`). Pre-existing, asm-independent. Filed
`issues/0137-jit-run-no-main-segfault.md`. Does not block A.1.

View File

@@ -0,0 +1,32 @@
# CHECKPOINT-REIFY — comptime `type_info` / `reify` (async-first foundation, step 3)
Companion to [PLAN-REIFY.md](PLAN-REIFY.md). Update after every step (one step at a
time, per the cadence rule).
## Last completed step
**None — stream just carved.** Design validated (3 codebase reviewers; all five reify
contracts confirmed feasible). No code written yet.
## Current state
- The plan + the five locked contracts exist in `PLAN-REIFY.md`; design-of-record is
`design/execution-evolution-roadmap.md` §7 step 3 + §8.1.
- **Nothing built.** `reify`/`type_info`/`field_type` do not exist in the compiler.
- Confirmed against the source (anchors in the plan): type minting via
`intern`/`internNominal` is programmatic and AST-free; type-fns memoize by mangled
name; enum codegen is fully type-table-driven (zero AST coupling); recursive
forward-declaration (reserve→complete) already exists for source types.
## Next step
**Phase 0.0 (lock):** add `TypeInfo`/`EnumInfo`/`EnumVariant` data types + bodyless
`#builtin` decls for `reify`/`type_info`/`field_type` to `library/modules/std/core.sx`
(parsed, unimplemented → loud bail), with a unit test that the decls parse. Then 0.1
(xfail: `examples/06xx-comptime-reify-enum.sx`) → 0.2 (green: implement `reify(.enum_)`).
## Known issues
None yet.
## Log
- **Stream carved.** Selected as the first async-first foundation: `reify` gates both
channel result types (`RecvResult($T)`) and `race`'s synthesized union, is fully
validated (3 reviewers), and is a self-contained compiler/type-system feature
testable in isolation (`06xx` comptime). Generic-enum syntax dropped in its favor.

View File

@@ -1,6 +1,6 @@
# sx Inline Assembly — Implementation Plan (ASM stream)
**Design source of truth:** [docs/inline-asm-design.md](../docs/inline-asm-design.md).
**Design source of truth:** [design/inline-asm-design.md](../design/inline-asm-design.md).
This plan turns that doc's §II.7 stage-map + §II.8 phasing into ordered,
commit-sized, testable steps. Read the design doc first — this file is the
*how/when*, not the *what/why*.
@@ -22,8 +22,93 @@ outputs return a tuple; templates are pure AT&T (via LLVM).
## Cadence (IMPASSIBLE)
No commit may both add a test AND make it pass. Each feature step is either a
behavior-locking PASSING test, or an xfail test the *next* commit turns green.
Arch-pinned tests live in `examples/16xx-platform-asm-*` (must declare `target=`).
Never regenerate snapshots while red.
Arch-pinned tests live in `examples/16xx-platform-asm-*` and declare their target
via the `expected/<name>.target` sidecar marker (Phase 0). Never regenerate
snapshots while red.
## Phase 0 — corpus target-gating (test-infra prerequisite; no compiler code)
**Why first.** The flagship v1 examples are `x86_64` (syscall-write, divmod,
cpuid) but the dev host is `aarch64`-Darwin, and the corpus runner
([src/corpus_run.test.zig](../src/corpus_run.test.zig)) currently (a) never threads
a per-example `--target` and (b) has no host-arch gate — its only skip is "marker
has no `.sx`". So D.0's `…-syscall-write` markers asserting exit/stdout describe
output the harness *cannot* produce on this host, which would violate the cadence
rule (the "next commit turns it green" can never happen). Phase 0 closes that gap.
It touches **only the runner + two fixtures** — zero compiler code, zero risk to
AE, and unblocks every arch-pinned asm example.
**Marker taxonomy (the cleanup).** The runner currently spreads per-example
*directives* across standalone boolean/value sidecars (`.aot` now, `.target`
proposed, more later). Replace that sprawl with **one optional config file,
`expected/<name>.build`**, holding all build/run directives; the output snapshots
(`.exit` / `.stdout` / `.stderr` / `.ir`) stay separate — they are
machine-regenerated data, not config. `.exit` remains the **test-discovery key**
(every test has one; `.build` is optional).
**`.build` format** — JSON, parsed with `std.json`:
```json
{ "aot": true, "target": "x86_64-linux" }
```
Parse via `std.json.parseFromSlice(BuildConfig, …)` into
`struct { aot: bool = false, target: ?[]const u8 = null }`. Field defaults cover
omitted keys; `std.json`'s default `ignore_unknown_fields = false` makes an
**unknown key a loud `error.UnknownField`** (surfaced as a runner failure, never a
silent ignore — CLAUDE.md no-silent-default rule). Extensible: future `"cpu"`,
`"link"`, `"cwd"` are just new optional struct fields, no new sidecar file and no
custom parser.
**What the directives do:**
1. **`target = <triple|shorthand>`** threads `--target <value>` into every `sx`
invocation for that example (`run` / `build` / `ir``--target` is a global
flag, confirmed [main.zig:39](../src/main.zig#L39)), AND **host-match selects
the mode.** The runner parses the leading `arch` + `os` tokens of the resolved
triple and compares them to `@import("builtin").target` (normalizing
`arm64``aarch64`):
- **match** → *execute* exactly as today (`sx run`, or `aot` build+exec) with
the target threaded, plus the `.ir` diff if an `.ir` snapshot exists. ⇒ an
x86_64 example gives **real end-to-end coverage on an x86_64 CI runner**.
- **mismatch** → **ir-only**: run *only* `sx ir <file> --target <t>`; assert
`.exit` (the ir command's exit), `.ir` (normalized stdout), and `.stderr`
(diagnostics, normally empty). Do **not** run/build/exec; do **not** assert
`.stdout`. An `.ir` snapshot is **required** in ir-only mode — its absence is
a loud runner failure ("arch-pinned <name>: ir-only mode requires an .ir
snapshot"), never a silent pass. Robust even if `sx ir` treats `--target` as
a partial no-op: the `inline_asm` op carries the template + constraint string
verbatim, so the IR snapshot still locks the exact thing §II.11 flags as
silently-miscompiling (the constraint assembler + template rewrite).
2. **`aot`** is the existing JIT-vs-build+exec switch, just relocated from the
standalone `.aot` marker into `.build`.
**Negative compile-error examples need NO `.build`.** `…-missing-volatile`
(no-output-without-`volatile`) is a Sema diagnostic raised before codegen/JIT, so
plain `sx run` reports it identically on any host — it stays a normal example with
no config file.
**update-goldens interaction:** in ir-only mode, `-Dupdate-goldens` writes `.exit`
(ir exit) + `.ir` (+ `.stderr` if non-empty) and skips `.stdout`. Execute mode
(incl. `aot`) is unchanged. `.build` is hand-authored — update-goldens never
writes it.
| Step | Commit | What | Files |
|---|---|---|---|
| 0.0 | lock | Add `BuildConfig` + `std.json` parse of `expected/<name>.build` (unknown-key ⇒ `error.UnknownField`); **migrate** the 2 existing `.aot` markers → `.build` (content `{ "aot": true }`) and delete them; thread `target`'s `--target` into the spawned argv; add `hostMatchesTarget(value) bool` (arch+os token parse, `arm64``aarch64`) gating the **execute** path. Lock with `examples/16xx-platform-target-host.sx` (trivial `main`) + a `.build` `{ "target": "<host arch triple>" }` (still runs+passes) and unit `test`s for the JSON parse + `hostMatchesTarget`. | `src/corpus_run.test.zig`, `examples/expected/1226-*.{aot→build}`, `…/1227-*`, + fixture |
| 0.1 | lock | Implement the **mismatch ⇒ ir-only** branch (skip run/build/exec; assert `.exit`+`.ir`+`.stderr` from `sx ir --target`; require `.ir`). Lock with `examples/16xx-platform-target-cross.sx` (asm-free `() -> i64 { return 0; }`) + `.build` `{ "target": "x86_64-linux" }` + a checked-in `.ir` snapshot — exercises ir-only on the arm64 host. | `src/corpus_run.test.zig` + fixture |
| 0.2 | docs | Update CLAUDE.md §"Test layout"/§"Testing" to document `.build` (format + `aot`/`target` keys) replacing the standalone `.aot` marker prose (lines ~435, ~492). | `CLAUDE.md` |
Both 0.0 and 0.1 are **lock** commits: the runner change and the fixture that
exercises it land together and pass the moment they land (the mechanism works
immediately — nothing is left red), which is the cadence rule's "lock in current
behavior" flavor, not a feature red→green. No asm lowering is gated on either.
**Phase 0 verification:** `zig build test` green; deliberately corrupt the
cross-target `.ir` fixture and confirm the runner reports an IR mismatch (proves
ir-only actually asserts, isn't a no-op); delete it and confirm the
"requires an .ir snapshot" failure fires.
**Estimated runner delta:** ~7090 lines (sidecar read + `--target` argv threading
+ `hostMatchesTarget` + the ir-only branch + update-mode tweak). Within the
"no step > ~500 new lines" rule; well under the read budget.
## Phase A — keyword + AST + parser (parses; no codegen)
| Step | Commit | What | Files |

124
current/PLAN-DIST.md Normal file
View File

@@ -0,0 +1,124 @@
# PLAN-DIST — bundle `zig` as sx's hermetic link/libc backend
## Goal
`sx build` produces a native binary by driving a **bundled `zig`**
(`zig cc`) as the linker, so a distributed sx on Linux needs no system
`cc`/lld/libc/CRT. `sx run` (JIT) is unaffected — it never links.
This is the "be like Zig" move: reuse Zig's hermetic toolchain (lld +
crt objects + musl/glibc, all bundled in the `zig` distribution) instead
of building our own lld-in-process + libc-from-source pipeline.
> **Configuration surface** (env vars, flags, resolution order,
> activation truth table, target→ABI map, distribution layout) is
> specified in [../design/bundled-zig-link-backend-design.md](../design/bundled-zig-link-backend-design.md) — the design-of-record
> for how the backend is configured. Keep the two files in sync.
## Locked decisions
1. **Default Linux output ABI = static musl** (`x86_64-linux-musl`,
`-static`). Output runs on ANY Linux with zero deps — the property
that makes Zig binaries portable. glibc/dynamic only via explicit
`--target x86_64-linux-gnu`.
2. **Activation = auto** when a bundled/resolvable `zig` exists AND the
user passed no `--linker`. Falls back to system `cc` otherwise.
3. **Dev uses PATH `zig`** (0.16.0 already installed). Defer copying a
vendored toolchain into `libexec/` until Phase 3 packaging.
## Why `zig cc`, not raw `ld.lld`
`zig cc` is a clang-compatible driver, so it slots into the **existing**
cc-style argv branch in `src/target.zig` almost unchanged, and supplies
lld + crt objects + musl/glibc automatically per `-target`. Driving
`ld.lld` directly would force us to locate/pass crt1.o/crti.o/libc
ourselves — exactly the work we're avoiding.
## Key code anchors (verified)
- Linker selection hook: `TargetConfig.getLinker()``src/target.zig:194-196`
(`self.linker orelse "cc"`).
- Unix `cc`-style link branch: `src/target.zig:524-564` (this is where
the zig backend hooks in; `-o`/`-L`/`-l`/extra objects already pass
through clang-compatibly).
- Exe-relative resolution pattern to mirror for finding zig:
`src/imports.zig:204-227` (`discoverStdlibPaths`, `$SX_STDLIB_PATH`
override + `<exe>/..` candidates).
- `--linker` CLI flag parsing: `src/main.zig:87-90`.
- Emit triple (must agree with link target): `src/ir/emit_llvm.zig`
(`LLVMSetTarget`, ~L246-284).
## Phases
### Phase 0 — Resolve a bundled/host zig
- New `src/zig_backend.zig`: `discoverZig(alloc) -> ?[]const u8`.
Resolution order:
1. `$SX_ZIG` env override.
2. `<exe>/../libexec/zig/zig` (install layout, Phase 3).
3. `<exe>/../../zig-bundle/zig` (dev vendored layout, Phase 3).
4. `zig` on `PATH` (dev fallback — active now).
- Add `SX_DEBUG_ZIG` trace, matching existing `SX_DEBUG_*` hooks.
- No behavior change yet; just resolution + a debug/print hook to confirm.
### Phase 1 — `zig cc` link backend (core change)
- `src/target.zig`: generalize the linker from a single token to a
**driver argv**. Today `getLinker()` returns one string at `argv[0]`;
introduce a `LinkBackend` so the internal backend contributes
`{zigPath, "cc"}` as leading entries.
- In the Unix branch (L524-564), when backend = zig:
- prepend `zig cc`,
- append `-target <mapped triple>`,
- add `-static` for musl,
- everything else (`-o`, `-L`, `-l`, extra objects, extra link flags)
passes through unchanged.
- Add `sxTripleToZig()` mapping (sx shorthand/triple → zig `-target`);
unspecified-on-Linux → `x86_64-linux-musl`.
- Align emit triple: when the zig backend is selected, set the LLVM
module triple in `emit_llvm.zig` to match the link target
(x86_64-linux), so the `.o` links cleanly against musl crt.
### Phase 2 — Activation
- Auto-enable: if `discoverZig()` succeeds and no `--linker` override,
use the zig backend for `sx build`. System `cc` remains the fallback.
- Optional explicit `--self-contained` / `--no-self-contained` to force.
- Confirm `sx run`/JIT path is untouched (no link step).
### Phase 3 — Distribution packaging
- `build.zig`: a `dist` step assembling
- `bin/sx` (built with `-Dstatic-llvm`),
- `libexec/zig/` (vendored zig binary **and its `lib/`**, copied from a
pinned ziglang.org release per host arch),
- `library/` (stdlib),
into a relocatable tarball.
- Pin the zig version (currently 0.16.0).
### Phase 4 — Verify & lock
- Manual first: `sx build hello.sx` (auto zig backend) then `file`/`ldd`
the output → expect "statically linked".
- Honor snapshot-integrity + FFI-cadence rules before adding a corpus
test (host/arch-gated, likely a `.build` sidecar).
## Risks / watch
- **Bundle size**: zig + its `lib/` ≈ 5060 MB.
- **gnu vs musl ABI**: pure codegen objects link fine against musl;
TLS/stack-protector are the only realistic friction. Aligning the emit
triple (Phase 1) covers the common path.
- **macOS/Windows cross** via the same `zig cc -target` is nearly free
after Phase 1, but Apple-SDK linking has caveats — scope to Linux
target first; treat the rest as follow-up.
- **c_import.zig** also shells `cc` for C imports (JIT). Out of scope
here; same backend can absorb it later.
## Status
- [x] Phase 0 — resolve zig (`src/zig_backend.zig`)
- [x] Phase 1 — zig cc link backend (`target.zig` + `emit_llvm` triple normalize)
- [x] Phase 2 — activation (`--self-contained`/`--no-self-contained`; auto on bundled zig)
- [ ] Phase 3 — dist packaging (vendor `zig` into `libexec/`)
- [ ] Phase 4 — verify & lock (manual ✓ macOS/Linux/Windows; corpus test pending runner `--self-contained` support)
Scope landed as **macOS + Linux + Windows** (not Linux-first). See the
"Implementation status" section in
[../design/bundled-zig-link-backend-design.md](../design/bundled-zig-link-backend-design.md)
for what refined the original locked decisions.

View File

@@ -5,7 +5,7 @@
They are *one* plan: Part B can't start until Part A is a behavior-equivalent
superset of `#foreign`, and Part A isn't "done" until Part B reaches the invariant.
**Design rationale:** [docs/inline-asm-design.md](../docs/inline-asm-design.md) §II.2
**Design rationale:** [design/inline-asm-design.md](../design/inline-asm-design.md) §II.2
(Deviation 6) + §II.10 #4 + the syntax evaluation.
**Decided syntax**
@@ -173,7 +173,7 @@ gate only the live tree (recommended) vs purge everything. Confirm 6 before Phas
> Work the FFI-linkage stream per `current/PLAN-EXTERN-EXPORT.md` (+ checkpoint
> `current/CHECKPOINT-EXTERN-EXPORT.md`). First read the plan's header (Decided
> syntax, Naming constraint, Key finding) and Part A; rationale is in
> `docs/inline-asm-design.md` §II.2 (Deviation 6) + §II.10 #4.
> `design/inline-asm-design.md` §II.2 (Deviation 6) + §II.10 #4.
>
> **This session = Part A, Phases 0 and 1 only** (`extern` works as a bare postfix
> keyword equivalent to a lib-less `#foreign` fn/global binding; `#foreign` stays

125
current/PLAN-REIFY.md Normal file
View File

@@ -0,0 +1,125 @@
# PLAN-REIFY — comptime type reflection + construction (`type_info` / `reify`)
## Goal
Add the two comptime metaprogramming builtins — **`type_info($T) -> TypeInfo`**
(reflect a type → data) and **`reify(info: TypeInfo) -> Type`** (construct a *new
nominal type* from data) — plus the sx-lib helpers (`make_enum`, `field_type`,
`RecvResult`/`TryResult`) built over them. This is **step 3 of the async-first
sequence** ([../design/execution-evolution-roadmap.md](../design/execution-evolution-roadmap.md)
§7); it gates channel result types (`RecvResult($T)`) and `race`'s synthesized
tagged-union, and **replaces** a would-be `enum($T)` generic-enum language feature.
> Rationale + the five validated contracts: design doc §7 step 3 + §8.1. The approach
> was grounded by three codebase reviewers — it is a **small extension reusing existing
> machinery**, not net-new architecture.
## Locked design (the five reify contracts — all codebase-validated)
1. **Nominal identity via type-fn memoization.** `RecvResult(i64)` is one `TypeId`
because type-fns dedup by mangled `(fn,args)` name (`generic.zig:1620-1629`) +
reify `findByName`. NOT structural dedup — enums are nominal (`types.zig:1110`).
2. **Functional through codegen.** A reify'd enum has **no backing AST decl**, and
every enum stage is type-table-driven (layout, construct, match+exhaustiveness,
`toLLVMType`, `type_name`/format) — so it flows through **unmodified**.
3. **Validate loudly** at the `intern`/`internNominal` choke point (`types.zig:411-439`).
4. **Comptime-only, JIT-free** — a type-table op in the interpreter; no S1 dependency.
5. **Reference-based self-reference** (`*Self`/`[]Self`) via reserve-placeholder→
complete (`nominal.zig:86/108/120`, `types.zig:442`); **by-value recursion rejected**.
Surface follows the **`#builtin`** pattern of the existing reflection builtins
(`type_of`/`field_count`/`field_name` in `library/modules/std/core.sx`,
`specs.md:2594-2600`) — NOT the BuildOptions compiler-hook registry.
## Key code anchors (verified by review)
- Type minting: `TypeTable.intern` / `internNominal``src/ir/types.zig:411-439`.
- Type-fn instantiation + mangled-name cache — `src/ir/lower/generic.zig:1575-1689`
(cache check `:1620-1629`; register inline-struct result `:1663-1689`).
- Forward-declare reserve (recursive types) — `src/ir/lower/nominal.zig:86/108/120`;
complete a forward-declared type — `src/ir/types.zig:442`.
- Enum codegen (all type-table-driven, the reify target shape): size `types.zig:633-636`;
`resolveVariantIndex` `lower/expr.zig:1159-1177`; match `lower/control_flow.zig:748-945`;
`toLLVMType` `backend/llvm/types.zig:111-154`; `type_name` `types.zig:846-882`.
- Existing reflection builtins to mirror — `core.sx` (`#builtin`) + their interp/lower
handlers (`src/ir/interp.zig` `type_name`/reflection at ~`:1911`).
- Match form — `specs.md:408-424`.
## Cadence (IMPASSIBLE)
No commit may both add a test AND make it pass (xfail-then-green, or a behavior-lock).
`zig build && zig build test` after every step. Never regenerate snapshots while red.
Examples: `06xx` (comptime, deterministic), `11xx` (diagnostics for loud failures).
---
## Phases
### Phase 0 — `reify` of a flat enum (the core)
| Step | Commit | What | Files |
|---|---|---|---|
| 0.0 | lock | `TypeInfo`/`EnumInfo`/`EnumVariant` lib types in `core.sx` (data only); `reify`/`type_info`/`field_type` as bodyless `#builtin` decls (parsed, unimplemented → loud bail). Unit: decls parse. | `library/modules/std/core.sx`, `src/ir/interp.zig` |
| 0.1 | xfail | `examples/06xx-comptime-reify-enum.sx``reify(.enum_(.{variants=[.{name="value",payload=i64},.{name="closed",payload=void}]}))`, construct `.value(3)`, match it. Red (reify unimplemented). | `examples/06xx-*` |
| 0.2 | green | implement `reify(.enum_)` → build `EnumInfo`/`TaggedUnionInfo` `TypeInfo`, `internNominal(info, fresh_nominal_id)`, return `TypeId`. Example green; construct + match work unmodified (Contract 2). | `src/ir/interp.zig`, (`src/ir/types.zig` if a helper is wanted) |
### Phase 1 — type-fn → reify identity
| Step | Commit | What | Files |
|---|---|---|---|
| 1.0 | xfail | `examples/06xx-comptime-reify-typefn-identity.sx``R :: ($T)->Type { reify(...) }`; assert `R(i64)` from two sites is ONE type (assignable/matchable across sites). Red if reify-result not registered by mangled name. | `examples/06xx-*` |
| 1.1 | green | register a reify-returning type-fn's result under the instantiation mangled name (mirror the inline-struct path `generic.zig:1663-1689`). Identity holds (Contract 1). | `src/ir/lower/generic.zig` |
### Phase 2 — `type_info` (reflect) + `field_type`
| Step | Commit | What | Files |
|---|---|---|---|
| 2.0 | xfail | reflect a struct/tuple → read variant/field names + **types** (`field_type($T,i)`). Red. | `examples/06xx-*` |
| 2.1 | green | implement `type_info`/`field_type` over the type table (reuse the `field_count`/`field_name` reflection path). | `src/ir/interp.zig` |
### Phase 3 — `make_enum` + `RecvResult`/`TryResult` (sx lib)
| Step | Commit | What | Files |
|---|---|---|---|
| 3.0 | lock | `make_enum(variants) -> Type` (sx lib over `reify`); `RecvResult($T)`/`TryResult($T)` as type-fns. Behavior-lock: `RecvResult(i64)` constructs + matches. | `library/modules/std/*` |
### Phase 4 — reference-based self-reference
| Step | Commit | What | Files |
|---|---|---|---|
| 4.0 | xfail | recursive enum via `*Self` (tree/list): `reify_rec((self)=> .enum_(... payload = ptr_to(self) ...))`. Red. | `examples/06xx-*` |
| 4.1 | green | `reify_rec` reserve-placeholder→complete (reuse `nominal.zig:86`/`types.zig:442` recursive path). | `src/ir/interp.zig`, `src/ir/types.zig` |
### Phase 5 — validation + loud diagnostics
| Step | Commit | What | Files |
|---|---|---|---|
| 5.0 | xfail | `examples/11xx-diagnostics-reify-*` — dup variant names, non-integer backing, **by-value self-reference** ("infinite size; use `*Self`"). Pin the messages. | `examples/11xx-*` |
| 5.1 | green | validate `TypeInfo` at the `intern`/`internNominal` choke point; emit diagnostics, never a broken type (Contract 3). | `src/ir/interp.zig` / `src/ir/types.zig` |
> `RaceResult` (tuple→tagged-union synthesis) is **not** in this stream — it lands with
> `race` (async cluster), but it consumes exactly the `type_info`+`field_type`+`reify`
> primitives built here.
## Risks / watch
- **Mangled-name plumbing (Phase 1)** is the one real unknown — confirm the type-fn
path registers a *reify-returned* result (not just inline `struct {…}` literals).
Fallback: have `reify` itself name the type by the instantiation key + `findByName`.
- **Self-ref completion (Phase 4)** must reuse the existing recursive-type
reserve→complete path; do not invent a new mutate-after-intern mechanism.
- Keep `reify` **comptime-only**: a `reify` reached at runtime is a hard error.
## Status
- [ ] Phase 0 — `reify` flat enum
- [ ] Phase 1 — type-fn identity
- [ ] Phase 2 — `type_info` + `field_type`
- [ ] Phase 3 — `make_enum` + `RecvResult`/`TryResult`
- [ ] Phase 4 — reference self-reference
- [ ] Phase 5 — validation + diagnostics
## Kickoff prompt (paste into a fresh session)
> Work the REIFY stream per `current/PLAN-REIFY.md` (+ checkpoint
> `current/CHECKPOINT-REIFY.md`). Read the plan header (goal, five locked contracts,
> key anchors) first; rationale is in `design/execution-evolution-roadmap.md` §7 step 3
> + §8.1. **This session = Phase 0 only** (`TypeInfo` lib types + `reify` of a flat
> enum: construct + match). Cadence (IMPASSIBLE): no commit both adds a test and makes
> it pass — lock, then xfail→green. `zig build && zig build test` after every step. If
> you hit an unrelated compiler bug, follow the CLAUDE.md IMPASSIBLE RULE (file an
> issue, stop). Stop at the end of Phase 0; update the checkpoint.

View File

@@ -0,0 +1,384 @@
# Bundled `zig` Link Backend for sx — Design Doc & Proposal
> Status: **core landed (macOS / Linux / Windows).** This is the
> design-of-record for how a distributed sx links native binaries
> hermetically. The phased plan lives in
> [../current/PLAN-DIST.md](../current/PLAN-DIST.md); keep the two in sync.
> User-facing surface is documented in `readme.md` (Cross-Compilation §).
---
## Implementation status (landed)
The core backend is implemented and verified on a macOS host:
| Target | Result | Notes |
|--------|--------|-------|
| `--target linux-musl` | static ELF | `zig cc -target x86_64-linux-musl -static` |
| `--target windows-gnu` | PE32+ | `zig cc -target x86_64-windows-gnu` |
| `--target macos` | Mach-O (runs) | `zig cc -target <arch>-macos`, no `-static` |
What shipped, and where it **refined** the original locked decisions:
- **Scope = macOS + Linux + Windows** (not Linux-first). iOS/Android/wasm keep
their specialized toolchains. (`TargetConfig.zigBackendInScope`.)
- **Auto-activation = a *bundled* zig is found** (a real distribution, or a
pinned `$SX_ZIG`). A `PATH`-only zig is the dev fallback and engages **only**
under `--self-contained` — so native dev/CI builds are never silently
rerouted, across all three OSes. This is the precise meaning of the §5.5
"zig found (B)" column: **B = bundled**. *(Refinement of "auto when zig
found": PATH-zig does not auto-engage; the musl-only auto gating considered
mid-design was dropped in favor of bundled-vs-PATH, which is OS-agnostic.)*
- **No translation table** (per the triple-scheme decision): sx triples are
passed straight to `zig cc`, and `emit_llvm` runs them through
`LLVMNormalizeTargetTriple` so vendor-less zig triples (e.g.
`x86_64-windows-gnu`) land their OS/env in LLVM's canonical positions —
otherwise "windows" sits in the vendor slot and the object silently falls
back to ELF. The one unavoidable exception is **macOS**: the object must be
emitted from Apple's `apple-darwin` triple (LLVM needs it for Mach-O), but
zig's `-target` parser rejects that scheme, so the *linker* triple alone is
the vendor-less `<arch>-macos`. One OS-specific line, not a table.
- **New shorthands:** `linux-musl`, `linux-musl-arm`, `windows-gnu` (zig
scheme). The existing `linux`/`linux-arm` shorthands were also de-vendored
(`x86_64-linux-gnu`, matching the corpus runner's own expander).
Files: `src/zig_backend.zig` (discovery), `src/target.zig`
(`selectZigLinker` / `emitZigLinkArgv` / `zigTargetTriple` / dispatch in
`link`), `src/ir/emit_llvm.zig` (triple normalization), `src/main.zig`
(`--self-contained` / `--no-self-contained` + shorthands).
Not yet done: distribution packaging (Phase 3 — vendoring `zig` into
`libexec/`), and a corpus regression test (needs the runner to thread
`--self-contained`; manual verification only so far).
The sections below are the original proposal; where they say "Linux-first" or
"follow-up" for macOS/Windows, the table above supersedes them.
---
## 0. TL;DR + feasibility
**Problem.** A distributed `sx` compiler can run on a Linux box (static-LLVM
binary + relocatable `library/`), but it cannot *finish a build*: the final
link step shells out to the host's `cc`, and relies on the host's libc + CRT
objects. No `cc`/glibc/SDK on the box → no binary. That is the gap between
"sx runs here" and "sx is a toolchain here."
**Proposal.** Bundle a pinned `zig` binary inside the sx distribution and use
`zig cc` as the link backend for `sx build`. `zig cc` brings its own lld,
CRT objects, and libc (musl or glibc) for the chosen target. Default Linux
output is **statically-linked musl**, which runs on any Linux with zero
dependencies — the property that makes Zig's own output portable.
**Feasibility: high.** The change is contained:
- The linker is selected through a single hook —
`TargetConfig.getLinker()` at `src/target.zig:194-196` — and the final
link argv is built in one place, the Unix `cc`-style branch at
`src/target.zig:524-564`.
- `zig cc` is a clang-compatible driver, so `-o` / `-L` / `-l` / extra
objects pass through that branch unchanged. The backend only has to
prepend `zig cc` and add `-target …` / `-static`.
- Exe-relative resolution (for finding the bundled zig) is already solved
for the stdlib in `src/imports.zig:204-227` and can be mirrored.
- `sx run` is JIT and never links, so it is wholly unaffected.
The cost is a ~5060 MB vendored `zig` (binary + its `lib/`) in the
distribution, and version-pinning discipline.
---
## 1. Motivation & background
### 1.1 Current state
| Concern | Today | File |
|---------|-------|------|
| Compiler binary | Self-containable via `-Dstatic-llvm` (no system LLVM) | `build.zig:9-10,156-162` |
| Stdlib | Relocatable, found relative to the exe | `src/imports.zig:204-227` |
| **Linking** | **Shells to system `cc`** | `src/target.zig:524-564` |
| **libc / CRT** | **Provided by the host `cc` driver implicitly** | (no `-lc`/crt passed) |
So two of three legs of a portable toolchain already stand. The third — the
linker and the libc/CRT it pulls in — is the host dependency this design
removes.
### 1.2 Why this matters for distribution
The goal is to hand someone a tarball and have `sx build app.sx` produce a
working binary on a stock Linux machine — a fresh container, a minimal CI
image, a box without `build-essential`. Today that fails at the link step.
Zig solved exactly this problem for its own users; since sx is *built with*
Zig, the cleanest fix is to stand on Zig's hermetic toolchain rather than
re-implement it.
---
## 2. Goals & non-goals
### Goals
- `sx build` produces a native Linux binary with **no host `cc`/ld/libc/SDK**.
- Default Linux output is **portable** (static musl): runs on any Linux.
- **Zero-config in the common case**: a bundled or PATH `zig` is detected and
used automatically; the operator sets nothing.
- A fully-specified, documented configuration surface (this document) for the
cases that *do* need tuning.
- No regression for existing users: system `cc` remains a fallback, and any
explicit `--linker` still wins.
### Non-goals (this iteration)
- Reimplementing lld in-process or building libc from source (see §7 —
Zig already does both; we reuse it).
- First-class Windows/macOS cross-compilation (nearly free as a follow-up,
but unverified — §11).
- Routing C-import compilation (`src/c_import.zig`, which also shells `cc`)
through the backend.
- Glibc-floor version pinning (`…-gnu.2.28`); exposed only if needed.
---
## 3. How Zig achieves hermetic builds (the model we're borrowing)
Zig's turnkey cross-compilation rests on bundling the two things sx borrows
from the host:
1. **In-process lld.** Zig embeds LLVM's lld (ELF/COFF/Mach-O/wasm) and links
without spawning an external linker.
2. **libc as data.** Zig ships musl *source* (builds `libc.a` + `crt*.o` on
demand, cached → static, no dynamic linker → portable output) and glibc
stubs generated from `.abilist` per version. For Windows it ships mingw
`.def` files and synthesizes import libraries.
`zig cc` exposes all of this behind a clang-compatible driver: `zig cc
-target x86_64-linux-musl -static foo.o -o foo` yields a portable binary on
any host, with nothing installed. **This design consumes that driver rather
than rebuilding its internals** — the whole second column above arrives for
free by vendoring the `zig` binary.
---
## 4. Design overview
`sx build` gains a **link backend** abstraction with two implementations:
- `system_cc` — today's behavior (shell `cc`, host libc).
- `bundled_zig` — shell `<zig> cc -target <triple> [-static] …`.
Selection is automatic (§5.5): if a usable `zig` is discovered and the user
gave no explicit `--linker`, `bundled_zig` is used; otherwise `system_cc`.
The backend plugs into the existing Unix link branch — it contributes the
leading `zig cc` tokens and the `-target`/`-static` flags; the rest of the
argv assembly is unchanged because `zig cc` is clang-compatible.
One supporting change: when `bundled_zig` is active, the triple handed to
LLVM in `src/ir/emit_llvm.zig` is aligned to the link target (`x86_64-linux`)
so the emitted object links cleanly against the selected musl CRT.
---
## 5. Detailed design (the configuration surface)
### 5.1 zig discovery — resolution order
`discoverZig()` (new `src/zig_backend.zig`) returns the first hit:
1. `$SX_ZIG` — explicit override.
2. `<exe_dir>/../libexec/zig/zig`**install layout** (§6).
3. `<exe_dir>/../../zig-bundle/zig`**dev vendored layout** (§6).
4. `zig` on `PATH`**dev fallback** (the only one active today).
`<exe_dir>` is resolved exactly as `src/imports.zig` resolves the stdlib.
If none resolve, behavior depends on activation (§5.5): auto-mode silently
falls back to `system_cc`; `--self-contained` errors.
### 5.2 Environment variables
| Var | Effect | Default |
|-----|--------|---------|
| `SX_ZIG` | Absolute path to the `zig` used as the link backend. Highest-priority discovery source. | unset |
| `ZIG_LIB_DIR` | Path to the bundled zig's `lib/`. Needed **only** if `zig` was relocated away from its `lib/`. In the supported layout (§6) they ship together and zig self-locates — leave unset. | unset |
| `SX_DEBUG_ZIG` | Trace discovery: each candidate path and the chosen one (or "none → cc"). Mirrors `SX_DEBUG_STDLIB`. | unset |
| `SX_DEBUG_LINK` | **Existing.** Prints the full link argv — shows the exact `zig cc …` invocation. | unset |
| `SX_STDLIB_PATH` | **Existing.** Stdlib override; unrelated to linking but noted because a full distribution sets neither and relies on exe-relative discovery for both. | unset |
### 5.3 CLI flags (`sx build`)
| Flag | Effect |
|------|--------|
| `--self-contained` | Force `bundled_zig` ON. If no usable zig is found, **error** — do not silently fall back. |
| `--no-self-contained` | Force `system_cc`. |
| `--linker <cmd>` | **Existing.** Explicit linker; supplying it **disables** auto-activation (user's choice wins). To pin a specific zig, prefer `SX_ZIG` + `--self-contained`. |
| `--target <triple\|shorthand>` | **Existing.** Selects target + ABI (§5.4). With `bundled_zig` active and target unspecified on a Linux host → `x86_64-linux-musl` static. |
| `--sysroot <path>` | **Existing.** Forwarded to the linker; rarely needed with `bundled_zig` (zig brings its own sysroot). |
### 5.4 Target → ABI mapping
The default (no `--target`) deliberately differs from the legacy `linux`
shorthand, because portable static output is the entire point.
| `sx` invocation | zig `-target` | Link mode | Portable? |
|-----------------|---------------|-----------|-----------|
| *(no `--target`, Linux host)* | `x86_64-linux-musl` | `-static` | ✅ any Linux |
| `--target linux-musl` *(new)* | `x86_64-linux-musl` | `-static` | ✅ |
| `--target linux` / `linux-x86` | `x86_64-linux-gnu` | dynamic | ❌ host glibc, versioned |
| `--target linux-arm` | `aarch64-linux-musl` | `-static` | ✅ |
| `--target windows` | `x86_64-windows-gnu` | per zig | follow-up (§11) |
| `--target macos` / `macos-arm` | `aarch64-macos` | per zig | follow-up (§11) |
- A **new** `linux-musl` shorthand is added; the existing `linux` shorthand
keeps its current gnu/dynamic meaning for back-compat.
- The LLVM emit triple is aligned to the link target so the `.o` links
cleanly against the selected libc/CRT (§4).
### 5.5 Activation truth table
`B` = a usable zig was discovered (§5.1). Subcommand = `sx build`.
| `--self-contained` | `--no-self-contained` | `--linker` | zig found (B) | Result |
|:---:|:---:|:---:|:---:|--------|
| — | — | no | yes | **bundled_zig** (auto) |
| — | — | no | no | system `cc` (silent fallback) |
| — | — | yes | * | user's `--linker` |
| yes | — | * | yes | **bundled_zig** (forced) |
| yes | — | * | no | **error**: `--self-contained` but no zig |
| — | yes | * | * | system `cc` (forced off) |
- `--self-contained` + `--linker` together: backend choice goes to
`--self-contained`; treat the literal combination as a usage error
(document, don't guess).
- `sx run` / `sx ir` / `sx asm` never link → backend not consulted.
### 5.6 Emit-triple alignment
`src/ir/emit_llvm.zig` (`LLVMSetTarget`, ~L246-284) currently uses the host
default triple when `--target` is unspecified (on Linux,
`x86_64-unknown-linux-gnu`). When `bundled_zig` is active, set the module
triple to match the link target (`x86_64-linux`) so codegen and the musl CRT
agree. Pure codegen objects are ABI-compatible across gnu/musl; aligning the
triple removes the edge-case risk (TLS model, stack protector) up front.
---
## 6. Distribution layout (packaging)
A relocatable tree; everything resolves relative to `bin/sx`, so the whole
directory moves/untars anywhere with no env vars set:
```
sx-<os>-<arch>/
├── bin/
│ └── sx # built -Dstatic-llvm (no system LLVM dep)
├── libexec/
│ └── zig/
│ ├── zig # pinned zig binary
│ └── lib/ # zig's lib/ (musl/glibc sources, lld data, …)
└── library/ # sx stdlib (existing discovery)
└── modules/…
```
Rules:
- `zig` and its `lib/` **must** ship together under `libexec/zig/` so zig
self-locates `lib/`; splitting them forces `ZIG_LIB_DIR`.
- Pinned zig version: **0.16.0** (matches the build toolchain). Record the
exact version in the release manifest — a mismatched `zig cc` CLI is the
likeliest future breakage.
- Vendor the matching zig release per host os/arch from ziglang.org at
package time.
---
## 7. Alternatives considered
| Alternative | Why not (now) |
|-------------|---------------|
| **In-process lld + bundled musl sysroot** (sx owns the pipeline; no zig) | Requires a custom LLVM build *with* lld — the Homebrew `llvm@19` here ships none (`liblld*.a`, headers, `ld.lld` all absent) — plus a C++ lld shim and per-arch prebuilt musl. Strictly more work for the same user-visible result. The right *eventual* target if we want zero foreign binaries; tracked as a follow-up. |
| **Full Zig-style: build libc from source on demand** | Most flexible (any arch/libc version, no prebuilt blobs) but the most work; only worth it after the in-process-lld path exists. |
| **Document a hard dependency on system `cc`** | Zero engineering, but defeats the goal — the box still needs `build-essential`. Acceptable only as the current fallback, not the distribution story. |
| **Bundle just `ld.lld` + a musl sysroot (no full zig)** | Smaller than a whole zig, but we'd hand-manage crt object selection, dynamic-linker paths, and import libs — i.e. re-derive what `zig cc` already encapsulates. Bundle-size saving doesn't justify the fragility. |
Vendoring `zig` wins on effort-to-result because sx already builds with Zig:
it's a first-party dependency, not a foreign toolchain, and it unlocks
Windows/macOS targets later for nearly free.
---
## 8. Phasing
Detail in [../current/PLAN-DIST.md](../current/PLAN-DIST.md). Summary:
0. **Resolve zig**`discoverZig()` + `SX_DEBUG_ZIG`; PATH fallback only.
1. **Link backend** — generalize the linker to a driver argv; emit
`zig cc -target … -static`; align the emit triple.
2. **Auto activation** — wire the §5.5 truth table; `cc` fallback intact.
3. **Packaging**`build.zig` `dist` step assembling the §6 tree.
4. **Verify & lock**`file`/`ldd` shows "statically linked"; host/arch-gated
corpus test honoring the snapshot-integrity + FFI-cadence rules.
The minimum end-to-end proof is Phases 0+1 against PATH zig.
---
## 9. Open decisions
**Locked:**
- Default Linux ABI = **static musl** (portable output).
- Activation = **auto** when a usable zig is found and no `--linker`.
- Dev uses **PATH zig**; vendoring deferred to Phase 3.
**Still open:**
- Exact spelling of the force flags (`--self-contained` vs e.g.
`--bundled-linker`); name chosen here pending review.
- Whether auto-mode should *warn* on silent `cc` fallback or stay quiet
(leaning quiet, with `SX_DEBUG_ZIG` for diagnosis).
- Whether to gate the Phase-4 corpus test behind a `.build` `target`
sidecar or keep it manual until a Linux CI runner exists.
---
## 10. Risks
- **Bundle size** ≈ 5060 MB (zig + `lib/`). Acceptable for a toolchain;
call it out in release notes.
- **zig CLI drift** across versions — pin hard, record in the manifest;
the most likely future breakage.
- **gnu vs musl ABI** for the emitted object — covered by the emit-triple
alignment (§5.6); TLS/stack-protector are the only realistic friction.
- **Operator confusion**: default-no-target (musl) diverging from the
`linux` shorthand (gnu). Mitigated by the new `linux-musl` shorthand and
explicit documentation (§5.4).
---
## 11. Out of scope / follow-ups
- **Windows / macOS targets** via the same `zig cc -target`: nearly free
after the Linux path, but Apple-SDK and Windows specifics need their own
verification — not documented as supported until tested.
- **`src/c_import.zig`** still shells system `cc` for C imports in JIT mode;
route through the backend later.
- **In-process lld** (alternative in §7) as the eventual zero-foreign-binary
endgame.
---
## Appendix — quick recipes (once implemented)
```sh
# Portable static Linux binary (default when a bundled zig is present):
sx build app.sx -o app
file app # → "ELF 64-bit … statically linked"
# Force the backend; fail loudly if no zig is bundled:
sx build app.sx --self-contained
# Use a specific zig:
SX_ZIG=/opt/zig-0.16.0/zig sx build app.sx --self-contained
# Opt out, use the system toolchain:
sx build app.sx --no-self-contained
# Dynamic glibc instead of static musl:
sx build app.sx --target linux
# Debug discovery + the exact link invocation:
SX_DEBUG_ZIG=1 SX_DEBUG_LINK=1 sx build app.sx
```

View File

@@ -0,0 +1,625 @@
# Execution-Model Evolution — Roadmap (comptime JIT · async · concurrency · hot-reload)
> Status: **exploratory design-of-record.** Captures the forward plan for sx's
> execution model across five interlocking threads. Not yet an active
> `PLAN-*`/`CHECKPOINT-*` stream — this is the shared design the streams would be
> carved from. Cross-platform shipping (the bundled-zig backend + the sx bundler)
> is **already landed**; see [bundled-zig-link-backend-design.md](bundled-zig-link-backend-design.md)
> and [../current/PLAN-DIST.md](../current/PLAN-DIST.md).
---
## 0. The thesis
sx's compiler stays small by pushing capability into **library sx + three general
primitives** (`inline asm`, `extern`/`export`, `atomics`) rather than baking
features into codegen. Concretely:
- **Async is a library, not a language feature** — colorblind, stackful fibers
behind an `Io` interface (Zig-inspired). No function coloring, no
async→state-machine transform. The implementation is pure sx down to a per-arch
inline-asm context switch.
- **Comptime gains a JIT escape hatch** — the interpreter stays the default
(debuggable, portable), but drops to a host-JIT for the one thing it can't
walk (inline asm) and, later, for whole fragments (the bundler).
- **One shared substrate** — a persistent ORC LLJIT + host-target emitter — serves
comptime-asm, the bundler, and JIT-resident hot-reload.
The honest trade is **small *surface*, but each primitive is *deep*** — not "small
compiler." The net-new **compiler** obligations this plan adds (all verified absent
today): **atomics lowering** (N1), **generic enums** `enum($T)`, **`type_info` +
`reify` + `field_type`** (comptime type construction), **`callconv(.naked)`**,
**repointable-`context` codegen** (+ per-fiber stack-limit), the **S1 persistent JIT
spine**, **C1 thunk synthesis**, **comptime-asm lifting** (C3), and (later) the **S2
ORC C++ shim**. Async itself is genuinely a library; the *enabling primitives* are a
major codegen/runtime investment. Already landed: `inline asm` (in flight),
`extern`/`export`, the `!`/`try`/`catch`/`onfail`/`raise` ERR stream, value-level
reflection, the `sx run` ORC LLJIT, and the host-FFI trampolines.
---
## 1. The spine (shared substrate)
| ID | Piece | What | Size |
|----|-------|------|------|
| **S1** | Persistent JIT executor | A long-lived ORC LLJIT + a host-triple `LLVMEmitter` + a compiled-fragment cache, plumbed into the interpreter. Today the LLJIT exists only for `sx run`'s `main` ([target.zig:319](../src/target.zig#L319)); the emitter carries one target machine ([emit_llvm.zig:274](../src/ir/emit_llvm.zig#L274)). | L |
| **S2** | ORC C++ shim | `MachOPlatform::Create` + redirectable/lazy-reexport symbols. The bare `LLVMOrcCreateLLJIT` can't do thread-locals, C constructors, or symbol redefinition — the wall the C-with-sx JIT spike hit (`_Thread_local` SIGABRT; `errors-*` examples crashed). Required by any non-trivial JIT or symbol repoint. | M |
S1/S2 are the spine: built once, consumed by **C1** (the FFI thunks — the main
near-term consumer), **C3**, and (later) **R2**. S1 alone suffices for C1/C3 (bare
calling/asm thunks — no TLS/ctors); S2 is only needed for R2 and JIT-ing C-with-sx.
---
## 2. Comptime / build layer
| ID | Piece | Unblocks | Depends | Size |
|----|-------|----------|---------|------|
| **C1** | **Real comptime FFI — JIT calling-thunks (LLVM = single ABI authority).** Trivial calls (scalar/ptr/string args, single-reg return) keep the existing `host_ffi.zig` trampoline fast-path; everything else (floats, structs-by-value, aggregate returns, >8 args, varargs) synthesizes a per-signature thunk, JIT-compiles it via **S1**, and calls it with an args buffer the interpreter fills by known layout (`type_info`). **LLVM emits the ABI-correct call — the same lowering as runtime codegen — so comptime and runtime FFI share ONE ABI implementation.** Rejected: libffi (foreign 2nd ABI impl), hand-rolled sx+asm (3rd impl + drift risk + needs C3 to run its own asm leaf anyway). | struct/string/slice/float signatures at comptime; full C interop in `#run`; lifts the bundler's API straightjacket; unifies comptime+runtime FFI | S1 (fast-path: none) | L |
| **C2** | **`#compiler``extern` collapse** — BuildOptions hooks become real exported C symbols resolved through C1; `*BuildConfig` threaded via global/handle; delete `.compiler_expr`/`compiler_call`/Registry. | one FFI mechanism, not two | C1 (`extern`/`export` already shipped) | M |
| **C3** | **Comptime asm via host-JIT** — stop bailing on `inline_asm` ([interp.zig:1019](../src/ir/interp.zig#L1019)); lift the block (operand model at [inst.zig:354](../src/ir/inst.zig#L354): inputs/`out_value`/`out_place`/`out_ty`/clobbers) to a host-arch thunk via `LLVMGetInlineAsm`, JIT, call through C1, cache by template+sig. | running asm-containing code at comptime | S1, C1 (+S2 non-trivial) | M |
| **C4** *(DROPPED)* | **JIT-the-bundler****not built** (Decision 6). Interp+C1 is the shipping bundler (I/O-bound, so native speed is moot; C1 closes the only capability gap). Remains an always-available S1 optimization if profiling ever shows the bundler's *own logic* is a hotspot. | — | — | — |
**Residue:** cross-arch comptime asm (C3) can't run on the host — narrows the bail
to the cross-compile case; needs a sharp diagnostic ("asm targets `<arch>`, host
is `<host>`").
---
## 3. Concurrency primitives (atomics + threads)
> **Why this is its own section:** we are doing **multiple OS threads**, so the
> async runtime and any lock-free structure need real atomics. OS threads already
> exist; atomics do not.
| ID | Piece | State | Size |
|----|-------|-------|------|
| **N1** | **Atomics — NET-NEW compiler feature.** Atomic load/store/RMW (`add/sub/and/or/xor/swap` + `fetch_min`/`fetch_max`; no `nand`), `compare_exchange`/`_weak` (→ `?T`, **null = success**), and fences, with orderings (relaxed/acquire/release/acq_rel/seq_cst). LLVM provides all — an **emit** feature, not a runtime library. **Surface LOCKED = `Atomic($T)` wrapper + `Ordering` enum** (not `@atomic_*``@` is address-of in sx). | **lowering absent** — zero LLVM `atomicrmw`/`cmpxchg`/`fence` emission today; some IR/inference scaffolding exists | M |
| **N2** | **OS threads + pthread Mutex/Cond + worker Pool** | **landed** — [std/thread.sx](../library/modules/std/thread.sx) (`pthread_create`/`join`/`detach`, in-place `Mutex`/`Cond`, bounded `Pool`). NOTE: pthread mutex **blocks the OS thread** — it is *not* fiber-aware (it would park every fiber on that thread); fiber-aware sync is N3, built on N1. | — |
| **N3** | **Fiber-aware sync** — mutex / channel / waitgroup that **suspend the fiber**, not the OS thread. Hybrid: atomic fast-path (N1) + fiber-suspend slow-path (A2/A5). Distinct from the pthread primitives in N2. | new library | M |
**Compiler obligation for N1:** the emit must map sx orderings to LLVM's and **not
reorder across atomics/fences**. Comptime is single-threaded, so the interpreter
can treat atomic ops as ordinary ops (seq_cst is trivially satisfied with one
thread) — no interp atomics machinery needed.
**N1 is a prerequisite for M:N scheduling (A5) and N3, and is broadly useful**
(lock-free queues, refcounts, the allocator). It is the load-bearing new primitive
this revision adds.
---
## 4. Async — colorblind, stackful, pure-sx
**Commitment:** no function coloring, no async→state-machine transform. Async is a
capability carried in `context` (like `context.allocator`), not a property of a
function's signature. A function does I/O through `context.io`; whether the call
suspends is decided by the `Io` *implementation*, transparently.
| ID | Piece | Notes | Size |
|----|-------|-------|------|
| **A1** | **`Io` interface + `context.io`** — a protocol/vtable threaded like `Allocator`. `io.async(fn,args) → Future`, `future.await`, cancellation. | leverages protocols + context | M |
| **A2** | **Stackful coroutine runtime — in sx lib, NOT a compiler builtin.** The context-switch is a `callconv(.naked)` sx fn with an inline-asm body (save callee-saved + SP/LR into `*from`, load from `*to`, `ret`); fiber bootstrap + stack alloc (`mmap`+guard via `extern`) also sx. The **compiler's** job is only (a) the general primitives — inline asm, `callconv(.naked)`, atomics — and (b) **fiber-safe codegen**: `context` lowered as a *repointable indirection* (never raw TLS) so the switch can repoint it, and stack-limit guards (if emitted) read from a swappable per-fiber location. Most arch-delicate sx in the tree (must match the platform callee-saved set + the compiler ABI), but it's inspectable sx, not a black box. | per-arch, arch-gated; co-validate vs codegen | M |
| **A3** | **Event-loop `Io` impls** — kqueue / epoll / io_uring drive readiness, then the (now-ready) syscall via C1. Plus a trivial **blocking `Io`**. | pure sx around syscall `extern`s | L |
| **A4** | **Stdlib I/O rework** — fs/socket/process take/use `context.io` instead of raw blocking syscalls, so existing calls participate in async. | mirrors the allocator-threading rule | M |
| **A5** | **Schedulers — M:1 → N×(M:1) → M:N, all sx std-lib `Io` vtables (committed; M:N last, not deferred).** M:1 first (minimal vehicle to validate the colorblind stack; covers I/O-bound). N×(M:1) = first parallel step (per-thread M:1 loops + `std/thread.sx` spawn; shared state uses N1 atomics — expected under parallelism, not a wart). M:N work-stealing last (most machinery: thread-safe steal queues + migration + errno/TLS discipline). All over N1 atomics + the A2 asm context-switch + `extern` syscalls. **pinning** API for thread-affine work (UI main thread, GL context). | see §4.3 | M (M:1) / M (N×M:1) / L (M:N) |
### 4.1 How control enters sx (the colorblind model)
- **sx→sx is ordinary.** The whole call chain lives on the fiber stack; a suspend
at a leaf `io.*` freezes the native stack verbatim. No frame knows it suspended.
**Zero special handling at call boundaries** — that's the point.
- **Three inbound boundaries** where the runtime enters sx:
1. **Task entry** (`io.async(fn)`) — a trampoline starts `fn` on a fresh fiber
stack via the normal calling convention.
2. **Resumption** — a context-switch (asm), *not* a call; sx continues mid-stack.
3. **C callback → sx** — must be `export`/`callconv(.c)`; runs on the event-loop
stack (not a fiber) so it **cannot itself suspend** — it may resume/enqueue a
fiber or run a non-suspending sx fn to completion (leaf-only).
### 4.2 `context` is fiber-local (the key obligation)
`context.io`/`context.allocator`/the `push Context` stack are dynamically scoped.
Fibers time-share OS threads (and **migrate** under M:N), so `context` must travel
**with the fiber** — saved/restored on every context-switch — **never a raw TLS
read.** A spawned task snapshots the spawner's context, then evolves its own
`push Context` stack. This is the CLAUDE.md "capture your owning allocator" rule one
level up: ambient state that outlives a suspension point must be carried by the
fiber.
### 4.3 Threads & the two hazard classes (why atomics)
| Model | Parallelism | Migration | Hazards |
|-------|-------------|-----------|---------|
| **M:1** (1 OS thread) | none | none | cooperative, race-free — simplest |
| **N×(M:1)** (per-thread schedulers, no migration) | yes | none | **data races** on shared state → atomics/locks |
| **M:N** (work-stealing) | yes | yes | data races **+** TLS-migration hazards |
- **Parallelism hazard** (any N>1): shared mutable state races → needs **N1
atomics** + N3 fiber-aware sync. The M:1 "no locks" simplicity is gone.
- **Migration hazard** (M:N only): a fiber that moves threads across a suspend
reads the *wrong* thread's TLS. **`errno` must be captured immediately** after
each syscall; **`context` must be fiber-local** (§4.2) — non-negotiable under M:N.
- **Pinning** (`io.pinToThread()`): some work must stay put — the **UI main
thread** (UIKit/macOS/Android — directly the app targets in §6), OpenGL
current-context, TLS-using FFI. M:N needs a "don't migrate / main-thread-only"
fiber attribute (Go's `LockOSThread`).
### 4.4 Pure-sx boundary
Everything is sx except the irreducible FFI floor: the **asm context-switch**
(per-arch, in `.sx`), **syscall `extern`s** (kernel-implemented, like any libc
binding), and **raw stack memory** (`mmap`). The schedulers, event loops, futures,
cancellation, and sync primitives are ordinary sx. Payoff: **swappable `Io`
vtables** — blocking, io_uring, kqueue, a **mock `Io`** for tests, a
**deterministic-simulation `Io`** (fake clock, scripted readiness) for reproducible
concurrency tests — all libraries.
### 4.5 Comptime async = blocking `Io`
At comptime install the **blocking `Io`**: `io.*` just blocks; no fibers, no
scheduler, no suspend. Same source, different vtable. The interpreter never needs
suspend/resume, and the FFI (C1) needs no async awareness. This is *why* the
colorblind model resolves comptime async for free.
### 4.6 Syntax surface (grounded against the grammar)
All of the concurrency/atomics surface lands on **existing** sx grammar — `enum`
tagged unions + `if x == { case … }` match ([specs.md:364,408](../specs.md#L408)),
first-class **tuples** with named fields ([specs.md:815-852](../specs.md#L815)),
`=>` closures, `struct($T)` generics, `callconv(...)`, and the ERR keywords
(`try`/`catch`/`onfail`/`raise`/`error`). `race`/`async`/`await`/`atomic` are **not
reserved words** ([specs.md:168](../specs.md#L168)), so they stay library
types/methods — no keyword additions. One genuinely-new compiler capability is
required (see end).
**Atomics (N1) — generic wrapper type.**
```sx
Ordering :: enum { relaxed; acquire; release; acq_rel; seq_cst; }
Atomic :: ($T: Type) -> Type #builtin; // atomicity carried by the type
counter : Atomic(i64) = .init(0);
counter.store(0, .relaxed);
n := counter.load(.acquire);
prev := counter.fetch_add(1, .seq_cst); // + fetch_sub/and/or/xor (min/max: open)
old := counter.swap(42, .acq_rel);
got := counter.compare_exchange(old, new, .acq_rel, .acquire); // strong → ?T (null = success)
got2 := counter.compare_exchange_weak(old, new, .acq_rel, .acquire); // may fail spuriously; for retry loops
fence(.seq_cst);
```
- CAS takes **two orderings** (success, failure); failure ordering may not be
`release`/`acq_rel` nor stronger than success — enforce in the compiler.
- Weak vs strong matters on **aarch64** (LL/SC) — weak in a loop is the idiom;
both compile identically on x86.
**Channels (N3) — methods only (no `<-`); `recv` returns a tagged union (not `(v, ok)`).**
```sx
RecvResult :: enum($T: Type) { value: T; closed; } // ordinary generic enum (not the race-synthesized union)
TryResult :: enum($T: Type) { value: T; empty; closed; } // non-blocking: 3 states a bool can't express
ch := Channel(i64).make(16); // capacity; .make() unbuffered
ch.send(v);
if ch.recv() == { case .value: (v) { use(v); } case .closed: { /* drained */ } }
ch.close();
// ergonomic layer: `for ch (v) { … }` consumes until closed, hiding RecvResult
```
**Fiber-aware locks (N3) — explicit lock + `defer` (no guard sugar).**
```sx
m : Mutex;
m.lock(); defer m.unlock();
```
**Futures & spawn (A1).**
```sx
f := context.io.async(worker, arg); // Future(R)
r := f.await(); // suspends this fiber
f.cancel();
d := context.io.timeout(5000); // a Future too — raceable like any other
```
**Pinning (A5) — spawn attribute, accepts a thread handle.**
```sx
PinTarget :: enum { any; main; on: Thread; } // default = .any (may migrate)
f := context.io.async(render, pin = .main);
f := context.io.async(worker, pin = .on(some_thread));
```
**`race` (Zig model — over futures, named tuple in → synthesized tagged-union out).**
The input is a **named tuple** (positional also allowed → `.0`/`.1` tags); the
result is an anonymous tagged union whose variants mirror the tuple's labels, each
payload = that field's `Future(T)` projected to `T`. Losers are **cancelled and
joined** before `race` returns (structured).
```sx
fa := context.io.async(read_a, conn); // Future(A)
fb := context.io.async(read_b, conn); // Future(B)
winner := context.io.race((a: fa, b: fb)); // RaceResult = enum { a: A; b: B }
if winner == {
case .a: (v) { handle_a(v); } // v : A
case .b: (v) { handle_b(v); } // v : B
}
// positional form: race((fa, fb)) → tags .0 / .1
```
The Go-style handler-map and the map literal that propped it up are **dropped**
`race` over futures subsumes select, and cancellation handles the losers.
**Cancellation rides ERR.** A cancelled `io.*` **raises**; the fiber unwinds
through `defer`/`onfail` (`try`/`catch`/`raise` are real keywords). Cancellation is
**cooperative** (observed only at suspend points — every `io.*` is a cancellation
point) and **structured** (`race` joins losers' teardown before returning). No
parallel unwind path — it reuses the error channel.
**Context switch (A2).**
```sx
swap_context :: (from: *Fiber, to: *Fiber) callconv(.naked) {
asm { /* save callee-saved + SP into *from; load from *to; ret */ };
}
```
`callconv(.naked)``callconv(.c)`: **no prologue/epilogue/frame** — required
because a context switch deliberately makes SP-in ≠ SP-out (a `.c` epilogue would
restore from the wrong stack). Body is a single `asm` block; you emit your own
`ret`. Args arrive in ABI registers, read directly from asm.
**One new compiler capability (gates `race`):** *comptime tuple→tagged-union
synthesis.* Reflection today only **reads** types (`field_count`/`field_name`/
`type_of`); `RaceResult(T)` must **construct** an anonymous `enum` from a tuple's
`(label, payload-type)` pairs. Supporting pieces: a `field_type($T, i) -> Type`
reflection accessor (we have value-level `field_value` + `type_of`, but type-only
field projection is missing) and `Future(T) → T` projection (falls out of
generics). This is the generic "derive a sum from a product" — useful beyond
`race`.
---
## 5. Dev loop / hot-reload
| ID | Piece | Notes | Depends | Size |
|----|-------|-------|---------|------|
| **R1** | **Hot-reload (dylib swap)** — host owns `State`+allocator; reloadable module is a `.dylib` with a fixed `export` interface; watch→rebuild→`dlopen`→rebind→`dlclose`. State survives (host-owned). | leans on `export` (shipped); sidesteps S2; native | — | M |
| **R2** | **Hot-reload (JIT-resident)** — program runs under S1's LLJIT; reloadable calls route through ORC indirection stubs, repointed on change. Finer granularity; same spine. | | S1, S2 | L |
| **R3** | **Incremental compilation** — dependency tracking + recompile-only-changed. Perf enabler; coarse per-file v1 suffices first. | | — | L |
**Core rule:** the data that must survive a reload cannot be owned by the code that
reloads. Code/state separation — the CLAUDE.md owning-allocator discipline, one
level up.
**Residue — state migration on layout change:** body-only changes hot-swap;
layout/signature/global-type changes are **detected** (compare new vs running
`State` layout via `types.zig`) and trigger **rebuild+restart**. Migration hooks
(`on_reload(old)→new`) are a hard later item. Design against *silent* corruption.
---
## 6. Cross-platform (mostly landed) — from a macOS laptop
### 6.1 Landed
| Capability | State | Reach from a mac |
|---|---|---|
| `extern`/`export` C linkage | done (replaced `#foreign`) | all targets |
| Bundled-`zig cc` cross-link backend | Phases 02 done; packaging pending | **macOS, Linux(-musl/static), Windows(-gnu)** verified |
| sx-side bundler (`.app`/`.apk`) | done | macOS, iOS sim/device, Android |
| JIT `sx run` (ORC LLJIT) | done | host |
| Target shorthands | done | `macos[-arm]`, `linux[-musl[-arm]]`, `windows[-gnu]`, `ios[-arm]`, `ios-sim[-arm/-x86]`, `android[-arm64/-x86_64]`, `wasm` |
### 6.2 Workflows
```sh
# macOS (native): inner loop is JIT; ship is Mach-O / .app
sx run app.sx
sx build app.sx -o app
sx build app.sx --bundle MyApp.app
# Linux (cross, landed killer feature): static, zero-dep ELF
sx build app.sx --target linux-musl -o app # scp anywhere, runs
# Windows (cross, landed, MinGW path): PE32+
sx build app.sx --target windows-gnu -o app.exe # cf. example 1660 (win32)
# iOS simulator (mac-only host)
sx build app.sx --target ios-sim --bundle App.app
# iOS device — signing threaded via the build program (BuildOptions setters)
# #run { o := build_options(); o.set_bundle_id(...); o.set_codesign_identity(...);
# o.set_provisioning_profile(...); }
sx build build.sx --target ios --bundle App.app
# Android (cross + bundle): javac → d8 → aapt2 → zipalign → apksigner, then adb
sx build app.sx --target android --apk app.apk
```
### 6.3 Where the roadmap lights up cross-platform
- **C1 + C4** → the iOS/Android **bundlers** (orchestrate ~a dozen host tools at
comptime; biggest win; always host-arch so no cross-arch risk).
- **R1/R2 + A1A5** → the **inner dev loop for non-host targets**: push-a-dylib +
remote-trigger-reload over an async laptop↔device channel — a capability that
*doesn't exist today* short of full rebuild+reinstall.
- **A1/A2 colorblind `Io`** → the dev tooling is itself async, and the **same
networking code runs blocking inside the bundler** (`adb push`) and async in the
live session — no coloring.
- **Pinning (A5)** → the UI render fiber pins to the main OS thread on every app
target.
**The single hard constraint the matrix exposes:** cross builds mean target arch ≠
host arch, so **C3's residue bites** — comptime/`#run` code reaching *target-arch*
inline asm can't execute on the mac. Native macOS dev never hits it; every cross
target must gate comptime asm to host-arch (`when host_arch == …`) or get a loud
diagnostic.
---
## 7. Linear build sequence (async-first — no parallel streams)
Single ordered list; deps satisfied at every step. **Async-first** (user-chosen): the
async story needs no JIT spine (syscalls use the existing trampoline FFI; comptime
async = blocking `Io`), so the FFI/JIT cluster comes *after*. C4 is omitted (dropped —
an S1 optimization if ever profiled). Net-new compiler prereqs (per the codebase
grounding) are explicit steps, not buried.
**Foundations — compiler primitives the async story needs (all net-new):**
1. **N1 — Atomics lowering.** IR/inference scaffolding exists; add LLVM
`atomicrmw`/`cmpxchg`/`fence` emission + orderings. Surface = `Atomic($T)` wrapper.
Gates channels/N3 + parallel schedulers.
2. ~~**Generic enums** `enum($T)`~~ **DROPPED.** `RecvResult($T)`/`TryResult($T)` are
**type-fns over `reify`** (step 3), not a new `enum($T)` language feature — and
type-fns (user `($T)->Type` in type position) **already work** (e.g.
[`Make`](../examples/0208-generics-value-param-type-function.sx),
[`Complex`](../examples/0201-generics-generic-struct.sx)). A declarative `enum($T)`
surface, if ever wanted, is later *sugar* desugaring to a type-fn-over-`reify`.
3. **`type_info` + `reify` + `field_type`** — comptime metaprogramming floor. Gates
`race` synthesis **and** channel `RecvResult`/`TryResult` (all type-fns over
`reify`; **generic-enum syntax dropped**). **Validated against the codebase (3
reviewers): a small extension reusing existing machinery throughout — not net-new
architecture.** Five contracts:
1. **Nominal identity via type-fn memoization** — type-fns dedup by mangled
`(fn,args)` name (generic.zig:1620-1629) + reify `findByName`, so `RecvResult(i64)`
is one `TypeId` and the body runs once. (NOT structural dedup — enums are
nominal via `nominal_id`, types.zig:1110.)
2. **Functional through codegen** — layout / construct / match+exhaustiveness /
`toLLVMType` / `type_name`+format are **all type-table-driven, zero AST
coupling**, so a backing-decl-less reify'd enum flows through unmodified.
3. **Validate loudly** at the single `intern`/`internNominal` choke point
(types.zig:411-439): reject dup variants / bad backing / unresolved payloads.
4. **Comptime-only, JIT-free** — a type-table op in the interp; no S1 dependency
(keeps reify, hence channels + `race`, off the JIT critical path).
5. **Reference-based self-reference (v1)**`*Self`/`[]Self` payloads via the
reserve-placeholder→complete path recursive *source* types already use
(nominal.zig:86/108/120, types.zig:442); **by-value recursion rejected** (loud,
infinite size). reify gains a `reify_rec((self) => …)` builder form.
- **Type-minting precedents (7):** monomorphization, protocol vtables, tuples,
vector/array, ptr/slice ctors, FFI stubs, **type-fn instantiation** — all
construct `TypeInfo` programmatically + `intern()`. **Residual = plumbing, not
capability:** name reify-results by the instantiation's mangled name (done for
inline-struct bodies — extend to reify-results) + reify input validation.
4. **`callconv(.naked)`** — extend `CallConv {default, c}` (types.zig:169) + skip
prologue/epilogue lowering. Gates A2.
5. **Repointable-`context` codegen** — lower `context` as a swappable indirection
(never raw TLS) + per-fiber stack-limit. Compiler obligation; gates A2 *and*
cross-fiber `context.io` correctness. (Reviewer note: this is a **prerequisite**
of A2, not a successor.)
**Async runtime — sx lib over the primitives:**
6. **A1 — `Io` interface + `context.io` + `Future` + `cancel()` API.**
7. **A2 — fiber runtime** (naked context-switch asm, bootstrap, `mmap` stacks).
8. **A3 — blocking `Io` → deterministic-sim `Io` (keystone, calibrated) → event-loop `Io`.**
9. **A5·M:1 — single-thread scheduler.**
10. **N3 — fiber-aware sync** (channels/mutex/waitgroup; `recv → RecvResult`).
11. **A6 — Cancellation.** `.canceled` in the `!` channel (model a); per-fiber atomic
flag (N1); every `io.*` a cancellation point; structured cancel-and-join; **masked
during cleanup**.
12. **A4 — stdlib I/O rework** (fs/socket/process onto `context.io`).
13. **A5·N×(M:1)** — first parallel (errno-capture + `context`-fiber-local discipline).
14. **A5·M:N** — work-stealing (steal queues + migration + pinning).
**Then comptime / FFI / JIT cluster:**
15. **S1 — persistent JIT spine** → 16. **C1 — real FFI (LLVM = ABI authority, on S1)**
→ 17. **C2 — `#compiler`→`extern`** → 18. **C3 — comptime asm** (S1 + C1; +S2 if
TLS/ctors).
**Deferred tail:**
19. **S2 — ORC C++ shim** (highest-risk — see §8; macOS `MachOPlatform`; ELF/COFF
unplanned) → 20. **R1 — dylib reload** (shipped `export`) → 21. **R2 —
JIT-resident reload** (S1 + S2; **↔ async live-fiber coupling**, §8) → 22. **R3 —
incremental compilation**.
Hard edges to remember: **C1 depends on S1** (the non-trivial FFI cases); **C3 depends
on C1** (calls through its thunk path); **R1/R2 couple to the async runtime** (can't
hot-swap code with live suspended fibers — runtime + long-lived fibers stay
persistent, only leaf logic reloads).
---
## 8. Irreducible hard problems (detect-and-degrade, don't pretend)
1. **State migration across layout change** (R1/R2) → v1 detects + rebuild/restart;
migration hooks later.
2. **Cross-arch comptime asm** (C3) → can't run on host; narrows the bail + loud
diagnostic; gate to host-arch.
3. **M:N migration hazards** (A5) → errno-capture discipline + fiber-local context
(mandatory), pinning for thread-affine work.
### 8.1 Highest technical risks (from review — ranked, async-first lens)
1. **A2 context-switch correctness** (in the async critical path). Silent stack
corruption, per-arch, **untestable by the deterministic-`Io` harness** (it tests
*scheduling*, not the *switch*); a one-register slip is invisible until it crashes
on the right arch. Couples *library asm* to the *compiler ABI* — ABI drift breaks
it silently later. → needs a dedicated **switch-stress test** (§10).
2. **`reify` → anonymous-tagged-union → match-codegen** (gates `race` + channels).
**DE-RISKED by review** (§7 step 3): all enum stages are type-table-driven with
zero AST coupling, identity is handled by existing type-fn mangled-name memoization,
and forward-declaration for self-ref already exists. Residual is *plumbing*
(name reify-results by mangled name + input validation), not new architecture.
3. **Deterministic-`Io` is the test keystone yet itself uncalibrated** — a buggy
deterministic scheduler yields deterministic-*wrong* stdout that snapshots lock in.
→ calibrate against the blocking `Io` / property-test fixed order (§10).
4. **`context`-fiber-local + errno discipline** (A5 M:N). "Non-negotiable" but
enforced by manual rule, not the compiler; M:1 can't even exercise migration.
5. **S2 ORC shim** (deferred, but highest-risk when reached): only C++ in the tree,
**already failed a spike** (`_Thread_local` SIGABRT), `MachOPlatform` is
macOS-specific — **Linux/Windows JIT-resident reload + non-Mac TLS/ctor JIT have no
named plan**. One "M" box hides a per-OS effort.
6. **C1 args-buffer layout-vs-ABI** — "LLVM emits the call" covers the *call*, not the
interpreter's *buffer pack* from `type_info`. Disagreement on edge layouts
(over-aligned/empty structs, aarch64 small-struct register splitting, `bool`) =
silent comptime corruption. → adversarial layout cases (§10).
---
## 9. Decisions log (all resolved)
**Sequencing — locked:** **async-first** (§7). The async cluster (steps 114)
precedes the FFI/JIT cluster (1518) because async needs no JIT spine. **Cancellation
(A6) = model (a)** — a `.canceled` variant in the **existing `!` error channel** that
`io.*` already returns (I/O is inherently fallible, so `io.*` is already `!`-typed —
the "keep calls clean" argument for the non-local-`raise` model is moot). Reuses
`!`/`try`/`catch`/`onfail`; no new unwind primitive. **Net-new prereq surfaced by
grounding:** `callconv(.naked)` (only `.default`/`.c` today). **Generic enums dropped**
`RecvResult($T)`/`TryResult($T)` are **type-fns over `reify`** (type-fns already work
in type position, e.g. `Make`/`Complex`), so no `enum($T)` feature is needed; `reify`
gains two contracts (deterministic identity + functional-enum output, §7 step 3).
**Locked (see §4.6 for the grounded surface):**
- **N1 atomics surface = generic wrapper `Atomic($T)`** + `Ordering` enum, `.init`,
`compare_exchange`/`_weak` returning `?T` (**null = success** — pinned, opposite of
most priors). (Not `@atomic_*` builtins — `@` is address-of in sx.) **RMW set** =
`add/sub/and/or/xor/swap` + `fetch_min`/`fetch_max` (free from LLVM); **no `nand`**.
- **`race` = over futures** (Zig model), **single named-tuple in** (`race((a: fa, b:
fb))`) → synthesized tagged-union out; Go-style handler-map + map literal
**dropped**. **No `async` spawn-sugar** — always `context.io.async(...)`.
- **Channels** = `send`/`recv` methods (no `<-`); **`recv` returns a tagged union**
`RecvResult($T){ value; closed }` (not `(v, ok)`), `try_recv` → `{ value; empty;
closed }`; optional `for ch (v) {…}` iteration sugar. **locks** = `lock()` + `defer
unlock()` (no guard sugar). `race`/`async`/`await` stay library, not keywords.
- **Comptime type metaprogramming = `type_info` + `reify` builtins only** (Zig
`@typeInfo`/`@Type` model). **Everything else is sx lib** — `make_enum`,
`field_type`, `RaceResult`. `reify` coverage starts at **enum/struct/tuple**, grows
later. `Future($T)` exposes `Value :: T` so `Future(X)→X` is plain member access
(no `type_arg` builtin).
- **C1 FFI engine = LLVM as single ABI authority** — per-signature JIT calling-thunks
via S1 (LLVM emits the ABI-correct call, same as runtime codegen); trampoline
fast-path for trivial calls. **libffi/dyncall + hand-rolled-sx rejected** (2nd/3rd
ABI impl; hand-rolled needs C3 for its own asm leaf anyway). Promotes **S1 to
foundational** (shared by C1, C3).
**Scheduler (Decision 5) — locked:** **M:1 → N×(M:1) → M:N**, all **sx std-lib `Io`
vtables** (compiler only provides N1 atomics + the A2 asm context-switch + `extern`
syscalls). M:1 ships first (validates the colorblind stack, covers I/O-bound);
N×(M:1) is the first parallel step; **M:N is last in sequence but committed — not
deferred.** Data races under parallelism are expected and handled with atomics +
fiber-aware sync — that *is* parallelism, not a wart; M:1's lock-freedom is just a
property of the single-threaded case.
**Deferred, orthogonal additions (Decisions 67) — both addable later without
revisiting anything locked:**
- **C4 (Decision 6) — fully orthogonal; not built now.** Pure deferred optimization
riding S1 (already present for C1/C3): JIT the bundler subgraph instead of
interpreting it. Zero coupling — same bundler sx, same C1 FFI. Apply only if
profiling ever shows the bundler's *own logic* is a hotspot (it's I/O-bound, so
unlikely). Interp+C1 is the shipping bundler.
- **Hot-reload (Decision 7) — deferred; mechanism additive.** Substrate ready: R1
(dylib-swap) needs only shipped `export`; R2 (JIT-resident) needs S1 + the S2 ORC
shim. **R1-vs-R2 chosen at pickup.** One coupling (a design constraint, not a
decision change): you can't hot-swap code with **live suspended fibers** pointing
into the old module — so the async runtime + long-lived fibers stay on the
*persistent* side, only transient **leaf logic** is reloadable (or quiesce fibers
before swap).
---
## 10. Testing & gates
Inherits the project cadence (CLAUDE.md): `zig build && zig build test` after every
step; **xfail-then-green or behavior-lock — no commit both adds a test AND makes it
pass**; never regenerate snapshots while red; corpus = `examples/` + `issues/` with
`.exit`/`.stdout`/`.stderr`/`.ir` snapshots. Per-*step* gates live in the eventual
`PLAN-*` streams; this section is the design-level verification strategy that those
streams must implement.
### 10.1 The async test harness = the deterministic-simulation `Io` (the keystone)
Concurrency is nondeterministic (scheduling/readiness order), which **breaks snapshot
testing** outright. So the **deterministic-sim `Io`** (fixed clock, scripted
readiness, deterministic single-stepping scheduler) is not merely a feature — it is
**the test harness for everything async**. Every concurrency example runs under it →
reproducible stdout → snapshottable. Consequence for sequencing: **build the
deterministic `Io` right after the blocking `Io`** (it's the simplest scheduler after
blocking and it *gates the ability to test* fibers/channels/race/schedulers at all).
The 10 patterns in §4.6-adjacent examples become corpus tests only because they run
under it.
### 10.2 What is NOT snapshot-testable
True parallel **data races** (N×M:1 / M:N) are nondeterministic by construction. They
run under the deterministic `Io` for *correctness* repro, but race-detection needs a
separate **stress harness** (run-N-times / TSan-style), **not** the corpus. Any such
coverage bound must be stated loudly (a `log()`-style note in the harness), never
silently skipped — per the REJECTED-PATTERNS rule against silent gaps.
### 10.3 Arch-sensitive lowering — atomics + context-switch
Atomic orderings lower differently per arch (x86 `lock`-prefix / plain MOV vs aarch64
LL/SC / `ldar`/`stlr`), and the A2 context-switch is per-arch asm. Lock both with the
**existing inline-asm cross-arch sibling pattern**: a `.build` `{"target": "…"}`
sidecar runs **ir-only** on a non-matching host (asserts `.ir` + `.exit` + `.stderr`
from `sx ir --target`) and **end-to-end** on a matching CI runner. So `Atomic`
lowering carries **x86_64 + aarch64 `.ir`** snapshots; the context-switch gets
per-arch run tests on matching runners.
### 10.4 New corpus categories
`17xx` atomics · `18xx` concurrency (fibers/channels/race/async, all under the
deterministic `Io`). Comptime metaprogramming (`type_info`/`reify`) + comptime-asm
extend `06xx`; C1 FFI extends `12xx`; the cross-arch comptime-asm **loud bail** and
the cancellation diagnostics are `11xx`.
### 10.5 Per-piece gates (design level)
| Piece | Locks via |
|---|---|
| **N1 atomics** | unit `emit_llvm.test.zig` (LLVM `atomicrmw`/`cmpxchg`/`fence` + ordering emission); corpus `17xx` single-thread (deterministic); arch-gated `.ir` (x86_64 + aarch64) |
| **type_info / reify** | unit (reflect round-trips; reify'd enum has correct layout/match codegen); corpus `06xx` comptime (deterministic) |
| **C1 FFI** | **behavior-lock** existing trampoline cases first; then xfail→green `12xx` comptime extern with floats / structs-by-value / aggregate (`{ptr,len}`) returns; unit for thunk-synth + args-buffer marshal |
| **S1 spine** | infra — exercised transitively via C1/C3 examples; unit for LLJIT lifecycle + thunk cache |
| **C3 comptime asm** | corpus `06xx` host-arch `#run` asm computes a value; `11xx` diagnostic asserts the cross-arch loud bail |
| **A1/A2 fibers** | unit (scheduler step, fiber bootstrap); context-switch arch-gated run tests; corpus `18xx` under deterministic `Io` |
| **A3/A5 schedulers, channels, race, cancel** | corpus `18xx` (the 10 patterns) under deterministic `Io` → deterministic snapshots; cancellation cleanup (`onfail`/`defer`) asserted via stdout ordering |
### 10.6 Cadence example (atomics, N1)
1. **xfail** — add `examples/17xx-atomics-fetch-add.sx` using `Atomic(i64).fetch_add`; seed the `.exit` marker → **red** (codegen missing). *(test added, not yet passing)*
2. **green** — emit LLVM `atomicrmw add` + ordering; example passes; capture `.stdout` + x86_64/aarch64 `.ir` snapshots; review the diff. *(makes it pass, no new test)*
This satisfies "no commit both adds a test and makes it pass," and every other piece
follows the same xfail→green (or behavior-lock→extend) shape.
### 10.7 Review-surfaced gaps (the high-corruption-risk pieces need *correctness*, not existence, tests)
The §10.5 gates prove things *run*; the §8.1 risks are silent-corruption modes a
run/snapshot test won't catch. Each needs an explicit adversarial gate:
- **A2 context-switch — switch-stress test.** Scribble *every* callee-saved register
+ a stack-canary before suspend; deep/recursive fiber chains; verify all survive
post-resume. Run/snapshot tests don't prove register preservation. (The single
highest-corruption-risk piece, §8.1.1.)
- **Deterministic-`Io` — calibrate the oracle.** Cross-check a handful of cases
against the blocking `Io` and property-test that scheduling order is actually fixed,
*before* trusting it to gate everything async (a deterministic-but-wrong scheduler
snapshots garbage).
- **`context`-fiber-local invariant — named test at the N×M:1/M:N step.** M:1 can't
exercise migration; add a test that forces a fiber to migrate and asserts it reads
*its* `context`/`errno`, not the new thread's.
- **N1 ordering *semantics* are out of snapshot scope — state it loudly.** `.ir`
snapshots prove the *keyword emitted*, not weak-memory correctness (e.g. `relaxed`
where `acquire` was needed ships green). Declare this out-of-scope parallel to
§10.2's race carve-out; lock-free structures need the stress harness.
- **C1 args-buffer — adversarial layout cases.** Over-aligned structs, empty structs,
aarch64 small-struct register splitting, `bool` — a wrong layout that happens to
print right passes a stdout test. Call these out explicitly, not just
"structs-by-value."
- **S2 — has no gate today despite a prior spike failure.** When reached, add a TLS +
C-constructor JIT test (the exact `_Thread_local` SIGABRT case), per host OS.
- **Hot-reload — no row today.** When picked up: state-survival test + the
live-suspended-fiber-into-stale-module hazard (R1/R2).

View File

@@ -549,6 +549,40 @@ Lexer/token: add `kw_asm` to the `Token.Tag` enum + keyword `StaticStringMap` in
* Every `%[name]` referenced in the template must name an operand (best surfaced as
a Sema diagnostic; also caught at codegen during the rewrite — §II.6).
### Operand naming rule (auto-name from a `{reg}` pin) — DECIDED
The `[name]` label on an operand is purely an sx-surface convenience: it provides
the `%[name]` template alias and (for `out_value`) the result tuple's field name.
LLVM never sees it (it sees positional `${N}` + the constraint). To kill the
common redundancy where a label just echoes its pinned register
(`[eax] "={eax}"`), the **operand name is derived as follows**, uniformly across
every operand kind (`out_value` / `out_place` / read-write / `input`):
1. **Explicit `[name]` wins** — use it verbatim (the `%[name]` alias / field name).
2. **Else, if the constraint pins a single register**`"={eax}"`, `"{rdi}"`,
`"+{rax}"`, i.e. a `{reg}` body (optionally with a `=`/`+` prefix) — the operand
is **auto-named after that register** (`eax`, `rdi`, `rax`). Usable as
`%[eax]` and as the tuple field name.
3. **Else (register-class `=r`/`+r`/`r`, or memory `=m`, …)** — the operand has
**no implicit name**. A `[name]` is then **required** if the template
references it (`%[name]`) or, for `out_value`, if a named result field is
wanted; otherwise it is anonymous (positional tuple field).
Corollaries:
* **Reject the echo form.** An explicit `[name]` that is identical to the
register its own constraint pins (`[eax] "={eax}"`) carries no information —
emit a diagnostic ("redundant operand name `eax` — it already names the pinned
register; drop the `[eax]`"). The useful form is a label that *differs* from the
register (`[quot] "={rax}"` → field `quot` over register `rax`).
* **Result field names** (the §II.5 result-type rule above) come from each
`out_value`'s *effective* name — explicit `[name]`, else the auto-derived
register name; positional only when neither exists (a class-constrained output
with no `[name]`).
* This is a **typing-stage** rule: the parser still stores `name: ?[]const u8`
(null when no `[name]` was written); Sema computes the effective name. No
parser change.
Note: there is **no** "≤1 output" rule (that was Zig's limit; sx's tuples lift it).
## II.6 sx IR + LLVM codegen (the part that must match Zig bit-for-bit)

437
docs/inline-assembly.md Normal file
View File

@@ -0,0 +1,437 @@
# Inline Assembly in sx
A guide to writing inline assembly in sx — emitting raw target
instructions, wiring values in and out, writing through memory, and
defining whole routines in assembly.
> Looking for the *why* behind the design (how it maps to LLVM, the
> Zig comparison, the emit algorithm)? That lives in
> [inline-asm-design.md](../design/inline-asm-design.md). This page is the
> user-facing how-to.
---
## The mental model
`asm` is an **expression**. It drops to the machine: you write a
template of real instructions, declare which sx values feed registers
going in and which come back out, and the block evaluates to the
output value (or a tuple of them).
```sx
add :: (a: i64, b: i64) -> i64 {
return asm { "add %[out], %[a], %[b]", [out] "=r" -> i64, [a] "r" = a, [b] "r" = b };
}
```
Three things to know up front:
1. **The body is a brace block of comma-separated parts:** the template
string first, then operands, then an optional `clobbers(.…)` clause.
2. **Each operand is tagged by role**, not by position: `-> Type` is a
value output, `= expr` is an input, `-> @place` writes through to
existing storage. The list is flat and order-independent — there are
no positional `:` sections.
3. **The outputs decide the result.** Zero outputs → `void` (and the
block must be `volatile`); one → that type; many → a tuple.
Templates are **AT&T syntax** (lowered through LLVM), **target-specific**,
and **never run at compile time** — see [When it runs](#when-it-runs).
---
## Operands
An operand is `[name]? "constraint" <role>`. The constraint string is
the LLVM/GCC-style constraint; the role marker says what the operand
does.
### Inputs — `= expr`
`= expr` feeds a value in. The constraint picks where it lands:
```sx
[a] "r" = a // any general register
"{rdi}" = fd // pinned to a specific register (x86_64 rdi)
```
### Symbol inputs — `"s" = fn`
A `"s"` input feeds a **function or global symbol** (not a runtime value).
In the template, `%[name]` expands to the symbol's **platform-mangled
name**, so you can branch or call straight to it:
```sx
cb :: (n: i64) -> i64 export "cb" { return n + 1; }
trampoline :: (n: i64) -> i64 {
return asm volatile {
#string ASM
mov x0, %[arg]
bl %[fn] // DIRECT call — `bl _cb` on macOS, `bl cb` on Linux
mov %[res], x0
ASM,
[res] "=r" -> i64,
[arg] "r" = n,
[fn] "s" = cb, // symbol operand
clobbers(.x0, .x30, .memory),
};
}
```
The same `%[fn]` works on **x86_64** — just the branch mnemonic differs:
```sx
return asm volatile {
"call %[fn]", // x86_64 — same portable %[fn]
[ret] "={rax}" -> i64,
"{rdi}" = n,
[fn] "s" = cb,
clobbers(.rcx, .rdx, .rsi, .r8, .r9, .r10, .r11, .memory),
};
```
Two reasons to prefer this over passing a function *pointer* in a plain
`"r"` register and using an indirect `blr`/`call *`:
- **One fewer indirection** — a direct PC-relative branch, no pointer
load into a register, and a predictable (non-indirect) branch.
- **Portable** — `%[fn]` is the same on every target; the backend emits
the correctly-mangled name, so you never hardcode the macOS leading
underscore *or* a per-arch operand modifier.
**How the portability works.** A bare `%[fn]` would render differently
per target — on x86 the symbol prints as `$cb` (an immediate `$`-prefix
that `call` rejects), while aarch64 prints it bare. So for a symbol (`"s"`)
operand the compiler **auto-injects LLVM's `:c` operand modifier** (`%[fn]`
`${N:c}`, "print the constant with no punctuation"). `:c` prints the
plain symbol on every target — equivalent to the GCC `:P`/`%P0` call-target
idiom on x86 (both emit the same `R_X86_64_PLT32` relocation) and a no-op
on aarch64. You can still override it with an explicit `%[fn:X]` if you
ever need a different rendering, but for a call/branch you never should.
The callee needs a stable, externally-linked symbol — i.e. `export`
(which also gives it the C ABI). A plain or `callconv(.c)`-only function
is `internal` and gets dead-code-eliminated, so the symbol won't link.
(A global-scope `asm { … }` routine has no operand list, so it can't use
a symbol operand — it references the literal symbol in its text.)
### Value outputs — `-> Type`
`-> Type` produces a value that becomes (part of) the block's result:
```sx
[out] "=r" -> i64 // result in any register
"={rax}" -> i64 // result pinned to rax
```
### Naming and `%[name]`
Inside the template, `%[name]` refers to an operand by its **effective
name**. An operand pinned to a register is **auto-named after that
register** — `"{rdi}"` is reachable as `%[rdi]`, `"={rax}"` as `%[rax]`
— so an explicit `[name]` is only needed:
- for a register-**class** operand (`"=r"`, `"r"`), which has no register
to name it; or
- to give a pinned operand a name *different* from its register.
Two labels are rejected so names stay unambiguous:
- the **echo form** `[rax] "={rax}"` — the label just repeats the pin, so
drop it (the operand is already `%[rax]`); and
- **duplicate** operand names.
In the template, `%%` is a literal `%`, and `%=` expands to a unique id
(handy for a local label that must differ across inlinings).
### The result type
The number of **value** outputs (`-> Type`) decides the block's type:
| `-> Type` outputs | result | example |
|---|---|---|
| 0 | `void` — must be `volatile` | `asm volatile { "dmb ish" }` |
| 1 | that type `T` | `x := asm { …, "=r" -> i64 }` |
| N | a **tuple**, fields named by each operand's name | `lo, hi := asm { … }` |
With multiple outputs you get real multiple return values — a named
operand becomes a named tuple field:
```sx
// aarch64 — split a value into low/high bytes
split :: (x: u64) -> (lo: u64, hi: u64) {
return asm {
#string ASM
and %[l], %[x], #0xff
lsr %[h], %[x], #8
ASM,
[l] "=r" -> u64, // → .lo (operand 0)
[h] "=r" -> u64, // → .hi (operand 1)
[x] "r" = x,
};
}
lo, hi := split(0x1234); // (0x34, 0x12) = (52, 18)
```
---
## `volatile`
`asm volatile { … }` marks the block as having side effects, so the
optimizer won't move or delete it. It is **required whenever there are
no value outputs** — a result-less, non-volatile asm would be dead code.
```sx
barrier :: () { asm volatile { "dmb ish" }; } // aarch64 full barrier
```
A block with outputs may still be `volatile` when its effects matter
beyond the returned value (e.g. a syscall).
---
## `clobbers(.…)`
`clobbers(.…)` is a dot-name list of registers and flags the asm trashes
that aren't already operands — so the register allocator keeps clear of
them:
```sx
clobbers(.rcx, .r11, .memory) // x86_64 syscall trashes rcx, r11, and memory
clobbers(.cc) // condition flags
```
`.memory` means "this asm reads or writes memory the compiler can't see,"
and `.cc` means "the condition flags are modified."
---
## Writing through memory — `-> @place`
Sometimes the asm should write into existing storage (a local, a struct
field) rather than *return* a value. `-> @place` does that: the place
output does **not** join the result tuple. There are three forms,
distinguished by the constraint.
### Write-through — `= …` constraint
The asm computes a value into a register; sx stores it through the
place's address afterward.
```sx
compute :: () -> i64 {
other : i64 = 0;
main_val := asm volatile {
#string ASM
mov %[m], #5
mov %[o], #37
ASM,
[m] "=r" -> i64, // value output → returned into main_val
[o] "=r" -> @other, // place output → stored through @other
};
return main_val + other; // 5 + 37 = 42
}
```
A value output and one or more place outputs can mix freely; only the
value outputs build the returned tuple.
### Read-write — `+` constraint
A `+` operand is read **and** written: the place's current value is fed
in, the asm updates it in place, and the result is stored back.
```sx
// increment-in-place: x is loaded, the asm adds 1, the result is stored back
bump :: () -> i64 {
x : i64 = 41;
asm volatile { "add %[v], %[v], #1", [v] "+r" -> @x };
return x; // 42
}
```
### Indirect memory — `=*m` constraint
An `=*m` operand passes the place's **address** to the asm, which writes
through it directly (no register round-trip, no return slot):
```sx
// store 42 straight into x's storage
poke :: () -> i64 {
x : i64 = 0;
asm volatile {
#string ASM
mov x9, #42
str x9, %[out]
ASM,
[out] "=*m" -> @x,
clobbers(.x9),
};
return x; // 42
}
```
**The place must be mutable storage.** Taking the address of a scalar
`::` constant has no meaning — a scalar constant folds to its value and
has no storage — so `-> @SOME_CONST` is a compile error:
```
cannot take the address of constant 'SOME_CONST' — a scalar '::'
constant has no storage (use a '=' variable or a local copy)
```
---
## Multi-instruction templates
A single `"…"` string is one fragment. For several instructions, use a
multi-line string literal or sx's **`#string` heredoc**, which is
delivered **verbatim** — no escape processing — so you write assembly
exactly as it should appear:
```sx
serialize :: () {
asm volatile {
#string ASM
mfence
lfence
ASM,
};
}
```
---
## Global (module-scope) assembly
A top-level `asm { … }` block is **global assembly** — template only
(no operands, no `volatile`), emitted as module-level assembly. It is
the place to define a whole routine in assembly. Symbols it defines are
reached from sx with a **lib-less `extern`** declaration:
```sx
asm {
#string ASM
.global _my_add
_my_add:
add x0, x0, x1
ret
ASM,
};
my_add :: (a: i64, b: i64) -> i64 extern;
main :: () -> i64 {
return my_add(40, 2); // 42 — computed by the global-asm routine
}
```
Multiple global blocks concatenate in source order. (Symbol naming
follows the platform convention — a leading underscore on macOS, none
on Linux.)
---
## When it runs
Inline assembly is emitted into the program and runs at **runtime**,
under both execution paths:
- **`sx run` (JIT)** — the module is compiled to an in-memory object
(the integrated assembler assembles your asm, including global blocks),
then run. Both inline and global asm work.
- **`sx build` (AOT)** — same, into a native binary.
It does **not** run at **compile time**. A `#run` (comptime) call into a
global-asm symbol fails loudly:
```sx
COMPUTED :: #run my_add(40, 2); // error: the symbol isn't linked yet at comptime
```
```
comptime extern call: symbol not found via dlsym
```
The comptime interpreter resolves `extern` calls against the host
process; a module-asm symbol only exists once the program is
assembled and linked, so call it at runtime, not in a `#run`.
---
## Cookbook
**Read a register** (no inputs):
```sx
stack_ptr :: () -> u64 {
return asm { "mov %[out], sp", [out] "=r" -> u64 }; // aarch64
}
```
**x86_64 syscall**`write(2)`, with pinned registers and clobbers:
```sx
sys_write :: (fd: i64, buf: *u8, count: i64) -> i64 {
return asm volatile {
"syscall",
[ret] "={rax}" -> i64, // bytes written, in rax
"{rax}" = 1, // SYS_write
"{rdi}" = fd,
"{rsi}" = buf,
"{rdx}" = count,
clobbers(.rcx, .r11, .memory),
};
}
```
**x86_64 divmod** — one instruction, two outputs, returned as a tuple:
```sx
divmod :: (n: u64, d: u64) -> (quot: u64, rem: u64) {
return asm {
"divq %[d]",
[quot] "={rax}" -> u64,
[rem] "={rdx}" -> u64,
"{rax}" = n, "{rdx}" = 0, [d] "r" = d,
clobbers(.cc),
};
}
q, r := divmod(17, 5); // (3, 2)
```
---
## Rules of thumb
- **`asm` yields a value.** Bind it (`x := asm { … }`), `return` it, or
destructure a multi-output tuple (`a, b := asm { … }`). A block with no
value outputs must be `volatile`.
- **Pinned operands name themselves.** `"{rdi}"` is `%[rdi]`; only add
`[name]` for register-class operands or to rename. Don't echo a pin
(`[rax] "={rax}"`).
- **`%%` for a literal percent; `%[name]` for an operand.** Templates are
AT&T.
- **List everything you trash** in `clobbers(.…)` — scratch registers,
`.cc`, and `.memory` if the asm touches memory the compiler can't see.
- **`-> @place` writes storage; pick the form:** `=` (compute then
store), `+` (read-modify-write), `=*m` (write through the address).
The place must be mutable — not a scalar `::` constant.
- **Global `asm { … }`** defines symbols; import them with a lib-less
`extern`. They run under JIT and AOT, but **not** in a `#run`.
- **It's target-specific.** Gate or pick instructions per architecture;
there is no portable instruction set.
---
## See also
- [inline-asm-design.md](../design/inline-asm-design.md) — the design rationale and
LLVM mapping.
- `examples/16xx-platform-asm-*` — the full, runnable example matrix
(basic in/out, tuples, the three `-> @place` forms, global asm, the
x86_64 syscall, and the comptime-boundary guard).
- The "Inline Assembly" section of [readme.md](../readme.md) for a
one-screen overview.
```

View File

@@ -0,0 +1,18 @@
// Taking the address of a scalar `::` constant is a compile error: a scalar
// constant folds to its value and has NO storage (only array/struct constants
// are immutable globals with a real address — see 0177). Covers a module-scope
// const, a local const, and an inline-asm `-> @const` write-through (the path
// that surfaced the bug). Before the fix, `@N` lowered to `inttoptr (i64 40 to
// ptr)` — a wild pointer that segfaulted on deref and emitted invalid stores
// for asm `-> @const`. Regression (issue 0138).
takes :: (p: *i64) {}
N :: 40;
main :: () {
takes(@N); // module scalar const — no storage
x :: 7;
takes(@x); // local scalar const — no storage
asm volatile { "mov %[c], #99", [c] "=r" -> @N }; // write-through to a const
}

View File

@@ -0,0 +1,9 @@
// Phase 0 (ASM stream) test-infra lock: exercises the `<name>.build` JSON
// config + `--target` threading + the host-match EXECUTE path of the corpus
// runner. The companion `.build` pins the HOST target (`{ "target": "macos" }`
// resolves to the host arch+os), so the runner threads `--target` and still
// runs the example natively — its stdout is asserted as usual.
#import "modules/std.sx";
main :: () {
print("target-host ok\n");
}

View File

@@ -0,0 +1,7 @@
// Phase 0 (ASM stream) test-infra lock: exercises the corpus runner's
// CROSS-TARGET ir-only path. The `.build` pins `x86_64-linux`, which does NOT
// match this aarch64 host, so the runner skips run/build/exec and verifies via
// `sx ir --target x86_64-linux` only — asserting exit + the `.ir` snapshot +
// stderr (no `.stdout`). Asm-free on purpose: it locks the harness gating, not
// any inline-asm lowering (that arrives in Phase A+).
main :: () -> i64 { return 0; }

View File

@@ -0,0 +1,20 @@
// ASM stream Phase E — x86_64 multi-output asm: `divq` produces quotient in rax
// and remainder in rdx, returned as a `(quot, rem)` tuple. Two `={rax}`/`={rdx}`
// value outputs ⇒ LLVM returns a `{ i64, i64 }` struct, which IS sx's tuple
// representation (so `q, r := …` destructures it directly). x86-pinned via
// `.build`: ir-only on a non-x86 host (the `.ir` snapshot locks the struct
// return + `%[name]` rewrite); runs natively on x86_64-linux. See 1647 for a
// multi-output example that executes on aarch64.
divmod :: (n: u64, d: u64) -> (quot: u64, rem: u64) {
return asm {
"divq %[d]",
[quot] "={rax}" -> u64,
[rem] "={rdx}" -> u64,
"{rax}" = n, "{rdx}" = 0, [d] "r" = d,
clobbers(.cc),
};
}
main :: () {
q, r := divmod(17, 5);
}

View File

@@ -0,0 +1,6 @@
// ASM stream Phase B — an asm with no value outputs yields no result, so its
// effects could be deleted unless it is marked `volatile`. This omits
// `volatile` ⇒ a compile error. Pins that diagnostic (mirrors Zig's rule).
// Called from `main` so lowering reaches the asm body.
nope :: () { asm { "nop" }; }
main :: () { nope(); }

View File

@@ -0,0 +1,5 @@
// ASM stream — the no-output `volatile` form runs end-to-end: a bare `nop`
// (no operands, no result) assembles and executes cleanly (exit 0). Confirms
// the no-output⇒volatile rule's positive side AND the zero-operand emit path.
nop :: () { asm volatile { "nop" }; }
main :: () { nop(); }

View File

@@ -0,0 +1,6 @@
// ASM stream Phase B — operand naming rule (§II.5): an explicit `[name]` that
// just echoes the register its own constraint pins (`[eax] "={eax}"`) carries
// no information — the operand is already auto-named after the register. Reject
// it. The useful form is a label that DIFFERS (e.g. `[quot] "={rax}"`).
f :: () -> u32 { return asm volatile { "cpuid", [eax] "={eax}" -> u32, "{eax}" = 1 }; }
main :: () { x := f(); }

View File

@@ -0,0 +1,4 @@
// ASM stream Phase B — two asm operands may not share a `[name]`: the `%[name]`
// template reference (and the result tuple field) would be ambiguous.
f :: () -> u64 { return asm volatile { "nop", [x] "=r" -> u64, [x] "r" = 5 }; }
main :: () { v := f(); }

View File

@@ -0,0 +1,10 @@
// ASM stream Phase D — inline assembly that RUNS end-to-end. An aarch64 `add`
// with two register-class inputs (`%[a]`, `%[b]`) and a value output (`%[out]`)
// returned from the function. The `.build` pins aarch64-macOS: on a matching
// host the runner executes it (exit 42); elsewhere it falls to ir-only mode and
// asserts the `.ir` snapshot (the inline_asm op + LLVM `call asm` are target-
// independent in the IR text). Regression for the full lower→emit→JIT path.
add_asm :: (a: i64, b: i64) -> i64 {
return asm { "add %[out], %[a], %[b]", [out] "=r" -> i64, [a] "r" = a, [b] "r" = b };
}
main :: () -> i64 { return add_asm(40, 2); }

View File

@@ -0,0 +1,9 @@
// ASM stream Phase D — a bare `x := asm { … -> T }` binding (not a direct
// `return asm`) types correctly: the value output flows through the local and
// out as the exit code. Regression for the `inferType` `.asm_expr` arm (without
// it the binding inferred `.unresolved` and silently produced 0). aarch64-pinned
// via `.build` → runs on a matching host, ir-only elsewhere.
main :: () -> i64 {
x := asm { "mov %[out], #99", [out] "=r" -> i64 };
return x;
}

View File

@@ -0,0 +1,20 @@
// ASM stream Phase E — multi-output asm that RUNS end-to-end on aarch64. Splits
// a value into low/high bytes via two value outputs, returned + destructured as
// a `(lo, hi)` tuple. The two outputs become an LLVM `{ i64, i64 }` struct =
// sx's tuple. aarch64-pinned via `.build`: executes on a matching host (exit
// reflects lo+hi), ir-only elsewhere.
split :: (x: u64) -> (lo: u64, hi: u64) {
return asm {
#string ASM
and %[l], %[x], #0xff
lsr %[h], %[x], #8
ASM,
[l] "=r" -> u64,
[h] "=r" -> u64,
[x] "r" = x,
};
}
main :: () -> i64 {
lo, hi := split(0x1234);
return xx (lo + hi); // 0x34 + 0x12 = 52 + 18 = 70
}

View File

@@ -0,0 +1,20 @@
// ASM stream Phase F — top-level (global) `asm { … }`: a template-only block at
// module scope, lowered to LLVM `module asm` (LLVMAppendModuleInlineAsm). It
// defines a symbol that a lib-less `extern` declaration calls into — the
// import direction reuses the existing C-FFI extern path, no new surface.
// Built+run via `aot` (a module-asm symbol lives in the final linked binary,
// not the JIT host); aarch64-macos-pinned, so ir-only on a non-matching host.
asm {
#string ASM
.global _my_add
_my_add:
add x0, x0, x1
ret
ASM,
};
my_add :: (a: i64, b: i64) -> i64 extern;
main :: () -> i64 {
return my_add(40, 2); // 42, computed by the global-asm routine
}

View File

@@ -0,0 +1,19 @@
// ASM stream Phase 2 — `-> @place` write-through output. An asm result can be
// STORED through a place (a local / struct field) instead of returned: the
// place output does NOT join the result tuple. Here one value output is
// returned (into `main_val`) while a second is written through `@other`. The
// two are combined to 42. Read-write (`+`) and indirect (`*`) place outputs are
// not yet implemented (rejected at lowering). aarch64-pinned; ir-only elsewhere.
compute :: () -> i64 {
other : i64 = 0;
main_val := asm volatile {
#string ASM
mov %[m], #5
mov %[o], #37
ASM,
[m] "=r" -> i64, // value output → returned
[o] "=r" -> @other, // place output → stored through @other
};
return main_val + other; // 5 + 37 = 42
}
main :: () -> i64 { return compute(); }

View File

@@ -0,0 +1,11 @@
// ASM stream Phase 2 — read-write (`+`) place output. The place is LOADED as a
// seed, the asm both reads and writes the operand register (tied input ↔ output),
// and the (modified) result is STORED back through the place. Increment-in-place:
// the register holds 41 on entry, the asm adds 1, 42 is written back to `x`.
// aarch64-pinned; ir-only elsewhere.
compute :: () -> i64 {
x : i64 = 41;
asm volatile { "add %[v], %[v], #1", [v] "+r" -> @x };
return x; // 42
}
main :: () -> i64 { return compute(); }

View File

@@ -0,0 +1,27 @@
// ASM stream — x86_64 Linux `write(2)` via a raw `syscall`. The canonical inline-
// asm use case: SYS_write (rax=1) with fd/buf/count pinned to rdi/rsi/rdx, the
// `syscall` instruction clobbering rcx + r11 (+ memory), and the byte count
// returned in rax. Demonstrates register-pinned inputs, a pinned value output,
// a pointer input (`*u8` → rsi), and `clobbers(.…)` lowering all at once.
//
// x86-pinned via `.build`: ir-only on a non-x86 host — the `.ir` snapshot locks
// the exact constraint string (`={rax},{rax},{rdi},{rsi},{rdx},~{rcx},~{r11},
// ~{memory}`), which is the §II.11 silent-miscompile risk zone — and runs
// natively on x86_64-linux (printing "ok\n"). See 1640 for an x86 multi-output
// example, 1645/1647/1649/1650 for aarch64 examples that execute on this host.
sys_write :: (fd: i64, buf: *u8, count: i64) -> i64 {
return asm volatile {
"syscall",
[ret] "={rax}" -> i64, // return: bytes written, in rax
"{rax}" = 1, // SYS_write (x86_64 Linux)
"{rdi}" = fd, // fd
"{rsi}" = buf, // buf
"{rdx}" = count, // count
clobbers(.rcx, .r11, .memory),
};
}
main :: () {
msg : [3]u8 = .[111, 107, 10]; // "ok\n"
n := sys_write(1, @msg[0], 3);
}

View File

@@ -0,0 +1,19 @@
// ASM stream — indirect-memory (`=*m`) place output. The place address is passed
// to the asm as a pointer and the asm writes THROUGH it (no return slot): here
// `str x9, %[out]` stores 42 into `x`'s storage directly. Distinct from a
// write-through `=` output (which returns a value that is then stored). Mixes a
// value output and an input below to exercise operand ordering. aarch64-pinned;
// ir-only elsewhere (the `.ir` locks the `=*m` constraint + `elementtype` attr).
poke :: () -> i64 {
x : i64 = 0;
asm volatile {
#string ASM
mov x9, #42
str x9, %[out]
ASM,
[out] "=*m" -> @x,
clobbers(.x9),
};
return x; // 42 — written through the pointer
}
main :: () -> i64 { return poke(); }

View File

@@ -0,0 +1,22 @@
// ASM stream — global (module-scope) `asm { … }` executed via the JIT (`sx run`),
// NOT AOT. `sx run` compiles the whole module to an in-memory object (the
// integrated assembler assembles the `module asm` block into it), then ORC
// relocates and runs it — so a module-asm symbol IS resolvable at JIT main
// execution, just like a normal symbol. The only path that can't see it is a
// COMPILE-TIME `#run` call (the interpreter resolves externs via host dlsym; the
// symbol isn't linked yet — see 1654). Sibling of 1648 (which exercises the same
// feature via AOT). aarch64-macos-pinned; ir-only elsewhere.
asm {
#string ASM
.global _my_sub
_my_sub:
sub x0, x0, x1
ret
ASM,
};
my_sub :: (a: i64, b: i64) -> i64 extern;
main :: () -> i64 {
return my_sub(44, 2); // 42, computed by the global-asm routine, under JIT
}

View File

@@ -0,0 +1,22 @@
// ASM stream — calling a global-asm symbol at COMPILE TIME (`#run`) fails loud.
// A module-asm symbol only exists once the module is assembled+linked; the
// comptime interpreter resolves `extern` calls via host `dlsym` (RTLD_DEFAULT),
// where the symbol is absent — so `#run my_add(…)` cannot evaluate and reports a
// clear diagnostic instead of silently misfiring. (Calling the same symbol at
// RUNTIME works under both JIT and AOT — see 1648/1653.) The failure is at
// dlsym resolution, before any asm is assembled, so it is arch-independent —
// no `.build` target needed. Regression guard for the comptime boundary.
asm {
#string ASM
.global _my_add
_my_add:
add x0, x0, x1
ret
ASM,
};
my_add :: (a: i64, b: i64) -> i64 extern;
COMPUTED :: #run my_add(40, 2); // compile-time call — module-asm symbol not yet linked
main :: () -> i64 { return COMPUTED; }

View File

@@ -0,0 +1,35 @@
// ASM stream — the round trip: sx → asm → sx. A global-asm routine (`_caller`)
// CALLS BACK into an sx function (`cb`) by its symbol, then returns. For the asm
// `bl _cb` to resolve, the sx callback needs EXTERNAL LINKAGE and a stable C
// symbol — that is exactly what `export` provides (it also implies the C ABI, so
// no hidden context parameter). `callconv(.c)` alone is NOT enough: it sets the
// ABI but leaves the function `internal`, so it is dead-code-eliminated (nothing
// in the IR references it — the `bl` is opaque to the optimizer) and `_cb` is
// undefined at link. macOS gives `export "cb"` the symbol `_cb` (leading
// underscore), which the template references. aarch64-macos-pinned; runs under
// the JIT here (sx run), ir-only elsewhere.
// The sx callback — `export` gives it external linkage + the `_cb` symbol + C ABI.
cb :: (n: i64) -> i64 export "cb" {
return n + 1;
}
// A global-asm trampoline that calls back into `cb`. It saves/restores the link
// register (x30) around the `bl` — it was itself reached via `bl`, so the return
// address must survive the nested call.
asm {
#string ASM
.global _caller
_caller:
stp x29, x30, [sp, #-16]!
bl _cb // x0 = cb(x0) — back into sx
ldp x29, x30, [sp], #16
ret
ASM,
};
caller :: (n: i64) -> i64 extern;
main :: () -> i64 {
return caller(41); // sx main → asm caller → bl _cb → sx cb → 42
}

View File

@@ -0,0 +1,25 @@
// ASM stream — symbol operand (`"s"`): feed a function/global SYMBOL into the
// template so a DIRECT `bl %[fn]` (PC-relative — one fewer indirection than a
// register-indirect `blr`: no pointer load, a relative reloc, a predictable
// branch) goes straight to it. The backend emits the platform-mangled name
// (`_cb` on macOS, `cb` on Linux), so the template stays portable — no hardcoded
// underscore. Round trip: sx → asm → `bl _cb` → sx → 42. aarch64-macos-pinned;
// runs under the JIT here, ir-only elsewhere (the `.ir` locks `"s"`/`ptr @cb`).
cb :: (n: i64) -> i64 export "cb" { return n + 1; }
tramp :: (n: i64) -> i64 {
return asm volatile {
#string ASM
stp x29, x30, [sp, #-16]!
mov x0, %[arg]
bl %[fn]
mov %[res], x0
ldp x29, x30, [sp], #16
ASM,
[res] "=r" -> i64,
[arg] "r" = n,
[fn] "s" = cb, // symbol operand → direct `bl _cb`
clobbers(.x0, .x30, .memory),
};
}
main :: () -> i64 { return tramp(41); }

View File

@@ -0,0 +1,12 @@
// ASM stream — read-write (`+`) place output on x86_64 (cross-arch sibling of
// the aarch64 1650). `incq %[v]` reads the operand register, increments it, and
// the result is stored back through the place. Locks the x86 lowering of `+`:
// an output `=r` plus a tied input (`=r,0`) seeded with the place's value.
// x86-pinned via `.build`: ir-only here (the `.ir` is the assertion), runs
// natively on x86_64-linux (main returns 0 on success, 1 if the asm misbehaved).
bump :: () -> i64 {
x : i64 = 41;
asm volatile { "incq %[v]", [v] "+r" -> @x };
return x; // 42
}
main :: () -> i64 { if bump() != 42 { return 1; } return 0; }

View File

@@ -0,0 +1,12 @@
// ASM stream — indirect-memory (`=*m`) place output on x86_64 (cross-arch sibling
// of the aarch64 1652). `movq $42, %[out]` stores straight through the place's
// address — the address is passed as an opaque `ptr` with an `elementtype(i64)`
// attribute, no return slot. Note `$42`: a literal `$` in the template is escaped
// to LLVM's `$$` and emitted back as `$42` (an x86 immediate). x86-pinned;
// ir-only here, runs on x86_64-linux.
poke :: () -> i64 {
x : i64 = 0;
asm volatile { "movq $42, %[out]", [out] "=*m" -> @x };
return x; // 42
}
main :: () -> i64 { if poke() != 42 { return 1; } return 0; }

View File

@@ -0,0 +1,19 @@
// ASM stream — symbol operand (`"s"`) on x86_64 (cross-arch sibling of the
// aarch64 1656). A DIRECT `call` to an `export`ed sx function by symbol, written
// with the SAME portable `%[fn]` as the aarch64 example — the compiler injects
// the `:c` operand modifier for symbol operands, so the symbol prints bare on
// every target (x86 would otherwise render `$cb`, a bad call target). The
// backend emits the platform-mangled name (`call cb` on Linux). x86-pinned;
// ir-only here, runs on x86_64-linux. Round trip: sx → asm → call cb → sx → 42.
cb :: (n: i64) -> i64 export "cb" { return n + 1; }
tramp :: (n: i64) -> i64 {
return asm volatile {
"call %[fn]",
[ret] "={rax}" -> i64,
"{rdi}" = n, // arg in rdi (SysV)
[fn] "s" = cb, // symbol operand → direct `call cb`
clobbers(.rcx, .rdx, .rsi, .r8, .r9, .r10, .r11, .memory),
};
}
main :: () -> i64 { if tramp(41) != 42 { return 1; } return 0; }

View File

@@ -0,0 +1,29 @@
// Windows x86_64 — print "42" and exit(0) through the Win32 system-call
// boundary. The Windows analog of the Linux raw-`syscall` write (see
// 1651): Windows has no stable raw syscall ABI (NtWriteFile's ordinal
// shifts between OS builds), so the documented boundary IS kernel32 —
// `GetStdHandle` + `WriteFile` to print, `ExitProcess` to terminate.
//
// Exercises the bundled-`zig` link backend end to end: built with
// `--target windows-gnu --self-contained`, zig cc (mingw) auto-resolves
// kernel32, producing a PE32+ that prints "42\n" and exits 0.
//
// Pinned `x86_64-windows-gnu` via `.build`: ir-only on this non-Windows
// host (the `.ir` snapshot locks the Win64-ABI lowering of the three
// extern calls); runs end-to-end on a Windows x86_64 runner.
kernel32 :: #library "kernel32";
// DWORD = u32, HANDLE/LPVOID = *void, BOOL = i32.
GetStdHandle :: (n_std_handle: u32) -> *void extern;
WriteFile :: (file: *void, buf: *u8, n: u32, written: *u32, overlapped: *void) -> i32 extern;
ExitProcess :: (code: u32) -> void extern;
main :: () {
// STD_OUTPUT_HANDLE = (DWORD)-11 = 0xFFFFFFF5.
out := GetStdHandle(0xFFFFFFF5);
msg : [3]u8 = .[52, 50, 10]; // "42\n"
written : u32 = 0;
WriteFile(out, @msg[0], 3, @written, null);
ExitProcess(0);
}

View File

@@ -0,0 +1 @@
1

View File

@@ -0,0 +1,17 @@
error: cannot take the address of constant 'N' — a scalar '::' constant has no storage (use a '=' variable or a local copy for mutable data)
--> examples/1177-diagnostics-addr-of-const-rejected.sx:14:11
|
14 | takes(@N); // module scalar const — no storage
| ^^
error: cannot take the address of constant 'x' — a scalar '::' constant has no storage (use a '=' variable or a local copy for mutable data)
--> examples/1177-diagnostics-addr-of-const-rejected.sx:16:11
|
16 | takes(@x); // local scalar const — no storage
| ^^
error: cannot take the address of constant 'N' — a scalar '::' constant has no storage (use a '=' variable or a local copy for mutable data)
--> examples/1177-diagnostics-addr-of-const-rejected.sx:17:49
|
17 | asm volatile { "mov %[c], #99", [c] "=r" -> @N }; // write-through to a const
| ^^

View File

@@ -0,0 +1 @@
{ "aot": true }

View File

@@ -0,0 +1 @@
{ "aot": true }

View File

@@ -0,0 +1 @@
{ "target": "macos" }

View File

@@ -0,0 +1 @@
0

View File

@@ -0,0 +1 @@

View File

@@ -0,0 +1 @@
target-host ok

View File

@@ -0,0 +1 @@
{ "target": "x86_64-linux" }

View File

@@ -0,0 +1 @@
0

View File

@@ -0,0 +1,6 @@
; Function Attrs: nounwind
define i32 @main() #0 {
entry:
ret i32 0
}

View File

@@ -0,0 +1 @@

View File

@@ -0,0 +1 @@
{ "target": "x86_64-linux" }

View File

@@ -0,0 +1 @@
0

View File

@@ -0,0 +1,26 @@
; Function Attrs: nounwind
define internal { i64, i64 } @divmod(i64 %0, i64 %1) #0 {
entry:
%alloca = alloca i64, align 8
store i64 %0, ptr %alloca, align 8
%allocaN = alloca i64, align 8
store i64 %1, ptr %allocaN, align 8
%load = load i64, ptr %alloca, align 8
%loadN = load i64, ptr %allocaN, align 8
%asm = call { i64, i64 } asm "divq ${4}", "={rax},={rdx},{rax},{rdx},r,~{cc}"(i64 %load, i64 0, i64 %loadN)
ret { i64, i64 } %asm
}
; Function Attrs: nounwind
define i32 @main() #0 {
entry:
%call = call { i64, i64 } @divmod(i64 17, i64 5)
%tg = extractvalue { i64, i64 } %call, 0
%alloca = alloca i64, align 8
store i64 %tg, ptr %alloca, align 8
%tgN = extractvalue { i64, i64 } %call, 1
%allocaN = alloca i64, align 8
store i64 %tgN, ptr %allocaN, align 8
ret i32 0
}

View File

@@ -0,0 +1 @@

View File

@@ -0,0 +1 @@

View File

@@ -0,0 +1 @@
1

View File

@@ -0,0 +1,5 @@
error: asm expression with no outputs must be marked `volatile`
--> examples/1641-platform-asm-missing-volatile.sx:5:14
|
5 | nope :: () { asm { "nop" }; }
| ^^^^^^^^^^^^^

View File

@@ -0,0 +1 @@

View File

@@ -0,0 +1 @@
0

View File

@@ -0,0 +1 @@

View File

@@ -0,0 +1 @@

View File

@@ -0,0 +1 @@
1

View File

@@ -0,0 +1,5 @@
error: redundant asm operand name `eax` — it already names the pinned register; drop the `[eax]`
--> examples/1643-platform-asm-echo-name.sx:5:25
|
5 | f :: () -> u32 { return asm volatile { "cpuid", [eax] "={eax}" -> u32, "{eax}" = 1 }; }
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

View File

@@ -0,0 +1 @@

View File

@@ -0,0 +1 @@
1

View File

@@ -0,0 +1,5 @@
error: duplicate asm operand name `x`
--> examples/1644-platform-asm-duplicate-name.sx:3:25
|
3 | f :: () -> u64 { return asm volatile { "nop", [x] "=r" -> u64, [x] "r" = 5 }; }
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

View File

@@ -0,0 +1 @@

View File

@@ -0,0 +1 @@
{ "target": "macos" }

View File

@@ -0,0 +1 @@
42

View File

@@ -0,0 +1,21 @@
; Function Attrs: nounwind
define internal i64 @add_asm(i64 %0, i64 %1) #0 {
entry:
%alloca = alloca i64, align 8
store i64 %0, ptr %alloca, align 8
%allocaN = alloca i64, align 8
store i64 %1, ptr %allocaN, align 8
%load = load i64, ptr %alloca, align 8
%loadN = load i64, ptr %allocaN, align 8
%asm = call i64 asm "add ${0}, ${1}, ${2}", "=r,r,r"(i64 %load, i64 %loadN)
ret i64 %asm
}
; Function Attrs: nounwind
define i32 @main() #0 {
entry:
%call = call i64 @add_asm(i64 40, i64 2)
%ca.tr = trunc i64 %call to i32
ret i32 %ca.tr
}

View File

@@ -0,0 +1 @@

View File

@@ -0,0 +1 @@

View File

@@ -0,0 +1 @@
{ "target": "macos" }

View File

@@ -0,0 +1 @@
99

View File

@@ -0,0 +1,11 @@
; Function Attrs: nounwind
define i32 @main() #0 {
entry:
%asm = call i64 asm "mov ${0}, #99", "=r"()
%alloca = alloca i64, align 8
store i64 %asm, ptr %alloca, align 8
%load = load i64, ptr %alloca, align 8
%ca.tr = trunc i64 %load to i32
ret i32 %ca.tr
}

View File

@@ -0,0 +1 @@

View File

@@ -0,0 +1 @@

View File

@@ -0,0 +1 @@
{ "target": "macos" }

View File

@@ -0,0 +1 @@
70

View File

@@ -0,0 +1,31 @@
; Function Attrs: nounwind
define internal { i64, i64 } @split(i64 %0) #0 {
entry:
%alloca = alloca i64, align 8
store i64 %0, ptr %alloca, align 8
%load = load i64, ptr %alloca, align 8
%asm = call { i64, i64 } asm " and ${0}, ${2}, #0xff\0A lsr ${1}, ${2}, #8\0A", "=r,=r,r"(i64 %load)
%tg = extractvalue { i64, i64 } %asm, 0
%tgN = extractvalue { i64, i64 } %asm, 1
%ti = insertvalue { i64, i64 } undef, i64 %tg, 0
%tiN = insertvalue { i64, i64 } %ti, i64 %tgN, 1
ret { i64, i64 } %tiN
}
; Function Attrs: nounwind
define i32 @main() #0 {
entry:
%call = call { i64, i64 } @split(i64 4660)
%tg = extractvalue { i64, i64 } %call, 0
%alloca = alloca i64, align 8
store i64 %tg, ptr %alloca, align 8
%tgN = extractvalue { i64, i64 } %call, 1
%allocaN = alloca i64, align 8
store i64 %tgN, ptr %allocaN, align 8
%load = load i64, ptr %alloca, align 8
%loadN = load i64, ptr %allocaN, align 8
%add = add i64 %load, %loadN
%ca.tr = trunc i64 %add to i32
ret i32 %ca.tr
}

View File

@@ -0,0 +1 @@

View File

@@ -0,0 +1 @@

View File

@@ -0,0 +1 @@
{ "aot": true, "target": "macos" }

View File

@@ -0,0 +1 @@
42

View File

@@ -0,0 +1,16 @@
module asm ".global _my_add"
module asm "_my_add:"
module asm " add x0, x0, x1"
module asm " ret"
; Function Attrs: nounwind
declare i64 @my_add(i64, i64) #0
; Function Attrs: nounwind
define i32 @main() #0 {
entry:
%call = call i64 @my_add(i64 40, i64 2)
%ca.tr = trunc i64 %call to i32
ret i32 %ca.tr
}

View File

@@ -0,0 +1 @@

View File

@@ -0,0 +1 @@

View File

@@ -0,0 +1 @@
{ "target": "macos" }

View File

@@ -0,0 +1 @@
42

View File

@@ -0,0 +1,25 @@
; Function Attrs: nounwind
define internal i64 @compute() #0 {
entry:
%alloca = alloca i64, align 8
store i64 0, ptr %alloca, align 8
%asm = call { i64, i64 } asm sideeffect " mov ${0}, #5\0A mov ${1}, #37\0A", "=r,=r"()
%asm.out = extractvalue { i64, i64 } %asm, 0
%asm.out1 = extractvalue { i64, i64 } %asm, 1
store i64 %asm.out1, ptr %alloca, align 8
%allocaN = alloca i64, align 8
store i64 %asm.out, ptr %allocaN, align 8
%load = load i64, ptr %allocaN, align 8
%loadN = load i64, ptr %alloca, align 8
%add = add i64 %load, %loadN
ret i64 %add
}
; Function Attrs: nounwind
define i32 @main() #0 {
entry:
%call = call i64 @compute()
%ca.tr = trunc i64 %call to i32
ret i32 %ca.tr
}

View File

@@ -0,0 +1 @@

View File

@@ -0,0 +1 @@

View File

@@ -0,0 +1 @@
{ "target": "macos" }

View File

@@ -0,0 +1 @@
42

View File

@@ -0,0 +1,20 @@
; Function Attrs: nounwind
define internal i64 @compute() #0 {
entry:
%alloca = alloca i64, align 8
store i64 41, ptr %alloca, align 8
%asm.rw.seed = load i64, ptr %alloca, align 8
%asm = call i64 asm sideeffect "add ${0}, ${0}, #1", "=r,0"(i64 %asm.rw.seed)
store i64 %asm, ptr %alloca, align 8
%load = load i64, ptr %alloca, align 8
ret i64 %load
}
; Function Attrs: nounwind
define i32 @main() #0 {
entry:
%call = call i64 @compute()
%ca.tr = trunc i64 %call to i32
ret i32 %ca.tr
}

View File

@@ -0,0 +1 @@

View File

@@ -0,0 +1 @@

View File

@@ -0,0 +1 @@
{ "target": "x86_64-linux" }

View File

@@ -0,0 +1 @@
0

View File

@@ -0,0 +1,28 @@
; Function Attrs: nounwind
define internal i64 @sys_write(i64 %0, ptr %1, i64 %2) #0 {
entry:
%alloca = alloca i64, align 8
store i64 %0, ptr %alloca, align 8
%allocaN = alloca ptr, align 8
store ptr %1, ptr %allocaN, align 8
%allocaN = alloca i64, align 8
store i64 %2, ptr %allocaN, align 8
%load = load i64, ptr %alloca, align 8
%loadN = load ptr, ptr %allocaN, align 8
%loadN = load i64, ptr %allocaN, align 8
%asm = call i64 asm sideeffect "syscall", "={rax},{rax},{rdi},{rsi},{rdx},~{rcx},~{r11},~{memory}"(i64 1, i64 %load, ptr %loadN, i64 %loadN)
ret i64 %asm
}
; Function Attrs: nounwind
define i32 @main() #0 {
entry:
%alloca = alloca [3 x i8], align 1
store [3 x i8] c"ok\0A", ptr %alloca, align 1
%igp.ptr = getelementptr i8, ptr %alloca, i64 0
%call = call i64 @sys_write(i64 1, ptr %igp.ptr, i64 3)
%allocaN = alloca i64, align 8
store i64 %call, ptr %allocaN, align 8
ret i32 0
}

Some files were not shown because too many files have changed in this diff Show More