Files
sx/design/comptime-compiler-api.md
agra 18af8eb845 comptime-API: strip the byte-weld; pivot to a flat-memory comptime VM
The byte-weld (sx structs whose layout was validated to mirror the
compiler's Zig records) plus the serialization/marshaling bridge was the
wrong direction: it bolted a parallel layout regime and hand-built
byte-copies onto a comptime value model that fundamentally isn't bytes.

Strip the struct-weld machinery:
- compiler_lib.zig loses the type registry (weldStruct / bound_types /
  BoundType / FieldLayout / findType / SxField / LayoutMismatch /
  validateStructLayout); it is now just the intern/text_of function
  host-call bridge (kept as the Phase-3 compiler-call seed).
- nominal.zig loses validateWeldedStruct / weldedFieldOrderStr + the
  sd.abi == .zig validation call.
- Remove the struct-weld unit tests and examples 0625/0627 (welded
  structs) + 1183/1186 (weld-layout diagnostics).
- The #library / abi / extern syntax stays.

Record the new direction: a bytecode VM over flat, byte-addressable
memory so comptime values are native bytes (no weld/validation/marshal),
target-aware (preserves cross-compilation) and sandboxed. See
current/PLAN-COMPILER-VM.md (Phase 0 strip -> Phase 1 flat-memory value
model -> Phase 2 bytecode -> Phase 3 compiler-API on flat memory).
design/comptime-compiler-api.md gets a SUPERSEDED banner. Also drop the
"~500 lines / split the step" rule from CLAUDE.md.
2026-06-17 19:29:36 +03:00

14 KiB

Comptime Compiler API — #library "compiler" + abi(.zig) extern

⚠ SUPERSEDED (2026-06-17) — direction changed. See ../current/PLAN-COMPILER-VM.md. The byte-weld approach below (sx structs whose layout is validated to mirror the compiler's Zig types, plus serialization / marshaling at the call boundary) is the wrong direction and is being stripped. The comptime value model fundamentally isn't bytes, so the weld bolts a parallel layout regime + hand-built byte-copies onto it. The new foundation: a bytecode VM over flat, byte-addressable memory, where comptime values ARE native bytes — so the compiler-API needs no weld, no validation, no marshaling (the compiler exposes its real types/functions and sx reads/builds them directly as memory). The goal below (unify declare/define/type_info + #compiler onto one mechanism, delete the bespoke arms) is unchanged; only the mechanism is. This doc is retained for history and to scope the Phase 0 strip — do NOT implement the weld machinery from here.

Original status: design-of-record. Captured a unified mechanism for sx↔compiler binding that subsumes the metatype declare/define primitives AND the #compiler struct attribute, and exposes the compiler's own type-table API to comptime sx. Design locked 2026-06-17; weld mechanism pivoted same day.

Motivation

Today the compiler↔sx boundary is two ad-hoc mechanisms:

  • #compiler structs (BuildOptions) — sx struct whose methods are compiler hooks (registered in compiler_hooks.zig). A handle to compiler state, method-bound.
  • The metatype declare/define/type_info #builtins — comptime sx reaching into the type table through a narrow, fixed keyhole, with a separate, translated TypeInfo data model in meta.sx (marshalled by hand in interp.zig).

Both are the SAME idea — comptime sx interacting with the compiler — implemented twice, differently. And the metatype path carries real costs: a projected data model that drifts from types.zig, hand-written marshaling, and the staging fragility of issue 0141 (constructor bodies lowered at scanDecls in a half-built world → wrong IR).

This unifies them. One mechanism: a named compiler library that exposes a curated set of the compiler's real types (welded by layout) and functions (host-call bridged), reachable from comptime sx. declare/define/type_info become sx library code over the real API; #compiler is deleted; BuildOptions migrates onto it.

The mechanism

#library "compiler"

compiler :: #library "compiler";

A named binding target that resolves NOT to a .dylib but to the compiler's own internal surface (Zig types + functions). Two defining properties:

  • It IS the safety boundary. The compiler library exports exactly the curated set of types + functions the compiler chooses to expose. Anything not on that export list is unreachable from user comptime code — the boundary is the lib's symbol table, not a convention.
  • It is comptime-only. The compiler isn't present at runtime, so every function from compiler resolves only under the comptime interpreter; calling one at runtime is a clean "comptime-only symbol" error, falling out of the existing is_comptime boundary. (Welded types are still usable as plain runtime data; only the functions are comptime-gated.)

abi(.zig) + extern <lib> — the binding surface

Syntax decision (2026-06-17, supersedes the original extern(.zig) <lib> single-qualifier form). The ABI/layout selector and the linkage keyword are two orthogonal things, so they are two annotations, not one fused qualifier:

  • abi(.x) — the ABI / calling-convention annotation, in the postfix slot before extern/export. It is the unified replacement for the old callconv(...) (which is removed): ABI = { default, c, zig, pure }.c (C ABI / cdecl), .zig (Zig-layout weld → the compiler library), .pure (naked asm). .default = unannotated (ordinary sx convention).
  • extern <lib> — the linkage keyword + binding source (the named library).

abi(...) sits where callconv(...) went (after the return type for fns); the extern/export keyword and the library handle follow. For welded types, the same abi(.zig) + extern <lib> pair sits after struct:

// functions:
text_of       :: (id: StringId)     -> string    abi(.zig) extern compiler;
intern        :: (s: string)        -> StringId   abi(.zig) extern compiler;
register_type :: (info: StructInfo) -> Type       abi(.zig) extern compiler;
find_type     :: (name: StringId)   -> ?Type      abi(.zig) extern compiler;

// types (layout-welded to the lib's real Zig type):
Field      :: struct abi(.zig) extern compiler { name: StringId; ty: Type; };
StructInfo :: struct abi(.zig) extern compiler {
    name: StringId; fields: []Field; is_protocol: bool; nominal_id: u32;
};

abi(.zig) = "Zig ABI / Zig layout"; extern compiler = the linkage + binding source.

Layout welding — why it's exact, not brittle

The sx compiler is itself a Zig program; types.zig is part of it. So at compiler-build time the real record's layout is available via @offsetOf / @sizeOf / @alignOf. An abi(.zig) extern compiler struct is laid out to the bound Zig type's EXACT offsets (queried, not guessed), and the compiler ASSERTS the sx declaration matches the Zig type byte-for-byte (a mismatch is a build error — the sx side is a header checked against the implementation). Because the same compiler builds both, they're guaranteed identical, and a types.zig change re-bakes the offsets on the next build — both sides move together.

Implementation note (how it's exact, concretely). No layout-override engine is needed. The sx header DECLARES its fields in the compiler type's memory order (Zig may reorder a struct from source order). The compiler REFLECTS the bound Zig type — field names from @typeInfo, offsets from @offsetOf, size from @sizeOf, nothing hand-maintained — and VALIDATES the header matches that memory order, with loud diagnostics on drift (field not found, wrong field order + the expected order, type/layout size mismatch). On pass the sx struct's NATURAL layout already equals the Zig layout, so it is an ordinary struct — no reorder, no padding tricks, no index/remap tables, no special LLVM path — and @ptrCasting it to the compiler's own type and dereferencing is byte-identical. When types.zig shifts, the header stops matching and the developer gets a specific message to fix it.

This is what C-ABI extern can't do: it copies Zig's REAL layout, so Zig slices ({ptr,len}), field reordering, and union(enum) tag placement all "just work" — no slice→ptr+len surgery on types.zig, no version fragility.

Host-call bridge (functions)

compiler functions dispatch, under the comptime interp, to the registered internal Zig function — the generalization of the path that already exists (host_ffi.zig resolves comptime extern "c" via dlsym; compiler_hooks.zig registers #compiler method hooks). The compiler lib's registry maps each exported sx name → its Zig function + welded signature.

The exposed surface (curated)

Types (welded): StringId (u32 handle), Type (≡ TypeId, u32), Field, StructInfo, EnumInfo, TaggedUnionInfo, TupleInfo, and a kind-tagged TypeInfo view (see Risks — the union(enum) is the one harder shape).

Functions (comptime-only): intern(string)->StringId, text_of(StringId)->string, find_type(StringId)->?Type, guarded mutators register_struct/register_enum/register_tuple(info)->Type, and the reflection readers (type_of, field/variant iteration) over the welded records.

declare/define/type_info collapse into thin sx over register_*/find_type — or disappear. The bespoke interp arms (.declare/.define/.type_info, defineEnum/defineStruct/defineTuple/reflectTypeInfo) are deleted.

What it buys (and the one honest limit)

Dissolves: the bespoke declare/define surface, the projected TypeInfo model, the hand-marshaling, the #compiler duplication, and the 0141 class of bugs — registration becomes a direct, guarded API call, not "evaluate an sx stdlib body (List/append) at scanDecls," so there's no body to mis-lower at a half-built stage.

Does NOT repeal: the ordering law — a type's layout must exist before code that uses it is lowered. That's inherent to the compiler, not machinery. The win is that it stops leaking as "weird exposed stages" and becomes an encapsulated contract inside the compiler API (the API decides how a registration slots in), instead of the user threading declare→forward-slot→define→eval-timing by hand.

Safety boundary

  • Only the compiler export list is reachable — no raw *TypeTable.
  • Mutators are guarded (register_* validate: dup field/variant names, kind changes, well-formedness) — the same checks define does today, now at the API.
  • Comptime-only enforcement on functions; runtime use is a clean error.
  • Mirrors Zig's own discipline: comptime builds types through sanctioned doors (@Type), it doesn't let user code scribble on the compiler's tables.

BuildOptions migration

BuildOptions :: struct #compiler { ... } + build_options() #compilerabi(.zig) extern compiler: the setter/getter hook-methods become abi(.zig) extern compiler functions (or methods on a welded/handle BuildOptions), backed by the same BuildConfig state. The compiler_hooks.zig registry becomes the compiler lib's function/type registry. Net: the build DSL and the metatype API ride one mechanism.

#compiler removal

After both consumers are migrated, delete the #compiler attribute and its special paths: lexer/parser token + sema handling (src/lexer.zig, src/parser.zig, src/sema.zig, src/token.zig, src/ast.zig), and the #compiler-specific registration in compiler_hooks.zig (the registry stays, re-homed under compiler). sx footprint is tiny (2 lines in library/modules/build.sx).

Code anchors (confirmed 2026-06-17)

Foundation that ALREADY exists:

  • #library "name" lexes (hash_library, src/lexer.zig:91) and parses into a library_decl { lib_name, name } AST node (src/parser.zig:210). So compiler :: #library "compiler"; works today (used for FFI libs like raylib).
  • extern / export are keywords (src/token.zig:46, kw_extern/kw_export).

New work for Phase 1:

  • Lexer/parser: the abi(.zig) annotation (a new abi keyword replacing callconv; ABI = { default, c, zig, pure }) in the slot before extern, followed by the <lib> handle — … abi(.zig) extern <lib> postfix on FN decls (after the return type, before extern) and STRUCT decls (beside struct #compiler). DONE (parse-only)parseOptionalAbi (src/parser.zig) wired on fn decls AND struct decls, ast.ABI, parser unit tests; the callconvabi rename migrated 52 sx files + the compiler's CC-mismatch diagnostic.
  • AST: the abi: ABI field lives on FnDecl / Lambda / FunctionTypeExpr (carries .zig for a welded fn); StructDecl gained abi: ABI + extern_lib: ?[]const u8. DONE.
  • Binding registry: re-home / generalize src/ir/compiler_hooks.zig (today's #compiler registry) into the compiler lib's type+function registry, keyed by exported sx name → Zig type (@offsetOf layout) / Zig fn (host-call).
  • Layout + emit: sx struct layout (src/ir/types.zig / lowering) honors the bound type's offsets; LLVM emission (src/backend/llvm/types.zig) hits them.
  • Host-call bridge: extend the comptime path (src/ir/host_ffi.zig + interp.zig) to dispatch compiler functions to their registered Zig fns, comptime-only.

Build order (each phase keeps zig build test green)

  1. abi(.zig) extern <lib> + #library foundation — parse the postfix annotation (the #library decl already exists); a binding registry (sx name → Zig type/fn); the layout engine honoring the bound type's @offsetOf offsets + LLVM emission that hits them; build-time layout-equality assertion. Prove with Field (two u32s). First testable sub-step DONE: abi(.zig) extern <lib> PARSES on a fn decl (parser unit test), AST carries the binding (abi == .zig, extern_lib) — no semantics yet.
  2. Weld StructInfo + StringId accessors (intern/text_of) over the host-call bridge.
  3. Re-express type_info/define (struct) as sx over register_struct/ find_type; migrate examples/0622; delete the struct interp arms; suite green.
  4. Widen to enum/tuple — weld EnumInfo/TaggedUnionInfo/TupleInfo (optional fields → sentinels: backing_type .unresolved, explicit_values len-0); migrate examples/0619/0623; delete the enum/tuple interp arms.
  5. Migrate BuildOptions to abi(.zig) extern compiler.
  6. Delete #compiler; suite green.

Risks / open questions

  • union(enum) welding. TypeInfo is a Zig tagged union; mirroring its tag placement is the one shape harder than plain structs. Start with a kind-tagged view (weld the payload structs, drive the discriminant via a kind accessor), defer full-union welding. type_info/define mostly traffic in the payload records anyway.
  • Optional fields in welded records (?[]const i64, ?TypeId) — represent via sentinels on the sx side, or expose through accessor functions rather than raw fields.
  • LLVM layout emission for arbitrary external offsets (padding / byte-offset GEPs) is the meatiest part of phase 1.
  • Mutation safety — the guarded-mutator surface must cover every invariant the type table relies on (interning, nominal ids, forward slots).
  • @offsetOf binding for nested/parameterized types — the registry must map each exported sx type to a concrete Zig type; generic Zig types need a concrete instantiation to bind.