Files
sx/issues/0125-any-to-string-array-arms-by-value.md
agra 48eb7bf48a P5.6 (macOS): default_pipeline drives bundling; fix issue 0125 (array-format blowup)
build.sx now `#import`s the sx bundler and `default_pipeline` delegates to its
`bundle_main` when a bundle was requested (emit + link, then wrap the binary into
the `.app`/`.apk`); otherwise it just emit+links via the shared `emit_and_link`
core. The Zig `--bundle`/`post_link_module` dispatch shim is removed — the CLI
bundle flags only feed `BuildConfig`, and `default_pipeline` branches on
`bundle_path()`. Validated end-to-end on macOS: `sx build --bundle App.app
--bundle-id … foo.sx` on a plain program AND auto-bundle from `set_bundle_path`
both produce a valid signed `.app` (correct `Contents/MacOS/` layout, Info.plist,
passes `codesign`, binary runs). Also fixed a pre-existing host-build bug:
target_triple was left empty for host builds → `is_macos()` false → wrong flat
layout; main.zig now exposes the host triple when `--target` is absent.

bundle_main no longer re-calls `build_options()` (the handle is already its `opts`
param).

Fix issue 0125 (root cause): the type-match dispatcher unboxed each interned array
tag to the concrete array type — a whole-array load — and passed it to
`array_to_string` by value, which LLVM scalarized into one SelectionDAG node per
element (~12s / segfault at [65536]u8). The bundler's `format("…{}…")` instantiates
`any_to_string`, so importing it into the prelude surfaced 0125 for any large-array
program. Fix (route 1): `any_to_string`'s `case array:` arm calls `slice_to_string`,
and `lowerRuntimeDispatchCall` detects an ARRAY tag bound to a SLICE param and builds
a `{ptr,len}` slice VIEW of the payload pointer (`unbox_any → [*]elem` is an
int-to-ptr with NO load, paired with the array length) instead of loading the array.
Output is byte-identical (`[a, b, c]`). Pinned as
examples/0056-basic-large-array-format-no-blowup.sx; 0055 drops 12s → 0.2s.

37 `.ir` snapshots regenerated (build.sx now pulls in the bundler's types + the
array-format lowering changed); verified `.ir`-only, zero behavior-stream diffs.
705/0 both gates.
2026-06-19 15:32:07 +03:00

4.4 KiB

0125 — any_to_string's array arms materialize every interned array type by value

RESOLVED (2026-06-19). Root cause as described below: the type-match dispatcher (lowerRuntimeDispatchCall, src/ir/lower/call.zig) unboxed each interned array tag to the concrete array type — a whole-array load — and fed it to array_to_string by value, which LLVM scalarized to one DAG node per element. Fix (route 1): any_to_string's case array: arm now calls slice_to_string (library/modules/std/fmt.sx); the dispatcher detects an ARRAY tag bound to a SLICE param and builds a {ptr,len} slice VIEW of the payload pointer (unbox_any → [*]elem is an int-to-ptr with NO load, paired with the array length) instead of loading the array. Output is byte-identical ([a, b, c]). The repro compiles fast and prints correctly; pinned as examples/0056-basic-large-array-format-no-blowup.sx.

Symptom

A program that (a) interns any large (~64KB+) array type and (b) uses {} formatting anywhere — print("{}\n", 5) of a plain int is enough — crashes sx build inside libLLVM (DAGCombiner::visitMERGE_VALUESSelectionDAG::ReplaceAllUsesWith), and makes sx run (-O0) take ~18s to compile a trivial file. The two triggers are independent: the array need never be printed, sliced, or passed anywhere near the format call.

  • Observed: segfault under sx build; multi-second compiles under sx run.
  • Expected: formatting an int is unaffected by an unrelated large array type; printing the array itself formats in place.

Root cause shape: any_to_string's comptime type-switch (library/modules/std/fmt.sx, case array: arm) expands one arm per interned array type, and each arm is array_to_string(cast(type) val):

  1. the cast(type) val unbox loads the WHOLE array from the Any payload pointer (coerceFromI64, src/ir/emit_llvm.zig ~2240, ua.load),
  2. the call passes the array BY VALUE to the array_to_string mono,
  3. the mono spills its by-value param to an alloca and (since the param is an SSA value, not addressable storage) reads elements via index_get on the value — copy-whole-array per element.

LLVM's legalizer scalarizes each whole-aggregate op into one SelectionDAG node per element; at ~64K elements the DAG combiner recurses to death (the sibling of issue 0124, which fixed the local-variable shapes: --- undef store and index reads on addressable storage).

Reproduction

#import "modules/std.sx";

f :: () {
    buf : [65536]u8 = ---;
    buf[0] = 1;
    out(string.{ ptr = @buf[0], len = 1 });
}

main :: () -> i32 {
    f();
    print("{}\n", 5);
    return 0;
}

Observed (with 0124's fix in place): sx build segfaults in libLLVM; sx ir shows the giant arm inside @any_to_string:

%ua.load = load [65536 x i8], ptr %ua.ptr, align 1
%call = call { ptr, i64 } @array_to_string__AR_65536_u8(ptr %0, [65536 x i8] %ua.load)

Investigation prompt

The fix needs the array formatting chain to never materialize the array as a first-class value. The Any payload for an array IS a pointer to its storage (that is what coerceFromI64 intToPtr+loads), so the arm has everything it needs to format in place. Plausible routes, most contained first:

  1. Lower the case array: arm to a slice view: box the payload pointer + the array's element count as a []elem and call slice_to_string (slices unbox as a 16-byte {ptr,len} — no giant ops). Needs the element type at arm-expansion time — the comptime type-switch already has the concrete array TypeId in hand; an element_type(T)-style comptime accessor may need to be added for the sx-level spelling, or the arm can be synthesized in the compiler where both pieces are known.
  2. Teach array_to_string :: (a: $T) monos (and the unbox cast) an indirect ABI for array-typed params — bigger blast radius: touches call emission, param spills, and many .ir snapshots.

Suspected files: src/ir/lower/comptime.zig / lower/call.zig (the type-switch arm expansion and cast(type) lowering), src/ir/emit_llvm.zig coerceFromI64, library/modules/std/fmt.sx (any_to_string, array_to_string).

Verification: the repro builds and runs printing 5; printing the array itself (print("{}\n", buf) on a small array) still renders element lists (pinned by 0101/0904 et al.); zig build test and bash tests/run_examples.sh green; the repro pinned as an example.