Surface rename of the signed integer family: s1..s64 become i1..i64
(u1..u64, usize, isize unchanged). 'string' keeps the s-prefix arm in
name classification; width parsing moves to the i-prefix arm next to
isize.
Internal TypeId tags follow the surface (.s8/.s16/.s32/.s64 ->
.i8/.i16/.i32/.i64), as do mono-key mangle fragments (ptr_i64,
tu_i64_bool) and all display/diagnostic formatting (i{d}).
Migrated in the same sweep: stdlib + examples + issue repros + FFI C
companions (shared symbol names like ffi_id_i64), expected
stdout/stderr/ir snapshots, specs.md, readme.md, CLAUDE.md/AGENTS.md,
implementation_plan.md, docs/, issue writeups. Vendored stb_image and
historical flow state left untouched.
zig build test: 426/426; examples suite: 595/595.
91 lines
3.6 KiB
Markdown
91 lines
3.6 KiB
Markdown
# 0125 — any_to_string's array arms materialize every interned array type by value
|
|
|
|
## Symptom
|
|
|
|
A program that (a) interns any large (~64KB+) array type and (b) uses
|
|
`{}` formatting anywhere — `print("{}\n", 5)` of a plain int is enough —
|
|
crashes `sx build` inside libLLVM (`DAGCombiner::visitMERGE_VALUES` →
|
|
`SelectionDAG::ReplaceAllUsesWith`), and makes `sx run` (-O0) take ~18s
|
|
to compile a trivial file. The two triggers are independent: the array
|
|
need never be printed, sliced, or passed anywhere near the format call.
|
|
|
|
- **Observed**: segfault under `sx build`; multi-second compiles under
|
|
`sx run`.
|
|
- **Expected**: formatting an int is unaffected by an unrelated large
|
|
array type; printing the array itself formats in place.
|
|
|
|
Root cause shape: `any_to_string`'s comptime type-switch
|
|
(library/modules/std/fmt.sx, `case array:` arm) expands one arm per
|
|
interned array type, and each arm is
|
|
`array_to_string(cast(type) val)`:
|
|
|
|
1. the `cast(type) val` unbox loads the WHOLE array from the Any
|
|
payload pointer (`coerceFromI64`, src/ir/emit_llvm.zig ~2240,
|
|
`ua.load`),
|
|
2. the call passes the array BY VALUE to the `array_to_string` mono,
|
|
3. the mono spills its by-value param to an alloca and (since the
|
|
param is an SSA value, not addressable storage) reads elements via
|
|
`index_get` on the value — copy-whole-array per element.
|
|
|
|
LLVM's legalizer scalarizes each whole-aggregate op into one
|
|
SelectionDAG node per element; at ~64K elements the DAG combiner
|
|
recurses to death (the sibling of issue 0124, which fixed the
|
|
local-variable shapes: `---` undef store and index reads on
|
|
addressable storage).
|
|
|
|
## Reproduction
|
|
|
|
```sx
|
|
#import "modules/std.sx";
|
|
|
|
f :: () {
|
|
buf : [65536]u8 = ---;
|
|
buf[0] = 1;
|
|
out(string.{ ptr = @buf[0], len = 1 });
|
|
}
|
|
|
|
main :: () -> i32 {
|
|
f();
|
|
print("{}\n", 5);
|
|
return 0;
|
|
}
|
|
```
|
|
|
|
Observed (with 0124's fix in place): `sx build` segfaults in libLLVM;
|
|
`sx ir` shows the giant arm inside `@any_to_string`:
|
|
|
|
```llvm
|
|
%ua.load = load [65536 x i8], ptr %ua.ptr, align 1
|
|
%call = call { ptr, i64 } @array_to_string__AR_65536_u8(ptr %0, [65536 x i8] %ua.load)
|
|
```
|
|
|
|
## Investigation prompt
|
|
|
|
The fix needs the array formatting chain to never materialize the
|
|
array as a first-class value. The Any payload for an array IS a
|
|
pointer to its storage (that is what `coerceFromI64` intToPtr+loads),
|
|
so the arm has everything it needs to format in place. Plausible
|
|
routes, most contained first:
|
|
|
|
1. Lower the `case array:` arm to a slice view: box the payload
|
|
pointer + the array's element count as a `[]elem` and call
|
|
`slice_to_string` (slices unbox as a 16-byte {ptr,len} — no giant
|
|
ops). Needs the element type at arm-expansion time — the comptime
|
|
type-switch already has the concrete array TypeId in hand; an
|
|
`element_type(T)`-style comptime accessor may need to be added for
|
|
the sx-level spelling, or the arm can be synthesized in the
|
|
compiler where both pieces are known.
|
|
2. Teach `array_to_string :: (a: $T)` monos (and the unbox `cast`) an
|
|
indirect ABI for array-typed params — bigger blast radius: touches
|
|
call emission, param spills, and many `.ir` snapshots.
|
|
|
|
Suspected files: src/ir/lower/comptime.zig / lower/call.zig (the
|
|
type-switch arm expansion and `cast(type)` lowering),
|
|
src/ir/emit_llvm.zig `coerceFromI64`,
|
|
library/modules/std/fmt.sx (`any_to_string`, `array_to_string`).
|
|
|
|
Verification: the repro builds and runs printing `5`; printing the
|
|
array itself (`print("{}\n", buf)` on a small array) still renders
|
|
element lists (pinned by 0101/0904 et al.); `zig build test` and
|
|
`bash tests/run_examples.sh` green; the repro pinned as an example.
|