3.4 KiB
0124 — 64K+ stack arrays emit whole-aggregate load/store ops that segfault LLVM
Symptom
Declaring a large (~64KB+) stack array in a function reachable from
main crashes the compiler during native emission — a segfault inside
libLLVM, not a diagnostic.
- Observed:
Segmentation fault at address 0x16b...(a stack address) undersx build, insideDAGCombiner::visitMERGE_VALUES→SelectionDAG::ReplaceAllUsesWith(viaLLVMTargetMachineEmitToFile, src/ir/emit_llvm.zig:2894). - Expected: the program compiles; the array lives in the frame and is accessed in place.
The crash threshold is DAG-shape dependent, not a clean size boundary
([65535]u8 and [65537]u8 compile, [65536]u8, [66000]u8,
[131072]u8 crash), because the real problem is the SelectionDAG
node count: lowering materializes the array as a FIRST-CLASS LLVM
value, and the legalizer scalarizes each whole-aggregate op into one
node per element. Two emission shapes produce such ops:
buf : [N]u8 = ---;stores a whole-array undef constant (store [N x i8] undef, ptr %alloca) — a store of nothing, for an explicitly-uninitialized local.buf[i]reads on a local array lower asindex_geton the array VALUE: load the entire array as an SSA value, spill it to anig.tmpalloca, GEP one element (the general-expression sibling of resolved issue 0110, which fixed onlylowerFor's element fetch). Besides the crash, this copies N bytes to read 1.
Each shape crashes llc in isolation on the dumped IR; with both replaced by in-place access the module compiles.
Reproduction
#import "modules/std.sx";
f :: (fd: s32) {
buf : [65536]u8 = ---;
if buf[0] > 0 { out("x\n"); }
}
main :: () -> s32 {
f(1);
return 0;
}
Observed at master 7f2b8b5: sx build segfaults in libLLVM with the
stack trace above. sx ir shows the two whole-aggregate ops:
%alloca1 = alloca [65536 x i8], align 1
store [65536 x i8] undef, ptr %alloca1, align 1
%load = load [65536 x i8], ptr %alloca1, align 1
%ig.tmp = alloca [65536 x i8], align 1
store [65536 x i8] %load, ptr %ig.tmp, align 1
%ig.ptr = getelementptr [65536 x i8], ptr %ig.tmp, i64 0, i64 0
Investigation prompt
Two lowering sites produce the whole-aggregate ops; fix both:
src/ir/lower/stmt.ziglowerVarDecl(annotated branch): a.undef_literalinitializer falls through tolowerExpr(val)→constUndef(array type)→store.---means explicitly uninitialized — emit NO store at all (keep the existing tuple zero-init carve-out above it).src/ir/lower/expr.ziglowerIndexExpr: when the indexed object is an array with addressable storage (getExprAllocahit, same guard as 0110'slowerForfix), emitindex_gepon the storage + a single-elementloadinstead ofindex_geton the loaded array value. Storage-less arrays (rvalues) keep theindex_getfallback. The object must NOT be lowered as a value on the storage path or the dead whole-arrayloadstill reaches the DAG.
Verification: the repro builds and runs (prints nothing or x
depending on stack garbage — gate on exit 0 of the build, not the
read); [65535]/[65537]/[131072] variants all build. Pin a
regression example that builds AND deterministically runs (write
before read). zig build && zig build test,
bash tests/run_examples.sh green; expect .ir snapshot churn from
removed undef stores and the new gep+load shape — re-pin and review.