diff --git a/issues/0124-large-stack-array-aggregate-ops-crash-llvm.md b/issues/0124-large-stack-array-aggregate-ops-crash-llvm.md new file mode 100644 index 0000000..e8aecaa --- /dev/null +++ b/issues/0124-large-stack-array-aggregate-ops-crash-llvm.md @@ -0,0 +1,86 @@ +# 0124 — 64K+ stack arrays emit whole-aggregate load/store ops that segfault LLVM + +## Symptom + +Declaring a large (~64KB+) stack array in a function reachable from +`main` crashes the compiler during native emission — a segfault inside +libLLVM, not a diagnostic. + +- **Observed**: `Segmentation fault at address 0x16b...` (a stack + address) under `sx build`, inside + `DAGCombiner::visitMERGE_VALUES` → `SelectionDAG::ReplaceAllUsesWith` + (via `LLVMTargetMachineEmitToFile`, src/ir/emit_llvm.zig:2894). +- **Expected**: the program compiles; the array lives in the frame and + is accessed in place. + +The crash threshold is DAG-shape dependent, not a clean size boundary +(`[65535]u8` and `[65537]u8` compile, `[65536]u8`, `[66000]u8`, +`[131072]u8` crash), because the real problem is the SelectionDAG +node count: lowering materializes the array as a FIRST-CLASS LLVM +value, and the legalizer scalarizes each whole-aggregate op into one +node per element. Two emission shapes produce such ops: + +1. `buf : [N]u8 = ---;` stores a whole-array undef constant + (`store [N x i8] undef, ptr %alloca`) — a store of nothing, for an + explicitly-uninitialized local. +2. `buf[i]` reads on a local array lower as `index_get` on the array + VALUE: load the entire array as an SSA value, spill it to an + `ig.tmp` alloca, GEP one element (the general-expression sibling of + resolved issue 0110, which fixed only `lowerFor`'s element fetch). + Besides the crash, this copies N bytes to read 1. + +Each shape crashes llc in isolation on the dumped IR; with both +replaced by in-place access the module compiles. + +## Reproduction + +```sx +#import "modules/std.sx"; + +f :: (fd: s32) { + buf : [65536]u8 = ---; + if buf[0] > 0 { out("x\n"); } +} + +main :: () -> s32 { + f(1); + return 0; +} +``` + +Observed at master 7f2b8b5: `sx build` segfaults in libLLVM with the +stack trace above. `sx ir` shows the two whole-aggregate ops: + +```llvm +%alloca1 = alloca [65536 x i8], align 1 +store [65536 x i8] undef, ptr %alloca1, align 1 +%load = load [65536 x i8], ptr %alloca1, align 1 +%ig.tmp = alloca [65536 x i8], align 1 +store [65536 x i8] %load, ptr %ig.tmp, align 1 +%ig.ptr = getelementptr [65536 x i8], ptr %ig.tmp, i64 0, i64 0 +``` + +## Investigation prompt + +Two lowering sites produce the whole-aggregate ops; fix both: + +1. `src/ir/lower/stmt.zig` `lowerVarDecl` (annotated branch): a + `.undef_literal` initializer falls through to + `lowerExpr(val)` → `constUndef(array type)` → `store`. `---` means + explicitly uninitialized — emit NO store at all (keep the existing + tuple zero-init carve-out above it). +2. `src/ir/lower/expr.zig` `lowerIndexExpr`: when the indexed object + is an array with addressable storage (`getExprAlloca` hit, same + guard as 0110's `lowerFor` fix), emit `index_gep` on the storage + + a single-element `load` instead of `index_get` on the loaded array + value. Storage-less arrays (rvalues) keep the `index_get` fallback. + The object must NOT be lowered as a value on the storage path or + the dead whole-array `load` still reaches the DAG. + +Verification: the repro builds and runs (prints nothing or `x` +depending on stack garbage — gate on exit 0 of the build, not the +read); `[65535]`/`[65537]`/`[131072]` variants all build. Pin a +regression example that builds AND deterministically runs (write +before read). `zig build && zig build test`, +`bash tests/run_examples.sh` green; expect `.ir` snapshot churn from +removed undef stores and the new gep+load shape — re-pin and review.