Files
sx/issues/0109-loop-body-alloca-stack-growth.md
agra d8076b9333 lang: rename signed integer types sN -> iN
Surface rename of the signed integer family: s1..s64 become i1..i64
(u1..u64, usize, isize unchanged). 'string' keeps the s-prefix arm in
name classification; width parsing moves to the i-prefix arm next to
isize.

Internal TypeId tags follow the surface (.s8/.s16/.s32/.s64 ->
.i8/.i16/.i32/.i64), as do mono-key mangle fragments (ptr_i64,
tu_i64_bool) and all display/diagnostic formatting (i{d}).

Migrated in the same sweep: stdlib + examples + issue repros + FFI C
companions (shared symbol names like ffi_id_i64), expected
stdout/stderr/ir snapshots, specs.md, readme.md, CLAUDE.md/AGENTS.md,
implementation_plan.md, docs/, issue writeups. Vendored stb_image and
historical flow state left untouched.

zig build test: 426/426; examples suite: 595/595.
2026-06-12 09:31:53 +03:00

5.9 KiB

RESOLVED — 0109: allocas inside loop bodies accumulate stack per iteration

Root cause: emitAlloca (and ~18 sibling LLVMBuildAlloca temp sites in the LLVM backend) built allocas at the builder's current position. An alloca inside a loop body re-executes per iteration and LLVM reclaims allocas only at ret, so the frame grew with the trip count — body locals, nested-loop index slots, and spill temps (ig.tmp etc.) all segfaulted long loops on stack exhaustion.

Fix: new LLVMEmitter.buildEntryAlloca (src/ir/emit_llvm.zig) builds every per-instruction alloca in the function's entry block (after existing entry allocas, builder position restored); all LLVMBuildAlloca sites reachable during instruction emission in src/backend/llvm/ops.zig, src/backend/llvm/abi.zig and src/ir/emit_llvm.zig route through it. Initialization stores stay at the use site, so per-iteration re-init semantics are unchanged; entry-block slots are also mem2reg-promotable. ~35 .ir snapshots churned (pure alloca position moves — verified type-multiset-identical per file).

Regression test: examples/0047-basic-loop-local-stack-reuse.sx (1M-iteration body-local loop prints sum=499999500000; 3M-iteration nested loop prints n=3000000; both segfaulted pre-fix).


0109 — allocas inside loop bodies accumulate stack per iteration → segfault on long loops

Symptom. Any alloca that lands inside a loop's body block executes anew on every iteration, and LLVM stack allocas are only reclaimed at function return — so the frame grows monotonically with the trip count. Observed: a 1M-iteration loop with a body-local array segfaults (stack overflow, fault address at the guard page); so does a 3M-iteration nested loop with no user locals at all (the inner loop's hidden index slot is itself a body-block alloca of the outer loop). Expected: loop-local storage is reused across iterations; stack usage is static per frame regardless of trip count.

This hits three shapes, all confirmed:

  1. user locals declared in a loop body (buf : [128]i64 = ---;),
  2. nested loops (inner for's idx_slot alloca sits in the outer body),
  3. compiler temporaries spilled in the body (e.g. index_get's ig.tmp — see issue 0110 for the for-over-array case specifically).

Reproduction

Repro A — body local (issues/0109-loop-body-alloca-stack-growth.sx):

#import "modules/std.sx";

main :: () -> i32 {
    sum := 0;
    for 0..1000000: (i) {
        buf : [128]i64 = ---;
        buf[0] = i;
        sum += buf[0];
    }
    print("sum={}\n", sum);
    0
}
  • Observed: Segmentation fault at address 0x16e70ffd0 (guard page). With 0..1000 instead it prints sum=499500 and exits 0 — the program is correct, only the stack accumulation kills it.
  • Expected: prints sum=499999500000, exit 0, at any trip count.

Repro B — pure nested loops, zero user locals:

#import "modules/std.sx";

main :: () -> i32 {
    n := 0;
    for 0..3000000: (i) {
        for 0..1: (j) { n += 1; }
    }
    print("n={}\n", n);
    0
}
  • Observed: segfault. Expected: n=3000000, exit 0.

The emitted IR shows the cause directly (sx ir, body of repro A):

for.body.1:
  %alloca2 = alloca [128 x i64], align 8   ; fresh 1KB every iteration
  ...
  %ig.tmp = alloca [128 x i64], align 8    ; plus a 1KB spill temp

Root cause (suspected area)

Builder.alloca (src/ir/module.zig ~474) emits the .alloca instruction into the current block, and the LLVM emitter (src/backend/llvm/ops.zig emitAlloca ~327) builds LLVMBuildAlloca at the current insertion point — so loop-body allocas are executed per iteration. LLVM only treats entry-block allocas as static frame slots (and mem2reg/SROA only promote those); a non-entry alloca re-executes and grows the stack each time, until ret.

The standard fix (what clang does): emit all static allocas into the function's entry block. Least-invasive locus is the emitter — in emitAlloca, save the current insertion point, position the builder at the entry block's first non-alloca instruction (or end of entry if empty), build the alloca there, restore the position, mapRef as before. The IR shape and the interpreter are untouched. All sx allocas are statically sized (TypeId), so every one is hoistable.

Investigation prompt (paste into a fresh session)

Fix issue 0109: loop-body allocas grow the stack per iteration and long loops segfault. In src/backend/llvm/ops.zig emitAlloca (~327), hoist the alloca to the current function's entry block: get the function via the current insert block's parent, position the builder before the entry block's first non-alloca instruction (LLVMGetEntryBasicBlock + LLVMGetFirstInstruction walk past LLVMAlloca opcodes — same positioning pattern as injectCtorIntoMain in src/ir/emit_llvm.zig ~466), build the alloca + mapRef, then restore the previous insertion point (LLVMGetInsertBlock before / LLVMPositionBuilderAtEnd after). Audit the other in-place LLVMBuildAlloca temporaries in src/ir/emit_llvm.zig (ba.tmp, abi.tmp, ig.tmp, etc. — grep BuildAlloca) and route the ones reachable inside loops through the same hoist helper.

Semantics note: per-iteration re-zeroing must not regress — initialization stores (e.g. store undef / = .{...} inits) stay where the decl was, in the body block; only the alloca itself moves to entry.

Verify: both repros in issues/0109-loop-body-alloca-stack-growth.md (A is issues/0109-loop-body-alloca-stack-growth.sx) now print sum=499999500000 / n=3000000 and exit 0; sx ir on repro A shows no alloca inside for.body.*. Then zig build && zig build test && bash tests/run_examples.sh — any .ir snapshot churn from alloca placement must be reviewed (git diff examples/expected/) before --update. Promote a trip-count-bounded variant (e.g. 200k iterations, small buf) to examples/00xx-basic-loop-local-stack-reuse.sx as the pinned regression.