Files
sx/issues/0109-loop-body-alloca-stack-growth.md
agra d8076b9333 lang: rename signed integer types sN -> iN
Surface rename of the signed integer family: s1..s64 become i1..i64
(u1..u64, usize, isize unchanged). 'string' keeps the s-prefix arm in
name classification; width parsing moves to the i-prefix arm next to
isize.

Internal TypeId tags follow the surface (.s8/.s16/.s32/.s64 ->
.i8/.i16/.i32/.i64), as do mono-key mangle fragments (ptr_i64,
tu_i64_bool) and all display/diagnostic formatting (i{d}).

Migrated in the same sweep: stdlib + examples + issue repros + FFI C
companions (shared symbol names like ffi_id_i64), expected
stdout/stderr/ir snapshots, specs.md, readme.md, CLAUDE.md/AGENTS.md,
implementation_plan.md, docs/, issue writeups. Vendored stb_image and
historical flow state left untouched.

zig build test: 426/426; examples suite: 595/595.
2026-06-12 09:31:53 +03:00

137 lines
5.9 KiB
Markdown

# RESOLVED — 0109: allocas inside loop bodies accumulate stack per iteration
**Root cause:** `emitAlloca` (and ~18 sibling `LLVMBuildAlloca` temp sites in the
LLVM backend) built allocas at the builder's current position. An alloca inside a
loop body re-executes per iteration and LLVM reclaims allocas only at `ret`, so
the frame grew with the trip count — body locals, nested-loop index slots, and
spill temps (`ig.tmp` etc.) all segfaulted long loops on stack exhaustion.
**Fix:** new `LLVMEmitter.buildEntryAlloca` (src/ir/emit_llvm.zig) builds every
per-instruction alloca in the function's entry block (after existing entry
allocas, builder position restored); all `LLVMBuildAlloca` sites reachable
during instruction emission in src/backend/llvm/ops.zig, src/backend/llvm/abi.zig
and src/ir/emit_llvm.zig route through it. Initialization stores stay at the
use site, so per-iteration re-init semantics are unchanged; entry-block slots
are also mem2reg-promotable. ~35 `.ir` snapshots churned (pure alloca position
moves — verified type-multiset-identical per file).
**Regression test:** `examples/0047-basic-loop-local-stack-reuse.sx` (1M-iteration
body-local loop prints `sum=499999500000`; 3M-iteration nested loop prints
`n=3000000`; both segfaulted pre-fix).
---
# 0109 — allocas inside loop bodies accumulate stack per iteration → segfault on long loops
**Symptom.** Any `alloca` that lands inside a loop's body block executes anew
on every iteration, and LLVM stack allocas are only reclaimed at function
return — so the frame grows monotonically with the trip count. Observed: a
1M-iteration loop with a body-local array segfaults (stack overflow, fault
address at the guard page); so does a 3M-iteration nested loop with **no user
locals at all** (the inner loop's hidden index slot is itself a body-block
alloca of the outer loop). Expected: loop-local storage is reused across
iterations; stack usage is static per frame regardless of trip count.
This hits three shapes, all confirmed:
1. user locals declared in a loop body (`buf : [128]i64 = ---;`),
2. nested loops (inner `for`'s `idx_slot` alloca sits in the outer body),
3. compiler temporaries spilled in the body (e.g. `index_get`'s `ig.tmp`
see issue 0110 for the for-over-array case specifically).
## Reproduction
Repro A — body local (`issues/0109-loop-body-alloca-stack-growth.sx`):
```sx
#import "modules/std.sx";
main :: () -> i32 {
sum := 0;
for 0..1000000: (i) {
buf : [128]i64 = ---;
buf[0] = i;
sum += buf[0];
}
print("sum={}\n", sum);
0
}
```
- **Observed**: `Segmentation fault at address 0x16e70ffd0` (guard page).
With `0..1000` instead it prints `sum=499500` and exits 0 — the program is
correct, only the stack accumulation kills it.
- **Expected**: prints `sum=499999500000`, exit 0, at any trip count.
Repro B — pure nested loops, zero user locals:
```sx
#import "modules/std.sx";
main :: () -> i32 {
n := 0;
for 0..3000000: (i) {
for 0..1: (j) { n += 1; }
}
print("n={}\n", n);
0
}
```
- **Observed**: segfault. **Expected**: `n=3000000`, exit 0.
The emitted IR shows the cause directly (`sx ir`, body of repro A):
```llvm
for.body.1:
%alloca2 = alloca [128 x i64], align 8 ; fresh 1KB every iteration
...
%ig.tmp = alloca [128 x i64], align 8 ; plus a 1KB spill temp
```
## Root cause (suspected area)
`Builder.alloca` (`src/ir/module.zig` ~474) emits the `.alloca` instruction
into the current block, and the LLVM emitter (`src/backend/llvm/ops.zig`
`emitAlloca` ~327) builds `LLVMBuildAlloca` at the current insertion point —
so loop-body allocas are *executed* per iteration. LLVM only treats
entry-block allocas as static frame slots (and mem2reg/SROA only promote
those); a non-entry alloca re-executes and grows the stack each time, until
`ret`.
The standard fix (what clang does): emit **all** static allocas into the
function's entry block. Least-invasive locus is the emitter — in
`emitAlloca`, save the current insertion point, position the builder at the
entry block's first non-alloca instruction (or end of entry if empty), build
the alloca there, restore the position, `mapRef` as before. The IR shape and
the interpreter are untouched. All sx allocas are statically sized (TypeId),
so every one is hoistable.
## Investigation prompt (paste into a fresh session)
> Fix issue 0109: loop-body allocas grow the stack per iteration and long
> loops segfault. In `src/backend/llvm/ops.zig` `emitAlloca` (~327), hoist the
> alloca to the current function's entry block: get the function via the
> current insert block's parent, position the builder before the entry
> block's first non-alloca instruction (`LLVMGetEntryBasicBlock` +
> `LLVMGetFirstInstruction` walk past `LLVMAlloca` opcodes — same positioning
> pattern as `injectCtorIntoMain` in `src/ir/emit_llvm.zig` ~466), build the
> alloca + `mapRef`, then restore the previous insertion point
> (`LLVMGetInsertBlock` before / `LLVMPositionBuilderAtEnd` after). Audit the
> other in-place `LLVMBuildAlloca` temporaries in `src/ir/emit_llvm.zig`
> (`ba.tmp`, `abi.tmp`, `ig.tmp`, etc. — grep `BuildAlloca`) and route the
> ones reachable inside loops through the same hoist helper.
>
> Semantics note: per-iteration re-zeroing must not regress — initialization
> stores (e.g. `store undef` / `= .{...}` inits) stay where the decl was, in
> the body block; only the `alloca` itself moves to entry.
>
> Verify: both repros in `issues/0109-loop-body-alloca-stack-growth.md` (A is
> `issues/0109-loop-body-alloca-stack-growth.sx`) now print
> `sum=499999500000` / `n=3000000` and exit 0; `sx ir` on repro A shows no
> `alloca` inside `for.body.*`. Then `zig build && zig build test && bash
> tests/run_examples.sh` — any `.ir` snapshot churn from alloca placement must
> be reviewed (`git diff examples/expected/`) before `--update`. Promote a
> trip-count-bounded variant (e.g. 200k iterations, small buf) to
> `examples/00xx-basic-loop-local-stack-reuse.sx` as the pinned regression.