fix(0109): hoist all per-instruction allocas to the function entry block
An alloca built at its use site re-executes on every pass through that block, and LLVM reclaims allocas only at ret — so loop-body locals, nested-loop index slots, and emitter spill temps (ig.tmp, sret slots, ABI coercion temps, byval materialization) grew the stack per iteration and long loops segfaulted on stack exhaustion. New LLVMEmitter.buildEntryAlloca inserts after existing entry-block allocas and restores the builder position; every LLVMBuildAlloca site reachable during instruction emission now routes through it. Initialization stores stay at the use site (per-iteration re-init is unchanged), and entry slots become mem2reg-promotable. The 35 .ir snapshot diffs are pure alloca position moves (type multisets verified identical per file). Regression: examples/0047-basic-loop-local-stack-reuse.sx (segfaulted pre-fix on both the 1M-iteration body-local loop and the 3M-iteration nested loop).
This commit is contained in:
136
issues/0109-loop-body-alloca-stack-growth.md
Normal file
136
issues/0109-loop-body-alloca-stack-growth.md
Normal file
@@ -0,0 +1,136 @@
|
||||
# RESOLVED — 0109: allocas inside loop bodies accumulate stack per iteration
|
||||
|
||||
**Root cause:** `emitAlloca` (and ~18 sibling `LLVMBuildAlloca` temp sites in the
|
||||
LLVM backend) built allocas at the builder's current position. An alloca inside a
|
||||
loop body re-executes per iteration and LLVM reclaims allocas only at `ret`, so
|
||||
the frame grew with the trip count — body locals, nested-loop index slots, and
|
||||
spill temps (`ig.tmp` etc.) all segfaulted long loops on stack exhaustion.
|
||||
|
||||
**Fix:** new `LLVMEmitter.buildEntryAlloca` (src/ir/emit_llvm.zig) builds every
|
||||
per-instruction alloca in the function's entry block (after existing entry
|
||||
allocas, builder position restored); all `LLVMBuildAlloca` sites reachable
|
||||
during instruction emission in src/backend/llvm/ops.zig, src/backend/llvm/abi.zig
|
||||
and src/ir/emit_llvm.zig route through it. Initialization stores stay at the
|
||||
use site, so per-iteration re-init semantics are unchanged; entry-block slots
|
||||
are also mem2reg-promotable. ~35 `.ir` snapshots churned (pure alloca position
|
||||
moves — verified type-multiset-identical per file).
|
||||
|
||||
**Regression test:** `examples/0047-basic-loop-local-stack-reuse.sx` (1M-iteration
|
||||
body-local loop prints `sum=499999500000`; 3M-iteration nested loop prints
|
||||
`n=3000000`; both segfaulted pre-fix).
|
||||
|
||||
---
|
||||
|
||||
# 0109 — allocas inside loop bodies accumulate stack per iteration → segfault on long loops
|
||||
|
||||
**Symptom.** Any `alloca` that lands inside a loop's body block executes anew
|
||||
on every iteration, and LLVM stack allocas are only reclaimed at function
|
||||
return — so the frame grows monotonically with the trip count. Observed: a
|
||||
1M-iteration loop with a body-local array segfaults (stack overflow, fault
|
||||
address at the guard page); so does a 3M-iteration nested loop with **no user
|
||||
locals at all** (the inner loop's hidden index slot is itself a body-block
|
||||
alloca of the outer loop). Expected: loop-local storage is reused across
|
||||
iterations; stack usage is static per frame regardless of trip count.
|
||||
|
||||
This hits three shapes, all confirmed:
|
||||
|
||||
1. user locals declared in a loop body (`buf : [128]s64 = ---;`),
|
||||
2. nested loops (inner `for`'s `idx_slot` alloca sits in the outer body),
|
||||
3. compiler temporaries spilled in the body (e.g. `index_get`'s `ig.tmp` —
|
||||
see issue 0110 for the for-over-array case specifically).
|
||||
|
||||
## Reproduction
|
||||
|
||||
Repro A — body local (`issues/0109-loop-body-alloca-stack-growth.sx`):
|
||||
|
||||
```sx
|
||||
#import "modules/std.sx";
|
||||
|
||||
main :: () -> s32 {
|
||||
sum := 0;
|
||||
for 0..1000000: (i) {
|
||||
buf : [128]s64 = ---;
|
||||
buf[0] = i;
|
||||
sum += buf[0];
|
||||
}
|
||||
print("sum={}\n", sum);
|
||||
0
|
||||
}
|
||||
```
|
||||
|
||||
- **Observed**: `Segmentation fault at address 0x16e70ffd0` (guard page).
|
||||
With `0..1000` instead it prints `sum=499500` and exits 0 — the program is
|
||||
correct, only the stack accumulation kills it.
|
||||
- **Expected**: prints `sum=499999500000`, exit 0, at any trip count.
|
||||
|
||||
Repro B — pure nested loops, zero user locals:
|
||||
|
||||
```sx
|
||||
#import "modules/std.sx";
|
||||
|
||||
main :: () -> s32 {
|
||||
n := 0;
|
||||
for 0..3000000: (i) {
|
||||
for 0..1: (j) { n += 1; }
|
||||
}
|
||||
print("n={}\n", n);
|
||||
0
|
||||
}
|
||||
```
|
||||
|
||||
- **Observed**: segfault. **Expected**: `n=3000000`, exit 0.
|
||||
|
||||
The emitted IR shows the cause directly (`sx ir`, body of repro A):
|
||||
|
||||
```llvm
|
||||
for.body.1:
|
||||
%alloca2 = alloca [128 x i64], align 8 ; fresh 1KB every iteration
|
||||
...
|
||||
%ig.tmp = alloca [128 x i64], align 8 ; plus a 1KB spill temp
|
||||
```
|
||||
|
||||
## Root cause (suspected area)
|
||||
|
||||
`Builder.alloca` (`src/ir/module.zig` ~474) emits the `.alloca` instruction
|
||||
into the current block, and the LLVM emitter (`src/backend/llvm/ops.zig`
|
||||
`emitAlloca` ~327) builds `LLVMBuildAlloca` at the current insertion point —
|
||||
so loop-body allocas are *executed* per iteration. LLVM only treats
|
||||
entry-block allocas as static frame slots (and mem2reg/SROA only promote
|
||||
those); a non-entry alloca re-executes and grows the stack each time, until
|
||||
`ret`.
|
||||
|
||||
The standard fix (what clang does): emit **all** static allocas into the
|
||||
function's entry block. Least-invasive locus is the emitter — in
|
||||
`emitAlloca`, save the current insertion point, position the builder at the
|
||||
entry block's first non-alloca instruction (or end of entry if empty), build
|
||||
the alloca there, restore the position, `mapRef` as before. The IR shape and
|
||||
the interpreter are untouched. All sx allocas are statically sized (TypeId),
|
||||
so every one is hoistable.
|
||||
|
||||
## Investigation prompt (paste into a fresh session)
|
||||
|
||||
> Fix issue 0109: loop-body allocas grow the stack per iteration and long
|
||||
> loops segfault. In `src/backend/llvm/ops.zig` `emitAlloca` (~327), hoist the
|
||||
> alloca to the current function's entry block: get the function via the
|
||||
> current insert block's parent, position the builder before the entry
|
||||
> block's first non-alloca instruction (`LLVMGetEntryBasicBlock` +
|
||||
> `LLVMGetFirstInstruction` walk past `LLVMAlloca` opcodes — same positioning
|
||||
> pattern as `injectCtorIntoMain` in `src/ir/emit_llvm.zig` ~466), build the
|
||||
> alloca + `mapRef`, then restore the previous insertion point
|
||||
> (`LLVMGetInsertBlock` before / `LLVMPositionBuilderAtEnd` after). Audit the
|
||||
> other in-place `LLVMBuildAlloca` temporaries in `src/ir/emit_llvm.zig`
|
||||
> (`ba.tmp`, `abi.tmp`, `ig.tmp`, etc. — grep `BuildAlloca`) and route the
|
||||
> ones reachable inside loops through the same hoist helper.
|
||||
>
|
||||
> Semantics note: per-iteration re-zeroing must not regress — initialization
|
||||
> stores (e.g. `store undef` / `= .{...}` inits) stay where the decl was, in
|
||||
> the body block; only the `alloca` itself moves to entry.
|
||||
>
|
||||
> Verify: both repros in `issues/0109-loop-body-alloca-stack-growth.md` (A is
|
||||
> `issues/0109-loop-body-alloca-stack-growth.sx`) now print
|
||||
> `sum=499999500000` / `n=3000000` and exit 0; `sx ir` on repro A shows no
|
||||
> `alloca` inside `for.body.*`. Then `zig build && zig build test && bash
|
||||
> tests/run_examples.sh` — any `.ir` snapshot churn from alloca placement must
|
||||
> be reviewed (`git diff examples/expected/`) before `--update`. Promote a
|
||||
> trip-count-bounded variant (e.g. 200k iterations, small buf) to
|
||||
> `examples/00xx-basic-loop-local-stack-reuse.sx` as the pinned regression.
|
||||
Reference in New Issue
Block a user