Surface rename of the signed integer family: s1..s64 become i1..i64
(u1..u64, usize, isize unchanged). 'string' keeps the s-prefix arm in
name classification; width parsing moves to the i-prefix arm next to
isize.
Internal TypeId tags follow the surface (.s8/.s16/.s32/.s64 ->
.i8/.i16/.i32/.i64), as do mono-key mangle fragments (ptr_i64,
tu_i64_bool) and all display/diagnostic formatting (i{d}).
Migrated in the same sweep: stdlib + examples + issue repros + FFI C
companions (shared symbol names like ffi_id_i64), expected
stdout/stderr/ir snapshots, specs.md, readme.md, CLAUDE.md/AGENTS.md,
implementation_plan.md, docs/, issue writeups. Vendored stb_image and
historical flow state left untouched.
zig build test: 426/426; examples suite: 595/595.
137 lines
5.9 KiB
Markdown
137 lines
5.9 KiB
Markdown
# RESOLVED — 0109: allocas inside loop bodies accumulate stack per iteration
|
|
|
|
**Root cause:** `emitAlloca` (and ~18 sibling `LLVMBuildAlloca` temp sites in the
|
|
LLVM backend) built allocas at the builder's current position. An alloca inside a
|
|
loop body re-executes per iteration and LLVM reclaims allocas only at `ret`, so
|
|
the frame grew with the trip count — body locals, nested-loop index slots, and
|
|
spill temps (`ig.tmp` etc.) all segfaulted long loops on stack exhaustion.
|
|
|
|
**Fix:** new `LLVMEmitter.buildEntryAlloca` (src/ir/emit_llvm.zig) builds every
|
|
per-instruction alloca in the function's entry block (after existing entry
|
|
allocas, builder position restored); all `LLVMBuildAlloca` sites reachable
|
|
during instruction emission in src/backend/llvm/ops.zig, src/backend/llvm/abi.zig
|
|
and src/ir/emit_llvm.zig route through it. Initialization stores stay at the
|
|
use site, so per-iteration re-init semantics are unchanged; entry-block slots
|
|
are also mem2reg-promotable. ~35 `.ir` snapshots churned (pure alloca position
|
|
moves — verified type-multiset-identical per file).
|
|
|
|
**Regression test:** `examples/0047-basic-loop-local-stack-reuse.sx` (1M-iteration
|
|
body-local loop prints `sum=499999500000`; 3M-iteration nested loop prints
|
|
`n=3000000`; both segfaulted pre-fix).
|
|
|
|
---
|
|
|
|
# 0109 — allocas inside loop bodies accumulate stack per iteration → segfault on long loops
|
|
|
|
**Symptom.** Any `alloca` that lands inside a loop's body block executes anew
|
|
on every iteration, and LLVM stack allocas are only reclaimed at function
|
|
return — so the frame grows monotonically with the trip count. Observed: a
|
|
1M-iteration loop with a body-local array segfaults (stack overflow, fault
|
|
address at the guard page); so does a 3M-iteration nested loop with **no user
|
|
locals at all** (the inner loop's hidden index slot is itself a body-block
|
|
alloca of the outer loop). Expected: loop-local storage is reused across
|
|
iterations; stack usage is static per frame regardless of trip count.
|
|
|
|
This hits three shapes, all confirmed:
|
|
|
|
1. user locals declared in a loop body (`buf : [128]i64 = ---;`),
|
|
2. nested loops (inner `for`'s `idx_slot` alloca sits in the outer body),
|
|
3. compiler temporaries spilled in the body (e.g. `index_get`'s `ig.tmp` —
|
|
see issue 0110 for the for-over-array case specifically).
|
|
|
|
## Reproduction
|
|
|
|
Repro A — body local (`issues/0109-loop-body-alloca-stack-growth.sx`):
|
|
|
|
```sx
|
|
#import "modules/std.sx";
|
|
|
|
main :: () -> i32 {
|
|
sum := 0;
|
|
for 0..1000000: (i) {
|
|
buf : [128]i64 = ---;
|
|
buf[0] = i;
|
|
sum += buf[0];
|
|
}
|
|
print("sum={}\n", sum);
|
|
0
|
|
}
|
|
```
|
|
|
|
- **Observed**: `Segmentation fault at address 0x16e70ffd0` (guard page).
|
|
With `0..1000` instead it prints `sum=499500` and exits 0 — the program is
|
|
correct, only the stack accumulation kills it.
|
|
- **Expected**: prints `sum=499999500000`, exit 0, at any trip count.
|
|
|
|
Repro B — pure nested loops, zero user locals:
|
|
|
|
```sx
|
|
#import "modules/std.sx";
|
|
|
|
main :: () -> i32 {
|
|
n := 0;
|
|
for 0..3000000: (i) {
|
|
for 0..1: (j) { n += 1; }
|
|
}
|
|
print("n={}\n", n);
|
|
0
|
|
}
|
|
```
|
|
|
|
- **Observed**: segfault. **Expected**: `n=3000000`, exit 0.
|
|
|
|
The emitted IR shows the cause directly (`sx ir`, body of repro A):
|
|
|
|
```llvm
|
|
for.body.1:
|
|
%alloca2 = alloca [128 x i64], align 8 ; fresh 1KB every iteration
|
|
...
|
|
%ig.tmp = alloca [128 x i64], align 8 ; plus a 1KB spill temp
|
|
```
|
|
|
|
## Root cause (suspected area)
|
|
|
|
`Builder.alloca` (`src/ir/module.zig` ~474) emits the `.alloca` instruction
|
|
into the current block, and the LLVM emitter (`src/backend/llvm/ops.zig`
|
|
`emitAlloca` ~327) builds `LLVMBuildAlloca` at the current insertion point —
|
|
so loop-body allocas are *executed* per iteration. LLVM only treats
|
|
entry-block allocas as static frame slots (and mem2reg/SROA only promote
|
|
those); a non-entry alloca re-executes and grows the stack each time, until
|
|
`ret`.
|
|
|
|
The standard fix (what clang does): emit **all** static allocas into the
|
|
function's entry block. Least-invasive locus is the emitter — in
|
|
`emitAlloca`, save the current insertion point, position the builder at the
|
|
entry block's first non-alloca instruction (or end of entry if empty), build
|
|
the alloca there, restore the position, `mapRef` as before. The IR shape and
|
|
the interpreter are untouched. All sx allocas are statically sized (TypeId),
|
|
so every one is hoistable.
|
|
|
|
## Investigation prompt (paste into a fresh session)
|
|
|
|
> Fix issue 0109: loop-body allocas grow the stack per iteration and long
|
|
> loops segfault. In `src/backend/llvm/ops.zig` `emitAlloca` (~327), hoist the
|
|
> alloca to the current function's entry block: get the function via the
|
|
> current insert block's parent, position the builder before the entry
|
|
> block's first non-alloca instruction (`LLVMGetEntryBasicBlock` +
|
|
> `LLVMGetFirstInstruction` walk past `LLVMAlloca` opcodes — same positioning
|
|
> pattern as `injectCtorIntoMain` in `src/ir/emit_llvm.zig` ~466), build the
|
|
> alloca + `mapRef`, then restore the previous insertion point
|
|
> (`LLVMGetInsertBlock` before / `LLVMPositionBuilderAtEnd` after). Audit the
|
|
> other in-place `LLVMBuildAlloca` temporaries in `src/ir/emit_llvm.zig`
|
|
> (`ba.tmp`, `abi.tmp`, `ig.tmp`, etc. — grep `BuildAlloca`) and route the
|
|
> ones reachable inside loops through the same hoist helper.
|
|
>
|
|
> Semantics note: per-iteration re-zeroing must not regress — initialization
|
|
> stores (e.g. `store undef` / `= .{...}` inits) stay where the decl was, in
|
|
> the body block; only the `alloca` itself moves to entry.
|
|
>
|
|
> Verify: both repros in `issues/0109-loop-body-alloca-stack-growth.md` (A is
|
|
> `issues/0109-loop-body-alloca-stack-growth.sx`) now print
|
|
> `sum=499999500000` / `n=3000000` and exit 0; `sx ir` on repro A shows no
|
|
> `alloca` inside `for.body.*`. Then `zig build && zig build test && bash
|
|
> tests/run_examples.sh` — any `.ir` snapshot churn from alloca placement must
|
|
> be reviewed (`git diff examples/expected/`) before `--update`. Promote a
|
|
> trip-count-bounded variant (e.g. 200k iterations, small buf) to
|
|
> `examples/00xx-basic-loop-local-stack-reuse.sx` as the pinned regression.
|