Surface rename of the signed integer family: s1..s64 become i1..i64
(u1..u64, usize, isize unchanged). 'string' keeps the s-prefix arm in
name classification; width parsing moves to the i-prefix arm next to
isize.
Internal TypeId tags follow the surface (.s8/.s16/.s32/.s64 ->
.i8/.i16/.i32/.i64), as do mono-key mangle fragments (ptr_i64,
tu_i64_bool) and all display/diagnostic formatting (i{d}).
Migrated in the same sweep: stdlib + examples + issue repros + FFI C
companions (shared symbol names like ffi_id_i64), expected
stdout/stderr/ir snapshots, specs.md, readme.md, CLAUDE.md/AGENTS.md,
implementation_plan.md, docs/, issue writeups. Vendored stb_image and
historical flow state left untouched.
zig build test: 426/426; examples suite: 595/595.
5.9 KiB
RESOLVED — 0109: allocas inside loop bodies accumulate stack per iteration
Root cause: emitAlloca (and ~18 sibling LLVMBuildAlloca temp sites in the
LLVM backend) built allocas at the builder's current position. An alloca inside a
loop body re-executes per iteration and LLVM reclaims allocas only at ret, so
the frame grew with the trip count — body locals, nested-loop index slots, and
spill temps (ig.tmp etc.) all segfaulted long loops on stack exhaustion.
Fix: new LLVMEmitter.buildEntryAlloca (src/ir/emit_llvm.zig) builds every
per-instruction alloca in the function's entry block (after existing entry
allocas, builder position restored); all LLVMBuildAlloca sites reachable
during instruction emission in src/backend/llvm/ops.zig, src/backend/llvm/abi.zig
and src/ir/emit_llvm.zig route through it. Initialization stores stay at the
use site, so per-iteration re-init semantics are unchanged; entry-block slots
are also mem2reg-promotable. ~35 .ir snapshots churned (pure alloca position
moves — verified type-multiset-identical per file).
Regression test: examples/0047-basic-loop-local-stack-reuse.sx (1M-iteration
body-local loop prints sum=499999500000; 3M-iteration nested loop prints
n=3000000; both segfaulted pre-fix).
0109 — allocas inside loop bodies accumulate stack per iteration → segfault on long loops
Symptom. Any alloca that lands inside a loop's body block executes anew
on every iteration, and LLVM stack allocas are only reclaimed at function
return — so the frame grows monotonically with the trip count. Observed: a
1M-iteration loop with a body-local array segfaults (stack overflow, fault
address at the guard page); so does a 3M-iteration nested loop with no user
locals at all (the inner loop's hidden index slot is itself a body-block
alloca of the outer loop). Expected: loop-local storage is reused across
iterations; stack usage is static per frame regardless of trip count.
This hits three shapes, all confirmed:
- user locals declared in a loop body (
buf : [128]i64 = ---;), - nested loops (inner
for'sidx_slotalloca sits in the outer body), - compiler temporaries spilled in the body (e.g.
index_get'sig.tmp— see issue 0110 for the for-over-array case specifically).
Reproduction
Repro A — body local (issues/0109-loop-body-alloca-stack-growth.sx):
#import "modules/std.sx";
main :: () -> i32 {
sum := 0;
for 0..1000000: (i) {
buf : [128]i64 = ---;
buf[0] = i;
sum += buf[0];
}
print("sum={}\n", sum);
0
}
- Observed:
Segmentation fault at address 0x16e70ffd0(guard page). With0..1000instead it printssum=499500and exits 0 — the program is correct, only the stack accumulation kills it. - Expected: prints
sum=499999500000, exit 0, at any trip count.
Repro B — pure nested loops, zero user locals:
#import "modules/std.sx";
main :: () -> i32 {
n := 0;
for 0..3000000: (i) {
for 0..1: (j) { n += 1; }
}
print("n={}\n", n);
0
}
- Observed: segfault. Expected:
n=3000000, exit 0.
The emitted IR shows the cause directly (sx ir, body of repro A):
for.body.1:
%alloca2 = alloca [128 x i64], align 8 ; fresh 1KB every iteration
...
%ig.tmp = alloca [128 x i64], align 8 ; plus a 1KB spill temp
Root cause (suspected area)
Builder.alloca (src/ir/module.zig ~474) emits the .alloca instruction
into the current block, and the LLVM emitter (src/backend/llvm/ops.zig
emitAlloca ~327) builds LLVMBuildAlloca at the current insertion point —
so loop-body allocas are executed per iteration. LLVM only treats
entry-block allocas as static frame slots (and mem2reg/SROA only promote
those); a non-entry alloca re-executes and grows the stack each time, until
ret.
The standard fix (what clang does): emit all static allocas into the
function's entry block. Least-invasive locus is the emitter — in
emitAlloca, save the current insertion point, position the builder at the
entry block's first non-alloca instruction (or end of entry if empty), build
the alloca there, restore the position, mapRef as before. The IR shape and
the interpreter are untouched. All sx allocas are statically sized (TypeId),
so every one is hoistable.
Investigation prompt (paste into a fresh session)
Fix issue 0109: loop-body allocas grow the stack per iteration and long loops segfault. In
src/backend/llvm/ops.zigemitAlloca(~327), hoist the alloca to the current function's entry block: get the function via the current insert block's parent, position the builder before the entry block's first non-alloca instruction (LLVMGetEntryBasicBlock+LLVMGetFirstInstructionwalk pastLLVMAllocaopcodes — same positioning pattern asinjectCtorIntoMaininsrc/ir/emit_llvm.zig~466), build the alloca +mapRef, then restore the previous insertion point (LLVMGetInsertBlockbefore /LLVMPositionBuilderAtEndafter). Audit the other in-placeLLVMBuildAllocatemporaries insrc/ir/emit_llvm.zig(ba.tmp,abi.tmp,ig.tmp, etc. — grepBuildAlloca) and route the ones reachable inside loops through the same hoist helper.Semantics note: per-iteration re-zeroing must not regress — initialization stores (e.g.
store undef/= .{...}inits) stay where the decl was, in the body block; only theallocaitself moves to entry.Verify: both repros in
issues/0109-loop-body-alloca-stack-growth.md(A isissues/0109-loop-body-alloca-stack-growth.sx) now printsum=499999500000/n=3000000and exit 0;sx iron repro A shows noallocainsidefor.body.*. Thenzig build && zig build test && bash tests/run_examples.sh— any.irsnapshot churn from alloca placement must be reviewed (git diff examples/expected/) before--update. Promote a trip-count-bounded variant (e.g. 200k iterations, small buf) toexamples/00xx-basic-loop-local-stack-reuse.sxas the pinned regression.