Files
sx/issues/0110-for-array-by-value-full-array-spill.md
agra bf47146085 fix(0110): for-over-array by-value fetch reads one element, not a full copy
lowerFor's by-value element fetch emitted index_get on the array VALUE;
the emitter realizes that as a whole-array spill to a stack temp + GEP,
per iteration — O(N^2) bytes copied per loop (and pre-0109 it also grew
the stack per iteration, segfaulting a [4096]s64 loop).

When the iterable is an array with addressable storage (and not deref'd
from a pointer, whose identifier alloca holds the pointer rather than
the array), the fetch is now index_gep on the storage + one element
load. Storage-less arrays keep the index_get fallback. The loaded
element remains a copy — mutating the capture does not write back.

Regression: examples/0048-basic-for-array-large.sx (sum over 4096
elements + by-value copy-guard).
2026-06-10 17:34:35 +03:00

113 lines
5.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# RESOLVED — 0110: `for arr: (x)` over an array copies the ENTIRE array per element
**Root cause:** `lowerFor`'s by-value element fetch emitted `index_get` on the
array *value*; the emitter realizes that as spill-whole-array-to-temp + GEP one
element, per iteration — O(N²) bytes copied (and pre-0109, per-iteration stack
growth that segfaulted a `[4096]s64` loop).
**Fix:** in `src/ir/lower/control_flow.zig` `lowerFor`, when the iterable is an
array with addressable storage (`getExprAlloca` hit, and the iterable was not
deref'd from a pointer — a deref'd identifier's alloca holds the pointer, not
the array), the by-value fetch is `index_gep` on the storage + a single element
`load`. Storage-less arrays and the deref'd-pointer case keep the `index_get`
fallback. Capture semantics unchanged: the loaded element is still a copy
(mutating it does not write back), matching the by-ref branch's base resolution.
**Regression test:** `examples/0048-basic-for-array-large.sx` (4096-element sum
prints `sum=8386560`, segfaulted pre-fix; copy-guard confirms by-value capture
mutation leaves the array untouched).
---
# 0110 — `for arr: (x)` over an array copies the ENTIRE array per element
**Symptom.** The collection-form `for` with by-value capture over an array
lowers the element fetch as `index_get` on the array *value*, which the LLVM
emitter realizes as: load the whole array as an SSA value, spill it to a
fresh `ig.tmp` alloca, GEP one element. Per iteration. Observed: a `for` over
a `[4096]s64` array segfaults (4096 iterations × 32KB spill = ~134MB of stack
— see issue 0109 for why body allocas never unwind); a `[256]s64` version
completes but copies 256 × 2KB = 512KB to read 2KB of data. Expected: O(1)
stack, O(N) total work — GEP into the array's existing storage and load the
single element.
Even after 0109's entry-block hoist (which collapses the spill to one reused
slot and fixes the crash), the full-array copy per element remains — O(N²)
bytes copied per loop. The by-ref capture path (`for arr: (*x)`) already does
this right: it GEPs into the array's alloca (`getExprAlloca`) without
copying. By-value should reuse the same base and just add a load.
## Reproduction
```sx
#import "modules/std.sx";
main :: () -> s32 {
arr : [4096]s64 = ---;
i := 0;
while i < 4096 { arr[i] = i; i += 1; }
sum := 0;
for arr: (x) { sum += x; }
print("sum={}\n", sum);
0
}
```
- **Observed**: `Segmentation fault` (stack overflow). With `256` instead of
`4096` it prints `sum=32640` — and `sx ir` shows the per-iteration spill:
```llvm
for.body.4:
%ig.tmp = alloca [256 x i64], align 8 ; fresh full-array temp
store [256 x i64] %load6, ptr %ig.tmp, align 8 ; copy ALL 256 elements
%ig.ptr = getelementptr [256 x i64], ptr %ig.tmp, i64 0, i64 %load8
%ig.val = load i64, ptr %ig.ptr, align 8 ; ...to read ONE
```
- **Expected**: `sum=8386560`, exit 0; the body GEPs `arr`'s own storage:
`getelementptr` on the original alloca + a single `i64` load, no array-sized
temp, no copy.
Repro co-located: `issues/0110-for-array-by-value-full-array-spill.sx`
(unpinned until fixed; depends on 0109 only for the crash, not for the copy).
## Root cause (suspected area)
`src/ir/lower/control_flow.zig` `lowerFor` (~336-343): the by-ref branch
builds `index_gep` on the array's storage (`getExprAlloca(fe.iterable) orelse
iterable`), but the by-value branch emits
`.index_get = .{ .lhs = iterable, .rhs = idx_val }` where `iterable` is the
loaded array **value**. The emitter's `index_get` on an aggregate SSA value
(`src/ir/emit_llvm.zig`, `ig.tmp` site ~2100s) has no address to index, so it
spills the whole value first.
Fix shape: in `lowerFor`, when the iterable is an array with storage
(`getExprAlloca` hit — same condition the by-ref branch uses), lower the
by-value element as `index_gep` on that storage followed by a `load` of the
element type, instead of `index_get` on the value. Capture semantics are
unchanged: the load still produces a per-iteration copy of the *element*
(mutating `x` must not write back; mutating the array inside the body is
already visible to subsequent iterations through the same storage in the
by-ref form — match existing behavior with a test). Keep the `index_get`
fallback for storage-less iterables (e.g. an array returned by a call).
## Investigation prompt (paste into a fresh session)
> Fix issue 0110: collection-form `for arr: (x)` (by-value capture) over an
> array spills the entire array per iteration. In
> `src/ir/lower/control_flow.zig` `lowerFor` (~336-343), make the by-value
> element fetch reuse the by-ref branch's base resolution: if the iterable is
> an `.array` and `getExprAlloca(fe.iterable)` (or the deref'd pointer base)
> yields storage, emit `index_gep` on that storage + `builder.load` of the
> element type; otherwise keep the existing `index_get` path. Slices/List
> views are unaffected (their `index_get` is already pointer-backed).
>
> Verify: `./zig-out/bin/sx run issues/0110-for-array-by-value-full-array-spill.sx`
> prints `sum=8386560` exit 0 (after 0109 lands; before it, verify the
> 256-element variant's `sx ir` shows no `ig.tmp` array spill in
> `for.body.*`). Add a semantics guard test: mutate `x` in the body and
> confirm the array is unchanged after the loop (by-value capture stays a
> copy). Then `zig build && zig build test && bash tests/run_examples.sh`,
> review any `.ir` snapshot diffs, and promote the repro to
> `examples/00xx-basic-for-array-large.sx`.