test: run example corpus in zig build test; sx ir → stdout

`zig build test` now runs the full examples/ + issues/ regression corpus alongside the Zig unit tests, driven by a pure-Zig test (src/corpus_run.test.zig) — no shell script in the build path. It spawns the installed `sx` per example (subprocess-isolated, per-run timeout), diffs stdout/stderr/exit and optional `sx ir` snapshots, and fails the build on any mismatch. The file list is enumerated at runtime, so new examples are covered with no test edit. - `sx ir` / `ir-dump` now write to stdout (fd 1) instead of stderr, so the dumps can be piped/redirected. - `zig build test -Dupdate-goldens` regenerates snapshots in-build, byte-identical to the legacy `run_examples.sh --update`; on mismatch the runner prints how to regenerate. - run_examples.sh kept (still used by tools/verify-step.sh) and made portable to a bare macOS: timeout/gtimeout fallback, bash 3.2-safe empty-array handling. - CLAUDE.md: document the new workflow.
2026-06-13 09:41:56 +03:00
parent 39488133c9
commit ab3c9202ff
7 changed files with 464 additions and 25 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -405,7 +405,10 @@ any can be advanced independently.
 After any code change:
 ```sh
 zig build                              # must compile
-zig build test                         # must pass
+zig build test                         # must pass — runs the Zig unit tests
                                       # AND the full examples/ + issues/
                                       # regression corpus (a failing example
                                       # fails the build)
 ```
 After completing a phase's final step, run the phase's end-to-end verification command listed in `current/PLAN.md`.
@@ -415,9 +418,27 @@ After completing a phase's final step, run the phase's end-to-end verification c
 After any compiler change:
 1. **Build**: `zig build && zig build test`
-2. **Run regression tests**: `bash tests/run_examples.sh`
+   - `zig build test` runs the unit tests **and** the example/issue corpus as
-   - Every test must show `ok` (currently 324)
+     one suite — a failing example fails the build. The corpus is driven by a
-   - Zero failures, zero timeouts
+     pure-Zig test (`src/corpus_run.test.zig`) that spawns the installed `sx`
     binary per example (subprocess-isolated, with a per-run timeout), so no
     shell script is involved.
 2. **Regenerate snapshots**: `zig build test -Dupdate-goldens`
   - Flips the corpus test to write each example's expected
     `.exit`/`.stdout`/`.stderr` (+ `.ir` where one already exists) from
     freshly-normalized output instead of asserting against it. This is the
     preferred way to update snapshots — no shell script needed.
   - A test is still keyed off its `expected/<name>.exit` marker, so seed an
     empty marker first for a brand-new example (see "Adding a feature").
 3. **Standalone corpus run** (optional): `bash tests/run_examples.sh`
   - Runs the corpus independent of `zig build test` (used by
     `tools/verify-step.sh`). `--update` still regenerates snapshots and
     produces byte-identical output to `-Dupdate-goldens`.
   - Every test must show `ok` (currently 626); zero failures, zero timeouts.
   - Uses GNU `timeout`/`gtimeout` when present (Homebrew coreutils on macOS)
     and runs without a per-test wall-clock guard when neither is found.
   - The two normalizers (`normalize`/`normalize_ir` in the script and the
     mirrors in `src/corpus_run.test.zig`) must stay in lockstep.
 ### Test layout
@@ -445,12 +466,12 @@ dirs) under the same `XXXX-` prefix.
 ### Snapshot integrity
-**Never run `--update` while tests are failing.** The `--update` flag blindly overwrites expected output with whatever the compiler produces — including error messages. If you update snapshots during a broken state, the test suite will "pass" against garbage output and real regressions become invisible.
+**Never regenerate snapshots while tests are failing.** `-Dupdate-goldens` (and the legacy `--update`) blindly overwrite expected output with whatever the compiler produces — including error messages. If you regenerate during a broken state, the test suite will "pass" against garbage output and real regressions become invisible.
 Safe workflow:
-1. Fix the code until `bash tests/run_examples.sh` passes against the **existing** snapshots.
+1. Fix the code until `zig build test` passes against the **existing** snapshots.
-2. Only run `--update` when you've intentionally changed output (new feature, new test, changed formatting).
+2. Only run `zig build test -Dupdate-goldens` when you've intentionally changed output (new feature, new test, changed formatting).
-3. After `--update`, review the diff (`git diff examples/expected/ issues/expected/`) to confirm no error messages or empty output were captured.
+3. After regenerating, review the diff (`git diff examples/expected/ issues/expected/`) to confirm no error messages or empty output were captured.
 ### Adding a new language feature
@@ -461,19 +482,20 @@ There is no monolithic smoke file — each feature is its own focused example.
 2. Run it: `./zig-out/bin/sx run examples/XXXX-<category>-<name>.sx`
 3. Seed the marker and capture expected output:
   `: > examples/expected/XXXX-<category>-<name>.exit` then
-   `bash tests/run_examples.sh --update`
+   `zig build test -Dupdate-goldens`
-4. Verify all tests still pass: `bash tests/run_examples.sh`
+4. Verify all tests still pass: `zig build test`
 ### Test file roles
 | File | Purpose |
 |------|---------|
 | `examples/XXXX-category-name.sx` | Focused feature example — one feature per file. |
-| `examples/expected/XXXX-category-name.{exit,stdout,stderr}` | Expected exit code + the two output streams. Regenerate with `--update`. |
+| `examples/expected/XXXX-category-name.{exit,stdout,stderr}` | Expected exit code + the two output streams. Regenerate with `zig build test -Dupdate-goldens`. |
 | `examples/expected/XXXX-category-name.ir` | Optional `sx ir` snapshot — present only where lowering shape is locked. |
 | `issues/NNNN-slug.md` | Open-issue / bug-report writeup (mark RESOLVED in a banner when fixed; the `.md` stays). |
 | `issues/NNNN-slug.sx` (+ `issues/NNNN-slug/`) | The issue's minimal repro, co-located with the `.md`. A repro with an `issues/expected/NNNN-slug.exit` marker runs in the suite; unpinned ones don't. |
-| `tests/run_examples.sh` | Test runner. Scans `examples/` and `issues/`; compares stdout/stderr/exit (+ optional IR) per test. |
+| `src/corpus_run.test.zig` | The corpus runner inside `zig build test` — spawns `sx` per example, diffs stdout/stderr/exit (+ optional IR); regenerates snapshots under `-Dupdate-goldens`. |
 | `tests/run_examples.sh` | Standalone shell runner (used by `tools/verify-step.sh`); same compare + `--update` as the Zig test. |
 ### Unit test file convention
@@ -496,8 +518,8 @@ All Zig unit tests live in separate `*.test.zig` files alongside the source they
   open bug, `issues/NNNN-slug.{md,sx}` (repro co-located with the writeup).
 2. Run it: `./zig-out/bin/sx run <path>.sx`
 3. Seed the marker (`: > <root>/expected/<name>.exit`) and capture expected:
-   `bash tests/run_examples.sh --update`
+   `zig build test -Dupdate-goldens`
-4. Verify: `bash tests/run_examples.sh`
+4. Verify: `zig build test`
 ### Resolving an open issue
@@ -505,8 +527,8 @@ When a bug filed under `issues/NNNN-slug.{md,sx}` is fixed:
 1. Move the repro into the feature suite as a regression test:
   `git mv issues/NNNN-slug.sx examples/XXXX-<category>-<name>.sx`.
-2. Seed `examples/expected/XXXX-<category>-<name>.exit`, capture with `--update`,
+2. Seed `examples/expected/XXXX-<category>-<name>.exit`, capture with
-   and review the diff.
+   `zig build test -Dupdate-goldens`, and review the diff.
 3. Tighten the example's comment header to describe the feature (keep a one-line
   `Regression (issue NNNN)` note for provenance).
 4. Mark `issues/NNNN-slug.md` RESOLVED with a short banner (root cause + fix +
--- a/build.zig
+++ b/build.zig
@@ -193,28 +193,49 @@ pub fn build(b: *std.Build) void {
        run_cmd.addArgs(args);
    }
-    // Corpus paths for the LSP corpus-sweep test (src/lsp/corpus_sweep.test.zig).
+    // Corpus paths for the corpus tests (src/lsp/corpus_sweep.test.zig — the
-    // Inject absolute corpus dirs at configure time so the in-process analyzer
+    // in-process analyzer sweep — and src/corpus_run.test.zig — the end-to-end
-    // sweep is CWD-independent; the test still ENUMERATES the directory
+    // example/issue runner). Inject absolute corpus dirs + the installed `sx`
-    // contents at runtime (new examples are covered with no test edit).
+    // binary path at configure time so the tests are CWD-independent; the
    // runner still ENUMERATES the directory contents at runtime, so new
    // examples are covered with no test edit.
    const corpus_opts = b.addOptions();
    corpus_opts.addOption([]const u8, "examples_dir", b.path("examples").getPath(b));
    corpus_opts.addOption([]const u8, "issues_dir", b.path("issues").getPath(b));
    corpus_opts.addOption([]const u8, "library_dir", b.path("library").getPath(b));
    // Absolute path to the installed `sx` binary the corpus runner spawns per
    // example. The runner test depends on the install step (below) so this
    // exists — and so the sibling library/ tree the binary loads is in place.
    corpus_opts.addOption([]const u8, "sx_exe", b.getInstallPath(.bin, "sx"));
    // `zig build test -Dupdate-goldens` flips src/corpus_run.test.zig from
    // verify mode to regenerate mode: it overwrites each example's expected
    // .exit/.stdout/.stderr (+ .ir where one exists) with freshly-normalized
    // output instead of asserting against it. The in-build equivalent of the
    // legacy `run_examples.sh --update`.
    const update_goldens = b.option(
        bool,
        "update-goldens",
        "Regenerate example/issue snapshots instead of verifying them (use with `zig build test`)",
    ) orelse false;
    corpus_opts.addOption(bool, "update_goldens", update_goldens);
    mod.addOptions("corpus_paths", corpus_opts);
    const mod_tests = b.addTest(.{
        .root_module = mod,
    });
    const run_mod_tests = b.addRunArtifact(mod_tests);
    // src/corpus_run.test.zig spawns the installed `sx` binary per example, so
    // the mod test binary must not run until `zig-out/bin/sx` + `zig-out/library`
    // are installed. This is what folds the full example/issue regression suite
    // into `zig build test` — no shell script, just a Zig test.
    run_mod_tests.step.dependOn(b.getInstallStep());
    const exe_tests = b.addTest(.{
        .root_module = exe.root_module,
    });
    const run_exe_tests = b.addRunArtifact(exe_tests);
-    const test_step = b.step("test", "Run tests");
+    const test_step = b.step("test", "Run unit tests + the example/issue regression suite");
    test_step.dependOn(&run_mod_tests.step);
    test_step.dependOn(&run_exe_tests.step);
 }
--- a/src/corpus_run.test.zig
+++ b/src/corpus_run.test.zig
@@ -0,0 +1,367 @@
 const std = @import("std");
 const corpus_paths = @import("corpus_paths");
 // End-to-end example/issue regression runner — the pure-Zig replacement for
 // `tests/run_examples.sh`. For every `<root>/expected/<name>.exit` marker under
 // examples/ and issues/, spawn the installed `sx` binary on `<name>.sx`, capture
 // stdout/stderr/exit, normalize, and diff against the stored snapshot. Optional
 // `<name>.ir` snapshots additionally diff `sx ir` output.
 //
 // Each example runs in its OWN subprocess (via std.process.run), so a crashing
 // example reports its exit code (or 128+signal, matching a shell's `$?`) instead
 // of taking down the test binary. A per-run deadline guards against hangs.
 //
 // Paths + the `sx` binary path are injected at configure time (build.zig
 // `corpus_paths`); the FILE LIST is enumerated at test time, so new examples are
 // covered with no edit here. The child runs with cwd = repo root and is handed a
 // repo-relative path (e.g. `examples/0001-foo.sx`) — exactly the form the stored
 // snapshots are normalized to, and the cwd `tests/fixtures/` imports resolve
 // against. (The shell runner passes absolute paths and relies on a sed rule to
 // collapse them back; running relatively makes that rule a no-op, so it is not
 // reimplemented here.)
 //
 // Snapshots are regenerated in-build with `zig build test -Dupdate-goldens`
 // (see the update-mode branch below) — no shell script needed. The legacy
 // `bash tests/run_examples.sh --update` still works and produces byte-identical
 // output; the two normalizers (here and in run_examples.sh) must stay in lockstep.
 const TIMEOUT_SECS = 10;
 const MAX_OUTPUT = 16 * 1024 * 1024;
 /// Wrap the live C `environ` so spawned children inherit the test process's
 /// environment. `Io.Threaded`'s default `process_environ` is EMPTY, and a null
 /// `environ_map` on a spawn falls back to it — so without this the child `sx`
 /// runs with no PATH (and getenv-based examples like 1222 fail spuriously).
 /// The slice points into the process-lifetime `environ` global; no copy needed.
 fn currentEnviron() std.process.Environ {
    const raw: [*:null]const ?[*:0]const u8 = @ptrCast(std.c.environ);
    return .{ .block = .{ .slice = std.mem.span(raw) } };
 }
 var g_test_threaded: ?std.Io.Threaded = null;
 fn test_io() std.Io {
    if (g_test_threaded == null) {
        g_test_threaded = std.Io.Threaded.init(std.heap.page_allocator, .{ .environ = currentEnviron() });
    }
    return g_test_threaded.?.io();
 }
 fn isLowerHex(c: u8) bool {
    return (c >= '0' and c <= '9') or (c >= 'a' and c <= 'f');
 }
 /// Mirror of `normalize()` in run_examples.sh: collapse `0x` + 4-or-more
 /// lowercase-hex digits to `0xADDR` so heap/fn addresses don't desync snapshots.
 /// (The path-collapse sed rule is intentionally omitted — see file header.)
 fn normalizeStd(arena: std.mem.Allocator, in: []const u8) ![]u8 {
    var out: std.ArrayList(u8) = .empty;
    var i: usize = 0;
    while (i < in.len) {
        if (in[i] == '0' and i + 1 < in.len and in[i + 1] == 'x') {
            var j = i + 2;
            while (j < in.len and isLowerHex(in[j])) j += 1;
            if (j - (i + 2) >= 4) {
                try out.appendSlice(arena, "0xADDR");
                i = j;
                continue;
            }
        }
        try out.append(arena, in[i]);
        i += 1;
    }
    return out.items;
 }
 /// `^attributes #[0-9]+ = \{` — one of normalize_ir's line-drop patterns.
 fn isAttributesLine(line: []const u8) bool {
    const pfx = "attributes #";
    if (!std.mem.startsWith(u8, line, pfx)) return false;
    var k: usize = pfx.len;
    const start = k;
    while (k < line.len and line[k] >= '0' and line[k] <= '9') k += 1;
    return k > start and std.mem.startsWith(u8, line[k..], " = {");
 }
 fn dropIrLine(line: []const u8) bool {
    return std.mem.startsWith(u8, line, "; ModuleID =") or
        std.mem.startsWith(u8, line, "source_filename =") or
        std.mem.startsWith(u8, line, "target datalayout =") or
        std.mem.startsWith(u8, line, "target triple =") or
        isAttributesLine(line);
 }
 /// Apply `s/%([a-z]+)[0-9]+/%\1N/g` to one line — collapse LLVM's auto-suffixed
 /// temporaries (`%tmp17` -> `%tmpN`) so renumbering doesn't desync snapshots.
 fn appendIrSubst(arena: std.mem.Allocator, out: *std.ArrayList(u8), line: []const u8) !void {
    var i: usize = 0;
    while (i < line.len) {
        if (line[i] == '%') {
            const lstart = i + 1;
            var j = lstart;
            while (j < line.len and line[j] >= 'a' and line[j] <= 'z') j += 1;
            const letters_end = j;
            const dstart = j;
            while (j < line.len and line[j] >= '0' and line[j] <= '9') j += 1;
            if (letters_end > lstart and j > dstart) {
                try out.append(arena, '%');
                try out.appendSlice(arena, line[lstart..letters_end]);
                try out.append(arena, 'N');
                i = j;
                continue;
            }
        }
        try out.append(arena, line[i]);
        i += 1;
    }
 }
 /// Mirror of `normalize_ir()` in run_examples.sh.
 fn normalizeIr(arena: std.mem.Allocator, in: []const u8) ![]u8 {
    var out: std.ArrayList(u8) = .empty;
    var lines = std.mem.splitScalar(u8, in, '\n');
    var first = true;
    while (lines.next()) |line| {
        if (dropIrLine(line)) continue;
        if (!first) try out.append(arena, '\n');
        first = false;
        try appendIrSubst(arena, &out, line);
    }
    return out.items;
 }
 /// Match the shell runner's `$(...)` capture, which strips trailing newlines
 /// from both expected and actual before comparing.
 fn trimNl(s: []const u8) []const u8 {
    return std.mem.trimEnd(u8, s, "\n");
 }
 /// bash `$?` convention: normal exit -> code; signal-terminated -> 128+signal.
 fn termCode(term: std.process.Child.Term) u32 {
    return switch (term) {
        .exited => |c| c,
        .signal, .stopped => |s| 128 + @as(u32, @intCast(@intFromEnum(s))),
        .unknown => |u| u,
    };
 }
 fn deadline(io: std.Io) std.Io.Timeout {
    const dur: std.Io.Clock.Duration = .{
        .raw = std.Io.Duration.fromSeconds(TIMEOUT_SECS),
        .clock = .awake,
    };
    return .{ .deadline = std.Io.Clock.Timestamp.fromNow(io, dur) };
 }
 fn readOptional(io: std.Io, gpa: std.mem.Allocator, abs_path: []const u8) ?[]u8 {
    return std.Io.Dir.readFileAlloc(.cwd(), io, abs_path, gpa, .limited(MAX_OUTPUT)) catch null;
 }
 /// Run every `<root>/expected/*.exit` test. Appends a formatted diagnostic to
 /// `failures` (owned by `fail_gpa`) for each mismatch. Returns the number of
 /// tests actually run (markers whose `.sx` is missing are skipped).
 fn sweepRoot(
    fail_gpa: std.mem.Allocator,
    io: std.Io,
    root_dir: []const u8,
    failures: *std.ArrayList([]const u8),
 ) !usize {
    // Repo root (parent of examples/ or issues/) is the child's cwd: relative
    // source paths land in diagnostics already-normalized, and tests/fixtures/
    // imports resolve here.
    const repo_root = std.fs.path.dirname(root_dir) orelse ".";
    const root_base = std.fs.path.basename(root_dir); // "examples" | "issues"
    var name_arena_state = std.heap.ArenaAllocator.init(fail_gpa);
    defer name_arena_state.deinit();
    const name_arena = name_arena_state.allocator();
    const expected_dir_path = try std.fs.path.join(name_arena, &.{ root_dir, "expected" });
    var dir = std.Io.Dir.openDirAbsolute(io, expected_dir_path, .{ .iterate = true }) catch return 0;
    defer dir.close(io);
    // Collect marker names first (entry.name is only valid until the next
    // iterate step; spawning subprocesses mid-iteration is asking for trouble).
    var names: std.ArrayList([]const u8) = .empty;
    var it = dir.iterate();
    while (try it.next(io)) |entry| {
        if (entry.kind == .directory) continue; // accept .file and .unknown d_type
        if (!std.mem.endsWith(u8, entry.name, ".exit")) continue;
        const name = entry.name[0 .. entry.name.len - ".exit".len];
        try names.append(name_arena, try name_arena.dupe(u8, name));
    }
    var work_state = std.heap.ArenaAllocator.init(fail_gpa);
    defer work_state.deinit();
    var ran: usize = 0;
    var skipped: usize = 0;
    var updated: usize = 0;
    for (names.items) |name| {
        _ = work_state.reset(.retain_capacity);
        const a = work_state.allocator();
        const sx_abs = try std.fs.path.join(a, &.{ root_dir, try std.fmt.allocPrint(a, "{s}.sx", .{name}) });
        std.Io.Dir.access(.cwd(), io, sx_abs, .{}) catch { // marker without source
            skipped += 1;
            std.debug.print("[corpus-run] skip {s} (no {s}.sx)\n", .{ name, name });
            continue;
        };
        ran += 1;
        const rel_path = try std.fmt.allocPrint(a, "{s}/{s}.sx", .{ root_base, name });
        const exp_dir = expected_dir_path;
        const exit_raw = readOptional(io, a, try std.fmt.allocPrint(a, "{s}/{s}.exit", .{ exp_dir, name })) orelse "";
        const out_raw = readOptional(io, a, try std.fmt.allocPrint(a, "{s}/{s}.stdout", .{ exp_dir, name })) orelse "";
        const err_raw = readOptional(io, a, try std.fmt.allocPrint(a, "{s}/{s}.stderr", .{ exp_dir, name })) orelse "";
        const ir_raw = readOptional(io, a, try std.fmt.allocPrint(a, "{s}/{s}.ir", .{ exp_dir, name }));
        // --- sx run ---
        const run_res = std.process.run(a, io, .{
            .argv = &.{ corpus_paths.sx_exe, "run", rel_path },
            .cwd = .{ .path = repo_root },
            .timeout = deadline(io),
        }) catch |err| {
            try failures.append(fail_gpa, try std.fmt.allocPrint(fail_gpa, "{s}: `sx run` {s}{s}", .{
                name,
                @errorName(err),
                if (err == error.Timeout) " (>10s)" else "",
            }));
            continue;
        };
        const act_exit = termCode(run_res.term);
        const act_out = trimNl(try normalizeStd(a, run_res.stdout));
        const act_err = trimNl(try normalizeStd(a, run_res.stderr));
        // --- sx ir (only when a snapshot already exists; mirrors the shell's
        // `$has_ir` gate — update mode never CREATES new .ir files) ---
        var act_ir: ?[]const u8 = null;
        if (ir_raw != null) {
            const ir_res = std.process.run(a, io, .{
                .argv = &.{ corpus_paths.sx_exe, "ir", rel_path },
                .cwd = .{ .path = repo_root },
                .timeout = deadline(io),
            }) catch |err| {
                try failures.append(fail_gpa, try std.fmt.allocPrint(fail_gpa, "{s}: `sx ir` {s}", .{ name, @errorName(err) }));
                continue;
            };
            // `sx ir` writes IR to stdout; mirror the shell's `2>&1` by appending
            // stderr (empty for a clean dump).
            const merged = try std.fmt.allocPrint(a, "{s}{s}", .{ ir_res.stdout, ir_res.stderr });
            act_ir = trimNl(try normalizeIr(a, merged));
        }
        // --- update mode: overwrite snapshots with freshly-normalized output ---
        if (corpus_paths.update_goldens) {
            try writeGolden(io, a, exp_dir, name, "exit", try std.fmt.allocPrint(a, "{d}", .{act_exit}));
            try writeGolden(io, a, exp_dir, name, "stdout", act_out);
            try writeGolden(io, a, exp_dir, name, "stderr", act_err);
            if (act_ir) |ir| try writeGolden(io, a, exp_dir, name, "ir", ir);
            updated += 1;
            continue;
        }
        // --- verify against stored snapshot ---
        const exp_exit = std.fmt.parseInt(u32, std.mem.trim(u8, exit_raw, " \t\r\n"), 10) catch {
            try failures.append(fail_gpa, try std.fmt.allocPrint(fail_gpa, "{s}: unparseable expected exit '{s}'", .{ name, std.mem.trim(u8, exit_raw, " \t\r\n") }));
            continue;
        };
        const exp_out = trimNl(try normalizeStd(a, out_raw));
        const exp_err = trimNl(try normalizeStd(a, err_raw));
        var diag: std.ArrayList(u8) = .empty;
        if (act_exit != exp_exit)
            try diag.appendSlice(a, try std.fmt.allocPrint(a, "  exit: expected={d} actual={d}\n", .{ exp_exit, act_exit }));
        if (!std.mem.eql(u8, act_out, exp_out))
            try appendDiff(a, &diag, "stdout", exp_out, act_out);
        if (!std.mem.eql(u8, act_err, exp_err))
            try appendDiff(a, &diag, "stderr", exp_err, act_err);
        if (ir_raw) |ir_expected_raw| {
            const exp_ir = trimNl(try normalizeIr(a, ir_expected_raw));
            if (!std.mem.eql(u8, act_ir.?, exp_ir))
                try appendDiff(a, &diag, "IR", exp_ir, act_ir.?);
        }
        try recordIfFailed(fail_gpa, failures, name, diag.items);
    }
    if (skipped > 0)
        std.debug.print("[corpus-run] {s}: {d} marker(s) skipped (no matching .sx)\n", .{ root_base, skipped });
    if (corpus_paths.update_goldens)
        std.debug.print("[corpus-run] {s}: {d} snapshot(s) regenerated\n", .{ root_base, updated });
    return ran;
 }
 /// Overwrite `<exp_dir>/<name>.<ext>` with `content` + a trailing newline —
 /// matching the shell runner's `echo "$x" > file` (command substitution strips
 /// trailing newlines; echo re-adds exactly one). Update mode only.
 fn writeGolden(
    io: std.Io,
    a: std.mem.Allocator,
    exp_dir: []const u8,
    name: []const u8,
    ext: []const u8,
    content: []const u8,
 ) !void {
    const path = try std.fmt.allocPrint(a, "{s}/{s}.{s}", .{ exp_dir, name, ext });
    const data = try std.fmt.allocPrint(a, "{s}\n", .{content});
    try std.Io.Dir.writeFile(.cwd(), io, .{ .sub_path = path, .data = data });
 }
 fn recordIfFailed(
    fail_gpa: std.mem.Allocator,
    failures: *std.ArrayList([]const u8),
    name: []const u8,
    diag: []const u8,
 ) !void {
    if (diag.len == 0) return;
    try failures.append(fail_gpa, try std.fmt.allocPrint(fail_gpa, "{s}:\n{s}", .{ name, diag }));
 }
 const DIFF_CAP = 2000;
 fn appendDiff(a: std.mem.Allocator, diag: *std.ArrayList(u8), label: []const u8, expected: []const u8, actual: []const u8) !void {
    try diag.appendSlice(a, try std.fmt.allocPrint(a, "  --- {s}: expected ---\n{s}\n  --- {s}: actual ---\n{s}\n", .{
        label, cap(expected), label, cap(actual),
    }));
 }
 fn cap(s: []const u8) []const u8 {
    return if (s.len > DIFF_CAP) s[0..DIFF_CAP] else s;
 }
 fn reportFailures(label: []const u8, ran: usize, failures: []const []const u8) !void {
    std.debug.print("[corpus-run] {s}: {d} ran, {d} failed\n", .{ label, ran, failures.len });
    for (failures) |f| std.debug.print("FAIL {s}\n", .{f});
    if (failures.len > 0 and !corpus_paths.update_goldens) std.debug.print(
        \\
        \\  ── snapshot mismatch ──────────────────────────────────────────────
        \\  If the new output is CORRECT (intentional change), regenerate snapshots:
        \\      zig build test -Dupdate-goldens
        \\      git diff examples/expected/ issues/expected/   # review before committing
        \\  Otherwise this is a regression — fix the code, don't update the snapshot.
        \\  ───────────────────────────────────────────────────────────────────
        \\
    , .{});
    try std.testing.expect(failures.len == 0);
 }
 test "examples corpus: every examples/*.sx runs and matches its snapshot" {
    const io = test_io();
    var failures: std.ArrayList([]const u8) = .empty;
    defer failures.deinit(std.testing.allocator);
    const ran = try sweepRoot(std.testing.allocator, io, corpus_paths.examples_dir, &failures);
    defer for (failures.items) |f| std.testing.allocator.free(f);
    try std.testing.expect(ran > 0);
    try reportFailures("examples", ran, failures.items);
 }
 test "issues corpus: every pinned issues/*.sx repro runs and matches its snapshot" {
    const io = test_io();
    var failures: std.ArrayList([]const u8) = .empty;
    defer failures.deinit(std.testing.allocator);
    const ran = try sweepRoot(std.testing.allocator, io, corpus_paths.issues_dir, &failures);
    defer for (failures.items) |f| std.testing.allocator.free(f);
    try reportFailures("issues", ran, failures.items);
 }
--- a/src/ir/emit_llvm.zig
+++ b/src/ir/emit_llvm.zig
@@ -2917,7 +2917,11 @@ pub const LLVMEmitter = struct {
        const ir_str = c.LLVMPrintModuleToString(self.llvm_module);
        defer c.LLVMDisposeMessage(ir_str);
        const len = std.mem.len(ir_str);
-        std.debug.print("{s}\n", .{ir_str[0..len]});
+        // Write to fd 1 (stdout), not std.debug.print (stderr): `sx ir` is a
        // data-emitting command meant to be piped/redirected, so the IR text
        // belongs on stdout. Mirrors core.flushInterpOutput's raw-write route.
        _ = std.c.write(1, ir_str, len);
        _ = std.c.write(1, "\n", 1);
    }
    /// Emit the module as an object file to disk.
--- a/src/main.zig
+++ b/src/main.zig
@@ -520,7 +520,9 @@ fn dumpSxIR(allocator: std.mem.Allocator, io: std.Io, input_path: []const u8, st
    sx.ir.printModule(&ir_module, &aw.writer) catch return error.CompileError;
    var result = aw.writer.toArrayList();
    defer result.deinit(allocator);
-    std.debug.print("{s}", .{result.items});
+    // Emit to stdout (fd 1), not stderr: `ir-dump` is a data-emitting command
    // meant to be piped/redirected. Matches `sx ir`'s stdout routing.
    _ = std.c.write(1, result.items.ptr, result.items.len);
 }
 fn emitIR(allocator: std.mem.Allocator, io: std.Io, input_path: []const u8, target_config: sx.target.TargetConfig, stdlib_paths: []const []const u8) !void {
--- a/src/root.zig
+++ b/src/root.zig
@@ -17,6 +17,7 @@ pub const imports_tests = @import("imports.test.zig");
 pub const core = @import("core.zig");
 pub const c_import = @import("c_import.zig");
 pub const c_import_tests = @import("c_import.test.zig");
 pub const corpus_run_tests = @import("corpus_run.test.zig");
 pub const ir = @import("ir/ir.zig");
 pub const lsp = struct {
--- a/tests/run_examples.sh
+++ b/tests/run_examples.sh
@@ -31,6 +31,28 @@ if [[ "${1:-}" == "--update" ]]; then
    UPDATE=1
 fi
 # Per-test wall-clock guard. GNU `timeout` (or `gtimeout` from Homebrew
 # coreutils) kills a hung test after $TIMEOUT seconds. Neither ships on a
 # bare macOS, so degrade gracefully: when no timeout binary is found, run the
 # command directly (a hang then blocks the suite, but the suite still works).
 TIMEOUT_CMD=()
 if command -v timeout >/dev/null 2>&1; then
    TIMEOUT_CMD=(timeout "$TIMEOUT")
 elif command -v gtimeout >/dev/null 2>&1; then
    TIMEOUT_CMD=(gtimeout "$TIMEOUT")
 fi
 # Run a command under the timeout wrapper if one is available, else directly.
 # The length check (not "${arr[@]}") keeps this safe under bash 3.2 + `set -u`,
 # where expanding an empty array trips "unbound variable".
 run_sx() {
    if [[ ${#TIMEOUT_CMD[@]} -gt 0 ]]; then
        "${TIMEOUT_CMD[@]}" "$@"
    else
        "$@"
    fi
 }
 # Normalize stdout/stderr for snapshot diffing. Applied identically to both
 # expected and actual, so it can only reconcile location/host noise — never
 # desync an otherwise-matching pair. The path rule collapses any absolute
@@ -75,7 +97,7 @@ for root in "${ROOTS[@]}"; do
        fi
        printf "  %-48s" "$name"
-        actual_out=$(timeout "$TIMEOUT" "$SX" run "$sx_file" 2>"$TMP_ERR" | normalize)
+        actual_out=$(run_sx "$SX" run "$sx_file" 2>"$TMP_ERR" | normalize)
        actual_exit=${PIPESTATUS[0]}
        actual_err=$(normalize < "$TMP_ERR")