test: run example corpus in zig build test; sx ir → stdout

`zig build test` now runs the full examples/ + issues/ regression corpus alongside the Zig unit tests, driven by a pure-Zig test (src/corpus_run.test.zig) — no shell script in the build path. It spawns the installed `sx` per example (subprocess-isolated, per-run timeout), diffs stdout/stderr/exit and optional `sx ir` snapshots, and fails the build on any mismatch. The file list is enumerated at runtime, so new examples are covered with no test edit. - `sx ir` / `ir-dump` now write to stdout (fd 1) instead of stderr, so the dumps can be piped/redirected. - `zig build test -Dupdate-goldens` regenerates snapshots in-build, byte-identical to the legacy `run_examples.sh --update`; on mismatch the runner prints how to regenerate. - run_examples.sh kept (still used by tools/verify-step.sh) and made portable to a bare macOS: timeout/gtimeout fallback, bash 3.2-safe empty-array handling. - CLAUDE.md: document the new workflow.
2026-06-13 09:41:56 +03:00
parent 39488133c9
commit ab3c9202ff
7 changed files with 464 additions and 25 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -405,7 +405,10 @@ any can be advanced independently.
 After any code change:
 ```sh
 zig build                              # must compile
-zig build test                         # must pass
+zig build test                         # must pass — runs the Zig unit tests
+                                       # AND the full examples/ + issues/
+                                       # regression corpus (a failing example
+                                       # fails the build)
 ```

 After completing a phase's final step, run the phase's end-to-end verification command listed in `current/PLAN.md`.
@@ -415,9 +418,27 @@ After completing a phase's final step, run the phase's end-to-end verification c
 After any compiler change:

 1. **Build**: `zig build && zig build test`
-2. **Run regression tests**: `bash tests/run_examples.sh`
-   - Every test must show `ok` (currently 324)
-   - Zero failures, zero timeouts
+   - `zig build test` runs the unit tests **and** the example/issue corpus as
+     one suite — a failing example fails the build. The corpus is driven by a
+     pure-Zig test (`src/corpus_run.test.zig`) that spawns the installed `sx`
+     binary per example (subprocess-isolated, with a per-run timeout), so no
+     shell script is involved.
+2. **Regenerate snapshots**: `zig build test -Dupdate-goldens`
+   - Flips the corpus test to write each example's expected
+     `.exit`/`.stdout`/`.stderr` (+ `.ir` where one already exists) from
+     freshly-normalized output instead of asserting against it. This is the
+     preferred way to update snapshots — no shell script needed.
+   - A test is still keyed off its `expected/<name>.exit` marker, so seed an
+     empty marker first for a brand-new example (see "Adding a feature").
+3. **Standalone corpus run** (optional): `bash tests/run_examples.sh`
+   - Runs the corpus independent of `zig build test` (used by
+     `tools/verify-step.sh`). `--update` still regenerates snapshots and
+     produces byte-identical output to `-Dupdate-goldens`.
+   - Every test must show `ok` (currently 626); zero failures, zero timeouts.
+   - Uses GNU `timeout`/`gtimeout` when present (Homebrew coreutils on macOS)
+     and runs without a per-test wall-clock guard when neither is found.
+   - The two normalizers (`normalize`/`normalize_ir` in the script and the
+     mirrors in `src/corpus_run.test.zig`) must stay in lockstep.

 ### Test layout

@@ -445,12 +466,12 @@ dirs) under the same `XXXX-` prefix.

 ### Snapshot integrity

-**Never run `--update` while tests are failing.** The `--update` flag blindly overwrites expected output with whatever the compiler produces — including error messages. If you update snapshots during a broken state, the test suite will "pass" against garbage output and real regressions become invisible.
+**Never regenerate snapshots while tests are failing.** `-Dupdate-goldens` (and the legacy `--update`) blindly overwrite expected output with whatever the compiler produces — including error messages. If you regenerate during a broken state, the test suite will "pass" against garbage output and real regressions become invisible.

 Safe workflow:
-1. Fix the code until `bash tests/run_examples.sh` passes against the **existing** snapshots.
-2. Only run `--update` when you've intentionally changed output (new feature, new test, changed formatting).
-3. After `--update`, review the diff (`git diff examples/expected/ issues/expected/`) to confirm no error messages or empty output were captured.
+1. Fix the code until `zig build test` passes against the **existing** snapshots.
+2. Only run `zig build test -Dupdate-goldens` when you've intentionally changed output (new feature, new test, changed formatting).
+3. After regenerating, review the diff (`git diff examples/expected/ issues/expected/`) to confirm no error messages or empty output were captured.

 ### Adding a new language feature

@@ -461,19 +482,20 @@ There is no monolithic smoke file — each feature is its own focused example.
 2. Run it: `./zig-out/bin/sx run examples/XXXX-<category>-<name>.sx`
 3. Seed the marker and capture expected output:
   `: > examples/expected/XXXX-<category>-<name>.exit` then
-   `bash tests/run_examples.sh --update`
-4. Verify all tests still pass: `bash tests/run_examples.sh`
+   `zig build test -Dupdate-goldens`
+4. Verify all tests still pass: `zig build test`

 ### Test file roles

 | File | Purpose |
 |------|---------|
 | `examples/XXXX-category-name.sx` | Focused feature example — one feature per file. |
-| `examples/expected/XXXX-category-name.{exit,stdout,stderr}` | Expected exit code + the two output streams. Regenerate with `--update`. |
+| `examples/expected/XXXX-category-name.{exit,stdout,stderr}` | Expected exit code + the two output streams. Regenerate with `zig build test -Dupdate-goldens`. |
 | `examples/expected/XXXX-category-name.ir` | Optional `sx ir` snapshot — present only where lowering shape is locked. |
 | `issues/NNNN-slug.md` | Open-issue / bug-report writeup (mark RESOLVED in a banner when fixed; the `.md` stays). |
 | `issues/NNNN-slug.sx` (+ `issues/NNNN-slug/`) | The issue's minimal repro, co-located with the `.md`. A repro with an `issues/expected/NNNN-slug.exit` marker runs in the suite; unpinned ones don't. |
-| `tests/run_examples.sh` | Test runner. Scans `examples/` and `issues/`; compares stdout/stderr/exit (+ optional IR) per test. |
+| `src/corpus_run.test.zig` | The corpus runner inside `zig build test` — spawns `sx` per example, diffs stdout/stderr/exit (+ optional IR); regenerates snapshots under `-Dupdate-goldens`. |
+| `tests/run_examples.sh` | Standalone shell runner (used by `tools/verify-step.sh`); same compare + `--update` as the Zig test. |

 ### Unit test file convention

@@ -496,8 +518,8 @@ All Zig unit tests live in separate `*.test.zig` files alongside the source they
   open bug, `issues/NNNN-slug.{md,sx}` (repro co-located with the writeup).
 2. Run it: `./zig-out/bin/sx run <path>.sx`
 3. Seed the marker (`: > <root>/expected/<name>.exit`) and capture expected:
-   `bash tests/run_examples.sh --update`
-4. Verify: `bash tests/run_examples.sh`
+   `zig build test -Dupdate-goldens`
+4. Verify: `zig build test`

 ### Resolving an open issue

@@ -505,8 +527,8 @@ When a bug filed under `issues/NNNN-slug.{md,sx}` is fixed:

 1. Move the repro into the feature suite as a regression test:
   `git mv issues/NNNN-slug.sx examples/XXXX-<category>-<name>.sx`.
-2. Seed `examples/expected/XXXX-<category>-<name>.exit`, capture with `--update`,
-   and review the diff.
+2. Seed `examples/expected/XXXX-<category>-<name>.exit`, capture with
+   `zig build test -Dupdate-goldens`, and review the diff.
 3. Tighten the example's comment header to describe the feature (keep a one-line
   `Regression (issue NNNN)` note for provenance).
 4. Mark `issues/NNNN-slug.md` RESOLVED with a short banner (root cause + fix +