test: run example corpus in zig build test; sx ir → stdout

`zig build test` now runs the full examples/ + issues/ regression corpus
alongside the Zig unit tests, driven by a pure-Zig test
(src/corpus_run.test.zig) — no shell script in the build path. It spawns
the installed `sx` per example (subprocess-isolated, per-run timeout),
diffs stdout/stderr/exit and optional `sx ir` snapshots, and fails the
build on any mismatch. The file list is enumerated at runtime, so new
examples are covered with no test edit.

- `sx ir` / `ir-dump` now write to stdout (fd 1) instead of stderr, so
  the dumps can be piped/redirected.
- `zig build test -Dupdate-goldens` regenerates snapshots in-build,
  byte-identical to the legacy `run_examples.sh --update`; on mismatch
  the runner prints how to regenerate.
- run_examples.sh kept (still used by tools/verify-step.sh) and made
  portable to a bare macOS: timeout/gtimeout fallback, bash 3.2-safe
  empty-array handling.
- CLAUDE.md: document the new workflow.
This commit is contained in:
agra
2026-06-13 09:41:56 +03:00
parent 39488133c9
commit ab3c9202ff
7 changed files with 464 additions and 25 deletions

View File

@@ -405,7 +405,10 @@ any can be advanced independently.
After any code change:
```sh
zig build # must compile
zig build test # must pass
zig build test # must pass — runs the Zig unit tests
# AND the full examples/ + issues/
# regression corpus (a failing example
# fails the build)
```
After completing a phase's final step, run the phase's end-to-end verification command listed in `current/PLAN.md`.
@@ -415,9 +418,27 @@ After completing a phase's final step, run the phase's end-to-end verification c
After any compiler change:
1. **Build**: `zig build && zig build test`
2. **Run regression tests**: `bash tests/run_examples.sh`
- Every test must show `ok` (currently 324)
- Zero failures, zero timeouts
- `zig build test` runs the unit tests **and** the example/issue corpus as
one suite — a failing example fails the build. The corpus is driven by a
pure-Zig test (`src/corpus_run.test.zig`) that spawns the installed `sx`
binary per example (subprocess-isolated, with a per-run timeout), so no
shell script is involved.
2. **Regenerate snapshots**: `zig build test -Dupdate-goldens`
- Flips the corpus test to write each example's expected
`.exit`/`.stdout`/`.stderr` (+ `.ir` where one already exists) from
freshly-normalized output instead of asserting against it. This is the
preferred way to update snapshots — no shell script needed.
- A test is still keyed off its `expected/<name>.exit` marker, so seed an
empty marker first for a brand-new example (see "Adding a feature").
3. **Standalone corpus run** (optional): `bash tests/run_examples.sh`
- Runs the corpus independent of `zig build test` (used by
`tools/verify-step.sh`). `--update` still regenerates snapshots and
produces byte-identical output to `-Dupdate-goldens`.
- Every test must show `ok` (currently 626); zero failures, zero timeouts.
- Uses GNU `timeout`/`gtimeout` when present (Homebrew coreutils on macOS)
and runs without a per-test wall-clock guard when neither is found.
- The two normalizers (`normalize`/`normalize_ir` in the script and the
mirrors in `src/corpus_run.test.zig`) must stay in lockstep.
### Test layout
@@ -445,12 +466,12 @@ dirs) under the same `XXXX-` prefix.
### Snapshot integrity
**Never run `--update` while tests are failing.** The `--update` flag blindly overwrites expected output with whatever the compiler produces — including error messages. If you update snapshots during a broken state, the test suite will "pass" against garbage output and real regressions become invisible.
**Never regenerate snapshots while tests are failing.** `-Dupdate-goldens` (and the legacy `--update`) blindly overwrite expected output with whatever the compiler produces — including error messages. If you regenerate during a broken state, the test suite will "pass" against garbage output and real regressions become invisible.
Safe workflow:
1. Fix the code until `bash tests/run_examples.sh` passes against the **existing** snapshots.
2. Only run `--update` when you've intentionally changed output (new feature, new test, changed formatting).
3. After `--update`, review the diff (`git diff examples/expected/ issues/expected/`) to confirm no error messages or empty output were captured.
1. Fix the code until `zig build test` passes against the **existing** snapshots.
2. Only run `zig build test -Dupdate-goldens` when you've intentionally changed output (new feature, new test, changed formatting).
3. After regenerating, review the diff (`git diff examples/expected/ issues/expected/`) to confirm no error messages or empty output were captured.
### Adding a new language feature
@@ -461,19 +482,20 @@ There is no monolithic smoke file — each feature is its own focused example.
2. Run it: `./zig-out/bin/sx run examples/XXXX-<category>-<name>.sx`
3. Seed the marker and capture expected output:
`: > examples/expected/XXXX-<category>-<name>.exit` then
`bash tests/run_examples.sh --update`
4. Verify all tests still pass: `bash tests/run_examples.sh`
`zig build test -Dupdate-goldens`
4. Verify all tests still pass: `zig build test`
### Test file roles
| File | Purpose |
|------|---------|
| `examples/XXXX-category-name.sx` | Focused feature example — one feature per file. |
| `examples/expected/XXXX-category-name.{exit,stdout,stderr}` | Expected exit code + the two output streams. Regenerate with `--update`. |
| `examples/expected/XXXX-category-name.{exit,stdout,stderr}` | Expected exit code + the two output streams. Regenerate with `zig build test -Dupdate-goldens`. |
| `examples/expected/XXXX-category-name.ir` | Optional `sx ir` snapshot — present only where lowering shape is locked. |
| `issues/NNNN-slug.md` | Open-issue / bug-report writeup (mark RESOLVED in a banner when fixed; the `.md` stays). |
| `issues/NNNN-slug.sx` (+ `issues/NNNN-slug/`) | The issue's minimal repro, co-located with the `.md`. A repro with an `issues/expected/NNNN-slug.exit` marker runs in the suite; unpinned ones don't. |
| `tests/run_examples.sh` | Test runner. Scans `examples/` and `issues/`; compares stdout/stderr/exit (+ optional IR) per test. |
| `src/corpus_run.test.zig` | The corpus runner inside `zig build test` — spawns `sx` per example, diffs stdout/stderr/exit (+ optional IR); regenerates snapshots under `-Dupdate-goldens`. |
| `tests/run_examples.sh` | Standalone shell runner (used by `tools/verify-step.sh`); same compare + `--update` as the Zig test. |
### Unit test file convention
@@ -496,8 +518,8 @@ All Zig unit tests live in separate `*.test.zig` files alongside the source they
open bug, `issues/NNNN-slug.{md,sx}` (repro co-located with the writeup).
2. Run it: `./zig-out/bin/sx run <path>.sx`
3. Seed the marker (`: > <root>/expected/<name>.exit`) and capture expected:
`bash tests/run_examples.sh --update`
4. Verify: `bash tests/run_examples.sh`
`zig build test -Dupdate-goldens`
4. Verify: `zig build test`
### Resolving an open issue
@@ -505,8 +527,8 @@ When a bug filed under `issues/NNNN-slug.{md,sx}` is fixed:
1. Move the repro into the feature suite as a regression test:
`git mv issues/NNNN-slug.sx examples/XXXX-<category>-<name>.sx`.
2. Seed `examples/expected/XXXX-<category>-<name>.exit`, capture with `--update`,
and review the diff.
2. Seed `examples/expected/XXXX-<category>-<name>.exit`, capture with
`zig build test -Dupdate-goldens`, and review the diff.
3. Tighten the example's comment header to describe the feature (keep a one-line
`Regression (issue NNNN)` note for provenance).
4. Mark `issues/NNNN-slug.md` RESOLVED with a short banner (root cause + fix +