The chess panel-text regression (text vanished after the first move on macOS) had a single root cause: GlyphCache's entries List, hash table, and shaped_buf grew through `context.allocator` — which during render is the per-frame arena. On the next arena reset the backing died, and subsequent glyph lookups read garbage / wrote into freshly-allocated view-tree memory. Fix is shaped as the user proposed: `List(T)`'s mutations take an optional trailing `alloc: Allocator = context.allocator` argument. No allocator stored on the container, no init ceremony, every existing `list.append(item)` callsite keeps working unchanged. Long-lived owners now write `list.append(item, self.parent_allocator)` and the arena-leak bug becomes impossible to write accidentally. Default-arg substitution previously only fired for identifier callees (`expandCallDefaults` at lower.zig:7978). Extended to the generic struct-method dispatch path (`list.append(...)` lands here) via a new `appendDefaultArgs` helper that lowers fd.params[i].default_expr in the caller's scope and appends to the lowered args slice. Long-lived owners updated to capture `parent_allocator: Allocator` at init and use it for every internal growth: - GlyphCache (the chess bug) — entries, shaped_buf, hash_keys, hash_vals, atlas bitmap. - DockInteraction — drops the existing `push Context` workaround in `ensure_capacity` for the explicit-arg form. - StateStore — entries list + per-entry data buffer. - Gles3Gpu, MetalGPU — shaders, buffers, textures (atlas-grow during render would otherwise leak resources into the frame arena). Also kept: an operator-precedence fix in pipeline.sx (`(self.frame_index & 1) == 0` instead of `self.frame_index & 1 == 0`, which parses as `self.frame_index & (1 == 0)` = always 0). That was a stealth single-arena-only bug that masked the GlyphCache one for a long time. Docs: - specs.md §11 documents `param: T = expr` default parameter values. The parser already supported it — formalised in the spec now. - current/CHECKPOINT-MEM.md logs the change. - CLAUDE.md REJECTED PATTERNS gains a "Long-lived containers growing through context.allocator" section with the `parent_allocator` capture template and the list of existing examples to mirror. 155/155 example tests pass — zero-diff against snapshots since every existing callsite still resolves to `context.allocator`.
23 KiB
sx compiler — session instructions
IMPASSIBLE RULES — no exceptions
When you hit a sx compiler bug during normal work
STOP. File the issue. Wait for a fix in another session. Do NOT work around the bug. Do NOT continue with adjacent work. Do NOT land code that depends on the not-yet-fixed behaviour.
Procedure:
- Create
issues/NNNN-slug.md(next freeNNNN, see existingissues/). - The file must contain:
- Symptom — one-line summary + observed vs expected.
- Reproduction — minimal sx code (inline fenced block). Must
reproduce the bug standalone, no project dependencies beyond
modules/std.sx/modules/allocators.sx. - Investigation prompt — a ready-to-paste prompt the user can drop into a fresh session to fix the bug. Should include: the suspected area of the compiler (file + function), what the fix likely needs to do, and the verification step (run the repro, expect new output).
- Update
current/CHECKPOINT-MEM.md(or the relevant stream's checkpoint) — mark## Current stateas BLOCKED on issue NNNN. - Tell the user: bug filed at
issues/NNNN-slug.md, work paused pending fix. - STOP. Do not migrate, do not refactor, do not look for an alternative path. The session ends until the user returns with confirmation the fix landed.
This rule is uncontestable. None of the following override the stop:
- "But the workaround is small."
- "But this just affects one file."
- "But I can route around it."
- "But the bug is pre-existing — not introduced by my change."
- "But it doesn't block what I was just doing — only future work."
- "But I already finished the step I was on; this is adjacent."
- "But the fix is obviously safe to land alongside."
If filing an issue is on the table, the answer is STOP. Period. Do not weigh "blocking vs non-blocking". Do not bundle the issue filing with continued work in the same session. Do not finalise the checkpoint, regen snapshots, or move to the next phase after filing. Filing the issue IS the last action of the session.
Every workaround during execution becomes technical debt that hides the bug from the next person who hits it. Every "I'll just keep going since it's pre-existing" leaves the next session believing the bug is gated behind their work and not yours. The cost of stopping is one paused session; the cost of working around — or "just finishing this last thing" — is permanent silent fragility.
If you genuinely think the bug is not a bug (e.g. you've misunderstood specs.md), STILL file the issue with that hypothesis in the prompt — let the fix session confirm or correct.
The two acceptable actions after recognising a compiler bug:
- File
issues/NNNN-slug.md, mark the checkpoint BLOCKED, stop. - If the bug surfaced AFTER you've already shipped a complete step (built + tests pass + checkpoint logged), still file as in (1) and still stop — do not roll forward into the next step.
There is no "wrap up first" option.
REJECTED PATTERNS — never generate these
Silent fallback defaults in the compiler
❌ Forbidden: returning a "reasonable-looking" default value when a lookup fails in the compiler. Examples of the pattern to root out:
// NEVER write this — lookup fails, return s64 and pretend nothing
// happened. Any caller asking `what type is this?` gets a lie.
return self.module.types.findByName(name_id) orelse .s64;
// NEVER write this — same shape, dressed up:
return scope.lookup(name) orelse default_type;
return resolved_field orelse .void;
These defaults silently produce wrong results in cases the implementer
didn't think of. The classic failure mode: the default coincidentally
matches the size/shape of one common case, so the test suite passes
and the bug ships invisibly. issue-0042 lived for years because
resolveTypeArg's orelse .s64 returned 8 bytes for unresolved
type-alias names — coincidentally correct for any 8-byte target
(s64, *T, f64, function pointers), and silently wrong for
everything else.
✅ Required: when a lookup that must succeed fails, emit a
diagnostic via self.diagnostics.addFmt(.err, span, "...", .{...})
and return the most clearly-broken sentinel the calling code can
survive (e.g. .void, a Ref.none, or via a ?T return that forces
the caller to handle the null). Errors must surface to the user as
text, not as a silently-corrupted size or alignment.
If you find an existing default-return in the compiler that swallows a lookup failure, treat it as a discovered bug — file an issue per the IMPASSIBLE RULES above, do not just delete the default in place without surfacing what it was hiding.
Silent unimplemented arms (catch-all else branches)
❌ Forbidden: a switch / if-chain over a Value tag, Op variant,
TypeId, etc. whose else branch silently does the wrong thing —
returns the input unchanged, returns a zero/null/undef default, picks
one common width and writes that many bytes, swallows an error into
.void_val so the caller fills a zero-init const, etc. The pattern
is identical in spirit to the silent-fallback-defaults rule above:
a case the implementer didn't think of falls through to behaviour
that looks like it worked but corrupts the data downstream.
Examples of patterns we've burned ourselves on:
// NEVER write this — `.int` value at a raw destination, write 8
// bytes regardless of actual IR type. Silently clobbers neighbors
// when the destination is sub-8.
.int => |v| {
const bytes = std.mem.toBytes(v);
@memcpy(dst[0..bytes.len], &bytes);
},
// NEVER write this — `.deref` of anything but slot_ptr passes the
// value through unchanged. Looks like a successful deref to callers;
// silently wrong for raw pointers.
.deref => |u| switch (frame.getRef(u.operand)) {
.slot_ptr => |s| return frame.loadSlot(s),
else => return val, // ← silently wrong
},
// NEVER write this — comptime init error becomes void_val, which the
// LLVM emitter happily turns into a zero-init constant. The user sees
// the const evaluating to 0 with no diagnostic.
const result = interp.call(func_id, &.{}) catch .void_val;
✅ Required: either (1) implement the arm correctly in the same
step as the one that introduces the new shape, or (2) bail loudly
with a one-line diagnostic that names the specific case. For the
interp we have bailDetail(comptime msg) which sets
Interpreter.last_bail_detail so the host diagnostic surfaces "op=X:
" instead of a bare CannotEvalComptime. Mirror the same
pattern in any new evaluator / interpreter / serializer.
Preferred order: implement the arm. Only fall back to "bail loudly" when the implementation requires plumbing that's out of scope for the current step. In that case, leave a one-line comment explaining what would be needed to implement it properly — so the next person hitting the diagnostic has a head-start.
If a path requires width / type / layout information that isn't
threaded into the IR op yet, prefer to add the field to the op
struct (Store.val_ty-style) over leaving an "8 bytes assumed"
shortcut. The field plumbing is a one-time cost; silent-clobber
debugging is forever.
When in doubt: else => return bailDetail("clear one-line reason")
beats else => unreachable beats else => /* hope */.
Allocator construction
❌ Forbidden: the "caller provides storage" pattern (in any form):
// NEVER write this — explicit @ptr:
g_gpa : GPA = ---;
alloc := GPA.create(@g_gpa);
// NEVER write this — UFCS-disguised same pattern:
gpa_state : GPA = .{ alloc_count = 0 };
gpa := gpa_state.create();
// NEVER write this — in-place init on a struct field:
self.arena_a.create(parent, size);
❌ Also forbidden: wrapping an init result through a cast just
to bind a typed pointer you already have:
// NEVER write this — tracker is already *TrackingAllocator:
tracker := TrackingAllocator.init(context.allocator);
t : *TrackingAllocator = xx tracker; // redundant rename
t.report();
✅ Required: init returns the concrete typed pointer (*T);
caller casts xx ptr to Allocator only at use sites that need the
protocol value.
gpa := GPA.init(); // *GPA
arena := Arena.init(xx gpa, 4096); // *Arena ; xx gpa → Allocator for parent
tracker := TrackingAllocator.init(context.allocator); // *TrackingAllocator
push Context.{ allocator = xx tracker, data = null } { ... }
print("gpa allocs: {}\n", gpa.alloc_count); // direct field access
tracker.report(); // direct method call
arena.reset(); // direct method call
The rule exists because:
createreturningAllocatorforces aninstance()accessor or a cast-back to recover the typed pointer (extra step every time).- Caller-storage patterns are verbose, error-prone (easy to pass the wrong @ptr), and an artifact of an earlier allocator design.
initreturning*Tmatches Zig conventions and lets the caller decide where/how to cast toAllocator.
See current/CHECKPOINT-MEM.md ISSUE-MEM-005 for the migration
history. If an existing allocator type still uses the old create
pattern, migrate it OR ask the user — never propagate the pattern
in new code, docstrings, examples, or tests.
Long-lived containers growing through context.allocator
❌ Forbidden: a struct that outlives any single
push Context { ... } scope (caches, persistent UI state, GPU
resource tables, anything-accessed-across-frame-boundaries) appending
to or growing an internal List/hash/buffer using whatever
context.allocator happens to be at the call site:
GlyphCache :: struct {
entries: List(GlyphEntry);
// ...
rasterize :: (self: *GlyphCache, ...) {
// BAD — during render, context.allocator is a per-frame arena,
// so the entries List backing dies on the next arena reset.
self.entries.append(entry);
}
}
The chess panel-text bug (text vanished after the first move) was
exactly this shape: GlyphCache.entries, hash_keys, hash_vals,
and shaped_buf all grew through the per-frame arena.
✅ Required: capture the long-lived allocator at init time on a
parent_allocator: Allocator field, and forward it explicitly to
every internal growth point. List(T) mutations take an optional
trailing alloc: Allocator = context.allocator, so the call site
just names the owner:
GlyphCache :: struct {
entries: List(GlyphEntry);
parent_allocator: Allocator;
init :: (self: *GlyphCache, ...) {
self.parent_allocator = context.allocator; // libc / GPA at init
// ...
}
rasterize :: (self: *GlyphCache, ...) {
// GOOD — entries always grows through the long-lived owner,
// regardless of who's pushed what context above us.
self.entries.append(entry, self.parent_allocator);
}
}
Heuristic for "is this struct long-lived?" — if its init is called
once at startup (or once per logical instance) and its methods are
called from a frame/event/render hot path, it's long-lived. Capture
parent_allocator and use it for every internal growth call.
The same applies to direct context.allocator.alloc(...) /
.dealloc(...) inside such structs — replace with
self.parent_allocator.alloc(...) / .dealloc(...).
Existing examples of this pattern (use as templates):
library/modules/ui/glyph_cache.sx— atlas, hash table, entries, shaped_buf.library/modules/ui/dock.sx— DockInteraction's nine per-child Lists.library/modules/ui/state.sx— StateStore.entries.library/modules/gpu/gles3.sx,library/modules/gpu/metal.sx— shaders / buffers / textures.library/modules/ui/pipeline.sx— UIPipeline (used for arena parents).
RenderTree.nodes in pipeline.sx is the opposite case — it's
intentionally per-frame arena-allocated and gets its items field
zeroed at the top of tick_with_body. Don't migrate that one.
On every session start
Three active workstreams run in parallel — IR (the language compiler), FFI (Obj-C / JNI ceremony reduction), and MEM (memory module overhaul, mem.sx + protocol expansion). They touch mostly disjoint files; any can be advanced independently.
- Read all three checkpoints to see where each stream is paused:
current/CHECKPOINT.md— IR progress tracker.current/CHECKPOINT-FFI.md— FFI progress tracker.current/CHECKPOINT-MEM.md— MEM progress tracker + issues log.
- Read the plan that corresponds to the stream the user wants to advance:
current/PLAN.md— IR implementation plan.current/PLAN-FFI.md— FFI ceremony reduction plan.~/.claude/plans/tidy-doodling-cray.md— MEM (mem.sx) implementation plan.
- Read
specs.mdif you need to understand language behavior. - Pick up from the next incomplete step in the relevant
CHECKPOINT*.md. If the user hasn't said which stream to work on, ask before picking.
Note:
implementation_plan.mdis the archive of completed work (closures, protocols, auto type erasure, init blocks). Do NOT pick up unchecked items from it — those are on hold until the IR work is done.
While working
- Work on one step at a time. Complete it fully before moving on.
- After completing a step, immediately update the relevant checkpoint
(
current/CHECKPOINT.mdfor IR,current/CHECKPOINT-FFI.mdfor FFI):- Update
## Last completed stepwith the step you just finished. - Update
## Current statewith what exists now. - Update
## Next stepwith what comes next. - Add a log entry under
## Log.
- Update
- If a step fails or you get stuck:
- Add the issue to
## Known issuesin the relevant checkpoint. - Do NOT skip the step — fix the blocker or ask the user.
- Add the issue to
- If you make a design decision not already in
specs.md, add it to## Decisions Loginimplementation_plan.md. - FFI cadence rule (from
current/PLAN-FFI.md): no commit may both add a test AND make it pass. Either lock in current behavior with a passing test, or land an expected-failing test that the very next commit turns green.
IR-specific rules (from current/PLAN.md)
- Never modify
src/codegen.zigin Phases 0–1. It is the safety net. - In Phase 3, only read specific sections of codegen.zig (grep for the relevant handler).
- No step should require reading more than ~1,000 lines of existing code. If it does, split it.
- No step should produce more than ~500 lines of new code. If it does, split it.
- If Claude gets confused mid-step, stop, update
current/CHECKPOINT.mdwith partial progress, and tell the user to start a new session.
Context management
- Each step is scoped so it can be completed in a single session without exhausting context.
- If you're running low on context, stop at the current step boundary, update
current/CHECKPOINT.mdwith your progress, and tell the user to start a new session. - Never try to complete multiple phases in one session unless each is small.
Build and verify
After any code change:
zig build # must compile
zig build test # must pass
After completing a phase's final step, run the phase's end-to-end verification command listed in current/PLAN.md.
Testing
After any compiler change:
- Build:
zig build && zig build test - Run regression tests:
bash tests/run_examples.sh- All 29 tests must show
ok - Zero failures, zero timeouts
- All 29 tests must show
Snapshot integrity
Never run --update while tests are failing. The --update flag blindly overwrites expected output with whatever the compiler produces — including error messages. If you update snapshots during a broken state, the test suite will "pass" against garbage output and real regressions become invisible.
Safe workflow:
- Fix the code until
bash tests/run_examples.shpasses against the existing snapshots. - Only run
--updatewhen you've intentionally changed output (new feature, new test, changed formatting). - After
--update, review the diff (git diff tests/expected/) to confirm no error messages or empty output were captured.
Adding a new language feature
When implementing a new feature:
- Add test case(s) to
examples/50-smoke.sxin the appropriate section - Run
./zig-out/bin/sx run examples/50-smoke.sxto verify it works - Regenerate expected output:
bash tests/run_examples.sh --update - Verify all tests still pass:
bash tests/run_examples.sh
Test file roles
| File | Purpose |
|---|---|
examples/50-smoke.sx |
Comprehensive feature coverage (~200 tests). Add new features here. |
examples/NN-name.sx |
Focused feature examples (e.g. 01-basic.sx, 52-frameworks.sx). Created either fresh or by renaming a resolved issue-*.sx once its bug is fixed. |
examples/issue-NNNN.sx |
Open bug repros. Each file is a literal item on the open-issue list, named after the issue number. When the bug is fixed, rename the file (and its tests/expected/issue-NNNN.{txt,exit}) to a focused feature example so the issue-* namespace shrinks to exactly the unresolved set. A file without a matching tests/expected/issue-NNNN.txt is an open issue that hasn't been pinned in the test suite yet. |
tests/expected/*.txt |
Expected output per example. Regenerate with --update. |
tests/expected/*.exit |
Expected exit codes. Auto-generated with --update. |
tests/run_examples.sh |
Test runner. Compares actual vs expected output. |
Unit test file convention
All Zig unit tests live in separate *.test.zig files alongside the source they test:
| Source file | Test file |
|---|---|
src/ir/types.zig |
src/ir/types.test.zig |
src/ir/interp.zig |
src/ir/interp.test.zig |
src/ir/lower.zig |
src/ir/lower.test.zig |
| ... | ... |
- Never put
testblocks directly in source files. All tests go in the corresponding.test.zigfile. - Each
.test.zigfile must be imported in the barrel file (src/ir/ir.zig) sozig build testdiscovers them viarefAllDecls. - When adding a new source file that needs tests, create the
.test.zigfile and addpub const foo_tests = @import("foo.test.zig");to the barrel.
Creating a new standalone test
- Create
examples/NN-name.sx(focused feature example) orexamples/issue-NNNN.sx(open-bug repro). - Run it:
./zig-out/bin/sx run examples/<file>.sx - Create expected output:
bash tests/run_examples.sh --update - Verify:
bash tests/run_examples.sh
Resolving an open issue
When a bug filed as examples/issue-NNNN.sx is fixed, the file should leave the issue-namespace:
- Pick a focused feature name with the next free number, e.g.
examples/52-frameworks.sx. git mv examples/issue-NNNN.sx examples/NN-name.sx.git mv tests/expected/issue-NNNN.txt tests/expected/NN-name.txt(and the.exitfile).- Tighten the comment header to describe the feature (drop the issue-NNNN provenance — that lives in git history now).
- Run the suite to confirm nothing else broke.
The examples/issue-* glob after step 5 is the literal list of what's still open.
Known bugs
When you encounter a known bug during unrelated work (e.g. a closure returning a pointer instead of a value while fixing forward references), do NOT fix it inline. Instead:
- Add it to
implementation_plan.mdunder a## Known Bugssection with a short description, reproduction steps, and the file/line where it manifests. - Continue with the current task, working around the bug if needed.
- Fix it at the end of the session (or in a future session) as a separate step.
This keeps the current task focused and avoids scope creep from non-trivial side-fixes.
Git commits
- Never add Co-Authored-By lines to commit messages.
Bundling lives in sx
Platform-specific bundling (Apple .app, Android .apk) is sx code.
The compiler shrinks to: parse → IR → codegen → link → invoke a sx
function. Codesigning / Info.plist / AndroidManifest / javac / d8 /
aapt2 / zipalign / apksigner / framework embed / entitlements / asset
trees all run in the IR interpreter post-link via libc / process.run
foreign calls.
| File | Role |
|---|---|
| library/modules/platform/bundle.sx | All four targets (macOS, iOS sim, iOS device, Android). Branches on BuildOptions.is_macos / is_ios / is_ios_device / is_ios_simulator / is_android accessors. |
| library/modules/fs.sx | POSIX file stdlib (open / read / write / copy / mkdir / unlink / chmod / rename / exists / basename / dirname). |
| library/modules/process.sx | popen-based run(cmd) -> ?ProcessResult + env(name) + find_executable(name). |
| library/modules/compiler.sx | BuildOptions setters + accessors. Adding a new bundling parameter = add a setter here + a hook in compiler_hooks.zig. |
| library/modules/platform/android.sx | AndroidPlatform (state-on-struct, no module globals). sx_android_* helpers take plat: *AndroidPlatform as first arg. logical_w field drives dpi_scale = pixel_w / logical_w so consumer's design-width fits any physical resolution. |
| src/ir/compiler_hooks.zig | BuildConfig + every BuildOptions.* hook. Hook registry is in Registry.registerDefaults. |
| src/ir/host_ffi.zig | dlsym(RTLD_DEFAULT) + arity-switched cdecl trampolines. Lets #foreign("c") decls resolve at #run / post-link time against host libc. |
| src/main.zig | After target.link(), threads target_triple + frameworks + jni_main emissions into BuildConfig, then invokes the post-link callback by FuncId (or by <module>.bundle_main name). --bundle / --apk flags feed bundle_path; auto-fallback to post_link_module = "platform.bundle" when bundle_path is set without a registered callback. |
Specifics in specs.md §10.5. The full bundling pipeline spec — what runs per Apple target vs Android, what each accessor returns, the BuildConfig forwarded from main.zig — lives there.
Wiring a new bundling step:
- Add the parameter as a setter on
BuildOptions :: struct #compiler { ... }in library/modules/compiler.sx. - Add the
BuildConfigfield + setter hook + accessor hook in src/ir/compiler_hooks.zig. Register both inRegistry.registerDefaults. - Optionally forward a CLI flag in src/main.zig before the post-link invocation.
- Read the accessor from library/modules/platform/bundle.sx.
File roles
| File | Role |
|---|---|
specs.md |
Language specification. Source of truth for syntax/semantics. |
current/PLAN.md |
Active IR implementation plan. |
current/CHECKPOINT.md |
Active IR progress tracker. Update after every step. |
current/PLAN-FFI.md |
Active FFI ceremony reduction plan (Obj-C / JNI intrinsics, JNI DSL, Obj-C header import). |
current/CHECKPOINT-FFI.md |
Active FFI progress tracker. Update after every step. |
implementation_plan.md |
Archive of completed work (closures, protocols, etc.). Do not pick up tasks from here. |
readme.md |
Original syntax sketches. Do not modify. |
CLAUDE.md |
This file. Session instructions. |
library/modules/platform/bundle.sx |
sx-side .app / .apk bundler. See "Bundling lives in sx" above. |
library/modules/fs.sx, library/modules/process.sx |
POSIX stdlib for the bundler + general consumer use. |