Files
sx/CLAUDE.md
agra 72593db953 mem: List(T) mutations gain optional alloc: Allocator = context.allocator
The chess panel-text regression (text vanished after the first move on
macOS) had a single root cause: GlyphCache's entries List, hash table,
and shaped_buf grew through `context.allocator` — which during render
is the per-frame arena. On the next arena reset the backing died, and
subsequent glyph lookups read garbage / wrote into freshly-allocated
view-tree memory.

Fix is shaped as the user proposed: `List(T)`'s mutations take an
optional trailing `alloc: Allocator = context.allocator` argument. No
allocator stored on the container, no init ceremony, every existing
`list.append(item)` callsite keeps working unchanged. Long-lived
owners now write `list.append(item, self.parent_allocator)` and the
arena-leak bug becomes impossible to write accidentally.

Default-arg substitution previously only fired for identifier callees
(`expandCallDefaults` at lower.zig:7978). Extended to the generic
struct-method dispatch path (`list.append(...)` lands here) via a new
`appendDefaultArgs` helper that lowers fd.params[i].default_expr in
the caller's scope and appends to the lowered args slice.

Long-lived owners updated to capture `parent_allocator: Allocator` at
init and use it for every internal growth:

- GlyphCache (the chess bug) — entries, shaped_buf, hash_keys,
  hash_vals, atlas bitmap.
- DockInteraction — drops the existing `push Context` workaround in
  `ensure_capacity` for the explicit-arg form.
- StateStore — entries list + per-entry data buffer.
- Gles3Gpu, MetalGPU — shaders, buffers, textures (atlas-grow during
  render would otherwise leak resources into the frame arena).

Also kept: an operator-precedence fix in pipeline.sx
(`(self.frame_index & 1) == 0` instead of
`self.frame_index & 1 == 0`, which parses as
`self.frame_index & (1 == 0)` = always 0). That was a stealth
single-arena-only bug that masked the GlyphCache one for a long time.

Docs:
- specs.md §11 documents `param: T = expr` default parameter values.
  The parser already supported it — formalised in the spec now.
- current/CHECKPOINT-MEM.md logs the change.
- CLAUDE.md REJECTED PATTERNS gains a "Long-lived containers growing
  through context.allocator" section with the `parent_allocator`
  capture template and the list of existing examples to mirror.

155/155 example tests pass — zero-diff against snapshots since every
existing callsite still resolves to `context.allocator`.
2026-05-25 14:41:17 +03:00

23 KiB
Raw Blame History

sx compiler — session instructions

IMPASSIBLE RULES — no exceptions

When you hit a sx compiler bug during normal work

STOP. File the issue. Wait for a fix in another session. Do NOT work around the bug. Do NOT continue with adjacent work. Do NOT land code that depends on the not-yet-fixed behaviour.

Procedure:

  1. Create issues/NNNN-slug.md (next free NNNN, see existing issues/).
  2. The file must contain:
    • Symptom — one-line summary + observed vs expected.
    • Reproduction — minimal sx code (inline fenced block). Must reproduce the bug standalone, no project dependencies beyond modules/std.sx / modules/allocators.sx.
    • Investigation prompt — a ready-to-paste prompt the user can drop into a fresh session to fix the bug. Should include: the suspected area of the compiler (file + function), what the fix likely needs to do, and the verification step (run the repro, expect new output).
  3. Update current/CHECKPOINT-MEM.md (or the relevant stream's checkpoint) — mark ## Current state as BLOCKED on issue NNNN.
  4. Tell the user: bug filed at issues/NNNN-slug.md, work paused pending fix.
  5. STOP. Do not migrate, do not refactor, do not look for an alternative path. The session ends until the user returns with confirmation the fix landed.

This rule is uncontestable. None of the following override the stop:

  • "But the workaround is small."
  • "But this just affects one file."
  • "But I can route around it."
  • "But the bug is pre-existing — not introduced by my change."
  • "But it doesn't block what I was just doing — only future work."
  • "But I already finished the step I was on; this is adjacent."
  • "But the fix is obviously safe to land alongside."

If filing an issue is on the table, the answer is STOP. Period. Do not weigh "blocking vs non-blocking". Do not bundle the issue filing with continued work in the same session. Do not finalise the checkpoint, regen snapshots, or move to the next phase after filing. Filing the issue IS the last action of the session.

Every workaround during execution becomes technical debt that hides the bug from the next person who hits it. Every "I'll just keep going since it's pre-existing" leaves the next session believing the bug is gated behind their work and not yours. The cost of stopping is one paused session; the cost of working around — or "just finishing this last thing" — is permanent silent fragility.

If you genuinely think the bug is not a bug (e.g. you've misunderstood specs.md), STILL file the issue with that hypothesis in the prompt — let the fix session confirm or correct.

The two acceptable actions after recognising a compiler bug:

  1. File issues/NNNN-slug.md, mark the checkpoint BLOCKED, stop.
  2. If the bug surfaced AFTER you've already shipped a complete step (built + tests pass + checkpoint logged), still file as in (1) and still stop — do not roll forward into the next step.

There is no "wrap up first" option.

REJECTED PATTERNS — never generate these

Silent fallback defaults in the compiler

Forbidden: returning a "reasonable-looking" default value when a lookup fails in the compiler. Examples of the pattern to root out:

// NEVER write this — lookup fails, return s64 and pretend nothing
// happened. Any caller asking `what type is this?` gets a lie.
return self.module.types.findByName(name_id) orelse .s64;

// NEVER write this — same shape, dressed up:
return scope.lookup(name) orelse default_type;
return resolved_field orelse .void;

These defaults silently produce wrong results in cases the implementer didn't think of. The classic failure mode: the default coincidentally matches the size/shape of one common case, so the test suite passes and the bug ships invisibly. issue-0042 lived for years because resolveTypeArg's orelse .s64 returned 8 bytes for unresolved type-alias names — coincidentally correct for any 8-byte target (s64, *T, f64, function pointers), and silently wrong for everything else.

Required: when a lookup that must succeed fails, emit a diagnostic via self.diagnostics.addFmt(.err, span, "...", .{...}) and return the most clearly-broken sentinel the calling code can survive (e.g. .void, a Ref.none, or via a ?T return that forces the caller to handle the null). Errors must surface to the user as text, not as a silently-corrupted size or alignment.

If you find an existing default-return in the compiler that swallows a lookup failure, treat it as a discovered bug — file an issue per the IMPASSIBLE RULES above, do not just delete the default in place without surfacing what it was hiding.

Silent unimplemented arms (catch-all else branches)

Forbidden: a switch / if-chain over a Value tag, Op variant, TypeId, etc. whose else branch silently does the wrong thing — returns the input unchanged, returns a zero/null/undef default, picks one common width and writes that many bytes, swallows an error into .void_val so the caller fills a zero-init const, etc. The pattern is identical in spirit to the silent-fallback-defaults rule above: a case the implementer didn't think of falls through to behaviour that looks like it worked but corrupts the data downstream.

Examples of patterns we've burned ourselves on:

// NEVER write this — `.int` value at a raw destination, write 8
// bytes regardless of actual IR type. Silently clobbers neighbors
// when the destination is sub-8.
.int => |v| {
    const bytes = std.mem.toBytes(v);
    @memcpy(dst[0..bytes.len], &bytes);
},

// NEVER write this — `.deref` of anything but slot_ptr passes the
// value through unchanged. Looks like a successful deref to callers;
// silently wrong for raw pointers.
.deref => |u| switch (frame.getRef(u.operand)) {
    .slot_ptr => |s| return frame.loadSlot(s),
    else => return val,  // ← silently wrong
},

// NEVER write this — comptime init error becomes void_val, which the
// LLVM emitter happily turns into a zero-init constant. The user sees
// the const evaluating to 0 with no diagnostic.
const result = interp.call(func_id, &.{}) catch .void_val;

Required: either (1) implement the arm correctly in the same step as the one that introduces the new shape, or (2) bail loudly with a one-line diagnostic that names the specific case. For the interp we have bailDetail(comptime msg) which sets Interpreter.last_bail_detail so the host diagnostic surfaces "op=X: " instead of a bare CannotEvalComptime. Mirror the same pattern in any new evaluator / interpreter / serializer.

Preferred order: implement the arm. Only fall back to "bail loudly" when the implementation requires plumbing that's out of scope for the current step. In that case, leave a one-line comment explaining what would be needed to implement it properly — so the next person hitting the diagnostic has a head-start.

If a path requires width / type / layout information that isn't threaded into the IR op yet, prefer to add the field to the op struct (Store.val_ty-style) over leaving an "8 bytes assumed" shortcut. The field plumbing is a one-time cost; silent-clobber debugging is forever.

When in doubt: else => return bailDetail("clear one-line reason") beats else => unreachable beats else => /* hope */.

Allocator construction

Forbidden: the "caller provides storage" pattern (in any form):

// NEVER write this — explicit @ptr:
g_gpa : GPA = ---;
alloc := GPA.create(@g_gpa);

// NEVER write this — UFCS-disguised same pattern:
gpa_state : GPA = .{ alloc_count = 0 };
gpa := gpa_state.create();

// NEVER write this — in-place init on a struct field:
self.arena_a.create(parent, size);

Also forbidden: wrapping an init result through a cast just to bind a typed pointer you already have:

// NEVER write this — tracker is already *TrackingAllocator:
tracker := TrackingAllocator.init(context.allocator);
t : *TrackingAllocator = xx tracker;   // redundant rename
t.report();

Required: init returns the concrete typed pointer (*T); caller casts xx ptr to Allocator only at use sites that need the protocol value.

gpa := GPA.init();                          // *GPA
arena := Arena.init(xx gpa, 4096);          // *Arena ; xx gpa → Allocator for parent
tracker := TrackingAllocator.init(context.allocator);  // *TrackingAllocator

push Context.{ allocator = xx tracker, data = null } { ... }

print("gpa allocs: {}\n", gpa.alloc_count);   // direct field access
tracker.report();                              // direct method call
arena.reset();                                 // direct method call

The rule exists because:

  • create returning Allocator forces an instance() accessor or a cast-back to recover the typed pointer (extra step every time).
  • Caller-storage patterns are verbose, error-prone (easy to pass the wrong @ptr), and an artifact of an earlier allocator design.
  • init returning *T matches Zig conventions and lets the caller decide where/how to cast to Allocator.

See current/CHECKPOINT-MEM.md ISSUE-MEM-005 for the migration history. If an existing allocator type still uses the old create pattern, migrate it OR ask the user — never propagate the pattern in new code, docstrings, examples, or tests.

Long-lived containers growing through context.allocator

Forbidden: a struct that outlives any single push Context { ... } scope (caches, persistent UI state, GPU resource tables, anything-accessed-across-frame-boundaries) appending to or growing an internal List/hash/buffer using whatever context.allocator happens to be at the call site:

GlyphCache :: struct {
    entries: List(GlyphEntry);
    // ...
    rasterize :: (self: *GlyphCache, ...) {
        // BAD — during render, context.allocator is a per-frame arena,
        // so the entries List backing dies on the next arena reset.
        self.entries.append(entry);
    }
}

The chess panel-text bug (text vanished after the first move) was exactly this shape: GlyphCache.entries, hash_keys, hash_vals, and shaped_buf all grew through the per-frame arena.

Required: capture the long-lived allocator at init time on a parent_allocator: Allocator field, and forward it explicitly to every internal growth point. List(T) mutations take an optional trailing alloc: Allocator = context.allocator, so the call site just names the owner:

GlyphCache :: struct {
    entries: List(GlyphEntry);
    parent_allocator: Allocator;

    init :: (self: *GlyphCache, ...) {
        self.parent_allocator = context.allocator;  // libc / GPA at init
        // ...
    }

    rasterize :: (self: *GlyphCache, ...) {
        // GOOD — entries always grows through the long-lived owner,
        // regardless of who's pushed what context above us.
        self.entries.append(entry, self.parent_allocator);
    }
}

Heuristic for "is this struct long-lived?" — if its init is called once at startup (or once per logical instance) and its methods are called from a frame/event/render hot path, it's long-lived. Capture parent_allocator and use it for every internal growth call.

The same applies to direct context.allocator.alloc(...) / .dealloc(...) inside such structs — replace with self.parent_allocator.alloc(...) / .dealloc(...).

Existing examples of this pattern (use as templates):

  • library/modules/ui/glyph_cache.sx — atlas, hash table, entries, shaped_buf.
  • library/modules/ui/dock.sx — DockInteraction's nine per-child Lists.
  • library/modules/ui/state.sx — StateStore.entries.
  • library/modules/gpu/gles3.sx, library/modules/gpu/metal.sx — shaders / buffers / textures.
  • library/modules/ui/pipeline.sx — UIPipeline (used for arena parents).

RenderTree.nodes in pipeline.sx is the opposite case — it's intentionally per-frame arena-allocated and gets its items field zeroed at the top of tick_with_body. Don't migrate that one.

On every session start

Three active workstreams run in parallel — IR (the language compiler), FFI (Obj-C / JNI ceremony reduction), and MEM (memory module overhaul, mem.sx + protocol expansion). They touch mostly disjoint files; any can be advanced independently.

  1. Read all three checkpoints to see where each stream is paused:
    • current/CHECKPOINT.md — IR progress tracker.
    • current/CHECKPOINT-FFI.md — FFI progress tracker.
    • current/CHECKPOINT-MEM.md — MEM progress tracker + issues log.
  2. Read the plan that corresponds to the stream the user wants to advance:
    • current/PLAN.md — IR implementation plan.
    • current/PLAN-FFI.md — FFI ceremony reduction plan.
    • ~/.claude/plans/tidy-doodling-cray.md — MEM (mem.sx) implementation plan.
  3. Read specs.md if you need to understand language behavior.
  4. Pick up from the next incomplete step in the relevant CHECKPOINT*.md. If the user hasn't said which stream to work on, ask before picking.

Note: implementation_plan.md is the archive of completed work (closures, protocols, auto type erasure, init blocks). Do NOT pick up unchecked items from it — those are on hold until the IR work is done.

While working

  • Work on one step at a time. Complete it fully before moving on.
  • After completing a step, immediately update the relevant checkpoint (current/CHECKPOINT.md for IR, current/CHECKPOINT-FFI.md for FFI):
    • Update ## Last completed step with the step you just finished.
    • Update ## Current state with what exists now.
    • Update ## Next step with what comes next.
    • Add a log entry under ## Log.
  • If a step fails or you get stuck:
    • Add the issue to ## Known issues in the relevant checkpoint.
    • Do NOT skip the step — fix the blocker or ask the user.
  • If you make a design decision not already in specs.md, add it to ## Decisions Log in implementation_plan.md.
  • FFI cadence rule (from current/PLAN-FFI.md): no commit may both add a test AND make it pass. Either lock in current behavior with a passing test, or land an expected-failing test that the very next commit turns green.

IR-specific rules (from current/PLAN.md)

  • Never modify src/codegen.zig in Phases 01. It is the safety net.
  • In Phase 3, only read specific sections of codegen.zig (grep for the relevant handler).
  • No step should require reading more than ~1,000 lines of existing code. If it does, split it.
  • No step should produce more than ~500 lines of new code. If it does, split it.
  • If Claude gets confused mid-step, stop, update current/CHECKPOINT.md with partial progress, and tell the user to start a new session.

Context management

  • Each step is scoped so it can be completed in a single session without exhausting context.
  • If you're running low on context, stop at the current step boundary, update current/CHECKPOINT.md with your progress, and tell the user to start a new session.
  • Never try to complete multiple phases in one session unless each is small.

Build and verify

After any code change:

zig build                              # must compile
zig build test                         # must pass

After completing a phase's final step, run the phase's end-to-end verification command listed in current/PLAN.md.

Testing

After any compiler change:

  1. Build: zig build && zig build test
  2. Run regression tests: bash tests/run_examples.sh
    • All 29 tests must show ok
    • Zero failures, zero timeouts

Snapshot integrity

Never run --update while tests are failing. The --update flag blindly overwrites expected output with whatever the compiler produces — including error messages. If you update snapshots during a broken state, the test suite will "pass" against garbage output and real regressions become invisible.

Safe workflow:

  1. Fix the code until bash tests/run_examples.sh passes against the existing snapshots.
  2. Only run --update when you've intentionally changed output (new feature, new test, changed formatting).
  3. After --update, review the diff (git diff tests/expected/) to confirm no error messages or empty output were captured.

Adding a new language feature

When implementing a new feature:

  1. Add test case(s) to examples/50-smoke.sx in the appropriate section
  2. Run ./zig-out/bin/sx run examples/50-smoke.sx to verify it works
  3. Regenerate expected output: bash tests/run_examples.sh --update
  4. Verify all tests still pass: bash tests/run_examples.sh

Test file roles

File Purpose
examples/50-smoke.sx Comprehensive feature coverage (~200 tests). Add new features here.
examples/NN-name.sx Focused feature examples (e.g. 01-basic.sx, 52-frameworks.sx). Created either fresh or by renaming a resolved issue-*.sx once its bug is fixed.
examples/issue-NNNN.sx Open bug repros. Each file is a literal item on the open-issue list, named after the issue number. When the bug is fixed, rename the file (and its tests/expected/issue-NNNN.{txt,exit}) to a focused feature example so the issue-* namespace shrinks to exactly the unresolved set. A file without a matching tests/expected/issue-NNNN.txt is an open issue that hasn't been pinned in the test suite yet.
tests/expected/*.txt Expected output per example. Regenerate with --update.
tests/expected/*.exit Expected exit codes. Auto-generated with --update.
tests/run_examples.sh Test runner. Compares actual vs expected output.

Unit test file convention

All Zig unit tests live in separate *.test.zig files alongside the source they test:

Source file Test file
src/ir/types.zig src/ir/types.test.zig
src/ir/interp.zig src/ir/interp.test.zig
src/ir/lower.zig src/ir/lower.test.zig
... ...
  • Never put test blocks directly in source files. All tests go in the corresponding .test.zig file.
  • Each .test.zig file must be imported in the barrel file (src/ir/ir.zig) so zig build test discovers them via refAllDecls.
  • When adding a new source file that needs tests, create the .test.zig file and add pub const foo_tests = @import("foo.test.zig"); to the barrel.

Creating a new standalone test

  1. Create examples/NN-name.sx (focused feature example) or examples/issue-NNNN.sx (open-bug repro).
  2. Run it: ./zig-out/bin/sx run examples/<file>.sx
  3. Create expected output: bash tests/run_examples.sh --update
  4. Verify: bash tests/run_examples.sh

Resolving an open issue

When a bug filed as examples/issue-NNNN.sx is fixed, the file should leave the issue-namespace:

  1. Pick a focused feature name with the next free number, e.g. examples/52-frameworks.sx.
  2. git mv examples/issue-NNNN.sx examples/NN-name.sx.
  3. git mv tests/expected/issue-NNNN.txt tests/expected/NN-name.txt (and the .exit file).
  4. Tighten the comment header to describe the feature (drop the issue-NNNN provenance — that lives in git history now).
  5. Run the suite to confirm nothing else broke.

The examples/issue-* glob after step 5 is the literal list of what's still open.

Known bugs

When you encounter a known bug during unrelated work (e.g. a closure returning a pointer instead of a value while fixing forward references), do NOT fix it inline. Instead:

  1. Add it to implementation_plan.md under a ## Known Bugs section with a short description, reproduction steps, and the file/line where it manifests.
  2. Continue with the current task, working around the bug if needed.
  3. Fix it at the end of the session (or in a future session) as a separate step.

This keeps the current task focused and avoids scope creep from non-trivial side-fixes.

Git commits

  • Never add Co-Authored-By lines to commit messages.

Bundling lives in sx

Platform-specific bundling (Apple .app, Android .apk) is sx code. The compiler shrinks to: parse → IR → codegen → link → invoke a sx function. Codesigning / Info.plist / AndroidManifest / javac / d8 / aapt2 / zipalign / apksigner / framework embed / entitlements / asset trees all run in the IR interpreter post-link via libc / process.run foreign calls.

File Role
library/modules/platform/bundle.sx All four targets (macOS, iOS sim, iOS device, Android). Branches on BuildOptions.is_macos / is_ios / is_ios_device / is_ios_simulator / is_android accessors.
library/modules/fs.sx POSIX file stdlib (open / read / write / copy / mkdir / unlink / chmod / rename / exists / basename / dirname).
library/modules/process.sx popen-based run(cmd) -> ?ProcessResult + env(name) + find_executable(name).
library/modules/compiler.sx BuildOptions setters + accessors. Adding a new bundling parameter = add a setter here + a hook in compiler_hooks.zig.
library/modules/platform/android.sx AndroidPlatform (state-on-struct, no module globals). sx_android_* helpers take plat: *AndroidPlatform as first arg. logical_w field drives dpi_scale = pixel_w / logical_w so consumer's design-width fits any physical resolution.
src/ir/compiler_hooks.zig BuildConfig + every BuildOptions.* hook. Hook registry is in Registry.registerDefaults.
src/ir/host_ffi.zig dlsym(RTLD_DEFAULT) + arity-switched cdecl trampolines. Lets #foreign("c") decls resolve at #run / post-link time against host libc.
src/main.zig After target.link(), threads target_triple + frameworks + jni_main emissions into BuildConfig, then invokes the post-link callback by FuncId (or by <module>.bundle_main name). --bundle / --apk flags feed bundle_path; auto-fallback to post_link_module = "platform.bundle" when bundle_path is set without a registered callback.

Specifics in specs.md §10.5. The full bundling pipeline spec — what runs per Apple target vs Android, what each accessor returns, the BuildConfig forwarded from main.zig — lives there.

Wiring a new bundling step:

  1. Add the parameter as a setter on BuildOptions :: struct #compiler { ... } in library/modules/compiler.sx.
  2. Add the BuildConfig field + setter hook + accessor hook in src/ir/compiler_hooks.zig. Register both in Registry.registerDefaults.
  3. Optionally forward a CLI flag in src/main.zig before the post-link invocation.
  4. Read the accessor from library/modules/platform/bundle.sx.

File roles

File Role
specs.md Language specification. Source of truth for syntax/semantics.
current/PLAN.md Active IR implementation plan.
current/CHECKPOINT.md Active IR progress tracker. Update after every step.
current/PLAN-FFI.md Active FFI ceremony reduction plan (Obj-C / JNI intrinsics, JNI DSL, Obj-C header import).
current/CHECKPOINT-FFI.md Active FFI progress tracker. Update after every step.
implementation_plan.md Archive of completed work (closures, protocols, etc.). Do not pick up tasks from here.
readme.md Original syntax sketches. Do not modify.
CLAUDE.md This file. Session instructions.
library/modules/platform/bundle.sx sx-side .app / .apk bundler. See "Bundling lives in sx" above.
library/modules/fs.sx, library/modules/process.sx POSIX stdlib for the bundler + general consumer use.