sx/CLAUDE.md

# sx compiler — session instructions

## IMPASSIBLE RULES — no exceptions

### When you hit a sx compiler bug during normal work

**STOP. File the issue. Wait for a fix in another session. Do NOT
work around the bug. Do NOT continue with adjacent work. Do NOT
land code that depends on the not-yet-fixed behaviour.**

Procedure:

1. Create `issues/NNNN-slug.md` (next free `NNNN`, see existing
   `issues/`).
2. The file must contain:
   - **Symptom** — one-line summary + observed vs expected.
   - **Reproduction** — minimal sx code (inline fenced block). Must
     reproduce the bug standalone, no project dependencies beyond
     `modules/std.sx` / `modules/allocators.sx`.
   - **Investigation prompt** — a ready-to-paste prompt the user can
     drop into a fresh session to fix the bug. Should include: the
     suspected area of the compiler (file + function), what the
     fix likely needs to do, and the verification step (run the
     repro, expect new output).
3. Update `current/CHECKPOINT-MEM.md` (or the relevant stream's
   checkpoint) — mark `## Current state` as BLOCKED on issue NNNN.
4. Tell the user: bug filed at `issues/NNNN-slug.md`, work paused
   pending fix.
5. **STOP.** Do not migrate, do not refactor, do not look for an
   alternative path. The session ends until the user returns with
   confirmation the fix landed.

This rule is uncontestable. None of the following override the stop:

- "But the workaround is small."
- "But this just affects one file."
- "But I can route around it."
- **"But the bug is pre-existing — not introduced by my change."**
- **"But it doesn't block what I was just doing — only future work."**
- **"But I already finished the step I was on; this is adjacent."**
- **"But the fix is obviously safe to land alongside."**

If filing an issue is on the table, the answer is STOP. Period.
Do not weigh "blocking vs non-blocking". Do not bundle the issue
filing with continued work in the same session. Do not finalise the
checkpoint, regen snapshots, or move to the next phase after filing.
Filing the issue IS the last action of the session.

Every workaround during execution becomes technical debt that hides
the bug from the next person who hits it. Every "I'll just keep
going since it's pre-existing" leaves the next session believing
the bug is gated behind their work and not yours. The cost of
stopping is one paused session; the cost of working around — or
"just finishing this last thing" — is permanent silent fragility.

If you genuinely think the bug is not a bug (e.g. you've
misunderstood specs.md), STILL file the issue with that hypothesis
in the prompt — let the fix session confirm or correct.

The two acceptable actions after recognising a compiler bug:
1. File `issues/NNNN-slug.md`, mark the checkpoint BLOCKED, stop.
2. If the bug surfaced AFTER you've already shipped a complete
   step (built + tests pass + checkpoint logged), still file as in
   (1) and still stop — do not roll forward into the next step.

There is no "wrap up first" option.

## REJECTED PATTERNS — never generate these

### Silent fallback defaults in the compiler

❌ **Forbidden:** returning a "reasonable-looking" default value when
a lookup fails in the compiler. Examples of the pattern to root out:

```zig
// NEVER write this — lookup fails, return s64 and pretend nothing
// happened. Any caller asking `what type is this?` gets a lie.
return self.module.types.findByName(name_id) orelse .s64;

// NEVER write this — same shape, dressed up:
return scope.lookup(name) orelse default_type;
return resolved_field orelse .void;
```

These defaults silently produce wrong results in cases the implementer
didn't think of. The classic failure mode: the default coincidentally
matches the size/shape of one common case, so the test suite passes
and the bug ships invisibly. issue-0042 lived for years because
`resolveTypeArg`'s `orelse .s64` returned 8 bytes for unresolved
type-alias names — coincidentally correct for any 8-byte target
(`s64`, `*T`, `f64`, function pointers), and silently wrong for
everything else.

✅ **Required:** when a lookup that *must* succeed fails, emit a
diagnostic via `self.diagnostics.addFmt(.err, span, "...", .{...})`
and return the most clearly-broken sentinel the calling code can
survive (e.g. `.void`, a `Ref.none`, or via a `?T` return that forces
the caller to handle the null). Errors must surface to the user as
text, not as a silently-corrupted size or alignment.

If you find an existing default-return in the compiler that swallows
a lookup failure, treat it as a discovered bug — file an issue per
the IMPASSIBLE RULES above, do not just delete the default in place
without surfacing what it was hiding.

### Silent unimplemented arms (catch-all `else` branches)

❌ **Forbidden:** a `switch` / `if-chain` over a Value tag, Op variant,
TypeId, etc. whose `else` branch silently does the wrong thing —
returns the input unchanged, returns a zero/null/undef default, picks
one common width and writes that many bytes, swallows an error into
`.void_val` so the caller fills a zero-init const, etc. The pattern
is identical in spirit to the silent-fallback-defaults rule above:
a case the implementer didn't think of falls through to behaviour
that *looks* like it worked but corrupts the data downstream.

Examples of patterns we've burned ourselves on:

```zig
// NEVER write this — `.int` value at a raw destination, write 8
// bytes regardless of actual IR type. Silently clobbers neighbors
// when the destination is sub-8.
.int => |v| {
    const bytes = std.mem.toBytes(v);
    @memcpy(dst[0..bytes.len], &bytes);
},

// NEVER write this — `.deref` of anything but slot_ptr passes the
// value through unchanged. Looks like a successful deref to callers;
// silently wrong for raw pointers.
.deref => |u| switch (frame.getRef(u.operand)) {
    .slot_ptr => |s| return frame.loadSlot(s),
    else => return val,  // ← silently wrong
},

// NEVER write this — comptime init error becomes void_val, which the
// LLVM emitter happily turns into a zero-init constant. The user sees
// the const evaluating to 0 with no diagnostic.
const result = interp.call(func_id, &.{}) catch .void_val;
```

✅ **Required:** either (1) implement the arm correctly *in the same
step* as the one that introduces the new shape, or (2) bail loudly
with a one-line diagnostic that names the specific case. For the
interp we have `bailDetail(comptime msg)` which sets
`Interpreter.last_bail_detail` so the host diagnostic surfaces "op=X:
<reason>" instead of a bare `CannotEvalComptime`. Mirror the same
pattern in any new evaluator / interpreter / serializer.

Preferred order: **implement the arm**. Only fall back to "bail loudly"
when the implementation requires plumbing that's out of scope for the
current step. In that case, leave a one-line comment explaining what
would be needed to implement it properly — so the next person hitting
the diagnostic has a head-start.

If a path requires width / type / layout information that isn't
threaded into the IR op yet, prefer to add the field to the op
struct (`Store.val_ty`-style) over leaving an "8 bytes assumed"
shortcut. The field plumbing is a one-time cost; silent-clobber
debugging is forever.

When in doubt: `else => return bailDetail("clear one-line reason")`
beats `else => unreachable` beats `else => /* hope */`.

### Allocator construction

❌ **Forbidden:** the "caller provides storage" pattern (in any form):

```sx
// NEVER write this — explicit @ptr:
g_gpa : GPA = ---;
alloc := GPA.create(@g_gpa);

// NEVER write this — UFCS-disguised same pattern:
gpa_state : GPA = .{ alloc_count = 0 };
gpa := gpa_state.create();

// NEVER write this — in-place init on a struct field:
self.arena_a.create(parent, size);
```

❌ **Also forbidden:** wrapping an `init` result through a cast just
to bind a typed pointer you already have:

```sx
// NEVER write this — tracker is already *TrackingAllocator:
tracker := TrackingAllocator.init(context.allocator);
t : *TrackingAllocator = xx tracker;   // redundant rename
t.report();
```

✅ **Required:** `init` returns the concrete typed pointer (`*T`);
caller casts `xx ptr` to `Allocator` only at use sites that need the
protocol value.

```sx
gpa := GPA.init();                          // *GPA
arena := Arena.init(xx gpa, 4096);          // *Arena ; xx gpa → Allocator for parent
tracker := TrackingAllocator.init(context.allocator);  // *TrackingAllocator

push Context.{ allocator = xx tracker, data = null } { ... }

print("gpa allocs: {}\n", gpa.alloc_count);   // direct field access
tracker.report();                              // direct method call
arena.reset();                                 // direct method call
```

The rule exists because:
- `create` returning `Allocator` forces an `instance()` accessor or a
  cast-back to recover the typed pointer (extra step every time).
- Caller-storage patterns are verbose, error-prone (easy to pass the
  wrong @ptr), and an artifact of an earlier allocator design.
- `init` returning `*T` matches Zig conventions and lets the caller
  decide where/how to cast to `Allocator`.

See `current/CHECKPOINT-MEM.md` ISSUE-MEM-005 for the migration
history. If an existing allocator type still uses the old `create`
pattern, migrate it OR ask the user — never propagate the pattern
in new code, docstrings, examples, or tests.

### Long-lived containers growing through `context.allocator`

A struct's lifetime can outlast its caller's current `context.allocator`.
When that happens, any internal allocation made via `context.allocator`
(directly, or via a `List.append(item)` that uses the default) binds to
a *transient* allocator, and dies the moment that transient scope is
torn down — even though the owning struct is still alive and reachable.

❌ **Forbidden:** the implicit-context capture in any struct whose
lifetime crosses a `push Context { ... }` boundary:

```sx
LongLived :: struct {
    items: List(Entry);

    add :: (self: *LongLived, e: Entry) {
        // BAD — `items` grows through whichever allocator happens to
        // be current at this call. If the caller is inside a transient
        // `push Context { allocator = ... }`, the new backing lives in
        // that transient allocator. When the push scope ends, the
        // backing is freed/reset, but `items.items` still points at it.
        self.items.append(e);
    }
}
```

The same trap applies to direct `context.allocator.alloc(...)` /
`.dealloc(...)` calls inside such structs.

✅ **Required:** capture the owning allocator at construction time
and forward it explicitly to every internal growth point. The
container's API supports this directly — `List(T)`'s mutations take
an optional trailing `alloc: Allocator = context.allocator`:

```sx
LongLived :: struct {
    items: List(Entry);
    own_allocator: Allocator;

    init :: (self: *LongLived) {
        // Snapshot whatever allocator is in scope at construction.
        // That same allocator must outlive this struct.
        self.own_allocator = context.allocator;
    }

    add :: (self: *LongLived, e: Entry) {
        self.items.append(e, self.own_allocator);
    }

    // Direct allocs too:
    grow_buf :: (self: *LongLived, n: s64) {
        self.buf = self.own_allocator.alloc(n);
    }
}
```

Two-question test for whether a struct needs this pattern:

1. Can a caller's `push Context { ... }` wrap a method on this struct?
2. Does any method allocate (directly, or by triggering List growth)?

If both yes, capture the owning allocator at init. Field name is by
convention (`parent_allocator`, `owner`, `own_allocator` — pick the
project's existing one and follow it).

Sibling case (do NOT migrate): a container whose backing is
*intentionally* tied to the caller's scope — typically a per-scope
scratch buffer that is reset/zeroed at the top of every scope. Those
SHOULD use `context.allocator` so they live and die with the scope.
A clear comment at the declaration site is mandatory.

## On every session start

Three active workstreams run in parallel — **IR** (the language compiler),
**FFI** (Obj-C / JNI ceremony reduction), and **MEM** (memory module
overhaul, mem.sx + protocol expansion). They touch mostly disjoint
files; any can be advanced independently.

1. Read all three checkpoints to see where each stream is paused:
   - `current/CHECKPOINT.md` — IR progress tracker.
   - `current/CHECKPOINT-FFI.md` — FFI progress tracker.
   - `current/CHECKPOINT-MEM.md` — MEM progress tracker + issues log.
2. Read the plan that corresponds to the stream the user wants to advance:
   - `current/PLAN.md` — IR implementation plan.
   - `current/PLAN-FFI.md` — FFI ceremony reduction plan.
   - `~/.claude/plans/tidy-doodling-cray.md` — MEM (mem.sx) implementation plan.
3. Read `specs.md` if you need to understand language behavior.
4. Pick up from the next incomplete step in the relevant `CHECKPOINT*.md`.
   If the user hasn't said which stream to work on, ask before picking.

> **Note:** `implementation_plan.md` is the archive of completed work (closures, protocols,
> auto type erasure, init blocks). Do NOT pick up unchecked items from it — those are
> on hold until the IR work is done.

## While working

- Work on **one step at a time**. Complete it fully before moving on.
- After completing a step, immediately update the relevant checkpoint
  (`current/CHECKPOINT.md` for IR, `current/CHECKPOINT-FFI.md` for FFI):
  - Update `## Last completed step` with the step you just finished.
  - Update `## Current state` with what exists now.
  - Update `## Next step` with what comes next.
  - Add a log entry under `## Log`.
- If a step fails or you get stuck:
  - Add the issue to `## Known issues` in the relevant checkpoint.
  - Do NOT skip the step — fix the blocker or ask the user.
- If you make a design decision not already in `specs.md`, add it to `## Decisions Log` in `implementation_plan.md`.
- **FFI cadence rule** (from `current/PLAN-FFI.md`): no commit may both
  add a test AND make it pass. Either lock in current behavior with a
  passing test, or land an expected-failing test that the very next
  commit turns green.

### IR-specific rules (from current/PLAN.md)

- **Never modify `src/codegen.zig` in Phases 0–1.** It is the safety net.
- In Phase 3, only read specific sections of codegen.zig (grep for the relevant handler).
- No step should require reading more than ~1,000 lines of existing code. If it does, split it.
- No step should produce more than ~500 lines of new code. If it does, split it.
- If Claude gets confused mid-step, stop, update `current/CHECKPOINT.md` with partial progress, and tell the user to start a new session.

## Context management

- Each step is scoped so it can be completed in a single session without exhausting context.
- If you're running low on context, stop at the current step boundary, update `current/CHECKPOINT.md` with your progress, and tell the user to start a new session.
- Never try to complete multiple phases in one session unless each is small.

## Build and verify

After any code change:
```sh
zig build                              # must compile
zig build test                         # must pass
```

After completing a phase's final step, run the phase's end-to-end verification command listed in `current/PLAN.md`.

## Testing

After any compiler change:

1. **Build**: `zig build && zig build test`
2. **Run regression tests**: `bash tests/run_examples.sh`
   - All 29 tests must show `ok`
   - Zero failures, zero timeouts

### Snapshot integrity

**Never run `--update` while tests are failing.** The `--update` flag blindly overwrites expected output with whatever the compiler produces — including error messages. If you update snapshots during a broken state, the test suite will "pass" against garbage output and real regressions become invisible.

Safe workflow:
1. Fix the code until `bash tests/run_examples.sh` passes against the **existing** snapshots.
2. Only run `--update` when you've intentionally changed output (new feature, new test, changed formatting).
3. After `--update`, review the diff (`git diff tests/expected/`) to confirm no error messages or empty output were captured.

### Adding a new language feature

When implementing a new feature:

1. Add test case(s) to `examples/50-smoke.sx` in the appropriate section
2. Run `./zig-out/bin/sx run examples/50-smoke.sx` to verify it works
3. Regenerate expected output: `bash tests/run_examples.sh --update`
4. Verify all tests still pass: `bash tests/run_examples.sh`

### Test file roles

| File | Purpose |
|------|---------|
| `examples/50-smoke.sx` | Comprehensive feature coverage (~200 tests). Add new features here. |
| `examples/NN-name.sx` | Focused feature examples (e.g. `01-basic.sx`, `52-frameworks.sx`). Created either fresh or by renaming a resolved `issue-*.sx` once its bug is fixed. |
| `examples/issue-NNNN.sx` | **Open bug repros.** Each file is a literal item on the open-issue list, named after the issue number. When the bug is fixed, rename the file (and its `tests/expected/issue-NNNN.{txt,exit}`) to a focused feature example so the `issue-*` namespace shrinks to exactly the unresolved set. A file without a matching `tests/expected/issue-NNNN.txt` is an open issue that hasn't been pinned in the test suite yet. |
| `tests/expected/*.txt` | Expected output per example. Regenerate with `--update`. |
| `tests/expected/*.exit` | Expected exit codes. Auto-generated with `--update`. |
| `tests/run_examples.sh` | Test runner. Compares actual vs expected output. |

### Unit test file convention

All Zig unit tests live in separate `*.test.zig` files alongside the source they test:

| Source file | Test file |
|-------------|-----------|
| `src/ir/types.zig` | `src/ir/types.test.zig` |
| `src/ir/interp.zig` | `src/ir/interp.test.zig` |
| `src/ir/lower.zig` | `src/ir/lower.test.zig` |
| ... | ... |

- **Never put `test` blocks directly in source files.** All tests go in the corresponding `.test.zig` file.
- Each `.test.zig` file must be imported in the barrel file (`src/ir/ir.zig`) so `zig build test` discovers them via `refAllDecls`.
- When adding a new source file that needs tests, create the `.test.zig` file and add `pub const foo_tests = @import("foo.test.zig");` to the barrel.

### Creating a new standalone test

1. Create `examples/NN-name.sx` (focused feature example) **or** `examples/issue-NNNN.sx` (open-bug repro).
2. Run it: `./zig-out/bin/sx run examples/<file>.sx`
3. Create expected output: `bash tests/run_examples.sh --update`
4. Verify: `bash tests/run_examples.sh`

### Resolving an open issue

When a bug filed as `examples/issue-NNNN.sx` is fixed, the file should leave the issue-namespace:

1. Pick a focused feature name with the next free number, e.g. `examples/52-frameworks.sx`.
2. `git mv examples/issue-NNNN.sx examples/NN-name.sx`.
3. `git mv tests/expected/issue-NNNN.txt tests/expected/NN-name.txt` (and the `.exit` file).
4. Tighten the comment header to describe the feature (drop the issue-NNNN provenance — that lives in git history now).
5. Run the suite to confirm nothing else broke.

The `examples/issue-*` glob after step 5 is the literal list of what's still open.

## Known bugs

When you encounter a known bug during unrelated work (e.g. a closure returning a pointer instead of a value while fixing forward references), do NOT fix it inline. Instead:

1. Add it to `implementation_plan.md` under a `## Known Bugs` section with a short description, reproduction steps, and the file/line where it manifests.
2. Continue with the current task, working around the bug if needed.
3. Fix it at the end of the session (or in a future session) as a separate step.

This keeps the current task focused and avoids scope creep from non-trivial side-fixes.

## Git commits

- Never add Co-Authored-By lines to commit messages.

## Bundling lives in sx

Platform-specific bundling (Apple `.app`, Android `.apk`) is sx code.
The compiler shrinks to: parse → IR → codegen → link → invoke a sx
function. Codesigning / Info.plist / AndroidManifest / javac / d8 /
aapt2 / zipalign / apksigner / framework embed / entitlements / asset
trees all run in the IR interpreter post-link via libc / process.run
foreign calls.

| File | Role |
|------|------|
| [library/modules/platform/bundle.sx](library/modules/platform/bundle.sx) | All four targets (macOS, iOS sim, iOS device, Android). Branches on `BuildOptions.is_macos / is_ios / is_ios_device / is_ios_simulator / is_android` accessors. |
| [library/modules/fs.sx](library/modules/fs.sx) | POSIX file stdlib (open / read / write / copy / mkdir / unlink / chmod / rename / exists / basename / dirname). |
| [library/modules/process.sx](library/modules/process.sx) | popen-based `run(cmd) -> ?ProcessResult` + `env(name)` + `find_executable(name)`. |
| [library/modules/compiler.sx](library/modules/compiler.sx) | `BuildOptions` setters + accessors. Adding a new bundling parameter = add a setter here + a hook in compiler_hooks.zig. |
| [library/modules/platform/android.sx](library/modules/platform/android.sx) | `AndroidPlatform` (state-on-struct, no module globals). `sx_android_*` helpers take `plat: *AndroidPlatform` as first arg. `logical_w` field drives `dpi_scale = pixel_w / logical_w` so consumer's design-width fits any physical resolution. |
| [src/ir/compiler_hooks.zig](src/ir/compiler_hooks.zig) | `BuildConfig` + every `BuildOptions.*` hook. Hook registry is in `Registry.registerDefaults`. |
| [src/ir/host_ffi.zig](src/ir/host_ffi.zig) | `dlsym(RTLD_DEFAULT)` + arity-switched cdecl trampolines. Lets `#foreign("c")` decls resolve at `#run` / post-link time against host libc. |
| [src/main.zig](src/main.zig) | After `target.link()`, threads target_triple + frameworks + jni_main emissions into BuildConfig, then invokes the post-link callback by FuncId (or by `<module>.bundle_main` name). `--bundle` / `--apk` flags feed `bundle_path`; auto-fallback to `post_link_module = "platform.bundle"` when bundle_path is set without a registered callback. |

Specifics in [specs.md §10.5](specs.md). The full bundling pipeline
spec — what runs per Apple target vs Android, what each accessor
returns, the BuildConfig forwarded from main.zig — lives there.

Wiring a new bundling step:
1. Add the parameter as a setter on `BuildOptions :: struct #compiler { ... }` in [library/modules/compiler.sx](library/modules/compiler.sx).
2. Add the `BuildConfig` field + setter hook + accessor hook in [src/ir/compiler_hooks.zig](src/ir/compiler_hooks.zig). Register both in `Registry.registerDefaults`.
3. Optionally forward a CLI flag in [src/main.zig](src/main.zig) before the post-link invocation.
4. Read the accessor from [library/modules/platform/bundle.sx](library/modules/platform/bundle.sx).

## File roles

| File | Role |
|------|------|
| `specs.md` | Language specification. Source of truth for syntax/semantics. |
| `current/PLAN.md` | **Active** IR implementation plan. |
| `current/CHECKPOINT.md` | **Active** IR progress tracker. Update after every step. |
| `current/PLAN-FFI.md` | **Active** FFI ceremony reduction plan (Obj-C / JNI intrinsics, JNI DSL, Obj-C header import). |
| `current/CHECKPOINT-FFI.md` | **Active** FFI progress tracker. Update after every step. |
| `implementation_plan.md` | Archive of completed work (closures, protocols, etc.). Do not pick up tasks from here. |
| `readme.md` | Original syntax sketches. Do not modify. |
| `CLAUDE.md` | This file. Session instructions. |
| `library/modules/platform/bundle.sx` | sx-side `.app` / `.apk` bundler. See "Bundling lives in sx" above. |
| `library/modules/fs.sx`, `library/modules/process.sx` | POSIX stdlib for the bundler + general consumer use. |