# Debugging sx: traces, debug info, and stepping This is the architecture spec for sx's debugging story — error return traces, DWARF debug info, and source-level stepping. It records *what* each piece does, *how* it works, and *why* it's built this way. For the user-facing guide to writing fallible code (and what a trace looks like in practice), see [error-handling.md](error-handling.md). This document is the implementer/architect reference. --- ## The guiding principle Debugging splits into two jobs, and conflating them is the trap: 1. **"My program errored — where, and along what path?"** (≈99% of the time) 2. **"I want to single-step in a real debugger."** (rare, deep) sx solves #1 **itself, in-process, with zero OS dependencies** — the source location is baked in at compile time, so a trace needs no DWARF reader, no symbolizer, no `/proc`, no `atos`. sx solves #2 by **emitting standard DWARF and handing it to an external debugger** (`lldb`/`gdb`), which already knows every platform's symbolization rules. We ship no symbolizer of our own. The payoff: error traces work identically and deterministically on every target — desktop JIT, AOT binary, comptime interpreter, even a locked-down iOS device with no debugger attached — while real single-stepping is available for free wherever a debugger exists. --- ## The three execution contexts sx code runs in three different machines, and the trace/debug design has to satisfy all three. "JIT" and "comptime" are **not** the same thing. | Context | What runs the code | Trace frame representation | |---|---|---| | **AOT** (`sx build`) | native machine code in an on-disk binary | pointer to an interned `Frame` | | **JIT** (`sx run`) | ORC-JIT'd machine code in anonymous memory | pointer to an interned `Frame` | | **Comptime** (`#run`) | the IR interpreter (`interp.zig`) — no machine code | packed `(func_id, ir_offset)` | The crucial constraint: **the same lowered IR runs in the compiled backend *and* the interpreter.** So a value the IR produces (like a trace frame) must mean the right thing in both — which is why the trace-push is a context-sensitive op (below), not a plain constant. A second fact shaped the design: **iOS devices forbid JIT** (no `mmap(PROT_EXEC|PROT_WRITE)` for third-party apps). On-device sx is therefore AOT-only, and the trace must be readable on a device with no debugger attached — which the in-process embedded-`Frame` design delivers and a PC-symbolization design could not. --- ## Error return traces A return trace is the path an error took from its `raise` site up through every `try` that propagated it. It is recorded as the error travels and formatted where it's caught (a `catch` handler, or the failable-`main` wrapper). ### The buffer A thread-local fixed-cap ring of opaque `u64` frames lives in a vendored C runtime, [`library/vendors/sx_trace_runtime/sx_trace.c`](../library/vendors/sx_trace_runtime/sx_trace.c): - `sx_trace_push(u64)` / `sx_trace_clear()` / `sx_trace_len()` / `sx_trace_truncated()` / `sx_trace_frame_at(u32)`. - Capacity 32; overflow keeps the **newest** frames (Zig-style) and latches a `truncated` flag so the formatter can note "N frames omitted." It lives in a separately-linked C file (not an emitted `thread_local` IR global) for the same reason as the JNI env slot: LLVM's ORC JIT doesn't initialize TLS for objects added via `AddObjectFile`. The compiler links the `.c` so the JIT resolves `sx_trace_*` via `dlsym`; AOT targets pick it up as an auto-injected `#source` (gated on `Lowering.needs_trace_runtime`). The buffer neither knows nor cares what a frame *means* — it just stores `u64`s. The producer and the formatter agree on the interpretation per context (next section). ### The frame: an embedded `Frame`, not a PC **A runtime frame is a pointer to a compile-time-interned `Frame {file, line, col, func, line_text}`.** The lowerer already knows the push site's source location (the instruction's span + the enclosing function), so the location — *and the offending source line itself* (`line_text`, for the `^` caret snippet) — is baked into read-only data at compile time and the formatter reads it directly. No PC capture, no DWARF, no symbolizer, no runtime file read. A comptime frame is instead a packed `(func_id: u32, ir_offset: u32)`, resolved through the interpreter's in-memory IR/source tables. The interpreter **never dereferences the compiled `Frame` pointer** — it uses its own representation — so the compiled and interpreted memory models never collide. ### The niladic trace-push op Because the same IR runs in both machines, the push is a **dedicated, niladic, span-stamped IR op** — the same pattern as `is_comptime` / `interp_print_frames`. It carries **no operands and no global reference**; each backend derives the frame from its own context: - **`emit_llvm`:** resolves the op's `span` + current function → `{file, line, col, func}` (reusing the source map wired in for DWARF), **interns and builds the `Frame` global in `emit_llvm`** (the same mechanism as the tag-name table), then emits `call sx_trace_push(ptr)`. - **`interp`:** pushes the packed `(func_id, ir_offset)` from its own execution context. This keeps the lowerer thin: at each push site it emits the op and nothing else — no operand wiring, no global construction. The rejected alternative — an op carrying a `GlobalId` to an IR-level `Frame` global — would make the global visible to the interpreter (forcing comptime onto the pointer-deref path) and fatten the lowerer; **do not do this.** `Frame` is defined **once** in sx (`trace.sx`/std); `emit_llvm` builds the interned global off that `TypeId` through the normal struct-emission path, never a bespoke byte layout (which would risk the "8-bytes-assumed" clobber class of bug). `file`/`func` strings are interned into a shared pool so a path shared by N push sites is stored once — the table stays tiny. File paths are normalized to a stable relative form so trace output is machine-independent and snapshot-testable. ### Push and clear sites Push (one frame each): - `raise EXPR` — at the raise site. - `try X` — on X's failure path, wherever that failure routes next. - a bare failable in its legal positions (LHS of `catch`, LHS of an `or value` terminator, RHS of a destructure) — at the failure point. Clear (every absorbing site — the error stops here): - `catch e { ... }` runs (cleared so the handler still sees the chain; the buffer is empty after the handler exits). - an attempt succeeds inside an `or` chain. - an `or value` terminator absorbs the failure. - a destructure binds the error slot (the user now owns the error). So at format time the buffer holds exactly the frames of failures that actually escaped to where you're formatting. Absorbed failures are push-then-clear and leave no residue — the steady state mirrors Zig's. `process.exit(code)` discards the buffer (immediate syscall, no flush). ### Output format ``` error return trace (most recent call last): parse at parse.sx:12:5 if !is_digit(s[0]) raise error.BadDigit; ^ run at main.sx:20:9 v := try parse(s); ^ ``` `func at file:line:col` per frame, oldest-first ("most recent call last"), with a best-effort source snippet + `^` caret. The snippet reads the source file if available (always true under `sx run`); it degrades to the bare `file:line:col` line when the source isn't present. The formatter lives in [`library/modules/trace.sx`](../library/modules/trace.sx) (`to_string` / `print_current`); the failable-`main` reporter is `sx_trace_report_unhandled` in `sx_trace.c`. ### Build-mode gating Traces follow the optimization level (mirrors `Lowering.tracesEnabled`): - **Debug (`-O0`/`-O1`, the `sx run` default):** push/clear emitted; the `Frame` table is emitted. - **Release (`-O2`/`-O3`):** push/clear are no-ops, no `Frame` table — a future `--release-traces` flag flips them back on. - **Comptime (`#run`):** always on, regardless of build mode — a `#run` failure must produce a useful diagnostic even in a release build. The success path costs nothing; the failure path costs one pointer push. --- ## DWARF debug info — a debugger-only artifact sx emits standard DWARF so external debuggers can step sx code. **DWARF is not used by the trace formatter** — it exists solely for `lldb`/`gdb` (and on-device iOS debugging). It is independent debugger sugar that can be stripped without affecting traces. ### What's emitted In [`src/ir/emit_llvm.zig`](../src/ir/emit_llvm.zig), gated on the same debug opt levels + a wired source map (`setDebugContext`): - one `DICompileUnit` + `DIFile` on the main file, - a `DISubprogram` per emitted function (`LLVMSetSubprogram`), - a `DILocation` per instruction, resolved from `Inst.span` via `errors.SourceLoc.compute`, scoped to the function's subprogram, - the `"Debug Info Version"` / `"Dwarf Version"` module flags, finalized with `LLVMDIBuilderFinalize`. The `llvm-c/DebugInfo.h` DIBuilder API is bound in [`src/llvm_api.zig`](../src/llvm_api.zig). ### What it enables (and what it doesn't, yet) - ✅ **breakpoints, `step`, `stepi`, backtrace, source-line mapping** — enabled by the line table + subprograms. - ⚠️ **variable inspection (`p x`)** — needs `DILocalVariable` + `DIType` + location expressions per IR slot, which are **not emitted yet**. lldb can step and show the right source line, but `p x` reports no variable. This is an optional future slice; it's not required for stepping. ### macOS / iOS note A linked Mach-O contains **no DWARF** — `ld` leaves a debug map (`OSO` stabs) pointing at the `.o` files. So `llvm-dwarfdump` on the executable shows nothing; you run `dsymutil` to collect a `.dSYM`, which lldb (and `atos`) consume. This is a standard build-time step, **not** something sx parses at runtime. --- ## Wiring: exactly how it's connected This section is the file-and-function map — the concrete data flow for both the trace path and the DWARF path. Items marked ✅ exist today; ⏳ are the planned slice-3 shape. ### Where the pieces live | File | Responsibility | |---|---| | [`src/core.zig`](../src/core.zig) | `Compilation`: owns `import_sources` (file→source map), constructs the emitter, calls `setDebugContext` + `emit`; re-enters the interpreter for `#run`/post-link | | [`src/ir/lower.zig`](../src/ir/lower.zig) | AST→IR. Stamps `Inst.span`; emits push/clear at failure/absorb sites; `tracesEnabled` gate; declares the `sx_trace_*` externs | | [`src/ir/emit_llvm.zig`](../src/ir/emit_llvm.zig) | IR→LLVM. Builds the interned `Frame` table; lowers the push op to a pointer push; emits all DWARF metadata | | [`src/ir/interp.zig`](../src/ir/interp.zig) | Comptime IR interpreter. Lowers the push op to a packed `(func_id, offset)`; resolves comptime frames | | [`src/errors.zig`](../src/errors.zig) | `SourceLoc.compute(source, offset) → {line, col}`; the `import_sources` map type | | [`src/ir/inst.zig`](../src/ir/inst.zig) | `Inst.span`, `Function.source_file`, the `Op` union (home of the trace-push op) | | [`library/vendors/sx_trace_runtime/sx_trace.c`](../library/vendors/sx_trace_runtime/sx_trace.c) | the thread-local ring buffer + `sx_trace_report_unhandled` | | [`library/modules/trace.sx`](../library/modules/trace.sx) | the formatter (`to_string` / `print_current`) | | [`src/llvm_api.zig`](../src/llvm_api.zig) | binds `llvm-c/Core.h` + `llvm-c/DebugInfo.h` | | [`src/target.zig`](../src/target.zig) | `TargetConfig.opt_level` (the gate) + `is_aot` | ### The shared spine: one source-location resolver Both paths resolve a byte offset to `file:line:col` the same way, so traces and DWARF can never disagree: - ✅ `import_sources : StringHashMap([:0]const u8)` (file path → source text) is built in `core.zig` during `resolveImports` (main file + every import), and shared with both the diagnostics renderer and the emitter (via `setDebugContext`). - ✅ `Inst.span` (a `{start, end}` byte range) is threaded onto every instruction by `Builder.current_span`, which `lower.zig` sets as it walks each expr/stmt (E3.0 slice 1). `Function.source_file` records which file a function's spans index. - ✅ `errors.SourceLoc.compute(source, span.start)` turns an offset into `{line, col}`. Used by the diagnostics renderer, `#caller_location`, the DWARF emitter, and (planned) the trace formatter — one function, every consumer. ### Trace path: compile → run → format **Producer (compile time) ✅ (3a)** 1. `lower.zig` reaches a failure site — `lowerRaise`, `lowerTry`'s propagation branch, `lowerFailableOr`, or `lowerDestructureDecl` — and (when `tracesEnabled()`) emits the niladic `.trace_frame_push` op, replacing today's `emitTracePush(placeholderTraceFrame())`. Absorbing sites emit `emitTraceClear()` → `call sx_trace_clear()`. 2. **Compiled backend** (`emit_llvm.emitInst`, `.trace_frame_push` arm): resolve the op's `span` + current function → `{file,line,col,func}`, intern into the `Frame` table (built alongside `tag_name_array`), and emit `call sx_trace_push(ptr_to_Frame)`. The `sx_trace_push` extern is declared lazily by `getTraceFids()` (which sets `needs_trace_runtime`). 3. **Interpreter** (`interp.zig`, same op): pack `(current_func_id, ir_offset)` into a `u64` and call the foreign `sx_trace_push` (resolved via `host_ffi` `dlsym` against the linked `sx_trace.c`). **Buffer (run time) ✅** — `sx_trace.c` stores the `u64`s. Linked into the compiler so the JIT resolves `sx_trace_*` via `dlsym`; auto-injected as a `#source` for AOT when `needs_trace_runtime` is set. **Formatter (run time) ✅ (compiled 3a, comptime 3b)** — `trace.sx` `to_string()` loops `sx_trace_len()` / `sx_trace_frame_at(i)` and resolves each `u64` through a **read-side context-split primitive** (the mirror of the push op): - compiled: cast the `u64` → `*Frame`, load the fields. - comptime: unpack `(func_id, offset)`, resolve via the interpreter's IR/source tables → a `Frame`. The same `trace.sx` source works in both because it runs in the matching machine — a compiled program formats compiled frames, a `#run` formats comptime frames. It then prints `func at file:line:col` + a best-effort source snippet. **Consumers ✅** — a `catch` handler calling `trace.print_current()`, and the failable-`main` wrapper, whose `ret` path in `emit_llvm` (`emitFailableMainRet`) calls `sx_trace_report_unhandled` in `sx_trace.c`. ### DWARF path: compile → debugger ✅ 1. `core.zig` `generateCode`: `LLVMEmitter.init(...)` → `emitter.setDebugContext(&self.import_sources, self.file_path)` → `emitter.emit()`. 2. `emit()` **Pass -1** `initDebugInfo()`: gated by `debugEnabled()` (source map present + opt none/less). Creates the `DIBuilder`, adds the `"Debug Info Version"`/`"Dwarf Version"` module flags, and one `DICompileUnit` on `diFileFor(main_file)`. 3. **Pass 2** `emitFunction` → `beginFunctionDebug(func, llvm_func, name)`: `diFileFor(func.source_file)` → `LLVMDIBuilderCreateFunction` → `LLVMSetSubprogram`; stores it as `di_scope`. 4. `emitInst` (top, every instruction): `setInstDebugLocation(inst.span)` → `SourceLoc.compute` over `sourceForFile(current_func_file)` → `LLVMDIBuilderCreateDebugLocation(scope = di_scope)` → `LLVMSetCurrentDebugLocation2`. So every LLVM instruction the op emits carries the right `!dbg`. 5. `endFunctionDebug` clears `di_scope` + the builder location, so the synthetic Obj-C / global-ctor functions (no subprogram) inherit none. 6. **Pass 4** `finalizeDebugInfo()` → `LLVMDIBuilderFinalize`; `LLVMDisposeDIBuilder` in `deinit`. 7. Backend emits the object / JIT module. AOT Mach-O carries a debug map → `dsymutil` collects a `.dSYM` → `lldb`/`gdb` symbolize. In release `debugEnabled()` is false → no `DIBuilder` runs → strippable to nothing. ### The gate: one switch, two consumers `Lowering.tracesEnabled()` (lower.zig) and `LLVMEmitter.debugEnabled()` (emit_llvm) both reduce to `opt_level == .none or .less`. The `Frame` table + push/clear ride `tracesEnabled`; DWARF rides `debugEnabled`. Release (`-O2`/`-O3`) emits neither. `sx run` defaults to `-O0` (both on); `sx ir`/`sx asm` default to `-O2` (both off) — which is why the `.ir` snapshots don't drift when this machinery is present. --- ## Why not return-address PCs + DWARF (decision, 2026-06-01) The original design captured return-address PCs and symbolized them via DWARF, Zig-style. We changed course. The full rationale lives in `implementation_plan.md` §Decisions Log; in brief: - **The dual-execution split is unavoidable regardless.** Compiled code and the interpreter run the same IR, so a frame must be context-split whether it's a PC or a `Frame` pointer — PCs buy no simplification here. - **JIT code has no on-disk DWARF.** `sx run` (the primary dev path, and what the test suite exercises) JITs into anonymous memory; symbolizing those PCs needs GDB-JIT registration + an in-process DWARF reader — the single largest chunk of the Zig-faithful approach. - **iOS forbids JIT and prints best with no debugger.** Device builds are AOT; the embedded-`Frame` trace prints source-mapped to stderr/`os_log` with nothing attached — the biggest DX win on a locked-down platform, and impossible with PC symbolization there. - **macOS keeps no DWARF in the linked binary** (debug-map → `.o`/`.dSYM`), so even AOT self-symbolization means porting a Mach-O debug-map + `.debug_line` reader. - **Determinism.** Interned `Frame`s have no ASLR addresses, so trace output is snapshot-testable; raw PCs are not. DWARF is still emitted (it's how Zig's own `std.debug` reads program debug info), but **demoted to the debugger-only role above**. All OS-specific symbolization is delegated to the platform debugger — sx ships none. --- ## Runtime artifacts | Artifact | Lookup | Size | Shipped in release? | |---|---|---|---| | **Tag-name table** | tag id → name string | tiny (per distinct tag) | **yes, always** — `{}` interpolation, the `main` wrapper, and the trace's "raised error.X" line need names even in release | | **`Frame` location table** | push site → `{file,line,col,func}` | small (interned strings; per push site) | **debug / `--release-traces` only** — rides the trace-mode gate | | **DWARF (`.debug_line` / `DISubprogram`)** | PC → file:line:col, for *debuggers* | larger (per source position) | **debug / `--release-traces` only**, strippable; consumed by `lldb`/`gdb`, never by the trace formatter | The tag-name table is always linked (it's how a tag renders as `BadDigit` in any build). The `Frame` table powers traces. DWARF is independent debugger sugar. --- ## Stepping and deep debugging Stepping is delegated entirely to the platform debugger via the DWARF we emit; sx provides the artifacts and a launch convenience, nothing more. ### Artifacts `sx build --emit-obj` keeps the DWARF-bearing object at its link-time path (`.sx-tmp/main.o`) instead of deleting it, and implies `-O0` (DWARF only emits at opt none/less). On **macOS** the linked binary's debug map resolves to that `.o`, so `lldb`/`gdb` run from the project root can step the binary directly; on **Linux** the DWARF is in the binary, so the `.o` isn't even needed. A portable `.dSYM` (via `dsymutil`) is only required for the on-device iOS rung (below). ### The verification ladder Source-level stepping is verified manually/interactively (it needs `dsymutil`/`lldb`, and on device a signing identity + a `get-task-allow` provisioning profile — not a `run_examples.sh` test). Climb cheapest-first; the device run is the final sign-off: 1. **macOS native ✅ verified** — `sx build --emit-obj` → drive `lldb --batch` (the debug map resolves to the kept `.o`; no `dsymutil` needed locally). Checked in as `tests/debug_stepping_smoke.sh`: file:line breakpoint resolves to `.sx:line` + a source-mapped `bt`. The automatable rung. 2. **iOS simulator ✅ verified** — `sx build --target ios-sim --emit-obj` produces an `arm64-ios-simulator` Mach-O that runs under `simctl spawn` and steps in `lldb` (the backtrace shows a `dyld_sim` frame — proof it's the sim runtime). The `tests/debug_stepping_smoke.sh` rung-2 exercises this *against an already-booted sim* (it never boots one itself — use a single simulator); it also collects a `.dSYM` via `dsymutil`, removes the `.o`, and confirms lldb still resolves via the `.dSYM` — proving the device-applicable artifact path. Skipped when no sim is booted. 3. **iOS device (capstone) — manual, needs hardware + Apple signing.** Every *technical* piece is already verified above (DWARF, the `.dSYM` workflow, stepping under the sim runtime); the device rung adds only Apple-toolchain steps that require a phone + a development identity, so it's a checklist, not a compiler deliverable: 1. `sx build --target ios --emit-obj …` (DWARF in the kept `.o`). 2. `dsymutil -o .app.dSYM` (the `.app` ships no `.o`). 3. bundle the `.app` (existing `--bundle` path) + debug-sign with a provisioning profile carrying **`get-task-allow`**. 4. `xcrun devicectl device install app …` then launch under `debugserver`. 5. attach `lldb` (it finds the adjacent `.dSYM`) and single-step sx source. No new compiler code is required — `--emit-obj` + standard Apple tools suffice. (A `--debug` convenience flag that chains 1–4 could be added later, but should be built with a device in hand to verify it.) Independently, **Tier-0 always works with no debugger**: a plain on-device run still prints the embedded-`Frame` trace to stderr/`os_log`. ### Dependencies Everything OS-specific is a **build-/run-time tool on the host** (the same ones any iOS app needs): `dsymutil`, `codesign` + provisioning, `devicectl`/`simctl`, `lldb`/`debugserver`. At **runtime, on the target, sx's dependency is zero** — the trace is `write(2, ...)` of pre-baked strings. We never call `atos`/`addr2line`, never read `/proc`, never parse a Mach-O debug map, never register JIT DWARF. --- ## Implementation status | Piece | Status | |---|---| | Tag-name table + `{}` interpolation | ✅ done (`a3ff503`) | | Trace buffer (`sx_trace.c`) + push/clear wiring | ✅ done (`51f5277` / `ea40724`) | | `trace.sx` formatting (placeholder locations) | ✅ done (`bb20339`) | | IR instructions carry source spans | ✅ done — E3.0 slice 1 (`b44a5d0`) | | DWARF emission (compile unit / subprogram / line table) | ✅ done — E3.0 slice 2 (`c32d694`) | | Niladic trace-push op + interned `Frame` table (runtime) | ✅ done — E3.3 slice 3a (`1b6cbc1`) | | Comptime resolver (`func_id, ir_offset` → location) | ✅ done — slice 3b | | Source snippet + `^` caret | ✅ done — slice 3c (line embedded in `Frame`) | | `--emit-obj` artifact plumbing | ✅ done — slice 3d | | Stepping verification: macOS lldb | ✅ done — 3e rung 1 (`tests/debug_stepping_smoke.sh`) | | Stepping verification: iOS simulator + `.dSYM` path | ✅ done — 3e rung 2 (verified; smoke skips if no booted sim) | | Stepping verification: iOS device | 📋 manual checklist — needs hardware + signing (no compiler gap) | | DWARF variable info (`DILocalVariable`, for `p x`) | ⏳ optional follow-on | The active plan and step breakdown live in `current/PLAN-ERR.md` (§"Why not PCs + DWARF" + Step E3.0/E3.3) and `current/CHECKPOINT-ERR.md`; the design decisions are logged in `implementation_plan.md` §Decisions Log.