lab/sx

Files

agra 4cd641c946 ERR/E3.0 (slice 3d): --emit-obj + macOS lldb stepping verified

`sx build --emit-obj` keeps the DWARF-bearing object so a debugger can
step the binary, completing the deep-debug half of the trace story.

- --emit-obj flag + TargetConfig.emit_obj. Implies -O0 (DWARF only
  emits at opt none/less); keeps the object at its link-time path
  .sx-tmp/main.o so the binary's debug map resolves to it; skips the
  Level-1 binary cache; reports the object path. macOS resolves via the
  debug map -> .o; Linux carries DWARF in the binary. Build-flow only,
  no runtime/codegen change.
- tests/debug_stepping_smoke.sh (3e rung 1; macOS, lldb, not in
  run_examples): builds with --emit-obj, drives an lldb file:line
  breakpoint, asserts resolution + a source-mapped backtrace. Passing —
  proves the slice 1-2 DWARF drives real source-level stepping.

(Also normalizes the 253 .exit trailing newline from the 3c --update.)
Gates: zig build, zig build test, run_examples.sh -> 291 passed.

2026-06-01 15:55:05 +03:00

22 KiB

Raw Blame History

Debugging sx: traces, debug info, and stepping

This is the architecture spec for sx's debugging story — error return traces, DWARF debug info, and source-level stepping. It records what each piece does, how it works, and why it's built this way.

For the user-facing guide to writing fallible code (and what a trace looks like in practice), see error-handling.md. This document is the implementer/architect reference.

The guiding principle

Debugging splits into two jobs, and conflating them is the trap:

"My program errored — where, and along what path?" (≈99% of the time)
"I want to single-step in a real debugger." (rare, deep)

sx solves #1 itself, in-process, with zero OS dependencies — the source location is baked in at compile time, so a trace needs no DWARF reader, no symbolizer, no /proc, no atos. sx solves #2 by emitting standard DWARF and handing it to an external debugger (lldb/gdb), which already knows every platform's symbolization rules. We ship no symbolizer of our own.

The payoff: error traces work identically and deterministically on every target — desktop JIT, AOT binary, comptime interpreter, even a locked-down iOS device with no debugger attached — while real single-stepping is available for free wherever a debugger exists.

The three execution contexts

sx code runs in three different machines, and the trace/debug design has to satisfy all three. "JIT" and "comptime" are not the same thing.

Context	What runs the code	Trace frame representation
AOT (`sx build`)	native machine code in an on-disk binary	pointer to an interned `Frame`
JIT (`sx run`)	ORC-JIT'd machine code in anonymous memory	pointer to an interned `Frame`
Comptime (`#run`)	the IR interpreter (`interp.zig`) — no machine code	packed `(func_id, ir_offset)`

The crucial constraint: the same lowered IR runs in the compiled backend and the interpreter. So a value the IR produces (like a trace frame) must mean the right thing in both — which is why the trace-push is a context-sensitive op (below), not a plain constant.

A second fact shaped the design: iOS devices forbid JIT (no mmap(PROT_EXEC|PROT_WRITE) for third-party apps). On-device sx is therefore AOT-only, and the trace must be readable on a device with no debugger attached — which the in-process embedded-Frame design delivers and a PC-symbolization design could not.

Error return traces

A return trace is the path an error took from its raise site up through every try that propagated it. It is recorded as the error travels and formatted where it's caught (a catch handler, or the failable-main wrapper).

The buffer

A thread-local fixed-cap ring of opaque u64 frames lives in a vendored C runtime, library/vendors/sx_trace_runtime/sx_trace.c:

sx_trace_push(u64) / sx_trace_clear() / sx_trace_len() / sx_trace_truncated() / sx_trace_frame_at(u32).
Capacity 32; overflow keeps the newest frames (Zig-style) and latches a truncated flag so the formatter can note "N frames omitted."

It lives in a separately-linked C file (not an emitted thread_local IR global) for the same reason as the JNI env slot: LLVM's ORC JIT doesn't initialize TLS for objects added via AddObjectFile. The compiler links the .c so the JIT resolves sx_trace_* via dlsym; AOT targets pick it up as an auto-injected #source (gated on Lowering.needs_trace_runtime).

The buffer neither knows nor cares what a frame means — it just stores u64s. The producer and the formatter agree on the interpretation per context (next section).

The frame: an embedded `Frame`, not a PC

A runtime frame is a pointer to a compile-time-interned Frame {file, line, col, func, line_text}. The lowerer already knows the push site's source location (the instruction's span + the enclosing function), so the location — and the offending source line itself (line_text, for the ^ caret snippet) — is baked into read-only data at compile time and the formatter reads it directly. No PC capture, no DWARF, no symbolizer, no runtime file read.

A comptime frame is instead a packed (func_id: u32, ir_offset: u32), resolved through the interpreter's in-memory IR/source tables. The interpreter never dereferences the compiled Frame pointer — it uses its own representation — so the compiled and interpreted memory models never collide.

The niladic trace-push op

Because the same IR runs in both machines, the push is a dedicated, niladic, span-stamped IR op — the same pattern as is_comptime / interp_print_frames. It carries no operands and no global reference; each backend derives the frame from its own context:

emit_llvm: resolves the op's span + current function → {file, line, col, func} (reusing the source map wired in for DWARF), interns and builds the Frame global in emit_llvm (the same mechanism as the tag-name table), then emits call sx_trace_push(ptr).
interp: pushes the packed (func_id, ir_offset) from its own execution context.

This keeps the lowerer thin: at each push site it emits the op and nothing else — no operand wiring, no global construction. The rejected alternative — an op carrying a GlobalId to an IR-level Frame global — would make the global visible to the interpreter (forcing comptime onto the pointer-deref path) and fatten the lowerer; do not do this.

Frame is defined once in sx (trace.sx/std); emit_llvm builds the interned global off that TypeId through the normal struct-emission path, never a bespoke byte layout (which would risk the "8-bytes-assumed" clobber class of bug). file/func strings are interned into a shared pool so a path shared by N push sites is stored once — the table stays tiny. File paths are normalized to a stable relative form so trace output is machine-independent and snapshot-testable.

Push and clear sites

Push (one frame each):

raise EXPR — at the raise site.
try X — on X's failure path, wherever that failure routes next.
a bare failable in its legal positions (LHS of catch, LHS of an or value terminator, RHS of a destructure) — at the failure point.

Clear (every absorbing site — the error stops here):

catch e { ... } runs (cleared so the handler still sees the chain; the buffer is empty after the handler exits).
an attempt succeeds inside an or chain.
an or value terminator absorbs the failure.
a destructure binds the error slot (the user now owns the error).

So at format time the buffer holds exactly the frames of failures that actually escaped to where you're formatting. Absorbed failures are push-then-clear and leave no residue — the steady state mirrors Zig's.

process.exit(code) discards the buffer (immediate syscall, no flush).

Output format

error return trace (most recent call last):
  parse at parse.sx:12:5
     if !is_digit(s[0]) raise error.BadDigit;
                        ^
  run   at main.sx:20:9
     v := try parse(s);
          ^

func at file:line:col per frame, oldest-first ("most recent call last"), with a best-effort source snippet + ^ caret. The snippet reads the source file if available (always true under sx run); it degrades to the bare file:line:col line when the source isn't present. The formatter lives in library/modules/trace.sx (to_string / print_current); the failable-main reporter is sx_trace_report_unhandled in sx_trace.c.

Build-mode gating

Traces follow the optimization level (mirrors Lowering.tracesEnabled):

Debug (-O0/-O1, the sx run default): push/clear emitted; the Frame table is emitted.
Release (-O2/-O3): push/clear are no-ops, no Frame table — a future --release-traces flag flips them back on.
Comptime (#run): always on, regardless of build mode — a #run failure must produce a useful diagnostic even in a release build.

The success path costs nothing; the failure path costs one pointer push.

DWARF debug info — a debugger-only artifact

sx emits standard DWARF so external debuggers can step sx code. DWARF is not used by the trace formatter — it exists solely for lldb/gdb (and on-device iOS debugging). It is independent debugger sugar that can be stripped without affecting traces.

What's emitted

In src/ir/emit_llvm.zig, gated on the same debug opt levels + a wired source map (setDebugContext):

one DICompileUnit + DIFile on the main file,
a DISubprogram per emitted function (LLVMSetSubprogram),
a DILocation per instruction, resolved from Inst.span via errors.SourceLoc.compute, scoped to the function's subprogram,
the "Debug Info Version" / "Dwarf Version" module flags, finalized with LLVMDIBuilderFinalize.

The llvm-c/DebugInfo.h DIBuilder API is bound in src/llvm_api.zig.

What it enables (and what it doesn't, yet)

✅ breakpoints, step, stepi, backtrace, source-line mapping — enabled by the line table + subprograms.
⚠️ variable inspection (p x) — needs DILocalVariable + DIType + location expressions per IR slot, which are not emitted yet. lldb can step and show the right source line, but p x reports no variable. This is an optional future slice; it's not required for stepping.

macOS / iOS note

A linked Mach-O contains no DWARF — ld leaves a debug map (OSO stabs) pointing at the .o files. So llvm-dwarfdump on the executable shows nothing; you run dsymutil to collect a .dSYM, which lldb (and atos) consume. This is a standard build-time step, not something sx parses at runtime.

Wiring: exactly how it's connected

This section is the file-and-function map — the concrete data flow for both the trace path and the DWARF path. Items marked ✅ exist today; ⏳ are the planned slice-3 shape.

Where the pieces live

File	Responsibility
`src/core.zig`	`Compilation`: owns `import_sources` (file→source map), constructs the emitter, calls `setDebugContext` + `emit`; re-enters the interpreter for `#run`/post-link
`src/ir/lower.zig`	AST→IR. Stamps `Inst.span`; emits push/clear at failure/absorb sites; `tracesEnabled` gate; declares the `sx_trace_*` externs
`src/ir/emit_llvm.zig`	IR→LLVM. Builds the interned `Frame` table; lowers the push op to a pointer push; emits all DWARF metadata
`src/ir/interp.zig`	Comptime IR interpreter. Lowers the push op to a packed `(func_id, offset)`; resolves comptime frames
`src/errors.zig`	`SourceLoc.compute(source, offset) → {line, col}`; the `import_sources` map type
`src/ir/inst.zig`	`Inst.span`, `Function.source_file`, the `Op` union (home of the trace-push op)
`library/vendors/sx_trace_runtime/sx_trace.c`	the thread-local ring buffer + `sx_trace_report_unhandled`
`library/modules/trace.sx`	the formatter (`to_string` / `print_current`)
`src/llvm_api.zig`	binds `llvm-c/Core.h` + `llvm-c/DebugInfo.h`
`src/target.zig`	`TargetConfig.opt_level` (the gate) + `is_aot`

The shared spine: one source-location resolver

Both paths resolve a byte offset to file:line:col the same way, so traces and DWARF can never disagree:

✅ import_sources : StringHashMap([:0]const u8) (file path → source text) is built in core.zig during resolveImports (main file + every import), and shared with both the diagnostics renderer and the emitter (via setDebugContext).
✅ Inst.span (a {start, end} byte range) is threaded onto every instruction by Builder.current_span, which lower.zig sets as it walks each expr/stmt (E3.0 slice 1). Function.source_file records which file a function's spans index.
✅ errors.SourceLoc.compute(source, span.start) turns an offset into {line, col}. Used by the diagnostics renderer, #caller_location, the DWARF emitter, and (planned) the trace formatter — one function, every consumer.

Trace path: compile → run → format

Producer (compile time) ✅ (3a)

lower.zig reaches a failure site — lowerRaise, lowerTry's propagation branch, lowerFailableOr, or lowerDestructureDecl — and (when tracesEnabled()) emits the niladic .trace_frame_push op, replacing today's emitTracePush(placeholderTraceFrame()). Absorbing sites emit emitTraceClear() → call sx_trace_clear().
Compiled backend (emit_llvm.emitInst, .trace_frame_push arm): resolve the op's span + current function → {file,line,col,func}, intern into the Frame table (built alongside tag_name_array), and emit call sx_trace_push(ptr_to_Frame). The sx_trace_push extern is declared lazily by getTraceFids() (which sets needs_trace_runtime).
Interpreter (interp.zig, same op): pack (current_func_id, ir_offset) into a u64 and call the foreign sx_trace_push (resolved via host_ffi dlsym against the linked sx_trace.c).

Buffer (run time) ✅ — sx_trace.c stores the u64s. Linked into the compiler so the JIT resolves sx_trace_* via dlsym; auto-injected as a #source for AOT when needs_trace_runtime is set.

Formatter (run time) ✅ (compiled 3a, comptime 3b) — trace.sx to_string() loops sx_trace_len() / sx_trace_frame_at(i) and resolves each u64 through a read-side context-split primitive (the mirror of the push op):

compiled: cast the u64 → *Frame, load the fields.
comptime: unpack (func_id, offset), resolve via the interpreter's IR/source tables → a Frame.

The same trace.sx source works in both because it runs in the matching machine — a compiled program formats compiled frames, a #run formats comptime frames. It then prints func at file:line:col + a best-effort source snippet.

Consumers ✅ — a catch handler calling trace.print_current(), and the failable-main wrapper, whose ret path in emit_llvm (emitFailableMainRet) calls sx_trace_report_unhandled in sx_trace.c.

DWARF path: compile → debugger ✅

core.zig generateCode: LLVMEmitter.init(...) → emitter.setDebugContext(&self.import_sources, self.file_path) → emitter.emit().
emit() Pass -1 initDebugInfo(): gated by debugEnabled() (source map present + opt none/less). Creates the DIBuilder, adds the "Debug Info Version"/"Dwarf Version" module flags, and one DICompileUnit on diFileFor(main_file).
Pass 2 emitFunction → beginFunctionDebug(func, llvm_func, name): diFileFor(func.source_file) → LLVMDIBuilderCreateFunction → LLVMSetSubprogram; stores it as di_scope.
emitInst (top, every instruction): setInstDebugLocation(inst.span) → SourceLoc.compute over sourceForFile(current_func_file) → LLVMDIBuilderCreateDebugLocation(scope = di_scope) → LLVMSetCurrentDebugLocation2. So every LLVM instruction the op emits carries the right !dbg.
endFunctionDebug clears di_scope + the builder location, so the synthetic Obj-C / global-ctor functions (no subprogram) inherit none.
Pass 4 finalizeDebugInfo() → LLVMDIBuilderFinalize; LLVMDisposeDIBuilder in deinit.
Backend emits the object / JIT module. AOT Mach-O carries a debug map → dsymutil collects a .dSYM → lldb/gdb symbolize. In release debugEnabled() is false → no DIBuilder runs → strippable to nothing.

The gate: one switch, two consumers

Lowering.tracesEnabled() (lower.zig) and LLVMEmitter.debugEnabled() (emit_llvm) both reduce to opt_level == .none or .less. The Frame table + push/clear ride tracesEnabled; DWARF rides debugEnabled. Release (-O2/-O3) emits neither. sx run defaults to -O0 (both on); sx ir/sx asm default to -O2 (both off) — which is why the .ir snapshots don't drift when this machinery is present.

Why not return-address PCs + DWARF (decision, 2026-06-01)

The original design captured return-address PCs and symbolized them via DWARF, Zig-style. We changed course. The full rationale lives in implementation_plan.md §Decisions Log; in brief:

The dual-execution split is unavoidable regardless. Compiled code and the interpreter run the same IR, so a frame must be context-split whether it's a PC or a Frame pointer — PCs buy no simplification here.
JIT code has no on-disk DWARF. sx run (the primary dev path, and what the test suite exercises) JITs into anonymous memory; symbolizing those PCs needs GDB-JIT registration + an in-process DWARF reader — the single largest chunk of the Zig-faithful approach.
iOS forbids JIT and prints best with no debugger. Device builds are AOT; the embedded-Frame trace prints source-mapped to stderr/os_log with nothing attached — the biggest DX win on a locked-down platform, and impossible with PC symbolization there.
macOS keeps no DWARF in the linked binary (debug-map → .o/.dSYM), so even AOT self-symbolization means porting a Mach-O debug-map + .debug_line reader.
Determinism. Interned Frames have no ASLR addresses, so trace output is snapshot-testable; raw PCs are not.

DWARF is still emitted (it's how Zig's own std.debug reads program debug info), but demoted to the debugger-only role above. All OS-specific symbolization is delegated to the platform debugger — sx ships none.

Runtime artifacts

Artifact	Lookup	Size	Shipped in release?
Tag-name table	tag id → name string	tiny (per distinct tag)	yes, always — `{}` interpolation, the `main` wrapper, and the trace's "raised error.X" line need names even in release
`Frame` location table	push site → `{file,line,col,func}`	small (interned strings; per push site)	debug / `--release-traces` only — rides the trace-mode gate
DWARF (`.debug_line` / `DISubprogram`)	PC → file:line:col, for debuggers	larger (per source position)	debug / `--release-traces` only, strippable; consumed by `lldb`/`gdb`, never by the trace formatter

The tag-name table is always linked (it's how a tag renders as BadDigit in any build). The Frame table powers traces. DWARF is independent debugger sugar.

Stepping and deep debugging

Stepping is delegated entirely to the platform debugger via the DWARF we emit; sx provides the artifacts and a launch convenience, nothing more.

Artifacts

sx build --emit-obj keeps the DWARF-bearing object at its link-time path (.sx-tmp/main.o) instead of deleting it, and implies -O0 (DWARF only emits at opt none/less). On macOS the linked binary's debug map resolves to that .o, so lldb/gdb run from the project root can step the binary directly; on Linux the DWARF is in the binary, so the .o isn't even needed. A portable .dSYM (via dsymutil) is only required for the on-device iOS rung (below).

The verification ladder

Source-level stepping is verified manually/interactively (it needs dsymutil/lldb, and on device a signing identity + a get-task-allow provisioning profile — not a run_examples.sh test). Climb cheapest-first; the device run is the final sign-off:

macOS native ✅ — sx build --emit-obj → drive lldb --batch (the debug map resolves to the kept .o; no dsymutil needed locally). Checked in as tests/debug_stepping_smoke.sh: breakpoint on a sx function, run, assert it stops at the right .sx:line, next/stepi advance, bt is source-mapped. The automatable rung (a checked-in smoke script).
iOS simulator — bundle the .app, install to a booted simulator (simctl), launch under lldb, repeat the checks. No device, no signing.
iOS device (capstone) — --debug: emit DWARF → dsymutil .dSYM, debug-sign with get-task-allow, install via devicectl, launch under debugserver, attach lldb, single-step sx source on the phone. If stepping works here — the most locked-down target — the DWARF story is proven everywhere.

Independently, Tier-0 always works with no debugger: a plain on-device run still prints the embedded-Frame trace to stderr/os_log.

Dependencies

Everything OS-specific is a build-/run-time tool on the host (the same ones any iOS app needs): dsymutil, codesign + provisioning, devicectl/simctl, lldb/debugserver. At runtime, on the target, sx's dependency is zero — the trace is write(2, ...) of pre-baked strings. We never call atos/addr2line, never read /proc, never parse a Mach-O debug map, never register JIT DWARF.

Implementation status

Piece	Status
Tag-name table + `{}` interpolation	✅ done (`a3ff503`)
Trace buffer (`sx_trace.c`) + push/clear wiring	✅ done (`51f5277` / `ea40724`)
`trace.sx` formatting (placeholder locations)	✅ done (`bb20339`)
IR instructions carry source spans	✅ done — E3.0 slice 1 (`b44a5d0`)
DWARF emission (compile unit / subprogram / line table)	✅ done — E3.0 slice 2 (`c32d694`)
Niladic trace-push op + interned `Frame` table (runtime)	✅ done — E3.3 slice 3a (`1b6cbc1`)
Comptime resolver (`func_id, ir_offset` → location)	✅ done — slice 3b
Source snippet + `^` caret	✅ done — slice 3c (line embedded in `Frame`)
`--emit-obj` artifact plumbing	✅ done — slice 3d
Stepping verification: macOS lldb	✅ done — slice 3e rung 1 (`tests/debug_stepping_smoke.sh`)
Stepping verification: iOS simulator → device	⏳ planned — slice 3e rungs 2–3 (capstone)
DWARF variable info (`DILocalVariable`, for `p x`)	⏳ optional follow-on

The active plan and step breakdown live in current/PLAN-ERR.md (§"Why not PCs + DWARF" + Step E3.0/E3.3) and current/CHECKPOINT-ERR.md; the design decisions are logged in implementation_plan.md §Decisions Log.

22 KiB Raw Blame History Unescape Escape