`sx build --emit-obj` keeps the DWARF-bearing object so a debugger can step the binary, completing the deep-debug half of the trace story. - --emit-obj flag + TargetConfig.emit_obj. Implies -O0 (DWARF only emits at opt none/less); keeps the object at its link-time path .sx-tmp/main.o so the binary's debug map resolves to it; skips the Level-1 binary cache; reports the object path. macOS resolves via the debug map -> .o; Linux carries DWARF in the binary. Build-flow only, no runtime/codegen change. - tests/debug_stepping_smoke.sh (3e rung 1; macOS, lldb, not in run_examples): builds with --emit-obj, drives an lldb file:line breakpoint, asserts resolution + a source-mapped backtrace. Passing — proves the slice 1-2 DWARF drives real source-level stepping. (Also normalizes the 253 .exit trailing newline from the 3c --update.) Gates: zig build, zig build test, run_examples.sh -> 291 passed.
22 KiB
Debugging sx: traces, debug info, and stepping
This is the architecture spec for sx's debugging story — error return traces, DWARF debug info, and source-level stepping. It records what each piece does, how it works, and why it's built this way.
For the user-facing guide to writing fallible code (and what a trace looks like in practice), see error-handling.md. This document is the implementer/architect reference.
The guiding principle
Debugging splits into two jobs, and conflating them is the trap:
- "My program errored — where, and along what path?" (≈99% of the time)
- "I want to single-step in a real debugger." (rare, deep)
sx solves #1 itself, in-process, with zero OS dependencies — the
source location is baked in at compile time, so a trace needs no DWARF
reader, no symbolizer, no /proc, no atos. sx solves #2 by emitting
standard DWARF and handing it to an external debugger (lldb/gdb),
which already knows every platform's symbolization rules. We ship no
symbolizer of our own.
The payoff: error traces work identically and deterministically on every target — desktop JIT, AOT binary, comptime interpreter, even a locked-down iOS device with no debugger attached — while real single-stepping is available for free wherever a debugger exists.
The three execution contexts
sx code runs in three different machines, and the trace/debug design has to satisfy all three. "JIT" and "comptime" are not the same thing.
| Context | What runs the code | Trace frame representation |
|---|---|---|
AOT (sx build) |
native machine code in an on-disk binary | pointer to an interned Frame |
JIT (sx run) |
ORC-JIT'd machine code in anonymous memory | pointer to an interned Frame |
Comptime (#run) |
the IR interpreter (interp.zig) — no machine code |
packed (func_id, ir_offset) |
The crucial constraint: the same lowered IR runs in the compiled backend and the interpreter. So a value the IR produces (like a trace frame) must mean the right thing in both — which is why the trace-push is a context-sensitive op (below), not a plain constant.
A second fact shaped the design: iOS devices forbid JIT (no
mmap(PROT_EXEC|PROT_WRITE) for third-party apps). On-device sx is
therefore AOT-only, and the trace must be readable on a device with no
debugger attached — which the in-process embedded-Frame design delivers
and a PC-symbolization design could not.
Error return traces
A return trace is the path an error took from its raise site up through
every try that propagated it. It is recorded as the error travels and
formatted where it's caught (a catch handler, or the failable-main
wrapper).
The buffer
A thread-local fixed-cap ring of opaque u64 frames lives in a vendored
C runtime, library/vendors/sx_trace_runtime/sx_trace.c:
sx_trace_push(u64)/sx_trace_clear()/sx_trace_len()/sx_trace_truncated()/sx_trace_frame_at(u32).- Capacity 32; overflow keeps the newest frames (Zig-style) and
latches a
truncatedflag so the formatter can note "N frames omitted."
It lives in a separately-linked C file (not an emitted thread_local IR
global) for the same reason as the JNI env slot: LLVM's ORC JIT doesn't
initialize TLS for objects added via AddObjectFile. The compiler links
the .c so the JIT resolves sx_trace_* via dlsym; AOT targets pick it
up as an auto-injected #source (gated on Lowering.needs_trace_runtime).
The buffer neither knows nor cares what a frame means — it just stores
u64s. The producer and the formatter agree on the interpretation per
context (next section).
The frame: an embedded Frame, not a PC
A runtime frame is a pointer to a compile-time-interned
Frame {file, line, col, func, line_text}. The lowerer already knows the push
site's source location (the instruction's span + the enclosing function),
so the location — and the offending source line itself (line_text, for the
^ caret snippet) — is baked into read-only data at compile time and the
formatter reads it directly. No PC capture, no DWARF, no symbolizer, no runtime
file read.
A comptime frame is instead a packed (func_id: u32, ir_offset: u32),
resolved through the interpreter's in-memory IR/source tables. The
interpreter never dereferences the compiled Frame pointer — it uses
its own representation — so the compiled and interpreted memory models
never collide.
The niladic trace-push op
Because the same IR runs in both machines, the push is a dedicated,
niladic, span-stamped IR op — the same pattern as is_comptime /
interp_print_frames. It carries no operands and no global reference;
each backend derives the frame from its own context:
emit_llvm: resolves the op'sspan+ current function →{file, line, col, func}(reusing the source map wired in for DWARF), interns and builds theFrameglobal inemit_llvm(the same mechanism as the tag-name table), then emitscall sx_trace_push(ptr).interp: pushes the packed(func_id, ir_offset)from its own execution context.
This keeps the lowerer thin: at each push site it emits the op and nothing
else — no operand wiring, no global construction. The rejected
alternative — an op carrying a GlobalId to an IR-level Frame global —
would make the global visible to the interpreter (forcing comptime onto
the pointer-deref path) and fatten the lowerer; do not do this.
Frame is defined once in sx (trace.sx/std); emit_llvm builds the
interned global off that TypeId through the normal struct-emission path,
never a bespoke byte layout (which would risk the "8-bytes-assumed"
clobber class of bug). file/func strings are interned into a shared
pool so a path shared by N push sites is stored once — the table stays
tiny. File paths are normalized to a stable relative form so trace output
is machine-independent and snapshot-testable.
Push and clear sites
Push (one frame each):
raise EXPR— at the raise site.try X— on X's failure path, wherever that failure routes next.- a bare failable in its legal positions (LHS of
catch, LHS of anor valueterminator, RHS of a destructure) — at the failure point.
Clear (every absorbing site — the error stops here):
catch e { ... }runs (cleared so the handler still sees the chain; the buffer is empty after the handler exits).- an attempt succeeds inside an
orchain. - an
or valueterminator absorbs the failure. - a destructure binds the error slot (the user now owns the error).
So at format time the buffer holds exactly the frames of failures that actually escaped to where you're formatting. Absorbed failures are push-then-clear and leave no residue — the steady state mirrors Zig's.
process.exit(code) discards the buffer (immediate syscall, no flush).
Output format
error return trace (most recent call last):
parse at parse.sx:12:5
if !is_digit(s[0]) raise error.BadDigit;
^
run at main.sx:20:9
v := try parse(s);
^
func at file:line:col per frame, oldest-first ("most recent call
last"), with a best-effort source snippet + ^ caret. The snippet reads
the source file if available (always true under sx run); it degrades to
the bare file:line:col line when the source isn't present. The
formatter lives in library/modules/trace.sx
(to_string / print_current); the failable-main reporter is
sx_trace_report_unhandled in sx_trace.c.
Build-mode gating
Traces follow the optimization level (mirrors Lowering.tracesEnabled):
- Debug (
-O0/-O1, thesx rundefault): push/clear emitted; theFrametable is emitted. - Release (
-O2/-O3): push/clear are no-ops, noFrametable — a future--release-tracesflag flips them back on. - Comptime (
#run): always on, regardless of build mode — a#runfailure must produce a useful diagnostic even in a release build.
The success path costs nothing; the failure path costs one pointer push.
DWARF debug info — a debugger-only artifact
sx emits standard DWARF so external debuggers can step sx code. DWARF is
not used by the trace formatter — it exists solely for lldb/gdb (and
on-device iOS debugging). It is independent debugger sugar that can be
stripped without affecting traces.
What's emitted
In src/ir/emit_llvm.zig, gated on the same
debug opt levels + a wired source map (setDebugContext):
- one
DICompileUnit+DIFileon the main file, - a
DISubprogramper emitted function (LLVMSetSubprogram), - a
DILocationper instruction, resolved fromInst.spanviaerrors.SourceLoc.compute, scoped to the function's subprogram, - the
"Debug Info Version"/"Dwarf Version"module flags, finalized withLLVMDIBuilderFinalize.
The llvm-c/DebugInfo.h DIBuilder API is bound in
src/llvm_api.zig.
What it enables (and what it doesn't, yet)
- ✅ breakpoints,
step,stepi, backtrace, source-line mapping — enabled by the line table + subprograms. - ⚠️ variable inspection (
p x) — needsDILocalVariable+DIType+ location expressions per IR slot, which are not emitted yet. lldb can step and show the right source line, butp xreports no variable. This is an optional future slice; it's not required for stepping.
macOS / iOS note
A linked Mach-O contains no DWARF — ld leaves a debug map (OSO
stabs) pointing at the .o files. So llvm-dwarfdump on the executable
shows nothing; you run dsymutil to collect a .dSYM, which lldb (and
atos) consume. This is a standard build-time step, not something sx
parses at runtime.
Wiring: exactly how it's connected
This section is the file-and-function map — the concrete data flow for both the trace path and the DWARF path. Items marked ✅ exist today; ⏳ are the planned slice-3 shape.
Where the pieces live
| File | Responsibility |
|---|---|
src/core.zig |
Compilation: owns import_sources (file→source map), constructs the emitter, calls setDebugContext + emit; re-enters the interpreter for #run/post-link |
src/ir/lower.zig |
AST→IR. Stamps Inst.span; emits push/clear at failure/absorb sites; tracesEnabled gate; declares the sx_trace_* externs |
src/ir/emit_llvm.zig |
IR→LLVM. Builds the interned Frame table; lowers the push op to a pointer push; emits all DWARF metadata |
src/ir/interp.zig |
Comptime IR interpreter. Lowers the push op to a packed (func_id, offset); resolves comptime frames |
src/errors.zig |
SourceLoc.compute(source, offset) → {line, col}; the import_sources map type |
src/ir/inst.zig |
Inst.span, Function.source_file, the Op union (home of the trace-push op) |
library/vendors/sx_trace_runtime/sx_trace.c |
the thread-local ring buffer + sx_trace_report_unhandled |
library/modules/trace.sx |
the formatter (to_string / print_current) |
src/llvm_api.zig |
binds llvm-c/Core.h + llvm-c/DebugInfo.h |
src/target.zig |
TargetConfig.opt_level (the gate) + is_aot |
The shared spine: one source-location resolver
Both paths resolve a byte offset to file:line:col the same way, so
traces and DWARF can never disagree:
- ✅
import_sources : StringHashMap([:0]const u8)(file path → source text) is built incore.zigduringresolveImports(main file + every import), and shared with both the diagnostics renderer and the emitter (viasetDebugContext). - ✅
Inst.span(a{start, end}byte range) is threaded onto every instruction byBuilder.current_span, whichlower.zigsets as it walks each expr/stmt (E3.0 slice 1).Function.source_filerecords which file a function's spans index. - ✅
errors.SourceLoc.compute(source, span.start)turns an offset into{line, col}. Used by the diagnostics renderer,#caller_location, the DWARF emitter, and (planned) the trace formatter — one function, every consumer.
Trace path: compile → run → format
Producer (compile time) ✅ (3a)
lower.zigreaches a failure site —lowerRaise,lowerTry's propagation branch,lowerFailableOr, orlowerDestructureDecl— and (whentracesEnabled()) emits the niladic.trace_frame_pushop, replacing today'semitTracePush(placeholderTraceFrame()). Absorbing sites emitemitTraceClear()→call sx_trace_clear().- Compiled backend (
emit_llvm.emitInst,.trace_frame_pusharm): resolve the op'sspan+ current function →{file,line,col,func}, intern into theFrametable (built alongsidetag_name_array), and emitcall sx_trace_push(ptr_to_Frame). Thesx_trace_pushextern is declared lazily bygetTraceFids()(which setsneeds_trace_runtime). - Interpreter (
interp.zig, same op): pack(current_func_id, ir_offset)into au64and call the foreignsx_trace_push(resolved viahost_ffidlsymagainst the linkedsx_trace.c).
Buffer (run time) ✅ — sx_trace.c stores the u64s. Linked into the
compiler so the JIT resolves sx_trace_* via dlsym; auto-injected as a
#source for AOT when needs_trace_runtime is set.
Formatter (run time) ✅ (compiled 3a, comptime 3b) — trace.sx to_string() loops
sx_trace_len() / sx_trace_frame_at(i) and resolves each u64 through
a read-side context-split primitive (the mirror of the push op):
- compiled: cast the
u64→*Frame, load the fields. - comptime: unpack
(func_id, offset), resolve via the interpreter's IR/source tables → aFrame.
The same trace.sx source works in both because it runs in the matching
machine — a compiled program formats compiled frames, a #run formats
comptime frames. It then prints func at file:line:col + a best-effort
source snippet.
Consumers ✅ — a catch handler calling trace.print_current(), and
the failable-main wrapper, whose ret path in emit_llvm
(emitFailableMainRet) calls sx_trace_report_unhandled in sx_trace.c.
DWARF path: compile → debugger ✅
core.ziggenerateCode:LLVMEmitter.init(...)→emitter.setDebugContext(&self.import_sources, self.file_path)→emitter.emit().emit()Pass -1initDebugInfo(): gated bydebugEnabled()(source map present + opt none/less). Creates theDIBuilder, adds the"Debug Info Version"/"Dwarf Version"module flags, and oneDICompileUnitondiFileFor(main_file).- Pass 2
emitFunction→beginFunctionDebug(func, llvm_func, name):diFileFor(func.source_file)→LLVMDIBuilderCreateFunction→LLVMSetSubprogram; stores it asdi_scope. emitInst(top, every instruction):setInstDebugLocation(inst.span)→SourceLoc.computeoversourceForFile(current_func_file)→LLVMDIBuilderCreateDebugLocation(scope = di_scope)→LLVMSetCurrentDebugLocation2. So every LLVM instruction the op emits carries the right!dbg.endFunctionDebugclearsdi_scope+ the builder location, so the synthetic Obj-C / global-ctor functions (no subprogram) inherit none.- Pass 4
finalizeDebugInfo()→LLVMDIBuilderFinalize;LLVMDisposeDIBuilderindeinit. - Backend emits the object / JIT module. AOT Mach-O carries a debug map
→
dsymutilcollects a.dSYM→lldb/gdbsymbolize. In releasedebugEnabled()is false → noDIBuilderruns → strippable to nothing.
The gate: one switch, two consumers
Lowering.tracesEnabled() (lower.zig) and LLVMEmitter.debugEnabled()
(emit_llvm) both reduce to opt_level == .none or .less. The Frame
table + push/clear ride tracesEnabled; DWARF rides debugEnabled.
Release (-O2/-O3) emits neither. sx run defaults to -O0 (both on);
sx ir/sx asm default to -O2 (both off) — which is why the .ir
snapshots don't drift when this machinery is present.
Why not return-address PCs + DWARF (decision, 2026-06-01)
The original design captured return-address PCs and symbolized them via
DWARF, Zig-style. We changed course. The full rationale lives in
implementation_plan.md §Decisions Log; in brief:
- The dual-execution split is unavoidable regardless. Compiled code
and the interpreter run the same IR, so a frame must be context-split
whether it's a PC or a
Framepointer — PCs buy no simplification here. - JIT code has no on-disk DWARF.
sx run(the primary dev path, and what the test suite exercises) JITs into anonymous memory; symbolizing those PCs needs GDB-JIT registration + an in-process DWARF reader — the single largest chunk of the Zig-faithful approach. - iOS forbids JIT and prints best with no debugger. Device builds are
AOT; the embedded-
Frametrace prints source-mapped to stderr/os_logwith nothing attached — the biggest DX win on a locked-down platform, and impossible with PC symbolization there. - macOS keeps no DWARF in the linked binary (debug-map →
.o/.dSYM), so even AOT self-symbolization means porting a Mach-O debug-map +.debug_linereader. - Determinism. Interned
Frames have no ASLR addresses, so trace output is snapshot-testable; raw PCs are not.
DWARF is still emitted (it's how Zig's own std.debug reads program debug
info), but demoted to the debugger-only role above. All OS-specific
symbolization is delegated to the platform debugger — sx ships none.
Runtime artifacts
| Artifact | Lookup | Size | Shipped in release? |
|---|---|---|---|
| Tag-name table | tag id → name string | tiny (per distinct tag) | yes, always — {} interpolation, the main wrapper, and the trace's "raised error.X" line need names even in release |
Frame location table |
push site → {file,line,col,func} |
small (interned strings; per push site) | debug / --release-traces only — rides the trace-mode gate |
DWARF (.debug_line / DISubprogram) |
PC → file:line:col, for debuggers | larger (per source position) | debug / --release-traces only, strippable; consumed by lldb/gdb, never by the trace formatter |
The tag-name table is always linked (it's how a tag renders as BadDigit
in any build). The Frame table powers traces. DWARF is independent
debugger sugar.
Stepping and deep debugging
Stepping is delegated entirely to the platform debugger via the DWARF we emit; sx provides the artifacts and a launch convenience, nothing more.
Artifacts
sx build --emit-obj keeps the DWARF-bearing object at its link-time path
(.sx-tmp/main.o) instead of deleting it, and implies -O0 (DWARF only emits
at opt none/less). On macOS the linked binary's debug map resolves to that
.o, so lldb/gdb run from the project root can step the binary directly; on
Linux the DWARF is in the binary, so the .o isn't even needed. A portable
.dSYM (via dsymutil) is only required for the on-device iOS rung (below).
The verification ladder
Source-level stepping is verified manually/interactively (it needs
dsymutil/lldb, and on device a signing identity + a get-task-allow
provisioning profile — not a run_examples.sh test). Climb cheapest-first;
the device run is the final sign-off:
- macOS native ✅ —
sx build --emit-obj→ drivelldb --batch(the debug map resolves to the kept.o; nodsymutilneeded locally). Checked in astests/debug_stepping_smoke.sh: breakpoint on a sx function,run, assert it stops at the right.sx:line,next/stepiadvance,btis source-mapped. The automatable rung (a checked-in smoke script). - iOS simulator — bundle the
.app, install to a booted simulator (simctl), launch under lldb, repeat the checks. No device, no signing. - iOS device (capstone) —
--debug: emit DWARF →dsymutil.dSYM, debug-sign withget-task-allow, install viadevicectl, launch underdebugserver, attachlldb, single-step sx source on the phone. If stepping works here — the most locked-down target — the DWARF story is proven everywhere.
Independently, Tier-0 always works with no debugger: a plain on-device
run still prints the embedded-Frame trace to stderr/os_log.
Dependencies
Everything OS-specific is a build-/run-time tool on the host (the same
ones any iOS app needs): dsymutil, codesign + provisioning,
devicectl/simctl, lldb/debugserver. At runtime, on the target,
sx's dependency is zero — the trace is write(2, ...) of pre-baked
strings. We never call atos/addr2line, never read /proc, never parse
a Mach-O debug map, never register JIT DWARF.
Implementation status
| Piece | Status |
|---|---|
Tag-name table + {} interpolation |
✅ done (a3ff503) |
Trace buffer (sx_trace.c) + push/clear wiring |
✅ done (51f5277 / ea40724) |
trace.sx formatting (placeholder locations) |
✅ done (bb20339) |
| IR instructions carry source spans | ✅ done — E3.0 slice 1 (b44a5d0) |
| DWARF emission (compile unit / subprogram / line table) | ✅ done — E3.0 slice 2 (c32d694) |
Niladic trace-push op + interned Frame table (runtime) |
✅ done — E3.3 slice 3a (1b6cbc1) |
Comptime resolver (func_id, ir_offset → location) |
✅ done — slice 3b |
Source snippet + ^ caret |
✅ done — slice 3c (line embedded in Frame) |
--emit-obj artifact plumbing |
✅ done — slice 3d |
| Stepping verification: macOS lldb | ✅ done — slice 3e rung 1 (tests/debug_stepping_smoke.sh) |
| Stepping verification: iOS simulator → device | ⏳ planned — slice 3e rungs 2–3 (capstone) |
DWARF variable info (DILocalVariable, for p x) |
⏳ optional follow-on |
The active plan and step breakdown live in current/PLAN-ERR.md
(§"Why not PCs + DWARF" + Step E3.0/E3.3) and current/CHECKPOINT-ERR.md;
the design decisions are logged in implementation_plan.md §Decisions Log.