parse_string scanned for `"` and `\` but accepted every other byte,
including raw control characters. RFC 8259 §7 requires those bytes to be
escaped inside a string; an unescaped one is invalid JSON and must surface
a parse error, not be silently accepted.
Add `BadControlChar` to JsonParseError and reject any unescaped byte < 0x20
in the string body scan (which gates the decode path too, so escaped forms
like \t/\n/ still decode correctly; 0x20 and 0x7F are not over-rejected).
Regression test in examples/0714: raw 0x09/0x0A/0x00 each raise
BadControlChar via `?`/`!`; a positive case proves the escaped forms still
decode to the right bytes. All prior assertions kept.
Issue 0078 (string == as an and/or operand emitting an invalid PHI) is
resolved on this branch, so the example no longer needs the split that
worked around it. Restore the natural combined assertion
sub.items[0].key == "k" and sub.items[0].val.str == "v"
(one nested-pair report), and the in_range containment helper to
return x >= lo and x < hi;
Drop the now-stale issues/0078 references. Re-captured expected stdout
(nested-key/nested-val -> nested-pair). json.sx and src/ untouched.
A string `==`/`!=` used as an operand of a short-circuit `and`/`or` emitted
invalid LLVM (`PHI node entries do not match predecessors!`). String compares
expand into their own memcmp sub-CFG during LLVM emission, so the operand
finishes in a later basic block (`str.merge`) than the one the IR block
started in. `fixupPhiNodes` wired the short-circuit merge PHI's incoming edge
to `block_map[ir_block]` (the block the IR block started as), recording a
stale predecessor (`%entry`/`%and.rhs.0`).
Fix: record the builder's actual insertion block after emitting each IR
block's instructions (`term_block_map`, via `LLVMGetInsertBlock`) and use it
as the PHI predecessor. General — corrects the incoming block for any operand
that emitted intermediate basic blocks (string `==`, value `match`, …), not
just string `==`.
Regression: examples/0045-basic-string-eq-short-circuit.sx (string `==` on
both sides of `and` and of `or`, plus a match-value + enum-payload `==` shape).
Fails (LLVM abort) pre-fix, passes after.
Add the JSON reader (parser) to library/modules/std/json.sx, the inverse
of the F2.1 writer over the same value model: insertion-ordered objects,
arrays, strings (full unescaping incl. \uXXXX + surrogate pairs), s64
integers, bool, null.
Heap discipline (binding): exactly two allocation kinds, both through the
EXPLICIT `alloc` parameter, never the implicit context allocator —
composite backing stores (Array/Object.items via add/put) and decoded
escaped-string buffers (bounded by the raw span). Un-escaped string
values are zero-copy VIEWS into the input buffer (valid only while it
lives); scalars carry no heap.
Failure surfacing (hard contract): malformed input raises a meaningful
JsonParseError variant (UnexpectedToken / UnexpectedEnd / BadEscape /
BadNumber / TrailingGarbage) on the error channel, never a bogus value.
Trailing non-whitespace is TrailingGarbage; fractions/exponents,
out-of-s64 magnitudes, and leading zeros are BadNumber. Number
accumulation runs in negative space so s64 MIN parses exactly.
examples/0714-modules-json-reader.sx asserts the parsed structure
(insertion order, every kind), proves the view-vs-decoded heap split by
pointer containment, round-trips back through the writer byte-for-byte,
decodes a surrogate-pair into 4 UTF-8 bytes, and checks every malformed
variant.
Filed issues/0078: a string `==` (or any sub-CFG operand) used in a
short-circuit `and`/`or` emits invalid LLVM IR (stale PHI predecessor),
hit while writing the example's assertions and worked around there by not
combining comparisons with `and`/`or`. src/ untouched.
Close the coverage gap from attempt 1: example 0713 now builds integer
fields holding s64 MIN (-9223372036854775808) and s64 MAX
(9223372036854775807) — plus zero, a small negative, and a small positive —
and asserts the EXACT emitted bytes. This permanently pins the edge that
write_int is specifically engineered for (folding positives into negative
space so MIN's non-representable-positive magnitude serializes correctly).
s64 MIN is expressed as (0 - 9223372036854775807 - 1) because its magnitude
is not a representable positive s64 literal.
Test hygiene: stream to a repo-local, gitignored .sx-tmp/ path (created if
missing) instead of a fixed /tmp name, and unlink it right after read-back
so nothing leaks. Writer/model logic and src/ are untouched.
Add library/modules/std/json.sx — the JSON value model and writer
(reader lands in a later step).
Value model: a tagged union over null/bool/integer(s64)/string/array/
object. Objects are an ORDERED list of (key,value) pairs preserving
INSERTION ORDER (no hash map, never sorted/deduped). Integers only — no
fraction/exponent this milestone.
Heap discipline:
- Scalars carry no heap; string values are VIEWS into caller memory
(never copied into the node).
- Composite nodes (Array/Object) own growable child storage, allocated
through an EXPLICIT allocator parameter on the builder methods
(arr.add(v, alloc) / obj.put(key, val, alloc), mirroring List.append)
— never the implicit context allocator.
- The writer adds ZERO output allocations: it emits into a caller-
provided Sink, either a fixed []u8 buffer (overflow raises, never
truncates) or streaming straight to an fs.File through a small caller
staging buffer (no whole-document string; peak memory O(staging)).
Integer digits format in a stack [20]u8; s64 MIN is handled by
formatting in negative space. Sink/IO/overflow surface on the !
error channel.
examples/0713-modules-json-writer.sx builds a nested object + array +
string with every escape kind + negative int + bool + null, then asserts
the EXACT bytes (insertion order, escaping) from both the buffer sink and
the file-streaming sink, plus the overflow-raises path.
Make the SHA-256 digest path allocation-free (foundation heap-discipline):
- final() and sha256_hex() now return the 64-char lowercase hex digest as
a [64]u8 by value on the stack; the cstring(64) heap allocation is gone.
- sha256_file() streams the file in fixed 64KB stack chunks via open_file/
File.read/File.close (defer-closed on every path) instead of slurping it
with read_file; peak memory is O(chunk), not O(filesize).
Tests (compare via a zero-copy string view over the [64]u8):
- 0710 updated to the by-value API (output unchanged).
- 0711 known-answer vectors: "", "abc", NIST-56/112, padding boundaries
{0,55,56,57,63,64,65,119,120}, and 1000 / 1,000,000 'a' repeats, each
pinned to its published digest (cross-checked with shasum -a 256).
- 0712 streaming equivalence (one-shot == byte-at-a-time == split-mid-block
== split-on-boundary) plus sha256_file(temp) == in-memory digest.
src/ untouched. zig build && zig build test && tests/run_examples.sh green.
Add a pure-sx streaming SHA-256 (FIPS 180-4) stdlib module, importable
as `#import "modules/std/hash.sx";`. All 32-bit word arithmetic is done
in s64 and masked back with `& MASK32`, so digests are deterministic and
platform-independent — no shelling out, no native crypto.
API:
- init() -> Sha256 (by-value *self pattern)
- update(*Sha256, string) (multi-block + partial-block buffering)
- final(*Sha256) -> string (32-byte digest as lowercase hex)
- sha256_hex(string) -> string (one-shot)
- sha256_file([:0]u8) -> ?string (digest of a file via fs.read_file)
Verified against FIPS/NIST known-answer vectors and `shasum -a 256`:
"" , "abc", the 56- and 112-byte multi-block vectors, 1000×'a', and the
64/65-byte block boundaries; chunked update() matches the one-shot call.
examples/0710-modules-sha256.sx pins the KAT vectors + the streaming
invariant; gate green (zig build, zig build test, run_examples 370/0/0/0).
The reserved-type-name binding diagnostic fired correctly but underlined the
enclosing statement / if / while / for / match / protocol / #objc_class block
because every binding-name check reused the parent `node.span`.
Thread each binding name's own span through the AST and parser, and pass it to
`checkBindingNames`:
- ast: add name spans to VarDecl, DestructureDecl, If/WhileExpr, ForExpr
(capture + index), MatchArm, Catch/OnFailStmt, Protocol/ForeignMethodDecl.
- parser: populate each span at the binding site from the name token's loc;
destructure reuses each target identifier's own span.
- semantic_diagnostics: every checkBindingName call now passes the binding's
own span — no site falls back to node.span. fn/lambda params already used
Param.name_span.
Carets now land on the offending identifier itself. New regression
examples/1125 asserts the protocol default-body and sx-defined #objc_class
method param spans; 0125/1119-1124 expected updated to the precise carets.
The reserved/builtin-type-name binding diagnostic was a hand-walked subset
of binding-bearing AST nodes with a silent `else => {}`, so each review
found another syntactic binding form that bypassed it and hit the original
LLVM verifier abort: destructure names (`s2, x := …`), `impl` method
params/locals, and `if` / `while` / `for` / match-arm / `catch` / `onfail`
captures.
Rewrite `checkBindingNames` (src/ir/semantic_diagnostics.zig) as an
EXHAUSTIVE `switch` over every `Node.Data` tag with NO `else` arm — a future
binding-bearing node type now fails to compile until it is handled here, so
coverage is enforced by the compiler instead of a hand-maintained list. The
check stays in the pre-lowering semantic pass rather than moving to the
`Scope.put` scope-registration choke point: lowering is lazy, so an
uncalled function's bindings never reach `Scope.put`, yet they must still be
rejected at their declaration (e.g. the never-called `takes_u8` in 1119).
No lowering special-case; `lower.zig` unchanged.
Regression tests (fail-before: LLVM abort or silent accept → pass-after:
clean diagnostic, exit 1):
- 1121 control-flow: destructure, if/while bindings, for capture+index,
match-arm capture
- 1122 impl-block method: reserved param AND reserved local
- 1123 catch + onfail tag bindings
- 1124 destructure name reserved in an imported module
Existing 0125 / 1119 / 0135 / 1120 tests kept; full suite 368 passed.
The issue-0076 reserved-type-name binding diagnostic only ran over main-file
decls, so an imported module (or the stdlib) could still declare `s2 := ...`
and reach lowering, where the address-of family loads the whole aggregate and
passes it by value to a `ptr` param — LLVM verifier abort.
Extend coverage to every compiled module: a dedicated `checkBindingNames` walk
(in semantic_diagnostics.zig) visits every var/`:=`/typed-local binding name and
function/lambda/struct-method parameter at any depth, with NO main-file filter,
descending the `namespace_decl` that a `mod :: #import` wraps so imported-module
decls are reached. It tracks each module's source_file (save/restore per node)
so the diagnostic renders against the imported module's text. Rejection still
defers to the parser's `Type.fromName` classifier; the unknown-type check (0064)
stays main-file-only. No lowering special-case; `.identifier`-only address-of
paths are unchanged.
Stdlib audit: the only reserved-name bindings under library/ were two `u1`
locals in ui/renderer.sx (UV coords) — renamed to u_min/u_max/v_min/v_max.
Regression test: examples/1120-diagnostics-imported-reserved-type-name.sx (+
companion mod.sx) — an imported `s2 := ...` now emits the clean diagnostic at
the import's declaration site (exit 1), not an LLVM abort.
Resolves issues 0076 (coverage extension) and 0077.
A value binding (local/global `var` or a parameter) spelled as a
reserved/builtin type name parses as a `.type_expr` rather than an
`.identifier` (parser.zig, via `Type.fromName`), so the address-of
family in lower.zig never saw a scoped local and mis-lowered it —
loading the aggregate and passing it by value to a `ptr` parameter
(LLVM verifier abort, or a silent `*self`-mutation-losing copy).
Add a declaration-site diagnostic in semantic_diagnostics.zig
(`UnknownTypeChecker.checkBindingName`): reject any parameter name or
`var` binding name (`:=` / typed-local / global forms) whose spelling
collides with a reserved type name. `isReservedTypeName` defers to the
parser's own classifier (`types.Type.fromName`) so the rejected set
never drifts from the set that would parse as a type — the named
builtins (bool/string/void/f32/f64/usize/isize/Any) and `[su]N` over
sx's 1-64 range. Bare value names (`s`, `self`, `index`) are untouched.
No lowering special-case; the `.identifier`-only address-of paths are
correct once type-shaped names can never be bound. The rejected
attempt-1 `bareVarName` approach was never landed.
Tests:
- 0125-types-type-named-var-rejected: `:=` form (s2) rejected
(repurposed from the old test that asserted the now-illegal behavior).
- 1119-diagnostics-reserved-type-name-as-identifier: parameter (u8),
typed-local (s64, bool), `:=` (string) forms rejected.
- 0135-types-self-streaming-nonreserved: positive — `*self` streaming
with non-reserved names accumulates correctly via both call styles.
- 0904-optionals: renamed incidental locals s1/s2 -> filled/empty.
The trace docs predated the current formatter. Corrected against the real
output (library/modules/trace.sx to_string + examples/expected/1025-errors-
trace-format.stderr):
- error-handling.md: replace the obsolete trace example ("error trace:" /
"raised error.X" / "at func (file:line)") with the real format —
"error return trace (most recent call last):" + per-frame "func at
file:line:col" + source line + caret.
- debugger.md: drop the stale "(planned)" marker on the trace formatter
(it is implemented); the tag-name table note now cites the failable-main
reporter's "unhandled error reached main: error.X" line, not a
nonexistent "raised error.X" trace line.
The `type_name` / `type_eq` reflection builtins resolved their Type arg's IR
type via `getRefIRType(...) orelse TypeId.s64`, then gated `== .any`. A failed
must-succeed lookup silently became `.s64` (`!= .any`), classifying a boxed
`Any` arg as bare i64 and reading the wrong value with no diagnostic.
Add the sibling classifier `LLVMEmitter.reflectArgRepr`, which routes the
lookup through `argIRTypeOrFail` (the issue-0074 `.unresolved` resolver) and
returns `{ boxed, bare, unresolved }`. The three emit sites in ops.zig
(`type_name` + `type_eq` x2) now switch on it: `.boxed` extracts the Any value
field, `.bare` uses the value directly, `.unresolved` hits a hard `@panic`
tripwire — never silently treated as bare. Real args always resolve, so the
happy path is byte-identical (suite stays 361/0, zero snapshot churn).
Secondary `lower.zig` `null_literal`/`undef_literal => target_type orelse .void`
confirmed intentional (typeless-literal default deliberately handled by
emitConstNull/emitConstUndef as null-ptr / undef-i64) — left with an invariant
comment, not the `.unresolved` tripwire.
Regression test in emit_llvm.test.zig asserts the loud path: fail-before with
`orelse .s64` yields `.bare`; pass-after yields `.unresolved`.
Discovered during the 0074 fix + a codebase-wide silent-type-fallback sweep.
getRefIRType(...) orelse TypeId.s64 at ops.zig:1023/1049/1055 (type_name/type_eq).
Blocker; to be resolved before the arch-refactor stream closes.
Four FFI call-arg lowering sites resolved an argument's IR type via
`getRefIRType(arg_ref) orelse .void` — a silent fallback to the load-bearing
real type `.void`. A failed lookup there is a codegen invariant violation, but
`.void` is treated by downstream `toLLVMType` → `abiCoerceParamType` →
`coerceArg` as a legitimate void-typed foreign argument, corrupting the call
ABI with no diagnostic.
Add one shared resolver `LLVMEmitter.argIRTypeOrFail` that returns the
dedicated `.unresolved` sentinel on a failed lookup — never `.void`/`.s64` — so
the failure cannot masquerade as a real type and trips `toLLVMType`'s existing
hard `@panic` tripwire at the call site. Route all four sites through it:
- src/ir/emit_llvm.zig JNI constructor (NewObject) arg loop
- src/backend/llvm/ops.zig objc_msgSend arg loop
- src/backend/llvm/ops.zig JNI non-virtual call arg loop
- src/backend/llvm/ops.zig JNI Call<Type>Method arg loop
Happy path is byte-identical (every real arg already has a resolved type); FFI
examples stay green with zero snapshot churn.
Regression test (fail-before/pass-after) in src/ir/emit_llvm.test.zig asserts an
unresolvable FFI arg ref now yields `.unresolved`, not the old silent `.void`.
The interp's .trace_frame op only yields the packed value; the separate
sx_trace_push call op is executed by the interp as a foreign call via
host_ffi/dlsym, so the prior 'no sx_trace_push call runs' / 'never calls
sx_trace_push' phrasing was wrong. The packed low word is the op's
span.start (a source byte offset), not an IR instruction offset; renamed
every ir_offset/offset reference to span.start.
The compiled backend builds each trace Frame global as an LLVM named-struct
constant over the cached getFrameStructType() layout (file, line, col, func,
line_text) via LLVMConstNamedStruct -- a type-safe LLVM struct, not the sx
Frame TypeId / normal struct-emission path. Also correct the file field to
the source basename (full paths live in DWARF).
The .trace_frame op is niladic: it carries no operand and no GlobalId.
The compiled backend yields the interned Frame global's address as the
op's value (reflection.emitTraceFrame); the interpreter yields a packed
(func_id, ir_offset) as the op's value and never calls sx_trace_push
(recovered later by .trace_resolve). The sx_trace_push call is a separate
call op emitted by lower.zig at each push site, consuming the op's value.
Reword every passage that stated the old/wrong model: the niladic
invariant is about the op (not the push site emitting only one
instruction); reflection yields the op's value rather than lowering a
push; the interp returns the packed value rather than calling the foreign
sx_trace_push via host_ffi dlsym.
Name the niladic op `.trace_frame` (no `.trace_frame_push` op exists) in
the trace-path roadmap, matching the rest of the doc and src/ir/inst.zig.
Describe the `.trace_frame` arm as building/interning the Frame global and
yielding its address as the op's value; the separate sx_trace_push call is
emitted by the lowerer via normal call lowering, not by the arm itself.
Refresh the debugging architecture reference for the A7.2 relocation:
DWARF emission lives in src/backend/llvm/debug.zig (DebugInfo) and the
interned Frame / tag-name tables in src/backend/llvm/reflection.zig
(Reflection); emit_llvm.zig is the orchestrator that owns LLVMEmitter and
dispatches to them. Behavior is unchanged; only the file-and-function map,
the 'what's emitted' home, and the debugEnabled() owner are corrected.
Remove the legacy parallel type model's compiler-like surface. The
compiler pipeline resolves/lowers/lays out against canonical
src/ir/types.zig (TypeId/TypeTable); src/types.zig.Type is now strictly
editor-indexing + parse-time name metadata.
- src/types.zig: delete the type-resolution surface (widen, bitWidth,
isImplicitlyConvertibleTo) and every helper left dead once it was gone
(eql, isInt/isFloat/isSigned/isUnsigned, isTuple/isVector, and the
already-unused classification predicates isEnum/isUnion/isString/
isStringLike/isAny/optionalChild/sliceElementType/manyPointerElementType/
vectorElementType/isFunctionType/isClosureType/isCallable). Keep the Type
union plus the display/name-classification helpers sema/lsp/parser use
(fromName, fromTypeExpr, toName, displayName, isStruct/isOptional/isSlice/
isPointer/isManyPointer/isArray, pointerPointeeType). Seal the file with a
doc comment.
- src/sema.zig: inferExprType no longer calls Type.widen for arithmetic;
it approximates the display type as the left operand's (no second
resolver in the editor index).
- src/ir/type_bridge.zig: delete the dead bridgeType (legacy Type -> TypeId)
function + its sole sx_types import; resolveAstType and the AST->TypeId
path are untouched.
- src/ir/ir.zig: drop the bridgeType re-export.
- src/ir/type_bridge.test.zig: drop the two bridgeType tests (function gone).
Gate: zig build, zig build test (exit 0), tests/run_examples.sh 361/0,
zero examples/expected churn.
Remove the last compiler dependency on sema as semantic truth and stop
publishing as-you-type sema diagnostics from the LSP.
- core.zig: drop dead `Compilation.analyze()`, the `sema_result` field,
and the sema->diagnostics merge; drop the now-orphaned sema import.
The CLI pipeline (parse -> resolveImports -> generateCode) never called
analyze(), so this removes only dead code.
- lsp/server.zig: rename `analyzeAndPublish` -> `refreshEditorIndex` and
delete its sema-diagnostic publish (and the now-unused `semaToLspDiags`).
The editor index (doc.sema) is still refreshed for nav/refs/completion/
tokens. On-save/on-open diagnostics still come solely from the canonical
compiler pipeline in `runProjectCheck` (unchanged).
- Document sema as an editor-indexing API (doc.sema field comment).
Intended behavior change: as-you-type sema diagnostics no longer publish;
on-save canonical diagnostics are the sole source. CLI compile output and
the 361-example suite are unchanged (361/0, zero snapshot churn).
Move the final inline emitInst handler groups (terminators, box/unbox-Any,
reflection, switch-branch, closure-creation, vector, block-param, misc) into
the Ops facade in src/backend/llvm/ops.zig. emitInst is now pure dispatch:
every arm delegates to self.ops().*, leaving only setInstDebugLocation plus
one-line delegations.
Widen the shared infra the moved bodies reach (emitFailableMainRet, getBlock,
anyTag, isSignedTypeEx, coerceToI64/coerceToI64Signed/coerceFromI64,
emitFieldValueGet) to pub on LLVMEmitter; helper and ref-tracking sections
stay put. Pure relocation: emitted LLVM IR byte-identical, zero snapshot churn.
Relocate the struct, enum, union, array/slice, tuple, and optional
opcode handler bodies out of emitInst into the existing Ops facade.
Each moved arm now delegates via self.ops().emit<Op>(...); shared infra
stays on LLVMEmitter, with resolveAggregate/resolveGepStructType widened
to pub as the GEP handlers require. Pure relocation, behavior-preserving:
zero snapshot churn (361/0).
Relocate the Calls (objc_msg_send / jni_msg_send / call / call_indirect)
and Call-extensions (call_builtin / compiler_call / call_closure) emitInst
handler groups out of emit_llvm.zig into the existing Ops facade. Each
emitInst arm now delegates via self.ops().emit<Op>(...). Behavior-preserving
pure relocation; emitted LLVM IR is byte-identical (361/0 examples, no
snapshot churn).
Shared call infra stays on LLVMEmitter, widened pub only as the moved
bodies require: extractSlicePtr, loadJniFn, getObjcMsgSendValue, the math
F32/F64 declarators + types, getOrDeclareWrite/getWriteType, ffiCtors,
materializeByvalArg, emitCStringGlobal, emitJniConstructor, and the Jni
slot-offset constants. emitJniConstructor remains in emit_llvm.zig (A7.3
decision); the moved jni arm calls it via self.e.emitJniConstructor(...).
Relocate the `// ── Memory ──`, `// ── Globals ──`, `// ── Conversions ──`,
and `// ── Pointer ops ──` opcode handler bodies out of `emitInst` in
src/ir/emit_llvm.zig into the existing `Ops` facade in
src/backend/llvm/ops.zig. Each `emitInst` arm now delegates via
`self.ops().emit<Op>(...)`. Widen `emitConversion`, `coerceArg`, and
`getRefIRType` to `pub` (the only helpers the moved bodies call).
Pure relocation: zero snapshot churn.
Move the Constants/Arithmetic/Bitwise/Comparisons/Logical opcode handler
bodies out of emitInst into a new Ops facade in src/backend/llvm/ops.zig.
emitInst's scalar arms now delegate via self.ops().*; the shared infra they
call (mapRef/resolveRef/matchBinOpTypes/emitCmp/emitCmpOrdered/emitStrCmp/
emitStringConstant/reflection + isFloatOrVecFloat/isSignedType) stays on
LLVMEmitter, widened to pub as needed. Pure relocation: zero snapshot churn.
Move getOrCreateJniSlots (the cls/methodid slot-cache builder) out of
emit_llvm.zig into the FfiCtors backend *LLVMEmitter facade. Behavior-preserving
— self.* -> self.e.* only.
- FfiCtors gains getOrCreateJniSlots (pub). The jni_slots cache + mangleJniKey
stay on LLVMEmitter; mangleJniKey is widened to pub (the facade calls it back,
like lazyDeclareCRuntime/emitPrivateCString), and JniSlotPair is widened to pub
(the facade returns it; the call site consumes it). 1 call site routed via
ffiCtors().
- emitJniConstructor intentionally NOT moved in this slice: it is emission-heavy
(resolveRef/mapRef/coerceArg/getRefIRType/extractSlicePtr/loadJniFn/
emitCStringGlobal — 100+ internal callers for the first two), so relocating it
would pub-expose the emitter's core value-emission machinery. Consistent with
A7.2 keeping emitFieldValueGet in emit_llvm.zig. Pending an explicit decision.
Gate: zig build, zig build test, bash tests/run_examples.sh -> 361/0
(JNI anchors 1402/1408/1418/1425 green, no churn).