cstring is ONE pointer to a null-terminated u8 buffer, C's char*: thin
(8 bytes, no length; cstring_len walks to the terminator), crossing
#foreign boundaries verbatim in both directions, with ?cstring as the
nullable case lowering to the same bare pointer (null = absent).
Conversion discipline mirrors Odin: a string LITERAL coerces implicitly
(its bytes are terminated constants); any other string is rejected with
a diagnostic naming to_cstring (it may be an unterminated view); and
cstring never coerces to string implicitly — from_cstring(c) is the
explicit zero-copy view, pricing the strlen.
Plumbing: TypeId/TypeInfo builtin slot 18 (first_user 19), name
classifiers, size/align/name tables, LLVM ptr lowering, the ?T pointer
niche, the xx pointer ladder, the literal-gated coercion plan
(isConstString + data_ptr), and the reserved-spelling set. std gains
cstring_len/from_cstring/to_cstring (fmt.sx, re-exported); the old
cstring(size) allocator helper is renamed alloc_string everywhere;
getenv migrates to (name: cstring) -> ?cstring as the canonical user
and env() drops its manual strlen/memcpy.
Pinned: examples/1222 (FFI both directions, literal coercion,
?cstring null paths, round trip) and examples/1173 (both coercion
diagnostics); FAIL pre-feature. The alloc_string rename + getenv
signature shift the .ir snapshots — regenerated. zig build test
426/426; run_examples 604/604.
Spec: reserved spelling + cstring section + C-interop rows.
Two genuine defects behind the 0128 filing (whose original repros were
both poisoned by binding getenv, which std already declares -> *u8):
1. Re-declaring a C symbol was silent first-wins: every call through
the later declaration was typed by the older signature. Foreign
registration now dedupes — equal signatures share one FuncId,
conflicting ones are diagnosed.
2. Foreign -> string / -> ?string returns read garbage: C returns one
char*, but the LLVM signature declared the fat {ptr,i64} (len =
register garbage), and ?string was mis-declared SRET (the hidden
out-pointer landed in the callee's first arg register). cstrRetKind
now classifies such returns, declares them as plain ptr (never
sret), and the call site synthesizes {ptr, strlen} via a
branch-guarded strlen (NULL -> {null,0} / optional null), wrapping
{string, i1} for ?string.
?[:0]u8 itself resolves fine (it is ?string); the spelling works in
return, param, local, and alias positions.
Regression: examples/1221 (plain + optional non-null + NULL paths) and
examples/1172 (conflict diagnostic); both FAIL pre-fix. The extern
dedupe collapses duplicate libc decls, so affected .ir snapshots were
regenerated. zig build test 426/426; run_examples 602/602;
distribution suite 21/21.
Surface rename of the signed integer family: s1..s64 become i1..i64
(u1..u64, usize, isize unchanged). 'string' keeps the s-prefix arm in
name classification; width parsing moves to the i-prefix arm next to
isize.
Internal TypeId tags follow the surface (.s8/.s16/.s32/.s64 ->
.i8/.i16/.i32/.i64), as do mono-key mangle fragments (ptr_i64,
tu_i64_bool) and all display/diagnostic formatting (i{d}).
Migrated in the same sweep: stdlib + examples + issue repros + FFI C
companions (shared symbol names like ffi_id_i64), expected
stdout/stderr/ir snapshots, specs.md, readme.md, CLAUDE.md/AGENTS.md,
implementation_plan.md, docs/, issue writeups. Vendored stb_image and
historical flow state left untouched.
zig build test: 426/426; examples suite: 595/595.
Two lowering sites materialized a local array as a whole LLVM value;
the legalizer scalarizes each such op into one SelectionDAG node per
element, and at ~64K elements the DAG combiner segfaults
(DAGCombiner::visitMERGE_VALUES → ReplaceAllUsesWith).
- lowerVarDecl: an array-typed `---` initializer emits NO store — the
slot stays uninitialized instead of receiving a whole-array undef
store. The tuple zero-init carve-out stays; non-array `---` keeps
the undef store. The interp is unchanged either way (slots start
.undef).
- lowerIndexExpr: element reads on an array with addressable storage
GEP the storage and load one element — the general-expression
sibling of 0110's lowerFor fix — without value-lowering the object
(a dead whole-array load would still reach the DAG). Storage-less
arrays keep the index_get fallback.
Sibling shape filed as 0125: any_to_string's per-array-type arms still
pass the array by value, so a 64K+ array type + any {} print crashes.
Regression: examples/0055-basic-large-stack-array.sx (sx build
segfaulted pre-fix). 22 .ir snapshots re-pinned: removed undef stores
and ig.tmp spills, in-place gep+load (instruction-shape-only churn,
reviewed).
The flat #import of mem.sx predated the namespace tail — the tail's
mem :: #import already puts mem.sx in the program graph, which is all
the ufcs helpers (context.allocator.create/alloc/free/clone) and the
CAllocator default-context machinery need; std.sx itself references no
mem name. Probe-verified the full mem surface + all gates: suite
588/588, zig build test 0, m3te 23/23, game builds + bundles. The
double import was also duplicating lowered IR — the 37 re-pinned .ir
snapshots net ~2.5k lines smaller; output streams byte-identical.
std.sx now contains only alias declarations (the re-export mechanism:
own decls carry one flat-import level) over three part-files: core.sx
(builtins, libc escape hatch, Source_Location/Allocator/Context/Into,
the reserved `string` decl — which needs and permits no alias), fmt.sx
(print/format/any_to_string/string ops/cstring/alloc_slice), list.sx
(List). The namespace tail is unchanged; the part-file namespaces
(core/fmt/list) carry alongside it. Consumer surface is byte-identical
— every bare prelude name resolves through the aliases (0120/0121
machinery). 37 .ir snapshots re-pinned: pure string-constant
renumbering from the changed import graph (digit-normalized diff is
empty). Gates: zig build test 426/426, suite 588/588, m3te 23/23,
game SxChess builds + bundles.
K : [4]s64 : .[...] and the untyped A :: .[1, 2, 3] register as
is_const globals: one storage, reads GEP it, dead-global elimination
drops unused ones, source-aware reads come free via selectGlobalAuthor.
- registerConstArrayGlobal (scanDecls pass 2): typed via the annotation
(array-ness + dimension/count checked), untyped via element-type
unification — all ints s64; ANY float promotes the element type to
f64 with ints converting exactly; bool/string homogeneous; a
non-numeric mix or non-inferable element asks for an annotation.
- constExprValue converts int elements into float destinations exactly
(the int+float promotion rule, element-wise).
- emitGlobals marks is_const globals LLVMSetGlobalConstant — also flips
the comptime-backed #run globals and __sx_default_context to
'constant' (37 pinned IR snapshots regenerated; runtime unchanged).
- Element shapes: nested arrays, struct elements, strings, bools.
Non-constant elements / dim mismatch / mixed types diagnose loudly.
Examples: 0177 (feature matrix incl. @K reads through *[4]s64 — needs
the 0117 fix), 1159/1160/1161 (diagnostics), 0837 repointed to values.
With 0115's own-wins globals landed, the remaining tail modules join
std.sx: every '#import "modules/std.sx"' now carries mem/xml/log/fs/
process/socket/json/cli/hash/test as namespaces (trace stays a direct
import).
Enablers in the same change:
- emit: dead-global elimination — a plain-data global no instruction
references is not emitted, so tail modules' data (hash's 64-entry K
table, OS/ARCH/POINTER_SIZE) stays out of binaries that don't use it.
Comptime-backed globals keep their #run evaluation. 37 pinned IR
snapshots regenerated (dead globals dropped + string renumbering from
the larger module).
- 1055/1056 stop pinning the global error-tag ordinal (it shifts with
program composition); they assert nonzero + tag identity + name.
- specs/readme/CLAUDE.md tail docs updated.
allocators/fs/process/socket/log/trace/test move under modules/std/
(allocators.sx becomes std/mem.sx; the Allocator protocol moves into
the std.sx prelude, impls stay in mem.sx). New std/xml.sx holds
xml_escape as xml.escape. std.sx gains the carried namespace tail —
flat-importing std.sx now also provides mem./xml./log. — with the
remaining modules (fs/process/socket/json/cli/hash/test) deferred from
the tail until the global last-wins maps are fully own-wins (pulling
them into every closure collides bare names corpus-wide; they stay
direct imports: modules/std/fs.sx etc.). log.sx's internal emit
renamed log_emit (it clobbered consumer fns named emit program-wide).
bundle.sx uses xml.escape via the carried alias. Consumer import paths
swept mechanically; .ir snapshots recaptured for the larger std
closure. m3te + game build unchanged.
An alloca built at its use site re-executes on every pass through that
block, and LLVM reclaims allocas only at ret — so loop-body locals,
nested-loop index slots, and emitter spill temps (ig.tmp, sret slots, ABI
coercion temps, byval materialization) grew the stack per iteration and
long loops segfaulted on stack exhaustion.
New LLVMEmitter.buildEntryAlloca inserts after existing entry-block
allocas and restores the builder position; every LLVMBuildAlloca site
reachable during instruction emission now routes through it.
Initialization stores stay at the use site (per-iteration re-init is
unchanged), and entry slots become mem2reg-promotable. The 35 .ir
snapshot diffs are pure alloca position moves (type multisets verified
identical per file).
Regression: examples/0047-basic-loop-local-stack-reuse.sx (segfaulted
pre-fix on both the 1M-iteration body-local loop and the 3M-iteration
nested loop).
`type_name` / `type_is_unsigned` on an `Any` argument unconditionally read
the Any's payload as a TypeId index. That is correct only when the Any holds
a Type value (`{ .any, tid }`); for an Any holding a runtime *value*
(`av : Any = 6`, tag s64, payload 6) it returned `types[6]` — `type_name(av)`
gave "u8" and `type_is_unsigned(av)` gave true.
Both backends now branch on the Any's runtime type-tag: tag == `.any` → the
box is a Type value, use the payload as the TypeId; otherwise the tag IS the
held value's type. So `type_name(av)` → "s64", `type_is_unsigned(av)` → false,
while `type_name(type_of(x))` still names the held type. The `{}` formatter is
unchanged (it already passed `type_of(val)`, a proper Type value).
- src/ir/interp.zig: shared `Value.reflectTypeId` tag-branching resolver; the
`type_name` / `type_is_unsigned` interp arms route through it.
- src/backend/llvm/ops.zig: shared `Ops.reflectArgTypeId` emits
extractvalue-tag / icmp-eq-.any / select for the runtime path; both
reflection arms route through it. The two backends agree.
- examples/0164-types-reflection-any-tag.sx: regression pinning type_name /
type_is_unsigned / print on an Any holding a value vs a Type.
- src/ir/interp.test.zig: unit test for `reflectTypeId`.
- 22 .ir snapshots: the new select appears in every std-importing program's
IR (any_to_string embeds these builtins) — benign, verified structurally
identical apart from the three new instructions.
- issues/0090, specs.md: documented the Any-tag rule.
Resolves issue 0090. The `{}` integer formatter mis-rendered both ends of
the 64-bit range:
- `int_to_string` computed the magnitude as `0 - n`, which overflows for
`s64::MIN` (its magnitude is unrepresentable as a positive s64) — the
value stayed negative, the digit loop ran zero times, so only `-`
printed. It now extracts digits straight from `n` (per-digit
`|n % 10|`, `n` truncating toward zero), never negating MIN.
- `any_to_string`'s `case int:` formatted every integer as s64, so a u64
all-ones value printed as `-1`. There was no `uint` type-category to
distinguish signedness. Added an additive `type_is_unsigned(T)`
reflection builtin (static fold + dynamic interp/LLVM paths, mirroring
`type_name`), backed by the new `TypeTable.isUnsignedInt` predicate, and
a `uint_to_string` formatter (unsigned decimal via long-division over
four 16-bit limbs). `case int:` routes through `type_is_unsigned(type)`.
The 16-bit-limb split is factored into a shared `decompose_u16x4`, now
reused by `int_to_hex_string` (no second unsigned-math routine).
Regression: examples/0046-basic-int-formatter-extremes pins both extremes
plus a width spread; unit tests cover `isUnsignedInt`. Docs (specs.md
representation note, readme std API) updated for unsigned/extreme `{}`
behavior. IR snapshots refreshed for the two new std functions.
A module-global initialized with an enum literal silently zero-initialized
to the first tag (`chosen : Color = .green` read back as `.red`), and an
enum tag inside a global array/struct was rejected as non-constant. The
constant serializer had no enum-literal arm.
Add `Lowering.constEnumLiteral`: serialize an enum literal to a
`ConstantValue.int` holding the variant's tag value, resolved against the
destination enum type and respecting explicit variant values; the global's
type drives the backing width at emit time. Wired into `globalInitValue`
(scalar global) and `constExprValue` (array element / struct field / nested
aggregate). A non-enum destination or unknown variant is diagnosed loudly,
never silently zero-initialized. The compiler-injected OS/ARCH globals now
serialize to their real `.unknown` tag (6 / 4); runtime reads are unchanged
(they resolve through comptime_constants), so only the static initializer in
the pinned .ir snapshots changes.
Remove the silent `func_ref => orelse LLVMConstNull` fallbacks in the LLVM
constant emitters: aggregate func_ref leaves carry a `require_resolved` flag
(transient null in Pass 0, loud diagnostic if still unresolved in the
Pass-1.5 re-emit), a top-level func_ref global is resolved in
initVtableGlobals, and the comptime (#run) path bails loudly instead of
emitting a null function pointer.
Regression: examples/0139-types-global-enum-literal-init.sx (scalar, array,
struct field, explicit-value enum u16 stride, struct-array with enum field);
negative: examples/1127-diagnostics-global-enum-literal-bad-variant.sx.
Mark issue 0082 RESOLVED.
Rename all example tests/companions to the XXXX-category-test-name scheme
(per-category 100-blocks: basic 0010, types 0100, ... errors 1000,
diagnostics 1100, ffi 1200, ffi-objc 1300, ffi-jni 1400, vectors 1500,
platform 1600). Companions and dir/C fixtures move in lockstep with their
parent test; #import/#source/#include paths rewritten to match.
Expected output now lives in examples/expected/ (a sibling dir of the
tests) split into three streams per the new convention:
<name>.exit / <name>.stdout / <name>.stderr (+ optional <name>.ir)
run_examples.sh rewritten: scans examples/ and issues/ for an
expected/<name>.exit marker, captures stdout and stderr separately (no
more 2>&1), compares each stream + exit + optional IR snapshot.
Behavior validated unchanged: every renamed test reproduces its prior
merged output + exit (diffs limited to file paths/basenames embedded in
diagnostics + traces, which correctly reflect the new names). Suite:
292 passed, 0 failed. 50-smoke.sx split + issue relocation + docs follow
in subsequent commits.