metal: pause step 3b pending sx-side fixes (filed 0024-0030)

Step 3b code is wired across UIRenderer + GlyphCache + UIPipeline +
chess game (gpu_mode = .metal on iOS, MetalGPU bound via the GPU
protocol). macOS GL chess, iOS-sim GLES chess, and iOS-sim Metal
triangle (63-metal-clear.sx) all still render.

iOS-sim Metal chess crashes inside replaceRegion uploading the 1MB
font atlas. Bisecting that crash exposed several sx-language issues
where mid-bisect tracers (NSLog inside if/else branch bodies) didn't
produce output, blocking further investigation.

Filing each finding as examples/issue-NNNN.sx rather than working
around piecemeal:

Bugs:
- 0024 NSLog/foreign-call inside if/else body not producing output
- 0025 C-ABI param coercion incomplete for composites >16B
       (combined direct-call abiCoerceParamType TODO + call_indirect
        path that doesn't apply C-ABI coercion at all)
- 0026 replaceRegion 1MB upload crash (likely downstream of 0025)

Features needed for step 4 + cleanup:
- 0027 Obj-C block bridge (^{...}) for animateWithDuration:
- 0028 Optional protocol box (?GPU = null) replaces T = ---; has_T: bool
- 0029 destroy_texture/buffer/shader on GPU protocol
- 0030 extern cross-file globals

Library-side: renderer.sx + glyph_cache.sx + pipeline.sx gain a
`gpu: GPU = ---; has_gpu: bool` field pair + branches that route every
GL touchpoint through the protocol when has_gpu. glyph_cache.init
saves/restores those fields around its memset. pipeline.set_gpu()
propagates to renderer + font. Renderer's MSL shader source added as
UI_MSL_SRC using packed_float2/packed_float4 to keep the 12-float
interleaved vertex layout tight (48 bytes).

metal.sx: dual-phase init (init(null, 0, 0) for eager device+queue,
re-init with the layer once UIKit installs the SxMetalView).
setStorageMode:.shared on every texture descriptor to ensure CPU-
writable atlas pixels on Apple Silicon iOS-sim.

Regression suite: 68 passing, 0 failed. WASM chess build currently
broken under step 3b state (silent compiler crash); documented in
CHECKPOINT.md, likely fallout from one of the filed issues (probably
0028 — the verbose protocol-box pattern). Step 3b resumes after
0024-0030 land.
This commit is contained in:
agra
2026-05-17 21:17:17 +03:00
parent a938c4f900
commit a1647eab9b
11 changed files with 783 additions and 97 deletions

90
examples/issue-0024.sx Normal file
View File

@@ -0,0 +1,90 @@
// issue-0024: NSLog/foreign-side-effect calls placed as the FIRST statement
// of an `if X { ... } else { ... }` branch body do not produce visible
// output, even when the branch is provably taken (the SECOND statement in
// the same body — also a foreign call — does produce output).
//
// ── Observed iOS-side symptom (session 59 bisect) ─────────────────────────
//
// In library/modules/gpu/metal.sx's `metal_create_texture_ios`:
//
// slot : TextureSlot = .{ tex = tex, bytes_per_pixel = bytes_per_pixel };
// self.textures.append(slot);
// NSLog(ns_string("[metal] T6 appended\n".ptr)); // ← fires
//
// pixels_null := pixels == null;
// if pixels_null {
// NSLog(ns_string("[metal] T6b null\n".ptr)); // ← never fires
// } else {
// NSLog(ns_string("[metal] T6a non-null\n".ptr)); // ← never fires
// handle : u32 = xx self.textures.len;
// metal_update_texture_region_ios(self, handle, 0, 0, w, h, pixels);
// // ← DOES fire
// // (its first
// // NSLog at
// // fn entry
// // appears in
// // the unified
// // log)
// NSLog(ns_string("[metal] T7 done\n".ptr)); // ← (helper crashed
// // before this)
// }
//
// T6 appears in the iOS unified log. T6a/T6b never appear. The else
// branch's helper call DOES fire (its own first-statement NSLog inside
// the helper appears). So the else-branch IS entered; just its first
// NSLog statement produces no output.
//
// ── Pure-sx repro below does NOT trigger ───────────────────────────────────
//
// Running `sx run examples/issue-0024.sx` exits 0 (counter == 4 — all
// bumps fired). The bug only manifests with foreign calls (NSLog / ns_string),
// and possibly only when the process subsequently crashes (replaceRegion
// in the metal.sx case) — which raises the alternative hypothesis that
// the missing NSLog output is just iOS unified-logging buffer-loss on
// process death, not a sx compiler bug. The runtime sequence between T6
// and the crash was ~500μs; logs within ~1ms of an unhandled exception
// can be lost to OSLog's internal buffering on Apple Silicon iOS-sim.
//
// ── Investigation plan ─────────────────────────────────────────────────────
//
// Two paths to disambiguate:
// 1. Replace NSLog markers with `write(STDERR_FILENO, ...)` calls
// (synchronous, no OSLog involvement). If markers still don't appear:
// sx compiler bug — likely in src/ir/lower.zig:2166-2196 (the
// `is_value` branch of `lowerIfExpr` and downstream `lowerBlockValue`
// around 922-948). Possible: side-effecting leading statements
// dropped when branches are treated as values.
// 2. If markers DO appear with synchronous write: the iOS-side symptom
// is unified-logging buffer-loss, not a compiler bug. Close this issue
// as "wontfix — diagnostic limitation" and move the iOS debugging to
// foreign-write tracing.
//
// ── Real-world impact ──────────────────────────────────────────────────────
//
// Bisecting issue-0026 (replaceRegion crash) is currently blocked: without
// trustworthy markers inside if/else branches we can't tell which arg
// arrives wrong. Resolution unblocks step 3b of the Metal port.
#import "modules/std.sx";
counter : s64 = 0;
bump :: () { counter = counter + 1; }
probe :: (skip: bool) {
bump();
if skip {
bump();
bump();
} else {
bump();
bump();
}
bump();
}
main :: () -> s32 {
probe(false);
// counter == 4 (entry + 2 in false branch + exit) → exit 0
if counter == 4 then 0 else 1;
}

94
examples/issue-0025.sx Normal file
View File

@@ -0,0 +1,94 @@
// issue-0025: Composite types larger than 16 bytes are passed without the
// LLVM `byval(<ty>)` attribute, and the `call_indirect` (fn-pointer cast)
// path doesn't apply C-ABI parameter coercion at all. Both gaps cause
// silent shape-mismatch when sx code calls foreign C functions that take
// large aggregates by value, OR when sx code calls a sx fn through a
// fn-pointer typed with a large-aggregate parameter.
//
// ── Two failing forms ─────────────────────────────────────────────────────
//
// (A) Direct call to a sx function with a >16B param:
//
// Wide :: struct { a: s64; b: s64; c: s64; d: s64; } // 32 bytes
// accept :: (w: Wide) -> s64 { w.a + w.b + w.c + w.d; }
// accept(Wide.{ a = 1, b = 10, c = 100, d = 1000 }) // expect 1111
//
// src/ir/emit_llvm.zig:2747-2795 (`abiCoerceParamType`):
// - <=8 bytes → coerced to i64
// - 9-16 bytes → coerced to [2 x i64]
// - >16 bytes → returns llvm_ty unchanged with TODO at line 2793
//
// The TODO is the bug: large composites should be coerced to `ptr`
// with a `byval(struct.T)` LLVM attribute. LLVM's mid-end then
// materializes the right machine code per target. Today the struct
// is left as-is, which LLVM tries to pass across registers + stack
// slots in ways that don't match the C ABI promise.
//
// (B) Indirect call via fn-pointer cast (the `xx objc_msgSend` idiom):
//
// fn_ptr : (Wide) -> s64 = xx accept;
// fn_ptr(Wide.{ ... })
//
// src/ir/emit_llvm.zig:902-967 (`.call_indirect`): both the
// FunctionInfo-known arm (939-952) and the LLVMTypeOf-fallback arm
// (953-956) construct `param_tys[j]` WITHOUT routing through
// `abiCoerceParamType`. So even if (A) is fixed, fn-pointer-cast call
// sites still mis-marshal large composites.
//
// ── Real-world impact ──────────────────────────────────────────────────────
//
// Every `xx objc_msgSend` call site in library/modules/platform/uikit.sx
// + library/modules/gpu/metal.sx. Works in practice today only because:
// - We never pass aggregates >16 bytes by value through fn-pointer casts
// (workaround: declare param as `*T` + pass `@local`; arm64 AAPCS's
// indirect-by-ref happens to match this machine-state-wise).
// - HFAs (CGSize 2×f64, MTLClearColor 4×f64, CGRect 4×f64 as return)
// are correctly classified at emit_llvm.zig:2766-2779.
//
// ── Workarounds in use ─────────────────────────────────────────────────────
//
// library/modules/gpu/metal.sx declares MTLRegion (48B) + MTLScissorRect
// (32B) call sites with `*MTLRegion` / `*MTLScissorRect` and passes
// `@region` / `@rect`. Should not be needed once this issue is fixed.
//
// ── Fix sketch ─────────────────────────────────────────────────────────────
//
// (A) emit_llvm.zig:2793 — return `ptr` and emit `byval(struct.T)` on
// the param via `LLVMAddCallSiteAttribute` / `LLVMCreateTypeAttribute`.
// At call sites, alloca + memcpy + pass the alloca pointer. Apply
// identically at function-definition emission so direct calls roundtrip.
//
// (B) emit_llvm.zig:902-967 — factor out a helper
// `coerceCallParams(param_count, src_args, dst_fn_param_tys)
// -> (coerced_args, coerced_tys)` that wraps `abiCoerceParamType`.
// Use the helper from both arms.
//
// Edge cases to preserve:
// - Variadic foreign functions (printf family) — variadic tail per
// AAPCS64 still passes composites in their natural form. Keep
// existing behavior for variadic args.
// - HFAs already handled at 2766-2779 — don't touch.
// - Structs <=8 bytes coerced to `i64`, 9-16 bytes to `[2 x i64]` —
// don't touch.
#import "modules/std.sx";
Wide :: struct {
a: s64; b: s64; c: s64; d: s64;
}
accept :: (w: Wide) -> s64 {
w.a + w.b + w.c + w.d;
}
main :: () -> s32 {
w := Wide.{ a = 1, b = 10, c = 100, d = 1000 };
direct := accept(w); // exercises path (A)
if direct != 1111 { return 1; }
fn_ptr : (Wide) -> s64 = xx accept;
indirect := fn_ptr(w); // exercises path (B)
if indirect != 1111 { return 2; }
0;
}

68
examples/issue-0026.sx Normal file
View File

@@ -0,0 +1,68 @@
// issue-0026: Chess game on iOS-sim with `plat.gpu_mode = .metal` crashes
// inside `[MTLTexture replaceRegion:mipmapLevel:withBytes:bytesPerRow:]`
// when uploading the 1024×1024 R8 font atlas. The 1×1 RGBA8 white tex
// through the SAME code path (metal_update_texture_region_ios in
// library/modules/gpu/metal.sx) works.
//
// Blocked on issue-0024 (NSLog inside if/else not firing — or unified-log
// buffer loss on crash; investigation pending) — without a trustworthy
// tracer we can't reliably bisect which arg arrives wrong. Most likely
// cause: this is downstream of issue-0025's ABI gaps (MTLRegion is 48
// bytes and goes through `xx objc_msgSend` cast, which is the
// call_indirect path that issue-0025 part B covers).
//
// ── Reproduction recipe ───────────────────────────────────────────────────
//
// cd /Users/agra/projects/game
// /Users/agra/projects/sx/zig-out/bin/sx build --target ios-sim main.sx \
// --bundle sx-out/ios/SxChess.app --bundle-id co.swipelab.sxchess \
// -F ~/Library/Frameworks
// cp -R assets sx-out/ios/SxChess.app/
// codesign --force --sign - --timestamp=none sx-out/ios/SxChess.app
// xcrun simctl install booted sx-out/ios/SxChess.app
// xcrun simctl launch --terminate-running-process booted co.swipelab.sxchess
// sleep 4 && xcrun simctl io booted screenshot /tmp/sx-chess.png
//
// Expected (after fix): chess board renders via Metal.
// Observed: app launches, returns immediately to home screen, no screen
// touched. The simpler examples/63-metal-clear.sx demo still renders the
// colored triangle on the same sim, so the Metal pipeline itself works
// for small uploads.
//
// ── Candidate root causes (in priority order) ─────────────────────────────
//
// 1. issue-0025 fallout (most likely): MTLRegion (48 B by value) passed
// via the *MTLRegion workaround. The call_indirect path (issue-0025
// part B) doesn't ABI-coerce, so the pointer-shaped declaration may
// not actually pass the address in the right register slot for that
// call site shape (6 args, including the indirect aggregate).
//
// 2. iOS-sim Metal-driver limitation: `setStorageMode:.shared` may not be
// honored for r8 textures of this size; default may be `.private`
// which precludes CPU-side replaceRegion. Workaround would be to
// upload via `MTLBuffer` + `MTLBlitCommandEncoder` (newBufferWithBytes
// + copyFromBuffer:sourceOffset:sourceBytesPerRow:...:toTexture:...).
//
// 3. sx-side `xx` cast bug: bytes_per_row : u64 = xx (u32_expr) may
// truncate or sign-extend incorrectly. Less likely (the math comes
// out to 1024, which fits in any width).
//
// ── How to resolve ────────────────────────────────────────────────────────
//
// After issues 0024 + 0025 are landed:
// 1. Re-add the trace NSLog markers ("[metal] U1..U5" in
// metal_update_texture_region_ios) — now they should actually print.
// 2. Re-build + relaunch chess on iOS-sim.
// 3. If U5 fires after U4 (no crash inside msg_replace), the bug was
// ABI-related; declare success and rename this file to
// examples/NN-metal-large-region-upload.sx (next free NN).
// 4. If U4 → crash persists, fall back to the MTLBuffer + blit
// encoder path in metal.sx's create_texture (when pixels != null,
// allocate a temporary MTLBuffer with newBufferWithBytes:length:options:
// then run a one-shot command buffer with a MTLBlitCommandEncoder
// copying the buffer into the texture). This is the Apple-recommended
// approach for large texture initial-uploads.
#import "modules/std.sx";
main :: () -> s32 { 0; }

50
examples/issue-0027.sx Normal file
View File

@@ -0,0 +1,50 @@
// issue-0027: Feature — support Obj-C blocks (^{...}) so sx code can call
// APIs that take a block parameter. Required for step 4 of the Metal port
// (keyboard lockstep via `[UIView animateWithDuration:animations:^{...}]`),
// and broadly useful for any UIKit/AppKit API.
//
// ── Proposed surface ──────────────────────────────────────────────────────
//
// Option A — comptime intrinsic that wraps a sx closure as a block:
//
// block := objc_block(@my_closure); // returns *void (an id<Block>)
// msg_block(view, sel, 0.3, block); // pass like any id arg
//
// Internals: emit a Block_literal struct constant with the right invoke
// fn pointer, isa, flags, descriptor pointer. Approximately what clang
// generates for ^{...}.
//
// Option B — surface-level syntax `^{ ... }` that lowers to Option A
// automatically. Cleaner for users; more parser work.
//
// Recommended: start with Option A (intrinsic). Migrate to Option B once
// the codegen path is proven.
//
// ── Implementation sketch ────────────────────────────────────────────────
//
// 1. New `library/modules/std/objc_block.sx` defining the Block_literal
// struct that mirrors clang's layout (isa, flags, reserved, invoke fn
// pointer, descriptor pointer).
// 2. `objc_block(fn_or_closure) -> *void` intrinsic that builds the
// literal at the call site. Initial implementation can be a
// stack-allocated block (_NSConcreteStackBlock); upgrade to
// heap-promoted (_Block_copy) once block lifetime exceeds the call.
// 3. Link libSystem's symbols `_NSConcreteStackBlock` and
// `_NSConcreteGlobalBlock` (auto on iOS; may need `#library "System"`
// on macOS).
// 4. (Deferred) surface syntax `^{ ... }` — parser hook + lowering
// to the intrinsic. Must not clash with bitwise XOR `^`.
//
// ── References ────────────────────────────────────────────────────────────
//
// - Apple block ABI spec (clang's "Block Implementation Specification")
// - _NSConcreteStackBlock + _NSConcreteGlobalBlock from libSystem
//
// ── Real-world impact ─────────────────────────────────────────────────────
//
// Without this, the keyboard inset cannot be animated in lockstep with the
// keyboard slide. See library/modules/platform/uikit.sx's
// uikit_keyboard_will_change_frame comments for the deferred lockstep work.
#import "modules/std.sx";
main :: () -> s32 { 0; }

53
examples/issue-0028.sx Normal file
View File

@@ -0,0 +1,53 @@
// issue-0028: Feature — make protocol boxes assignable to an optional
// type so callers can spell "no GPU bound" as `?GPU = null` instead of
// the verbose `T = ---; has_T: bool` pattern.
//
// ── Current pattern (verbose) ─────────────────────────────────────────────
//
// gpu: GPU = ---;
// has_gpu: bool = false;
// ...
// if self.has_gpu { self.gpu.create_shader(...); }
//
// ── Proposed pattern ──────────────────────────────────────────────────────
//
// gpu: ?GPU = null;
// ...
// if self.gpu != null { self.gpu.create_shader(...); }
//
// ── Where the verbose pattern lives today ─────────────────────────────────
//
// library/modules/ui/renderer.sx — UIRenderer.gpu + has_gpu
// library/modules/ui/glyph_cache.sx — GlyphCache.gpu + has_gpu
// library/modules/ui/pipeline.sx — UIPipeline.gpu + has_gpu (+ set_gpu)
// library/modules/platform/uikit.sx — UIKitPlatform.frame_closure +
// has_frame_closure (Closure type,
// same pattern but on a closure)
//
// ── Implementation sketch ─────────────────────────────────────────────────
//
// Protocol boxes are 2-pointer structs ({vtable, ctx} or {ctx, fn_ptrs...}
// depending on the inline-vs-vtable shape — see src/ir/lower.zig
// `buildProtocolValue` ~7800-7869). `?T` for these can use `vtable_ptr ==
// null` (or `ctx == null`, depending on layout choice) as the "none"
// sentinel — no extra storage needed. This matches the existing
// optional-closure handling at src/ir/emit_llvm.zig where `?Closure` uses
// `fn_ptr == null` as none.
//
// Approach:
// 1. Extend `?T` type construction to accept T being a protocol type.
// Files: src/ir/types.zig + src/ir/lower.zig (type-resolution).
// 2. Implement `optional_wrap` / `optional_unwrap` /
// `optional_has_value` for protocol-typed payloads in
// src/ir/emit_llvm.zig — model after the closure-optional path.
// 3. Keep the existing `T = ---; has_T: bool` pattern working — the
// new `?T` is additive, not a replacement. Don't churn existing
// files (uikit.sx's frame_closure pattern stays).
//
// ── Syntax constraint ─────────────────────────────────────────────────────
//
// `?T` syntax already exists for primitives + pointers. Extending to
// protocols is a type-system change; no new surface syntax needed.
#import "modules/std.sx";
main :: () -> s32 { 0; }

47
examples/issue-0029.sx Normal file
View File

@@ -0,0 +1,47 @@
// issue-0029: Feature — add explicit destructors to the GPU protocol so
// resources can be freed without leaking.
//
// ── Proposed additions to library/modules/gpu/api.sx ──────────────────────
//
// destroy_shader :: (h: ShaderHandle);
// destroy_buffer :: (h: BufferHandle);
// destroy_texture :: (h: TextureHandle);
//
// ── Why ────────────────────────────────────────────────────────────────────
//
// Today, library/modules/ui/glyph_cache.sx's `grow()` method recreates
// the atlas texture at a larger size but has no way to release the old
// one — see the comment in metal.sx that explicitly notes the leak. The
// GL path uses glDeleteTextures(1, @self.texture_id); the GPU protocol
// has no equivalent yet.
//
// ── Implementation notes ──────────────────────────────────────────────────
//
// Metal backend: send `release` to the MTLTexture / MTLBuffer /
// MTLRenderPipelineState (or call CFRelease, since these are
// CFTypeRef-compatible). Clear the corresponding slot in
// MetalGPU.textures / buffers / shaders to `null` / 0.
//
// GL backend (future): glDeleteTextures / glDeleteBuffers / glDeleteProgram.
//
// Handle lifecycle: after destroy, the slot in the backend List is freed.
// New allocations can take that slot or grow the list. Caller's handles
// remain valid until destroy. Don't aggressively re-use slots in MVP;
// keep handles append-only with a `null` marker for destroyed entries
// (matches the current shape).
//
// ── Touch points ──────────────────────────────────────────────────────────
//
// library/modules/gpu/api.sx — add 3 protocol method signatures
// library/modules/gpu/metal.sx — implement them (release + null
// the slot)
// library/modules/ui/glyph_cache.sx — call destroy_texture(old_handle)
// in grow() before creating the
// new atlas
//
// ── Syntax constraint ─────────────────────────────────────────────────────
//
// None — straight protocol-method addition.
#import "modules/std.sx";
main :: () -> s32 { 0; }

57
examples/issue-0030.sx Normal file
View File

@@ -0,0 +1,57 @@
// issue-0030: Feature — support `extern` global declarations so a global
// declared in one sx source file can be referenced from another without
// parameter threading.
//
// ── Use case from the Metal port ──────────────────────────────────────────
//
// // game/main.sx
// g_metal_gpu : *MetalGPU = null;
//
// // game/chess/pieces.sx
// extern g_metal_gpu : *MetalGPU;
//
// load :: (self: *ChessPieces, path: [:0]u8) {
// ...
// inline if OS == .ios {
// tex := g_metal_gpu.create_texture(w, h, .rgba8, xx pixels);
// } else {
// // GL path
// }
// }
//
// Today, pieces.load takes `has_gpu: bool, gpu: GPU` parameters and
// game/main.sx threads them through. Cross-file `extern` globals would
// let us drop those parameters.
//
// ── Implementation sketch ─────────────────────────────────────────────────
//
// Mirror how foreign function declarations work — declared in one file,
// defined elsewhere, linker resolves. Globals already have first-class
// addresses in the IR; just add an "extern" flag that says "don't emit
// storage, emit a reference."
//
// Files:
// - parser (sx surface syntax for `extern G : T;`)
// - src/ir/lower.zig (record an extern global stub that resolves at
// module-link time)
// - src/ir/emit_llvm.zig (emit an `external` LLVM global)
//
// ── Syntax constraint ─────────────────────────────────────────────────────
//
// `extern G : T;` is a NEW top-level form. Must not clash with:
// - `G :: T;` (type alias)
// - `G : T = ---;` (uninitialized global with explicit type)
// - `G : T;` (does this currently parse as anything?)
//
// The parser MUST reject `extern G : T = expr;` — extern cannot have an
// initializer (the definition lives elsewhere).
//
// ── Caveat ────────────────────────────────────────────────────────────────
//
// Encourages spaghetti globals. Documentation should steer callers toward
// explicit parameter passing where reasonable. Useful for genuine
// process-singletons (the active GPU, the active platform, etc.) where
// threading them through every call site is more noise than signal.
#import "modules/std.sx";
main :: () -> s32 { 0; }