metal: pause step 3b pending sx-side fixes (filed 0024-0030)

Step 3b code is wired across UIRenderer + GlyphCache + UIPipeline +
chess game (gpu_mode = .metal on iOS, MetalGPU bound via the GPU
protocol). macOS GL chess, iOS-sim GLES chess, and iOS-sim Metal
triangle (63-metal-clear.sx) all still render.

iOS-sim Metal chess crashes inside replaceRegion uploading the 1MB
font atlas. Bisecting that crash exposed several sx-language issues
where mid-bisect tracers (NSLog inside if/else branch bodies) didn't
produce output, blocking further investigation.

Filing each finding as examples/issue-NNNN.sx rather than working
around piecemeal:

Bugs:
- 0024 NSLog/foreign-call inside if/else body not producing output
- 0025 C-ABI param coercion incomplete for composites >16B
       (combined direct-call abiCoerceParamType TODO + call_indirect
        path that doesn't apply C-ABI coercion at all)
- 0026 replaceRegion 1MB upload crash (likely downstream of 0025)

Features needed for step 4 + cleanup:
- 0027 Obj-C block bridge (^{...}) for animateWithDuration:
- 0028 Optional protocol box (?GPU = null) replaces T = ---; has_T: bool
- 0029 destroy_texture/buffer/shader on GPU protocol
- 0030 extern cross-file globals

Library-side: renderer.sx + glyph_cache.sx + pipeline.sx gain a
`gpu: GPU = ---; has_gpu: bool` field pair + branches that route every
GL touchpoint through the protocol when has_gpu. glyph_cache.init
saves/restores those fields around its memset. pipeline.set_gpu()
propagates to renderer + font. Renderer's MSL shader source added as
UI_MSL_SRC using packed_float2/packed_float4 to keep the 12-float
interleaved vertex layout tight (48 bytes).

metal.sx: dual-phase init (init(null, 0, 0) for eager device+queue,
re-init with the layer once UIKit installs the SxMetalView).
setStorageMode:.shared on every texture descriptor to ensure CPU-
writable atlas pixels on Apple Silicon iOS-sim.

Regression suite: 68 passing, 0 failed. WASM chess build currently
broken under step 3b state (silent compiler crash); documented in
CHECKPOINT.md, likely fallout from one of the filed issues (probably
0028 — the verbose protocol-box pattern). Step 3b resumes after
0024-0030 land.
This commit is contained in:
agra
2026-05-17 21:17:17 +03:00
parent a938c4f900
commit a1647eab9b
11 changed files with 783 additions and 97 deletions

94
examples/issue-0025.sx Normal file
View File

@@ -0,0 +1,94 @@
// issue-0025: Composite types larger than 16 bytes are passed without the
// LLVM `byval(<ty>)` attribute, and the `call_indirect` (fn-pointer cast)
// path doesn't apply C-ABI parameter coercion at all. Both gaps cause
// silent shape-mismatch when sx code calls foreign C functions that take
// large aggregates by value, OR when sx code calls a sx fn through a
// fn-pointer typed with a large-aggregate parameter.
//
// ── Two failing forms ─────────────────────────────────────────────────────
//
// (A) Direct call to a sx function with a >16B param:
//
// Wide :: struct { a: s64; b: s64; c: s64; d: s64; } // 32 bytes
// accept :: (w: Wide) -> s64 { w.a + w.b + w.c + w.d; }
// accept(Wide.{ a = 1, b = 10, c = 100, d = 1000 }) // expect 1111
//
// src/ir/emit_llvm.zig:2747-2795 (`abiCoerceParamType`):
// - <=8 bytes → coerced to i64
// - 9-16 bytes → coerced to [2 x i64]
// - >16 bytes → returns llvm_ty unchanged with TODO at line 2793
//
// The TODO is the bug: large composites should be coerced to `ptr`
// with a `byval(struct.T)` LLVM attribute. LLVM's mid-end then
// materializes the right machine code per target. Today the struct
// is left as-is, which LLVM tries to pass across registers + stack
// slots in ways that don't match the C ABI promise.
//
// (B) Indirect call via fn-pointer cast (the `xx objc_msgSend` idiom):
//
// fn_ptr : (Wide) -> s64 = xx accept;
// fn_ptr(Wide.{ ... })
//
// src/ir/emit_llvm.zig:902-967 (`.call_indirect`): both the
// FunctionInfo-known arm (939-952) and the LLVMTypeOf-fallback arm
// (953-956) construct `param_tys[j]` WITHOUT routing through
// `abiCoerceParamType`. So even if (A) is fixed, fn-pointer-cast call
// sites still mis-marshal large composites.
//
// ── Real-world impact ──────────────────────────────────────────────────────
//
// Every `xx objc_msgSend` call site in library/modules/platform/uikit.sx
// + library/modules/gpu/metal.sx. Works in practice today only because:
// - We never pass aggregates >16 bytes by value through fn-pointer casts
// (workaround: declare param as `*T` + pass `@local`; arm64 AAPCS's
// indirect-by-ref happens to match this machine-state-wise).
// - HFAs (CGSize 2×f64, MTLClearColor 4×f64, CGRect 4×f64 as return)
// are correctly classified at emit_llvm.zig:2766-2779.
//
// ── Workarounds in use ─────────────────────────────────────────────────────
//
// library/modules/gpu/metal.sx declares MTLRegion (48B) + MTLScissorRect
// (32B) call sites with `*MTLRegion` / `*MTLScissorRect` and passes
// `@region` / `@rect`. Should not be needed once this issue is fixed.
//
// ── Fix sketch ─────────────────────────────────────────────────────────────
//
// (A) emit_llvm.zig:2793 — return `ptr` and emit `byval(struct.T)` on
// the param via `LLVMAddCallSiteAttribute` / `LLVMCreateTypeAttribute`.
// At call sites, alloca + memcpy + pass the alloca pointer. Apply
// identically at function-definition emission so direct calls roundtrip.
//
// (B) emit_llvm.zig:902-967 — factor out a helper
// `coerceCallParams(param_count, src_args, dst_fn_param_tys)
// -> (coerced_args, coerced_tys)` that wraps `abiCoerceParamType`.
// Use the helper from both arms.
//
// Edge cases to preserve:
// - Variadic foreign functions (printf family) — variadic tail per
// AAPCS64 still passes composites in their natural form. Keep
// existing behavior for variadic args.
// - HFAs already handled at 2766-2779 — don't touch.
// - Structs <=8 bytes coerced to `i64`, 9-16 bytes to `[2 x i64]` —
// don't touch.
#import "modules/std.sx";
Wide :: struct {
a: s64; b: s64; c: s64; d: s64;
}
accept :: (w: Wide) -> s64 {
w.a + w.b + w.c + w.d;
}
main :: () -> s32 {
w := Wide.{ a = 1, b = 10, c = 100, d = 1000 };
direct := accept(w); // exercises path (A)
if direct != 1111 { return 1; }
fn_ptr : (Wide) -> s64 = xx accept;
indirect := fn_ptr(w); // exercises path (B)
if indirect != 1111 { return 2; }
0;
}