abi: pass >16B aggregates by ptr-in-next-reg (Apple ARM64 ABI) + Path B for fn-ptr casts
Three stacked compiler bugs were causing iOS-sim chess to crash inside [MTLTexture replaceRegion:...]. Fixing them lets every replaceRegion call site succeed (1×1 RGBA8, 1MB R8 atlas, 440×440 chess pieces). Path B for callconv(.c) fn-pointer casts: - FunctionInfo now carries call_conv: CallConv (TypeInfo.CallConv) so function-type interning distinguishes sx-CC from C-CC. Inst.zig's Function.CallingConvention aliases the same enum. - Parser accepts an optional `callconv(.c)` suffix on fn-pointer type spellings (factored into parseOptionalCallConv() shared with parseFnDecl and parseLambda). - resolveFunctionType passes the parsed CC through functionTypeCC(). - .call_indirect reads fp.call_conv == .c and applies the C-ABI alloca+materialize for >16B aggregate args (Path A's behaviour at .call). Apple ARM64 ABI (drop LLVM byval): - Side-by-side asm diff vs clang's emission for the equivalent C call site showed LLVM's `byval` attribute lowers Apple-arm64 byval on the stack, while clang passes the struct via a pointer in the next int register (x2 for replaceRegion:). The runtime objc_msgSend dispatch path expects clang's convention. - Dropped the byval attribute from the function-signature emission and from both call sites (.call and .call_indirect). The materialize-into- alloca + pass-plain-ptr pattern stays — the call site now matches clang's `mov x2, sp` exactly. - Path A's sx-to-sx case continues to work since both ends use plain ptr (caller does alloca+store+pass, callee loads from the ptr in prologue). Protocol dispatch (emitProtocolDispatch): - Untargeted `null` lowers as const_null with type .void (per target_type orelse .void). The "wrap-value-in-alloca-pass-pointer" branch alloca'd a void slot, which LLVM's IRBuilder asserts on — EXC_BREAKPOINT in getTypeSizeInBits, manifesting as exit 133 / SIGTRAP when building the chess game. Fixed by re-emitting as constNull(void_ptr) when arg_ty == .void && expected_ty == void_ptr. - is_pointer_ty only recognized .pointer, so [*]T (many_pointer) was alloca-wrapped — the heap pixels pointer from stbi_load was stored into a stack slot and the slot's address was passed as the *void arg. Fixed by extending the check to `.pointer or .many_pointer`. metal.sx call sites + lifecycle guards: - msg_replace (replaceRegion:, MTLRegion = 48B) and the two setScissorRect: sites (MTLScissorRect = 32B) now spell their fn-pointer types with by-value params + callconv(.c) — the *MTLRegion/@local workaround is gone. - metal_begin_frame_ios bails before nextDrawable when pixel_w/h are 0 (drawableSize 0×0 makes nextDrawable abort via XPC). - metal_init_ios only sets drawableSize when dims are positive. - begin_frame's encoder/cmd_buffer failure paths now clear self.drawable so a partial failure doesn't leak a drawable back into the pool. Examples + tests: - examples/86-callconv-c-fnptr-large-aggregate.sx — new, covers Path B with C-CC fn-ptr cast. - examples/87-fnptr-cast-large-aggregate.sx — renamed from issue-0025.sx, covers Path B with default sx-CC (the negative case). - examples/85-cc-c-large-aggregate.sx — from Session 60, covers Path A. - examples/issue-0014.sx, issue-0024.sx, issue-0025.sx — removed (resolved earlier this work). 71 regression tests pass, 0 failed. Chess game builds clean for iOS sim and reaches its frame loop without aborting. Runtime: chess UI still doesn't render — remaining issue is in the UIKit lifecycle / CAMetalLayer setup (legacy-app vs scene-API hybrid), not a compiler bug. See current/CHECKPOINT.md "Next step" for the diagnosis + options.
This commit is contained in:
31
examples/85-cc-c-large-aggregate.sx
Normal file
31
examples/85-cc-c-large-aggregate.sx
Normal file
@@ -0,0 +1,31 @@
|
||||
// Regression test for issue-0025 path A.
|
||||
//
|
||||
// sx functions declared with `callconv(.c)` that take a composite > 16 bytes
|
||||
// by value must marshal the arg through `ptr byval(<T>)` per AAPCS64 / SysV
|
||||
// AArch64: the caller copies the struct to an alloca, passes the alloca
|
||||
// pointer with a `byval(<T>)` attribute, and the callee's entry block loads
|
||||
// the struct back from the pointer.
|
||||
//
|
||||
// Before the fix, abiCoerceParamType returned the raw LLVM struct type for
|
||||
// >16-byte composites (TODO at src/ir/emit_llvm.zig:2793), so the C ABI
|
||||
// promise was silently violated whenever sx-emitted C-callable code
|
||||
// interoperated with a real C caller.
|
||||
|
||||
#import "modules/std.sx";
|
||||
|
||||
Wide :: struct {
|
||||
a: s64;
|
||||
b: s64;
|
||||
c: s64;
|
||||
d: s64;
|
||||
}
|
||||
|
||||
accept_c :: (w: Wide) -> s64 callconv(.c) {
|
||||
w.a + w.b + w.c + w.d;
|
||||
}
|
||||
|
||||
main :: () -> s32 {
|
||||
w := Wide.{ a = 1, b = 10, c = 100, d = 1000 };
|
||||
if accept_c(w) != 1111 { return 1; }
|
||||
0;
|
||||
}
|
||||
37
examples/86-callconv-c-fnptr-large-aggregate.sx
Normal file
37
examples/86-callconv-c-fnptr-large-aggregate.sx
Normal file
@@ -0,0 +1,37 @@
|
||||
// Regression test for issue-0025 path B.
|
||||
//
|
||||
// When a fn-pointer's type is spelled with `callconv(.c)`, the indirect
|
||||
// call must apply the same C-ABI byval coercion that direct C-ABI calls
|
||||
// do at the call site (path A): >16-byte non-HFA aggregates are passed
|
||||
// as `ptr byval(<T>)`. Without the fix, the indirect call site builds
|
||||
// an LLVM function type whose param slot is the raw struct, which the
|
||||
// AArch64/x86_64 backend tries to lay out across registers + stack in
|
||||
// ways that don't match the byval-attributed callee signature — the
|
||||
// callee then reads garbage out of the wrong machine-state slots.
|
||||
//
|
||||
// The opt-in is the `callconv(.c)` on the fn-pointer type spelling.
|
||||
// Pure-sx fn-pointer casts (no callconv suffix) keep their default
|
||||
// calling convention — verified by examples/87-fnptr-cast-large-aggregate.sx.
|
||||
|
||||
#import "modules/std.sx";
|
||||
|
||||
Wide :: struct {
|
||||
a: s64;
|
||||
b: s64;
|
||||
c: s64;
|
||||
d: s64;
|
||||
}
|
||||
|
||||
accept_c :: (w: Wide) -> s64 callconv(.c) {
|
||||
w.a + w.b + w.c + w.d;
|
||||
}
|
||||
|
||||
main :: () -> s32 {
|
||||
w := Wide.{ a = 1, b = 10, c = 100, d = 1000 };
|
||||
if accept_c(w) != 1111 { return 1; }
|
||||
|
||||
fn_ptr : (Wide) -> s64 callconv(.c) = xx accept_c;
|
||||
if fn_ptr(w) != 1111 { return 2; }
|
||||
|
||||
0;
|
||||
}
|
||||
29
examples/87-fnptr-cast-large-aggregate.sx
Normal file
29
examples/87-fnptr-cast-large-aggregate.sx
Normal file
@@ -0,0 +1,29 @@
|
||||
// Pure-sx fn-pointer cast: a function-pointer typed without `callconv(.c)`
|
||||
// keeps the default (sx) calling convention. Passing a >16-byte aggregate
|
||||
// through that pointer must not get the C-ABI byval coercion — the sx-CC
|
||||
// callee expects the struct as an SSA value, not as a `ptr byval(<T>)`.
|
||||
//
|
||||
// Pair with examples/86-callconv-c-fnptr-large-aggregate.sx, which covers
|
||||
// the opposite arm (fn-pointer typed `callconv(.c)` does get byval).
|
||||
|
||||
#import "modules/std.sx";
|
||||
|
||||
Wide :: struct {
|
||||
a: s64; b: s64; c: s64; d: s64;
|
||||
}
|
||||
|
||||
accept :: (w: Wide) -> s64 {
|
||||
w.a + w.b + w.c + w.d;
|
||||
}
|
||||
|
||||
main :: () -> s32 {
|
||||
w := Wide.{ a = 1, b = 10, c = 100, d = 1000 };
|
||||
direct := accept(w);
|
||||
if direct != 1111 { return 1; }
|
||||
|
||||
fn_ptr : (Wide) -> s64 = xx accept;
|
||||
indirect := fn_ptr(w);
|
||||
if indirect != 1111 { return 2; }
|
||||
|
||||
0;
|
||||
}
|
||||
@@ -1,15 +0,0 @@
|
||||
// issue-0014: Feature request — {{{ CONTENT_HASH }}} template variable for wasm shell
|
||||
//
|
||||
// When targeting wasm, the compiler processes shell.html and substitutes
|
||||
// {{{ SCRIPT }}} with the <script> tag. Add a {{{ CONTENT_HASH }}} variable
|
||||
// that is a short hash (e.g. 8-char hex) of the final build outputs
|
||||
// (.js + .wasm + .data), so the shell can use it for cache busting:
|
||||
//
|
||||
// <script>
|
||||
// Module.locateFile=function(path){return path+'?v={{{ CONTENT_HASH }}}'};
|
||||
// </script>
|
||||
// <script async src="index.js?v={{{ CONTENT_HASH }}}"></script>
|
||||
//
|
||||
// This lets browsers cache until the next build, then bust automatically.
|
||||
// No changes needed to build.sx or modules/compiler.sx — just the compiler
|
||||
// recognizing the new placeholder during shell template substitution.
|
||||
@@ -1,90 +0,0 @@
|
||||
// issue-0024: NSLog/foreign-side-effect calls placed as the FIRST statement
|
||||
// of an `if X { ... } else { ... }` branch body do not produce visible
|
||||
// output, even when the branch is provably taken (the SECOND statement in
|
||||
// the same body — also a foreign call — does produce output).
|
||||
//
|
||||
// ── Observed iOS-side symptom (session 59 bisect) ─────────────────────────
|
||||
//
|
||||
// In library/modules/gpu/metal.sx's `metal_create_texture_ios`:
|
||||
//
|
||||
// slot : TextureSlot = .{ tex = tex, bytes_per_pixel = bytes_per_pixel };
|
||||
// self.textures.append(slot);
|
||||
// NSLog(ns_string("[metal] T6 appended\n".ptr)); // ← fires
|
||||
//
|
||||
// pixels_null := pixels == null;
|
||||
// if pixels_null {
|
||||
// NSLog(ns_string("[metal] T6b null\n".ptr)); // ← never fires
|
||||
// } else {
|
||||
// NSLog(ns_string("[metal] T6a non-null\n".ptr)); // ← never fires
|
||||
// handle : u32 = xx self.textures.len;
|
||||
// metal_update_texture_region_ios(self, handle, 0, 0, w, h, pixels);
|
||||
// // ← DOES fire
|
||||
// // (its first
|
||||
// // NSLog at
|
||||
// // fn entry
|
||||
// // appears in
|
||||
// // the unified
|
||||
// // log)
|
||||
// NSLog(ns_string("[metal] T7 done\n".ptr)); // ← (helper crashed
|
||||
// // before this)
|
||||
// }
|
||||
//
|
||||
// T6 appears in the iOS unified log. T6a/T6b never appear. The else
|
||||
// branch's helper call DOES fire (its own first-statement NSLog inside
|
||||
// the helper appears). So the else-branch IS entered; just its first
|
||||
// NSLog statement produces no output.
|
||||
//
|
||||
// ── Pure-sx repro below does NOT trigger ───────────────────────────────────
|
||||
//
|
||||
// Running `sx run examples/issue-0024.sx` exits 0 (counter == 4 — all
|
||||
// bumps fired). The bug only manifests with foreign calls (NSLog / ns_string),
|
||||
// and possibly only when the process subsequently crashes (replaceRegion
|
||||
// in the metal.sx case) — which raises the alternative hypothesis that
|
||||
// the missing NSLog output is just iOS unified-logging buffer-loss on
|
||||
// process death, not a sx compiler bug. The runtime sequence between T6
|
||||
// and the crash was ~500μs; logs within ~1ms of an unhandled exception
|
||||
// can be lost to OSLog's internal buffering on Apple Silicon iOS-sim.
|
||||
//
|
||||
// ── Investigation plan ─────────────────────────────────────────────────────
|
||||
//
|
||||
// Two paths to disambiguate:
|
||||
// 1. Replace NSLog markers with `write(STDERR_FILENO, ...)` calls
|
||||
// (synchronous, no OSLog involvement). If markers still don't appear:
|
||||
// sx compiler bug — likely in src/ir/lower.zig:2166-2196 (the
|
||||
// `is_value` branch of `lowerIfExpr` and downstream `lowerBlockValue`
|
||||
// around 922-948). Possible: side-effecting leading statements
|
||||
// dropped when branches are treated as values.
|
||||
// 2. If markers DO appear with synchronous write: the iOS-side symptom
|
||||
// is unified-logging buffer-loss, not a compiler bug. Close this issue
|
||||
// as "wontfix — diagnostic limitation" and move the iOS debugging to
|
||||
// foreign-write tracing.
|
||||
//
|
||||
// ── Real-world impact ──────────────────────────────────────────────────────
|
||||
//
|
||||
// Bisecting issue-0026 (replaceRegion crash) is currently blocked: without
|
||||
// trustworthy markers inside if/else branches we can't tell which arg
|
||||
// arrives wrong. Resolution unblocks step 3b of the Metal port.
|
||||
|
||||
#import "modules/std.sx";
|
||||
|
||||
counter : s64 = 0;
|
||||
|
||||
bump :: () { counter = counter + 1; }
|
||||
|
||||
probe :: (skip: bool) {
|
||||
bump();
|
||||
if skip {
|
||||
bump();
|
||||
bump();
|
||||
} else {
|
||||
bump();
|
||||
bump();
|
||||
}
|
||||
bump();
|
||||
}
|
||||
|
||||
main :: () -> s32 {
|
||||
probe(false);
|
||||
// counter == 4 (entry + 2 in false branch + exit) → exit 0
|
||||
if counter == 4 then 0 else 1;
|
||||
}
|
||||
@@ -1,94 +0,0 @@
|
||||
// issue-0025: Composite types larger than 16 bytes are passed without the
|
||||
// LLVM `byval(<ty>)` attribute, and the `call_indirect` (fn-pointer cast)
|
||||
// path doesn't apply C-ABI parameter coercion at all. Both gaps cause
|
||||
// silent shape-mismatch when sx code calls foreign C functions that take
|
||||
// large aggregates by value, OR when sx code calls a sx fn through a
|
||||
// fn-pointer typed with a large-aggregate parameter.
|
||||
//
|
||||
// ── Two failing forms ─────────────────────────────────────────────────────
|
||||
//
|
||||
// (A) Direct call to a sx function with a >16B param:
|
||||
//
|
||||
// Wide :: struct { a: s64; b: s64; c: s64; d: s64; } // 32 bytes
|
||||
// accept :: (w: Wide) -> s64 { w.a + w.b + w.c + w.d; }
|
||||
// accept(Wide.{ a = 1, b = 10, c = 100, d = 1000 }) // expect 1111
|
||||
//
|
||||
// src/ir/emit_llvm.zig:2747-2795 (`abiCoerceParamType`):
|
||||
// - <=8 bytes → coerced to i64
|
||||
// - 9-16 bytes → coerced to [2 x i64]
|
||||
// - >16 bytes → returns llvm_ty unchanged with TODO at line 2793
|
||||
//
|
||||
// The TODO is the bug: large composites should be coerced to `ptr`
|
||||
// with a `byval(struct.T)` LLVM attribute. LLVM's mid-end then
|
||||
// materializes the right machine code per target. Today the struct
|
||||
// is left as-is, which LLVM tries to pass across registers + stack
|
||||
// slots in ways that don't match the C ABI promise.
|
||||
//
|
||||
// (B) Indirect call via fn-pointer cast (the `xx objc_msgSend` idiom):
|
||||
//
|
||||
// fn_ptr : (Wide) -> s64 = xx accept;
|
||||
// fn_ptr(Wide.{ ... })
|
||||
//
|
||||
// src/ir/emit_llvm.zig:902-967 (`.call_indirect`): both the
|
||||
// FunctionInfo-known arm (939-952) and the LLVMTypeOf-fallback arm
|
||||
// (953-956) construct `param_tys[j]` WITHOUT routing through
|
||||
// `abiCoerceParamType`. So even if (A) is fixed, fn-pointer-cast call
|
||||
// sites still mis-marshal large composites.
|
||||
//
|
||||
// ── Real-world impact ──────────────────────────────────────────────────────
|
||||
//
|
||||
// Every `xx objc_msgSend` call site in library/modules/platform/uikit.sx
|
||||
// + library/modules/gpu/metal.sx. Works in practice today only because:
|
||||
// - We never pass aggregates >16 bytes by value through fn-pointer casts
|
||||
// (workaround: declare param as `*T` + pass `@local`; arm64 AAPCS's
|
||||
// indirect-by-ref happens to match this machine-state-wise).
|
||||
// - HFAs (CGSize 2×f64, MTLClearColor 4×f64, CGRect 4×f64 as return)
|
||||
// are correctly classified at emit_llvm.zig:2766-2779.
|
||||
//
|
||||
// ── Workarounds in use ─────────────────────────────────────────────────────
|
||||
//
|
||||
// library/modules/gpu/metal.sx declares MTLRegion (48B) + MTLScissorRect
|
||||
// (32B) call sites with `*MTLRegion` / `*MTLScissorRect` and passes
|
||||
// `@region` / `@rect`. Should not be needed once this issue is fixed.
|
||||
//
|
||||
// ── Fix sketch ─────────────────────────────────────────────────────────────
|
||||
//
|
||||
// (A) emit_llvm.zig:2793 — return `ptr` and emit `byval(struct.T)` on
|
||||
// the param via `LLVMAddCallSiteAttribute` / `LLVMCreateTypeAttribute`.
|
||||
// At call sites, alloca + memcpy + pass the alloca pointer. Apply
|
||||
// identically at function-definition emission so direct calls roundtrip.
|
||||
//
|
||||
// (B) emit_llvm.zig:902-967 — factor out a helper
|
||||
// `coerceCallParams(param_count, src_args, dst_fn_param_tys)
|
||||
// -> (coerced_args, coerced_tys)` that wraps `abiCoerceParamType`.
|
||||
// Use the helper from both arms.
|
||||
//
|
||||
// Edge cases to preserve:
|
||||
// - Variadic foreign functions (printf family) — variadic tail per
|
||||
// AAPCS64 still passes composites in their natural form. Keep
|
||||
// existing behavior for variadic args.
|
||||
// - HFAs already handled at 2766-2779 — don't touch.
|
||||
// - Structs <=8 bytes coerced to `i64`, 9-16 bytes to `[2 x i64]` —
|
||||
// don't touch.
|
||||
|
||||
#import "modules/std.sx";
|
||||
|
||||
Wide :: struct {
|
||||
a: s64; b: s64; c: s64; d: s64;
|
||||
}
|
||||
|
||||
accept :: (w: Wide) -> s64 {
|
||||
w.a + w.b + w.c + w.d;
|
||||
}
|
||||
|
||||
main :: () -> s32 {
|
||||
w := Wide.{ a = 1, b = 10, c = 100, d = 1000 };
|
||||
direct := accept(w); // exercises path (A)
|
||||
if direct != 1111 { return 1; }
|
||||
|
||||
fn_ptr : (Wide) -> s64 = xx accept;
|
||||
indirect := fn_ptr(w); // exercises path (B)
|
||||
if indirect != 1111 { return 2; }
|
||||
|
||||
0;
|
||||
}
|
||||
Reference in New Issue
Block a user